Job Description
Software Engineer - Kernel Developer
Mountain View, CA
Acceler8 Talent is seeking an experienced Software Engineer with deep kernel development experience to join a well funded startup whose hardware promises to drastically change the economics of compute for the worlds' largest models.
With a $500m Series B in the bank and a world-class team with a track record of shipping highly successful products, this company abandons legacy chip design assumptions and strives for the best possible solution for every aspect of their chip - there is no such thing as good enough.
As a Kernel Engineer, you will be responsible for designing and optimizing performance-critical kernels that interface directly with custom AI hardware. You will work closely with ML Research and Hardware Engineering teams, providing a programmer’s perspective on hardware architecture and ensuring tight integration across the software stack.
Responsibilities:
Design, implement, and optimize high-performance kernels that interface directly with custom AI hardware
Partner closely with ML Research and Hardware Engineering teams to translate algorithmic intent into efficient kernel implementations
Provide architectural feedback and guidance from a programmer’s perspective to influence hardware and system design decisions
Optimize kernels using techniques such as parallelism, SIMD/vectorization, low-level memory optimization, and instruction-level tuning
Support performance analysis, profiling, and debugging across kernels, runtime, and hardware
Requirements:
Bachelor’s degree in Computer Science or equivalent practical experience
Experience optimizing software for specialized or accelerator hardware, including techniques such as parallel programming, SIMD, low-level C/C++, assembly-level optimization, or GPU/CUDA programming
Proficiency in at least one of: Assembly, C, C++, Zig, or Rust
Strong understanding of performance bottlenecks across compute, memory, and data movement
Preferences:
Experience implementing kernels for ML workloads, including models such as Transformers
Familiarity with distributed and parallel execution models, including AllReduce, AllToAll, data parallelism, and tensor parallelism
Working knowledge of compiler fundamentals and how code is lowered, optimized, and executed on modern hardware
If you're interested in building the future of AI compute, apply here or reach out to me at ltomaszko@acceler8talent.com to discuss further.
Ready to Apply?
Don't miss this opportunity! Apply now and join our team.
Job Details
Posted Date:
March 2, 2026
Job Type:
Technology
Location:
Mountain View, California, 94039, United States
Company:
Acceler8 Talent
Ready to Apply?
Don't miss this opportunity! Apply now and join our team.