Skip to Content

Staff AI Systems Engineer

--10xEngineers--
Experience: 6–9 years

Location: Onsite – Lahore 

About the Role

We are looking for a Staff AI Systems Engineer whose primary domain is AI model algorithms and optimizations, but who can follow a problem all the way down the stack — through compilers, kernels, firmware, and onto custom silicon — when the work demands it.

You are first and foremost an algorithms and optimization person. But you don't stop at the model layer when there's a systems problem blocking progress. You understand enough of the stack below you to diagnose, collaborate, and contribute at every level.

What You'll Do
  • Own model-level algorithm research and optimization — including inference techniques such as quantization, sparsity, attention variants, KV cache strategies, and memory bandwidth optimization

  • Evaluate and integrate state-of-the-art developments in AI inference and language modeling, translating research advances into practical gains on custom hardware

  • When algorithms hit hardware limits, go deeper — trace bottlenecks through the compiler, kernel, and firmware layers and drive solutions in collaboration with the relevant teams or directly when needed

  • Work with ASIC and hardware design teams to ensure AI workloads are efficiently mapped to custom silicon, providing algorithm-level insight that shapes hardware and compiler roadmaps

  • Build internal tooling and frameworks that allow the broader team to experiment with and deploy optimized models on proprietary hardware

  • Jump across team boundaries when needed — if something is broken or blocked, you help fix it regardless of where it sits organizationally

What We're Looking For
  • 5–8 years of experience in AI systems or ML engineering, with a strong primary focus on model architectures and inference optimization

  • Deep hands-on knowledge of inference optimization techniques — quantization, sparsity, PEFT, speculative decoding, and related methods

  • Ability to work through the stack when needed — practical familiarity with compilers (MLIR, LLVM, or equivalent), kernel development, and hardware-software interfaces on AI accelerators

  • Experience working with or around custom ASICs — understanding how hardware architecture decisions affect model-level performance and how to adapt algorithms accordingly

  • Strong programming skills in Python and C/C++

  • Ability to communicate across disciplines — you can go deep on algorithms with an AI researcher and engage meaningfully with a compiler or chip architect

Strong Candidates May Also Have Experience With
  • High-performance ML systems — designing or optimizing systems where throughput, latency, and efficiency are first-class constraints

  • GPU/accelerator programming — CUDA, ROCm, or vendor-specific accelerator SDKs at a kernel or driver level

  • ML framework internals — deep familiarity with PyTorch, JAX, or similar frameworks beyond the user-facing API

  • OS internals — understanding of scheduling, memory management, and system calls as they relate to AI workload performance

  • Language modeling with transformers — practical experience working with large language models, attention mechanisms, and their computational characteristics

What This Role Is Not
  • Not a DevOps / SRE / cloud infrastructure role

  • Not focused on dashboards, Business Insights (BI), or generic data science

  • Not suited for candidates with primarily freelance or short-term project work

  • Prior experience of training models on custom datasets and porting to a hardware through standard SDK calls or APIs may not be enough to meet the demands of this role

Logistics 
  • Applications will be reviewed on a rolling basis

Why 10xEngineers

At 10xEngineers, we build the systems and infrastructure that bring machine learning algorithms to life on both standard and custom hardware. Our work spans the complete ML inference stack — from deeply understanding model architectures and algorithmic optimizations, through serving, compilation, and kernel development, all the way down to precise mapping on the hardware itself.

We don't specialize in just one layer. We own the full picture, and that means every engineer here has the opportunity — and the expectation — to think across abstractions, connect dots others miss, and solve problems that don't fit neatly into a job description.

If you're energized by hard engineering challenges, comfortable operating at multiple levels of the stack, and want to work where AI research meets real silicon — this might be exactly where you belong.