Staff AI Systems Engineer
--10xEngineers--
Location: Onsite – Lahore
We are looking for a Staff AI Systems Engineer whose primary domain is AI model algorithms and optimizations, but who can follow a problem all the way down the stack — through compilers, kernels, firmware, and onto custom silicon — when the work demands it.
You are first and foremost an algorithms and optimization person. But you don't stop at the model layer when there's a systems problem blocking progress. You understand enough of the stack below you to diagnose, collaborate, and contribute at every level.
Own model-level algorithm research and optimization — including inference techniques such as quantization, sparsity, attention variants, KV cache strategies, and memory bandwidth optimization
Evaluate and integrate state-of-the-art developments in AI inference and language modeling, translating research advances into practical gains on custom hardware
When algorithms hit hardware limits, go deeper — trace bottlenecks through the compiler, kernel, and firmware layers and drive solutions in collaboration with the relevant teams or directly when needed
Work with ASIC and hardware design teams to ensure AI workloads are efficiently mapped to custom silicon, providing algorithm-level insight that shapes hardware and compiler roadmaps
Build internal tooling and frameworks that allow the broader team to experiment with and deploy optimized models on proprietary hardware
Jump across team boundaries when needed — if something is broken or blocked, you help fix it regardless of where it sits organizationally
5–8 years of experience in AI systems or ML engineering, with a strong primary focus on model architectures and inference optimization
Deep hands-on knowledge of inference optimization techniques — quantization, sparsity, PEFT, speculative decoding, and related methods
Ability to work through the stack when needed — practical familiarity with compilers (MLIR, LLVM, or equivalent), kernel development, and hardware-software interfaces on AI accelerators
Experience working with or around custom ASICs — understanding how hardware architecture decisions affect model-level performance and how to adapt algorithms accordingly
Strong programming skills in Python and C/C++
Ability to communicate across disciplines — you can go deep on algorithms with an AI researcher and engage meaningfully with a compiler or chip architect
High-performance ML systems — designing or optimizing systems where throughput, latency, and efficiency are first-class constraints
GPU/accelerator programming — CUDA, ROCm, or vendor-specific accelerator SDKs at a kernel or driver level
ML framework internals — deep familiarity with PyTorch, JAX, or similar frameworks beyond the user-facing API
OS internals — understanding of scheduling, memory management, and system calls as they relate to AI workload performance
Language modeling with transformers — practical experience working with large language models, attention mechanisms, and their computational characteristics
Not a DevOps / SRE / cloud infrastructure role
Not focused on dashboards, Business Insights (BI), or generic data science
Not suited for candidates with primarily freelance or short-term project work
Prior experience of training models on custom datasets and porting to a hardware through standard SDK calls or APIs may not be enough to meet the demands of this role
Applications will be reviewed on a rolling basis
At 10xEngineers, we build the systems and infrastructure that bring machine learning algorithms to life on both standard and custom hardware. Our work spans the complete ML inference stack — from deeply understanding model architectures and algorithmic optimizations, through serving, compilation, and kernel development, all the way down to precise mapping on the hardware itself.
We don't specialize in just one layer. We own the full picture, and that means every engineer here has the opportunity — and the expectation — to think across abstractions, connect dots others miss, and solve problems that don't fit neatly into a job description.
If you're energized by hard engineering challenges, comfortable operating at multiple levels of the stack, and want to work where AI research meets real silicon — this might be exactly where you belong.