Engineering Manager - ML Platform and Infrastructure

🇺🇸 Sunnyvale, California
$2K - $3K Annual
Posted 3 months ago
Expires June 9, 2026
Full TimeOn-siteEngineeringOperations

As an Engineering Manager on the Machine Learning (ML) Platform team at Applied Intuition, you will lead a team of engineers dedicated to building the infrastructure that powers Physical AI at scale. Your team will focus on three critical areas: Training & Inference Orchestration, GPU Cluster Architecture, and Performance Optimization. This role involves close collaboration with research and development teams to accelerate the transition from experimentation to production.

Key responsibilities include growing and managing a team of infrastructure and systems engineers to deliver a best-in-class ML platform. You will own the design and evolution of frameworks for orchestrating distributed training and inference jobs across thousands of GPUs. Additionally, you will drive the buildout and scaling of GPU cluster infrastructure, making critical decisions on architecture, scheduling, networking, and resource management. Leading efforts to optimize training and inference performance, including throughput, fault tolerance, GPU utilization, and cost efficiency at scale, is also a significant part of the role.

The ideal candidate will have over three years of engineering management experience, preferably leading infrastructure or platform teams. A deep understanding of distributed systems, GPU computing, and large-scale ML infrastructure is essential. Direct experience building or operating large GPU clusters (1,000+ GPUs) and a strong grasp of distributed training frameworks such as PyTorch Distributed, Megatron-LM, DeepSpeed, or FSDP are required. Familiarity with GPU cluster management, high-performance networking (InfiniBand, RDMA), and resource scheduling tools like Slurm or Kubernetes is also important.

Compensation for this full-time position in Sunnyvale, California, ranges from $204,000 to $343,000 annually. The total compensation package includes base salary, equity in the form of options and/or restricted stock units, comprehensive health, dental, vision, life, and disability insurance coverage, 401(k) retirement benefits with employer match, learning and wellness stipends, and paid time off. Benefits are subject to change and may vary based on the jurisdiction of employment.

Applied Intuition fosters a collaborative and mission-driven work environment, emphasizing innovation and excellence in the field of Physical AI. Employees are expected to work primarily from the office five days a week, with flexibility for occasional remote work to accommodate personal commitments. This role offers significant opportunities for professional growth and the chance to contribute to cutting-edge advancements in AI infrastructure.

More Jobs at Applied Intuition