Member of Technical Staff - Training Platform

🇺🇸 San Francisco, California
$2K - $3K Annual
Posted 3 days ago
Expires July 10, 2026
Full TimeHybridEngineeringProduct

BUILDING OPEN SUPERINTELLIGENCE INFRASTRUCTURE

Prime Intellect is building the open superintelligence stack - from frontier agentic models to the infrastructure that lets anyone create, train, and deploy them. We aggregate and orchestrate global compute into a single control plane and pair it with the full RL post-training stack: environments, secure sandboxes, verifiable evals, and our async RL trainer. We enable researchers, startups, and enterprises to run end-to-end reinforcement learning at frontier scale, adapting models to real tools, workflows, and deployment contexts.

We recently raised $15M in funding (taking total funding to $20M), led by Founders Fund with participation from Menlo Ventures and prominent angels including Andrej Karpathy (Eureka Labs, Tesla, OpenAI), Tri Dao (Chief Scientist, Together AI), Dylan Patel (SemiAnalysis), Clem Delangue (Hugging Face), Emad Mostaque (Stability AI), and many others.

ROLE IMPACT

You'll help build our hosted training platform - the product that lets users launch LoRA and full fine-tuning runs on managed GPU clusters with a single API call or a few clicks. The role spans the developer-facing platform and the underlying Kubernetes-based training infrastructure that runs the jobs.

CORE TECHNICAL RESPONSIBILITIES

HOSTED TRAINING INFRASTRUCTURE

- Design and operate Kubernetes-based training and inference orchestration across multi-cluster, multi-cloud GPU fleets

- Build and maintain Helm charts that compose trainers, inference servers, environment servers, and supporting services into reproducible "Training stacks"

- Develop the Python control-plane agents that watch pods, report run state to the platform, and keep clusters in sync

- Implement scheduling and autoscaling for heterogeneous hardware (H100/H200/B200) using KEDA, LeaderWorkerSet, taints/tolerations, and gang scheduling

- Run a tight GitOps workflow - every change ships through PRs, Helm value...

More Jobs at Primeintellect