Member of Technical Staff - Voice Model
As a Member of Technical Staff on the Grok Voice Model team at xAI, you will contribute to developing advanced voice AI systems that deliver smooth, natural, and low-latency spoken interactions. This role involves working within a highly motivated team dedicated to engineering excellence, aiming to create AI systems that accurately understand the universe and assist humanity in its pursuit of knowledge.
Your primary responsibilities will include designing and executing large-scale speech data curation and processing pipelines, encompassing the collection of diverse real-world audio, synthetic data generation, and automated annotation workflows to facilitate high-quality model training and evaluation. You will also engage in pre-training and post-training of speech-language models, implementing supervised fine-tuning, reinforcement learning, and other techniques to ensure Grok Voice responses are accurate, factually grounded, natural in spoken style, conversational in tone, and fluent across multiple languages. Additionally, you will build and iterate comprehensive evaluation frameworks covering objective metrics, human preference studies, content factuality assessments, real-time interaction quality, and experimentation infrastructure to measure and enhance performance. Collaboration with product teams to integrate voice models into applications and real-time environments, defining spoken interaction specifications, and managing the full lifecycle from prototype to global-scale deployment for stable, low-latency, and delightful voice experiences will also be key aspects of your role.
The ideal candidate will possess deep proficiency in Python, with a strong track record of writing clean, efficient code for AI/ML systems. Hands-on experience processing large-scale datasets using tools like Spark and Ray for cleaning, augmentation, and feature extraction is essential. Proficiency in pre-training and post-training speech-language models using JAX/PyTorch, including supervised fine-tuning, reinforcement learning, and optimizations for accuracy, factuality, natural spoken style, detail, and multilingual fluency, is required. The ability to set up and run rigorous evaluation pipelines, including objective metrics, human preference studies, content factuality checks, and iterative A/B testing to drive model improvements, is also necessary. Experience building or working with large-scale distributed training and inference systems on Kubernetes is preferred. A proactive, self-driven attitude, ready to thrive in a fast-paced, high-caliber team to deliver outstanding voice AI experiences, is highly valued.
The position offers a competitive base salary ranging from $150,000 to $450,000 USD. In addition to the base salary, xAI provides equity, comprehensive medical, vision, and dental coverage, access to a 401(k) retirement plan, short and long-term disability insurance, life insurance, and various other discounts and perks.
At xAI, we foster a culture of curiosity, engineering excellence, and hands-on contribution. Our flat organizational structure empowers employees to take initiative and deliver excellence. We value strong communication skills and the ability to concisely and accurately share knowledge with teammates. Joining our team offers growth opportunities in a dynamic environment dedicated to advancing AI systems that aid humanity in its pursuit of knowledge.