Software Engineer, Data Infrastructure
The Software Engineer, Data Infrastructure role at Thinking Machines Lab involves designing and operating scalable, fault-tolerant infrastructure to support large-scale language model (LLM) research. The engineer will be part of a high-impact team responsible for developing distributed training pipelines, multimodal data catalogs, and intelligent processing systems handling petabytes of data. This position is based in San Francisco, California.
Key responsibilities include building high-throughput systems for data ingestion, processing, and transformation, such as training data catalogs, deduplication, quality checks, and search functionalities. The role also involves implementing monitoring and alerting systems to ensure platform reliability and performance, as well as collaborating with research teams to enhance data quality and accelerate training cycles.
Candidates should have a bachelor's degree or equivalent experience in computer science, engineering, or a related field. Proficiency in at least one backend language, such as Python or Rust, is required. Applicants should be fluent in distributed compute frameworks like Apache Spark or Ray and have a strong understanding of cloud infrastructure, data lake architectures, and both batch and streaming pipelines. The ability to work across the technology stack and manage projects end-to-end is essential.
The expected annual salary range for this position is $350,000 to $475,000 USD, depending on background, skills, and experience. Thinking Machines Lab offers generous health, dental, and vision benefits, unlimited paid time off, paid parental leave, and relocation support as needed.
Thinking Machines Lab is committed to advancing collaborative general intelligence, aiming to make AI accessible and customizable for diverse needs and goals. The company fosters a highly collaborative environment, encouraging proactive initiative and cross-functional teamwork. This role offers significant opportunities for professional growth and the chance to contribute to cutting-edge AI research and development.