AI Software Engineer (Model Training)
ABOUT THE ROLE
Maincode is training Matilda, the first large language model built and trained from scratch in Australia. Our new compute cluster is live, and we are now scaling the next version.
This role sits directly inside that training stack. You will build the pipelines, infrastructure, and tooling that determine how efficiently Matilda trains, how stable long runs are, and how fast new experiments can be executed. Training runs last days or weeks. Small changes propagate through complex systems. The work requires precision and patience.
We build AI systems from first principles: designing the architectures, running the infrastructure, shaping the training process, and operating the models ourselves. Matilda is not a research prototype. It is a production system, trained at scale and served for open public access.
Maincode operates one of the largest private AI compute environments in Australia, built for a single purpose: training our own models. This is not a role that wraps external APIs or ships user-facing features. You will be working on the systems that train a large language model from scratch.
WHAT YOU WOULD ACTUALLY DO
You will build and maintain the systems that support large scale model training.
This includes:
- Designing and maintaining distributed training pipelines for large language models
- Building data ingestion and preprocessing systems for large training datasets
- Developing tooling for experiment management, checkpointing, and reproducibility
- Monitoring and debugging long running training jobs across clusters
- Improving reliability and observability across the training stack
- Optimising training throughput across compute, memory, and data pipelines
- Working closely with researchers to translate experimental ideas into training runs
- Diagnosing failures across infrastructure, training loops, and data pipelines
Training runs can last days or weeks. Small changes propagate through complex systems.
You wi...