Multimodal AI Systems Architect (AI Engineering)

🇦🇺 Oregon, Australia
Posted 3 weeks ago
Expires June 23, 2026

Hyphen Connect is seeking a Multimodal AI Systems Architect to develop and optimize AI systems that seamlessly integrate vision and audio models. This role focuses on enhancing voice-to-voice interactions and multimodal retrieval capabilities, ensuring the systems are both efficient and innovative.

The primary responsibilities include integrating vision encoders and audio-native models into core agent reasoning loops, optimizing streaming latency for voice-to-voice AI interactions, and architecting multimodal retrieval-augmented generation (RAG) systems capable of extracting insights from videos and PDFs.

Candidates should have experience with Whisper, CLIP, and multimodal large language model (LLM) integration. Knowledge of streaming architectures and WebRTC is essential, along with expertise in cross-modal alignment.

Hyphen Connect offers a dynamic work environment where innovation and collaboration are highly valued. Employees have opportunities for professional growth and development, working on cutting-edge AI technologies that have a significant impact in the industry.

More Jobs at Hyphen Connect