Multimodal AI Systems Architect (AI Engineering)

🇺🇸 Seattle, WA
Posted 3 weeks ago
Expires June 23, 2026

Hyphen Connect is seeking a Multimodal AI Systems Architect to join our AI Engineering team in Seattle, USA. This role focuses on developing and optimizing AI systems that seamlessly integrate vision and audio models, enhancing our voice-to-voice interactions and multimodal retrieval capabilities.

As a Multimodal AI Systems Architect, you will be responsible for integrating vision encoders and audio-native models into core agent reasoning loops. You will optimize streaming latency for voice-to-voice AI interactions and architect multimodal retrieval-augmented generation (RAG) systems capable of extracting insights from videos and PDFs.

The ideal candidate will have experience with Whisper, CLIP, and multimodal large language model (LLM) integration. Knowledge of streaming architectures and WebRTC is essential, along with expertise in cross-modal alignment.

Hyphen Connect offers a dynamic work environment where innovation and collaboration are highly valued. We provide opportunities for professional growth and development, encouraging our team members to stay at the forefront of AI technology.

More Jobs at Hyphen Connect