Synthetic Data Engineer (AI Data/Training)
The Synthetic Data Engineer (AI Data/Training) will join Hyphen Connect, a forward-looking company focused on leveraging artificial intelligence for data solutions. This role is part of a dynamic team dedicated to driving the creation and optimization of synthetic data pipelines crucial for machine learning model development and performance. In this position, you will contribute to the foundation of innovative AI products, working closely with both data science and engineering colleagues in a fast-paced, collaborative environment.
Day-to-day responsibilities include designing and building scalable architectures for generating synthetic datasets, curating and anonymizing data, and rigorously testing data output for accuracy and bias mitigation. You will work with state-of-the-art tools and frameworks to improve data diversity, quality, and privacy, ensuring datasets enable robust machine learning training and evaluation. Collaboration with product, engineering, and machine learning teams to deliver data-driven results and troubleshoot technical issues will be essential.
The ideal candidate will have extensive experience in data engineering, synthetic data generation, and familiarity with cloud platforms. Proficiency in Python and relevant libraries as well as a strong understanding of AI/ML concepts is critical. A background in handling large-scale datasets, knowledge of privacy-preserving techniques, and demonstrated problem-solving skills are highly desirable. Experience with data pipelines, APIs, and data security best practices is a plus.
Hyphen Connect offers a competitive compensation package that may include salary, equity, and comprehensive health benefits. Team members enjoy flexible working arrangements, wellness programs, and access to professional development resources.
The company fosters an innovative and supportive culture, valuing diversity and continuous learning. Employees are encouraged to take initiative, contribute to open discussions, and grow within a rapidly evolving AI and data-driven landscape.