Lead Lab Support Engineer
As a Lead Lab Support Engineer at Graphcore, you will serve as the technical lead providing IT support to Engineering Labs and Silicon development projects. This includes bring-ups and testing in hybrid Linux and Windows environments. You will deliver reliable, scalable, and high-quality services to internal customers, partners, and collaborators. You will also lead a small team of Lab Support Engineers and encourage a culture of accountability, collaboration, and continuous improvement.
Working alongside Project Managers, engineering teams, and interested parties, you will build and deploy scalable workflows. You must ensure IT support for Engineering Labs keeps pace with the organisation’s evolving requirements. At first, this role requires direct involvement in daily support, configuration, and onboarding activities while forming the Lab Support team and setting up scalable processes.
Key responsibilities include logging and solving support requests face-to-face and via a ticketing system, ensuring high-quality L1 and L2 support through effective triage, prioritization, and allocation of tickets, and establishing, managing, and consistently refining the Lab Support service, covering standards, processes, and governance. You will serve as the primary point of escalation and decision-making for IT service delivery, handle and provide support for Linux-based systems, maintain Windows-based systems, manage a fleet of servers, assist Hardware Lab teams with daily activities, install and troubleshoot servers, perform hardware maintenance and fault finding, and detail solutions while maintaining a clear, up-to-date internal knowledge base.
The ideal candidate will possess excellent communication and customer service skills, a good understanding of troubleshooting principles and methodical problem-solving, strong Linux administration skills in Debian and RedHat derivatives, good Windows administration skills including Active Directory domain joining and policies, and good networking skills such as VLANs, VPNs, Wi-Fi, routing, and subnetting. Familiarity with desktop and server hardware, BMCs, Out-of-Band networks, firmware and BIOS upgrades, PDUs, and rack mounts is essential. Experience in managing Infrastructure-as-Code using Puppet, Ansible, or similar via Git, and understanding of authentication services such as LDAP and Radius are also required.
Desirable qualifications include experience in managing web servers, load-balancers, and reverse-proxies (e.g., ha-proxy, nginx), identifying network, storage, CPU, and RAM bottlenecks across complex workloads, experience with various monitoring solutions and stacks (e.g., Zabbix, Prometheus, Grafana Mimir, Open Telemetry), proficiency with containerization frameworks and orchestration (e.g., Docker, containerd, Kubernetes), Python programming skills with the ability to write code to interact with APIs, process data, and build small applications, and experience with CI/CD pipelines via GitLab or GitHub.