Infrastructure Engineer - Observability (APAC)
We are seeking a seasoned infrastructure operations expert that has experience with orchestrating high-throughput data services.
Experience that showcases high availability and systems reliability skillsets in high volume data pipeline environments are a big plus.
WHAT YOU'LL OWN
- Collaborate deeply with our infrastructure and product teams to enforce org-wide practices for emitting and collecting telemetry across a wide range of services, both internal and external-facing. This includes contributing to org-wide documentation, advocacy of best practices and helping to enforce standards org-wide.
- Own and operate the Kubernetes infrastructure of the observability team. You will help in defining the documentation, operational flows, and engineering standards to ensure high uptime across our logging, tracing, and metrics systems that are used by internal and external stakeholders
- Work within the Observability team to ensure industry-standard deployment and reliability practices are used, and to develop industry-leading reliability software to ensure that our observability systems never go down for our customers.
- Orchestrate and scale systems such as VictoriaMetrics, OpenTelemetry Collector, and Vector.
WHAT YOU BRING
- 5+ years of experience in a Site Reliability Engineering role
- Experience operating and supporting clustered applications in production environments
- Hands-on experience deploying and managing applications in Kubernetes (k8s) environments
- Working knowledge of PostgreSQL, including administration, performance tuning, and troubleshooting
- Proficiency with at least one Infrastructure as Code (IaC) tool (e.g., Terraform, Pulumi, OpenTofu, or equivalent)
- Experience with telemetry tooling such as OpenTelemetry, VictoriaMetrics, Grafana, Prometheus.
- Experience with AWS services is a plus
- Strong documentation and communication skills is a plus
WHAT WE OFFER
- Fully Remote
We hire globally. We believe yo...