Tech Lead - Infrastructure Services N 4C
Genpact
- Site Reliability Engineering
- Terraform
- Datadog
- Python
- TRACES
- CI/CD
View details and apply
Observability Platform Engineering Design, implement, and maintain Datadog-based observability solutions across infrastructure, platforms, and applications. Develop and optimize dashboards, monitors, and alerts to support proactive detection and triage of performance and reliability issues. Integrate custom telemetry pipelines (metrics, logs, traces, events) aligned with Open Telemetry and platform architecture standards. Manage instrumentation strategies to ensure accurate and consistent coverage across services. 2. Site Reliability Engineering (SRE) Practices Apply SRE principles to improve service reliability, availability, and performance. Define and track SLIs, SLOs, and SLAs for critical systems, and build feedback loops to continuously enhance service health. Automate manual operational processes using Python, Terraform, or CI/CD tooling. Collaborate with development and platform teams to identify resilience patterns and embed observability by design. 3. Datadog Expertise
Apply Through Reszu shares your public profile for this lead. Phone and email stay private unless you approve contact access.