Platform Engineer / Distributed Systems Engineer - Gurgaon - ref. f8620419
Job Description
What You'll Work On
Design and develop a next-generation scalable observability platform for modern cloud-native and hybrid infrastructures that works in tandem with AI agents.
Create intelligent AI agents to analyze logs, traces, and metrics in real time, delivering automated insights and remediation.- Build scalable and fault tolerant AI agent frameworks
Engineer and optimize large-scale analytics pipelines to process high-velocity telemetry data.
Build resilient distributed systems with high reliability, performance, and fault tolerance.
Implement and fine-tune LLMs for natural language querying and automated troubleshooting.
Partner with ML engineers to streamline AI model deployment and management.
What We're Looking For
Strong programming skills in Python and Golang (experience with Rust is a plus)
Track record of building distributed systems and large-scale analytics pipelines
Hands-on experience with cloud infrastructure (AWS, GCP, or Azure) and Kubernetes
Deep understanding of observability technologies (Prometheus, OpenTelemetry, Grafana, Elastic, etc.)
Knowledge of LLMs, AI agents, agent frameworks liks langchain, autogen is a plus
Experience with stream processing and real-time data processing frameworks
Proficiency in database technologies (SQL & NoSQL, Time-Series DBs)
5+ years** of relevant experience
Bachelor's degree in Computer Science, Engineering, or related field (Master's/PhD is a plus)