Lead DevOps Engineer - Azure
Job Description
Job Specification: Lead DevOps Engineer - AzureJob Title: Lead DevOps Engineer
Location: Fully Remote (India only)
Work Hours: 2:00 PM to 11:00 PM IST
Experience Required: 9+ Years (4+ Years on Azure)
Role Summary:
We are seeking a highly skilled and experienced Lead DevOps Engineer to join our dynamic team. The ideal candidate will have extensive expertise in Azure Cloud Services, Infrastructure as Code (IaC), and containerization technologies. This role requires strong leadership and technical skills to manage and optimize a multi-tenant shared production infrastructure, drive cost-efficiency, and ensure seamless deployment processes.The individual will also lead efforts in implementing robust disaster recovery strategies, cloud security, and production support for critical applications.
Key Responsibilities:
Design, implement, and maintain IaC using Terraform for scalable and efficient infrastructure management.
Manage and optimize Azure services, including Web Apps, App Services, Front Door, API Management, Redis Cache, Cosmos Postgres, Cosmos MongoDB, AI Search Index, Event Hub, Azure Functions, and Key Vault.
Implement cost management solutions and drive Azure cloud spend optimization.
Extensive experience with containerization technologies such as Kubernetes (AKS) and Istio for service mesh management.
Implement and maintain CI/CD pipelines using GitHub Actions for streamlined container deployments.
Lead the transition from App Insights to Splunk Observability for application performance monitoring and troubleshooting.
Implement logging and alerting mechanisms for proactive incident management using tools.
Design and execute disaster recovery failover and failback processes to ensure business continuity.
Drive chaos engineering practices to test and improve system resilience.
Enhance container security by integrating CrowdStrike for robust threat detection and mitigation.
Ensure adherence to cloud security best practices for infrastructure and applications.
Implement Blue-Green deployments and manage API versioning for seamless application updates.
Provide production deployment and incident support for both production and non-production environments.
Optimize database performance, including transitioning RU-based MongoDB to vCore-based Cosmos DB.
Maintain and enhance database reliability and scalability for multi-tenant environments.
Work closely with cross-functional teams to decommission legacy infrastructure and support ephemeral environments for testing.
Collaborate with teams using Jira and Confluence to streamline DevOps processes and ensure effective documentation.
Manage peak event capacity planning to ensure high availability during critical business periods.
Optimize cloud resource usage and costs through strategic planning and automation.
Required Qualifications:
9+ years of experience in DevOps roles, with 4+ years working on Azure cloud services.
Extensive hands-on experience with Terraform for Infrastructure as Code.
Strong knowledge of AKS, Web Apps, App Services, and related Azure technologies.
Proficient with Front Door, API Management, Redis Cache, Cosmos DB (Postgres/MongoDB), AI Search Index, Event Hub, Azure Functions, and Key Vault.
Skilled in GitHub, GitHub Actions, and automation pipelines.
Hands-on experience with Kubernetes, Istio, and container security tools like CrowdStrike.
Experience transitioning observability tools (e.g., App Insights to Splunk) and configuring OpsGenie alerts.
Familiarity with CloudBolt for cloud spend optimization.
Preferred Qualifications:
Experience with incident management and support in production and non-production environments.
Hands-on experience with tools like Jira, Confluence, and OpsGenie.
Exposure to advanced DevOps practices like chaos engineering and ephemeral environments.
Key Initiatives Led:
Disaster recovery failover and failback.
Multi-tenant shared infrastructure management in production.
Database optimization (e.g., RU-based MongoDB to vCore).
Splunk implementation and App Insights decommissioning.
Cloud security for containers and legacy infrastructure decommissioning.
Blue-Green deployments and API version management.
Peak event capacity management and Azure cloud spend optimization.
Soft Skills:
Strong problem-solving and troubleshooting skills.
Excellent communication and collaboration with cross-functional teams.
Ownership mindset with a proactive approach to addressing challenges.