Site Reliability Engineer

apartmentSpydra placeHyderabad calendar_month 

Job Description

Site Reliability Engineer (SRE)

Position Overview

We are seeking a highly motivated and experienced Site Reliability Engineer (SRE) to join our team. The SRE will bridge the gap between software development and IT operations, ensuring the reliability, scalability, and performance of our systems.

The ideal candidate will be responsible for implementing and managing infrastructure, automating processes, and proactively addressing issues to maintain uptime and improve the overall efficiency of our platforms.

Key Responsibilities
  • System Reliability and Performance
  • Design, implement, and maintain scalable, reliable, and secure systems.
  • Monitor system performance and proactively address issues to meet SLAs and SLOs.
  • Conduct capacity planning and scalability testing.
  • Automation and Efficiency
  • Develop automation tools and scripts to streamline operational processes.
  • Implement CI/CD pipelines to ensure smooth software deployment.
  • Create self-healing mechanisms to improve system resiliency.
  • Incident Management
  • Lead incident response, troubleshooting, and root cause analysis for system issues.
  • Establish and maintain robust incident response plans.
  • Collaborate with cross-functional teams to resolve incidents and prevent recurrence.
  • Monitoring and Metrics
  • Develop and maintain monitoring, alerting, and logging solutions.
  • Track and analyze system performance metrics to drive improvements.
  • Provide regular reports on system reliability and operational health.
  • Infrastructure Management
  • Manage cloud-based and on-premise infrastructure (e.g., AWS, Azure, GCP).
  • Implement best practices for infrastructure as code (e.g., Terraform, Ansible).
  • Ensure system compliance with security and data protection regulations.
Required Qualifications
  • Bachelors degree in Computer Science, Engineering, or a related field.
  • 3+ years of experience in site reliability, DevOps, or infrastructure engineering.
  • Strong programming and scripting skills (e.g., Python, Go, Bash).
  • Hands-on experience with cloud platforms (AWS, Azure, GCP) and containerization tools (Docker, Kubernetes).
  • Proficiency in CI/CD tools (Jenkins, GitLab, ArgoCD) and infrastructure as code tools (Terraform, Ansible).
  • Experience with monitoring and logging tools (Prometheus, Grafana, ELK stack, Datadog).
  • Solid understanding of networking, security, and system architecture principles.
  • Exceptional problem-solving skills and a proactive approach to system management.
Preferred Qualifications
  • Experience with microservices architecture and distributed systems.
  • Knowledge of database administration (SQL, NoSQL).
  • Familiarity with incident management frameworks (ITIL, SRE practices).
  • Previous experience working in an Agile environment.
  • Certification in cloud platforms (e.g., AWS Certified Solutions Architect, Google Cloud Professional Engineer).
electric_boltImmediate start

Lead Site Reliability Engineer

apartmentJP Morgan Chase & Co.placeHyderabad
Job Description Assume a critical role in defining the future of a globally recognized firm and have a direct and significant effect in a realm tailored for top achievers in site reliability. As a Lead Site Reliability Engineer at JPMorgan Chase...
apartmentIPS Vantage Tech Services Private LimitedplaceHyderabad
Job Description Dear Candidate, Profile :- Senior Computer Engineer/Site Reliability Engineer Experience :- 6+ Years  •  Excellent knowledge of Azure and VMware cloud services and resources with hands-on experience  •  Good hands-on experience...
check_circleNew offer

Site Reliability Engineer II

apartmentPhenomplaceHyderabad
in 6 countries and over 1,500 employees. As an HR tech unicorn organization, innovation and creativity is within our DNA. Come help us make every talent moment Phenomenal! Position Summary Were looking for a Senior Site Reliability Engineer to join...