Senior Site Reliability Engineer (SRE)

placeBangalore calendar_month 

Responsibilities:

About the role

As a Site Reliability Engineer (Cloud), you will join and reinforce the TCM Kondor modernization and cloud enablement team which primary objectives will be to act as a central team to accelerate the Cloud Transformation journey across our core systems.

We are looking for a curious and enthusiast Site Reliability Engineer to join our team, to optimize, design, implement, observe and maintain our organization’s cloud-based systems.

A Site Reliability Engineer’s responsibilities include design, deploying and debugging systems, as well as executing new cloud initiatives.

Ultimately, you will work with different IT professionals and teams to ensure our cloud computing systems meet the needs of our organization and customers.

Objectives of this Role
  • Work in tandem with our engineering team to identify and implement the most optimal cloud-based solutions for the company.
  • Define and document best practices and strategies regarding application deployment and infrastructure maintenance.
  • Provide guidance, thought leadership, and mentorship to development teams to build cloud competencies.
  • Ensure application performance, uptime, and scale, maintaining high standards of code quality and thoughtful design.
  • Managing cloud environments in accordance with company security guidelines.
  • Stay current with industry trends, making recommendations as needed to help the organization innovate and excel.
Responsibilities
  • Develop, deploy and maintain infrastructure on Azure using Docker and Kubernetes.
  • Implement automation tools and frameworks (CI/CD pipelines).
  • Collaborate with team members to improve the company’s engineering tools, systems and procedures, and data security.
  • Optimize the company’s computing architecture.
  • Conduct systems tests for security, performance, and availability.
  • Develop and maintain design and troubleshooting documentation.
  • Collaborate with the engineering teams to enable their applications to run on Cloud infrastructure.
  • Debugging technical issues inside a complex stack involving virtualization, containers, microservices, etc.
  • Troubleshoot incidents, identify root cause, fix and document problems, and implement preventive measures.
  • Employ exceptional problem-solving skills, with the ability to see and solve issues before they snowball into problems.
Requirements
  • Bachelor’s degree in computer science, information technology, or mathematics
  • 5+ years of proven experience as a Site Reliability Engineer or similar role in software development and system administration.
  • Experience in Docker for containerization and application deployment.
  • Experience with Kubernetes and Helm for orchestration of Docker containers.
  • Experience with Azure cloud services and understanding of their offerings and architecture.
  • Knowledge of databases and operating systems.
  • Ability to troubleshoot complex software and hardware issues.
  • Knowledge of best practices related to data encryption and cybersecurity.
  • Excellent problem-solving and communication skills.
  • Experience in network, server, and application-status monitoring.
  • Operating systems – any Linux/Unix flavor
  • Monitoring – Prometheus, Grafana
Nice to Have
  • Relevant certifications such as Certified Kubernetes Administrator (CKA), Certified Kubernetes Application Developer (CKAD), or Azure Certifications (AZ-104, AZ-204, AZ-400, etc.).
  • Experience with other cloud platforms like AWS or GCP.
  • Experience in network, server, and application-status monitoring.
  • CI/CD - Jenkins (groovy)
  • Exposure to Azure pipelines
  • Knowledge on GIT Version control
  • Scripting
apartmentChase BankplaceBangalore
Reliability Engineer III at JPMorgan Chase within the Commercial & Investment Bank Payments Technology team, youwill solve complex and broad business problems with simple and straightforward solutions. Through code and cloud infrastructure, you will configure...
local_fire_departmentUrgent

Performance & Reliability Architect

apartmentAccentureplaceBangalore
and availability monitoring tools, processes, and techniques. Collaborates with the Technology and Enterprise Architects for the selection and design of run-time and DevOps technologies. Must have skills : Site Reliability Engineering Good to have skills...
apartmentBroadridge Matrix Trust CompanyplaceBangalore
Site Reliability Engineer & DevOps Engineer Requirements / Qualifications  •  On-Prem Experience: Hands-on experience managing on-premises infrastructure.  •  Version Control: Proficiency in Git for source code management.  •  CI/CD Tools: Strong experience...