Senior Site Reliability Engineer

apartmentOracle placeHyderabad calendar_month 

Job Description

Job Description

Oracle is seeking a motivated Senior Site Reliability Engineer who thrives in a fast-paced rapidly evolving technology environment. This position requires wide and overall knowledge in Linux administration, software development, cloud computing, networking, performance analysis and monitoring to provide the stability, performance, and reliability our infrastructure need.
Senior Site Reliability Engineer expected to work with multiple service development teams, identifying cross-team issues that create risk for operations across the organization and resolving those issues with a mixture of engineering, development, troubleshooting expertise, and general operational guidance.

This role also requires excellent communication and organizational skills. The candidate is expected to collaborate with service owners, other engineers and developers to deliver a superior support experience to development community

Career Level - IC3

Responsibilities

RESPONSIBILITIES
  • Solve complex problems related to Linux infrastructure, cloud infrastructure and build automation to prevent problem recurrence.
  • Identify opportunities and drive the implementation of automation to improve service health, availability and reliability
  • Configure, design, and script end-to-end service monitoring, alerting and self-healing capabilities for production services
  • Understand the end-to-end configuration, technical dependencies, characteristics of production infrastructure and services
  • Quickly grasp and analyze new technologies that are complex and rapidly changing and integrate those into automation and infrastructure support
  • Act as escalation point for complex or critical issues that may not have a documented procedure and provide cause analysis (RCA)
  • Author functional and technical documentation and standard operating producers (SOP)
  • Collaborate with development teams in defining and implementing improvements in service architecture.
  • Articulate technical characteristics of services and technology areas and guide cross-functional teams to engineer and add capabilities to internal tools.
  • Responsible for the design and delivery of the mission critical automation, with focus on security, resiliency, scale, and performance.
Knowledge Skills
  • 5 - 8 years of experience in Site Reliability Engineering and in implementing automation.
  • Experience in Linux administration with good knowledge on Kernel level debugging
  • Experience in debugging operating system performance issues and performance tuning
  • Excellent troubleshooting skills for resolving critical application, networking and system administration issues
  • Experience working with fault tolerant, highly available, high throughput, distributed, scalable systems
  • Expertise in developing scripts, utilities and tools to automate routine or manual intensive tasks
  • Experience in application, compute and storage troubleshooting for improving application reliability, scalability, availability
  • Experience in cloud infrastructure technologies
  • Experience with monitoring tools such as Prometheus, Grafana
  • Programming languages development experience using Python/Terraform
  • Experience in managing high-availability production applications.
  • Possess and demonstrates strong logical-thinking skill, full of intellectual curiosity and high for self-development.
  • Aptitude to be a good team player and the desire to learn and implement new cloud technologies as needed
  • Excellent organizational, verbal, and written communication skills

About Us

As a world leader in cloud solutions, Oracle uses tomorrow's technology to tackle today's problems. True innovation starts with diverse perspectives and various abilities and backgrounds.

When everyone's voice is heard, we're inspired to go beyond what's been done before. It's why we're committed to expanding our inclusive workforce that promotes diverse insights and perspectives.

We've partnered with industry-leaders in almost every sectorand continue to thrive after 40+ years of change by operating with integrity.

Oracle careers open the door to global opportunities where work-life balance flourishes. We offer a highly competitive suite of employee benefits designed on the principles of parity and consistency. We put our people first with flexible medical, life insurance and retirement options.

We also encourage employees to give back to their communities through our volunteer programs.

We're committed to including people with disabilities at all stages of the employment process. If you require accessibility assistance or accommodation for a disability at any point, let us know by calling +1 888 404 2494, option one.

Disclaimer:

Oracle is an Equal Employment Opportunity Employer*. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, national origin, sexual orientation, gender identity, disability and protected veterans status, or any other characteristic protected by law.

Oracle will consider for employment qualified applicants with arrest and conviction records pursuant to applicable law.

  • Which includes being a United States Affirmative Action Employer
thumb_up_altRecommended

Site Reliability Engineer

apartmentOptumplaceHyderabad
Job Description Optum is a global organization that delivers care, aided by technology to help millions of people live healthier lives. The work you do with our team will directly improve health outcomes by connecting people with the care, pharmacy...
apartmentAmazonplaceHyderabad
with Amazon's AWS cloud services? Do you have a passion for ensuring a positive customer experience? This is the job for you. AWS Financial Technology Team is looking for a results-oriented, innovative software development engineer, who can help us create...
check_circleNew offer

Lead Site Reliability Engineer

apartmentZenotiplaceHyderabad
years of overall experience in any discipline of Software Engineering. At least 4+ years of experience in DevOps/Site Reliability Engineering domain. Hands on experience in Python, should be able to write scripts or automation. Experience with public...