[ref. d78103524] Senior Site Reliability Engineer

placeBangalore calendar_month08/01/2025

Role Description:

Our mission at Booking.com is to create transformative, innovative, and personalised travel experiences for millions of customers all across the world. We want customers to have an amazing experience wherever and whenever they choose: mobile, web, and through partners and 3rd parties.

About the team - Private cloud:

The Private Cloud group operates, orchestrates, and optimizes Booking-managed cloud infrastructure. The Private Cloud capabilities are provided on platform instances that are privately owned and centrally managed by Booking.com. These platform instances, and the workloads running on them, are hosted both in Booking data centers (“on-premises”) and on public cloud infrastructure (AWS).

The Private Cloud platform has three primary internal customer-facing verticals: virtualisation, containerisation, and server-less, corresponding to the three types of workloads it supports.

At the highest level, the Booking Private Cloud drives three primary business outcomes:

Agility in provisioning and using cloud infrastructure.
Efficiency in cost and utilisation of cloud infrastructure, as well as toil reduction for developers and engineers.
Trust in the safety, reliability, and performance of our cloud infrastructure.

Key Job Responsibilities and Duties:

The core premise for the Booking SRE lies in treating operational and reliability problems of software systems as a software engineering problem. We code our way out of problems where operations are concerned addressing availability, scalability, latency, and efficiency challenges within the vast infrastructure here at Booking.

We expect our SRE engineers to be software engineers that optimize systems rather than be system operators.

You will impact millions of people all over the globe with your creative solutions
You work in one of the biggest e-commerce companies in the world
You will solve exciting problems at scale by writing and deploying code across tens of thousands of servers
Ensuring an “everything as code” mindset for yourself and your team
You will have the opportunity to collaborate with many of the world’s leading SREs
You will be free to launch your own ideas and solutions within our sophisticated production environment
Here are some of the tools and technologies we use to achieve this: Python, Go, Puppet, Kubernetes, Elasticsearch, Prometheus, HAProxy, Cassandra, Kafka etc

What you’ll be doing:

Design, develop and implement software that improves the stability, scalability, availability and latency of the Booking.com products;
Take ownership of one or more services and have the freedom to do what is best for our business and customers;
Solve problems occurring with our highly available production systems and build solutions and automation to prevent them from happening again;
Build effective monitoring to supervise the health of your system, and jump in to handle outages;
Build and run capacity tests to manage the growth of your systems;
Plan for reliability by designing systems to work across our multinational data centers;
Develop tools to assist the product development teams with successfully deploying 1000s of change sets every day;
Be an advocate of engineering standard processes;
Share the on-call rotation and be an escalation contact for incidents:
Contribute to Booking.com's growth through interviewing, on-boarding, or other recruitment efforts.

What you’ll bring:

8 years + hands-on experience in software and site reliability engineering within the technology sector. Coupled with expertise with building, operating and maintaining sophisticated and scalable systems.
Solid experience in at least one programming language. We use Java, Python, Go, Ruby, Perl;
Experience with Infrastructure as Code technologies;
Knowledge of cloud computing fundamentals;
Solid foundation in Linux administration and troubleshooting;
Understanding of Service level agreements and objectives;
Additional experience in OpenStack, Kubernetes, Networking, Security or Storage is desirable;
Supervising / observability technologies like Prometheus, Graphite, Grafana, Kibana, Elasticsearch are a plus;
Good interpersonal skills
Proficient command of the English language, both written and spoken

electric_boltImmediate start

Software Engineer II - Site Reliability Engineer

apartmentJP Morgan Chase & Co.placeBangalore

Job Description You're ready to gain the skills and experience needed to grow within your role and advance your career - and we have the perfect software engineering opportunity for you. As a Software Engineer II at JPMorgan Chase within the Asset...

starFeatured

Expert Site Reliability Engineer

placeBangalore

to reliability measured in our change success rate and mean time to repair. FusionOperate is a Multi Cloud DevOps PaaS focused on Container Orchestration, Continuous Delivery, Observability, AIOPs, Insights & Data. As a Site Reliability Engineer your mission...

thumb_up_altRecommended

Senior Site Reliability Engineer - AI Research Clusters

apartmentNvidiaplaceBangalore

these clusters at high reliability, efficiency, and performance and drive foundational improvements and automation to improve researchers productivity. As a Site Reliability Engineer, you are responsible for the big picture of how our systems relate to each other...

Recommended jobs:

Automation Engineer Jobs in Bangalore

Chemical Engineer Jobs in Bangalore

Build and Release Engineer Jobs in Bangalore

Cisco Network Engineer Jobs in Bangalore 4 Urgent

Bridge Engineer Jobs in Bangalore 5 Urgent