Site Reliability Engineer
Job Description
We at Altizon are on a mission to build easy to use products to solve real world, hard and complex problems for our customers.
We are building mission critical product and solutions to help our customers in manufacturing make data driven decisions. Our goal is to provide data insights on our user's fingertips in the most simplistic yet powerful way. We have built expertise in building Digital Factory solutions over last 11 years and continue to strive by learning continuously and through innovations.
Our customers in Food & Beverage, Chemical and Automobile industries love our products. Our flagship IIOT platform is a critical element in their operations playbook; whether it's improving productivity and throughput on the shop floor, predictive maintenance, digital checklist or predicting quality using our advanced AI/ML algorithms, or taking a bird's eye view of their factory efficiency from a control tower.
To maintain our leadership position in the above markets and continue to deliver new and innovative products, we are looking to hire Site Reliability Engineer with strong statistical and analytics concepts with expertise in Python or R. The successful candidate will have 1 to 3 years of experience and deep knowledge of these technologies.
What you will do- Get an opportunity to work on a cutting-edge, highly scalable technology stack that handles billions of events at scale.
- Maintain the health and integrity of the platform infrastructure and data processing pipelines. This includes Compute, Network & Storage infra
- Design, develop and employ tools, scripts, instrumentation and dashboards that will monitor availability and performance of each component of the platform.
- Handle server/service outages with priority and communicate any unavailability to stakeholders.
- Make sure that the platform and its internal services are working according to laid down security guidelines and compliance requirements.
- Implement backup and disaster recovery strategies.
- Be responsible for continuous builds, nightly builds, and associated tooling
- Be responsible for maintaining docker images/scripts of internal micro-services
- Own DevOps scripts for deploying new/updating existing stacks with a required footprint (small/HA) Day 1 and Day 2
- Understanding and hands on knowledge/experience of AZURE is very Important.
- AWS Certified Solutions Architect Associate certification or AWS Cloud Practitioner certification
- Bachelor's (BE) or Masters (MS, MCS). (Computer Science preferred)
- Minimum of first-class in Academics
- Unix and Shell scripting
- Understanding of public cloud infrastructure (AWS/GCP)
- Aware of DevOps Tools (e.g. Docker, any Load balancer)
- Good programming skills with scripting languages necessary for writing and maintaining automation scripts
- Optional but good to have:
- ELK stack
- Ansible
- Jenkins or any other deployment tool
- python
Candidates should be able to demonstrate their experience and skills through previous work.