- Bridging the gaps b/w core infra, security, QA and development team.
- Owning the end-to-end Availability and Performance of applications and their infrastructure
- Providing 24X7 infra & app support
- Automate and improve development and release processes.
- Creating, managing and maintaining entire infrastructure using IaC.
- Onboarding new applications with the production readiness review process.
- Working with Core Infra, Dev and Product teams to define SLO/Error Budgets/Alerts.
- Working with the Dev team to have an in-depth understanding of the application architecture and its bottlenecks.
- Identifying observability gaps in application & infrastructure and working with stakeholders to fix them by leveraging right toolsets.
- Managing outages and doing detailed RCA with developers and identifying ways to avoid that situation.
- Automate toil and repetitive work.
- Experience in managing high traffic, large scale microservices and infrastructure with excellent troubleshooting skills.
- Experience in troubleshooting, managing and deploying containerized environments using Docker/containerd, Kubernetes is a must.
- Must be proficient with helm.
- Must be very hands-on in managing and troubleshooting the Kubernetes environment.
- Extensive experience with Linux administration and a good understanding of the various Linux kernel subsystems (memory, storage, network etc).
- Extensive experience in DNS, TCP/IP, UDP, Routing and Load Balancing.
- Expertise in GitOps, Infrastructure as a Code tool such as Terraform etc.
- Expertise in AWS Cloud and/or other relevant Cloud Infrastructure solutions.
- Experience in building the CI/CD pipelines with tools such as Jenkins, GitLab ect
- Experience with multiple datastores is a plus (Kafka/RabbitMQ, Redis, Elasticsearch).
- Must be good in any of the DevOps scripting languages - bash, python or go.
- A collaborative spirit with the ability to work across disciplines to influence, learn and deliver.
- A deep understanding of computer science, software development, and networking principles
Must Have Skills: Kubernetes, Helm, GitLab/Jenkins, Terraform, AWS (EC2, RDS, Elasticache, VPC, Route53 etc etc)