Site Reliability Engineer

Bengaluru, Karnataka Autodesk

Posted 4 days ago

Job Viewed

Tap Again To Close

Job Description

**Job Requisition ID #**
25WD89004
**Position Overview**
Are you excited about shaping the future of platform development through cutting-edge technologies and cloud services?
Autodesk is searching for a skilled Platform Development Engineer to join our Platform Services and Emerging Technologies team. In this role, you will play a pivotal role in designing, developing, and optimizing our cloud-based platform, with a focus on serverless computing, virtual machines, and container orchestration services in both AWS and Azure environments. If you are passionate about leveraging cloud services to build scalable and resilient platforms while driving innovation and collaboration, this is the perfect opportunity for you.
**Minimum Qualifications**
+ Bachelor's degree in Computer Science, Computer Engineering, or a related field, with a minimum of 2 years of experience in platform development or a similar role.
+ 2+ years of hands-on software development experience in Python, Java, Go, NodeJS, or .NET.
+ Proficiency in serverless computing services such as AWS Lambda and Azure Functions.
+ Hands-on experience with virtual machine services like AWS EC2 and Azure Virtual Machines.
+ Familiarity with container orchestration services such as AWS ECS, Azure Container Instances (ACI), and Azure Kubernetes Service (AKS).
+ Strong understanding of Infrastructure as Code (IaaC) principles and tools like Terraform and CloudFormation.
+ Experience designing and implementing scalable, high-performance platform solutions.
+ Experience implementing unit and integration tests.
**Preferred Qualifications**
+ AWS certifications such as AWS Certified Solutions Architect or AWS Certified Developer.
+ Azure certifications such as Azure Solutions Architect or Azure Developer Associate.
+ Proficiency in CI/CD pipelines for automated deployment and testing.
+ Experience with monitoring and logging tools such as CloudWatch, Azure Monitor, and ELK Stack.
**The Ideal Candidate**
+ You have a strong background in platform development, with expertise in serverless computing, virtual machines, and container orchestration services in both AWS and Azure environments.
+ You are well-versed in Infrastructure as Code (IaaC) principles and tools.
+ You thrive in a collaborative environment and are passionate about leveraging cloud services to build scalable and resilient platforms.
+ You stay updated with the latest trends and best practices in cloud computing and platform development.
#LI-AK1
**Learn More**
**About Autodesk**
Welcome to Autodesk! Amazing things are created every day with our software - from the greenest buildings and cleanest cars to the smartest factories and biggest hit movies. We help innovators turn their ideas into reality, transforming not only how things are made, but what can be made.
We take great pride in our culture here at Autodesk - it's at the core of everything we do. Our culture guides the way we work and treat each other, informs how we connect with customers and partners, and defines how we show up in the world.
When you're an Autodesker, you can do meaningful work that helps build a better world designed and made for all. Ready to shape the world and your future? Join us!
**Salary transparency**
Salary is one part of Autodesk's competitive compensation package. Offers are based on the candidate's experience and geographic location. In addition to base salaries, our compensation package may include annual cash bonuses, commissions for sales roles, stock grants, and a comprehensive benefits package.
**Diversity & Belonging**
We take pride in cultivating a culture of belonging where everyone can thrive. Learn more here: you an existing contractor or consultant with Autodesk?**
Please search for open jobs and apply internally (not on this external site).
This advertiser has chosen not to accept applicants from your region.

Site Reliability Engineer

Bengaluru, Karnataka NetApp

Posted 5 days ago

Job Viewed

Tap Again To Close

Job Description

**Job Summary**
NetApp's Instaclustr offering provides open source as-a-service, delivering reliability at scale. As TechOps Engineer, you will be part of our TechOps team who maintain and support our large fleet of cloud-hosted Apache Cassandra, Kafka, OpenSearch & other open-source technology clusters. Every day will raise interesting and challenging problems ranging across many different areas such as fleet management, customer support, operations and incident response. You will be working with other teams, including Product Management, Development Teams and Customer Success to help drive and shape the way we support our customers.
**Key Responsibilities**
+ Provide expert operational support to our nodes running in the cloud(AWS/Azure/GCP), using technologies such as Linux (CoreOS, Ubuntu), Docker,and languages including Java, Python and bash.
+ Responding to customer queries and incidents, diagnosing and solving complex technical issues by liaising with customer's engineers on Apache Cassandra, Apache Kafka, Elastic Search, Redis and other supported technologies and maintain a highstandard of customer communication
+ Investigate issues and apply standard change and maintenance procedures to optimize the performance and stability of production systems
+ Undertake complex cluster operations such as migrations, upgrades and maintenance on our fleet.
+ Develop and continually improve our suite of internal automation tools, applications, documentation, processes and procedures
+ Be a proactive, reliable and supportive member of the TechOps team, and participate in a 24/7 rotating shift roster
**Job Requirements**
+ Strong knowledge and experience with Unix/Linux and be comfortable working from the command line. This is essential, there are no GUIs here.
+ Good fundamental computer science / software engineering skills and knowledge, particularly operating system internals, resource management, and networking.
+ Previous experience working with databases and/or Open-Source Techs (managingstandard tasks, handle production issues, performance tuning) in a support role.
+ Programming skills in Python, bash scripting, SQL, and source code control usingGit.
+ Exceptional ability to communicate clearly and professionally in written and verbalEnglish (essential).
+ Demonstrated ability to multitask.
+ Passion for all things IT, and especially open source.
+ Any customer service experience is favourable
**Education**
Typically requires a 2-3 years of related experience.

At NetApp, we embrace a hybrid working environment designed to strengthen connection, collaboration, and culture for all employees. This means that most roles will have some level of in-office and/or in-person expectations, which will be shared during the recruitment process.
**Equal Opportunity Employer:**
NetApp is firmly committed to Equal Employment Opportunity (EEO) and to compliance with all laws that prohibit employment discrimination based on age, race, color, gender, sexual orientation, gender identity, national origin, religion, disability or genetic information, pregnancy, and any protected classification.
**Why NetApp?**
We are all about helping customers turn challenges into business opportunity. It starts with bringing new thinking to age-old problems, like how to use data most effectively to run better - but also to innovate. We tailor our approach to the customer's unique needs with a combination of fresh thinking and proven approaches.
We enable a healthy work-life balance. Our volunteer time off program is best in class, offering employees 40 hours of paid time off each year to volunteer with their favourite organizations. We provide comprehensive benefits, including health care, life and accident plans, emotional support resources for you and your family, legal services, and financial savings programs to help you plan for your future. We support professional and personal growth through educational assistance and provide access to various discounts and perks to enhance your overall quality of life.
If you want to help us build knowledge and solve big problems, let's talk.
This advertiser has chosen not to accept applicants from your region.

Site Reliability Engineer

Bangalore, Karnataka IBM

Posted 17 days ago

Job Viewed

Tap Again To Close

Job Description

**Introduction**
A career in IBM Software means you'll be part of a team that transforms our customer's challenges into solutions.
Seeking new possibilities and always staying curious, we are a team dedicated to creating the world's leading AI-powered, cloud-native software solutions for our customers. Our renowned legacy creates endless global opportunities for our IBMers, so the door is always open for those who want to grow their career.
IBM's product and technology landscape includes Research, Software, and Infrastructure. Entering this domain positions you at the heart of IBM, where growth and innovation thrive.
**Your role and responsibilities**
As a Site Reliability Engineer, you will work in an agile, collaborative environment to build, deploy, configure, and maintain systems for the IBM client business. In this role, you will lead the problem resolution process for our clients, from analysis and troubleshooting, to deploying the latest software updates & fixes.
* System Architecture and Automation: Design, implement, and maintain scalable infrastructure and services. Automate infrastructure management, configuration, and deployment.
* Incident Response: Act as a point of escalation for major incidents, resolving service disruptions swiftly. Perform post-incident reviews and implement permanent resolutions.
* Monitoring and Performance Tuning: Create and maintain monitoring, logging, and alerting solutions to track system performance. Identify bottlenecks and optimize performance.
* Service Level Objectives (SLOs) and SLAs: Define and manage service-level objectives (SLOs) and service-level agreements (SLAs). Ensure services meet or exceed these benchmarks.
* Capacity Planning: Analyze service trends to predict future needs and ensure that the infrastructure is prepared to handle anticipated loads.
* Collaboration with Engineering Teams: Work closely with software engineering teams to build reliable and scalable solutions, contributing to the design and implementation of new features.
* Security and Compliance: Implement and manage security best practices. Ensure compliance with regulatory and corporate policies.
**Required technical and professional expertise**
2-5 years in Site Reliability Engineering, DevOps, or a related field.
* Cloud Technologies: Hands-on experience with AWS and Azure.
* Infrastructure as Code (IaC): Expertise in automation tools like Terraform (preferred), CloudFormation.
* Containerization and Orchestration: Proficiency with Docker, Kubernetes, or OpenShift.
* Programming and Scripting: Proficiency in languages like Python, Go, or Bash for automation and scripting tasks.
* CI/CD Pipelines: Experience with Continuous Integration/Continuous Deployment tools such as GitLab CI, or Jenkins.
**Preferred technical and professional experience**
* Monitoring Tools: Strong knowledge of monitoring and alerting tools like Prometheus and Grafana.
* Networking and Security: Strong understanding of networking concepts (DNS, TCP/IP, HTTP) and security best practices.
* Database Management: Experience in managing both SQL and NoSQL databases.
* Experience with large-scale distributed systems.
* Exposure to microservices architecture.
* Strong troubleshooting and analytical skills.
* Certifications in cloud platforms (AWS Certified, Azure Certified, K8s certification etc.).
* Experience working in Agile environments.
IBM is committed to creating a diverse environment and is proud to be an equal-opportunity employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, gender, gender identity or expression, sexual orientation, national origin, caste, genetics, pregnancy, disability, neurodivergence, age, veteran status, or other characteristics. IBM is also committed to compliance with all fair employment practices regarding citizenship and immigration status.
This advertiser has chosen not to accept applicants from your region.

Site Reliability Engineer

Bengaluru, Karnataka Tavant

Posted 5 days ago

Job Viewed

Tap Again To Close

Job Description

About Tavant:


With 25+ years of experience building innovative digital products and solutions, Tavant provides impactful results to its customers. It has been the frontrunner in driving digital innovation and tech-enabled transformation across a wide range of industries such as Consumer Lending, Manufacturing, Agtech, Media & Entertainment, and Retail in North America, Europe, and Asia-Pacific. Powered by Artificial Intelligence and Machine Learning algorithms, we help our customers improve their operational efficiency, productivity, speed, and accuracy. Our suite of products and solutions are routinely rated high by the industry.


Ours is a challenging workplace where teams are diverse, competitive, and continually searching for tomorrow's technology and brilliant minds to create it. And we don’t focus just on what we do – we also care how we do it. So, bring your talent and ambition to make a difference. We’ll create a world of opportunities for you.


Job Title - Site Reliability Engineer (SRE)

Location - Bangalore

Work Experience -9 to 15 years

Seeking an experienced Site Reliability Engineer to join our Data Compute DevOps/Cloud/SRE team. This role focuses on ensuring high availability, scalability, and reliability of cloud infrastructure with emphasis on Kubernetes orchestration and AWS cloud services.


Essential Requirements

  • Kubernetes: Proficient level expertise (9+ years) - cluster management, networking, security, troubleshooting
  • AWS: Working knowledge of core services (EC2, EKS, S3, IAM, VPC)
  • Infrastructure as Code: Terraform and Terraform Enterprise (TFE)
  • CI/CD: GitLab and GitLab CI/CD pipelines
  • Monitoring & Observability: Prometheus and Grafana
  • GitOps: ArgoCD deployment experience
  • Container Orchestration: Helm charts and package management
  • Version Control: Git workflows and repository management


Core Competencies

  • SRE principles and practices (SLIs, SLOs, error budgets)
  • Infrastructure automation and orchestration
  • Incident management and post-mortem processes
  • System reliability and performance optimization
  • Scripting abilities (Bash, Python)
This advertiser has chosen not to accept applicants from your region.

Site Reliability Engineer

Bengaluru, Karnataka Xebia

Posted 5 days ago

Job Viewed

Tap Again To Close

Job Description

We are seeking an experienced AWS DevOps Engineer with strong expertise in Observability and Site Reliability Engineering (SRE) to design, build, and manage scalable, reliable, and secure cloud environments. The role requires hands-on experience with AWS services, Infrastructure as Code (IaC), CI/CD, monitoring & observability frameworks, and incident response practices to ensure high availability, performance, and resilience of business-critical systems.


Key Responsibilities

  • Cloud Infrastructure (AWS):
  • Design, implement, and manage scalable, resilient, and cost-optimized cloud infrastructure using AWS services (EC2, EKS, Lambda, RDS, S3, CloudFront, IAM, VPC, etc.).
  • Implement Infrastructure as Code (IaC) using tools like Terraform / CloudFormation .
  • DevOps & Automation:
  • Build and maintain CI/CD pipelines (Jenkins, GitHub Actions, GitLab CI, or AWS CodePipeline) for automated deployments.
  • Automate repetitive tasks to improve development velocity and operational efficiency.
  • Observability & Monitoring:
  • Define and implement observability strategy covering monitoring, logging, tracing, and alerting.
  • Work with tools like Prometheus, Grafana, ELK/EFK stack, AWS CloudWatch, Datadog, New Relic, Splunk, or Dynatrace .
  • Establish SLIs, SLOs, and SLAs to measure and improve system reliability.
  • Site Reliability Engineering (SRE):
  • Drive incident management processes – detection, alerting, root cause analysis, and postmortems.
  • Apply chaos engineering principles to validate resilience and recovery.
  • Optimize reliability, latency, scalability, and system efficiency.
  • Security & Compliance:
  • Implement best practices for cloud security, identity & access management, and compliance frameworks (ISO, SOC2, GDPR, etc.).
  • Ensure observability and monitoring meet security and audit requirements.
  • Collaboration & Leadership:
  • Partner with development, QA, and product teams to ensure seamless deployments.
  • Mentor junior engineers and promote a culture of reliability, automation, and continuous improvement .


Required Skills & Qualifications

  • 7+ years of professional experience in DevOps, Cloud Infrastructure, or SRE roles.
  • Strong expertise in AWS Cloud (certification preferred: AWS Certified DevOps Engineer, Solutions Architect, or SysOps).
  • Proficiency in IaC tools (Terraform, CloudFormation).
  • Solid experience in CI/CD pipeline tools (Jenkins, GitHub Actions, GitLab CI/CD, AWS CodePipeline).
  • Hands-on with observability tools : Prometheus, Grafana, CloudWatch, ELK, Datadog, New Relic, Splunk, or similar.
  • Deep understanding of SRE principles : SLIs/SLOs, error budgets, incident response, chaos testing.
  • Strong scripting/coding experience (Python, Bash, Go, or similar).
  • Knowledge of containers & orchestration (Docker, Kubernetes, EKS).
  • Familiarity with security best practices in cloud-native environments.


Preferred Skills

  • Experience with multi-cloud or hybrid-cloud environments .
  • Exposure to resiliency testing & chaos engineering tools (Gremlin, Litmus, Chaos Mesh).
  • Knowledge of cost-optimization and FinOps in AWS.
  • Excellent communication and stakeholder management skills.


What We Offer

  • Opportunity to work on cutting-edge cloud-native architectures.
  • A culture focused on automation, reliability, and innovation .
  • Growth opportunities with certifications, training, and leadership exposure.
This advertiser has chosen not to accept applicants from your region.

Site Reliability Engineer

Bengaluru, Karnataka Enterprise Minds, Inc

Posted 5 days ago

Job Viewed

Tap Again To Close

Job Description

We're Hiring! | Site Reliability Engineer | 8-10 years

This advertiser has chosen not to accept applicants from your region.

Site Reliability Engineer

Bengaluru, Karnataka WhiteLotus Talent Partners

Posted 10 days ago

Job Viewed

Tap Again To Close

Job Description

We are looking for a L0 and L1 Site Reliability Engineer (SRE) Support to join our Krutrim Cloud Site Reliability operations team and ensure the smooth functioning of our cloud infrastructure powered by OpenStack and Kubernetes . In this role, you will focus on monitoring , basic troubleshooting , and incident response , helping to maintain high system availability, reliability, and performance. You will be responsible for identifying and addressing simple issues, as well as escalating more complex problems to senior SREs when needed.

The ideal candidate should have a basic understanding of cloud infrastructure (especially OpenStack and Kubernetes ), containerized environments , and system monitoring. This position offers an excellent opportunity for someone looking to grow into a more advanced SRE or DevOps role.

This advertiser has chosen not to accept applicants from your region.
Be The First To Know

About the latest Devops engineers Jobs in Bengaluru !

Site Reliability Engineer

Bengaluru, Karnataka Tookitaki

Posted 13 days ago

Job Viewed

Tap Again To Close

Job Description

Position Overview

Job Title : Site Reliability Engineer (SRE)

Department : Technology

Location : Bangalore

Reporting To : Head of Infra


Tookitaki is looking for a Site Reliability Engineer (SRE) with 3–6 years of experience to help maintain and scale the infrastructure that powers our flagship products—FinCense and the AFC Ecosystem . As an SRE, you will work at the intersection of software engineering and infrastructure , ensuring high availability, performance, and scalability of our platforms. You will collaborate with engineering, DevOps, and client success teams to operationalize deployments across on-premise, VPC, and Compliance as a Service (CaaS) environments while improving monitoring, automation, and incident response.


Position Purpose

The SRE role is responsible for ensuring the reliability and efficiency of Tookitaki’s production systems and environments. This includes building monitoring systems , improving deployment pipelines, automating routine operations, and responding to production incidents. You’ll help build a resilient infrastructure that supports our mission to provide AI-driven solutions that prevent financial crime.


Key Responsibilities

System Monitoring & Incident Management

  • Build and maintain monitoring, alerting, and logging systems using tools like Prometheus, Grafana, and ELK.
  • Respond to incidents and outages , conduct post-mortems, and implement corrective actions.

Infrastructure & Deployment Automation

  • Automate infrastructure provisioning and application deployment using Terraform, Ansible, or Helm .
  • Contribute to CI/CD pipelines, improve reliability and speed of software delivery (GitLab CI, Jenkins, etc.).

Container & Orchestration Management

  • Manage and troubleshoot Docker containers and Kubernetes clusters , ensuring workload scaling, resource management, and health.
  • Support application updates, rollbacks, and blue-green or canary deployments.

Cloud & Platform Operations

  • Operate within AWS (preferred) or GCP environments (EC2, S3, VPC, IAM).
  • Monitor system availability and resource usage across environments.

Security & Reliability Enhancements

  • Implement and monitor TLS/SSL, RBAC, SSO, and secure API practices .
  • Support compliance and security audit activities by maintaining logs, access controls, and operational hygiene.

Collaboration & Documentation

  • Work closely with developers, infra engineers, and support teams to ensure production readiness.
  • Maintain playbooks, runbooks, and system documentation for reliability engineering activities.


Qualifications and Skills

Education

  • Bachelor’s degree in Computer Science, Engineering, or related technical field .

Experience

  • 3–6 years in Site Reliability Engineering, DevOps, Platform Engineering , or a related role.
  • Experience with production environments and live system debugging.

Technical Skills

  • Kubernetes , Docker, Helm – experience deploying and scaling services.
  • Linux administration and command-line debugging.
  • Hands-on with AWS (preferred) or GCP cloud platforms.
  • Scripting in Bash and Python for automation and monitoring tasks.
  • Experience with monitoring and alerting tools like Prometheus, Grafana, ELK, or Datadog .
  • Familiarity with databases (e.g., MariaDB, ScyllaDB ) and SQL/CQL querying .

Soft Skills

  • Strong problem-solving and debugging skills.
  • Ability to work in on-call rotations and high-pressure production environments.
  • Excellent communication and documentation abilities.

Key Competencies

  • Operational Reliability : Ensures system uptime and performance through proactive monitoring and maintenance.
  • Automation Mindset : Reduces manual effort through scripting and tooling.
  • Incident Response : Quick identification and resolution of issues to minimize downtime.
  • Cross-Functional Collaboration : Works effectively with engineering, support, and infra teams.
  • Security Awareness : Applies best practices in infrastructure and platform security.

Success Metrics

  • Maintain 99.9%+ uptime across production environments.
  • Reduce mean time to detect (MTTD) and mean time to resolve (MTTR) for critical incidents.
  • Increase in automation coverage and reduction in manual deployment steps.
  • High internal satisfaction from developers on CI/CD and platform reliability.
  • Compliance readiness and security log availability for audits.

Benefits

  • Competitive compensation
  • Work on a globally recognized RegTech platform transforming financial crime prevention. Exposure to cutting-edge AI and big data infrastructure (Spark, Kafka, ScyllaDB, Flink).
This advertiser has chosen not to accept applicants from your region.

Site Reliability Engineer

Bengaluru, Karnataka Altimetrik India

Posted 14 days ago

Job Viewed

Tap Again To Close

Job Description

Experience - 8 to 12 years

Location - Chennai, Bangalore

Notice Period - Immediate to 30 days


Roles and Responsibilities


  • Assist in the design and implementation of reliable and scalable systems using Kubernetes, Docker, and Istio.
  • Proactively identify performance improvements in areas such as responsiveness, availability, and scalability.
  • Monitor system performance and respond to incidents as they arise, utilizing Datadog for observability.
  • Help develop automation scripts for deployment and monitoring.
  • Leverage GitOps to ensure that software can reliably and smoothly be shipped to production.
  • Collaborate with development teams to identify and resolve reliability issues.
  • Conduct load testing to verify that systems can handle expected loads for new products and updates to existing products.
  • Implement A/B deployments, canary deployments, and traffic mirroring strategies to ensure critical updates go smoothly and can be rolled back easily if necessary.
  • Utilize Helm charts for application deployment and management.
  • Understand AWS systems, including AWS Load Balancers, EKS and routing, to support systems handling millions of requests per hour.
  • Ensure that solutions are cost-effective while providing a high-quality customer experience and maintaining very high availability.
  • Participate in on-call rotations and support production systems, collaborating with SREs in other parts of the world.
  • Contribute to documentation and knowledge sharing within the team.
  • Assist in the implementation of best practices for system reliability.


Skills

  • 8+ years of experience in Site Reliability Engineering, DevOps, or a related field.
  • Expertise with AWS.
  • Expertise with Kubernetes, Docker, and Istio.
  • Knowledge of monitoring and alerting tools, particularly Datadog, AppDynamics, ELK, Grafana, or Prometheus.
  • Implement and tune Horizontal Pod Autoscalers (HPAs) to optimize resource utilization.
  • Understanding of Argo CD for GitOps practices.
  • Familiarity with A/B, Canary, Blue/Green deployments, and traffic mirroring techniques.
  • Understanding of scripting and orchestration tools such as Terraform, Ansible, or equivalent.
  • Awareness of cost management in cloud environments and the ability to balance cost with performance and reliability.
  • Demonstrates advanced problem-solving, troubleshooting, decision making skills
  • Ability to handle a team effectively
  • Excellent verbal and written communication skills.
  • Expertise in Golang or Rust
This advertiser has chosen not to accept applicants from your region.

Site Reliability Engineer

Bengaluru, Karnataka Rapid Circle

Posted 18 days ago

Job Viewed

Tap Again To Close

Job Description

Making a difference and driving positive change is what we do every day at Rapid Circle. Our Cloud Pioneers help our clients in their digital transformation. Are you someone who goes for constant, positive change? Then this vacancy is for you!


As a Cloud Pioneer at Rapid Circle, you will work with our customers on different projects. For example, making impact in the healthcare sector, by making research data safely available. But also, awesome projects in the manufacturing or energy market make this job very challenging.

At Rapid Circle we are curious and are constantly improving our expertise to help customers find their way in a rapidly changing world. We share our knowledge and discover new ways to learn.


Rapid Circle is growing rapidly and are therefore looking for the right person for the role. You will be given lots of freedom to develop personally. We also have a lot of in-house knowledge (MVPs) within the Netherlands, Australia, and India. By working closely with your (international) colleagues, you can continue to challenge yourself and create your own growth path. Freedom, entrepreneurship, and development are key at Rapid Circle, so also in the role of a Site Reliability Engineer.


  • Develop, manage, and optimize Terraform modules and deployments across multiple environments.
  • Handle SRE operational duties including responding to pull requests and ensuring smooth continuous integration and delivery processes.
  • Maintain and fine-tune applications for optimal performance, ensuring they meet specified requirements.
  • Explore and experiment with new technologies through Proof-of-Concepts to enhance existing functionalities or discover new opportunities.
  • Automate deployment, configuration, and operational processes to improve efficiency and accuracy.
  • Collaborate with development teams to guide system architecture and design, focusing on reliability, efficiency, and scalability.
  • Implement and manage observability tools such as Grafana, Prometheus, and New Relic to ensure all critical services are monitored effectively.
  • Develop custom reliability tools and frameworks for use by engineering teams.
  • Participate in an on-call rotation for critical systems, lead incident responses, and conduct thorough post-mortem analyses.
  • Drive system and process efficiencies including capacity planning, configuration management, performance tuning, monitoring, and root cause analysis.
  • Act as a consultant within the organization for best practices in infrastructure management and assist teams in effective infrastructure utilization.
  • Play a key role in capacity planning to help teams prepare for scaling and growth.


Must have:


  • In-depth knowledge of cloud service providers like Azure or AWS, with a professional or specialty level certification (security certification is a plus).
  • Strong understanding of REST and/or Graph APIs.
  • Background in DevSecOps or cloud security, with experience in cloud security posture management applications.
  • Experience with state machines such as AWS Step Functions or Azure Logic Apps.
  • Deep knowledge in telemetry and observability; experience with Prometheus, OpenTelemetry, or DynaTrace is highly desirable.
  • Proficiency in Kubernetes with CKA/CKAD certification being advantageous.
  • Expertise in Terraform, with experience in setting up pipelines for multi-environment deployments.
  • Good programming skills in high-level languages, with a preference for Python. Go, or any other compiled languages is an advantage.
  • Familiarity with Observability tools like Grafana, Prometheus, and New Relic.
  • Strong project management and organizational skills.
  • An open mindset with the ability to quickly adapt to new technologies and learning practices.


About Cloud Native Engineering

The Cloud Native Engineering Practice is an organization of engineers who work with our production services throughout their entire life cycle, from design and architecture, through implementation, deployment, and sustaining operation. SRE’s delivers important system properties: reliability, performance, efficiency, and scalability, for the products and platforms that our customers use every day.

SREs work in high-performance squads with expertise on large scale system reliability and in-depth understanding of critical business components architecture, as well as dedicated engineering teams building comprehensive tools, platform and infrastructure.

We need your skills and passion to help make it happen!

This advertiser has chosen not to accept applicants from your region.
 

Nearby Locations

Other Jobs Near Me

Industry

  1. request_quote Accounting
  2. work Administrative
  3. eco Agriculture Forestry
  4. smart_toy AI & Emerging Technologies
  5. school Apprenticeships & Trainee
  6. apartment Architecture
  7. palette Arts & Entertainment
  8. directions_car Automotive
  9. flight_takeoff Aviation
  10. account_balance Banking & Finance
  11. local_florist Beauty & Wellness
  12. restaurant Catering
  13. volunteer_activism Charity & Voluntary
  14. science Chemical Engineering
  15. child_friendly Childcare
  16. foundation Civil Engineering
  17. clean_hands Cleaning & Sanitation
  18. diversity_3 Community & Social Care
  19. construction Construction
  20. brush Creative & Digital
  21. currency_bitcoin Crypto & Blockchain
  22. support_agent Customer Service & Helpdesk
  23. medical_services Dental
  24. medical_services Driving & Transport
  25. medical_services E Commerce & Social Media
  26. school Education & Teaching
  27. electrical_services Electrical Engineering
  28. bolt Energy
  29. local_mall Fmcg
  30. gavel Government & Non Profit
  31. emoji_events Graduate
  32. health_and_safety Healthcare
  33. beach_access Hospitality & Tourism
  34. groups Human Resources
  35. precision_manufacturing Industrial Engineering
  36. security Information Security
  37. handyman Installation & Maintenance
  38. policy Insurance
  39. code IT & Software
  40. gavel Legal
  41. sports_soccer Leisure & Sports
  42. inventory_2 Logistics & Warehousing
  43. supervisor_account Management
  44. supervisor_account Management Consultancy
  45. supervisor_account Manufacturing & Production
  46. campaign Marketing
  47. build Mechanical Engineering
  48. perm_media Media & PR
  49. local_hospital Medical
  50. local_hospital Military & Public Safety
  51. local_hospital Mining
  52. medical_services Nursing
  53. local_gas_station Oil & Gas
  54. biotech Pharmaceutical
  55. checklist_rtl Project Management
  56. shopping_bag Purchasing
  57. home_work Real Estate
  58. person_search Recruitment Consultancy
  59. store Retail
  60. point_of_sale Sales
  61. science Scientific Research & Development
  62. wifi Telecoms
  63. psychology Therapy
  64. pets Veterinary
View All Devops Engineers Jobs View All Jobs in Bengaluru