4,338 Site Reliability Engineer jobs in India
Site Reliability Engineer
Posted 2 days ago
Job Viewed
Job Description
**About Ensono**
Ensono is an expert technology adviser and managed service provider. As a relentless ally, we accelerate clients' digital transformation to achieve business outcomes that stand to last. Our dedicated team helps organizations optimize today's systems across any hybrid environment with services such as consulting, mainframe and application modernization, public cloud migration and cloud-native development. With certified experts in AWS, Azure and Google Cloud and recognized as Microsoft Datacenter Transformation Partner of the Year, Ensono has over 3500+ associates globally and is headquartered in greater Chicago.
We care about your success, offering comprehensive strategic and managed services for mission-critical applications. Our Advisory and Consulting services can help upfront with an application strategy or find the right places for your applications - whether it's public, multi or hybrid cloud, or mainframe. And because we span across all mission-critical platforms, we can meet you wherever you are in your digital transformation journey, with 24/7 support when you need it. We are your relentless ally, flexing with you when challenges emerge so you don't feel stuck in place. With cross-platform certifications and decades of experience, our technology experts have become an extension of your team so you're continuously innovating - doing more with less while remaining secure. And that's just the beginning **.**
**Technical Skills:**
SRE
+ Commercial experience and proficiency with industry standard:
1.
1. IAC tooling (Terraform preferably, or ARM/bicep and CloudFront)
2. Core CI/CD Tooling (Azure DevOps, GitHub Actions or Gitlab - experience with Harness is beneficial)
3. Monitoring Tooling (DataDog, Splunk, NewRelic, Azure Monitor)
+ Commercial experience in at least one core technology (Dotnet, Java, Javascript)
1.
1. Troubleshooting issues and identifying systemic failings indicated by incidents/failures
2. Implementing fixes
3. Proposing solutions for reducing toil
+ Providing leadership in the Incident resolution process, including creating and maintaining documentation, and providing key input to Post-mortem analysis
+ Improving Service Requests and Change Management processes, both technically and through stakeholder management).
+ Participate in the process for, and Proactively mitigate risks in a Security management process (Vulnerabilities in Code, Infrastructure, Dependencies)
+ Lead discussion in client-facing meetings and discussions around the SRE process, and identifying areas for increasing SRE footprint.
+ Engaging with suppliers and 3rd parties for support, requests and opportunities
+ SRE Foundation certificate (DevOps Institute) and an Azure 'Associate'-level certification highly beneficial or required during probationary period.
Experience: 4+ yrs
JR
Site Reliability Engineer

Posted 3 days ago
Job Viewed
Job Description
Découvrez votre prochaine opportunité au sein d'une organisation qui compte parmi les 500 plus importantes entreprises mondiales. Envisagez des opportunités innovantes, découvrez notre culture enrichissante et travaillez avec des équipes talentueuses qui vous poussent à vous développer chaque jour. Nous savons ce qu'il faut faire pour diriger UPS vers l'avenir : des personnes passionnées dotées d'une combinaison unique de compétences. Si vous avez les qualités, de la motivation, de l'autonomie ou le leadership pour diriger des équipes, il existe des postes adaptés à vos aspirations et à vos compétences d'aujourd'hui et de demain.
**Fiche de poste :**
**Job Summary:**
We are seeking a skilled and proactive **Site Reliability Engineer (SRE)** with 5-8 years of experience and deep expertise in **Google Cloud Platform (GCP)** . The ideal candidate will be responsible for the reliability, availability, and performance of cloud-based applications and infrastructure. You will collaborate with development, operations, and security teams to build and maintain scalable, secure, and highly available systems.
**Key Responsibilities:**
+ Design, develop, and maintain **reliable, scalable, and highly available systems** on GCP.
+ Build and manage **CI/CD pipelines** , infrastructure as code (IaC), and monitoring solutions.
+ Proactively monitor and manage **system performance, uptime, and capacity** using observability tools.
+ Troubleshoot and resolve **infrastructure and application-level issues** in real-time.
+ Implement and maintain **disaster recovery** , **failover mechanisms** , and **backup strategies** .
+ Automate repetitive tasks and processes to improve **efficiency and reduce toil** .
+ Participate in **on-call rotations** , incident management, and root cause analysis (RCA).
+ Ensure compliance with **security standards, privacy regulations, and governance policies** .
+ Collaborate with cross-functional teams to support **DevOps and SRE best practices** .
+ Drive improvements in **SLAs, SLOs, and error budgets** through data-driven insights.
**Required Qualifications:**
+ 5-8 years of relevant experience as an SRE, DevOps Engineer, or Cloud Infrastructure Engineer.
+ Strong hands-on experience with **Google Cloud Platform (GCP)** - Compute Engine, GKE, Cloud Functions, Cloud Storage, IAM, BigQuery, etc.
+ Proficiency in **Infrastructure as Code** tools like **Terraform** , **Deployment Manager** , or **CloudFormation** .
+ Experience with **Kubernetes** , **Docker** , and container orchestration.
+ Proficiency in scripting languages like **Python** , **Shell** , or **Go** .
+ Deep understanding of **monitoring and logging tools** such as **Prometheus** , **Grafana** , **Stackdriver** , or **Datadog** .
+ Knowledge of **CI/CD tools** such as Jenkins, GitLab CI, or Cloud Build.
+ Experience with **incident response** , **postmortem analysis** , and **site reliability principles** .
+ Strong problem-solving and communication skills.
**Preferred Qualifications:**
+ GCP certifications (e.g., **Professional Cloud DevOps Engineer** , **Cloud Architect** ).
+ Exposure to **multi-cloud environments** or hybrid cloud infrastructure.
+ Familiarity with **Agile** and **ITIL** frameworks.
+ Experience working in regulated environments with compliance standards (e.g., ISO, SOC2).
**Type de contrat:**
en CDI
_Chez UPS, égalité des chances, traitement équitable et environnement de travail inclusif sont des valeurs clefs auxquelles nous sommes attachés._
Site Reliability Engineer

Posted 3 days ago
Job Viewed
Job Description
Explore your next opportunity at a Fortune Global 500 organization. Envision innovative possibilities, experience our rewarding culture, and work with talented teams that help you become better every day. We know what it takes to lead UPS into tomorrow-people with a unique combination of skill + passion. If you have the qualities and drive to lead yourself or teams, there are roles ready to cultivate your skills and take you to the next level.
**Job Description:**
**Job Summary:**
We are seeking a skilled and proactive **Site Reliability Engineer (SRE)** with 5-8 years of experience and deep expertise in **Google Cloud Platform (GCP)** . The ideal candidate will be responsible for the reliability, availability, and performance of cloud-based applications and infrastructure. You will collaborate with development, operations, and security teams to build and maintain scalable, secure, and highly available systems.
**Key Responsibilities:**
+ Design, develop, and maintain **reliable, scalable, and highly available systems** on GCP.
+ Build and manage **CI/CD pipelines** , infrastructure as code (IaC), and monitoring solutions.
+ Proactively monitor and manage **system performance, uptime, and capacity** using observability tools.
+ Troubleshoot and resolve **infrastructure and application-level issues** in real-time.
+ Implement and maintain **disaster recovery** , **failover mechanisms** , and **backup strategies** .
+ Automate repetitive tasks and processes to improve **efficiency and reduce toil** .
+ Participate in **on-call rotations** , incident management, and root cause analysis (RCA).
+ Ensure compliance with **security standards, privacy regulations, and governance policies** .
+ Collaborate with cross-functional teams to support **DevOps and SRE best practices** .
+ Drive improvements in **SLAs, SLOs, and error budgets** through data-driven insights.
**Required Qualifications:**
+ 5-8 years of relevant experience as an SRE, DevOps Engineer, or Cloud Infrastructure Engineer.
+ Strong hands-on experience with **Google Cloud Platform (GCP)** - Compute Engine, GKE, Cloud Functions, Cloud Storage, IAM, BigQuery, etc.
+ Proficiency in **Infrastructure as Code** tools like **Terraform** , **Deployment Manager** , or **CloudFormation** .
+ Experience with **Kubernetes** , **Docker** , and container orchestration.
+ Proficiency in scripting languages like **Python** , **Shell** , or **Go** .
+ Deep understanding of **monitoring and logging tools** such as **Prometheus** , **Grafana** , **Stackdriver** , or **Datadog** .
+ Knowledge of **CI/CD tools** such as Jenkins, GitLab CI, or Cloud Build.
+ Experience with **incident response** , **postmortem analysis** , and **site reliability principles** .
+ Strong problem-solving and communication skills.
**Preferred Qualifications:**
+ GCP certifications (e.g., **Professional Cloud DevOps Engineer** , **Cloud Architect** ).
+ Exposure to **multi-cloud environments** or hybrid cloud infrastructure.
+ Familiarity with **Agile** and **ITIL** frameworks.
+ Experience working in regulated environments with compliance standards (e.g., ISO, SOC2).
**Employee Type:**
Permanent
UPS is committed to providing a workplace free of discrimination, harassment, and retaliation.
Site Reliability Engineer

Posted 3 days ago
Job Viewed
Job Description
**Grade Level (for internal use):**
10
Department overview
S&P Global provides innovative products and services that enhance transparency, reduce risk, and improve operational efficiency. Our customers include banks, hedge funds, asset managers, central banks, regulators, auditors, fund administrators and insurance companies. We develop large scale technology platforms and enterprise software to produce global financial data with focus on analysis and regulatory requirements.
Position Summary:
We are seeking a proactive and innovative Site Reliability Engineer to join our growing team. In this role, you will be a key player in ensuring the reliability, scalability, and performance of our critical systems. You will move beyond traditional monitoring to implement advanced observability, leverage AIOps for predictive insights, and use Chaos Engineering to proactively uncover system weaknesses. This is an opportunity to help shape a modern SRE culture, automate away toil, and empower our development teams to build more resilient applications from the ground up.
Key Responsibilities
1. Observability & Proactive System Health
+ Design, build, and maintain a comprehensive observability platform using tools like Splunk and OpenTelemetry to provide deep insights into system health and performance.
+ Leverage AIOps principles and platforms to enhance anomaly detection, automate event correlation, and enable predictive alerting, reducing mean time to detection (MTTD).
+ Develop and manage robust alerting strategies and SLO-based dashboards to ensure critical issues are addressed before they impact customers.
+ Drive a data-driven culture by providing engineering teams with the visibility they need to understand the impact of their code in production.
2. Reliability & Resilience Engineering
+ Design, implement, and conduct Chaos Engineering experiments to proactively identify and remediate system weaknesses, architectural flaws, and potential cascading failures.
+ Partner with software engineering teams throughout the application lifecycle to architect for high availability, disaster recovery, and fault tolerance.
+ Define, measure, and evangelize Service Level Indicators (SLIs) and Service Level Objectives (SLOs), and manage the associated error budgets to balance reliability with feature velocity.
+ Analyze and lead blameless post-mortems for incidents, ensuring that root causes are addressed and preventative measures are implemented to avoid recurrence.
3. Performance & Efficiency Optimization
+ Analyze performance metrics and distributed traces to identify and resolve latency bottlenecks across our infrastructure and applications.
+ Implement cost optimization (FinOps) strategies by identifying and eliminating resource waste, optimizing cloud service usage, and promoting efficient architecture patterns.
+ Work with development teams to conduct performance testing and ensure new features do not introduce performance regressions.
4. Automation & Platform Engineering
+ Identify and aggressively automate manual operational tasks (toil) by developing scripts, tools, and self-healing systems.
+ Enhance and maintain our Infrastructure as Code (IaC) modules, promoting reusable patterns and best practices with Terraform.
+ Improve and secure CI/CD pipelines (e.g., GitHub Actions, Azure DevOps) to enable safe, automated, and rapid deployment and rollback procedures.
Requirements and Qualifications
Core Technical Skills
+ Experience: 4+ years in a Site Reliability, DevOps, or Cloud Engineering role, with demonstrable experience in a large-scale production environment.
+ Cloud Proficiency: Deep experience with AWS services (EKS, ECS, EC2, S3, RDS, Lambda) and managing production workloads in the cloud.
+ Observability: Proficient in application observability, monitoring, and logging. Hands-on experience with tools like Splunk, OpenTelemetry, Prometheus, Grafana, or Datadog is essential.
+ Infrastructure as Code (IaC): Strong experience with Terraform for provisioning and managing cloud infrastructure.
+ Containerization: Solid understanding of Containerization Technology particularly with managed services like EKS or ECS.
+ CI/CD: Experience building and maintaining CI/CD pipelines using tools like GitHub Actions, Azure DevOps, or Jenkins.
+ Scripting & Automation: Strong scripting skills in languages like Python, Bash, or PowerShell for automation and tooling. Familiarity with a higher-level language such as C# (.NET) is a plus.
+ Modern Practices: Experience with or a demonstrated understanding of AIOps concepts and Chaos Engineering principles and tools (e.g., Gremlin, AWS Fault Injection Simulator).
Professional Attributes
+ SRE Mindset: A true understanding of Site Reliability Engineering principles, including SLOs, error budgets, and the value of eliminating toil.
+ Problem-Solving: Excellent troubleshooting and problem-solving skills, with a methodical approach to resolving complex technical issues under pressure.
+ Collaboration: Ability to work effectively with development teams, product managers, and other stakeholders, communicating complex technical ideas clearly.
+ Ownership & Drive: A strong sense of ownership, urgency, and a passion for building and maintaininghighly available, performant, and reliable systems.
+ Agile Experience: Comfortable working in an agile environment and contributing to team sprints and planning.
+ On-Call: Willingness to participate in a scheduled on-call rotation
Education & Certifications
+ Bachelor's degree in Computer Science, Information Technology, or a related field, or equivalent practical experience.
+ AWS certification (e.g., AWS Certified Solutions Architect, DevOps Engineer) is highly preferred.
**About S&P Global Market Intelligence**
At S&P Global Market Intelligence, a division of S&P Global we understand the importance of accurate, deep and insightful information. Our team of experts delivers unrivaled insights and leading data and technology solutions, partnering with customers to expand their perspective, operate with confidence, and make decisions with conviction.
For more information, visit .
**What's In It For** **You?**
**Our Purpose:**
Progress is not a self-starter. It requires a catalyst to be set in motion. Information, imagination, people, technology-the right combination can unlock possibility and change the world.
Our world is in transition and getting more complex by the day. We push past expected observations and seek out new levels of understanding so that we can help companies, governments and individuals make an impact on tomorrow. At S&P Global we transform data into Essential Intelligence®, pinpointing risks and opening possibilities. We Accelerate Progress.
**Our People:**
We're more than 35,000 strong worldwide-so we're able to understand nuances while having a broad perspective. Our team is driven by curiosity and a shared belief that Essential Intelligence can help build a more prosperous future for us all.
From finding new ways to measure sustainability to analyzing energy transition across the supply chain to building workflow solutions that make it easy to tap into insight and apply it. We are changing the way people see things and empowering them to make an impact on the world we live in. We're committed to a more equitable future and to helping our customers find new, sustainable ways of doing business. We're constantly seeking new solutions that have progress in mind. Join us and help create the critical insights that truly make a difference.
**Our Values:**
**Integrity, Discovery, Partnership**
At S&P Global, we focus on Powering Global Markets. Throughout our history, the world's leading organizations have relied on us for the Essential Intelligence they need to make confident decisions about the road ahead. We start with a foundation of **integrity** in all we do, bring a spirit of **discovery** to our work, and collaborate in close **partnership** with each other and our customers to achieve shared goals.
**Benefits:**
We take care of you, so you can take care of business. We care about our people. That's why we provide everything you-and your career-need to thrive at S&P Global.
Our benefits include:
+ Health & Wellness: Health care coverage designed for the mind and body.
+ Flexible Downtime: Generous time off helps keep you energized for your time on.
+ Continuous Learning: Access a wealth of resources to grow your career and learn valuable new skills.
+ Invest in Your Future: Secure your financial future through competitive pay, retirement planning, a continuing education program with a company-matched student loan contribution, and financial wellness programs.
+ Family Friendly Perks: It's not just about you. S&P Global has perks for your partners and little ones, too, with some best-in class benefits for families.
+ Beyond the Basics: From retail discounts to referral incentive awards-small perks can make a big difference.
For more information on benefits by country visit: Hiring and Opportunity at S&P Global:**
At S&P Global, we are committed to fostering a connected and engaged workplace where all individuals have access to opportunities based on their skills, experience, and contributions. Our hiring practices emphasize fairness, transparency, and merit, ensuring that we attract and retain top talent. By valuing different perspectives and promoting a culture of respect and collaboration, we drive innovation and power global markets.
**Recruitment Fraud Alert:**
If you receive an email from a spglobalind.com domain or any other regionally based domains, it is a scam and should be reported to . S&P Global never requires any candidate to pay money for job applications, interviews, offer letters, "pre-employment training" or for equipment/delivery of equipment. Stay informed and protect yourself from recruitment fraud by reviewing our guidelines, fraudulent domains, and how to report suspicious activity here ( .
---
**Equal Opportunity Employer**
S&P Global is an equal opportunity employer and all qualified candidates will receive consideration for employment without regard to race/ethnicity, color, religion, sex, sexual orientation, gender identity, national origin, age, disability, marital status, military veteran status, unemployment status, or any other status protected by law. Only electronic job submissions will be considered for employment.
If you need an accommodation during the application process due to a disability, please send an email to: and your request will be forwarded to the appropriate person.
**US Candidates Only:** The EEO is the Law Poster describes discrimination protections under federal law. Pay Transparency Nondiscrimination Provision - - Middle Professional Tier I (EEO Job Group)
**Job ID:**
**Posted On:**
**Location:** Gurgaon, Haryana, India
Site Reliability Engineer

Posted 3 days ago
Job Viewed
Job Description
**Grade Level (for internal use):**
10
Department overview
S&P Global provides innovative products and services that enhance transparency, reduce risk, and improve operational efficiency. Our customers include banks, hedge funds, asset managers, central banks, regulators, auditors, fund administrators and insurance companies. We develop large scale technology platforms and enterprise software to produce global financial data with focus on analysis and regulatory requirements.
Position Summary:
We are seeking a proactive and innovative Site Reliability Engineer to join our growing team. In this role, you will be a key player in ensuring the reliability, scalability, and performance of our critical systems. You will move beyond traditional monitoring to implement advanced observability, leverage AIOps for predictive insights, and use Chaos Engineering to proactively uncover system weaknesses. This is an opportunity to help shape a modern SRE culture, automate away toil, and empower our development teams to build more resilient applications from the ground up.
Key Responsibilities
1. Observability & Proactive System Health
+ Design, build, and maintain a comprehensive observability platform using tools like Splunk and OpenTelemetry to provide deep insights into system health and performance.
+ Leverage AIOps principles and platforms to enhance anomaly detection, automate event correlation, and enable predictive alerting, reducing mean time to detection (MTTD).
+ Develop and manage robust alerting strategies and SLO-based dashboards to ensure critical issues are addressed before they impact customers.
+ Drive a data-driven culture by providing engineering teams with the visibility they need to understand the impact of their code in production.
2. Reliability & Resilience Engineering
+ Design, implement, and conduct Chaos Engineering experiments to proactively identify and remediate system weaknesses, architectural flaws, and potential cascading failures.
+ Partner with software engineering teams throughout the application lifecycle to architect for high availability, disaster recovery, and fault tolerance.
+ Define, measure, and evangelize Service Level Indicators (SLIs) and Service Level Objectives (SLOs), and manage the associated error budgets to balance reliability with feature velocity.
+ Analyze and lead blameless post-mortems for incidents, ensuring that root causes are addressed and preventative measures are implemented to avoid recurrence.
3. Performance & Efficiency Optimization
+ Analyze performance metrics and distributed traces to identify and resolve latency bottlenecks across our infrastructure and applications.
+ Implement cost optimization (FinOps) strategies by identifying and eliminating resource waste, optimizing cloud service usage, and promoting efficient architecture patterns.
+ Work with development teams to conduct performance testing and ensure new features do not introduce performance regressions.
4. Automation & Platform Engineering
+ Identify and aggressively automate manual operational tasks (toil) by developing scripts, tools, and self-healing systems.
+ Enhance and maintain our Infrastructure as Code (IaC) modules, promoting reusable patterns and best practices with Terraform.
+ Improve and secure CI/CD pipelines (e.g., GitHub Actions, Azure DevOps) to enable safe, automated, and rapid deployment and rollback procedures.
Requirements and Qualifications
Core Technical Skills
+ Experience: 4+ years in a Site Reliability, DevOps, or Cloud Engineering role, with demonstrable experience in a large-scale production environment.
+ Cloud Proficiency: Deep experience with AWS services (EKS, ECS, EC2, S3, RDS, Lambda) and managing production workloads in the cloud.
+ Observability: Proficient in application observability, monitoring, and logging. Hands-on experience with tools like Splunk, OpenTelemetry, Prometheus, Grafana, or Datadog is essential.
+ Infrastructure as Code (IaC): Strong experience with Terraform for provisioning and managing cloud infrastructure.
+ Containerization: Solid understanding of Containerization Technology particularly with managed services like EKS or ECS.
+ CI/CD: Experience building and maintaining CI/CD pipelines using tools like GitHub Actions, Azure DevOps, or Jenkins.
+ Scripting & Automation: Strong scripting skills in languages like Python, Bash, or PowerShell for automation and tooling. Familiarity with a higher-level language such as C# (.NET) is a plus.
+ Modern Practices: Experience with or a demonstrated understanding of AIOps concepts and Chaos Engineering principles and tools (e.g., Gremlin, AWS Fault Injection Simulator).
Professional Attributes
+ SRE Mindset: A true understanding of Site Reliability Engineering principles, including SLOs, error budgets, and the value of eliminating toil.
+ Problem-Solving: Excellent troubleshooting and problem-solving skills, with a methodical approach to resolving complex technical issues under pressure.
+ Collaboration: Ability to work effectively with development teams, product managers, and other stakeholders, communicating complex technical ideas clearly.
+ Ownership & Drive: A strong sense of ownership, urgency, and a passion for building and maintaininghighly available, performant, and reliable systems.
+ Agile Experience: Comfortable working in an agile environment and contributing to team sprints and planning.
+ On-Call: Willingness to participate in a scheduled on-call rotation
Education & Certifications
+ Bachelor's degree in Computer Science, Information Technology, or a related field, or equivalent practical experience.
+ AWS certification (e.g., AWS Certified Solutions Architect, DevOps Engineer) is highly preferred.
**About S&P Global Market Intelligence**
At S&P Global Market Intelligence, a division of S&P Global we understand the importance of accurate, deep and insightful information. Our team of experts delivers unrivaled insights and leading data and technology solutions, partnering with customers to expand their perspective, operate with confidence, and make decisions with conviction.
For more information, visit .
**What's In It For** **You?**
**Our Purpose:**
Progress is not a self-starter. It requires a catalyst to be set in motion. Information, imagination, people, technology-the right combination can unlock possibility and change the world.
Our world is in transition and getting more complex by the day. We push past expected observations and seek out new levels of understanding so that we can help companies, governments and individuals make an impact on tomorrow. At S&P Global we transform data into Essential Intelligence®, pinpointing risks and opening possibilities. We Accelerate Progress.
**Our People:**
We're more than 35,000 strong worldwide-so we're able to understand nuances while having a broad perspective. Our team is driven by curiosity and a shared belief that Essential Intelligence can help build a more prosperous future for us all.
From finding new ways to measure sustainability to analyzing energy transition across the supply chain to building workflow solutions that make it easy to tap into insight and apply it. We are changing the way people see things and empowering them to make an impact on the world we live in. We're committed to a more equitable future and to helping our customers find new, sustainable ways of doing business. We're constantly seeking new solutions that have progress in mind. Join us and help create the critical insights that truly make a difference.
**Our Values:**
**Integrity, Discovery, Partnership**
At S&P Global, we focus on Powering Global Markets. Throughout our history, the world's leading organizations have relied on us for the Essential Intelligence they need to make confident decisions about the road ahead. We start with a foundation of **integrity** in all we do, bring a spirit of **discovery** to our work, and collaborate in close **partnership** with each other and our customers to achieve shared goals.
**Benefits:**
We take care of you, so you can take care of business. We care about our people. That's why we provide everything you-and your career-need to thrive at S&P Global.
Our benefits include:
+ Health & Wellness: Health care coverage designed for the mind and body.
+ Flexible Downtime: Generous time off helps keep you energized for your time on.
+ Continuous Learning: Access a wealth of resources to grow your career and learn valuable new skills.
+ Invest in Your Future: Secure your financial future through competitive pay, retirement planning, a continuing education program with a company-matched student loan contribution, and financial wellness programs.
+ Family Friendly Perks: It's not just about you. S&P Global has perks for your partners and little ones, too, with some best-in class benefits for families.
+ Beyond the Basics: From retail discounts to referral incentive awards-small perks can make a big difference.
For more information on benefits by country visit: Hiring and Opportunity at S&P Global:**
At S&P Global, we are committed to fostering a connected and engaged workplace where all individuals have access to opportunities based on their skills, experience, and contributions. Our hiring practices emphasize fairness, transparency, and merit, ensuring that we attract and retain top talent. By valuing different perspectives and promoting a culture of respect and collaboration, we drive innovation and power global markets.
**Recruitment Fraud Alert:**
If you receive an email from a spglobalind.com domain or any other regionally based domains, it is a scam and should be reported to . S&P Global never requires any candidate to pay money for job applications, interviews, offer letters, "pre-employment training" or for equipment/delivery of equipment. Stay informed and protect yourself from recruitment fraud by reviewing our guidelines, fraudulent domains, and how to report suspicious activity here ( .
---
**Equal Opportunity Employer**
S&P Global is an equal opportunity employer and all qualified candidates will receive consideration for employment without regard to race/ethnicity, color, religion, sex, sexual orientation, gender identity, national origin, age, disability, marital status, military veteran status, unemployment status, or any other status protected by law. Only electronic job submissions will be considered for employment.
If you need an accommodation during the application process due to a disability, please send an email to: and your request will be forwarded to the appropriate person.
**US Candidates Only:** The EEO is the Law Poster describes discrimination protections under federal law. Pay Transparency Nondiscrimination Provision - - Middle Professional Tier I (EEO Job Group)
**Job ID:**
**Posted On:**
**Location:** Gurgaon, Haryana, India
Site Reliability Engineer

Posted 3 days ago
Job Viewed
Job Description
**OSTTRA India**
**The Role: Site Reliability Engineer**
**The Team:** SRE is a global team that provides technical support across the suite of OSTTRA products. The SRE team works closely with a highly competent Technical Operation Centre (TOC), Development and Infrastructure teams to deliver proactive tasks to improve the supportability of our platforms. Our work helps to ensure that OSTTRA provides a high-quality service and maintains client satisfaction.
**The Impact:** Together, we build, support, protect and manage high-performance, resilient platforms that process more than 100 million messages a day. Our services are vital to automated trade processing around the globe, managing peak volumes and working with our customers and regulators to ensure the efficient settlement of trades and effective operation of global capital markets.
**What's in it for you:**
OSTTRA is seeking a Site Reliability Engineer professional to join the SRE Team. The role will be specialised into the designated platforms provisioning 2nd line technical support to TOC as well as integration support for our Trade Processing applications. This person will report directly to the regional SRE manager and work closely with an experienced global team to contribute to the quality of our support.
You will have 6-10 years' experience of roles like Site Reliability Engineer or Application Support with Project Management tasks to meet the needs of our expanding portfolio of Financial Services clients.
This role presents an excellent opportunity to be part of an agile team based out of India, collaborating with colleagues across multiple regions globally, with a strong focus on delivering value through self-service.
**Responsibilities:**
+ Your duties will include Capacity Management, Operational Support Design, Audit Preparation, Incident Escalation, Problem Management Engagement, DR Design and Execution and ad hoc High Profile Client Engagement for your designated platform(s) in our full suite of OTC Derivative products and FX for post-trade confirmation processing.
+ You will need to demonstrate excellent communication skills and have a natural ability to learn with a keen interest in technology. You must be a team player and enjoy working in a high-performance collaborative environment with multiple teams.
+ The successful candidate will need to be able to apply strong technical skills and good business knowledge, together with investigative techniques and problem-solving skills to identify gaps and improve overall estate to bring resilience and stability to the platform(s).
+ Liaising with other teams across Product, Development and particularly the infrastructure teams as required for 3rd line escalation. Technical advisory will be required at times by Product and business or clients for solution delivery.
+ Working closely with Development and Infrastructure team, to understand and ensure supportability of platforms and liaising with delivery teams to ensure readiness for new platform releases. Based in our Gurgaon office, you will be responsible for handling, identifying and communicating technical resolutions in English.
**What We're Looking For:**
+ University graduate or equivalent with background of bachelor's in computer science
+ Experience or having high motivation in managing the capacity, performance throughput and EOS/EOL of platform from infrastructure to software
+ Experience in troubleshooting of issues, defining supportability, soaking in software development life cycle SDLC process streamlining application delivery from Dev/QA to UAT/Production
+ Good understanding of Site Reliable Engineer as well as Application Support processes, supporting of incidents and execute/design disaster recovery
+ Strong ability to understand application architecture, able to effectively navigate to the problem area, and identify proactive measures around resiliency, recovery design
+ Ability to apply analytical methodology, such as trending, distribution etc., to get insight from application data to help troubleshooting and analysing best approach
+ Ability to understand business workflow and tie to technical implementation
+ Experience in reading and tracing Java, C++, Python and/or scripting languages
+ Experience of databases including SQL scripting, preferably but not limited to Oracle
**Good to Have:**
+ Understanding of networking principles, its practical uses and basic troubleshooting.
+ Possess the understanding of Cloud (AWS, GCP or Azure), PAAS and implementation with Kubernetes, OpenShift, Windows and Linux
+ Experience in handling client issues and expectation management
+ Good understanding of messaging platforms and protocols like XML, XSLT, IBM MQ, AMQ etc
+ Knowledge of financial messaging protocols like FIX, FPmL, TOF etc
+ Experience security protocols related to connectivity encryption utilizing SSL and TLS
+ Have experience of working in the Finance Industry
+ Knowledge of the Financial OTC Derivative and FX products
+ Awareness of Derivatives products and post trade processing (desirable)
**The Location: Gurgaon, India**
**About Company Statement:**
OSTTRA is a market leader in derivatives post-trade processing, bringing innovation, expertise, processes and networks together to solve the post-trade challenges of global financial markets. OSTTRA operates cross-asset post-trade processing networks, providing a proven suite of Credit Risk, Trade Workflow and Optimization services. Together these solutions streamline post-trade workflows, enabling firms to connect to counterparties and utilities, manage credit risk, reduce operational risk and optimize processing to drive post-trade efficiencies.
OSTTRA was formed in 2021 through the combination of four businesses that have been at the heart of post trade evolution and innovation for the last 20+ years: MarkitServ, Traiana, TriOptima and Reset. These businesses have an exemplary track record of developing and supporting critical market infrastructure and bring together an established community of market participants comprising all trading relationships and paradigms, connected using powerful integration and transformation capabilities.
**About OSTTRA**
_Candidates should note that OSTTRA is an_ _independent firm,_ _jointly owned by S&P Global and CME Group. As part of the joint venture, S&P Global_ _provides recruitment services_ _to OSTTRA - however, successful candidates will be interviewed and directly employed by OSTTRA, joining our global team of more than 1,200 post trade experts._
OSTTRA was formed in 2021 through the combination of four businesses that have been at the heart of post trade evolution and innovation for the last 20+ years: MarkitServ, Traiana, TriOptima and Reset. OSTTRA is a joint venture, owned 50/50 by S&P Global and CME Group.
With an outstanding track record of developing and supporting critical market infrastructure, our combined network connects thousands of market participants to streamline end to end workflows - from trade capture at the point of execution, through portfolio optimization, to clearing and settlement.
Joining the OSTTRA team is a unique opportunity to help build a bold new business with an outstanding heritage in financial technology, playing a central role in supporting global financial markets.
Learn more at .
**What's In It For** **You?**
**Benefits:**
We take care of you, so you can take care of business. We care about our people. That's why we provide everything you-and your career-need to thrive at S&P Global.
Our benefits include:
+ Health & Wellness: Health care coverage designed for the mind and body.
+ Flexible Downtime: Generous time off helps keep you energized for your time on.
+ Continuous Learning: Access a wealth of resources to grow your career and learn valuable new skills.
+ Invest in Your Future: Secure your financial future through competitive pay, retirement planning, a continuing education program with a company-matched student loan contribution, and financial wellness programs.
+ Family Friendly Perks: It's not just about you. S&P Global has perks for your partners and little ones, too, with some best-in class benefits for families.
+ Beyond the Basics: From retail discounts to referral incentive awards-small perks can make a big difference.
For more information on benefits by country visit: Opportunity Employer**
S&P Global is an equal opportunity employer and all qualified candidates will receive consideration for employment without regard to race/ethnicity, color, religion, sex, sexual orientation, gender identity, national origin, age, disability, marital status, military veteran status, unemployment status, or any other status protected by law. Only electronic job submissions will be considered for employment.
If you need an accommodation during the application process due to a disability, please send an email to: and your request will be forwarded to the appropriate person.
**US Candidates Only:** The EEO is the Law Poster describes discrimination protections under federal law. Pay Transparency Nondiscrimination Provision - - Professional (EEO-2 Job Categories-United States of America), BSMGMT203 - Entry Professional (EEO Job Group)
**Job ID:**
**Posted On:**
**Location:** Gurgaon, Haryana, India
Site Reliability Engineer

Posted 3 days ago
Job Viewed
Job Description
**OSTTRA India**
**The Role: Site Reliability Engineer**
**The Team:** SRE is a global team that provides technical support across the suite of OSTTRA products. The SRE team works closely with a highly competent Technical Operation Centre (TOC), Development and Infrastructure teams to deliver proactive tasks to improve the supportability of our platforms. Our work helps to ensure that OSTTRA provides a high-quality service and maintains client satisfaction.
**The Impact:** Together, we build, support, protect and manage high-performance, resilient platforms that process more than 100 million messages a day. Our services are vital to automated trade processing around the globe, managing peak volumes and working with our customers and regulators to ensure the efficient settlement of trades and effective operation of global capital markets.
**What's in it for you:**
OSTTRA is seeking a Site Reliability Engineer professional to join the SRE Team. The role will be specialised into the designated platforms provisioning 2nd line technical support to TOC as well as integration support for our Trade Processing applications. This person will report directly to the regional SRE manager and work closely with an experienced global team to contribute to the quality of our support.
You will have 6-10 years' experience of roles like Site Reliability Engineer or Application Support with Project Management tasks to meet the needs of our expanding portfolio of Financial Services clients.
This role presents an excellent opportunity to be part of an agile team based out of India, collaborating with colleagues across multiple regions globally, with a strong focus on delivering value through self-service.
**Responsibilities:**
+ Your duties will include Capacity Management, Operational Support Design, Audit Preparation, Incident Escalation, Problem Management Engagement, DR Design and Execution and ad hoc High Profile Client Engagement for your designated platform(s) in our full suite of OTC Derivative products and FX for post-trade confirmation processing.
+ You will need to demonstrate excellent communication skills and have a natural ability to learn with a keen interest in technology. You must be a team player and enjoy working in a high-performance collaborative environment with multiple teams.
+ The successful candidate will need to be able to apply strong technical skills and good business knowledge, together with investigative techniques and problem-solving skills to identify gaps and improve overall estate to bring resilience and stability to the platform(s).
+ Liaising with other teams across Product, Development and particularly the infrastructure teams as required for 3rd line escalation. Technical advisory will be required at times by Product and business or clients for solution delivery.
+ Working closely with Development and Infrastructure team, to understand and ensure supportability of platforms and liaising with delivery teams to ensure readiness for new platform releases. Based in our Gurgaon office, you will be responsible for handling, identifying and communicating technical resolutions in English.
**What We're Looking For:**
+ University graduate or equivalent with background of bachelor's in computer science
+ Experience or having high motivation in managing the capacity, performance throughput and EOS/EOL of platform from infrastructure to software
+ Experience in troubleshooting of issues, defining supportability, soaking in software development life cycle SDLC process streamlining application delivery from Dev/QA to UAT/Production
+ Good understanding of Site Reliable Engineer as well as Application Support processes, supporting of incidents and execute/design disaster recovery
+ Strong ability to understand application architecture, able to effectively navigate to the problem area, and identify proactive measures around resiliency, recovery design
+ Ability to apply analytical methodology, such as trending, distribution etc., to get insight from application data to help troubleshooting and analysing best approach
+ Ability to understand business workflow and tie to technical implementation
+ Experience in reading and tracing Java, C++, Python and/or scripting languages
+ Experience of databases including SQL scripting, preferably but not limited to Oracle
**Good to Have:**
+ Understanding of networking principles, its practical uses and basic troubleshooting.
+ Possess the understanding of Cloud (AWS, GCP or Azure), PAAS and implementation with Kubernetes, OpenShift, Windows and Linux
+ Experience in handling client issues and expectation management
+ Good understanding of messaging platforms and protocols like XML, XSLT, IBM MQ, AMQ etc
+ Knowledge of financial messaging protocols like FIX, FPmL, TOF etc
+ Experience security protocols related to connectivity encryption utilizing SSL and TLS
+ Have experience of working in the Finance Industry
+ Knowledge of the Financial OTC Derivative and FX products
+ Awareness of Derivatives products and post trade processing (desirable)
**The Location: Gurgaon, India**
**About Company Statement:**
OSTTRA is a market leader in derivatives post-trade processing, bringing innovation, expertise, processes and networks together to solve the post-trade challenges of global financial markets. OSTTRA operates cross-asset post-trade processing networks, providing a proven suite of Credit Risk, Trade Workflow and Optimization services. Together these solutions streamline post-trade workflows, enabling firms to connect to counterparties and utilities, manage credit risk, reduce operational risk and optimize processing to drive post-trade efficiencies.
OSTTRA was formed in 2021 through the combination of four businesses that have been at the heart of post trade evolution and innovation for the last 20+ years: MarkitServ, Traiana, TriOptima and Reset. These businesses have an exemplary track record of developing and supporting critical market infrastructure and bring together an established community of market participants comprising all trading relationships and paradigms, connected using powerful integration and transformation capabilities.
**About OSTTRA**
_Candidates should note that OSTTRA is an_ _independent firm,_ _jointly owned by S&P Global and CME Group. As part of the joint venture, S&P Global_ _provides recruitment services_ _to OSTTRA - however, successful candidates will be interviewed and directly employed by OSTTRA, joining our global team of more than 1,200 post trade experts._
OSTTRA was formed in 2021 through the combination of four businesses that have been at the heart of post trade evolution and innovation for the last 20+ years: MarkitServ, Traiana, TriOptima and Reset. OSTTRA is a joint venture, owned 50/50 by S&P Global and CME Group.
With an outstanding track record of developing and supporting critical market infrastructure, our combined network connects thousands of market participants to streamline end to end workflows - from trade capture at the point of execution, through portfolio optimization, to clearing and settlement.
Joining the OSTTRA team is a unique opportunity to help build a bold new business with an outstanding heritage in financial technology, playing a central role in supporting global financial markets.
Learn more at .
**What's In It For** **You?**
**Benefits:**
We take care of you, so you can take care of business. We care about our people. That's why we provide everything you-and your career-need to thrive at S&P Global.
Our benefits include:
+ Health & Wellness: Health care coverage designed for the mind and body.
+ Flexible Downtime: Generous time off helps keep you energized for your time on.
+ Continuous Learning: Access a wealth of resources to grow your career and learn valuable new skills.
+ Invest in Your Future: Secure your financial future through competitive pay, retirement planning, a continuing education program with a company-matched student loan contribution, and financial wellness programs.
+ Family Friendly Perks: It's not just about you. S&P Global has perks for your partners and little ones, too, with some best-in class benefits for families.
+ Beyond the Basics: From retail discounts to referral incentive awards-small perks can make a big difference.
For more information on benefits by country visit: Opportunity Employer**
S&P Global is an equal opportunity employer and all qualified candidates will receive consideration for employment without regard to race/ethnicity, color, religion, sex, sexual orientation, gender identity, national origin, age, disability, marital status, military veteran status, unemployment status, or any other status protected by law. Only electronic job submissions will be considered for employment.
If you need an accommodation during the application process due to a disability, please send an email to: and your request will be forwarded to the appropriate person.
**US Candidates Only:** The EEO is the Law Poster describes discrimination protections under federal law. Pay Transparency Nondiscrimination Provision - - Professional (EEO-2 Job Categories-United States of America), BSMGMT203 - Entry Professional (EEO Job Group)
**Job ID:**
**Posted On:**
**Location:** Gurgaon, Haryana, India
Be The First To Know
About the latest Site reliability engineer Jobs in India !
Site Reliability Engineer

Posted 3 days ago
Job Viewed
Job Description
We provide expert, sustainable solutions in records and information management, digital transformation services, data centers, asset lifecycle management, and fine art storage, handling, and logistics. We proudly partner every day with our 225,000 customers around the world to preserve their invaluable artifacts, extract more from their inventory, and protect their data privacy in innovative and socially responsible ways.
Are you curious about being part of our growth story while evolving your skills in a culture that will welcome your unique contributions? If so, let's start the conversation.
Category: Information Technology
Iron Mountain is a global leader in storage and information management services trusted by more than 225,000 organizations in 60 countries. We safeguard billions of our customers' assets, including critical business information, highly sensitive data, and invaluable cultural and historic artifacts. Take a look at our history here.
Iron Mountain helps lower cost and risk, comply with regulations, recover from disaster, and enable digital and sustainable solutions, whether in information management, digital transformation, secure storage and destruction, data center operations, cloud services, or art storage and logistics. Please see our Values and Code of Ethics for a look at our principles and aspirations in elevating the power of our work together.
If you have a physical or mental disability that requires special accommodations, please let us know by sending an email to See the Supplement to learn more about Equal Employment Opportunity.
Iron Mountain is committed to a policy of equal employment opportunity. We recruit and hire applicants without regard to race, color, religion, sex (including pregnancy), national origin, disability, age, sexual orientation, veteran status, genetic information, gender identity, gender expression, or any other factor prohibited by law.
To view the Equal Employment Opportunity is the Law posters and the supplement, as well as the Pay Transparency Policy Statement, CLICK HERE
**Requisition:** J
Site Reliability Engineer

Posted 3 days ago
Job Viewed
Job Description
At Amgen, if you feel like you're part of something bigger, it's because you are. Our shared mission-to serve patients living with serious illnesses-drives all that we do.
Since 1980, we've helped pioneer the world of biotech in our fight against the world's toughest diseases. With our focus on four therapeutic areas -Oncology, Inflammation, General Medicine, and Rare Disease- we reach millions of patients each year. As a member of the Amgen team, you'll help make a lasting impact on the lives of patients as we research, manufacture, and deliver innovative medicines to help people live longer, fuller happier lives.
Our award-winning culture is collaborative, innovative, and science based. If you have a passion for challenges and the opportunities that lay within them, you'll thrive as part of the Amgen team. Join us and transform the lives of patients while transforming your career.
Site Reliability Engineer
**What you will do**
Let's do this. Let's change the world. In this vital role you will responsible for the reliability, stability, performance, scalability, and security of platforms that support Amgen's digital products and engineering teams. This hands-on role focuses on supporting cloud-based infrastructure, automating operations, maintaining observability, and improving platform reliability through code.
You'll work closely with senior engineers and cross-functional teams to support CI/CD workflows, container platforms, incident response, and enterprise tooling-all while adopting modern SRE principles and practices.
This role is ideal for engineers who have foundational site reliability experience and are looking to expand their skills in a cloud-native, enterprise-scale environment.
**Roles & Responsibilities:**
**Infrastructure & Platform Support**
+ Provision and manage cloud infrastructure using Infrastructure as Code (IaC)
+ Support container orchestration platforms, ensuring availability, access control, and resource management
+ Assist in configuring and maintaining CI/CD pipelines and environments
**Monitoring & Incident Response**
+ Set up and maintain observability tools to track system health and performance
+ Participate in alert tuning, incident resolution, and root cause analysis
+ Support integration of observability platforms with incident response workflows
**Automation & Platform Operations**
+ Automate routine platform tasks such as provisioning, patching, and configuration
+ Write scripts to improve platform reliability, reduce manual work, and enforce compliance
+ Participate in platform upgrades, maintenance windows, and service validation efforts
**AI Enablement & Intelligence**
+ Support the adoption of AI-assisted operational tools for log analysis, anomaly detection, and predictive alerts
+ Collaborate with senior engineers to evaluate AI/ML-based observability and automation platforms
+ Assist in integrating AI-driven insights into dashboards, alerts, or incident workflows
+ Stay current with emerging AI trends in infrastructure and site reliability, and contribute to tool evaluations and pilots
**Collaboration & Enablement**
+ Work with development, QA, and security teams to ensure reliable and secure deployments
+ Document operational procedures, playbooks, and system runbooks
+ Learn and support enterprise collaboration platforms and internal tooling
+ Participate in Agile and SAFe delivery processes-including sprint planning, stand-ups, retrospectives, and PI planning-to ensure security and platform reliability are embedded across development cycles.
**What we expect of you**
We are all different, yet we all use our unique contributions to serve patients. The (vital attribute) professional we seek is a (type of person) with these qualifications.
**Basic Qualifications:**
+ Master's degree / Bachelor's degree and 5 to 9 years in Computer Science, IT or related field
+ 4 years of hands-on related experience in site reliability, DevOps, or platform engineering roles
+ Hands-on experience with cloud platforms preferably AWS
+ Familiarity with Kubernetes or container orchestration technologies
+ Exposure to CI/CD practices and pipeline automation
+ Experience troubleshooting Linux systems, processes, and services
**Preferred Qualifications:**
**Must-Have Skills:**
+ Practical experience with **cloud platforms** (e.g., AWS, Azure, or GCP), including compute, networking, IAM, and storage services
+ Familiarity with **container orchestration platforms** (e.g., Kubernetes, Docker), including basic workload deployment and troubleshooting
+ Experience using **Infrastructure as Code (IaC)** tools such as **Terraform** or **CloudFormation**
+ Working knowledge of **Linux administration** , including system services, package management, and file system structures
+ Hands-on exposure to **CI/CD platforms** (e.g., GitLab CI, Jenkins, GitHub Actions) and pipeline troubleshooting
+ Proficiency in **scripting or automation languages** like **Python** , **Bash** , or **Go**
+ Exposure to **observability tooling** (e.g., **Dynatrace** , **Prometheus** , or **Grafana** ) for monitoring and alerting
+ Familiarity with **incident management practices** and tools (e.g., runbooks, escalation workflows, basic alert tuning)
+ Version control skills using **Git** and understanding of branching strategies
+ Experience supporting or integrating **enterprise collaboration platforms** (e.g., Jira, Confluence, ServiceNow)
+ Interest and basic understanding of **AI/ML tools** used in infrastructure and operations (e.g., anomaly detection, intelligent alerting, log analysis)
**Good-to-Have Skills:**
+ Experience using Infrastructure as Code (IaC) tools like Terraform or CloudFormation
+ Familiarity with IT incident response workflows and ticketing platforms
+ Knowledge of secrets management, configuration management tools (e.g., Ansible), or logging frameworks
+ Exposure to **AI-assisted tooling** (e.g., AIOps platforms, AI-enhanced alerting, anomaly detection)
**Professional Certifications (Preferred)**
+ Cloud DevOps Certification (AWS/Azure/GCP)
+ Certified Kubernetes Administrator (CKA) or Security Specialist (CKS)
+ CI/CD Platform Certification
+ ITIL Foundation or equivalent service management certification
**Soft Skills:**
+ Strong analytical and troubleshooting skills
+ Collaborative and proactive mindset
+ Effective communication and documentation practices
+ Curiosity and willingness to adopt new tools and methods, including AI integrations
+ Ability to manage time and prioritize tasks in dynamic environments
**Shift Information:** This position is an onsite role and may require working during later hours to align with business hours. Candidates must be willing and able to work outside of standard hours as required to meet business needs.
**What you can expect of us**
As we work to develop treatments that take care of others, we also work to care for your professional and personal growth and well-being. From our competitive benefits to our collaborative culture, we'll support your journey every step of the way.
In addition to the base salary, Amgen offers competitive and comprehensive Total Rewards Plans that are aligned with local industry standards.
**Apply now and make a lasting impact with the Amgen team.**
**careers.amgen.com**
As an organization dedicated to improving the quality of life for people around the world, Amgen fosters an inclusive environment of diverse, ethical, committed and highly accomplished people who respect each other and live the Amgen values to continue advancing science to serve patients. Together, we compete in the fight against serious disease.
Amgen is an Equal Opportunity employer and will consider all qualified applicants for employment without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, protected veteran status, disability status, or any other basis protected by applicable law.
We will ensure that individuals with disabilities are provided reasonable accommodation to participate in the job application or interview process, to perform essential job functions, and to receive other benefits and privileges of employment. Please contact us to request accommodation.
Site Reliability Engineer
Posted 1 day ago
Job Viewed
Job Description
Position: Site Reliability Engineer (SRE)
Role Overview:
We are seeking an experienced Site Reliability Engineer (SRE) with a strong background in Windows infrastructure to manage and optimize our cloud and on-premises environments. The ideal candidate will partner with development teams to improve service reliability, implement automation, and ensure high-performance systems across VMC, AWS, and Azure platforms.
Key Responsibilities:
- Manage day-to-day operations of VMC, AWS, and Azure infrastructure .
- Gather and analyze metrics from operating systems and applications to support performance tuning and fault finding.
- Collaborate with development teams to improve services through rigorous testing and release procedures.
- Participate in system design consulting, platform management, and capacity planning.
- Create sustainable systems and services through automation and Infrastructure as Code (IaC) .
- Balance feature development speed and reliability by defining and maintaining service-level objectives (SLOs) .
- Build and document automation processes for Infrastructure as a Service (IaaS).
- Oversee backup, patch management, and system maintenance.
- Configure, deploy, maintain, troubleshoot, and monitor container orchestration on AWS.
- Communicate complex technical ideas effectively to both technical and non-technical stakeholders.
Qualifications:
- Bachelor’s degree (or equivalent) in Computer Science or related discipline.
- 7+ years of experience in IT operations, Windows infrastructure, or SRE roles.
- Hands-on experience with Windows Server, Active Directory (AD), LDAP, DNS, network storage, and Azure compute services .
- Strong scripting and programming skills using PowerShell, Python, Ansible, Terraform, and Bash .
- Solid understanding of container orchestration (Kubernetes, Docker, etc.).
- Familiarity with ITSM processes (Incident, Problem, Change Management) using tools such as ServiceNow (preferable).
- Proactive approach to identifying performance bottlenecks and areas for improvement.
- Strong analytical, problem-solving, and communication skills .
- Ability to work both independently and collaboratively with a sense of ownership.
Skills & Attributes:
- Strong interpersonal and teamwork skills.
- Self-driven with a proactive mindset .
- Ability to balance reliability and speed while maintaining high-quality services.