Didn't find the right job?

Get expert career advice to help you find the ideal role and improve your job search strategy.

1,399 Software Reliability jobs in India

Lead Software Engineer - Reliability Engineering, ITC

560093 Bangalore City, Karnataka NIKE

Posted 1 day ago

Tap Again To Close

Job Description

LEAD Site Reliability Engineer

India Technology Center

WHO YOU WILL WORK WITH

You will be a part of a team of talented Site Reliability Engineers focused on delivering reliabile and observable software used by millions of athletes* around the world. You will be a part of the Resilience Engineering organization which includes Reliability Engineering, Live Site Support Engineering, Peak Event Management, Insights & Efficiency Engineering, and Enterprise Systems Engineering.

While a variety of engagement methods exist, SREs are primarily embedded with product delivery teams across Global Technology. These teams span all of Nike’s most critical digital properties: Nike.com, Nike App, SNKRS, brick & mortar retail, wholesale platforms, and supply chain technologies.

WHO WE ARE LOOKING FOR

We are looking for talented and passionate lead full-stack developers with knowledge of data centre infrastructure and cloud platforms who can bring the following:

Bachelor’s degree in Computer Science, Information Systems, Business, or other relevant subject areas
7-12 years of professional experience in software development, operations, or support
Strong design and development experience with Java
Proficient with JavaScript on the frontend (React, Angular, etc.) and backend (Node.js) components
Kubernetes working knowledge and experience
Experience in other modern enterprise languages (functional or other – Scala, Python, Golang, etc.) is preferred
Basic understanding of DNS, Networking, Virtualization, Linux
Expertise in designing/building/supporting scalable cloud-based Micro Services
Experience with Docker and/or Serverless patterns
Experience with at least one No-SQL database like DynamoDb, Cassandra, etc.
Good understanding of RESTful APIs
Good understanding of common tools for service management, agile, and observability: ServiceNow, Jira, Jenkins, Splunk, New Relic, SignalFx

Ability to observe, diagnose, and develop fixes for production issues quickly and efficiently
Ability to develop and drive real-time monitoring solutions that provide visibility into site health and key performance indicators. Practical experiences with observability tools like SignalFx, CatchPoint, Splunk, NewRelic or any other industry-specific products will be a value add. Capability for customization on observability tools will be part of the job
Strong communication skills (written and verbal). They must be able to articulate issues and their impact(s)
Highly confident and capable of reporting and communicating high-value metrics to leadership. Deep understanding of the business landscape and how site reliability influences our consumers
Ability to work across teams (business and technical) to continuously analyze system performance in production, troubleshoot consumer reported issues, and proactively identify areas in need of optimization
Demonstrated negotiation and influencing skills
Passion for coaching, teaching, mentoring and learning

WHAT YOU WILL WORK ON

As a site reliability engineer, you will be focused on maximum availability, observability, reliability, security, and performance for Nike Digital Experiences.

SREs perform deep problem analysis, detect infrastructure or code defects, define, report, and create observability processes for Key Performance Indicators (KPIs), and work with product delivery teams to provide long-term solutions to production issues.

DO NOT EDIT

This advertiser has chosen not to accept applicants from your region.

Principal Software Engineer - Reliability Engineering , ITC

560093 Bangalore City, Karnataka NIKE

Posted 1 day ago

Tap Again To Close

Job Description

PRINCIPAL SITE RELIABILITY ENGINEER
India Technology Center

WHO YOU WILL WORK WITH

The Principal Site Reliability Engineer will work alongside a talented team of Site Reliability Engineers focused on delivering reliabile and observable software used by millions of athletes* around the world. You will be a part of the Resilience Engineering organization which includes Site Reliability Engineering, Quality & Release Engineering, Accessibility Engineering, and High Availability/Disaster Recovery. This role reports to the Senior Director, Reliability Engineering

While a variety of engagement methods exist, SREs are primarily embedded with product delivery teams across Global Technology. These product delivery teams can span all of Nike’s most critical digital properties: Nike.com, Nike App, SNKRS, brick & mortar retail, wholesale platforms, marketing technologies, order fulfillment and supply chain platforms, etc. The Principal Site Reliability Engineer will drive the work done by embedded SREs to a full consumer journey and product level, breaking down organizational silos to deliver true end-to-end consumer reliability.

In order to deliver Reliability Engineering goals, you will partner and influence at multiple levels of not only Global Technology (Director up to CTO), but across business units and geographical locations.

WHO WE ARE LOOKING FOR

We’re looking for a talented engineering leader to join our Global Technology Reliability Engineering team in the role of Principal Site Reliability Engineer.

This leader will have a deep software engineering background, a demonstrated ability to influence and partner, and a deep passion for learning and mentoring. This critically important role will have a track record of delivering large scale distributed systems that are made reliable and observable through the application of concepts from Site Reliability Engineering, DevOps, and other relevant disciplines.

10-14 years combined work experience as a software engineer, team lead/principal engineer, or manager leading distributed teams
Deep understanding of how to deliver large scale software with modern reliability and resilience concepts (multi-region, multi-cloud, active/active, canary deploys, synthetic testing, containers, etc.)
Hands-on experience architecting, deploying, and operating software using modern cloud-based distributed system techniques, micro-service architecture patterns, and DevOps processes
Expertise in data structures, algorithms, and complexity analysis. Experience with AI Ops, AI/ML a plus

Ability to build strong relationships with partners/stakeholders and use technical credibility and influence to drive positive outcomes
Demonstrated experience implementing Service Level Objectives, error budgets, and the associated cultural change
A history of finding and reducing toil within complex systems and processes
Experience with modern observability tooling, processes, and mindset – Splunk, SignalFx, New Relic, CatchPoint, etc. Bonus points for experience with Open Source observability stacks
A passion for learning, teaching, and mentoring
A strong desire for building and motivating teams focused on data-driven continuous improvement

WHAT YOU WILL WORK ON

Site Reliability Engineers are blessed to work with a variety of technologies and teams, have the opportunity to solve complex issues, and contribute to the most important areas of Nike’s global business.

As a Principal Site Reliability Engineer you will:

Partner with leaders in product, engineering, business, and operations to identify and address risks, vulnerabilities, and limits in our end-to-end systems
Technically lead and mentor the SRE team with a focus towards improving the availability, reliability, and observability of Nike’s digital platforms while reducing the burden of toil using tooling, automation, or process change
Use your technical expertise to identify training and up-skilling opportunities, monitor industry trends, and define new reliability patterns for the broader organization
Influence systems design decisions and patterns across business-value engineering teams, infrastructure teams, and architecture
Make the life of on-call engineers safe by delivering deep observability, actionable alerts and runbooks, and iterative Service Level Objectives that truly align with consumer experience
Strategically define a multi-year roadmap in collaboration with peer engineering teams, geo partners, and product management teams
Identify, curate, implement, and adapt key metrics for end-to-end system health and performance

This advertiser has chosen not to accept applicants from your region.

Software Engineer I - Reliability Engineering , ITC

560093 Bangalore City, Karnataka NIKE

Posted 1 day ago

Tap Again To Close

Job Description

Site Reliability Engineer I

India Technology Center

WHO YOU’LL WORK WITH

As a Software Engineer specializing in Resilience Engineering, you will play a critical role in ensuring the maximum availability, observability, reliability, security, and performance of Nike’s digital experiences. This position requires a proactive approach to maintaining robust, consumer-facing systems that support millions of users worldwide.

In this role, you will focus on in-depth problem analysis, identify infrastructure and code-level defects, establish observability processes for key performance indicators (KPIs), and collaborate closely with product delivery teams to design sustainable solutions to production challenges. Your expertise will be vital to enhancing Nike’s commitment to a seamless and resilient digital experience.

WHO WE ARE LOOKING FOR

Nike is seeking talented and driven full stack developers with expertise in cloud infrastructure and services. The ideal candidate will possess:

A Bachelor’s degree in Computer Science, Information Systems, or a related field
Alternatively, 1-3 years of relevant professional experience in lieu of a degree will be considered
Proven experience in designing and developing applications using Java, Node.js, or similar languages
Familiarity with front-end frameworks (e.g., React, Angular) is advantageous
Experience with modern programming languages such as Scala, Python, or Golang is preferred
A solid understanding of DNS, networking, virtualization, and Linux operating systems
Demonstrated expertise in building and managing scalable, cloud-based microservices, ideally on AWS
Experience with Docker or serverless architectures
Proficiency in at least one NoSQL database (e.g., DynamoDB, Cassandra)
Strong understanding of RESTful APIs
Familiarity with service management, agile, and observability tools such as ServiceNow, Jira, Jenkins, Splunk, New Relic, and SignalFX

WHAT YOU’LL WORK ON

Observing, diagnosing, and quickly resolving production issues with precision to minimize service interruptions
Developing and implementing real-time monitoring solutions that deliver essential insights into system health and key performance indicators
Communicating technical issues and their business impacts clearly, ensuring alignment across teams and effective response strategies
Reporting high-value metrics and insights to leadership, demonstrating the impact of site reliability on consumer experience and overall business objectives
Managing IT service processes such as Incident, Problem, Change, and Knowledge Management to maintain service quality and reliability
Collaborating closely with both business and technical teams to analyze system performance, troubleshoot consumer-reported issues, and proactively optimize system efficiency
Leading initiatives to enhance application reliability for high-demand consumer web and mobile platforms, ensuring consistent performance
Leveraging negotiation and influence to foster alignment and drive collaborative solutions across multiple teams
Promoting a culture of growth by coaching, mentoring, and sharing knowledge, supporting continuous improvement and resilience across the team

Join us in delivering resilient, high-performance digital solutions that will empower millions of consumers around the world. Your skills and insights will be pivotal in driving Nike’s digital transformation.

This advertiser has chosen not to accept applicants from your region.

Software Engineer I - Reliability Engineering , ITC

560093 Bangalore City, Karnataka NIKE

Posted 1 day ago

Tap Again To Close

Job Description

Site Reliability Engineer I

India Technology Center

WHO YOU’LL WORK WITH

WHO WE ARE LOOKING FOR

Nike is seeking talented and driven full stack developers with expertise in cloud infrastructure and services. The ideal candidate will possess:

A Bachelor’s degree in Computer Science, Information Systems, or a related field
Alternatively, 1-3 years of relevant professional experience in lieu of a degree will be considered
Proven experience in designing and developing applications using Java, Node.js, or similar languages
Familiarity with front-end frameworks (e.g., React, Angular) is advantageous
Experience with modern programming languages such as Scala, Python, or Golang is preferred
A solid understanding of DNS, networking, virtualization, and Linux operating systems
Demonstrated expertise in building and managing scalable, cloud-based microservices, ideally on AWS
Experience with Docker or serverless architectures
Proficiency in at least one NoSQL database (e.g., DynamoDB, Cassandra)
Strong understanding of RESTful APIs
Familiarity with service management, agile, and observability tools such as ServiceNow, Jira, Jenkins, Splunk, New Relic, and SignalFX

WHAT YOU’LL WORK ON

Observing, diagnosing, and quickly resolving production issues with precision to minimize service interruptions
Developing and implementing real-time monitoring solutions that deliver essential insights into system health and key performance indicators
Communicating technical issues and their business impacts clearly, ensuring alignment across teams and effective response strategies
Reporting high-value metrics and insights to leadership, demonstrating the impact of site reliability on consumer experience and overall business objectives
Managing IT service processes such as Incident, Problem, Change, and Knowledge Management to maintain service quality and reliability
Collaborating closely with both business and technical teams to analyze system performance, troubleshoot consumer-reported issues, and proactively optimize system efficiency
Leading initiatives to enhance application reliability for high-demand consumer web and mobile platforms, ensuring consistent performance
Leveraging negotiation and influence to foster alignment and drive collaborative solutions across multiple teams
Promoting a culture of growth by coaching, mentoring, and sharing knowledge, supporting continuous improvement and resilience across the team

This advertiser has chosen not to accept applicants from your region.

Software Engineer III - Reliability Engineering , ITC

560093 Bangalore City, Karnataka NIKE

Posted 1 day ago

Tap Again To Close

Job Description

SRE III, Reliability Engineering

India Technology Center

WHO YOU’LL WORK WITH

WHO WE ARE LOOKING FOR

The ideal candidate will have a strong software engineering background, a demonstrated ability to influence and partner, and show a passion for learning and mentoring. This engineer will have a track record of delivering reliable and observable digital experiences through the application of concepts from Site Reliability Engineering, DevOps, and other relevant disciplines.

Bachelor’s degree in Computer Science, Information Systems, or other relevant subject areas
4-7 years of professional experience in software engineering

Deep understanding of how to deliver large scale software with modern reliability and resilience concepts (multi-region, multi-cloud, active/active, canary deploys, synthetic testing, containers, etc.)
Hands-on experience building, deploying, and operating software using modern cloud-based distributed system techniques and micro-service architecture patterns. AWS experience preferred

Ability to build strong relationships with partners/stakeholders and use technical credibility and influence to drive positive outcomes
Demonstrated experience implementing Service Level Objectives, error budgets, and the associated cultural change
A history of finding and reducing toil within complex systems and processes

Experience with modern observability tooling, processes, and mindset – Splunk, SignalFx, New Relic, CatchPoint, etc. Bonus points for experience with Open Source observability stacks. Extra bonus points for experience with AI Ops, AI/ML

WHAT YOU’LL WORK ON

As a Site Reliability Engineer, you will be focused on maximum availability, observability, reliability, security, and performance for Nike Digital Experiences. SREs perform deep problem analysis, detect infrastructure or code defects, define, report, and create observability processes for Key Performance Indicators (KPIs), and work with product delivery teams to provide long-term solutions to production issues.

Partner with leaders in product, engineering, business, and operations to identify and address risks, vulnerabilities, and limits in our end-to-end systems
As an embedded engineer deliver product roadmaps for key Nike initiatives like Nike.com, Nike App, Retail systems, Wholesale platforms, Supply Chain systems, and more
Influence systems design decisions and patterns across business-value engineering teams, infrastructure teams, and architecture
Make the life of on-call engineers safe by delivering deep observability, actionable alerts and runbooks, and iterative Service Level Objectives that truly align with consumer experience
Identify, curate, implement, and adapt key metrics for end-to-end system health and performance

This advertiser has chosen not to accept applicants from your region.

Software Engineer I - Reliability Engineering , ITC

Karnataka, Karnataka Nike

Posted 1 day ago

Tap Again To Close

Job Description

**Site Reliability Engineer I**
India Technology Center
**WHO YOU'LL WORK WITH**
As a Software Engineer specializing in Resilience Engineering, you will play a critical role in ensuring the maximum availability, observability, reliability, security, and performance of Nike's digital experiences. This position requires a proactive approach to maintaining robust, consumer-facing systems that support millions of users worldwide.
In this role, you will focus on in-depth problem analysis, identify infrastructure and code-level defects, establish observability processes for key performance indicators (KPIs), and collaborate closely with product delivery teams to design sustainable solutions to production challenges. Your expertise will be vital to enhancing Nike's commitment to a seamless and resilient digital experience.
**WHO WE ARE LOOKING FOR**
Nike is seeking talented and driven full stack developers with expertise in cloud infrastructure and services. The ideal candidate will possess:
+ A Bachelor's degree in Computer Science, Information Systems, or a related field
+ Alternatively, 1-3 years of relevant professional experience in lieu of a degree will be considered
+ Proven experience in designing and developing applications using Java, Node.js, or similar languages
+ Familiarity with front-end frameworks (e.g., React, Angular) is advantageous
+ Experience with modern programming languages such as Scala, Python, or Golang is preferred
+ A solid understanding of DNS, networking, virtualization, and Linux operating systems
+ Demonstrated expertise in building and managing scalable, cloud-based microservices, ideally on AWS
+ Experience with Docker or serverless architectures
+ Proficiency in at least one NoSQL database (e.g., DynamoDB, Cassandra)
+ Strong understanding of RESTful APIs
+ Familiarity with service management, agile, and observability tools such as ServiceNow, Jira, Jenkins, Splunk, New Relic, and SignalFX
**WHAT YOU'LL WORK ON**
+ Observing, diagnosing, and quickly resolving production issues with precision to minimize service interruptions
+ Developing and implementing real-time monitoring solutions that deliver essential insights into system health and key performance indicators
+ Communicating technical issues and their business impacts clearly, ensuring alignment across teams and effective response strategies
+ Reporting high-value metrics and insights to leadership, demonstrating the impact of site reliability on consumer experience and overall business objectives
+ Managing IT service processes such as Incident, Problem, Change, and Knowledge Management to maintain service quality and reliability
+ Collaborating closely with both business and technical teams to analyze system performance, troubleshoot consumer-reported issues, and proactively optimize system efficiency
+ Leading initiatives to enhance application reliability for high-demand consumer web and mobile platforms, ensuring consistent performance
+ Leveraging negotiation and influence to foster alignment and drive collaborative solutions across multiple teams
+ Promoting a culture of growth by coaching, mentoring, and sharing knowledge, supporting continuous improvement and resilience across the team
Join us in delivering resilient, high-performance digital solutions that will empower millions of consumers around the world. Your skills and insights will be pivotal in driving Nike's digital transformation.
NIKE, Inc. is committed to employing a diverse workforce. Qualified applicants will receive consideration without regard to race, color, religion, sex, national origin, age, sexual orientation, gender identity, gender expression, protected veteran status, or disability. NIKE is committed to working with and providing reasonable accommodation to individuals with disabilities. If, because of a medical condition or disability, you need a reasonable accommodation for any part of the employment process, please call and let us know the nature of your request, your location and your contact information.

This advertiser has chosen not to accept applicants from your region.

Software Engineer II, Reliability Engineering , ITC

Karnataka, Karnataka Nike

Posted 1 day ago

Tap Again To Close

Job Description

**Site Reliability Engineer II**
India Technology Center
**WHO YOU'LL WORK WITH**
You will be a part of a team of talented Site Reliability Engineers focused on delivering reliabile and observable software used by millions of athletes* around the world. You will be a part of the Resilience Engineering organization which includes Reliability Engineering, Live Site Support Engineering, Peak Event Management, Insights & Efficiency Engineering, and Enterprise Systems Engineering.
While a variety of engagement methods exist, SREs are primarily embedded with product delivery teams across Global Technology. These teams span all of Nike's most critical digital properties: Nike.com, Nike App, SNKRS, brick & mortar retail, wholesale platforms, and supply chain technologies.
**WHO WE ARE LOOKING FOR**
The ideal candidate will have a strong software engineering background, a demonstrated ability to influence and partner, and show a passion for learning and mentoring. This engineer will have a track record of delivering reliable and observable digital experiences through the application of concepts from Site Reliability Engineering, DevOps, and other relevant disciplines.
+ Bachelor's degree in Computer Science, Information Systems, or other relevant subject areas
+ 2-4 years of professional experience in software engineering
+ Understanding of how to deliver large scale software with modern reliability and resilience concepts (multi-region, multi-cloud, active/active, canary deploys, synthetic testing, containers, etc.)
+ Hands-on experience building, deploying, and operating software using modern cloud-based distributed system techniques and micro-service architecture patterns. AWS experience preferred
+ Experience with modern observability tooling, processes, and mindset - Splunk, SignalFx, New Relic, CatchPoint, etc. Bonus points for experience with Open Source observability stacks. Extra bonus points for experience with AI Ops, AI/ML
+ Strong design and development experience with Java
+ Proficient with JavaScript on frontend (React, Angular, etc.) and backend (Node.js) components
+ Experience in other modern enterprise languages (functional or other - Scala, Python, Golang, etc.) is preferred
+ Basic understanding of DNS, Networking, Virtualization, Linux
+ Expertise in designing/building/supporting scalable cloud-based Micro Services
+ Experience with Docker and/or Serverless patterns
+ Experience with at least one No-SQL database like DynamoDb, Cassandra, etc.
+ Good understanding of RESTful APIs
+ Strong communication skills (written and verbal). They must be able to clearly articulate issues and their impact(s)
+ Highly confident and capable in reporting and communicating high value metrics to leadership. Deep understanding of the business landscape and how site reliability influences our consumers
**WHAT YOU'LL WORK ON**
As a Site Reliability Engineer, you will be focused on maximum availability, observability, reliability, security, and performance for Nike Digital Experiences. SREs perform deep problem analysis, detect infrastructure or code defects, define, report, and create observability processes for Key Performance Indicators (KPIs), and work with product delivery teams to provide long-term solutions to production issues.
+ Observing, diagnosing, and quickly resolving production issues with precision to minimize service interruptions
+ Developing and implementing real-time monitoring solutions that deliver essential insights into system health and key performance indicators
+ Communicating technical issues and their business impacts clearly, ensuring alignment across teams and effective response strategies
+ Reporting high-value metrics and insights to leadership, demonstrating the impact of site reliability on consumer experience and overall business objectives
+ Managing IT service processes such as Incident, Problem, Change, and Knowledge Management to maintain service quality and reliability
+ Collaborating closely with both business and technical teams to analyze system performance, troubleshoot consumer-reported issues, and proactively optimize system efficiency
+ Leading initiatives to enhance application reliability for high-demand consumer web and mobile platforms, ensuring consistent performance
+ Leveraging negotiation and influence to foster alignment and drive collaborative solutions across multiple teams
+ Promoting a culture of growth by coaching, mentoring, and sharing knowledge, supporting continuous improvement and resilience across the team
NIKE, Inc. is committed to employing a diverse workforce. Qualified applicants will receive consideration without regard to race, color, religion, sex, national origin, age, sexual orientation, gender identity, gender expression, protected veteran status, or disability. NIKE is committed to working with and providing reasonable accommodation to individuals with disabilities. If, because of a medical condition or disability, you need a reasonable accommodation for any part of the employment process, please call and let us know the nature of your request, your location and your contact information.

This advertiser has chosen not to accept applicants from your region.

Be The First To Know

About the latest Software reliability Jobs in India !

Set Email Alert:

Enter your email

Job title

Location

Sr Engineer, Software Reliability [T500-20217]

Hyderabad, Andhra Pradesh TMUS Global Solutions

Posted 3 days ago

Tap Again To Close

Job Description

About T-Mobile:

T-Mobile US, Inc. (NASDAQ: TMUS), headquartered in Bellevue, Washington, is America’s supercharged Un-carrier, connecting millions through its strong nationwide network and flagship brands, T-Mobile and Metro by T-Mobile. Customers benefit from an unmatched combination of value, quality, and exceptional service experience.

About TMUS Global Solutions:

TMUS Global Solutions is a world-class technology powerhouse accelerating the company’s global digital transformation. With a culture built on growth, inclusivity, and global collaboration, the teams here drive innovation at scale, powered by bold thinking.

TMUS India Private Limited operates as TMUS Global Solutions.

About the Role:

As a Senior Engineer – Oracle ERP Security & Identity Management, you will be responsible for designing, implementing, and maintaining secure access within Oracle ERP/EPM applications. You will manage role-based security, integrate with Microsoft Entra ID (Azure AD), and support governance across global teams. Your expertise will ensure that ERP platforms remain secure, compliant, and aligned with enterprise identity standards.

We pride ourselves on encouraging a culture of innovation, agile ways of working, and transparency in all we do. Join us and make a tangible impact!

What You’ll Do:

Configure and manage Oracle ERP/EPM security roles, including creation, editing, and assignment of roles and privileges.
Integrate Oracle ERP/EPM with Microsoft Entra ID (Azure AD) for single sign-on and identity federation.
Partner with security and compliance teams to enforce segregation of duties (SoD) and internal control requirements.
Develop and maintain documentation: security design, role matrices, access reviews, audit evidence.
Support provisioning and de-provisioning processes for ERP/EPM users.
Troubleshoot access issues, analyze security logs, and ensure compliance with enterprise security standards.
Collaborate with functional and technical teams to align security design with business processes.
Provide production support for security incidents, role conflicts, and access audits.
Contribute to security governance frameworks, reusable patterns, and automation initiatives for ERP/EPM security.
Support and configure Oracle Risk Management Cloud (RMC) for access certification, advanced controls, and continuous monitoring, ensuring alignment with enterprise compliance standards.

What You’ll Bring:

4-7 years of IT experience with at least 3+ years in Oracle ERP/EPM security administration.
Strong knowledge of role-based access control (RBAC), duty roles, and data security policies.
Hands-on experience integrating Oracle ERP/EPM with Microsoft Entra ID (Azure AD).
Familiarity with provisioning, SSO, MFA, and identity lifecycle management.
Experience in audits, access reviews, and SoD controls.
Strong analytical and problem-solving skills for resolving security and access issues.
Effective communication and collaboration skills with global teams.
Adaptability and resilience in fast-changing environments.
Customer-focused mindset with ability to support governance and compliance needs.

Must Have Skills:

Oracle ERP
Oracle Identity and Access Management (Oracle IAM), SailPoint, Okta, Microsoft Entra ID or any other Identity and access management tool
Oracle Security Roles and Personas

Nice-to-Have:

Oracle certifications in Security, Cloud Administration, or ERP/EPM modules.
Experience with automation of security provisioning using OIC or other tools.
Exposure to compliance frameworks (SOX, USGCI) and audit support.
Familiarity with Agile delivery tools (JIRA, Confluence).

This advertiser has chosen not to accept applicants from your region.

Software Engineering Manager II, Site Reliability Engineering

Bengaluru, Karnataka Google

Posted 1 day ago

Tap Again To Close

Job Description

Software Engineering Manager II, Site Reliability Engineering
_corporate_fare_ Google _place_ Bengaluru, Karnataka, India
**Advanced**
Experience owning outcomes and decision making, solving ambiguous problems and influencing stakeholders; deep expertise in domain.
**Minimum qualifications:**
+ Bachelor's degree in Computer Science, a related field, or equivalent practical experience.
+ 8 years of experience with software development in one or more programming languages.
+ 3 years of experience managing people or teams.
+ 3 years of experience leading projects.
+ 3 years of experience designing, analyzing, and troubleshooting distributed systems.
**Preferred qualifications:**
+ Master's degree in Computer Science or Engineering.
+ Experience designing, analyzing, and troubleshooting large-scale distributed systems or microservices architectures.
+ Experience in driving organizational change.
+ Excellent communication skills along with systematic problem-solving approach.
**About the job**
Site Reliability Engineering (SRE) combines software and systems engineering to build and run large-scale, massively distributed, fault-tolerant systems. SRE ensures that Google's services-both our internally critical and our externally-visible systems-have reliability, uptime appropriate to users' needs and a fast rate of improvement. Additionally SRE's will keep an ever-watchful eye on our systems capacity and performance.
Much of our software development focuses on optimizing existing systems, building infrastructure and eliminating work through automation. On the SRE team, you'll have the opportunity to manage the complex challenges of scale which are unique to Google, while using your expertise in coding, algorithms, complexity analysis and large-scale system design.
SRE's culture of intellectual curiosity, problem solving and openness is key to its success. Our organization brings together people with a wide variety of backgrounds, experiences and perspectives. We encourage them to collaborate, think big and take risks in a blame-free environment. We promote self-direction to work on meaningful projects, while we also strive to create an environment that provides the support and mentorship needed to learn and grow.
**To learn more:** check out our books onSite Reliability Engineering ( or read acareer profile ( about why a Software Engineer chose to join SRE.
Behind everything our users see online is the architecture built by the Technical Infrastructure team to keep it running. From developing and maintaining our data centers to building the next generation of Google platforms, we make Google's product portfolio possible. We're proud to be our engineers' engineers and love voiding warranties by taking things apart so we can rebuild them. We keep our networks up and running, ensuring our users have the best and fastest experience possible.
**Responsibilities**
+ Lead a team of Software/Systems Engineers in Bengaluru, on projects for users and be responsible for uptime.
+ Defend the service Service Level Objective (SLO), participating in a sustainable tier-1 on call rotation and supporting a postmortem culture. Ensure our services continue to operate in a reliable, low toil fashion.
+ Identify and lead opportunities to use Artificial Intelligence and Machine Learning to solve traditional system engineering issues, driving improvements in customer satisfaction.
+ Collaborate with the partner teams to deliver business impact and identify risks inside our corporate systems and help define the expanded functional scope for the team as technologies and business needs evolve.
Information collected and processed as part of your Google Careers profile, and any job applications you choose to submit is subject to Google'sApplicant and Candidate Privacy Policy (./privacy-policy) .
Google is proud to be an equal opportunity and affirmative action employer. We are committed to building a workforce that is representative of the users we serve, creating a culture of belonging, and providing an equal employment opportunity regardless of race, creed, color, religion, gender, sexual orientation, gender identity/expression, national origin, disability, age, genetic information, veteran status, marital status, pregnancy or related condition (including breastfeeding), expecting or parents-to-be, criminal histories consistent with legal requirements, or any other basis protected by law. See alsoGoogle's EEO Policy ( ,Know your rights: workplace discrimination is illegal ( ,Belonging at Google ( , andHow we hire ( .
If you have a need that requires accommodation, please let us know by completing ourAccommodations for Applicants form ( .
Google is a global company and, in order to facilitate efficient collaboration and communication globally, English proficiency is a requirement for all roles unless stated otherwise in the job posting.
To all recruitment agencies: Google does not accept agency resumes. Please do not forward resumes to our jobs alias, Google employees, or any other organization location. Google is not responsible for any fees related to unsolicited resumes.
Google is proud to be an equal opportunity workplace and is an affirmative action employer. We are committed to equal employment opportunity regardless of race, color, ancestry, religion, sex, national origin, sexual orientation, age, citizenship, marital status, disability, gender identity or Veteran status. We also consider qualified applicants regardless of criminal histories, consistent with legal requirements. See also and If you have a need that requires accommodation, please let us know by completing our Accommodations for Applicants form:

This advertiser has chosen not to accept applicants from your region.

Principal Software Engineer - Network Reliability Engineering - AI/ML

Oracle

Posted today

Tap Again To Close

Job Description

**Job Description**
**Description:**
Oracle Cloud Infrastructure (OCI) provides mission-critical cloud services to enterprises worldwide. The Network Reliability Engineering(NRE) Automation, Reporting, and Tooling team builds innovative solutions that boost the productivity and efficiency of the Global Network Operations Center (GNOC). Our tooling empowers the GNOC and Network Reliability Engineering (NRE) teams with observability, automation, and actionable insights at hyperscale.
As a Principal Software Developer, you will design, build, and deliver scalable automation frameworks and advanced platforms leveraging AI/ML to drive operational excellence across OCI's global network. This includes building network event driven data (such as failures), hybrid classification, and both training and inference. You are passionate about developing software that solves real-world operational challenges, thrive in a fast-paced team, and are comfortable working with complex distributed systems. You value simplicity, scalability, and collaboration.
**Responsibilities:**
+ Architect, build, and support distributed systems for process control and execution based on Product Requirement Documents (PRDs).
+ Develop and sustain DevOps tooling, new product process integrations and automated testing.
+ Develop ML in Python 3; build backend services in Go (Golang); create command-line interface (CLI) tools in Rust or Python 3; and integrate with other services as needed using Go, Python 3, or C.
+ Build and maintain schemas/models to ensure every platform and service write is captured for monitoring, debugging and compliance
+ Build and maintain dashboards that monitor the quality and effectiveness of service execution for "process as code" your team delivers.
+ Build automated systems that route code failures to the appropriate oncall engineers and service owners.
+ Ensure high availability, reliability, and performance of developed solutions in production environments.
+ Support serverless workflow development for workflows which call and utlize the above mentioned services support our GNOC, GNRE, and onsite operations and hardware support teams.
+ Participate in code reviews, mentor peers, and help build a culture of engineering excellence.
+ Operate in an Extreme Programming (XP) asynchronous environment (chat/tasks) without daily standups, and keep work visible by continuously updating task and ticket states in Jira.
**Required Qualifications:**
+ 8 - 10 years of experience in process as code, software engineering, automation development, or similar roles
+ Bachelors in computer science and Engineering or related engineering fields
+ Strong coding skills in Go and Python3
+ Experience with distributed systems, micro-services, and cloud-native technologies
+ Proficiency in Linux environments and scripting languages
+ Proficiency with database creation, maintenance and code using SQL and Go or Py3 libraries
+ Understanding of network operations or large-scale IT infrastructure
+ Excellent problem-solving, organizational, and communication skills
+ Experience using AI coding assistants or AI-powered tools to help accelerate software development, including code generation, code review, or debugging.
**Preferred Qualifications:**
+ Process engineering experience (control systems, proportional integral derivative's (pid), statistical process control (SPC))
+ Proficiency with data modeling, data analysis, and reporting frameworks (e.g., SQL, Spark, Prometheus, Grafana, etc.)
+ Experience with C, Cpp, Java, or Rust
+ Experience developing automation and tools for network or scale cloud operations
+ Background in creating dashboards, alerts, and real-time reporting platforms
+ Familiarity with workflow automation (e.g., Apache Airflow), CI/CD pipelines, or infrastructure as code
+ Previous experience supporting or building tools for (any) hyperscale or scale could network, compute, or storage operations.
+ Knowledge of REST APIs, remote procedure calls (RPCs), and service oriented architectures (SOA)
+ Familiarity with eXtreme programming (xp), agile, and devops process
+ Experience with ticketing and version control systems (e.g., Jira, Git)
Career Level - IC4
**About Us**
As a world leader in cloud solutions, Oracle uses tomorrow's technology to tackle today's challenges. We've partnered with industry-leaders in almost every sector-and continue to thrive after 40+ years of change by operating with integrity.
We know that true innovation starts when everyone is empowered to contribute. That's why we're committed to growing an inclusive workforce that promotes opportunities for all.
Oracle careers open the door to global opportunities where work-life balance flourishes. We offer competitive benefits based on parity and consistency and support our people with flexible medical, life insurance, and retirement options. We also encourage employees to give back to their communities through our volunteer programs.
We're committed to including people with disabilities at all stages of the employment process. If you require accessibility assistance or accommodation for a disability at any point, let us know by emailing or by calling in the United States.
Oracle is an Equal Employment Opportunity Employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, national origin, sexual orientation, gender identity, disability and protected veterans' status, or any other characteristic protected by law. Oracle will consider for employment qualified applicants with arrest and conviction records pursuant to applicable law.

This advertiser has chosen not to accept applicants from your region.

Industry

View All Software Reliability Jobs

Menu

Search Suggestions

Recent Searches

Popular Searches

Location Suggestions

Popular Locations

Nearby Locations

Other Jobs Near Me

Industry

1,399 Software Reliability jobs in India

Lead Software Engineer - Reliability Engineering, ITC

Job Description

Principal Software Engineer - Reliability Engineering , ITC

Job Description

Software Engineer I - Reliability Engineering , ITC

Job Description

Software Engineer I - Reliability Engineering , ITC

Job Description

Software Engineer III - Reliability Engineering , ITC

Job Description

Software Engineer I - Reliability Engineering , ITC

Job Description

Software Engineer II, Reliability Engineering , ITC

Job Description

Be The First To Know

Sr Engineer, Software Reliability [T500-20217]

Job Description

Software Engineering Manager II, Site Reliability Engineering

Job Description

Principal Software Engineer - Network Reliability Engineering - AI/ML

Job Description

Nearby Locations

Other Jobs Near Me

Industry