83 Devops Engineers jobs in Chennai
Site Reliability Engineer

Posted 1 day ago
Job Viewed
Job Description
Découvrez votre prochaine opportunité au sein d'une organisation qui compte parmi les 500 plus importantes entreprises mondiales. Envisagez des opportunités innovantes, découvrez notre culture enrichissante et travaillez avec des équipes talentueuses qui vous poussent à vous développer chaque jour. Nous savons ce qu'il faut faire pour diriger UPS vers l'avenir : des personnes passionnées dotées d'une combinaison unique de compétences. Si vous avez les qualités, de la motivation, de l'autonomie ou le leadership pour diriger des équipes, il existe des postes adaptés à vos aspirations et à vos compétences d'aujourd'hui et de demain.
**Fiche de poste :**
**Job Summary:**
We are seeking a skilled and proactive **Site Reliability Engineer (SRE)** with 5-8 years of experience and deep expertise in **Google Cloud Platform (GCP)** . The ideal candidate will be responsible for the reliability, availability, and performance of cloud-based applications and infrastructure. You will collaborate with development, operations, and security teams to build and maintain scalable, secure, and highly available systems.
**Key Responsibilities:**
+ Design, develop, and maintain **reliable, scalable, and highly available systems** on GCP.
+ Build and manage **CI/CD pipelines** , infrastructure as code (IaC), and monitoring solutions.
+ Proactively monitor and manage **system performance, uptime, and capacity** using observability tools.
+ Troubleshoot and resolve **infrastructure and application-level issues** in real-time.
+ Implement and maintain **disaster recovery** , **failover mechanisms** , and **backup strategies** .
+ Automate repetitive tasks and processes to improve **efficiency and reduce toil** .
+ Participate in **on-call rotations** , incident management, and root cause analysis (RCA).
+ Ensure compliance with **security standards, privacy regulations, and governance policies** .
+ Collaborate with cross-functional teams to support **DevOps and SRE best practices** .
+ Drive improvements in **SLAs, SLOs, and error budgets** through data-driven insights.
**Required Qualifications:**
+ 5-8 years of relevant experience as an SRE, DevOps Engineer, or Cloud Infrastructure Engineer.
+ Strong hands-on experience with **Google Cloud Platform (GCP)** - Compute Engine, GKE, Cloud Functions, Cloud Storage, IAM, BigQuery, etc.
+ Proficiency in **Infrastructure as Code** tools like **Terraform** , **Deployment Manager** , or **CloudFormation** .
+ Experience with **Kubernetes** , **Docker** , and container orchestration.
+ Proficiency in scripting languages like **Python** , **Shell** , or **Go** .
+ Deep understanding of **monitoring and logging tools** such as **Prometheus** , **Grafana** , **Stackdriver** , or **Datadog** .
+ Knowledge of **CI/CD tools** such as Jenkins, GitLab CI, or Cloud Build.
+ Experience with **incident response** , **postmortem analysis** , and **site reliability principles** .
+ Strong problem-solving and communication skills.
**Preferred Qualifications:**
+ GCP certifications (e.g., **Professional Cloud DevOps Engineer** , **Cloud Architect** ).
+ Exposure to **multi-cloud environments** or hybrid cloud infrastructure.
+ Familiarity with **Agile** and **ITIL** frameworks.
+ Experience working in regulated environments with compliance standards (e.g., ISO, SOC2).
**Type de contrat:**
en CDI
_Chez UPS, égalité des chances, traitement équitable et environnement de travail inclusif sont des valeurs clefs auxquelles nous sommes attachés._
Site Reliability Engineer

Posted 1 day ago
Job Viewed
Job Description
Explore your next opportunity at a Fortune Global 500 organization. Envision innovative possibilities, experience our rewarding culture, and work with talented teams that help you become better every day. We know what it takes to lead UPS into tomorrow-people with a unique combination of skill + passion. If you have the qualities and drive to lead yourself or teams, there are roles ready to cultivate your skills and take you to the next level.
**Job Description:**
**Job Summary:**
We are seeking a skilled and proactive **Site Reliability Engineer (SRE)** with 5-8 years of experience and deep expertise in **Google Cloud Platform (GCP)** . The ideal candidate will be responsible for the reliability, availability, and performance of cloud-based applications and infrastructure. You will collaborate with development, operations, and security teams to build and maintain scalable, secure, and highly available systems.
**Key Responsibilities:**
+ Design, develop, and maintain **reliable, scalable, and highly available systems** on GCP.
+ Build and manage **CI/CD pipelines** , infrastructure as code (IaC), and monitoring solutions.
+ Proactively monitor and manage **system performance, uptime, and capacity** using observability tools.
+ Troubleshoot and resolve **infrastructure and application-level issues** in real-time.
+ Implement and maintain **disaster recovery** , **failover mechanisms** , and **backup strategies** .
+ Automate repetitive tasks and processes to improve **efficiency and reduce toil** .
+ Participate in **on-call rotations** , incident management, and root cause analysis (RCA).
+ Ensure compliance with **security standards, privacy regulations, and governance policies** .
+ Collaborate with cross-functional teams to support **DevOps and SRE best practices** .
+ Drive improvements in **SLAs, SLOs, and error budgets** through data-driven insights.
**Required Qualifications:**
+ 5-8 years of relevant experience as an SRE, DevOps Engineer, or Cloud Infrastructure Engineer.
+ Strong hands-on experience with **Google Cloud Platform (GCP)** - Compute Engine, GKE, Cloud Functions, Cloud Storage, IAM, BigQuery, etc.
+ Proficiency in **Infrastructure as Code** tools like **Terraform** , **Deployment Manager** , or **CloudFormation** .
+ Experience with **Kubernetes** , **Docker** , and container orchestration.
+ Proficiency in scripting languages like **Python** , **Shell** , or **Go** .
+ Deep understanding of **monitoring and logging tools** such as **Prometheus** , **Grafana** , **Stackdriver** , or **Datadog** .
+ Knowledge of **CI/CD tools** such as Jenkins, GitLab CI, or Cloud Build.
+ Experience with **incident response** , **postmortem analysis** , and **site reliability principles** .
+ Strong problem-solving and communication skills.
**Preferred Qualifications:**
+ GCP certifications (e.g., **Professional Cloud DevOps Engineer** , **Cloud Architect** ).
+ Exposure to **multi-cloud environments** or hybrid cloud infrastructure.
+ Familiarity with **Agile** and **ITIL** frameworks.
+ Experience working in regulated environments with compliance standards (e.g., ISO, SOC2).
**Employee Type:**
Permanent
UPS is committed to providing a workplace free of discrimination, harassment, and retaliation.
Devops Network Engineers
Posted 3 days ago
Job Viewed
Job Description
Required Qualifications:
- Bachelor’s degree in Computer Science, Engineering, or related field (or equivalent experience)
- 5+ years of experience in network engineering, systems administration, or engineering operations
- Strong proficiency in Linux and Windows server environments
- Deep understanding of networking protocols (TCP/IP, DNS, DHCP, VPN, routing)
- Experience with CI/CD tools (e.g., Jenkins, GitLab CI), monitoring platforms (e.g., Prometheus, Grafana), and infrastructure as code (e.g., Terraform, Ansible)
- Scripting skills (e.g., Bash, PowerShell, Python)
- Familiarity with cloud platforms (AWS, Azure, GCP) and containerization (Docker, Kubernetes)
- Experience with asset tracking tools and inventory management systems.
Senior Site reliability Engineer

Posted 1 day ago
Job Viewed
Job Description
Are you a visible champion with a 'can do' attitude and enthusiasm that inspires others?
About the Business
LexisNexis Risk Solutions is the essential partner in the assessment of risk. Within our Business Services vertical, we offer a multitude of solutions focused on helping businesses of all sizes drive higher revenue growth, maximize operational efficiencies, and improve customer experience. Our solutions help our customers solve difficult problems in the areas of Anti-Money Laundering/Counter Terrorist Financing, Identity Authentication & Verification, Fraud and Credit Risk mitigation and Customer Data Management. You can learn more about LexisNexis Risk at the link below, the Team
This Team performs complex research, design, and software development assignments within a software functional area or product line, and provides direct input to project plans, schedules, and methodology in the development of cross-functional software products.
About the Role
.
We're looking for an experienced Site Reliability Engineer (SRE) to join our team. In this role, you'll work on meaningful projects that improve the reliability, performance, and efficiency of our systems. You'll help reduce manual work through automation, support incident response, and contribute to continuous improvement efforts.
This position is ideal for someone who enjoys solving complex problems, collaborating across teams, and making systems more resilient and scalable.
Responsibilities:
+ Design and implement solutions to improve system reliability and reduce manual tasks.
+ Monitor distributed systems and their dependencies to ensure performance and availability.
+ Automate recovery processes to maintain service levels.
+ Participate in on-call rotations and support incident response.
+ Share knowledge and provide informal mentorship to team members.
+ Contribute to process and tooling improvements based on hands-on experience.
Requirements:
+ Experience with SRE practices such as monitoring, incident response, and automation.
+ CI/CD (including microservices pipeline design, large-scale Docker image handling) Git
+ Good in writing Terraform modules
+ Demonstrate good experience on Networking (NAT, outbound proxy, subnet-to-cluster communication)
+ Exposure to application Architecture, Performance Testing (pre-production)
+ Demonstrate Cloud experience
+ Linux (including bash scripting)
+ Good in trouble shooting , Cost Optimization
+ Security ( Snyk , infra security tooling & processes)
+ Familiarity with distributed systems and cloud infrastructure.
+ Ability to write scripts or code to automate tasks (e.g., Python, Bash, Go).
+ Strong problem-solving skills and a collaborative mindset.
+ Willingness to learn and grow in a supportive team environment.
Learn more about the LexisNexis Risk team and how we work ( are committed to providing a fair and accessible hiring process. If you have a disability or other need that requires accommodation or adjustment, please let us know by completing our Applicant Request Support Form or please contact .
Criminals may pose as recruiters asking for money or personal information. We never request money or banking details from job applicants. Learn more about spotting and avoiding scams here .
Please read our Candidate Privacy Policy .
We are an equal opportunity employer: qualified applicants are considered for and treated during employment without regard to race, color, creed, religion, sex, national origin, citizenship status, disability status, protected veteran status, age, marital status, sexual orientation, gender identity, genetic information, or any other characteristic protected by law.
USA Job Seekers:
EEO Know Your Rights .
RELX is a global provider of information-based analytics and decision tools for professional and business customers, enabling them to make better decisions, get better results and be more productive.
Our purpose is to benefit society by developing products that help researchers advance scientific knowledge; doctors and nurses improve the lives of patients; lawyers promote the rule of law and achieve justice and fair results for their clients; businesses and governments prevent fraud; consumers access financial services and get fair prices on insurance; and customers learn about markets and complete transactions.
Our purpose guides our actions beyond the products that we develop. It defines us as a company. Every day across RELX our employees are inspired to undertake initiatives that make unique contributions to society and the communities in which we operate.
Senior Site Reliability Engineer
Posted 3 days ago
Job Viewed
Job Description
We’re looking for an experienced Site Reliability Engineer to fill the mission-critical role of ensuring that our complex, web-scale systems are healthy, monitored, automated, and designed to scale. You will use your background as an operations generalist to work closely with our development teams from the early stages of design all the way through identifying and resolving production issues. The ideal candidate will be passionate about an operations role that involves deep knowledge of both the application and the product, and will also believe that automation is a key component to operating large-scale systems.
6-Month Accomplishments
- Familiarize with poshmark tech stack and functional requirements.
- Get comfortable with automation tools/frameworks used within cloudops organization and deployment processes associated with.
- Gain in depth knowledge related to related product functionality and infrastructure required for it.
- Start Contributing by working on small to medium scale projects.
- Understand and follow on call rotation as a secondary to get familiarized with the on call process.
12+ Month Accomplishments
- Execute projects related to comms functionality, independently, with little guidance from lead.
- Create meaningful alerts and dashboards for various sub-system involved in targeted infrastructure.
- Identify gaps in infrastructure and suggest improvements or work on it.
- Get involved in on-call rotation.
Responsibilities
- Serve as a primary point responsible for the overall health, performance, and capacity of one or more of our Internet-facing services.
- Gain deep knowledge of our complex applications.
- Assist in the roll-out and deployment of new product features and installations to facilitate our rapid iteration and constant growth.
- Develop tools to improve our ability to rapidly deploy and effectively monitor custom applications in a large-scale UNIX environment.
- Work closely with development teams to ensure that platforms are designed with "operability" in mind.
- Function well in a fast-paced, rapidly-changing environment.
- Participate in a 24x7 on-call rotation
Desired Skills
- 4+ years of experience in Systems Engineering/Site Reliability Operations role is required, ideally in a startup or fast-growing company.
- 4+ years in a UNIX-based large-scale web operations role.
- 4+ years of experience in doing 24/7 support for large scale production environments.
- Battle-proven, real-life experience in running a large scale production operation.
- Experience working on cloud-based infrastructure e.g AWS, GCP, Azure.
- Hands-on experience with continuous integration tools such as Jenkins, configuration management with Ansible, systems monitoring and alerting with tools such as Nagios, New Relic, Graphite.
- Experience scripting/coding
- Ability to use a wide variety of open source technologies and tools.
Technologies we use:
- Ruby, JavaScript, NodeJs, Tomcat, Nginx, HaProxy
- MongoDB, RabbitMQ, Redis, ElasticSearch.
- Amazon Web Services (EC2, RDS, CloudFront, S3, etc.)
- Terraform, Packer, Jenkins, Datadog, Kubernetes, Docker, Ansible and other DevOps tools.
Senior Site Reliability Engineer
Posted 3 days ago
Job Viewed
Job Description
Dear Candidates,
Greetings from TCS!
TCS is looking for Senior Site Reliability Engineer – AWS
Experience: 8-12 years
Location: Chennai
Must have skills:
- Design, implement, and maintain scalable, secure, and highly available infrastructure on AWS
- Develop and improve CI/CD pipelines, Infrastructure as Code (IaC) using Terraform, Harness
- Own and implement monitoring, alerting, logging, and distributed tracing with tools like Dynatrace/ Datadog
- Troubleshoot production incidents, conduct blameless postmortems, and improve incident response processes
- Optimize systems for cost, performance, and reliability
- Drive chaos engineering and resilience testing · Collaborate with development teams to embed SRE practices like SLAs, SLOs, and error budgets
- Mentor junior SREs and promote DevOps/SRE culture across the org
Good to have skills:
- Experience with Service Meshes (e.g., Istio, App Mesh) · Familiarity with AWS Well-Architected Framework
- Experience building self-healing systems and automated remediation
- Background in security, compliance, or multi-account/multi-region AWS architectures
- AWS Certified DevOps Engineer – Professional
- AWS Certified Solutions Architect – Professional
Roles and responsibilities:
- We are looking for a Senior Site Reliability Engineer (SRE) with deep experience in AWS infrastructure, automation, observability, and production support.
- As an SRE, you will ensure our cloud-native systems are resilient, scalable, and efficient, driving reliability through code, not just processes.
- Coordinate with customer, onsite and testing team on daily basis for tasks and bugs
- Also report to the TCS management on the task
Senior Site Reliability Engineer I

Posted 1 day ago
Job Viewed
Job Description
Join Our Diverse and Inclusive Team Delivering High-Quality Software Worldwide
Are you someone who enjoys working with others, solving problems creatively, and making a meaningful difference?
About the Business
At ICIS, our purpose is to optimize the world's resources and empower strategic, sustainable decisions by bringing market transparency to all. We support organizations of every size and background with accessible and actionable insights across global value chains. Our team values diverse perspectives and welcomes candidates from all walks of life. Explore more about us at the Team
Our teams, called Squads, are made up of people with a variety of skills and experiences, including Squad Leads, Business Analysts, Developers, and Testers. We work collaboratively, learn from each other, and support one another to achieve shared goals. Everyone's contributions are valued, and we encourage open communication and continuous learning.
About the Role
As a Senior Site Reliability Engineer I, you will play a key role in ensuring our applications and infrastructure are reliable, scalable, and secure. You will work closely with development, architecture, and service management teams, using your skills to configure, maintain, monitor, and improve our systems. This role is ideal for someone who enjoys tackling challenges, sharing knowledge, and driving innovation that benefits our customers and colleagues.
Responsibilities
+ Lead efforts to enhance system reliability and scalability, designing solutions that support our evolving business needs while maintaining security and quality.
+ Work collaboratively with software engineers and other teams to design and implement deployment approaches using automated processes for continuous integration and delivery.
+ Help design, develop, test, and implement solutions that improve availability, reliability, and scalability of our applications.
+ Implement infrastructure, configuration, and network as code for supported applications and platforms.
+ Partner with Infrastructure, DevOps, Development, and SRE teams to resolve complex technical challenges.
+ Monitor service levels and proactively address issues before they affect our customers.
+ Promote and support best practices in site reliability engineering across the team.
Requirements
+ Experience working with cloud platforms (such as Amazon Web Services) and Infrastructure as a Service (IaaS).
+ Background in DevOps, site reliability engineering practices, or related areas. We value a variety of experiences and encourage you to apply even if you don't meet every listed qualification.
+ Understanding of site reliability principles and how they can improve the software development process.
+ Familiarity with continuous integration and delivery tools (e.g., Jenkins, GitLab, Terraform).
+ Experience with containers and orchestration tools (such as ECS, Kubernetes, or Docker) is helpful but not required.
+ Ability to diagnose and resolve networking, performance, and optimization issues in distributed systems.
+ Basic knowledge of Linux, networking, and storage fundamentals.
+ Interest in sharing knowledge, supporting teammates, and contributing to a positive team environment.
+ Understanding of release, integration, and deployment processes.
+ Familiarity with monitoring, logging, and alerting tools (such as Grafana, Prometheus, or similar) is a plus.
+ Ability to use scripting languages (such as Python, Bash, or PowerShell) to automate tasks is welcome but not required.
+ Strong communication skills and ability to work collaboratively within a diverse team.
Our Commitment to Diversity, Equity, and Inclusion
We believe that diversity, equity, and inclusion make us stronger. We welcome applications from individuals of all backgrounds, experiences, and abilities. If you require an accommodation at any stage of the application process, please let us know-we are happy to support you.
We encourage individuals from all backgrounds and experiences to apply-even if you are unsure whether you meet every qualification. If you have a passion for reliability, collaboration, and continuous improvement, we want to hear from you!
Learn more about the LexisNexis Risk team and how we work ( .
We are committed to providing a fair and accessible hiring process. If you have a disability or other need that requires accommodation or adjustment, please let us know by completing our Applicant Request Support Form or please contact .
Criminals may pose as recruiters asking for money or personal information. We never request money or banking details from job applicants. Learn more about spotting and avoiding scams here .
Please read our Candidate Privacy Policy .
We are an equal opportunity employer: qualified applicants are considered for and treated during employment without regard to race, color, creed, religion, sex, national origin, citizenship status, disability status, protected veteran status, age, marital status, sexual orientation, gender identity, genetic information, or any other characteristic protected by law.
USA Job Seekers:
EEO Know Your Rights .
RELX is a global provider of information-based analytics and decision tools for professional and business customers, enabling them to make better decisions, get better results and be more productive.
Our purpose is to benefit society by developing products that help researchers advance scientific knowledge; doctors and nurses improve the lives of patients; lawyers promote the rule of law and achieve justice and fair results for their clients; businesses and governments prevent fraud; consumers access financial services and get fair prices on insurance; and customers learn about markets and complete transactions.
Our purpose guides our actions beyond the products that we develop. It defines us as a company. Every day across RELX our employees are inspired to undertake initiatives that make unique contributions to society and the communities in which we operate.
Be The First To Know
About the latest Devops engineers Jobs in Chennai !
Senior Site Reliability Engineer I

Posted 1 day ago
Job Viewed
Job Description
Join Our Diverse and Inclusive Team Delivering High-Quality Software Worldwide
Are you someone who enjoys working with others, solving problems creatively, and making a meaningful difference?
About the Business
At ICIS, our purpose is to optimize the world's resources and empower strategic, sustainable decisions by bringing market transparency to all. We support organizations of every size and background with accessible and actionable insights across global value chains. Our team values diverse perspectives and welcomes candidates from all walks of life. Explore more about us at the Team
Our teams, called Squads, are made up of people with a variety of skills and experiences, including Squad Leads, Business Analysts, Developers, and Testers. We work collaboratively, learn from each other, and support one another to achieve shared goals. Everyone's contributions are valued, and we encourage open communication and continuous learning.
About the Role
As a Senior Site Reliability Engineer I, you will play a key role in ensuring our applications and infrastructure are reliable, scalable, and secure. You will work closely with development, architecture, and service management teams, using your skills to configure, maintain, monitor, and improve our systems. This role is ideal for someone who enjoys tackling challenges, sharing knowledge, and driving innovation that benefits our customers and colleagues.
Responsibilities
+ Lead efforts to enhance system reliability and scalability, designing solutions that support our evolving business needs while maintaining security and quality.
+ Work collaboratively with software engineers and other teams to design and implement deployment approaches using automated processes for continuous integration and delivery.
+ Help design, develop, test, and implement solutions that improve availability, reliability, and scalability of our applications.
+ Implement infrastructure, configuration, and network as code for supported applications and platforms.
+ Partner with Infrastructure, DevOps, Development, and SRE teams to resolve complex technical challenges.
+ Monitor service levels and proactively address issues before they affect our customers.
+ Promote and support best practices in site reliability engineering across the team.
Requirements
+ Experience working with cloud platforms (such as Amazon Web Services) and Infrastructure as a Service (IaaS).
+ Background in DevOps, site reliability engineering practices, or related areas. We value a variety of experiences and encourage you to apply even if you don't meet every listed qualification.
+ Understanding of site reliability principles and how they can improve the software development process.
+ Familiarity with continuous integration and delivery tools (e.g., Jenkins, GitLab, Terraform).
+ Experience with containers and orchestration tools (such as ECS, Kubernetes, or Docker) is helpful but not required.
+ Ability to diagnose and resolve networking, performance, and optimization issues in distributed systems.
+ Basic knowledge of Linux, networking, and storage fundamentals.
+ Interest in sharing knowledge, supporting teammates, and contributing to a positive team environment.
+ Understanding of release, integration, and deployment processes.
+ Familiarity with monitoring, logging, and alerting tools (such as Grafana, Prometheus, or similar) is a plus.
+ Ability to use scripting languages (such as Python, Bash, or PowerShell) to automate tasks is welcome but not required.
+ Strong communication skills and ability to work collaboratively within a diverse team.
Our Commitment to Diversity, Equity, and Inclusion
We believe that diversity, equity, and inclusion make us stronger. We welcome applications from individuals of all backgrounds, experiences, and abilities. If you require an accommodation at any stage of the application process, please let us know-we are happy to support you.
We encourage individuals from all backgrounds and experiences to apply-even if you are unsure whether you meet every qualification. If you have a passion for reliability, collaboration, and continuous improvement, we want to hear from you!
Learn more about the LexisNexis Risk team and how we work ( .
We are committed to providing a fair and accessible hiring process. If you have a disability or other need that requires accommodation or adjustment, please let us know by completing our Applicant Request Support Form or please contact .
Criminals may pose as recruiters asking for money or personal information. We never request money or banking details from job applicants. Learn more about spotting and avoiding scams here .
Please read our Candidate Privacy Policy .
We are an equal opportunity employer: qualified applicants are considered for and treated during employment without regard to race, color, creed, religion, sex, national origin, citizenship status, disability status, protected veteran status, age, marital status, sexual orientation, gender identity, genetic information, or any other characteristic protected by law.
USA Job Seekers:
EEO Know Your Rights .
RELX is a global provider of information-based analytics and decision tools for professional and business customers, enabling them to make better decisions, get better results and be more productive.
Our purpose is to benefit society by developing products that help researchers advance scientific knowledge; doctors and nurses improve the lives of patients; lawyers promote the rule of law and achieve justice and fair results for their clients; businesses and governments prevent fraud; consumers access financial services and get fair prices on insurance; and customers learn about markets and complete transactions.
Our purpose guides our actions beyond the products that we develop. It defines us as a company. Every day across RELX our employees are inspired to undertake initiatives that make unique contributions to society and the communities in which we operate.
Senior Site Reliability Engineer I

Posted 1 day ago
Job Viewed
Job Description
Are you excited to work with a variety of products and technologies in a collaborative and supportive environment?
Do you enjoy sharing knowledge and learning with colleagues from diverse backgrounds?
About the Business
LexisNexis Risk Solutions is a trusted partner in risk assessment. Within our insurance vertical, we provide customers with innovative solutions and decision tools that leverage public and industry-specific content, advanced technology, and analytics. Our products help drive informed, data-driven decisions across the insurance policy lifecycle, supporting operational efficiency and reducing risk. Insurance Risk Solutions ( the Team
Our teams are structured in inclusive and collaborative squads, each consisting of a Squad Lead, Business Analyst, Dev Lead, Developers, and Testers. This structure fosters strong connections and empowers members to contribute their unique perspectives and deliver outcomes together.
About the Role
This is a developed professional level role for an SRE. Individuals are responsible for challenging reliability and toil reduction projects. At this level, SREs have hands-on experience across most SRE practices. They have a good understanding of how to observe distributed systems and their dependencies, and how to automate recovery to protect service levels. SREs are on-call and assist others during incidents. They contribute to process improvements through experience and knowledge. Individuals in this role provide informal guidance to junior staff.
Responsibilities
+ Design, modify, develop, write, and implement data applications and REST APIs in cloud.
+ Develop microservices or middleware APIs under a micro-service styled architecture.
+ Fix issues in the existing systems by using the industry standard coding practices.
+ Interface with other technical personnel or team members to finalize requirements.
+ Write and review portions of detailed specifications for the development of complex system components.
+ Implement DevOps strategy in existing and new project.
+ Upgrade and enhance existing C++ applications to enterprise Java.
+ Work closely with other development team members to understand complex product requirements and translate them into software designs.
+ Successfully implement development processes, coding best practices, and code reviews.
+ Operate in various development environments (Agile, Waterfall, etc.) while collaborating with key stakeholders.
+ Resolve complex technical issues, as necessary.
+ Keep abreast of new technology developments.
+ Design and work with complex data models.
+ All other duties as assigned.
Requirements
+ At least 7/8+ years of experience of software development programming with enterprise Java
+ Proficiency in Java programming with Spring Framework
+ At least 3 years of experience with Microservices asynchronous event driven programming or event messaging programming under a pub/sub model.
+ Strong secure coding and system vulnerability mitigation experience.
+ Strong experience in AI-assisted software development.
+ Strong knowledge of Docker containers, Docker CLI and strong practical knowledge of Kubernetes and Kubernetes CLI is desired
+ Expertise in other development languages including but not limited to PowerShell, Python, and SQL is desired.
+ Strong knowledge of multiple data storage subsystems, especially relational databases including MySQL, Sybase, PostgreSQL, and/or Oracle.
+ Advanced knowledge of software development methodologies (e.g., Agile, Waterfall).
+ Strong proficiency with data manipulation language including optimization techniques.
+ Strong understanding of normalized/dimensional data modeling principles.
+ Strong research skills.
+ Strong proficiency with DevOps and Continuous Integration/Delivery tools and Linux bash scripting is desired: Gitlab, GitHub Actions, Ansible, Linux Bash scripting.
+ Strong experience working with one of the 2 major Cloud vendors (AWS, Azure) and/or a strong foundational understanding of Microsoft Azure Cloud IaaS and PaaS services offerings is desired: (I.E Azure VMs, Azure App Services, Azure Container instances, Azure Storage Accounts, Azure Database for MySql, etc.)
+ Good to have Azure/AWS fundamental and/or developer certifications.
+ Experience with Azure Cloud services in particular running containerized workloads in Azure Kubernetes Services is a big plus.
+ Ability to interface competently with other technical personnel or team members to finalize requirements.
+ Ability to work well with internal and external technology resources.
+ Ability to partner and lead internal and external technology resources in solving complex business needs.
+ Knowledge of test-driven development and unit testing frameworks/methodologies.
+ Ability to write and review portions of detailed specifications for the development of complex system components.
+ Good oral and written communications skills.
+ 7/8+ years of Software Engineering experience
+ B.Sc. (IT)/B.Sc. (CS)/BCA/MCA/BE (Computers/IT) or equivalent experience required.
Learn more about the LexisNexis Risk Solutions team and how we work ( are committed to providing a fair and accessible hiring process. If you have a disability or other need that requires accommodation or adjustment, please let us know by completing our Applicant Request Support Form or please contact .
Criminals may pose as recruiters asking for money or personal information. We never request money or banking details from job applicants. Learn more about spotting and avoiding scams here .
Please read our Candidate Privacy Policy .
We are an equal opportunity employer: qualified applicants are considered for and treated during employment without regard to race, color, creed, religion, sex, national origin, citizenship status, disability status, protected veteran status, age, marital status, sexual orientation, gender identity, genetic information, or any other characteristic protected by law.
USA Job Seekers:
EEO Know Your Rights .
RELX is a global provider of information-based analytics and decision tools for professional and business customers, enabling them to make better decisions, get better results and be more productive.
Our purpose is to benefit society by developing products that help researchers advance scientific knowledge; doctors and nurses improve the lives of patients; lawyers promote the rule of law and achieve justice and fair results for their clients; businesses and governments prevent fraud; consumers access financial services and get fair prices on insurance; and customers learn about markets and complete transactions.
Our purpose guides our actions beyond the products that we develop. It defines us as a company. Every day across RELX our employees are inspired to undertake initiatives that make unique contributions to society and the communities in which we operate.
Senior Site Reliability Engineer 1

Posted 1 day ago
Job Viewed
Job Description
Are you a visible champion with a 'can do' attitude and enthusiasm that inspires others?
About the Business
LexisNexis Risk Solutions is the essential partner in the assessment of risk. Within our Business Services vertical, we offer a multitude of solutions focused on helping businesses of all sizes drive higher revenue growth, maximize operational efficiencies, and improve customer experience. Our solutions help our customers solve difficult problems in the areas of Anti-Money Laundering/Counter Terrorist Financing, Identity Authentication & Verification, Fraud and Credit Risk mitigation and Customer Data Management. You can learn more about LexisNexis Risk at the link below, the Team
This Team performs complex research, design, and software development assignments within a software functional area or product line, and provides direct input to project plans, schedules, and methodology in the development of cross-functional software products .
About the Role
We are looking for a Senior DevOps / Site Reliability Engineer (SRE) with 7+ years of experience to join our high-performing engineering team. This role is pivotal in building scalable systems, reducing operational toil, and improving the reliability of our infrastructure and applications. As a senior member, you'll lead initiatives around automation, observability, and resilience, while collaborating closely with product, development, and operations teams .
Responsibilities:
+ Lead initiatives to identify and eliminate manual, repetitive tasks through automation and tooling.
+ Develop self-healing infrastructure solutions and drive continuous operational efficiency. Lead efforts to build resilient systems and proactively identify potential points of failure across the stack.
+ Design and implement reliability-focused automation and tooling to ensure consistent system performance and uptime.
+ Support post-release validations and operational readiness assessments to ensure smooth rollouts.
+ Occasional weekend support may be required (e.g., during major releases or critical changes).
+ Design, implement, and manage cloud-native infrastructure using Terraform and other IaC tools.
+ Ensure infrastructure follows principles of scalability, fault tolerance, and security.
+ Design and implement robust monitoring and alerting solutions using Elastic Stack, OpenTelemetry (OTEL), and similar tools.
+ Define and manage SLIs/SLOs, and partner with development teams to ensure service reliability.
+ Partner with engineering teams to create & improve CI/CD pipelines and deployment processes.
+ Provide technical leadership and recommendations to improve system architecture, release velocity, and developer productivity.
Requirements:
+ Good experience on OS - Linux, Cloud - AWS cloud
+ Strong in Terraform and Ansible infrastructure
+ Good experience in DevOps, Site Reliability Engineering, or Infrastructure Engineering.
+ Strong experience with AWS services and Terraform for IaC .
+ Deep understanding of incident response, post-mortem analysis, and reliability engineering principles.
+ Proven track record with Elastic Stack, or other observability tools.
+ Proficient in scripting (Python, Bash, etc.) and working with Git-based workflows.
+ Solid grasp of modern CI/CD tooling and software development lifecycle practices.
Good to Have Skills:
+ Experience in Azure, Kubernetes, or container orchestration tools.
+ Good to have OpenTelemetry
Learn more about the LexisNexis Risk team and how we work ( are committed to providing a fair and accessible hiring process. If you have a disability or other need that requires accommodation or adjustment, please let us know by completing our Applicant Request Support Form or please contact .
Criminals may pose as recruiters asking for money or personal information. We never request money or banking details from job applicants. Learn more about spotting and avoiding scams here .
Please read our Candidate Privacy Policy .
We are an equal opportunity employer: qualified applicants are considered for and treated during employment without regard to race, color, creed, religion, sex, national origin, citizenship status, disability status, protected veteran status, age, marital status, sexual orientation, gender identity, genetic information, or any other characteristic protected by law.
USA Job Seekers:
EEO Know Your Rights .
RELX is a global provider of information-based analytics and decision tools for professional and business customers, enabling them to make better decisions, get better results and be more productive.
Our purpose is to benefit society by developing products that help researchers advance scientific knowledge; doctors and nurses improve the lives of patients; lawyers promote the rule of law and achieve justice and fair results for their clients; businesses and governments prevent fraud; consumers access financial services and get fair prices on insurance; and customers learn about markets and complete transactions.
Our purpose guides our actions beyond the products that we develop. It defines us as a company. Every day across RELX our employees are inspired to undertake initiatives that make unique contributions to society and the communities in which we operate.