16 Site Reliability Engineer jobs in Coimbatore
Site Reliability Engineer
Posted today
Job Viewed
Job Description
Job Title: Site Reliability Engineer
About noon
noon.com is a technology leader with a simple mission: to be the best place to buy and sell things. In doing this we hope to accelerate the digital economy of the Middle East, empowering regional talent and businesses to meet the full range of consumers' online needs.
noon operates without boundaries; we are aggressively and voraciously ambitious. Starting in 2017 with noon.com, the region’s homegrown e-commerce platform and leading online shopping destination, noon is now a digital ecosystem of products and services - noon, noon Food, Noon in Minutes, NowNow, SIVVI, noon One, and noon Pay.
At noon we have the courage to pursue what seems impossible, we work hard to get things done, we go to great lengths to ensure that the experience of everyone from our customers to our sellers or noon Bandidos is stellar but above all, we are grateful for the opportunities we have. If you feel the above values resonate with you – you will enjoy this incredible journey with us!
Job Description
As a Site Reliability Engineer (SRE) at noon payments, you will play a crucial role in maintaining and enhancing the reliability, availability, and performance of our cloud-based infrastructure and services.
You will be responsible for automating deployments, optimizing systems, and ensuring seamless performance across our platforms. This position requires a strong foundation in cloud infrastructure management, particularly with Azure - AKS and GCP-GKE, alongside hands-on experience with Azure DevOps and monitoring tools like Datadog.
You will:
- Cloud Infrastructure Management: Manage and optimize cloud environments across Azure and GCP, ensuring efficient resource utilization, high system availability, and scalability (AKS-GKE).
- Infrastructure as Code: Utilize Terraform for infrastructure provisioning, ensuring consistent and scalable deployments, and managing infrastructure via Azure DevOps pipelines.
- Configuration Management: Implement and manage system configurations using Ansible to ensure consistency and streamline updates across different environments.
- Continuous Integration/Continuous Deployment (CI/CD): Develop, maintain, and optimize CI/CD pipelines within Azure DevOps to automate testing and deployment processes, reducing time from development to production.
- Monitoring and Observability: Set up and maintain comprehensive monitoring and observability solutions using Datadog to track system health, performance, and proactively detect issues.
- Container Orchestration: Deploy, manage, and optimize Kubernetes clusters to support scalable and resilient application deployments.
- Incident Management: Participate in a 24/7 on-call or roster-based team to respond to incidents, conduct root cause analysis, and implement solutions to minimize downtime and ensure system reliability.
- Performance Tuning: Continuously monitor system performance, identify bottlenecks, and implement optimizations to improve efficiency and response times.
- Capacity Planning: Plan and manage system capacity to ensure resources meet current and future demands, enabling seamless service delivery.
- Collaboration: Work closely with Network Operations Center (NOC) and DevOps teams to troubleshoot issues, optimize deployment processes, and drive continuous improvement.
- Documentation: Create and maintain detailed documentation for system configurations, deployment processes, and incident reports.
Skill Requirements
- Bachelor’s degree in computer science, Information Technology or any other related discipline or equivalent related experience.
- Cloud, ITIL, CKA certifications are a plus.
- 6+ years of directly related or relevant experience, preferably in information security.
- Extensive experience with cloud platforms such as Azure, GCP, and Huawei Cloud.
- Proficiency with Terraform for infrastructure automation and Ansible for configuration management.
- Hands-on experience with Kubernetes for container orchestration mainly AKS and GKE.
- Expertise in monitoring and observability tools such as Datadog.
- Familiarity with Azure VMSS, GCP MIG for virtual machine scaling and management.
- Experience in a 24/7 on-call or roster-based team environment, focusing on system uptime and incident response.
- Strong understanding of SRE processes and best practices for system reliability, availability, and performance.
- Excellent problem-solving skills and the ability to handle complex technical issues under pressure.
- Effective communication skills and a collaborative approach to working with diverse teams.
- Experience with payment gateway projects or similar high-transaction systems is preferred.
- Additional knowledge in advanced monitoring techniques, performance tuning, and capacity planning is a plus.
Who will excel?
We’re looking for candidates who thrive in a fast-paced, dynamic start-up environment. We’re searching for problem solvers, people who operate with a bias for action and have a deep understanding of the importance of resourcefulness over reliance.
Candor is our only default. Demanding unequivocal high standards should be non-negotiable because quality matters. We want people who are radically candid, cohorts who commit to settling for nothing but the best - in hiring, in accepting work from colleagues, and in your own work.
Ours is not an easy mission, but it is a meaningful one. Every hire must actively raise the bar of talent in the company to help us reach our vision.
Site Reliability Engineer
Posted today
Job Viewed
Job Description
Role: Senior Site Reliability Engineer
Location: Remote position
Mode: Remote (may need to travel to nearby Concentrix office as per business need)
Minimum Experience required: 8+ Years
Key Responsibilities:
- Stakeholder Management Working with key technology stakeholders to deliver SRE strategies and capabilities to drive and support the digital transformation agenda.
- Work with and use Agile SDLC methodology to deliver critical automation and solutions.
E2E SRE Lifecycle Management
- Develop automated CI/CD pipelines for building, deploying and delivery of critical software
- Create automated infrastructure mechanisms, leveraging Infrastructure-as-Code (IAC) tooling and processes.
- Write modular IAC code to provision, maintain, and decommission infrastructure in any of the major cloud vendors (AWS, Azure, GCP) – with the end goal of creating infrastructure on demand.
- Help define CI/CD pipeline design & configuration – which can be used to set the standard for the corporation.
- Build new features and incorporate best practices to enhance the CI/CD processes and innovate on existing processes.
- Responsibility for integrating security tooling, ensuring its complete integration into every stage of the development journey, continually delivering security throughout the software development and delivery process.
- Lead the planning and documentation of complex CI/CD strategies based on customer needs.
- Maintain CI/CD pipelines and configurations and perform reviews of the same.
- Analyze and repair broken pipelines and failures.
- Investigate, diagnose, and repair complex pipelines, testing bugs, and manage CI/CD reports.
- Understand business processes at the Solution level to integrate industry best practices into DevSecOps process.
- Tracks usage of DevSecOps practices across business units.
Experience:
- 5-8 years in leading DevSecOps/SRE teams to help drive the digital transformation in financial services or technology domain.
- Experience developing and implementing DevSecOps practices using modern technology stacks and patterns.
- Familiarity with various CI/CD methodologies and their corresponding tools.
- Proficiency in SRE role including CI/CD, automation, IaC, security best practices built into the process, continuous monitoring and Agile ways of working.
- Strong experience in driving DevSecOps transformation
- Proven track record in partnering with the business to support SRE mission critical transformations using an Agile approach.
Skills and Competencies:
- Hands-on experience in SRE role.
- Proficient in defining CI/CD pipelines, security best practices, deployment best practices in cloud using Azure, AWS, GCP, or equivalent public cloud.
- DevSecOps SRE Strategy: Align the Strategy with Agile Development Model and use relevant techniques such as blue-green deployments, canary releases, continuous security scanning, etc. to implement scalable, automated DevSecOps practices.
- Communication: Ability to influence and help communicate the organization’s direction, defend technical rationale for decisions, and ensure results are achieved.
- Collaboration: Proven track record of building collaborative partnerships and ability to operate effectively in a global environment.
- Diverse environment: Can-do attitude and ability to work in a high paced environment.
Tech Stack preferred:
- Development & Delivery Methods: Agile (Scaled Agile Framework)
- Languages and Scripting: Python, YAML, Groovy, Shell, awk, Ansible, Terraform
- DevSecOps: Azure DevOps, Jenkins, JFrog or Nexus, GitLab, GitHub, Maven/Ant/Gradle
- Security: SonarQube, BlackDuck, Fortify
- Container and Orchestration Platforms: Azure Kubernetes Services (AKS), Docker, GKE, EKS, Kubernetes, Docker Swarm, Docker
- Release and Monitoring Tools and Methodologies: Blue Green Deployment, Canary release, Splunk, Grafana (LGTm) or ES.
Site Reliability Engineer
Posted today
Job Viewed
Job Description
About Mindcurv We help our customers rethink their digital business, experiences, and technology to navigate the new digital reality.
We do this by designing sustainable and accountable solutions for humans living in a digital world.
Mindcurv holistically covers the market’s need to digitalise business processes and customer experiences and take advantage of the cloud following DevOps and agile principles.
We cater to the following six solution lines: Strategy and Advisory Creative Services and Digital Products Client Engagement Platforms Digital Experience and Solutions Data Services Cloud Platforms and Managed Services We are made up of a team of experts from various domains that define, create, and improve digital experiences.
They engage with people to achieve solutions that enable businesses to grow and scale sustainably.
Within Digital Platforms & Commerce we design and fully craft tailored solutions for our customers enabling them to get the most out of their business.
We design and build a solid foundation in commerce, marketplace, responsive design, DXP and order management to name a few.
You’ll be instrumental in automating operations, solving complex infrastructure challenges, and driving continuous improvement to deliver seamless and resilient services.
Your responsibilities will include: Design, build, and maintain scalable infrastructure and systems.
Automate operational tasks to improve efficiency and reliability.
Implement application monitoring and continuous improvement of application performance and stability.
Develop and implement disaster recovery and incident management strategies.
Collaborate with developers to improve application architecture and deployment.
Optimize system availability, latency, and performance metrics.
Manage CI/CD pipelines for seamless software delivery.
Perform root cause analysis and lead detailed post-mortems.
Consult with software development teams to implement reliability best practices.
Write and maintain infrastructure and operational documentation.
Operational responsibility of a number of distributed applications.
Including on-call shifts.
Multiple years of experience programming in languages such as Python, Go, or Java.
Expertise with cloud platforms (AWS, Azure, GCP) and tools.
Hands-on experience with infrastructure as code (Terraform, Ansible, etc.).
Familiarity with Linux/Unix systems and networking fundamentals.
Familiarity with containerization and orchestration tools like Docker and Kubernetes.
Proven ability to monitor, debug, and optimize distributed systems.
Experience managing CI/CD pipelines and automation frameworks.
Strong problem-solving skills and attention to detail.
Excellent communication and collaboration skills for cross-functional teamwork.
Ability to analyze and improve complex systems for reliability and scalability.
Self-motivated with a passion for continuous learning and improvement.
We are an Advanced consulting partner for AWS and Gold partner for Microsoft Azure We train our engineers with the required skills to be able to develop platforms in the cloud We invest in your certifications provided by approved programs like AWS and Microsoft and that are mandated in the role Access to learning and training portals to help you stay up to date with latest technologies We have virtual rooms with specialists from various regions support in active discussion on technologies Finally, we offer a great place to work and collaborate What do we offer you?
Perks, like drinks and snacks, pizza sessions, gaming nights and sensational parties, a pension scheme and a great salary, we’ve got them all.
And if you crave an intellectual challenge, Mindcurv has you covered.
Interesting projects involving the latest, hyper innovative tech.
An agile, entrepreneurial environment, with lots of freedom and no politics.
Work-life balance, a culture of transparency and a management team with their ears to the ground.
We typically are the trailblazers and innovators.
A hybrid way of working in which you can work from home.
You just come into one of our offices in Essen, Cologne, Düsseldorf, Munich, Frankfurt a.M., Hamburg, Jena, Utrecht, Madrid, Cochin, Coimbatore or Trivandrum when it adds value to be on-site.
Looking for a glance behind the scenes? Our high performers You know who really thrive with us?
Self-starters, team-players and continuous learners, with an uncanny ability to handle ambiguity.
We’ll equip you with everything you need to succeed, help you explore the length and breadth of your domain and provide you with constant growth opportunities, to enrich your career.
Ready for change?
Are you ready for the next step in your career?
For a role in which you can be fully yourself and bring out the best?
In yourself, your colleagues and your clients?
Don’t wait any longer and apply for this job right now.
Powered by JazzHR
Senior Site Reliability Engineer
Posted today
Job Viewed
Job Description
Role - Senior SRE
Location: Pune, Bangalore, Hyderabad, Chennai, Ahmadabad, Noida
Type - Permanent
Job Description:
Qualifications:
- Bachelor’s or Master’s degree in Computer Science, Engineering, or a related field.
- Minimum of 6 years of IT experience for an engineer-level position.
Basic Skills:
- 2-3 years of experience as a Cloud SRE/Engineer with applications utilising services such as:
- Cloud Build
- Cloud Functions
- Logging
- Monitoring
- GCS (Google Cloud Storage)
- CloudSQL
- IAM (Identity and Access Management)
- Proficiency in Python, with experience in a secondary language like Golang or Java.
- Proven ability to manage codebases and configurations effectively.
Key Skills:
- Strong knowledge and hands-on experience with:
- Docker
- Experience in implementing and maintaining CI/CD pipelines using:
- GCP Cloud Build
- Other cloud-native services
- Proficiency in Infrastructure as Code (IaC) tools like Terraform.
- Knowledge of security best practices and RBAC.
- Experience in defining, monitoring, and achieving:
- Service Level Objectives (SLOs)
- Service Level Agreements (SLAs)
- Proficiency with source control tools like Github Enterprise.
- Commitment to continuous improvement and automation of manual tasks.
- Familiarity with monitoring tools such as:
- Grafana
- Prometheus
- Splunk
- GCP native logging solutions
- Willingness to provide extra hours of support when necessary.
Nice to Have Skills:
- Experience in managing K8s workloads
- Experience in secrets management using HashiCorp Vault.
- Experience in any tracing tools like google tracing , honeycomb.
Senior Site Reliability Engineer- ELK Expert
Posted today
Job Viewed
Job Description
Location: India (Remote) - Must be available to work in the EST (US/Canada) Time Zone.
Role Summary:
Are you a Senior Site Reliability Engineer (SRE) with deep ELK expertise, ready to take ownership of large-scale observability infrastructure?
We're looking for an SRE with 7+ years of experience , including 4+ years specializing in the ELK stack (Elasticsearch, Logstash, Kibana) , to join our Platform Engineering Practice . In this role, you’ll design, manage, and scale ELK clusters ingesting 2–3+ TB/day , enhance reliability across distributed systems, and drive automation within Azure cloud environments. This is a high-impact engineering opportunity focused on performance, observability, and operational excellence at scale.
- Career Growth: Work alongside industry experts on cutting-edge cloud technologies
- Competitive Compensation and Benefits: We recognize and reward top talent
- Exciting, Impactful Work: Design and build scalable, resilient cloud environments
- Strategic Platform Role: Contribute to the foundation of next-gen observability and reliability infrastructure
- Design and Optimize Cloud Infrastructure: Architect scalable, fault-tolerant systems on Microsoft Azure
- Automate Everything: Use Terraform, Ansible, and GitHub Actions to streamline deployment and configuration
- Ensure Reliability and Performance: Proactively monitor, troubleshoot, and resolve production issues using Prometheus, Grafana, and Azure Monitor
- Enhance Security and Compliance: Implement security best practices across DevOps workflows
- Collaborate and Innovate: Work closely with engineering, security, and operations teams to drive automation and efficiency
- Manage and scale large ELK clusters handling 2–3+ TB/day log volumes, ensuring high availability and performance
- Optimize ELK architecture: Implement efficient index lifecycle policies, shard strategies, and hot-warm-cold tiered storage
- Build and tune log pipelines: Scale Logstash and Beats pipelines across distributed environments
- Support Kibana observability layers: Create dashboards, visualizations, and custom alerting frameworks (e.g., Watcher, ElastAlert)
- 7+ years of experience in Site Reliability Engineering, DevOps, or Cloud Engineering
- 4+ years of dedicated, hands-on experience with ELK (Elasticsearch, Logstash, Kibana)
- Strong experience managing large-scale ELK clusters in production with heavy ingestion (multi-TB/day)
- Deep knowledge of index tuning, shard allocation, ILM policies , and scaling ELK components
- Expertise in GitHub Actions, Terraform, Ansible, and Infrastructure as Code (IaC)
- Proficiency in Python, Go, or Bash for automation and scripting
- Deep understanding of Kubernetes, Docker , and cloud-native architectures
- Experience with observability tools such as Prometheus, Grafana, Azure Monitor
- Ability to work in a fast-paced, collaborative environment and solve complex operational issues
- Bachelor’s or Master’s degree in Computer Science, Information Technology, or a related field
- Microsoft Azure certifications: AZ-104 , AZ-400
Cloud Infrastructure Specialist
Posted today
Job Viewed
Job Description
We are seeking an experienced infrastructure specialist to join our platform engineering team.
- This is a high-performance, cloud-native infrastructure system role.
Senior Cloud Infrastructure Engineer
Posted today
Job Viewed
Job Description
This role offers a chance to work on cutting-edge AI projects with leading language model companies, leveraging expertise in cloud infrastructure and software development.
Be The First To Know
About the latest Site reliability engineer Jobs in Coimbatore !
Cloud Infrastructure Architect Lead
Posted today
Job Viewed
Job Description
Cloud Infrastructure Engineer Lead
We are seeking an experienced engineer to oversee the design, implementation, and management of scalable, secure cloud infrastructure. The ideal candidate will possess a strong background in cloud platforms, CI/CD pipelines, and team leadership.
Key Responsibilities:
Lead the implementation and maintenance of cloud infrastructure on platforms like AWS, Azure, or GCP.
Design and manage efficient CI/CD pipelines for reliable deployments.
Automate infrastructure provisioning using tools like Terraform or Ansible.
Monitor system performance, troubleshoot issues, and optimize scalability.
Drive DevOps best practices, security standards, and governance within the organization.
Collaborate with engineering teams to ensure smooth release processes.
Requirements:
10+ years of experience in DevOps, SRE, or Cloud Engineering roles.
Expertise with AWS, Azure, or GCP, and infrastructure-as-code tools (e.g., Terraform).
Strong scripting skills (e.g., Python, Bash).
Solid understanding of container orchestration and Docker.
Experience with CI/CD tools (e.g., Jenkins, GitHub Actions).
Proficient in system monitoring/logging (e.g., Prometheus, Grafana).
What We Offer:
A competitive salary package.
A flexible remote work setup.
Career growth and leadership development opportunities.
An annual learning & development budget.
A collaborative engineering culture.
Benefits:
Flexible Work Arrangement : 100% remote work setup with flexible hours.
Career Growth Opportunities : Annual learning & development budget to support continuous growth.
Engineering Culture : Collaborative and forward-thinking environment.
Why Choose Us:
We offer a dynamic work environment where engineers can thrive. If you're passionate about cloud infrastructure and DevOps, this is the perfect opportunity to join our team.