1816 Devops Engineers jobs in Bengaluru
Cloud Site Reliability Engineer
Posted 1 day ago
Job Viewed
Job Description
Location: Bangalore
Experience: 6 - 9 Years
- Deploying Docker images including Docker Compose
- Production experience running Kubernetes workloads ideally on AWS EKS
- Experience managing and maintaining Kubernetes Clusters on AWS EKS
- Experience creating and deploying Helm charts & libraries
- Production experience with infrastructure-as-code (IaC), Terraform preferred
- Hands-on experience with Jenkins Core, including authoring and maintaining declarative CI/CD pipelines and libraries
- Experience with monitoring tools e.g., CloudWatch, Datadog & Splunk Cloud
- Proficiency with UNIX operating systems and shell scripting
- Programming experience, e.g., Python preferred
- Experience with distributed version control systems, Git preferred
- Experience with the agile software development lifecycle and Kanban preferred
- Experience with CDN Providers e.g., Akamai preferred
The Skills that are good to have for this Role
- Experience with Amazon Web Services (AWS), having managed services and applications in a large AWS cross-account environment using IAM and federated SSO
- Experience crafting and maintaining logging, monitoring, and alerting capabilities using tools like Datadog and Splunk
- Ability to communicate at all levels with track record of strong written and verbal communications
- See problems as opportunities to automate
- Ability to work independently with minimal direction
- Drive and champion the overall design of highly available, secure, scalable microservices-based applications in AWS
Site Reliability Engineer
Posted 6 days ago
Job Viewed
Job Description
Looking for a DevOps Senior Engineer in the Data Engineering team who can help us support next-generation Analytics applications over Oracle cloud.
This posting is for DevOps Senior Engineer in the Oracle Analytics Warehouse product development organization. Fully handled Cloud service that provides customers a turn-key enterprise warehouse on the cloud for Fusion Applications. The service is being built on a sophisticated technology stack demonstrating a brand-new data integration platform and the industry's most sophisticated analytical business analytics platform.
are looking for senior engineer with experience in supporting data warehousing products. As a member of the Product development organization, focus will be on working with development teams, providing timely support to customers and identify/implementing process automation, for cloud BI product.
· BE or equivalent experience or higher degree in Computer Science / Engineering or equivalent from top university
· validated experience, supporting business customers on any Cloud/On-premise BI Application
· Proficiency in using Python to interact with the Apache Spark framework for big data processing
· Experience in SQL/PL-SQL and excellent de-bugging skills
· Experience in Diagnosing network latency and intermittent issues, Reading and analyzing log files
· Good Functional Knowledge in domains like ERP, HCM, SCP and CX
· Working experience with any ERP/in-demand application such as Oracle EBS, Fusion is helpful
· Good programming skills in Python/Java
· Exposure to cloud infrastructure, Oracle Cloud Infrastructure (OCI) is helpful
· Experience in performance tuning SQL and understanding ETL pipelines
· Build, Configure, Manage and Coordinate all Build and Release engineering activities
· Strong logical/critical thinking and problem resolution skill
· Excellent interpersonal skills
**Responsibilities**
Roles and Responsibilities:
· As member of Pipeline Production Operations, you will address customer issues and tickets within defined SLA's
· Proactively identify and resolve potential problems in an effort to prevent them from occurring and improve the overall customer experiences
· You will approach each case with a goal of ensuring Oracle Analytics products are performing at an efficient level by addressing any underlying or additional problems uncovered during each
Customer engagement.
· Co-ordinate and connect with different team members to formulate the solutions to customer issues
· You will ensure full understanding of the issue, including impact to customer.
You will recommend solutions to customers and follow through to resolution or escalate the case in a timely manner if no resolution can be found.
· Bring together logs, configuration details and attempt to reproduce the reported issues.
· Develop and improve Knowledge base for the issues and their solutions.
Participate in knowledge sharing via involvement in technical discussions and Knowledge Base documentation.
Prioritize workload based on severity and demonstrate a sense of urgency when handling cases.
Find opportunities for process improvements and automation through building right utilities/tools
Willing to be working in Shifts and weekends based on support rota.
Career Level - IC3
**About Us**
As a world leader in cloud solutions, Oracle uses tomorrow's technology to tackle today's challenges. We've partnered with industry-leaders in almost every sector-and continue to thrive after 40+ years of change by operating with integrity.
We know that true innovation starts when everyone is empowered to contribute. That's why we're committed to growing an inclusive workforce that promotes opportunities for all.
Oracle careers open the door to global opportunities where work-life balance flourishes. We offer competitive benefits based on parity and consistency and support our people with flexible medical, life insurance, and retirement options. We also encourage employees to give back to their communities through our volunteer programs.
We're committed to including people with disabilities at all stages of the employment process. If you require accessibility assistance or accommodation for a disability at any point, let us know by emailing or by calling in the United States.
Oracle is an Equal Employment Opportunity Employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, national origin, sexual orientation, gender identity, disability and protected veterans' status, or any other characteristic protected by law. Oracle will consider for employment qualified applicants with arrest and conviction records pursuant to applicable law.
Site Reliability Engineer
Posted 1 day ago
Job Viewed
Job Description
Hiring for Lead / Sr Site Reliability Engineer for Mumbai & Bangalore Location
Experience - 8 - 14 Years
Job Purpose
- Analysing, troubleshooting, and designing vital services, platforms, and infrastructure on GCP while always thinking about reliability, scalability, resilience, security, and performance.
Job Responsibilities:
- Help build a Site Reliability Engineering culture by sharing the best practices, approaches, documentation, and code with other engineering teams
- Apply automation and software to any tasks or parts of the system which are performed manually
- Able to troubleshoot complicated, cross platform issues handling OS, Networking, Database in a cloud-based SaaS environment and handle live production incidents
- Monitor application performance take steps to improve overall application performance and stability and follow through with implementation
Key Skills:
- Experience in monitoring and analyzing infrastructure performance using standard performance monitoring tools
- Demonstrable experience in Containerization-Docker and orchestration (Kubernetes)
- Experience with Infrastructure As Code (Terraform, Cloud Formation, Ansible)
- Knowledge and proven hands-on experience in large-scale databases and distributed technologies, such as Kafka and Confluent Platform Kafka
- Basic programming and scripting skills
Site Reliability Engineer
Posted 1 day ago
Job Viewed
Job Description
Good-day,
We have immediate opportunity for Senior Site Reliability Engineer.
Job Role: Senior Site Reliability Engineer
Job Location: Synechron ( Bengaluru/ Pune)
Experience- 10 to 15 years
Notice : Immediate Joiner
About Company:
At Synechron, we believe in the power of digital to transform businesses for the better. Our global consulting firm combines creativity and innovative technology to deliver industry-leading digital solutions. Synechron’s progressive technologies and optimization strategies span end-to-end Artificial Intelligence, Consulting, Digital, Cloud & DevOps, Data, and Software Engineering, servicing an array of noteworthy financial services and technology firms. Through research and development initiatives in our FinLabs we develop solutions for modernization, from Artificial Intelligence and Blockchain to Data Science models, Digital Underwriting, mobile-first applications and more. Over the last 20+ years, our company has been honoured with multiple employer awards, recognizing our commitment to our talented teams. With top clients to boast about, Synechron has a global workforce of 13,950+, and has 52 offices in 20 countries within key global markets. For more information on the company, please visit our website or LinkedIn community.
Diversity, Equity, and Inclusion
Diversity & Inclusion are fundamental to our culture, and Synechron is proud to be an equal opportunity workplace and an affirmative-action employer. Our Diversity, Equity, and Inclusion (DEI) initiative ‘Same Difference’ is committed to fostering an inclusive culture – promoting equality, diversity and an environment that is respectful to all. We strongly believe that a diverse workforce helps build stronger, successful businesses as a global company. We encourage applicants from across diverse backgrounds, race, ethnicities, religion, age, marital status, gender, sexual orientations, or disabilities to apply. We empower our global workforce by offering flexible workplace arrangements, mentoring, internal mobility, learning and development programs, and more.
All employment decisions at Synechron are based on business needs, job requirements and individual qualifications, without regard to the applicant’s gender, gender identity, sexual orientation, race, ethnicity, disabled or veteran status, or any other characteristic protected by law.
JOB DESCRIPTION
Experience & qualifications:
10 to 15 years of IT experience with hands on experience in CICD pipeline, DevOps tools & site reliability practices.
Excellent database knowledge with indexing, performance tuning and query optimization skills
Excellent skills in Linux , networking and firewall configurations.
Experience in production monitoring systems
Experience in DevOps tools like Docker, Git, Kubernetes etc.
Experience in DevOps - infrastructure as code, containerization, and automation of infrastructure provisioning
Experience in studying and suggesting improvements on application workload efficiency.
Strong admin level experience in application monitoring tools like Splunk.
Strong configuration experience in job scheduler tools like Control M or Autosys
Experience in collaborating with multiple teams to understand and improve application performance
Preferably from payments and cards domain.
Very good understanding of version control systems like GitHub.
Excellent knowledge of ITIL version 4 with thorough understanding of problem management life cycle, root cause analysis and permanent resolution, change management & incident management process.
Job roles & responsibilities
- Automate software build and deployment processes.
- Build CICD pipeline development that are tailor made as per needs & requirements of the environment.
- Takes accountability for permanent resolution of recurring issues within defined SLAs
- provide technical solutions to complex issues working with solution architects.
- Identify system bottlenecks in terms of Operating System , Database and network, work with the infrastructure teams for permanent resolution for the same.
- Work closely with infrastructure managers, system architects on improving platform stability, and robustness.
- Understands existing change implementation process for upstream systems and proposes improvement ideas.
- Work on improving system reliability with key parameters like Latency, traffic, errors & saturation.
- Understands end user issues and liaise with Partners & stakeholders (internal and external) to co-ordinate on permanent resolution for business user pain points.
- Study existing manual support processes , propose automated solutions to enhance productivity and save manual efforts.
- Review of DR exercise processes and provide recommendations to reduce any outage events during the exercises.
- Building monitoring systems that alerts on symptoms rather than on outages
- Review knowledge base like standard operating procedures , wiki page articles for content quality and guide team on constantly updating with live cases.
- Study and implement measures to enhance system availability .
To expedite the application process, I would appreciate it if you could provide the following information at your earliest convenience:
- Tentative Date to Join (if selected):
- Current Location:
- Preferred Location:
- Current Salary:
- Expected Salary:
- Reason for Change:
- Total Experience:
- Relevant Experience:
- Share official email confirmation of Notice Period or Last Working Day :
- Primary Skills (Hands-on):
- Secondary Skills:
Please send your updated resume to my email at or reach out to me via WhatsApp at .
Site Reliability Engineer
Posted 1 day ago
Job Viewed
Job Description
Job Requirements:
- Bachelor's degree in Computer Science, Engineering, or a related field.
- 3+ year of experience in a relevant role, such as Site Reliability Engineer, DevOps Engineer, or similar, is preferred but not mandatory.
- Basic understanding of AWS solutions including EC2, S3, CloudWatch, Lambda, and RDS.
- Interest and understanding of Platform Engineering concepts and principles.
- Familiarity with monitoring and observability tools such as Prometheus, Grafana, or ELK stack
Job Description
- Assist in designing, implementing, and maintaining scalable monitoring, alerting, and logging solutions to ensure the availability and performance of backend services.
- Support the development and implementation of observability tools and practices to derive actionable insights from operational data.
- Collaborate with development teams to design and support scalable, reliable, and resilient systems on AWS.
- Contribute to platform engineering projects to create automated solutions for deployment, scaling, and operations.
- Assist in the development of disaster recovery plans, and participate in DR and capacity testing under supervision.
- Analyze system performance and provide recommendations for optimization and improvement.
- Participate in post-incident reviews, identifying root causes and preventive measures.
- Maintain documentation for system procedures, operations, and architecture configurations.
- Stay up-to-date with industry best practices, tools, and technologies related to site reliability and platform engineering.
- Participate in on-call rotations as needed to ensure 24/7 availability of critical systems and services.
Site Reliability Engineer
Posted 1 day ago
Job Viewed
Job Description
We are seeking an experienced AWS DevOps Engineer with strong expertise in Observability and Site Reliability Engineering (SRE) to design, build, and manage scalable, reliable, and secure cloud environments. The role requires hands-on experience with AWS services, Infrastructure as Code (IaC), CI/CD, monitoring & observability frameworks, and incident response practices to ensure high availability, performance, and resilience of business-critical systems.
Key Responsibilities
- Cloud Infrastructure (AWS):
- Design, implement, and manage scalable, resilient, and cost-optimized cloud infrastructure using AWS services (EC2, EKS, Lambda, RDS, S3, CloudFront, IAM, VPC, etc.).
- Implement Infrastructure as Code (IaC) using tools like Terraform / CloudFormation .
- DevOps & Automation:
- Build and maintain CI/CD pipelines (Jenkins, GitHub Actions, GitLab CI, or AWS CodePipeline) for automated deployments.
- Automate repetitive tasks to improve development velocity and operational efficiency.
- Observability & Monitoring:
- Define and implement observability strategy covering monitoring, logging, tracing, and alerting.
- Work with tools like Prometheus, Grafana, ELK/EFK stack, AWS CloudWatch, Datadog, New Relic, Splunk, or Dynatrace .
- Establish SLIs, SLOs, and SLAs to measure and improve system reliability.
- Site Reliability Engineering (SRE):
- Drive incident management processes – detection, alerting, root cause analysis, and postmortems.
- Apply chaos engineering principles to validate resilience and recovery.
- Optimize reliability, latency, scalability, and system efficiency.
- Security & Compliance:
- Implement best practices for cloud security, identity & access management, and compliance frameworks (ISO, SOC2, GDPR, etc.).
- Ensure observability and monitoring meet security and audit requirements.
- Collaboration & Leadership:
- Partner with development, QA, and product teams to ensure seamless deployments.
- Mentor junior engineers and promote a culture of reliability, automation, and continuous improvement .
Required Skills & Qualifications
- 7+ years of professional experience in DevOps, Cloud Infrastructure, or SRE roles.
- Strong expertise in AWS Cloud (certification preferred: AWS Certified DevOps Engineer, Solutions Architect, or SysOps).
- Proficiency in IaC tools (Terraform, CloudFormation).
- Solid experience in CI/CD pipeline tools (Jenkins, GitHub Actions, GitLab CI/CD, AWS CodePipeline).
- Hands-on with observability tools : Prometheus, Grafana, CloudWatch, ELK, Datadog, New Relic, Splunk, or similar.
- Deep understanding of SRE principles : SLIs/SLOs, error budgets, incident response, chaos testing.
- Strong scripting/coding experience (Python, Bash, Go, or similar).
- Knowledge of containers & orchestration (Docker, Kubernetes, EKS).
- Familiarity with security best practices in cloud-native environments.
Preferred Skills
- Experience with multi-cloud or hybrid-cloud environments .
- Exposure to resiliency testing & chaos engineering tools (Gremlin, Litmus, Chaos Mesh).
- Knowledge of cost-optimization and FinOps in AWS.
- Excellent communication and stakeholder management skills.
What We Offer
- Opportunity to work on cutting-edge cloud-native architectures.
- A culture focused on automation, reliability, and innovation .
- Growth opportunities with certifications, training, and leadership exposure.
Site Reliability Engineer
Posted 1 day ago
Job Viewed
Job Description
About Tavant:
With 25+ years of experience building innovative digital products and solutions, Tavant provides impactful results to its customers. It has been the frontrunner in driving digital innovation and tech-enabled transformation across a wide range of industries such as Consumer Lending, Manufacturing, Agtech, Media & Entertainment, and Retail in North America, Europe, and Asia-Pacific. Powered by Artificial Intelligence and Machine Learning algorithms, we help our customers improve their operational efficiency, productivity, speed, and accuracy. Our suite of products and solutions are routinely rated high by the industry.
Ours is a challenging workplace where teams are diverse, competitive, and continually searching for tomorrow's technology and brilliant minds to create it. And we don’t focus just on what we do – we also care how we do it. So, bring your talent and ambition to make a difference. We’ll create a world of opportunities for you.
Job Title - Site Reliability Engineer (SRE)
Location - Bangalore
Work Experience -9 to 15 years
Seeking an experienced Site Reliability Engineer to join our Data Compute DevOps/Cloud/SRE team. This role focuses on ensuring high availability, scalability, and reliability of cloud infrastructure with emphasis on Kubernetes orchestration and AWS cloud services.
Essential Requirements
- Kubernetes: Proficient level expertise (9+ years) - cluster management, networking, security, troubleshooting
- AWS: Working knowledge of core services (EC2, EKS, S3, IAM, VPC)
- Infrastructure as Code: Terraform and Terraform Enterprise (TFE)
- CI/CD: GitLab and GitLab CI/CD pipelines
- Monitoring & Observability: Prometheus and Grafana
- GitOps: ArgoCD deployment experience
- Container Orchestration: Helm charts and package management
- Version Control: Git workflows and repository management
Core Competencies
- SRE principles and practices (SLIs, SLOs, error budgets)
- Infrastructure automation and orchestration
- Incident management and post-mortem processes
- System reliability and performance optimization
- Scripting abilities (Bash, Python)
Be The First To Know
About the latest Devops engineers Jobs in Bengaluru !
Site Reliability Engineer
Posted 1 day ago
Job Viewed
Job Description
Performance & Reliability Engineer ( Senior, Lead , Principal & Manager)
Hybrid
Location: Pune, Chennai, Bangalore & Gurgaon
Need immediate joiners only
Job description
Role: Performance & Reliability Engineer
Job Location: Gurgaon, Chennai, Pune, Bangalore
Hybrid
Job Overview:
We are seeking a highly skilled and motivated Performance & Reliability Engineer to join our team. In this role, you will be responsible for ensuring the reliability, scalability, and performance of our systems and applications. You will leverage tools such as Dynatrace , CloudWatch , and Python to monitor and optimize system performance, troubleshoot issues, and enhance the overall reliability of our infrastructure with SRE Best Practices .
Key Responsibilities:
- Performance Monitoring & Optimization:
- Use Dynatrace and CloudWatch to monitor system performance and availability.
- Implement performance tuning techniques to ensure high availability and optimal system performance.
- Identify performance bottlenecks and optimize applications and infrastructure for scalability.
- System Observability
- AppDynamics and monitoring dashboards.
- Collaborate with development and operations teams to troubleshoot incidents and provide recommendations for performance improvements.
- Proactively identify areas of risk and implement preventive measures.
- Automation & Scripting:
- Develop automation scripts in Python to enhance monitoring, incident response, and reporting processes.
- Write and maintain Python-based tools for proactive monitoring, alerting, and issue resolution.
- Cloud Monitoring & Alerts:
- Configure CloudWatch for real-time monitoring and alerting of cloud infrastructure,
- Develop and manage dashboards to visualize system health and performance metrics.
- Prepare and present performance reports, incident post-mortems, and improvement recommendations to senior leadership.
- Chaos Engineering, Fault management
- Vulnerability identification, Failure simulation, Stress Management
Required Skills and Experience:
- Strong experience with Dynatrace for application performance monitoring and root cause analysis.
- Proficiency in CloudWatch for monitoring AWS cloud infrastructure, configuring alerts, and visualizing metrics.
- Solid understanding of Python for automating tasks, building performance tools, and writing scripts to enhance operations.
- Experience in analyzing system logs, troubleshooting performance issues, and providing technical recommendations.
- Hands-on experience with cloud environments (AWS preferred), including development knowledge
- Experience with load testing and performance benchmarking.
About Xebia:
Site Reliability Engineer
Posted 1 day ago
Job Viewed
Job Description
About the company:
At WizCommerce, we are revolutionising B2B commerce for manufacturers, wholesalers, and distributors across the USA, India and Canada. Our mission is to empower businesses with innovative technology solutions that drive revenue and foster growth. Recognizing the challenges posed by outdated software and cumbersome offline selling methods, we offer a cutting-edge, AI-powered B2B sales app designed to enhance customer engagement.
Our platform allows sales representatives to take orders seamlessly, whether at trade shows, in the field, or online. With our industry’s only AI engine, reps can upsell effectively and automate internal processes, all while enjoying an intuitive and beautifully crafted user interface. Efficiency and user experience go hand in hand in our design philosophy.
While B2C e-commerce has rapidly advanced, the B2B sector has often lagged behind, relying on disjointed systems. At WizCommerce, we see a vital opportunity to disrupt this space, bringing the same level of innovation that consumers expect. Our vision is to equip B2B businesses with the tools they need to thrive in a competitive market, streamline operations, and enhance customer interactions.
If you want to disrupt a Trillion$+ space, WizCommerce is the place to be. Read more about us: Forbes , The Morning Star Yourstory or our website !
Founders:
Divyaanshu Makkar (Co-founder, CEO)
Vikas Garg (Co-founder, CCO)
Why should you join us?
- Huge Market Size: B2B commerce is just in India and the US. Our two main geos are a 5 Trillion$ space, waiting to be disrupted. Every start-up has a risk of going to 0, but few have a market large enough to become 100B + companies.
- Never-ending exciting opportunities: If you are a person who loves to dig deeper into problems, figure out solutions, take ownership and implement the solution and then chill - this is the right place for you. We are building something that has never been made before
- Customers love us: There are many times when customers have invited us to stay with them to understand their pain points as we have exceeded their expectations in every feature launch, that’s the ownership and quality bar we set here.
We’re looking for people with a strong interest in building successful products or systems, comfortable dealing with lots of moving pieces, exquisite attention to detail, and are comfortable learning new technologies and methods.
Role & Responsibilities:
- System Reliability & Monitoring : Design and implement monitoring solutions to track system performance, uptime, and critical metrics.
- Incident Response : Act as the first responder during system outages, resolving issues or escalating as needed.
- Automation : Develop tools and scripts to automate routine tasks, ensuring scalability and minimal downtime.
- Capacity Planning : Forecast and plan for growth in system capacity and scale resources accordingly.
- Performance Tuning : Continuously monitor and optimize the performance of systems and applications.
- Collaboration : Work with cross-functional teams, including DevOps and engineering, to improve system design and deployment processes.
- Security & Compliance : Ensure systems meet security and compliance standards, including backups, disaster recovery, and failover strategies.
- Documentation : Maintain clear and detailed documentation of system architecture, configuration, and procedures.
Qualifications and Skills:
- Bachelor’s degree in Computer Science, Engineering, or related field (or equivalent experience).
- 1+ years of experience in SRE or DevOps
- Proven experience in managing and supporting cloud infrastructure (AWS, Azure, or GCP). Experience with GCP is a plus.
- Proficiency in programming languages such as Python, Go, or Shell scripting.
- Experience with containerization tools like Docker and orchestration platforms like Kubernetes and repo-manager like Helm.
- Strong understanding of CI/CD pipelines and configuration management tools (Jenkins, Ansible, Terraform).
- Expertise in monitoring tools such as Prometheus, Grafana, or NewRelic.
- Strong troubleshooting and problem-solving skills.
Compensation: Best in the industry
Role location: Bengaluru/Gurugram
Website Link:
Site Reliability Engineer
Posted 1 day ago
Job Viewed
Job Description
We are looking for a L0 and L1 Site Reliability Engineer (SRE) Support to join our Krutrim Cloud Site Reliability operations team and ensure the smooth functioning of our cloud infrastructure powered by OpenStack and Kubernetes . In this role, you will focus on monitoring , basic troubleshooting , and incident response , helping to maintain high system availability, reliability, and performance. You will be responsible for identifying and addressing simple issues, as well as escalating more complex problems to senior SREs when needed.
The ideal candidate should have a basic understanding of cloud infrastructure (especially OpenStack and Kubernetes ), containerized environments , and system monitoring. This position offers an excellent opportunity for someone looking to grow into a more advanced SRE or DevOps role.