5,496 Sre jobs in India

Site Reliability Engineer (SRE)

Bengaluru, Karnataka ScaleneWorks

Posted today

Job Viewed

Tap Again To Close

Job Description

Excellent written and verbal communication skills.
experience in handling large production systems environment
Must be extremely comfortable using and navigating within a Linux environment
Ability to do low level debugging and problem analysis by examining logs and running Unix commands
Must be efficient in writing and debugging scripts
experience in Virtualization Technologies and Automation / Configuration Managements
Automation and configuration management tools/solutions: Ansible, Python, bash, Terraform, GoLang etc. (at least one)
Virtualization technologies: Citrix Xen Hypervisor (Preferred), KVM(also preferred), libvirt, VMware vSphere, etc. (at least one)
Monitoring technologies: Zabbix, Sysdig, Grafana, Nagios, Splunk, etc. (at least one)
Working knowledge with Container technologies: Kubernetes, Docker, etc.
Flexibility to work on shifts to handle production systems
Preferred Technical and Professional Experience
Good experience in public cloud platforms, Kubernetes clusters and Strong Linux skills for managing services across microservices platform, good SRE knowledge in Cloud Compute, Storage and Network services

This advertiser has chosen not to accept applicants from your region.

Site Reliability Engineer (SRE)

Hyderabad, Andhra Pradesh GSPANN

Posted today

Job Viewed

Tap Again To Close

Job Description

Description GSPANN is hiring an experienced Site Reliability Engineer (SRE) with 8+ years in IT Service Management (ITSM) and hands-on expertise in Application Performance Monitoring (APM) tools like Datadog and Splunk. We’re looking for a self-driven professional who can build resilient monitoring systems and improve application observability.

Role and Responsibilities

  • Utilize and manage APM tools such as Datadog and ensure efficient performance monitoring.
  • Leverage expertise in Splunk to support system observability and troubleshooting.
  • Integrate APM tools with third-party platforms like Outlook, Microsoft Teams, and other communication tools.
  • Design and implement alerting mechanisms using monitoring and logging tools to detect and resolve potential issues proactively.
  • Define and track Key Performance Indicators (KPIs) for system health and efficiency.
  • Streamline alerts to reduce noise and improve incident relevance.
  • Conduct periodic reviews to expand observability coverage across applications.
  • Analyze metrics regularly to identify anomalies and performance patterns.
  • Update dashboards as needed to reflect current operational metrics and system status.
  • Skills and Experience

  • 8+ years of experience in ITIL/ITSM process management.
  • At least 5 years of hands-on experience with Datadog or equivalent APM tools.
  • Proficiency in Splunk for data analysis and visualization.
  • Strong verbal and written communication skills.
  • Excellent organizational abilities with the capability to manage and prioritize multiple tasks.
  • Previous experience in an application support or similar SRE role.
  • Familiarity with other monitoring tools and emerging technologies is a plus.
  • This advertiser has chosen not to accept applicants from your region.

    SRE (Site Reliability Engineer)

    Bengaluru, Karnataka Garden City Games India Pvt Ltd

    Posted today

    Job Viewed

    Tap Again To Close

    Job Description

    NOC + SRE + DevOps Engineer


    About US:

    Garden City Games develops free-to-play mobile and social games enjoyed by millions worldwide. The Bangalore-based studio employs over 100 talented professionals dedicated to creating exceptional games for phones, tablets, and desktops.

     

    Previously known as PopReach Games, the company was founded by leading figures in the mobile games industry. It manages popular intellectual properties such as Smurfs Village, War of Nations, Kingdoms of Camelot, and Kitchen Scramble, with over 400 million lifetime downloads.

     

    Garden City Games is committed to diversity and inclusion, fostering a team with varied backgrounds, perspectives, and skills. In 2024, the studio became part of the Phoenix Games Group, benefiting from enhanced resources and support. With its new identity, Garden City Games continues to deliver top-tier mobile and social games to players.

     

    Job Summary:

    The NOC + SRE + DevOps Engineer will play a crucial role in maintaining and improving the reliability and performance of our services. This hybrid role encompasses monitoring and supporting our network operations, ensuring the reliability and scalability of our infrastructure, and enhancing our CI/CD processes. The ideal candidate will have a strong background in network operations, site reliability engineering, and DevOps practices with hands-on experience in Linux, AWS, ECS, Ansible, Terraform, and OpenStack.

     

    Key Responsibilities:

    ·    Linux, AWS knowledge.

    ·    Willing to work on On-call rotation for after-hours support.

    ·    Network Operations Center (NOC) Responsibilities:

    ·    Monitor and manage the company's network infrastructure and servers 24/7.

    ·    Respond to and troubleshoot network issues, outages, and performance bottlenecks.

    ·    Ensure high availability and reliability of network services.

    ·    Maintain and update network documentation and incident logs.

    ·    Coordinate with stakeholders for issue resolution.


     

    ·    Site Reliability Engineering (SRE) Responsibilities:

    ·    Develop and implement strategies for maintaining system reliability, availability, and performance.

    ·    Automate repetitive tasks to improve operational efficiency.

    ·    Monitor system performance and reliability metrics, and proactively address potential issues.

    ·    Conduct root cause analysis of incidents and implement preventive measures.

    ·    Keep infrastructure costs under control

    ·    Collaborate with development teams to ensure applications are designed for reliability and scalability.

     

    DevOps Responsibilities :

    ·    Manage and optimize CI/CD pipelines for application deployment.

    ·    Collaborate with development and operations teams to automate and streamline build, test, and deployment processes.

    ·    Implement and maintain infrastructure as code (IaC) using tools like Terraform and Ansible.

    ·    Ensure security and compliance of the deployment pipeline and infrastructure.

    ·    Continuously evaluate and integrate new tools and technologies to improve the DevOps process.



    Requirements Qualifications:

    ·    Bachelor’s degree in Computer Science, Information Technology, or related field or equivalent experience

    ·      5+ years of experience in network operations, site reliability engineering, and/or DevOps roles.

     

    Strong experience with Linux system administration.

    ·    Proficiency in cloud platforms, particularly AWS.

    ·    Hands-on experience with configuration management tools such as Ansible.

    ·    Proficiency in infrastructure as code (IaC) tools such as Terraform.

    ·    Experience with OpenStack for cloud infrastructure management.

    ·    Strong understanding of network protocols, network troubleshooting, and monitoring tools.


    Work Mode : Hybrid Mode,
    Location : Bangalore.





    This advertiser has chosen not to accept applicants from your region.

    Site Reliability Engineer (SRE)

    InOrg

    Posted today

    Job Viewed

    Tap Again To Close

    Job Description

    About VivaOps :

    VivaOps is a leading DevSecOps platform company specializing in GitLab - The comprehensive DevOps platform, to transform and secure software development processes. We help organizations to streamline their DevSecOps journey by offering a complete range of GitLab services, from advisory, to implementation and managed services, to accelerate deployment, optimize security, and improve collaboration. Our deep expertise in GitLab enables clients to unlock the full potential of their development pipelines, driving efficiency, innovation, and competitive advantage.

    Job Title: Site Reliability Engineer (SRE)

    Location: Remote

    Shift Timings: 5:30 PM to 3:00 AM IST to ensure support for global operations.

    Job Description:

    We are seeking a skilled Site Reliability Engineer (SRE) to join our dynamic team. The ideal candidate will have a strong background in both log and metrics monitoring stacks, specifically ELK (Elasticsearch, Logstash, Kibana) and TICK (Telegraf, InfluxDB, Chronograf, Kapacitor). As an SRE, you will be responsible for ensuring the reliability, availability, and performance of our customer's platforms and services, bridging the gap between development and operations.

    Required Skills and Qualifications:

    ● Minimum 3+ years of experience in Site Reliability Engineering, DevOps, or a related role.

    ● Proficiency in the ELK stack (Elasticsearch, Logstash, Kibana) for log monitoring.

    ● Experience with the TICK stack (Telegraf, InfluxDB, Chronograf, Kapacitor) for metrics monitoring.

    ● Strong scripting skills in languages such as Python, Bash, or Ruby.

    ● Understanding of Operating System:

    Ubuntu(OpenStack) - Must have

    Debian and Redhat etc.,

    ● DevOps Platforms

    Gitlab - Good to have

    Or similar

    ● Solid understanding of Grafana and Prometheus.

    ● Having worked with ServiceNow or something similar.

    ● Experience with configuration management tools like Ansible, Puppet, or Chef.

    ● Familiarity with containerization and orchestration tools like Docker and Kubernetes.

    ● Understanding of cloud platforms (Any of AWS, Azure, or GCP) and their services.

    ● Bachelor's degree in computer science, Information Technology, or a related field.

    ● Excellent problem-solving skills and attention to detail.

    ● Strong communication and collaboration abilities.

    About Inorg :

    InOrg handles India Operations for VivaOps and is dedicated to empowering organizations to achieve global growth at scale. InOrg is establishing a Global Capability Center (GCC) for VivaOps.

    This advertiser has chosen not to accept applicants from your region.

    Site Reliability Engineer (SRE)

    Pune, Maharashtra GfK

    Posted today

    Job Viewed

    Tap Again To Close

    Job Description

    Description

    About You

    You are a DevOps or Site Reliability Engineer with a passion for cloud infrastructure and automation. You’re a self-starter and you love keeping up to date with the latest developments in cloud, configuration management and container technologies. You understand the benefits of an immutable infrastructure and you enjoy enabling self-service deployments through continuous delivery pipelines.

    However, you have a head for operations and you know that security, backups, scalability and resilience are a paramount concern. In fact, you have a keen interest in security and you’re always looking for ways to use infrastructure-as-code to achieve security and compliance at scale. You understand that costs are a concern for the modern infrastructure engineer and you’re always looking for ways to reduce them.

    You thrive in a collaborative DevOps culture and you love to work as part of a cross-functional agile product delivery team alongside developers, data scientists, product owners and technical architects.


    Responsibilities

    Working in partnership with our software engineering teams, the Site Reliability Engineering (SRE) team is responsible for building and operating the next-generation infrastructure to support the GfK product portfolio.

    Working extensively with in our hybrid cloud environment (AWS, GCP, Azure, private VMWare cloud), you will be embedded within the product development and SRE squads to enable them to test and deploy software rapidly whilst ensuring the highest standards of reliability and security.

    You will influence architectural decisions with focus on security, scalability, cost and high-performance.

    Part of your role will be to set up and maintain monitoring, metrics and reporting systems for fine-grained observability and actionable alerting and also create and maintain appropriate backup, restore and redundancy solutions for business-critical data.

    Skills

  • Linux administration skills and a deep understanding of networking and TCP/IP.
  • Experience with the major cloud providers and Terraform.
  • Knowledge of technical architecture and modern-day design patterns, including micro-services, serverless functions, NoSQL, RESTful APIs, etc.
  • Demonstrable skills in a Configuration Management tool like Ansible.
  • Experience in setting up and supporting CI/CD pipelines and tooling such as GitHub or Gitlab CI
  • Proficiency in a high-level programming language such as Python or Go.
  • Experience with monitoring, log aggregation, and alerting tooling (ELK, Prometheus, Grafana etc).
  • Experience with Docker and Kubernetes
  • Experience with secret management tools like Hashicorp Vault is deemed a plus
  • Proficient in applying SRE core tenets, including SLI/SLO/SLA measurement, toil elimination, and reliability modeling for optimizing system performance and resilience.
  • Experience with cloud-native tools like Cluster API, service mesh, KEDA, OPA, Kubernetes Operators
  • Experience with big data technologies such as NoSQL/RDBMS(PostgreSQL, Oracle, MongoDB), Redis, Spark, Rabbit, Kafka etc.
  • Experience in troubleshooting and monitoring large-scale distributed systems
  • With @NielsenIQ, we’re now an even more diverse team of 40,000 people – each with their own stories

    Our increasingly diverse workforce empowers us to better reflect the diversity of the markets we measure.

    #NIQandGfK #TheFullView #ShapersofTomorrow #LI-Hybrid

    We are an ethical and honest company that is wholly committed to its clients and employees. We are proud to be an inclusive workplace for all and are committed to equal employment opportunity, focusing on all of our employees reaching their full potential. 

    We respect and value every employee regardless of race, ethnicity, gender, sex, sexual orientation, age, personality, experience, culture, faith, socio-economic status, or physical or mental disabilities.

    We endorse the core principles and rights set forth in the United Nations Declaration of Human Rights and the Social Charter of Fundamental Rights of the European Union, promoting the universal values of human dignity, freedom, equality, and solidarity.

    This advertiser has chosen not to accept applicants from your region.

    SRE (Site Reliability Engineer)

    Bengaluru, Karnataka Garden City Games India Pvt Ltd

    Posted today

    Job Viewed

    Tap Again To Close

    Job Description

    Job Description

    NOC + SRE + DevOps Engineer


    About US:

    Garden City Games develops free-to-play mobile and social games enjoyed by millions worldwide. The Bangalore-based studio employs over 100 talented professionals dedicated to creating exceptional games for phones, tablets, and desktops.

     

    Previously known as PopReach Games, the company was founded by leading figures in the mobile games industry. It manages popular intellectual properties such as Smurfs Village, War of Nations, Kingdoms of Camelot, and Kitchen Scramble, with over 400 million lifetime downloads.

     

    Garden City Games is committed to diversity and inclusion, fostering a team with varied backgrounds, perspectives, and skills. In 2024, the studio became part of the Phoenix Games Group, benefiting from enhanced resources and support. With its new identity, Garden City Games continues to deliver top-tier mobile and social games to players.

     

    Job Summary:

    The NOC + SRE + DevOps Engineer will play a crucial role in maintaining and improving the reliability and performance of our services. This hybrid role encompasses monitoring and supporting our network operations, ensuring the reliability and scalability of our infrastructure, and enhancing our CI/CD processes. The ideal candidate will have a strong background in network operations, site reliability engineering, and DevOps practices with hands-on experience in Linux, AWS, ECS, Ansible, Terraform, and OpenStack.

     

    Key Responsibilities:

    ·    Linux, AWS knowledge.

    ·    Willing to work on On-call rotation for after-hours support.

    ·    Network Operations Center (NOC) Responsibilities:

    ·    Monitor and manage the company's network infrastructure and servers 24/7.

    ·    Respond to and troubleshoot network issues, outages, and performance bottlenecks.

    ·    Ensure high availability and reliability of network services.

    ·    Maintain and update network documentation and incident logs.

    ·    Coordinate with stakeholders for issue resolution.


     

    ·    Site Reliability Engineering (SRE) Responsibilities:

    ·    Develop and implement strategies for maintaining system reliability, availability, and performance.

    ·    Automate repetitive tasks to improve operational efficiency.

    ·    Monitor system performance and reliability metrics, and proactively address potential issues.

    ·    Conduct root cause analysis of incidents and implement preventive measures.

    ·    Keep infrastructure costs under control

    ·    Collaborate with development teams to ensure applications are designed for reliability and scalability.

     

    DevOps Responsibilities :

    ·    Manage and optimize CI/CD pipelines for application deployment.

    ·    Collaborate with development and operations teams to automate and streamline build, test, and deployment processes.

    ·    Implement and maintain infrastructure as code (IaC) using tools like Terraform and Ansible.

    ·    Ensure security and compliance of the deployment pipeline and infrastructure.

    ·    Continuously evaluate and integrate new tools and technologies to improve the DevOps process.



    Requirements Qualifications:

    ·    Bachelor’s degree in Computer Science, Information Technology, or related field or equivalent experience

    ·      5+ years of experience in network operations, site reliability engineering, and/or DevOps roles.

     

    Strong experience with Linux system administration.

    ·    Proficiency in cloud platforms, particularly AWS.

    ·    Hands-on experience with configuration management tools such as Ansible.

    ·    Proficiency in infrastructure as code (IaC) tools such as Terraform.

    ·    Experience with OpenStack for cloud infrastructure management.

    ·    Strong understanding of network protocols, network troubleshooting, and monitoring tools.


    Work Mode : Hybrid Mode,
    Location : Bangalore.






    Requirements
    Qualifications: ● Education: Degree in Game Design or a related field. ● Experience: 4+ years of game design experience, F2P in mid-core genre ● Skills & Knowledge: ○ Extensive experience with free-to-play game design concepts and principles for mid core games. ○ Strong understanding of engagement and monetization Key Performance Indicators (KPIs). ○ Proven track record of writing comprehensive project plans, design briefs, and product specification documents. ○ Passion for games, demonstrated by a deep understanding of current gaming trends and player motivations. Preferred Qualifications: ● Experience with mid-core or hardcore game genres. ● Knowledge of current trends and best practices in the gaming industry. ● Strong analytical skills with the ability to interpret data and use it to inform design decisions Location: Bangalore/India, Work Mode: Office/hybrid
    This advertiser has chosen not to accept applicants from your region.

    Site Reliability Engineer (SRE)

    Hyderabad, Andhra Pradesh Arrise Solutions (India) Pvt. Ltd

    Posted today

    Job Viewed

    Tap Again To Close

    Job Description

    Site Reliability Engineer


    Location: Hyderabad

    Reports to: Head – DevOps & TechOps

    Type of Position: Full Time



    About us:


    Arrise Solutions (India) Pvt. Ltd. is a leading content provider to the iGaming and Betting Industry, offering a multi-product portfolio that is innovative, regulated and mobile-focused. Arrise Solutions (India) Pvt. Ltd. strives to create the most engaging and evocative experience for customers globally across a range of products, including slots, live casino, sports betting, virtual sports and bingo. 


    Job Description :


    About The Role


    The team at Arrise Solutions (India) Pvt. Ltd. deals with infrastructure provisioning automation, configuration management, tool administration, production system support, application environment maintenance, observability, release management, and internal tool development for efficient handling of the aforementioned activities. You will be actively involved in all aspects of the team activities.


    What You’ll Do Here

    • Analyze, troubleshoot, debug, and assist in problem solving in test and production environments within the framework of incident and change management processes.
    • Lead any outage triage that impact the infrastructure/applications and collaborate closely with support & development teams by playing an active role in blameless postmortems, refine play books to reduce MTTR
    • Ensure the operational health, high availability, reliability, and security of the applications & Infra.
    • Maintain production services through measuring and monitoring availability, latency, and overall system health.
    • Handle on-call and emergency support


    What We're Looking For

    • 7-9 years of experience with SRE/DevOps/System Engineer/Production Support roles
    • Infrastructure/app performance engineering experience is a nice to have
    • Experience with Linux administration and troubleshooting of Java applications in production
    • Maintenance and development of monitoring, logging, tracing, and alerting solutions (Grafana, ELK or PagerDuty or equivalents)
    • Hands-on experience with tools and techniques to diagnose and uncover container and overall system performance
    • Ability to handle fast paced environment with multiple projects simultaneously and incident responses.
    • Hands-on experience with handling Infrastructure on AWS / GCP / Data Center or similar cloud/on-premise platforms.
    • Expertise in scripting languages: Python / Groovy / Ruby or equivalent
    • Experience in DevOps, CI/CD, Configuration, and Release Management areas and relevant tools




    This advertiser has chosen not to accept applicants from your region.
    Be The First To Know

    About the latest Sre Jobs in India !

    Mainframe SRE - Site Reliability Engineer

    Bangalore, Karnataka Kyndryl

    Posted 14 days ago

    Job Viewed

    Tap Again To Close

    Job Description

    **Who We Are**
    At Kyndryl, we design, build, manage and modernize the mission-critical technology systems that the world depends on every day. So why work at Kyndryl? We are always moving forward - always pushing ourselves to go further in our efforts to build a more equitable, inclusive world for our employees, our customers and our communities.
    **The Role**
    Join us as a Site Reliability Engineer (SRE) and embark on an exciting journey of ensuring reliability, resiliency, and innovation in our information systems and ecosystems. As an SRE at Kyndryl, you'll be at the forefront of driving continuous improvement and delivering exceptional service to our customers.
    Your role goes beyond traditional engineering, as you'll have the opportunity to analyze business needs, tackle complex problems, and provide strategic advice and designs. You'll be involved in every stage of the software lifecycle, from building and testing to deploying changes and maintaining robust systems.
    We're looking for a true visionary who can think strategically and help shape the future of our services. Your expertise in building trusted relationships with customers and partnering with them for success will be instrumental in driving our growth.
    As an SRE, you'll have the unique opportunity to work on end-to-end services, spanning customer sites and platforms. Collaboration and proactivity are key as you work alongside a talented team of professionals, eager to make a difference. You'll embrace an entrepreneurial mindset, taking ownership of your responsibilities and constantly seeking innovative solutions.
    With an unwavering focus on quality, robustness, and security, you'll be a driving force in implementing cutting-edge tools that enhance our operations, improve reliability, and gather valuable feedback on our platforms. Your ability to identify and mitigate common operational issues will play a crucial role in delivering seamless experiences to our customers.
    If you're passionate about pushing the boundaries of technology, thrive in a collaborative environment, and are motivated by the opportunity to shape the future of reliability engineering, then we want to hear from you. Join our team and be part of a dynamic and forward-thinking organization that values innovation and excellence in everything we do.
    Your Future at Kyndryl
    Kyndryl has a global footprint, which means that as a Site Reliability Engineer at Kyndryl you will have opportunities to work on projects and collaborate with colleagues from around the world. This role is dynamic and influential - offering a wide range of professional and personal growth opportunities that you won't find anywhere else.
    **Who You Are**
    You're good at what you do and possess the required experience to prove it. However, equally as important - you have a growth mindset; keen to drive your own personal and professional development. You are customer-focused - someone who prioritizes customer success in their work. And finally, you're open and borderless - naturally inclusive in how you work with others.
    Required Skills and Experience
    + 10+ years of experience in operational management, including incident management and escalations
    + Experience with design and implementation of application monitoring to ensure reliability and performance meets or exceeds business goals
    + Experience implementing strategies to cap operations load and to handle overflow using appropriate tooling and metrics; defining service level indicators and objectives in collaboration with stakeholders, business, development, DevSecOps and Operations teams
    + Analyze z/OS Related Problems; Apply Knowledge of ISV Products; Apply Knowledge of ISV Systems; Apply Knowledge of LPAR; Apply Knowledge of Parallel Sysplex; Apply Knowledge of z/OS;Apply Knowledge of z/OS Allocation;Apply Knowledge of z/OS Consoles Component;
    + Apply Knowledge of z/OS Health Checker;Apply Knowledge of z/OS I/O Subsystem;Apply Knowledge of z/OS Logger;
    + Apply Knowledge of z/OS Network Connectivity;Apply Knowledge of z/OS RRS;Apply Knowledge of z/OS Scheduler/SMF;
    + Apply Knowledge of z/OS Storage Management;Apply Knowledge of z/OS Suprvisr/Contents/NIP/IPL;
    + Apply Knowledge of z/OS WLM;Apply knowledge of ISV Solution
    Preferred Skills and Experience
    + BS degree in Computer Science, Engineering, or other highly technical, scientific discipline
    + Expertise with Ansible, Terraform, and Python
    + Experience with distributed technologies as well as dynamic resource management frameworks such as Kubernetes
    + Expertise in leveraging open-source tooling such as Prometheus, Grafana, or Loki
    **Being You**
    Diversity is a whole lot more than what we look like or where we come from, it's how we think and who we are. We welcome people of all cultures, backgrounds, and experiences. But we're not doing it single-handily: Our Kyndryl Inclusion Networks are only one of many ways we create a workplace where all Kyndryls can find and provide support and advice. This dedication to welcoming everyone into our company means that Kyndryl gives you - and everyone next to you - the ability to bring your whole self to work, individually and collectively, and support the activation of our equitable culture. That's the Kyndryl Way.
    **What You Can Expect**
    With state-of-the-art resources and Fortune 100 clients, every day is an opportunity to innovate, build new capabilities, new relationships, new processes, and new value. Kyndryl cares about your well-being and prides itself on offering benefits that give you choice, reflect the diversity of our employees and support you and your family through the moments that matter - wherever you are in your life journey. Our employee learning programs give you access to the best learning in the industry to receive certifications, including Microsoft, Google, Amazon, Skillsoft, and many more. Through our company-wide volunteering and giving platform, you can donate, start fundraisers, volunteer, and search over 2 million non-profit organizations. At Kyndryl, we invest heavily in you, we want you to succeed so that together, we will all succeed.
    **Get Referred!**
    If you know someone that works at Kyndryl, when asked 'How Did You Hear About Us' during the application process, select 'Employee Referral' and enter your contact's Kyndryl email address.
    Kyndryl is committed to creating a diverse environment and is proud to be an equal opportunity employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, gender, gender identity or expression, sexual orientation, national origin, genetics, pregnancy, disability, age, veteran status, or other characteristics. Kyndryl is also committed to compliance with all fair employment practices regarding citizenship and immigration status.
    This advertiser has chosen not to accept applicants from your region.

    Mainframe SRE - Site Reliability Engineer

    Chennai, Tamil Nadu Kyndryl

    Posted 14 days ago

    Job Viewed

    Tap Again To Close

    Job Description

    **Who We Are**
    At Kyndryl, we design, build, manage and modernize the mission-critical technology systems that the world depends on every day. So why work at Kyndryl? We are always moving forward - always pushing ourselves to go further in our efforts to build a more equitable, inclusive world for our employees, our customers and our communities.
    **The Role**
    Join us as a Site Reliability Engineer (SRE) and embark on an exciting journey of ensuring reliability, resiliency, and innovation in our information systems and ecosystems. As an SRE at Kyndryl, you'll be at the forefront of driving continuous improvement and delivering exceptional service to our customers.
    Your role goes beyond traditional engineering, as you'll have the opportunity to analyze business needs, tackle complex problems, and provide strategic advice and designs. You'll be involved in every stage of the software lifecycle, from building and testing to deploying changes and maintaining robust systems.
    We're looking for a true visionary who can think strategically and help shape the future of our services. Your expertise in building trusted relationships with customers and partnering with them for success will be instrumental in driving our growth.
    As an SRE, you'll have the unique opportunity to work on end-to-end services, spanning customer sites and platforms. Collaboration and proactivity are key as you work alongside a talented team of professionals, eager to make a difference. You'll embrace an entrepreneurial mindset, taking ownership of your responsibilities and constantly seeking innovative solutions.
    With an unwavering focus on quality, robustness, and security, you'll be a driving force in implementing cutting-edge tools that enhance our operations, improve reliability, and gather valuable feedback on our platforms. Your ability to identify and mitigate common operational issues will play a crucial role in delivering seamless experiences to our customers.
    If you're passionate about pushing the boundaries of technology, thrive in a collaborative environment, and are motivated by the opportunity to shape the future of reliability engineering, then we want to hear from you. Join our team and be part of a dynamic and forward-thinking organization that values innovation and excellence in everything we do.
    Your Future at Kyndryl
    Kyndryl has a global footprint, which means that as a Site Reliability Engineer at Kyndryl you will have opportunities to work on projects and collaborate with colleagues from around the world. This role is dynamic and influential - offering a wide range of professional and personal growth opportunities that you won't find anywhere else.
    **Who You Are**
    You're good at what you do and possess the required experience to prove it. However, equally as important - you have a growth mindset; keen to drive your own personal and professional development. You are customer-focused - someone who prioritizes customer success in their work. And finally, you're open and borderless - naturally inclusive in how you work with others.
    Required Skills and Experience
    + 10+ years of experience in operational management, including incident management and escalations
    + Experience with design and implementation of application monitoring to ensure reliability and performance meets or exceeds business goals
    + Experience implementing strategies to cap operations load and to handle overflow using appropriate tooling and metrics; defining service level indicators and objectives in collaboration with stakeholders, business, development, DevSecOps and Operations teams
    + Analyze z/OS Related Problems; Apply Knowledge of ISV Products; Apply Knowledge of ISV Systems; Apply Knowledge of LPAR; Apply Knowledge of Parallel Sysplex; Apply Knowledge of z/OS;Apply Knowledge of z/OS Allocation;Apply Knowledge of z/OS Consoles Component;
    + Apply Knowledge of z/OS Health Checker;Apply Knowledge of z/OS I/O Subsystem;Apply Knowledge of z/OS Logger;
    + Apply Knowledge of z/OS Network Connectivity;Apply Knowledge of z/OS RRS;Apply Knowledge of z/OS Scheduler/SMF;
    + Apply Knowledge of z/OS Storage Management;Apply Knowledge of z/OS Suprvisr/Contents/NIP/IPL;
    + Apply Knowledge of z/OS WLM;Apply knowledge of ISV Solution
    Preferred Skills and Experience
    + BS degree in Computer Science, Engineering, or other highly technical, scientific discipline
    + Expertise with Ansible, Terraform, and Python
    + Experience with distributed technologies as well as dynamic resource management frameworks such as Kubernetes
    + Expertise in leveraging open-source tooling such as Prometheus, Grafana, or Loki
    **Being You**
    Diversity is a whole lot more than what we look like or where we come from, it's how we think and who we are. We welcome people of all cultures, backgrounds, and experiences. But we're not doing it single-handily: Our Kyndryl Inclusion Networks are only one of many ways we create a workplace where all Kyndryls can find and provide support and advice. This dedication to welcoming everyone into our company means that Kyndryl gives you - and everyone next to you - the ability to bring your whole self to work, individually and collectively, and support the activation of our equitable culture. That's the Kyndryl Way.
    **What You Can Expect**
    With state-of-the-art resources and Fortune 100 clients, every day is an opportunity to innovate, build new capabilities, new relationships, new processes, and new value. Kyndryl cares about your well-being and prides itself on offering benefits that give you choice, reflect the diversity of our employees and support you and your family through the moments that matter - wherever you are in your life journey. Our employee learning programs give you access to the best learning in the industry to receive certifications, including Microsoft, Google, Amazon, Skillsoft, and many more. Through our company-wide volunteering and giving platform, you can donate, start fundraisers, volunteer, and search over 2 million non-profit organizations. At Kyndryl, we invest heavily in you, we want you to succeed so that together, we will all succeed.
    **Get Referred!**
    If you know someone that works at Kyndryl, when asked 'How Did You Hear About Us' during the application process, select 'Employee Referral' and enter your contact's Kyndryl email address.
    Kyndryl is committed to creating a diverse environment and is proud to be an equal opportunity employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, gender, gender identity or expression, sexual orientation, national origin, genetics, pregnancy, disability, age, veteran status, or other characteristics. Kyndryl is also committed to compliance with all fair employment practices regarding citizenship and immigration status.
    This advertiser has chosen not to accept applicants from your region.

    Remote Site Reliability Engineer (SRE)

    641014 Coimbatore, Tamil Nadu ₹100000 month WhatJobs

    Posted 1 day ago

    Job Viewed

    Tap Again To Close

    Job Description

    full-time
    Our client is seeking a highly skilled and motivated Remote Site Reliability Engineer (SRE) to join their growing technology team. This is a fully remote position, offering the opportunity to work on mission-critical infrastructure and ensure the high availability, performance, and scalability of their services. The ideal candidate will have a strong background in systems administration, software development, and cloud technologies. You will be responsible for automating infrastructure operations, developing monitoring solutions, managing cloud environments, and responding to incidents to maintain robust and reliable systems.

    Key Responsibilities:
    • Design, build, and maintain scalable and reliable cloud infrastructure (AWS, Azure, GCP).
    • Develop automation tools and scripts (e.g., Python, Bash, Go) to streamline infrastructure deployment and management.
    • Implement and manage robust monitoring, alerting, and logging systems (e.g., Prometheus, Grafana, ELK Stack).
    • Troubleshoot and resolve production issues, ensuring minimal downtime and impact on users.
    • Participate in on-call rotation to provide 24/7 support for critical systems.
    • Develop and maintain system documentation, runbooks, and architectural diagrams.
    • Implement and manage CI/CD pipelines for automated software delivery.
    • Conduct performance tuning and capacity planning for production environments.
    • Collaborate with development teams to ensure system resilience and operability.
    • Participate in incident response and post-mortem analysis to identify root causes and implement preventative measures.
    • Ensure security best practices are implemented and maintained across all infrastructure.
    Qualifications:
    • Bachelor's degree in Computer Science, Engineering, or a related field, or equivalent practical experience.
    • Minimum of 5 years of experience in Site Reliability Engineering, DevOps, or Systems Engineering.
    • Strong proficiency in at least one cloud platform (AWS, Azure, GCP).
    • Expertise in scripting languages such as Python, Bash, or Go.
    • Experience with containerization technologies (Docker, Kubernetes).
    • In-depth knowledge of Linux operating systems.
    • Experience with infrastructure as code tools (e.g., Terraform, Ansible).
    • Familiarity with monitoring and logging tools.
    • Excellent problem-solving and debugging skills.
    • Strong communication and collaboration abilities, essential for a remote role.
    • Experience with CI/CD tools and practices.
    This remote role offers a competitive salary, excellent benefits, and the chance to work with a cutting-edge technology stack on critical systems. Join our client and make a significant impact on their platform's reliability.
    This advertiser has chosen not to accept applicants from your region.
     

    Nearby Locations

    Other Jobs Near Me

    Industry

    1. request_quote Accounting
    2. work Administrative
    3. eco Agriculture Forestry
    4. smart_toy AI & Emerging Technologies
    5. school Apprenticeships & Trainee
    6. apartment Architecture
    7. palette Arts & Entertainment
    8. directions_car Automotive
    9. flight_takeoff Aviation
    10. account_balance Banking & Finance
    11. local_florist Beauty & Wellness
    12. restaurant Catering
    13. volunteer_activism Charity & Voluntary
    14. science Chemical Engineering
    15. child_friendly Childcare
    16. foundation Civil Engineering
    17. clean_hands Cleaning & Sanitation
    18. diversity_3 Community & Social Care
    19. construction Construction
    20. brush Creative & Digital
    21. currency_bitcoin Crypto & Blockchain
    22. support_agent Customer Service & Helpdesk
    23. medical_services Dental
    24. medical_services Driving & Transport
    25. medical_services E Commerce & Social Media
    26. school Education & Teaching
    27. electrical_services Electrical Engineering
    28. bolt Energy
    29. local_mall Fmcg
    30. gavel Government & Non Profit
    31. emoji_events Graduate
    32. health_and_safety Healthcare
    33. beach_access Hospitality & Tourism
    34. groups Human Resources
    35. precision_manufacturing Industrial Engineering
    36. security Information Security
    37. handyman Installation & Maintenance
    38. policy Insurance
    39. code IT & Software
    40. gavel Legal
    41. sports_soccer Leisure & Sports
    42. inventory_2 Logistics & Warehousing
    43. supervisor_account Management
    44. supervisor_account Management Consultancy
    45. supervisor_account Manufacturing & Production
    46. campaign Marketing
    47. build Mechanical Engineering
    48. perm_media Media & PR
    49. local_hospital Medical
    50. local_hospital Military & Public Safety
    51. local_hospital Mining
    52. medical_services Nursing
    53. local_gas_station Oil & Gas
    54. biotech Pharmaceutical
    55. checklist_rtl Project Management
    56. shopping_bag Purchasing
    57. home_work Real Estate
    58. person_search Recruitment Consultancy
    59. store Retail
    60. point_of_sale Sales
    61. science Scientific Research & Development
    62. wifi Telecoms
    63. psychology Therapy
    64. pets Veterinary
    View All Sre Jobs