Site Reliability Engineering Lead

Hyderabad, Andhra Pradesh GSPANN

Posted today

Job Viewed

Tap Again To Close

Job Description

Description GSPANN is hiring a Site Reliability Engineering (SRE) Lead with 10+ years of experience in IT Service Management (ITSM) and Application Performance Monitoring (APM). The ideal candidate will have hands-on expertise with Datadog, a strong grasp of IT operations, and the ability to implement workflow automation and drive operational excellence through data and KPIs.

Role and Responsibilities

  • Apply deep knowledge of Information Technology Infrastructure Library (ITIL v4) and ITSM platforms. (Certification is preferred).
  • Use Datadog to monitor performance, infrastructure, and digital experience (RUM, Synthetic Monitoring, etc.).
  • Implement complex process workflows and track performance using metrics-driven reporting.
  • Demonstrate a strong understanding of IT Operations and its impact on application reliability.
  • Communicate technical concepts clearly and concisely to both technical teams and executive leadership.
  • Build strategic relationships across teams, departments, business stakeholders, and external partners.
  • Translate business requirements into measurable KPIs that reflect application stability and provide business insights.
  • Troubleshoot recurring issues with a focus on incident reduction and operational automation.
  • Identify Toil (manual, repetitive work) and propose automation opportunities.
  • React quickly to time-sensitive issues with strong problem-solving and decision-making skills.
  • Skills and Experience

  • 7+ years of experience in ITIL/ITSM management.
  • 3+ years working with Datadog APM tools, including infrastructure monitoring, logs, and digital experience components.
  • Proven experience in administering the Datadog platform across its various features.
  • Prior experience in a similar application support or SRE leadership role.
  • Familiarity with additional monitoring tools and modern observability technologies.
  • Excellent analytical, troubleshooting, and problem-solving skills.
  • Strong communication and organizational capabilities.
  • Ability to manage multiple tasks while prioritizing effectively.
  • This advertiser has chosen not to accept applicants from your region.

    Manager Data Reliability Engineering

    Hyderabad, Andhra Pradesh Zeta Services Inc.

    Posted today

    Job Viewed

    Tap Again To Close

    Job Description

    About Zeta:Zeta is a Next-Gen Banking Tech company that empowers banks and fintechs to launch banking products for the future. It was founded by and Ramki Gaddipati in 2015.Our flagship processing platform - Zeta Tachyon - is the industry’s first modern, cloud-native, and fully API-enabled stack that brings together issuance, processing, lending, core banking, fraud & risk, and many more capabilities as a single-vendor stack. 20M+ cards have been issued on our platform globally.Zeta is actively working with the largest Banks and Fintechs in multiple global markets transforming customer experience for multi-million card portfolios.Zeta has over 1700+ employees - with over 70% roles in R&D - across locations in the US , EMEA , and Asia . We raised $280 million at a valuation from Softbank, Mastercard, and other investors in 2021. Learn more @ , , , About the Role: As a Lead Data Reliability Engineer, you will be accountable for designing, deploying, and managing our complex cloud-based data systems. You will lead efforts to ensure the reliability, scalability, and security of our data infrastructure, while mentoring and guiding junior team members.

    Responsibilities:

  • Leverage deep expertise in database management and optimization to ensure high performance and reliability of our data systems.
  • Identify bottlenecks and performance issues within data pipelines, optimize query performance, data access, and overall data processing.
  • Design, deploy, and manage complex data systems in cloud environments AWS, Azure, GCP) using tools such as Terraform and adhering to AWS well architected Framework.
  • Develop and implement complex automation solutions using tools such as Jenkins and scripts to streamline data operations and enhance efficiency.
  • Architect and manage enterprise HA and DR solutions to ensure business continuity and data availability.
  • Expertly analyze and optimize database performance, identifying and resolving bottlenecks.
  • Ensure adherence to cloud security best practices and compliance standards, protecting sensitive data and systems.
  • Manage complex incidents, troubleshoot issues, and implement effective solutions to maintain data integrity and system reliability.
  • Demonstrate leadership skills, mentor junior team members, and foster a collaborative and communicative team environment.
  • Skills:

  • Leadership: Proven leadership skills with the ability to inspire and motivate a team.
  • Project Management: Proven experience in managing complex database projects.
  • Database Management Systems: Expertise in one or more relational database management systems.
  • Security and Compliance: In-depth knowledge of database security principles and compliance requirements.
  • Performance Tuning: Advanced skills in monitoring and optimizing database performance.
  • Team Collaboration: Effective collaboration with cross-functional teams and departments.
  • Vendor Management: Experience in engaging with database technology vendors.
  • Problem-Solving: Strong problem-solving skills, especially in resolving complex database issues.
  • Communication: Excellent communication skills for conveying technical information to both technical and non-technical stakeholders.
  • Strategic Planning: Ability to contribute to the development of the organization's overall database strategy.
  • Experience and Qualifications:

  • Bachelor’s degree in Computer Science or equivalent with 8 - 11 years of hands-on experience in database management and optimization on various relational databases with PostgreSQL as primary skillset.
  • Experience in Cloud Databases administration - Configuration, Backup/Restore, replication which are in the order of 10s of TB in size. 
  • Expertise in identifying and resolving performance issues within data pipelines.
  • Experience in developing and implementing complex automation solutions using tools like Jenkins, Terraform & Python.
  • Experience in architecting and managing high availability and disaster recovery solutions of Databases in the cloud across regions.
  • Expert in performance monitoring and optimization of database systems.
  • In-depth knowledge of cloud security best practices and compliance standards.
  • Extensive experience in managing complex incidents and troubleshooting.
  • Should be able to solve problems, make sound decisions and use good judgment. 
  • Leadership skills with the ability to mentor and guide junior team members.
  • Experience with modern data processing tools and frameworks Apache Kafka, Apache Spark, Airflow, Debezium etc.) is a plus.
  • Zeta is an equal opportunity employer. At Zeta, we are committed to equal employment opportunities regardless of job history, disability, gender identity, religion, race, marital/parental status, or another special status. We are proud to be an equitable workplace that welcomes individuals from all walks of life if they fit the roles and responsibilities.
    This advertiser has chosen not to accept applicants from your region.

    Site Reliability Engineering Advisor

    Hyderabad, Andhra Pradesh FedEx

    Posted today

    Job Viewed

    Tap Again To Close

    Job Description

    This is the critical Site Reliability Engineer Advisor position in FedEx Office Omni Channel team for the current and future vision of FedEx Retail online Services. If not filled, this will put the execution of our vision of framework stability and resilience as well as print omni channel runway at risk.

    JOB SPECIFIC INFORMATION
     

  • Drive improvement initiatives that support the overarching global reliability of the company's systems, including capacity planning, failover strategies, performance improvements, reduction of Mean Time to Awareness/Resolve, and postmortems.
  • Leverage critical thinking to improve best practices and provides enterprise-level recommendations that ensure reliability and resiliency.
  • Future oriented.
  • Drives strategic planning.
  • Mentor those in less senior positions.
  • Perform other duties as assigned.
  • Design, develop and implement enterprise scale reliability solutions to support the organization’s business strategy and goals.
  • Work closely with our architects on an Agile team to fully understand systems and application monitoring, provide feedback, and request clarification as needed.
  • Work with other engineers on a collaborative Agile team to address latency best practices on an ongoing basis.
  • Review reliability requirements and data models and translate into responsible architecture
  • Conduct design and code reviews with extended team to ensure that reliability code meets FedEx Enterprise Foundational Services standards
  • Track and resolve performance bottlenecks
  • Standardization of logs across applications ensuring that only the required information to support the application is produced for new and existing applications
  • Applicant must work well under pressure
  • Stake holder management in India and US time
  • Skills/Knowledge Considered a Plus:
     

  • Experience developing in the following languages: PHP, Java, JavaScript and SQL
  • Experience with Relational Databases like Oracle and MySQL
  • Experience using Monitoring Tools like VisualVM, Splunk, AppDynamics and New Relic
  • Strong Knowledge of the Linux Operating System
  • Understanding of Cloud Architectures - IaaS vs PaaS vs SaaS
  • Understanding of software architecture and design patterns - Monolithic vs Microservices
  • Practitioner of SAFe (Scaled Agile Framework) or other Agile methodologies
  • Experience with the GitHub and Jenkins CICD Platforms
  • Experience using Spring Frameworks like Spring Boot, Spring Cloud and Spring Data JPA
  • Familiarity with networking concepts and best practices
  • Understanding of the 12-Factor App and associated principles
  • Automation experience like testing and deploying production applications
  • Knowledge of the API First Philosophy


    Education: Bachelor's degree or equivalent in Computer Science, Electrical / Electronics Engineering, MIS or related discipline

    Experience: Five (5) years of work experience in managing reliability and performance of infrastructure, applications and systems at scale.

    Knowledge, Skills and Abilities
    • Fluency in English
    • Problem Solving Skills
    • Communication
    • Collaboration
    • Adaptability
    • Ability to operate in a 24x7 environment encompassing global timezones


    Preferred Qualifications:

    Pay Transparency:

    Pay:

    Additional Details:


    FedEx was built on a philosophy that puts people first, one we take seriously. We are an equal opportunity/affirmative action employer and we are committed to a diverse, equitable, and inclusive workforce in which we enforce fair treatment, and provide growth opportunities for everyone.

    All qualified applicants will receive consideration for employment regardless of age, race, color, national origin, genetics, religion, gender, marital status, pregnancy (including childbirth or a related medical condition), physical or mental disability, or any other characteristic protected by applicable laws, regulations, and ordinances.

    Our Company

    FedEx is one of the world's largest express transportation companies and has consistently been selected as one of the top 10 World’s Most Admired Companies by "Fortune" magazine. Every day FedEx delivers for its customers with transportation and business solutions, serving more than 220 countries and territories around the globe. We can serve this global network due to our outstanding team of FedEx team members, who are tasked with making every FedEx experience outstanding.

    Our Philosophy

    The People-Service-Profit philosophy (P-S-P) describes the principles that govern every FedEx decision, policy, or activity. FedEx takes care of our people; they, in turn, deliver the impeccable service demanded by our customers, who reward us with the profitability necessary to secure our future. The essential element in making the People-Service-Profit philosophy such a positive force for the company is where we close the circle, and return these profits back into the business, and invest back in our people. Our success in the industry is attributed to our people. Through our P-S-P philosophy, we have a work environment that encourages team members to be innovative in delivering the highest possible quality of service to our customers. We care for their well-being, and value their contributions to the company.

    Our Culture

    Our culture is important for many reasons, and we intentionally bring it to life through our behaviors, actions, and activities in every part of the world. The FedEx culture and values have been a cornerstone of our success and growth since we began in the early 1970’s. While other companies can copy our systems, infrastructure, and processes, our culture makes us unique and is often a differentiating factor as we compete and grow in today’s global marketplace.

    This advertiser has chosen not to accept applicants from your region.

    Software Engineer (Site Reliability)

    Hyderabad, Andhra Pradesh Procter & Gamble

    Posted 17 days ago

    Job Viewed

    Tap Again To Close

    Job Description

    Job Location
    HYDERABAD OFFICE INDIA
    Job Description
    We are seeking a motivated Software/Platform Engineer with expe rience in Databricks observability to join our dynamic team in D&A Platforms SRE . The ideal candidate will work and play a role in maintaining the reliability, availability, and performance of our data infrastructure and applications, demonstrating Databricks to ensure flawless operations and efficient performance. You will collaborate closely with development, operations, and data teams to implement best practices in observability and monitoring, enabling a proactive approach to incident management and system optimization.
    Key Responsibilities
    Reliability and Performance :
    + Design, implement, and maintain scalable and reliable systems and services
    + Monitor system performance, availability, and reliability, proactively identifying and resolving issues.
    Observability Implementation :
    + Apply Databricks observability tools to develop and maintain dashboards, alerts, and reporting mechanisms that provide insights into system performance and usage.
    + Establish and improve observability frameworks to supervise key performance indicators (KPIs) and service-level objectives (SLOs).
    Incident Management :
    + Respond to and fix production incidents, performing root cause analysis and implementing corrective actions to prevent future occurrences.
    + Collaborate with multi-functional teams to ensure effective incident response processes and documentation.
    Automation and Efficiency :
    + Develop automation scripts and tools to streamline operational tasks, improve deployment processes, and enhance system reliability.
    + Supply to the continuous improvement of deployment pipelines and infrastructure as code ( IaC ) practices.
    Collaboration and Documentation :
    + Work closely with development teams to understand application architectures and give to system design discussions.
    + Document processes, best practices, and system architecture to facilitate knowledge sharing and onboarding.
    Performance Optimization :
    + Analyze system performance and application usage patterns to recommend and implement optimizations that improve efficiency and reduce costs.
    Job Qualifications
    + Education:
    + Bachelor's degree in Computer Science, Engineering, or a related field, or equivalent practical experience.
    + Experience:
    + 3 to 6 years of experience in Platform Engineering, DevOps, or a related field.
    + Experience with Databricks, including its observability and monitoring features.
    + Experience in Grafana observability platform
    + Familiarity with cloud platforms (Azure)
    + Technical Skills:
    + Programming languages skills: Python, Scala.
    + SQL knowledge for data extraction and transformation
    + Experience in Power BI development, both semantic models & visualizations
    + Experience in Grafana visualizations
    + Soft Skills:
    + Problem-solving skills and the ability to work in a fast-paced, collaborative environment.
    + Good communication skills, with the ability to convey sophisticated technical concepts to non-technical collaborators.
    + A proactive attitude with a focus on continuous improvement and learning.
    + Open to explore and experiment new SRE processes and tools to support technical requirements of the D&A platform
    + Willing to proactively seek new opportunities to learn and adopt new knowledge into practice.
    About us
    We produce globally recognized brands and we grow the best business leaders in the industry. With a portfolio of trusted brands as diverse as ours, it is paramount our leaders are able to lead with courage the vast array of brands, categories and functions. We serve consumers around the world with one of the strongest portfolios of trusted, quality, leadership brands, including Always®, Ariel®, Gillette®, Head & Shoulders®, Herbal Essences®, Oral-B®, Pampers®, Pantene®, Tampax® and more. Our community includes operations in approximately 70 countries worldwide. Visit to know more.We are an equal opportunity employer and value diversity at our company. We do not discriminate against individuals on the basis of race, color, gender, age, national origin, religion, sexual orientation, gender identity or expression, marital status, citizenship, disability, HIV/AIDS status, or any other legally protected factor.
    "At P&G, the hiring journey is personalized every step of the way, thereby ensuring equal opportunities for all, with a strong foundation of Ethics & Corporate Responsibility guiding everything we do.All the available job opportunities are posted either on our website - pgcareers.com, or on our official social media pages, for the convenience of prospective candidates, and do not require them to pay any kind of fees towards their application."
    Job Schedule
    Full time
    Job Number
    R
    Job Segmentation
    Experienced Professionals (Job Segmentation)
    This advertiser has chosen not to accept applicants from your region.

    Software Engineer (Site Reliability)

    Hyderabad, Andhra Pradesh Procter & Gamble

    Posted today

    Job Viewed

    Tap Again To Close

    Job Description

    Description

    We are seeking a motivated Software/Platform Engineer with experience in Databricks observability to join our dynamic team in D&A Platforms SRE. The ideal candidate will work and play a role in maintaining the reliability, availability, and performance of our data infrastructure and applications, demonstrating Databricks to ensure flawless operations and efficient performance. You will collaborate closely with development, operations, and data teams to implement best practices in observability and monitoring, enabling a proactive approach to incident management and system optimization.

    Key Responsibilities

    Reliability and Performance :

  • Design, implement, and maintain scalable and reliable systems and services
  • Monitor system performance, availability, and reliability, proactively identifying and resolving issues.
  • Observability Implementation :

  • Apply Databricks observability tools to develop and maintain dashboards, alerts, and reporting mechanisms that provide insights into system performance and usage.
  • Establish and improve observability frameworks to supervise key performance indicators (KPIs) and service-level objectives (SLOs).
  • Incident Management :

  • Respond to and fix production incidents, performing root cause analysis and implementing corrective actions to prevent future occurrences.
  • Collaborate with multi-functional teams to ensure effective incident response processes and documentation.
  • Automation and Efficiency :

  • Develop automation scripts and tools to streamline operational tasks, improve deployment processes, and enhance system reliability.
  • Supply to the continuous improvement of deployment pipelines and infrastructure as code (IaC) practices.
  • Collaboration and Documentation :

  • Work closely with development teams to understand application architectures and give to system design discussions.
  • Document processes, best practices, and system architecture to facilitate knowledge sharing and onboarding.
  • Performance Optimization :

  • Analyze system performance and application usage patterns to recommend and implement optimizations that improve efficiency and reduce costs.
  • Job Qualifications

  • Education :Bachelor’s degree in Computer Science, Engineering, or a related field, or equivalent practical experience.
  • Experience :3 to 6 years of experience in Platform Engineering, DevOps, or a related field.Experience with Databricks, including its observability and monitoring features.Experience in Grafana observability platformFamiliarity with cloud platforms (Azure)
  • Technical Skills :Programming languages skills: Python, Scala.SQL knowledge for data extraction and transformationExperience in Power BI development, both semantic models & visualizationsExperience in Grafana visualizations
  • Soft Skills :Problem-solving skills and the ability to work in a fast-paced, collaborative environment.Good communication skills, with the ability to convey sophisticated technical concepts to non-technical collaborators.A proactive attitude with a focus on continuous improvement and learning.Open to explore and experiment new SRE processes and tools to support technical requirements of the D&A platformWilling to proactively seek new opportunities to learn and adopt new knowledge into practice.
  • About us  

    We produce globally recognized brands and we grow the best business leaders in the industry. With a portfolio of trusted brands as diverse as ours, it is paramount our leaders are able to lead with courage the vast array of brands, categories and functions. We serve consumers around the world with one of the strongest portfolios of trusted, quality, leadership brands, including Always®, Ariel®, Gillette®, Head & Shoulders®, Herbal Essences®, Oral-B®, Pampers®, Pantene®, Tampax® and more. Our community includes operations in approximately 70 countries worldwide. Visit to know more.

    We are an equal opportunity employer and value diversity at our company. We do not discriminate against individuals on the basis of race, color, gender, age, national origin, religion, sexual orientation, gender identity or expression, marital status, citizenship, disability, HIV/AIDS status, or any other legally protected factor.

    "At P&G, the hiring journey is personalized every step of the way, thereby ensuring equal opportunities for all, with a strong foundation of Ethics & Corporate Responsibility guiding everything we do.
    All the available job opportunities are posted either on our website - pgcareers.com, or on our official social media pages, for the convenience of prospective candidates, and do not require them to pay any kind of fees towards their application.”

    Job Schedule

    Full time

    Job Number

    R

    Job Segmentation

    Experienced Professionals (Job Segmentation)
    This advertiser has chosen not to accept applicants from your region.

    Azure Data Engineers - Site Reliability Engineering

    Hyderabad, Andhra Pradesh GSPANN

    Posted today

    Job Viewed

    Tap Again To Close

    Job Description

    Description GSPANN is hiring Azure Data Engineers with expertise in Site Reliability Engineering (SRE) to optimize and automate large-scale data applications. The role involves ensuring system reliability and performance using Azure Data Factory, Databricks, Cosmos DB, and Power BI.

    Role and Responsibilities

  • Develop a deep understanding of the business and analyze the end-to-end customer journey.
  • Collaborate with stakeholders to enhance the design, visibility, availability, scalability, and performance of services.
  • Work closely with Data Engineering teams to implement necessary improvements and enhancements.
  • Identify and escalate potential production-impacting issues proactively in collaboration with Engineering teams.
  • Automate manual processes efficiently, conduct in-depth incident analysis, and drive blameless postmortems.
  • Optimize alert management, decision-making, and performance analysis by leveraging standardized telemetry data.
  • Assist in deployment, post-deployment monitoring, and dashboard/alert creation to track system changes effectively.
  • Ensure adherence to critical company controls required for internal and external audit compliance.
  • Develop value-proposition presentations, case studies, and accelerators to drive business impact.
  • Skills and Experience

  • 8+ years of experience in software development, technical operations, and managing large-scale applications.
  • 7+ years of experience in developing or supporting Azure Data Factory (Application Programming Interface/API & API Management/APIM), Azure Databricks, Azure DevOps, Azure Data Lake Storage (ADLS), SQL, Synapse Data Warehouse, and Azure Cosmos DB.
  • 5+ years of hands-on experience in Data Engineering and coding.
  • Experience in data virtualization products like Denodo is desirable.
  • Holding an Azure Data Engineer or Solutions Architect certification is a plus.
  • Work with container platforms such as Docker and Kubernetes, assessing applications/platforms periodically for architectural improvements.
  • Apply strong troubleshooting skills to quickly identify and resolve application issues with minimal business impact.
  • Handle high-volume, mission-critical applications with hands-on experience.
  • Utilize expertise in IT tools, techniques, systems, and solutions to drive operational excellence.
  • Lead triage calls with multiple technical stakeholders and communicate effectively across teams.
  • Solve cross-functional issues creatively, adapting to changing priorities.
  • Manage escalations effectively, taking full ownership of critical issues to ensure timely resolution.
  • Basic knowledge of Data Science and Machine Learning (DSML), Artificial Intelligence/Machine Learning (AI/ML), and Machine Learning Operations (ML Ops).
  • Familiarity with Azure Event Hub, IoT Hub, and Azure Application Insights.
  • Knowledge of the IT Infrastructure Library (ITIL) framework and IT Service Management (ITSM) tools.
  • Understanding of SAP HANA and willingness to work in rotational shifts.
  • This advertiser has chosen not to accept applicants from your region.

    Software Engineering

    Hyderabad, Andhra Pradesh Microsoft Corporation

    Posted 9 days ago

    Job Viewed

    Tap Again To Close

    Job Description

    Do you want to be a part of a multi-billion-dollar organization that is rapidly growing and is responsible for 200M MAU and exabytes of customer data in the cloud at high performance and scale? Do you want to work on technically challenging problems on the cloud in a full-stack environment, with an opportunity to influence the roadmap and vision of not only your team but your partner teams as well? If so, come join the OneDrive-SharePoint (ODSP) team as part of Office M365 ecosystem in Hyderabad!
    SharePoint helps millions of people work better together and empowers the biggest companies in the world to solve mission critical problems. We create global scale services to store, secure and manage some of the most sensitive data on the planet.
    We have fantastic opportunities and are on the front-line of making many of our next generation architecture investments to deliver multi-geo content store, amazing performance/scale/reliability, and security capabilities using scalable cloud distributed systems.
    Microsoft's mission is to empower every person and every organization on the planet to achieve more. As employees we come together with a growth mindset, innovate to empower others, and collaborate to realize our shared goals. Each day we build on our values of respect, integrity, and accountability to create a culture of inclusion where everyone can thrive at work and beyond.
    **Responsibilities**
    Towards this vision, we are seeking a solid and highly motivated **Software Engineer** to disrupt and build next generation of products and take it to the next level:
    + Driving the complex problem solving and the design/development of software and ensure its quality.
    + Defining new components with complete understanding of service interdependencies and limitations.
    + Possess knowledge and is curious to learn more about performance, scalability, enterprise system architecture, and engineering best practices.
    + Creating prototypes and proof-of-concepts for iterative development.
    + Work effectively with product development and engineering teams.
    + You must be self-driven, curious to learn, proactive, and result-oriented.
    Join a team of builders and innovators that think outside the box. A team that's committed to a low operational burden by designing for it. A team that puts work-life balance, personal and professional growth as a principle, not just a goal. If you enjoy working in a dynamic environment to deliver world class mission critical systems, this may be the career opportunity for you!
    **Qualifications**
    **Required Qualifications:**
    + Bachelor's/ Master's Degree in Computer Science OR related technical field AND 2+ year(s) technical engineering experience with coding in languages **preferably C#** but not limited to, C, C++, Java, JavaScript, Python, etc.
    + Strong CS fundamentals and exceptional coding skills.
    + Good communication and cross group collaboration skills.
    + Experience in Azure, Exchange, or other cloud and distributed systems is a big plus.
    Microsoft is an equal opportunity employer. Consistent with applicable law, all qualified applicants will receive consideration for employment without regard to age, ancestry, citizenship, color, family or medical care leave, gender identity or expression, genetic information, immigration status, marital status, medical condition, national origin, physical or mental disability, political affiliation, protected veteran or military status, race, ethnicity, religion, sex (including pregnancy), sexual orientation, or any other characteristic protected by applicable local laws, regulations and ordinances. If you need assistance and/or a reasonable accommodation due to a disability during the application process, read more about requesting accommodations ( .
    This advertiser has chosen not to accept applicants from your region.
    Be The First To Know

    About the latest Software reliability Jobs in Hyderabad !

    Software Engineering Lead

    Hyderabad, Andhra Pradesh UnitedHealth Group

    Posted 2 days ago

    Job Viewed

    Tap Again To Close

    Job Description

    Optum is a global organization that delivers care, aided by technology to help millions of people live healthier lives. The work you do with our team will directly improve health outcomes by connecting people with the care, pharmacy benefits, data and resources they need to feel their best. Here, you will find a culture guided by inclusion, talented peers, comprehensive benefits and career development opportunities. Come make an impact on the communities we serve as you help us advance health optimization on a global scale. Join us to start **Caring. Connecting. Growing together.**
    **Primary Responsibilities:**
    + Working with the rest of the team to deploy, maintain, and run a highly available, multi-tenant distributed system
    + Automating both the infrastructure creation and the application deployment to that environment
    + Supporting Linux systems internals and administration
    + Build and maintain different Flavors of Kubernetes clusters on-prem and public cloud
    + Support application teams to address the issues faced
    + Update, enhance, create new tools using Python or Golang programming languages
    + Using modern AI tools and agents to create new tools to enhance our products and services
    + Comply with the terms and conditions of the employment contract, company policies and procedures, and any and all directives (such as, but not limited to, transfer and/or re-assignment to different work locations, change in teams and/or work shifts, policies in regard to flexibility of work benefits and/or work environment, alternative work arrangements, and other decisions that may arise due to the changing business environment). The Company may adopt, vary or rescind these policies and directives in its absolute discretion and without any limitation (implied or otherwise) on its ability to do so
    **Required Qualifications:**
    + Full time graduation degree
    + Experience or understanding in life cycle management of Linux servers which includes Server provisioning, Build, Kick start, decommission and configuring servers using Configuration management tools like (ansible or Chef or Terraform)
    + Experience in supporting On-call and addressing war rooms, P1 and P2 tickets
    + Working experience on applying SRE concepts
    + Working experience with the use of AI tools and agents
    + Experience with scaling, monitoring, and troubleshooting actively running systems
    + Understand DevOps model (end to end) and experience in automating the software dev or test or deployment lifecycle with continuous integration and continuous deployment
    + Working knowledge of managing different Flavors of Kubernetes clusters on-prem and Public Clouds (AWS, AZURE and GCP)
    + Expert in working with Docker and Kubernetes
    + Expert in tools such as Monit, ELK, Splunk, Prometheus, Grafana etc.
    + Expert in scripting tools like shell scripting
    + Expertise in leveraging Open Source Software technologies to solve business problems
    + Proficient in using Python or Golang
    + Proven skills to recommend solutions to host an application or designing the application hosting strategy
    + Proven development skill to auto heal of system alerts
    + Demonstrated ability in addressing different customers on email or chat-ops and one on one meetings or calls with customers
    + Proven solid cross-organizational collaboration skills
    + Proven solid hands-on Experience with DevOps tool chains and practices
    _At UnitedHealth Group, our mission is to help people live healthier lives and make the health system work better for everyone. We believe everyone - of every race, gender, sexuality, age, location and income - deserves the opportunity to live their healthiest life. Today, however, there are still far too many barriers to good health which are disproportionately experienced by people of color, historically marginalized groups and those with lower incomes. We are committed to mitigating our impact on the environment and enabling and delivering equitable care that addresses health disparities and improves health outcomes - an enterprise priority reflected in our mission._
    This advertiser has chosen not to accept applicants from your region.
     

    Nearby Locations

    Other Jobs Near Me

    Industry

    1. request_quote Accounting
    2. work Administrative
    3. eco Agriculture Forestry
    4. smart_toy AI & Emerging Technologies
    5. school Apprenticeships & Trainee
    6. apartment Architecture
    7. palette Arts & Entertainment
    8. directions_car Automotive
    9. flight_takeoff Aviation
    10. account_balance Banking & Finance
    11. local_florist Beauty & Wellness
    12. restaurant Catering
    13. volunteer_activism Charity & Voluntary
    14. science Chemical Engineering
    15. child_friendly Childcare
    16. foundation Civil Engineering
    17. clean_hands Cleaning & Sanitation
    18. diversity_3 Community & Social Care
    19. construction Construction
    20. brush Creative & Digital
    21. currency_bitcoin Crypto & Blockchain
    22. support_agent Customer Service & Helpdesk
    23. medical_services Dental
    24. medical_services Driving & Transport
    25. medical_services E Commerce & Social Media
    26. school Education & Teaching
    27. electrical_services Electrical Engineering
    28. bolt Energy
    29. local_mall Fmcg
    30. gavel Government & Non Profit
    31. emoji_events Graduate
    32. health_and_safety Healthcare
    33. beach_access Hospitality & Tourism
    34. groups Human Resources
    35. precision_manufacturing Industrial Engineering
    36. security Information Security
    37. handyman Installation & Maintenance
    38. policy Insurance
    39. code IT & Software
    40. gavel Legal
    41. sports_soccer Leisure & Sports
    42. inventory_2 Logistics & Warehousing
    43. supervisor_account Management
    44. supervisor_account Management Consultancy
    45. supervisor_account Manufacturing & Production
    46. campaign Marketing
    47. build Mechanical Engineering
    48. perm_media Media & PR
    49. local_hospital Medical
    50. local_hospital Military & Public Safety
    51. local_hospital Mining
    52. medical_services Nursing
    53. local_gas_station Oil & Gas
    54. biotech Pharmaceutical
    55. checklist_rtl Project Management
    56. shopping_bag Purchasing
    57. home_work Real Estate
    58. person_search Recruitment Consultancy
    59. store Retail
    60. point_of_sale Sales
    61. science Scientific Research & Development
    62. wifi Telecoms
    63. psychology Therapy
    64. pets Veterinary
    View All Software Reliability Jobs View All Jobs in Hyderabad