7 Server Management jobs in Hyderabad
Infrastructure Specialist-System Administration
Posted 4 days ago
Job Viewed
Job Description
A career in IBM Consulting is rooted by long-term relationships and close collaboration with clients across the globe. You'll work with visionaries across multiple industries to improve the hybrid cloud and AI journey for the most innovative and valuable companies in the world. Your ability to accelerate impact and make meaningful change for your clients is enabled by our strategic partner ecosystem and our robust technology platforms across the IBM portfolio
**Your role and responsibilities**
As a Software Developer you'll participate in many aspects of the software development lifecycle, such as design, code implementation, testing, and support. You will create software that enables your clients' hybrid-cloud and AI journeys.
Your primary responsibilities include:
* Envision, design, and build the Software infrastructure that keeps the solutions running.
* Challenge ideas, identify problems and create efficient solutions.
* Develop flexible, maintainable, and scalable application components.
* Collaborate with development peers and lead the way staying up to date with tools and technology trends
**Required technical and professional expertise**
* Min 4 yrs of exp.
* We are seeking a skilled Telecom Operations Specialist / Support Engineer to join our team.
* The ideal candidate will have a strong background in Linux Shell Scripting, SQL Database Querying, and hands-on experience with OSS/BSS systems in the telecommunications sector.
* Familiarity with Cramer tools, defect triage, and incident management processes are key to ensuring optimal performance and efficient resolution of network and service-related issues.
* Exposure to COTS products used in telecom operations is highly desirable
**Preferred technical and professional experience**
* Manage and monitor system resources, databases
* Develop and maintain shell scripts for automating repetitive tasks.
* Streamlining system operations in a Linux environment
IBM is committed to creating a diverse environment and is proud to be an equal-opportunity employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, gender, gender identity or expression, sexual orientation, national origin, caste, genetics, pregnancy, disability, neurodivergence, age, veteran status, or other characteristics. IBM is also committed to compliance with all fair employment practices regarding citizenship and immigration status.
L3 Server Engineer – Major Incident Management
Posted today
Job Viewed
Job Description
Company Description
Nextbridge IT Solutions is a US-based IT solution firm specializing in connecting exceptional talent with organizations driving transformation in infrastructure, cloud, and emerging technologies. We partner closely with clients to understand their technical needs and organizational goals, delivering tailored solutions through highly skilled professionals. Our culture values forward-thinking, accountability, and agility, encouraging continuous growth and supporting long-term success. Join us to shape the future together.
Role Description
This is a remote contract role for a L3 Server Engineer – Major Incident Management. The L3 Server Engineer will be responsible for managing and resolving major incidents, providing expert troubleshooting, and ensuring uptime and performance of infrastructure. Duties include handling operating systems, supporting databases, and overseeing overall IT infrastructure. The role also requires effective communication and collaboration with other IT professionals and stakeholders to ensure swift resolution of incidents.
Key Responsibilities
- Serve as the primary technical escalation point for all server-related Major Incidents (MIM) and P1 events.
- Lead technical triage on bridge calls and in war rooms, coordinating efforts between L2 support, application teams, vendors, and other cross-functional stakeholders.
- Perform advanced, real-time troubleshooting to diagnose and resolve complex issues across Windows Server, Linux, and VMware virtualization platforms.
- Drive the restoration of critical infrastructure services with a focus on minimizing business impact.
- Author and deliver comprehensive Root Cause Analysis (RCA) and detailed post-incident reports.
- Partner with the Problem Management team to identify trends, implement proactive solutions, and prevent incident recurrence.
- Mentor and provide technical guidance to L1/L2 support teams to improve overall incident response capabilities.
- Participate in a 24x7 on-call rotation to provide critical support when needed.
Qualifications
- 5–8 years of hands-on experience in enterprise server administration and high-severity incident response.
- Expert-level knowledge of Windows Server (2016/2019/2022) and Linux (RHEL, Ubuntu).
- Deep expertise with virtualization technologies, specifically VMware ESXi/vSphere in a large-scale environment.
- Solid understanding of core infrastructure concepts: TCP/IP networking, SAN/NAS storage, and enterprise backup/recovery solutions.
- Hands-on experience with enterprise monitoring platforms (e.g., SolarWinds, Datadog, Nagios).
- Proficiency with an ITSM tool, preferably ServiceNow, for incident lifecycle management.
- Demonstrated ability to remain calm, focused, and organized during high-pressure situations.
- ITIL v3/v4 Foundation certification is required.
Preferred
- Advanced certifications such as MCSE, VCP, RHCE.
- ITIL Intermediate/Expert or related certifications.
- Experience with public cloud platforms (Azure, AWS) and hybrid cloud environments.
- Scripting and automation skills (PowerShell, Bash) for diagnostics and reporting.
Key Competencies
- Crisis Management: Able to lead effectively in high-pressure, time-critical situations.
- Collaboration: Works seamlessly with vendors, internal teams, and stakeholders to achieve common goals.
- Analytical Mindset: Possesses superior troubleshooting and Root Cause Analysis (RCA) capabilities.
- Communication: Delivers clear, concise, and timely updates during incidents, tailored for both technical and business audiences.
- Proactive Mindset: Focuses on prevention and continuous service improvement, not just reactive resolution.
Remote Work Environment
- This is a fully remote position. A company-provided Virtual Desktop (VDI) will be used for all work.
- Candidates are expected to provide their own reliable computer (laptop or desktop) and at least one monitor capable of accessing the VDI.
- A dedicated and quiet workspace is essential to maintain a professional environment during critical incident bridge calls.
L3 Server Engineer – Major Incident Management
Posted today
Job Viewed
Job Description
Company Description
Nextbridge IT Solutions is a US-based IT solution firm specializing in connecting exceptional talent with organizations driving transformation in infrastructure, cloud, and emerging technologies. We partner closely with clients to understand their technical needs and organizational goals, delivering tailored solutions through highly skilled professionals. Our culture values forward-thinking, accountability, and agility, encouraging continuous growth and supporting long-term success. Join us to shape the future together.
Role Description
This is a remote contract role for a L3 Server Engineer – Major Incident Management. The L3 Server Engineer will be responsible for managing and resolving major incidents, providing expert troubleshooting, and ensuring uptime and performance of infrastructure. Duties include handling operating systems, supporting databases, and overseeing overall IT infrastructure. The role also requires effective communication and collaboration with other IT professionals and stakeholders to ensure swift resolution of incidents.
Key Responsibilities
- Serve as the primary technical escalation point for all server-related Major Incidents (MIM) and P1 events.
- Lead technical triage on bridge calls and in war rooms, coordinating efforts between L2 support, application teams, vendors, and other cross-functional stakeholders.
- Perform advanced, real-time troubleshooting to diagnose and resolve complex issues across Windows Server, Linux, and VMware virtualization platforms.
- Drive the restoration of critical infrastructure services with a focus on minimizing business impact.
- Author and deliver comprehensive Root Cause Analysis (RCA) and detailed post-incident reports.
- Partner with the Problem Management team to identify trends, implement proactive solutions, and prevent incident recurrence.
- Mentor and provide technical guidance to L1/L2 support teams to improve overall incident response capabilities.
- Participate in a 24x7 on-call rotation to provide critical support when needed.
Qualifications
- 5–8 years of hands-on experience in enterprise server administration and high-severity incident response.
- Expert-level knowledge of Windows Server (2016/2019/2022) and Linux (RHEL, Ubuntu) .
- Deep expertise with virtualization technologies, specifically VMware ESXi/vSphere in a large-scale environment.
- Solid understanding of core infrastructure concepts: TCP/IP networking, SAN/NAS storage, and enterprise backup/recovery solutions .
- Hands-on experience with enterprise monitoring platforms (e.g., SolarWinds, Datadog, Nagios).
- Proficiency with an ITSM tool, preferably ServiceNow , for incident lifecycle management.
- Demonstrated ability to remain calm, focused, and organized during high-pressure situations.
- ITIL v3/v4 Foundation certification is required.
Preferred
- Advanced certifications such as MCSE, VCP, RHCE .
- ITIL Intermediate/Expert or related certifications.
- Experience with public cloud platforms (Azure, AWS ) and hybrid cloud environments.
- Scripting and automation skills (PowerShell, Bash ) for diagnostics and reporting.
Key Competencies
- Crisis Management : Able to lead effectively in high-pressure, time-critical situations.
- Collaboration : Works seamlessly with vendors, internal teams, and stakeholders to achieve common goals.
- Analytical Mindset : Possesses superior troubleshooting and Root Cause Analysis (RCA) capabilities.
- Communication : Delivers clear, concise, and timely updates during incidents, tailored for both technical and business audiences.
- Proactive Mindset : Focuses on prevention and continuous service improvement, not just reactive resolution.
Remote Work Environment
- This is a fully remote position. A company-provided Virtual Desktop (VDI) will be used for all work.
- Candidates are expected to provide their own reliable computer (laptop or desktop) and at least one monitor capable of accessing the VDI.
- A dedicated and quiet workspace is essential to maintain a professional environment during critical incident bridge calls.
L3 Server Engineer – Major Incident Management
Posted 4 days ago
Job Viewed
Job Description
Company Description
Nextbridge IT Solutions is a US-based IT solution firm specializing in connecting exceptional talent with organizations driving transformation in infrastructure, cloud, and emerging technologies. We partner closely with clients to understand their technical needs and organizational goals, delivering tailored solutions through highly skilled professionals. Our culture values forward-thinking, accountability, and agility, encouraging continuous growth and supporting long-term success. Join us to shape the future together.
Role Description
This is a remote contract role for a L3 Server Engineer – Major Incident Management. The L3 Server Engineer will be responsible for managing and resolving major incidents, providing expert troubleshooting, and ensuring uptime and performance of infrastructure. Duties include handling operating systems, supporting databases, and overseeing overall IT infrastructure. The role also requires effective communication and collaboration with other IT professionals and stakeholders to ensure swift resolution of incidents.
Key Responsibilities
- Serve as the primary technical escalation point for all server-related Major Incidents (MIM) and P1 events.
- Lead technical triage on bridge calls and in war rooms, coordinating efforts between L2 support, application teams, vendors, and other cross-functional stakeholders.
- Perform advanced, real-time troubleshooting to diagnose and resolve complex issues across Windows Server, Linux, and VMware virtualization platforms.
- Drive the restoration of critical infrastructure services with a focus on minimizing business impact.
- Author and deliver comprehensive Root Cause Analysis (RCA) and detailed post-incident reports.
- Partner with the Problem Management team to identify trends, implement proactive solutions, and prevent incident recurrence.
- Mentor and provide technical guidance to L1/L2 support teams to improve overall incident response capabilities.
- Participate in a 24x7 on-call rotation to provide critical support when needed.
Qualifications
- 5–8 years of hands-on experience in enterprise server administration and high-severity incident response.
- Expert-level knowledge of Windows Server (2016/2019/2022) and Linux (RHEL, Ubuntu) .
- Deep expertise with virtualization technologies, specifically VMware ESXi/vSphere in a large-scale environment.
- Solid understanding of core infrastructure concepts: TCP/IP networking, SAN/NAS storage, and enterprise backup/recovery solutions .
- Hands-on experience with enterprise monitoring platforms (e.g., SolarWinds, Datadog, Nagios).
- Proficiency with an ITSM tool, preferably ServiceNow, for incident lifecycle management.
- Demonstrated ability to remain calm, focused, and organized during high-pressure situations.
- ITIL v3/v4 Foundation certification is required.
Preferred
- Advanced certifications such as MCSE, VCP, RHCE .
- ITIL Intermediate/Expert or related certifications.
- Experience with public cloud platforms (Azure, AWS ) and hybrid cloud environments.
- Scripting and automation skills (PowerShell, Bash ) for diagnostics and reporting.
Key Competencies
- Crisis Management : Able to lead effectively in high-pressure, time-critical situations.
- Collaboration : Works seamlessly with vendors, internal teams, and stakeholders to achieve common goals.
- Analytical Mindset : Possesses superior troubleshooting and Root Cause Analysis (RCA) capabilities.
- Communication : Delivers clear, concise, and timely updates during incidents, tailored for both technical and business audiences.
- Proactive Mindset : Focuses on prevention and continuous service improvement, not just reactive resolution.
Remote Work Environment
- This is a fully remote position. A company-provided Virtual Desktop (VDI) will be used for all work.
- Candidates are expected to provide their own reliable computer (laptop or desktop) and at least one monitor capable of accessing the VDI.
- A dedicated and quiet workspace is essential to maintain a professional environment during critical incident bridge calls.
L3 Server Engineer – Major Incident Management
Posted 4 days ago
Job Viewed
Job Description
Company Description
Nextbridge IT Solutions is a US-based IT solution firm specializing in connecting exceptional talent with organizations driving transformation in infrastructure, cloud, and emerging technologies. We partner closely with clients to understand their technical needs and organizational goals, delivering tailored solutions through highly skilled professionals. Our culture values forward-thinking, accountability, and agility, encouraging continuous growth and supporting long-term success. Join us to shape the future together.
Role Description
This is a remote contract role for a L3 Server Engineer – Major Incident Management. The L3 Server Engineer will be responsible for managing and resolving major incidents, providing expert troubleshooting, and ensuring uptime and performance of infrastructure. Duties include handling operating systems, supporting databases, and overseeing overall IT infrastructure. The role also requires effective communication and collaboration with other IT professionals and stakeholders to ensure swift resolution of incidents.
Key Responsibilities
- Serve as the primary technical escalation point for all server-related Major Incidents (MIM) and P1 events.
- Lead technical triage on bridge calls and in war rooms, coordinating efforts between L2 support, application teams, vendors, and other cross-functional stakeholders.
- Perform advanced, real-time troubleshooting to diagnose and resolve complex issues across Windows Server, Linux, and VMware virtualization platforms.
- Drive the restoration of critical infrastructure services with a focus on minimizing business impact.
- Author and deliver comprehensive Root Cause Analysis (RCA) and detailed post-incident reports.
- Partner with the Problem Management team to identify trends, implement proactive solutions, and prevent incident recurrence.
- Mentor and provide technical guidance to L1/L2 support teams to improve overall incident response capabilities.
- Participate in a 24x7 on-call rotation to provide critical support when needed.
Qualifications
- 5–8 years of hands-on experience in enterprise server administration and high-severity incident response.
- Expert-level knowledge of Windows Server (2016/2019/2022) and Linux (RHEL, Ubuntu) .
- Deep expertise with virtualization technologies, specifically VMware ESXi/vSphere in a large-scale environment.
- Solid understanding of core infrastructure concepts: TCP/IP networking, SAN/NAS storage, and enterprise backup/recovery solutions .
- Hands-on experience with enterprise monitoring platforms (e.g., SolarWinds, Datadog, Nagios).
- Proficiency with an ITSM tool, preferably ServiceNow, for incident lifecycle management.
- Demonstrated ability to remain calm, focused, and organized during high-pressure situations.
- ITIL v3/v4 Foundation certification is required.
Preferred
- Advanced certifications such as MCSE, VCP, RHCE .
- ITIL Intermediate/Expert or related certifications.
- Experience with public cloud platforms (Azure, AWS ) and hybrid cloud environments.
- Scripting and automation skills (PowerShell, Bash ) for diagnostics and reporting.
Key Competencies
- Crisis Management : Able to lead effectively in high-pressure, time-critical situations.
- Collaboration : Works seamlessly with vendors, internal teams, and stakeholders to achieve common goals.
- Analytical Mindset : Possesses superior troubleshooting and Root Cause Analysis (RCA) capabilities.
- Communication : Delivers clear, concise, and timely updates during incidents, tailored for both technical and business audiences.
- Proactive Mindset : Focuses on prevention and continuous service improvement, not just reactive resolution.
Remote Work Environment
- This is a fully remote position. A company-provided Virtual Desktop (VDI) will be used for all work.
- Candidates are expected to provide their own reliable computer (laptop or desktop) and at least one monitor capable of accessing the VDI.
- A dedicated and quiet workspace is essential to maintain a professional environment during critical incident bridge calls.
Sr Specialist Systems Administration - Cloud and Linux System Administrator
Posted 1 day ago
Job Viewed
Job Description
Title: Sr Specialist Systems Administration
Role: Cloud and Linux System Administrator
Experience Level: 8+ Years
Job Summary/ Description
Network Cloud Operations is responsible for Operations, Administration and Maintenance of Server Infrastructure in AT&T AIC (AT&T Integrated Cloud) and NC (Network Cloud) environment. This Job requires Software Update and Deployment of Network Cloud Environment, Firmware updates, Working on Tickets, Changes where the environment is built on Linux, Open stack, Kubernetes environment in 24*7 operations.
Roles and Responsibilities:
Performs work focusing on software, maintenance, and operations of systems used by AT&T or its clients to conduct business.
Performs feasibility assessments, creates requirements, manages projects, and integrates and tests technical solutions for software.
Primary / Mandatory skills:
This position requires a broad-spectrum skill set across Linux, KVM, OpenStack, Kubernetes, Containers, Cloud Infrastructure platforms, Windows, and Monitoring of the Server Infrastructure. The candidate should have minimum of 6-10 years of working experience on Production Environment.
Should have excellent trouble-shooting and analytical skills in Systems Administration of Linux, Cloud,
OpenStack and Kubernetes environment.
Should be well versed and worked on Operations and Maintenance of Open stack Modules like Keystone, Nova, Neutron, Swift, Cinder, Heat, Glance, Horizon, and Fuel.
Should be well versed with concepts like High Availability, DRS, Fault Tolerance, Scalability, Reliability, shared resources etc. including Handling of Tier 1/ Tier 2 Tickets in AT&T AIC/ NC Server Environment.
Creating, reviewing, approving, and implementing changes in AIC/ NC Server Environment.
Restoration of Applications/services in AIC/NC infrastructure from Hardware/ OpenStack/ OS perspective in Outages/ Service degradation scenarios.
Experience on Python & Shell scripting.
Upgradation & Deployment of AIC/ NC Server infrastructure (Linux, OpenStack & Kubernetes).
Knowledge and Understanding of CI/CD Pipeline in Jenkins, GitHub
Co-ordination with OS vendors, Development Teams, Production Support teams for Root Cause Analysis, configuration changes etc.
And Co-ordination with Hardware vendors (Dell, HP) for faulty Hardware replacement.
Should have very high level of Customer Focus, to provide effortless Customer Experience. Should display a positive attitude working with various peers/ customers / clients/ vendors etc.
The Candidate should be ready to work in 24*7 rotational shift Operations.
Should have strong analytical/logical/problem solving skills.
Should have to work across diverse teams across regions/cultures.
Certification in Systems Administration of Linux, Open Stack and Kubernetes is preferable.
The candidate should have high level of Co-ordination and Communication skills (verbal and written).
Education Qualification: Engineering/ Graduation/ Equivalent in Computer Science, Electronics and Communications.
**Weekly Hours:**
40
**Time Type:**
Regular
**Location:**
Hyderabad, India
It is the policy of AT&T to provide equal employment opportunity (EEO) to all persons regardless of age, color, national origin, citizenship status, physical or mental disability, race, religion, creed, gender, sex, sexual orientation, gender identity and/or expression, genetic information, marital status, status with regard to public assistance, veteran status, or any other characteristic protected by federal, state or local law. In addition, AT&T will provide reasonable accommodations for qualified individuals with disabilities. AT&T is a fair chance employer and does not initiate a background check until an offer is made.
AT&T will consider for employment qualified applicants in a manner consistent with the requirements of federal, state and local laws
We expect employees to be honest, trustworthy, and operate with integrity. Discrimination and all unlawful harassment (including sexual harassment) in employment is not tolerated. We encourage success based on our individual merits and abilities without regard to race, color, religion, national origin, gender, sexual orientation, gender identity, age, disability, marital status, citizenship status, military status, protected veteran status or employment status
Be The First To Know
About the latest Server management Jobs in Hyderabad !