82 Devops Manager jobs in Hyderabad
DevOps Manager
Posted today
Job Viewed
Job Description
Position: DevOps Manager
Location: India (Remote)
Employment Type: Full-Time
Schedule: Monday to Friday, Day Shift
Company Description
Scry AI is a research-led enterprise AI company that builds intelligent platforms to drive efficiency, insight, and compliance. Our platforms Collatio®, Auriga®, and Concentio® streamline complex workflows by automating data extraction, validation, reconciliation and delivering real-time intelligence.
We are seeking a DevOps Manager to lead our infrastructure, CI/CD, and reliability practices across cloud and on-prem deployments. You will own uptime, performance, security, and cost efficiency for AI/ML workloads powering Collatio®, Auriga®, and Concentio®.
Role Overview
As DevOps Manager, you will lead a small team of DevOps/SRE engineers to design, automate, and operate secure, compliant, and highly available platforms across AWS/Azure/GCP and customer on-prem environments. You will standardize IaC, improve CI/CD velocity, build robust observability, and enable GPU-accelerated AI inference at scale for enterprise clients.
Key Responsibilities
Platform Reliability & Operations
• Own SLOs/SLIs, availability, latency, and capacity planning across services.
• Lead incident response, root-cause analysis, postmortems, and on-call processes.
• Implement backup, disaster recovery, and business continuity for multi-region and on-prem.
Cloud, On-Prem & Edge Deployments
• Architect Kubernetes platforms (managed and self-hosted), including RBAC, network policies, and secrets management.
• Standardize infrastructure with Terraform, Helm, and GitOps (Argo CD) for repeatable customer deployments.
• Support Concentio® edge/IoT rollouts with secure remote updates and telemetry pipelines.
AI/ML & Data Infrastructure
• Enable GPU scheduling and drivers (CUDA, NVIDIA), inference runtimes (Triton), and model packaging.
• Build MLOps foundations (MLflow, feature stores) and artifact/version governance.
• Operate data services (Kafka, PostgreSQL, Redis, MinIO/S3, Elasticsearch/Opensearch) for high-throughput pipelines.
CI/CD & Developer Experience
• Own CI/CD with GitHub Actions/GitLab CI/Jenkins; establish trunk-based development, automated testing, and canary/blue-green releases.
• Maintain internal developer platforms, templates, and golden paths to improve delivery speed and quality.
Security, Compliance & Observability
• Implement least-privilege access, SSO (Okta/AAD), Vault-based secrets, image scanning (Trivy), and policy as code.
• Ensure SOC 2, ISO 27001, HIPAA/GDPR alignment with audit trails and immutable logs.
• Build end-to-end observability using Prometheus, Grafana, Loki/EFK, and OpenTelemetry.
FinOps & Stakeholder Management
• Track cloud spend, rightsize resources, and negotiate quotas for GPU/compute.
• Partner with Product, Data Science, and Customer Success to plan capacity for new features and enterprise go-lives.
Required Qualifications & Skills
• Strong Kubernetes expertise (production operations, networking, security, Helm, GitOps).
• Proven IaC experience with Terraform and configuration management (Ansible).
• CI/CD at scale with GitHub Actions/GitLab CI/Jenkins; artifact registries and SBOMs.
• Observability: Prometheus, Grafana, ELK/EFK or Loki, alerting and runbooks.
• Cloud proficiency in at least one major provider (AWS/Azure/GCP) and Linux fundamentals.
• Security fundamentals: network segmentation, TLS, secrets management, container hardening.
• Experience running data/streaming systems (Kafka, Redis, PostgreSQL) in production.
• Excellent communication, incident leadership, and stakeholder management.
Nice-to-Have
• GPU orchestration, Triton Inference Server, Hugging Face model serving.
• Service mesh (Istio/Linkerd), API gateways, and zero-trust patterns.
• MLOps tooling (MLflow, Feast), Airflow, dbt.
• Compliance implementations for regulated industries (BFSI, healthcare).
• Certifications: CKA/CKAD, AWS/Azure/GCP Architect, Security+.
Our Ideal Candidate
• Drives reliability with automation, not toil.
• Balances speed and safety with measurable delivery improvements.
• Thrives in customer-facing, hybrid cloud, and on-prem environments.
• Coaches teams with clear standards, runbooks, and continuous improvement.
Tip for Candidates
If you want to build secure, high-performance platforms for real-world AI at enterprise scale, follow our page for more such relevant job openings.
DevOps Manager
Posted 1 day ago
Job Viewed
Job Description
We are hiring for Leading OTT Platform.
Designation- DevOps Manager
EXP- 8+ Yrs
Location- Hyderabad (Hybrid)
About The Role
As DevOps Engineering Manager, you will build and manage a team of engineers that provide DevOps support to our development teams and are directly responsible for crucial parts of our infrastructure platform. You will provide technical guidance and leadership to your team. You will drive the team’s day to day activities, work with other Infrastructure leads in prioritization, and closely partner with our development teams. As Engineering Manager you will report to the Senior Engineering Manager in Infrastructure.
Requirements
About you
• 8+ years of experience in a DevOps, SRE, or related infrastructure engineering role.
• 3+ years as a team lead or manager having hired and mentored engineers of all levels.
• You are a hands-on manager, able to review your team’s code and help your team make technical decisions.
• Experience in managing large complex projects and communicating effectively with stakeholders.
• Expert in AWS cloud technologies, coupled with Infrastructure as Code (IaC) such as Pulumi and Terraform
• Experience managing containerized infrastructure in ECS or Kubernetes
• Experience in GitOps practices and CI/CD.
• Familiar with Site Reliability principles and practices such as Incident Management, Root Cause Analysis, monitoring and alerting.
• Experience working with and guiding Software Engineers in production best practices, sharing knowledge across teams and teammates.
• Experience in at least two programming languages including TypeScript
• Able to accommodate periodic evening meetings with the US based team.
• Excellent communication skills and ability to collaborate with stakeholders.
• Participate in an on-call and developer support rotations.
Pluses
• Experienced working in a video or content streaming environment
• Understand security concepts such as cryptography, authentication, authorization, and security protocols.
• Familiarity with Datadog
• Familiarity with GitHub Actions
DevOps Manager
Posted today
Job Viewed
Job Description
Designation- DevOps Manager
EXP- 8+ Yrs
Location- Hyderabad (Hybrid)
About The Role
As DevOps Engineering Manager, you will build and manage a team of engineers that provide DevOps support to our development teams and are directly responsible for crucial parts of our infrastructure platform. You will provide technical guidance and leadership to your team. You will drive the team’s day to day activities, work with other Infrastructure leads in prioritization, and closely partner with our development teams. As Engineering Manager you will report to the Senior Engineering Manager in Infrastructure.
Requirements
About you
• 8+ years of experience in a DevOps, SRE, or related infrastructure engineering role.
• 3+ years as a team lead or manager having hired and mentored engineers of all levels.
• You are a hands-on manager, able to review your team’s code and help your team make technical decisions.
• Experience in managing large complex projects and communicating effectively with stakeholders.
• Expert in AWS cloud technologies, coupled with Infrastructure as Code (IaC) such as Pulumi and Terraform
• Experience managing containerized infrastructure in ECS or Kubernetes
• Experience in GitOps practices and CI/CD.
• Familiar with Site Reliability principles and practices such as Incident Management, Root Cause Analysis, monitoring and alerting.
• Experience working with and guiding Software Engineers in production best practices, sharing knowledge across teams and teammates.
• Experience in at least two programming languages including TypeScript
• Able to accommodate periodic evening meetings with the US based team.
• Excellent communication skills and ability to collaborate with stakeholders.
• Participate in an on-call and developer support rotations.
Pluses
• Experienced working in a video or content streaming environment
• Understand security concepts such as cryptography, authentication, authorization, and security protocols.
• Familiarity with Datadog
• Familiarity with GitHub Actions
DevOps Manager
Posted today
Job Viewed
Job Description
Location: India (Remote)
Employment Type: Full-Time
Schedule: Monday to Friday, Day Shift
Company Description
Scry AI is a research-led enterprise AI company that builds intelligent platforms to drive efficiency, insight, and compliance. Our platforms Collatio®, Auriga®, and Concentio® streamline complex workflows by automating data extraction, validation, reconciliation and delivering real-time intelligence.
We are seeking a DevOps Manager to lead our infrastructure, CI/CD, and reliability practices across cloud and on-prem deployments. You will own uptime, performance, security, and cost efficiency for AI/ML workloads powering Collatio®, Auriga®, and Concentio®.
Role Overview
As DevOps Manager, you will lead a small team of DevOps/SRE engineers to design, automate, and operate secure, compliant, and highly available platforms across AWS/Azure/GCP and customer on-prem environments. You will standardize IaC, improve CI/CD velocity, build robust observability, and enable GPU-accelerated AI inference at scale for enterprise clients.
Key Responsibilities
Platform Reliability & Operations
• Own SLOs/SLIs, availability, latency, and capacity planning across services.
• Lead incident response, root-cause analysis, postmortems, and on-call processes.
• Implement backup, disaster recovery, and business continuity for multi-region and on-prem.
Cloud, On-Prem & Edge Deployments
• Architect Kubernetes platforms (managed and self-hosted), including RBAC, network policies, and secrets management.
• Standardize infrastructure with Terraform, Helm, and GitOps (Argo CD) for repeatable customer deployments.
• Support Concentio® edge/IoT rollouts with secure remote updates and telemetry pipelines.
AI/ML & Data Infrastructure
• Enable GPU scheduling and drivers (CUDA, NVIDIA), inference runtimes (Triton), and model packaging.
• Build MLOps foundations (MLflow, feature stores) and artifact/version governance.
• Operate data services (Kafka, PostgreSQL, Redis, MinIO/S3, Elasticsearch/Opensearch) for high-throughput pipelines.
CI/CD & Developer Experience
• Own CI/CD with GitHub Actions/GitLab CI/Jenkins; establish trunk-based development, automated testing, and canary/blue-green releases.
• Maintain internal developer platforms, templates, and golden paths to improve delivery speed and quality.
Security, Compliance & Observability
• Implement least-privilege access, SSO (Okta/AAD), Vault-based secrets, image scanning (Trivy), and policy as code.
• Ensure SOC 2, ISO 27001, HIPAA/GDPR alignment with audit trails and immutable logs.
• Build end-to-end observability using Prometheus, Grafana, Loki/EFK, and OpenTelemetry.
FinOps & Stakeholder Management
• Track cloud spend, rightsize resources, and negotiate quotas for GPU/compute.
• Partner with Product, Data Science, and Customer Success to plan capacity for new features and enterprise go-lives.
Required Qualifications & Skills
• Strong Kubernetes expertise (production operations, networking, security, Helm, GitOps).
• Proven IaC experience with Terraform and configuration management (Ansible).
• CI/CD at scale with GitHub Actions/GitLab CI/Jenkins; artifact registries and SBOMs.
• Observability: Prometheus, Grafana, ELK/EFK or Loki, alerting and runbooks.
• Cloud proficiency in at least one major provider (AWS/Azure/GCP) and Linux fundamentals.
• Security fundamentals: network segmentation, TLS, secrets management, container hardening.
• Experience running data/streaming systems (Kafka, Redis, PostgreSQL) in production.
• Excellent communication, incident leadership, and stakeholder management.
Nice-to-Have
• GPU orchestration, Triton Inference Server, Hugging Face model serving.
• Service mesh (Istio/Linkerd), API gateways, and zero-trust patterns.
• MLOps tooling (MLflow, Feast), Airflow, dbt.
• Compliance implementations for regulated industries (BFSI, healthcare).
• Certifications: CKA/CKAD, AWS/Azure/GCP Architect, Security+.
Our Ideal Candidate
• Drives reliability with automation, not toil.
• Balances speed and safety with measurable delivery improvements.
• Thrives in customer-facing, hybrid cloud, and on-prem environments.
• Coaches teams with clear standards, runbooks, and continuous improvement.
Tip for Candidates
If you want to build secure, high-performance platforms for real-world AI at enterprise scale, follow our page for more such relevant job openings.
DevOps Manager
Posted today
Job Viewed
Job Description
We are seeking a highly skilled and motivated DevOps Lead with expertise in both AWS and Azure cloud platforms to join our dynamic team. The successful candidate will collaborate with solution architects, developers, project managers, customer technical teams, and internal stakeholders to drive results. Your primary focus will be ensuring seamless customer access to applications in the cloud, managing customer workload migrations, implementing robust backup policies, overseeing hybrid cloud deployments, and building solutions for service assurance with a strong emphasis on leveraging Azure's unique capabilities.
Responsibilities:
- Define architecture, design, implement, program manage, and lead technology teams in delivering complex technical solutions for our clients across both AWS and Azure platforms.
- Span across DevOps, Continuous Integration (CI), and Continuous Delivery (CD) areas, providing demonstrable implementation experience in shaping value-add consulting solutions.
- Deploy, automate, and maintain cloud infrastructure with leading public cloud vendors such as Amazon Web Services (AWS) and Microsoft Azure, with a keen focus on integrating Azure-specific services and tools.
- Set up backups, replications, archiving, and implement disaster recovery measures leveraging Azure's resilience and geo-redundancy features.
- Utilize Azure DevOps services for better collaboration, reporting, and increasing automation in the CI/CD pipelines.
Mandatory Skills and Experience:
- Detail-oriented with a holistic perspective on system architecture, including at least 1 year of hands-on experience with Azure cloud services.
- Strong shell scripting and Linux administration skills, with a deep understanding of Linux and virtualization.
- Expertise in server technologies like Apache, Nginx, and Node, including optimization experience.
- Knowledge of database technologies such as MySQL, Redis, and MongoDB, with proficiency in management, replication, and disaster recovery.
- Proven experience in medium to large-scale public cloud deployments on AWS and Azure, including the migration of complex, multi-tier applications to these platforms.
- In-depth working knowledge of AWS and Azure, showcasing the ability to leverage Azure-specific features such as Azure Active Directory, Azure Kubernetes Service (AKS), Azure Functions, and Azure Logic Apps.
- Familiarity with CI/CD, automation, and monitoring processes for production-level infrastructure, including the use of Azure Monitor and Azure Automation.
- Deep understanding of system performance and the ability to analyze root causes using tools available in Azure.
- Experience with Azure-specific management and governance tools, such as Azure Policy, Azure Blueprints, and Azure Resource Manager (ARM) templates.
- Proficiency in CI/CD automation using tools like Jenkins, Travis CI, Circle CI, or Azure DevOps.
- Knowledge of security infrastructure and vulnerabilities, including Azure's security tools like Azure Security Center and Azure Sentinel.
- Capability to analyze costs for the entire infrastructure, including cost management and optimization in Azure environments.
- Hands-on experience with configuration management tools like Ansible, Puppet, Chef, or similar, with an emphasis on their integration in Azure environments.
- Experience with container orchestration tools such as Kubernetes, Docker Swarm, and Docker containers, with a preference for those proficient in Azure Kubernetes Service (AKS).
DevOps Manager
Posted today
Job Viewed
Job Description
We are seeking a highly skilled and motivated DevOps Lead with expertise in both AWS and Azure cloud platforms to join our dynamic team. The successful candidate will collaborate with solution architects, developers, project managers, customer technical teams, and internal stakeholders to drive results. Your primary focus will be ensuring seamless customer access to applications in the cloud, managing customer workload migrations, implementing robust backup policies, overseeing hybrid cloud deployments, and building solutions for service assurance with a strong emphasis on leveraging Azure's unique capabilities.
Responsibilities:
- Define architecture, design, implement, program manage, and lead technology teams in delivering complex technical solutions for our clients across both AWS and Azure platforms.
- Span across DevOps, Continuous Integration (CI), and Continuous Delivery (CD) areas, providing demonstrable implementation experience in shaping value-add consulting solutions.
- Deploy, automate, and maintain cloud infrastructure with leading public cloud vendors such as Amazon Web Services (AWS) and Microsoft Azure, with a keen focus on integrating Azure-specific services and tools.
- Set up backups, replications, archiving, and implement disaster recovery measures leveraging Azure's resilience and geo-redundancy features.
- Utilize Azure DevOps services for better collaboration, reporting, and increasing automation in the CI/CD pipelines.
Mandatory Skills and Experience:
- Detail-oriented with a holistic perspective on system architecture, including at least 1 year of hands-on experience with Azure cloud services.
- Strong shell scripting and Linux administration skills, with a deep understanding of Linux and virtualization.
- Expertise in server technologies like Apache, Nginx, and Node, including optimization experience.
- Knowledge of database technologies such as MySQL, Redis, and MongoDB, with proficiency in management, replication, and disaster recovery.
- Proven experience in medium to large-scale public cloud deployments on AWS and Azure, including the migration of complex, multi-tier applications to these platforms.
- In-depth working knowledge of AWS and Azure, showcasing the ability to leverage Azure-specific features such as Azure Active Directory, Azure Kubernetes Service (AKS), Azure Functions, and Azure Logic Apps.
- Familiarity with CI/CD, automation, and monitoring processes for production-level infrastructure, including the use of Azure Monitor and Azure Automation.
- Deep understanding of system performance and the ability to analyze root causes using tools available in Azure.
- Experience with Azure-specific management and governance tools, such as Azure Policy, Azure Blueprints, and Azure Resource Manager (ARM) templates.
- Proficiency in CI/CD automation using tools like Jenkins, Travis CI, Circle CI, or Azure DevOps.
- Knowledge of security infrastructure and vulnerabilities, including Azure's security tools like Azure Security Center and Azure Sentinel.
- Capability to analyze costs for the entire infrastructure, including cost management and optimization in Azure environments.
- Hands-on experience with configuration management tools like Ansible, Puppet, Chef, or similar, with an emphasis on their integration in Azure environments.
- Experience with container orchestration tools such as Kubernetes, Docker Swarm, and Docker containers, with a preference for those proficient in Azure Kubernetes Service (AKS).
DevOps Manager
Posted today
Job Viewed
Job Description
Position: DevOps Manager
Location: India (Remote)
Employment Type: Full-Time
Schedule: Monday to Friday, Day Shift
Company Description
Scry AI is a research-led enterprise AI company that builds intelligent platforms to drive efficiency, insight, and compliance. Our platforms Collatio®, Auriga®, and Concentio® streamline complex workflows by automating data extraction, validation, reconciliation and delivering real-time intelligence.
We are seeking a DevOps Manager to lead our infrastructure, CI/CD, and reliability practices across cloud and on-prem deployments. You will own uptime, performance, security, and cost efficiency for AI/ML workloads powering Collatio®, Auriga®, and Concentio®.
Role Overview
As DevOps Manager, you will lead a small team of DevOps/SRE engineers to design, automate, and operate secure, compliant, and highly available platforms across AWS/Azure/GCP and customer on-prem environments. You will standardize IaC, improve CI/CD velocity, build robust observability, and enable GPU-accelerated AI inference at scale for enterprise clients.
Key Responsibilities
Platform Reliability & Operations
• Own SLOs/SLIs, availability, latency, and capacity planning across services.
• Lead incident response, root-cause analysis, postmortems, and on-call processes.
• Implement backup, disaster recovery, and business continuity for multi-region and on-prem.
Cloud, On-Prem & Edge Deployments
• Architect Kubernetes platforms (managed and self-hosted), including RBAC, network policies, and secrets management.
• Standardize infrastructure with Terraform, Helm, and GitOps (Argo CD) for repeatable customer deployments.
• Support Concentio® edge/IoT rollouts with secure remote updates and telemetry pipelines.
AI/ML & Data Infrastructure
• Enable GPU scheduling and drivers (CUDA, NVIDIA), inference runtimes (Triton), and model packaging.
• Build MLOps foundations (MLflow, feature stores) and artifact/version governance.
• Operate data services (Kafka, PostgreSQL, Redis, MinIO/S3, Elasticsearch/Opensearch) for high-throughput pipelines.
CI/CD & Developer Experience
• Own CI/CD with GitHub Actions/GitLab CI/Jenkins; establish trunk-based development, automated testing, and canary/blue-green releases.
• Maintain internal developer platforms, templates, and golden paths to improve delivery speed and quality.
Security, Compliance & Observability
• Implement least-privilege access, SSO (Okta/AAD), Vault-based secrets, image scanning (Trivy), and policy as code.
• Ensure SOC 2, ISO 27001, HIPAA/GDPR alignment with audit trails and immutable logs.
• Build end-to-end observability using Prometheus, Grafana, Loki/EFK, and OpenTelemetry.
FinOps & Stakeholder Management
• Track cloud spend, rightsize resources, and negotiate quotas for GPU/compute.
• Partner with Product, Data Science, and Customer Success to plan capacity for new features and enterprise go-lives.
Required Qualifications & Skills
• Strong Kubernetes expertise (production operations, networking, security, Helm, GitOps).
• Proven IaC experience with Terraform and configuration management (Ansible).
• CI/CD at scale with GitHub Actions/GitLab CI/Jenkins; artifact registries and SBOMs.
• Observability: Prometheus, Grafana, ELK/EFK or Loki, alerting and runbooks.
• Cloud proficiency in at least one major provider (AWS/Azure/GCP) and Linux fundamentals.
• Security fundamentals: network segmentation, TLS, secrets management, container hardening.
• Experience running data/streaming systems (Kafka, Redis, PostgreSQL) in production.
• Excellent communication, incident leadership, and stakeholder management.
Nice-to-Have
• GPU orchestration, Triton Inference Server, Hugging Face model serving.
• Service mesh (Istio/Linkerd), API gateways, and zero-trust patterns.
• MLOps tooling (MLflow, Feast), Airflow, dbt.
• Compliance implementations for regulated industries (BFSI, healthcare).
• Certifications: CKA/CKAD, AWS/Azure/GCP Architect, Security+.
Our Ideal Candidate
• Drives reliability with automation, not toil.
• Balances speed and safety with measurable delivery improvements.
• Thrives in customer-facing, hybrid cloud, and on-prem environments.
• Coaches teams with clear standards, runbooks, and continuous improvement.
Tip for Candidates
If you want to build secure, high-performance platforms for real-world AI at enterprise scale, follow our page for more such relevant job openings.
Be The First To Know
About the latest Devops manager Jobs in Hyderabad !
DevOps Manager
Posted today
Job Viewed
Job Description
Position: DevOps Manager
Location: India (Remote)
Employment Type: Full-Time
Schedule: Monday to Friday, Day Shift
Company Description
Scry AI is a research-led enterprise AI company that builds intelligent platforms to drive efficiency, insight, and compliance. Our platforms Collatio®, Auriga®, and Concentio® streamline complex workflows by automating data extraction, validation, reconciliation and delivering real-time intelligence.
We are seeking a DevOps Manager to lead our infrastructure, CI/CD, and reliability practices across cloud and on-prem deployments. You will own uptime, performance, security, and cost efficiency for AI/ML workloads powering Collatio®, Auriga®, and Concentio®.
Role Overview
As DevOps Manager, you will lead a small team of DevOps/SRE engineers to design, automate, and operate secure, compliant, and highly available platforms across AWS/Azure/GCP and customer on-prem environments. You will standardize IaC, improve CI/CD velocity, build robust observability, and enable GPU-accelerated AI inference at scale for enterprise clients.
Key Responsibilities
Platform Reliability & Operations
• Own SLOs/SLIs, availability, latency, and capacity planning across services.
• Lead incident response, root-cause analysis, postmortems, and on-call processes.
• Implement backup, disaster recovery, and business continuity for multi-region and on-prem.
Cloud, On-Prem & Edge Deployments
• Architect Kubernetes platforms (managed and self-hosted), including RBAC, network policies, and secrets management.
• Standardize infrastructure with Terraform, Helm, and GitOps (Argo CD) for repeatable customer deployments.
• Support Concentio® edge/IoT rollouts with secure remote updates and telemetry pipelines.
AI/ML & Data Infrastructure
• Enable GPU scheduling and drivers (CUDA, NVIDIA), inference runtimes (Triton), and model packaging.
• Build MLOps foundations (MLflow, feature stores) and artifact/version governance.
• Operate data services (Kafka, PostgreSQL, Redis, MinIO/S3, Elasticsearch/Opensearch) for high-throughput pipelines.
CI/CD & Developer Experience
• Own CI/CD with GitHub Actions/GitLab CI/Jenkins; establish trunk-based development, automated testing, and canary/blue-green releases.
• Maintain internal developer platforms, templates, and golden paths to improve delivery speed and quality.
Security, Compliance & Observability
• Implement least-privilege access, SSO (Okta/AAD), Vault-based secrets, image scanning (Trivy), and policy as code.
• Ensure SOC 2, ISO 27001, HIPAA/GDPR alignment with audit trails and immutable logs.
• Build end-to-end observability using Prometheus, Grafana, Loki/EFK, and OpenTelemetry.
FinOps & Stakeholder Management
• Track cloud spend, rightsize resources, and negotiate quotas for GPU/compute.
• Partner with Product, Data Science, and Customer Success to plan capacity for new features and enterprise go-lives.
Required Qualifications & Skills
• Strong Kubernetes expertise (production operations, networking, security, Helm, GitOps).
• Proven IaC experience with Terraform and configuration management (Ansible).
• CI/CD at scale with GitHub Actions/GitLab CI/Jenkins; artifact registries and SBOMs.
• Observability: Prometheus, Grafana, ELK/EFK or Loki, alerting and runbooks.
• Cloud proficiency in at least one major provider (AWS/Azure/GCP) and Linux fundamentals.
• Security fundamentals: network segmentation, TLS, secrets management, container hardening.
• Experience running data/streaming systems (Kafka, Redis, PostgreSQL) in production.
• Excellent communication, incident leadership, and stakeholder management.
Nice-to-Have
• GPU orchestration, Triton Inference Server, Hugging Face model serving.
• Service mesh (Istio/Linkerd), API gateways, and zero-trust patterns.
• MLOps tooling (MLflow, Feast), Airflow, dbt.
• Compliance implementations for regulated industries (BFSI, healthcare).
• Certifications: CKA/CKAD, AWS/Azure/GCP Architect, Security+.
Our Ideal Candidate
• Drives reliability with automation, not toil.
• Balances speed and safety with measurable delivery improvements.
• Thrives in customer-facing, hybrid cloud, and on-prem environments.
• Coaches teams with clear standards, runbooks, and continuous improvement.
Tip for Candidates
If you want to build secure, high-performance platforms for real-world AI at enterprise scale, follow our page for more such relevant job openings.
DevOps Manager
Posted today
Job Viewed
Job Description
Position: DevOps Manager
Location: India (Remote)
Employment Type: Full-Time
Schedule: Monday to Friday, Day Shift
Company Description
Scry AI is a research-led enterprise AI company that builds intelligent platforms to drive efficiency, insight, and compliance. Our platforms Collatio®, Auriga®, and Concentio® streamline complex workflows by automating data extraction, validation, reconciliation and delivering real-time intelligence.
We are seeking a DevOps Manager to lead our infrastructure, CI/CD, and reliability practices across cloud and on-prem deployments. You will own uptime, performance, security, and cost efficiency for AI/ML workloads powering Collatio®, Auriga®, and Concentio®.
Role Overview
As DevOps Manager, you will lead a small team of DevOps/SRE engineers to design, automate, and operate secure, compliant, and highly available platforms across AWS/Azure/GCP and customer on-prem environments. You will standardize IaC, improve CI/CD velocity, build robust observability, and enable GPU-accelerated AI inference at scale for enterprise clients.
Key Responsibilities
Platform Reliability & Operations
• Own SLOs/SLIs, availability, latency, and capacity planning across services.
• Lead incident response, root-cause analysis, postmortems, and on-call processes.
• Implement backup, disaster recovery, and business continuity for multi-region and on-prem.
Cloud, On-Prem & Edge Deployments
• Architect Kubernetes platforms (managed and self-hosted), including RBAC, network policies, and secrets management.
• Standardize infrastructure with Terraform, Helm, and GitOps (Argo CD) for repeatable customer deployments.
• Support Concentio® edge/IoT rollouts with secure remote updates and telemetry pipelines.
AI/ML & Data Infrastructure
• Enable GPU scheduling and drivers (CUDA, NVIDIA), inference runtimes (Triton), and model packaging.
• Build MLOps foundations (MLflow, feature stores) and artifact/version governance.
• Operate data services (Kafka, PostgreSQL, Redis, MinIO/S3, Elasticsearch/Opensearch) for high-throughput pipelines.
CI/CD & Developer Experience
• Own CI/CD with GitHub Actions/GitLab CI/Jenkins; establish trunk-based development, automated testing, and canary/blue-green releases.
• Maintain internal developer platforms, templates, and golden paths to improve delivery speed and quality.
Security, Compliance & Observability
• Implement least-privilege access, SSO (Okta/AAD), Vault-based secrets, image scanning (Trivy), and policy as code.
• Ensure SOC 2, ISO 27001, HIPAA/GDPR alignment with audit trails and immutable logs.
• Build end-to-end observability using Prometheus, Grafana, Loki/EFK, and OpenTelemetry.
FinOps & Stakeholder Management
• Track cloud spend, rightsize resources, and negotiate quotas for GPU/compute.
• Partner with Product, Data Science, and Customer Success to plan capacity for new features and enterprise go-lives.
Required Qualifications & Skills
• Strong Kubernetes expertise (production operations, networking, security, Helm, GitOps).
• Proven IaC experience with Terraform and configuration management (Ansible).
• CI/CD at scale with GitHub Actions/GitLab CI/Jenkins; artifact registries and SBOMs.
• Observability: Prometheus, Grafana, ELK/EFK or Loki, alerting and runbooks.
• Cloud proficiency in at least one major provider (AWS/Azure/GCP) and Linux fundamentals.
• Security fundamentals: network segmentation, TLS, secrets management, container hardening.
• Experience running data/streaming systems (Kafka, Redis, PostgreSQL) in production.
• Excellent communication, incident leadership, and stakeholder management.
Nice-to-Have
• GPU orchestration, Triton Inference Server, Hugging Face model serving.
• Service mesh (Istio/Linkerd), API gateways, and zero-trust patterns.
• MLOps tooling (MLflow, Feast), Airflow, dbt.
• Compliance implementations for regulated industries (BFSI, healthcare).
• Certifications: CKA/CKAD, AWS/Azure/GCP Architect, Security+.
Our Ideal Candidate
• Drives reliability with automation, not toil.
• Balances speed and safety with measurable delivery improvements.
• Thrives in customer-facing, hybrid cloud, and on-prem environments.
• Coaches teams with clear standards, runbooks, and continuous improvement.
Tip for Candidates
If you want to build secure, high-performance platforms for real-world AI at enterprise scale, follow our page for more such relevant job openings.
DevOps Manager
Posted 6 days ago
Job Viewed
Job Description
We are seeking a highly skilled and motivated DevOps Lead with expertise in both AWS and Azure cloud platforms to join our dynamic team. The successful candidate will collaborate with solution architects, developers, project managers, customer technical teams, and internal stakeholders to drive results. Your primary focus will be ensuring seamless customer access to applications in the cloud, managing customer workload migrations, implementing robust backup policies, overseeing hybrid cloud deployments, and building solutions for service assurance with a strong emphasis on leveraging Azure's unique capabilities.
Responsibilities:
- Define architecture, design, implement, program manage, and lead technology teams in delivering complex technical solutions for our clients across both AWS and Azure platforms.
- Span across DevOps, Continuous Integration (CI), and Continuous Delivery (CD) areas, providing demonstrable implementation experience in shaping value-add consulting solutions.
- Deploy, automate, and maintain cloud infrastructure with leading public cloud vendors such as Amazon Web Services (AWS) and Microsoft Azure, with a keen focus on integrating Azure-specific services and tools.
- Set up backups, replications, archiving, and implement disaster recovery measures leveraging Azure's resilience and geo-redundancy features.
- Utilize Azure DevOps services for better collaboration, reporting, and increasing automation in the CI/CD pipelines.
Mandatory Skills and Experience:
- Detail-oriented with a holistic perspective on system architecture, including at least 1 year of hands-on experience with Azure cloud services.
- Strong shell scripting and Linux administration skills, with a deep understanding of Linux and virtualization.
- Expertise in server technologies like Apache, Nginx, and Node, including optimization experience.
- Knowledge of database technologies such as MySQL, Redis, and MongoDB, with proficiency in management, replication, and disaster recovery.
- Proven experience in medium to large-scale public cloud deployments on AWS and Azure, including the migration of complex, multi-tier applications to these platforms.
- In-depth working knowledge of AWS and Azure, showcasing the ability to leverage Azure-specific features such as Azure Active Directory, Azure Kubernetes Service (AKS), Azure Functions, and Azure Logic Apps.
- Familiarity with CI/CD, automation, and monitoring processes for production-level infrastructure, including the use of Azure Monitor and Azure Automation.
- Deep understanding of system performance and the ability to analyze root causes using tools available in Azure.
- Experience with Azure-specific management and governance tools, such as Azure Policy, Azure Blueprints, and Azure Resource Manager (ARM) templates.
- Proficiency in CI/CD automation using tools like Jenkins, Travis CI, Circle CI, or Azure DevOps.
- Knowledge of security infrastructure and vulnerabilities, including Azure's security tools like Azure Security Center and Azure Sentinel.
- Capability to analyze costs for the entire infrastructure, including cost management and optimization in Azure environments.
- Hands-on experience with configuration management tools like Ansible, Puppet, Chef, or similar, with an emphasis on their integration in Azure environments.
- Experience with container orchestration tools such as Kubernetes, Docker Swarm, and Docker containers, with a preference for those proficient in Azure Kubernetes Service (AKS).