Terraform
Posted 6 days ago
Job Viewed
Job Description
Overall Responsibilities:
Design and implement Infrastructure as Code (IaC) solutions using Terraform.
Develop and maintain reusable Terraform modules for cloud infrastructure automation.
Collaborate with cloud architects, DevOps, and security teams to optimize cloud deployments.
Ensure scalability, security, and compliance of infrastructure solutions.
Improve cloud infrastructure reliability through automated provisioning and monitoring.
Document best practices, standards, and Terraform coding guidelines.
Provide mentorship and guidance to junior Terraform developers and cloud engineers.
Technical Responsibilities:
Write and manage Terraform scripts for provisioning AWS, Azure, and GCP infrastructure.
Optimize Terraform configurations for high availability, cost efficiency, and security.
Integrate Terraform with CI/CD pipelines using GitHub Actions, Jenkins, and AWS CodePipeline.
Manage state files and remote backends using Terraform Cloud or S3 with DynamoDB locking.
Implement Role-Based Access Control (RBAC) and security best practices in IaCdeployments.
Troubleshoot and debug Terraform-related issues in production and staging environments.
Automate infrastructure testing using tools like Terratestand Checkov.
Contribute to infrastructure governance, enforcing policies via Sentinel or Open Policy Agent (OPA). Tools & Technologies(as applicable):Terraform, AWS, Azure, GCP, Terraform Cloud, AWS CloudFormation, Kubernetes, Docker, Ansible, Jenkins, GitHub Actions, AWS CodePipeline, Prometheus, Datadog, Sentinel, Open Policy Agent (OPA), Terratest, Checkov
Preferred Experience:
Extensive experience in designing, developing, and managing Terraform-based infrastructure.
Strong knowledge of cloud platforms (AWS, Azure, GCP) and their best practices.
Experience integrating Terraform with DevOps pipelines and automation workflows.
Hands-on expertise in Terraform security, governance, and policy enforcement.
Familiarity with containerization and orchestration using Kubernetes and Docker.
Experience in cloud migration projects and hybrid cloud architectures.
Golang+Terraform
Posted 3 days ago
Job Viewed
Job Description
Must-Have
- Proficiency in GO Language
- Proficiency in Terraform/HCL (Hashi Corp Language)
- Proficiency in SDLC concepts (bitBucket, Jira, Jenkins, etc.)
- Fundamental knowledge of IT Service Management/ITIL processes and tools
- REST API connectivity programming concepts
- Appreciation for clean and well documented code
- Use of source control products and methodologies
- 10+ years of total IT experience
- 8+ years of experience in middleware integration
- 4+ years of experience designing and deploying integration services solutions using MuleSoft
- Advanced knowledge of SOA Design patterns for building middleware systems ground up using Message Routing, Content Enrichment, Message Filtering, Message Transformation, Guaranteed delivery, Message sequencing, Batch message processing, error handling and reconciliation mechanisms.
- Experienced in engineering/upgrading complex solutions in line with demanding client requirements
- Experienced in J2EE, Java Servlets, JMS, EJB, Design Patterns, Web services and building integration projects
- Strong troubleshooting skills
- Excellent communication skills; candidate must be able to concisely communicate complex technical topics
- Proficient in the Agile, Scrum, and iterative SDLC practices
Good-to-Have
- Build integration flows, Terraform modules and/or APIs using MuleSoft Anypoint Studio 3.x and 4.x, Java, JMS, Structured Query Language (SQL)
- Fundamental knowledge of IT Service Management/ITIL processes and tools
- REST API connectivity programming concepts
- Appreciation for clean and well documented code
- Use of source control products and methodologies
Exp Range: 5 TO 10
Location: Chennai/Hyderabad/Bangalore/Mumbai/Indore
DevOps Engineer - Kubernetes / OpenShift, Terraform and Azure / AWS DevOps
Posted today
Job Viewed
Job Description
**Job Summary**
We are seeking an experienced Infra Dev Specialist with 4 to 8 years of experience to join our team. The ideal candidate will have expertise in Docker Ansible Kubernetes linux CICD Azure Devops Terraform and Shellscript. This role is hybrid with no travel required and involves working closely with our development and operations teams to ensure seamless infrastructure management and deployment.
**Responsibilities**
+ Develop and maintain scalable infrastructure solutions using Docker Ansible Kubernetes linux CICD Azure Devops Terraform and shell script to support business applications.
+ Implement and manage Docker Containers to ensure efficient application deployment and operation.
+ Collaborate with cross-functional teams to design and optimize Kubernetes Work Models for improved performance.
+ Monitor and troubleshoot infrastructure issues to minimize downtime and ensure system reliability.
+ Automate deployment processes to enhance efficiency and reduce manual intervention.
+ Provide technical guidance and support to development teams to ensure best practices in containerization.
+ Conduct regular system audits to identify potential improvements and implement necessary changes.
+ Document infrastructure configurations and processes to maintain a comprehensive knowledge base.
+ Stay updated with the latest industry trends and technologies to continuously improve infrastructure solutions.
+ Ensure compliance with security policies and procedures to protect company data and assets.
+ Participate in capacity planning to ensure infrastructure scalability and performance.
+ Optimize resource utilization to reduce costs and improve system efficiency.
+ Contribute to the companys purpose by ensuring robust and reliable infrastructure that supports business growth.
**Qualifications**
+ Possess strong experience in Kubernetes Services and Docker Container management.
+ Demonstrate proficiency in Kubernetes Work Model design and implementation.
+ Exhibit excellent problem-solving skills with a focus on infrastructure optimization.
+ Have a solid understanding of automation tools and techniques for deployment processes.
+ Show ability to work collaboratively in a hybrid work environment.
+ Display strong communication skills to effectively interact with cross-functional teams.
+ Maintain a proactive approach to learning and adapting to new technologies.
**Certifications Required**
Certified Kubernetes Administrator (CKA)
Cognizant is an equal opportunity employer. All qualified applicants will receive consideration for employment without regard to sex, gender identity, sexual orientation, race, color, religion, national origin, disability, protected Veteran status, age, or any other characteristic protected by law.
GCP Infrastructure Engineer - Google Cloud, Terraform, Python, Bash, GKE, CI/CD
Posted 1 day ago
Job Viewed
Job Description
Découvrez votre prochaine opportunité au sein d'une organisation qui compte parmi les 500 plus importantes entreprises mondiales. Envisagez des opportunités innovantes, découvrez notre culture enrichissante et travaillez avec des équipes talentueuses qui vous poussent à vous développer chaque jour. Nous savons ce qu'il faut faire pour diriger UPS vers l'avenir : des personnes passionnées dotées d'une combinaison unique de compétences. Si vous avez les qualités, de la motivation, de l'autonomie ou le leadership pour diriger des équipes, il existe des postes adaptés à vos aspirations et à vos compétences d'aujourd'hui et de demain.
**Fiche de poste :**
Job Summary:
We are seeking a highly skilled GCP Infrastructure Engineer to design, build, and manage the cloud infrastructure that powers Generative AI (GenAI) applications at scale. In this role, you will leverage Google Cloud Platform (GCP) Vertex AI, IBM Watsonx, and containerization technologies such as Docker and Kubernetes (GKE) to deliver secure, scalable, and high-performance AI solutions. You will own the end-to-end infrastructure lifecycle - from design and provisioning to automation, monitoring, and optimization - while enabling data scientists and ML engineers to seamlessly deploy and operate GenAI workloads.
Key Responsibilities:
Cloud Infrastructure & Platform Engineering
+ Design, provision, and maintain scalable, secure, and cost-efficient infrastructure for GenAI applications on GCP.
+ Deploy and manage containerized workloads using Docker and Kubernetes (GKE).
+ Configure and optimize Vertex AI and IBM Watsonx platforms for training, fine-tuning, and serving LLMs and other generative models.
+ Implement high-performance GPU/TPU clusters to support distributed training and large-scale inference.
+ Ensure business continuity through backup, disaster recovery, and multi-region deployments.
Automation & Reliability
+ Develop and maintain Infrastructure as Code (IaC) templates with Terraform, or Cloud Deployment Manager.
+ Adopt GitOps practices (Flux) for infrastructure lifecycle management.
+ Build and optimize CI/CD pipelines for data pipelines, model workflows, and GenAI applications.
+ Apply SRE principles (SLIs, SLOs, SLAs) to guarantee platform reliability and uptime.
Security, Governance & Compliance
+ Embed DevSecOps best practices across the infrastructure lifecycle, including policy-as-code, vulnerability scanning, and secrets management.
+ Enforce identity and access management (IAM), network segmentation, and data encryption in compliance with standards (HIPAA, SOX, GDPR, FedRAMP).
+ Collaborate with enterprise security and compliance teams to implement governance frameworks for GenAI platforms.
Monitoring, Observability & Cost Optimization
+ Implement observability stacks (Prometheus, Grafana, Cloud Monitoring, Datadog) for both infra health and ML-specific metrics (model drift, data anomalies).
+ Define KPIs to monitor system health, performance, and adoption across AI workloads.
+ Optimize cloud cost efficiency for GPU/TPU-intensive workloads using autoscaling, preemptible instances, and utilization monitoring.
Collaboration & Enablement
+ Partner with data scientists, ML engineers, and software teams to streamline GenAI application development and deployment.
+ Provide onboarding, documentation, and reusable templates to enable faster adoption of AI infrastructure.
+ Stay current with the latest advancements in GenAI, cloud-native infrastructure, and container orchestration.
Required Education
Bachelor's or master's degree in computer science, Software Engineering, or a related field.
Required Experience
+ **5+ years** of experience in cloud infrastructure engineering, **DevOps,** or platform engineering.
+ Experience with GenAI use cases (chatbots, content generation, code assistants, etc.).
+ Strong hands-on expertise with **Google Cloud Platform (GCP),** especially **Vertex** **AI.**
+ Experience with **IBM Watsonx for AI application** deployment and management.
+ Proven skills in **Docker, Kubernetes (GKE),** and container orchestration at scale.
+ Proficiency in **Python, Bash,** or other relevant scripting languages.
+ Strong understanding of cloud networking, IAM, and security best practices.
+ Experience with CI/CD tools (GitHub Actions, GitLab CI, Jenkins) and IaC tools (Terraform, Pulumi, Ansible, Deployment Manager).
+ Familiarity with data pipelines and integration tools (Dataflow, Apache Beam, Pub/Sub, Kafka).
+ Excellent problem-solving, debugging, and communication skills.
Preferred Experience
+ Experience in MLOps practices for model deployment, monitoring, and retraining.
+ Exposure to multi-cloud or hybrid cloud environments (GCP, AWS, Azure, on-prem).
+ Hands-on experience with feature stores (Vertex AI Feature Store, Feast) and ML observability tools (EvidentlyAI, Fiddler).
+ Knowledge of distributed training frameworks (Horovod, DeepSpeed, PyTorch Distributed).
+ Contributions to open-source projects in infrastructure, MLOps, or GenAI.
+ Experience managing infrastructure in regulated industries.
Preferred Certifications:
+ Google Cloud Certified - Professional Cloud Architect
+ Google Cloud Certified - Machine Learning Engineer
+ Certified Kubernetes Administrator (CKA) or Certified Kubernetes Application Developer (CKAD)
+ IBM Certified Watsonx Generative AI Engineer - Associate
+ IBM Certified Solution Architect - Cloud Pak for Data
+ Other relevant certifications in AI, Machine Learning, or Cloud-Native technologies.
**Type de contrat:**
en CDI
_Chez UPS, égalité des chances, traitement équitable et environnement de travail inclusif sont des valeurs clefs auxquelles nous sommes attachés._
GCP Infrastructure Engineer - Google Cloud, Terraform, Python, Bash, GKE, CI/CD
Posted 1 day ago
Job Viewed
Job Description
Explore your next opportunity at a Fortune Global 500 organization. Envision innovative possibilities, experience our rewarding culture, and work with talented teams that help you become better every day. We know what it takes to lead UPS into tomorrow-people with a unique combination of skill + passion. If you have the qualities and drive to lead yourself or teams, there are roles ready to cultivate your skills and take you to the next level.
**Job Description:**
Job Summary:
We are seeking a highly skilled GCP Infrastructure Engineer to design, build, and manage the cloud infrastructure that powers Generative AI (GenAI) applications at scale. In this role, you will leverage Google Cloud Platform (GCP) Vertex AI, IBM Watsonx, and containerization technologies such as Docker and Kubernetes (GKE) to deliver secure, scalable, and high-performance AI solutions. You will own the end-to-end infrastructure lifecycle - from design and provisioning to automation, monitoring, and optimization - while enabling data scientists and ML engineers to seamlessly deploy and operate GenAI workloads.
Key Responsibilities:
Cloud Infrastructure & Platform Engineering
+ Design, provision, and maintain scalable, secure, and cost-efficient infrastructure for GenAI applications on GCP.
+ Deploy and manage containerized workloads using Docker and Kubernetes (GKE).
+ Configure and optimize Vertex AI and IBM Watsonx platforms for training, fine-tuning, and serving LLMs and other generative models.
+ Implement high-performance GPU/TPU clusters to support distributed training and large-scale inference.
+ Ensure business continuity through backup, disaster recovery, and multi-region deployments.
Automation & Reliability
+ Develop and maintain Infrastructure as Code (IaC) templates with Terraform, or Cloud Deployment Manager.
+ Adopt GitOps practices (Flux) for infrastructure lifecycle management.
+ Build and optimize CI/CD pipelines for data pipelines, model workflows, and GenAI applications.
+ Apply SRE principles (SLIs, SLOs, SLAs) to guarantee platform reliability and uptime.
Security, Governance & Compliance
+ Embed DevSecOps best practices across the infrastructure lifecycle, including policy-as-code, vulnerability scanning, and secrets management.
+ Enforce identity and access management (IAM), network segmentation, and data encryption in compliance with standards (HIPAA, SOX, GDPR, FedRAMP).
+ Collaborate with enterprise security and compliance teams to implement governance frameworks for GenAI platforms.
Monitoring, Observability & Cost Optimization
+ Implement observability stacks (Prometheus, Grafana, Cloud Monitoring, Datadog) for both infra health and ML-specific metrics (model drift, data anomalies).
+ Define KPIs to monitor system health, performance, and adoption across AI workloads.
+ Optimize cloud cost efficiency for GPU/TPU-intensive workloads using autoscaling, preemptible instances, and utilization monitoring.
Collaboration & Enablement
+ Partner with data scientists, ML engineers, and software teams to streamline GenAI application development and deployment.
+ Provide onboarding, documentation, and reusable templates to enable faster adoption of AI infrastructure.
+ Stay current with the latest advancements in GenAI, cloud-native infrastructure, and container orchestration.
Required Education
Bachelor's or master's degree in computer science, Software Engineering, or a related field.
Required Experience
+ **5+ years** of experience in cloud infrastructure engineering, **DevOps,** or platform engineering.
+ Experience with GenAI use cases (chatbots, content generation, code assistants, etc.).
+ Strong hands-on expertise with **Google Cloud Platform (GCP),** especially **Vertex** **AI.**
+ Experience with **IBM Watsonx for AI application** deployment and management.
+ Proven skills in **Docker, Kubernetes (GKE),** and container orchestration at scale.
+ Proficiency in **Python, Bash,** or other relevant scripting languages.
+ Strong understanding of cloud networking, IAM, and security best practices.
+ Experience with CI/CD tools (GitHub Actions, GitLab CI, Jenkins) and IaC tools (Terraform, Pulumi, Ansible, Deployment Manager).
+ Familiarity with data pipelines and integration tools (Dataflow, Apache Beam, Pub/Sub, Kafka).
+ Excellent problem-solving, debugging, and communication skills.
Preferred Experience
+ Experience in MLOps practices for model deployment, monitoring, and retraining.
+ Exposure to multi-cloud or hybrid cloud environments (GCP, AWS, Azure, on-prem).
+ Hands-on experience with feature stores (Vertex AI Feature Store, Feast) and ML observability tools (EvidentlyAI, Fiddler).
+ Knowledge of distributed training frameworks (Horovod, DeepSpeed, PyTorch Distributed).
+ Contributions to open-source projects in infrastructure, MLOps, or GenAI.
+ Experience managing infrastructure in regulated industries.
Preferred Certifications:
+ Google Cloud Certified - Professional Cloud Architect
+ Google Cloud Certified - Machine Learning Engineer
+ Certified Kubernetes Administrator (CKA) or Certified Kubernetes Application Developer (CKAD)
+ IBM Certified Watsonx Generative AI Engineer - Associate
+ IBM Certified Solution Architect - Cloud Pak for Data
+ Other relevant certifications in AI, Machine Learning, or Cloud-Native technologies.
**Employee Type:**
Permanent
UPS is committed to providing a workplace free of discrimination, harassment, and retaliation.
GCP Infrastructure Engineer - Google Cloud, Terraform, Python, Bash, GKE, CI/CD
Posted 1 day ago
Job Viewed
Job Description
Découvrez votre prochaine opportunité au sein d'une organisation qui compte parmi les 500 plus importantes entreprises mondiales. Envisagez des opportunités innovantes, découvrez notre culture enrichissante et travaillez avec des équipes talentueuses qui vous poussent à vous développer chaque jour. Nous savons ce qu'il faut faire pour diriger UPS vers l'avenir : des personnes passionnées dotées d'une combinaison unique de compétences. Si vous avez les qualités, de la motivation, de l'autonomie ou le leadership pour diriger des équipes, il existe des postes adaptés à vos aspirations et à vos compétences d'aujourd'hui et de demain.
**Fiche de poste :**
Job Summary:
We are seeking a highly skilled GCP Infrastructure Engineer to design, build, and manage the cloud infrastructure that powers Generative AI (GenAI) applications at scale. In this role, you will leverage Google Cloud Platform (GCP) Vertex AI, IBM Watsonx, and containerization technologies such as Docker and Kubernetes (GKE) to deliver secure, scalable, and high-performance AI solutions. You will own the end-to-end infrastructure lifecycle - from design and provisioning to automation, monitoring, and optimization - while enabling data scientists and ML engineers to seamlessly deploy and operate GenAI workloads.
Key Responsibilities:
Cloud Infrastructure & Platform Engineering
+ Design, provision, and maintain scalable, secure, and cost-efficient infrastructure for GenAI applications on GCP.
+ Deploy and manage containerized workloads using Docker and Kubernetes (GKE).
+ Configure and optimize Vertex AI and IBM Watsonx platforms for training, fine-tuning, and serving LLMs and other generative models.
+ Implement high-performance GPU/TPU clusters to support distributed training and large-scale inference.
+ Ensure business continuity through backup, disaster recovery, and multi-region deployments.
Automation & Reliability
+ Develop and maintain Infrastructure as Code (IaC) templates with Terraform, or Cloud Deployment Manager.
+ Adopt GitOps practices (Flux) for infrastructure lifecycle management.
+ Build and optimize CI/CD pipelines for data pipelines, model workflows, and GenAI applications.
+ Apply SRE principles (SLIs, SLOs, SLAs) to guarantee platform reliability and uptime.
Security, Governance & Compliance
+ Embed DevSecOps best practices across the infrastructure lifecycle, including policy-as-code, vulnerability scanning, and secrets management.
+ Enforce identity and access management (IAM), network segmentation, and data encryption in compliance with standards (HIPAA, SOX, GDPR, FedRAMP).
+ Collaborate with enterprise security and compliance teams to implement governance frameworks for GenAI platforms.
Monitoring, Observability & Cost Optimization
+ Implement observability stacks (Prometheus, Grafana, Cloud Monitoring, Datadog) for both infra health and ML-specific metrics (model drift, data anomalies).
+ Define KPIs to monitor system health, performance, and adoption across AI workloads.
+ Optimize cloud cost efficiency for GPU/TPU-intensive workloads using autoscaling, preemptible instances, and utilization monitoring.
Collaboration & Enablement
+ Partner with data scientists, ML engineers, and software teams to streamline GenAI application development and deployment.
+ Provide onboarding, documentation, and reusable templates to enable faster adoption of AI infrastructure.
+ Stay current with the latest advancements in GenAI, cloud-native infrastructure, and container orchestration.
Required Education
Bachelor's or master's degree in computer science, Software Engineering, or a related field.
Required Experience
+ **5+ years** of experience in cloud infrastructure engineering, **DevOps,** or platform engineering.
+ Experience with GenAI use cases (chatbots, content generation, code assistants, etc.).
+ Strong hands-on expertise with **Google Cloud Platform (GCP),** especially **Vertex** **AI.**
+ Experience with **IBM Watsonx for AI application** deployment and management.
+ Proven skills in **Docker, Kubernetes (GKE),** and container orchestration at scale.
+ Proficiency in **Python, Bash,** or other relevant scripting languages.
+ Strong understanding of cloud networking, IAM, and security best practices.
+ Experience with CI/CD tools (GitHub Actions, GitLab CI, Jenkins) and IaC tools (Terraform, Pulumi, Ansible, Deployment Manager).
+ Familiarity with data pipelines and integration tools (Dataflow, Apache Beam, Pub/Sub, Kafka).
+ Excellent problem-solving, debugging, and communication skills.
Preferred Experience
+ Experience in MLOps practices for model deployment, monitoring, and retraining.
+ Exposure to multi-cloud or hybrid cloud environments (GCP, AWS, Azure, on-prem).
+ Hands-on experience with feature stores (Vertex AI Feature Store, Feast) and ML observability tools (EvidentlyAI, Fiddler).
+ Knowledge of distributed training frameworks (Horovod, DeepSpeed, PyTorch Distributed).
+ Contributions to open-source projects in infrastructure, MLOps, or GenAI.
+ Experience managing infrastructure in regulated industries.
Preferred Certifications:
+ Google Cloud Certified - Professional Cloud Architect
+ Google Cloud Certified - Machine Learning Engineer
+ Certified Kubernetes Administrator (CKA) or Certified Kubernetes Application Developer (CKAD)
+ IBM Certified Watsonx Generative AI Engineer - Associate
+ IBM Certified Solution Architect - Cloud Pak for Data
+ Other relevant certifications in AI, Machine Learning, or Cloud-Native technologies.
**Type de contrat:**
en CDI
_Chez UPS, égalité des chances, traitement équitable et environnement de travail inclusif sont des valeurs clefs auxquelles nous sommes attachés._
GCP Infrastructure Engineer - Google Cloud, Terraform, Python, Bash, GKE, CI/CD
Posted 1 day ago
Job Viewed
Job Description
Explore your next opportunity at a Fortune Global 500 organization. Envision innovative possibilities, experience our rewarding culture, and work with talented teams that help you become better every day. We know what it takes to lead UPS into tomorrow-people with a unique combination of skill + passion. If you have the qualities and drive to lead yourself or teams, there are roles ready to cultivate your skills and take you to the next level.
**Job Description:**
Job Summary:
We are seeking a highly skilled GCP Infrastructure Engineer to design, build, and manage the cloud infrastructure that powers Generative AI (GenAI) applications at scale. In this role, you will leverage Google Cloud Platform (GCP) Vertex AI, IBM Watsonx, and containerization technologies such as Docker and Kubernetes (GKE) to deliver secure, scalable, and high-performance AI solutions. You will own the end-to-end infrastructure lifecycle - from design and provisioning to automation, monitoring, and optimization - while enabling data scientists and ML engineers to seamlessly deploy and operate GenAI workloads.
Key Responsibilities:
Cloud Infrastructure & Platform Engineering
+ Design, provision, and maintain scalable, secure, and cost-efficient infrastructure for GenAI applications on GCP.
+ Deploy and manage containerized workloads using Docker and Kubernetes (GKE).
+ Configure and optimize Vertex AI and IBM Watsonx platforms for training, fine-tuning, and serving LLMs and other generative models.
+ Implement high-performance GPU/TPU clusters to support distributed training and large-scale inference.
+ Ensure business continuity through backup, disaster recovery, and multi-region deployments.
Automation & Reliability
+ Develop and maintain Infrastructure as Code (IaC) templates with Terraform, or Cloud Deployment Manager.
+ Adopt GitOps practices (Flux) for infrastructure lifecycle management.
+ Build and optimize CI/CD pipelines for data pipelines, model workflows, and GenAI applications.
+ Apply SRE principles (SLIs, SLOs, SLAs) to guarantee platform reliability and uptime.
Security, Governance & Compliance
+ Embed DevSecOps best practices across the infrastructure lifecycle, including policy-as-code, vulnerability scanning, and secrets management.
+ Enforce identity and access management (IAM), network segmentation, and data encryption in compliance with standards (HIPAA, SOX, GDPR, FedRAMP).
+ Collaborate with enterprise security and compliance teams to implement governance frameworks for GenAI platforms.
Monitoring, Observability & Cost Optimization
+ Implement observability stacks (Prometheus, Grafana, Cloud Monitoring, Datadog) for both infra health and ML-specific metrics (model drift, data anomalies).
+ Define KPIs to monitor system health, performance, and adoption across AI workloads.
+ Optimize cloud cost efficiency for GPU/TPU-intensive workloads using autoscaling, preemptible instances, and utilization monitoring.
Collaboration & Enablement
+ Partner with data scientists, ML engineers, and software teams to streamline GenAI application development and deployment.
+ Provide onboarding, documentation, and reusable templates to enable faster adoption of AI infrastructure.
+ Stay current with the latest advancements in GenAI, cloud-native infrastructure, and container orchestration.
Required Education
Bachelor's or master's degree in computer science, Software Engineering, or a related field.
Required Experience
+ **5+ years** of experience in cloud infrastructure engineering, **DevOps,** or platform engineering.
+ Experience with GenAI use cases (chatbots, content generation, code assistants, etc.).
+ Strong hands-on expertise with **Google Cloud Platform (GCP),** especially **Vertex** **AI.**
+ Experience with **IBM Watsonx for AI application** deployment and management.
+ Proven skills in **Docker, Kubernetes (GKE),** and container orchestration at scale.
+ Proficiency in **Python, Bash,** or other relevant scripting languages.
+ Strong understanding of cloud networking, IAM, and security best practices.
+ Experience with CI/CD tools (GitHub Actions, GitLab CI, Jenkins) and IaC tools (Terraform, Pulumi, Ansible, Deployment Manager).
+ Familiarity with data pipelines and integration tools (Dataflow, Apache Beam, Pub/Sub, Kafka).
+ Excellent problem-solving, debugging, and communication skills.
Preferred Experience
+ Experience in MLOps practices for model deployment, monitoring, and retraining.
+ Exposure to multi-cloud or hybrid cloud environments (GCP, AWS, Azure, on-prem).
+ Hands-on experience with feature stores (Vertex AI Feature Store, Feast) and ML observability tools (EvidentlyAI, Fiddler).
+ Knowledge of distributed training frameworks (Horovod, DeepSpeed, PyTorch Distributed).
+ Contributions to open-source projects in infrastructure, MLOps, or GenAI.
+ Experience managing infrastructure in regulated industries.
Preferred Certifications:
+ Google Cloud Certified - Professional Cloud Architect
+ Google Cloud Certified - Machine Learning Engineer
+ Certified Kubernetes Administrator (CKA) or Certified Kubernetes Application Developer (CKAD)
+ IBM Certified Watsonx Generative AI Engineer - Associate
+ IBM Certified Solution Architect - Cloud Pak for Data
+ Other relevant certifications in AI, Machine Learning, or Cloud-Native technologies.
**Employee Type:**
Permanent
UPS is committed to providing a workplace free of discrimination, harassment, and retaliation.
Be The First To Know
About the latest Terraform Jobs in Chennai !
GCP Infrastructure Engineer - Google Cloud, Terraform, Python, Bash, GKE, CI/CD
Posted 1 day ago
Job Viewed
Job Description
Découvrez votre prochaine opportunité au sein d'une organisation qui compte parmi les 500 plus importantes entreprises mondiales. Envisagez des opportunités innovantes, découvrez notre culture enrichissante et travaillez avec des équipes talentueuses qui vous poussent à vous développer chaque jour. Nous savons ce qu'il faut faire pour diriger UPS vers l'avenir : des personnes passionnées dotées d'une combinaison unique de compétences. Si vous avez les qualités, de la motivation, de l'autonomie ou le leadership pour diriger des équipes, il existe des postes adaptés à vos aspirations et à vos compétences d'aujourd'hui et de demain.
**Fiche de poste :**
**Job Summary:**
We are seeking a highly skilled GCP Infrastructure Engineer to design, build, and manage the cloud infrastructure that powers Generative AI (GenAI) applications at scale. In this role, you will leverage Google Cloud Platform (GCP) Vertex AI, IBM Watsonx, and containerization technologies such as Docker and Kubernetes (GKE) to deliver secure, scalable, and high-performance AI solutions. You will own the end-to-end infrastructure lifecycle - from design and provisioning to automation, monitoring, and optimization - while enabling data scientists and ML engineers to seamlessly deploy and operate GenAI workloads.
**Key Responsibilities:**
**Cloud Infrastructure & Platform Engineering**
+ Design, provision, and maintain scalable, secure, and cost-efficient infrastructure for GenAI applications on GCP.
+ Deploy and manage containerized workloads using Docker and Kubernetes (GKE).
+ Configure and optimize Vertex AI and IBM Watsonx platforms for training, fine-tuning, and serving LLMs and other generative models.
+ Implement high-performance GPU/TPU clusters to support distributed training and large-scale inference.
+ Ensure business continuity through backup, disaster recovery, and multi-region deployments.
**Automation & Reliability**
+ Develop and maintain Infrastructure as Code (IaC) templates with Terraform, or Cloud Deployment Manager.
+ Adopt GitOps practices (Flux) for infrastructure lifecycle management.
+ Build and optimize CI/CD pipelines for data pipelines, model workflows, and GenAI applications.
+ Apply SRE principles (SLIs, SLOs, SLAs) to guarantee platform reliability and uptime.
**Security, Governance & Compliance**
+ Embed DevSecOps best practices across the infrastructure lifecycle, including policy-as-code, vulnerability scanning, and secrets management.
+ Enforce identity and access management (IAM), network segmentation, and data encryption in compliance with standards (HIPAA, SOX, GDPR, FedRAMP).
+ Collaborate with enterprise security and compliance teams to implement governance frameworks for GenAI platforms.
**Monitoring, Observability & Cost Optimization**
+ Implement observability stacks (Prometheus, Grafana, Cloud Monitoring, Datadog) for both infra health and ML-specific metrics (model drift, data anomalies).
+ Define KPIs to monitor system health, performance, and adoption across AI workloads.
+ Optimize cloud cost efficiency for GPU/TPU-intensive workloads using autoscaling, preemptible instances, and utilization monitoring.
**Collaboration & Enablement**
+ Partner with data scientists, ML engineers, and software teams to streamline GenAI application development and deployment.
+ Provide onboarding, documentation, and reusable templates to enable faster adoption of AI infrastructure.
+ Stay current with the latest advancements in GenAI, cloud-native infrastructure, and container orchestration.
**Required Qualifications**
**Education**
Bachelor's or master's degree in computer science, Software Engineering, or a related field.
**Experience**
+ **5+ years** of experience in cloud infrastructure engineering, **DevOps,** or platform engineering.
+ Experience with GenAI use cases (chatbots, content generation, code assistants, etc.).
+ Strong hands-on expertise with **Google Cloud Platform (GCP),** especially **Vertex** **AI.**
+ Experience with **IBM Watsonx for AI application** deployment and management.
+ Proven skills in **Docker, Kubernetes (GKE),** and container orchestration at scale.
+ Proficiency in **Python, Bash,** or other relevant scripting languages.
+ Strong understanding of cloud networking, IAM, and security best practices.
+ Experience with CI/CD tools (GitHub Actions, GitLab CI, Jenkins) and IaC tools (Terraform, Pulumi, Ansible, Deployment Manager).
+ Familiarity with data pipelines and integration tools (Dataflow, Apache Beam, Pub/Sub, Kafka).
+ Excellent problem-solving, debugging, and communication skills.
**Preferred Experience**
+ Experience in MLOps practices for model deployment, monitoring, and retraining.
+ Exposure to multi-cloud or hybrid cloud environments (GCP, AWS, Azure, on-prem).
+ Hands-on experience with feature stores (Vertex AI Feature Store, Feast) and ML observability tools (EvidentlyAI, Fiddler).
+ Knowledge of distributed training frameworks (Horovod, DeepSpeed, PyTorch Distributed).
+ Contributions to open-source projects in infrastructure, MLOps, or GenAI.
+ Experience managing infrastructure in regulated industries.
**Preferred Certifications:**
+ Google Cloud Certified - Professional Cloud Architect
+ Google Cloud Certified - Machine Learning Engineer
+ Certified Kubernetes Administrator (CKA) or Certified Kubernetes Application Developer (CKAD)
+ IBM Certified Watsonx Generative AI Engineer - Associate
+ IBM Certified Solution Architect - Cloud Pak for Data
+ Other relevant certifications in AI, Machine Learning, or Cloud-Native technologies.
**Type de contrat:**
en CDI
_Chez UPS, égalité des chances, traitement équitable et environnement de travail inclusif sont des valeurs clefs auxquelles nous sommes attachés._
GCP Infrastructure Engineer - Google Cloud, Terraform, Python, Bash, GKE, CI/CD
Posted 1 day ago
Job Viewed
Job Description
Explore your next opportunity at a Fortune Global 500 organization. Envision innovative possibilities, experience our rewarding culture, and work with talented teams that help you become better every day. We know what it takes to lead UPS into tomorrow-people with a unique combination of skill + passion. If you have the qualities and drive to lead yourself or teams, there are roles ready to cultivate your skills and take you to the next level.
**Job Description:**
**Job Summary:**
We are seeking a highly skilled GCP Infrastructure Engineer to design, build, and manage the cloud infrastructure that powers Generative AI (GenAI) applications at scale. In this role, you will leverage Google Cloud Platform (GCP) Vertex AI, IBM Watsonx, and containerization technologies such as Docker and Kubernetes (GKE) to deliver secure, scalable, and high-performance AI solutions. You will own the end-to-end infrastructure lifecycle - from design and provisioning to automation, monitoring, and optimization - while enabling data scientists and ML engineers to seamlessly deploy and operate GenAI workloads.
**Key Responsibilities:**
**Cloud Infrastructure & Platform Engineering**
+ Design, provision, and maintain scalable, secure, and cost-efficient infrastructure for GenAI applications on GCP.
+ Deploy and manage containerized workloads using Docker and Kubernetes (GKE).
+ Configure and optimize Vertex AI and IBM Watsonx platforms for training, fine-tuning, and serving LLMs and other generative models.
+ Implement high-performance GPU/TPU clusters to support distributed training and large-scale inference.
+ Ensure business continuity through backup, disaster recovery, and multi-region deployments.
**Automation & Reliability**
+ Develop and maintain Infrastructure as Code (IaC) templates with Terraform, or Cloud Deployment Manager.
+ Adopt GitOps practices (Flux) for infrastructure lifecycle management.
+ Build and optimize CI/CD pipelines for data pipelines, model workflows, and GenAI applications.
+ Apply SRE principles (SLIs, SLOs, SLAs) to guarantee platform reliability and uptime.
**Security, Governance & Compliance**
+ Embed DevSecOps best practices across the infrastructure lifecycle, including policy-as-code, vulnerability scanning, and secrets management.
+ Enforce identity and access management (IAM), network segmentation, and data encryption in compliance with standards (HIPAA, SOX, GDPR, FedRAMP).
+ Collaborate with enterprise security and compliance teams to implement governance frameworks for GenAI platforms.
**Monitoring, Observability & Cost Optimization**
+ Implement observability stacks (Prometheus, Grafana, Cloud Monitoring, Datadog) for both infra health and ML-specific metrics (model drift, data anomalies).
+ Define KPIs to monitor system health, performance, and adoption across AI workloads.
+ Optimize cloud cost efficiency for GPU/TPU-intensive workloads using autoscaling, preemptible instances, and utilization monitoring.
**Collaboration & Enablement**
+ Partner with data scientists, ML engineers, and software teams to streamline GenAI application development and deployment.
+ Provide onboarding, documentation, and reusable templates to enable faster adoption of AI infrastructure.
+ Stay current with the latest advancements in GenAI, cloud-native infrastructure, and container orchestration.
**Required Qualifications**
**Education**
Bachelor's or master's degree in computer science, Software Engineering, or a related field.
**Experience**
+ **5+ years** of experience in cloud infrastructure engineering, **DevOps,** or platform engineering.
+ Experience with GenAI use cases (chatbots, content generation, code assistants, etc.).
+ Strong hands-on expertise with **Google Cloud Platform (GCP),** especially **Vertex** **AI.**
+ Experience with **IBM Watsonx for AI application** deployment and management.
+ Proven skills in **Docker, Kubernetes (GKE),** and container orchestration at scale.
+ Proficiency in **Python, Bash,** or other relevant scripting languages.
+ Strong understanding of cloud networking, IAM, and security best practices.
+ Experience with CI/CD tools (GitHub Actions, GitLab CI, Jenkins) and IaC tools (Terraform, Pulumi, Ansible, Deployment Manager).
+ Familiarity with data pipelines and integration tools (Dataflow, Apache Beam, Pub/Sub, Kafka).
+ Excellent problem-solving, debugging, and communication skills.
**Preferred Experience**
+ Experience in MLOps practices for model deployment, monitoring, and retraining.
+ Exposure to multi-cloud or hybrid cloud environments (GCP, AWS, Azure, on-prem).
+ Hands-on experience with feature stores (Vertex AI Feature Store, Feast) and ML observability tools (EvidentlyAI, Fiddler).
+ Knowledge of distributed training frameworks (Horovod, DeepSpeed, PyTorch Distributed).
+ Contributions to open-source projects in infrastructure, MLOps, or GenAI.
+ Experience managing infrastructure in regulated industries.
**Preferred Certifications:**
+ Google Cloud Certified - Professional Cloud Architect
+ Google Cloud Certified - Machine Learning Engineer
+ Certified Kubernetes Administrator (CKA) or Certified Kubernetes Application Developer (CKAD)
+ IBM Certified Watsonx Generative AI Engineer - Associate
+ IBM Certified Solution Architect - Cloud Pak for Data
+ Other relevant certifications in AI, Machine Learning, or Cloud-Native technologies.
**Employee Type:**
Permanent
UPS is committed to providing a workplace free of discrimination, harassment, and retaliation.
GCP Infrastructure Engineer - Google Cloud, Terraform, Python, Bash, GKE, CI/CD
Posted 1 day ago
Job Viewed
Job Description
Découvrez votre prochaine opportunité au sein d'une organisation qui compte parmi les 500 plus importantes entreprises mondiales. Envisagez des opportunités innovantes, découvrez notre culture enrichissante et travaillez avec des équipes talentueuses qui vous poussent à vous développer chaque jour. Nous savons ce qu'il faut faire pour diriger UPS vers l'avenir : des personnes passionnées dotées d'une combinaison unique de compétences. Si vous avez les qualités, de la motivation, de l'autonomie ou le leadership pour diriger des équipes, il existe des postes adaptés à vos aspirations et à vos compétences d'aujourd'hui et de demain.
**Fiche de poste :**
Job Summary:
We are seeking a highly skilled GCP Infrastructure Engineer to design, build, and manage the cloud infrastructure that powers Generative AI (GenAI) applications at scale. In this role, you will leverage Google Cloud Platform (GCP) Vertex AI, IBM Watsonx, and containerization technologies such as Docker and Kubernetes (GKE) to deliver secure, scalable, and high-performance AI solutions. You will own the end-to-end infrastructure lifecycle - from design and provisioning to automation, monitoring, and optimization - while enabling data scientists and ML engineers to seamlessly deploy and operate GenAI workloads.
Key Responsibilities:
Cloud Infrastructure & Platform Engineering
+ Design, provision, and maintain scalable, secure, and cost-efficient infrastructure for GenAI applications on GCP.
+ Deploy and manage containerized workloads using Docker and Kubernetes (GKE).
+ Configure and optimize Vertex AI and IBM Watsonx platforms for training, fine-tuning, and serving LLMs and other generative models.
+ Implement high-performance GPU/TPU clusters to support distributed training and large-scale inference.
+ Ensure business continuity through backup, disaster recovery, and multi-region deployments.
Automation & Reliability
+ Develop and maintain Infrastructure as Code (IaC) templates with Terraform, or Cloud Deployment Manager.
+ Adopt GitOps practices (Flux) for infrastructure lifecycle management.
+ Build and optimize CI/CD pipelines for data pipelines, model workflows, and GenAI applications.
+ Apply SRE principles (SLIs, SLOs, SLAs) to guarantee platform reliability and uptime.
Security, Governance & Compliance
+ Embed DevSecOps best practices across the infrastructure lifecycle, including policy-as-code, vulnerability scanning, and secrets management.
+ Enforce identity and access management (IAM), network segmentation, and data encryption in compliance with standards (HIPAA, SOX, GDPR, FedRAMP).
+ Collaborate with enterprise security and compliance teams to implement governance frameworks for GenAI platforms.
Monitoring, Observability & Cost Optimization
+ Implement observability stacks (Prometheus, Grafana, Cloud Monitoring, Datadog) for both infra health and ML-specific metrics (model drift, data anomalies).
+ Define KPIs to monitor system health, performance, and adoption across AI workloads.
+ Optimize cloud cost efficiency for GPU/TPU-intensive workloads using autoscaling, preemptible instances, and utilization monitoring.
Collaboration & Enablement
+ Partner with data scientists, ML engineers, and software teams to streamline GenAI application development and deployment.
+ Provide onboarding, documentation, and reusable templates to enable faster adoption of AI infrastructure.
+ Stay current with the latest advancements in GenAI, cloud-native infrastructure, and container orchestration.
Required Education
Bachelor's or master's degree in computer science, Software Engineering, or a related field.
Required Experience
+ **8+ years** of experience in cloud infrastructure engineering, **DevOps,** or platform engineering.
+ Experience with GenAI use cases (chatbots, content generation, code assistants, etc.).
+ Strong hands-on expertise with **Google Cloud Platform (GCP),** especially **Vertex** **AI.**
+ Experience with **IBM Watsonx for AI application** deployment and management.
+ Proven skills in **Docker, Kubernetes (GKE),** and container orchestration at scale.
+ Proficiency in **Python, Bash,** or other relevant scripting languages.
+ Strong understanding of cloud networking, IAM, and security best practices.
+ Experience with CI/CD tools (GitHub Actions, GitLab CI, Jenkins) and IaC tools (Terraform, Pulumi, Ansible, Deployment Manager).
+ Familiarity with data pipelines and integration tools (Dataflow, Apache Beam, Pub/Sub, Kafka).
+ Excellent problem-solving, debugging, and communication skills.
Preferred Experience
+ Experience in MLOps practices for model deployment, monitoring, and retraining.
+ Exposure to multi-cloud or hybrid cloud environments (GCP, AWS, Azure, on-prem).
+ Hands-on experience with feature stores (Vertex AI Feature Store, Feast) and ML observability tools (EvidentlyAI, Fiddler).
+ Knowledge of distributed training frameworks (Horovod, DeepSpeed, PyTorch Distributed).
+ Contributions to open-source projects in infrastructure, MLOps, or GenAI.
+ Experience managing infrastructure in regulated industries.
Preferred Certifications:
+ Google Cloud Certified - Professional Cloud Architect
+ Google Cloud Certified - Machine Learning Engineer
+ Certified Kubernetes Administrator (CKA) or Certified Kubernetes Application Developer (CKAD)
+ IBM Certified Watsonx Generative AI Engineer - Associate
+ IBM Certified Solution Architect - Cloud Pak for Data
+ Other relevant certifications in AI, Machine Learning, or Cloud-Native technologies.
**Type de contrat:**
en CDI
_Chez UPS, égalité des chances, traitement équitable et environnement de travail inclusif sont des valeurs clefs auxquelles nous sommes attachés._