30 Infrastructure As Code Tools jobs in Chennai
AI & Cloud Infrastructure Lead
Posted today
Job Viewed
Job Description
**Required Skills and Qualifications:**
+ **Experience:**
+ Proven experience as an Oracle DBA with a deep understanding of Oracle Database management (version 11g, 12c, 19c) in both on-premises and cloud environments.
+ Strong experience working with **Oracle Cloud Infrastructure (OCI)** and cloud-based database services.
+ **Skills:**
+ Expert knowledge in database tuning, backup, recovery, security, and troubleshooting.
+ Hands-on experience with Oracle Enterprise Manager (OEM), RMAN, and SQL*Plus.
+ Proficient in cloud architecture and services related to Oracle Cloud.
+ Familiarity with cloud-native tools like **Oracle Autonomous Database** and **Oracle Exadata Cloud Service** .
+ Strong scripting skills (Shell, Python, etc.) for automation and management.
+ In-depth understanding of database replication, clustering, and high availability configurations.
+ Experience with performance monitoring and tuning using tools like **AWR** , **ADDM** , and **ASH** .
+ **Certifications:**
+ Oracle Certified Professional (OCP) is highly preferred.
+ Oracle Cloud certification is a plus (OCI Architect or Oracle Autonomous Database certification).
+ **Soft Skills:**
+ Excellent problem-solving skills and attention to detail.
+ Strong communication and collaboration skills.
+ Ability to work independently and in a team environment.
+ Ability to manage multiple priorities and meet deadlines.
**Preferred Qualifications:**
+ Academics : B.Tech/B.E./MCA/M.Sc
+ Experience in **Oracle RAC (Real Application Clusters)** and **Data Guard** configuration and management.
+ Experience with migration strategies from on-premises Oracle environments to Oracle Cloud.
+ Experience in Oracle Cloud networking and security best practices.
**Responsibilities**
**Key Responsibilities:**
+ **Database Administration:** Install, configure, manage, and maintain Oracle databases (Oracle 11g/12c/19c) both on-premises and in Oracle Cloud.
+ **Performance Tuning:** Perform database performance tuning, troubleshooting, and optimization for both traditional on-premises and cloud environments.
+ **Backup and Recovery:** Implement and manage backup and disaster recovery solutions for Oracle databases, ensuring data integrity and availability.
+ **Cloud Management:** Utilize Oracle Cloud Infrastructure (OCI) to deploy, monitor, and manage databases, implementing best practices for high availability, scalability, and security.
+ **Security & Compliance:** Manage database security configurations, enforce policies, and monitor for vulnerabilities. Ensure databases meet industry compliance standards.
+ **Capacity Planning:** Monitor database capacity, performance, and system utilization. Plan for scaling databases in the cloud to meet business needs.
+ **Automation & Scripting:** Automate routine DBA tasks such as backups, patches, and database monitoring using tools like shell scripting, Python, and Oracle Enterprise Manager.
+ **Troubleshooting:** Provide database troubleshooting and resolution for issues related to database connectivity, performance, and security.
+ **Collaboration:** Work with development teams to optimize SQL queries and database schemas for better application performance.
+ **Upgrades & Patching:** Plan and execute upgrades and patches to Oracle database systems, ensuring minimal downtime and data consistency.
+ **Documentation & Reporting:** Maintain up-to-date documentation of database environments, configurations, and procedures. Provide regular reports on database performance and security.
Career Level - IC3
**About Us**
As a world leader in cloud solutions, Oracle uses tomorrow's technology to tackle today's challenges. We've partnered with industry-leaders in almost every sector-and continue to thrive after 40+ years of change by operating with integrity.
We know that true innovation starts when everyone is empowered to contribute. That's why we're committed to growing an inclusive workforce that promotes opportunities for all.
Oracle careers open the door to global opportunities where work-life balance flourishes. We offer competitive benefits based on parity and consistency and support our people with flexible medical, life insurance, and retirement options. We also encourage employees to give back to their communities through our volunteer programs.
We're committed to including people with disabilities at all stages of the employment process. If you require accessibility assistance or accommodation for a disability at any point, let us know by emailing or by calling in the United States.
Oracle is an Equal Employment Opportunity Employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, national origin, sexual orientation, gender identity, disability and protected veterans' status, or any other characteristic protected by law. Oracle will consider for employment qualified applicants with arrest and conviction records pursuant to applicable law.
Google Cloud Infrastructure Support Engineer
Posted 1 day ago
Job Viewed
Job Description
Explore your next opportunity at a Fortune Global 500 organization. Envision innovative possibilities, experience our rewarding culture, and work with talented teams that help you become better every day. We know what it takes to lead UPS into tomorrow-people with a unique combination of skill + passion. If you have the qualities and drive to lead yourself or teams, there are roles ready to cultivate your skills and take you to the next level.
**Job Description:**
Google Cloud Infrastructure Support Engineer will be responsible for ensuring the reliability, performance, and security of our Google Cloud Platform (GCP) infrastructure. Work closely with cross-functional teams to troubleshoot issues, optimize infrastructure, and implement best practices for cloud architecture. Experience with Terraform for deploying and managing infrastructure templates. Administer BigQuery environments, including managing datasets, access controls, and optimize query performance. Be familiar with Vertex AI for monitoring and managing machine learning model deployments. Knowledge of GCP's Kubernetes Engine and its integration with the cloud ecosystem. Understanding of cloud security best practices and experience with implementing security measures. Knowledge of setting up and managing data clean rooms within BigQuery. Understanding of the Analytics Hub platform and how it integrates with data clean rooms to facilitate sensitive data-sharing use cases. Knowledge of DataPlex and how it integrates with other Google Cloud services such as BigQuery, Dataproc Metastore, and Data Catalog.
**Key Responsibilities:**
+ Provide technical support for our Google Cloud Platform infrastructure, including compute, storage, networking, and security services.
+ Monitor system performance and proactively identify and resolve issues to ensure maximum uptime and reliability.
+ Collaborate with cross-functional teams to design, implement, and optimize cloud infrastructure solutions.
+ Automate repetitive tasks and develop scripts to streamline operations and improve efficiency.
+ Document infrastructure configurations, processes, and procedures.
**Qualifications:**
Required:
+ Strong understanding of GCP services, including Compute Engine, Kubernetes Engine, Cloud Storage, VPC networking, and IAM.
+ Experience with BigQuery and VertexAI
+ Proficiency in scripting languages such as Python, Bash, or PowerShell.
+ Experience with infrastructure as code tools such as Terraform or Google Deployment Manager.
+ Strong communication and collaboration skills.
+ Bachelor's Degree in Computer Science or related discipline, or the equivalent in education and work experience
Preferred:
+ Google Cloud certification (e.g., Google Cloud Certified - Professional Cloud Architect, Google Cloud Certified - Professional Cloud DevOps Engineer)
**Employee Type:**
Permanent
UPS is committed to providing a workplace free of discrimination, harassment, and retaliation.
Google Cloud Infrastructure Support Engineer
Posted 1 day ago
Job Viewed
Job Description
Découvrez votre prochaine opportunité au sein d'une organisation qui compte parmi les 500 plus importantes entreprises mondiales. Envisagez des opportunités innovantes, découvrez notre culture enrichissante et travaillez avec des équipes talentueuses qui vous poussent à vous développer chaque jour. Nous savons ce qu'il faut faire pour diriger UPS vers l'avenir : des personnes passionnées dotées d'une combinaison unique de compétences. Si vous avez les qualités, de la motivation, de l'autonomie ou le leadership pour diriger des équipes, il existe des postes adaptés à vos aspirations et à vos compétences d'aujourd'hui et de demain.
**Fiche de poste :**
Google Cloud Infrastructure Support Engineer will be responsible for ensuring the reliability, performance, and security of our Google Cloud Platform (GCP) infrastructure. Work closely with cross-functional teams to troubleshoot issues, optimize infrastructure, and implement best practices for cloud architecture. Experience with Terraform for deploying and managing infrastructure templates. Administer BigQuery environments, including managing datasets, access controls, and optimize query performance. Be familiar with Vertex AI for monitoring and managing machine learning model deployments. Knowledge of GCP's Kubernetes Engine and its integration with the cloud ecosystem. Understanding of cloud security best practices and experience with implementing security measures. Knowledge of setting up and managing data clean rooms within BigQuery. Understanding of the Analytics Hub platform and how it integrates with data clean rooms to facilitate sensitive data-sharing use cases. Knowledge of DataPlex and how it integrates with other Google Cloud services such as BigQuery, Dataproc Metastore, and Data Catalog.
**Key Responsibilities:**
+ Provide technical support for our Google Cloud Platform infrastructure, including compute, storage, networking, and security services.
+ Monitor system performance and proactively identify and resolve issues to ensure maximum uptime and reliability.
+ Collaborate with cross-functional teams to design, implement, and optimize cloud infrastructure solutions.
+ Automate repetitive tasks and develop scripts to streamline operations and improve efficiency.
+ Document infrastructure configurations, processes, and procedures.
**Qualifications:**
Required:
+ Strong understanding of GCP services, including Compute Engine, Kubernetes Engine, Cloud Storage, VPC networking, and IAM.
+ Experience with BigQuery and VertexAI
+ Proficiency in scripting languages such as Python, Bash, or PowerShell.
+ Experience with infrastructure as code tools such as Terraform or Google Deployment Manager.
+ Strong communication and collaboration skills.
+ Bachelor's Degree in Computer Science or related discipline, or the equivalent in education and work experience
Preferred:
+ Google Cloud certification (e.g., Google Cloud Certified - Professional Cloud Architect, Google Cloud Certified - Professional Cloud DevOps Engineer)
**Type de contrat:**
en CDI
_Chez UPS, égalité des chances, traitement équitable et environnement de travail inclusif sont des valeurs clefs auxquelles nous sommes attachés._
Oracle Cloud Infrastructure Architect @Chennai (preferred)NOT looking for Admin folks/Mumbai/Pune/
Posted today
Job Viewed
Job Description
Role:
OCI Architect would lead the design, implementation, and management of robust cloud solutions on Oracle Cloud Infrastructure (OCI). This role demands deep technical expertise, strategic thinking, and hands-on experience in provisioning, migration, and optimization of OCI environments.
He/she would Architect and implement scalable, secure, and cost-efficient solutions on OCI
Lead environment provisioning, cloud migration, and deployment activities
Design and develop OCI infrastructure covering compute, networking, storage, and security
Automate provisioning and configuration using Infrastructure-as-Code tools
Enforce OCI security best practices and ensure compliance with regulatory standards
AsK:
10-15yrs of experience ONLY. Should be an SME on OCI Core Services like Compute, Storage, Networking, IAM
Strong understanding of OCI Security & Compliance frameworks
Proven experience designing OCI architectures for enterprise workloads
Proficiency in Terraform or OCI Resource Manager for IaC
Hands-on experience with DevOps practices and CI/CD pipelines
Familiarity with cloud observability tools for monitoring and performance tuning.
GCP Infrastructure Engineer - Google Cloud, Terraform, Python, Bash, GKE, CI/CD
Posted 1 day ago
Job Viewed
Job Description
Découvrez votre prochaine opportunité au sein d'une organisation qui compte parmi les 500 plus importantes entreprises mondiales. Envisagez des opportunités innovantes, découvrez notre culture enrichissante et travaillez avec des équipes talentueuses qui vous poussent à vous développer chaque jour. Nous savons ce qu'il faut faire pour diriger UPS vers l'avenir : des personnes passionnées dotées d'une combinaison unique de compétences. Si vous avez les qualités, de la motivation, de l'autonomie ou le leadership pour diriger des équipes, il existe des postes adaptés à vos aspirations et à vos compétences d'aujourd'hui et de demain.
**Fiche de poste :**
Job Summary:
We are seeking a highly skilled GCP Infrastructure Engineer to design, build, and manage the cloud infrastructure that powers Generative AI (GenAI) applications at scale. In this role, you will leverage Google Cloud Platform (GCP) Vertex AI, IBM Watsonx, and containerization technologies such as Docker and Kubernetes (GKE) to deliver secure, scalable, and high-performance AI solutions. You will own the end-to-end infrastructure lifecycle - from design and provisioning to automation, monitoring, and optimization - while enabling data scientists and ML engineers to seamlessly deploy and operate GenAI workloads.
Key Responsibilities:
Cloud Infrastructure & Platform Engineering
+ Design, provision, and maintain scalable, secure, and cost-efficient infrastructure for GenAI applications on GCP.
+ Deploy and manage containerized workloads using Docker and Kubernetes (GKE).
+ Configure and optimize Vertex AI and IBM Watsonx platforms for training, fine-tuning, and serving LLMs and other generative models.
+ Implement high-performance GPU/TPU clusters to support distributed training and large-scale inference.
+ Ensure business continuity through backup, disaster recovery, and multi-region deployments.
Automation & Reliability
+ Develop and maintain Infrastructure as Code (IaC) templates with Terraform, or Cloud Deployment Manager.
+ Adopt GitOps practices (Flux) for infrastructure lifecycle management.
+ Build and optimize CI/CD pipelines for data pipelines, model workflows, and GenAI applications.
+ Apply SRE principles (SLIs, SLOs, SLAs) to guarantee platform reliability and uptime.
Security, Governance & Compliance
+ Embed DevSecOps best practices across the infrastructure lifecycle, including policy-as-code, vulnerability scanning, and secrets management.
+ Enforce identity and access management (IAM), network segmentation, and data encryption in compliance with standards (HIPAA, SOX, GDPR, FedRAMP).
+ Collaborate with enterprise security and compliance teams to implement governance frameworks for GenAI platforms.
Monitoring, Observability & Cost Optimization
+ Implement observability stacks (Prometheus, Grafana, Cloud Monitoring, Datadog) for both infra health and ML-specific metrics (model drift, data anomalies).
+ Define KPIs to monitor system health, performance, and adoption across AI workloads.
+ Optimize cloud cost efficiency for GPU/TPU-intensive workloads using autoscaling, preemptible instances, and utilization monitoring.
Collaboration & Enablement
+ Partner with data scientists, ML engineers, and software teams to streamline GenAI application development and deployment.
+ Provide onboarding, documentation, and reusable templates to enable faster adoption of AI infrastructure.
+ Stay current with the latest advancements in GenAI, cloud-native infrastructure, and container orchestration.
Required Education
Bachelor's or master's degree in computer science, Software Engineering, or a related field.
Required Experience
+ **5+ years** of experience in cloud infrastructure engineering, **DevOps,** or platform engineering.
+ Experience with GenAI use cases (chatbots, content generation, code assistants, etc.).
+ Strong hands-on expertise with **Google Cloud Platform (GCP),** especially **Vertex** **AI.**
+ Experience with **IBM Watsonx for AI application** deployment and management.
+ Proven skills in **Docker, Kubernetes (GKE),** and container orchestration at scale.
+ Proficiency in **Python, Bash,** or other relevant scripting languages.
+ Strong understanding of cloud networking, IAM, and security best practices.
+ Experience with CI/CD tools (GitHub Actions, GitLab CI, Jenkins) and IaC tools (Terraform, Pulumi, Ansible, Deployment Manager).
+ Familiarity with data pipelines and integration tools (Dataflow, Apache Beam, Pub/Sub, Kafka).
+ Excellent problem-solving, debugging, and communication skills.
Preferred Experience
+ Experience in MLOps practices for model deployment, monitoring, and retraining.
+ Exposure to multi-cloud or hybrid cloud environments (GCP, AWS, Azure, on-prem).
+ Hands-on experience with feature stores (Vertex AI Feature Store, Feast) and ML observability tools (EvidentlyAI, Fiddler).
+ Knowledge of distributed training frameworks (Horovod, DeepSpeed, PyTorch Distributed).
+ Contributions to open-source projects in infrastructure, MLOps, or GenAI.
+ Experience managing infrastructure in regulated industries.
Preferred Certifications:
+ Google Cloud Certified - Professional Cloud Architect
+ Google Cloud Certified - Machine Learning Engineer
+ Certified Kubernetes Administrator (CKA) or Certified Kubernetes Application Developer (CKAD)
+ IBM Certified Watsonx Generative AI Engineer - Associate
+ IBM Certified Solution Architect - Cloud Pak for Data
+ Other relevant certifications in AI, Machine Learning, or Cloud-Native technologies.
**Type de contrat:**
en CDI
_Chez UPS, égalité des chances, traitement équitable et environnement de travail inclusif sont des valeurs clefs auxquelles nous sommes attachés._
GCP Infrastructure Engineer - Google Cloud, Terraform, Python, Bash, GKE, CI/CD
Posted 1 day ago
Job Viewed
Job Description
Explore your next opportunity at a Fortune Global 500 organization. Envision innovative possibilities, experience our rewarding culture, and work with talented teams that help you become better every day. We know what it takes to lead UPS into tomorrow-people with a unique combination of skill + passion. If you have the qualities and drive to lead yourself or teams, there are roles ready to cultivate your skills and take you to the next level.
**Job Description:**
Job Summary:
We are seeking a highly skilled GCP Infrastructure Engineer to design, build, and manage the cloud infrastructure that powers Generative AI (GenAI) applications at scale. In this role, you will leverage Google Cloud Platform (GCP) Vertex AI, IBM Watsonx, and containerization technologies such as Docker and Kubernetes (GKE) to deliver secure, scalable, and high-performance AI solutions. You will own the end-to-end infrastructure lifecycle - from design and provisioning to automation, monitoring, and optimization - while enabling data scientists and ML engineers to seamlessly deploy and operate GenAI workloads.
Key Responsibilities:
Cloud Infrastructure & Platform Engineering
+ Design, provision, and maintain scalable, secure, and cost-efficient infrastructure for GenAI applications on GCP.
+ Deploy and manage containerized workloads using Docker and Kubernetes (GKE).
+ Configure and optimize Vertex AI and IBM Watsonx platforms for training, fine-tuning, and serving LLMs and other generative models.
+ Implement high-performance GPU/TPU clusters to support distributed training and large-scale inference.
+ Ensure business continuity through backup, disaster recovery, and multi-region deployments.
Automation & Reliability
+ Develop and maintain Infrastructure as Code (IaC) templates with Terraform, or Cloud Deployment Manager.
+ Adopt GitOps practices (Flux) for infrastructure lifecycle management.
+ Build and optimize CI/CD pipelines for data pipelines, model workflows, and GenAI applications.
+ Apply SRE principles (SLIs, SLOs, SLAs) to guarantee platform reliability and uptime.
Security, Governance & Compliance
+ Embed DevSecOps best practices across the infrastructure lifecycle, including policy-as-code, vulnerability scanning, and secrets management.
+ Enforce identity and access management (IAM), network segmentation, and data encryption in compliance with standards (HIPAA, SOX, GDPR, FedRAMP).
+ Collaborate with enterprise security and compliance teams to implement governance frameworks for GenAI platforms.
Monitoring, Observability & Cost Optimization
+ Implement observability stacks (Prometheus, Grafana, Cloud Monitoring, Datadog) for both infra health and ML-specific metrics (model drift, data anomalies).
+ Define KPIs to monitor system health, performance, and adoption across AI workloads.
+ Optimize cloud cost efficiency for GPU/TPU-intensive workloads using autoscaling, preemptible instances, and utilization monitoring.
Collaboration & Enablement
+ Partner with data scientists, ML engineers, and software teams to streamline GenAI application development and deployment.
+ Provide onboarding, documentation, and reusable templates to enable faster adoption of AI infrastructure.
+ Stay current with the latest advancements in GenAI, cloud-native infrastructure, and container orchestration.
Required Education
Bachelor's or master's degree in computer science, Software Engineering, or a related field.
Required Experience
+ **5+ years** of experience in cloud infrastructure engineering, **DevOps,** or platform engineering.
+ Experience with GenAI use cases (chatbots, content generation, code assistants, etc.).
+ Strong hands-on expertise with **Google Cloud Platform (GCP),** especially **Vertex** **AI.**
+ Experience with **IBM Watsonx for AI application** deployment and management.
+ Proven skills in **Docker, Kubernetes (GKE),** and container orchestration at scale.
+ Proficiency in **Python, Bash,** or other relevant scripting languages.
+ Strong understanding of cloud networking, IAM, and security best practices.
+ Experience with CI/CD tools (GitHub Actions, GitLab CI, Jenkins) and IaC tools (Terraform, Pulumi, Ansible, Deployment Manager).
+ Familiarity with data pipelines and integration tools (Dataflow, Apache Beam, Pub/Sub, Kafka).
+ Excellent problem-solving, debugging, and communication skills.
Preferred Experience
+ Experience in MLOps practices for model deployment, monitoring, and retraining.
+ Exposure to multi-cloud or hybrid cloud environments (GCP, AWS, Azure, on-prem).
+ Hands-on experience with feature stores (Vertex AI Feature Store, Feast) and ML observability tools (EvidentlyAI, Fiddler).
+ Knowledge of distributed training frameworks (Horovod, DeepSpeed, PyTorch Distributed).
+ Contributions to open-source projects in infrastructure, MLOps, or GenAI.
+ Experience managing infrastructure in regulated industries.
Preferred Certifications:
+ Google Cloud Certified - Professional Cloud Architect
+ Google Cloud Certified - Machine Learning Engineer
+ Certified Kubernetes Administrator (CKA) or Certified Kubernetes Application Developer (CKAD)
+ IBM Certified Watsonx Generative AI Engineer - Associate
+ IBM Certified Solution Architect - Cloud Pak for Data
+ Other relevant certifications in AI, Machine Learning, or Cloud-Native technologies.
**Employee Type:**
Permanent
UPS is committed to providing a workplace free of discrimination, harassment, and retaliation.
GCP Infrastructure Engineer - Google Cloud, Terraform, Python, Bash, GKE, CI/CD
Posted 1 day ago
Job Viewed
Job Description
Découvrez votre prochaine opportunité au sein d'une organisation qui compte parmi les 500 plus importantes entreprises mondiales. Envisagez des opportunités innovantes, découvrez notre culture enrichissante et travaillez avec des équipes talentueuses qui vous poussent à vous développer chaque jour. Nous savons ce qu'il faut faire pour diriger UPS vers l'avenir : des personnes passionnées dotées d'une combinaison unique de compétences. Si vous avez les qualités, de la motivation, de l'autonomie ou le leadership pour diriger des équipes, il existe des postes adaptés à vos aspirations et à vos compétences d'aujourd'hui et de demain.
**Fiche de poste :**
Job Summary:
We are seeking a highly skilled GCP Infrastructure Engineer to design, build, and manage the cloud infrastructure that powers Generative AI (GenAI) applications at scale. In this role, you will leverage Google Cloud Platform (GCP) Vertex AI, IBM Watsonx, and containerization technologies such as Docker and Kubernetes (GKE) to deliver secure, scalable, and high-performance AI solutions. You will own the end-to-end infrastructure lifecycle - from design and provisioning to automation, monitoring, and optimization - while enabling data scientists and ML engineers to seamlessly deploy and operate GenAI workloads.
Key Responsibilities:
Cloud Infrastructure & Platform Engineering
+ Design, provision, and maintain scalable, secure, and cost-efficient infrastructure for GenAI applications on GCP.
+ Deploy and manage containerized workloads using Docker and Kubernetes (GKE).
+ Configure and optimize Vertex AI and IBM Watsonx platforms for training, fine-tuning, and serving LLMs and other generative models.
+ Implement high-performance GPU/TPU clusters to support distributed training and large-scale inference.
+ Ensure business continuity through backup, disaster recovery, and multi-region deployments.
Automation & Reliability
+ Develop and maintain Infrastructure as Code (IaC) templates with Terraform, or Cloud Deployment Manager.
+ Adopt GitOps practices (Flux) for infrastructure lifecycle management.
+ Build and optimize CI/CD pipelines for data pipelines, model workflows, and GenAI applications.
+ Apply SRE principles (SLIs, SLOs, SLAs) to guarantee platform reliability and uptime.
Security, Governance & Compliance
+ Embed DevSecOps best practices across the infrastructure lifecycle, including policy-as-code, vulnerability scanning, and secrets management.
+ Enforce identity and access management (IAM), network segmentation, and data encryption in compliance with standards (HIPAA, SOX, GDPR, FedRAMP).
+ Collaborate with enterprise security and compliance teams to implement governance frameworks for GenAI platforms.
Monitoring, Observability & Cost Optimization
+ Implement observability stacks (Prometheus, Grafana, Cloud Monitoring, Datadog) for both infra health and ML-specific metrics (model drift, data anomalies).
+ Define KPIs to monitor system health, performance, and adoption across AI workloads.
+ Optimize cloud cost efficiency for GPU/TPU-intensive workloads using autoscaling, preemptible instances, and utilization monitoring.
Collaboration & Enablement
+ Partner with data scientists, ML engineers, and software teams to streamline GenAI application development and deployment.
+ Provide onboarding, documentation, and reusable templates to enable faster adoption of AI infrastructure.
+ Stay current with the latest advancements in GenAI, cloud-native infrastructure, and container orchestration.
Required Education
Bachelor's or master's degree in computer science, Software Engineering, or a related field.
Required Experience
+ **5+ years** of experience in cloud infrastructure engineering, **DevOps,** or platform engineering.
+ Experience with GenAI use cases (chatbots, content generation, code assistants, etc.).
+ Strong hands-on expertise with **Google Cloud Platform (GCP),** especially **Vertex** **AI.**
+ Experience with **IBM Watsonx for AI application** deployment and management.
+ Proven skills in **Docker, Kubernetes (GKE),** and container orchestration at scale.
+ Proficiency in **Python, Bash,** or other relevant scripting languages.
+ Strong understanding of cloud networking, IAM, and security best practices.
+ Experience with CI/CD tools (GitHub Actions, GitLab CI, Jenkins) and IaC tools (Terraform, Pulumi, Ansible, Deployment Manager).
+ Familiarity with data pipelines and integration tools (Dataflow, Apache Beam, Pub/Sub, Kafka).
+ Excellent problem-solving, debugging, and communication skills.
Preferred Experience
+ Experience in MLOps practices for model deployment, monitoring, and retraining.
+ Exposure to multi-cloud or hybrid cloud environments (GCP, AWS, Azure, on-prem).
+ Hands-on experience with feature stores (Vertex AI Feature Store, Feast) and ML observability tools (EvidentlyAI, Fiddler).
+ Knowledge of distributed training frameworks (Horovod, DeepSpeed, PyTorch Distributed).
+ Contributions to open-source projects in infrastructure, MLOps, or GenAI.
+ Experience managing infrastructure in regulated industries.
Preferred Certifications:
+ Google Cloud Certified - Professional Cloud Architect
+ Google Cloud Certified - Machine Learning Engineer
+ Certified Kubernetes Administrator (CKA) or Certified Kubernetes Application Developer (CKAD)
+ IBM Certified Watsonx Generative AI Engineer - Associate
+ IBM Certified Solution Architect - Cloud Pak for Data
+ Other relevant certifications in AI, Machine Learning, or Cloud-Native technologies.
**Type de contrat:**
en CDI
_Chez UPS, égalité des chances, traitement équitable et environnement de travail inclusif sont des valeurs clefs auxquelles nous sommes attachés._
Be The First To Know
About the latest Infrastructure as code tools Jobs in Chennai !
GCP Infrastructure Engineer - Google Cloud, Terraform, Python, Bash, GKE, CI/CD
Posted 1 day ago
Job Viewed
Job Description
Explore your next opportunity at a Fortune Global 500 organization. Envision innovative possibilities, experience our rewarding culture, and work with talented teams that help you become better every day. We know what it takes to lead UPS into tomorrow-people with a unique combination of skill + passion. If you have the qualities and drive to lead yourself or teams, there are roles ready to cultivate your skills and take you to the next level.
**Job Description:**
Job Summary:
We are seeking a highly skilled GCP Infrastructure Engineer to design, build, and manage the cloud infrastructure that powers Generative AI (GenAI) applications at scale. In this role, you will leverage Google Cloud Platform (GCP) Vertex AI, IBM Watsonx, and containerization technologies such as Docker and Kubernetes (GKE) to deliver secure, scalable, and high-performance AI solutions. You will own the end-to-end infrastructure lifecycle - from design and provisioning to automation, monitoring, and optimization - while enabling data scientists and ML engineers to seamlessly deploy and operate GenAI workloads.
Key Responsibilities:
Cloud Infrastructure & Platform Engineering
+ Design, provision, and maintain scalable, secure, and cost-efficient infrastructure for GenAI applications on GCP.
+ Deploy and manage containerized workloads using Docker and Kubernetes (GKE).
+ Configure and optimize Vertex AI and IBM Watsonx platforms for training, fine-tuning, and serving LLMs and other generative models.
+ Implement high-performance GPU/TPU clusters to support distributed training and large-scale inference.
+ Ensure business continuity through backup, disaster recovery, and multi-region deployments.
Automation & Reliability
+ Develop and maintain Infrastructure as Code (IaC) templates with Terraform, or Cloud Deployment Manager.
+ Adopt GitOps practices (Flux) for infrastructure lifecycle management.
+ Build and optimize CI/CD pipelines for data pipelines, model workflows, and GenAI applications.
+ Apply SRE principles (SLIs, SLOs, SLAs) to guarantee platform reliability and uptime.
Security, Governance & Compliance
+ Embed DevSecOps best practices across the infrastructure lifecycle, including policy-as-code, vulnerability scanning, and secrets management.
+ Enforce identity and access management (IAM), network segmentation, and data encryption in compliance with standards (HIPAA, SOX, GDPR, FedRAMP).
+ Collaborate with enterprise security and compliance teams to implement governance frameworks for GenAI platforms.
Monitoring, Observability & Cost Optimization
+ Implement observability stacks (Prometheus, Grafana, Cloud Monitoring, Datadog) for both infra health and ML-specific metrics (model drift, data anomalies).
+ Define KPIs to monitor system health, performance, and adoption across AI workloads.
+ Optimize cloud cost efficiency for GPU/TPU-intensive workloads using autoscaling, preemptible instances, and utilization monitoring.
Collaboration & Enablement
+ Partner with data scientists, ML engineers, and software teams to streamline GenAI application development and deployment.
+ Provide onboarding, documentation, and reusable templates to enable faster adoption of AI infrastructure.
+ Stay current with the latest advancements in GenAI, cloud-native infrastructure, and container orchestration.
Required Education
Bachelor's or master's degree in computer science, Software Engineering, or a related field.
Required Experience
+ **5+ years** of experience in cloud infrastructure engineering, **DevOps,** or platform engineering.
+ Experience with GenAI use cases (chatbots, content generation, code assistants, etc.).
+ Strong hands-on expertise with **Google Cloud Platform (GCP),** especially **Vertex** **AI.**
+ Experience with **IBM Watsonx for AI application** deployment and management.
+ Proven skills in **Docker, Kubernetes (GKE),** and container orchestration at scale.
+ Proficiency in **Python, Bash,** or other relevant scripting languages.
+ Strong understanding of cloud networking, IAM, and security best practices.
+ Experience with CI/CD tools (GitHub Actions, GitLab CI, Jenkins) and IaC tools (Terraform, Pulumi, Ansible, Deployment Manager).
+ Familiarity with data pipelines and integration tools (Dataflow, Apache Beam, Pub/Sub, Kafka).
+ Excellent problem-solving, debugging, and communication skills.
Preferred Experience
+ Experience in MLOps practices for model deployment, monitoring, and retraining.
+ Exposure to multi-cloud or hybrid cloud environments (GCP, AWS, Azure, on-prem).
+ Hands-on experience with feature stores (Vertex AI Feature Store, Feast) and ML observability tools (EvidentlyAI, Fiddler).
+ Knowledge of distributed training frameworks (Horovod, DeepSpeed, PyTorch Distributed).
+ Contributions to open-source projects in infrastructure, MLOps, or GenAI.
+ Experience managing infrastructure in regulated industries.
Preferred Certifications:
+ Google Cloud Certified - Professional Cloud Architect
+ Google Cloud Certified - Machine Learning Engineer
+ Certified Kubernetes Administrator (CKA) or Certified Kubernetes Application Developer (CKAD)
+ IBM Certified Watsonx Generative AI Engineer - Associate
+ IBM Certified Solution Architect - Cloud Pak for Data
+ Other relevant certifications in AI, Machine Learning, or Cloud-Native technologies.
**Employee Type:**
Permanent
UPS is committed to providing a workplace free of discrimination, harassment, and retaliation.
GCP Infrastructure Engineer - Google Cloud, Terraform, Python, Bash, GKE, CI/CD
Posted 1 day ago
Job Viewed
Job Description
Découvrez votre prochaine opportunité au sein d'une organisation qui compte parmi les 500 plus importantes entreprises mondiales. Envisagez des opportunités innovantes, découvrez notre culture enrichissante et travaillez avec des équipes talentueuses qui vous poussent à vous développer chaque jour. Nous savons ce qu'il faut faire pour diriger UPS vers l'avenir : des personnes passionnées dotées d'une combinaison unique de compétences. Si vous avez les qualités, de la motivation, de l'autonomie ou le leadership pour diriger des équipes, il existe des postes adaptés à vos aspirations et à vos compétences d'aujourd'hui et de demain.
**Fiche de poste :**
**Job Summary:**
We are seeking a highly skilled GCP Infrastructure Engineer to design, build, and manage the cloud infrastructure that powers Generative AI (GenAI) applications at scale. In this role, you will leverage Google Cloud Platform (GCP) Vertex AI, IBM Watsonx, and containerization technologies such as Docker and Kubernetes (GKE) to deliver secure, scalable, and high-performance AI solutions. You will own the end-to-end infrastructure lifecycle - from design and provisioning to automation, monitoring, and optimization - while enabling data scientists and ML engineers to seamlessly deploy and operate GenAI workloads.
**Key Responsibilities:**
**Cloud Infrastructure & Platform Engineering**
+ Design, provision, and maintain scalable, secure, and cost-efficient infrastructure for GenAI applications on GCP.
+ Deploy and manage containerized workloads using Docker and Kubernetes (GKE).
+ Configure and optimize Vertex AI and IBM Watsonx platforms for training, fine-tuning, and serving LLMs and other generative models.
+ Implement high-performance GPU/TPU clusters to support distributed training and large-scale inference.
+ Ensure business continuity through backup, disaster recovery, and multi-region deployments.
**Automation & Reliability**
+ Develop and maintain Infrastructure as Code (IaC) templates with Terraform, or Cloud Deployment Manager.
+ Adopt GitOps practices (Flux) for infrastructure lifecycle management.
+ Build and optimize CI/CD pipelines for data pipelines, model workflows, and GenAI applications.
+ Apply SRE principles (SLIs, SLOs, SLAs) to guarantee platform reliability and uptime.
**Security, Governance & Compliance**
+ Embed DevSecOps best practices across the infrastructure lifecycle, including policy-as-code, vulnerability scanning, and secrets management.
+ Enforce identity and access management (IAM), network segmentation, and data encryption in compliance with standards (HIPAA, SOX, GDPR, FedRAMP).
+ Collaborate with enterprise security and compliance teams to implement governance frameworks for GenAI platforms.
**Monitoring, Observability & Cost Optimization**
+ Implement observability stacks (Prometheus, Grafana, Cloud Monitoring, Datadog) for both infra health and ML-specific metrics (model drift, data anomalies).
+ Define KPIs to monitor system health, performance, and adoption across AI workloads.
+ Optimize cloud cost efficiency for GPU/TPU-intensive workloads using autoscaling, preemptible instances, and utilization monitoring.
**Collaboration & Enablement**
+ Partner with data scientists, ML engineers, and software teams to streamline GenAI application development and deployment.
+ Provide onboarding, documentation, and reusable templates to enable faster adoption of AI infrastructure.
+ Stay current with the latest advancements in GenAI, cloud-native infrastructure, and container orchestration.
**Required Qualifications**
**Education**
Bachelor's or master's degree in computer science, Software Engineering, or a related field.
**Experience**
+ **5+ years** of experience in cloud infrastructure engineering, **DevOps,** or platform engineering.
+ Experience with GenAI use cases (chatbots, content generation, code assistants, etc.).
+ Strong hands-on expertise with **Google Cloud Platform (GCP),** especially **Vertex** **AI.**
+ Experience with **IBM Watsonx for AI application** deployment and management.
+ Proven skills in **Docker, Kubernetes (GKE),** and container orchestration at scale.
+ Proficiency in **Python, Bash,** or other relevant scripting languages.
+ Strong understanding of cloud networking, IAM, and security best practices.
+ Experience with CI/CD tools (GitHub Actions, GitLab CI, Jenkins) and IaC tools (Terraform, Pulumi, Ansible, Deployment Manager).
+ Familiarity with data pipelines and integration tools (Dataflow, Apache Beam, Pub/Sub, Kafka).
+ Excellent problem-solving, debugging, and communication skills.
**Preferred Experience**
+ Experience in MLOps practices for model deployment, monitoring, and retraining.
+ Exposure to multi-cloud or hybrid cloud environments (GCP, AWS, Azure, on-prem).
+ Hands-on experience with feature stores (Vertex AI Feature Store, Feast) and ML observability tools (EvidentlyAI, Fiddler).
+ Knowledge of distributed training frameworks (Horovod, DeepSpeed, PyTorch Distributed).
+ Contributions to open-source projects in infrastructure, MLOps, or GenAI.
+ Experience managing infrastructure in regulated industries.
**Preferred Certifications:**
+ Google Cloud Certified - Professional Cloud Architect
+ Google Cloud Certified - Machine Learning Engineer
+ Certified Kubernetes Administrator (CKA) or Certified Kubernetes Application Developer (CKAD)
+ IBM Certified Watsonx Generative AI Engineer - Associate
+ IBM Certified Solution Architect - Cloud Pak for Data
+ Other relevant certifications in AI, Machine Learning, or Cloud-Native technologies.
**Type de contrat:**
en CDI
_Chez UPS, égalité des chances, traitement équitable et environnement de travail inclusif sont des valeurs clefs auxquelles nous sommes attachés._
GCP Infrastructure Engineer - Google Cloud, Terraform, Python, Bash, GKE, CI/CD
Posted 1 day ago
Job Viewed
Job Description
Explore your next opportunity at a Fortune Global 500 organization. Envision innovative possibilities, experience our rewarding culture, and work with talented teams that help you become better every day. We know what it takes to lead UPS into tomorrow-people with a unique combination of skill + passion. If you have the qualities and drive to lead yourself or teams, there are roles ready to cultivate your skills and take you to the next level.
**Job Description:**
**Job Summary:**
We are seeking a highly skilled GCP Infrastructure Engineer to design, build, and manage the cloud infrastructure that powers Generative AI (GenAI) applications at scale. In this role, you will leverage Google Cloud Platform (GCP) Vertex AI, IBM Watsonx, and containerization technologies such as Docker and Kubernetes (GKE) to deliver secure, scalable, and high-performance AI solutions. You will own the end-to-end infrastructure lifecycle - from design and provisioning to automation, monitoring, and optimization - while enabling data scientists and ML engineers to seamlessly deploy and operate GenAI workloads.
**Key Responsibilities:**
**Cloud Infrastructure & Platform Engineering**
+ Design, provision, and maintain scalable, secure, and cost-efficient infrastructure for GenAI applications on GCP.
+ Deploy and manage containerized workloads using Docker and Kubernetes (GKE).
+ Configure and optimize Vertex AI and IBM Watsonx platforms for training, fine-tuning, and serving LLMs and other generative models.
+ Implement high-performance GPU/TPU clusters to support distributed training and large-scale inference.
+ Ensure business continuity through backup, disaster recovery, and multi-region deployments.
**Automation & Reliability**
+ Develop and maintain Infrastructure as Code (IaC) templates with Terraform, or Cloud Deployment Manager.
+ Adopt GitOps practices (Flux) for infrastructure lifecycle management.
+ Build and optimize CI/CD pipelines for data pipelines, model workflows, and GenAI applications.
+ Apply SRE principles (SLIs, SLOs, SLAs) to guarantee platform reliability and uptime.
**Security, Governance & Compliance**
+ Embed DevSecOps best practices across the infrastructure lifecycle, including policy-as-code, vulnerability scanning, and secrets management.
+ Enforce identity and access management (IAM), network segmentation, and data encryption in compliance with standards (HIPAA, SOX, GDPR, FedRAMP).
+ Collaborate with enterprise security and compliance teams to implement governance frameworks for GenAI platforms.
**Monitoring, Observability & Cost Optimization**
+ Implement observability stacks (Prometheus, Grafana, Cloud Monitoring, Datadog) for both infra health and ML-specific metrics (model drift, data anomalies).
+ Define KPIs to monitor system health, performance, and adoption across AI workloads.
+ Optimize cloud cost efficiency for GPU/TPU-intensive workloads using autoscaling, preemptible instances, and utilization monitoring.
**Collaboration & Enablement**
+ Partner with data scientists, ML engineers, and software teams to streamline GenAI application development and deployment.
+ Provide onboarding, documentation, and reusable templates to enable faster adoption of AI infrastructure.
+ Stay current with the latest advancements in GenAI, cloud-native infrastructure, and container orchestration.
**Required Qualifications**
**Education**
Bachelor's or master's degree in computer science, Software Engineering, or a related field.
**Experience**
+ **5+ years** of experience in cloud infrastructure engineering, **DevOps,** or platform engineering.
+ Experience with GenAI use cases (chatbots, content generation, code assistants, etc.).
+ Strong hands-on expertise with **Google Cloud Platform (GCP),** especially **Vertex** **AI.**
+ Experience with **IBM Watsonx for AI application** deployment and management.
+ Proven skills in **Docker, Kubernetes (GKE),** and container orchestration at scale.
+ Proficiency in **Python, Bash,** or other relevant scripting languages.
+ Strong understanding of cloud networking, IAM, and security best practices.
+ Experience with CI/CD tools (GitHub Actions, GitLab CI, Jenkins) and IaC tools (Terraform, Pulumi, Ansible, Deployment Manager).
+ Familiarity with data pipelines and integration tools (Dataflow, Apache Beam, Pub/Sub, Kafka).
+ Excellent problem-solving, debugging, and communication skills.
**Preferred Experience**
+ Experience in MLOps practices for model deployment, monitoring, and retraining.
+ Exposure to multi-cloud or hybrid cloud environments (GCP, AWS, Azure, on-prem).
+ Hands-on experience with feature stores (Vertex AI Feature Store, Feast) and ML observability tools (EvidentlyAI, Fiddler).
+ Knowledge of distributed training frameworks (Horovod, DeepSpeed, PyTorch Distributed).
+ Contributions to open-source projects in infrastructure, MLOps, or GenAI.
+ Experience managing infrastructure in regulated industries.
**Preferred Certifications:**
+ Google Cloud Certified - Professional Cloud Architect
+ Google Cloud Certified - Machine Learning Engineer
+ Certified Kubernetes Administrator (CKA) or Certified Kubernetes Application Developer (CKAD)
+ IBM Certified Watsonx Generative AI Engineer - Associate
+ IBM Certified Solution Architect - Cloud Pak for Data
+ Other relevant certifications in AI, Machine Learning, or Cloud-Native technologies.
**Employee Type:**
Permanent
UPS is committed to providing a workplace free of discrimination, harassment, and retaliation.