5,695 Data Preprocessing jobs in India
Director Data Science & Data Engineering
Posted today
Job Viewed
Job Description
At eBay, we're more than a global ecommerce leader — we’re changing the way the world shops and sells. Our platform empowers millions of buyers and sellers in more than 190 markets around the world. We’re committed to pushing boundaries and leaving our mark as we reinvent the future of ecommerce for enthusiasts.
Our customers are our compass, authenticity thrives, bold ideas are welcome, and everyone can bring their unique selves to work — every day. We're in this together, sustaining the future of our customers, our company, and our planet.
Join a team of passionate thinkers, innovators, and dreamers — and help us connect people and build communities to create economic opportunity for all.
Director – Data Science & Data Engineering
Shape the Future of AI-Driven eCommerce Discovery
About the Role
We're reimagining how people discover products in eCommerce—and we're looking for a visionary leader who blends technical depth with product intuition. If you're passionate about structured data, large language models, and building high-impact data products, this role is tailor-made for you.
As Director of Data Science & Data Engineering, you’ll lead a talented team of data scientists, analysts, and engineers working at the cutting edge of AI/ML, product analytics, and taxonomy design. Your mission? Drive innovation in product discovery through smarter data, scalable infrastructure, and breakthrough AI-powered solutions.
You’ll join the Product Knowledge org and play a key role in designing the backbone of next-gen search, recommendations, and generative AI experiences.
This is a high-impact, high-agency role—perfect for a hands-on leader who thrives in fast-paced, collaborative environments.
What You’ll Work On
Lead and inspire a cross-functional team to:
Transform Product Data into Insights
Conduct deep-dive SQL and Python analyses to uncover opportunities in taxonomy, ontology, and catalog structure that enhance discovery and user experience.
Harness the Power of Generative AI
Use prompt engineering and LLMs to create innovative tools for classification, taxonomy validation, and data enrichment.
Build & Evaluate AI/ML Models
Design frameworks to evaluate product knowledge models, semantic embeddings, and ML-based categorization systems.
Drive Data-Informed Strategy
Translate complex findings into clear, actionable insights for Product and Engineering teams. Influence roadmap decisions on entity resolution, catalog optimization, and knowledge graph development.
Partner Across Functions
Collaborate closely with Applied Research, Engineering, and Product teams to build and deploy high-impact data and AI solutions at scale.
Experiment & Innovate Fast
Prototype quickly, validate hypotheses, and iterate on structured data and AI-driven solutions that push boundaries.
What You Bring
12+ years of experience in data science or analytics roles, including 5+ years leading teams
Proven track record building data products, knowledge graphs, and scalable data pipelines
Deep understanding of eCommerce search, recommendation systems, and product analytics
Hands-on experience with LLMs, prompt engineering, and RAG techniques (preferred)
Strong communication skills and ability to influence cross-functional stakeholders
Experience evaluating ML models with custom metrics and robust frameworks
Startup mindset—comfortable with ambiguity, bias for action, and fast iteration
Why Join Us
Be at the forefront of AI-powered product discovery in eCommerce
Own high-impact initiatives in a startup-style culture with real autonomy
Work alongside world-class talent across AI, Product, and Engineering
Build solutions that scale—serving millions of users and shaping the future of shopping
Ready to lead the next wave of AI + Data innovation in commerce? Let’s build the future together.
Please see the for information regarding how eBay handles your personal data collected when you use the eBay Careers website or apply for a job with eBay.
eBay is an equal opportunity employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, national origin, sex, sexual orientation, gender identity, veteran status, and disability, or other legally protected status. If you have a need that requires accommodation, please contact us at . We will make every effort to respond to your request for accommodation as soon as possible. View our to learn more about eBay's commitment to ensuring digital accessibility for people with disabilities.
The eBay Jobs website uses cookies to enhance your experience. By continuing to browse the site, you agree to our use of cookies. Visit our for more information.
Data Engineering
Posted today
Job Viewed
Job Description
Role Responsibilities:
- Design and build scalable data pipelines and architectures
- Integrate and transform large-scale datasets from varied sources
- Collaborate with solution designers to implement data ingestion and enrichment
- Promote coding standards and ensure data quality in agile delivery
Job Requirements:
- Strong experience with Hadoop ecosystem (Hive, Spark), Java/Scala
- Proficient in SQL, Shell Scripting, and version control tools
- Exposure to AWS cloud infrastructure and data warehouses
- Familiarity with Agile methodology and automation tools like GitHub and TeamCity
Skills Required
data engineering , Hadoop, Spark, Scala, Aws, Sql
Data Engineering
Posted today
Job Viewed
Job Description
Role Responsibilities:
- Design and build scalable data pipelines and architectures
- Integrate and transform large-scale datasets from varied sources
- Collaborate with solution designers to implement data ingestion and enrichment
- Promote coding standards and ensure data quality in agile delivery
Job Requirements:
- Strong experience with Hadoop ecosystem (Hive, Spark), Java/Scala
- Proficient in SQL, Shell Scripting, and version control tools
- Exposure to AWS cloud infrastructure and data warehouses
- Familiarity with Agile methodology and automation tools like GitHub and TeamCity
Skills Required
data engineering , Hadoop, Spark, Scala, Aws, Sql
Data Engineering
Posted today
Job Viewed
Job Description
Role Responsibilities:
- Design and build scalable data pipelines and architectures
- Integrate and transform large-scale datasets from varied sources
- Collaborate with solution designers to implement data ingestion and enrichment
- Promote coding standards and ensure data quality in agile delivery
Job Requirements:
- Strong experience with Hadoop ecosystem (Hive, Spark), Java/Scala
- Proficient in SQL, Shell Scripting, and version control tools
- Exposure to AWS cloud infrastructure and data warehouses
- Familiarity with Agile methodology and automation tools like GitHub and TeamCity
Skills Required
data engineering , Hadoop, Spark, Scala, Aws, Sql
Data Engineering
Posted today
Job Viewed
Job Description
Job Title: Middleware Engineer
Position: Data Engineer
Experience: 5-6yrs
Category: IT Infrastructure
Main location: India, Karnataka, Bangalore
Employment Type: Full Time
Qualification: Bachelor's degree in Computer Science or related field or higher.
Roles and Responsibilities
Data Engineer - 5-6 years experience.
Responsibilities
===
Design, develop, and maintain data architectures, pipelines, and workflows for the collection, processing, storage, and retrieval of large volumes of structured and unstructured data from multiple sources.
Collaborate with cross-functional teams to identify and prioritize data engineering requirements and to develop and deploy data-driven solutions to address business challenges.
Build and maintain scalable data storage and retrieval systems (e.g., data lakes, data warehouses, databases), fault-tolerant, and high-performance data platforms on cloud infrastructure such as AWS, Azure, or Google Cloud Platform.
Develop and maintain ETL workflows, data pipelines, and data transformation processes to prepare data for machine learning and AI applications.
Implement and optimize distributed computing frameworks such as Hadoop, Spark, or Flink to support high-performance and scalable processing of large data sets.
Build and maintain monitoring, alerting, and logging systems to ensure the availability, reliability, and performance of data pipelines and data platforms.
Collaborate with Data Scientists and Machine Learning Engineers to deploy models on production environments and ensure their scalability, reliability, and accuracy.
Requirements:
===
Bachelor s or master s degree in computer science, engineering, or related field.
At least 5-6 years of experience in data engineering, with a strong background in machine learning, cloud computing and big data technologies.
Experience with at least one major cloud platform (AWS, Azure, GCP).
Proficiency in programming languages like Python, Java, and SQL.
Experience with distributed computing technologies such as Hadoop, Spark, and Kafka.
Familiarity with database technologies such as SQL, NoSQL, NewSQL.
Experience with data warehousing and ETL tools such as Redshift, Snowflake, or Airflow.
Strong problem-solving and analytical skills.
Excellent communication and teamwork skills.
Preferred qualification:
===
Experience with DevOps practices and tools such as Docker, Kubernetes, or Ansible, Terraform.
Experience with data visualization tools such as Tableau, Superset, Power BI, or Plotly, D3.js.
Experience with stream processing frameworks such as Kafka, Pulsar or Kinesis.
Experience with data governance, data security, and compliance.
Experience with software engineering best practices and methodologies such as Agile or Scrum.
Must Have Skills
===
data engineer with expertise in machine learning, cloud computing , and big data technologies.
Data Engineering Experince on multiple clouds one of them , preferably GCP
data lakes, data warehouses, databases
ETL workflows, data pipelines,data platforms
Hadoop, Spark, or Flink
Hadoop, Spark, and Kafka
SQL, NoSQL, NewSQL
Redshift, Snowflake, or Airflow
Data Engineering
Posted today
Job Viewed
Job Description
We are seeking a skilled Data Engineer to join our team in India. The ideal candidate will have 6-16 years of experience in designing and building data systems to support analytical and operational needs.
Responsibilities- Designing and implementing data pipelines to support data acquisition, transformation, and storage.
- Collaborating with data scientists and analysts to optimize data processing and retrieval.
- Ensuring data quality and integrity by developing and maintaining robust data validation processes.
- Managing and optimizing databases and data warehouses for performance and scalability.
- Developing and maintaining documentation for data engineering processes, data models, and systems architecture.
- Proficiency in programming languages such as Python, Java, or Scala.
- Strong knowledge of SQL and experience with relational databases (e.g., MySQL, PostgreSQL).
- Experience with big data technologies such as Hadoop, Spark, or Kafka.
- Familiarity with cloud platforms (AWS, Azure, or Google Cloud) and their data services.
- Understanding of ETL (Extract, Transform, Load) processes and tools.
- Knowledge of data modeling techniques and practices.
- Experience with version control systems such as Git.
Education
Bachelor Of Technology (B.Tech/B.E), Bachelor of Ayurvedic Medicine and Surgery (BAMS), Doctor of Dental Surgery/Medicine (DDS/DMD), Doctor of Nursing Practice (DNP), Master of Public Administration (MPA), Master of Library & Information Science (MLIS), Doctor of Public Health (DrPH), Masters in Technology (M.Tech/M.E), PGDM, Doctor of Psychology (Psy. D./D. Psych.), Bachelor Of Computer Application (B.C.A), Bachelors of Law (B.L/L.L.B), Doctor of Physical Therapy, PGP, Bachelor of Homeopathic Medicine and Surgery (BHMS), Master of Law (M.L/L.L.M), MBBS, Doctor of Medicine (M.D/M.S), Bachelor of Business Administration (B.B.A), Doctor of Optometry, Doctor of Business Administration (DBA), Master OF Business Administration (M.B.A), Post Graduate Diploma in Computer Applications (PGDCA), Master in Computer Application (M.C.A), Bachelor of Dental Surgery (B.D.S), Post Graduate Programme in Management for Executives (PGPX), Doctor of Pharmacy (PharmD), Doctor of Veterinary Medicine (DVM)
Skills Required
Python, Sql, Etl, Data Warehousing, Apache Spark, Kafka, Nosql, Data Modeling, Cloud Services, Apache Airflow
Data Engineering
Posted today
Job Viewed
Job Description
Develop, optimize, and maintain robust ETL pipelines for data ingestion, transformation, and processing.
Design and implement scalable data solutions on Azure Cloud Services, leveraging tools like Azure Data Factory, Databricks, and Key Vault.
Work with real-time and batch data processing systems, ensuring data accuracy and availability.
Collaborate with cross-functional teams to define data engineering requirements and deliver efficient solutions in an Agile environment.
Perform advanced SQL queries and Python scripting for data extraction, transformation, and analysis.
Ensure data security, integrity, and compliance with organizational and industry standards.
Monitor, troubleshoot, and enhance data pipelines using modern tools and techniques.
Use scheduling tools like Control-M or equivalent to automate and orchestrate workflows.
Document technical workflows, processes, and troubleshooting guides for stakeholders.
RequirementsProficiency in SQL and Python for advanced data manipulation and automation.
Strong understanding of data engineering principles, including data modeling, data warehousing, and pipeline optimization.
Hands-on experience with Databricks and its ecosystem.
Familiarity with Azure cloud services, particularly Azure Data Factory, Key Vault, and other relevant components.
Experience working with real-time data systems (e.g., Kafka, Event Hub, or similar).
Good understanding of Agile methodologies and ability to work effectively in collaborative, fast-paced environments.
Exposure to Snowflake is added advantage
Knowledge of scheduling tools like Control-M or equivalent.
Strong analytical and problem-solving skills with a proactive mindset.
Excellent communication and teamwork abilities.
—
Job responsibilitiesDevelop, optimize, and maintain robust ETL pipelines for data ingestion, transformation, and processing.
Design and implement scalable data solutions on Azure Cloud Services, leveraging tools like Azure Data Factory, Databricks, and Key Vault.
Work with real-time and batch data processing systems, ensuring data accuracy and availability.
Collaborate with cross-functional teams to define data engineering requirements and deliver efficient solutions in an Agile environment.
Perform advanced SQL queries and Python scripting for data extraction, transformation, and analysis.
Ensure data security, integrity, and compliance with organizational and industry standards.
Monitor, troubleshoot, and enhance data pipelines using modern tools and techniques.
Use scheduling tools like Control-M or equivalent to automate and orchestrate workflows.
Document technical workflows, processes, and troubleshooting guides for stakeholders.
What we offerCulture of caring. At GlobalLogic, we prioritize a culture of caring. Across every region and department, at every level, we consistently put people first. From day one, you'll experience an inclusive culture of acceptance and belonging, where you'll have the chance to build meaningful connections with collaborative teammates, supportive managers, and compassionate leaders.
Learning and development. We are committed to your continuous learning and development. You'll learn and grow daily in an environment with many opportunities to try new things, sharpen your skills, and advance your career at GlobalLogic. With our Career Navigator tool as just one example, GlobalLogic offers a rich array of programs, training curricula, and hands-on opportunities to grow personally and professionally.
Interesting & meaningful work. GlobalLogic is known for engineering impact for and with clients around the world. As part of our team, you'll have the chance to work on projects that matter. Each is a unique opportunity to engage your curiosity and creative problem-solving skills as you help clients reimagine what's possible and bring new solutions to market. In the process, you'll have the privilege of working on some of the most cutting-edge and impactful solutions shaping the world today.
Balance and flexibility. We believe in the importance of balance and flexibility. With many functional career areas, roles, and work arrangements, you can explore ways of achieving the perfect balance between your work and life. Your life extends beyond the office, and we always do our best to help you integrate and balance the best of work and life, having fun along the way!
High-trust organization. We are a high-trust organization where integrity is key. By joining GlobalLogic, you're placing your trust in a safe, reliable, and ethical global company. Integrity and trust are a cornerstone of our value proposition to our employees and clients. You will find truthfulness, candor, and integrity in everything we do.
About GlobalLogicGlobalLogic, a Hitachi Group Company, is a trusted digital engineering partner to the world's largest and most forward-thinking companies. Since 2000, we've been at the forefront of the digital revolution – helping create some of the most innovative and widely used digital products and experiences. Today we continue to collaborate with clients in transforming businesses and redefining industries through intelligent products, platforms, and services.
Skills Required
data engineering , Python, Sql
Be The First To Know
About the latest Data preprocessing Jobs in India !
Data Engineering
Posted today
Job Viewed
Job Description
Role Responsibilities:
- Design and build scalable data pipelines and architectures
- Integrate and transform large-scale datasets from varied sources
- Collaborate with solution designers to implement data ingestion and enrichment
- Promote coding standards and ensure data quality in agile delivery
Job Requirements:
- Strong experience with Hadoop ecosystem (Hive, Spark), Java/Scala
- Proficient in SQL, Shell Scripting, and version control tools
- Exposure to AWS cloud infrastructure and data warehouses
- Familiarity with Agile methodology and automation tools like GitHub and TeamCity
Skills Required
data engineering , Hadoop, Spark, Scala, Aws, Sql
Senior Data Engineering Analyst - Data Engineering

Posted 3 days ago
Job Viewed
Job Description
**Primary Responsibilities:**
+ Ensures that all the standard requirements have been met and is involved in performing the technical analysis
+ Assisting the project manager by compiling information from the current systems, analyzing the program requirements, and ensuring that it meets the specified time requirements
+ Resolves moderate problems associated with the designed programs and provides technical guidance on complex programming
+ Comply with the terms and conditions of the employment contract, company policies and procedures, and any and all directives (such as, but not limited to, transfer and/or re-assignment to different work locations, change in teams and/or work shifts, policies in regards to flexibility of work benefits and/or work environment, alternative work arrangements, and other decisions that may arise due to the changing business environment). The Company may adopt, vary or rescind these policies and directives in its absolute discretion and without any limitation (implied or otherwise) on its ability to do so
**Required Qualifications:**
+ Graduate degree or equivalent experience
+ 4+ years of experience in database architecture, engineering, design, optimization, security, and administration; as well as data modeling, big data development, Extract, Transform, and Load (ETL) development, storage engineering, data warehousing, data provisioning
+ Good hands-on experience in Azure, ADF and Databricks
+ Experience in RDBMS like Oracle, SQL Server, Oracle, DB2 etc.
+ Knowledge of Azure, ADF, Databricks, Scala, Airflow, DWH concepts
+ Good knowledge in Unix and shell scripting
+ Good understanding of Extract, Transform, and Load (ETL) Architecture, Cloud, Spark
+ Good understanding of Data Architecture
+ Understanding of the business needs and designing programs and systems that match the complex business requirements and records all the specifications that are involved in the development and coding process
+ Understanding of QA and testing automation process
+ Proven ability to participate in agile development projects for batch and real time data ingestion
+ Proven ability to work with business and peers to define, estimate, and deliver functionality
+ Proven ability to be involved in creating proper technical documentation in the work assignments
**Preferred Qualifications:**
+ Knowledge of Agile, Automation, Big Data, DevOps, Python, - Scala/Pyspark programming, HDFS, Hive, Python, AI/ML
+ Understanding and knowledge of Agile
_At UnitedHealth Group, our mission is to help people live healthier lives and make the health system work better for everyone. We believe everyone-of every race, gender, sexuality, age, location and income-deserves the opportunity to live their healthiest life. Today, however, there are still far too many barriers to good health which are disproportionately experienced by people of color, historically marginalized groups and those with lower incomes. We are committed to mitigating our impact on the environment and enabling and delivering equitable care that addresses health disparities and improves health outcomes - an enterprise priority reflected in our mission._
#Nic
Senior Data Engineering Analyst - Data Engineering
Posted today
Job Viewed
Job Description
Optum is a global organization that delivers care, aided by technology to help millions of people live healthier lives. The work you do with our team will directly improve health outcomes by connecting people with the care, pharmacy benefits, data and resources they need to feel their best. Here, you will find a culture guided by inclusion, talented peers, comprehensive benefits and career development opportunities. Come make an impact on the communities we serve as you help us advance health optimization on a global scale. Join us to start Caring. Connecting. Growing together.
Primary Responsibilities:
Required Qualifications:
Preferred Qualifications:
At UnitedHealth Group, our mission is to help people live healthier lives and make the health system work better for everyone. We believe everyone–of every race, gender, sexuality, age, location and income–deserves the opportunity to live their healthiest life. Today, however, there are still far too many barriers to good health which are disproportionately experienced by people of color, historically marginalized groups and those with lower incomes. We are committed to mitigating our impact on the environment and enabling and delivering equitable care that addresses health disparities and improves health outcomes - an enterprise priority reflected in our mission.
#Nic