3,610 Data Preprocessing jobs in India
Director Data Science & Data Engineering
Posted today
Job Viewed
Job Description
At eBay, we're more than a global ecommerce leader — we’re changing the way the world shops and sells. Our platform empowers millions of buyers and sellers in more than 190 markets around the world. We’re committed to pushing boundaries and leaving our mark as we reinvent the future of ecommerce for enthusiasts.
Our customers are our compass, authenticity thrives, bold ideas are welcome, and everyone can bring their unique selves to work — every day. We're in this together, sustaining the future of our customers, our company, and our planet.
Join a team of passionate thinkers, innovators, and dreamers — and help us connect people and build communities to create economic opportunity for all.
Director – Data Science & Data Engineering
Shape the Future of AI-Driven eCommerce Discovery
About the Role
We're reimagining how people discover products in eCommerce—and we're looking for a visionary leader who blends technical depth with product intuition. If you're passionate about structured data, large language models, and building high-impact data products, this role is tailor-made for you.
As Director of Data Science & Data Engineering, you’ll lead a talented team of data scientists, analysts, and engineers working at the cutting edge of AI/ML, product analytics, and taxonomy design. Your mission? Drive innovation in product discovery through smarter data, scalable infrastructure, and breakthrough AI-powered solutions.
You’ll join the Product Knowledge org and play a key role in designing the backbone of next-gen search, recommendations, and generative AI experiences.
This is a high-impact, high-agency role—perfect for a hands-on leader who thrives in fast-paced, collaborative environments.
What You’ll Work On
Lead and inspire a cross-functional team to:
Transform Product Data into Insights
Conduct deep-dive SQL and Python analyses to uncover opportunities in taxonomy, ontology, and catalog structure that enhance discovery and user experience.
Harness the Power of Generative AI
Use prompt engineering and LLMs to create innovative tools for classification, taxonomy validation, and data enrichment.
Build & Evaluate AI/ML Models
Design frameworks to evaluate product knowledge models, semantic embeddings, and ML-based categorization systems.
Drive Data-Informed Strategy
Translate complex findings into clear, actionable insights for Product and Engineering teams. Influence roadmap decisions on entity resolution, catalog optimization, and knowledge graph development.
Partner Across Functions
Collaborate closely with Applied Research, Engineering, and Product teams to build and deploy high-impact data and AI solutions at scale.
Experiment & Innovate Fast
Prototype quickly, validate hypotheses, and iterate on structured data and AI-driven solutions that push boundaries.
What You Bring
12+ years of experience in data science or analytics roles, including 5+ years leading teams
Proven track record building data products, knowledge graphs, and scalable data pipelines
Deep understanding of eCommerce search, recommendation systems, and product analytics
Hands-on experience with LLMs, prompt engineering, and RAG techniques (preferred)
Strong communication skills and ability to influence cross-functional stakeholders
Experience evaluating ML models with custom metrics and robust frameworks
Startup mindset—comfortable with ambiguity, bias for action, and fast iteration
Why Join Us
Be at the forefront of AI-powered product discovery in eCommerce
Own high-impact initiatives in a startup-style culture with real autonomy
Work alongside world-class talent across AI, Product, and Engineering
Build solutions that scale—serving millions of users and shaping the future of shopping
Ready to lead the next wave of AI + Data innovation in commerce? Let’s build the future together.
Please see the for information regarding how eBay handles your personal data collected when you use the eBay Careers website or apply for a job with eBay.
eBay is an equal opportunity employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, national origin, sex, sexual orientation, gender identity, veteran status, and disability, or other legally protected status. If you have a need that requires accommodation, please contact us at . We will make every effort to respond to your request for accommodation as soon as possible. View our to learn more about eBay's commitment to ensuring digital accessibility for people with disabilities.
The eBay Jobs website uses cookies to enhance your experience. By continuing to browse the site, you agree to our use of cookies. Visit our for more information.
Data Engineering
Posted 1 day ago
Job Viewed
Job Description
Responsibilities:
- Work with stakeholders to understand the data requirements to design, develop, and maintain complex ETL processes.
- Create the data integration and data diagram documentation.
- Lead the data validation, UAT and regression test for new data asset creation.
- Create and maintain data models, including schema design and optimization.
- Create and manage data pipelines that automate the flow of data, ensuring data quality and consistency.
Qualifications and Skills:
- Strong knowledge on Python and Pyspark
- Expectation is to have ability to write Pyspark scripts for developing data workflows.
- Strong knowledge on SQL, Hadoop, Hive, Azure, Databricks and Greenplum
- Expectation is to write SQL to query metadata and tables from different data management system such as, Oracle, Hive, Databricks and Greenplum.
- Familiarity with big data technologies like Hadoop, Spark, and distributed computing frameworks.
- Expectation is to use Hue and run Hive SQL queries, schedule Apache Oozie jobs to automate the data workflows.
- Good working experience of communicating with the stakeholders and collaborate effectively with the business team for data testing.
- Expectation is to have strong problem-solving and troubleshooting skills.
- Expectation is to establish comprehensive data quality test cases, procedures and implement automated data validation processes.
- Degree in Data Science, Statistics, Computer Science or other related fields or an equivalent combination of education and experience.
- 3-7 years of experience in Data Engineer.
- Proficiency in programming languages commonly used in data engineering, such as Python, Pyspark, SQL.
- Experience in Azure cloud computing platform, such as developing ETL processes using Azure Data Factory, big data processing and analytics with Azure Databricks.
- Strong communication, problem solving and analytical skills with the ability to do time management and multi-tasking with attention to detail and accuracy.
Data Engineering
Posted today
Job Viewed
Job Description
Work with stakeholders to understand the data requirements to design, develop, and maintain complex ETL processes.
Create the data integration and data diagram documentation.
Lead the data validation, UAT and regression test for new data asset creation.
Create and maintain data models, including schema design and optimization.
Create and manage data pipelines that automate the flow of data, ensuring data quality and consistency.
Qualifications and Skills:
Strong knowledge on Python and Pyspark
Expectation is to have ability to write Pyspark scripts for developing data workflows.
Strong knowledge on SQL, Hadoop, Hive, Azure, Databricks and Greenplum
Expectation is to write SQL to query metadata and tables from different data management system such as, Oracle, Hive, Databricks and Greenplum.
Familiarity with big data technologies like Hadoop, Spark, and distributed computing frameworks.
Expectation is to use Hue and run Hive SQL queries, schedule Apache Oozie jobs to automate the data workflows.
Good working experience of communicating with the stakeholders and collaborate effectively with the business team for data testing.
Expectation is to have strong problem-solving and troubleshooting skills.
Expectation is to establish comprehensive data quality test cases, procedures and implement automated data validation processes.
Degree in Data Science, Statistics, Computer Science or other related fields or an equivalent combination of education and experience.
3-7 years of experience in Data Engineer.
Proficiency in programming languages commonly used in data engineering, such as Python, Pyspark, SQL.
Experience in Azure cloud computing platform, such as developing ETL processes using Azure Data Factory, big data processing and analytics with Azure Databricks.
Strong communication, problem solving and analytical skills with the ability to do time management and multi-tasking with attention to detail and accuracy.
Data Engineering
Posted today
Job Viewed
Job Description
Responsibilities:
- Work with stakeholders to understand the data requirements to design, develop, and maintain complex ETL processes.
- Create the data integration and data diagram documentation.
- Lead the data validation, UAT and regression test for new data asset creation.
- Create and maintain data models, including schema design and optimization.
- Create and manage data pipelines that automate the flow of data, ensuring data quality and consistency.
Qualifications and Skills:
- Strong knowledge on Python and Pyspark
- Expectation is to have ability to write Pyspark scripts for developing data workflows.
- Strong knowledge on SQL, Hadoop, Hive, Azure, Databricks and Greenplum
- Expectation is to write SQL to query metadata and tables from different data management system such as, Oracle, Hive, Databricks and Greenplum.
- Familiarity with big data technologies like Hadoop, Spark, and distributed computing frameworks.
- Expectation is to use Hue and run Hive SQL queries, schedule Apache Oozie jobs to automate the data workflows.
- Good working experience of communicating with the stakeholders and collaborate effectively with the business team for data testing.
- Expectation is to have strong problem-solving and troubleshooting skills.
- Expectation is to establish comprehensive data quality test cases, procedures and implement automated data validation processes.
- Degree in Data Science, Statistics, Computer Science or other related fields or an equivalent combination of education and experience.
- 3-7 years of experience in Data Engineer.
- Proficiency in programming languages commonly used in data engineering, such as Python, Pyspark, SQL.
- Experience in Azure cloud computing platform, such as developing ETL processes using Azure Data Factory, big data processing and analytics with Azure Databricks.
- Strong communication, problem solving and analytical skills with the ability to do time management and multi-tasking with attention to detail and accuracy.
Data Engineering
Posted today
Job Viewed
Job Description
Responsibilities:
- Work with stakeholders to understand the data requirements to design, develop, and maintain complex ETL processes.
- Create the data integration and data diagram documentation.
- Lead the data validation, UAT and regression test for new data asset creation.
- Create and maintain data models, including schema design and optimization.
- Create and manage data pipelines that automate the flow of data, ensuring data quality and consistency.
Qualifications and Skills:
- Strong knowledge on Python and Pyspark
- Expectation is to have ability to write Pyspark scripts for developing data workflows.
- Strong knowledge on SQL, Hadoop, Hive, Azure, Databricks and Greenplum
- Expectation is to write SQL to query metadata and tables from different data management system such as, Oracle, Hive, Databricks and Greenplum.
- Familiarity with big data technologies like Hadoop, Spark, and distributed computing frameworks.
- Expectation is to use Hue and run Hive SQL queries, schedule Apache Oozie jobs to automate the data workflows.
- Good working experience of communicating with the stakeholders and collaborate effectively with the business team for data testing.
- Expectation is to have strong problem-solving and troubleshooting skills.
- Expectation is to establish comprehensive data quality test cases, procedures and implement automated data validation processes.
- Degree in Data Science, Statistics, Computer Science or other related fields or an equivalent combination of education and experience.
- 3-7 years of experience in Data Engineer.
- Proficiency in programming languages commonly used in data engineering, such as Python, Pyspark, SQL.
- Experience in Azure cloud computing platform, such as developing ETL processes using Azure Data Factory, big data processing and analytics with Azure Databricks.
- Strong communication, problem solving and analytical skills with the ability to do time management and multi-tasking with attention to detail and accuracy.
Data engineering
Posted 5 days ago
Job Viewed
Job Description
Work with stakeholders to understand the data requirements to design, develop, and maintain complex ETL processes.
Create the data integration and data diagram documentation.
Lead the data validation, UAT and regression test for new data asset creation.
Create and maintain data models, including schema design and optimization.
Create and manage data pipelines that automate the flow of data, ensuring data quality and consistency.
Qualifications and Skills:
Strong knowledge on Python and Pyspark
Expectation is to have ability to write Pyspark scripts for developing data workflows.
Strong knowledge on SQL, Hadoop, Hive, Azure, Databricks and Greenplum
Expectation is to write SQL to query metadata and tables from different data management system such as, Oracle, Hive, Databricks and Greenplum.
Familiarity with big data technologies like Hadoop, Spark, and distributed computing frameworks.
Expectation is to use Hue and run Hive SQL queries, schedule Apache Oozie jobs to automate the data workflows.
Good working experience of communicating with the stakeholders and collaborate effectively with the business team for data testing.
Expectation is to have strong problem-solving and troubleshooting skills.
Expectation is to establish comprehensive data quality test cases, procedures and implement automated data validation processes.
Degree in Data Science, Statistics, Computer Science or other related fields or an equivalent combination of education and experience.
3-7 years of experience in Data Engineer.
Proficiency in programming languages commonly used in data engineering, such as Python, Pyspark, SQL.
Experience in Azure cloud computing platform, such as developing ETL processes using Azure Data Factory, big data processing and analytics with Azure Databricks.
Strong communication, problem solving and analytical skills with the ability to do time management and multi-tasking with attention to detail and accuracy.
Data Engineering
Posted today
Job Viewed
Job Description
- Design, build, and maintain scalable data pipelines to process structured, semi-structured, and unstructured data.
- Develop ETL processes to ingest, transform, and load data into cloud data warehouses (Snowflake or Redshift), ensuring data quality and consistency.
- Collaborate with cross-functional teams including business stakeholders, analysts, report developers, architects, and data scientists to gather and translate data requirements into technical solutions.
- Optimize data infrastructure and processes for performance and efficiency, including storage, processing, and querying.
- Implement data governance policies and best practices to ensure compliance, security, and privacy.
- Troubleshoot and resolve data issues, identifying root causes and implementing effective solutions.
- Utilize AWS cloud services (S3, EC2, EMR, Glue, Athena, Lambda) for data engineering tasks.
- Code primarily in Python and PySpark; proficiency in SQL required.
- Participate in designing next-generation data platform architecture to meet increasing data demands.
- Employ distributed computing frameworks (e.g., Hadoop, Spark) and orchestration tools (e.g., Airflow, Jenkins).
- Use containerization and orchestration technologies (e.g., Docker, Kubernetes).
- Build CI/CD pipelines using tools like GitLab and Bitbucket.
- Apply knowledge of data modeling, database concepts, and query languages (SQL, Hive).
- Ensure data quality using monitoring tools (e.g., Great Expectations).
- Support data governance, lineage, and cataloging using relevant tools (e.g., DataHub).
- Work with real-time data streaming technologies (e.g., Kafka) as needed.
- Contribute to reporting and analytics initiatives, including data visualization with tools like Tableau, Power BI, or MicroStrategy.
- Adhere to Agile/Scrum methodologies and participate in related development processes.
- Retail industry experience considered a plus.
Be The First To Know
About the latest Data preprocessing Jobs in India !
Data Engineering
Posted today
Job Viewed
Job Description
Job Title: Middleware Engineer
Position: Data Engineer
Experience: 5-6yrs
Category: IT Infrastructure
Main location: India, Karnataka, Bangalore
Employment Type: Full Time
Qualification: Bachelor's degree in Computer Science or related field or higher.
Roles and Responsibilities
Data Engineer - 5-6 years experience.
Responsibilities
===
Design, develop, and maintain data architectures, pipelines, and workflows for the collection, processing, storage, and retrieval of large volumes of structured and unstructured data from multiple sources.
Collaborate with cross-functional teams to identify and prioritize data engineering requirements and to develop and deploy data-driven solutions to address business challenges.
Build and maintain scalable data storage and retrieval systems (e.g., data lakes, data warehouses, databases), fault-tolerant, and high-performance data platforms on cloud infrastructure such as AWS, Azure, or Google Cloud Platform.
Develop and maintain ETL workflows, data pipelines, and data transformation processes to prepare data for machine learning and AI applications.
Implement and optimize distributed computing frameworks such as Hadoop, Spark, or Flink to support high-performance and scalable processing of large data sets.
Build and maintain monitoring, alerting, and logging systems to ensure the availability, reliability, and performance of data pipelines and data platforms.
Collaborate with Data Scientists and Machine Learning Engineers to deploy models on production environments and ensure their scalability, reliability, and accuracy.
Requirements:
===
Bachelor s or master s degree in computer science, engineering, or related field.
At least 5-6 years of experience in data engineering, with a strong background in machine learning, cloud computing and big data technologies.
Experience with at least one major cloud platform (AWS, Azure, GCP).
Proficiency in programming languages like Python, Java, and SQL.
Experience with distributed computing technologies such as Hadoop, Spark, and Kafka.
Familiarity with database technologies such as SQL, NoSQL, NewSQL.
Experience with data warehousing and ETL tools such as Redshift, Snowflake, or Airflow.
Strong problem-solving and analytical skills.
Excellent communication and teamwork skills.
Preferred qualification:
===
Experience with DevOps practices and tools such as Docker, Kubernetes, or Ansible, Terraform.
Experience with data visualization tools such as Tableau, Superset, Power BI, or Plotly, D3.js.
Experience with stream processing frameworks such as Kafka, Pulsar or Kinesis.
Experience with data governance, data security, and compliance.
Experience with software engineering best practices and methodologies such as Agile or Scrum.
Must Have Skills
===
data engineer with expertise in machine learning, cloud computing , and big data technologies.
Data Engineering Experince on multiple clouds one of them , preferably GCP
data lakes, data warehouses, databases
ETL workflows, data pipelines,data platforms
Hadoop, Spark, or Flink
Hadoop, Spark, and Kafka
SQL, NoSQL, NewSQL
Redshift, Snowflake, or Airflow
Data Engineering
Posted today
Job Viewed
Job Description
Profile Description
Division IST Location Mumbai
We're seeking someone to join our team as Data Engineer as a part of our Institutional Securities_Technology, to will play a key role in helping transform how Morgan Stanley operates
In stitutional Securities_Technology
Institutional Securities Technology (IST) develops and oversees the overall technology strategy and bespoke technology solutions to drive and enable the institutional businesses and enterprise-wide functions. IST's 'clients' include Fixed Income, Equities, Commodities, Investment Banking, Research, Prime Brokerage and Global Capital Markets.
Advisory & Sales Distribution
Advisory is responsible for end-to-end process life cycle support for Investment Banking, Global Capital Markets and Research with a focus on efficiency, scale, and time to market. Sales Distribution develops technology for the firm's IED and FID Sales and Distribution business units, creating application synergies across the businesses various platforms
Software Engineering
This is Director position that develops and maintains software solutions that support business needs.
Morgan Stanley is an industry leader in financial services, known for mobilizing capital to help governments, corporations, institutions, and individuals around the world achieve their financial goals.
At Morgan Stanley India, we support the Firm's global businesses, with critical presence across Institutional Securities, Wealth Management, and Investment management, as well as in the Firm's infrastructure functions of Technology, Operations, Finance, Risk Management, Legal and Corporate & Enterprise Services. Morgan Stanley has been rooted in India since 1993, with campuses in both Mumbai and Bengaluru. We empower our multi-faceted and talented teams to advance their careers and make a global impact on the business. For those who show passion and grit in their work, there's ample opportunity to move across the businesses for those who show passion and grit in their work.
Interested in joining a team that's eager to create, innovate and make an impact on the world? Read on.
What you'll do in the role:
Strong hand-on experience of Snowflake keeping up to date with available latest features.
Strong hands-on experience on python programming language.
Strong hands-on experience on writing complex SQLs.
Strong understanding of how distributed platforms works.
Ability to develop, maintain and distribute the code in modularized fashion.
Process oriented, focused on standardization, streamlining, and implementation of best practices delivery approach
Basic understanding of Unix shell script and Unix OS platform
Excellent problem solving and analytical skill
Excellent verbal and written communication skills
Ability to multi-task and function efficiently in a fast-paced environment
Self-starter with flexibility and adaptability in a dynamic work environment
Good Understanding of Agile Methodology 5+ years of experience in Data Engineer role.
5+ years of experience in designing & building real time data pipelines and analytical solutions using big data and cloud technology ecosystem (Azure, Snowflake, Databricks etc)
Expertise with Data Lake/Big Data Projects implementation in cloud (preferably Azure+Snowflake)
What you'll bring to the role:
Experience in Data Reporting Tools (preferably Business Objects)
Experience in Visualization Tools (preferably Power BI)
Financial Domain experience will be a plus
What you can expect from Morgan Stanley
We are committed to maintaining the first-class service and high standard of excellence that have defined Morgan Stanley for over 85 years. At our foundation are five core values - putting clients first, doing the right thing, leading with exceptional ideas, committing to diversity and inclusion, and giving back - that guide our more than 80,000 employees in 1,200 offices across 42 countries. At Morgan Stanley, you'll find trusted colleagues, committed mentors and a culture that values diverse perspectives, individual intellect and cross-collaboration. Our Firm is differentiated by the caliber of our diverse team, while our company culture and commitment to inclusion define our legacy and shape our future, helping to strengthen our business and bring value to clients around the world. Learn more about how we put this commitment to action: morganstanley.com/diversity. We are proud to support our employees and their families at every point along their work-life journey, offering some of the most attractive and comprehensive employee benefits and perks in the industry.
WHAT YOU CAN EXPECT FROM MORGAN STANLEY:
We are committed to maintaining the first-class service and high standard of excellence that have defined Morgan Stanley for over 89 years. Our values - putting clients first, doing the right thing, leading with exceptional ideas, committing to diversity and inclusion, and giving back - aren't just beliefs, they guide the decisions we make every day to do what's best for our clients, communities and more than 80,000 employees in 1,200 offices across 42 countries. At Morgan Stanley, you'll find an opportunity to work alongside the best and the brightest, in an environment where you are supported and empowered. Our teams are relentless collaborators and creative thinkers, fueled by their diverse backgrounds and experiences. We are proud to support our employees and their families at every point along their work-life journey, offering some of the most attractive and comprehensive employee benefits and perks in the industry. There's also ample opportunity to move about the business for those who show passion and grit in their work.
To learn more about our offices across the globe, please copy and paste into your browser.
Morgan Stanley is an equal opportunities employer. We work to provide a supportive and inclusive environment where all individuals can maximize their full potential. Our skilled and creative workforce is comprised of individuals drawn from a broad cross section of the global communities in which we operate and who reflect a variety of backgrounds, talents, perspectives, and experiences. Our strong commitment to a culture of inclusion is evident through our constant focus on recruiting, developing, and advancing individuals based on their skills and talents.
Data Engineering
Posted today
Job Viewed
Job Description
- Work with stakeholders to understand the data requirements to design, develop, and maintain complex ETL processes. - Create the data integration and data diagram documentation. - Lead the data validation, UAT and regression test for new data asset creation. - Create and maintain data models, including schema design and optimization. - Create and manage data pipelines that automate the flow of data, ensuring data quality and consistency.
Qualifications and Skills:
- Strong knowledge on Python and Pyspark - Expectation is to have ability to write Pyspark scripts for developing data workflows. - Strong knowledge on SQL, Hadoop, Hive, Azure, Databricks and Greenplum - Expectation is to write SQL to query metadata and tables from different data management system such as, Oracle, Hive, Databricks and Greenplum. - Familiarity with big data technologies like Hadoop, Spark, and distributed computing frameworks. - Expectation is to use Hue and run Hive SQL queries, schedule Apache Oozie jobs to automate the data workflows. - Good working experience of communicating with the stakeholders and collaborate effectively with the business team for data testing. - Expectation is to have strong problem-solving and troubleshooting skills. - Expectation is to establish comprehensive data quality test cases, procedures and implement automated data validation processes. - Degree in Data Science, Statistics, Computer Science or other related fields or an equivalent combination of education and experience. - 3-7 years of experience in Data Engineer. - Proficiency in programming languages commonly used in data engineering, such as Python, Pyspark, SQL. - Experience in Azure cloud computing platform, such as developing ETL processes using Azure Data Factory, big data processing and analytics with Azure Databricks. - Strong communication, problem solving and analytical skills with the ability to do time management and multi-tasking with attention to detail and accuracy.