31 Big Data Technologies jobs in Indore
Data Engineering Role
Posted 4 days ago
Job Viewed
Job Description
Minimum Requirements:
- At least 3 years of professional experience in Data Engineering
- Demonstrated end-to-end ownership of ETL pipelines
- Deep, hands-on experience with AWS services: EC2, Athena, Lambda, and Step Functions (non-negotiable)
- Strong proficiency in MySQL (non-negotiable)
- Working knowledge of Docker: setup, deployment, and troubleshooting
Highly Preferred Skills:
- Experience with orchestration tools such as Airflow or similar
- Hands-on with PySpark
- Familiarity with the Python data ecosystem: SQLAlchemy, DuckDB, PyArrow, Pandas, NumPy
- Exposure to DLT (Data Load Tool)
Ideal Candidate Profile:
The role demands a builder’s mindset over a maintainer’s. Independent contributors with clear, efficient communication thrive here. Those who excel tend to embrace fast-paced startup environments, take true ownership, and are motivated by impact—not just lines of code. Candidates are expected to include the phrase Red Panda in their application to confirm they’ve read this section in full.
Key Responsibilities:
- Architect, build, and optimize scalable data pipelines and workflows
- Manage AWS resources end-to-end: from configuration to optimization and debugging
- Work closely with product and engineering to enable high-velocity business impact
- Automate and scale data processes—manual workflows are not part of the culture
- Build foundational data systems that drive critical business decisions
Compensation range: ₹8.4–12 LPA (fixed base), excluding equity, performance bonus, and revenue share components.
Data Engineering Azure databricks
Posted 12 days ago
Job Viewed
Job Description
Data Engineer (DE) Consultant is responsible for designing, developing, and maintaining data assets and data related products by liaising with multiple stakeholders.
Responsibilities:
- Work with stakeholders to understand the data requirements to design, develop, and maintain complex ETL processes.
- Create the data integration and data diagram documentation.
- Lead the data validation, UAT and regression test for new data asset creation.
- Create and maintain data models, including schema design and optimization.
- Create and manage data pipelines that automate the flow of data, ensuring data quality and consistency.
Qualifications and Skills:
- Strong knowledge on Python and Pyspark
- Expectation is to have ability to write Pyspark scripts for developing data workflows.
- Strong knowledge on SQL, Hadoop, Hive, Azure, Databricks and Greenplum
- Expectation is to write SQL to query metadata and tables from different data management system such as, Oracle, Hive, Databricks and Greenplum.
- Familiarity with big data technologies like Hadoop, Spark, and distributed computing frameworks.
- Expectation is to use Hue and run Hive SQL queries, schedule Apache Oozie jobs to automate the data workflows.
- Good working experience of communicating with the stakeholders and collaborate effectively with the business team for data testing.
- Expectation is to have strong problem-solving and troubleshooting skills.
- Expectation is to establish comprehensive data quality test cases, procedures and implement automated data validation processes.
- Degree in Data Science, Statistics, Computer Science or other related fields or an equivalent combination of education and experience.
- 4-7 years of experience in Data Engineer.
- Proficiency in programming languages commonly used in data engineering, such as Python, Pyspark, SQL.
- Experience in Azure cloud computing platform, such as developing ETL processes using Azure Data Factory, big data processing and analytics with Azure Databricks.
- Strong communication, problem solving and analytical skills with the ability to do time management and multi-tasking with attention to detail and accuracy.
Data Science Internship
Posted 1 day ago
Job Viewed
Job Description
Data Science Internship at SpectoV – Analyze. Predict. Innovate.
SpectoV is offering an excellent opportunity for Data Science Interns (Freshers) to work on real-world projects involving data analytics, machine learning, and AI-driven solutions. This internship is designed to provide practical experience in handling and analyzing data to build impactful solutions.
Program Structure:
• 1 Month Mandatory Training – Sankalp Data Science Training Program
• 2 Months Live Project Work – Real-world data science applications
Internship Details:
• Duration: 3 Months
• Location: Remote
• Eligibility: Freshers with basic knowledge of Python, Data Analytics, and Machine Learning concepts
Perks:
• Performance-based stipend (up to ₹10,000 per month)
• Full-time offer for top performers
• Training and Internship Completion Certificates
• Letter of Recommendation
Website: spectov.in
Apply now and build a strong foundation in Data Science with one of India’s leading AR/VR and AI innovators.
Data Science Specialist
Posted 1 day ago
Job Viewed
Job Description
Our client, a global leader in the maritime industry, is seeking a talented Data Science Analyst to join their dynamic team in Mumbai.
Data Science Analyst
based in Andheri, Mumbai, India
ORGANIZATION : This position reports to Senior Data expert in the company headquarters. You will be part of the global analytics function that builds and drives data-driven decision-making across the organization. This role focuses on developing analytics platforms, dashboards, and predictive models that enhance operational efficiency and business intelligence across the company’s global fleet and offices. If you enjoy solving complex data challenges, building analytical solutions, and turning insights into real-world impact, this is an excellent opportunity to be part of an innovative team shaping the future of maritime analytics.
Key Responsibilities:
- Lead and deliver analytics projects from concept to completion, aligned with the company’s analytics roadmap.
- Source, cleanse, and transform raw data from large-scale systems to create strong foundations for modelling and dashboarding.
- Develop, validate, and deploy machine learning models using Python or R, leveraging modern ML libraries such as Scikit-learn or PyTorch.
- Build interactive and user-friendly dashboards using Tableau or Power BI, ensuring data is translated into actionable insights.
- Conduct exploratory data analysis (EDA) to identify trends and opportunities that drive strategic business outcomes.
- Collaborate with IT and operations teams across India and Hong Kong to design scalable, high-quality analytical solutions.
- Communicate findings effectively to senior stakeholders and non-technical users through visual and written reports.
- Promote data governance, best practices, and data literacy across the organization.
Your Profile:
- Bachelor’s or Master’s degree in Mathematics, Statistics, Computer Science, or a related quantitative field.
- 5+ years of hands-on experience in Data Analytics or Data Science roles.
- Proficiency in SQL and experience working with NoSQL or Graph Databases.
- Demonstrated experience building machine learning models and deploying them in production environments.
- Strong working knowledge of data visualization tools such as Tableau or Power BI.
- Familiarity with Git for version control and collaborative development.
- Excellent analytical, problem-solving, and communication skills with the ability to simplify complex data for decision-makers.
- Experience with Big Data technologies (e.g. Spark, Hive, Presto) or cloud platforms (AWS, Azure, Google Cloud) will be an advantage.
- Exposure to Generative AI tools and frameworks such as LangChain, Vertex AI, or Hugging Face will be a plus.
The Offer:
- Join us to make a significant impact in the maritime industry through cutting-edge data analytics.
- A competitive remuneration package with benefits.
- Work with a global leader in the shipping industry.
RECRUITMENT PROCESS: Please apply with an updated resume. All applications will be treated as strictly confidential. Our team will review your application and a consultant will get in touch with you accordingly.
Comaea Consulting
Data Science Intern
Posted 26 days ago
Job Viewed
Job Description
NLP Data Science Intern
Did you notice a shortage of food at supermarkets during covid? Have you heard about the recent issues in the global shipping industry? or perhaps you’ve heard about the shortages of microchips? These problems are called supply chain disruptions. They have been increasing in frequency and severity. Supply chain disruptions are threatening our very way of life.
Our vision is to advance society’s capacity to withstand shocks and stresses. Kavida.ai believes the only way to ensure security is through supply chain resiliency. We are on a mission to help companies proactively manage disruption supply chain disruption risks using integrated data.
Our Story
In March 2020 over 35 academics, data scientists, students, and software engineering volunteers came together to address the food shortage issues caused by the pandemic - Covid19foodsupply.com. A core team of 9 was formed and spun off into a startup and the rest is history.
Our investors include one of the world's largest supply chain quality & compliance monitoring companies, a £1.25bn apparel manufacturer, and some very impressive angel investors.
Social Impact:
Social impact is in our DNA. We believe private sector innovation is the only way to address social problems at scale. If we achieve our mission, humanity will always have access to its essential goods for sustenance. No more shortages of food, PPE, medicine, etc.
Our Culture :
Idea Meritocracy:
The best ideas win. We only care about what is right, not who is right. We know arriving at the best answer requires constructive tension. Sometimes it can get heated but it's never personal. Everyone contributes to better ideas knowing they will be heard but also challenged.
Drivers Not Passengers:
We think as owners who drive the bus, not as passengers. We are self-starters and never wait for instructions. We are hungry for autonomy, trust, and responsibility. Everyone is a leader because we know leadership is a trait, not a title. Leaders drive growth and navigate the chaos.
We Figure Out The Answers:
We trust our ability to figure stuff out. We do not need all the information to start answering the question. We can connect the dots and answer difficult questions with logic.
Customer & Mission Obsessed:
Our customers are our heroes and we are obsessed with helping them. We are obsessed with; understanding their supply chains better, resolving their biggest headaches, and advancing their competitiveness.
Learning and growth
We all take personal responsibility for becoming smarter, wiser, more skilled, happier. We are obsessed with learning about our industry and improving our own skills. We are obsessed with our personal growth; to become more.
Job Description:
As a member of our Research team, you will be responsible for researching, developing, and coding Agents using state-of-the-art LLM's with automated pipelines.
- Write code for the development of our ML engines and micro-services pipelines.
- use, optimize, train, and evaluate state-of-the-art GPT models.
- research and Develop Agentic pipelines using LLM's.
- research and develop RAG based pipeline using vector DB's .
Essential Requirements:
- prompt engineering and Agentic LLm frameworks like langchain/llama index
- good enough undersanding of vectors/tensors and RAG pipelines
- Knowledge of building NLP systems using transfer learning or building custom NLP systems from scratch using TensorFlow or PyTorch.
- In-depth knowledge of DSA, async, python, and containers.
- Knowledge of transformers and NLP techniques is essential, and deployment experience is a significant advantage.
Salary Range: ₹15000 - ₹25000
We are offering a full-time internship position to final-year students. The internship will last for an initial period of 6-12 months before converting to a full-time job, depending on suitability for both parties. If the applicant is a student who needs to return to university, they can continue with the program on a part-time basis.
Graduate Trainee - Data Science
Posted 2 days ago
Job Viewed
Job Description
Program Highlights:
- Intensive training in data analysis techniques, statistical modeling, and machine learning algorithms.
- Exposure to programming languages commonly used in data science, such as Python and R.
- Hands-on experience with data manipulation, cleaning, and preparation tools.
- Learn to build and evaluate predictive models and machine learning pipelines.
- Develop skills in data visualization and storytelling to communicate insights effectively.
- Work on challenging projects that address real business problems.
- Receive mentorship from senior data scientists and guidance on career development.
- Collaborate within a remote team environment, enhancing communication and teamwork skills.
- Opportunity to contribute to innovative data-driven solutions.
- Potential for full-time employment upon successful completion of the trainee program.
Ideal Candidate Profile:
- Recent graduate with a Bachelor's or Master's degree in Data Science, Statistics, Mathematics, Computer Science, Economics, or a related quantitative field.
- Strong academic record and a demonstrated interest in data analysis and machine learning.
- Foundational knowledge of programming concepts, preferably in Python or R.
- Excellent analytical and problem-solving abilities.
- Strong communication and interpersonal skills, crucial for remote collaboration.
- Eagerness to learn and adapt to new tools and methodologies.
- Ability to work independently and manage time effectively in a remote setting.
- A proactive and curious mindset towards data exploration and discovery.
This is a fully remote program, offering flexibility and the chance to learn and grow from anywhere. If you are a driven graduate eager to dive into the world of data science, apply today!
Junior Data Science Apprentice
Posted 14 days ago
Job Viewed
Job Description
- Assist senior data scientists in data collection, cleaning, and preprocessing tasks.
- Perform exploratory data analysis to identify trends, patterns, and insights.
- Support the development and validation of machine learning models under guidance.
- Create data visualizations to communicate findings effectively to team members.
- Learn and apply various statistical techniques and data mining methods.
- Contribute to documentation of data processes, models, and findings.
- Participate in team meetings and knowledge-sharing sessions.
- Gain exposure to various data science tools and technologies such as Python, R, SQL, and data visualization libraries.
- Collaborate with team members on assigned tasks and projects.
- Actively seek feedback and opportunities for learning and skill development.
- Help in the preparation of reports and presentations summarizing analytical outcomes.
- Bachelor's degree or current enrollment in a Bachelor's or Master's program in a quantitative field such as Computer Science, Statistics, Mathematics, Economics, Physics, or Engineering.
- Strong analytical and quantitative skills with a demonstrable interest in data science.
- Basic understanding of programming concepts, preferably in Python or R.
- Familiarity with database concepts and SQL is a plus.
- Excellent problem-solving abilities and attention to detail.
- Strong communication and interpersonal skills, essential for remote collaboration.
- Eagerness to learn and adapt to new technologies and methodologies.
- Ability to work independently and manage time effectively in a remote setting.
- Prior exposure to data analysis projects (academic or personal) is advantageous.
- A proactive and curious mindset.
Be The First To Know
About the latest Big data technologies Jobs in Indore !
QA Analyst – Data Science
Posted 1 day ago
Job Viewed
Job Description
**We are currently hiring for a senior-level position and are looking for immediate joiners only.
If you are interested, please send your updated resume to along with details of your CTC, ECTC and notice period **
Location: Remote
Employment Type: Full-time
About the Role
The QA Engineer will own quality assurance across the ML lifecycle—from raw data validation through
feature engineering checks, model training/evaluation verification, batch prediction/optimization
validation, and end-to-end (E2E) workflow testing. The role is hands-on with Python automation, data
profiling, and pipeline test harnesses in Azure ML and Azure DevOps. Success means provably correct
data, models, and outputs at production scale and cadence.
About the Role
The QA Engineer will own quality assurance across the ML lifecycle—from raw data validation through
feature engineering checks, model training/evaluation verification, batch prediction/optimization
validation, and end-to-end (E2E) workflow testing. The role is hands-on with Python automation, data
profiling, and pipeline test harnesses in Azure ML and Azure DevOps. Success means provably correct
data, models, and outputs at production scale and cadence.
Key Responsibilities
● Test Strategy & Governance
○ Define an ML-specific Test Strategy covering data quality KPIs, feature consistency
checks, model acceptance gates (metrics + guardrails), and E2E run acceptance
(timeliness, completeness, integrity).
○ Establish versioned test datasets & golden baselines for repeatable regression of
features, models, and optimizers.
● Data Quality & Transformation
○ Validate raw data extracts and landed datalake data: schema/contract checks,
null/outlier thresholds, time-window completeness, duplicate detection, site/material
coverage.
○ Validate transformed/feature datasets: deterministic feature generation, leakage
detection, drift vs. historical distributions, feature parity across runs (hash or statistical
similarity tests).
○ Implement automated data quality checks (e.g., Great Expectations/pytest +
Pandas/SQL) executed in CI and AML pipelines.
● Model Training & Evaluation
○ Verify training inputs (splits, windowing, target leakage prevention) and
hyperparameter configs per site/cluster.
○ Automate metric verification (e.g., MAPE/MAE/RMSE, uplift vs. last model, stability
tests) with acceptance thresholds and champion/challenger logic.
○ Validate feature importance stability and sensitivity/elasticity sanity checks (pricevolume
monotonicity where applicable).
○ Gate model registration/promotion in AML based on signed test artifacts and
reproducible metrics.
● Predictions, Optimization & Guardrails
○ Validate batch predictions: result shapes, coverage, latency, and failure handling.
© 2025 Insurge Partners. All rights reserved.
○ Test model optimization outputs and enforced guardrails: detect violations and prove
idempotent writes to DB.
○ Verify API push to third party system (idempotency keys, retry/backoff, delivery
receipts).
● Pipelines & E2E
○ Build pipeline test harnesses for AML pipelines (data-gen nightly, training weekly,
prediction/optimization) including orchestrated synthetic runs and fault injection
(missing slice, late competitor data, SB backlog).
○ Run E2E tests from raw data store -> ADLS -> AML -> RDBMS -> APIM/Frontend; assert
freshness SLOs and audit event completeness (Event Hubs -> ADLS immutable).
● Automation & Tooling
○ Develop Python-based automated tests (pytest) for data checks, model metrics, and API
contracts; integrate with Azure DevOps (pipelines, badges, gates).
○ Implement data-driven test runners (parameterized by site/material/model-version)
and store signed test artifacts alongside models in AML Registry.
○ Create synthetic test data generators and golden fixtures to cover edge cases (price
gaps, competitor shocks, cold starts).
● Reporting & Quality Ops
○ Publish weekly test reports and go/no-go recommendations for promotions; maintain a
defect taxonomy (data vs. model vs. serving vs. optimization).
○ Contribute to SLI/SLO dashboards (prediction timeliness, queue/DLQ, push success, data
drift) used for release gates.
Required Qualifications
○ 5–7+ years in QA with 3+ years focused on ML/Data systems (data pipelines + model validation).
○ Python automation (pytest, pandas, NumPy), SQL (PostgreSQL/Snowflake), and CI/CD (Azure
DevOps) for fully automated ML QA.
○ Strong grasp of ML validation: leakage checks, proper splits, metric selection
(MAE/MAPE/RMSE), drift detection, sensitivity/elasticity sanity checks.
○Experience testing AML pipelines (pipelines/jobs/components), and message-driven integrations
(Service Bus/Event Hubs).
○ API test skills (FastAPI/OpenAPI, contract tests, Postman/pytest- + idempotency and retry
patterns.
○ Familiar with feature stores/feature engineering concepts and reproducibility.
○Solid understanding of observability (App Insights/Log Analytics) and auditability requirements.
Education
• Bachelor’s or Master’s degree in Computer Science, Information Technology, or related field.
• Certification in Azure Data or ML Engineer Associate is a plus.
Data Science Intern (Remote)
Posted 26 days ago
Job Viewed
Job Description
Job title: Data Science Intern- Remote
Report to: Data Science Manager in Pune
Job Responsibilities
- Solve Time series 1D (continuous glucose monitoring), RNA sequencing, and Computer vision (mainly medical images) problems
- Solve challenging problems using scalable 1D signal processing, machine learning, and deep learning approaches.
- In charge of developing state-of-the-art machine learning/deep learning algorithms for medical datasets
- Communicate highly technical results and methods concisely and clearly
- Collaborate with researchers in Japan to understand requirement as well request data.
Requirement
- Master/Ph.D. in relevant field from tier-1 colleges
- 1~2 years of experience with programming language/s Python or C++, Pandas
- 1~2 years of experience of working with deep learning framework, i.e. Pytorch or Tensorflow
- Well acquainted with classical time series problems-algorithms, NLP, Computer vision etc.
- Demonstrated experience with machine learning/deep learning models.
- Candidates should be able to read and implement research papers from top conferences.
- Develop IP (patents) and publish papers.
- Proficiency in Windows, Linux, dockers, PPT, and Git commands are highly required.
Preferred Skills
- Experience working with time-series, text, and sequential datasets in real world settings.
- Proven track record of research or industry experience on Time series problems, NLP, tabular datasets.
- Well acquainted with machine learning libraries such as pandas, scikit-learn etc.
- Experience programming in Azure or GCP or other cloud service.
- Publications in top-tier conferences will be a plus.
Location
This is a remote internship. If your performance is strong, we will consider converting your role to a full-time position after six months.
Data Science Lead, Clinical Intelligence
Posted 2 days ago
Job Viewed
Job Description
Job Description: Data Science Lead, Clinical Intelligence
Company Overview
Anervea is a pioneering AI transformation tech company delivering AI-powered SaaS solutions for the US pharma industry. Our therapy-agnostic clinical intelligence platform leverages real-world evidence (RWE) to predict patient outcomes and personalize treatments, empowering clients to optimize clinical trials and accelerate drug development.
Job Title
Data Science Lead, Clinical Intelligence
Location
Pune, India (hybrid office/remote) or fully remote within India.
Job Summary
We are seeking an experienced Data Science Lead to spearhead data operations for our clinical intelligence SaaS platform. You will lead data pipeline development, integrate computational biology insights, and ensure compliance with US pharma regulations, driving AI-powered predictions for patient outcomes across therapies (e.g., oncology, diabetes). This role is perfect for a leader with expertise in computational biology, clinical research, and scalable data systems.
Key Responsibilities
- Build and manage end-to-end data pipelines for ingesting and analyzing de-identified RWE, preclinical data, and public datasets (e.g., TCGA, ChEMBL) using Python, pandas, and cloud tools.
- Ensure data quality, privacy, and compliance with HIPAA, FDA (21 CFR Part 11), and GDPR, focusing on de-identification and bias mitigation.
- Lead integration of computational biology (e.g., genomics, AlphaFold protein modeling) into AI models for therapy-agnostic outcome predictions.
- Collaborate with AI teams to develop predictive models (e.g., XGBoost, PyTorch) for clinical trials and personalized medicine.
- Optimize data operations for scalability and cost-efficiency, handling large, diverse health datasets.
- Oversee cross-functional teams (remote/hybrid) to troubleshoot issues, audit data, and deliver client-ready insights.
- Stay ahead of US pharma trends (e.g., RWE, precision medicine) to enhance platform capabilities.
Qualifications and Requirements
- Master’s or PhD in Computational Biology, Bioinformatics, Data Science, or related field.
- 4+ years in data science or operations in US pharma/biotech, with expertise in clinical research (e.g., trials, RWE).
- Deep knowledge of computational biology (e.g., genomics, RDKit/AlphaFold for drug-protein interactions).
- Proficiency in Python, SQL, ETL tools (e.g., Airflow), and big data frameworks (e.g., Spark).
- Familiarity with US pharma regulations (HIPAA, FDA) and clinical trial processes (Phase 1-3).
- Experience with AI/ML for health data (e.g., scikit-learn, PyTorch).
- Based in India; open to remote or hybrid work in Pune.
- Strong leadership and communication skills for global client collaboration.
Preferred Skills
- Experience with cloud platforms (e.g., AWS, Azure) for secure data processing.
- Knowledge of SaaS platforms and API integrations for client data.
- Background in oncology or precision medicine (e.g., breast cancer outcome predictions).
- Expertise in mitigating data biases for fair AI predictions.
What We Offer
- Highly Competitive salary based on experience with performance bonuses.
- Flexible remote/hybrid work, health benefits, and learning opportunities.
- Leadership role in cutting-edge US pharma AI innovation with travel opportunities in the US
- Collaborative, global team environment.