Location: Indore/Raipur

Experience: 2 to 4 years

Key Responsibilities:

  • Design, develop, and manage scalable data pipelines and ETL workflows using Databricks, PySpark, and SQL for large-scale data processing.
  • Build and maintain data ingestion frameworks to extract data from enterprise systems such as SAP APIs, REST services, and relational databases.
  • Develop and optimize Delta Lake based data architecture to ensure reliable, high-performance data storage and processing.
  • Design and implement data transformation pipelines to convert raw data into curated datasets for analytics and reporting.
  • Optimize Spark jobs and SQL queries to improve performance and reduce compute costs.
  • Implement data quality validation, monitoring, and error handling frameworks for reliable pipeline execution.
  • Build automated workflow orchestration and scheduling mechanisms for end-to-end data processing pipelines.
  • Collaborate with data analysts, business stakeholders, and platform teams to design efficient data solutions.
  • Develop and maintain data models and schema design for data lake and downstream analytical systems.
  • Support data platform engineering activities, including cluster configuration, performance tuning, and reusable utility development.
  • Troubleshoot production pipeline failures, data inconsistencies, and performance issues.
  • Develop Python utilities and frameworks to support data ingestion, transformation, and automation tasks.
  • Implement data governance, security, and access control standards across enterprise data pipelines.
  • Participate in code reviews, documentation, and best practices to improve overall data engineering standards.
  • Support large-scale data integrations and migrations from legacy systems to modern cloud data platforms.
  • Ownership of the entire data pipeline lifecycle, from development to deployment

Required Skills:

  • 2+ years of experience in Data Engineering, Data Pipeline Development, and Data Processing.
  • Strong experience with Python, PySpark, and SQL for large-scale data transformations.
  • Hands-on experience with Databricks, Delta Lake, and distributed data processing frameworks.
  • Experience integrating data from REST APIs, SAP systems, and enterprise data sources.
  • Strong knowledge of data modeling, schema design, and ETL best practices.
  • Experience working with cloud data platforms (GCP / AWS / Azure) and cloud storage systems.
  • Experience with workflow orchestration, job scheduling, and automated data pipelines.
  • Ability to optimize Spark workloads and troubleshoot performance issues in large datasets.
  • Strong problem-solving skills and ability to work in fast-paced data platform environments.