Responsibilities:

Data Processing: Design, develop, and maintain scalable and efficient data processing pipelines using technologies such as Apache Spark, Hive, and Hadoop.
Programming Languages: Proficient in Python, Scala, SQL, and Shell Scripting for data processing, transformation, and automation.
Cloud Platform Expertise: Hands-on experience with Google Cloud Platform (GCP) services, including but not limited to BigQuery, BigTable, Cloud Composer, Dataflow, Google Cloud Storage, and Identity and Access Management (IAM).
Version Control and CI/CD: Implement and maintain version control using Git and establish continuous integration/continuous deployment (CI/CD) pipelines for data processing workflows.
Jenkins Integration: Experience with Jenkins for automating the building, testing, and deployment of data pipelines.
Data Modeling: Work on data modeling and database design to ensure optimal storage and retrieval of data.
Performance Optimization: Identify and implement performance optimization techniques for large-scale data processing.
Collaboration: Collaborate with cross-functional teams, including data scientists, analysts, and other engineers, to understand data requirements and deliver solutions.
Security and Networking: Possess basic knowledge of GCP Networking and GCP IAM to ensure secure and compliant data processing.
Documentation: Create and maintain comprehensive documentation for data engineering processes, workflows, and infrastructure.

Mandatory Skill Set: Apache Spark, Hive, Hadoop, BigQuery, BigTable, Cloud Composure, Dataflow, Google Cloud Storage, Python, SQL, Shell Scripting, Git.

Good to have Skill Set: CI/CD, Jenkins, Security and Networking, Scala, GCP Identity and Access Management (IAM).

Qualifications:

Proven experience with Apache Spark, Hive, and Hadoop.
Strong programming skills in Python, Scala, SQL, and Shell Scripting.
Hands-on experience with GCP services, including BigQuery, BigTable, Cloud Composer, Dataflow, Google Cloud Storage, and Identity and Access Management (IAM)
Familiarity with version control using Git and experience in implementing CI/CD pipelines.
Experience with Jenkins for automating data pipeline processes.
Basic understanding of GCP Networking.
Excellent problem-solving and analytical skills.
Strong communication and collaboration skills.

NucleusTeq culture

Our positive and supportive culture encourages our associates to do their best work every day. We celebrate individuals by recognizing their uniqueness and offering them the flexibility to make daily choices that can help them to be healthy, centered, confident, and aware. We offer well-being programs and are continuously looking for new ways to maintain a culture where our people excel and lead healthy, happy lives.

Data Engineer- GCP