Mindera
9 days ago
We are looking for an experienced Data Engineer to become a valuable member of our energetic team. The perfect candidate will possess extensive knowledge of big data technologies, ETL/ELT workflows, and data modeling techniques. This position will concentrate on designing and enhancing data pipelines, maintaining data integrity, and bolstering our analytics projects.
We are looking for an experienced Data Engineer with 5 to 7+ years of pertinent experience to be a part of our energetic team. The ideal candidate will possess a robust background in big data technologies, ETL/ELT processes, and data modeling. This position will concentrate on developing and refining data pipelines, ensuring data fidelity, and facilitating our analytics efforts.
Key Responsibilities:
- Design, develop, and maintain scalable ETL/ELT pipelines using PySpark and Databricks to facilitate data ingestion and processing.
- Implement and enhance data streaming solutions for real-time data processing.
- Improve Spark job performance by addressing memory management, partitioning strategies, and implementing efficient data storage formats.
- Collaborate with data scientists and analysts to gather data requirements and provide reliable datasets for analysis.
- Create and refine complex SQL queries for data extraction, transformation, and analysis.
- Maintain data quality and integrity through automated testing and validation methods.
- Document data workflows and maintain metadata for governance purposes.
- Research and adopt new data engineering technologies and methods to enhance efficiency and scalability.
Mandatory Skills:
- PySpark: Proficient in using PySpark for data processing and ETL workflows.
- Azure Databricks: Experience with the Databricks platform, including cluster setup and management.
- Data Streaming: Knowledge of streaming data processing with frameworks such as Spark Streaming.
- Python: Strong programming skills in Python for scripting and automation tasks.
- SQL: Advanced skills in SQL for querying and managing relational databases.
- Spark Optimization: Experience in optimizing Spark applications for enhanced performance.
Optional Skills:
- Snowflake: Familiarity with Snowflake for data warehousing and query optimization.
- Cloud Platforms: Understanding of cloud services (AWS, Azure, GCP) for data storage and processing.
- ETL/ELT Concepts: Knowledge of ETL/ELT processes, data modeling, and data warehousing best practices.
- Big Data Tools: Familiarity with tools and frameworks such as Kafka, Hadoop, and Hive.
- CI/CD Practices: Understanding of CI/CD for automated deployment and version control using tools like Git, Jenkins, etc.