Overview:
The Data Engineer is responsible for the creation, maintenance, and continuous improvement of data pipelines. Part of the responsibilities is to implement best practices in data management practices (i.e., cleaning, validation, and transformation of data) and make data into usable datasets that can easily be consumed by other teams.
What you will do:
- Work on existing data pipelines, including development of data models, and data management may it be in a data warehouse, data/delta lake or lakehouse. Collaboratively work with upstream teams that pass data to the data architecture and with downstream teams that use data within it.
- Helps in the maintenance of the overall data architecture, ensuring its scalability, high availability, on-time data ingestions, and ensuring operations are not disrupted.
- Build data pipelines as new data comes in, applying best practices and DataOps principles.
- Acquire and maintain an in-depth domain knowledge of the data within the assigned scope. This domain knowledge is crucial in the creation of data models and development of the said data models for Zone 2 and Zone 3 data (a.k.a. silver and gold layer, respectively). This expertise ensures that DE-transformed business datasets are usable for downstream teams.
What we are looking for:
- BS or MS in Computer Science or equivalent
- Proven years of experience with Data Engineering roles focusing on Data Architecture, Data Management, and DataOps practices
- Has relevant experience in data modelling
- Has good working knowledge on Shell (e.g. bash, zsh) scripting
- Has good working knowledge on data manipulation (SQL statements, JSON, NOSQL query, etc.)
- Has good working knowledge on AWS services (EC2, S3, Glue Crawlers, Jobs, Batch, Athena, Lambda, etc.) or equivalent cloud offerings a big plus
- Has good working knowledge on Apache Spark using SQL/Python
- Has good understanding of the concepts of Datawarehouse, Data Lake/Delta Lake and/or Lakehouse
- Has good knowledge on Linux/Unix Administration
- Has good working knowledge on data modeling
- Able to work with other Leads to foster a culture of collaboration and teamwork
- CI/CD experience using Terraform is a huge advantage
- Experience working with Amazon Web Services / Cloud is a big plus