Job Description
We are seeking an experienced Databricks/Data Engineer to enhance our existing data framework and build an end-to-end ingestion workflow for unstructured and semi-structured data. This is the first phase of a larger project, with the potential for long-term engagement.
Project Scope
- Build a data ingestion workflow in Databricks for:
- CSV files (semi-structured)
- PDF files (unstructured/semi-structured)
- Implement mapping, normalization, and transformation logic for the incoming data.
- Set up an end-to-end record workflow, ensuring data consistency and integrity.
- Automate data ingestion:
- Detect file drops in designated locations
- Trigger notifications/alerts after successful or failed ingestion
- Collaborate on designing scalable pipelines for future phases of the project.
Candidate Requirements
- Proven experience with Databricks (PySpark, Delta Lake, or Spark SQL).
- Experience handling semi-structured and unstructured data (CSV, PDF, JSON, XML, etc.).
- Knowledge of data mapping, normalization, and ETL pipelines.
- Experience with workflow automation and notification triggers.
- Strong problem-solving skills and ability to work independently.
- Familiarity with version control tools (Git) and Agile processes is a plus.
Engagement
- Phase 1: Build and automate ingestion workflow
- Long-term: Opportunity to contribute to broader project initiatives