CORDIS Fabric Data Platform
Built a Microsoft Fabric Lakehouse solution for CORDIS European research data using a medallion architecture. The project covered source extraction, Landing, Bronze, Silver and Gold layer processing, audit logging, data validation and Power BI semantic modelling.
CORDIS Fabric Data Platform Full Document
Document Preview
This document explains the medallion architecture design, Fabric Lakehouse layers, PySpark transformations, audit logging and Power BI-ready Gold model.
- Landing, Bronze, Silver and Gold layer design
- Metadata-driven ingestion and Delta table loading
- Audit logs, validations and row-count checks
- Power BI semantic model preparation
Landing, Bronze, Silver and Gold lakehouse design.
Cleaned CORDIS projects prepared for reporting and search.
Fact and dimension tables for Power BI analytics.
Problem
CORDIS data is spread across multiple programmes and file formats. The goal was to create an analytics-ready data platform that could standardise project, organisation, publication, deliverable, report, IPR and policy-priority data for reporting.
Architecture
What I Built
- Metadata-driven extraction from CORDIS Horizon, H2020 and FP7 sources.
- Bronze loading with file format handling, schema checks and table creation.
- Silver transformations for standardised field names, data types, status values and country names.
- Gold semantic model using fact and dimension tables for reporting.
- Audit tables and row-count checks to improve reliability and debugging.
- Power BI-style dashboard design for executive, organisation, country and research topic views.
Outcome
The project became the foundation for the later CORDIS Research Explorer web application and the portable CORDIS-to-Supabase ETL pipeline.