CORDIS to Supabase Pipeline
Summary: A portable Python ETL pipeline that prepares cleaned CORDIS data for the live research web app and loads the final web-ready dataset into Supabase.
82,370web-ready project records
0blank programme values after validation
Portableruns outside Fabric
Objective
The pipeline was created to make the CORDIS project independent from temporary cloud trials. It allows the cleaned web dataset to be rebuilt locally and loaded into Supabase whenever needed.
Pipeline Flow
- Download or reuse CORDIS source files.
- Extract and standardise programme datasets.
- Clean project, organisation, topic, country and funding fields.
- Create Gold-style fact and dimension outputs.
- Generate a web-optimised cordis_projects table.
- Validate row counts, duplicate IDs, blank statuses and unknown country rows.
- Load the final table into Supabase.
Outcome
The pipeline supports the live CORDIS Research Explorer app and gives a repeatable path to rebuild the dataset without depending on a Fabric trial account.