Back to portfolio
Portable Data Engineering Pipeline

CORDIS to Supabase Pipeline

A portable Python ETL flow that prepares cleaned CORDIS project data for the web application and loads the final web dataset into Supabase.

PythonPandasParquetSupabaseData CleaningETL
CORDIS to Supabase pipeline preview
PROJECT DOCUMENT

CORDIS to Supabase Pipeline Full Document

Click to read document

Document Preview

This document explains the Python ETL design, data cleaning logic, validation checks, Gold-style outputs and Supabase loading process.

  • Source extraction and data preparation
  • Programme, status and country standardisation
  • Gold-style fact and dimension outputs
  • Supabase web table load and validation
Read Full Document
82,370

Web-ready CORDIS project records validated after cleaning.

0 blanks

No blank programme or status values in the final validation output.

Portable

Designed to run outside Microsoft Fabric when needed.

Purpose

The pipeline prepares the final web dataset using Python so the data can be rebuilt and loaded again when needed.

Pipeline Flow

Source filesExtractTransformGold outputsWeb datasetSupabase load

Key Work

Outcome

The pipeline supports the live CORDIS Research Explorer app and gives a repeatable path to rebuild the cleaned dataset.