Adapter details
Last updated
Last updated
Overview
Metric: A metric is a column in an event data file on which the aggregations are done to derive the insights.
CQube adapter is an ETL (Extract, Transform and Load) pipeline with processes used to move data from the adapter database to multiple CSVs after making the required transformations. A cQube adapter is needed because cQube expects data in a specific format and the output CSVs of the adapter can be ingested directly into cQube to get the programs, reports and indicators.
The adapter makes a connection with the state data source (ex: azure container/aws s3 bucket/oracle file system /minio bucket).Read the zipped data file from the emission folder emission/<date>/<file_name>.csv.
It then reads the raw data files from the datasource.It performs the transformation to generate the Dimension and Event (Fact) CSV files. The desired format and output columns list in the dimension and event file for each program can be found here.
Select the required column from the report(zip file).
Split the files according to the number of metrics in report
Output Event CSV files will be stored inside AWS S3 Bucket / Minio / Azure in the input-bucket process_input/program/<date>/<event_name>-event.data.csv. process_input/program/<date>/<event_name>-dimension.data.csv Format.
NiFi will run This adapter ETL pipeline will run at a specific frequency so that the output CSV data can be refreshed and the latest data will be ingested into the system.
Example for illustration:
Initial file
Final files
cQube adapter can use any system, programming language or ETL tool to develop the cQube adapter.
For example:
Python scripts can be used to extract data from the source / state database, transform it and finally export the CSV files inside the AWS S3 bucket or cloud storage which is being used. Apache Airflow can be used to scheduling the python scripts.
Or, Apache NiFi can be used to create the end-to-end ETL Pipeline.
The only requirement is that the adapter-generated CSV files should have the same column names and the data format as per schema Refer this link for detailed explanation
date
district_id
block_id
cluster_id
school_id
schoolcategory_name
grade
gender
KPI-1
KPI-2
KPI-3
date
district_id
block_id
cluster_id
school_id
schoolcategory_name
grade
gender
KPI-1
date
district_id
block_id
cluster_id
school_id
schoolcategory_name
grade
gender
KPI-2
date
district_id
block_id
cluster_id
school_id
schoolcategory_name
grade
gender
KPI-3