Sunbird cQube
Search…
S3 Partitioning

S3 bucket partitioning

S3 buckets will contain partitions for the data files to store. The partitions are created at the S3 input bucket & the S3 Output bucket.

S3 emission bucket partitions

  • All the files in the S3 emission bucket will be in the CSV format.
  • S3 emission bucket follows the folder hierarchy based on the data sources.
1
S3 -> Bucket name -> Data source -> emitted zip files with timestamp
Copied!
  • The emitted zip file contains the CSV data files with timestamp and a manifest file with a timestamp.
  • The folders and the files will be removed from the S3 emission bucket once NIFI copies the data.
  • Unprocessed data files will remain in the S3 emission bucket for one week and then they will be deleted automatically at the end of the week.

S3 input bucket partitions

  • All the files in the S3 input bucket will be in the CSV format.
  • S3 input bucket follows a hierarchical partitioning based on the Data source, Year, Month, date and timestamp
1
S3 -> Bucket name -> Data source -> Year -> Year - Month -> date_Source name
Copied!
Example for the S3 input bucket:
1
S3/cqube-gj-input/student_attendance/2020/2020-05/2020-05-29_student_attendance
Copied!

S3 output bucket partitions

  • All the files in the S3 Output bucket will be in the JSON format.
  • S3 output bucket follows the hierarchical partitioning based on the data source, Year, Month, date and timestamp, similar to the partitioning that the S3 input bucket follows.
  • Metadata files will have information of the latest updated output files which helps cQube to consider the latest output file during the visualization stage.
Last modified 1mo ago