PTB-XL ECG Dataset — Analysis Report
Automated analysis of metadata, missing values, ECG signal quality, and clean record identification.
Dataset Structure
Interactive folder & file tree of the PTB-XL v1.0.1 dataset. Click any folder node to expand or collapse. Hover for descriptions. Scroll/pinch to zoom, drag to pan.
Lists every ECG record that passed all quality criteria:
all 12 leads present, no NaN values in any lead, no flat signals, no missing required
metadata fields, and no quality flags (baseline drift, static noise, burst noise, or
electrode problems). Each row contains the ecg_id and its
strat_fold assignment for use in cross-validation splits.
Hosted on 🤗 HuggingFace — vlbthambawita/ecg-metadata-curated
⬇ Download clean_records.csvDataset Summary
Key statistics from ptbxl_database.csv and signal files.
Age Distribution
Histogram of patient ages across all recordings.
Sex Distribution
Count of female and male patients.
Height & Weight
Scatter plot of patient height vs weight (up to 5,000 sampled points). Scroll to zoom, drag to pan.
Recording Site
Top recording sites by number of ECG recordings.
Stratified Fold Distribution
Number of records per cross-validation fold (1–10).
Top SCP Diagnostic Codes
Most frequent SCP-ECG diagnostic codes across all records.
Diagnostic Class Distribution
High-level diagnostic classes assigned to records.
Heart Axis
Distribution of heart axis categories.
Quality Flags
Number of records flagged for each signal quality issue.
Missing Metadata
Percentage of records missing each metadata field.
Missing ECG Leads
Percentage of records where each ECG lead is absent or contains NaN values.
Flat Signal Leads
Percentage of records where each lead has a flat (near-zero std) signal.
Missing / NaN Leads Heatmap
Per-record × per-lead presence of missing or NaN values (sampled records). Blue = present, Red = missing/NaN.
Clean Records per Fold
Number of clean vs discarded records in each stratified fold.