PTB-XL ECG Dataset — Analysis Report

Automated analysis of metadata, missing values, ECG signal quality, and clean record identification.

Dataset Structure

Interactive folder & file tree of the PTB-XL v1.0.1 dataset. Click any folder node to expand or collapse. Hover for descriptions. Scroll/pinch to zoom, drag to pan.

PTB-XL Directory Tree
📁 Folder
📊 CSV file
💾 DAT — binary signal
📄 HEA — WFDB header
🐍 Python script
📝 Text file
More files
📄
Clean Records CSV

Lists every ECG record that passed all quality criteria: all 12 leads present, no NaN values in any lead, no flat signals, no missing required metadata fields, and no quality flags (baseline drift, static noise, burst noise, or electrode problems). Each row contains the ecg_id and its strat_fold assignment for use in cross-validation splits.

Hosted on 🤗 HuggingFace — vlbthambawita/ecg-metadata-curated

⬇ Download clean_records.csv

Dataset Summary

Key statistics from ptbxl_database.csv and signal files.

Summary Table

Age Distribution

Histogram of patient ages across all recordings.

Age Distribution
Age Statistics

Sex Distribution

Count of female and male patients.

Sex Distribution
Sex Statistics

Height & Weight

Scatter plot of patient height vs weight (up to 5,000 sampled points). Scroll to zoom, drag to pan.

Height vs Weight
Height & Weight Statistics

Recording Site

Top recording sites by number of ECG recordings.

Recording Site Distribution
Site Statistics

Stratified Fold Distribution

Number of records per cross-validation fold (1–10).

Strat Fold Distribution
Fold Statistics

Top SCP Diagnostic Codes

Most frequent SCP-ECG diagnostic codes across all records.

Top SCP Codes
SCP Code Statistics

Diagnostic Class Distribution

High-level diagnostic classes assigned to records.

Diagnostic Class
Diagnostic Class Statistics

Heart Axis

Distribution of heart axis categories.

Heart Axis Distribution
Heart Axis Statistics

Quality Flags

Number of records flagged for each signal quality issue.

Quality Flag Counts
Quality Flag Statistics

Missing Metadata

Percentage of records missing each metadata field.

Missing Metadata (%)
Missing Metadata Statistics

Missing ECG Leads

Percentage of records where each ECG lead is absent or contains NaN values.

Missing / NaN Leads (%)
Missing Lead Statistics

Flat Signal Leads

Percentage of records where each lead has a flat (near-zero std) signal.

Flat Leads (%)
Flat Lead Statistics

Missing / NaN Leads Heatmap

Per-record × per-lead presence of missing or NaN values (sampled records). Blue = present, Red = missing/NaN.

Missing Leads Heatmap
Present
Missing / NaN

Clean Records per Fold

Number of clean vs discarded records in each stratified fold.

Clean Records per Fold
Clean Record Statistics