PTB-XL+ Dataset Analysis Report

PTB-XL+: A Comprehensive Electrocardiographic Feature Dataset  ·  physionet.org ↗  ·  Paper ↗

Dataset Summary

High-level counts across all data sources in PTB-XL+ v1.0.1.

Record Coverage by Source

Number of ECG records available in each component versus the canonical RECORDS file.

Records per data source

Feature Missing Values

Percentage of missing values per feature, by algorithm source. Features above 5% threshold are flagged.

12SL — top features by missing %
ECGDeli — top features by missing %
UNIG — top features by missing %

Key Interval & Amplitude Statistics

Summary statistics for clinically important ECG features (P/QRS/T durations, PR/QT intervals, amplitudes) across all records.

12SL feature statistics
ECGDeli feature statistics
UNIG feature statistics

PTB-XL Label Frequency (SCP Codes)

Most common SCP diagnostic codes from cardiologist annotations in PTB-XL.

Top SCP codes by record count

12SL Algorithm Statement Frequency

Most common automated diagnostic statements from the 12SL algorithm.

Top 12SL statements by record count

Label Co-occurrence

How often pairs of labels appear together in the same record (top labels shown).

PTB-XL SCP code co-occurrence heatmap
12SL statement co-occurrence heatmap

SNOMED CT Coverage

How many SNOMED CT concepts from the dataset description file are covered by each label source.

SNOMED concept coverage

Fiducial Point Coverage (ECGDeli)

Number of records with ECGDeli delineation annotations per lead.

Records with .atr annotation files per lead

Median Beat Coverage

Number of records with median beat waveform files (.dat/.hea) for each algorithm.

12SL median beats
UNIG median beats

Cross-Source Label Agreement

Agreement between PTB-XL cardiologist annotations and 12SL algorithm statements, compared via shared SNOMED CT concepts. "Both" = concept flagged by both sources in the same record.

Top SNOMED concepts — agreement breakdown