PTB-XL

Open Completed

Quick facts

Format12-lead · 10 s · 500 Hz (also 100 Hz)
Patients18,869
Records21,799
Leads12
LicenseCC BY 4.0
OriginPhysikalisch-Technische Bundesanstalt — Germany

Overview

PTB-XL is a large, publicly available 12-lead ECG dataset with 21,799 clinical records from 18,869 patients, collected at the Physikalisch-Technische Bundesanstalt (PTB) between October 1989 and June 1996. Each record is 10 seconds long and provided at both 500 Hz and 100 Hz. Records are annotated with up to 71 SCP-ECG statements grouped into 5 diagnostic superclasses (NORM, MI, STTC, CD, HYP).

ECGBench bundles a deterministic 10-fold stratified patient-level split derived from the SCP superclass labels, ready to consume via the ECGDataset class.

Diagnostic superclass breakdown

SuperclassDescriptionRecords
NORMNormal ECG9,514
MIMyocardial Infarction5,486
STTCST/T changes5,250
CDConduction disturbance4,907
HYPHypertrophy2,655

Loading with ECGBench

from ecgbench import ECGDataset, ecg_collate_fn
from torch.utils.data import DataLoader

# Load the training split (folds 1-8) at 100 Hz
dataset = ECGDataset(
    physionet_path="/path/to/ptb-xl/1.0.3/",
    dataset_name="ptbxl",
    split="train",
    frequency="100",
)

loader = DataLoader(dataset, batch_size=32, collate_fn=ecg_collate_fn)

for batch in loader:
    signals = batch["signal"]     # (B, 12, 1000) at 100 Hz
    ecg_ids = batch["ecg_id"]     # list of record IDs
    labels  = batch["scp_codes"]  # list of per-record SCP dicts
    break

Inspecting the catalogue entry

from ecgbench import get_dataset

entry = get_dataset("ptb-xl")
print(entry.patients, entry.records, entry.access)
# -> 18,869 21,799 open