Reading WFDB data into PowerBI

Daniel Sepulveda Estay, PhD
6 min readJan 16, 2023

WFDB format is a file format used for storing physiological signals such as electrocardiograms (ECGs) and blood pressure recordings. The format is supported by the PhysioNet library and tools, which are widely used in biomedical research and education. The format consists of two files: a header file (.hea) that contains information about the signals, such as the number of channels, sampling frequency, and gain, and a data file (.dat) that contains the actual signal data, which is typically stored in 16-bit integer format.

The extension of a WFDB data file is typically “.dat”. When using the WFDB library, the data file and the header file are stored separately and the header file has the extension of “.hea”. The header file contains information about the signals, such as the number of channels, sampling frequency, and gain, while the data file contains the actual signal data, which is typically stored in 16-bit integer format.

When reading or writing WFDB files using the wfdb package, you typically only need to provide the base name of the file (without the extension) and the package will automatically look for the corresponding header and data files. For example, if you have a WFDB file called "example", the package will look for "example.hea" as the header file and "example.dat" as the data file.

Reading WFDB Data

The following is Python code that shows how to use the wfdb package to read a WFDB file and convert it to a CSV file:

import wfdb
import csv

# Read the WFDB file using the `rdsamp` function
record = wfdb.rdsamp('your_file_name')

# Open a new CSV file for writing
with open('your_file_name.csv', 'w', newline='') as csvfile:
writer = csv.writer(csvfile)

# Write the header row to the CSV file
writer.writerow(['time', 'signal1', 'signal2', ...])

# Write the data to the CSV file
for i in range(len(record.p_signals)):
writer.writerow([record.time_vect[i]] + list(record.p_signals[i]))

This code reads a WFDB file named ‘your_file_name’ using the rdsamp function from the wfdb package and then writes the signal data to a new CSV file 'your_file_name.csv' with time column as first column and signal data as other columns. You can also use record.sig_name to get the name of the signal and use it as column name instead of hardcoded signal1, signal2...

It’s important to notice that rdsamp function returns multiple signal data, so you might need to modify the code to select the specific signal you want.

Reading each channel

The rdsamp function from the wfdb package has a parameter called channels that you can use to select specific signals. This parameter takes a list of integers, where each integer corresponds to the index of the signal you want to read. For example, to read the first and third signals from a multi-channel WFDB file, you can use the following code:

record = wfdb.rdsamp('your_file_name', channels=[0, 2])

Alternatively, if you want to select signals based on their names, you can use the rdrecord function instead of rdsamp, which returns a Record object containing all the meta-data of the record, including the names of the signals. Then you can use the p_signals attribute of the Record object to access the signals you want by name.

record = wfdb.rdrecord('your_file_name', 
channels = ['signal_name_1', 'signal_name_2'])

This will read the signals ‘signal_name_1’ and ‘signal_name_2’ and you can use it to write to the CSV file using the writerow as in the previous example.

It’s worth noting that you can also use the sig_name attribute of the Record object to get the names of all the signals in the record and use it to filter the signals you want to read.

Implementation by using a app window

Here’s some sample Python code that demonstrates how to use the wfdb package to read a WFDB file, display a list of available channels, and display the selected channel's waveform on the same window.

import wfdb
import matplotlib.pyplot as plt
from tkinter import Tk
from tkinter.filedialog import askopenfilename

# Open a file dialog to select the WFDB file
root = Tk()
root.withdraw()
filepath = askopenfilename(title = "Select WFDB file")

# Read the WFDB file using the `rdsamp` function
record = wfdb.rdsamp(filepath)

# Get the names of the signals
sig_names = record.sig_name

# Print the names of the signals
print("Available Channels:")
for i in range(len(sig_names)):
print(i, sig_names[i])

# Ask the user to select a channel
selected_channel = int(input("Enter the number of the channel you want to view: "))

# Plot the selected channel's waveform
plt.plot(record.p_signals[selected_channel,:])
plt.title(sig_names[selected_channel])
plt.xlabel('Sample index')
plt.ylabel('Amplitude')
plt.show()

This code uses the askopenfilename function from the tkinter.filedialog module to open a file dialog that allows the user to select a WFDB file. Then, it uses the rdsamp function from the wfdb package to read the selected file, and the sig_name attribute of the returned Record object to get the names of the signals. It then prints the names of the available channels, and ask the user to select a channel. Finally, it plots the selected channel's waveform using the matplotlib package, you can use subplots to show multiple channels in the same window.

It’s important to notice that this code only works with single lead signals, if you have multi-lead signals you will need to modify it to handle them.

Using WFDB data with multiple lead signals

Here’s some sample Python code that shows how to use the wfdb package to read a WFDB file and convert it to a unified CSV file that contains data for all the channels and patients:

import wfdb
import csv
import os

# Folder path that contains all the WFDB files
folder_path = 'your_folder_path'

# Create a new CSV file for writing
with open('unified.csv', 'w', newline='') as csvfile:
writer = csv.writer(csvfile)

# Write the header row to the CSV file
writer.writerow(['patient', 'channel', 'time', 'signal'])

# Iterate through all the files in the folder
for filename in os.listdir(folder_path):
if filename.endswith(".dat"):
# Read the WFDB file using the `rdsamp` function
record = wfdb.rdsamp(os.path.join(folder_path, filename))
patient = os.path.splitext(os.path.basename(filename))[0]
# write the data to the CSV file
for i in range(len(record.sig_name)):
channel = record.sig_name[i]
for j in range(len(record.p_signals[i])):
writer.writerow([patient, channel,
record.time_vect[j], record.p_signals[i][j]])

This code reads all the WFDB files in a given folder ‘your_folder_path’ using the rdsamp function from the wfdb package and then writes the patient, channel, time and signal data to a new CSV file 'unified.csv'. It uses a nested loop to iterate through all the channels of each WFDB file and write the information to the CSV file.

It’s worth noting that this code is only reading the first signal of the multi-lead signals, you can modify it to read all the signals and write them to the csv file. This CSV file can then be imported into PowerBI for further analysis.

Reading all signals from a multi-lead WFDB File

Sure, here’s a modified version of the previous code that reads all the signals from a multi-lead WFDB file and writes them to the CSV file:

import wfdb
import csv
import os

# Folder path that contains all the WFDB files
folder_path = 'your_folder_path'

# Create a new CSV file for writing
with open('unified.csv', 'w', newline='') as csvfile:
writer = csv.writer(csvfile)

# Write the header row to the CSV file
writer.writerow(['patient', 'channel', 'time', 'signal'])

# Iterate through all the files in the folder
for filename in os.listdir(folder_path):
if filename.endswith(".dat"):
# Read the WFDB file using the `rdsamp` function
record = wfdb.rdsamp(os.path.join(folder_path, filename))
patient = os.path.splitext(os.path.basename(filename))[0]

# Iterate through all the signals in the record
for i in range(len(record.sig_name)):
channel = record.sig_name[i]
for j in range(len(record.p_signals[i])):
writer.writerow([patient, channel, record.time_vect[j],
record.p_signals[i][j]])

This code reads all the signals of a multi-lead WFDB file by iterating through all the signals in the record using a nested loop, and writing the patient, channel, time and signal data to the CSV file.

Additionally, it uses the splitext method of the os.path module to extract the patient name from the filename, which is used as the first column in the CSV file.

Please make sure to check the format of the time column, PowerBI might not be able to read it as a date type, you might need to convert it to a string or datetime format. You can also use the sig_name attribute of the Record object to get the names of all the signals in the record and use it as the column name in the CSV file.

Exporting the names of the signals

import wfdb
import csv
import os

# Folder path that contains all the WFDB files
folder_path = 'your_folder_path'

# Create a new CSV file for writing
with open('unified.csv', 'w', newline='') as csvfile:
writer = csv.writer(csvfile)

# Iterate through all the files in the folder
for filename in os.listdir(folder_path):
if filename.endswith(".dat"):
# Read the WFDB file using the `rdsamp` function
record = wfdb.rdsamp(os.path.join(folder_path, filename))
patient = os.path.splitext(os.path.basename(filename))[0]

# Get the names of the signals
sig_names = record.sig_name

# Write the header row to the CSV file
writer.writerow(['patient', 'time'] + sig_names)

# Iterate through all the signals in the record
for i in range(len(sig_names)):
channel = sig_names[i]
for j in range(len(record.p_signals[i])):
writer.writerow([patient, record.time_vect[j]] +
record.p_signals[i][j])

This code uses the sig_name attribute of the Record object to get the names of all the signals in the record, then it writes the names of the signals as the header row of the CSV file. It also includes the patient name and time as columns in the CSV file. It then iterates through all the signals in the record and writes the patient, time and signal data to the CSV file.

--

--

Daniel Sepulveda Estay, PhD

I am an engineer and researcher specialized in the operation and management of supply chains, their design, structure, dynamics, risk and resilience