Python mido: ”OSError: undefined status byte 0xf9“

80 views Asked by At

Got

OSError: undefined status byte 0xf9

while using mido to separate the tracks of musics in a dataset.

enter image description here

The dataset is lakh/Clean MIDI subset. I am not so sure wheather this error is caused by an identified midi file or the python script itself, because the error still shows up when I have deleted the midi file that blocks the running.

The full .py file that I ran:

from mido import Message, MidiFile, MidiTrack
import os

root_dir = r"C:\Users\admin\Desktop\clean_midi"
mid = MidiFile(r"C:\Users\admin\Desktop\clean_midi\.38 Special\Caught Up In You.mid")

musicain_dirs = os.listdir(root_dir)
for dir in musicain_dirs:
    musicain_dir = os.path.join(root_dir, dir)
    if not os.path.exists(dir):
        os.makedirs(dir)
    musics = os.listdir(musicain_dir)
    for music in musics:
        src_mid = MidiFile(os.path.join(musicain_dir, music))
        tar_dir = os.path.join(dir, music)
        if not os.path.exists(tar_dir):
            os.makedirs(tar_dir)
        for track in src_mid.tracks:
            if len(track) > 5:
                tar_mid = MidiFile()
                tar_mid.tracks.append(track)
                tar_name = os.path.join(tar_dir, track.name.replace("/"," ") + "_" + music)
                if not os.path.exists(tar_name):
                    tar_mid.save(tar_name)
        print(music)

I have no idea about this "SPEC_BY_STATUS", and I could not find anything about this list and specs.py.

1

There are 1 answers

0
BlackJack On

Problems reading the files

There is something in the MIDI file that mido doesn't like or understand. Midi.org has a table of status bytes and their meanings that lists status byte value 0xF9 (decimal 249) as „Undefined (Reserved)“. So raising an exception if this occurs in a MIDI file looks like a reasonable action. Other software might just ignore this status and pretend it isn't there.

Interestingly I can't reproduce this with the given data set, but get the same exception with the status byte value 0xF5, which according to the table is also undefined. The song is „Always on My Mind“ by the Pet Shop Boys.

But that's not the only problem loading MIDI files from that set. So the code must handle exceptions when creating MidiFile objects. I encountered a total of 272 files that didn't load with mido for various reasons. A very obvious one — there is a WAV file: clean_midi/Queen/temp2.wav. Others have data bytes outside the range of 0…127, or have an unexpected end of file, or there is no MThd signature at the start of the file, and some have more specific problems with notes and modes that don't match somehow.

The code from the question tries to treat each entry in the clean_midi/ directory as a subdirectory — which fails for the two text files in clean_midi/.

So on the readings side the code must make sure it treats only directories as directories, filter for files with the *.mid extension to not pick up the WAV file, and handle exceptions when loading a MIDI file. For instance by writing out the files path and at least exception type and text, to be able to analyze it later.

Depending on what the goal is, this relatively small amount of unreadable files, 272 of 17,256 total, could just be ignored.

The source of the dataset says for the full dataset of 176,581 MIDI files:

Please note - no attempt was made to remove invalid MIDI files from this collection; as a result it contains a few thousand files which are likely corrupt.

Problems writing the tracks

There are problems on the track writing side too. At least one track name contains a byte of value 0 which can't be used in file names.

Tracks don't have to have distinct names within a MIDI file, so the code doesn't write track files because they would have the same name as an already written track file. This could be fixed by including the track number in the file name. I would also rearrange the order to {tracknumber}_{original_name}_{track_name} so it's easier to see which files belong together when the file names are in the same directory, sorted lexicographically.

Then there is one MIDI file with a really long file name and a really long track name which would result in a 530 character long file name in the original code. That is too long for some file systems. So the code should limit the file name to a reasonable length.

The revised code

Beside the things specific to the MIDI files mentioned above, the os and os.path functions should be replaced by pathlib to make the code a bit simpler and easier to read and understand. Also the main program is quite long and nested, so at least splitting a MIDI file into tracks should be factored out into a function.

#!/usr/bin/env python3
from pathlib import Path

from mido import MidiFile

CLEAN_MIDI_PATH = Path("/home/bj/forum/clean_midi")


def log_file_problem(path, type_, message):
    print("----")
    print(path)
    print(type_, message)


def split_midi(midi, file_path, tracks_path):
    for track_number, track in enumerate(midi.tracks, 1):
        if len(track) > 5:
            track_midi = MidiFile()
            track_midi.tracks.append(track)

            track_name = track.name[:100].replace("/", " ").replace("\0", " ")
            track_midi_file_path = (
                tracks_path
                / f"{track_number:03d}_{file_path.stem[-50:]}_{track_name}.mid"
            )
            if track_midi_file_path.exists():
                log_file_problem(
                    track_midi_file_path, "Warning:", "file already exists!"
                )
            else:
                track_midi.save(track_midi_file_path)


def main():
    for musician_source_path in CLEAN_MIDI_PATH.iterdir():
        if musician_source_path.is_dir():
            musician_target_path = Path(musician_source_path.name)
            musician_target_path.mkdir(exist_ok=True)

            for midi_file_path in musician_source_path.glob("*.mid"):
                try:
                    midi = MidiFile(midi_file_path)
                except Exception as error:
                    log_file_problem(midi_file_path, type(error), error)
                else:
                    tracks_path = musician_target_path / midi_file_path.stem
                    tracks_path.mkdir(exist_ok=True)
                    split_midi(midi, midi_file_path, tracks_path)


if __name__ == "__main__":
    main()