Got
OSError: undefined status byte 0xf9
while using mido to separate the tracks of musics in a dataset.
The dataset is lakh/Clean MIDI subset. I am not so sure wheather this error is caused by an identified midi file or the python script itself, because the error still shows up when I have deleted the midi file that blocks the running.
The full .py file that I ran:
from mido import Message, MidiFile, MidiTrack
import os
root_dir = r"C:\Users\admin\Desktop\clean_midi"
mid = MidiFile(r"C:\Users\admin\Desktop\clean_midi\.38 Special\Caught Up In You.mid")
musicain_dirs = os.listdir(root_dir)
for dir in musicain_dirs:
musicain_dir = os.path.join(root_dir, dir)
if not os.path.exists(dir):
os.makedirs(dir)
musics = os.listdir(musicain_dir)
for music in musics:
src_mid = MidiFile(os.path.join(musicain_dir, music))
tar_dir = os.path.join(dir, music)
if not os.path.exists(tar_dir):
os.makedirs(tar_dir)
for track in src_mid.tracks:
if len(track) > 5:
tar_mid = MidiFile()
tar_mid.tracks.append(track)
tar_name = os.path.join(tar_dir, track.name.replace("/"," ") + "_" + music)
if not os.path.exists(tar_name):
tar_mid.save(tar_name)
print(music)
I have no idea about this "SPEC_BY_STATUS", and I could not find anything about this list and specs.py.
Problems reading the files
There is something in the MIDI file that
mido
doesn't like or understand. Midi.org has a table of status bytes and their meanings that lists status byte value 0xF9 (decimal 249) as „Undefined (Reserved)“. So raising an exception if this occurs in a MIDI file looks like a reasonable action. Other software might just ignore this status and pretend it isn't there.Interestingly I can't reproduce this with the given data set, but get the same exception with the status byte value 0xF5, which according to the table is also undefined. The song is „Always on My Mind“ by the Pet Shop Boys.
But that's not the only problem loading MIDI files from that set. So the code must handle exceptions when creating
MidiFile
objects. I encountered a total of 272 files that didn't load withmido
for various reasons. A very obvious one — there is a WAV file:clean_midi/Queen/temp2.wav
. Others have data bytes outside the range of 0…127, or have an unexpected end of file, or there is noMThd
signature at the start of the file, and some have more specific problems with notes and modes that don't match somehow.The code from the question tries to treat each entry in the
clean_midi/
directory as a subdirectory — which fails for the two text files inclean_midi/
.So on the readings side the code must make sure it treats only directories as directories, filter for files with the
*.mid
extension to not pick up the WAV file, and handle exceptions when loading a MIDI file. For instance by writing out the files path and at least exception type and text, to be able to analyze it later.Depending on what the goal is, this relatively small amount of unreadable files, 272 of 17,256 total, could just be ignored.
The source of the dataset says for the full dataset of 176,581 MIDI files:
Problems writing the tracks
There are problems on the track writing side too. At least one track name contains a byte of value 0 which can't be used in file names.
Tracks don't have to have distinct names within a MIDI file, so the code doesn't write track files because they would have the same name as an already written track file. This could be fixed by including the track number in the file name. I would also rearrange the order to
{tracknumber}_{original_name}_{track_name}
so it's easier to see which files belong together when the file names are in the same directory, sorted lexicographically.Then there is one MIDI file with a really long file name and a really long track name which would result in a 530 character long file name in the original code. That is too long for some file systems. So the code should limit the file name to a reasonable length.
The revised code
Beside the things specific to the MIDI files mentioned above, the
os
andos.path
functions should be replaced bypathlib
to make the code a bit simpler and easier to read and understand. Also the main program is quite long and nested, so at least splitting a MIDI file into tracks should be factored out into a function.