Extracting a key/tonality from a filename using

94 views Asked by At

I'm trying to find a way to extract the key from the file name, bypassing all min/maj characters (case insensitive), but allowing # and b.

A couple of file name examples:

sample_F_one_shot.wav
sample_oneshot_F#maj.wav
sample_Ab_one_shot.wav

An example of what the result should look like:

A; B; C; D; E; F; G

and their combinations with # and b.

Here are the basic rules:

  • The key is not case sensitive and can be written in small or capital letters.
  • The key is not at the beginning of the file name.
  • The key can be in brackets: "()"; "[]".
  • There may be signs before the key: ""; ""; "."; " "; "space"; "-"; "--"; "*"; "-".
  • After the key there may be signs: ""; ""; "."; " "; "space"; "-"; "--"; "*"; "-"; ".wav"; ",aif"; ".rx2".
  • There cannot be letters before the key.
  • There cannot be numbers before the key.
  • After the key there may be numbers: “C1”; "F#2" - numbers must be cut off.
  • Possible options with: "#maj"; "#Maj"; "#min"; "#Min"; "#m" - everything after "#" should be cut off

I came up with a 2 step system using [regexp]. In the first step I find the key with symbols around it, for example: F; .F.; [F]; F#; _F#maj; and so on...

Something like:

"regexp (?:[^a-gA-G#b]|^)([A-G](?:#|b)?(?:m(?:in(?:or)?)?|M(?:aj(?:or)?)?))(?:[^a-zA-Z#bd]|$)"

The second step is to simply get rid of all the characters, leaving only # and b in the key using another [regexp].

Problem: In the first step I am unable to achieve high accuracy in isolating the key with symbols.

Question: Perhaps there is an easier way to do this?

I think I'm confused about this. I also tried to do this using ChatGPT and it seems to be the best option I got, but it is not entirely accurate.

2

There are 2 answers

0
lrn On

In the examples, the keys you want to extract appear to be surrounded by _ or .. The keys have the form: A single A-F optionally followed by either b or # or #maj. You also mentioned 'min', so let's include that too.

A regexp for that would be:

final keyRE = RegExp(r"(?<=^|[_.])[A-F](?:b|#(?:maj|min)?)?(?=[_.]|$)");

That exactly follows the description above:

  • Following a _ or . (or start of input),
  • A single A..F
  • Optionally followed by either b or # or #maj or #min,
  • Ended by another . or _ (or the end of input).
11
Nick On

Based on your sample filenames and rules, you could use this regex with the case-insensitive flag (i):

(?<![A-Z0-9])[A-G][b#]?(?=(?:m(?:(?:aj|in)(?:or)?)?)?(?![A-Z]))

This matches:

  • (?<![A-Z0-9]) : lookbehind assertion for something which is not a letter or digit
  • [A-G] : a character in the set A to G
  • [b#]? : an optional b or #
  • (?=...) : a positive lookahead assertion for:
    • (?:m(?:(?:aj|in)(?:or)?)?)? : an optional m, maj, major, min or minor
    • (?![A-Z]) : something which is not a letter

Regex demo on regex101