Extract ID3 tags of a MP3 URL with partial download using python

2.9k views Asked by At

I need to extract ID3 tags and meta-data of remote mp3 files.

I wrote few lines that could get ID3 tags of local file:

from mutagen.mp3 import MP3
import urllib2

audio = MP3("Whistle.mp3")

songtitle = audio["TIT2"]
artist = audio["TPE1"]

print "Title: " + str(songtitle)
print "Artist: "+str(artist)

I need to achieve this for url links for mp3 files. I tried to get partial download of files using urllib2.

import urllib2
from mutagen.mp3 import MP3

req = urllib2.Request('http://www.1songday.com/wp-content/uploads/2013/08/Lorde-Royals.mp3')
req.headers['Range'] = 'bytes=%s-%s' % (0, 100)
response = urllib2.urlopen(req)
headers = response.info()
print headers.type
print headers.maintype

data = response.read()
print len(data)

How can I extract ID3 tags of the MP3 url without completly downloading the file?

2

There are 2 answers

0
hansaplast On

The id3 tags are stored in the ID3 metadata which is usually in front of the mp3 frames (containing the audio), but the mp3 standard allows them also to "follow the mp3 frames".

To download the minimum number of bytes you need to:

  1. download the first 10 bytes of the mp3, extract the ID3v2 header and compute the size of the id3v2 header
  2. to retrieve the full id3v2 tags download size bytes of the mp3
  3. use a python library to extract the ID3 tags

Here's a script (python 2 or 3) which extracts album art with a minimal amount of download size:

try:
    import urllib2 as request # python 2
except ImportError:
    from urllib import request # python 3
    from functools import reduce
import sys
from io import BytesIO
from mutagen.mp3 import MP3

url = sys.argv[1]

def get_n_bytes(url, size):
    req = request.Request(url)
    req.headers['Range'] = 'bytes=%s-%s' % (0, size-1)
    response = request.urlopen(req)
    return response.read()

data = get_n_bytes(url, 10)
if data[0:3] != 'ID3':
    raise Exception('ID3 not in front of mp3 file')

size_encoded = bytearray(data[-4:])
size = reduce(lambda a,b: a*128+b, size_encoded, 0)

header = BytesIO()
# mutagen needs one full frame in order to function. Add max frame size
data = get_n_bytes(url, size+2881) 
header.write(data)
header.seek(0)
f = MP3(header)

if f.tags and 'APIC:' in f.tags.keys():
    artwork = f.tags['APIC:'].data
    with open('image.jpg', 'wb') as img:
       img.write(artwork)

A few remarks:

  • it checks that the ID3 is in front of the file and that it's ID3v2
  • the size of the id3 tags is stored in byte 6 to 9, as documented on id3.org
  • unfortunately mutagen needs one full mp3 audio frame to parse the id3 tags. You therefore need to also download one mp3 frame (which is at max 2881 bytes long according to this comment)
  • instead of blindly assuming that the album art is jpg you should check for the image format first as id3 allows many different image types
  • tested with about 10 random mp3s from the internet, e.g. this one : python url.py http://www.fuelfriendsblog.com/listenup/01%20America.mp3
0
Pierre-Francoys Brousseau On

In your example, the ID3 tags are not fetched, so you cannot extract them.

I played around a bit after reading the spec for ID3 and here's a good way to get started.

#Search for ID3v1 tags
import string
tagIndex = string.find(data,'TAG')
if (tagIndex>0):
  if data[tagIndex+3]=='+': 
    print "Found extended ID3v1 tag!"
    title = data[tagIndex+3:tagIndex+63]
    print title
  else:
    print  "Found ID3v1 tags"
    title = data[tagIndex+3:tagIndex+33]
    print title
    #So on.
else:
  #Look for ID3v2 tags
  if 'TCOM' in data:
    composerIndex = string.find(data,'TCOM')
    #and so on. See wikipedia for a full list of frame specifications