Xbrl data extraction

2.5k views Asked by At

Bit of a.newb when it comes to this but I have around 15,000 html files with XBRL data in them. I've downloaded these files from http://download.companieshouse.gov.uk/en_monthlyaccountsdata.html Ideally I want to extract from all of these files information related to the company's name and intangible assets but I'm unsure how to do this.

Ideally I'd want to export the data in to columns in a single excel file.

Any help would be appreciated.

1

There are 1 answers

0
shunty On

A bit late to answer, but never mind. As a start, you could have a look at VT Fact Viewer. It can give you a grid display of the XBRL facts in the document and you can export them to Excel. Once there you'll need to do some filtering looking for tags like "core:IntangibleAssets" or maybe "uk-gaap:Intangible...." sort of things.

However, if you're doing this on a lot of documents (such as the CH data dump) then you're going to need to start doing some "proper" xml processing of your own using a programming or scripting language. But, the viewer will still be helpful as it will show you the sort of things you are aiming to extract.

As a simple example the following will get you some Intangible asset data in CSV format which you can open in Excel. Written in C# (using LINQPad) so you'll have to translate if required:

string fname = @"C:\ch_data\Prod223_1770_00101234_20160331.html";
var doc = XDocument.Load(fname);
// The 'ix' namespace may use 2008 or 2013 schema so we'll just use the .LocalName property of the tag
var elements = doc.Root
    .Descendants()
    .Where(x => x.Name.LocalName == "nonFraction")
    .Where(x => x.Attributes().Any(a => a.Value.Contains("Intangible")));

var lines = new List<string>();
foreach (var element in elements)
{
    var attribs = element.Attributes();
    var ctx = attribs.FirstOrDefault(a => a.Name == "contextRef")?.Value ?? "";
    var dec = attribs.FirstOrDefault(a => a.Name == "decimals")?.Value ?? "";
    var scale = attribs.FirstOrDefault(a => a.Name == "scale")?.Value ?? "";
    var units = attribs.FirstOrDefault(a => a.Name == "unitRef")?.Value ?? "";
    var fmt = attribs.FirstOrDefault(a => a.Name == "format")?.Value ?? "";
    var name = attribs.FirstOrDefault(a => a.Name == "name")?.Value ?? "";
    var value = element.Value;

    string line = $"\"{ctx}\",\"{dec}\",\"{scale}\",\"{units}\",\"{name}\",\"{fmt}\",\"{value}\"";
    lines.Add(line);
    //Console.WriteLine(line);
}
File.WriteAllLines(Path.ChangeExtension(fname, "csv"), lines);

Change the input filename to loop through a directory or list of filenames as appropriate.