Bit of a.newb when it comes to this but I have around 15,000 html files with XBRL data in them. I've downloaded these files from http://download.companieshouse.gov.uk/en_monthlyaccountsdata.html Ideally I want to extract from all of these files information related to the company's name and intangible assets but I'm unsure how to do this.
Ideally I'd want to export the data in to columns in a single excel file.
Any help would be appreciated.
A bit late to answer, but never mind. As a start, you could have a look at VT Fact Viewer. It can give you a grid display of the XBRL facts in the document and you can export them to Excel. Once there you'll need to do some filtering looking for tags like "core:IntangibleAssets" or maybe "uk-gaap:Intangible...." sort of things.
However, if you're doing this on a lot of documents (such as the CH data dump) then you're going to need to start doing some "proper" xml processing of your own using a programming or scripting language. But, the viewer will still be helpful as it will show you the sort of things you are aiming to extract.
As a simple example the following will get you some Intangible asset data in CSV format which you can open in Excel. Written in C# (using LINQPad) so you'll have to translate if required:
Change the input filename to loop through a directory or list of filenames as appropriate.