I understand from the info/download page, that the format for google ngrams data is
ngram TAB year TAB match_count TAB volume_count NEWLINE
Here's a small extract from the file that has 1 grams that starts with a:
announced.37_VERB 2008 1 1
annually.34 1913 2 2
I understand that the _VERB part is POS tagging. However I couldn't find reliable documentation as to what the numbers after period means i.e .37 or .34 etc., If someone could provide some lead on this it would be of great help for all those getting to started to work on NLP using google ngrams as data source.