I want to use the 20 newsgroups dataset to test an algorithm, and analysis the significant words for each group.
In the website provided by University of Toronto. But I can't find the correspond vocabulary file for this dataset. So is there anyone else could give me a light?
You could try here for the 20 newsgroups dataset. It also includes a vocabulary file, but it may not be consistent with the file you have so it might help to use all the files there.
Hope this Helps!