Yelp DataSet - Access specifically to review.jason

407 views Asked by At

Im doing an online course where we use the Yelp Dataset.

The dataset is available here
https://www.yelp.com/dataset

The download arrives as a yelp.dataset.tar file.

If I extract that file using say a win 7, it becomes a file named "yelp_dataset" of type - I'm not sure because it doesnt have a "." extenstion. The course which uses python to get into the "Review Data" goes straight to

path = 'C:/Users/xyz/Desktop/Python Folder/Data/yelp_dataset/review.json'
f = open(path)
d = jsonloads(f.readline)) 

however I obviously don't have review.json or any of the other .json files like user.json etc. Having read the documentation on the dataset I read that "Each file is composed of a single object type, one JSON-object per-line." however not sure how to get at the review.json object.

Many thanks

1

There are 1 answers

0
user2126062 On

Thank you Furas - you put me on the right track. I should not have been extracting the file using Winzip or similar. The correct thing to do is to use Python to extract the files.

path = 'aFILEPATH/yelp_dataset/yelp_dataset.tar'

# open file
file = tarfile.open(path)

# extracting file
file.extractall()

file.close()