I'm loading json files and then trying to use train_test_split to split the data.
The code looks like this:
### Load json files
with open(r'C:\Users\\Documents\Github\\data\training profile sentence real profiles.json', 'r') as f:
real_dataset = json.load(f)
with open(r'C:\Users\\Documents\Github\\\data\training profile sentence fake profiles.json', 'r') as f:
fake_dataset = json.load(f)
##### using train test split from my own dataset that gets called through function
real_train, real_valid = train_test_split(real_dataset, test_size=0.25, shuffle=True)
fake_train, fake_valid = train_test_split(fake_dataset, test_size=0.25, shuffle=True)
This is giving me the following error:
ValueError: With n_samples=1, test_size=0.25 and train_size=None, the resulting train set will be empty. Adjust any of the aforementioned parameters.
I've added train_size parameters to the call but it still gives me the above.
A lot of the examples that I've seen show n_samples=0 but I'm getting n_samples=1 which to me sounds like it is seeing the data.
Is using train_test_split not possible for json files? or do I need to do more in order to use it? Or is there another way to do this?
I've used train_test_split but only with dataframe data. I cannot find much information about using json files with it.