using train_test_split for a json file

20 views Asked by At

I'm loading json files and then trying to use train_test_split to split the data.

The code looks like this:

### Load json files
    with open(r'C:\Users\\Documents\Github\\data\training profile sentence real profiles.json', 'r') as f:
         real_dataset = json.load(f)

    with open(r'C:\Users\\Documents\Github\\\data\training profile sentence fake profiles.json', 'r') as f:
         fake_dataset = json.load(f)

    ##### using train test split from my own dataset that gets called through function
    real_train, real_valid = train_test_split(real_dataset, test_size=0.25, shuffle=True)

    fake_train, fake_valid = train_test_split(fake_dataset, test_size=0.25, shuffle=True)

This is giving me the following error:

ValueError: With n_samples=1, test_size=0.25 and train_size=None, the resulting train set will be empty. Adjust any of the aforementioned parameters.

I've added train_size parameters to the call but it still gives me the above.

A lot of the examples that I've seen show n_samples=0 but I'm getting n_samples=1 which to me sounds like it is seeing the data.

Is using train_test_split not possible for json files? or do I need to do more in order to use it? Or is there another way to do this?

I've used train_test_split but only with dataframe data. I cannot find much information about using json files with it.

0

There are 0 answers