Converting CSV file data into federated data

404 views Asked by At

I am trying to convert my CSV dataset into a federated data. Please find the code and the error I am getting while I am running my code

code: import collections

import numpy as np
import pandas as pd
import tensorflow as tf
import tensorflow_federated as tff

np.random.seed(0)
df = pd.read_csv('path to my csv file')

client_id_colname = 'aratio: continuous.' 
SHUFFLE_BUFFER = 1000
NUM_EPOCHS = 1

client_ids = df[client_id_colname].unique()
train_client_ids = sample(client_ids.tolist(),500)
test_client_ids = [x for x in client_ids if x not in train_client_ids]

def create_tf_dataset_for_client_fn(client_id):
  client_data = df[df[client_id_colname] == client_id]
  dataset = tf.data.Dataset.from_tensor_slices(client_data.to_dict('list'))
  dataset = dataset.shuffle(SHUFFLE_BUFFER).batch(1).repeat(NUM_EPOCHS)
  return dataset

train_data = tff.simulation.ClientData.from_clients_and_fn(
        client_ids=train_client_ids,
        create_tf_dataset_for_client_fn=create_tf_dataset_for_client_fn
    )
test_data = tff.simulation.ClientData.from_clients_and_fn(
        client_ids=test_client_ids,
        create_tf_dataset_for_client_fn=create_tf_dataset_for_client_fn
    )

Error: ---------------------------------------------------------------------------

NameError                                 Traceback (most recent call last)
<ipython-input-7-9d85508920a8> in <module>
     15 # split client id into train and test clients
     16 client_ids = df[client_id_colname].unique()
---> 17 train_client_ids = sample(client_ids.tolist(),500)
     18 test_client_ids = [x for x in client_ids if x not in train_client_ids]
     19 

NameError: name 'sample' is not defined
2

There are 2 answers

3
Zachary Garrett On

Python cannot find the sample function. The code will need to import it from somewhere, a few possible options:

To use the first, the code would need an import random and the sample line would need to change to:

train_client_ids = random.sample(client_ids.tolist(), 500)
0
oreopot On

add the following line in the list of your import statements:

from random import sample