Initializing large number of variables in tensorflow takes a long time

706 views Asked by At

I have a large number of variables (2000) that need to be initialized. Tensorflow takes a long time to initialize these variables which is a blocker for me right now. I am running tf in distributed mode (between graph.

with tf.variable_scope("f_counts"):
    per_ps_features = [] #A List of list
    for node in xrange(num_workers):
        with tf.device("/job:ps/task:{}".format(node % num_ps)):
            f = []  #List of features per parameter server
            for ps_node in xrange(num_workers):
                f.append(tf.get_variable(initializer=tf.constant([], dtype=tf.string), dtype=tf.string, validate_shape=False, trainable=False, name='ps_'+str(node)+'features_'+str(ps_node)))  # unique features per node                    
            per_ps_features.append(f)

As you can see, each PS has a variable corresponding to the number of PS servers. This makes the following very slow (sometimes an hour to just create the session)

with tf.train.MonitoredTrainingSession(master=server.target, is_chief= is_chief, config=tf.ConfigProto(log_device_placement=False)) as session: 

Is there a workaround or alternative when say num_workers = 200 ??

0

There are 0 answers