I have a large number of variables (2000) that need to be initialized. Tensorflow takes a long time to initialize these variables which is a blocker for me right now. I am running tf in distributed mode (between graph.
with tf.variable_scope("f_counts"):
per_ps_features = [] #A List of list
for node in xrange(num_workers):
with tf.device("/job:ps/task:{}".format(node % num_ps)):
f = [] #List of features per parameter server
for ps_node in xrange(num_workers):
f.append(tf.get_variable(initializer=tf.constant([], dtype=tf.string), dtype=tf.string, validate_shape=False, trainable=False, name='ps_'+str(node)+'features_'+str(ps_node))) # unique features per node
per_ps_features.append(f)
As you can see, each PS has a variable corresponding to the number of PS servers. This makes the following very slow (sometimes an hour to just create the session)
with tf.train.MonitoredTrainingSession(master=server.target, is_chief= is_chief, config=tf.ConfigProto(log_device_placement=False)) as session:
Is there a workaround or alternative when say num_workers = 200
??