If someone can help me understand the situation it would be great. Thanks in advance. My setup: OS: Ubuntu 16.04, 2 Titan X GPUs. TensorFlow (version 0.12.1) installed in a conda environment using pip as on TF docs. Python 3.5.
Code:
I ran the following code to test my 2 GPU setup. Once each with random_matrix = tf.zeros(...)
and random_matrix = tf.random_uniform(...)
. The outputs are shown below.
Questions:
1) When I run with tf.zeros
. The timings on CPU and GPU are identical. But with tf.random_uniform
I see that the GPU is faster (as I had expected). Why is tf.zeros
slower on GPU? What am I missing?
2) I have fixed the global seed and the local seed. Why are the outputs within the GPUs different for the tf.random_uniform
case?
Thanks a lot for any insights in advance.
import sys
import numpy as np
import tensorflow as tf
from datetime import datetime
device_names = ["/cpu:0", "/gpu:0", "/gpu:1"]
shapes = [(3000, 3000), (6000, 6000), (9000, 9000), (12000, 12000)]
all_timings = []
tf.set_random_seed(1234)
for device_name in device_names:
device_timings = []
for shape in shapes:
print("device_name:::::::::{}".format(device_name))
with tf.device(device_name):
# random_matrix = tf.zeros(shape)
random_matrix = tf.random_uniform(shape=shape,
minval=0,
maxval=1,
seed=1234)
result_op = tf.reduce_sum(tf.matmul(random_matrix,tf.transpose(random_matrix)))
start_time = datetime.now()
result = -1.0
with tf.Session(config=tf.ConfigProto(log_device_placement=False)) as session:
result = session.run(result_op)
time_diff = datetime.now() - start_time
device_timings.append((device_name,
shape,
"time_taken (secs): {}".format(time_diff.total_seconds()),
"result: {}".format(result)))
print("++++++++++++++++++++++++++++++++++++++++++++++++++++++\n\n")
all_timings.append(device_timings)
print("\n\n")
for device_timings in all_timings:
for t in device_timings:
print(t)
print("---------------------------------------------------------\n\n")
Timings with tf.random_uniform():
('/cpu:0', (3000, 3000), 'time_taken (secs): 1.146831', 'result: 6754431488.0')
('/cpu:0', (6000, 6000), 'time_taken (secs): 2.816985', 'result: 54023852032.0')
('/cpu:0', (9000, 9000), 'time_taken (secs): 9.372665', 'result: 184425938944.0')
('/cpu:0', (12000, 12000), 'time_taken (secs): 21.718614', 'result: 439655661568.0')
--------------------------------------------------------
('/gpu:0', (3000, 3000), 'time_taken (secs): 0.39667', 'result: 6754406912.0')
('/gpu:0', (6000, 6000), 'time_taken (secs): 0.085984', 'result: 54006796288.0')
('/gpu:0', (9000, 9000), 'time_taken (secs): 0.221407', 'result: 182251880448.0')
('/gpu:0', (12000, 12000), 'time_taken (secs): 0.444187', 'result: 431996174336.0')
---------------------------------------------------------
('/gpu:1', (3000, 3000), 'time_taken (secs): 0.399159', 'result: 6754401792.0')
('/gpu:1', (6000, 6000), 'time_taken (secs): 0.102889', 'result: 54006857728.0')
('/gpu:1', (9000, 9000), 'time_taken (secs): 0.262842', 'result: 182251585536.0')
('/gpu:1', (12000, 12000), 'time_taken (secs): 0.469139', 'result: 431996141568.0')
---------------------------------------------------------
Timings with tf.zeros():
('/cpu:0', (3000, 3000), 'time_taken (secs): 1.040602', 'result: 0.0')
('/cpu:0', (6000, 6000), 'time_taken (secs): 2.760587', 'result: 0.0')
('/cpu:0', (9000, 9000), 'time_taken (secs): 9.134257', 'result: 0.0')
('/cpu:0', (12000, 12000), 'time_taken (secs): 21.410583', 'result: 0.0')
---------------------------------------------------------
('/gpu:0', (3000, 3000), 'time_taken (secs): 0.394707', 'result: 0.0')
(/gpu:0', (6000, 6000), 'time_taken (secs): 2.750311', 'result: 0.0')
('/gpu:0', (9000, 9000), 'time_taken (secs): 9.141721', 'result: 0.0')
('/gpu:0', (12000, 12000), 'time_taken (secs): 21.441183', 'result: 0.0')
--------------------------------------------------------
('/gpu:1', (3000, 3000), 'time_taken (secs): 0.390197', 'result: 0.0')
('/gpu:1', (6000, 6000), 'time_taken (secs): 2.788815', 'result: 0.0')
('/gpu:1', (9000, 9000), 'time_taken (secs): 9.335516', 'result: 0.0')
('/gpu:1', (12000, 12000), 'time_taken (secs): 21.654866', 'result: 0.0')
Thanks Yaroslav! I provide the code and results from my run, just in case somebody else is interested. If you try the code please be patient for a few minutes.
Code:
Summary: