User Using TensorFlow on GPU Nodes

From T2B Wiki
Jump to: navigation, search

Contents

OBSOLETE

Although the T2B has GPUs, they are not supported by tensorflow.
Ohters have reported that with the kind of GPU's we have, the results given by tensorflow might be wrong!
We recommend to not use tensorflow on our GPUs



Setting up your environment

In order to use TensorFlow, you need to be connected to one of the machines containing GPU. You can do this both interactively or via the queue. If you are developing, we recommend that you connect directly to the machine.
To be able to connect to a GPU machine, you first need to log in to the cluster. You need to add the -A option to your ssh command. A good example to connect is the following:

ssh -o ServerAliveInterval=100 -AX mshort.iihe.ac.be

when you connect to the cluster in this way, you can tunnel further to the GPU machine:

ssh node49-1.wn

Next, declare the GPUs in your environment :

$ source /swmgrs/icecubes/set_gpus.sh

To make sure that it worked, try something like this :

$ echo $CUDA_VISIBLE_DEVICES
0,1

In this case, we have 2 GPU devices at our disposal.

To easily get a ready-to-use software environment for TensorFlow with GPU support, we make use of a Singularity container.

$ singularity shell --nv -B /swmgrs -B /cvmfs -B /scratch   /swmgrs/nonvo/singularity/osgvo-tensorflow-gpu.simg 

Testing your environment

When launching the previous command, your are in a shell pertaining to the osgvo-tensorflow-gpu image. This image is ubuntu based, so in that respect, it is different from the rest of our cluster which is redhat like. However, the main functionality is not changed by this fact.

Let us now test a small program. Paste the following code into a file name TF.py:

import tensorflow as tf
    
matrix1 = tf.constant([[3., 3.]])
matrix2 = tf.constant([[2.],[2.]])
product = tf.matmul(matrix1, matrix2)

with tf.Session() as sess:
  with tf.device("/gpu:0"):
    result = sess.run(product)
    print(result)

This program defines two matrices and multiplies them making use of the first GPU, and then it prints the result.
To run it, issue:

python TF.py

The end result should be [12.].

Running on the cluster

If you want to run jobs on the cluster, create the following batch script as job.sh and make it executable (chmod +x job.sh).

source /swmgrs/icecubes/set_gpus.sh
hostname
env|grep CUDA
echo $CUDA_VISIBLE_DEVICES

singularity exec --nv -B /swmgrs -B /cvmfs -B /scratch /swmgrs/nonvo/singularity/osgvo-tensorflow-gpu.simg python TF.py

Notice the "singularity exec" command that will execute the command "python TF.py" within our singularity container.
This job can be submitted to the gpu queue in the following way:

qsub -q gpu job.sh

When the jobs is completed, the file "job.sh.o<jobnr>" will contain the result of your calculations.

Personal tools
Namespaces

Variants
Actions
Navigation
Tools