Param-Ishan for Deep learning - 101

How to configure a new account from scratch on Param-Ishan

--

Let’s configure a new account on the Param-Ishan supercomputer to successfully execute:

>> import tensorflow as tf

I have got help from various sources but didn’t find a central resource with all the information in one place. This post intends to serve that purpose.

Basics

The basics of how to login on the various nodes and use the scheduler SLURM have been clearly mentioned here.(please refer the article to understand what these things stand for). To briefly describe the role of the scheduler, the login node that we are provided allows a script to run for about 1 hour. After that, the process is automatically killed. The purpose of that is, we can run our script on debug mode using our login node and once we are sure that it’s working, we submit it to the scheduler, where it gets assigned a priority ( as per our need of its resources) and runs accordingly. Please refer the link above to gather more insight on how it’s actually done.

A few additional things to keep in mind:

  1. You cannot connect to the internet on Param-Ishan using your normal 202.* proxy.
  2. If you become greedy and set unreasonably high number of GPUs for the scheduler, you may end up waiting in the queue for a LONG time.
  3. Never comment this line of your .bashrc file, bad things will happen if you do:

export LD_LIBRARY_PATH=/cm/shared/apps/glibc_2.14:$LD_LIBRARY_PATH

Setup

These are compiled after help from various sources.

1) Add the following to your .bashrc file:

module unload cuda80/toolkit/8.0.44
module unload gcc/5.2.0
module load gcc/4.8.2
module load cuda75/toolkit/7.5.18

2) Install anaconda (need to do that by transferring the .sh file from local using scp).

3) Install virtualenv.
4) git clone latest versions of keras and theano into your laptop and upload them into supercomputer. Install theano and keras by going inside thier respective folders and running: python setup.py install. (Once you have anaconda installed, you should be able to use pip [with --proxy flag] for the installation.)
5) Create .theanorc
6) Set the following theano flags

[gcc]
cxxflags=-march=core2
floatX=float32,device=gpu

step 8 : To create the virtualenv

virtualenv -p /home/apps/cdac/Python-2.7.10/bin/python <Name>

Tensorflow

So, TensorFlow (TF) GPU is currently supported only for version 0.10. You can refer the following tutorial to get it installed. The TF_BINARY_URL needs to be changed if you are using a Python2.7 environment. Refer the docs to get the correct link for your environment.

But if your your code requires the latest version of TF, the only option (go through this if it is really really necessary) is to use TF CPU (currently). For this, you need to install GLIBC 2.17. Refer this stackoverflow link for setting up GLIBC and this one to make TF work with it. To make your life easier, you can add this to your .bashrc file:

alias libs=’/tmp/libc6_2.17/lib/x86_64-linux-gnu/ld-linux-x86–64.so.2 — library-path /tmp/libc6_2.17/lib/x86_64-linux-gnu/:$LIBRARY_PATH:/cm/shared/apps/gcc/4.8.2/lib64/’

Then you can run Python using its full path. In my case it works as:

libs /home/<username>/anaconda2/bin/python2.7

Now, you can use TF (CPU) with the latest version. Hope this helps someone. :)

Cheers! To a better tomorrow :)

--

--

Curious about almost everything. Passionate about climate change and education. Trying to be helpful!