User Guide

The following User Guide pertains to the wesley cluster. Training for the Visualization facilities is done on an individual basis and varies depending on the requirements of your project.

Logging Into the Head Node

Logging into wesley is done through the SSH protocol and requires that you have an ssh client installed on your computer. Linux and Mac have built-in ssh clients available from their command prompts. Windows users can download and install one of several freely available ssh clients. We recommend MobaXterm or Putty.

When you log in to wesley you are actually logging onto the head node (sometimes also referred to as the master, the frontend, or the login node). The head node is where you will perform virtually all of your work including submitting your programs (jobs) for execution, monitoring your jobs, editing, compiling and debugging programs, managing your files, etc.

Logging Into a Compute Node

In some circumstances you may need to log in to one of the compute nodes. For example, you may need to monitor the processes of one of your jobs on that node; using top for example, or manage temporary job files on a node's local scratch disk (/data). For small tasks like these, it is acceptable to logon to the node directly using ssh. To do so, you must first be logged into the head node, and then from there use the ssh command to log into the node itself. For example, to log into the compute node named wes-04-03 you would issue the following command from the head node:

ssh wes-04-03

In some situations you may need to run a software package interactively on a compute node. This might be the case if, for example, your interactive work will require more than the 14gb RAM limit per user on the head node, or if it will require access to hardware only available on a particular node (eg. GPU software development using CUDA). In these cases you should use the queuing system to submit an interactive job. The queuing system will then find a free node, reserve a cpu core and the appropriate amount of RAM, and then log you into the node to give you an interactive shell. By using interactive jobs in this manner, you will not interfere with other running jobs on the cluster, and other user's jobs will never interfere with you. As an example, to get an interactive shell on any node that has 24gb available ram and reserve it for 4 hours use:

qsub -I -l pvmem=24gb -l walltime=4:00:00

Access restrictions (Firewall)

wesley is accessible from any computer on the Lakehead University network and to several off-campus, but well known, networks. For security reasons wesley is not visible to the entire world.

Changing your Password

To change your password, log in to the head node and type:

passwd

The first time that you log in you should change the initial password that was assigned to you. It is also recommended that you change your password periodically.

Transfering Files

Files can be transferred between wesley and your computer using the sftp or scp protocols (both are secure protocols based on the SSH protocol). Mac and Linux have these available through the command line. Windows users can download free clients such as MobaXterm or WinSCP.

File Storage

Home Directory

Your home directory is located in /home/YOUR_USERNAME. /home is a filesystem which is physically located on the storage node of the cluster, but exported using the NFS protocol so that it is fully accessible from the head node and all of the compute nodes. You see exactly the same files under /home no matter which node you are logged into or running jobs on.

There are currently no quotas imposed on the amount of data that you can store in your home directory. However, since the entire /home filesystem is only 9 TB and is shared by all users it is recommended that you keep your usage below 30 GB.

If usage on /home begins to become a problem users with the largest amount of data will be contacted directly and asked to reduce their files. If problems persist then system imposed disk quotas will be implemented.

Scratch File System

Some HPC applications require large and/or many temporary scratch files while running but these files are not kept after the calculations complete. In other cases, output files might be kept, but they are large and written with very frequent small write operations. Because /home is accessed over a relatively slow 1 Gbps ethernet network, and because there may be hundreds of these types of applications running simultaneously, read and write performance in /home can be severely impacted.

To address this issue, each compute node is equipped with a 1 TB locally attached disk (in reality, two disks in RAID 0 (stripped)). High I/O jobs running on a compute node can use this for their scratch files in order to obtain better performance and increase overall cluster efficiency.

To provide access to these local disks a directory /data/YOUR_USER_NAME is automatically created for you on each node. To make use to these simply write your job control script appropriately.

Running Programs (Jobs)

Running programs on the cluster is somewhat different than running a programs on a standalone Linux/Unix machine. In general your programs must be submitted to a job queue rather than run directly on the login/head node. An automatic scheduler is responsible for identifying compute nodes with adequate resources, taking jobs from the queue and starting them on the assigned compute node, and cleaning things up after the job completes. Several user commands are provided for submitting your jobs to the queue and for monitoring and managing (e.g. deleting) them. These commands are described in detail in the Job Control section.

Running programs directly on the head node itself should be limited to editing, compiling, debugging, simple interactive graphical apps (eg. word processing, graphing, etc.), managing your files, and submitting/managing your jobs. Limits are imposed on the memory size and run times of user processes on the login node to prevent system overloading.

Using the GPGPUs

Running CUDA programs

The Nvidia Tesla M2050 GPGPU is attached to the compute node wes-00-00. In order to run a program that requires the GPU, you must explicitly request wes-00-00 in your job control script.

Compiling and Debugging Cuda code

Open an interactive job to wes-00-00 in order to compile your CUDA gpu code or to interactively debug your CUDA code using Nvidia's cuda debugger. For example, the following will request a 4 hour session on wes-00-00 and reserve 12gb of RAM.

qsub -I -X -l host=wes-00-00 -l walltime=4:00:00 -l pvmem=8gb

The -X option turns on X11 (graphics) forwarding and is required if you intend to run any graphical user interface. This also requires that you used X11 forwarding when you initially logged onto the head node.