User Tools

Site Tools


pub:forge

The Forge

System Information

Software

The Forge was built with rocks 6.1.1 which is built on top of CentOS 6.8. With the Forge we made the conversion from using PBS/Maui to using SLURM as our scheduler and resource manager.

Hardware

Management nodes

The head node and login node are Dell commodity servers configured as follows.

Dell R430: Dual 8 core Haswell CPUs with 64GB of DDR4 ram and dual 1TB 7.2K rpm drives in RAID 1

Compute nodes

The newly added compute nodes are all SuperMicro super servers TP-2028 chassis configured as follows.

SuperMicro TP-2028: 4 node chassis with each node containing dual 16 core Haswell CPUs with 256 GB DDR4 ram and 6 300GB 15K SAS drives in raid 0.

Storage

General Policy Notes

None of the cluster attached storage available to users is backed up in any way by us, this means that if you delete something and don't have a copy somewhere else, it is gone. Please note the data stored on cluster attached storage is limited to Data Class 1 and 2 as defined by UM System Data Classifications. If you have need to store things in DCL3 or DCL4 please contact us so we may find a solution for you.

Home Directories

The Forge home directory storage is available from an NFS share, meaning your home directory is the same across the entire cluster. This storage will provide 14 TB of raw storage, 13 TB of that is available for users, limited to 50GB per user, which can be expanded upon request with proof of need. This volume is not backed up, we do not provide any data recovery guarantee in the event of a storage system failure. System failures where data loss occurs are rare, but they do happen. All this to say, you should not be storing the only copy of your critical data on this system.

Scratch Directories

Each user will get a scratch directory created for them at /mnt/stor/scratch/$USER an alias of `cdsc` has also been made for users to cd directly to this location. As with all storage scratch space is not backed up, and even more to the fact of data impermanence in this location it is actively cleaned in an attempt to prevent the storage from filling up. The intent for this storage is for your programs to create temporary files which you may need to keep after the calculation completes for a short time only. The volume is a high speed network attached scratch space, there currently are no quotas placed on the directories in this scratch space, however if the 20TB volume filling up becomes a problem we will have to implement quotas.

Along with the networked scratch space, there is always local scratch on each compute node for use during calculations in /tmp. There is no quota placed on this space, and it is cleaned regularly as well, but things stored in this space will only be available to processes executing on the node in which they were created. Meaning if you create it in /tmp in a job, you won't be able to see it on the login node, and other processes won't be able to see it if they are on a different node than the process which created the file.

Leased Space

If home directory, and scratch space availability aren't enough for your storage needs we also lease out quantities of cluster attached space. If you are interested in leasing storage please contact us. If you already are leasing storage, but need a reference guide on how to manage the storage please go here.

Policies

Under no circumstances should your code be running on the login node.

You are allowed to install software in your home directory for your own use. Know that you will *NOT* be given root/sudo access, so if your software requires it you will not be able to use that software. Contact ITRSS about having the software installed cluster-wide.

User data on the Forge is not backed up meaning it is your responsibility to back up important research data to a location off site via any of the methods in the moving_data section.

If you are a student your jobs can run on any compute node in the cluster, even the ones dedicated to researchers, however if the researcher who has priority access to that dedicated node then your job will stop and go back into the queue. You may prevent this preemption by specifying to run on just the general nodes in your job file, please see the documentation on how to submit this request.

If you are a researcher who has purchased priority access for 3 years to an allocation of nodes your jobs will primarily run on those nodes, if your job requires more resources than you have available in your priority queue your job will run in the general queue at the same priority as student jobs. You may add users to your allocation so that your research group may also benefit from the priority access.

If you are a researcher who has purchased an allocation of CPU hours you will run on the general nodes at the same priority as the students. Your job will not run on any dedicated nodes and will not be susceptible to preemption by any other user. Once your job starts it will run until it fails, completes, or runs out of allocated time.


Partitions

The Hardware in the Forge is split up into separate groups, or partitions. Some hardware is in more than one partition, for the most part you won't have to define a partition in a job submission file as we have a plugin that will select the best partition for your user for the job to run in. However there are a few cases that you will want to assign a job to a specific partition. Please see the table below for a list of the limits or default values given to jobs based on the partition. The important thing to note is how long you can request your job to run.

Partition Time Limit Default Memory per CPU
short 4 hours 1000MB
requeue 7 days 1000MB
free 14 days 1000MB
any priority partition 30 days varies by hardware

Quick Start

Logging in

SSH (Linux)

Open a terminal and type

 ssh username@forge.mst.edu 

replacing username with your campus sso username, Enter your sso password

Logging in places you onto the login node. Under no circumstances should you run your code on the login node.

If you are submitting a batch file, then your job will be redirected to a compute node to be computed.

However, if you are attempting use a GUI, ensure that you do not run your session on the login node (Example: username@login-44-0). Use an interactive session to be directed to a compute node to run your software.

sinteractive

For further description of sinteractive, read the section in this documentation titled Interactive Jobs.

Putty (Windows)

Open Putty and connect to forge.mst.edu using your campus SSO.

Off Campus Logins

Our off campus logins go through a different host, forge-remote.mst.edu and use public key authentication only, password authentication is disabled for off campus users. To learn how to connect from off campus please see our how to on setting up public key authentication

Submitting a job

Using SLURM is very similar to using PBS/Maui to submit jobs, you need to create a submission script to execute on the backend nodes, then use a command line utility to submit the script to the resource manager. See the file contents of a general submission script complete with comments.

Example Job Script
batch.sub
#!/bin/bash
#SBATCH --job-name=Change_ME 
#SBATCH --ntasks=1
#SBATCH --time=0-00:10:00 
#SBATCH --mail-type=begin,end,fail,requeue 
#SBATCH --export=all 
#SBATCH --out=Forge-%j.out 
 
# %j will substitute to the job's id
#now run your executables just like you would in a shell script, Slurm will set the working directory as the directory the job was submitted from. 
#e.g. if you submitted from /home/blspcy/softwaretesting your job would run in that directory.
 
#(executables) (options) (parameters)
echo "this is a general submission script"
echo "I've submitted my first batch job successfully"

Now you need to submit that batch file to the scheduler so that it will run when it is time.

 sbatch batch.sub 

You will see the output of sbatch after the job submission that will give you the job number, if you would like to monitor the status of your jobs you may do so with the squeue command.

Common SBATCH Directives
Directive Valid Values Description
–job-name= string value no spaces Sets the job name to something more friendly, useful when examining the queue.
–ntasks= integer value Sets the requested CPUS for the job
–nodes= integer value Sets the number of nodes you wish to use, useful if you want all your tasks to land on one node.
–time= D-HH:MM:SS, HH:MM:SS Sets the allowed run time for the job, accepted formats are listed in the valid values column.
–mail-type=begin,end,fail,requeue Sets when you would like the scheduler to notify you about a job running. By default no email is sent
–mail-user=email address Sets the mailto address for this job
–export= ALL,or specific variable names By default Slurm exports the current environment variables so all loaded modules will be passed to the environment of the job
–mem= integer value number in MB of memory you would like the job to have access to, each queue has default memory per CPU values set so unless your executable runs out of memory you will likely not need to use this directive.
–mem-per-cpu= integer Number in MB of memory you want per cpu, default values vary by queue but are typically greater than 1000Mb.
–nice= integer Allows you to lower a jobs priority if you would like other jobs set to a higher priority in the queue, the higher the nice number the lower the priority.
–constraint= please see sbatch man page for usage Used only if you want to constrain your job to only run on resources with specific features, please see the next table for a list of valid features to request constraints on.
–gres= name:count Allows the user to reserve additional resources on the node, specifically for our cluster gpus. e.g. –gres=gpu:2 will reserve 2 gpus on a gpu enabled node
-p partition_name Not typically used, if not defined jobs get routed to the highest priority partition your user has permission to use. If you were wanting to specifically use a lower priority partition because of higher resource availability you may do so.
Valid Constraints
Feature Description
intel Node has intel CPUs
amd Node has amd CPUs
EDR Node has an EDR (100Gbit/sec) infiniband interconnect
FDR Node has a FDR (56Gbit/sec) infiniband interconnect
QDR Node has a QDR (36Gbit/sec) infiniband interconnect
DDR Node has a DDR (16Gbit/sec) Infiniband interconnect
serial Node has no high speed interconnect
gpu Node has GPU acceleration capabilities

Monitoring your jobs

 squeue -u username 
JOBID     PARTITION    NAME    USER     STATE        TIME CPUS NODES
  719       requeue Submiss  blspcy   RUNNING       00:01    1     1

Cancel your job

scancel - Command to cancel a job, user must own the job being cancelled or must be root.

scancel <jobnumber>

Viewing your results

Output from your submission will go into an output file in the submission directory, this will either be slurm-jobnumber.out or whatever you defined in your submission script. In our example script we set this to Forge-jobnumber.out, this file is written asynchronously so it may take a bit after the job is complete for the file to show up if it is a very short job.

Moving Data

Moving data in and out of the forge can be done with a few different tools depending on your operating system and preference.

DFS volumes

Missouri S&T users can mount their web volumes and S Drives with the

mountdfs

command. This will mount your user directories to the login machine under /mnt/dfs/$USER. The data can be copied over with command line tools to your home directory, your data will not be accessible from the compute nodes so do not submit jobs from these directories. Aliases “cds” and “cdwww” have been created to allow you to cd into your s drive and web volume quickly and easily.

Windows

WinSCP

Using winSCP connect to forge.mst.edu using your SSO just as you would with ssh or putty and you will be presented with the contents of your home directory. Now you will be able to drag files into the winscp window and drop them in the folder you want them in and the copying process should begin. It should also work the same in the opposite direction to get data back out.

Filezilla

Using Filezilla you connect to forge.mst.edu using your SSO and you will have the contents of your home directory displayed, drag and drop works with Filezilla as well.

Git

git is installed on the cluster and is recommended to keep track of code changes across your research. See getting started with git for usage guides, Campus offers a hosted private git server at https://git.mst.edu at no additional cost.

Linux

Filezilla

See windows instructions

scp

scp is a command line utility that allows for secure copies from one machine to another through ssh, scp is available on most Linux distributions. If I wanted to copy a file in using scp I would open a terminal on my workstation and issue the following command.

 scp /home/blspcy/batch.sub blspcy@forge.mst.edu:/home/blspcy/batch.sub 

It will then ask me to authenticate using my campus SSO, then copy the file from my local local location of /home/blspcy/batch.sub to my forge home directory. If you have questions on how to use scp I recommend reading the man page for scp, or check it out online at SCP man page

rsync

rsync is a more powerful command line utility than scp, it has a simpler syntax, and checks to see if the file has actually changed before performing the copy. See the man page for usage details or online documentation

git

See git for windows for instruction, it works the same way.

Modules

An important concept for running on the cluster is modules. Unlike a traditional computer where you can run every program from the command line after installing it, with the cluster we install the programs to a main “repository” so to speak, and then you load only the ones you need as modules. To see which modules are available for you to use you would type “module avail”. Once you find which module you need for your file, you type “module load <module>” where <module> is the module you found you wanted from the module avail list. You can see which modules you already have loaded by typing “module list”.

Here is the output of module avail as of 02/01/2019

------------------------------------------------------------------------------------------------------ /share/apps/modulefiles/mst-app-modules ------------------------------------------------------------------------------------------------------
   AnsysEM/17/17          SAC/101.2                           ansys/17.0               curl/7.63.0              gnuplot                 matlab/2012                       msc/nastran/2014            rlwrap/0.43             trinity
   AnsysEM/19/19          SAC/101.6a                   (D)    ansys/18.0               dirac                    gperf/3.1               matlab/2013a                      msc/patran/2014             rum                     valgrind/3.12
   CORE/06.05.92          SAS/9.4                             ansys/18.1               dock/6.6eth              gridgen                 matlab/2014a               (D)    mseed2sac/2.2               samtools/0.1.19         valgrind/3.14 (D)
   CST/2014        (D)    SEGY2ASCII/2.0                      ansys/18.2               dock/6.6ib        (D)    gromacs                 matlab/2016a                      namd/2.9                    scipy                   vasp/4.6
   CST/2015               SEISAN/10.5                         ansys/19.0               espresso                 gv/3.7.4                matlab/2017a                      namd/2.10            (D)    siesta                  vasp/5.4.1    (D)
   CST/2016               SU2/3.2.8                           ansys/19.2               feh/2.28                 hadoop/1.1.1            matlab/2017b                      namd/2.12b                  spark                   vim/8.0.1376
   CST/2017               SU2/4.1.3                           apbs                     fio/3.12                 hadoop/1.2.1            matlab/2018a                      nwchem/6.6                  sparta                  visit/2.8.2
   CST/2018               SU2/5.0.0                           bison/3.1                flex/2.6.0               hadoop/2.6.0     (D)    matlab/2018b                      octave/3.8.2                spin                    visit/2.9.0   (D)
   GMT/4.5.17             SU2/6.0.0                    (D)    bowtie                   freefem++                hk/1.3                  mcnp/mcnp-nic              (D)    octave/4.2.1         (D)    starccm/6.04            vmd
   GMT/5.4.2       (D)    SeisUnix/44R2                       cadence/cadence          fun3d/custom      (D)    htop/2.0.2              mcnp/6.1                          openssl/1.1.1a              starccm/7.04            voro++/0.4.6
   GMTSAR/gmon.out        SeismicHandler/2013a                calcpicode               fun3d/12.2               imake                   mctdh                             overture                    starccm/8.04     (D)    vulcan
   GMTSAR/5.6.test        a2ps/4.14                           casino                   fun3d/12.3               jdupes/1.10.4           metis/5.1.0                       packmol                     starccm/10.04           x11/1.20.1
   GMTSAR/5.6      (D)    abaqus/6.11-2                       cmg                      fun3d/12.4               lammps/9Dec14           molpro/2010.1.nightly             parallel/20180922           starccm/10.06           xdvi/22.87.03
   JWM/2.3.7              abaqus/6.12-3                (D)    comsol/4.3a              fun3d/2014-12-16         lammps/17Nov16          molpro/2010.1.25                  parmetis/4.0.3              starccm/11.02           yade/gmon.out
   LIGGGHTS/3.8           abaqus/6.14-1                       comsol/4.4        (D)    gamess                   lammps/23Oct2017 (D)    molpro/2012.1.nightly      (D)    proj                        starccm/11.04           yade/2017.01a (D)
   OpenFoam/2.3.1         abaqus/2016                         comsol/5.2a_deng         gaussian                 lammps/24Jul17          molpro/2012.1.9                   psi4                        starccm/12.02           zpaq/7.15
   OpenFoam/2.4.x         abaqus/2018                         comsol/5.2a_park         gaussian-09-e.01         lammps/30July16         molpro/2015.1.source.block        qe                          starccm/12.06
   OpenFoam/5.x           abinit/8.6.3                        comsol/5.2a_yang         gccmakedep/1.0.3         linpack                 molpro/2015.1.source.s            qmcpack/3.0                 starccm/13.04
   OpenFoam/6.x    (D)    abinit/8.10.1                (D)    comsol/5.2a_zaeem        gdal                     lz4/1.8.3               molpro/2015.1.source              qmcpack/2014                tecplot/2013
   ParaView               accelrys/material_studio/6.1        comsol/5.3a_park         gdk-pixbuf/2.24.1        madagascar/2.0          molpro/2015.1                     qmcpack/2016         (D)    tecplot/2014     (D)
   QtCreator/Qt5          amber/12                            comsol/5.4_park          geos                     maple/16                moose/3.6.4                       quantum-espresso/6.1        tecplot/2016.old
   R/3.1.2                amber/13                     (D)    cp2k                     git/2.20.1               maple/2015       (D)    mpb-meep                          rdkit/2017-03-3             tecplot/2016
   R/3.4.2                ansys/14.0                          cplot                    gmp                      maple/2016              mpc                               rdseed/5.3.1                tecplot/2017r3
   R/3.5.1         (D)    ansys/15.0                   (D)    cpmd                     gnu-tools                maple/2017              msc/adams/2015                    rheotool/3.0                tecplot/2017

--------------------------------------------------------------------------------------------------- /share/apps/modulefiles/mst-compiler-modules ----------------------------------------------------------------------------------------------------
   cilk/5.4.6         cmake/3.9.4     gmake/4.2.1        gnu/5.4.0    gnu/7.1.0     gpi2/1.1.1            (D)    intel/2015.2.164 (D)    java/9.0.4  (D)    ninja/1.8.2    pgi/RCS/11.4,v     python/2.7.14       python/3.5.2    python/3.7.1 (D)
   cmake/3.2.1 (D)    cmake/3.11.1    gnu/4.9.2   (D)    gnu/6.3.0    gnu/7.2.0     intel/2011_sp1.11.339        intel/2018.3            julia/0.6.2        perl/5.26.1    python/gmon.out    python/2.7.15       python/3.6.2
   cmake/3.6.2        cmake/3.13.1    gnu/4.9.3          gnu/6.4.0    gpi2/1.0.1    intel/2013_sp1.3.174         java/8u161              mono/3.12.0        pgi/11.4       python/2.7.9       python/3.5.2_GCC    python/3.6.4

------------------------------------------------------------------------------------------------------ /share/apps/modulefiles/mst-mpi-modules ------------------------------------------------------------------------------------------------------
   intelmpi                             mvapich2/pgi/11.4/eth                       openmpi/1.10.7/intel/2011_sp1.11.339        openmpi/2.0.3/gnu/7.2.0             (D)    openmpi/2.1.1/gnu/7.2.0             (D)    openmpi/intel/13/eth
   mvapich2/2.2/intel/2015.2.164        mvapich2/pgi/11.4/ib                 (D)    openmpi/1.10.7/intel/2013_sp1.3.174         openmpi/2.0.3/intel/2011_sp1.11.339        openmpi/2.1.1/intel/2011_sp1.11.339        openmpi/intel/13/ib  (D)
   mvapich2/2.2/intel/2018.3     (D)    openmpi                                     openmpi/1.10.7/intel/2015.2.164      (D)    openmpi/2.0.3/intel/2013_sp1.3.174         openmpi/2.1.1/intel/2013_sp1.3.174         openmpi/intel/15/eth
   mvapich2/2.3/gnu/7.2.0               openmpi/1.10.6/gnu/4.9.3                    openmpi/1.10/gnu/4.9.2                      openmpi/2.0.3/intel/2015.2.164             openmpi/2.1.1/intel/2015.2.164             openmpi/intel/15/ib  (D)
   mvapich2/2.3/intel/2018.3            openmpi/1.10.6/gnu/5.4.0                    openmpi/1.10/intel/15                       openmpi/2.0.3/intel/2018.3          (D)    openmpi/2.1.1/intel/2018.3          (D)    openmpi/pgi/11.4/eth
   mvapich2/gnu/4.9.2/eth               openmpi/1.10.6/gnu/6.3.0                    openmpi/2.0.2/gnu/4.9.3                     openmpi/2.1.0/gnu/4.9.3                    openmpi/2.1.5/intel/2018.3                 openmpi/pgi/11.4/ib  (D)
   mvapich2/gnu/4.9.2/ib         (D)    openmpi/1.10.6/gnu/7.1.0                    openmpi/2.0.2/gnu/5.4.0                     openmpi/2.1.0/gnu/5.4.0                    openmpi/3.1.2/intel/2018.3                 rocks-openmpi
   mvapich2/intel/11/eth                openmpi/1.10.6/gnu/7.2.0             (D)    openmpi/2.0.2/gnu/6.3.0                     openmpi/2.1.0/gnu/6.3.0                    openmpi/gnu/4.9.2/eth                      rocks-openmpi_ib
   mvapich2/intel/11/ib          (D)    openmpi/1.10.6/intel/2011_sp1.11.339        openmpi/2.0.2/gnu/7.1.0              (D)    openmpi/2.1.0/gnu/7.1.0             (D)    openmpi/gnu/4.9.2/ib                (D)
   mvapich2/intel/13/eth                openmpi/1.10.6/intel/2013_sp1.3.174         openmpi/2.0.2/intel/2011_sp1.11.339         openmpi/2.1.0/intel/2011_sp1.11.339        openmpi/intel/11/backup
   mvapich2/intel/13/ib          (D)    openmpi/1.10.6/intel/2015.2.164      (D)    openmpi/2.0.2/intel/2013_sp1.3.174          openmpi/2.1.0/intel/2013_sp1.3.174         openmpi/intel/11/eth
   mvapich2/intel/15/eth                openmpi/1.10.7/gnu/6.4.0                    openmpi/2.0.2/intel/2015.2.164       (D)    openmpi/2.1.0/intel/2015.2.164      (D)    openmpi/intel/11/ib                 (D)
   mvapich2/intel/15/ib          (D)    openmpi/1.10.7/gnu/7.2.0             (D)    openmpi/2.0.3/gnu/6.4.0                     openmpi/2.1.1/gnu/6.4.0                    openmpi/intel/13/backup

------------------------------------------------------------------------------------------------------ /share/apps/modulefiles/mst-lib-modules ------------------------------------------------------------------------------------------------------
   CUnit/2.1-3                        cryptopp/7.0.0                                glibc/2.23                     (D)    libtiff/5.3.0                         netcdf/openmpi_ib/gnu/4.3.2     (D)    python/modules/theano/0.8.0
   Imlib2/1.4.4                       docbook/4.1.2                                 glpk/4.63                             libxcb/0.4.0                          netcdf/openmpi_ib/intel/3.6.2          python/modules/tinyarray/1.0.5
   Qt5/5.6.3                          eigen/3.3.4                                   gtk-doc/1.9                           libxcomposite/0.4.4                   netcdf/openmpi_ib/intel/4.3.2   (D)    readline/7.0-alt
   Qt5/5.11.3                  (D)    expat/2.2.6                                   gts/0.7.6                             libxdmcp/1.1.2                        pixman/0.34.0                          readline/7.0                   (D)
   arpack                             fftw/3.3.6/gnu/7.1.0                          hdf/4/gnu/2.10                        libxfont/1.5.4                        python/modules/argparse/1.4.0          sgao
   atlas/gnu/3.10.2                   fftw/mvapich2_ib/gnu4.9.2/2.1.5               hdf/4/intel/2.10                      libxkbfile/1.0.9                      python/modules/decorator/4.0.11        snappy/1.1.7
   atlas/intel/3.10.2                 fftw/mvapich2_ib/gnu4.9.2/3.3.4        (D)    hdf/4/pgi/2.10                        libxpm/3.5.12                         python/modules/hostlist/1.15           vtk/6.1
   autoconf-archive/2018.03.13        fftw/mvapich2_ib/intel2015.2.164/2.1.5        hdf/5/mvapich2_ib/gnu/1.8.14          libxtst/1.2.2                         python/modules/kwant/1.0.5             vtk/8.1.1                      (D)
   boost/gnu/gmon.out                 fftw/mvapich2_ib/intel2015.2.164/3.3.4 (D)    hdf/5/mvapich2_ib/intel/1.8.14        loki/gmon.out                         python/modules/lasagne/1               xbitmaps/1.1.2
   boost/gnu/1.51.0                   fftw/mvapich2_ib/pgi11.4/2.1.5                hdf/5/mvapich2_ib/pgi/1.8.14          loki/0.1.7                     (D)    python/modules/matplotlib/1.4.2        xcb/lib/1.13
   boost/gnu/1.53.0                   fftw/mvapich2_ib/pgi11.4/3.3.4         (D)    hdf/5/openmpi_ib/gnu/1.8.14           mkl/2011_sp1.11.339                   python/modules/mpi4py/2.0.1            xcb/proto/1.13
   boost/gnu/1.55.0            (D)    fftw/openmpi_ib/gnu4.9.2/2.1.5                hdf/5/openmpi_ib/intel/1.8.14         mkl/2013_sp1.3.174                    python/modules/mpmath/0.19             xcb/util/0.4.0
   boost/gnu/1.62.0                   fftw/openmpi_ib/gnu4.9.2/3.3.4         (D)    hdf/5/openmpi_ib/pgi/1.8.14           mkl/2015.2.164                 (D)    python/modules/networkx/1.11           xcb/util/image/0.4.0
   boost/gnu/1.65.1-python2           fftw/openmpi_ib/intel2015.2.164/2.1.5         lapack/3.8.0                          mkl/2018.3                            python/modules/nolearn/0.6             xcb/util/keysyms/0.4.0
   boost/gnu/1.65.1                   fftw/openmpi_ib/intel2015.2.164/3.3.4  (D)    libdrm/2.4.94                         ncurses/6.1                           python/modules/numpy/1.10.0            xcb/util/renderutil/0.3.9
   boost/gnu/1.69.0                   fftw/openmpi_ib/pgi11.4/2.1.5                 libepoxy/1.5.2                        netcdf/mvapich2_ib/gnu/3.6.2          python/modules/obspy/1.0.4             xcb/util/wm/0.4.1
   boost/intel/1.55.0          (D)    fftw/openmpi_ib/pgi11.4/3.3.4          (D)    libfontenc/1.1.3                      netcdf/mvapich2_ib/gnu/4.3.2   (D)    python/modules/opencv/3.1.0            xorg-macros/1.19.1
   boost/intel/1.62.0                 freeglut/3.0.0                                libgbm/2.1                            netcdf/mvapich2_ib/intel/3.6.2        python/modules/pylab/0.1.4             xorgproto/2018.4
   cgns/openmpi_ib/gnu         (D)    glibc/2.14                                    libqglviewer/2.7.0                    netcdf/mvapich2_ib/intel/4.3.2 (D)    python/modules/scipy/0.16.0            xtrans/1.3.5
   cgns/openmpi_ib/intel              glibc/2.18                                    libqglviewer/2.7.1             (D)    netcdf/openmpi_ib/gnu/3.6.2           python/modules/sympy/1.0

  Where:
   D:  Default Module

Use "module spider" to find all possible modules.
Use "module keyword key1 key2 ..." to search for all possible modules matching any of the "keys".

Compiling Code

There are several compilers available through modules, to see a full list of modules run

module avail 

the naming scheme for the compiler modules are as follows.

MPI_PROTOCOL/COMPILER/COMPILER_VERSION/INTERFACE e.g openmpi/intel/15/ib is the 2015 intel compiler built with openmpi libraries and is set to communicate over the high speed infiniband interface.

After you have decided which compiler you want to use you need to load it.

 module load openmpi/intel/15/ib 

Then compile your code, use mpicc for c code and mpif90 for fortran code. Here is an MPI hello world C code.

helloworld.c
/* C Example */
#include <stdio.h>
#include <mpi.h>
 
 
int main (argc, argv)
     int argc;
     char *argv[];
{
  int rank, size;
 
  MPI_Init (&argc, &argv);	/* starts MPI */
  MPI_Comm_rank (MPI_COMM_WORLD, &rank);	/* get current process id */
  MPI_Comm_size (MPI_COMM_WORLD, &size);	/* get number of processes */
  printf( "Hello world from process %d of %d\n", rank, size );
  MPI_Finalize();
  return 0;
}

Use mpicc to compile it.

 mpicc ./helloworld.c 

Now you should see a a.out executable in your current working directory, this is your mpi compiled code that we will run when we submit it as a job.

Submitting an MPI job

You need to be sure that you have the same module loaded in your job environment as you did when you compiled the code to ensure that the compiled executables will run correctly, you may either load them before submitting a job and use the directive

 #SBATCH --export=all 

in your submission script, or load the module prior to running your executable in your submission script. Please see the sample submission script below for an mpi job.

helloworld.sub
#!/bin/bash
#SBATCH -J MPI_HELLO
#SBATCH --ntasks=8
#SBATCH --export=all
#SBATCH --out=Forge-%j.out
#SBATCH --time=0-00:10:00
#SBATCH --mail-type=begin,end,fail,requeue
 
module load openmpi/intel/15/ib
mpirun ./a.out

Now we need to submit that file to the scheduler to be put into the queue.

 sbatch helloworld.sub 

You should see the scheduler report back what job number your job was assigned just as before, and you should shortly see an output file in the directory you submitted your job from.

Interactive jobs

Some things can't be run with a batch script because they require user input, or you need to compile some large code and are worried about bogging down the login node. To start an interactive job simply use the

sinteractive

command and your terminal will now be running on one of the compute nodes. The hostname command can help you confirm you are no longer running on a login node. Now you may run your executable by hand without worrying about impacting other users. The sinteractive script by default will allot a 1 cpu for 1 hour, you may request more by using SBATCH directives, e.g.

sinteractive --time=02:00:00 --cpus-per-task=2

will start a job with 2 CPUs on one node for 2 hours.

If you will need a GUI Window for whatever you are running inside the interactive job you will need to connect to The Forge with X forwarding enabled. For Linux this is simply adding the -X switch to the ssh command.

 ssh forge.mst.edu -X 

For Windows there are a couple X server software's available for use, x-ming and x-win32 that can be configured with putty. Here is a simple guide for configuring putty to use xming.

Job Arrays

If you have a large number of jobs you need to start I recommend becoming familiar with using job arrays, basically it allows you to submit one job file to start up to 10000 jobs at once.

One of the ways you can vary the input of the job array from task to task is to set a variable based on which array id the job is and then use that value to read the matching line of a file. For instance the following line when put into a script will set the variable PARAMETERS to the matching line of the file data.dat in the submission directory.

PARAMETERS=$(awk -v line=${SLURM_ARRAY_TASK_ID} '{if (NR == line) { print $0; };}' ./data.dat)

You can then use this variable in your execution line to do whatever you would like to do, you just have to have the appropriate data in the data.dat file on the appropriate lines for the array you are submitting. See the sample data.dat file below.

data.dat
"I am line number 1"
"I am line number 2"
"I am line number 3"
"I am line number 4"

you can then submit your job as an array by using the –array directive, either in the job file or as an argument at submission time, see the example below.

array_test.sub
#!/bin/bash
#SBATCH -J Array_test
#SBATCH --ntasks=1
#SBATCH --out=Forge-%j.out
#SBATCH --time=0-00:10:00
#SBATCH --mail-type=begin,end,fail,requeue
 
 
PARAMETERS=$(awk -v line=${SLURM_ARRAY_TASK_ID} '{if (NR == line) { print $0; };}' ./data.dat)
 
echo $PARAMETERS

I prefer to use the array as an argument at submission time so I don't have to touch my submission file again, just the data.dat file that it reads from.

sbatch --array=1-2,4 array_test.sub

Will execute lines 1,2, and 4 of data.dat which echo out what line number they are from my data.dat file.

You may also add this as a directive in your submission file and submit without any switches as normal. Adding the following line to the header of the submission file above will accomplish the same thing as supplying the array values at submission time.

#SBATCH --array=1-2,4 

Then you may submit it as normal

 sbatch array_test.sub 

Converting From Torque

With the new cluster we have changed resource managers from PBS/Maui to Slurm, We have provided a script to convert your submission files from PBS directives to slurm. This script, pbs2slurm.py, should be in your executable path and can be used as follows.

pbs2slurm.py < batch.qsub > batch.sbatch 

Where batch.qsub is the name of your old submission script file and batch.sbatch is the name of your new submission script file. Please see the table below for commonly used command conversions.

PBS/Maui Command Slurm Equivalent Purpose
qsub sbatch Submit a job for execution
qdel scancel Cancel a job
showq squeue check the status of jobs
qstat squeue check the status of jobs
checkjob scontrol show job check the detailed information of a specific job
checknode scontrol show node check the detailed information of a specific node

Checking your account usage

If you have purchased a number of CPU hours from us you may check on how many hours you have used by issuing the

usereport

command from a login node, this will show your account's CPU hour limit and the total amount used.

Note this is usage for your account, not your user.

Applications

The applications portion of this wiki is currently a Work in progress, not all applications are currently here, but they will eventually be.

Abaqus

  • Default Vesion = 6.12.3
  • Other versions available: 2019, 2018, 2016, 6.14.1, 6.11.2
module load abaqus/2019
module load abaqus/2018
module load abaqus/2016
module load abaqus/6.14-1
module load abaqus/6.12-3

This example is for 6.12

The SBATCH commands are explained in the Forge Documentation.

You will need a *.inp file from where you built the model on another system. This .inp file needs to be in the same folder as the jobfile you will create.

Example jobfile:

abaqus.sub
#!/bin/bash
#SBATCH --job-name=sbatchfilename.sbatch
#SBATCH --nodes=1
#SBATCH --ntasks=2
#SBATCH --mem=4000
#SBATCH --time=00:60:00
#SBATCH --mail-type=BEGIN
#SBATCH --mail-type=END
#SBATCH --mail-type=FAIL
#SBATCH --mail-user=joeminer@mst.edu\
 
input=inputfile.inp
 
/share/apps/ABAQUS/Commands/abq6123 job=$input analysis cpus=2 interactive
  • Replace the sbatchfilename.sbatch with the name of your jobfile.
  • Replace the email address with your own.
  • Replace the inputfile.inp with the *.inp file that you want to analyze.
  • Keep ntasks and cpu equal to the same amount of processors

This job will run on 1 node with 2 processors and allocate 4GB of memory per processor. These node/processor values can change, but be aware that the Abaqus licensing is limited and the job may queue for hours or days before licenses are available, depending on the quantities you selected.

Simple breakdown of Abaqus Licensing:

Parallel License Number of Processors used Tokens Allocated
1420
2840
31660
43280
564100

Abaqus 2016

module load abaqus/2016
abaqus2016.sbatch
#!/bin/bash
#SBATCH --job-name=sbatchfilename.sbatch
#SBATCH --nodes=1
#SBATCH --ntasks=20
#SBATCH --mem=4000
#SBATCH --partition=requeue
#SBATCH --time=10:00:00
#SBATCH --mail-type=BEGIN
#SBATCH --mail-type=END
#SBATCH --mail-type=FAIL
#SBATCH --mail-user=joeminer@mst.edu
 
unset SLURM_GTIDS
 
input=inputfile.inp
time abq2016hf3 job=$input analysis standard_parallel=all cpus=20 interactive
 unset SLURM_GTIDS 

This is a new variable that must be in place for 2016 to run with MPI.

Chain loading jobs for Abaqus 2016.

abaqus2016chain.sbatch
#!/bin/bash
#SBATCH --job-name=sbatchfilename.sbatch
#SBATCH --nodes=1
#SBATCH --ntasks=20
#SBATCH --mem=4000
#SBATCH --partition=requeue
#SBATCH --time=10:00:00
#SBATCH --mail-type=BEGIN
#SBATCH --mail-type=END
#SBATCH --mail-type=FAIL
#SBATCH --mail-user=joeminer@mst.edu
 
unset SLURM_GTIDS
 
input=inputfile.inp
oldjob=oldjobfile.inp
 
abq2016hf3 job=$input oldjob=$oldjob cpus=20 double interacive

Ansys

  • Default Version = 15.0
  • Other versions available: 14.0, 17.0, 18.0, 18.1, 18.2, 19.0, 19.2

Running the Workbench

Be sure you are connected to the Forge with X forwarding enabled, and running inside an interactive job using command

  sinteractive

before you attempt to launch the work bench. Running sinteractive without any switches will give you 1 cpu for 1 hour, if you need more time or resources you may request it. See Interactive Jobs for more information.
Once inside an interactive job you need to load the ansys module.

  module load ansys

Now you may run the workbench.

  runwb2


Job Submission Information


Fluent is the primary tool in the Ansys suite of software used on the Forge.
Most of the fluent simulation creation process is done on your Windows or Linux workstation.
The 'Solving' portion of a simulation is where the Forge is utilized.
Fluent will output a lengthy file, based on the simulation being run and that lengthy output file would be used on your Windows or Linux Workstation to do the final review and analysis of your simulation.

The basic steps


1. Create your geometry
2. Setup your mesh
3. Setup your solving method
4. Use the .cas and .dat files, generated from the first three steps, to construct your jobfile
5. Copy those files to the Forge, to your home folder
6. Create your jobfile using the slurm tools on the Forge Documentation page
7. Load the Ansys module
8. Submit your newly created jobfile with sbatch

Serial Example.

I used the Turbulent Flow example from Cornell's SimCafe.
On the Forge, I have this directory structure for this example. Please create your own structure that makes sense to you.

TurbulentFlow/
|-- flntgz-48243.cas
|-- flntgz-48243.dat
|-- output.dat
|-- slurm-8731.out
|-- TurbulentFlow_command.txt
|-- TurbulentFlow.sbatch

The .cas file is the CASE file that contains the parameters define by you when creating the model.
The .dat file is the data result file used when running the simulation.
The .txt file, is the actual, command equivalent, of your model, in a form that the Forge understands.
The .sbatch file, is the slurm job file that you will use to submit your model for analysis.
The .out file is the output from the run.
The .dat file is the binary (ansys specific) file created during the solution, that could be imported into Ansys back on the Windows/Linux workstation for further analsys.

Jobfile Example.

TurbulentFlow.sbatch
#!/bin/bash
#SBATCH --job-name=TurbulentFlow.sbatch
#SBATCH --ntasks=1
#SBATCH --nodes=1
#SBATCH --time=01:00:00
#SBATCH --mail-type=BEGIN
#SBATCH --mail-type=END
#SBATCH --mail-type=FAIL
#SBATCH --mail-user=rlhaffer@mst.edu
#SBATCH -o Forge-%j.out
 
fluent 2ddp -g < /home/rlhaffer/unittests/ANSYS/TurbulentFlow/TurbulentFlow_command.txt

The SBATCH commands are explained in the Forge Documentation.

The job-name is a name given to help you determine which job is which.
This job will be in the — partition=requeue queue.
It will use 1 node — nodes=1.
It will use 4 processors in one node — ntasks=4.
It has a wall clock time of 1 hour — time=01:00:00.
It will email the user when it begins, ends, or if it fails. –mail-type & –mail-user
fluent is the command we are going to run.
2ddp is the mode we want fluent to use

Modes The [mode] option must be supplied and is one of the following:
* 2d runs the two-dimensional, single-precision solver
* 3d runs the three-dimensional, single-precision solver
* 2ddp runs the two-dimensional, double-precision solver
* 3ddp runs the three-dimensional, double-precision solver

-g turns off the GUI
Path to the command file we are calling in fluent. < /home/rlhaffer/unittests/ANSYS/TurbulentFlow/TurbulentFlow_command.txt

Contents of command file
This file can get long. As it contains the .cas file & .dat file information as well as saving frequency and iteration count
NOTE, this is all in one line when creating the command file

/file/rcd /home/rlhaffer/unittests/ANSYS/TurbulentFlow/flntgz-48243.cas /file/autosave/data-frequency 20000 /solve/iterate 150000 /file/wd /home/rlhaffer/unittests/ANSYS/TurbulentFlow/output.dat /exit

When the simulation is finished, you will have a Forge-#####.out file that looks something like this:

/share/apps/ansys_inc/v150/fluent/fluent15.0.7/bin/fluent -r15.0.7 2ddp -g
/share/apps/ansys_inc/v150/fluent/fluent15.0.7/cortex/lnamd64/cortex.15.0.7 -f fluent -g (fluent "2ddp  -alnamd64 -r15.0.7 -path/share/apps/ansys_inc/v150/fluent")
Loading "/share/apps/ansys_inc/v150/fluent/fluent15.0.7/lib/fluent.dmp.114-64"
Done.
/share/apps/ansys_inc/v150/fluent/fluent15.0.7/bin/fluent -r15.0.7 2ddp -alnamd64 -path/share/apps/ansys_inc/v150/fluent -cx edrcompute-43-17.local:56955:53521
Starting /share/apps/ansys_inc/v150/fluent/fluent15.0.7/lnamd64/2ddp/fluent.15.0.7 -cx edrcompute-43-17.local:56955:53521

     Welcome to ANSYS Fluent 15.0.7

     Copyright 2014 ANSYS, Inc.. All Rights Reserved.
     Unauthorized use, distribution or duplication is prohibited.
     This product is subject to U.S. laws governing export and re-export.
     For full Legal Notice, see documentation.

Build Time: Apr 29 2014 13:56:31 EDT  Build Id: 10581
 
Loading "/share/apps/ansys_inc/v150/fluent/fluent15.0.7/lib/flprim.dmp.1119-64"
Done.

     --------------------------------------------------------------
     This is an academic version of ANSYS FLUENT. Usage of this product
     license is limited to the terms and conditions specified in your ANSYS
     license form, additional terms section.
     --------------------------------------------------------------


Cleanup script file is /home/rlhaffer/unittests/ANSYS/TurbulentFlow/cleanup-fluent-edrcompute-43-17.local-17945.sh

> 
Reading "/home/rlhaffer/unittests/ANSYS/TurbulentFlow/flntgz-48243.cas"...
    3000 quadrilateral cells, zone  2, binary.
    5870 2D interior faces, zone  1, binary.
      30 2D velocity-inlet faces, zone  5, binary.
      30 2D pressure-outlet faces, zone  6, binary.
     100 2D wall faces, zone  7, binary.
     100 2D axis faces, zone  8, binary.
    3131 nodes, binary.
    3131 node flags, binary.

Building...
     mesh
     materials,
     interface,
     domains,
	mixture
     zones,
	pipewall
	outlet
	inlet
	interior-surface_body
	centerline
	surface_body
Done.

Reading "/home/rlhaffer/unittests/ANSYS/TurbulentFlow/flntgz-48243.dat"...
Done.
  iter  continuity  x-velocity  y-velocity           k     epsilon     time/iter
!  389 solution is converged
   389  9.7717e-07  1.0711e-07  2.9115e-10  5.2917e-08  3.4788e-07  0:00:00 150000
!  390 solution is converged
   390  9.5016e-07  1.0389e-07  2.8273e-10  5.1020e-08  3.3551e-07  1:11:14 149999

Writing "/home/rlhaffer/unittests/ANSYS/TurbulentFlow/output.dat"...
Done.

Parallel Example

To use fluent in parallel please you need set the PBS_NODEFILE envrionment variable inside your job. Please see example submission file below.

TurbulentFlow.sbatch
#!/bin/bash
 
#SBATCH --job-name=TurbulentFlow.sbatch
#SBATCH --ntasks=32
#SBATCH --time=01:00:00
#SBATCH --mail-type=BEGIN
#SBATCH --mail-type=END
#SBATCH --mail-type=FAIL
#SBATCH --mail-user=rlhaffer@mst.edu
#SBATCH -o Forge-%j.out
 
#generate a node file
export PBS_NODEFILE=`generate_nodefile`
#run fluent in parallel.
fluent 2ddp -g -t32 -pinfiniband -cnf=$PBS_NODEFILE -ssh < /home/rlhaffer/unittests/ANSYS/TurbulentFlow/TurbulentFlow_command.txt

Interactive Fluent

If you would like to run the full GUI you may do so inside an interactive job, make sure you've connected to The Forge with X Forwarding enabled. Start the job with.

sinteractive

This will give you 1 processor for 1 hour, to request more processors or more time please see the documentation at Interactive Jobs.

Once inside the interactive job you will need to load the ansys module.

module load ansys

Then you may start fluent from the command line.

fluent 2ddp 

will start the 2d, double precision version of fluent. If you've requested more than one processor you need to first run

export PBS_NODEFILE=`generate_nodefile`

Then you need to add some switches to fluent to get it to use those processors.

fluent 2ddp -t## -pethernet -cnf=$PBS_NODEFILE -ssh

You need to replace the ## with the number of processors requested.

Ansys Mechanical

Tested for version 18.1

Simple tutorial

round.sbatch
#!/bin/bash
#SBATCH --job-name=round
#SBATCH --ntasks=10
#SBATCH --mem=20000
#SBATCH -p=requeue
#SBATCH --time=00:60:00
--comment=#SBATCH --mail-type=begin
--comment=#SBATCH --mail-type=end
#SBATCH --export=all
#SBATCH --out=Forge-%j.out
 
module load ansys/18.1
 
time ansys181 -j round -b -dis -np 10 < /home/rlhaffer/unittests/ANSYS/roundthing/round.log

ansys mechanical log file

round.inp
/INPUT,'round','inp',,0,0
/SOLU
SOLVE
FINISH

AnsysEM

  • Default Version = 17.0
  • Other installed version: 19.0
  • Also Called Ansys Electronics Desktop
ansysem.sub
#!/bin/bash
#SBATCH --job-name=jobfile.sbatch
#SBATCH --partition=requeue
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=20
#SBATCH --time=01:00:00
#SBATCH --mail-type=BEGIN
#SBATCH --mail-type=END
#SBATCH --mail-type=FAIL
#SBATCH --mail-user=joeminer@mst.edu
 
module load AnsysEM/17
 
time ansysedt -ng -batchsolve Model_Cluster_Test_v2.aedt

CMG

Computer Modeling Group - This is a WORKING SECTION - Edits are actively taking place.

Default version = 2015

Constructing the jobfile

From the Windows installation the .dat model will be needed.

Please follow the general Forge documentation for submission techniques.

After logged into The Forge, run:

module load cmg

This loads environment variables needed to for CMG to function.

Copy your .dat file from your Windows system to your cluster home folder. This is explained in the Forge documentation.

Example sbatch jobfile

cmg.sub
#!/bin/bash
#SBATCH --job-name=batchfilename.sbatch
#SBATCH --nodes=1
#SBATCH --ntasks=7
#SBATCH --mem=17000
#SBATCH --time=10:00:00
#SBATCH --mail-type=BEGIN
#SBATCH --mail-type=END
#SBATCH --mail-type=FAIL
#SBATCH --mail-user=joeminer@mst.edu
#SBATCH --export=ALL
 
RunSim.sh gem 2015.10 GasModel4P.dat -parasol 7
RunSim.h the CMG sim script that allows the simulation to run on multiple processors
gem the solver specified
2015.10 version of the software used
GasModel4P.dat the model to be analyzed
-parasol 7 this is the command switch to tell CMG there are processors to use. In this example, 7 processors were allocated.

CMG licensing

When running the above example, we discovered how CMG’s licensing really works.
We have varying quantities of licenses for different features and each seat of those licenses equates to 20 solver tokens that will be allocated to jobs on the cluster.

For this example..

20 tokens of the solve-stars solver are being used, because we have an instance of the solver open.
35 tokens of the solve-parallel solver are being used, because we have asked for 7 processors.

We currently have 8 seats for Stars and 8 seats for Parallel. This equates to 160 tokens of each. Out of that 160 tokens, your job can take at least 20 tokens from the stars solver, and up to 160 tokens of the parallel solver.

160 tokens of the parallel solver actually equates to 32 processors when building the job file.

So, the max processors requested is 32.

Token Scenario example

If a student submits a parallel job asking for 32 processors (all 160 tokens) and all 160 are free, the job will begin running.

If a second student then submits a job asking for 10 processors (50 tokens) but all 160 are currently in use, the job will fail. In the output file of the failed job, it will indicate that the job failed because it did not have enough licenses to run.

If this happens, the best suggestion is to modify your jobfile with a smaller number of processors. This may take a few iterations until you find a number of processors that will work.
There is no efficient way to know how many licenses are in use.

Licensing for CMG is limited. Some features have 2160 tokens, some only 100.
Please be patient when submitting jobs as they could fail in this no license manner.

The jobs tested only indicate the solve-star and solve-parallel are being used. We have not seen a test case using these other features:

gem_forgas
gem_gap
imex_forgas
imex_gap
imex_iam
stars_forgas
stars_gap
winprop
builder
cmost_studio
dynagrid
solve_university
rlm_roam

Comsol

Licensing

The licensing scheme for Comsol is seat based, which means if you only have 1 license, and you have it installed on your workstation and the Forge you can only use one or the other at the same point in time. If your workstation has the license checked out when your Forge job starts, your job will fail to get a license and will fail to run.

This problem only gets compounded when you are also working with other users inside a group with a handful of shared seats. You will need to coordinate with them when you and they will run so that you don't run into license problems when trying to run.

Running Batch

Running Comsol with the Forge as your solver is fairly straight forward, first you load your module (module names for comsol differ by which license pool you are using).

  module load comsol/5.4_$pool
  

then create your batch file for submission.

comsol.sbatch
#SBATCH --job-name=comsol_test
#SBATCH --nodes=1
#SBATCH --ntasks=20
#SBATCH --mem=20000
#SBATCH --time=60:00:00
#SBATCH --export=ALL
 
comsol -np 20 batch -inputfile multilayer.mph -outputfile laminate_out.mph

Please note that comsol is a memory intensive application, you will likely have to adjust the values for mem and ntasks to suit what your simulation will need.

We also advise creating the input file on a windows workstation and using the Forge for simulation solving, however running interactively should be possible inside an interactive job with X forwarding enabled.

CST

  • (Computer Simulation Techonolgy)
  • Default version = 2014
  • Other versions available = 2015, 2016, 2017, 2018

Job Submission information

CST is an electromagnetic simulation solver, primarily used by the EMCLab.

Steps.

  1. Build your simulation on one of the Windows workstations availble to you.
  2. Copy your .cst file to your home directory on the Forge.
  3. Write you job submission file.
  4. Either use the default CST version of 2014 or use 2015 by loading its module file.
 module load CST/2015 

Example Job File

CST.sub
#!/bin/bash
#SBATCH --job-name=CST_Test
#SBATCH --nodes=1
#SBATCH --ntasks=2
#SBATCH --mem=4000
#SBATCH --partition=cuda
#SBATCH --gres=gpu:2
#SBATCH --time=08:00:00
#SBATCH --mail-type=FAIL
#SBATCH --mail-type=BEGIN
#SBATCH --mail-type=END
#SBATCH --mail-user=joeminer@mst.edu
 
cst_design_environment /home/userID/path/to/.cst/file/example.cst --r --withgpu=2

This job will use 1 node, 2 processors from that node, 4GB of memory, the cuda queue, 2 gpus, 8 hours of wall clock time and email the user when the job starts, ends, or fails.

This will output the results in the standard CST folder structure:

/path/to/.cst/file/example
|-- DC
|-- Model
|-- ModelCache
|-- Model.lok
|-- Result
|-- SP
`-- Temp

Cuda

If you would like to use CUDA programs or GPU accelerated code you will need to use our GPU nodes. Access to our GPU nodes is granted on a case by case basis, please request access to them through the help desk or submit a ticket at http://help.mst.edu.

Our login nodes don't have the CUDA toolkit installed so to compile your code you will need to start an interactive job on these nodes to do your compilation.

sinteractive -p cuda --time=01:00:00 --gres=gpu:1

This interactive session will start on a cuda node and give you access to one of the GPUs on the node, once started you may compile your code and do whatever testing you need to do inside this interactive session.

To submit a job for batch processing please see this example submission file below.

cuda.sub
#!/bin/bash
#SBATCH -J Cuda_Job
#SBATCH -p cuda
#SBATCH -o Forge-%j.out
#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --gres=gpu:1
#SBATCH --time=01:00:00
 
./a.out

This file requests 1 cpu and 1 gpu on 1 node for 1 hour, to request more cpus or more gpus you will need to modify the values related to ntasks and gres=gpu. It is recommended that you at least have 1 cpu for each gpu you intend to use, we currently only have 2 gpus available per node. Once we incorporate the remainder of the GPU nodes we will have 7 gpus available in one chassis.

Espresso

All of the available pseudopotential files .UPF as of 12/14/2015, are copied into the pseudo install folder.

Before running an Espresso job, run:

 module load espresso 

Example:

The sbatch file is the job file you will run on the Forge HPC.
The .pw.in is the input file you create, that you will reference in the jobfile.
In the job file, change the path to your input file.
In the .pw.in file, change the paths of outdir, wfcdir to a location in your Forge home directory and change the pseudo_dir path as in the example below.

FeGeom2BU.sbatch
#!/bin/bash
#SBATCH --job-name=FeGeomBU
#SBATCH --ntasks=20
#SBATCH --time=0-04:00:00
#SBATCH --mail-type=begin,end,fail,requeue
#SBATCH --out=forge-%j.out
 
mpirun pw.x -procs 20 -i /path/to/your/inputfile/FeGeom2BU.pw.in
FeGeom2BU.pw.in
&CONTROL
                       title = 'Na3FePO4CO3 geom+U' ,
                 calculation = 'vc-relax' ,
                      outdir = '/path/to/your/inputfile' ,
                      wfcdir = '/path/to/your/inputfile' ,
                  pseudo_dir = '/share/apps/espresso/espresso-5.2.1/pseudo' ,
                      prefix = 'Na3FePO4CO3Geom' ,
                   verbosity = 'high' ,
 /
 &SYSTEM
                       ibrav = 12,
                           A = 8.997 ,
                           B = 5.163 ,
                           C = 6.741 ,
                       cosAB = -0.0027925231 ,
                         nat = 11,
                        ntyp = 5,
                     ecutwfc = 40 ,
                     ecutrho = 400 ,
                  lda_plus_u = .true. ,
             lda_plus_u_kind = 0 ,
                Hubbard_U(1) = 6.39,
                 space_group = 11 ,
                     uniqueb = .false. ,
 /
 &ELECTRONS
                    conv_thr = 1d-6 ,
                 startingpot = 'file' ,
                 startingwfc = 'atomic' ,
 /
 &IONS
                ion_dynamics = 'bfgs' ,
 /
 &CELL
               cell_dynamics = 'bfgs' ,
                 cell_dofree = 'all' ,
 /
ATOMIC_SPECIES
   Fe   55.93300  Fe.pbe-sp-van_mit.UPF 
    P   30.97400  P.pbe-van_ak.UPF 
    C   12.01100  C.pbe-van_ak.UPF 
    O   15.99900  O.pbe-van_ak.UPF 
   Na   22.99000  Na.pbe-sp-van_ak.UPF 
ATOMIC_POSITIONS crystal_sg 
   Fe      0.362400000    0.279900000    0.250000000    
    P      0.587500000    0.798900000    0.250000000    
   Na      0.740900000    0.251300000    0.002200000    
   Na      0.917800000    0.738500000    0.250000000    
    C      0.060500000    0.234500000    0.250000000    
    O      0.123300000    0.461600000    0.250000000    
    O      0.143000000    0.031700000    0.250000000    
    O      0.919500000    0.215700000    0.250000000    
    O      0.321000000    0.281700000    0.565500000    
    O      0.431600000    0.674000000    0.250000000    
    O      0.569400000    0.096500000    0.250000000    
K_POINTS automatic 
  6 12 10   1 1 1 

Lammps

To use lammps you will need to load the lammps module

module load lammps

and use a submission file similar to the following

lammps.sub
#SBATCH -J Zn-ZnO-rapid
 
#SBATCH --ntasks=4
 
#SBATCH --mail-type=FAIL
#SBATCH --mail-type=BEGIN
#SBATCH --mail-type=END
#SBATCH --time=02:00:00
 
#SBATCH --export=ALL  
 
module load lammps/23Oct2017
 
mpirun lmp_forge < in.heating

Other options may be configured. please refer to the lammps documentation .

Version 23Oct2017 has support for the following packages

asphere body class2 colloid compress coreshell dipole granular kspace manybody mc meam misc molecule mpiio opt peri python qeq replica rigid shock snap srd user-cgdna user-cgsdk user-diffraction user-dpd user-drude user-eff user-fep user-lb user-manifold user-meamc user-meso user-mgpt user-misc user-molfile user-netcdf user-omp user-phonon user-qtb user-reaxc user-smtbq user-sph user-tally user-uef

Maple

Default version = 2015 Other installed versions: 2016, 2017

Example:

This is a simple polynomial and the goal is to find the roots of the polynomial.

Open Maple on your Windows or Linux workstation and start a new project:

findroots.mpl
#Solve a simple polynomial equation\\
 
f(x):=(x^9-x^8+x^7-x^6+x^5-x^4+x^3-x^2+x^4+9*x^3-2.4^2+.5*x)/(x-1);\\
myroots:=solve(f(x)=148596);\\
 
#Save the results into a Maple input file for later post-processing.
save myroots, "myroots.mpl";

When this runs, it will save the output to the myroots.mpl file.

As seen in Maple:

Now, you need to SSH into the Forge with your MST userID/password.

Once connected, you need to copy the findroots.mpl file from your local system to your home directory on the cluster.

You can build your folder structure on the Forge how you wish, just use something that is meaningful.

Once copied, you can return to your terminal on Forge and create your jobfile to run your findroots.mpl project.

Open vi to create your job file.

Since this example is simple, it is assigned 1 node, with 2 processors and 15 minutes wall time.
The job will email when it begins, ends or fails.

First:

 module load maple/2015

Then write your jobfile:

maple.sub
##Example Maple script
#!/bin/bash
 
#SBATCH --job-name=maple_example
#SBATCH --nodes=1
#SBATCH --ntasks=2
#SBATCH --mem=2000
#SBATCH --time=00:15:00
#SBATCH --mail-type=BEGIN
#SBATCH --mail-type=END
#SBATCH --mail-type=FAIL
#SBATCH --mail-user=rlhaffer@mst.edu
 
maple -q findroots.mpl

Then run:

 sbatch maple.sbatch


When the job finishes, you will have a

slurm-jobID.out


This contains the output of the finished job.

Matlab

IMPORTANT NOTE Currently campus has 100 Matlab seat licenses to be shared between the Forge and research desktops. There are certain times of the year where Matlab usage is quite high. License check out is on a first come, first served basis. If you are not able to get a Matlab license, you might consider using GNU Octave. This is available on the Forge and will do much of what Matlab will do.

Matlab is available to run in batch form or interactively on the cluster.

  • Default version = 2014a
  • Other installed version: 2012, 2013a, 2016a, 2017a, 2017b, 2018a, 2018b

Interactive Matlab

The simplest way to get up and running with matlab on the cluster is to simply run

matlab

from the login node. This will start an interactive job on the backend nodes, load the 2014a module and open matlab. If you have connected with X forwarding you will get the full matlab GUI to use however you would like. This method however limits you to 1 core for 4 hours maximum on one of our compute nodes. To use more than 1 core, or run for longer than 4 hours, you will need to submit your job as a batch submission and use Matlab's Distributive Computing Engine.

Batch Submit

If you want to use Batch Submissions for Matlab you will need to create a submission script similar to the ones above in quick start, but you will want to limit the nodes your job runs on 1, please see the sample submission script below.

matlab.sub
#!/bin/bash
 
#SBATCH --nodes=1
#SBATCH --ntasks=12
#SBATCH -J Matlab_job
#SBATCH -o Forge-%j.out
#SBATCH --time=01:00:00
 
module load matlab
matlab < helloworld.m

This submission asks for 12 processors on 1 node for an hour, the maximum per node we currently have is 64. If you load the any version older than 2014a your matlab can't use more than 12 processors as that is what mathworks limited you to using on one machine, however Mathworks lifted this limitation to 512 processors in 2014a.

Distributive Computing Engine

Mathworks has developed a parallel computing product that the campus has purchased licenses for, We have developed a parallel cluster profile that must be loaded on a user to user basis. To begin using the distributive computing profile you must open an interactive matlab session with X forwarding so you get the matlab GUI.

[blspcy@login-44-0 ~]$ matlab 

Once the GUI is open you need to open the parallel computing options menu, noted simply by parallel.

Now open the Parallel profile manager and import the Forge.settings profile from the root directory of the version of matlab you are using, e.g. for 2014a /share/apps/matlab/matlab-2014/

You will now have a Forge_Import profile in your profile list, select this profile and set it as default.

If you would like you may validate the imported profile and see that it is indeed working.

The validation will run a series of five tests which it should pass, if for any reason the profile doesn't pass any test please contact the helpdesk or create a ticket at http://help.mst.edu

To modify how many workers the pool will try to use when you call

 matlabpool open 

in your matlab script you may edit the profile and set the NumWorkersRange value to the number you wish to use, or simply specify when you call the pool open

matlabpool open 24

will open a pool of 24 workers.

You may now use matlabpool open in your code and it will call the Forge cluster profile configured above, These steps must be followed for every version of Matlab you wish to use before calling the pool. The parallel profiles are only built for versions newer than 2014a.

To make use of this new found power you must implement

 matlabpool open 

calls in your matlab script then run it using either the interactive method or batch submit method listed above, make sure to close the pool when you are finished with your work with

 matlabpool close 

Keep in mind that the University only has 160 of these licenses and your job may fail if someone else is using the licenses when your job runs.

METIS

\*\*\Under Construction/*/*/

METIS and parMETIS are installed as modules, to use the software please load the modules with either

module load metis

or

module load parmetis

Once loaded you may submit jobs using the METIS or parMETIS binaries as defined in the METIS user manual or the parMETIS user manual

MSC

Nastran

MSC nastran 2014.1 is available through the module msc/nastran/2014

 module load msc/nastran/2014 

Here is an example job file.

msc.sub
#!/bin/bash
 
#SBATCH -J Nastran
#SBATCH -o Forge-%j.out
#SBATCH --ntasks=1
#SBATCH --time=01:00:00
 
nast20141 test.bdf
#you should also be able to use .dat files with nastran this way.

Submit it as normal.

NAMD

This article is under development. Things referenced below are written as we make progress and some instruction may not work as intended.

Below is an example file for namd, versions prior to 2.12b have not been tested as of yet and may not function in the same way as 2.12b.

namd.sub
#SBATCH -J NAMD_EXAMPLE
#SBATCH --ntask=16
#SBATCH --time=01:00:00
#SBATCH -o Forge-%J.out
 
module load namd/2.12b
 
charmrun +p16 namd2 [options]

You will need to give namd2 whatever options you would like to run, user documentation for what those may be is available from the developers at http://www.ks.uiuc.edu/Research/namd/2.12b1/ug/.

Gamess

To use gamess you will need to load the gamess module

module load gamess

and use a submission file similar to the following

gamess.sub
#!/bin/bash
 
#SBATCH -J Gamess_test
#SBATCH --time=01:00:00
#SBATCH --ntasks=2
#SBATCH -o Forge-%J.out
 
rungms -i ./test.inp

We have customized the rungms script extensively for our system and have introduced several useful switch options.

Switch Function
-i defines the path to the input file, no default value
-d defines the path to the data folder, default value is the submission directory
-s defines the path to the scratch folder, default value is ~/gamess/scr-$input-$date this folder gets removed once the job completes.
-b defines the binary that will be used, default is /share/apps/gamess/gamess.00.x useful if you've compiled your own version of gamess.
-a defines the auxdata folder, default is /share/apps/gamess/auxdata useful if you've compiled your own version of gamess.

OpenFOAM

First load the module

module load OpenFOAM 

then source the bash file,

source $FOAMBASH

or the csh file if you are using csh as your shell

source $FOAMCSH

Then you may use open foam to work on your data files, anything that needs to be generated interactively needs to be done in an interactive job

sinteractive

You can create a job file to submit from the data directory to do anything that needs more processing, please see the example below.

foam.sub
#!/bin/bash
 
#SBATCH -J FOAMtest
#SBATHC --nodes=1
#SBATCH --ntasks=2
#SBATCH --time=01:00:00
 
blockMesh

This will execute blockMesh on the input files in the submission directory on 1 node with 2 CPUs for 1 hour.

Python

Python 2

Python modules are available through system modules as well. To see what python modules have been installed issue

module avail python

These are the site installed python modules and can be used just like any other module.

module load python

will load all python modules,

module load python/matplotlib

will load the matplotlib module and any module it depends on.

Python 3

There are many modules available for Python 3. However, unlike Python 2.7, all of the python modules are installed in the python directory rather than being separate modules. Currently, the newest Python 3 version available on the Forge is Python 3.6.4. To see a current list of available python versions, run the command

module avail python

To see a list of all available Python modules available for a particular Python version, load that Python version and run

pip list

Users are welcome to install additional modules on their account using

pip install --user <package name>

*NOTE* The default versions of Python on the Forge do *NOT* have pip installed. You will have to load a Python module to get pip functionality.

QMCPack

Start by loading the module.

module load qmcpack

Then create your submission file in the directory where your input files are, see example below.

qmcpack.sub
#!/bin/bash
 
#SBATCH -J qmcpack
#SBATCH -o Forge-%j.out
#SBATCH --ntasks=8
#SBATCH --time=1:00:00
#SBATCH --constraint=intel
 
mpirun qmcapp input.xml

Note: QMCPack is compiled to run only on intel processors, you need to add

--constraint=intel

to make sure that your job doesn't run on a node with amd processors.

This should run qmcpack on 8 cpus for 1 hour on your input file. If you want to run multiple instances, in parallel, in the same job you may pass an input.list file in place of the input.xml which contains all the input xml files you wish to run.

For more information on creating the input files please see QMCPack's user guide

SAS 9.4

Notes:
SAS can be run from the login node, with an X session, or from command line.
SAS is node-locked to only forge.mst.edu, so it won't work in a jobfile. You must run it interactively on forge.mst.edu.

For both methods, you need to load the SAS module.

 module load SAS/9.4 

Interactive:

Steps:

  1. Start a local X Server XWin 32
  2. SSH to forge.mst.edu
  3. On our Linux builds, type the command “ssh -XC forge.mst.edu”
  4. Once logged in onto the Forge, run
     sinteractive -p=free

    - the sinteractive command will open a new SSH tunnel on a compute node with free resources.

  5. One this new SSH command line, run the
    sas_en

    and it will open the interface on the compute node that sinteractive has set.

  6. Remember to close all your windows when you're finished

This series of Windows will appear.

Jobfile

sas.sub
#!/bin/bash
#SBATCH --job-name=job_name
#SBATCH --ntasks=1
#SBATCH --time=23:59:59 
#SBATCH --mem=31000
#SBATCH --mail-type=begin,end,fail,requeue 
#SBATCH --export=all 
#SBATCH --out=Forge-%j.out 
 
module load SAS/9.4
sas_en -nodms sas_script.sas

Command line:

 sas_en -nodms 

runs the command line version of sas. (type endsas; to get out)

[rlhaffer@login-44-0 apps]$ sas_en -nodms
NOTE: Copyright (c) 2002-2012 by SAS Institute Inc., Cary, NC, USA. 
NOTE: SAS (r) Proprietary Software 9.4 (TS1M3) 
      Licensed to THE CURATORS OF THE UNIV OF MISSOURI - T&R, Site 70084282.
NOTE: This session is executing on the Linux 2.6.32-431.11.2.el6.x86_64 (LIN 
      X64) platform.
      SAS/QC 14.1

NOTE: Additional host information:

 Linux LIN X64 2.6.32-431.11.2.el6.x86_64 #1 SMP Tue Mar 25 19:59:55 UTC 2014 
      x86_64 CentOS release 6.5 (Final) 

You are running SAS 9. Some SAS 8 files will be automatically converted 
by the V9 engine; others are incompatible.  Please see 
http://support.sas.com/rnd/migration/planning/platform/64bit.html

PROC MIGRATE will preserve current SAS file attributes and is 
recommended for converting all your SAS libraries from any 
SAS 8 release to SAS 9.  For details and examples, please see
http://support.sas.com/rnd/migration/index.html


This message is contained in the SAS news file, and is presented upon
initialization.  Edit the file "news" in the "misc/base" directory to
display site-specific news and information in the program log.
The command line option "-nonews" will prevent this display.




NOTE: SAS initialization used:
      real time           0.11 seconds
      cpu time            0.04 seconds
      
  1? 

PRINTING:

  • Select File | Print.
  • This will create a postscript file, “sasprt.ps”, in the directory you started SAS from. You can download it to your computer and use ps2pdf to print it.

StarCCM+

Engineering Simulation Software

Default version = 10.04.011

Other working versions:

  • 13.04
  • 12.06
  • 11.04
  • 11.02.009
  • 10.06.010
  • 8.04.010
  • 7.04.011
  • 6.04.016

Job Submission Information

Copy your .sim file from the workstation to your cluster home profile. The Forge documentation explains how to do this.
Once copied, create your job file.

Example job file: First

 module load starccm/10.04
starccm.sub
#!/bin/bash
#SBATCH --job-name=starccm_example
#SBATCH --nodes=1
#SBATCH --ntasks=12
#SBATCH --partition=requeue
#SBATCH --time=08:00:00
#SBATCH --mail-type=FAIL
#SBATCH --mail-type=BEGIN
#SBATCH --mail-type=END
#SBATCH --mail-user=joeminer@mst.edu
 
module load starccm/10.04
 
time starccm+ -power -batch -np 12 -licpath 1999@flex.cd-adapco.com -podkey AABBCCDDee1122334455 /path/to/your/starccm/simulation/example.sim

It's prefered that you keep the ntasks and -np set to the same processor count.

Breakdown of the script:
This job will use 1 node, asking for 12 processors, for a total wall time of 8 hours and will email you when the job starts, finishes or fails.

The StarCCM commands:

-power using the power session and power on demand key
-batch tells Star to utilize more than one processor
-np number of processors to allocate
-licpath 1999@flex.cd-adapco.com - Since users of StarCCM at S&T must have Power On Demand keys, we use CD-Adapco's license server
-podkey Enter your PoD key….AABBCCDDee1122334455 This is given to you when you sign up for Star, or someone from the FSAE team provides it.
/path/to/your/starccm/simulation/example.sim use the true path to your .sim file in your cluster home directory

Vasp

To use our site installation of Vasp you must first prove that you have a license to use it by emailing your vasp license confirmation to it-research-support@mst.edu.

Once you have been granted access to using vasp you may load the vasp module

module load vasp

and create a vasp job file, in the directory that your input files are, that will look similar to the one below.

vasp.sub
#!/bin/bash
 
#SBATCH -J Vasp
#SBATCH -o Forge-%j.out
#SBATCH --time=1:00:00
#SBATCH --ntasks=8
 
mpirun vasp

This example will run the standard vasp compilation on 8 cpus for 1 hour.

If you need the gamma only version of vasp use

 mpirun vasp_gam 

in your submission file.

If you need the non-colinear version of vasp use

 mpirun vasp_ncl 

in your submission file.

There are some globally available Psudopoetentials available, the module sets the environment variable $POTENDIR to the global directory.

pub/forge.txt · Last modified: 2022/05/06 20:15 (external edit)