User Tools

Site Tools


pub:foundry

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
pub:foundry [2020/12/21 12:18]
blspcy [Hardware]
pub:foundry [2023/08/22 15:15] (current)
lantzer
Line 15: Line 15:
  
 Dell C6525: 4 node chassis with each node containing dual 32 core AMD EPYC Rome 7452 CPUs with 256 GB DDR4 ram and 6 480GB SSD drives in raid 0. Dell C6525: 4 node chassis with each node containing dual 32 core AMD EPYC Rome 7452 CPUs with 256 GB DDR4 ram and 6 480GB SSD drives in raid 0.
 +
 +As of 06/17/21 we currently have over 11,000 cores of compute capacity on the Foundry.
  
 ===GPU nodes=== ===GPU nodes===
Line 20: Line 22:
 The newly added GPU nodes are Dell C4140s configured as follows. The newly added GPU nodes are Dell C4140s configured as follows.
  
-Dell C4140: 1 node chassis with 4 Nvidia V100 GPUs connected via NV-link and interconnect with other nodes via HDR-100 infiniband. +Dell C4140: 1 node chassis with 4 Nvidia V100 GPUs connected via NV-link and interconnect with other nodes via HDR-100 infiniband. Each has dual 20 core intel processors and 192GB of DDR4 ram.
  
 +As of 06/17/21 we currently have 24 V100 GPUs available for use.
  
 ===Storage=== ===Storage===
Line 35: Line 38:
 ==Scratch Directories== ==Scratch Directories==
  
-Each user will get a scratch directory created for them at /lustre/scratch/$USER an alias of `cdsc` has also been made for users to cd directly to this location. As with all storage scratch space is not backed up, and even more to the fact of data impermanence in this location it is actively cleaned in an attempt to prevent the storage from filling up. The intent for this storage is for your programs to create temporary files which you may need to keep after the calculation completes for a short time only. The volume is a high speed network attached scratch space, there currently are no quotas placed on the directories in this scratch space, however if the 20TB volume filling up becomes a problem we will have to implement quotas. +Each user will get a scratch directory created for them at /lustre/scratch/$USER an alias of `cdsc` has also been made for users to cd directly to this location. As with all storage scratch space is not backed up, and even more to the fact of data impermanence in this location it is actively cleaned in an attempt to prevent the storage from filling up. The intent for this storage is for your programs to create temporary files which you may need to keep after the calculation completes for a short time only. The volume is a high speed network attached scratch space, there currently are no quotas placed on the directories in this scratch space, however if the 60TB volume filling up becomes a problem we will have to implement quotas. 
  
 Along with the networked scratch space, there is always local scratch on each compute node for use during calculations in /tmp. There is no quota placed on this space, and it is cleaned regularly as well, but things stored in this space will only be available to processes executing on the node in which they were created. Meaning if you create it in /tmp in a job, you won't be able to see it on the login node, and other processes won't be able to see it if they are on a different node than the process which created the file.  Along with the networked scratch space, there is always local scratch on each compute node for use during calculations in /tmp. There is no quota placed on this space, and it is cleaned regularly as well, but things stored in this space will only be available to processes executing on the node in which they were created. Meaning if you create it in /tmp in a job, you won't be able to see it on the login node, and other processes won't be able to see it if they are on a different node than the process which created the file. 
Line 196: Line 199:
  
 Missouri S&T users can mount their web volumes and S Drives with the <code>mountdfs</code> command. This will mount your user directories to the login machine under /mnt/dfs/$USER. The data can be copied over with command line tools to your home directory, **your data will not be accessible from the compute nodes so do not submit jobs from these directories.** Aliases "cds" and "cdwww" have been created to allow you to cd into your s drive and web volume quickly and easily.  Missouri S&T users can mount their web volumes and S Drives with the <code>mountdfs</code> command. This will mount your user directories to the login machine under /mnt/dfs/$USER. The data can be copied over with command line tools to your home directory, **your data will not be accessible from the compute nodes so do not submit jobs from these directories.** Aliases "cds" and "cdwww" have been created to allow you to cd into your s drive and web volume quickly and easily. 
 +
 +You can un-mount your user directories with the <code>umountdfs</code> command. If you have trouble accessing these resources, you may be able to get it working again by un-mounting the directories and then mount the directory again. When the file servers reboot for monthly patching or other scheduled maintenance, the mount might not reconnect automatically.
  
 === Windows === === Windows ===
Line 397: Line 402:
 ** The applications portion of this wiki is currently a Work in progress, not all applications are currently here, nor will they ever be as the applications we support continually grows. ** ** The applications portion of this wiki is currently a Work in progress, not all applications are currently here, nor will they ever be as the applications we support continually grows. **
  
 +==== Abaqus ====
 +
 +  * Default Version = 2022
 +  * Other versions available: 2020
 +
 +
 +=== Using Abaqus ===
 +
 +Abaqus should not be operated on the login node at all.
 +\\
 +Be sure you are connected to the Foundry with X forwarding enabled, and running inside an interactive job using command 
 +    sinteractive
 +Before you attempt to run Abaqus. Running sinteractive without any switches will give you 1 cpu for 10 minutes, if you need more time or resources you may request it. See [[pub:foundry#interactive_jobs|Interactive Jobs]] for more information.
 +\\
 +Once inside an interactive job you need to load the Abaqus module.
 +    module load abaqus
 +Now you may run abaqus.
 +    ABQLauncher cae -mesa
 +
 +
 +
 +====Anaconda====
 +If you would like to install python modules via conda, you may load the anaconda module to get access to conda for this purpose. After loading the module you will need to initialize conda to work with your shell.
 +<code>
 +module load anaconda
 +conda init
 +</code>
 +This will ask you what shell you are using, and after it is done it will ask you to log out and back in again to load the conda environment. After you log back in your command prompt will look different than it did before. It should now have (base) on the far left of your prompt. This is the virtual environment you are currently in. Since you do not have permissions to modify base, you will need to create and activate your own virtual environment to build your software inside of.
 +<code>
 +conda create --name myenv
 +conda activate myenv
 +</code>
 +Now instead of (base) it should say (myenv) or whatever you have named your environment in the create step. These environments are stored in your home directory so they are unique to you. If you are working together with a group, everyone in your group will either need a copy of the environment you've built in $HOME/.conda/envs/ 
 +\\
 +Once you are inside your virtual environment you can run whatever conda installs you would like and it will install them and dependencies inside this environment. If you would like to execute code that depends on the modules you install you will need to be sure that you are inside your virtual environment. (myenv) should be shown on your command prompt, if it is not, activate it with `conda activate`.
 ==== Ansys ==== ==== Ansys ====
  
Line 587: Line 627:
 If you would like to run the full GUI you may do so inside an interactive job, make sure you've connected to The Foundry with X Forwarding enabled. Start the job with. <code>sinteractive</code> This will give you 1 processor for 1 hour, to request more processors or more time please see the documentation at [[pub:foundry#interactive_jobs|Interactive Jobs]]. If you would like to run the full GUI you may do so inside an interactive job, make sure you've connected to The Foundry with X Forwarding enabled. Start the job with. <code>sinteractive</code> This will give you 1 processor for 1 hour, to request more processors or more time please see the documentation at [[pub:foundry#interactive_jobs|Interactive Jobs]].
  
-Once inside the interactive job you will need to load the ansys module. <code>module load ansys</code> Then you may start fluent from the command line. <code>fluent 2ddp </code> will start the 2d, double precision version of fluent. If you've requested more than one processor you need to first run <code>export PBS_NODEFILE=`generate_nodefile`</code> Then you need to add some switches to fluent to get it to use those processors. <code>fluent 2ddp -t## -pethernet -cnf=$PBS_NODEFILE -ssh</code> You need to replace the ## with the number of processors requested.+Once inside the interactive job you will need to load the ansys module. <code>module load ansys</code> Then you may start fluent from the command line. <code>fluent 2ddp </code> will start the 2d, double precision version of fluent. If you've requested more than one processor you need to first run <code>export PBS_NODEFILE=`generate_pbs_nodefile`</code> Then you need to add some switches to fluent to get it to use those processors. <code>fluent 2ddp -t## -pethernet -cnf=$PBS_NODEFILE -ssh</code> You need to replace the ## with the number of processors requested. 
 + 
 +==== Comsol ==== 
 + 
 +Comsol Multiphysics is available for general usage through a comsol/5.6_gen module. Please see the sample submission script below for running comsol in parallel on the Foundry. 
 + 
 +<file bash comsol.sub> 
 +#!/bin/bash 
 +#SBATCH -J Comsol_job 
 +#SBATCH --ntasks-per-node=1 
 +#SBATCH --cpus-per-task=64 
 +#SBATCH --mem=0 
 +#SBATCH --time=1-00:00:00 
 +#SBATCH --export=ALL 
 + 
 +module load comsol/5.6_gen 
 +ulimit -s unlimited 
 +ulimit -c unlimited 
 + 
 +comsol batch -mpibootstrap slurm -inputfile input.mph -outputfile out.mph 
 + 
 +</file>
  
 ==== Cuda ==== ==== Cuda ====
Line 609: Line 670:
  
 This file requests 1 cpu and 1 gpu on 1 node for 1 hour, to request more cpus or more gpus you will need to modify the values related to ntasks and gres=gpu. It is recommended that you at least have 1 cpu for each gpu you intend to use, we currently only have 2 gpus available per node. Once we incorporate the remainder of the GPU nodes we will have 7 gpus available in one chassis. This file requests 1 cpu and 1 gpu on 1 node for 1 hour, to request more cpus or more gpus you will need to modify the values related to ntasks and gres=gpu. It is recommended that you at least have 1 cpu for each gpu you intend to use, we currently only have 2 gpus available per node. Once we incorporate the remainder of the GPU nodes we will have 7 gpus available in one chassis.
 +
 +====Gaussian====
 +
 +Gaussian has 2 different versions on the Foundry, the sample submission file below uses the g09 executable however if you load the version 16 module you will need to use g16. 
 +<file bash gaussian.sub>
 +#!/bin/bash
 +#SBATCH --job-name=gaussian
 +#SBATCH --nodes=1
 +#SBATCH --ntasks=1
 +#SBATCH --time=10:00
 +#SBATCH --mem-per-cpu=1000
 +module load gaussian/09e01
 +g09 < Fe_CO5.inp 
 +</file>
 +
 +You will need to replace the file name of the input file in the sample provided with your own.
 +
  
 ==== Matlab ==== ==== Matlab ====
Line 616: Line 694:
  
 Matlab is available to run in batch form or interactively on the cluster. Matlab is available to run in batch form or interactively on the cluster.
-  * Default version = 2019b +  * Default version = 2021a 
-  * Other installed version: none yet+  * Other installed version(s)2019b, 2020a, 2020b (run "module avail" to see what versions are currently available)
  
 === Interactive Matlab === === Interactive Matlab ===
  
-The simplest way to get up and running with matlab on the cluster is to simply run <code>matlab</code> from the login node. This will start an interactive job on the backend nodes, load the default module and open matlab. If you have connected with X forwarding you will get the full matlab GUI to use however you would like. This method however limits you to 1 core for 4 hours maximum on one of our compute nodes. To use more than 1 core, or run for longer than 4 hours, you will need to start an interactive session on your own and define what you want, or submit a batch submission and run what you want+To get started with Matlab, run the following sequence of commands from the login node. This will start an interactive job on backend node, load the default module for Matlab, and then launch Matlab. If you have connected with X forwardingyou will get the full Matlab GUI to use however you woud like. By default, this limits you to 1 core for 4 hours maximum on one of our compute nodes. To use more than 1 core, or to run for longer than 4 hours, you will need to either add additional parameters to the "sinteractive" command or submit a batch sumbission job that configures all of the job parameters you require.
  
-Please note that by default matlab does not parallelize your code so unless you use parallelized calls in your code. If you have parallelized your code you will need to first open a parallel pool to run your code in.+<code>sinteractive 
 +module load matlab 
 +matlab 
 +</code> 
 + 
 +Please note that by default Matlab does not parallelize your code so unless you use parallelized calls in your code. If you have parallelized your code you will need to first open a parallel pool to run your code in.
  
 === Batch Submit === === Batch Submit ===
Line 694: Line 777:
 Another thing to note is the flexibility of singularity. It can run containers from it's own library, docker, dockerhub, singularityhub, or a local file in many formats. Another thing to note is the flexibility of singularity. It can run containers from it's own library, docker, dockerhub, singularityhub, or a local file in many formats.
  
 +
 +====StarCCM+====
 +
 +Engineering Simulation Software\\
 +
 +Default version  = 2021.2\\
 +
 +Other working versions:  
 +  * 2020.1
 +  * 12.02.010
 +
 +
 +
 +Job Submission Information
 +
 +Copy your .sim file from the workstation to your cluster home profile.\\
 +Once copied, create your job file.
 +
 +Example job file:
 +
 +<file bash starccm.sub>
 +
 +#!/bin/bash
 +#SBATCH --job-name=starccm_test
 +#SBATCH --nodes=1
 +#SBATCH --ntasks=12
 +#SBATCH --mem=40000
 +#SBATCH --partition=requeue
 +#SBATCH --time=12:00:00
 +#SBATCH --mail-type=BEGIN
 +#SBATCH --mail-type=FAIL
 +#SBATCH --mail-type=END
 +#SBATCH --mail-user=username@mst.edu
 +
 +module load starccm/2021.2
 +
 +time starccm+ -batch -np 12 /path/to/your/starccm/simulation/example.sim
 +</file>
 +
 +** It's prefered that you keep the ntasks and -np set to the same processor count.**\\
 +
 +Breakdown of the script:\\
 +This job will use **1** node, asking for **12** processors, **40,000 MB** of memory for a total wall time of **12 hours** and will email you when the job starts, finishes or fails.
 +
 +The StarCCM commands:\\
 +
 +
 +|-batch| tells Star to utilize more than one processor|
 +|-np| number of processors to allocate|
 +|/path/to/your/starccm/simulation/example.sim| use the true path to your .sim file in your cluster home directory|
 +
 +
 +====TensorFlow with GPU support====
 +
 +https://www.tensorflow.org/
 +
 +We have been able to get TensorFlow to work with GPU support if we install it within an anacoda environment. Other methods do not seem to work as smoothly (if they even work at all).
 +
 +First use [[#Anaconda|Anaconda]] to create and activate a new environment (e.g. tensorflow-gpu). Then use anaconda to install TensorFlow with GPU support:
 +
 +conda install tensorflow-gpu
 +
 +At this point you should be able to activate that anaconda environment and run TensorFlow with GPU support.
 +
 +Job Submission Information
 +
 +Copy your python script to the cluster. Once copied, create your job file.
 +
 +Example job file:
 +
 +<file bash tensorflow-gpu.sub>
 +#!/bin/bash
 +#SBATCH --job-name=tensorflow_gpu_test
 +#SBATCH --nodes=1
 +#SBATCH --ntasks=1
 +#SBATCH --partition=cuda
 +#SBATCH --time=01:00:00
 +#SBATCH --gres=gpu:1
 +#SBATCH --mail-type=BEGIN
 +#SBATCH --mail-type=FAIL
 +#SBATCH --mail-type=END
 +#SBATCH --mail-user=username@mst.edu
 +
 +module load anaconda/2020.7
 +conda activate tensorflow
 +python tensorflow_script_name.py
 +</file>
 +
 +==== Thermo-Calc ====
 +
 +  * Default Version = 2021a
 +  * Other versions available: none yet
 +
 +=== Accessing Thermo-Calc ===
 +
 +Thermo-Calc is a restricted software. If you need access please email nic-cluster-admins@mst.edu for more info.
 +
 +=== Using Thermo-Calc ===
 +
 +Thermo-Calc will not operate on the login node at all.
 +\\
 +Be sure you are connected to the Foundry with X forwarding enabled, and running inside an interactive job using command 
 +    sinteractive
 +before you attempt to run Thermo-Calc. Running sinteractive without any switches will give you 1 cpu for 10 minutes, if you need more time or resources you may request it. See [[pub:foundry#interactive_jobs|Interactive Jobs]] for more information.
 +\\
 +Once inside an interactive job you need to load the Thermo-Calc module.
 +    module load thermo-calc
 +Now you may run thermo-calc.
 +    Thermo-Calc.sh
 +
 +====Vasp====
 +
 +To use our site installation of Vasp you must first prove that you have a license to use it by emailing your vasp license confirmation to <nic-cluster-admins@mst.edu>.
 +
 +Once you have been granted access to using vasp you may load the vasp module <code>module load vasp</code> (you might need to select the version that you are licensed for).
 +
 +and create a vasp job file, in the directory that your input files are, that will look similar to the one below.
 +
 +<file bash vasp.sub>
 +#!/bin/bash
 +
 +#SBATCH -J Vasp
 +#SBATCH -o Foundry-%j.out
 +#SBATCH --time=1:00:00
 +#SBATCH --ntasks=8
 +
 +module load vasp
 +module load libfabric
 +
 +srun vasp
 +
 +</file>
 +
 +This example will run the standard vasp compilation on 8 cpus for 1 hour. \\
 +
 +If you need the gamma only version of vasp use <code> srun vasp_gam </code> in your submission file. \\
 +
 +If you need the non-colinear version of vasp use <code> srun vasp_ncl </code> in your submission file.\\
 +
 +It might work to launch vasp with "mpirun vasp", but running "srun vasp" should automatically configure the MPI job parameters based on the configured slurm job parameters and should run more cleanly than mpirun.\\\
 +
 +There are some globally available Psudopoetentials available, the module sets the environment variable $POTENDIR to the global directory.
  
  
pub/foundry.1608553107.txt.gz · Last modified: 2022/05/06 20:15 (external edit)