Installing packages
Overview
Teaching: 15 min
Exercises: 15 minQuestions
How do I install a package for framwork/language X?
Can I use the operating system package manager?
Objectives
Install packages for a wide range of languages such as Python, R, Anaconda, Perl
Use containers to install complete software packages with dependencies in a single image with Singularity.
Installing packages for common frameworks/languages is very useful for a user on HPC to perform. Fortunately we allow downloads from the Internet so this is quite easy to achieve (some HPC systems can be isolated from the Internet that makes things harder). There will be many ways to install packages for programs, for example Perl can be just the operating system version of Perl or part of a bigger packages such as ActiveState Perl. ActiveState has its own package system rather than using the default package system in Perl. However we aim to provide the user in this section to be able to understand how packages work and install for common languages.
Once installed on the login node, the SLURM job scripts can be written to take advantage of the new package or software. You do not need to install the package again!
Check versions
Versions of software modules can be checked with
module av
, for example for Python$ module av python
Specifying the version of the software also makes sure changes to default version will not impact your jobs. For example default python loaded with
module load python
could install packages into one location but if the default version changed it will not automatically reinstall those packages.
Python
Python is becoming the de-facto language across many disciplines. It is easy to read, write and run. Python 3 should be preferred over the now retired Python 2. On Hawk a user can load Python from a module
$ module load python/3.7.0
Python versions
Calling Python can be performed using just
python
or with the version number such aspython3
or even more exactpython3.7
. To be sure you are calling Python 3 we recommend running withpython3
rather than the unversionedpython
. What is the difference on Hawk?Solution
$ python --version
will print
Python 2.7.5
Python 3.7.0 is now loaded into your environment, this can be checked with
$ python3 --version
$ Python 3.7.0
Installing a package
To install a package we recommend using the pip
functionality with a virtual environment. Using the previously recommdended --user
option can cause problems due to not being isolated and conflict with other packages. Pip provides many packages of Python and handles
dependencies between the packages. For a user on Hawk it may require 2 steps.
To create a virtual environment and update Pip (updating Pip resolves odd behaviour especially when detecting available platform downloads) run:
$ python3 -m venv my_env
$ source ./my_env/bin/activate
$ pip install --upgrade pip
Then install a package with:
$ pip install <package>
Pip versions
Pip (just like Python) also is versioned with
pip
just being a default version in the OS or virtual environment. Whilstpip3
will call the Python 3 version of Pip. This is important to match overwise it will install packages in the wrong location.
Also to add another option Python3 comes with pip as a module sopython3 -m pip
can also be a valid way and is now recommended as the future way to call pip. What versions are available on Hawk?Solution
There is no default OS
pip
butpip
andpip
are supplied by the module.$ pip --version $ pip3 --version
Both print
pip 20.0.2 from /apps/languages/python/3.7.0/el7/AVX512/intel-2018/lib/python3.7/site-packages/pip (python 3.7)
Where <package>
is the name of the package. The packages can be found using the Python Package Index or
PyPI.
Example
To install a rather unhelpful hello-world
package into a virtual environment, activate your virtual environment and run
$ pip install pip-hello-world
This will install the package in your $HOME
directory.
This package can be tested with the following Python script
from helloworld import hello
c = hello.Hello()
c.say_hello()
The script will produce
'Hello, World!'
Virtual environments in detail
Python supports a concept called virtual environments. These are self-contained Python environments that allows you to install packages without impacting other environments you may have. For example:
$ python3.7 -m venv pyenv-test $ source pyenv-test/bin/activate $ pip3 install pip-hello-world $ python3.7 $ deactivate
Everything is contained with the directory
pyenv-test
. There was also no need to specify--user
. For beginners this can be quite confusing but is very useful to master. Once finished you candeactivate
the environment`.
R
R is a language for statisticians. It is very popular language to perform statistics (its primary role). Package
management for R is inbuilt rather than using a separate program to perform it.
On Hawk, R can be loaded via the module system with
$ module load R/3.5.1
Example
Running R, the following command will install the jsonlite
package
install.packages(“jsonlite”)
# (If prompted, answer Yes to install locally)
# Select a download mirror in UK if possible.
Where jsonlite
is the package name you want to install. Information can be found at
CRAN
In this case the package will be installed at $HOME/R/x86_64-pc-linux-gnu-library/3.5/
Anaconda
Anaconda is a fully featured software management tool that provides many packages for many different languages. Anaconda allow dependencies between software packages to be installed.
Since Anaconda keeps updating we can load the module on Hawk with
$ module load anaconda
$ source activate
This should load a default version of Anaconda. Specific versions can be loaded as well.
Example
A package can then be installed with
$ conda search biopython
$ conda create --name snowflakes biopython
This first checks there is a biopython
package and then creates an environment snowflakes
to store the package in.
The files are stored in $HOME/.conda
This package can then be activated using
$ conda info --envs
$ conda activate snowflakes
$ python
$ source deactivate
This first lists all environments you have installed. Then activates the environment you would like and then you can
run python
. Finally you can deactivate the environment.
Newer versions of Anaconda may change the procedure.
Anaconda directory
Every package you download is stored in
$HOME/.conda
and also lots of other files are created for every package. Overtime this directory can have many files and will reach the quota limit for files. We recommend usingconda clean -a
to tidyup the directory and reduce the number of files.
Perl
Perl has been around for a long-time but becoming uncommon to see being used on HPC. On Hawk we recommend using the operating system Perl command so no module is required.
$ perl --version
To manage packages we can use the cpanm
utility. This can be installed in your home directory with
$ mkdir -p ~/bin && cd ~/bin
$ curl -L https://cpanmin.us/ -o cpanm
$ chmod +x cpanm
Then setup cpanm
with
$ cpanm --local-lib=~/perl5 local::lib
$ eval $(perl -I ~/perl5/lib/perl5/ -Mlocal::lib)
$ perl -I ~/perl5/lib/perl5/ -Mlocal::lib
# Add output from perl command above to $HOME/.myenv
This will install Perl packages in $HOME/perl5
.
Example
Install the Test::More
package.
$ cpanm Test::More
Then run the following Perl script
use Test::More;
print $Test::More::VERSION ."\n";
Which should produce something like
1.302175
Singularity
Singularity is a container solution for HPC where alternatives such as Docker are not suitable. Singularity can pretend to be another operating system but inherits all the local privilege you have on the host system. On Hawk we do not want anyone to be able to run software as another user - especially root. Also containers are immutable so you save files inside containers - $HOME and /scratch are mounted within the image.
Singulairity is available as a module with:
$ module load singularity
The latest version should be preferred due to containing the latest versions of the code that may include security and performance fixes.
Singularity can then be used, initially users will want to download existing containers but eventually you can build you own - maybe via the Singularity Hub
MPI and GPUs
Singularity supports using MPI and GPUs. MPI can be tricky especially with OpenMPI but GPUs are supported using the
--nv
command line option that loads the relevent GPU libraries into the container from the host operating system.
Example
To download a container from Singularity Hub then you can run:
$ singularity pull shub://vsoch/hello-world
This will pull down an image file hello-world_latest.sif
. This can be run with:
$ singularity run hello-world_latest.sif
This produces the following output:
RaawwWWWWWRRRR!! Avocado!
Docker Hub can also be used for Docker containers such as
$ singularity pull docker://ubuntu
You can then run a Ubuntu application on Redhat, e.g.
$ singularity exec ubuntu_latest.sif cat /etc/os-release
A useful site for GPUs is NGC that hosts containers for software optimised for GPUs. Instructions are on the website.
Containers
Containers allows a user to build once and run anywhere (if same processor type). A recipe file can be created to store the exact steps used to create the image. This can make sure any code is reproducible by others. It is useful for packages that assume a user has root access to their machine and install OS level packages for dependencies.
Others?
There may be other software package managers that you want to investigte. Please get in touch with us if you need help.
For example, the Spack is a software manager to make it easy to install packages. Finally there is
also the traditional autoconf
and make
approach. For example you can build software with:
$ ./configure --prefix=$HOME/my_package
$ make
$ make install
This will install my_package
in your $HOME
directory.
Experimenting with software packages is encouraged to find new and novel ways to perform your research.
Key Points
Users can control their own software when needed for development.