code-1486361_1920

Bede Software

A basic overview of Bede's software landscape is given here. It is intended as an introduction to the software, packages, and libraries that are available to users of the Bede Supercomputer.

Bede is running Red Hat Enterprise Linux 7 and access to its computational resources is mediated by the Slurm batch scheduler.

A range of software will be supported by Bede. The system will have all of the software from Watson Machine Learning: Community Edition installed as standard. This comprises a number of open-source packages that the majority of researchers and data scientists will be familiar with:

  • TensorFlow
  • Caffe
  • PyTorch
  • Keras

These will be supplemented by bespoke tools from IBM:

  • Large Model Support
  • Distributed Deep Learning
  • Snap Machine Learning

Large Model Support

This package is intended to help avoid the ‘out of memory’ error that can often occur when training a model. It helps to maximise the benefits of the bandwidth of Bede’s NV Link interconnects by allowing tensor outputs to be moved to the much larger system memory.

Doing this enables models to train on much higher resolution imagery which serves to increase the model’s accuracy and usefulness.


Distributed Deep Learning

Distributed Deep Learning is IBM’s answer to the problem of scaling GPU-accelerated activities. The package will test a range of NPI communication algorithms and choose the best one for your topology.


Snap Machine Learning

Snap Machine Learning is a product of IBM Research Zurich. It is a distributed GPU-accelerated machine learning library. It enables traditional machine learning algorithms such as Random Forest or Radiant Boosting to benefit from GPU acceleration.

IBM have done a significant amount of work to integrate it as one consistent API to make it easier for researchers to use without having to learn lots of new elements.


Distribution

All of these packages are distributed using Conda, the Python Packaging Library. IBM offer these packages via three separate channels within Conda.

The main channel features packages that are tried, tested and supported by IBM. There is also an ‘early access’ channel which lets users make use of the next release of a package, they will be as close to the final open source release as possible.

Finally, there is a community channel. This is for users who have done work to port a Python package, that perhaps wasn’t behaving as expected on a Power 9 system, and make it available for other users within the community.


Return to article index