Bede is running Red Hat Enterprise Linux 7 and access to its computational resources is mediated by the Slurm batch scheduler.
A range of software will be supported by Bede. The system will have all of the software from Watson Machine Learning: Community Edition installed as standard. This comprises a number of open-source packages that the majority of researchers and data scientists will be familiar with:
These will be supplemented by bespoke tools from IBM:
- Large Model Support
- Distributed Deep Learning
- Snap Machine Learning
Large Model Support
This package is intended to help avoid the ‘out of memory’ error that can often occur when training a model. It helps to maximise the benefits of the bandwidth of Bede’s NV Link interconnects by allowing tensor outputs to be moved to the much larger system memory.
Doing this enables models to train on much higher resolution imagery which serves to increase the model’s accuracy and usefulness.
Distributed Deep Learning
Distributed Deep Learning is IBM’s answer to the problem of scaling GPU-accelerated activities. The package will test a range of NPI communication algorithms and choose the best one for your topology.
Snap Machine Learning
Snap Machine Learning is a product of IBM Research Zurich. It is a distributed GPU-accelerated machine learning library. It enables traditional machine learning algorithms such as Random Forest or Radiant Boosting to benefit from GPU acceleration.
IBM have done a significant amount of work to integrate it as one consistent API to make it easier for researchers to use without having to learn lots of new elements.
All of these packages are distributed using Conda, the Python Packaging Library. IBM offer these packages via three separate channels within Conda.
The main channel features packages that are tried, tested and supported by IBM. There is also an ‘early access’ channel which lets users make use of the next release of a package, they will be as close to the final open source release as possible.
Finally, there is a community channel. This is for users who have done work to port a Python package, that perhaps wasn’t behaving as expected on a Power 9 system, and make it available for other users within the community.