Machine Learning for Humanists

Overview

There are numerous Machine Learning techniques, but they all allow the programmer to do something similar and remarkable. Usually, a human programmer needs to explicitly tell the computer how to complete a task before it is able to do so. Using Machine Learning, the programmer instead defines a learning objective, and allows the computer to search for its own solution.

Until recently, Machine Learning was generally inapplicable to Humanities problems. This has changed. The Humanities are becoming more data-rich, computer scientists are becoming more interested in creative applications of AI, and today, Machine Learning is an increasingly important weapon in the armoury of the digital humanist.

Session 1: Basic Concepts of Machine Learning

The first session covered basic concepts of Machine Learning, including: supervised vs. unsupervised methods; learning vs. inference; generative vs. discriminative models; metrics (accuracy, precision, recall, and others); the rise of ‘deep learning’. It went on to look at some of the basic Python packages for Machine Learning: numpy, scikit-learn, gensim, and TensorFlow.

Session 1: Basic Concepts of Machine Learning

Session 2: Topic Modelling

The second session, looked at one of the most popular ‘big data’ methods in the Humanities: Topic Modelling. Roughly speaking, Topic Modelling allows scholars to determine what texts are about. This led on to a discussion of the most popular Topic Modelling technique (latent Dirichlet allocation), and a demonstration of how to apply it to a corpus of documents with only a few lines of simple code, and then consider the right way to interpret the output of such a model.

Session 2: Topic Modelling

Session 3: Deep Learning: Text Generation

In the third session, there was a discussion of the most popular of all Machine Learning techniques: Deep Learning, also known as Artificial Neural Networks. The session considered one of the first ways that Deep Learning was applied to Humanities data: to create generative models of language.

Session 3: Deep Learning: Text Generation

Session 4: Deep Learning: Word Vectors

The final session, explored another Deep Learning technique, Word Vectors. Like Topic Modelling, Word Vectors allow the scholar to model the meaning of words directly. But in this case, instead of sorting related words into ‘topics’, the computer tries to represent the meaning of each word individually as a set of numbers.

Session 4: Deep Learning: Word Vectors

Resources and Programming Notebooks

Throughout these workshops, Falk talks about programming notebooks that are hosted on Google Collab. You can view these at:

Michael Falk: Machine Learning for Humanists