Python for Web Scraping - Digital Humanities

This workshop starts with an overview of using social media data for applications in digital humanities and an introduction to Python. It then covers Twitter data collection using the Twitter Developer API, data pre-processing and machine learning methods to process the collected data.

22 Jul 2020 1 p.m. — 2 p.m.

Overview

Real-time social media data from platforms such as Twitter, Facebook, LinkedIn, as well as query volumes from search engines, are being used to track real-world phenomena across a wide range of topics. Social media data generated based on large groups of users provide the potential ability to observe public opinions and activities in real-time.

This workshop starts with an overview of using social media data to measure the impact of real-life events and an introduction to Python. It then covers Twitter data collection using the Twitter Developer API, data preprocessing and machine learning methods to process the collected data.

Finally, we introduced a powerful tool called topic modelling for digital humanities research and show some use cases on how it can be applied to solve real-world problems.

The session is intended for beginner or intermediate Python users, only a small amount of knowledge of Python will be required to join the session.

This session is for those working or studying at one of the N8 Research Partnership Universities.

This session will be delivered by Tahir Aduragba, Zhongtian Sun and Jialin Yu from Durham University.

Zhongtian Sun - I graduated from University of Nottingham (Bachelor) and Warwick Business School (Master) respectively; and I am a first year PhD student at Durham University in Computer Science Department. I am interested in knowledge representation learning, graph neural network and machine learning.

Tahir Aduragba - I'm a PhD student at the Department of Computer Science, Durham University. My research interest is in deep learning, natural language processing and data science. Specifically, I'm interested in the prediction of infectious disease spread on social media. I have a bachelors degree in Computer Science from Brunel University London and a masters degree in Information Systems from the University of Manchester.

Jialin Yu - Jialin graduated from University of Nottingham and UCL respectively for his Beng and MSc; and is now a second year PhD student at Durham University in Computer Science Department. His research is around probabilistic modelling and machine learning with a focus on text data. He was a demonstrator for a second year module "Theory of Computation" at Durham University from 2019 to 2020.