Research Project: From Modern Tweets to Victorian Broadsheets: Sentiment Analysis Software for Victorian Literature
Why did you apply for this internship?
I applied for this internship as I viewed it as a valuable and unique opportunity to combine my passion for English Language and my interests in computational solutions to research challenges.
I was particularly excited by the project that I chose as the problems identified by Emily Middleton in conducting sentiment analysis were challenges that I had also faced.
What did you hope to gain in completing this project?
I hoped to gain a comprehensive understanding of the challenges faced by research software engineers and the methods through which they create solutions to solve these problems. I also wanted to acquire new confidence in coding to create computational research solutions.
Project Overview
Many existing digital tools for sentiment analysis are built with a modern sentiment lexicon (a dictionary of lexical items labelled with sentiment scores like positive, neutral, and negative) and require knowledge of a programming language.
Reliance on a modern sentiment lexicon results in the exclusion of context-specific or evolving terms, meaning that researchers interested in a specific historical period, literary genre, or form, are often unable to tailor their analysis to their research material.
To provide a digital solution to this problem, I developed a prototype sentiment analysis software that allows users to customise and compare lexicons for sentiment analysis tailored to the user’s research material.
What were the key results of your research project?
- I developed a prototype sentiment analysis software that allows users to customise and compare lexicons for sentiment analysis tailored to the user’s research material.
- I created a prototype historically accurate Victorian sentiment lexicon by converting an open-source sentence-level Victorian lexicon to a word-level sentiment lexicon.
- I improved the accuracy of this word-level lexicon through the use of a large corpus of Charles Dickens literature from the Victorian era. Though the resulting sentiment lexicon presented some accuracy when tested in the prototype sentiment analysis software, I am aware that it carried my own personal bias as well as that of the creator of the original sentence-level lexicon. I suggest that in future iterations of the methodology, a large quantity of historically-accurate annotated data is used as seed data to improve the overall accuracy and reduce the bias of the lexicon.
- I began the internship with no prior programming experience, but over the past eight weeks, I was able to successfully produce a working prototype software with a Graphical User Interface, using Python
How do you feel you have benefited from completing this internship, and has it made you consider future career paths?
Prior to beginning my project, I had no previous coding experience. However, over the course of the eight-week internship, I received invaluable guidance from the Research Computing team, which allowed me to successfully create a Python-based prototype sentiment analysis software.
This introduction to software engineering has provided me with the skills to develop existing digital research tools beyond sentiment analysis to support my research in the final year of my degree.
Although I was previously unfamiliar with research software engineering, working within the Research Computing team at the University of Leeds has provided me with an understanding of the value and impact of delivering high-quality research solutions. I have found the interdisciplinary nature of the project both exciting and challenging, and am now keen to incorporate more computational methodology into my language and literature studies.
Throughout the internship, I have also had the opportunity to attend networking events such as the N8 Conference in York and the Research Culture’s Communities of Practice Event at the University of Leeds. Through attendance of poster presentations, lectures, and participation in roundtable discussions at these events, I have gained insight into the valuable work of research software engineers across the country.