Skip to main content
Zichen Yang presenting at the 2025 intern showcase

Zichen Yang

Zichen received a first class honours degree in Computer Science at the University of Liverpool in 2025 and completed his internship following graduation.

AI tools for large-scale text mining PubMed

Why did you apply for this internship?

I want to apply my skills to innovative software that goes beyond standard CRUD applications. While I have already developed some side projects, I am now seeking an opportunity to work within a more structured and rigorous development process. This research-focused internship provides the ideal environment for such an endeavour.

What did you hope to gain in completing this project?

I wanted to learn more about software development best practices, gain first-hand experience in GPU-based parallel computation, and produce tools that will be useful and used by people.


Project Overview

Scientific databases require structured data to be useful for research. This data, which links genes to their functions, often comes from millions of public research papers. Currently, human experts read these papers and manually extract the information, but this process is slow.

This project explores whether an AI system can automate this task. We built an AI agent that reads a scientific paper, identifies the species and genes discussed, and then proposes the relevant gene ontology annotations.

Early results show that the AI finds most genes mentioned in papers and suggests reasonable annotations, even if they differ from those provided by human experts.

What were the key results of your research project?

  • The AI agent can find most genes mentioned in a PubMed article, and sometimes finds extra ones that are not confirmed by human curators.
  • The agent suggests Gene Ontology (GO) terms for each gene. These suggestions are sometimes more specific, or in a different category, than those chosen by human curators.
  • The agent can structure data from the literature, but deciding which GO annotation is “correct” is difficult; there is no simple metric for accuracy.
  • MCP servers built for this project can be used by other agents and systems, not just this one.


GitHub repository: https://github.com/PeronGH/bioagent



How do you feel you have benefited from completing this internship, and has it made you consider future career paths?

This internship gave me hands-on experience with new technologies like MCP and AI agents. I learned how research software is developed and used at a UK university. The work made me more interested in AI and software engineering, and I am now considering a career in this field in the UK.


Download presentation slides:

  Internships 2025 - Zichen Yang


Return to article index