Talebpour, Mozhgan (2025) Quantifying encoded semantics in language models through topical and bias information. Doctoral thesis, University of Essex. DOI https://doi.org/10.5526/ERR-00042465
Talebpour, Mozhgan (2025) Quantifying encoded semantics in language models through topical and bias information. Doctoral thesis, University of Essex. DOI https://doi.org/10.5526/ERR-00042465
Talebpour, Mozhgan (2025) Quantifying encoded semantics in language models through topical and bias information. Doctoral thesis, University of Essex. DOI https://doi.org/10.5526/ERR-00042465
Abstract
The use of language models has become increasingly popular in recent years, with applications spanning various Natural Language Processing (NLP) downstream tasks. While these models are widely used, their internal mechanisms remain largely unexplored. Since language models are initially trained on domain-independent data, the information they encode often remains ambiguous, making it challenging to interpret their behaviour and potential unintended biases. Although prior research has shed some light on language model architectures and the types of information they capture, much remains unknown. There are studies that suggest lower layers in models like BERT capture linguistic features, middle layers capture syntactic information, and higher layers encode contextual meaning. This PhD thesis advances prior work by examining how various language model architectures capture various types of semantic information. Specifically, this work focuses on encoding topical and biased information. This study not only examines whether different architectures of language models inherently encode semantically related information, but also explores the mechanisms through which this information is encoded within language models. Given the centrality of self-attention in Transformer-based language models, this study focuses on analysing the role of attention weights in capturing biased and topical information. This study helps to clarify the black-box nature of pre-trained language models by explaining how and why the contextual layers in BERT capture topical information, based on a comparison between attention weight clustering and LDA topic modelling results. It also shows that the difference in attention weights between biased and neutral content indicates that encoded bias in language models is complex and influenced by both the model’s architecture and the training data. The insights gained from this research contribute to the broader field of explainable Artificial Intelligence, helping to develop more transparent and interpretable language models.
| Item Type: | Thesis (Doctoral) |
|---|---|
| Subjects: | Q Science > Q Science (General) |
| Divisions: | Faculty of Science and Health > Computer Science and Electronic Engineering, School of |
| Depositing User: | Mozhgan Talebpour |
| Date Deposited: | 05 Jan 2026 12:41 |
| Last Modified: | 05 Jan 2026 12:41 |
| URI: | http://repository.essex.ac.uk/id/eprint/42465 |
Available files
Filename: Mozhgan_PhD_Thesis_v1.pdf