How well do distributed representations convey contextual lexical semantics: a Thesis Proposal
Abstract
Modern neural networks (NNs), trained on extensive raw sentence data, construct distributed representations by compressing individual words into dense, continuous, high-dimensional vectors. These representations are specifically designed to capture the varied meanings, including ambiguity, of word occurrences within context. In this thesis, our objective is to examine the efficacy of distributed representations from NNs in encoding lexical meaning. Initially, we identify four sources of ambiguity - homonymy, polysemy, semantic roles, and multifunctionality - based on the relatedness and similarity of meanings influenced by context. Subsequently, we aim to evaluate these sources by collecting or constructing multilingual datasets, leveraging various language models, and employing linguistic analysis tools.
- Publication:
-
arXiv e-prints
- Pub Date:
- June 2024
- DOI:
- 10.48550/arXiv.2406.00751
- arXiv:
- arXiv:2406.00751
- Bibcode:
- 2024arXiv240600751L
- Keywords:
-
- Computer Science - Computation and Language
- E-Print:
- 6 pages