Speaker:Professor Shu-Kai Hsieh (Graduate Institute of Linguistics, NTU)
Topic:From Data Semantics to Semantic Data: A Computational Linguistic Reframing of Meaning
Speaker:Professor Shu-Kai Hsieh (Graduate Institute of Linguistics, NTU)
Time:Dec 12 (Friday) , 2025, 10:40-11:30
Place: 4F-427, Assembly Building I
Abstract
This talk examines how the study of linguistic meaning has shifted from discrete word-sense inventories to high-dimensional contextual representations, and what this transition implies for statistical modeling in the era of large language models. I begin with classical approaches to word-sense representation and disambiguation, where semantic distinctions were explicitly annotated, categorized, and computed over symbolic structures. These methods assumed that meaning could be decomposed into stable units and that statistical models operate on data whose semantics were externally defined.
Recent developments in contextual embeddings challenge this assumption. Instead of treating semantics as labels attached to data, semantic structure now emerges from the distributional geometry of large corpora. Meaning becomes dynamic, situated, and encoded implicitly in vector spaces shaped by context.
To illustrate this shift, I draw on some empirical domains such as diachronic sentiment analysis and financial market emotion modeling. Both demonstrate how semantic trajectories, once manually defined, can now be quantified through vector drift, contextual clustering, and temporal embedding alignment. I conclude by arguing that this shift reframes fundamental questions in computational linguistics and in other data-intensive disciplines concerned with how information is represented, processed, and interpreted.
