Cognitive Aspects of the Lexicon

CogALex 2020


Artificial Intelligence



1 Background
Supporting us in many tasks (thinking, searching, memorizing and communicating) words are important. Hence, one may wonder how to build tools supporting their learning and usage (access/navigation). Alas the answer is not quite as straightforward as it may seem. It depends on various factors: the questioner's background (lexicography, psychology, computer science), the task (production/reception), and the material support (hardware). Words in books, computers and the human brain are not the same. Obviously, being aware of this, different communities have focused on different issues —(dictionary building; creation of navigational tools; representation and organization of words; time course for accessing a word, etc.)— yet, their views and respective goals have changed considerably over time.
Rather than considering the lexicon as a static entity, where discrete units (words) are organized alphabetically (database view), dictionaries are now viewed dynamically, i.e., as lexical graphs, whose entities are linked in various ways (topical relations; associations) and whose weight links may vary over time. While lexicographers view words as products (holistic entities), psychologists and neuroscientists view them as processes (decomposition), involving various steps or layers (representations) between an input and an output.
Computational linguists have their own ways to look at words, and their proposals have also changed quite a bit during the last decade. Discrete count-based vector representations have successively been replaced by continuous vectors (i.e., word embeddings) and then by language-model-based contextualized representations. These latter are more powerful than any of the other forms, as they are able to account for context ambiguity, outperforming the static models (including word-embeddings) in a broad range of tasks.
As one can see, different communities look at words from different angles, which can be an asset, as complementary views may help us to broaden and deepen our understanding of this fundamental cognitive resource. Yet, this diversity of perspectives can also a problem, in particular if the field is rapidly moving on, as in our case. Hence it becomes harder and harder for everyone, including experts, to remain fully informed about the latest changes (state of the art). This is one of the reasons why we organize this workshop. More precisely, our goal is not only to keep people informed without getting them crushed by the information glut, but also to help them to perceive clearly what is new, relevant, hence important. Last, but not least, we would like to connect people from different communities in the hope that this may help them to gain new insights or inspirations.
2 Scope and Topics
This workshop is about possible enhancements of lexical resources (representation, organization of the data, etc.). To allow for this we invite researchers to submit their contributions. The idea is to discuss the limitations of existing resources and to explore possible enhancements that take into account the users’ and the engineers' needs (computational aspects).
Also, just like in the past we propose again a 'shared task'. This time the goal is to provide a common benchmark for testing lexical representations for the automatic identification of lexical semantic relations (synonymy, antonymy, hypernymy, part-whole meronymy) in various languages (English, Chinese, and so on).
For this workshop we solicit papers including but not limited to the following topics, each of which can be considered from various points of view: linguistics (lexicography, computational- or corpus linguistics), neuro- or psycholinguistics (tip-of-the-tongue problem, word associations), network-related sciences (vector-based approaches, graph theory, small-world problem), and so on.
Organization, i.e. structure of the lexicon
• Micro- and macrostructure of the lexicon;
• Indexical categories (taxonomies, thesaurus-like topical structures, etc.);
• Map of the lexicon (topology) and relations between words (word associations).
The meaning of words and how to reveal it
• Lexical representation (holistic, decomposed);
• Meaning representation (concept based, primitives);
• Distributional semantics (count models, neural embeddings, etc. )
Analysis of the conceptual input given by a dictionary user
• What information do language producers typically provide when looking for a word (terms, relations)?
• What kind of relational information do they give: typed or untyped relations?
• Which relations are typically used?
Methods for crafting dictionaries or indexes
• Manual, automatic or collaborative building of dictionaries and indexes (crowdsourcing, serious games, etc.);
• Extraction of associations from corpora to build semantic networks supporting navigation;
• (Semi-) automatic induction of the link type (e.g., synonym, hypernym, meronym, ...).
Creation of new types of dictionaries
• Concept dictionary;
• Dictionary of larger segments than words (clauses, phrasal elements);
• Dictionary of patterns or concept-patterns;
• Dictionary of syllables.
Dictionary access (navigation and search strategies), interface issues
• Search based on sound (rhymes), meaning or contextually related words (associations);
• Determination of appropriate search space based on the user’s cognitive state (information available at the onset: query) and meta-knowledge (knowledge concerning the relationship between the input and the target word), ...
• Identification of typical word access strategies (navigational patterns) used by people;
• Interface problems, data visualization.
3 Workshop Submissions
The workshop features two tracks:
A regular research track, where the submissions must be substantially original.
A shared task track, with submissions consisting of system description papers.
The regular research track submissions should follow one of the 2 formats:
* Long papers (9 content pages + references) should report on solid and finished research including new experimental results, resources and/or techniques.
* Short papers (4 content pages + references) should report on small experiments, focused contributions, ongoing research, negative results and/or philosophical discussion.
Submissions must be anonymized, conform to the style sheet of COLING (https://coling2020.org/pages/call_for_papers), and be submitted via their website (https://www.softconf.com/coling2020/CogALex/).
4 Invited Speaker
Alex Arenas (http://deim.urv.cat/~alexandre.arenas/)
Alephsys Lab, Computer Science & Mathematics, Universidad Rovira i Virgili, 43007 Tarragona, Spain
5 Workshop Organizers
Michael Zock (LIS, CNRS, Aix-Marseille University, Marseille, France)
Alessandro Lenci (Comput. Linguistics Laboratory, University of Pisa, Italy)
Enrico Santus (MIT Computer Science & AI Lab, Boston, USA)
Emmanuele Chersoni (Hong Kong Polytechnic University, Hong Kong, China)
6 Program Committee
see : https://sites.google.com/view/cogalex-2020/home/programme-committee
7 Contacts
For general questions, please get in touch with Michael Zock (michael.zock@lis-lab.fr)
Concerning the shared task, please contact Enrico Santus (esantus@gmail.com), or Emmanuele Chersoni (emmanuelechersoni@gmail.com)