D-Terminer

About D-Terminer

Terminology

Automatic term extraction is the process of automatically identifying terminology in domain-specific text.

Terms consist of one or more words that express a specific concept in a domain.
Terminology has multiple meanings. It can refer to the specialised, domain-specific vocabulary of a domain, i.e., the collection of all terms in a domain, or it can be used to describe the study of terms.
Examples of terms in the domain of heart failure are: cariodology, beta-blockers, myocardial infarction, and heart failure with reduced ejection fraction.

Domain in this context can be interpreted as any area in which people can build expertise. Common examples are the domains of medicine, technology, or finance. Equally valid domains are music, football, or cooking. Domains can be defined very broadly, e.g., medicine, or they can be more specific, e.g, heart failure. The domains included in the ACTER dataset that is used to train the system are: corruption, dressage (horse riding), heart failure, and wind energy. This means the system will work particularly well on those domains, or domains that are strongly related. However, it will also generalise to other domains.

The definition of terms leaves much room for interpretation, so the boundary between terms and general language is not always clear. Some people only consider very specific terms to be valid, while others will include more general words. For instance, in the domain of heart failure, some people will consider heart to be a valid term, while others consider this general language. The interpretation usually depends on the intended application.
This project differentiates between different types of terms. For the monolingual term extraction, you can choose to focus on all, or a subset of these types. The results are most accurate when the system is trained to extract all of these types (standard settings), but you can also use a system trained to find a subset.

Publications

This demo has been developed based on Ayla Rigouts Terryn's PhD research. More information about the methodology, evaluation, and dataset can be found in the following publications:

Contact

This demo is a work in progress, so feel free to contact us at ayla.rigoutsterryn@kuleuven.be with suggestions on how to improve it. We will also gladly answer your questions regarding this demo.

Planned improvements include: