Termset Expansion Macros

NLPQL supports a set of macros for termset generation. The macros provide a compact syntax for representing lists of synonyms and lexical variants (plurals and verb inflections). The macros also support the concept of a “namespace”, so that terms can be generated from different sources.

The use of termset expansion macros is optional. They are provided purely for convenience, as a means to generate and suggest additional synonyms.

Syntax

The macro syntax is namespace.function(args), where the namespace is either Clarity or OHDSI. The argument is either a single term in double quotes or a comma-separated list of terms surrounded by brackets:

namespace.function("term")
namespace.function(["term1", "term2", ..., "termN"])

If the namespace is omitted it defaults to Clarity. The supported macros are:

Macro Meaning  
Clarity.Synonyms Generate a list of synonyms from WordNet
Clarity.Plurals Generate a list of plural forms
Clarity.VerbInflections Generate inflections for the verb in base form
OHDSI.Synonyms Generate a list of OHDSI synonyms for the concept
OHDSI.Ancestors Generate all OHDSI ancestor concepts
OHDSI.Descendants Generate all OHDSI descendant concepts

The synonym finder examines the macro argument(s) and attempts to find the nouns, adjectives, and adverbs. It generates synonyms for each that it finds, returning the cartesian product [1] of all possibilities. This process can cause a combinatorial explosion in the number of results. To illustrate, consider this example:

The human walks the pet.

If the synonyms for human are man, woman, boy, girl and the synonyms for pet are dog, cat, then 4*2 = 8 results will be generated, in addition to the original:

The human walks the pet.
The man walks the dog.
The woman walks the dog.
The boy walks the dog.
The girl walks the dog.
The man walks the cat.
The woman walks the cat.
The boy walks the cat.
The girl walks the cat.

Hundreds or perhaps thousands of result strings could be generated by expansion of terms with many synonyms. So we recommend caution with synonym generation, limiting its use to single terms or short strings.

Both single and multiword terms can be included in a macro, and the macro can operate only on selected terms in a list:

Synonyms(["heart", "heart attack", "heart disease"])
"heart", Synonyms("heart attack"), "heart disease",

IMPORTANT NOTE: the VerbInflections macro requires that the verb be given in base form (also called “raw infinitive” form, “dictionary” form, or “bare” form). The reason for this is because it is not possible to unambiguously determine the base form of a verb from an arbitrary inflection, and the ClarityNLP verb inflector requires the base form as input. See the documentation for the verb inflector for more on this topic.

Macro Nesting

Macros can also be nested:

Clarity.LexicalVariants(OHDSI.Synonyms(["myocardial infarction"]))
Plurals(Synonyms("neoplasm"))

The nesting depth is limited to two, as these examples illustrate.

API

The API endpoint nlpql_expander allows users to view the results of macro expansion. For instance, to expand macros in the NLPQL file macros.nlpql, HTTP POST the file to the nlpql_expander API endpoint with this cURL [2] command:

curl -i -X POST http://localhost:5000/nlpql_expander -H "Content-Type: text/plain" --data-binary "@macros.nlpql"

Another HTTP client, such as Postman [3], could also be used to POST the file.

Examples

Here is an example that illustrates the use of the NLPQL macros.

Consider this termset for symptoms related to influenza:

termset FluTermset: [
"coughing",
OHDSI.Synonyms("fever"),
Synonyms("body ache"),
VerbInflections("have fever"),
];

After macro expansion, the termset becomes:

termset FluTermset: [
"coughing",
"febrile", "fever", "fever (finding)", "pyrexia", "pyrexial",
"body ache", "body aching", ... "torso aching", "trunk ache", "trunk aching",
"had fever", "has fever", "have fever", "having fever",
];

Some synonyms for “body ache” have been omitted. The result will obviously require editing and removal of irrelevant synonyms. One could use the macros as part of an iterative development process for termsets, using the macros to generate initial lists of terms which would then be pruned and refined.