Clarity.ngram¶
Description¶
Task that aggregates n-grams across the selected document set. Uses textacy. There’s no need to specify final on this task. Any n-gram that occurs at at least the minimum frequency will show up in the final result.
Example¶
define demographicsNgram:
Clarity.ngram({
termset:[DemographicTerms],
"n": "3",
"filter_nums": false,
"filter_stops": false,
"filter_punct": true,
"min_freq": 2,
"lemmas": true,
"limit_to_termset": true
});
Arguments¶
Name | Type | Required | Notes |
---|---|---|---|
termset | termset | No | |
documentset | documentset | No | |
cohort | cohort | No | |
n | int | No | Default = 2 |
filter_nums | bool | No | Default = false; Exclude numbers from n-grams |
filter_stops | bool | No | Default = true; Exclude stop words |
filter_punct | bool | No | Default = true; Exclude punctuation |
lemmas | bool | No | Default = true; Converts work tokens to lemmas |
limit_to_termset | bool | No | Default = false; Only include n-grams that contain at least one term from termset |
min_freq | bool | No | Default = 1; Minimum frequency for n-gram to return in final result |
Results¶
Name | Type | Notes |
---|---|---|
text | str | The n-gram detected |
count | int | The number of occurrences of the n-gram |