Clarity.CQLExecutionTask¶
Description¶
This is a custom task that allows ClarityNLP to execute CQL (Clinical Quality Language) queries embedded in NLPQL files. ClarityNLP directs CQL code to a FHIR (Fast Healthcare Interoperability Resources) server, which runs the query and retrieves structured data for a single patient. The data returned from the CQL query appears in the job results for the NLPQL file.
The CQL query requires several FHIR-related parameters, such as the patient ID, the URL of the FHIR server, and several others to be described below. These parameters can either be specified in the NLPQL file itself or supplied by ClarityNLP as a Service.
Documentsets for Unstructured and Structured Data¶
ClarityNLP was originally designed to process unstructured text documents. In a typical workflow the user specifies a documentset in an NLPQL file, along with the tasks and NLPQL expressions needed to process the documents. ClarityNLP issues a Solr query to retrieve the matching documents, which it divides into batches. ClarityNLP launches a separate task per batch to process the documents in parallel. The number of tasks spawned by the Luigi scheduler depends on the number of unstructured documents returned by the Solr query. In general, the results obtained include data from multiple patients.
ClarityNLP can also support single-patient structured CQL queries with a few
simple modifications to the documentset. For CQL queries the documentset must
be specified in the NLPQL file so that it limits the unstructured documents to
those for a single patient only. FHIR is essentially a single-patient
readonly data retrieval standard. Each patient with data stored on a FHIR
server has a unique patient ID. This ID must be used in the documentset
statement and in the Clarity.CQLExecutionTask
body itself, as illustrated
below. The documentset specifies the unstructured data for the patient, and
the CQL query specifies the structured data for the patient.
Relevant FHIR Parameters¶
These parameters are needed to connect to the FHIR server, evaluate the CQL
statements, and retrieve the results. They can be provided directly as
parameters in the CQLExecutionTask
statement (see below), or indirectly
via ClarityNLPaaS:
Parameter | Meaning |
---|---|
cql_eval_url | URL of the FHIR server’s CQL Execution Service |
patient_id | Unique ID of patient whose data will be accessed |
fhir_data_service_uri | FHIR service base URL |
fhir_terminology_service_endpoint | Set to Terminology Service Endpoint |
fhir_terminology_service_uri | URI for a service that conforms to the FHIR Terminology Service Capability Statement |
fhir_terminology_user_name | Username for terminology service authentication |
fhir_terminology_user_password | Password for terminology service authentication |
The terminology user name and password parameters may not be required, depending on whether or not the terminology server enforces password authentication.
Time Filtering¶
This task supports a time filtering capability for the CQL query results. Two
optional parameters, time_start
and time_end
, can be used to
specify a time window. Any results whose timestamps lie outside of this
window will be discarded. If the time window parameters are omitted, all
results from the CQL query will be kept.
The time_start
and time_end
parameters must be quoted strings with
syntax as follows:
DATETIME(YYYY, MM, DD, HH, mm, ss)
DATE(YYYY, MM, DD)
EARLIEST()
LATEST()
An optional offset in days can be added or subtracted to these:
LATEST() - 7d
DATE(2010, 7, 15) + 20d
The offset consists of digits followed by a d
character, indicating days.
Both ``time_start`` and ``time_end`` are assumed to be expressed in Universal Coordinated Time (UTC).
Here are some time window examples:
1. Discard any results not occurring in March, 2016:
"time_start":"DATE(2016, 03, 01)",
"time_end":"DATE(2016, 03, 31)"
2. Keep all results within one week of the most recent result:
"time_start":"LATEST() - 7d",
"time_end":"LATEST()"
3. Keep all results within a window of 20 days beginning July 4, 2018, at 3 PM:
"time_start":"DATETIME(2018, 7, 4, 15, 0, 0)",
"time_end":"DATETIME(2018, 7, 4, 15, 0, 0) + 20d"
Note that the strings to the left and right of the colon must be surrounded by quotes.
Example¶
Here is an example of how to use the CQLExecutionTask
directly, without
using ClarityNLPaaS. In the text box below there is a documentset creation
statement followed by an invocation of the CQLExecutionTask
. The
documentset consists of all indexed documents for patient 99999
with a
source
field equal to MYDOCS
. These documents are specified explicitly
in the CQLExecutionTask
invocation that follows, to limit the source
documents to those for patient 99999 only.
The task_index
parameter is used in an interprocess communication scheme
for controlling task execution. ClarityNLP’s Luigi scheduler creates worker
task clones in proportion to the number of unstructured documents in the
documentset. Only a single task from among the clones should actually connect
to the FHIR server, run the CQL query, and retrieve the structured data.
ClarityNLP uses the task_index
parameter to identify the single task
that should execute the CQL query. Any NLPQL file can contain multiple
invocations of Clarity.CQLExecutionTask
. Each of these should have
a task_index
parameter, and they should be numbered sequentially starting
with 0. In other words, each define
statement containing an invocation
of Clarity.CQLExecutionTask
should have a unique value for the zero-based
task_index
.
The patient_id
parameter identifies the patient whose data will be accessed
by the CQL query. This ID should match that specified in the documentset
creation statement.
The remaining parameters from the table above are set to values appropriate for GA Tech’s FHIR infrastructure.
The cql
parameter is a triple-quoted string containing the CQL query.
This CQL code is assumed to be syntactically correct and is passed to the FHIR
server’s CQL evaluation service unaltered. All CQL code should be checked for
syntax errors and other problems prior to its use in an NLPQL file.
This example omits the optional time window parameters.
documentset PatientDocs:
Clarity.createDocumentSet({
"filter_query":"source:MYDOCS AND subject:99999"
});
define WBC:
Clarity.CQLExecutionTask({
documentset: [PatientDocs],
"task_index": 0,
"patient_id":"99999",
"cql_eval_url":"https://gt-apps.hdap.gatech.edu/cql/evaluate",
"fhir_data_service_uri":"https://apps.hdap.gatech.edu/gt-fhir/fhir/",
"fhir_terminology_service_uri":"https://cts.nlm.nih.gov/fhir/",
"fhir_terminology_service_endpoint":"Terminology Service Endpoint",
"fhir_terminology_user_name":"username",
"fhir_terminology_user_password":"password",
cql: """
library Retrieve2 version '1.0'
using FHIR version '3.0.0'
include FHIRHelpers version '3.0.0' called FHIRHelpers
codesystem "LOINC": 'http://loinc.org'
define "WBC": Concept {
Code '26464-8' from "LOINC",
Code '804-5' from "LOINC",
Code '6690-2' from "LOINC",
Code '49498-9' from "LOINC"
}
context Patient
define "result":
[Observation: Code in "WBC"]
"""
});
context Patient;
Arguments¶
Name | Type | Required | Notes |
---|---|---|---|
documentset | documentset | Yes | Documents for a SINGLE patient only. |
task_index | int | Yes | Each CQLExecutionTask statement must have a unique value of this index. |
patient_id | str | Yes | CQL query executed on FHIR server for this patient. |
cql_eval_url | str | Yes | See table above. |
fhir_data_service_uri | str | Yes | See table above. |
fhir_terminology_service_uri | str | Yes | See table above. |
fhir_terminology_service_endpoint | str | Yes | See table above. |
cql | triple-quoted str | Yes | Properly-formatted CQL query, sent verbatim to FHIR server. |
fhir_terminology_user_name | str | No | Optional, depends on configuration of terminology server |
fhir_terminology_user_password | str | No | Optional, depends on configuration of terminology server |
time_start | str | No | Optional, discard results with timestamp < time_start |
time_end | str | No | Optional, discard results with timestamp > time_end |
Results¶
The specific fields returned by the CQL query are dependent on the type of FHIR
resource that contains the data. ClarityNLP can decode these FHIR resource types:
Patient
, Procedure
, Condition
, and Observation
. It can also decode
bundles of these resource types.
Fields in the MongoDB result documents are prefixed with the type of FHIR resource
from which they were taken except for the datetime
field, which omits the
prefix to enable date-based sorting. The prefixes for each are:
FHIR Resource Type | Prefix |
---|---|
Patient | patient |
Procedure | procedure |
Condition | condition |
Observation | obs |
The fields returned for the Patient
resource are:
Field Name | Meaning |
---|---|
patient_subject | patient id |
patient_fname_1 | patient first name (could have multiple first names, numbered sequentially) |
patient_lname_1 | patient last name (could have multiple last names, numbered sequentially) |
patient_gender | gender of the patient |
patient_date_of_birth | date of birth in YYYY-MM-DD format |
The fields returned for the Procedure
resource are:
Field Name | Meaning |
---|---|
procedure_id_value | ID of the procedure |
procedure_status | status indicator for the procedure |
procedure_codesys_code_1 | code for the procedure; multiple codes are numbered sequentially |
procedure_codesys_system_1 | code system; multiple code systems are numbered sequentially |
procedure_codesys_display_1 | code system procedure name; multiple names are numbered sequentially |
procedure_subject_ref | typically the string ‘Patient/’ followed by a patient ID, i.e. Patient/99999 |
procedure_subject_display | patient full name string |
procedure_context_ref | typically the string ‘Encounter/’ followed by a number, i.e. Encounter/31491 |
procedure_performed_date_time | timestamp of the procedure in YYYY-MM-DDTHH:mm:ss+hhmm format |
datetime | identical to procedure_performed_date_time |
The fields returned for the Condition
resource are:
Field Name | Meaning |
---|---|
condition_id_value | ID of the condition |
condition_category_code_1 | category code value; multiple codes are numbered sequentially |
condition_category_system_1 | category code system; multiple code systems are numbered sequentially |
condition_category_display_1 | category name; multiple names are numbered sequentially |
condition_codesys_code_1 | code for the condition; multiple codes are numbered sequentially |
condition_codesys_system_1 | code system; multiple code systems are numbered sequentially |
condition_codesys_display_1 | code system condition name; multiple names are numbered sequentially |
condition_subject_ref | typically the string ‘Patient/’ followed by a patient ID, i.e. Patient/99999 |
condition_subject_display | patient full name string |
condition_context_ref | typically the string ‘Encounter/’ followed by a number, i.e. Encounter/31491 |
condition_onset_date_time | timestamp of condition onset in YYYY-MM-DDTHH:mm:ss+hhmm format |
datetime | identical to condition_onset_date_time |
condition_abatement_date_time | timestamp of condition abatement in YYYY-MM-DDTHH:mm:ss+hhmm format |
end_datetime | identical to condition_abatement_date_time |
The fields returned for the Observation
resource are:
Field Name | Meaning |
---|---|
obs_codesys_code_1 | code for the observation; multiple codes are numbered sequentially |
obs_codesys_system_1 | code system; multiple code systems are numbered sequentially |
obs_codesys_display_1 | code system observation name; multiple names are numbered sequentially |
obs_subject_ref | typically the string ‘Patient/’ followed by a patient ID, i.e. Patient/99999 |
obs_subject_display | patient full name string |
obs_context_ref | typically the string ‘Encounter/’ followed by a number, i.e. Encounter/31491 |
obs_value | numberic value of what was observed or measured |
obs_unit | string identifying the units for the value observed |
obs_unit_system | typically a URL with information on the units used |
obs_unit_code | unit string with customary abbreviations |
obs_effective_date_time | timestamp in YYYY-MM-DDTHH:mm:ss+hhmm format |
datetime | identical to obs_effective_date_time |
Collector¶
No