Clarity.CQLExecutionTask¶

Description¶

This is a custom task that allows ClarityNLP to execute CQL (Clinical Quality Language) queries embedded in NLPQL files. ClarityNLP directs CQL code to a FHIR (Fast Healthcare Interoperability Resources) server, which runs the query and retrieves structured data for a single patient. The data returned from the CQL query appears in the job results for the NLPQL file.

The CQL query requires several FHIR-related parameters, such as the patient ID, the URL of the FHIR server, and several others to be described below. These parameters can either be specified in the NLPQL file itself or supplied by ClarityNLP as a Service.

Documentsets for Unstructured and Structured Data¶

ClarityNLP was originally designed to process unstructured text documents. In a typical workflow the user specifies a documentset in an NLPQL file, along with the tasks and NLPQL expressions needed to process the documents. ClarityNLP issues a Solr query to retrieve the matching documents, which it divides into batches. ClarityNLP launches a separate task per batch to process the documents in parallel. The number of tasks spawned by the Luigi scheduler depends on the number of unstructured documents returned by the Solr query. In general, the results obtained include data from multiple patients.

ClarityNLP can also support single-patient structured CQL queries with a few simple modifications to the documentset. For CQL queries the documentset must be specified in the NLPQL file so that it limits the unstructured documents to those for a single patient only. FHIR is essentially a single-patient readonly data retrieval standard. Each patient with data stored on a FHIR server has a unique patient ID. This ID must be used in the documentset statement and in the Clarity.CQLExecutionTask body itself, as illustrated below. The documentset specifies the unstructured data for the patient, and the CQL query specifies the structured data for the patient.

Relevant FHIR Parameters¶

These parameters are needed to connect to the FHIR server, evaluate the CQL statements, and retrieve the results. They can be provided directly as parameters in the CQLExecutionTask statement (see below), or indirectly via ClarityNLPaaS:

Parameter	Meaning
cql_eval_url	URL of the FHIR server’s CQL Execution Service
patient_id	Unique ID of patient whose data will be accessed
fhir_data_service_uri	FHIR service base URL
fhir_terminology_service_endpoint	Set to `Terminology Service Endpoint`
fhir_terminology_service_uri	URI for a service that conforms to the FHIR Terminology Service Capability Statement
fhir_terminology_user_name	Username for terminology service authentication
fhir_terminology_user_password	Password for terminology service authentication

The terminology user name and password parameters may not be required, depending on whether or not the terminology server enforces password authentication.

Time Filtering¶

This task supports a time filtering capability for the CQL query results. Two optional parameters, time_start and time_end, can be used to specify a time window. Any results whose timestamps lie outside of this window will be discarded. If the time window parameters are omitted, all results from the CQL query will be kept.

The time_start and time_end parameters must be quoted strings with syntax as follows:

DATETIME(YYYY, MM, DD, HH, mm, ss)
DATE(YYYY, MM, DD)
EARLIEST()
LATEST()

An optional offset in days can be added or subtracted to these:

LATEST() - 7d
DATE(2010, 7, 15) + 20d

The offset consists of digits followed by a d character, indicating days.

Both ``time_start`` and ``time_end`` are assumed to be expressed in Universal Coordinated Time (UTC).

Here are some time window examples:

1. Discard any results not occurring in March, 2016:

"time_start":"DATE(2016, 03, 01)",
  "time_end":"DATE(2016, 03, 31)"

2. Keep all results within one week of the most recent result:

"time_start":"LATEST() - 7d",
  "time_end":"LATEST()"

3. Keep all results within a window of 20 days beginning July 4, 2018, at 3 PM:

"time_start":"DATETIME(2018, 7, 4, 15, 0, 0)",
  "time_end":"DATETIME(2018, 7, 4, 15, 0, 0) + 20d"

Note that the strings to the left and right of the colon must be surrounded by quotes.

Example¶

Here is an example of how to use the CQLExecutionTask directly, without using ClarityNLPaaS. In the text box below there is a documentset creation statement followed by an invocation of the CQLExecutionTask. The documentset consists of all indexed documents for patient 99999 with a source field equal to MYDOCS. These documents are specified explicitly in the CQLExecutionTask invocation that follows, to limit the source documents to those for patient 99999 only.

The task_index parameter is used in an interprocess communication scheme for controlling task execution. ClarityNLP’s Luigi scheduler creates worker task clones in proportion to the number of unstructured documents in the documentset. Only a single task from among the clones should actually connect to the FHIR server, run the CQL query, and retrieve the structured data.

ClarityNLP uses the task_index parameter to identify the single task that should execute the CQL query. Any NLPQL file can contain multiple invocations of Clarity.CQLExecutionTask. Each of these should have a task_index parameter, and they should be numbered sequentially starting with 0. In other words, each define statement containing an invocation of Clarity.CQLExecutionTask should have a unique value for the zero-based task_index.

The patient_id parameter identifies the patient whose data will be accessed by the CQL query. This ID should match that specified in the documentset creation statement.

The remaining parameters from the table above are set to values appropriate for GA Tech’s FHIR infrastructure.

The cql parameter is a triple-quoted string containing the CQL query. This CQL code is assumed to be syntactically correct and is passed to the FHIR server’s CQL evaluation service unaltered. All CQL code should be checked for syntax errors and other problems prior to its use in an NLPQL file.

This example omits the optional time window parameters.

documentset PatientDocs:
 Clarity.createDocumentSet({
     "filter_query":"source:MYDOCS AND subject:99999"
 });

 define WBC:
     Clarity.CQLExecutionTask({
         documentset: [PatientDocs],
         "task_index": 0,
         "patient_id":"99999",
         "cql_eval_url":"https://gt-apps.hdap.gatech.edu/cql/evaluate",
         "fhir_data_service_uri":"https://apps.hdap.gatech.edu/gt-fhir/fhir/",
         "fhir_terminology_service_uri":"https://cts.nlm.nih.gov/fhir/",
         "fhir_terminology_service_endpoint":"Terminology Service Endpoint",
         "fhir_terminology_user_name":"username",
         "fhir_terminology_user_password":"password",
         cql: """
              library Retrieve2 version '1.0'

              using FHIR version '3.0.0'

              include FHIRHelpers version '3.0.0' called FHIRHelpers

              codesystem "LOINC": 'http://loinc.org'

              define "WBC": Concept {
                  Code '26464-8' from "LOINC",
                  Code '804-5' from "LOINC",
                  Code '6690-2' from "LOINC",
                  Code '49498-9' from "LOINC"
              }

              context Patient

              define "result":
                  [Observation: Code in "WBC"]
              """
     });

     context Patient;

Extends¶

BaseTask

Arguments¶

Name	Type	Required	Notes
documentset	documentset	Yes	Documents for a SINGLE patient only.
task_index	int	Yes	Each CQLExecutionTask statement must have a unique value of this index.
patient_id	str	Yes	CQL query executed on FHIR server for this patient.
cql_eval_url	str	Yes	See table above.
fhir_data_service_uri	str	Yes	See table above.
fhir_terminology_service_uri	str	Yes	See table above.
fhir_terminology_service_endpoint	str	Yes	See table above.
cql	triple-quoted str	Yes	Properly-formatted CQL query, sent verbatim to FHIR server.
fhir_terminology_user_name	str	No	Optional, depends on configuration of terminology server
fhir_terminology_user_password	str	No	Optional, depends on configuration of terminology server
time_start	str	No	Optional, discard results with timestamp < time_start
time_end	str	No	Optional, discard results with timestamp > time_end

Results¶

The specific fields returned by the CQL query are dependent on the type of FHIR resource that contains the data. ClarityNLP can decode these FHIR resource types: Patient, Procedure, Condition, and Observation. It can also decode bundles of these resource types.

Fields in the MongoDB result documents are prefixed with the type of FHIR resource from which they were taken except for the datetime field, which omits the prefix to enable date-based sorting. The prefixes for each are:

FHIR Resource Type	Prefix
Patient	patient
Procedure	procedure
Condition	condition
Observation	obs

The fields returned for the Patient resource are:

Field Name	Meaning
patient_subject	patient id
patient_fname_1	patient first name (could have multiple first names, numbered sequentially)
patient_lname_1	patient last name (could have multiple last names, numbered sequentially)
patient_gender	gender of the patient
patient_date_of_birth	date of birth in YYYY-MM-DD format

The fields returned for the Procedure resource are:

Field Name	Meaning
procedure_id_value	ID of the procedure
procedure_status	status indicator for the procedure
procedure_codesys_code_1	code for the procedure; multiple codes are numbered sequentially
procedure_codesys_system_1	code system; multiple code systems are numbered sequentially
procedure_codesys_display_1	code system procedure name; multiple names are numbered sequentially
procedure_subject_ref	typically the string ‘Patient/’ followed by a patient ID, i.e. Patient/99999
procedure_subject_display	patient full name string
procedure_context_ref	typically the string ‘Encounter/’ followed by a number, i.e. Encounter/31491
procedure_performed_date_time	timestamp of the procedure in YYYY-MM-DDTHH:mm:ss+hhmm format
datetime	identical to procedure_performed_date_time

The fields returned for the Condition resource are:

Field Name	Meaning
condition_id_value	ID of the condition
condition_category_code_1	category code value; multiple codes are numbered sequentially
condition_category_system_1	category code system; multiple code systems are numbered sequentially
condition_category_display_1	category name; multiple names are numbered sequentially
condition_codesys_code_1	code for the condition; multiple codes are numbered sequentially
condition_codesys_system_1	code system; multiple code systems are numbered sequentially
condition_codesys_display_1	code system condition name; multiple names are numbered sequentially
condition_subject_ref	typically the string ‘Patient/’ followed by a patient ID, i.e. Patient/99999
condition_subject_display	patient full name string
condition_context_ref	typically the string ‘Encounter/’ followed by a number, i.e. Encounter/31491
condition_onset_date_time	timestamp of condition onset in YYYY-MM-DDTHH:mm:ss+hhmm format
datetime	identical to condition_onset_date_time
condition_abatement_date_time	timestamp of condition abatement in YYYY-MM-DDTHH:mm:ss+hhmm format
end_datetime	identical to condition_abatement_date_time

The fields returned for the Observation resource are:

Field Name	Meaning
obs_codesys_code_1	code for the observation; multiple codes are numbered sequentially
obs_codesys_system_1	code system; multiple code systems are numbered sequentially
obs_codesys_display_1	code system observation name; multiple names are numbered sequentially
obs_subject_ref	typically the string ‘Patient/’ followed by a patient ID, i.e. Patient/99999
obs_subject_display	patient full name string
obs_context_ref	typically the string ‘Encounter/’ followed by a number, i.e. Encounter/31491
obs_value	numberic value of what was observed or measured
obs_unit	string identifying the units for the value observed
obs_unit_system	typically a URL with information on the units used
obs_unit_code	unit string with customary abbreviations
obs_effective_date_time	timestamp in YYYY-MM-DDTHH:mm:ss+hhmm format
datetime	identical to obs_effective_date_time

Collector¶

No