Solr Setup and Configuration

Data types

We use standard Solr data types with one custom data type, searchText. searchText is a text field, tokenized on spaces, with filtering to support case insensitivity.


All documents in ClarityNLP are stored in Solr. These are the minimal required fields:

        "report_type":"Report Type",
        "source":"My Institution",
        "report_text":"Report text here"

id and report_id should be unique in the data set, but can be equal. report_text should be plain text. subject is generally the patient identifier, but could also be some other identifier, such as drug_name. source is generally your institution or the name of the document set.

Additional fields can be added to store additional metadata. The following fields are allowable as dynamic fields:

  • *_section (searchText); e.g. past_medical_history_section (for indexing specific sections of notes)
  • *_id (long) e.g.doctor_id (any other id you wish to store)
  • *_ids (long, multiValued) e.g. medication_ids (any other id as an array)
  • *_system (string) e.g. code_system (noting any system values)
  • *_attr (string) e.g.clinic_name_attr (any single value custom attribute)
  • *_attrs (string, multiValued) e.g. insurer_names (any multi valued custom attribute)

Custom Solr Setup

This should be completed for you if you are using Docker. However, here are the commands to setup Solr.

curl -X POST -H 'Content-type:application/json' --data-binary '{
      "add-field-type" : {
         "analyzer" : {
               "pattern":"([a-zA-Z])\\\\1+" }],
               "class":"solr.WhitespaceTokenizerFactory" },
               "preserveOriginal":"0" },
               {"class": "solr.LowerCaseFilterFactory"
    }' http://localhost:8983/solr/report_core/schema
  • Add standard fields (Solr 6):
curl -X POST -H 'Content-type:application/json' --data-binary '{
}' http://localhost:8983/solr/report_core/schema
  • Add standard fields (Solr 7 and later):
curl -X POST -H 'Content-type:application/json' --data-binary '{
}' http://localhost:8983/solr/report_core/schema
  • Add dynamic fields (Solr 6):
curl -X POST -H 'Content-type:application/json' --data-binary '{
}' http://localhost:8983/solr/report_core/schema
  • Add dynamic fields (Solr 7 and later):
curl -X POST -H 'Content-type:application/json' --data-binary '{
}' http://localhost:8983/solr/report_core/schema

Deleting documents

These commands will permanently delete your documents; use with caution.

Delete documents based on a custom query:

curl "http://localhost:8983/solr/report_core/update?commit=true" -H "Content-Type: text/xml" --data-binary '<delete><query>source:"My Source"</query></delete>'

Delete all documents:

curl "http://localhost:8983/solr/report_core/update?commit=true" -H "Content-Type: text/xml" --data-binary '<delete><query>*:*</query></delete>'