Overview/Challenge
The roll out of electronic health records (EHRs) has transformed data science opportunities across all specialties. These data resources are already sizeable in UK mental healthcare because of the implementation of EHRs in many service providers 10 or more years before Acute Care. However, realising the opportunities of mental health records for research depends substantially on the development and application of natural language processing (NLP) because the most valuable information (for example, on presentations, risk factors, interventions, and outcomes) is contained in text rather than structured fields, and information currently in pre-structured format from the standard EHR is very limited in its utility for mental health research.
Substantial progress has been made in this field at the NIHR Maudsley Biomedical Research Centre via the Clinical Record Interactive Search (CRIS) platform at the South London and Maudsley NHS Foundation Trust (SLaM), in collaboration with King’s College London, with over 100 NLP applications developed over the last 10 years and in active routine use by researchers. These have potential utility for other Trusts and EHR data platforms but there is currently no means of sharing functionality, either for the Maudsley team or any other NLP provider. In order to achieve this functionality, the Mental Health Text Analytics Cloud (MH-TAC) has been created. This is a prototype cloud-based platform set up within the NHS firewall that will allow NLP algorithm hosts to have secure domains within which uploaded text files (e.g., from different mental health services) can be processed and returned to users with the required NLP-derived meta-data.
The provision of technical infrastructure for sharing functionality is an important step but not the only requirement. In addition to this, appropriate and acceptable governance procedures and data sharing agreements need to be in place to permit transfers of clinical data; also, further evaluation and modification may well be required to algorithms to assess cross-site applicability and address any deficiencies uncovered. Therefore, provision of wider NLP functionality is better conceptualised as a service than as a technical solution; however, the development and configuration of an NLP service, in turn, requires a technical solution and proof-of-concept pilot studies. Ultimately, impetus is needed to roll out a prototype NLP service in order to ascertain how it might be best configured and supported in future – something we are intending to provide through DATAMIND’s Roadbuilder 4 initiative.
Impact and Outcomes
- We have demonstrated technical proof of concept for MHTAC functionality.
- We have assembled the necessary resources to evaluate service-level functionality for MH-TAC.
- These are both building on extensive previous and continuing research output using mental healthcare NLP resources being developed as CRIS architecture (e.g., underpinning most of the 300+ publications to date).
- We have set up the beginnings of a nationwide collaborative network to support and coordinate mental healthcare NLP (and thus mental health data science more broadly).
- UK mental healthcare is currently comfortably ahead of the field internationally in relation to the research application of NLP.
What’s next?
- Via the collaborative network, we hope to engage with the HDRUK Gateway and Phenotype Library to provide resources and templates for standardising NLP application communication and provision (FAIR principles).
- Now that the necessary technical infrastructure and governance templates are in place, we hope to evaluate MH-TAC as a prototype NLP provision service, ascertaining and communicating the ways in which this might be optimised.
- The VISION Consortium should provide similar opportunities for proof-of-concept research applications of NLP beyond healthcare.
- Through SLaM’s Clinical Informatics Service, and in collaboration with CogStack, we are developing clinical dashboard and caseload management tools which incorporate NLP applications originally developed for research use. We anticipate that this might support wider roll-out of functionality.