GitXplorerGitXplorer
d

phis

public
0 stars
2 forks
0 issues

Commits

List of commits on branch master.
Unverified
328312bc64046e0bb777fa4d481cde2b4a933a08

`

bbdaduy committed 6 years ago
Unverified
da89ade4d7dbe77842bdde8c44264a1c6bfc97b3

`

bbdaduy committed 6 years ago
Unverified
04f38246bcb1b8bf2399113ac987555c55ace19b

`

bbdaduy committed 6 years ago
Unverified
9f1610a3301f83de49577d9ef6fc27a5c831837a

`

bbdaduy committed 6 years ago
Unverified
43bcf141349c146da97a24fa1fbf8fed0aff2fed

`

bbdaduy committed 6 years ago
Unverified
c0344a67f061e95e642ff7514cca441537f82db2

`

bbdaduy committed 6 years ago

README

The README file for this repository.

Personal Health Information Scrubber (PHIS)

Introduction

Personal Health Information Scrubber, or PHIS, is a de-identification software helped to remove individual-identifiable protected health information (PHI) (e.g, patient names, SSN, address, etc. ) from clinical texts. Our PHI model is based on an extended definition of HIPAA Privacy Rule which has two models: Safe Harbor (SH) and Limited Data Set (LDS). The SH model included 18 HIPAA-recommended identifiers for patients, their relatives, employers, extended to cover physicians and providers. the LDS model keeps information such as Age, Date, Zip code. LDS model retains more information for research, however, data recipients need to sign Data Use Agreement (DUA) with providers to promise additional data protection mechanisms. PHIS was originally developed at the Informatics Institute, University of Alabama at Birmingham (UAB) and released to public under MIT license.

The majority code of the solution used Java technology. Only the thin client of the client-server model was implemented using C#.NET.

Basic Features

  • Standalone and client-server deployment modes.
  • Manual and automated de-identification.
  • Management of custom PHI terminology defined by Users.
  • Batch processing of multiple documents (standalone version only).
  • Two de-identification modes: "Safe Harbor" AND "Limited Data Set".

Latest Build

How To Run

Minimum Runtime Requirement

  • Windows 7 or later / MacOS
  • Java 8

Standalone Deployment

  • On Shell, execute command:
java -jar phis_standalone.jar
  • On a Windows machine:
open phis_standalone.exe

phis standalone

Client-server deployment

  • To start phis client on a Windows machine:
open phis_client.exe

phis client

  • To start phis_server from shell, execute command:
java -jar phis_server.jar

phis server

  • On the first run, phis_standalone and phis_client will automatically create the "config" folder allowing ad hoc configuration (e.g, custom terminology, networking parameters, etc.)

Developer's Quickstart Guide

Development Tools

  • JDK 1.8.0_192
  • Apache Maven 3.6
  • Visual Studio Community 2017
  • Netbeans 8.2

Compiling

  1. Clone/Download the source code at https://github.com/bdaduy/phis.git
  2. Test if JDK and Maven were imported to environment variables.
java -version
mvn -version
  1. Compiling Java code
  • Navigate to project root folder ("phis"), execute command
mvn clean install
  • Examine artifacts on "phis_standalone\target", "phis_server\target"
  1. Compiling C# code
  • Open "phis_client.sln" with Visual Studio
  • Build -> Rebuild Solution
  • Examine artifacts on "phis_client\bin"

Development Notes

  • Both deployment models used the NLP logics coded in the phis_standalone module. The phis_client user-interface (UI) used Windows Form App (C#.NET) technology, while phis_standalone ui used JavaFX technology.
  • The client-server model may be less flexible, however, it is more secure to utilize confidential resources (e.g, patient and physician databases).
  • The symetric encryption algorithm, Advanced Encryption Standard (AES), was used to encrypt messages between client and server. To change the encryption key, modify the "ChangeMe" value in PHISController.java (server) and in config/AppConfig.xml (client).
  • The de-Identification solution used a mix of methods such as pattern-matching, dictionary-matching, and machine-learning divided into many independent units or pipelines.
  • The machine-learning approach used the conditional random field (CRF) classifier implemented by Stanford NLP Group. The CRF classifiers were trained on the 2014 I2b2 de-identification dataset and the UAB local dataset.
  • PHIS allows user-defined dictionaries by dropping text files to corresponding folder on "config\custom_dict".
  • PHIS might underperform on texts on different domains due to the effect of medical sublanguage. In such situation, users can evolve phis by (1) develop meaningful local terminologies (2) re-train the machine-learning classifer on local data, and (3) Add more rules and ad hoc algorithms to the pipeline.

Trouble Shooting

  • Try to "Run as administrator" if the program cannot start.
  • If you encounter OutOfMemoryError, try to increase maximum Java heap space (e.g,-Xmx1g)

Disclaimer

PHIS was developed for internal use. Some UAB-specific components were not included in the public release. Further uses of the tool in specific situations will need cautiously and comprehensively testing and adaptation. We hold no liability and accoutability for any damages or losses resulted from the uses of the software.

References

  • Developer: Duy Duc An Bui (PhD)
  • Supervisor: James J. Cimino (MD)
  • Related publications:
  Bui DDA, Wyatt M, Cimino JJ. The UAB Informatics Institute and 2016 CEGS N-GRID de-identification shared task challenge. J Biomed Inform. 2017;75S:S54-S61.
  Bui DDA, Redden DT and Cimino JJ. Is Multiclass Automatic Text De-Identification Worth the Effort?. Methods of Information in Medicine.2018; 57(04); pp.177-184.