GitXplorerGitXplorer
d

phis

public
0 stars
2 forks
0 issues

Commits

List of commits on branch master.
Unverified
827684157d5d072d4af8e9036d81fd6f8fc2f5b9

`

bbdaduy committed 6 years ago
Unverified
9531c58b3cc8bf35c9cbeef44aa8437145f05d1f

`

bbdaduy committed 6 years ago
Unverified
7b15aa8c75d887fdb3c29e678146260f259aef7f

`

bbdaduy committed 6 years ago
Unverified
f104b214dff42608dd88214e0f918e165fca70c2

`

bbdaduy committed 6 years ago
Unverified
5c6ccc75cb9970893964f18bbd2b49a167b8c456

`

bbdaduy committed 6 years ago
Unverified
153c7a33207f4cdefdb250361fe93931c691c457

`

bbdaduy committed 6 years ago

README

The README file for this repository.

Personal Health Information Scrubber (PHIS)

Introduction

Personal Health Information Scrubber, or PHIS, is a de-identification software helped to remove individual-identifiable protected health information (PHI) (e.g, patient names, SSN, address, etc. ) from clinical texts. Our PHI model is based on an extended definition of HIPAA Privacy Rule which has two models: Safe Harbor (SH) and Limited Data Set (LDS). The SH model included 18 HIPAA-recommended identifiers for patients, their relatives, employers, extended to cover physicians and providers. the LDS model keeps information such as Age, Date, Zip code. LDS model retains more information for research, however, data recipients need to sign Data Use Agreement (DUA) with providers to promise additional data protection mechanisms. PHIS was originally developed at the Informatics Institute, University of Alabama at Birmingham (UAB) and released to public under MIT license.

The majority code of the solution used Java technology. Only the thin client of the client-server model was implemented using C#.NET.

Basic Features

  • Standalone and client-server deployment modes.
  • Manual and automated de-identification.
  • Management of custom PHI terminology defined by Users.
  • Batch processing of multiple documents (standalone version only).
  • Two de-identification modes: "Safe Harbor" AND "Limited Data Set".

Latest Build

How To Run

Minimum Runtime Requirement

  • Windows 7 or later / MacOS
  • Java 8

Standalone Deployment

  • On Shell, execute command:
java -jar phis_standalone.jar
  • On a Windows machine:
open phis_standalone.exe

phis standalone

Client-server deployment

  • To start phis client on a Windows machine:
open phis_client.exe

phis client

  • To start phis_server from shell, execute command:
java -jar phis_server.jar

phis server

  • On the first run, phis_standalone and phis_client will automatically create the "config" folder allowing ad hoc configuration (e.g, custom terminology, networking parameters, etc.)

Developer's Quickstart Guide

Development Tools

  • JDK 1.8.0_192
  • Apache Maven 3.6
  • Visual Studio Community 2017
  • Netbeans 8.2

Compiling

  1. Clone/Download the source code at https://github.com/bdaduy/phis.git
  2. Test if JDK and Maven were imported to environment variables.
java -version
mvn -version
  1. Compiling Java code
  • Navigate to project root folder ("phis"), execute command
mvn clean install
  • Examine artifacts on "phis_standalone\target", "phis_server\target"
  1. Compiling C# code
  • Open "phis_client.sln" with Visual Studio
  • Build -> Rebuild Solution
  • Examine artifacts on "phis_client\bin"

Development Notes

  • Both deployment models used the NLP logics coded in the phis_standalone module. The phis_client user-interface (UI) used Windows Form App (C#.NET) technology, while phis_standalone ui used JavaFX technology.
  • The client-server model may be less flexible, however, it is more secure to utilize confidential resources (e.g, patient and physician databases).
  • The symetric encryption algorithm, Advanced Encryption Standard (AES), was used to encrypt messages between client and server. To change the encryption key, modify the "ChangeMe" value in PHISController.java (server) and in config/AppConfig.xml (client).
  • The de-Identification solution used a mix of methods such as pattern-matching, dictionary-matching, and machine-learning divided into many independent units or pipelines.
  • The machine-learning approach used the conditional random field (CRF) classifier implemented by Stanford NLP Group. The CRF classifiers were trained on the 2014 I2b2 de-identification dataset and the UAB local dataset.
  • PHIS allows user-defined dictionaries by dropping text files to corresponding folder on "config\custom_dict".
  • PHIS might underperform on texts on different domains due to the effect of medical sublanguage. In such situation, users can evolve phis by (1) develop meaningful local terminologies (2) re-train the machine-learning classifer on local data, and (3) Add more rules and ad hoc algorithms to the pipeline.

Trouble Shooting

  • Try to "Run as administrator" if the program cannot start.
  • If you encounter OutOfMemoryError, try to increase maximum Java heap space (e.g,-Xmx1g)

Disclaimer

PHIS was developed for internal use. Some UAB-specific components were not included in the public release. Further uses of the tool in specific situations will need cautiously and comprehensively testing and adaptation. We hold no liability and accoutability for any damages or losses resulted from the uses of the software.

References

  • Developer: Duy Duc An Bui (PhD)
  • Supervisor: James J. Cimino (MD)
  • Related publications:
  Bui DDA, Wyatt M, Cimino JJ. The UAB Informatics Institute and 2016 CEGS N-GRID de-identification shared task challenge. J Biomed Inform. 2017;75S:S54-S61.
  Bui DDA, Redden DT and Cimino JJ. Is Multiclass Automatic Text De-Identification Worth the Effort?. Methods of Information in Medicine.2018; 57(04); pp.177-184.