Document Type

Conference Proceeding

Publication Date



Clinical health information systems capture massive amounts of unstructured data from various health and medical facilities. This study utilizes unstructured patient clinical text data to develop an intelligent assistant that can identify possible related diagnoses based on a given text input. The approach applies a one-vs-rest binary classification technique wherein given an input text data, it is identified whether it can be positively or negatively classified for a given diagnosis. Multi-layer Feed-Forward Neural Network models were developed for each individual diagnosis case. The task of the intelligent assistant is to iterate over all the different models and return those that output a positive diagnosis. To validate the performance of the models, the performance metrics were compared against Naive Bayes, Decision Trees, and K-Nearest Neighbor. The results show that the neural network learner provided better performance scores in both accuracy and area under the curve metric scores. Further, testing on multiple diagnoses also shows that the methodology for developing the diagnosis models can be replicated for development of models for other diseases as well.