This experiment uses data derived from MRI scans to diagnose schizophrenia in patients in order to provide a second opinion for doctors.
## Introduction ## Over 200,000 people in the United States suffer from a somewhat obscure but extremely detrimental disorder known as schizophrenia. Those with the disorder seem, to an outside viewer, to be out of touch with reality and act in a disorganized fashion that prevents them from performing daily activities. The root cause of schizophrenia is as yet unknown, though scientists believe that it results from a combination of factors, such as genetics, the environment, and abnormal brain chemistry. When it affects someone, it can last for many years or even an entire lifetime. However, through treatment with medications and intensive care, conditions could improve for many. In order to provide that treatment, though, the disease must first be diagnosed. The primary goal of this project is to corroborate the diagnoses of doctors when evaluating the presence of schizophrenia. Due to the fact that substance abuse, bipolar disorder, and other similar afflictions can result in symptoms similar to those of schizophrenia, MRI scans are taken to validate the doctor’s initial thoughts on the type of disorder. Computers also have the potential to take into account information gleaned from these scans, and, using machine learning, provide confirmation to doctors’ diagnoses. ## Data ## - The data is publicly available and was collected by the Mind Research Network by Dr. Vince Calhoun. - Information from 86 total patients has been aggregated in the dataset. - There are 2 types of data in the set: - Source-Based Morphometry Loadings (SBM), derived through independent component analysis (ICA) on gray-matter concentration, essentially show the “computational power” of the indicated regions of the brain, as the outer sheet is the location of signal processing. Readings of lower values indicate low gray matter concentration and could possibly point towards the presence of schizophrenia in a subject. - Functional Network Connectivity Values (FNC) are functional modality features, as they describe patterns in brain function. Using fMRI scans and Group Independent Component Analysis (GICA), they show synchronicity between brain networks, providing connectivity patterns over time. ## Feature Selection ## In the original dataset, there are over 400 columns of data. Narrowing down the number of columns often improves both accuracy and efficiency, even though it entails losing some aspects of the data. There were two different methods employed for feature selection in this project. The first method was carried out by adding a column of random numbers. Azure ML has a module called “Permutation Feature Importance,” which provides a numerical value for how important a certain column is in determining the end result. Any column that had an importance value less than the random number column was taken out. Another method of feature selection used Azure ML’s module “Filter Based Feature Selection.” The type of feature selection chosen for this module was “Mutual Information,” where the algorithm measures how well a column reduces the uncertainty of the result and then eliminates the columns that don’t benefit the experiment in this way. Reducing uncertainty is an extremely important part of diagnosis of disorders like schizophrenia because the reason to use a computer in the first place is to provide validation to a doctor’s assessment of the problem. ## Results ## The model generated an accuracy of about 80%, a value far better than a baseline random guessing. Perhaps with more training data or different types of data, this accuracy value can be further increased. Doctors can rely on the algorithm, which is meant to provide a second opinion concerning the diagnosis.