ABSTRACT

In past couple of years, Data science has been the buzzword in the business and the scientific world. But today, it has also become a hot topic of interest in the healthcare industry. Data scientists are increasingly using data mining techniques to mine knowledge hidden in large, complex, multi-dimensional, and a variety of domain specific datasets. Moreover, Internet being the largest source of data has a rich collection of not only heterogeneous but also unstructured data which is difficult to manually collect and use in automated processes. In last few years, however, several techniques have come up that facilitates data collection and conversion to structured format. The paper lists applications of data sciences in the health care industry. We have also used web scraping technique to extract data from HTML documents and collect them in R objects to use them for analysis. A logistic regression has been built that predicts whether a patient has diabetes or not based on different parameters. Metrics have been used to determine the significant parameters that help to deduce the right outcome and accuracy of the model is checked to gain confidence in the model.