ABSTRACT

After the data are collected, entered, cleaned, and collated, they are ready for the rigors of analysis. This includes procedures such as generating correct estimates of various parameters (e.g., incidence and prevalence), building up confidence intervals, testing statistical significance, and assessing the strength and type of relationship. Actual methods depend on the nature of data, the type of hypotheses to be examined, and the theoretical conjectures that form the foundation of the study.

Statistical analysis of data is done in several ways. For descriptive studies and for some analytical studies, the primary objective is to find the percentage of cases that have a particular outcome, or the average level of a medical measurement. This is called estimation. This estimate is accompanied by what is called the confidence interval that delineates the range of values beyond which a sample summary is unlikely to lie in repeated samples. Common methods of obtaining this are provided in this chapter. The other important activity under data analysis is the test of hypothesis whereby we find how likely the values obtained in our study can be from a presumed population. This requires the concept of P-value and power. These also are briefly discussed. The basic methods for testing significance (this term seems to be on its way out), such as chi-square for qualitative data and Student’s t-test for quantitative data, are presented. The next discussion is on regression, which is used to study the relationship between two or more characteristics. This includes both the ordinary least square where the dependent is a quantitative measurement and the logistic where the dependent is binary. This section also contains a brief explanation of correlation coefficient. The methods for assessing the cause–effect relationship and for validation of results are presented later in this chapter and finally statistical fallacies that so commonly arise in medical research are discussed.