Most introductory statistics courses focus on concepts and techniques for statistical inference that allow researchers to quantify how strongly the available data support or contradict scientific hypotheses about the associations between covariates to outcomes of interest. Much less time is spent on the distinct problem of prediction, i.e., how to build a model using existing data that accurately predicts future outcomes based on individual characteristics. The prediction problem is of great interest in the health sciences; for example, the digitization of electronic health records and adoption of automated clinical decision support systems has made it possible for risk prediction algorithms to be tightly integrated into clinical care. At the same time, uncritical use of automated algorithms has the potential to increase health disparities. This course introduces key concepts and techniques that are relevant to using and assessing prediction models for biomedical data. Students will learn how to use statistical machine learning models to predict binary outcomes (logistic regression, classification trees, support vector machines) and continuous outcomes (linear regression, regression trees, generalized additive models), and how to compare the performance of multiple models using cross-validation and sample splitting. Additional topics will include ensemble methods, feature selection, clustering, common pitfalls in the use of prediction models for biomedical data, and issues of algorithmic fairness as they relate to health equity. For the final project, students will use a dataset in a biomedical area of interest to them to build and assess the performance of a prediction model. Methods will be illustrated and implemented in R.
Gopher Grades is maintained by Social Coding with data from Summer 2017 to Summer 2024 provided by the Office of Institutional Data and Research
Privacy Policy