![]() ![]() ![]() BMI 209 Home Announcements Syllabus Schedule & Handouts Links |
BMI 209 - Statistical Data Mining & Analysis of
Microarray Data Fall
2005, 1 unit Course
coordinators: Ru-Fang Yeh <rufang@biostat.ucsf.edu>
Jane Fridlyand <jfridlyand@cc.ucsf.edu> Office Hours: By appointment. Objective: This course will introduce, illustrate and evaluate a variety of statistical data mining methods employed in the context of large-scale genomic experiments, with an emphasis on applications to DNA microarrays. Techniques covered correspond to commonly encountered research questions and study designs, and include preprocessing/normalization of expression array data, exploratory data analysis, hypothesis testing, linear models, clustering, discrimination, prediction, hidden Markov models and annotation with gene ontology and sequence information. The course will feature extensive discussion and illustration of data mining techniques covered in the text “The Elements of Statistical Learning” by Hastie et al, (2001, Springer) and associated computational tools and resources. A brief overview of microarray technology is included, as is discussion of recent array-related developments and extensions. Upon completion of the course students will be able to: 1. Perform basic microarray data analyses. 2. Identify and use relevant resources (genomic data and tools) for their own research. 3. Assess genomic data analyses performed by others. Topics: 9/15 Lecture 1: Introduction to genomics and microarray technology [Yeh] 9/22 Lecture 2: Introduction to statistics and microarray analysis [Yeh] - Summary statistics and exploratory data analysis methods - Clustering: partitioning and hierarchical methods - Sources of variability and experimental design - Data preprocessing for expression arrays 9/29 Lecture 3: Hypothesis testing and Linear Models [Yeh] - Two-sample statistics - Introduction to linear models for factorial experiments - Multiple testing issues 10/6 Lecture 4: Classification I [Fridlyand] - Linear methods for classification (Hastie Ch.4) - Linear discriminant analysis and variations - Logistic regression 10/13 Lecture 5: Classification II [Fridlyand] - Tree-based methods (Hastie Ch.9) - Ensembles: bagging, boosting, random forests (Hastie Ch.10) 10/20 Lecture 6: Classification III [Segal] - Support vector machines (Hastie Ch.12) - Neural networks (Hastie Ch.11) 10/27 Lecture 7: Model selection [Fridlyand / Segal] - Bias, variance and model complexity (Hastie Ch.7) - Model search (forward/backwards/stochastic) - Model selection criteria: AIC/BIC - Cross validation and performance assessment - Application to estimating the number of clusters 11/3 Lecture 8: Regression [Segal] - Penalization and selection (Hastie, Ch. 3) - Continuous and survival endpoints 11/10 Lecture 9: Hidden markov models & Functional annotation [Yeh] - HMM, GHMM, paired HMM and their applications in genomics - Functional annotation and motif finding for groups of co-expressed genes 11/17 Lecture 10: Case Studies: “other” arrays [Fridlyand/Yeh] - Array CGH - SNP arrays 11/24 Thanksgiving holiday 12/1 Student presentations Grades: 30% - classroom participation 70% - oral/poster presentation of a student project: case study of your own data or re-analysis of publicly available data. Textbook: The Elements of Statistical Learning by T. Hastie, R. Tibshirani, J. H. Friedman. 2001. Springer. Recommended readings:
|