UCSFCBMBCBMB



    BMI 209 Home

    Announcements

    Syllabus

    Schedule & Handouts

    Links
 












BMI 209 - Statistical Data Mining & Analysis of Microarray Data

Fall 2005, 1 unit


Course website: http://www.biostat.ucsf.edu/cbmb/bmi209

Lectures: Thursdays 2-3.30pm, Genentech Hall S201(Mission Bay campus)

Instructors: CBMB faculty <cbmb@biostat.ucsf.edu>

(Ru-Fang Yeh, Jane Fridlyand, Mark Segal) 

Course coordinators: Ru-Fang Yeh <rufang@biostat.ucsf.edu>
                                       Jane Fridlyand <jfridlyand@cc.ucsf.edu>


Office Hours: By appointment.

Objective:  This course will introduce, illustrate and evaluate a variety of statistical data mining methods employed in the context of large-scale genomic experiments, with an emphasis on applications to DNA microarrays. Techniques covered correspond to commonly encountered research questions and study designs, and include preprocessing/normalization of expression array data, exploratory data analysis, hypothesis testing, linear models, clustering, discrimination, prediction, hidden Markov models and annotation with gene ontology and sequence information. The course will feature extensive discussion and illustration of data mining techniques covered in the text “The Elements of Statistical Learning” by Hastie et al, (2001, Springer) and associated computational tools and resources.  A brief overview of microarray technology is included, as is discussion of recent array-related developments and extensions.

Upon completion of the course students will be able to:
    1.  Perform basic microarray data analyses.
    2.  Identify and use relevant resources (genomic data and tools) for their own research.
    3.  Assess genomic data analyses performed by others.

Topics:

9/15    Lecture 1: Introduction to genomics and microarray technology [Yeh]

9/22    Lecture 2: Introduction to statistics and microarray analysis [Yeh]
           - Summary statistics and exploratory data analysis methods
           - Clustering: partitioning and hierarchical methods
           - Sources of variability and experimental design
           - Data preprocessing for expression arrays

9/29    Lecture 3: Hypothesis testing and Linear Models [Yeh]
           - Two-sample statistics
           - Introduction to linear models for factorial experiments 
           - Multiple testing issues

10/6    Lecture 4: Classification I [Fridlyand]
           - Linear methods for classification (Hastie Ch.4)
           - Linear discriminant analysis and variations
           - Logistic regression

10/13  Lecture 5: Classification II [Fridlyand]
           - Tree-based methods (Hastie Ch.9)
           - Ensembles: bagging, boosting, random forests (Hastie Ch.10)

10/20  Lecture 6: Classification III [Segal]
           - Support vector machines (Hastie Ch.12)
           - Neural networks (Hastie Ch.11)
                       
10/27  Lecture 7: Model selection [Fridlyand / Segal]
           - Bias, variance and model complexity (Hastie Ch.7)
           - Model search (forward/backwards/stochastic)
           - Model selection criteria: AIC/BIC
           - Cross validation and performance assessment
           - Application to estimating the number of clusters

11/3    Lecture 8: Regression [Segal]
           - Penalization and selection (Hastie, Ch. 3)
           - Continuous and survival  endpoints

11/10  Lecture 9: Hidden markov models & Functional annotation [Yeh]
           - HMM, GHMM, paired HMM and their applications in genomics
           - Functional annotation and motif finding for groups of co-expressed genes

11/17  Lecture 10: Case Studies: “other” arrays [Fridlyand/Yeh]
           - Array CGH
           - SNP arrays

11/24  Thanksgiving holiday

12/1    Student presentations

Grades: 
    30% - classroom participation
    70% - oral/poster presentation of a student project: case study of your own data or re-analysis of publicly available data.


Textbook:
    The Elements of Statistical Learning by T. Hastie, R. Tibshirani, J. H. Friedman. 2001. Springer.

Recommended readings:

  1. Bioinformatics and Computational Biology Solutions using R and Bioconductor, edited by R. Gentleman et al. 2005. Springer.
  2. Statistical Analysis of Gene Expression Microarray Data, edited by T.P. Speed. 2003. Chapman & Hall/CRC.
  3. A Primer of Genome Science by G. Gibson & S.V. Muse. 2001. Sinauer Associates.