http://epibiostat.ucsf.edu/biostat/res_prog.html
UCSF
Department of Epidemiology & Biostatistics

Division of Biostatistics

Research Program Archives

Upcoming Research Program meetings are now listed on the Seminar page.

August 21, 2008

Naisyin Wang

Texas A&M University

Flexible Modeling of Longitudinal Response/Covariate Observations

The analysis of hierarchical biomedical data sometimes requires more modeling flexibility than that can be provided by standard parametric approaches. It is commonly believed that the effect of ignoring covariance structure is mainly on the lost of efficiency. In this talk, I will discuss some recently developed flexible semiparametric models for longitudinal observations. I will use numerical outcomes and examples to illustrate some potential concerns when one ignores the correlations in longitudinal measurements. The less known fact is the serious biases that could be induced by ignoring correlations in the longitudinal covariate observations. The modeling consideration of the use of functional principle component analysis (FPCA) in a recently developed latent-feature regression model will also be discussed.

June 26, 2008

Gang Li

Department of Biostatistics, UCLA

Joint Modeling of Longitudinal and Competing Risks Survival Data

In this talk I will review some recent developments in joint modeling of longitudinal data and competing risks su rvival data. By modeling the event time jointly, the analysis of longitudinal measurements is adjusted for non-ig norable missing data due to informative dropout that cannot always be handled appropriately by the standard linea r mixed effects models alone. Joint models utilize information from both outcomes, and thus can be more efficient . In addition, a joint model enables one to make joint inference on both endpoints which is often desired in anal ysis of clinical trials. Our models allow for more than one type of failures and provide a simple means to handle dependent censoring. New methods for dealing with outliers and heterogeneous random effects will also be discuss ed. The methods will be illustrated using data from a clinical trial for scleroderma lung disease.

June 17, 2008

Joan Hilton

Division of Biostatistics, UCSF

Noninferiority trial designs for binomial rate differences and odds ratios

For noninferiority trials in which binomial response rates differ under H_0 and are equal under H_A, we show that the marginal response rate and the noninferiority margin are convenient design parameters. \240We also show that the minimum overall sample size, N, and optimal allocation ratio associated with fixed type-1 and type-2 error rates depend on how the margin is parameterized. Since investigators commonly use the difference between experimental and control response rates (delta) for design and the odds ratio (psi) for analysis, we examine the effects on sample size and power of switching parameterizations of the margin. We also model the sample-size ratio, N_delta/N_psi, as a function of a wide range of design parameters; the regression estimates from this model can be used at the design stage to identify pairs for which the margin\222s parameterization should not be interchanged between design and analysis. Finally, we discuss ways to quantify the unknown marginal response rate.

May 20, 2008

Ying Lu

Department of Radiology, UCSF

"Who Should Take Which Test" — A Recursive Tree Algorithm for Choosing the Optimum Diagnostic Strategy

Several diagnostic tests are commonly available to clinicians. For example, both dual X-ray absorptieometry (DXA) and quantitative ultrasound (QUS) can be used to diagnose osteoporosis so patients can receive treatments to prevent hip fracture. In general, these diagnostic tests vary in cost and their diagnostic accuracy in predicting adverse outcome (such as hip fracture) depend on subject characteristics. In this talk, we present a new recursive partitioning tree-structured algorithm to determine "who should take which test" according to freely available easily collected risk factors. Our algorithm compares the choices 4 possible actions from two diagnostic tests (test 1 and 2): (A0) no need of diagnostic test and treatment (sufficient low risk of disease); (A1) applying diagnostic test 1 and treating only those with positive testing results; (A2) applying diagnostic test 2 and treating only those with positive testing results; and (A3) treating all subjects without any diagnostic tests. The algorithm assigns subjects into one of the four action groups according to answers of sequential binary questions with regards to their observed risk factors. For continuous or ordered variables, the question is whether a subject has value above or below a threshold. For categorical variables, the question is whether a subject belongs to one category. The risk factors utilized in the tree, the corresponding thresholds, and the order of questions are determined by the data. The splitting criterion is to maximize the gain in cost-effective difference (CED), which is the difference between quality-adjusted-life-year (QALY) gain (in $) and incremental diagnostic and treatment cost, in comparison to the strategy of the parent node. The pruning is based on cross-validation method. Cost parameters and discount of QALY can be obtained from published literature and are assumed to be independent of risk factors. The joint distribution of time to adverse outcome and the subsequent time to death with and without treatment depends on risk factors and should be estimated by different partitioning choices. We provide non-parametric estimation procedures to determine these distributions and CED. The proposed method is applied todetermine who should receive DXA versus QUS tests and be treated by Alendronate to prevent hip fractures based on age, height reduction from youth, weight, body mass index, walking speed, etc. Sensitivity analysis also shows the ranges of cost parameters, treatment efficacy, and acceptable cost in dollars of one QALY-gain, within which the resulting decision tree remains appropriate. This method can apply to large-scale cohort studies or treatment clinical trials with multiple diagnostic tests as ancillary components to determine the optimum cost-effective combination of diagnostic tests and treatments.

Joint work with Caixia Li.

April 22, 2008

Diana Miglioretti

Group Health Center for Health Studies, Seattle

Misleading marginal analyses in settings where random effect variances depend on covariates

Clustered data are commonly collected in medical studies and typically analyzed using either marginal or conditional modeling approaches to account for potential correlation induced by unmeasured heterogeneity among clusters. In many cases, it is reasonable to expect that the magnitude of this heterogeneity may depend on a covariate that varies either between or within clusters. We show that when a covariate influences both the conditional mean and the random effect variance, marginal analyses may provide misleading results, suggesting there is no covariate effect or even an effect in the opposite direction of the conditional effect. Conditional models that falsely assume a constant random effect variance may also provide biased estimates. We use simulations to show that this bias decreases as the cluster sizes get larger when the random effect variance depends on between-cluster covariate, but that the bias remains regardless of cluster size when the random effect variance depends on a within-cluster covariate. For conditional models we show how to accommodate non-constant (either between or within cluster) random effects variances. We illustrate our findings using data from the Breast Cancer Surveillance Consortium to examine the effect of radiologist experience on the interpretive performance of mammography.

Joint work with Sebastien JPA Haneuse, Charles McCulloch and John Neuhaus.

March 18, 2008

Ru-Fang Yeh

Division of Biostatistics and CBMB, UCSF

Statistical Inference of Gene regulatory modules from linked multi-level molecular profiling data

As high-throughput biotechnology matures and becomes ubiquitous for monitoring various molecular changes, many studies have begun to generate linked genomic data that simultaneouosly profile events during the multi-step process of gene expression. There have been well-established analytic methods for typical problems using one microarray platform at a time, such as differential expression analysis of transcriptome arrays and DNA aberration detection by array competitive genome hybridization. However, integrative analysis of linked genomic data is largely under-explored and limited to anecdotal successes. Using brain tumor as an example, I will discuss statistical challenges and open problems arising from the hierarchical nature and networked dependency of such data, and present some preliminary results of identifying microRNA-regulated gene modules from mRNA expression and array CGH data using gene set tests and simple linear models.

February 19, 2008

Steve Gregorich

Department of Medicine and CAPS, UCSF

REPEATED MEASURES MODELS WITH MULTIPLE, CORRELATED RANDOM EFFECTS

I will discuss two types of random effects models for repeated measures, with example applications to data from a prospective study of women with non-cancerous uterine conditions. First, I will describe an associative latent growth model, which allows for estimation of covariation between temporal changes (trajectories) in multiple dimensions. Such models are illustrated using longitudinal assessments of women's self-reported sexual, physical, and mental health (e.g., are changes in sexual functioning associated with changes in mental health status?). Second, I will describe spline models of pre- and post-hysterectomy health-related quality of life (HRQOL) trajectories as well as the 'instantaneous' HRQOL change (or 'bump') attributable to the surgical intervention. These models include correlated random intercept, trajectory, and 'bump' effects, which address interesting research questions: e.g., are women's pre-surgical HRQOL trajectories associated with the 'instantaneous' HRQOL changes that are attributable the surgical intervention?

January 29, 2008

Kevin Delucchi

Department of Psychiatry, UCSF

Capturing Group Membership via Growth Mixture Models: A Simulation Study

This talk presents the results of a series of simulations examining the ability of latent growth mixture models GMM) to capture group membership and intercept and slope parameters. The basic model was that of a common design; a longitudinal study with four equally-spaced assessment points. Underlying the observed data were two known populations from which the observed data were generated. We then fit a two-class GMM to the observed data, assigned subjects to their most likely class and compared that assignment to their true membership and the estimates of the intercept and slope to population values. A total of 56 conditions were simulated from a 7 x 2 x 2 x 2 factorial design with 1000 samples per condition: Seven levels of degree of imbalance of sample sizes, 2 differences in intercept means, 2 difference in slope means, and 2 levels of residual variance. This was conducted for total Ns of 300 and 900 with uneven numbers per group.

Focusing on the extreme effect-size conditions, the percentage correctly classified ranged from 58% to 88% correct. When the effects for slope and intercept were large the percentage correctly classified for the larger group increased as the N in that group increased. For the smaller group, as the N declined from 420 to 300, the correct classification rate dropped but then increased as the N declined further reaching a high of only 72%. As the sample size of the larger of the two groups increased, the mean estimated intercept approached the true value of 0. The mean estimates for the second class under the large effect approached the true value of 1.0 up until the N declines to n=240, then, as the sample size declined further, the estimate drops back away from the true value. For the large effect case the mean estimates of slope approached the true value of 0.5 as the N increased for the larger group. For the smaller group, the mean estimated slope initially approached the true value of -0.5 but never reached it and moved away from it in the smallest sample size condition.

These results raise concerns about the quality of results based on GMMs. This brings into question results from applied analyses in which study participants are assigned to their most likely latent group and then are compared on covariates. Further details, implications and future plans will be presented.

January 8, 2008

Dennis Osmond

Department of Epidemiology & Biostatistics, UCSF

An overview of the uses of propensity scores, propensity score weights, and instrumental variables

December 4, 2007

Mei Polley

Department of Neurosurgery, UCSF

Two-Stage Designs for Dose-Finding Trials with a Biologic Endpoint Using Stepwise Tests

We tackle the problem of early phase dose-finding trials with monotone biologic endpoints such as biologic measurements and laboratory values. A specific aim of this type of trial is to identify the minimum dose that exhibits adequate drug activity and shifts the mean of the endpoint from a zero dose, the so-called minimum effective dose. Stepwise tests for dose-finding have been well studied in the context of non-human studies where the sampling plan is done in one stage. We extend the notion of stepwise tests to a two-stage setting in an attempt to reduce the sample size requirement by shutting down unpromising doses in a futility interim. Specifically, we examine four two-stage designs and apply them to design a statin trial with four doses and a placebo in patients with Hodgkin's disease. We discuss the calibration of the design parameters and the implementation of these methods.

Joint work with Ken Cheung, Department of Biostatistics, Columbia University.

November 6, 2007

Rebecca Scherzer

Metabolism Section, VAMC

Closed testing procedures for group sequential clinical trials with multiple survival endpoints

Clinical trials often involve multiple survival endpoints with group sequential monitoring, but most studies specify a primary outcome, following a univariate approach and ignoring multiplicity. This research gives methods for such data. We illustrate the use of marginal proportional hazards models with a Lan and DeMets (1983) type alpha spending function to test multiple survival endpoints at K interim analyses. To adjust for multiplicity at each interim analysis, we consider and extend methods developed by Tang and Geller (1999) and Follmann, et al. (1994). These methods are motivated using survival data from a clinical study of primary biliary cirrhosis. Type I error and power are examined using simulation studies.   Slides

October 16, 2007

Mark Segal

Division of Biostatistics and CBMB, UCSF

Re-Cracking the Second Genetic Code

In a recent, widely celebrated, computational biology paper Segal et al., (Nature, 2007) provide extensive evidence supporting the existence of a second genetic code embodied in DNA. This second code pertains to the positioning of nucleosomes (the fundamental repeating subunits of all eukaryotic chromatin) which are responsible for packaging DNA into chromosomes inside the cell nucleus and controlling gene expression. Here, we re-evaluate both the basis for, and performance of, the proposed nucleosome positioning code. Tools employed in this process include the spectral envelope and discriminatory motif finding.

May 8, 2007

Mary Lesperance

Department of Mathematics & Statistics, University of Victoria

GRAPHICAL TECHNIQUES FOR GENE EXPRESSION STUDIES

Correspondence analysis (CA) is a descriptive technique designed for investigating the association between row and column variables by graphically displaying the patterns in the data. It has been widely applied to categorical data. We explore and develop variations of CA techniques to identify differentially expressed genes and to assess the quality of replicate DNA arrays.

Multiple correspondence analysis (MCA) and a related technique called joint correspondence analysis (JCA) are methods for visualizing the joint features of 2 or more categorical variables. We have been working with the Genetic Pathology Evaluation Centre (GPEC) at UBC and the Breast Outcomes Unit (BCOU) at the B.C. Cancer Agency (BCCA) to study relationships between molecular markers and outcomes for breast cancer. Molecular markers and diagnostic variables are typically categorized as positive/negative by pathologists and oncologists, whereas outcome measures such as time to recurrence or breast cancer specific survival time are continuous and possibly censored. We consider fuzzy coding methods to display survival information in an MCA analysis of molecular markers.

March 13, 2007

Chuck McCulloch

Professor and Division Head of Biostatistics, UCSF

The good, the bad, and the ugly of joint modeling using shared and correlated random effects

Multiple outcomes, often with very different marginal distributions, are common in studies in a variety of scientific fields. But direct specification of joint distributions is difficult and has led many to consider building correlated data models through conditional independence along with shared or correlated random effects. After a brief review of these models and some motivating examples, I elaborate the consequences, good and bad, of this type of model specification. An example of a joint model is illustrated using data from the Osteoarthritis Initiative.

February 22, 2007

Yu Shen

Department of Biostatistics, M. D. Anderson Cancer Center

Inference of Tamoxifen's Effects on Prevention of Breast Cancer from a Randomized Controlled Trial

Breast cancer is the most common non-skin cancer among women in the United States, and continues to be an important cause of morbidity and mortality for women at high risk of developing the disease. The advent of preventive intervention and early detection of cancer brings greater hope to the control of breast cancer, while also posing significant challenges to researchers and public health policy makers. To provide quantitative frameworks to describe the natural history of breast cancer; assess the impact of the primary preventive intervention on the natural progression of the disease, we propose a flexible semiparametric model to assess the effects of a preventive agent on the incidence of breast cancer as well as time to the diagnosis of the disease, separately, in the framework of a cure-rate model. We used an estimating equation approach to estimate the unknown parameters, and assessed the semiparametric model assumption with a test based on the area between two survival curves. This is a joint work with Qin and Costantino.

February 13, 2007

Michael Rosenblum

Division of Biostatistics, UCB

Latex Diaphragms In Preventing HIV Among Women:
Statistical Issues in the Methods for Improving Reproductive Health in Africa (MIRA) trial, Conducted Jointly by UCSF and University of Zimbabwe

The MIRA trial is a randomized, controlled trial with two arms, in which a primary intervention (diaphragms and gel) is given only to the treatment arm, and a secondary intervention (condom provision and counseling) is given to both treatment and control arms. In this setting, the standard intent to treat analysis, which compares the mean outcomes in the treatment and control arms, gives an unbiased estimate of the causal effect of assignment to the diaphragm and gel arm, in the presence of condom counseling. However, it may be of more public health interest to estimate the effectiveness of the primary intervention in the absence of the secondary intervention. We attempt to estimate a related parameter: what the causal effect of assignment to the diaphragm and gel arm would be if condom use were set at a fixed level. We describe how we implement a direct effects analysis to estimate this parameter from data collected in the MIRA trial.

January 16, 2007

Ying Lu and Caixia Li

Department of Radiology, UCSF

The added utility of a variable in the presence of other covariates

September 22, 2005

Su-Chun Cheng

Associate Professor of Biostatistics, UCSF

COMBINING MULTIPLE DIAGNOSTIC TESTS WITH NONPARAMETRIC TRANSFORMATION MODELS FOR CLASSIFYING CENSORED EVENT TIMES

November 17, 2004

Kevin Delucchi

Associate Adjunct Professor of Psychiatry, UCSF

LATENT PATTERNS OF CHANGE IN LONGITUDINAL DRUG ABUSE RESEARCH

November 3, 2004

Saunak Sen

Assistant Professor of Biostatistics, UCSF

QUANTITATIVE TRAIT MAPPING STUDY DESIGNS FROM AN INFORMATION PERSPECTIVE

June 2, 2004

John Witte

Professor of Epidemiology & Biostatistics, and Urology, UCSF

STATISTICAL ISSUES IN FAMILY STUDIES

May 5, 2004

John Kornak

UCSF Department of Radiology and Department of Epidemiology & Biostatistics / VA Medical Center Magnetic Resonance Unit

IMPROVING THE EFFECTIVE RESOLUTION OF LOW SIGNAL MAGNETIC RESONANCE IMAGING MODALITIES BY INCORPORATING HIGH RESOLUTION STRUCTURAL INFORMATION

April 21, 2004

Tor Tosteson

Associate Professor of Community and Family Medicine (Biostatistics), Biostatistics Director, MCRC/SPORT, Dartmouth Medical School

CHANGES IN FUNCTIONAL HEALTH STATUS ASSOCIATED WITH TREATMENT FOR LUMBAR SPINE DISORDERS: Methods for longitudinal analysis

Statistical methods for longitudinal data analysis are proposed to provide estimates of change in patient outcomes associated with treatment for discherniation and spinal stenosis in two observational studies with longitudinal follow-up. Special statistical issues include nonlinear trends, unequal timing of visits, variable timing for surgical treatment, treatment, strong regression to the mean and potentially biased followup rates. Surgically treated patients show greater functional health status gains than for non-surgically treated patients, although some inconsistencies are noted between the two studies. The methods and results are discussed in the context of the ongoing Spine Patients Outcomes Research Trial (SPORT).

April 7, 2004

Su-Chun Cheng

Associate Professor of Biostatistics, UCSF

JOURNAL CLUB

Margaret S. Pepe, Tianxi Cai, and Zheng Zhang, "Robust Binary Regression for Optimally Combining Predictors" (April 18, 2003). UW Biostatistics Working Paper Series. Working Paper 198.

March 24, 2004

Chuck McCulloch

Professor and Head of Biostatistics, UCSF

REPEATED MEASURES LOGISTIC REGRESSION: MARGINAL AND CONDITIONAL MODELS

March 10, 2004

Eric Vittinghoff

Associate Adjunct Professor of Biostatistics, UCSF

JOURNAL CLUB

NP Jewell and MJ van der Laan (2002). Current Status Data: Review, Recent Developments and Open Problems.

February 25, 2004

Nancy Hills

Department of Epidemiology, UC Berkeley

STATISTICAL ISSUES IN THE ANALYSIS OF DATA FROM STUDIES OF HUMAN PAPILLOMA VIRUS

January 28, 2003

Peter Gilbert

Statistical Center for HIV/AIDS Research & Prevention (SCHARP)

SENSITIVITY ANALYSES COMPARING OUTCOMES MEASURED ONLY IN A SUBSET SELECTED POST-RANDOMIZATION, WITH APPLICATION TO HIV VACCINE TRIALS

January 14, 2003

John Neuhaus

Professor of Biostatistics, UCSF

JOURNAL CLUB DISCUSSION OF TWO PAPERS ON RESPONSE-SELECTIVE SAMPLING DESIGNS

J. Scott and C. J. Wild. (1997). Fitting Regression Models to Case-Control Data by Maximum Likelihood. Biometrika 84:57-71.

J. F. Lawless, J. D. Kalbfleisch, C. J. Wild. (1999). Semiparametric methods for response-selective and missing data problems in regression. Journal of the Royal Statistical Society, Series B 61:413-438.

December 3, 2003

Ying Lu

Associate Adjunct Professor of Radiology, UCSF

THE OPTIMAL COMBINATION OF MULTIPLE DIAGNOSTIC VARIABLES

November 19, 2003

John Neuhaus

Professor of Biostatistics, UCSF

THE ANALYSIS OF CLUSTERED DATA WITH RESPONSE DEPENDENT CLUSTER SIZES

November 5, 2003

John Kornak

UCSF/VA Medical Center Magnetic Resonance Unit

ISSUES IN THE STATISTICAL ANALYSIS OF fMRI DATA

October 22, 2003

Steve Gregorich

Assistant Adjunct Professor of Medicine/DGIM, UCSF

FITTING MIXED LOGIT MODELS VIA ADAPTIVE QUADRATURE (PART II): BIAS AND COVERAGE OF FIXED AND RANDOM PARAMETER ESTIMATES

October 8, 2003

Eric Vittinghoff

Associate Adjunct Professor of Biostatistics, UCSF

A COST-EFFICIENT CASE-ONLY METHOD FOR EXPLORATORY ANALYSIS OF TREATMENT-COVARIATE INTERACTIONS IN RANDOMIZED TRIALS WITH FAILURE-TIME ENDPOINTS

September 24, 2003

Su-Chun Cheng

Associate Professor of Biostatistics, UCSF

SEMIPARAMETRIC REGRESSION ANALYSIS OF MEAN RESIDUAL LIFE WITH CENSORED SURVIVAL DATA

September 10, 2003

Hua Jin

UCSF Department of Radiology

TREE STRUCTURED SURVIVAL ANALYSIS

May 21, 2003

Saunak Sen

Assistant Professor, CBMB, UCSF

A DISCUSSION ON FALSE DISCOVERY RATES

based on the following three papers:

Yoav Benjamini, Yosef Hochberg. (1995) Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing. Journal of the Royal Statistical Society. Series B (Methodological) 57(1):289-300.

Joel Ira Weller, Jiu Zhou Song, David W. Heyen, Harris A. Lewin, and Micha Ron. (1998) A New Approach to the Problem of Multiple Comparisons in the Genetic Dissection of Complex Traits. Genetics 150:1699-1706.

John D. Storey. (2002) A direct approach to false discovery rates. Journal Of The Royal Statistical Society Series B 64(3):479-498, 2002.

April 30, 2003

Peter Bacchetti

Adjunct Professor of Biostatistics, UCSF

SURVIVAL REGRESSION METHODS FOR VERY SMALL STUDIES

April 2, 2003

Steve Gregorich

Assistant Adjunct Professor, DGIM, UCSF

AN ASSESSMENT OF THE PERFORMANCE OF MIXED-EFFECT LOGISTIC MODELS WITH AN EMPHASIS ON CONVERGENCE, BIAS, AND COVERAGE

February 19, 2003

Joan Hilton

Associate Professor of Biostatistics, UCSF

USE OF BASELINE VALUES IN LINEAR MIXED EFFECTS MODELS

February 5, 2003

Peter Bacchetti

Adjunct Professor of Biostatistics, UCSF

THE ROLES OF AGE AND EXTRAPOLATION IN PROJECTING THE EPIDEMIC OF MAD COW DISEASE IN HUMANS