The Biostatistics High Performance Computing System (henceforth referred to as the "cluster") is a shared equipment facility for statistical computing that is beyond the capacity of a high-end desktop computer. It is specially geared towards high-memory and disk-intensive computing such as some genetics and bioinformatic applications. As a shared resource, it is maintained by contributions from participating research groups.
The system consists of 11 machines (also known as "nodes") organized in a hierarchy. One computer is designated as the "head" node and is the central gateway to all the other, "compute" nodes. Each node has two quad-core Intel CPUs running the Linux operating system. The head node has 64G of RAM, two nodes have 48G RAM, two have 32G RAM, two have 24G RAM, and four nodes have 16G RAM. Each compute node has it own disks, and all nodes share disk space via NFS mounted on the head node.
At this time, we have a host of open-source software installed on the system. This includes common GNU tools such as the GCC compiler, LAPACK linear algebra library, and the R software package. To facilitate distributed computing, we have installed OpenMPI, and the R/snow package. No proprietary software is installed at this time, although we are open to that, resources permitting.
The cluster was started in 2004 using seed money from a Shared Equipment Grant awarded by UCSF's School of Medicine's REAC (Research Evaluation and Allocation Committee). This was supplemented by contributions from research groups led by Drs. Peter Bacchetti, Mark Segal, and Sergio Baranzini. System administration and hosting was provided by the Division of Biostatistics (Dr. Chuck McCulloch, head). The first cluster had six dual-CPU nodes running Apple's OS X.
In 2008, the second generation system became operative using funds contributed by research groups led by Drs. Jeff Wall, Saunak Sen, Esteban Burchard, and Mark Segal. The Division of Biostatistics (Dr. Chuck McCulloch, head) continues to support system administration and hosting.
The cluster is overseen by Saunak Sen, and administered by Richard Tabor. A techincal advisory board makes day-to-day technical decisions. Funding decisions are made in consultation with the steering committee consisting of principal investigators leading the stakeholder research groups that have contributed resources to the cluster.
If you are interested in gaining access, please send an email to Saunak Sen (sen@biostat.ucsf.edu). All Division of Biostatistics researchers can gain access by virtue of their affiliation with the division which supports hosting and maintenance. We invite all interested research groups to a trial access period of a month. If the group is interested in continued access, we request contribution of resources to the shared equipment facility.