Recent advances in sequencing technology have led to an explosive growth in the amount of publicly available human sequence data. This represents an invaluable opportunity for scientific exploration and discovery, but the sheer quantity of the data presents a major challenge for analysis. For instance, the 1000 Genomes project sequences a large number of human genomes to provide a comprehensive resource on human genetic variation. This data is publicly available but due to its sheer size it has been practically inaccessible to contemporary sequence search methods.
The Terabase Search Engine (TSE) project will develop novel software and databases that allow users to search, retrieve, and re-analyze the raw data underlying thousands of human genomes. The SciServer will provide the building blocks for the query and data management framework for the TSE project. New tools built on top of SciServer will make this rich resource available for exploration and discovery to the research community.