Compute Images – SciServer

SciServer Compute Images

NEW: Astronomy
Language-based images
      Python + R
      Matlab R2016a
      BeakerX
      Julia
Project-based images
      LSST Science Pipeline
      Montage
      HEASARCv6.28
      Marvin
      Recount
      JH Turbulence DB
      Geo
      Oceanography

An image is a virtual machine setup for SciServer Compute, which comes pre-installed with important data analysis packages.

When you create a container in SciServer Compute, you have the option of selecting an Image to use with that container. Some images are designed to support specific programming languages, while others are designed to support research within a specific science domain.

The lists below describe each of the images that you can select when you create a new container in SciServer Compute. You may only select one image per container, and you may only create up to three containers.

SciServer Essentials

The default image for all newly-created containers using Python or R is SciServer Essentials. This default computing environment contains basic tools for a wide range of data analysis and machine learning tasks. It comes pre-installed with:

Python 3.7 (Anaconda 2019.10): run conda list in a new notebook for a full list of packages
R 3.6.2: run installed.packages() in a new notebook for a full list of packages
Jupyter Notebook 6.0.3
JupyterLab 1.2.6 (you can switch freely between classical Jupyter notebook and JupyterLab views
TeX Live 2019 (to support advanced text rendering in Matplotlib and R, and to allow saving entire notebooks as PDF)

The environment also includes two machine learning libraries, available in both Python and R:

TensorFlow 2.0.0
PyTorch 1.4.0

Both machine learning libraries work with CPU only. GPU hardware acceleration is not yet available in this image.

The image also includes all SciScript libaries (v2.0.13) for both Python and R. See our API documentation pages for what those libraries contain.

Please note that all official support for Python 2 has been dropped as of January 1, 2020, so we are only including Python 3 in this new image.
If you still need Python 2, you can use our old Python + R image, but we strongly encourage you to switch to Python 3 as soon as possible.

All SciServer Images

The SciServer Essentials image is the default choice when creating a container, but many other computing environments are available by choosing from the Compute Image dropdown menu when you create a new container. This section describes features that all available images share; later sections describe each individual image.

All SciServer images – except Matlab – come pre-installed with the SciServer modules/packages/libraries, which allow SciServer Compute to communicate with all other SciServer components (e.g. CasJobs, SkyQuery, etc.). Although these SciServer communication packages come pre-installed with each image, you still must import them within your scripts. You can do this with the import command in Python or the install command in R. For further information on what the SciServer modules/packages/libraries contain and how they work, see SciServer API Documentation.

All images are based on Scientific Linux 7’s official Docker images. All contain the following packages:

The CentOS “Development Tools” package group, which provides access to GCC’s C and C++ compilers, gfortran, Autotools, ctags, flex/bison, make, git, subversion, and other useful tools
Several X11-related libraries, including cmake and wget
The time zone database (the so-called “Olson Database”) is explicitly re-installed so that programs making use of it will continue to work this is the case for packages using the OlsonNames function in R

To see the full list of packages in the CentOS Development Tools package group, open a terminal and run the following command:

curl https://mirror.centos.org/centos/7/os/x86_64/repodata/repomd.xml 2>/dev/null | sed -r 's#xmlns[^=]*?="[^"]*"##g' | xmllint --xpath "repomd/data[@type='group']/location/@href" -|sed -e 's#href="#https://mirror.centos.org/centos/7/os/x86_64/#' -e 's/"//' |xargs -n1 wget -O- -q | xmllint --xpath "comps/group/id[.='development']/parent::group/packagelist/packagereq[@type!='optional']" - | sed -e 's##\n#g' -e 's###g'|sort

If none of these images contains the packages that you need for your work, you can always use pip or conda to install new packages. To install a new package, create a new notebook (or open an existing notebook) and type the following in its own Code cell at the top of the notebook:

!pip install [package]

!conda install [package]

replacing [package] (and the surrounding brackets) with the with the name of the package you want to install. Don’t forget to include the exclamation point at the beginning of the line.

NEW: Astronomy Image

We have created a new Compute Image that makes it easier than ever to do astronomy research with SciServer.
Select Astronomy from the dropdown list of images to use it.
The new Astronomy image contains the following packages:

astropy
astroquery
specutils
astropy-healpix
halotools
photutils
esutil
fitsio
ccdproc
ipyaladin
astroML
pywwt
toasty
shapely
galsim
yt
sherpa
regions
reproject
pyvo
saba
pyregion
cdshealpix
mocpy
pyds9
pyraf
MontagePy
montage-wrapper
pymangle
pysynphot
healpy
sdss-marvin
jdaviz
specdash
@wwtelescope/jupyterlab (worldwide telescope widget)

Language-based images

Python + R

The Python + R image is a good default image for working with SciServer using those scripting languages. When you create a new notebook, use the dropdown menu to specify whether that notebook will use Python 2, Python 3, or R. You can also upload existing scripts as .ipynb, .py or .r files.

Python scripts can be written in either Python 2 or Python 3; if you are new to Python, we recommend Python 3, as some features of Python 2 will no longer be supported in the future.

The list below provides full details about the Python + R image.

The Python + R image, like all SciServer images, can be accessed through various user interfaces, but the underlying image is the same. The default is Classical Jupyter, which is ideal for most research and education use cases. The JupyterLab user interface has some advanced features that may be useful, while the RStudio user interface is optimized for working with R scripts.
There is a known issue that JupyterLab will remember the files opened and try to restore them, even between different containers
The image can use any of the following versions:
- The Anaconda Python 2 distribution of Python 2.7
- The Anaconda Python 3 distribution of Python 3.5
- The Anaconda R Essentials distribution of R 3.4
Python 2, Python 3, and R versions of our SciScript libraries are installed
The image has redis and libdynd installed for both R and Python (2 and 3)
The image comes installed with both Python 2 and Python 3 versions of the following packages:
- Argcomplete
- Astroml
- Chest
- Configobj
- Dill
- Dynd-python
- Gatspy
- GRequests
- Pymc
- Pymc3
- Redis-py
- Sphinx_rtd_theme
- Sockjs-tornado
- Supersmoother
- Traceback2
- Unittest2
In addition to the Anaconda R Essentials distribution, the image also comes installed the bit64 and jpeg R packages.
The RStudio image uses RStudio 1.1.453 as the interface.
For Python 2 users: Python 2 and its packages are installed in a conda environment called “py27.” To use this version of Python in scripts within a SciServer Compute terminal, it will be necessary to run the command source activate py27. This is only necessary for running scripts in a terminal; when you create a new Python 2 notebook from the dropdown menu, it happens automatically.

Matlab R2016a

The Matlab R2016a image is the only one in which you can write notebooks in Matlab. This image is not based on the Python + R image described above.

BeakerX

The BeakerX image provides access to the BeakerX package on top of the Python+R image. The primary advantage of using it is that it allows notebooks to use JVM languages, such as Java, Kotlin, Scala, Clojure, and Groovy.

Julia

The Julia image, built on top of the Python + R image, allows you to write notebooks in Julia. It comes installed with the Julia, IJulia, Plots, PyPlot, and PyCall packages, and is available with either the Classical Jupyter or JupyterLab interfaces.

While there is no SciServer library available for Julia, the Python version of SciScript is available. For example, the following retrieves the list of Compute Jobs:

using PyCall @pyimport SciServer.Jobs as Jobs Jobs.getJobsList()

We would like to hear from users of this image, and are open to feedback concerning what packages would be useful.

Project-based images

LSST Science Pipeline (Astronomy)

The LSST Science Pipeline image is designed to address the use cases of the upcoming Large Synoptic Survey Telescope (LSST). The LSST is an 8.4-meter telescope, now under construction in Chile, that will conduct the largest-ever survey of the night sky. The LSST will not obtain first light until 2019, but its science team is now developing the data processing and analysis pipelines to support its ambitious mission. This Compute Image is optimized to support that design work.

This image is not based on the Python + R image
Red Hat’s devtoolset-7 software collection is installed, providing much newer development tools (e.g., GCC 7)
Version 15.0 of the LSST pipeline is installed – specifically, the lsst_distrib package, which contains almost all the is installed, containing almost all packages required to run the LSST pipeline
The startup script for Classical Jupyter ensures that the LSST packages are setup, so relevant commands like setup and eups should immediately work upon starting this image.

Montage (Astronomy)

Montage is a set of tools, available as either command-line programs or Python modules, that allow users to reproject images, analyze and correct background differences between images, and create mosaics. Montage includes tools for searching image metadata for several datasets and downloading the images found. On the back end, it provides visualization of grayscale or 3-color composite images, catalog and image coverage metadata overlays, etc. Catalogs and other data can be found and retrieved using the included Astroquery package.

For more on how to use Montage with online Python notebooks, see the Montage Jupyter Notebooks page.

HEASARCv6.28 (Astronomy)

The HEASARCv6.28 image contains a full copy of the HEASoft software package v6.28 as described on the HEASoft documentation page at HEASARC. The Compute Image also has a Python installation with common astronomical packages pre-installed and a user environment already configured.

For questions on the HEASoft tools themselves, please see the FTOOLS HELP DESK at the bottom of the HEASoft documentation page. For questions on the environment within SciServer (or if you’re not sure which it is), please contact the HEASARC at using the Feedback link at the bottom of the HEASARC home page.

Some startup instructions can be found at the HEASARC SciServer page.

The corresponding HEASARC data can be found in the HEASARC data volume, described on the SciServer Hosted Datasets page.

Marvin (Astronomy)

MaNGA (Mapping Nearby Galaxies at APO) is one of the three main surveys of the SDSS-IV program. Its goal is to map the detailed composition and kinematic structure of 10,000 nearby galaxies. MaNGA uses integral field unit (IFU) spectroscopy to measure spectra for hundreds of points within each galaxy.

Marvin is a software suite of tools designed for streamlined access to the MaNGA data, optimized for overcoming the challenges of searching, accessing, and visualizing the complexity of the MaNGA dataset. Marvin has two main components: a Web Interface for quick visual introduction into the world of MaNGA data, and a Python package of tools, for more in-depth scientific analysis and inclusion in your science workflow. The Marvin Image provides access to the suite of Python tools tailored for exploring MaNGA data.

The Marvin image is built from Astronomy image, with the following additional packages preinstalled:

marvin

Recount (Genomics)

The Recount image is associated with Recount, a genomics project has created a searchable online database of RNA gene sequences from more than 2,000 published studies. The image is designed for use with the Recount public data volume, which can be mounted onto a new container at the same time the image is selected.

The Recount image is based on the Python + R image, and comes installed with the R-based Bioconductor genomics analysis package, version 3.6.

In addition to the packages already installed in the base Python + R image, the Recount image comes with the following packages:

bioconductor-annotationdbi
bioconductor-biobase
bioconductor-biocinstaller
bioconductor-biocstyle
bioconductor-deformats
bioconductor-deseq2
bioconductor-geoquery
bioconductor-iranges
bioconductor-recount
bioconductor-summarizedexperiment
bioconductor-txdb.hsapiens.ucsc.hg18.knowngene
r-bibtex
r-cli
r-knitrbootstrap
r-lubridate
r-rcpp
r-testthat
knitcitations from CRAN
GenomeInfoDbData from bioconductor
regionReport from bioconductor

The bioconda channel has been added as the preferred repository for packages in this image.

JH Turbulence DB (Fluid dynamics)

The JH Turbulence DB image on SciServer provides functionalities to access directly datasets archived and maintained on the Johns Hopkins Turbulence Databases (JHTDB, https://turbulence.pha.jhu.edu/). The system contains space-time data of turbulent flows from the output of world-class high-resolution direct numerical Navier-Stokes simulations. The data are publicly available to the research community. The package pyJHTDB (https://github.com/idies/pyJHTDB) provides a Python interface for querying, downloading, and analyzing the data. The built-in functions include evaluating simulation fields and computing spatial differentiation, interpolation, filtering and particle tracking directly on data clusters. By providing this open simulation laboratory for turbulence research and the python notebook capabilities of SciServer, we hope that broader access to data from simulations will further accelerate turbulence research in coming years.

In addition to the packages from the Python + R image above, the following packages are installed for both the Anaconda Python 2 and Python 3 environments:

gsl
h5py
mpi4py
mpich
pyfftw
pyJHTDB

Geo (Earth/social sciences)

The Geo image comes with packages to create maps and conduct geospatial analyses with Geographic Information Systems (GIS). This image is ideal for research in fields such as earth science and social science, where important quantities vary with geography.

The Geo image is built from the Python + R image, with the following additional packages preinstalled:

Oceanography

The Oceanography image is designed to work with the Johns Hopkins Ocean Circulation Models described on the Datasets page.

The analysis of these large datasets is often restricted by limited computational resources. To address this issue, a team led by Mattia Almansi has developed OceanSpy, a python package that facilitates extracting information from model output fields. SciServer users can use the modules included in this Image to run analyses online and store post-processing files within SciServer Compute.

OceanSpy builds on software packages developed by the Pangeo community, in particular xarray, dask, and xgcm.

Here is a list of packages available in the oceanography image (on top of miniconda):

dask
distributed
bottleneck
netCDF4
xarray
cartopy
esmpy
ffmpeg
cmocean
eofs
geopy
xgcm
xesmf
xmitgcm
oceanspy

OceanSpy: Extraction of oceanographic properties

Extracting information from the model output can be done along ‘surveys’, ‘mooring arrays’, or at point locations. The ‘survey’ resembles a hydrographic ship survey, with equidistant ‘stations’ (vertical profiles) along a great-circle path between two points on the Earth. The data (e.g. temperature and salinity) on the vertical profiles are interpolated from the regular grid onto the station locations. The ‘mooring array’ creates a zig-zag path along the model grid through user-defined mooring locations in the ocean. It enables exact calculation of the transport through an arbitrary curve in lat-lon space. Last, oceanographic properties can be extracted in random locations in the model 4D space, facilitating comparison with floats in the ocean.

OceanSpy: Computation of useful diagnostics

Apart from extracting readily available information, OceanSpy can be used to calculate new diagnostics that are not part of the model output. For example, OceanSpy can calculate the velocity component orthogonal to a ‘survey’, it can calculate the Brunt–Väisälä frequency, the Ertel Potential Vorticity, the eddy kinetic energy, the horizontal divergence, volume fluxes, etc. For regions in the model domain where all the required model diagnostics are available, it can also calculate all heat and salt budget terms to machine precision.