Photo-Z regression calculator

  • Query training set of galaxies for which redshift (~distance measure) is available.
  • Derive a k-nearest-neighbour regressor that determines redshift estimate from 5 "broad-band" magnitudes
  • Test regressor against test set by plotting predicted vs actual redshift

Author: Jake VanderPlas vanderplas@astro.washington.edu

License: BSD

The figure is an example from astroML: see http://astroML.github.com

Modified by: Dmitry Medvedev dmedv@jhu.edu to work with SDSS DB

In [1]:
# standard first block for defining the token and makinhg it available as a system variable for the session
# token must be replaced with new one once it has expired
with open('/home/idies/keystone.token', 'r') as f:
    token = f.read().rstrip('\n')# replace with your own token ID
import sys
sys.argv.append("--ident="+token)
In [2]:
import SciServer.CasJobs as CasJobs
from io import StringIO
import numpy as np
import pandas
from matplotlib import pyplot as plt
from sklearn.neighbors import KNeighborsRegressor
from astroML.plotting import scatter_contour
In [3]:
# some special settings
# ensure columns get written completely in notebook
pandas.set_option('display.max_colwidth', -1)
# do *not* show python warnings 
import warnings
warnings.filterwarnings('ignore')
In [4]:
NOBJECTS = 20000
GAL_COLORS_DTYPE = [('u', float),
                    ('g', float),
                    ('r', float),
                    ('i', float),
                    ('z', float),
                    ('redshift', float),
                    ('redshift_err', float)]

Now we do this

In [5]:
# get data from CasJobs
query_text = ('\n'.join(
    ("SELECT TOP %i" % NOBJECTS,
    "   p.u, p.g, p.r, p.i, p.z, s.z, s.zerr",
    "FROM PhotoObj AS p",
    "   JOIN SpecObj AS s ON s.bestobjid = p.objid",
    "WHERE ",
    "   p.u BETWEEN 0 AND 19.6",
    "   AND p.g BETWEEN 0 AND 20",
    "   AND (s.class = 'GALAXY' OR s.class = 'QSO')")))
response = CasJobs.executeQuery(query_text, "DR12", token = token)
# read the result as a
output = StringIO(response.read().decode())
data = np.loadtxt(output, delimiter=',',skiprows=1, dtype=GAL_COLORS_DTYPE)
executeQuery POST response:  200 OK
In [6]:
# show some of the data
print(data[1:10])
[(18.317, 16.30371, 15.33307, 14.87664, 14.52129, 0.07437177, 1.571993e-05)
 (18.16069, 16.87605, 16.2232, 15.87214, 15.63378, 0.07517696, 1.396806e-05)
 (19.34731, 18.22945, 17.69255, 17.33694, 17.17545, 0.07510134, 7.82039e-06)
 (19.27735, 17.10828, 16.03202, 15.53831, 15.14866, 0.07395959, 1.70741e-05)
 (18.68361, 16.6412, 15.72005, 15.30056, 14.97785, 0.0762095, 1.405473e-05)
 (17.93991, 15.96646, 15.03117, 14.56741, 14.21826, 0.02708067, 9.246141e-06)
 (19.37848, 18.06206, 17.49278, 17.14781, 16.93153, 0.07536609, 1.557691e-05)
 (19.32579, 18.07899, 17.47068, 17.07045, 16.82918, 0.07588968, 1.05723e-05)
 (18.30457, 17.08336, 16.37024, 15.95688, 15.70534, 0.07475523, 6.839745e-06)]
In [7]:
n_neighbors = 1

N = len(data)

# shuffle data
np.random.seed(0)
np.random.shuffle(data)

# put colors in a matrix
X = np.zeros((N, 4))
X[:, 0] = data['u'] - data['g']
X[:, 1] = data['g'] - data['r']
X[:, 2] = data['r'] - data['i']
X[:, 3] = data['i'] - data['z']
z = data['redshift']

# divide into training and testing data
Ntrain = N / 2
Xtrain = X[:Ntrain]
ztrain = z[:Ntrain]

Xtest = X[Ntrain:]
ztest = z[Ntrain:]

knn = KNeighborsRegressor(n_neighbors, weights='uniform')
zpred = knn.fit(Xtrain, ztrain).predict(Xtest)

rms = np.sqrt(np.mean((ztest - zpred) ** 2))
print("RMS error = %.2g" % rms)
RMS error = 0.36
In [8]:
axis_lim = np.array([-0.1, 2.5])

plt.figure(figsize=(12, 8))
ax = plt.axes()
plt.scatter(ztest, zpred, c='k', lw=0, s=4)
plt.plot(axis_lim, axis_lim, '--k')
plt.plot(axis_lim, axis_lim + rms, ':k')
plt.plot(axis_lim, axis_lim - rms, ':k')
plt.xlim(axis_lim)
plt.ylim(axis_lim)

plt.text(0.99, 0.02, "RMS error = %.2g" % rms,
         ha='right', va='bottom', transform=ax.transAxes,
         bbox=dict(ec='w', fc='w'), fontsize=16)

plt.title('Photo-z: Nearest Neigbor Regression')
plt.xlabel(r'$\mathrm{z_{spec}}$', fontsize=20)
plt.ylabel(r'$\mathrm{z_{phot}}$', fontsize=20)
plt.show()

Exercise 3: alternative ML algorithm

Improve on this photo-z predictor.

In [ ]: