I. Function
II. Input
III. Output
IV. Design Assumptions
V. Processing Flowchart
VI. Pipeline Processing
GALWORKS detects and estimates parameters for probable galaxies and other extended objects that are smaller than 3 arcmin, and produces a catalog of these parameters and an image database containing the coadd image of these objects as well as images of (pieces of) larger galaxies.
Galaxy extraction requires circularly symmetric PSFs throughout the coadd that do not vary rapidly along the scan. Data showing rapid variations in seeing that are not tracked will be rejected, with information identifying the rejected scan portion output. The input seeing ridgeline information will be used as parameters to establish the zero-point for the relations used to distinguish stars from galaxies, such as the central surface brightness and size. Areas around bright stars will not be searched for galaxies. Bright stars off the edge of the scan will cause problems, but GALWORKS can not reliably find these stars until they have been observed by 2MASS. Thus DBMAN will have to periodically mark sources in the extended source database that are found to be near newly-observed bright stars; alternatively, see Post- Processing Remarks. How to treat galaxies with multiple nuclei, or that overlap other galaxies, or that have stars superimposed on them, are significant issues that may require revision as more data are examined. A design choice has been made, given below, see also Post-Processing Remarks and ISSUES.
Two procedural modes of operation are performed on the coadd data. The purpose of the first mode is to establish the stellar ridgeline parameters unique to the coadd images. The second mode uses this ridgeline information to characterize each object as extended or stellar (or point source) in nature. The extended source character information is written to an external file, and small "postage stamp" images centered on these extended objects are extracted and archived.
VI.a. Compute Flux Thresholds
Starcounts are performed for each coadd and the number density as a function of total
integrated flux is computed. Flux thresholds are set by the coadd source density. If the density
is sufficiently high (e.g. near the Galactic plane), then the limiting flux (i.e., the total integrated
flux) is modified according to the estimated confusion, as given in the memo by Beichman dated
02-19-93. The estimated confusion noise is illustrated in Figure 5 of the noted memo (where the
confusion+sky noise is plotted vs. source density; here the confusion noise is computed as a mean
value in a 5‘ aperture). For example, for a stellar density of 1100 stars/deg2 (K < 14) the sky
surface brightness due to confusion noise has increased by 0.2 mag compared to the pole (300
stars/deg2); for 5000 stars/deg2 the increase is 0.5 mag; for 10000 stars/deg2 the increase is 1.3
mag; for 20000 stars/deg2 the increase is 2.0 mag. The corresponding values for J and H are
similar.
The nominal flux threshold values are:
VI.b. Coadd Preparation for standard processing
The coadds are first ``cleaned'' of bright stars (K < 9, H < 10, J < 11) and large (D > 1 arcmin) galaxies and other large extended objects from catalogs. Note: we will attempt to process known large galaxies with D < 5' as long as 75% of the galaxy diameter is enclosed within the coadd image. (The coadd images, or pieces thereof, of these bright galaxies and other known galaxies are saved in the Extended Source Image Database.) A secondary task, not connected to galaxy processing, is also performed for reasons of an efficient system design: the band-merged point source list is band-filled, i.e., in bands where no extracted point source is available, upper limits are obtained from the coadded image, and the band-merged band-filled point source list is written out. See Appendix for further discussion of band filling.
For bright stars, cleaning consists of masking pixels in circular regions centered on these objects, as well as horizontal, vertical, and diagonal rectangular regions containing diffraction spikes from these bright objects. Persistence blanking is also performed on the coadds. The positions and magnitudes of bright stars are obtained from PIXPHOT output and from an internal IPAC catalog called "JCAT" (constructed for the IRTS mission; ref: Bev Smith).
For large galaxies, cleaning uses a catalog containing positions, magnitudes, and sizes of large optical galaxies. The catalog was generated from a ``NED'' search of the UGC catalog. Pixel masking is performed for regions defined by the size of the galaxy as given in the catalog.
Object detection (see below) is used only in checkout to ensure that the system detection step does not miss any extended objects. In addition to the three (JHK) band coadds, GALWORKS employs a "super" image, or "supercoadd", consisting of a weighted average of these images, which is used only to determine the axial ratio and position angle for galaxy candidates and to determine a more precise source position. The weights are derived from optimization of the SNR and colors of galaxies in the average.
VI.c. Background Determination
The most critical operation for galaxy detection and parameterization is the correct determination of the coadd background, which is subtracted from the coadd. The background is computed using a two-step iterative scheme. First, all sources detected from PIXPHOT are blanked from the coadd. All known galaxies (from input catalogs) are blanked. The coadd is then median filtered with an N X N kernel (e.g., 8 X 8). A one dimensional cubic polynomial is then fit to each column of the coadd (after stars have been masked from the image). A residual between the data and the fit is computed, from which those points whose residuals are less than 3-sigma are included in the next iterative fit. This rejection procedure is iterated 2 - 3 times. In this way, stars and small galaxies (not included in the PIXPHOT output or those missed by the median filter) are excluded from the background determination. The second step consists of a cubic polynomial fit to each line of the coadd, where the line consists of solutions from the column 1-D fit in step 1. By coupling the column solution with the line solution, we are in effect fitting a 2-D surface to the coadd. The solution from this step is the background, which is then subtracted from the coadd. The cubic polynomial tracks only variations in the background larger than about 100-300 arcsec. See Appendix for further discussion of background determination.
VI.d. Identify Possible Extended Sources
With bright stars and large galaxies masked from the coadds, and the background determined and subtracted from each JHK coadd, each coadd is searched for additional extended sources.
Two different algorithms are used to find galaxies (see below). The first algorithm, described in this section, finds ``normal'' galaxies that have a central peak greater than 4sigma for an individual pixel and is used only in checkout. A set of classification operations that will distinguish galaxies from stars is performed on each detection object. The second algorithm, described below, operates on the JHK coadds after all point sources found by the point source processor and galaxies found by the first algorithm are masked off. It then block averages the masked-off image to find low central surface brightness galaxies or galaxies that do not have a classic radial profile. The purpose of the detection step is to find ``normal'' galaxies that were not detected as point sources by PIXPHOT and which have a classic galaxy profile (as opposed to a general amorphous blob). All stars found by PIXPHOT are masked off using a radius large enough to ensure that no flux remains in each JHK coadd. Object detection consists of a two step process. The first step is to find local maxima objects (where minimum separation between local maxima is given by some constant value). In the second step, all local maxima objects are compared to their immediate ``local'' background value, which is computed in an annulus centered on the object (e.g., Rann = 5 - 6 arcsec). If the maximum pixel value of the object is less than 3sigma above the ``local'' background value, then the object is rejected. This operation is designed to eliminate noise spikes and other false detections that tend to occur along steep flux gradients associated with moderate to large galaxies or in regions of complex nebulosity (e.g., Rho Oph).
All sources are then band merged to form one source per combined JHK band (with improved position). As an initial estimate of extent, the differential central surface brightness (normalized by their total integrated brightness) is computed for each object (see below for discussion of this parameter). This parameter is used as a conservative estimate of the likelihood that an object is a point source (as opposed to a galaxy, e.g.). Objects that are likely point sources are subtracted from the coadd when the parameterization & characterization operation is being performed for each object (see below for discussion of "nearby neighbor subtraction"). We will process all objects that are part of an input catalog (provided by S. Schneider, e.g.) regardless of whether they are extended or not.
VI.e. Parameterization & Photometry
VI.e.1 Parameterization
The next level of processing is to parameterize each detected object and perform a set of classification operations that will distinguish stars from galaxies. Galaxies are discriminated from stars in three general ways: their central surface brightness is significantly lower than that of stars of the same brightness, and their radial profile and annular brightness (see below) is significantly more extended than that of stars. These discrimination parameters are computed in part by fitting the three-parameter function f0 * exp [ (-r / alpha) ** (1 / beta) ] to the coadd pixels centered on each object, where r is the radial distance from the object centroid and f0 is the central surface brightness. Only pixels located within a circular aperture centered on the object are considered. Pixels centered on neighboring detections are subtracted if their initial differential central surface brightness (see below) is consistent with their being stellar (or point-like). A radius of 6 pixels (= 6arcsec) is initially used for the fit. After the fit parameters are determined, an optimized aperture size is computed and the fit repeated.
The central surface brightness of the fit to each object is compared with the expected central surface brightness of a stellar source with the same total integrated magnitude. The differential surface brightness, f0(star) - f0(object), normalized by the total integrated flux, is the first discriminant between galaxies and stars.
Analysis shows that alpha and beta are correlated and that the value of beta changes with source magnitude and the point spread function, rather than with galaxy type. However, the quantity alpha * beta, termed the ``radial shape discriminant'' (abbreviated as ``shape''), is useful as the second discriminant between galaxies and stars. The dependence on the point spread function is removed by subtracting the value for point sources, and defines the ``differential shape''. This radial shape discriminant is not to be confused with the two-dimensional contours on the sky, and is not an intrinsic property of each galaxy due to its dependence on seeing, signal-to-noise, etc.
The ridgeline parameters f0(star) and shape for point sources and other ridgelines can be fully described by point sources found from PIXPHOT and characterized by SEEMAN.
In order to distinguish double stars from truly extended objects, the radial profile fit is performed again on each potential extended source after masking off a 60 degree wedge, with the vertex anchored to the source peak. Nine different fits are performed, using wedges that in turn cover a full circle around the source. After determining where the minimum shape occurs, a finer grid of wedge angular placements is used to determine the actual minimum shape for each source. Since the ``other'' star will be masked by one of these wedges, the minimum shape will be that of a point source, whereas galaxies almost never attain such a low minimum shape value due to their extent in all directions.
In order to attempt to distinguish triple stars from extended objects, a profile fit is performed for the two symmetric radial vectors extending along the elliptical major axis from the source peak, as well as 8 additional radial fits along the cardinal points and rays rotated 45 degree from the cardinal points. Another powerful discriminent of triple stars is the psf-subtracted segmented annular flux.
Double and some triple stars are rejected by applying thresholds in ``wedged'' shape versus radial shape, in the ``major-axis'' shape versus radial shape and annular flux as mentioned above. A score is assigned to each remaining object that represents the probability of that object being a galaxy. Additional parameters used to discriminate point sources from extended objects include the radial 1st and 2nd moments and a set of annular parameters: differential annular brightness (normalized by the total integrated brightness) for several different annuli, and symmetry (about a point) in the annular brightness distribution (specifically, the azimuthal flux distribution). Finally, the reduced chi determined from the radial galaxy profile fit to the major- axis of the object can be used to distinguish double and triple stars from truly extended objects.
The key parametric values computed for each object are the position, 2-D elliptical shape (computed from K alone, or H alone, or J alone, depending on if the source is a 3-band source, 2-band (JH) source or a J only source), maximum circular and elliptical sizes, central surface brightness, elliptical fit to the 3sigma isophote, and the source extent: generalized galaxy profile shape, wedge-masked galaxy profile shape, major-axis profile shape, the elliptical profile shape and the radial moments. Neighboring detections are subtracted (or masked out, depending upon their relative brightness) during the elliptical and profile fits.
Galaxies and other extended sources are then selected from these objects by applying thresholds in central surface brightness and in the various galaxy profile shapes as a function of magnitude. These thresholds apply to the quantitative distance these parametric values have from ridgelines defined by pure stellar sources. Double stars and other false galaxies are rejected with the wedge-masked, radial vector profile shapes, and the psf-subtracted annular flux. Other artifacts, including geometrically well-defined ``ghosts'' and persistence are also rejected. A score is assigned to each object that represents the probability of that object being a galaxy. If a source appears to be a galaxy in any of the three bands, then it is identified as a galaxy (i.e., to reject a source as an extended object, it must fail the score criteria in each band).
VI.e.2. Source Position
The position of the source is determined by computing the intensity-weighted centroid for each band using a box kernel of width 5 pixels. The pixel positions are then combined in a simple average to obtain the final pixel position.
VI.e.3. Photometry
With extended sources detected and identified, the next step is to compute true integrated fluxes (or upper limits, the case being). The magnitude is determined via aperture photometry using several fixed size apertures, as well as total magnitudes derived from elliptical and circular apertures determined from fitting an adaptive aperture to the galaxy on the K image (the same size apertures are used in each band). In addition, an isophotal integrated magnitude is computed from the 20 and 21 mag per arcsec2 elliptical isophotes at K, using the same aperture at J and H for a 3 band source. Finally, a petrosian radius and magnitude are computed; the petrosian radius is defined as at the point that the radial SB = 1/5 of the median SB of the galaxy. It is also possible to compute a hybrid flux consisting of the total flux as derived from the adaptive aperature method plus a flux computed from the integral of a modified exponential function fit to the "wings" of the radial profile. The purpose of this flux measure is to recapture galaxy flux lost in the noise assuming smooth galaxy profiles. For more discussion of this flux, see routine surfrad in the appendix.
For sources detected at H & J only, the apertures determined for H are applied to J & K. For sources detected only in J, the J apertures are applied to H & K. Integrated fluxes are derived as follows. The method uses only the global background (from the cubic polynomial fit; see above), and uses both neighboring star blanking and subtraction. Bright stars and large extended sources are blanked as described earlier. All sources are subtracted from the coadd. With neighboring sources either blanked or subtracted as appropriate, the various fixed and adaptive apertures are then applied.
Extended sources will go into the main galaxy database, which will include the position (RA, DEC), central surface brightness, circular aperture magnitudes (radii = 5, 10, 15, 20, 25, 30, 40, 50 and 60 arcsec), optimized elliptical & circular aperture magnitudes, isophotal magnitudes, petrosian radius and magnitude, hybrid extrapolated radius and magnitude, 2-D elliptical shape parameters (eccentricity, semi-major axis, position angle), mean background value and sigma, the gradient and residual emission in the local background, and radial profile shape parameters (central surface brightness, alpha, beta).
VI.f. Extract Postage-Stamp Images
The final step of algorithm 1 is to build the extended source image archive. For objects classified as extended sources, or known to be large galaxies from the optical input catalog, regions surrounding the objects are extracted from the coadds and stored to disk. These subimages have a size such that at least 95% of the observable flux of the galaxy is contained in the image.
After image extraction and image blanking (that is, the area defining the processed object is blanked from the coadd image), the last set of characterization parameters are computed. These include computation and parameterization of the "local" background -- corresponding to the pixels local to the object which has been blanked from the coadd. These parameters are designed to detect nebular flux and other residual emission local to the object.
VI.g. LCSB Galaxy Detection and Extraction
The purpose of this operation is to recover low central surface brightness galaxies (and nebular structures) that might have been missed by the pipeline Galworks processing. All stars found by PIXPHOT are masked off (from JHK-band images) using a radius large enough to ensure that no flux remains in the coadd from them. All galaxies found earlier are masked off similarly. The masked-off coadd is then block average, first 2 X 2, then 4 X 4 and finally 8 X 8. A filter is then applied to the blocked image that detects all sources with integrated flux above a N-sigma threshold (N is tied to source density; for the low density case, N = 4.0). Another filter will derive circular aperture magnitudes for various radii for all detections from the individual JHK images and compute a petrosian radius and flux magnitude. Sources found from this process will go into a separate low surface brightness galaxy database (as it is anticipated that the reliability of such detections will be lower than with algorithm 1), which will include only the following information: position (RA,DEC), circular aperture magnitudes (radii = 5, 10, 15, 20, 30, 40 , 50, 60, 70), circular petrosian radius and flux magnitude, peak SNR and total SNR.
See also
After the primary extended source pipeline processing has completed, it will be necessary to perform some additional operations in order to generate the "final" galaxy catalog product. Two such examples include cleaning the extended source lists of contaminant (false) detections and merging duplicate sources. Many of these tasks will be performed in DBMAN.
False detections can occur for situations in which a bright star (or a large uncataloged, galaxy) is located on the edge of a scan. Bright stars have diffraction spikes and other "extended" structures that may be detected (falsely) as galaxies. To clean these false sources, the candidate galaxy list and the list of known bright stars (from PIXPHOT and elsewhere) are matched and the false detections (those located near bright stars or near their associated diffraction spikes) are eliminated. Note that it may be possible to detect meteor streaks in the main pipelie processing. See Appendix for further discussion of meteor/streak detection.
To clean false detections due to large galaxies near the edge of a scan, the galaxy list is searched for close proximal objects: pairs of objects which are located within some radial annular distance to each other (e.g., this radius may be set by the 1… isophotal radius of the brighter object). This operation will clean both false detections due to large galaxies near the edge of scans and false detections located in close proximity to large galaxies (e.g., a noise spike occurring near the disk of a large spiral galaxy). Optimal setting of the proximity radius is necessary to balance the cleaning of false detections (in which a large radius is optimum) versus inclusion of merging galaxies and other extended objects that are in close angular proximity (here we want a small cleaning radius to avoid rejecting these objects).
The same galaxies, or duplicates, may occur more than once in the candidate galaxy list since there is 10% overlap between scans and actual coadd images. It is therefore necessary to eliminate duplicate appearances of galaxies by selecting the object located the farthest from the edge of a scan or coadd (since more of the galaxy flux is captured).
The probability of stellar contamination is essentially independent of galaxy magnitude. This arises because brighter galaxies are bigger, but the contaminating star density is lower since the threshold for contamination is raised. The probability that {the J isophotal magnitude measurement corresponding to H isophotal mag of 20 mag per square arcsecond} is contaminated by 10% is about 10% at the galactic poles, and 30% at b ~ 20 degrees. The probability that the J "total magnitude" is contaminated by 10% is about 20% at the poles, and 80% at b ~ 20 degrees.
Thus a significant portion of the entire galaxy database will consist of sources contaminated by stellar sources, which makes it essential to verify either that the algorithms have done the best job, or to identify sources affected by a particular sort of contamination. Without detailed examination, we won't really know how many sources are affected by problems of other sorts. How do we implement the "human-in-the-loop" examination of all galaxy images? One possible approach is the following: A human can quickly look at a coadd image and set flags, as appropriate, for the following conditions:
1. Simulation: Two kinds of simulations are performed, the first being a full simulation of both stars (modeled using a Lorentzian distribution) and galaxies (exponential disk and r1/4 elliptical distributions), and the second being a hybrid using simulations of stars and actual coadd data (e.g., coadds containing the Coma Cluster region). These simulations (performed using a wide range in stellar number density) are designed to test for completeness and reliability, as well as photometric uncertainty due to SNR and stellar contamination.
2. Repeated observations of galaxy cluster fields (e.g., Coma, Hercules, HHE, Abell 2065, Abell 262, Virgo) as well as random fields (e.g., near the Galactic plane).
3. Comparison to higher resolution observations (both optical and infrared).
4. Comparison to previously obtained results at other wavelengths (e.g., using the NED database;
APM)
More Details on BACKGROUND DETERMINATION
Section VI.c. gave an overview of the background determination. Here we provie a more detailed dscription. Proper background subtraction is crucial toward efficient star-galaxy discrimination and accurate photometry. For example to the latter, an error of only 0.1 DN in the background results in ~11% error in the total magnitude estimate for a 13th mag galaxy. Given the relatively large number of pixels in a coadd image (512 X 1024) it is possible to measure the background accurately if one uses as much of this information as possible. The background determination algorithm developed for 2MAPPS employs both median filtering and polynomial fitting of the coadd as described below.
Note: the input image array has been "cleaned" as follows:
1. bright stars have been masked, as well as their associated persistence and ghost features
masked.
2. Stars (from PROPHOT) have been masked from the image.
3. Large galaxies (from the UGC catalog) have been masked from the image.
Algorithm:
1. Median filter coadd using NXN box kernel. To minimize runtime N > 4, with a current default
value of 8 (preliminary testing, however, suggests that a block of 16 is feasable and is much
faster). The purpose of median filtering is two fold: (1) screen out deviant pixels due to star
contamination or bad pixels and (2) compress pixel information to speed up polynomial fitting
procedure, described below.
2. Polynomial fitting of coadd columns and rows. A one dimensional cubic polynomial is first
fit to each column of the image array. A residual between the data and the fit is computed, from
which those points whose residuals are less than 3-sigma are included in the next iterative fit.
This rejection procedure is iterated two - three times in order to minimize the inclusion of stars
and small galaxies in the background determination. The next step consists of a cubic
polynomial fit to each line of the image, where the line consists of solutions from the column
1-D fit in step 1. By coupling the column solution with the line solution, we are in effect fitting
a 2-D surface to the coadd. The solution from this step is the final background.
This algorithm is sensitive to back gradients or structures that have a size scale between 3 to 5
arcminutes:
Point sources that require a bandfill flux measurement are handled within the GALWORKS subsystem as follows. One of the first steps of GALWORKS is to call the bandfill routine, which reads the point source file, converts position coordinates to coadd x-y coordinates, performs bandfill operations as needed, writes back out the point source list with bandfill entries, and returns to GALWORKS the point source list array (x,y, J, H, K mags).
1. Read point source list file from disk
2. Convert source position to coadd x-y coordinates
3. For those sources requiring a bandfill, perform aperture photometry
LCSB PROCESSOR
EXTENDED BRIGHT STAR DETECTION
It is possible to detect emission surrounding bright stars if the star and its associated
reflected light (e.g., diffraction spikes) are properly subtracted from the coadd first. The
algorithm described here (a.k.a algorithm 3) is designed to find bright emission around (but well
beyond the PSF of) bright stars.
METEOR / STREAK DETECTOR
The purpose of this module is to detect linear streaks in the coadd images caused by meteors, satellites, airplanes and other fast moving objects in the sky. The basic idea is to find linearly correlated pixels (mini-streaks or vectors) and to then collate the vectors into one single vector/corridor according to vertex position and position angle. A meteor streak, for example, will have several min-streaks that are correlated along one direction (or along a corridor) so that if one counts the number of pixels within this corridor the value will be large (and the length of the corridor will be relatively long) compared to some random set of linear correlated pixels which happen to be aligned some corridor (which may be due to some faint galaxy, bright star diffraction spikes, faint triples stars, etc).
The input image has been previously processed by GALWORKS so that all sources (brighter than some limit), bright stars and their associated persistence ghosts and galaxies have been masked from the coadd image. LSB sources, however, have not been masked (the reason being that the LSB processor will be sensitive to meteor streaks, so the streak detector needs to be run before the LSB detector).
Algorithm:
1. Block average image (2 X 2 seems to work best)
2. Compute stats of image: mean, median and st. dev
3. Linear Correlation: for each pixel, find all connected pixels with vals > 2*sigma, search
position angle space (i.e., vectors) for connected pixels.