IV. WISE Data Processing
7. Archive Preparation and Loading
Contents
a. Source Data and Metadata
b. Image Data and Metadata
c. Data Output
d. Data and Metadata Transfer to IRSA
The WISE archive is built upon the existing infrastructure of InfraRed Science Archive (IRSA).
IRSA has curated and served scientific data products from NASA's infrared and
submillimeter projects and missions. It provides an excellent
platform for optimal scientific exploration of datasets from NASA missions.
The IRSA data holding includes data from InfraRed All-sky Survey (IRAS), 2MASS, Spitzer, Planck, Herschel and various value-added Legacy survey datasets. IRSA provides seamless access to infrared data covering multi-wavelength, multi-epoch all-sky surveys. The new "Hydra" configurable archive interface, recently developed by the IRSA team, allows a unified experience across missions. The same data access plateform provides the user community the inter-changebility across many
different database. Using IRSA to archive the WISE dataset,
we aim to achieve stable, long-term management of the data and efficient service to the community.
The WISE mission requirement is that within 1 week of receiving the raw data at IPAC, the WISE archive shall deliver the data product, including the source tables and images to the science team. The WISE archive has met this requirement by achieving
rapid data loading and efficient update of the new database. Specifically, we were able to
to load new datasets at least every 3 days, i.e., minimum archive data loading frequency of twice a week. The procedure of the WISE archive data loading can be summarized as follows. First, the raw data was ingested and processed through the WISE pipeline. The archive prepares source tables, image data and metadata and stage all of the data for IRSA loading onto the IRSA storage space.
After the WISE OPS team completes and delivers the prepared WISE data to the IRSA, the IRSA team begins the loading, update various database, and serves the new data to the science team.
The Archive Preparation subsystem compiles source data, image data, and metadata (data about the data)
that are output from various subsystems in the scan-frame and multi-frame WSDC processing pipelines,
adds a few columns in the data structures that no subsystem provides (such as galactic and ecliptic coordinates),
and outputs the combined data in files that are readable by the database loading system created by IRSA.
The Archive Preparation subsystems are run independent of the data processing pipelines,
at any time after the data processing and Quality Assurance are complete.
a. Source Data and Metadata
i. Pipeline Output Data and Metadata
The Archive Preparation subsystem reads output files from the Scan/Frame pipeline or Multiframe pipeline
containing the
- source data,
- photometric calibration data,
- known solar system object association data,
- and quality assurance data, including quality scores and 2MASS source associations,
and metadata from the
- instrumental calibration subsystem,
- source detection subsystem,
- source photometry subsystem,
- position reconstruction subsystem,
- known solar system object association subsystem,
- artifact identification subsystem and
- the pipeline executive system.
2MASS Associations
Archive Preparation also adds 2MASS All-Sky Point Source Catalog (PSC)
association data to the WISE source record. These associations were
created during the Quality Assurance process by searching for entries
in the 2MASS PSC that lie within 3.0 arcsec of the position of the WISE
source in the Single-exposure frame or Atlas Tile.
The full composite 2MASS All-Sky PSC was searched,
including the high reliability
Catalog and the lower signal-to-noise extension (see
section I.6.b.i of the 2MASS All-Sky Data Release Explanatory Supplement).
If more than one 2MASS PSC object is found within 3 arcseconds of the WISE
source, the closest association was recorded.
No correction was made for proper motion between the epochs of 2MASS and WISE.
Added to the WISE source records that have associated 2MASS PSC entries
within 3 arcseconds are the tmass_key column, which is the unique
identifier of the associated 2MASS PSC source, n_2mass, which is
the number of 2MASS PSC sources found within 3 arcseconds
of the WISE source position, the associated 2MASS source J, H and Ks
photometry and uncertainties, and the amplitude and
position angle of the vector
from the WISE source to the 2MASS source, in degrees East of North.
CAUTION: Any 2MASS source
information included in the WISE source record is an association
not an identification. Although the position accuracies of
the two Catalogs are good, there is a non-zero probability of chance
associations between physically unrelated objects, as well as
missed associations between the Catalogs. You should always
confirm associations by reviewing the entries from both Catalogs.
ii. Data Added in Archive Preparation
The Archive Preparation subsystem also creates certain data to add to the source data.
The added data fields for both scan-frame and multiframe data are as follows:
- Galactic and ecliptic coordinates glon, glat, elon, and elat
are calculated for each source and frame center
from the equatorial coordinates output by the pipeline.
For the frame metadata, the frame center is defined as the frame center of band W1.
- The source photometric quality flag, ph_qual,
is a four-character flag, with one character per band [W1/W2/W3/W4],
that provides a shorthand summary of the quality of the profile-fit
photometry measurement in each band, as derived from the measurement
signal-to-noise ratio. The values are:
- A - Source is detected in this band with a flux signal-to-noise
ratio w?snr>10.
- B - Source is detected in this band with a flux signal-to-noise
ratio 3<w?snr<10.
- C - Source is detected in this band with a flux signal-to-noise
ratio 2<w?snr<3.
- U - Upper limit on magnitude; the source measurement has w?snr<2.
The profile-fit magnitude w?mpro is a 95% confidence upper limit.
- The bitwise source detection flag, det_bit,
is a bit-encoded integer indicating bands in which a source has
a w?snr>2 detection. For example, a source detected in W1 only
has det_bit=1 (binary 0001). A source detected in
W4 only has det_bit=8 (binary 1000). A source detected in
all four bands has det_bit=15 (binary 1111).
- Unit sphere coordinates, x, y, and z,
are calculated for all output tables containing positional information:
the source data, frame metadata, known solar system object associations, and photometric calibrator associations.
They are defined as follows:
x = cos(RA) * cos(Dec)
y = sin(RA) * sin(Dec)
z = sin(Dec)
Together these three columns form a unit three-vector representation of the position.
- The spatial index spt_ind value,
calculated from the unit sphere coordinates,
is the level 7
Hierarchical Triangular Mesh index for the position,
in decimal format.
Sources with the same spatial index value are close to each other on the sky,
and the spatial indexes thus provide a useful mapping of the WISE release on storage media.
Additional data created by the Archive Preparation subsystem
for the scan-frame pipeline data are as follows:
- Designation is defined as sexagesimal, equatorial position-based source name in the form: hhmmss.ss+ddmmss.s. The full naming convention for WISE All-Sky Release Catalog sources has the form "WISE Jhhmmss.ss+ddmmss.s," where "WISE" indicates the source is from the All-Sky Release Source Catalog, and "J" indicates the position is J2000. The "WISE" acronym is not listed explicitly in the designation column.
The acronym for entries in the All-Sky Release Reject Table is "WISER".
CAUTION: Source designations should not be used as an astrometric reference. They are not a substitute for the source position information in the ra and dec columns.
- Both the source data and the Atlas tile metadata contain the 4-digit moon_lev flag for the Atlas tile. The four digits represent the "moon level" in the coadded images for bands 1--4; each value is the "ceiling" (upwards rounding) of the fraction of frames in that band's coadded image that are contaminated by moon-glow multiplied by 10, with a maximum value of "9". For example, a band 1 coadded image made up of 100 frames, with 14 contaminated by moon-glow, would have a moon_lev value of:
moon_lev = ceil[(Nfrms_moon_contam/Nfrms_total)* 10]
or
moon_lev = ceil[(14/100) * 10] = ceil(1.4) = 2,
and the Atlas tile would have a moon_lev flag value of "2xxx" (where "xxx" are the calculated values for the other three bands).
- In the source data, the cntr number is a unique identification number for each source,
formed entirely from the source identifier source_id.
The source identifier is in turn formed from the scan identifier, scan_id,
the frame number within the scan, frame_id,
and the sequential number of the source within the frame, src.
Therefore, on average sources with cntr values close to each other are also close to each other on the sky,
except at scan/frame boundaries.
Cntr ordering thus provides a useful mapping of the WISE release on storage media,
though somewhat less useful than spatial index mapping.
The cntr value is formed by making the source identifier into an integer, in the format:
SSSSSssFFFIIIIII
where
- SSSSSss = Scan identifier scan_id,
with the letter in the last position translated into
two zero-filled digits corresponding to the letters' places in the alphabet.
- FFF = Frame number frame_id,
which is three zero-filled digits.
- IIIIII = six-digit, zero-filled, sequential extracted source number, src, within the frame.
For example, a source in the scan 04381b, frame number 057,
with a source_id of 04381b057-012345,
would have a cntr value of 438102057012345.
- In the metadata and association data, the cntr number is a unique identification number
for each output table entry,
created sequentially in the order of the Archive load processing.
- A value of "1" for the known solar system object association flag, sso_flg,
indicates that a known solar system object was associated
with this detected source during pipeline processing.
A value of "0" indicates that there was no association.
- The modified Julian date of the observation, mjd,
is calculated from the mid-point of the observation of the frame.
- In the metadata, the flag indicating whether the frame is within the moon-mask area, moon_masked, was compiled into a 4-digit flag, with one digit for each band.
Additional data created by the Archive Preparation subsystem
for the multiframe pipeline data are as follows:
- In the source data, the cntr number is a unique identification number for each source,
formed entirely from the source identifier source_id.
The source identifier is in turn formed from the coadd identifier, coadd_id,
and the sequential number of the source within the frame, src.
Therefore, on average sources with cntr values close to each other are also close to each other on the sky, except at Tile boundaries.
- Cntr ordering thus provides a useful mapping of the WISE release on storage media,
though somewhat less useful than spatial index mapping.
- The cntr value is formed by making the source identifier into an integer, in the format:
RRRRsDDdttrevIIIIII, where
- RRRR = Tile center RA in deci-degrees, truncated not rounded
(e.g. RRRR=int[10*ra]), as in the coadd_id.
- s = Tile center Declination sign translated from "p" or "m" into "1" or "0", respectively.
- DDD = Tile center Declination in deci-degrees, as in the coadd_id.
For positive declinations, the tenths of a degree is truncated not rounded
(e.g. DDD=int[10*dec]).
For negative declinations, the tenths of a degree is always truncated leftward
on the number line (e.g. DDD=ceil[10*abs(dec)].
- [ttrev] = Disambiguation string, translated where necessary into digits
corresponding to the letters' places in the alphabet.
- t - Tile type translated into two digits, zero-filled ("a" = "01").
- r - Data Release identifier translated into a single digit,
since the possible values are limited to "a" through "i" (The All-sky Release uses "b" = "2").
- e - Data epoch ("4" = 4-band data, "3" = 3-band data).
- v - Processing version.
The translated disambiguation string is always 01111 for Tiles in the WISE
Preliminary Data Release.
- IIIIII = six-digit, zero-filled, sequential extracted source number,
src, within the tile.
For example, a source in the tile 3041m137_aa11,
with a source_id of 3041m137_aa11-012345,
would have a cntr value of 3041013701111012345.
- In the metadata, the cntr number is a unique identification number
for each metadata output table entry,
created sequentially in the order of the Archive load processing.
b. Image Data and Metadata
i. Pipeline Output Data and Metadata
The Archive Preparation subsystem reads image file headers
and quality assurance score data
from the Scan/Frame pipeline or Multiframe pipeline output.
It also compiles a file transfer list of image files and artifact identification files
that must be transferred to the IRSA system.
A few of the fields from the image header keywords are renamed to improve clarity
and the date-time values are reformatted to be consistent
with other date-time output formats in Archive Preparation.
ii. Data Added in Archive Preparation
The Archive Preparation subsystem also creates certain data to add to the image metadata.
The added data for both scan-frame and multiframe data are as follows:
- The FITS image header WCS data and WCS subroutines
are used to calculate the equatorial positions
of the four corner pixels of each image,
w?ra1, w?dec1, w?ra2, w?dec2,
w?ra3, w?dec3, and w?ra4, w?dec4.
- The cntr number is a unique identification number
for each metadata output table entry,
created sequentially in the order of the Archive load processing.
- Unit sphere coordinates, x, y, and z,
are calculated for the image center position.
They are defined as follows:
x = cos(RA) * cos(Dec)
y = sin(RA) * sin(Dec)
z = sin(Dec)
Together these three columns form a unit three-vector representation of the position.
- The spatial index spt_ind value,
calculated from the unit sphere coordinates,
is the level 7
Hierarchical Triangular Mesh index for the position,
in decimal format.
Sources with the same spatial index value are close to each other on the sky,
and the spatial indexes thus provide a useful mapping of the WISE release on storage media.
Also, additional data created for the scan-frame data are as follows:
- Galactic and ecliptic coordinates glon, glat, elon, and elat
are calculated for each image center
from the equatorial WCS frame center coordinates output by the pipeline.
(The galactic and ecliptic coordinates are already available in the pipeline output
for the multiframe data.)
c. Data Output
The Archive Preparation subsystem compiles all of the above data and metadata together
for each scan, frame, or atlas tile,
organizes them by the types of data,
and outputs them in various files that are readable by the IRSA database loading system.
Source and Image data inventory tables are also written to record the date-times each scan or atlas tile was processed
through the source and image processing steps in the archive preparation subsystem.
The subsystem also writes the file transfer list
to another file used by the file transfer routine in the IRSA database loading system.
To optimize the loading process,
many scan/frames' or atlas tiles' data are then compiled together to create larger load batches
for the database loading system.
Checksums for each image, data, and metadata file are also determined and recorded.
d. Data and Metadata Transfer to IRSA
When all the data is compiled together for a load batch,
the Archive Preparation subsystem creates a manifest file
that lists the data and metadata files, file transfer list file,
and checksums for each file
for that load batch and sends it to the IRSA database staff,
who initiate the IRSA database loading system.
The IRSA database loading system transfers all the files to the IRSA systems,
determine checksums for each transferred file,
and makes sure each checksum is identical to the one
recorded by the Archive Preparation subsystem.
The loading system then loads the data and metadata files
into the proper database tables, indexes the tables,
then releases them to the project-internal Gator User interface.
It also makes the image files available in the IRSA WISE Archive Image Interface.
One important and useful feature of the WISE data loading into the archive database is its load ID, which identified the dataset
loaded into the archive with an unique number. This load ID can allow easy change/update of any specific dataset, as well as flexible query for a smaller dataset. The IRSA team also uses this ID to track down any bad data for replacement, and reload the correct or reprocessed data
into the database.
Last update: 2012 March 15