IV. WISE Data Processing

7. Archive Preparation and Loading


Contents

a. Source Data and Metadata
b. Image Data and Metadata
c. Data Output
d. Data and Metadata Transfer to IRSA

The WISE archive is built upon the existing infrastructure of InfraRed Science Archive (IRSA). IRSA has curated and served scientific data products from NASA's infrared and submillimeter projects and missions. It provides an excellent platform for optimal scientific exploration of datasets from NASA missions. The IRSA data holding includes data from InfraRed All-sky Survey (IRAS), 2MASS, Spitzer, Planck, Herschel and various value-added Legacy survey datasets. IRSA provides seamless access to infrared data covering multi-wavelength, multi-epoch all-sky surveys. The new "Hydra" configurable archive interface, recently developed by IRSA, allows a unified experience across missions. Using IRSA to archive the WISE dataset, we aim to achieve stable, long-term management of the data and efficient service to the community.

The WISE mission requirement is that within 1 week of receiving the raw data at IPAC, the WISE archive shall deliver the data product, including the source tables and images to the science team. The WISE archive has met this requirement by achieving rapid data loading and efficient update of the new database. Specifically, we were able to to load new datasets at least every 3 days, i.e., minimum archive data loading frequency of twice a week. The procedure of the WISE archive data loading can be summarized as follows. First, the raw data was ingested and processed through the WISE pipeline. The archive prepares source tables, image data and metadata and stage all of the data for IRSA loading onto the IRSA storage space. After the WISE OPS team completes and delivers the prepared WISE data to the IRSA, the IRSA team begins the loading, update various database, and serves the new data to the science team.

The Archive Preparation subsystem compiles source data, image data, and metadata (data about the data) that are output from various subsystems in the scan-frame and multi-frame WSDC pipelines, adds a few columns in the data structures that no subsystem provides (such as galactic and ecliptic coordinates), and outputs the combined data in files that are readable by the database loading system created by IRSA. The Archive Preparation subsystems are run independent of the data processing pipelines, at any time after the data processing and Quality Assurance are complete.

a. Source Data and Metadata

i. Pipeline Output Data and Metadata

The Archive Preparation subsystem reads output files from the Scan/Frame pipeline or Multiframe pipeline containing the

and metadata from the

2MASS Associations

Archive Preparation also adds 2MASS All-Sky Point Source Catalog (PSC) association data to the WISE source record. These associations were created during the Quality Assurance process by searching for entries in the 2MASS PSC that lie within 3.0 arcsec of the position of the WISE source in the Single-exposure frame or Atlas Tile. The full composite 2MASS All-Sky PSC was searched, including the high reliability Catalog and the lower signal-to-noise extension (see section I.6.b.i of the 2MASS All-Sky Data Release Explanatory Supplement). If more than one 2MASS PSC object is found within 3 arcseconds of the WISE source, the closest association was recorded. No correction was made for proper motion between the epochs of 2MASS and WISE.

Added to the WISE source records that have associated 2MASS PSC entries within 3 arcseconds are the tmass_key column, which is the unique identifier of the associated 2MASS PSC source, n_2mass, which is the number of 2MASS PSC sources found within 3 arcseconds of the WISE source position, the associated 2MASS source J, H and Ks photometry and uncertainties, and the amplitude and position angle of the vector from the WISE source to the 2MASS source, in degrees East of North.

CAUTION - Any 2MASS source information included in the WISE source record is an association not an identification. Although the position accuracies of the two Catalogs are good, there is a non-zero probability of chance associations between physically unrelated objects, as well as missed associations between the Catalogs. You should always confirm associations by reviewing the entries from both Catalogs.

ii. Data Added in Archive Preparation

The Archive Preparation subsystem also creates certain data to add to the source data. The added data fields for both scan-frame and multiframe data are as follows:

Additional data created by the Archive Preparation subsystem for the scan-frame pipeline data are as follows:

Additional data created by the Archive Preparation subsystem for the multiframe pipeline data are as follows:

b. Image Data and Metadata

i. Pipeline Output Data and Metadata

The Archive Preparation subsystem reads image file headers and quality assurance score data from the Scan/Frame pipeline or Multiframe pipeline output. It also compiles a file transfer list of image files and artifact identification files that must be transferred to the IRSA system. A few of the fields from the image header keywords are renamed to improve clarity and the date-time values are reformatted to be consistent with other date-time output formats in Archive Preparation.

ii. Data Added in Archive Preparation

The Archive Preparation subsystem also creates certain data to add to the image metadata. The added data for both scan-frame and multiframe data are as follows:

Also, additional data created for the scan-frame data are as follows:

c. Data Output

The Archive Preparation subsystem compiles all of the above data and metadata together for each scan, frame, or atlas tile, organizes them by the types of data, and outputs them in various files that are readable by the IRSA database loading system. Source and Image data inventory tables are also written to record the date-times each scan or atlas tile was processed through the source and image processing steps in the archive preparation subsystem. The subsystem also writes the file transfer list to another file used by the file transfer routine in the IRSA database loading system. To optimize the loading process, many scan/frames' or atlas tiles' data are then compiled together to create larger load batches for the database loading system. Checksums for each image, data, and metadata file are also determined and recorded.

d. Data and Metadata Transfer to IRSA

When all the data is compiled together for a load batch, the Archive Preparation subsystem creates a manifest file that lists the data and metadata files, file transfer list file, and checksums for each file for that load batch and sends it to the IRSA database staff, who initiate the IRSA database loading system.

The IRSA database loading system transfers all the files to the IRSA systems, determine checksums for each transferred file, and makes sure each checksum is identical to the one recorded by the Archive Preparation subsystem. The loading system then loads the data and metadata files into the proper database tables, indexes the tables, then releases them to the project-internal Gator User interface. It also makes the image files available in the IRSA WISE Archive Image Interface.

One important and useful feature of the WISE data loading into the archive database is its load ID, which identified the dataset loaded into the archive with an unique number. This load ID can allow easy change/update of any specific dataset, as well as flexible query for a smaller dataset. The IRSA team also uses this ID to track down any bad data for replacement, and reload the correct or reprocessed data into the database.


Last update: 2011 April 22


Previous page    Next page
Return to Explanatory Supplement TOC