IV. WISE Data Processing

2. Data Ingest


Contents

a. Data Transfer
i. Data Storage
ii. Manifest Files
iii. Output Log
iv. The Working Directory
b. MOS Ingest
i. MOS Data Requirements
ii. MOS Data Delivery Cadence
iii.Ancillary Data Database Generation
iv. MOS Ingest Metadata Table Output
c. HRP Ingest
i. Image Reconstruction
ii. Correlation to Ancillary Data
iii. Level 0 Image Creation and Archive Update
iv. Scan ID and Frame Number Assignment
v. Frame Index Update
vi. Recovery From Telemetry Errors
vii. Frame Accounting
viii. Scan Pipeline Kickoff
ix. HRP Ingest Meta-data File Output
d. Quicklook Processing

2. Data Ingest

The Ingest subsystem at the WSDC was responsible for assembling Level 0 images from various mission ancillary data products from the Mission Operations System (MOS) and science image data from the High Rate Processor (HRP).

Deliveries include the following:

Ingest has six primary functions:

  1. Detect data transferred to the WSDC ingest and move those data to archive locations ready for further processing
  2. Interpret mission ancillary data sent by MOS and construct internal databases for use in data processing
  3. Depacketize and decompress image data from the HRP and correlate it with mission ancillary data to create Level 0 images (L0)
  4. Initiate downstream Quicklook and Scan/frame pipeline processing
  5. Write/update L0 meta-data to the Frame Index (FIX)
  6. Provide QA metrics on the L0 images, mission ancillary data, and the ingest process

Figure 1 - End-to-End Data Flow

Ingest carries out its primary functions in three distinct stages:

  1. Data transfers archived and copied, triggering subsequent steps IV.2.a.
  2. Mission ancillary data ingest upon delivery by MOS IV.2.b.
  3. HRP ingest upon delivery; this stage also does pipeline kickoff and FIX updating IV.2.c.

a. Data Transfer

The transfer process verified data for completeness and integrity and stored the transferred files on the WSDC operations system at IPAC. Copies of these data were used for the actual ingest process allowing the original data to remain untouched.

i. Data Storage

Data integrity checking ensured that each file transfer was complete and uncorrupted. Files were compared against their accompanying manifest to verify that file names and sizes were as expected.

ii. Manifest Files

A summary of a transfer's contents was created and saved for later reference.

iii. Output Log

A log file containing a record of all actions taken was then created, saved and subsequently reviewed by the operations staff.

iv. The Working Directory

Each time the completion of a new delivery was detected, all data were copied into the operations working directory and validated. All subsequent ingest occurred out of the working directory.

b. MOS Ingest

The MOS at JPL is responsible for delivering all ancillary data, i.e. non-image meta-data necessary for subsequent data processing and evaluation.

Figure 2 - MOS Data Ingest Overview

i. MOS Data Requirements

Every ingest of HRP data (WISE science images) requires that the following MOS ancillary data be up to date, i.e. the most recent data available must have already been transferred and ingested:

If any of these files are missing or out of date, HRP ingest was aborted, then restarted when all prerequisites were present.

ii. MOS Data Delivery Cadence

SCLK clock kernels were updated roughly weekly and on an as-needed basis. PEF and ground track files were delivered twice a week, associated with the command sequence generation cycle. All other MOS ancillary data were derived directly from telemetry downlinks and were transferred ahead of the associated HRP image data.

iii. Ancillary Data Database Generation

All MOS deliveries were handled by a common ingest pipeline with the actions to be taken driven entirely by what files were present in the delivery. The many MOS data file types were then segregated by the file name suffix, and each type was handled in its own specialized manner.

1. LSK, SCLK DB
The Leap Second Kernel (LSK) is a SPICE kernel, describing the translation between ET and UTC. The Spacecraft Clock Kernel (SCLK) is also a SPICE kernel describing, with the LSK, the translation between Vehicle Time Codes (VTC) and UTC dates and times.

The clock kernel files are cumulative, that is, they include data that cover the whole mission up to the final time covered by the file. Thus, each new SCLK kernel ingested simply replaced previous versions.

2. CK DB
The C-kernel (CK) files are SPICE C-kernels describing spacecraft orientation. C-kernels (like SP kernels below) form a complete database when all CK files are read in time order by NAIF library routines.

3. SPK DB
The SP-kernels (SPK) files are SPICE kernels describing the ephemerides of various orbiting bodies, particularly the moon, planets and, most especially the WISE satellite. SP-kernels (like C kernels above) form a complete database for the mission when all SPK files are read in time order by the NAIF library routines.

NAIF library routines are directed by the ingest pipeline to read all C- and SP-kernels within a selectable time window which covers the time period of the data being ingested and they, in concert with the LSK and CLK kernels, then provide all needed functionality for determination of spacecraft position and estimated orientation vs. time.

4. Ground Track DB
The ground track file contains ASCII record-oriented, time-stamped spacecraft longitude, latitude, altitude and SAA distance information.

5. Predicted Event File
The Predicted Event File (PEF) is a Sequence and Command Generation (SEQGEN) product which records start and stop times for a set of interesting on-orbit events. Events of greatest interest to ingest and pipeline processing are extracted and reformatted into an event table. The orbit number (which is used to generate the scan ID of the scan in which the event occurred) is also recorded.

6. Mission Plan DB
Mission Plan files, which originate at the SOC (UCLA), are test files passed to the WSDC via MOS. These files were archived for future reference.

7. H/K DB
The housekeeping (H/K) telemetry data were delivered to the WSDC as a series of compressed CSV files, correlating time to the various telemetry values. These files were combined to produce a cumulative, SQL-searchable DBMS of H/K data indexed on time.

iv. MOS Ingest Metadata Table Output

Various summary statistics and other data were written to a metadata table for each MOS ancillary data delivery.

c. HRP Ingest

The High Rate Processor (HRP) at the White Sands Complex received telemetry from the TDRSS ground station, stripped off communication packet wrappers and organized the underlying source packets into four time-ordered output files, one each for all images in a given WISE band included in that transfer. These, along with a manifest file, comprised an HRP delivery to the WSDC. In each band's file for a given delivery, images are stored as a stream of compressed images tagged with the image's VTC in the CCSDS secondary packet header. The ingest process stripped off the source packet metadata and assembled and decompressed for each band into FITS format and rotated the images to a common orientation. Ingest then associated (correlated) the image data via the VTC (converted to UTC using the SPICE LSK and CLK kernels) with various mission ancillary data, which were written, along with standard FITS image definition data, to the FITS header. This comprised the L0 data product and was saved in the permanent Level-0 Archive.

Each delivery included data from an average of about 8 scans. A Quicklook Pipeline was run for a selected subset of the images for each delivery for rapid survey quality evaluation. All completed scans in a delivery were submitted for full processing by the Scan/Frame Pipeline.

Figure 3 - HRP Data Ingest Overview

i. Image Reconstruction

Telemetry files sent from the HRP are time-ordered CCSDS source packets for one band covering the time range for that delivery. The packets were read as streams and the CCSDS packet headers were used to establish the start and end positions of images in the stream, the VTC, and how much data was in the terminal packet (the only one which might have fill data). The data portion of each source packet (i.e. with packet headers and fill data stripped off) for a given image are appended to one another to produce the completed image. The image data was passed to the Rice decompression library to produce a decompressed image. Packet header data, mainly the VTC, was saved for each image for later use.

Every packet header was examined for internal consistency by checking the following:

If any of these checks failed, the packet was assumed to be corrupted, an error generated, the image was skipped and ingest processing continued.

If all checks were successful, the raw image was ready for correlation with the ancillary data.

ii. Correlation to Ancillary Data

To be processed by the Scan/Frame Pipeline, image data must be associated with mission ancillary data. All ancillary data was either present, or processing could not continue. The ancillary data required is listed below in rough order of importance:

Other SEQGEN files, such as the mission plan file, were also delivered, but these were not used in Level 0 image construction.

All data listed above were queried by image VTC to get the requested data. These data were then saved as metadata associated with the image and in the Level 0 image FITS header.

iii. Level 0 Image Creation and Archive Update

With the raw image and correlated ancillary data in hand for each image, the Level 0 FITS image is created as follows:

iv. Scan ID and Frame Number Assignment

The event table provides start and stop times for science scans, and with them a unique scan ID for that scan. The scan ID was derived as follows:

For example, suppose the PEF lists the orbit number as 1506.5 after a particular SEP crossing. The half-orbit number is 3013. The second scan that starts in this would produce the scan ID 03013b.

"Frame numbers" (a frame is an image and associated meta-data) were assigned to each image based on the time elapsed from start of the scan (as recorded in the event table) to the start of the image's exposure in integer counts of 10 second bins as follows:

n(frame) = int((t(frame) - t0(scan)) / 10) + 1


where t(frame) is the image time, t0(scan) is the the scan start time, and n(frame) is the resulting frame number. Note that using bins of 10s leads to non-sequential frame numbers. This was necessary to guarantee unique frame numbers even when the spacecraft clock was being adjusted during a scan.

v. Frame Index Update

Some of the metadata associated with each Level 0 frame were written to the Frame Index, a set of RDBMS tables that tracked important image information and the progress of every frame as it was processed. The key data in the Frame Index are:

The image center position, spatial bin, and frame time columns were indexed.

Subsequent processing, such as by the Scan/Frame Pipeline, updated other columns in the Frame Index to track processing progress and record QA data.

vi. Recovery From Telemetry Errors

Ingest of the image telemetry packet stream occasionally generated errors due to packets being corrupted or lost in transmission. When this happened, the corrupted image was discarded and processing continued after resyncing to the start of the next image in the stream. All complete images not affected by corruption were ultimately extracted from a telemetry file.

vii. Frame Accounting

Frame accounting was done on an ongoing basis to ensure there were no major communication problems. This accounting was entirely advisory as it was not possible for the WSDC to initiate any actions to retrieve missing data.

When frame images were lost in communication from the satellite to the WSDC, they were often, but not always, retransmitted by an automated accounting process at MOS. Thus images for a given scan may occur not only separated into multiple deliveries but out of order. In any given ingest run it was therefore impossible to know what missing frame data in a scan might ultimately appear. Thus frame accounting could only be done well after ingest completed, when no more frames for a scan could be downlinked.

Counting missing frame data was done in a few different ways:

viii. Scan Pipeline Kickoff

The Scan Pipeline was started for every scan which had all expected frames, or for which a drop-dead time interval since the last frame to be received had passed. The Scan Pipeline would in turn start an instance of the Frame Pipeline for every frame with Level 0 image data.

Ingest created the output directories for frames known to be present in the scan and the Frame Pipeline created symlinks to the Level 0 images for that frame and copied ingest meta-data. These, the Frame Index, and static calibration data form the prerequisites for Scan/Frame processing.

ix. HRP Ingest Meta-data File Output

Meta-data files recording detailed information for all frames received was saved with the Level 0 archive and in the Scan/Frame Pipeline processing directories.

d. Quicklook Processing

For each delivery, the ingest process started one Quicklook pipeline to run on approximately 100 frames per delivery to get an immediate, preliminary look at data quality. Frames were selected based on the sky location exposed, statistical properties, calibrators in-field, and observation time,a s well as a population of random frames. The selection criteria were meant to emphasize frames that will give clean PSF measurements to confirm that the scan mirror is in sync with our scan rate.

The Quicklook Pipeline is the Scan Pipeline with command line parameters set such that:

A composite PSF was generated by Quicklook combining data from all the frames selected, and its properties were evaluated to confirm scan synchronization.



Last update: 2011 May 18


Previous page    Next page
Return to Explanatory Supplement TOC Return to Explanatory Supplement TOC