Expected Confusion Noise as a Function of Stellar Number Density

Application of Confusion Noise and "Tuned" Score Thresholds

T. Jarrett, IPAC
(980121)

As the surface density of stars exponentially increases near the disk of the Galaxy, the "confusion noise" becomes appreciable and is one of the primary deterrents toward galaxy detection and extraction. So as to not waste valuable processor time on objects fainter than the confusion limit (defined below), it is part of GALWORKS to estimate the confusion noise (via the stellar number density) and throttle back the magnitude (sensitivity) threshold limits accordingly. In addition to the mag limits, it is necessary to adjust the star-galaxy discrimination score thresholds with density in order to minimize stellar contamination as well as cap the total number of sources that galworks churns out. The discussion on score thresholding, or "tuning" of the scores, follows the discussion of confusion noise.

The confusion noise calculation is explained briefly in the GALWORKS SDS document. The reader may also want to refer to the similar topic of Expected Number of Multiple Star Systems as a Function of Galactic Latitude .

The crux of the confusion noise matter is reproduced below:
Starcounts are performed for each coadd and the number density as a function of total integrated flux is computed. Flux thresholds are set by the coadd source density. If the density is sufficiently high (e.g. near the Galactic plane), then the limiting flux (i.e., the total integrated flux) is modified according to the estimated confusion, as given in the memo by Beichman dated 02-19-93. The estimated confusion noise is illustrated in Figure 5 of the noted memo (where the confusion+sky noise is plotted vs. source density; here the confusion noise is computed as a mean value in a 5 arcsec aperture). For example, for a stellar density of 1100 stars/deg2 (K < 14) the sky surface brightness due to confusion noise has increased by 0.2 mag compared to the pole (300 stars/deg2); for 5000 stars/deg2 the increase is 0.5 mag; for 10000 stars/deg2 the increase is 1.3 mag; for 20000 stars/deg2 the increase is 2.0 mag.

The estimated confusion noise as a function of the K stellar number density (log10 of cumulative stars per degree**2, with Kmag < 14.0) is given below. The solid white line represents the nominal confusion noise, while the red dashed line represents the actual value used by GALWORKS.

The GALWORKS confusion noise threshold was artifically set lower than the nominal value for data checkout purposes (amongst other things). Also denoted on the plot are the range in densities for some of the GALWORKS RTB fields, including:

mid30, located at glat = 30 deg (glong = 40)

mid20, located at glat = 19 deg

mid10, located at glat = 8 to 9 deg

low5, located at glat = 3 to 6 deg

NAN (north american nebula), located at -2 deg glat

msx, located near 0 to 1 deg glat

From the confusion noise plot we can estimate (to within 10 to 20%) the expected mag thresholds after adjusting for the confusion noise. The nominal mag limits for GALWORKS are 15.50, 14.75 and 14.25, JHK respectively, where the mag refers to the "K" fiducial isophotal photometry.

Sensitivity (mag) Thresholds for Extended Source Processing

field	density	dmag	J	H	K
NAN	4.3	2.0	13.5	12.75	12.25
low5	4.1	1.4	14.1	13.40	12.80
mid10	3.8	0.9	14.6	13.85	13.40
mid20	3.35	0.3	15.2	14.45	14.00
mid30	2.90	0.1	15.4	14.65	14.15
abell262	<2.70	0.0	15.5	14.75	14.25
coma	<2.50	0.0	15.5	14.75	14.25

Threshold Tuning: Case Study of "low5"

For low density fields where the level-1 specs apply, the "score" thresholds are tuned such that the database (galaxy candidates) is maximized by the internal completeness, with reliability minimized without adversely affecting completeness. The lev-1 spec is >90% for low density fields (~glat > 30 deg) and 80% for mid-density fields (glat between ~10 and 30 deg). We therefore choose our score thresholds such that the internal completeness is at least 95 to 99% for low density and 90% for mid density. For mid-density and high density fields (see below) the critical score thresholds are on "msh" and "r23".

For high density fields, (~glat < 20), the "score" thresholds are tuned such that the total number of galaxy candidates is no more than 50 or so objects per scan. The critical score threshold is on "r23".

The following plot shows the "r23" score for galaxy candidates in the "low5" field, which has a very high source density. The objects include real galaxies (filled circles), double stars (red triangle)and triple stars (blue crosses). This score is the most robust discriminator between real extended sources and compact double & triple stars. The "msh" works well with double stars, but is insensitive to triple stars. It does, however, function as the "preliminary" score threshold in the GALWORKS processor, so it is important to set this score as high as possible (to reduce runtime) without affecting completeness. Note: yellow crosses are sources that cannot be accurately classified (they tend to be at the faint limit of the survey).

"r23" vs isophotal mag for the "low5" field

One can see from the plot that double/triple stars are a major headache toward galaxy detection and extraction. The best we can do is minimize the total numbers of these "false" galaxy candidates while preserving most of our brighter galaxies (brighter than the mag limit after adjusting for the confusion noise).

The source counts for the galaxy candidates found in the "low5" fields are given below. The first gif image shows the counts before any "tuning" of "r23" or "msh" is attempted. The important column to look at is the last one, entitled "GTOT/scan", which is the total number of sources per scan. The total reflects a "cumulative" count, including real galaxies, "bogies" (stars, doubles, triples), artifacts (pieces of trails), "unknown" objects and "no verify" objects. We desire no more than 50 objects per scan (6 degree scan) total up to the mag limit. For "low5" the mag limit is around 14.1, 13.4 and 12.8, JH & K, respectively. The 50 objects per scan number comes from runtime considerations (as well as disk space consideration).

Binned galaxy candidate counts; no tuning applied

Raising the threshold for "msh" and "r23" to values of 3 and 4, respectively, gives a satisfactory total number of objects per scan at the limit we desire:

Binned galaxy candidate counts; "light" tuning

Notice that the total number of sources per scan is about 50 at the mag thresholds. Note however, the reliability is still rather horrible.

With "moderate" tuning, r23 threshold of 4.0, cuts the total number of objects by nearly one half, with very little affect on the completeness:

Binned galaxy candidate counts; "moderate" tuning

Finally, setting r23 to 5.0 results in a completeness hit in the last relevant mag bin:

Binned galaxy candidate counts; "heavy" tuning

Note that the total counts are now down to 20 or so objects per scan brighter than the mag threshold limit.

Since we would like to error on the side of completeness, the "moderate" score thresholds will be used. After applying the confusion noise adjustment to the mag limits and employing the "moderate" score thresholds (msh = 3.0, r23 = 4.0), the resulting galaxy database counts are:

Binned galaxy candidate counts; final tuning

and the 2-D score plots are the following:

North American Nebula (NAN)

As we go deeper into the plane of the Milkey Way, the confusion noise rises accordingly (since the near-IR bands are relatively insensitive to extinction). The odds of finding a galaxy in any given scan are remote. We did, however, find one beautiful galaxy in the NAN field scans. The following plots show the scores for this galaxy along with all of the other junk (re: false galaxies) that galworks extracts in one scan, as well as the galaxy candidate source counts. Note: there is one other possible galaxy (it is very difficult to classify, but it looks compelling nonetheless).

"r23" vs isophotal mag for the "NAN" field

"msh" vs isophotal mag for the "NAN" field

Binned galaxy candidate counts; no tuning applied

Using the "moderate" score thresholds determined for "low5" (which is at a slightly lower stellar number density) and the confusion noise adjusted mag limits of 13.5, 12.75, and 12.5, JHK, respectively, we arrive at the following source counts:

Binned galaxy candidate counts; "moderate" tuning applied

... And so we arrive at a satisfactory conclusion: the total number of sources per scan is less than 50 and our one beautiful galaxy is preserved. Hurray!

"mid10"

The "mid10" field has an average confusion noise of about 0.9 mag. False detections still dominate the total source counts. Since there is no lev-1 spec on reliability, we want to again be as complete as possible, while limiting the total number counts to less than 50 or so per scan.

"r23" vs isophotal mag for the "mid10" field

"msh" vs isophotal mag for the "mid10" field

Binned galaxy candidate counts; no tuning applied

The optimum tuned thresholds for this field are msh = 2 and r23 = 2.5.

Binned galaxy candidate counts; mid10 tuned

"mid20"

The "mid20" field has an average confusion noise of about 0.3 mag. Real galaxies now dominate the total source counts. Since there is no lev-1 spec on reliability, we want to again be as complete as possible, while limiting the total number counts to less than 50 or so per scan.

"r23" vs isophotal mag for the "mid20" field

"msh" vs isophotal mag for the "mid20" field

Binned galaxy candidate counts; no tuning applied

The optimum tuned thresholds for this field are msh = 2 and r23 = 1.5.

Binned galaxy candidate counts; mid20 tuned

"mid30"

The "mid30" field has an average confusion noise of about 0.1 mag. The level-1 specs now apply, with a reliability limit of 99% and a completeness limit of 90%. Since the 99% reliability can only be achieved with post-processing tricks (e.g., using OBDT blackboxes or N-cube classifications and a human in the loop to eliminate extraneous artifacts) our goal for the database is to achieve excellent completeness (at least 95%), while minimizing the impact of bogus sources as much as possible.

"r23" vs isophotal mag for the "mid30" field

"msh" vs isophotal mag for the "mi30" field

Binned galaxy candidate counts; no tuning applied

The optimum tuned thresholds for this field are msh = 1.5 and r23 = 1.5.

Binned galaxy candidate counts; mid30 tuned

To summarize the "tuning" results for confusion noise and score thresholding ("r23" and "msh"):

Sensitivity & Score Thresholds for Extended Source Processing

field	density	dmag	J	H	K	msh	r23
NAN	4.3	2.0	13.5	12.75	12.25	>3.0	>4.0
low5	4.1	1.4	14.1	13.40	12.80	3.0	4.0
mid10	3.8	0.9	14.6	13.85	13.40	2.0	2.5
mid20	3.35	0.3	15.2	14.45	14.00	2.0	1.5
mid30	2.90	0.1	15.4	14.65	14.15	1.5	1.5
coma	<2.50	0.0	15.5	14.75	14.25	1.5	1.0

The confusion noise, score thresholds and densities are also conveniently plotted below:

Confusion Noise & Score Thresholds vs. Density

Here the confusion noise is denoted by the white solid line. The "r23" thresholds by the blue crosses and the "msh" thresholds by the green filled circles. It turns out that a convenient "fit" to the thresholds is given by a scaling of the confusion mag. For "r23" the formula is:

and for the "msh" threshold the formula is:

msh_threshold = 1.5 + (confusion_dmag * 1.0)

The "fits" are shown with the green (msh) and blue (r23) dashed lines.

Runtime & Such

Runtime for Pre-Tuned and Tuned GALWORKS

. . . |-- pre-tune ---| |--- tuned ------| . |--tuned ----- -short--|
field density scans real user dt/scan real user dt/scan improve #stamps #/scan #/scan
ab262 2.70 15 720 448 30 564 482 32 0.94 1960 131 ..
coma 2.50 5 283 170 34 183 156 31 1.10 818 164 127
herc 2.80 5 382 246 49 258 235 47 1.04 1350 270 193
mid30 2.90 9 461 291 32 310 290 32 1.00 1017 113 ..
mid47 2.75 8 353 237 30 211 200 25 1.20 886 111 43
mid40 2.80 5 209 138 28 131 124 25 1.12 506 101 37
misc 3.05 6 333 206 34 182 151 25 1.36 562 94 ..
mid20 3.35 4 326 200 50 190 181 45 1.11 590 148 77
mid10 3.80 6 903 655 109 623 376 63 1.73 257 43 ..
maffei 3.75 3 400 290 97 305 201 69 1.41 n/a n/a ..
low5 4.10 7 1640 1351 193 884 611 87 2.22 455 65 ..
neb 4.20 1 282 226 226 104 49 49 4.62 n/a n/a ..
nan 4.30 1 260 216 216 103 59 59 3.66 30 30 ..
msx 4.50 1 264 209 209 100 50 50 4.18 13 13 ..

.	.	.	\|--	pre-tune	---\|	\|---	tuned	------\|	.	\|--tuned	-----	-short--\|
field	density	scans	real	user	dt/scan	real	user	dt/scan	improve	#stamps	#/scan	#/scan
ab262	2.70	15	720	448	30	564	482	32	0.94	1960	131	..
coma	2.50	5	283	170	34	183	156	31	1.10	818	164	127
herc	2.80	5	382	246	49	258	235	47	1.04	1350	270	193
mid30	2.90	9	461	291	32	310	290	32	1.00	1017	113	..
mid47	2.75	8	353	237	30	211	200	25	1.20	886	111	43
mid40	2.80	5	209	138	28	131	124	25	1.12	506	101	37
misc	3.05	6	333	206	34	182	151	25	1.36	562	94	..
mid20	3.35	4	326	200	50	190	181	45	1.11	590	148	77
mid10	3.80	6	903	655	109	623	376	63	1.73	257	43	..
maffei	3.75	3	400	290	97	305	201	69	1.41	n/a	n/a	..
low5	4.10	7	1640	1351	193	884	611	87	2.22	455	65	..
neb	4.20	1	282	226	226	104	49	49	4.62	n/a	n/a	..
nan	4.30	1	260	216	216	103	59	59	3.66	30	30	..
msx	4.50	1	264	209	209	100	50	50	4.18	13	13	..

Notes:

Final Note: the runtime can be improved by 13 to 19% by eliminating Algor II (LCSB detector) from GALWORKS. There is no improvement for high source density fields (because Algor II does not run for these fields by default).