
To: MSI/NIS Science Team
From: Jeff Warren (warren@msss.com), Mike Malin (malin@msss.com)
Subject: Shot Noise Limiting Companding for MSI
Date: 8 February 1995
Introduction
Two 12-to-8-bit companding algorithms have been suggested for MSI: one
uses a simple linear mapping between 4095 and 255; the other attempts
to minimize the error introduced by the coarser bins when attempting
to reproduce the original 12-bit values from compressed data (i.e.,
minimizing the percentage difference between the original
12-bit input and a 12-bit value reconstructed from an 8-bit value).
One point that is overlooked in these algorithms is the impact of
nonlinear noise in the original 12 bit signal; that is, the photon
counting, statistical or "shot" noise. Since shot noise increases as
the square root of the signal, higher signal data have larger noise
levels (although they also have higher signal-to-noise ratios).
Linear, or lin-log companding, leads to an oversampling noisy data
while unnecessarily throwing away data that could otherwise be
resolved. This memorandum describes an algorithm for computing a
lookup table to optimally compand "noisy" 12 bit data to 8 bits while
introducing a minimal amount of quantization noise.
Noise Characteristics
For the purposes of this memo, a CCD will be considered a device for
counting electrons. A perfect device would be able to count the
electrons in the input signal and produce an exact replica as its
output. However, experiments that count items over an
interval of time (e.g., photons incident upon a CCD during a 100 ms
exposure) are subject to measurement fluctuations that result because
random samples of events distributed randomly in time contain numbers
of events that fluctuate from sample to sample. The likelihood of
observing any given number of counts is given by the Poisson
probability function, where the uncertainty is proportional to the
square root of the mean value.
In attempting to invert the observed count to infer the original
signal, the counting uncertainty introduces an uncertainty in the
reconstructed value. Table 1 shows an example of how uncertainty in
the counting statistics can affect the output signal. Note that in
Table 1 not only are the output values incorrect, they actually trend
in the direction opposite of the input signal. Since this attribute
of the data is independent of the efficiency of the device, it sets a
limit on the precision with which any measurement should be made. Any
additional precision is wasted, as one is then simply measuring
"noise."
Input Signal Max Counting error/Noise Example noise Output Signal
100 sqrt(100) = 10 +5 105
103 sqrt(103)=10.1489 -5 98
TABLE 1
Figure 1 shows in graphical form the "one-sigma" uncertainties
associated with a portion of a CCD's total range. Note the
significant overlap in the possible output values. The regions where
the uncertainty bars overlap represent output values that cannot be
mapped to distinguishable input values. In this example, an output
value of 100 has a finite probability of having been generated by an
original value as low as 91 and as high as 110. Even lower (and
higher) original values may also have been counted as 100, with lower
probability.

If two values in the output signal have uncertainty values that do not
overlap, the input signals that generated them are considered
discernible from one another, on the basis of statistical arguments
that values that differ by two-sigma are discriminable. Figure 1
illustrates this convention: consecutive or adjacent output signals
that vary from, for example, 90 to 115, are more likely to have been
caused by a real change in the input signal, while those that vary
from 92 to 108 are consistent with a constant input signal of 100 with
only a variation in the random noise. If one does not wish to retain
and quantify the high frequency variation in the output signal that
results from this statistical variation, the output signal can be
resampled to a coarser resolution which increases the probability that
transitions in the output signal represent real changes in the input
signal. Since this is a coarser sampling of the original output data,
a smaller number of bits can be used to represent this new output
signal.
Noise Limiting Quantization
A step function has been developed that resamples the original count
space into bins whose centers have a constant probability of being
discriminated from adjacent bins. Figure 2 shows graphically what the
resampling algorithm achieves. Stated explicitly, if the sqrt(N)
distribution about the input value represents a 1 sigma probability,
then this requantization ensures that the centers of adjacent bins
represent real changes in the input signal to a probability of 2
sigma.

The first bin is chosen such that its maximum value equals the "full
well" (the maximum possible number of counts) of the detector. For
the purposes of this discussion, it has been assumed that the
full-well count of the detector is equal to 500,000 electrons.
Other full-well values will give rise to different values for
resampling, as discussed at the end of this memo. For this example,
the center of the first bin (represented by No) is chosen
such that:
No + sqrt(No) = 500000
To solve this equation for No, a change of variables is
performed:
X = sqrt(No)
This yields the equation:
X^2 + X - 500000 = 0
This is simply a quadratic whose roots are given by:
X = { -1 +/- sqrt[1 + (4 * 500000)] } / 2
Having solved for X, No can now be computed as:
No = X^2 = 499293.393042
The next bin is then chosen such that its maximum value matches the
the minimum value of the first bin. The minimum value of the first
bin is given by:
k = No - sqrt(No) = 499293.393042 -
706.606958 = 498586.786084
The equation for the center of the next bin (represented by
N1) is then given by:
N1 + sqrt(N1) = k
This is the same equation used above to generate the first bin (when k
was equal to 500000). Thus, the center of each of bin can be computed
using the following equation:
X = { -1 +/- sqrt[1 + (4 * k)] } / 2
N = X^2
where k is initially 500000, and then subsequently set to X - sqrt(X)
for the next iteration.
Up to this point, this calculation has assumed that the CCD produces
counts at a resolution of 1 electron. Owing to limitations in
analog-to-digital conversion, the CCD's signal is actually quantized
into 4096 distinct levels. Assuming that the conversion from
electron counts to DN follows a linear relationship, and maps the
full-well value to the full 12 bit range, the actual signal
produced by the CCD is given by:
DN = N / S
DN = CCD output
N = original electron count
S = scaling constant = 4096 / 500000 = 122.0703
Values for N produced by the previous equations should be converted to
DN values using the equation above. As values for N become smaller,
multiple bin centers start mapping to the same value in DN space. At
this point, signal differences between the centers of DN values are
guaranteed to be discernible from one another and therefore no
quantization is necessary for smaller DN values.
Listing 1 shows the C code that implements the
algorithm just described. This program produces 677 bins in the
original 12 bit DN space that represent a noise limited approximation
to the original input signal. Table 2 shows the
results obtained by running this program.
At this point, the algorithm has requantized the dynamic range from
4096 levels (12 bits) to 677 levels (10 bits), preserving within the
data the ability to discriminate any two values with a confidence of
greater than 2-sigma.
Additional Quantization To 8 Bit Space
Owing to the nature of the MSI, the desired companding algorithm is
one that reduces the original 12 bits down to 8 bits using a single
lookup table. This can be achieved by sampling the 677 levels
previously determined at 256 regular intervals. Listing 2 shows the C code that implements this
sampling. In implementation, the order of the original 677 entry
table is reversed so that the first entry corresponds to the lowest
signal and the last entry corresponds to the highest signal. The
table is then divided into 256 bins using linear interpolation between
values in the original table to compute the values for the sampled
points. The reconstructed DN value for the center of each bin is
computed from the average of the bin's extents. Table 3 lists the output produced by this algorithm.
Columns 1 and 4 of this table can be used as the lookup table to
expand the data from 8 bits back to 12 bits. This table is easily
inverted to obtain the lookup table for converting from 12 bits to 8
bits. Figure 3 shows the behavior of this companding algorithm as a
plot of the 8 bit values against the 12 bit values. For comparison,
the lin-log companding table proposed by APL is also shown.
Implementation Issues
It has been assumed that MSI is a fixed gain camera, that is,
that it converts the electron count to DN space by dividing by a
constant (full-well value / 4096 DN). Increases in exposure
time result in more electrons accumulating on the detector prior to it
being read out, but the size of the 4096 bins is a constant, and such
exposure variations only change which part of this linear space is
being used. Following the division, the DN values are forced to lie
in the range of 0 to 4095. This means that 0 electrons maps to
exactly 0 DN and the full-well value maps to exactly 4096 DN, which is
then clipped to 4095. If the scale factor for converting from
electron count to DN space is different from the one assumed
throughout this memo (i.e., if it isn't 500,000 electrons), then it
should be a simple matter to change the scale factor given in Listing 1. If the analog to digital conversion is
nonlinear, then the scaling equation in Listing
1 will need to reflect this more complicated function. In either
case, these changes should only affect the values in columns 3, 4 and
5 of table 2. After making these changes, the
quantization to 8 bit space should be performed the same way as
described to produce new values for Table 3.