Summary

This plugin was created to streamline the generation of image chips with "labels" to be fed into machine learning (ML) algorithms. In order to facilitate robust training, we want to generate a large number of image chips across a wide range of acquisition parameters. Those include:

  • Different targets and/or variants of targets

  • Different backgrounds (the context within which the target appears)

  • Different ground sampling distances (GSDs)

  • Different sensor view angles (zenith and azimuth)

  • Different illumination angles (zenith and azimuth)

Historically this has been accomplished using external scripting with a conventional DIRSIG simulation. The primary goal of this plugin is to make it easy to configure all the degrees of freedom in one location and have the plugin manage the creation of the images.

Assumptions and Simplifications

This approach makes several assumptions and employs simplifications in how it models some elements of the simulation. Most of these choices were made in light of what training data and test data for ML algorithms looks like. Specifically that most ML workflows employ 8-bit and/or 24-bit images and various physical parameters of the sensor, scene, atmosphere, etc. are generally irrelevant. For example, the algorithm isn’t aware of the size pixels on the focal plane and the effective focal length. But it is aware of the GSD of the images. Likewise, the algorithm isn’t explicitly aware of a hazy maritime atmosphere vs. clear desert atmosphere. But it is aware that some images have lower contrast and some images have higher contrast. In light of the context of how these images are generally used in ML workflows, many of the approaches employed in this plugin have been simplified to streamline the setup of these simulations.

Camera Modeling

The modeling of the camera has been simplified to avoid the user needing detailed system specifications that are largely irrelevant in the context in which the output images are used. For example, the user defines the GSD directly rather than the physical size of pixel elements on the focal plane and an effective focal length. As a result, the object to image plane projection is orthogonal rather than perspective. Because the final imagery (the PNG, JPEG, etc. images used with the ML algorithm) won’t have physical units, it is not important to have detailed spectral response functions for each channel. Hence, the definition of spectral channels is limited to a simple bandpass defined by an lower and upper wavelength and the response is assumed to be uniform across that bandpass. There are options to incorporate the effective point-spread function (PSF) of the system, but that PSF is currently assumed to be constant across all channels.

Atmospheric Modeling

The ChipMaker plugin in DIRSIG5 is technically a combo plugin because it binds to both the sensor API (to drive the image formation) and the atmosphere API (to drive the source direction, direct illumination and diffuse illumination). The atmospheric modeling provided by this plugin is not physics-driven (for example, computed by MODTRAN), but rather a simple analytical model. The total irradiance from the hemisphere is spectrally constant and partitioned between direct and diffuse components. There is no path scattering or path transmission between the sensor and the scene. For a physics-based, remote sensing simulation tool this seems like an inappropriate simplification of the real world. However, utilizing a physics-based atmosphere model (for example, MODTRAN) would entail an enormous amount of computations since every chip would involve a unique view and illumination geometry. At this time, the reality is that calibrated images are rarely used to either train ML algorithms and ML algorithms are rarely supplied calibrated image to analyze. Hence, it doesn’t matter if the exact transmission and scattering is modeled because the algorithms are typically working with 8-bit, 24-bit, etc. images where the impacts of path transmission and scattering manifest as relative contrast differences in the image. Therefore, the approach here is to capture the multiplicative transmission loss and additive scattering gain in the conversion from output radiance to integer count images. For example, a hazy atmosphere (high scattering, low transmission) can be emulated as a linear radiance to counts scaling that has a lower gain and higher bias when compared to a clearer atmosphere.

Input

The input file for the plugin is a JSON formatted file. An example file is shown below and will be discussed section by section. See the ChipMaker1 demo for a working example.

{
    "camera" : {
        "image_size" : {
            "x" : 128,
            "y" : 128
        },
        "gsd_range" : {
            "minimum" : 0.05,
            "maximum" : 0.10
        },
        "channellist" : [
            {
                "name" : "Red",
                "minimum" : 0.6,
                "maximum" : 0.7
            },
            {
                "name" : "Green",
                "minimum" : 0.5,
                "maximum" : 0.6
            },
            {
                "name" : "Blue",
                "minimum" : 0.4,
                "maximum" : 0.5
            }
        ],
        "readout" : {
            "frame_time" : 1e-03,
            "integration_time" : 1e-04
        },
        "psf" : {
            "image" : "circle_psf.png",
            "scale" : 10.0
        },
        "image_filename" : {
            "basename" : "chip",
            "extension" : "img"
        },
        "truth" : [
            "scene_x", "scene_y", "scene_z", "geometry_index"
        ]
    },
    "time_range" : {
        "minimum" : 0,
        "maximum" : 0
    },
    "view" : {
        "zenith_range" : {
            "minimum" : 5,
            "maximum" : 40
        },
        "azimuth_range" : {
            "minimum" : 0,
            "maximum" : 360
        }
    },
    "source" : {
        "zenith_range" : {
            "minimum" : 5,
            "maximum" : 40
        },
        "azimuth_range" : {
            "minimum" : 0,
            "maximum" : 360
        }
    }
    "setup" : {
        "random_seed" : 54321,
        "target_tags" : [ "box", "sphere" ],
        "options" : [ "with_and_without" ],
        "count" : 100,
        "report_filename" : "labels.txt"
    }
}

Camera

The camera description utilizes parameters that are image-centric rather than camera-centric. What that means is that rather than specifying the physical size of the pixels in the array, an effective focal length, etc. the user specifies the dimensions of the image and the GSD. The camera is currently modeled as an ortho camera, to avoid camera specific distortions that are beyond the scope of the camera model.

image_size

The size of the image frames to be generated in x (width) and y (height).

gsd_range

The user can (optionally) provide a range of GSDs to model. If the user wants all the images to have the same GSD, then set the minimum and maximum to the same value. If this range is not provided, the plugin will automatically compute the GSD so that each target fits within the image.

channellist

The user can specify a set of channels to be modeled by the sensor. The channels are assumed to have simple uniform responses across the spectral bandpass defined by the minimum and maximum variables. The name variable specifies the name that will be used for the corresponding band in the output image.

image_filename

The user specifies the file "basename" and "extension" and the simulation will write images to files using a basenameX.extension naming pattern, where X is the index of the chip.

readout

The pixels can integrate using either a global shutter where all pixels are integrated synchronously and then readout. The pixels can also be integrated asynchronously in a line-by-line manner to emulate either a rolling shutter or a pushbroom scanning sensor. The global (synchronous) integration method is the default, and the integration_time is the duration that every pixel is integrated for. To enable the line-by-line (asynchronous) integration method, the frame_time must be set and the line-to-line delay is assumed to the frame time / number of lines. In this case, the integration_time is the duration that every line of pixels is integrated for.

truth

The user can optionally request truth for each image. This will be output as additional bands in the the image files.

psf

The user can optionally describe the point spread function (PSF) of the system using an image file. The image variable is used to supply the name of the file containing the PSF image (PNG, JPEG, TIFF, GIF). Because the contribution area described in the PSF image is usually much larger than the pixel, the scale variable is used to describe the width of that image in pixel units.

Time

Scenes that contain motion (moving objects) can be sampled as a function of time, which allows the moving objects to be imaged in different locations and/or orientations (as defined by their respective motion). The range of sample times is defined in the time_range section of the input. The minimum and maximum times are relative and in seconds.

View

The range of view directions for the camera is defined in the view section of the input. The zenith (declination from nadir) and azimuth (CW East of North) are supplied as minimum and maximum pairs. These angles are in degrees.

The optional offset_range will introduce a spatial offset of the target within the image. The range is used to generate a random XY offset. The values are in meters.

Source

The direction of the source (sun) with relation to the target is defined in the view section of the input. The zenith (declination from nadir) and azimuth (CW East of North) are supplied as minimum and maximum pairs. This angles are in degrees.

Setup

The setup section of the file specifies the overall setup of the simulation to be performed, including the specification of which targets to sample, the number of images to be generated and the name of the file containing key label information.

target_tags

The list of tags used to select the targets in the scene to be imaged.

count

The number of image chips to generate.

random_seed

The random set of targets, view directions, source directions, etc. can be expected to change from simulation to simulation because the seed for the random number generator that drives these random parameters is different for each execution. If the user desires the ability to reproduce a specific simulation, then they can supply the random_seed variable to fix it so that it won’t change.

options

There are several options related to how the simulation runs. See below for more detail.

report_filename

The ASCII/text report that describes the target, view angles, illumination angles, GSD, etc. for each image chip is written to the filename provided by this variable.

Options

The following options control how the simulation is performed.

hide_others

This option will cause the simulation to hide all the other targets in the selection set while the chip for a given target is being generated. In the example above the selection set includes anything that has the tags "box" and "sphere". Therefore each chip will be centered on a "box" or sphere". With this option included, all other "box" and "sphere" objects will be hidden except for the one being imaged. Note that "cylinders" (not included in the example tag set) will not be a chip target or be hidden when imaging any of the "box" or "sphere" targets.

with_and_without

This option will cause the simulation to produce A/B image pairs with and without the current target present. If there are N chips requested (see the count variable in the setup), the resulting images will be named chip0a.img (contains the target) and chip0b.img (same parameters, but without the target).

rerun_from_report

This option allows the user to reproduce a set of images using the output label report (see the report_filename variable in the setup) from a previous simulation. When using this mode, rather than choose a random target, random view, etc. it will use the parameters (target index, time, GSD, source angles, etc.) from the report file. Note that if the scene changes (specifically if, new targets are added), then the output image set will be different.

Note The with_and_without and rerun_from_report options cannot be combined at this time.

Usage

To use the ChipMaker plugin in DIRSIG5, the user must use the newer JSON formatted simulation input file (referred to a JSIM file with a .jsim file extension). At this time, these files are hand-crafted (no graphical editor is available). An example is shown below:

[{
    "scene_list" : [
        { "inputs" : "./demo.scene" }
    ],
    "plugin_list" : [
        {
            "name" : "ChipMaker",
            "inputs" : {
                "input_filename" : "./chips.json"
            }
        }
    ]
}]

The ChipMaker1 contains a working example of this plugin.