NOTICE: This software (or technical data) was produced for the U.S. Government under contract, and is subject to the Rights in Data-General Clause 52.227-14, Alt. IV (DEC 2007). Copyright 2023 The MITRE Corporation. All Rights Reserved.

Overview

OpenMPF provides a Markup component that can be used to draw bounding boxes and labels on images and videos. The component provides one task called OCV GENERIC MARKUP TASK that can be added to the end of any image and/or video pipeline. By default, many other OpenMPF components provide * (WITH MARKUP) pipelines that use this task. Note that the Markup component will not appear in the list of components in the Component Registration web UI because it's a core feature of OpenMPF.

Configuration

The following properties can be set as job properties or algorithm properties on the MARKUPCV algorithm. Also, the default values can be changed be setting the system property listed for each:

Video Markup Icons

Icon Meaning Setting
Exemplar Icon Track exemplar MARKUP_VIDEO_EXEMPLAR_ICONS_ENABLED
Motion Icon Track or detection is moving MARKUP_VIDEO_MOVING_OBJECT_ICONS_ENABLED
Stationary Icon Track or detection is stationary MARKUP_VIDEO_MOVING_OBJECT_ICONS_ENABLED
Detection Algorithm Icon Detection is the direct result of a component detection algorithm MARKUP_VIDEO_BOX_SOURCE_ICONS_ENABLED
Tracking Filled Gap Icon Detection is the result of a component performing tracking in an attempt to fill in the gaps between algorithm detections MARKUP_VIDEO_BOX_SOURCE_ICONS_ENABLED
Animation Icon Detection is the result of the Workflow Manager interpolating (animating) the size and position of the bounding box to fill gaps between detections in the track MARKUP_VIDEO_BOX_SOURCE_ICONS_ENABLED

Video Markup Examples

Walking No Border Exemplar

Above we show frame 94 of a marked up video. Frame numbers are enabled so the frame number is shown in the top-right corner. Exemplar icons are enabled, and since this detection is the track exemplar a star icon is shown in the label. Also, the label shows the track's CLASSIFICATION property followed by the track confidence. All of the examples shown in this section will display track-level information because MARKUP_LABELS_FROM_DETECTIONS=false. The circle represents the top-left corner of the detection. See this section of the C++ Batch Component API for more information on flip and rotation.

Walking With Border Algorithm Detection

Above we show frame 25 of the marked up video. This time we configured markup to show a black border around the video frame. This is useful when the label extends beyond the edge of the original video frame, as shown here. Also, this time we configured markup to show icons indicating if the track is moving or stationary. The fast-forward icon at the start of the label indicates that this track is moving. Additionally, this time we configured markup to show icons indicating the bounding box source. The magnifying glass icon after the fast-forward icon indicates that this detection is a direct result of the component's detection algorithm. Note that the magnifying glass icon will be replaced with the star icon for exemplars.

Walking With Border Animation

The frame above shows a movie camera icon to indicate that the detection is the result of the Workflow Manager (WFM) interpolating (animating) the size and position of the bounding box to fill gaps between detections in the track. Considering how blurry the person appears in this frame, it's not surprising that the algorithm could not detect him. If you perform a job with FRAME_INTERVAL greater than one, or otherwise perform frame skipping, then all bounding boxes in skipped frames will be the result of WFM animation. Note that the classification and confidence values are simply carried over from the last detection that was not the result of WFM animation.

Walking With Border Filled Gap

The frame above shows a paper clip icon to indicate that the detection is the result of the component performing tracking in an attempt to fill in the gaps between algorithm detections. In general, these detections are more trustworthy than the ones resulting from WFM animation, but not as trustworthy as the ones directly resulting from the detection algorithm.

Walking With Border Skies

The frame above shows the person detection in addition to a new skis detection. The confidence for the latter is lower, which is good considering the algorithm misclassified the person's shadow as skis. The skis track is only a few frames long, so the WFM determined it was a non-moving (stationary) track. This is represented by the anchor icon at the start of the label. Also, notice that the labels are semi-transparent. This allows you to read labels and see frame content that would otherwise be hidden if the labels were completely opaque. Note that you may want to set MARKUP_LABELS_ALPHA to 0.75 or greater when using the mjpeg encoder.

Video Encoder Considerations

Performing markup on an image will always generate a .png file. Performing markup on a video will generate a video file based on the value of MARKUP_VIDEO_ENCODER. The vp9, h264, and mjpg encoders are supported.

The vp9 and h264 encoders serve the same purpose in that both formats can be played in the WFM web UI in most web browsers, while the .avi files resulting from the mjpeg format must be downloaded and played using a separate program like VLC or mpv. In general, h264 encoding is much faster than vp9 encoding, so you may want to use it instead of vp9. Please be aware that you may be required to pay Usage Royalties when using the h264 encoder for commercial purposes.

The mjpeg encoder is faster than both the vp9 and h264 encoders, but produces lower quality video. Specifically, the label text is not as clear. You may want to use it when developing components or marking up large video files.

To give you a sense of performance, here are the results of a very limited batch of tests. Note that if you choose to use the vp9 encoder, you can increase the CRF value to reduce processing time at the cost of reduced video quality.

input media: 23 frames @ 3840x2160:

Encoder CRF Time (secs) Notes
mjpeg 6.94
h264 9.478
vp9 60 13.194
vp9 31 21.431

input media: 509 frames @ 640x480:

Encoder CRF Time (secs) Notes
mjpeg 6.927 alpha 0.5 is hard to read when blended with dark background; 0.75 does better
h264 11.259
vp9 60 35.945 text not acceptable due to low resolution
vp9 31 52.178