External Data Formats

NNTC Log Files

This is the standard set of files logged by Teledyne FLIR’s computer vision library NNTC.

./outputs/2022-10-13_15-50-46_0xb32
├── config.json
├── detections.json
├── nntc_config_expanded.json
├── nntc.log
├── run_command.txt
├── tracks.json
└── version.json

File

Description

config.json
Alternatively:
config.json.aes
A flattened ($ref objects resolved) copy of the configurations used to run NNTC.
The file will be encrypted and contain the .aes extension
if the input configuration is encrypted.
nntc_config_expanded.json
Alternatively:
nntc_config_expanded.json.aes
Similar to config.json except additional preprocessing is
used to prepare values into a usable state for NNTC
(defaults applied and values cast to proper types)

nntc.log

Raw log output (same as what is printed to screen).

run_command.txt

Saves command that was run with all input arguments.

version.json

Information about application and model versions for record keeping

detections.json

Output of the CNN is logged using the Formatted Detections format.
The name of this file is determined by the input argument:
--log_detections=detections.json

tracks.json

Output of the tracker using the Formatted Detections format.
The name of this file is determined by the input argument:
--log_tracks=tracks.json

Formatted Detections

The Teledyne FLIR Formatted Detections format holds a textual representation of bounding box output (detections and tracks) using JSON syntax. It is one of the key formats used by the NNTC computer vision library. Formatted Detections are derived from the COCO format, but has some extensions (e.g. addition of “track_id” and “image_filepath”) and changes (e.g. “image_id” is not required if the file path is present).

Minimal Prototype (Only Required Fields)

[
    {
        "area"          : int,
        "bbox"          : [x,y,width,height],
        "category_id"   : int,
        "score"         : float
        "image_filepath": String
    }
]

Detailed Prototype

[
    {
        "area"                  : int,
        "bbox"                  : [x,y,width,height],
        "category_id"           : int,
        "score"                 : float
        "object_class_label"    : String,
        "image_filepath"        : String,
        "image_id"              : int,
        "frame_number"          : int,
        "det_time"              : int,
        "net_time"              : int,
        "timestamp"             : int,
        "track_id"              : int,
        "fine_grained"          : [
            {
              "characteristic_1"    : int_or_something_else
            },
            {
              "characteristic_2"    : int_or_something_else
            },
            ...
       ]
    }
]

Full Examples

NNTC Detection Output
[
    {
        "area": 2291,
        "bbox": [
            245,
            466,
            29,
            79
        ],
        "category_id": 1,
        "det_time": 1,
        "detection_source": "acquisition",
        "frame_number": 0,
        "image_filepath": "data/video-BzZspxAweF8AnKhWK-frame-000745-SSCRtAHcFjphNPczJ.jpg",
        "net_time": 1,
        "object_class_label": "person",
        "score": 0.96,
        "timestamp": "1652415948505",
        "track_id": -1
    },
    {
        "area": 2320,
        "bbox": [
            243,
            465,
            29,
            80
        ],
        "category_id": 1,
        "det_time": 1,
        "detection_source": "acquisition",
        "frame_number": 1,
        "image_filepath": "data/video-BzZspxAweF8AnKhWK-frame-000746-MHHMa9SEkr4Be79Fg.jpg",
        "net_time": 1,
        "object_class_label": "person",
        "score": 0.98,
        "timestamp": "1652415948539",
        "track_id": -1
    }
]
NNTC Track Output
[
    {
        "area": 2233,
        "bbox": [
            243,
            464,
            29,
            77
        ],
        "category_id": 1,
        "det_time": 1,
        "frame_number": 3,
        "image_filepath": "data/video-BzZspxAweF8AnKhWK-frame-000748-9tAPb5b34YrhNisoL.jpg",
        "net_time": 1,
        "object_class_label": "person",
        "score": 0.99,
        "timestamp": "1652415948605",
        "track_id": 1,
        "track_update_source": "object_detector",
        "track_update_state": "updated"
    },
    {
        "area": 2310,
        "bbox": [
            242,
            464,
            30,
            77
        ],
        "category_id": 1,
        "det_time": 1,
        "frame_number": 4,
        "image_filepath": "data/video-BzZspxAweF8AnKhWK-frame-000749-haAffTiwkCuseTx9S.jpg",
        "net_time": 1,
        "object_class_label": "person",
        "score": 0.66,
        "timestamp": "1652415948639",
        "track_id": 1,
        "track_update_source": "object_detector",
        "track_update_state": "not_updated"
    }
]

Top-Level Fields

The formatted detections format consists of an array of json objects. Each json object represents a detection at a certain point in time. The typical example looks like below, but some of the key/value pairs are optional, see the first object for the typical key/value pairs and the second one, for all different key/value pairs including the optional ones.

Key

Type

Example

Description

“area”

int

327680

Required area of the detected object in pixels. Typically width * height

“bbox”

list

[0,0,640,512]

Required location the detected object in pixels: top left x and y coordinates
followed by the width and height.
[x,y,width,height]

“category_id”

int

1

The category or class of the detected objects in its numerical representation
(starting from 1, not 0)
For the string representation you need to lookup the index in the label_vector file.

“object_class_label”

string

“person”

Optional name of the object class id looked up in the label vector.

“score”

float

0.95

Required “confidence” score for the detection or track (0.0 to 1.0).

“image_filepath”

string

“data/haAffTiwkCuseTx9S.jpg”

This will be the filepath for the image. This can be relative to this file or absolute.
Conditionally optional if “image_id” is present.

“image_id”

int

123

Conditionally optional unique ID for the image. If this is not present “image_filepath”
needs to be present.

“frame_number”

int

123

Optional integer to represent the order that the frame is read (i.e. the nth frame processed).
For some datasets “image_id” is already assigned and this might hold redundant information.

“timestamp”

int

1652415948639

Optional system timestamp of when the frame was read in unix time with millisecond accuracy.
NNTC outputs this field.

“fine-grained”

list

[ {..}, {..}, .. ]

Optional list of objects that contains fine-grain classification of objects
This classifier is separate from the Object Detector and so it has a separate label vector,
cat-ids, etc. The output is meant to be a further characterization of the object
(i.e. object class = vehicle, fine grained will tell you ‘what kind of vehicle’).

“net_time”

int

1

Deprecated inference time of the network in µs.
Originally used scripts to calculate FPS and speeds of the network.
If this value is 1, then the timing info is not being written.

“det_time”

int

1

Deprecated combined time of several operations in µs:
setup (copy inputs), inference, cleanup (copy back results) of the network.
If this value is 1, then the timing info is not being written.

Label Vector

The label vector JSON file contains an array of class names. The class ID is the index of the class name, starting at 0.

Example label vector file:

{
  "name": "ObjectClass",
  "values": [
    "background",
    "person",
    "bicycle",
    "car",
    "uav",
    "vehicle",
    "unknown",
    "face"
  ]
}

Remap File

A remap file should be a dictionary of input names and output names in JSON syntax.

Example remap file:

{
  "car": "vehicle",
  "motorcycle": "bicycle",
  "truck": "vehicle",
  "van": "vehicle",
  "bus": "vehicle",
  "other vehicle": "vehicle",
  "drone": "uav",
  "phantom": "uav",
  "fixed wing": "uav",
  "inspire 1": "uav",
  "matrice 210": "uav",
  "pedestrian": "person"
}

Conservator Dataset

A JSON file that describes the videos, frames and annotations, with associated frame images usually in the data or analyticsData subdirectory.

General structure of index.json file

{
    "frames":
    [
        // Frame object
        {
            "annotations":
            [
                // Annotation object
                {
                    "attributes":
                    [
                        // Attribute object
                        { .. }
                    ]
                }
            ]
        }
    ],
    "videos":
    [
        // Video object
        { .. }
    ]
}

A Conservator file consists of the following key building blocks:

  • Frame - Each image file in a dataset is a frame

  • Annotation - Describes a region in the image. Can be a bounding box, point, or polygon

  • Source - Describes the origin of the annotation (user, tool, etc)

  • Attribute - Describes a property of an annotation

  • Video - Organizational mechanism that groups frames. A video has common attributes such as the sensor or temporal relations

Full Examples

Conservator Dataset Example
{
    "datasetId": "nsfhqYrxd4Nk3tGv6",
    "datasetName": "dataset_rgb_img_val",
    "owner": "name@company.com",
    "version": 1,
    "frames": [
        {
            "annotations": [
                {
                    "boundingBox": { "h": 166, "w": 304, "x": 1079, "y": 656 },
                    "labels": [
                        "car"
                    ],
                    "attributes": [
                        {
                            "_id": "MgtmF58vxKTSy3hqH",
                            "attributePrototypeId": "HR8P4QdTtYcMCe5vp",
                            "name": "color",
                            "options": [
                                "unknown",
                                "white",
                                "black",
                                "red",
                                ...
                            ],
                            "source": "Conservator",
                            "type": "radio",
                            "value": "red"
                        },
                        ...
                    ]
                },
                ...
            ],
            "datasetFrameId": "BqWJ7uGCivSmkkBfj",
            "videoMetadata": {
                "frameId": "KsfdGuvSpnYfDEK9L",
                "frameIndex": 0,
                "videoId": "GFsKBJbRzCdRPPpo9"
            },
            ...
        },
        ...
    ],
    "videos": [
        {
          "id": "dkgizpZBCSA4WFkPz",
          ...
        },
        ...
    ]
}

Top-Level Fields

{
  "datasetId": String,
  "datasetName": String,
  "owner": String,
  "version": int,
  "frames": list,
  "videos": list
}

Key

Type

Example

Description

“datasetId”

string

“nsfhqYrxd4Nk3tGv6”

Required unique dataset identifier assigned by Conservator.

“datasetName”

string

“dataset_rgb_img_val”

Required human-readable name assigned to the dataset by a user.

“owner”

string

name@company.com

Required email address of the user who owns the dataset.

“version”

int

1

Required format version of the dataset file (currently just 1).

“frames”

list

[ {..}, {..} ]

Required list of Frame objects in the dataset.

“videos”

list

[ {..}, {..} ]

Required list of Video objects in the dataset.

Frame

Key

Type

Example

Description

“frameIndex”

int

“nsfhqYrxd4Nk3tGv6”

Required The zero-based frame index of the target frame.
Must be greater or equal to 0. The frameIndex must be unique
and may not be greater than the highest frame index of the target video.

“annotations”

list

[ {..}, {..} ]

Optional list of zero or more Annotation objects.
Providing an array of Annotation objects replaces all
annotations in the current frame with the provided annotations.
To remove all annotations for the current frame, provide an empty array.
Omit this field completely to leave the existing annotations for
the current frame intact.

“custom”

string

{ .. }

Free-form object for storing custom frame metadata.
Accepts any valid JSON object, including nested objects and arrays.

Annotation

An annotation can be human-generated or machine-generated (described in the Source object).

Key

Type

Example

Description

“targetId”

int

“nsfhqYrxd4Nk3tGv6”

Optional integer that identifies the object from frame-to-frame.
Subsequent frames of the same object in a video would have the same targetId.
This is useful in applications where the object is tracked over time and allows.
computation of metrics such as MOTA, MOTP, HOTA, etc.

“labels”

list

[ “person” ]

Required array of one non-empty strings.
Allowed characters are lower case a-z, 0-9, hyphen, underscore and space.
Historically multiple labels have been supported, but that functionality
has been deprecated.

“labelId”

string

“rsAGiJREDEiyz4H2u”

Unique ID that describes the label in the Label Set

“boundingBox”

object

{ .. }

Object that describes size and orientation of a square annotation
in pixels. The origin (0,0) of the image is top left corner.
One of the following keys is conditionally required:
“boundingBox”, “point”, “boundingPolygon”

Example:

{
   "h": 512, // Height
   "w": 640, // Width
   "x": 0, // The x dimensions of the top left corner
   "y": 0  // The y dimensions of the top left corner
}

“point”

object

{ .. }

Single Point object that describes the coordinates of a point in pixels.
The origin (0,0) of the image is top left corner.
{
   "x": 0, // The x dimensions of the point
   "y": 0  // The y dimensions of the point
}

“boundingPolygon”

object

[ {..}, {..} ]

List of Point objects that make up a polygon.
The origin (0,0) of the image is top left corner.
[
  { "x": 0, "y": 0 },    // Point 1
  { "x": 10, "y": 10 },  // Point 2
  { "x": 0, "y": 10 }    // Point 3
]

“attributes”

list

[ {..}, {..} ]

A list of Attribute objects that correspond to the annotation.
For example an attribute can describe the occlusion or truncation
status of the object or it can be used as a way to specify a more
detailed label. The label can be "vehicle" and an attribute can
specify the type of vehicle like "car" or "truck".

“custom”

string

{ .. }

Free-form object for storing custom frame metadata.
Accepts any valid JSON object, including nested objects and arrays.

Source

Key

Type

Example

Description

“type”

string

“human”

Required string that describes the origin of the annotation.
- "human" - a human drew the annotation
- "machine" - the annotation came from a computer vision system (CNN/tracker)

“meta”

object

{ .. }

Additional context about the annotation such as the user or the algorithm
that was used to create the annotation.
“meta”: {
“tool”: “conservator”,
“user”: “Q5XkSBgAmASqLiRcS”
}

Attribute

Example attribute:

{
    "_id": "pwGzbJYvXyPdJcJA8",
    "attributePrototypeId": "2skd7FtnBtokA5nWc",
    "name": "occluded",
    "options": [
        "No (Fully Visible)",
        "1% - 70% Occluded (Partially Occluded)",
        "70% - 90% Occluded (Difficult to see)"
    ],
    "source": "Conservator",
    "type": "radio",
    "value": "No (Fully Visible)"
}

Key

Type

Example

Description

“_id”

string

“pwGzbJYvXyPdJcJA8”

Required unique identifier of the attribute assigned by Conservator.

“attributePrototypeId”

string

“2skd7FtnBtokA5nWc”

“name”

string

“occluded”

Required human-readable description of the Attribute.
In this example “occluded” describes the amount that
other objects in the scene are blocking this annotation.

“type”

string

“radio”

Required type of UI element used to enter the attribute
- "radio" - only one can be selected at once
- "checkbox?" - TODO

“options”

list

[ “a”, “b”, “c” ]

A list of available object for the Attribute.

“source”

string

“Conservator”

The tool that defined the Attribute.
This is used to determine if the Attribute came from
Conservator or a 3rd party tool such as LabelBox.

“value”

string

“a”

The value of the attribute

Video

Key

Type

Example

Description

“id”

string

“pwGzbJYvXyPdJcJA8”

NOTE: This field (a unique identifier) is included for reference
when exporting metadata, but is ignored when importing metadata.
into Conservator.

“filename”

string

“24mm_day_280.tiff”

Original filename that was uploaded to Conservator

“name”

string

“24mm_day_280.tiff”

Human-readable file name that describes the video.
By default this will be the same as the filename, but the and
name can be changed in the UI after upload.

“owner”

string

name@company.com

Email address of the user who owns the video.

“description”

string

“radio”

User-defined field with comments about the video

“tags”

list

[ “a”, “b”, “c” ]

Optional array of zero or more non-empty strings.
Allowed characters are lower case a-z, 0-9, hyphen, underscore and space.

“source”

string

“Conservator”

The tool that defined the Attribute.
This is used to determine if the Attribute came from
Conservator or a 3rd party tool such as LabelBox.

“custom”

string

{ .. }

Optional free-form object for storing custom frame metadata.
Accepts any valid JSON object, including nested objects and arrays.

COCO Format

COCO is a large-scale object detection, segmentation, and captioning dataset developer by academic and industry partners.

In general, there is a not a clear file organization established by COCO however the following examples can be used as a semi-standard schemes.

  1. Datumaro follows this format. See the following guide.

./
├── images/
│   ├── {filename0}.jpg
│   ├── {filename1}.jpg
│   └── ..
└── annotations/instances_default.json
  1. Similar to the Conservator format the following structure can be used.

./
├── data/
│   ├── {filename0}.jpg
│   ├── {filename1}.jpg
│   └── ..
└── annotations_coco.json

The VisData import dialog looks for a simple pattern: if the annotation file contains the term “coco” it will read it as a COCO-formatted file.

General Annotation File Structure:

{
    "info": {..},
    "licenses": [
        {
            "id": 1,
            "name": "Attribution-NonCommercial-ShareAlike License",
            "url": "http://creativecommons.org/licenses/by-nc-sa/2.0/",
        },
        ..
    ],
    "categories": [
        ..
        {
            "id": 2,
            "name": "cat",
            "supercategory": "animal",
            "keypoints": ["nose", "head", .. ],
            "skeleton": [[12, 14], [14, 16], .. ]
        },
        ..
    ],
    "images": [
        {
            "id": 1,
            "license": 1,
            "file_name": "<filename0>.<ext>",
            "height": 480,
            "width": 640,
            "date_captured": null
        },
        ..
    ],
    "annotations": [
        {
            "id": 1,
            "image_id": 1,
            "category_id": 2,
            "bbox": [260, 177, 231, 199],
            "segmentation": [ .. ],
            "keypoints": [224, 226, 2, .. ],
            "num_keypoints": 10,
            "score": 0.95,
            "area": 45969,
            "iscrowd": 0
        },
        ..
    ]
}
See also: