External Data Formats

NNTC Log Files

This is the standard set of files logged by Teledyne FLIR’s computer vision library NNTC.

./outputs/2022-10-13_15-50-46_0xb32
├── config.json
├── detections.json
├── nntc_config_expanded.json
├── nntc.log
├── run_command.txt
├── tracks.json
└── version.json

File	Description
config.json Alternatively: config.json.aes	A flattened ($ref objects resolved) copy of the configurations used to run NNTC. The file will be encrypted and contain the .aes extension if the input configuration is encrypted.
nntc_config_expanded.json Alternatively: nntc_config_expanded.json.aes	Similar to `config.json` except additional preprocessing is used to prepare values into a usable state for NNTC (defaults applied and values cast to proper types)
nntc.log	Raw log output (same as what is printed to screen).
run_command.txt	Saves command that was run with all input arguments.
version.json	Information about application and model versions for record keeping
detections.json	Output of the CNN is logged using the Formatted Detections format. The name of this file is determined by the input argument: `--log_detections=detections.json`
tracks.json	Output of the tracker using the Formatted Detections format. The name of this file is determined by the input argument: `--log_tracks=tracks.json`

Formatted Detections

The Teledyne FLIR Formatted Detections format holds a textual representation of bounding box output (detections and tracks) using JSON syntax. It is one of the key formats used by the NNTC computer vision library. Formatted Detections are derived from the COCO format, but has some extensions (e.g. addition of “track_id” and “image_filepath”) and changes (e.g. “image_id” is not required if the file path is present).

Minimal Prototype (Only Required Fields)

[
    {
        "area"          : int,
        "bbox"          : [x,y,width,height],
        "category_id"   : int,
        "score"         : float
        "image_filepath": String
    }
]

Detailed Prototype

[
    {
        "area"                  : int,
        "bbox"                  : [x,y,width,height],
        "category_id"           : int,
        "score"                 : float
        "object_class_label"    : String,
        "image_filepath"        : String,
        "image_id"              : int,
        "frame_number"          : int,
        "det_time"              : int,
        "net_time"              : int,
        "timestamp"             : int,
        "track_id"              : int,
        "fine_grained"          : [
            {
              "characteristic_1"    : int_or_something_else
            },
            {
              "characteristic_2"    : int_or_something_else
            },
            ...
       ]
    }
]

Full Examples

NNTC Detection Output

[
    {
        "area": 2291,
        "bbox": [
            245,
            466,
            29,
            79
        ],
        "category_id": 1,
        "det_time": 1,
        "detection_source": "acquisition",
        "frame_number": 0,
        "image_filepath": "data/video-BzZspxAweF8AnKhWK-frame-000745-SSCRtAHcFjphNPczJ.jpg",
        "net_time": 1,
        "object_class_label": "person",
        "score": 0.96,
        "timestamp": "1652415948505",
        "track_id": -1
    },
    {
        "area": 2320,
        "bbox": [
            243,
            465,
            29,
            80
        ],
        "category_id": 1,
        "det_time": 1,
        "detection_source": "acquisition",
        "frame_number": 1,
        "image_filepath": "data/video-BzZspxAweF8AnKhWK-frame-000746-MHHMa9SEkr4Be79Fg.jpg",
        "net_time": 1,
        "object_class_label": "person",
        "score": 0.98,
        "timestamp": "1652415948539",
        "track_id": -1
    }
]

NNTC Track Output

[
    {
        "area": 2233,
        "bbox": [
            243,
            464,
            29,
            77
        ],
        "category_id": 1,
        "det_time": 1,
        "frame_number": 3,
        "image_filepath": "data/video-BzZspxAweF8AnKhWK-frame-000748-9tAPb5b34YrhNisoL.jpg",
        "net_time": 1,
        "object_class_label": "person",
        "score": 0.99,
        "timestamp": "1652415948605",
        "track_id": 1,
        "track_update_source": "object_detector",
        "track_update_state": "updated"
    },
    {
        "area": 2310,
        "bbox": [
            242,
            464,
            30,
            77
        ],
        "category_id": 1,
        "det_time": 1,
        "frame_number": 4,
        "image_filepath": "data/video-BzZspxAweF8AnKhWK-frame-000749-haAffTiwkCuseTx9S.jpg",
        "net_time": 1,
        "object_class_label": "person",
        "score": 0.66,
        "timestamp": "1652415948639",
        "track_id": 1,
        "track_update_source": "object_detector",
        "track_update_state": "not_updated"
    }
]

Top-Level Fields

The formatted detections format consists of an array of json objects. Each json object represents a detection at a certain point in time. The typical example looks like below, but some of the key/value pairs are optional, see the first object for the typical key/value pairs and the second one, for all different key/value pairs including the optional ones.

Key	Type	Example	Description
“area”	int	327680	Required area of the detected object in pixels. Typically width * height
“bbox”	list	[0,0,640,512]	Required location the detected object in pixels: top left x and y coordinates followed by the width and height. `[x,y,width,height]`
“category_id”	int	1	The category or class of the detected objects in its numerical representation (starting from 1, not 0) For the string representation you need to lookup the index in the label_vector file.
“object_class_label”	string	“person”	Optional name of the object class id looked up in the label vector.
“score”	float	0.95	Required “confidence” score for the detection or track (0.0 to 1.0).
“image_filepath”	string	“data/haAffTiwkCuseTx9S.jpg”	This will be the filepath for the image. This can be relative to this file or absolute. Conditionally optional if “image_id” is present.
“image_id”	int	123	Conditionally optional unique ID for the image. If this is not present “image_filepath” needs to be present.
“frame_number”	int	123	Optional integer to represent the order that the frame is read (i.e. the nth frame processed). For some datasets “image_id” is already assigned and this might hold redundant information.
“timestamp”	int	1652415948639	Optional system timestamp of when the frame was read in unix time with millisecond accuracy. NNTC outputs this field.
“fine-grained”	list	[ {..}, {..}, .. ]	Optional list of objects that contains fine-grain classification of objects This classifier is separate from the Object Detector and so it has a separate label vector, cat-ids, etc. The output is meant to be a further characterization of the object (i.e. object class = vehicle, fine grained will tell you ‘what kind of vehicle’).
“net_time”	int	1	Deprecated inference time of the network in µs. Originally used scripts to calculate FPS and speeds of the network. If this value is 1, then the timing info is not being written.
“det_time”	int	1	Deprecated combined time of several operations in µs: setup (copy inputs), inference, cleanup (copy back results) of the network. If this value is 1, then the timing info is not being written.

Label Vector

The label vector JSON file contains an array of class names. The class ID is the index of the class name, starting at 0.

Example label vector file:

{
  "name": "ObjectClass",
  "values": [
    "background",
    "person",
    "bicycle",
    "car",
    "uav",
    "vehicle",
    "unknown",
    "face"
  ]
}

Remap File

A remap file should be a dictionary of input names and output names in JSON syntax.

Example remap file:

{
  "car": "vehicle",
  "motorcycle": "bicycle",
  "truck": "vehicle",
  "van": "vehicle",
  "bus": "vehicle",
  "other vehicle": "vehicle",
  "drone": "uav",
  "phantom": "uav",
  "fixed wing": "uav",
  "inspire 1": "uav",
  "matrice 210": "uav",
  "pedestrian": "person"
}

Conservator Dataset

A JSON file that describes the videos, frames and annotations, with associated frame images usually in the data or analyticsData subdirectory.

General structure of index.json file

{
    "frames":
    [
        // Frame object
        {
            "annotations":
            [
                // Annotation object
                {
                    "attributes":
                    [
                        // Attribute object
                        { .. }
                    ]
                }
            ]
        }
    ],
    "videos":
    [
        // Video object
        { .. }
    ]
}

A Conservator file consists of the following key building blocks:

Frame - Each image file in a dataset is a frame
Annotation - Describes a region in the image. Can be a bounding box, point, or polygon
Source - Describes the origin of the annotation (user, tool, etc)
Attribute - Describes a property of an annotation
Video - Organizational mechanism that groups frames. A video has common attributes such as the sensor or temporal relations

Full Examples

Conservator Dataset Example

{
    "datasetId": "nsfhqYrxd4Nk3tGv6",
    "datasetName": "dataset_rgb_img_val",
    "owner": "name@company.com",
    "version": 1,
    "frames": [
        {
            "annotations": [
                {
                    "boundingBox": { "h": 166, "w": 304, "x": 1079, "y": 656 },
                    "labels": [
                        "car"
                    ],
                    "attributes": [
                        {
                            "_id": "MgtmF58vxKTSy3hqH",
                            "attributePrototypeId": "HR8P4QdTtYcMCe5vp",
                            "name": "color",
                            "options": [
                                "unknown",
                                "white",
                                "black",
                                "red",
                                ...
                            ],
                            "source": "Conservator",
                            "type": "radio",
                            "value": "red"
                        },
                        ...
                    ]
                },
                ...
            ],
            "datasetFrameId": "BqWJ7uGCivSmkkBfj",
            "videoMetadata": {
                "frameId": "KsfdGuvSpnYfDEK9L",
                "frameIndex": 0,
                "videoId": "GFsKBJbRzCdRPPpo9"
            },
            ...
        },
        ...
    ],
    "videos": [
        {
          "id": "dkgizpZBCSA4WFkPz",
          ...
        },
        ...
    ]
}

Top-Level Fields

{
  "datasetId": String,
  "datasetName": String,
  "owner": String,
  "version": int,
  "frames": list,
  "videos": list
}

Key	Type	Example	Description
“datasetId”	string	“nsfhqYrxd4Nk3tGv6”	Required unique dataset identifier assigned by Conservator.
“datasetName”	string	“dataset_rgb_img_val”	Required human-readable name assigned to the dataset by a user.
“owner”	string	“name@company.com”	Required email address of the user who owns the dataset.
“version”	int	1	Required format version of the dataset file (currently just 1).
“frames”	list	[ {..}, {..} ]	Required list of Frame objects in the dataset.
“videos”	list	[ {..}, {..} ]	Required list of Video objects in the dataset.

Frame

Key	Type	Example	Description
“frameIndex”	int	“nsfhqYrxd4Nk3tGv6”	Required The zero-based frame index of the target frame. Must be greater or equal to 0. The frameIndex must be unique and may not be greater than the highest frame index of the target video.
“annotations”	list	[ {..}, {..} ]	Optional list of zero or more Annotation objects. Providing an array of Annotation objects replaces all annotations in the current frame with the provided annotations. To remove all annotations for the current frame, provide an empty array. Omit this field completely to leave the existing annotations for the current frame intact.
“custom”	string	{ .. }	Free-form object for storing custom frame metadata. Accepts any valid JSON object, including nested objects and arrays.

Annotation

An annotation can be human-generated or machine-generated (described in the Source object).

Key	Type	Example	Description
“targetId”	int	“nsfhqYrxd4Nk3tGv6”	Optional integer that identifies the object from frame-to-frame. Subsequent frames of the same object in a video would have the same targetId. This is useful in applications where the object is tracked over time and allows. computation of metrics such as MOTA, MOTP, HOTA, etc.
“labels”	list	[ “person” ]	Required array of one non-empty strings. Allowed characters are lower case a-z, 0-9, hyphen, underscore and space. Historically multiple labels have been supported, but that functionality has been deprecated.
“labelId”	string	“rsAGiJREDEiyz4H2u”	Unique ID that describes the label in the Label Set
“boundingBox”	object	{ .. }	Object that describes size and orientation of a square annotation in pixels. The origin (0,0) of the image is top left corner. One of the following keys is conditionally required: “boundingBox”, “point”, “boundingPolygon” Example: { "h": 512, // Height "w": 640, // Width "x": 0, // The x dimensions of the top left corner "y": 0 // The y dimensions of the top left corner }
“point”	object	{ .. }	Single Point object that describes the coordinates of a point in pixels. The origin (0,0) of the image is top left corner. { "x": 0, // The x dimensions of the point "y": 0 // The y dimensions of the point }
“boundingPolygon”	object	[ {..}, {..} ]	List of Point objects that make up a polygon. The origin (0,0) of the image is top left corner. [ { "x": 0, "y": 0 }, // Point 1 { "x": 10, "y": 10 }, // Point 2 { "x": 0, "y": 10 } // Point 3 ]
“attributes”	list	[ {..}, {..} ]	A list of Attribute objects that correspond to the annotation. For example an attribute can describe the occlusion or truncation status of the object or it can be used as a way to specify a more detailed label. The label can be `"vehicle"` and an attribute can specify the type of vehicle like `"car"` or `"truck"`.
“custom”	string	{ .. }	Free-form object for storing custom frame metadata. Accepts any valid JSON object, including nested objects and arrays.

Source

Key	Type	Example	Description
“type”	string	“human”	Required string that describes the origin of the annotation. - `"human"` - a human drew the annotation - `"machine"` - the annotation came from a computer vision system (CNN/tracker)
“meta”	object	{ .. }	Additional context about the annotation such as the user or the algorithm that was used to create the annotation. “meta”: { “tool”: “conservator”, “user”: “Q5XkSBgAmASqLiRcS” }

Attribute

Example attribute:

{
    "_id": "pwGzbJYvXyPdJcJA8",
    "attributePrototypeId": "2skd7FtnBtokA5nWc",
    "name": "occluded",
    "options": [
        "No (Fully Visible)",
        "1% - 70% Occluded (Partially Occluded)",
        "70% - 90% Occluded (Difficult to see)"
    ],
    "source": "Conservator",
    "type": "radio",
    "value": "No (Fully Visible)"
}

Key	Type	Example	Description
“_id”	string	“pwGzbJYvXyPdJcJA8”	Required unique identifier of the attribute assigned by Conservator.
“attributePrototypeId”	string	“2skd7FtnBtokA5nWc”
“name”	string	“occluded”	Required human-readable description of the Attribute. In this example “occluded” describes the amount that other objects in the scene are blocking this annotation.
“type”	string	“radio”	Required type of UI element used to enter the attribute - `"radio"` - only one can be selected at once - `"checkbox?"` - TODO
“options”	list	[ “a”, “b”, “c” ]	A list of available object for the Attribute.
“source”	string	“Conservator”	The tool that defined the Attribute. This is used to determine if the Attribute came from Conservator or a 3rd party tool such as LabelBox.
“value”	string	“a”	The value of the attribute

Video

Key	Type	Example	Description
“id”	string	“pwGzbJYvXyPdJcJA8”	NOTE: This field (a unique identifier) is included for reference when exporting metadata, but is ignored when importing metadata. into Conservator.
“filename”	string	“24mm_day_280.tiff”	Original filename that was uploaded to Conservator
“name”	string	“24mm_day_280.tiff”	Human-readable file name that describes the video. By default this will be the same as the filename, but the and name can be changed in the UI after upload.
“owner”	string	“name@company.com”	Email address of the user who owns the video.
“description”	string	“radio”	User-defined field with comments about the video
“tags”	list	[ “a”, “b”, “c” ]	Optional array of zero or more non-empty strings. Allowed characters are lower case a-z, 0-9, hyphen, underscore and space.
“source”	string	“Conservator”	The tool that defined the Attribute. This is used to determine if the Attribute came from Conservator or a 3rd party tool such as LabelBox.
“custom”	string	{ .. }	Optional free-form object for storing custom frame metadata. Accepts any valid JSON object, including nested objects and arrays.

COCO Format

COCO is a large-scale object detection, segmentation, and captioning dataset developer by academic and industry partners.

In general, there is a not a clear file organization established by COCO however the following examples can be used as a semi-standard schemes.

Datumaro follows this format. See the following guide.

./
├── images/
│   ├── {filename0}.jpg
│   ├── {filename1}.jpg
│   └── ..
└── annotations/instances_default.json

Similar to the Conservator format the following structure can be used.

./
├── data/
│   ├── {filename0}.jpg
│   ├── {filename1}.jpg
│   └── ..
└── annotations_coco.json

The VisData import dialog looks for a simple pattern: if the annotation file contains the term “coco” it will read it as a COCO-formatted file.

General Annotation File Structure:

{
    "info": {..},
    "licenses": [
        {
            "id": 1,
            "name": "Attribution-NonCommercial-ShareAlike License",
            "url": "http://creativecommons.org/licenses/by-nc-sa/2.0/",
        },
        ..
    ],
    "categories": [
        ..
        {
            "id": 2,
            "name": "cat",
            "supercategory": "animal",
            "keypoints": ["nose", "head", .. ],
            "skeleton": [[12, 14], [14, 16], .. ]
        },
        ..
    ],
    "images": [
        {
            "id": 1,
            "license": 1,
            "file_name": "<filename0>.<ext>",
            "height": 480,
            "width": 640,
            "date_captured": null
        },
        ..
    ],
    "annotations": [
        {
            "id": 1,
            "image_id": 1,
            "category_id": 2,
            "bbox": [260, 177, 231, 199],
            "segmentation": [ .. ],
            "keypoints": [224, 226, 2, .. ],
            "num_keypoints": 10,
            "score": 0.95,
            "area": 45969,
            "iscrowd": 0
        },
        ..
    ]
}

See also:

https://cocodataset.org/#format-data