External Data Formats
NNTC Log Files
This is the standard set of files logged by Teledyne FLIR’s computer vision library NNTC.
./outputs/2022-10-13_15-50-46_0xb32
├── config.json
├── detections.json
├── nntc_config_expanded.json
├── nntc.log
├── run_command.txt
├── tracks.json
└── version.json
File |
Description |
---|---|
config.json
Alternatively:
config.json.aes
|
A flattened ($ref objects resolved) copy of the configurations used to run NNTC.
The file will be encrypted and contain the .aes extension
if the input configuration is encrypted.
|
nntc_config_expanded.json
Alternatively:
nntc_config_expanded.json.aes
|
Similar to
config.json except additional preprocessing isused to prepare values into a usable state for NNTC
(defaults applied and values cast to proper types)
|
nntc.log |
Raw log output (same as what is printed to screen). |
run_command.txt |
Saves command that was run with all input arguments. |
version.json |
Information about application and model versions for record keeping |
detections.json |
Output of the CNN is logged using the Formatted Detections format.
The name of this file is determined by the input argument:
--log_detections=detections.json |
tracks.json |
Output of the tracker using the Formatted Detections format.
The name of this file is determined by the input argument:
--log_tracks=tracks.json |
Formatted Detections
The Teledyne FLIR Formatted Detections format holds a textual representation of bounding box output (detections and tracks) using JSON syntax. It is one of the key formats used by the NNTC computer vision library. Formatted Detections are derived from the COCO format, but has some extensions (e.g. addition of “track_id” and “image_filepath”) and changes (e.g. “image_id” is not required if the file path is present).
Minimal Prototype (Only Required Fields)
[
{
"area" : int,
"bbox" : [x,y,width,height],
"category_id" : int,
"score" : float
"image_filepath": String
}
]
Detailed Prototype
[
{
"area" : int,
"bbox" : [x,y,width,height],
"category_id" : int,
"score" : float
"object_class_label" : String,
"image_filepath" : String,
"image_id" : int,
"frame_number" : int,
"det_time" : int,
"net_time" : int,
"timestamp" : int,
"track_id" : int,
"fine_grained" : [
{
"characteristic_1" : int_or_something_else
},
{
"characteristic_2" : int_or_something_else
},
...
]
}
]
Full Examples
NNTC Detection Output
[
{
"area": 2291,
"bbox": [
245,
466,
29,
79
],
"category_id": 1,
"det_time": 1,
"detection_source": "acquisition",
"frame_number": 0,
"image_filepath": "data/video-BzZspxAweF8AnKhWK-frame-000745-SSCRtAHcFjphNPczJ.jpg",
"net_time": 1,
"object_class_label": "person",
"score": 0.96,
"timestamp": "1652415948505",
"track_id": -1
},
{
"area": 2320,
"bbox": [
243,
465,
29,
80
],
"category_id": 1,
"det_time": 1,
"detection_source": "acquisition",
"frame_number": 1,
"image_filepath": "data/video-BzZspxAweF8AnKhWK-frame-000746-MHHMa9SEkr4Be79Fg.jpg",
"net_time": 1,
"object_class_label": "person",
"score": 0.98,
"timestamp": "1652415948539",
"track_id": -1
}
]
NNTC Track Output
[
{
"area": 2233,
"bbox": [
243,
464,
29,
77
],
"category_id": 1,
"det_time": 1,
"frame_number": 3,
"image_filepath": "data/video-BzZspxAweF8AnKhWK-frame-000748-9tAPb5b34YrhNisoL.jpg",
"net_time": 1,
"object_class_label": "person",
"score": 0.99,
"timestamp": "1652415948605",
"track_id": 1,
"track_update_source": "object_detector",
"track_update_state": "updated"
},
{
"area": 2310,
"bbox": [
242,
464,
30,
77
],
"category_id": 1,
"det_time": 1,
"frame_number": 4,
"image_filepath": "data/video-BzZspxAweF8AnKhWK-frame-000749-haAffTiwkCuseTx9S.jpg",
"net_time": 1,
"object_class_label": "person",
"score": 0.66,
"timestamp": "1652415948639",
"track_id": 1,
"track_update_source": "object_detector",
"track_update_state": "not_updated"
}
]
Top-Level Fields
The formatted detections format consists of an array of json objects. Each json object represents a detection at a certain point in time. The typical example looks like below, but some of the key/value pairs are optional, see the first object for the typical key/value pairs and the second one, for all different key/value pairs including the optional ones.
Key |
Type |
Example |
Description |
---|---|---|---|
“area” |
int |
327680 |
Required area of the detected object in pixels. Typically width * height |
“bbox” |
list |
[0,0,640,512] |
Required location the detected object in pixels: top left x and y coordinates
followed by the width and height.
[x,y,width,height] |
“category_id” |
int |
1 |
The category or class of the detected objects in its numerical representation
(starting from 1, not 0)
For the string representation you need to lookup the index in the label_vector file.
|
“object_class_label” |
string |
“person” |
Optional name of the object class id looked up in the label vector.
|
“score” |
float |
0.95 |
Required “confidence” score for the detection or track (0.0 to 1.0).
|
“image_filepath” |
string |
“data/haAffTiwkCuseTx9S.jpg” |
This will be the filepath for the image. This can be relative to this file or absolute.
Conditionally optional if “image_id” is present.
|
“image_id” |
int |
123 |
Conditionally optional unique ID for the image. If this is not present “image_filepath”
needs to be present.
|
“frame_number” |
int |
123 |
Optional integer to represent the order that the frame is read (i.e. the nth frame processed).
For some datasets “image_id” is already assigned and this might hold redundant information.
|
“timestamp” |
int |
1652415948639 |
Optional system timestamp of when the frame was read in unix time with millisecond accuracy.
NNTC outputs this field.
|
“fine-grained” |
list |
[ {..}, {..}, .. ] |
Optional list of objects that contains fine-grain classification of objects
This classifier is separate from the Object Detector and so it has a separate label vector,
cat-ids, etc. The output is meant to be a further characterization of the object
(i.e. object class = vehicle, fine grained will tell you ‘what kind of vehicle’).
|
“net_time” |
int |
1 |
Deprecated inference time of the network in µs.
Originally used scripts to calculate FPS and speeds of the network.
If this value is 1, then the timing info is not being written.
|
“det_time” |
int |
1 |
Deprecated combined time of several operations in µs:
setup (copy inputs), inference, cleanup (copy back results) of the network.
If this value is 1, then the timing info is not being written.
|
Label Vector
The label vector JSON file contains an array of class names. The class ID is the index of the class name, starting at 0.
Example label vector file:
{
"name": "ObjectClass",
"values": [
"background",
"person",
"bicycle",
"car",
"uav",
"vehicle",
"unknown",
"face"
]
}
Remap File
A remap file should be a dictionary of input names and output names in JSON syntax.
Example remap file:
{
"car": "vehicle",
"motorcycle": "bicycle",
"truck": "vehicle",
"van": "vehicle",
"bus": "vehicle",
"other vehicle": "vehicle",
"drone": "uav",
"phantom": "uav",
"fixed wing": "uav",
"inspire 1": "uav",
"matrice 210": "uav",
"pedestrian": "person"
}
Conservator Dataset
A JSON file that describes the videos, frames and annotations, with associated frame images usually in the data or analyticsData subdirectory.
General structure of index.json file
{
"frames":
[
// Frame object
{
"annotations":
[
// Annotation object
{
"attributes":
[
// Attribute object
{ .. }
]
}
]
}
],
"videos":
[
// Video object
{ .. }
]
}
A Conservator file consists of the following key building blocks:
Frame - Each image file in a dataset is a frame
Annotation - Describes a region in the image. Can be a bounding box, point, or polygon
Source - Describes the origin of the annotation (user, tool, etc)
Attribute - Describes a property of an annotation
Video - Organizational mechanism that groups frames. A video has common attributes such as the sensor or temporal relations
Full Examples
Conservator Dataset Example
{
"datasetId": "nsfhqYrxd4Nk3tGv6",
"datasetName": "dataset_rgb_img_val",
"owner": "name@company.com",
"version": 1,
"frames": [
{
"annotations": [
{
"boundingBox": { "h": 166, "w": 304, "x": 1079, "y": 656 },
"labels": [
"car"
],
"attributes": [
{
"_id": "MgtmF58vxKTSy3hqH",
"attributePrototypeId": "HR8P4QdTtYcMCe5vp",
"name": "color",
"options": [
"unknown",
"white",
"black",
"red",
...
],
"source": "Conservator",
"type": "radio",
"value": "red"
},
...
]
},
...
],
"datasetFrameId": "BqWJ7uGCivSmkkBfj",
"videoMetadata": {
"frameId": "KsfdGuvSpnYfDEK9L",
"frameIndex": 0,
"videoId": "GFsKBJbRzCdRPPpo9"
},
...
},
...
],
"videos": [
{
"id": "dkgizpZBCSA4WFkPz",
...
},
...
]
}
Top-Level Fields
{
"datasetId": String,
"datasetName": String,
"owner": String,
"version": int,
"frames": list,
"videos": list
}
Key |
Type |
Example |
Description |
---|---|---|---|
“datasetId” |
string |
“nsfhqYrxd4Nk3tGv6” |
Required unique dataset identifier assigned by Conservator. |
“datasetName” |
string |
“dataset_rgb_img_val” |
Required human-readable name assigned to the dataset by a user. |
“owner” |
string |
Required email address of the user who owns the dataset. |
|
“version” |
int |
1 |
Required format version of the dataset file (currently just 1). |
“frames” |
list |
[ {..}, {..} ] |
Required list of Frame objects in the dataset.
|
“videos” |
list |
[ {..}, {..} ] |
Required list of Video objects in the dataset.
|
Frame
Key |
Type |
Example |
Description |
---|---|---|---|
“frameIndex” |
int |
“nsfhqYrxd4Nk3tGv6” |
Required The zero-based frame index of the target frame.
Must be greater or equal to 0. The frameIndex must be unique
and may not be greater than the highest frame index of the target video.
|
“annotations” |
list |
[ {..}, {..} ] |
Optional list of zero or more Annotation objects.
Providing an array of Annotation objects replaces all
annotations in the current frame with the provided annotations.
To remove all annotations for the current frame, provide an empty array.
Omit this field completely to leave the existing annotations for
the current frame intact.
|
“custom” |
string |
{ .. } |
Free-form object for storing custom frame metadata.
Accepts any valid JSON object, including nested objects and arrays.
|
Annotation
An annotation can be human-generated or machine-generated (described in the Source object).
Key |
Type |
Example |
Description |
---|---|---|---|
“targetId” |
int |
“nsfhqYrxd4Nk3tGv6” |
Optional integer that identifies the object from frame-to-frame.
Subsequent frames of the same object in a video would have the same targetId.
This is useful in applications where the object is tracked over time and allows.
computation of metrics such as MOTA, MOTP, HOTA, etc.
|
“labels” |
list |
[ “person” ] |
Required array of one non-empty strings.
Allowed characters are lower case a-z, 0-9, hyphen, underscore and space.
Historically multiple labels have been supported, but that functionality
has been deprecated.
|
“labelId” |
string |
“rsAGiJREDEiyz4H2u” |
Unique ID that describes the label in the Label Set
|
“boundingBox” |
object |
{ .. } |
Object that describes size and orientation of a square annotation
in pixels. The origin (0,0) of the image is top left corner.
One of the following keys is conditionally required:
“boundingBox”, “point”, “boundingPolygon”
Example: {
"h": 512, // Height
"w": 640, // Width
"x": 0, // The x dimensions of the top left corner
"y": 0 // The y dimensions of the top left corner
}
|
“point” |
object |
{ .. } |
Single Point object that describes the coordinates of a point in pixels.
The origin (0,0) of the image is top left corner.
{
"x": 0, // The x dimensions of the point
"y": 0 // The y dimensions of the point
}
|
“boundingPolygon” |
object |
[ {..}, {..} ] |
List of Point objects that make up a polygon.
The origin (0,0) of the image is top left corner.
[
{ "x": 0, "y": 0 }, // Point 1
{ "x": 10, "y": 10 }, // Point 2
{ "x": 0, "y": 10 } // Point 3
]
|
“attributes” |
list |
[ {..}, {..} ] |
A list of Attribute objects that correspond to the annotation.
For example an attribute can describe the occlusion or truncation
status of the object or it can be used as a way to specify a more
detailed label. The label can be
"vehicle" and an attribute canspecify the type of vehicle like
"car" or "truck" . |
“custom” |
string |
{ .. } |
Free-form object for storing custom frame metadata.
Accepts any valid JSON object, including nested objects and arrays.
|
Source
Key |
Type |
Example |
Description |
---|---|---|---|
“type” |
string |
“human” |
Required string that describes the origin of the annotation.
-
"human" - a human drew the annotation-
"machine" - the annotation came from a computer vision system (CNN/tracker) |
“meta” |
object |
{ .. } |
Additional context about the annotation such as the user or the algorithm
that was used to create the annotation.
“meta”: {
“tool”: “conservator”,
“user”: “Q5XkSBgAmASqLiRcS”
}
|
Attribute
Example attribute:
{
"_id": "pwGzbJYvXyPdJcJA8",
"attributePrototypeId": "2skd7FtnBtokA5nWc",
"name": "occluded",
"options": [
"No (Fully Visible)",
"1% - 70% Occluded (Partially Occluded)",
"70% - 90% Occluded (Difficult to see)"
],
"source": "Conservator",
"type": "radio",
"value": "No (Fully Visible)"
}
Key |
Type |
Example |
Description |
---|---|---|---|
“_id” |
string |
“pwGzbJYvXyPdJcJA8” |
Required unique identifier of the attribute assigned by Conservator.
|
“attributePrototypeId” |
string |
“2skd7FtnBtokA5nWc” |
|
“name” |
string |
“occluded” |
Required human-readable description of the Attribute.
In this example “occluded” describes the amount that
other objects in the scene are blocking this annotation.
|
“type” |
string |
“radio” |
Required type of UI element used to enter the attribute
-
"radio" - only one can be selected at once-
"checkbox?" - TODO |
“options” |
list |
[ “a”, “b”, “c” ] |
A list of available object for the Attribute.
|
“source” |
string |
“Conservator” |
The tool that defined the Attribute.
This is used to determine if the Attribute came from
Conservator or a 3rd party tool such as LabelBox.
|
“value” |
string |
“a” |
The value of the attribute
|
Video
Key |
Type |
Example |
Description |
---|---|---|---|
“id” |
string |
“pwGzbJYvXyPdJcJA8” |
NOTE: This field (a unique identifier) is included for reference
when exporting metadata, but is ignored when importing metadata.
into Conservator.
|
“filename” |
string |
“24mm_day_280.tiff” |
Original filename that was uploaded to Conservator |
“name” |
string |
“24mm_day_280.tiff” |
Human-readable file name that describes the video.
By default this will be the same as the filename, but the and
name can be changed in the UI after upload.
|
“owner” |
string |
Email address of the user who owns the video. |
|
“description” |
string |
“radio” |
User-defined field with comments about the video
|
“tags” |
list |
[ “a”, “b”, “c” ] |
Optional array of zero or more non-empty strings.
Allowed characters are lower case a-z, 0-9, hyphen, underscore and space.
|
“source” |
string |
“Conservator” |
The tool that defined the Attribute.
This is used to determine if the Attribute came from
Conservator or a 3rd party tool such as LabelBox.
|
“custom” |
string |
{ .. } |
Optional free-form object for storing custom frame metadata.
Accepts any valid JSON object, including nested objects and arrays.
|
COCO Format
COCO is a large-scale object detection, segmentation, and captioning dataset developer by academic and industry partners.
In general, there is a not a clear file organization established by COCO however the following examples can be used as a semi-standard schemes.
Datumaro follows this format. See the following guide.
./
├── images/
│ ├── {filename0}.jpg
│ ├── {filename1}.jpg
│ └── ..
└── annotations/instances_default.json
Similar to the Conservator format the following structure can be used.
./
├── data/
│ ├── {filename0}.jpg
│ ├── {filename1}.jpg
│ └── ..
└── annotations_coco.json
The VisData import dialog looks for a simple pattern: if the annotation file contains the term “coco” it will read it as a COCO-formatted file.
General Annotation File Structure:
{
"info": {..},
"licenses": [
{
"id": 1,
"name": "Attribution-NonCommercial-ShareAlike License",
"url": "http://creativecommons.org/licenses/by-nc-sa/2.0/",
},
..
],
"categories": [
..
{
"id": 2,
"name": "cat",
"supercategory": "animal",
"keypoints": ["nose", "head", .. ],
"skeleton": [[12, 14], [14, 16], .. ]
},
..
],
"images": [
{
"id": 1,
"license": 1,
"file_name": "<filename0>.<ext>",
"height": 480,
"width": 640,
"date_captured": null
},
..
],
"annotations": [
{
"id": 1,
"image_id": 1,
"category_id": 2,
"bbox": [260, 177, 231, 199],
"segmentation": [ .. ],
"keypoints": [224, 226, 2, .. ],
"num_keypoints": 10,
"score": 0.95,
"area": 45969,
"iscrowd": 0
},
..
]
}
- See also: