Skip to content


If you don't have BoxJelly installed, please see the installation instructions here.


Once BoxJelly is installed, you can run it from the command line:


Cthulhu configuration (required)

Recently, BoxJelly ditched its internal video player in favor of Cthulhu. This integration is still in development and has some limitations.

As a result, you must configure Cthulhu and BoxJelly before they can be used together. The following configuration is required:

  1. Set the BoxJelly framerate. Cthulhu does not report video framerate, so a default of 29.97 is assumed. This is configurable in the BoxJelly settings (Ctrl+,).
  2. Set the Cthulhu global duration. Set the appropriate duration for localizations in the Cthulhu "Annotations" settings. Normally, this is 1000/fps. If you notice flickering, you may need to increase this value. If you notice overlapping boxes within the same track, you may need to decrease this value.


Before loading anything in BoxJelly, you should ensure that the settings are correct. Select Settings from the File menu or press Ctrl+,.


Open a video and track file

From the main window, navigate to the File menu and select Open, or press Ctrl+O.


A dialog will appear asking you to select a video and track file.


BoxJelly can open a video either from your local filesystem or from a URL.

Track file

BoxJelly currently only supports local track files. There are several track file formats available, documented here. If you have a suggestion or need for another format, please create an issue in the BoxJelly repository.

YOLOv5-DeepSort (.txt)

This format is a modification of the MOT challenge format to suit the MBARI VARS Annotation Assistance (VARS-AA/VAA) project. Each line in the file is contains the following space-delimited information:

<frame> <id> <bb_left> <bb_top> <bb_width> <bb_height> <conf> <label>
  • <frame> is the frame number of the track.
  • <id> is the integer track ID. Internally, these are remapped to UUIDs for consistency with the other track formats.
  • <bb_left> is the left bounding box coordinate.
  • <bb_top> is the top bounding box coordinate.
  • <bb_width> is the width of the bounding box.
  • <bb_height> is the height of the bounding box.
  • <conf> is the confidence of the bounding box, a floating-point number 0-1.
  • <label> is the label of the bounding box. This can be any string, including spaces.

Deepsea-Track (.tar.gz)

This format is the default output of the VAA deepsea-track stack. The .tar.gz archive must contain one file per frame, named f<frame number>.json; e.g.: f0000001.json. The JSON schema is documented within the deepsea-track repository.

JSON (.json)

This format is a more concise, human-readable representation of tracks as JSON and serves mostly for debug purposes. Notably, a track in this format contains an internal list of its own detections, as opposed to the alternative formats which maintain a flat representation of detections by frame.

BoxJelly (track) window

The main window of BoxJelly shows the track panel, consisting of the track list (left) and track timeline (right). The track list shows the ID and label of each track, whereas the track timeline shows the portion of the video spanned by each track. These two views are synchronized.

Track panel

Track list

The track list allows selection of tracks. Click on the track's entry in the list to select it. Ctrl and Shift can be used to select multiple tracks.

Note: Due to a limitation in Cthulhu's interface, selecting tracks can be slow and BoxJelly may freeze for a few seconds until Cthulhu can process the selection.

For aesthetic reasons, track IDs (UUIDs) are truncated to their first 8 characters when displayed. Hovering over a track in the list will show the track's full UUID.

Track timeline

Clicking on a track in the timeline will select it.

In addition, the ruler at the top of the timeline shows frame numbers. Clicking anywhere on the ruler will seek the video to the desired frame.

Scrolling up/down in the track timeline will show earlier/later tracks by start frame.

Scrolling left/right in the track timeline will move the displayed window of time into the video displayed. This window can be rescaled by Ctrl+scrolling.

A red line indicates the current frame. When the video is playing, the view will scroll automatically to keep the current frame in view.

Cthulhu (video) window

When a video is loaded, Cthulhu will open a video window. This window is synchronized with the BoxJelly window.

Video window

Editing tracks

BoxJelly currently offers a number of ways to edit tracks. All editing actions are available from the Edit menu. Additionally, all actions are undo/redoable with Ctrl+Z/Ctrl+Shift+Z.

Relabel (Ctrl+R)

Select a track or multiple tracks and click on the Relabel button in the Edit menu. A dialog will appear asking you to enter a new label.


Delete (Del)

Select a track or multiple tracks and click on the Delete button in the Edit menu. The track will be removed from the track list and the track timeline.

Split (Ctrl+N)

Select a track and seek the video to the moment where you would like to split the track, then click on the Split button in the Edit menu. A new track will be created at the given moment.

Merge (Ctrl+M)

Select two or more tracks and click on the Merge button in the Edit menu. The tracks will be merged into a single track with a new ID.

Save tracks

BoxJelly will save tracks to the same format you opened the track file with.

You can save to the same location with the Save (Ctrl+S) button in the File menu, or you can save to a different location with the Save As (Ctrl+Shift+S) button. The Save As button will prompt you to select a location to save the track file.