ViPER Toolkit Software Requirements

This page describes what the software should do when it is completed. If you are a developer and want to modify this document, please read the howto document for editing this page. The system requirements—what software and hardware are required to run the toolkit—these are mentioned at the home page and in the quick-start document.

Three related projects, the Pure Java MPEG Decoder, the chronicle widget, and the LAMP AppLoader, have their own requirements pages. Here are links to the JMPEG requirements, the chronicle requirements, and the AppLoader requirements.

ViPER Goals

ViPER is designed to support video processing evaluation. To a lesser extent, it should support document, audio, and other media processing evaluation. This is broadly divided into two categories: ground truth management and evaluation.

It should be noted that a third category, workflow management and algorithm execution, which is included with many other such systems, is not a current goal of the system. This includes things like automatic execution of algorithms and creation of ground truth from a source document. For examples, see Tapas Kanungo's work with PSET, or Chhabra's and Phillips's Benchmark for Graphics Recognition Systems. We mostly use ViPER to allow different groups to run algorithms and mail us their results, allowing them to use their own computers.

These requirements are goals for the next version of ViPER, v4. This revision will make radical changes to the ground truth tool to support new visualizations and, hopefully, a cleaner workflow. There will also be minor changes to the performance evaluation tool, including the introduction of a fourth evaluation paradigm for activity detection.

Goals for Metadata Interface
- Must maintain current ease of use, with three standard descriptor types.
- Old viper data must be importable into any new file format
- The data must be searchable. This includes direct queries and searching for text strings.
- Sequences of media files must be supported.
- Editing instance data must be supported.
- Editing configuration should be supported.
- It should be easy to add new data types.
- The format should allow versioning, for support of undo/redo.
- The data must support indexing by time or by frame number.
- Relations and heirarchies should be supported. These are useful in defining activities and events, for example.
- Generated attributes and objects should be supported, through a generalized rule set or scripting language.
Goals for GT
- It must allow the user to view all data contained in a GT file.
- It must provide a graphical interface to the data.
- The graphical interface must overlay the metadata on the source media, or otherwise juxtapose it, when appropriate.
- The user must be allowed to edit the metadata.
- The interface should support undo/redo.
- The interface should support importing multiple metadata files.
- The interface should support editing the metadata configuration.
- The interface should contain tools to allow a user to mark up a file quickly, possibly as fast as real time.
- The interface should contain tools to allow a user to verify a set of ground truth quickly, preferrably at real time.
- The interface may support drag and drop, and any other direct manipulation ui metaphors.
Goals for PE
- This tool must provide a simple method for comparing a set of result data to ground truth data.
- It must support evaluation of object tracking algorithms.
- It must support evaluation of object detection algorithms.
- It must support activity and scenario detection evaluation.
- It must allow the user to select what subsets of data are appropriate for evaluation.
- It must support a user directly comparing several different algorithms against the same corpus.
- It should be easy to use.
- It should support nice, graphical output, or exporting to excel or some other file format.

Definitions

ViPER: The Video Performance Evaluation Toolkit (or Video Performance Evaluation Toolkit), provides a set of tools for evaluating video processing algorithms, such as person tracking and text detection.
Candidate: Also known as result data, this is the set of descriptors generated by some algorithm that will be compared to the target data.
Target: Also known as truth data, this is the set of descriptors that represent the true content of the media file.

ViPER-GT: The Ground Truth Authoring and Viewing Tool

Remember Ben Shneiderman's infoviz mantra:

Overview First
Zoom + Filter
Details on Demand

Core System
1. Metadata Management
  1. The core system must allow the plug-ins to access via the ViPER API.
  2. The core system must support loading and saving any valid viper file and saving it back, so that the two files are equivalent.
  3. The system should have a history of user files, to allow quick loading of recently used files.
  4. The system should be able to handle loading multiple files at a time, for editing and comparing results, and switching between related files in a project (e.g. different camera angles).
  5. The system may support inserting other files or merging two metadata files. If the GT system doesn't support this, a seperate tool must exist for this kind of functionality (it is often necessary to divide work among multiple truth editors, and merging is a necessary step).
2. User Interface Mediation for Modules
  1. There must be a means of reducing the user clutter and focusing the user's attention to the work at hand. This includes a method for selecting what file to work on, what frames are important, and what descriptors are relevant to the user's task. An example would be having a selection for showing only descriptors that are relevant to another descriptor, a certain frame, or contain a certain lvalue.
  2. There should be a method for supporting DnD. 694122
3. Basic Graphical User Interface
  1. The menus and windows must follow standard guidelines (e.g. the Windows guidelines).
  2. The main window must show the current sourcefile selected in its title bar. 694123
  3. The user should have access to a list of most recently used files, to save time looking for files to open.
4. User Preferences Management
  1. Different modules should have their own preferences, accessable in the same manner or through the same interface.
  2. The user should be able to specify their hotkey preferences. 694133
5. Undo/Redo 694113
  1. The undo/redo should be accessable through the standard hotkeys and place in the menu.
  2. The undo manager should have a user interface to display multiple levels of undo.
  3. Any undo manager may support transactions, to allow multiple actions to be bundled into one by a script.
6. Documentation
  1. There must exist a clear design specification of the core system, allowing plug-in writers a clear handle on how to extend the app.
  2. There users must have a manual which describes the system from the perspective of a ground truth author or designer.
7. Ontology Support
8. Help System
  1. There must be some form of online help, if only a link to the documentation.
  2. There should exist context-sensitive help, explaining certain features.
Canvas Module
1. A media canvas must exist, which presents a view of the current file and frame, with the metadata appropriately represented directly upon it, and directly manipulatable when possible.
2. The canvas must be frame accurate - i.e. it will display the frame x as per the media format specification, and consistant with time, as well.
3. The canvas should be fast, displaying the new frame after a user suggests a change in less than a second. It would be preferrable to allow real-time playback of video, as well.
4. The canvas should use standard vector manipulation ui metaphors and gestures, unless there is a clear reason to use different ones. 694134
5. There should be a way to select multiple objects on the canvas and manipulate them at once. 694114
6. There should be a way for the user to customize colors or to associate colors with each instance of an object. 694120
7. The shapes and so forth should be drawn appropriately to the scale of the zoom. 694124
8. In order to support more clarity while editing ground truth, the canvas should allow the user to manipulate the image quality, e.g. contrast, gamma, and possibly sharpness. 694126
9. Should have a refresh or snapback button to go to a preferred view clear of clutter or pixel effluvia. 694128
10. Should have standard video controls to allow real-time scan of video for errors. 694130
11. It would be nice if there was a way to create a movie of the canvas. 694106
12. Should have a way of tracing object tracks through a video at near-real-time. 793929
Timeline Module 694089
1. The new system must have a timeline (see LifeLines or OntoLog to get an idea of what a timeline view is) that shows an overview of the metadata along the t dimension.
2. The timeline must support direct manipulation paradigms of some sort.
3. The timeline must provide all the functionality currently provided by the range slider and the buttons beneath it. This includes:
  1. The user can use the timeline widget to scrub through time, dragging a slider, or something, to skip around through the video. The change must be reflected in the canvas.
  2. The user must be able to use the canvas to select frames, down to frame level precision. This may involve the use of a type-in box, or some sort of enhanced fisheye scrubbing.
  3. The user should be able to use the timeline to propagate descriptor data across frames.
  4. The user must be able to use the timeline to interpolate a descriptor between frames.
4. The display should support dynamic-query type manipulation to display only lines that are relevant to the user.
5. The timeline should support zooming along the time access. It may support zooming along the orthogonal, but that may prove distracting.
6. The timeline should present the lines in a coherent ordering, and allow the user to alter the ordering (either with direct manipulation or with some sort of dynamic query type functionality).
7. The timeline should have thumbnails of important frames (to the user). 694131
Table Module (See the spec)
1. The table module presents a spreadsheet-like table view of all instances of a descriptor type or set of descriptor types.
2. The table module must allow direct editing of all attribute values.
3. The table must have tabs for each object descriptor type, one for all content descriptors, and one for the file descriptor. 694118
4. It should be possible to hide selected attributes from the view. This will help to reduce clutter and increase user focus. 694116
Property Sheet Module
1. The property sheets must provide an editable overview of a single descriptor. It may show the descriptor over the course of its life, or for the currently focussed frame.
Ontology/Configuration Editor Module
1. The system needs a config editor. The current method of editing .gtc files is a significant barrier to the utility of the toolkit.
Tree View of Instances 694117
1. The new gt should include a treeview of the instances, which will allow the user to select the appropriate ones and to compare them.

ViPER-PE: A Tool for Performance Evaluation

1. The performance evaluation tool must take in two different representations of metadata for the same source media file and present the user with a numeric valuation of the distance between the two metadata sets.
2. The performance evaluation tool must support multiple evaluations and scripted evaluation runs.
3. The performance evaluation tool must support user defined types and evaluation techniques.
4. The performance evaluation tool must allow the user to evaluate only the section or sections of the truth that is relevant to the user's interest.
5. The performance evaluation tool must support the ViPER data format.
6. The user may set parameters to indicate that different metadata classes represent the same thing.

ViPER Toolkit Software Requirements

ViPER Goals

Definitions

ViPER-GT: The Ground Truth Authoring and Viewing Tool

ViPER-PE: A Tool for Performance Evaluation

ViPER-Viz: Scripts and Tools for Analyzing Evaluations