Media Processing Evaluation Weblog

Somethings been really, really annoying me

Okay, so I talked with Ilya this afternoon about getting ViPER-PE to perform per his requirements. He needs a list of all objects that are identified at all in a video. He doesn't much care that the objects are identified well. Originally I suggested a Framewise evaluation, but he doesn't care if an object is identified on each frame, just that it is identified sometime during its existance in the field of view. So, he should logically use an object evaluation with high thresholds and no statistical evaluation. But, he has identified a few bugs...

The first bug is that some items are making it through localization. I haven't noticed it, because they are caught during the statistical match. I've tracked this down to a bug in the 'clear' method of the 'FrameSpan' object, where it wasn't always clearing values. I've added a few more tests to the AttributeTest unit test set to exercise the framespan more.

The second bug arises in MULTIPLE matching, where some cromulent matches are thrown out.... MULTIPLE is almost what Ilya wants, although NONE is a better choice. I'll look in to fixing that; I thinkg what he really wants is an asymmetric multiple match, like multi-greedy, where each target descriptor is matched to 1..k candidates, and each candidate is matched to at most one target.

Two Useful features

Talking with Raj today as he gets deeper into annotating the HAI data, I mentioned two features that I'd always been meaning to add: 'E-mail Logs to viper-bugs' and 'Zoom to Selection'. I already have zoom to selection working on the person editor, so it shouldn't be too hard to add to viper-proper. The email thing might be harder; I hopefully will be able to use the javamail api.

truth

A video sequence can represent any number of persons, activities, and objects. The goal of a video understanding algorithm is to automatically extract information - allowing a machine to perform a dull task that would take dozens of man hours per hour of footage. ViPER-GT is a system for that dull task. It allows a user to define a set of data that a video represents, and mark it up in painstaking, frame-by-frame detail. In a sense, ViPER-GT is a video annotation tool, in the same vein as VideoAnnEx or VideoMiner. However, it is more accurate to compare it to Anvil or other tools for performance evaluation. I will present information about ViPER-GT's predecessors (and descendants), information about its design and implementation, some quantification of the effort required to use ViPER per quality of output, and some use cases.

Quartz Extreme 2D + Viper = Bad

Okay, so I've finally gotten around to installing OS 10.4 on my mac, and one of the first things I did is enable Quartz 2D Extreme. This enables quick drawing. However, it also made viper's zooming features behave in... questionable ways. It may be my video card (ATI Mobility 9700 with 128MB of RAM), or it could be a driver issue, or something to do with Java in Mac OS X. When you zoom in, the resolution is incorrect - the bitmap is rendered at whatever the last resolution was. This is also independent of java version - it happens on 1.5 as well as 1.4.2. (1.5 seems faster, but I didn't test that.) I've also noticed a similar problem in iMovie when Q2dE is enabled and I resize the application window. It seems that a specific resolution image is being placed in the video ram, and then being reused after it should be replaced.

scripts manager

ViPER-GT needs a utility to manage user scripts. Right now, the user installs scripts by placing them in the ~/.viper/scripts directory. This is a lot of work - especially because most file managers hide the .viper directory. A user should be able to install, upgrade and remove scripts from within viper, and maybe browse for new ones. Also, it should be possible to bind a script to a hotkey.

The scripts are already somewhat self-describing; they include a method to get their name. Improvements would be to get a default bundle or classification (for grouping scripts when they become too numerous for a single menu, or tags for browsing and discovery purposes). Other than that, the scripts shouldn't require too many changes.

There are a few things I want to emulate: Firefox, jEdit and Eclipse all have pretty much what I want, in the form of their plug-in managers. Of them, jEdit is probably the most useful to us directly; written in Swing and with a compatable license, so some of the code might be useable directly.

Required Preference Items

I would like for the scripts manager to make use of the existing infrastructure as much as possible. As such, it is a good idea to get a handle on the PrefsManager, which is described somewhat in the AppLoader specification. Basically, the PrefsManager is a layer on top of Jena to provide some functionality that was missing when I started using it. Jena is an RDF triplestore for Java. RDF is a good way to keep track of a lot of semi-structured information. XML presents a rooted tree, while RDF's data model is a graph. An RDF document is a list of subject, verb, object triples. This makes describing things with lots of loopy links easier, e.g. describing the connections between javabeans in ViPER-GT.

I also find it to be a lot easier to read and more compact than XML, at least in n3 form. (It seems that most of the data I've found on the web is in RDF/XML, which is sort of the worst of both worlds.) Anyway, to extend ViPER-GT in any way that involves munging with more than one simple component usually involves editing a lot of n3, so it is important to understand how to read it and how ViPER-GT uses it, both in the apploader packages and in some of the beans that use PrefsManager directly, like the UndoManager (which uses prefs to look up text strings for each undo item). For a simple example of using the user prefs file to track some information, look at how ViperViewMediator's getLocalPathToFile and putCanonicalToLocalMapping keep track of where a file referenced in a gt file can be found on the user's local disk.

Basically, to add elements to the RDF model, you use the changeUser(toRemove,toAdd) method on the PrefsManager. This makes sure the change happens atomically and that the appropriate listeners are notified. Later, to get information, query the unified model field directly, using the normal Jena methods (e.g. getResource and listStatements). You will need to define new properties, such as 'version' and 'last-update-check-time', and probably a 'Script' RDF class.

Where possible, you should use existing techniques and RDF vocabulary. For example, you can use the apploader HOTKEYS vocabulary to bind hotkeys to each script.

Setting Up the Management Protocol

I'd recommend doing whatever Firefox or jEdit does. I'm sure the Firefox stuff is better documented, but the jEdit stuff is browsable.

Requests from the PSU Meeting on June 1, 2005

Improved performance: the ability to play back at full speed or faster with fewer dropped frames. While I doubt ViPER will ever acheive VirtualDub-quality scrubbing, especially while written in Java, it can probably be improved considerably. The first step is getting a profiler configured on my machine at work and trying to quantify the performance issues. It might also be a good idea to make a simple piccolo-video player to see what the performance bounds are on that sort of widget.
Diff-type functionality. The way I see implementing this is by having one .xgtf file as the 'Primary' and having several others possibly loaded as 'secondary'. We discussed outputing four different files from a comparison, correct truth and result data, missed truth and false result, and displaying them with different colors (like the old overlay scripts).
Shift+constrained orientation of shapes: holding shift while drawing an oriented box or line of a polygon should constrain it to one of the eight major orientations.
Per-attribute coloring, or descriptor specific styles. For example, the VideoMining annotation tool supports coloring based on another attribute value (male = red, female = blue, etc).
Better handling of hiding things using the tabs and column header colored sphere icons. Right now, it is too easy to change those accidentally. I should add a pop-up menu, as well, which would help make the icons easier to understand.
A variety of improvements to the timeline were requested. The most obvious one was making the descriptor summary lines clearer and more/less meaningful. Right now, if there are more than a couple of descriptors, the line becomes unusable and slow. If there are fewer than a couple, the line is needlessly obscure. Also, the button for 'display where valid' is a horrible UI, and the roll-out of a set of descriptors is similarly useless. Also, it would be nice to constrain annotation to one subset of the video. Also, you probably shouldn't be able to drag the time-cursor to an invalid frame or type an invalid frame in to the frame number box.
While 'Toggle Display of Invalid' is nice, it should never hide the selected attribute. This leads to the strange problem of accidentally drawing on the screen when you think you should be in 'select' mode. Even if you do mean to draw the shape, you certainly mean for it to be automatically set to 'valid', too.
'Advance to next descriptor' seemed well-implemented in the video miner. I'll probably steal their icon idea (an arrow pointing to a box).
In general, viper does not provide enough support for editing existing data. Often it is easier to start from scratch than to fix a problem (e.g. an annotator made all of the faces too large).
Event-annotation features would be nice - thumbnail display, shot-level editing, etc.

text lines

Daniel R is doing an excellent job with the text lines, and I've had to make surprisingly few changes to the code to support some of the more interesting things that he's doing. For more information about text lines, see the relevant post on the google group.

SpeedPlayer 2

So, I've been working on fixing up Ayesh's 'SpeedPlayer' to be used as a testbed for my ideas about how to get semantically relevant highlights of a single video stream. These ideas will be integrated into panoply, but I want to have some good short-term goals first. I've got it set up to do a sort of generic dynamic query on videos by chaining relevance information filters (or, rather, an n-dimensional function paramaterized by frame number - just so long as they are converted into a relevance function at the end).

In Ayesh and Daniel's work, the video summarization was based on curve simplification of a frame histogram - with the frames corresponding to key points of the simplified curve used as the summary. The curve simplfication technique they used gave a score of 'curve importance' to each point on the curve, which was used to determine relevance rank for the corresponding frame.

First, I refactored the applet to use Java Swing instead of AWT and Jonathan's pure java MPEG decoder instead of the one from J. Anders. Next, I refactored the frame rank information to use my classes for relevance. Then, I replaced the existing fixed rank -> threshold modifcation path with a 'relevance chain', inspired by the iTunes smart playlist editor and the new Mac OS X automator as much as by Schneiderman's original paper on dynamic queries. More recently, I've been trying to replace the jmpeg stuff with QuickTime for Java, but that requires either frame-level access or lots of quicktime-edit calls, which seems to be giving me trouble. Larry also mentioned keeping a 'frames seen' history. An interesting idea - as this can also be used as a relevance data source.

The more interesting question is how to use multiple relevance streams, and allow the user to integrate them. What makes a good summary? The idea of a goal-oriented summary - a summary that includes just one possible story - is what I am looking at here. Daniel suggested using curve simplification on: total centroid mass, inverse of the distance between two shapes, correlation between pairs of shapes, and on the tracks themselves. The idea of track correlation is interesting, as this will give a curve point that is of interest whenever the tracks change correlation - e.g. when people start walking together or when they separate. Also, there is the issue of what about a single person who is segmented into two tracks.

transcode using vlc

Just so you now, to save 'inputfile.avi' as the MPEG-4 video 'outfile.mp4', use the following command line:

vlc -vvv inputFile.avi --sout '#transcode{vcodec=mp4v,acodec=mpga,vb=800,ab=128}:duplicate{dst=std{access=file,mux=mp4,url="outfile.mp4"}}'

fuzzy importance framework for smart fastforward in panoply

Something I doubt has yet to be done: Daniel originally used frame ranking for smart ffwd. I propose a more generic frame-score approach, which has a lot of added benefits. It can still support frame ranking, simply by providing some appropriate map of rank to score, with lower ranks receiving correspondingly lower scores. However, it presents a more generic ability to combine multiple scores or perform transformations, such as arbitrary convolutions, on the scores as a one-dimensional function of time. For example, we could directly map the score to playback speed, with scores at the highest level playing back in realtime or less-than-realtime, and scores at the lowest level perhaps being skipped entirely, or at least only displayed briefly. In the Dotworld interface, I see adding another panel, one which might prove unnecessary for the user, displaying the relevance information currently used for playback. A sparkline could be placed beneath each movie for their relevance, and possibly beneath each actor. These could be summed or unioned somehow to get more interesting information. From the perspective of movie as three-dimensional volume, this would be like extending Jacobs' work on intelligent thumbnails to allow spatial distortion of the images. In space, however, we will likely consider using intelligent cropping only, as such distortions in the spatial domain may prove less useful than simple cropping, or cropping + duplication + scaling.