Archive for 2005/05

Blog for work on my Masters thesis - a survey of methods for evaluating media understanding, object detection, and pattern matching algorithms. Mostly, it is related to ViPER, the Video Performance Evaluation Resource. If you find a good reference, or would like to comment, e-mail viper at cfar.umd.edu.

Media Processing Evaluation Weblog

Tuesday, May 24, 2005

text lines

Daniel R is doing an excellent job with the text lines, and I've had to make surprisingly few changes to the code to support some of the more interesting things that he's doing. For more information about text lines, see the relevant post on the google group.

- posted by David @ 2:02 PM

Wednesday, May 04, 2005

SpeedPlayer 2

So, I've been working on fixing up Ayesh's 'SpeedPlayer' to be used as a testbed for my ideas about how to get semantically relevant highlights of a single video stream. These ideas will be integrated into panoply, but I want to have some good short-term goals first. I've got it set up to do a sort of generic dynamic query on videos by chaining relevance information filters (or, rather, an n-dimensional function paramaterized by frame number - just so long as they are converted into a relevance function at the end).

In Ayesh and Daniel's work, the video summarization was based on curve simplification of a frame histogram - with the frames corresponding to key points of the simplified curve used as the summary. The curve simplfication technique they used gave a score of 'curve importance' to each point on the curve, which was used to determine relevance rank for the corresponding frame.

First, I refactored the applet to use Java Swing instead of AWT and Jonathan's pure java MPEG decoder instead of the one from J. Anders. Next, I refactored the frame rank information to use my classes for relevance. Then, I replaced the existing fixed rank -> threshold modifcation path with a 'relevance chain', inspired by the iTunes smart playlist editor and the new Mac OS X automator as much as by Schneiderman's original paper on dynamic queries. More recently, I've been trying to replace the jmpeg stuff with QuickTime for Java, but that requires either frame-level access or lots of quicktime-edit calls, which seems to be giving me trouble. Larry also mentioned keeping a 'frames seen' history. An interesting idea - as this can also be used as a relevance data source.

The more interesting question is how to use multiple relevance streams, and allow the user to integrate them. What makes a good summary? The idea of a goal-oriented summary - a summary that includes just one possible story - is what I am looking at here. Daniel suggested using curve simplification on: total centroid mass, inverse of the distance between two shapes, correlation between pairs of shapes, and on the tracks themselves. The idea of track correlation is interesting, as this will give a curve point that is of interest whenever the tracks change correlation - e.g. when people start walking together or when they separate. Also, there is the issue of what about a single person who is segmented into two tracks.