Okay, so I talked with Ilya this afternoon about getting ViPER-PE to perform per his requirements. He needs a list of all objects that are identified at all in a video. He doesn't much care that the objects are identified well. Originally I suggested a Framewise evaluation, but he doesn't care if an object is identified on each frame, just that it is identified sometime during its existance in the field of view. So, he should logically use an object evaluation with high thresholds and no statistical evaluation. But, he has identified a few bugs...
The first bug is that some items are making it through localization. I haven't noticed it, because they are caught during the statistical match. I've tracked this down to a bug in the 'clear' method of the 'FrameSpan' object, where it wasn't always clearing values. I've added a few more tests to the AttributeTest unit test set to exercise the framespan more.
The second bug arises in MULTIPLE matching, where some cromulent matches are thrown out.... MULTIPLE is almost what Ilya wants, although NONE is a better choice. I'll look in to fixing that; I thinkg what he really wants is an asymmetric multiple match, like multi-greedy, where each target descriptor is matched to 1..k candidates, and each candidate is matched to at most one target.
ViPER-GT needs a utility to manage user scripts. Right now, the user installs scripts by placing them in the ~/.viper/scripts directory. This is a lot of work - especially because most file managers hide the .viper directory. A user should be able to install, upgrade and remove scripts from within viper, and maybe browse for new ones. Also, it should be possible to bind a script to a hotkey.
The scripts are already somewhat self-describing; they include a method to get their name. Improvements would be to get a default bundle or classification (for grouping scripts when they become too numerous for a single menu, or tags for browsing and discovery purposes). Other than that, the scripts shouldn't require too many changes.
There are a few things I want to emulate: Firefox, jEdit and Eclipse all have pretty much what I want, in the form of their plug-in managers. Of them, jEdit is probably the most useful to us directly; written in Swing and with a compatable license, so some of the code might be useable directly.
I would like for the scripts manager to make use of the existing infrastructure as much as possible. As such, it is a good idea to get a handle on the PrefsManager, which is described somewhat in the AppLoader specification. Basically, the PrefsManager is a layer on top of Jena to provide some functionality that was missing when I started using it. Jena is an RDF triplestore for Java. RDF is a good way to keep track of a lot of semi-structured information. XML presents a rooted tree, while RDF's data model is a graph. An RDF document is a list of subject, verb, object triples. This makes describing things with lots of loopy links easier, e.g. describing the connections between javabeans in ViPER-GT.
I also find it to be a lot easier to read and more compact than XML, at least in n3 form. (It seems that most of the data I've found on the web is in RDF/XML, which is sort of the worst of both worlds.) Anyway, to extend ViPER-GT in any way that involves munging with more than one simple component usually involves editing a lot of n3, so it is important to understand how to read it and how ViPER-GT uses it, both in the apploader packages and in some of the beans that use PrefsManager directly, like the UndoManager (which uses prefs to look up text strings for each undo item). For a simple example of using the user prefs file to track some information, look at how ViperViewMediator's getLocalPathToFile and putCanonicalToLocalMapping keep track of where a file referenced in a gt file can be found on the user's local disk.
Basically, to add elements to the RDF model, you use the changeUser(toRemove,toAdd) method on the PrefsManager. This makes sure the change happens atomically and that the appropriate listeners are notified. Later, to get information, query the unified model field directly, using the normal Jena methods (e.g. getResource and listStatements). You will need to define new properties, such as 'version' and 'last-update-check-time', and probably a 'Script' RDF class.
Where possible, you should use existing techniques and RDF vocabulary. For example, you can use the apploader HOTKEYS vocabulary to bind hotkeys to each script.
I'd recommend doing whatever Firefox or jEdit does. I'm sure the Firefox stuff is better documented, but the jEdit stuff is browsable.
So, I've been working on fixing up Ayesh's 'SpeedPlayer' to be used as a testbed for my ideas about how to get semantically relevant highlights of a single video stream. These ideas will be integrated into panoply, but I want to have some good short-term goals first. I've got it set up to do a sort of generic dynamic query on videos by chaining relevance information filters (or, rather, an n-dimensional function paramaterized by frame number - just so long as they are converted into a relevance function at the end).
In Ayesh and Daniel's work, the video summarization was based on curve simplification of a frame histogram - with the frames corresponding to key points of the simplified curve used as the summary. The curve simplfication technique they used gave a score of 'curve importance' to each point on the curve, which was used to determine relevance rank for the corresponding frame.
First, I refactored the applet to use Java Swing instead of AWT and Jonathan's pure java MPEG decoder instead of the one from J. Anders. Next, I refactored the frame rank information to use my classes for relevance. Then, I replaced the existing fixed rank -> threshold modifcation path with a 'relevance chain', inspired by the iTunes smart playlist editor and the new Mac OS X automator as much as by Schneiderman's original paper on dynamic queries. More recently, I've been trying to replace the jmpeg stuff with QuickTime for Java, but that requires either frame-level access or lots of quicktime-edit calls, which seems to be giving me trouble. Larry also mentioned keeping a 'frames seen' history. An interesting idea - as this can also be used as a relevance data source.
The more interesting question is how to use multiple relevance streams, and allow the user to integrate them. What makes a good summary? The idea of a goal-oriented summary - a summary that includes just one possible story - is what I am looking at here. Daniel suggested using curve simplification on: total centroid mass, inverse of the distance between two shapes, correlation between pairs of shapes, and on the tracks themselves. The idea of track correlation is interesting, as this will give a curve point that is of interest whenever the tracks change correlation - e.g. when people start walking together or when they separate. Also, there is the issue of what about a single person who is segmented into two tracks.
vlc -vvv inputFile.avi --sout '#transcode{vcodec=mp4v,acodec=mpga,vb=800,ab=128}:duplicate{dst=std{access=file,mux=mp4,url="outfile.mp4"}}'