Blog for work on my Masters thesis - a survey of methods for evaluating media understanding, object detection, and pattern matching algorithms. Mostly, it is related to ViPER, the Video Performance Evaluation Resource. If you find a good reference, or would like to comment, e-mail viper at cfar.umd.edu.
Archives
Media Processing Evaluation Weblog
Monday, May 31, 2004
More Fixes, Features, and a Plan for 4.0
After pushing out alpha 7, I've been trying to wrap up the last remaining feature requests marked as for 4.0. I moved several back to 4.1 or later, and implemented Autosave, which was probably the last big one that was required to make the program cool. Well, the last feature anyway. There are a variety of bugs that get in the way of program use, the most annoying being vdub4java's 'I'll only run once' hissy fit/ core dump thing. While I'm not comfortable releasing a 4.0 beta with that bug, I'll probably do so anyway, as I've hacked around the associated crashing. I'll release one more alpha, probably tomorrow, then focus completely on bug fixes for a week or two. After the major bugs are fixed, I'll release the first beta. After the documentation is caught up to the release, and I'm not noticing any major problems, I'll put out 4.0. Hopefully, this can be finished before the June VACE meeting.
- posted by David @ 2:45 PM
Tuesday, May 25, 2004
Tracking When Events Are Consumed in Java with Eclipse.
If you find your events are being consumed surreptitiously by your widgets, preventing your listener from hearing about them, you need to add a breakpoint at the consume event in the InputEvent class. Browse into the JRE_LIB jar and look for the java.awt.event package, and click on the InputEvent object to open it up. (You might have to modify your 'Java > Installed JREs' in the preferences window to get the source code to display.) Next, modify your breakpoint to only stop when the event you are looking for is consumed. Right-click on the breakpoint icon and select 'Breakpoint Properties'. Click the 'Enable Condition' radio button and add a condition, e.g. 'this instanceof KeyEvent' to listen for consumption of all key events. You may also want to 'Disable Breakpoint' until you get close to when the event is generated to minimize the number of unnecessary breaking/resuming you will have to do. Once you are here, you can then see in the stack when the 'consume' method was called.
This is most useful for KeyEvents, which the java containers have an annoying habit of consuming early and often. For example, if you have a scroll pane nested within in two split panes, you will have to remove the arrow keys from the ancestor input maps on all three separately in order for the events to bubble up to the outside container.
- posted by David @ 4:51 PM
Monday, May 24, 2004
ViPER-GT 4.0 Alpha 7
I just pushed out another version of viper, including some changes to ViPER-GT:
- Playback where valid: this allows the user to play only the parts of the video where a descriptor or class of descriptors exist.
- Section advance buttons: this skips to the beginning or end of the video, or, if 'playback where valid' is turned on, the beginning or end of a segment of the descriptor.
- Display with Respect To: keep an object in the same place while the video is playing, moving the other attributes around behind it. This isn't quite finished yet; I still haven't implemented propagate/interpolate with respect to.
- Various interface improvements, including more tooltips
- A few small bug fixes
- posted by David @ 12:48 PM
Thursday, May 13, 2004
A Database of Human Segmented Natural Images and its Application to Evaluating Segmentation Algorithms and Measuring Ecological Statistics
David Martin, Charles Fowlkes, Doron Tal and Jitendra Malik present an approach to comparing an image segmentation to a set of human-authored segmentations. It reminds me of Bleu, in that it presents an evaluation metric that makes up for massive interoperator disagreement in a smart fashion, exploiting the differences rather than hiding them. However, it does have the flaw that the more extreme segmentations are therefore acceptable.
By not giving instruction to the human segmentors, they seem to be grasping for a base set of image segmentations: segmentations that are representative of each person's idea of the most appropriate segmentation for the image. The metric could be useful for three dimensional segmentation, as well. It does give me some ideas as to better handle multiple matching.
Link Reference
@inproceedings{ martin2001iccv, author = "D. Martin and C. Fowlkes and D. Tal and J. Malik", title = "A Database of Human Segmented Natural Images and its Application to Evaluating Segmentation Algorithms and Measuring Ecological Statistics", pages = "416--425", year = "2001", url = "http://citeseer.nj.nec.com/martin01database.html" }
- posted by David @ 2:05 PM
Wednesday, May 12, 2004
Evaluating OntoSem
I stopped by the CLIP seminar this afternoon, which Doug introduced as the ultimate seminar with the longest title: Evaluating Basic Text Meaning Representations Produced by the OntoSem Semantic Analyzer. Sergei Nirenburg, the presenter from UMBC's ILIT, gave an overview of his OntoSem textual semantic analysis system and ran through some tests of it.
OntoSem extracts TMRs from natural language corpora. This is, essentially, a graphical model of meaning. These can be human edited, and used as gold standard data for evaluation or training. To evaluate the automatically extracted data, an OntoSearch concept distance measure was used to tell how different two ideas are. The evaluation used the human corrected data, which was corrected after of the three stages: preprocessing, syntactic analysis and semantic analysis. This allows the errors to be somewhat localized, indicating some sort of significant (Amdahl-style) improvement would come from replacing any of the stages with an oracle. This type of evaluation is useful for blame assignment, and also helps to train one layer specifically for the higher layers in the system.
Dr. Nirenburg's lab has a java environment to generate gold standard data, run analysis, and do evaluation online. This is more complete than viper, which cannot do the whole circuit online. There is the whole book 'Experimental Environments for Computer Vision' dedicated to those sort closed loop evaluation systems for vision. I would like to allow viper to reach that level, but it doesn't seem that important to me at the moment, where I would rather focus on evaluating pre-existing systems than on developing plug-in wrappers for them.
Dr. Nirenburg mentioned that production of gold standard TMRs takes on the order of ten minutes per sample. This is quick. A large number of these will make a good training sample, and be good for evaluation, as well. They are proposing to generate TMR samples from a 100,000+ sized corpus. They also noticed that using their system to make the gold standard authors editors helped to reduce interannotator disagreement, as well as rapidly accelerating the process. I still worry about bias, but I suppose, for evaluating their system instead of another, this will at least be useful to prove progress.
- posted by David @ 12:27 PM
Friday, May 07, 2004
ACM Open Source Multimedia Applications
Nevenka Dimitrova stopped by today to give a seminar on using vision techniques for information extraction, and mentioned that the ACM is hosting a competition for open source multimedia applications. Her own work is very interesting; she works with Philips Labs on next generation PVR and lifelog devices (TiVo that watches TV). Some of the more interesting topics included screenplay-caption alignment for scene detection and other kinds of data fusion for set-top-box type applications.
- posted by David @ 2:52 PM
Sunday, May 02, 2004
SILVER: Video editing
I haven't had time to read all their papers yet, but CMU's SILVER video editing project, which is heavily influenced by Informedia, has plenty of interesting ideas about editing video, many of which could apply to editing media annotations. For more information, see the list of Silver publications.