Blog for work on my Masters thesis - a survey of methods for evaluating media understanding, object detection, and pattern matching algorithms. Mostly, it is related to ViPER, the Video Performance Evaluation Resource. If you find a good reference, or would like to comment, e-mail viper at cfar.umd.edu.
Archives
Media Processing Evaluation Weblog
Wednesday, July 30, 2003
GT v4 Canvas
So, I probably should write a design spec for the canvas. I'm thinking that there is probably a method for handling the canvas that isn't as non-object-oriented as the old method. It is probably a question of registering the correct handlers. For example, the canvas will implement a standard selection handler, which issues an API query to get all items in a certain region. Those items are then in 'selected' mode, and will receive mouse events. And how to unselect? A mouse event is trickled to all selected events, and if it is not captured, it is passed to the canvas selection handler. This allows the powerpoint-style selection behaviour - click away to deselect.
And how to handle the necessary API changes? The selection represents a larger move to a standard 'query' api, accomplishing SQL-type stuff. I would like to allow faster queries (i.e. by storing appropriate things in quadtrees), but that shouldn't be necessary for single-frame queries (get all things in this box on frame 12, the kind needed for selection). Still, it would be nice to provide hooks for indexing (either for geometric data, as in this example, or for word search using Lucene), and I probably should do that. The datatypes are loaded during parsing, and they could pass the information. Another method would be to have a seperate 'indexer' class that can be associated with different datatypes, or even attributes.
ViperQuery queryobj = QueryFactory.generate ("hasAttribute", "within("+x+","+y+")"); Iterator resultIter = sourcefileMetadata.query(queryobj);
The iterator could then highlight the descriptors, and request event handlers.
- posted by David @ 4:51 PM
Project Plans for GT v4
I need to write some good project plans for GT. If I expect to have to coordinate between developers, it is good to have a plan, like Mozilla's roadmap.
- posted by David @ 1:46 PM
Tuesday, July 29, 2003
Segmentation Evaluations
Okay, so I already discussed the simple news story segmentation evaluation used in Daniel's book, but what about higher dimensionality? For two dimensions, there is Kanungo et. al.'s PSET, developed at our lab before ViPER. Another method, as presented by Erik B. Dam in his thesis, Evaluation of Diffusion Schemes for Multi-scale Watershed Segmentation, are based around the idea of processing cost, or how much work
is required to transform the candidate into the target. For example, a minimal processing cost method would count the minimum possible number of region selection and deselections to convert the candidate into the target segmentation.
- posted by David @ 4:48 PM
New Name
I am still somewhat annoyed at the number of things that are named ViPER, or Viper, or VIPER. At least we are on the front page of google for 'viper video' search. Since we are the top searches for 'video processing evaluation' and 'video performance evaluation,' perhaps branding beyond what we have isn't necessary.
- posted by David @ 3:44 PM
Definitions
Derived from Zhang via Koester and Spann's An Evaluation of 3D Segmentation algorithms using Seismic Variance Data.
- Analytical evaluation
- Evaluating an algorithm or technique directly from the theory itself.
- Empirical evaluation
- Evaluating something, in our case, an algortihm, from performance data.
- Goodness Measurements
- Evaluating the quality of the output due to inherint qualities in it. An example would be evaluating a translation system by reading its output without regard for the input (assuming no cheating, this works fairly well).
- Discrepancy Measurements
- Comparing the results to some gold standard or ground truth, and using that as the metric for evaluation.
According to these definitions, ViPER is a system for empirical evaluation using discrepancy methods.
- posted by David @ 3:29 PM
Monday, July 28, 2003
Tracking Evaluations
ViPER implements three major modes of evaluation: Framewise, Keyed Tracking, and Object Matching. The framewise evaluation looks at each frame of the video seperately. The keyed tracking algorithm compares how well a single found object tracks a single truth object, with the matching determined before the candidates are given to ViPER. The object matching algorithm is more similar to the keyed tracking algorithm, but it allows for a greater variety in evaluation and doesn't require beforehand matching. Instead, you can use a Hungarian algorithm approach to select the best-possible matching, resulting in the best possible score for the result set. But, how to handle splits and merges? There is a merge algorithm that starts with the optimal single matching, and tries to agglomerate objects. If any agglomeration improves the overall score (the calculation of which is pretty arbitrary), then the two objects are viewed as an 'aggregate object'. To handle splits, the algorithm works on both targets (ground truth objects) as well as the candidates (result objects). But is this sound? What does that mean?
It really depends on the shape of the search space. The only space I've implemented, the polygons, it seems to work, although I haven't proven it. I haven't been able to find a set of boxes that won't be combined as expected, though. There are a few problems with the output. Basically, there are few cases where a box will not be combined. This often results in a big pile of objects flattened into one. For example, several result boxes that overlap will usually be combined if they all overlap the truth object (always if they contribute to a better overall match). Unfortunately, this may give the same numerical result for different amounts of overlap. How should thes splits/merges be scored?
Let's imagine all possible ways that a candidate could give perfect precision but imperfect recall to two target objects. It would be better to reward consistancy (don't want track to flip back and forth between the two often) than inconsistancy. The best solution would be to stick the candidate to all one target, with each track loss counting as another bad mark. Finally, assuming that there are the same number of track switches, what is better? A switch in the middle, or one near an end? This, perhaps, should be left undefined for now. But this technique only applies in time; what about in other spaces? In convex polygons, like bounding boxes, this doesn't come up that often. But it may if we allow more general shapes, or other sets, as attributes.
What about matching generic 3d shapes? I need to look up some research on this. There is probably some good stuff about 3d segmentation and level-set evaluation for things like medical imaging.
- posted by David @ 3:03 PM
Vacation
D'oh. I won't be able to give a presentation Monday, as I will be on vacation. I'll prepare something for the week after, instead.
- posted by David @ 2:41 PM
Use Cases for Metadata Itself
So, I recently realized that I probably should publish a lot of the assumptions about the type of metadata that ViPER will be used for, as there are a bunch of assumptions that went into it. The most obvious thing is how we chose the default data types, which are fairly specific to the tasks for which we've used ViPER. Daniel asked me about using ViPER for his region tracking stuff, and I don't know how well suited the current tools or data types are to the task. The main method I would choose for ground truthing would be a set of descriptors for each wormhole type, with the wormhole a dynamic bounding box, but that imposes a lot of constraints. Another possibilty would be to have a 'wormhole' type with an 'object_type' attribute, which may be a necessary abuse of the data format to get evaluation to work under the current non-inheritance, non-DL regime. So, I'm going to try to come up with some use cases. They may imply the need for an API, but they won't make reference to one as such, or any specific model, if I can avoid it.
- posted by David @ 2:39 PM
Meeting Notes for July 28th Meeting
Huanfeng Ma: Gabor Filter Based Multi-class Classifier for Scanned Document Images
Huanfeng presented his work in preparation for the IJDAR conference paper next month; well, next week, actually. The work is in development for the TIDES bilingual dictionary project. He showed some examples, like finding hindi script in hindi/english dictionary for surprise language project. The system will do script, font-face and font-style identification to improve OCR, parsing, and extraction of dictionary information. He seperated existing work can be seperated by goal: script, font-face, and font-style detection methods, and by algorithm type: template based, feature based, etc. His system will use gabor feature vectors and work on all three classifications at once.
He went through the different stages in the system, including training. He first demonstrated symbol detection, using Hausdorff distance to find special symbols, identified by the trainer, that are common in dictionaries, and often contain necessary information for parsing. He then showed how he normalized the text (word segmentation occurs first, and the gabor filters require a certain sized block to work) by tiling a 64x64 square with the segment. He then showed the feature vector (four orientations in the isometric gabor filter, with four distances). The classifier for the presented results used distance to closest cluster centroid.
He then showed some results, with the accuracy hanging around 80%. The best results were for Hindi v. Roman, with the worst for Arabic v. Roman. He also showed a classifer for Arial v. Times, but that is a little bit fixed, as this could be any comparison between serif and sans-serif, and the roman was a little smaller. Some errors at the classifier level included classifying an 'a' or 'an' as Korean, which seemed plausible. It looked as though many errors were due to improper segmentation of words, as that is the first step and uses a smoothed connected components technique, sometimes resulting in, for example, the dot being taken from a lower case 'i' in 'in.' It was especially damaging to the Arabic script, sometimes resulting in one line of text segmented as two lines. Finally, he showed how replacing the classifier with an SVM may improve the results. Although he didn't show any numbers, the improvement was obviously significant for Arabic v. Roman.
Arvind Karunanidhi - Programming for the Series 60 Symbian Phone
The second presentation at this week's meeting was about how to develop camera apps for the Series 60 SymbianOS based Nokia phones. This is part of an ongoing project at LAMP to get working mobile multimedia. The goals for the project were to devlop an API for native apps (i.e. not j2me applications, but apps written in C), a viewfinder application (like on the back of a camera), and the ability to take and manipulate images before they are saved to a file (on the user's media card, which can take several seconds). The presentation was given with the small Nokia cameraphone, although he indicated that the API would work on the larger, UIQ/handheld style Sony Ericsson P800.
In the symbian system for the phone, each library acts like a server, and the applications connect to the dll as a client. Arvind showed the dlls using the FExplorer third-party app for Symbian. He then went through the development process, where he used an emulator, but found it better to run/debug on the phone, as that is more accurate. He's using the free borland compiler. He also went through the method for creating a project: writing a project file (it lists the files to use, the application GUID, etc), a build file (his example only had the project name), and a make file (with target architecture, either armws for arm windows emulator or armib for phone itself), as well as how to create a package contatining the finished application and the .sis Symbian installer for the application, which is transferred to the phone via IR or bluetooth.
He then presented the two applications: the image capture app and the viewfinder. He also presented some future plans, such as a barcode reader. He discused some of the problems with that (namely, c++ isn't suited for the development environment, and some of the intricacies of the C API for the system, such as its descriptor passing, two-phase construction of objects, lack of C++ exception handling, and its garbage collector and memory model.
After the meeting, we discussed I'm presenting next week. I also need to write a proposal for the August 19th lab review.
- posted by David @ 12:46 PM
Wednesday, July 23, 2003
Use Cases for API
So, in all the use cases for ViPER that I wrote a few months back, I didn't write anything for the ViPER API. There are three programs right now who use the ViPER API - ViPER-GT, ViPER-PE, and Malach. I think the Malach project is by far the most interesting use, and it uses a fork of the api that is was branched off about a year ago. Still, it is informative to have the three use cases. ViPER-PE uses an older api, developed on its own, and there isn't any reason to change it, at the moment. So ViPER-GT is the only one using the in-development api at the moment. Regardless, here are the three use cases.
An Editor
A client requires an application that will support editing video metadata. Existing applications to do this include the IBM MPEG-7 Annotation Tool.
Comparison
The developer wishes to compare two metadata files. The purpose is to determine, quantitatively, how different they are. It will require access to two sets of video data, at least, and have programatic access to their descriptions and have meaningful interpretations of their data.
A Browser
A user wishes to be able to search and browse a large (on the order of many thousands of hours) corpus of video, annotated with metadata. The metadata api must be such that it may access data from multiple sources and servers. It must support describing terabytes of video with the gigabytes of metadata.
- posted by David @ 6:35 PM
Monday, July 21, 2003
New Release (for configurator)
I put out a new release of ViPER, a pre-alpha of the fourth version. The only new thing an end user will see is the configuration tool and a bunch of new bugs. However, I think the configuration tool is on the right track, with things long-missing from the old architecture, including: undo/redo (with a spiffy little manager), recently used file history, and an api that makes it more extensible. I'll probably try to mess around with drag-and-drop support next. Anyway, here is the link to the pre-alpha quality ViPER build for July 2003.
- posted by David @ 12:50 AM
Sunday, July 20, 2003
Bootstrapping the AppLoader
So, the AppLoader requires a big RDF graph in order to load the application. But you would like to be able to just double-click on a jar file and have it run, so there must be a way to pass the necessary resources to the application. I can't figure out how to make a jar always load a given java property, so I must move away from, or add another method, than the current way of setting the java property lal.prefs
to the URI of the preferences. I think I'll probably have a default name for the file, locally, so I can package it with the jar and use the standard methods for getting resources, through the ClassLoader
.
- posted by David @ 11:22 PM
Friday, July 18, 2003
Video Transcoding
So we might need to transfer MPEG-2 into MPEG-4. One solution is FFMPEG. It seems to support MPEG-4, but it isn't clear if it is a stable solution. However, I don't see any other one. It is used in most open source applications of its type, like VideoLan and transcode, so I'm optimistic.
- posted by David @ 12:37 PM
Thursday, July 17, 2003
Web Page Concerns
I just imported the web site into CVS. This means that there will be a cron job running to update the site once a night. Since this blog is controlled by blogger, it will still be updated in real time. It is worth noting that the web site is hosted in SourceForge's CVS repository, unlike the source code for the project, which is hosted internally on a UMIACS machine. The site is still using fried SHTML, even though it is all static content. I'll try to get the baking scripts working soon; it should improve load time.
- posted by David @ 6:58 PM
ViPER Schema Editor
Okay, so Jena2 has a fourth preview release, and now a beta, they just didn't announce it on their site. I've spent the last couple of days doing documentation; I'll try to get out an executable jar of the software soon.
- posted by David @ 1:07 PM
Monday, July 14, 2003
ViPER Schema Editor
I think I'm about ready to cut a beta for the schema editor, but it relies on a live CVS version of the in-development Jena2 RDF API. If I put out a jar, it will have to include a version of Jena2 that I compiled, as the most recent developer preview is lacking several features (like the commands for setting what local name prefixes you want during serialization) and containing a few bugs (the one I bit that made me switch to the the in-development version involved literal datatyping). So I'll probably just keep adding features. The undo history list needs some clickability, and I would like to add drag-and-drop, or at least cut-and-paste, functionality to the tree view.
- posted by David @ 7:27 PM
Meeting Notes
Today's meeting consisted of a presentation by Huiping Li, one of the post-docs here. He is working with Dr. Doermann and Yang on a new system for storing and analyzing surveillance video; he presented a vague overview of the system (it will use the MPEG standards for object video compression, MPEG-4, and metadata, MPEG-7). He also presented an overview of object-based compression for video, and some ideas about when it is a better idea to use frame-based compression, instead.
- posted by David @ 7:17 PM
Monday, July 07, 2003
Meeting Notes
Today's weekly meeting was composed entirely of a talk by Xavi, one of the PhD students working at the lab with Dr. Doermann and Dr. Kang. His work is focussing on video genre classification. All of his results are in sports; I suppose this builds off of Vikrant Kobla's previous work on sports/non-sports classification here. His first talk was a run-through for a five-minute presentation of his work at this year's ICME in Baltimore, which is this week.
Eight slides in all, it presented a quick overview of his work. The system presented used motion features (from the MPEG motion vectors) and color features (using a simple four bucket histogram of white, green, yellow/brown and don't-care) fed to an HMM to classify the videos. The results were good, for the four video types he used (basketball, hockey, soccer and football). It was not suprising to see football and soccer have the highest confusion rate.
His second talk, a draft talk for his dissertation defense, was considerably longer. It covered his more recent work, which expanded the features to include audio, and the number of sports to include tennis (on grass courts), golf, and cricket. The increase in sports certainly reduced the accuracy of the system, despite the increased information from the audio. He also showed his interface for the system, which uses WMP, and displays the features along the bottom.
- posted by David @ 1:12 PM
Sunday, July 06, 2003
Performance Assessment by Resampling: Rigid Motion Estimators
This paper starts with a discussion of the commonly cited flaws in performance evaluation methodologies, referring to a paper by Christensen and Förstner, and then presents three resampling techniques: the boostrap, the jackknife and the empirical influence function. The paper then goes into methods for estimating error using an example of rigid motion estimation. I don't understand the paper; it appears that I'll need more background in numerical analysis and statistics. I do appreciate numerical bases for evaluation technique papers, and I did like this paper. It appears here that much of the paper is applying existing numerical methods to improve error estimates on one particular problem, something that perhaps should be done for all problems. Its references may prove more instructive.
Reference Link
@article{Matei1998 author = {Bogdan Matei, Peter Meer and David Tyler}, title = {Performance Assessment by Resampling: Rigid Motion Estimators}, book = {Empirical Evaluation Techniques in Computer Vision}, year = {1998}, pages = {72--95} }
- posted by David @ 5:18 PM
Tuesday, July 01, 2003
Blog Moving
I'm moving the blog from its current location at Blog*Spot to a new location within the current ViPER site at SourceForge. So, remember, go to the new location!