Blog for work on my Masters thesis - a survey of methods for evaluating media understanding, object detection, and pattern matching algorithms. Mostly, it is related to ViPER, the Video Performance Evaluation Resource. If you find a good reference, or would like to comment, e-mail viper at cfar.umd.edu.
Archives
Media Processing Evaluation Weblog
Wednesday, February 26, 2003
I've started setting up a sourceforge project for ViPER. Thanks to Dave, I can now release it under the GPL. It has some non-GPL'd code checked into the CVS tree right now (or, rather, some non-GPL'd jar files) so I'll have to move those to a seperate package before I can set up the CVS tree on their servers, or at least modify the documentation to fulfill their license agreements for redistribution (standard attribution requirements).
I'll probably move this blog there, as well.
- posted by David @ 6:06 PM
Thursday, February 20, 2003
A Performance Evaluation Protocol for Graphics Recognition Systems
This paper presents an evaluation protocol for vector graphics recognition systems, with a description of metrics for line segment, arc, circle, and text box entities. It presents a method for matching ground truth entities to detected entities. This includes similarity measures for the aforementioned types and an algorithm for selecting matches. The results are simply the list of matches and the similarity measures for each pairing; further interpretation is left unmentioned in the paper; it just mentions that 'performance measurements for the recognition system can be formulated', and that 'a linear combination of... the matching results... can be summed to produce an overall score relevant to the application'. For an example of this, see the referencing paper.
The paper makes references to a 1995 benchmark for dashed line recognition for the first International Workshop on Graphics Recognition at Penn State University. I should probably ask someone at PSU for a copy.
One of the more interesting things the paper discusses that ViPER-PE does not consider is different matching metrics for different heterogenous pairings of target and candidate entities, i.e. using a different metric for matching an arc segment to a line segment than either arc-to-arc or line-to-line.
The presentation of the multiple matching (only many/one and one/many) does not include any explanation, theoretical or otherwise. The benchmark first gets one-to-one matches, in a sort of hobbled Hungarian algorithm (might actually be a full-fledged Hungarian algorithm, but I doubt it). Then it looks at matches where the residuals greater than a threshold, the lower-threshold, sum to a total similarity measure greater than a larger threshold, the upper-threshold. These are taken as many/one or one/many matches. At least with the only paper I found that used the Hungarian algorithm tried to show why it would be a good idea. I would like to pose a similar argument for ViPER-PE's multiple matching algorithm, but I think I need to take some more algorithms classes, first.
Link Reference
@article{Atul1998 author = {Ihsin T. Phillips, Jisheng Liang, Atul K. Chhabra, and Robert Haralick}, title = {A Performance Evaluation Protocol for Graphics Recognition Systems}, book = {Lecture Notes in Computer Science}, editor = {K. Tombre and A. Chhabra}, volume = {1389}, year = {1997}, pages = {372--389} }
- posted by David @ 3:42 PM
A Benchmark for Graphics Recognition Systems
A. Chhabra and I. Phillips present a system for testing recognition of various elements of CAD drawing. The system can take files in the AutoCAD .dxf format and convert most of the visual entities into its own .vec format, which it them may perturb before rasterizing the output. Then it can run the recognizers on the raster images, and compare the results with the original .vec files. They use the evaluation described in and earlier paper.
The evaluation takes the results from the metrics described above into a weighted average for the detection rate, instead of the more simple rate that viper currently uses. It also results in a false rate and miss rate instead of a precision/recall; this is more consistent with their conception of multi-match results. Also, they offer a 'Editing Cost' metric, a weighted sum of the number of false or missed entities and the number of merge events. These are a set of useful metrics. The paper includes some evaluation, but unfortunately 'we set... w6 and w7 to zero', the weights of the merges in the 'Editing Cost' metric. There is also no analysis of the significance of their results.
The perturbation of the vector data stands in contrast to LAMP's evaluation software for fully automated OCR evaluation, which uses Tapas Kanungo's software for adding noise to raster images. Tapas's software is backed up with some math, while Chhabra and Phillip's vector perturbation algorithm is only vaguely outlined in the paper.
Link Reference
@article{Atul1998 author = {Atul K. Chhabra & Ihsin T. Phillips}, title = {A Benchmark for Graphics Recognition Systems}, book = {Empirical Evaluation Techniques in Computer Vision}, year = {1998}, pages = {28--44} }