Home >> Media Group >> Research >> ViPER
Downloads -- Documentation -- Developers -- Discussion
LAMP     The Language and Media Processing Laboratory

Blog for work on my Masters thesis - a survey of methods for evaluating media understanding, object detection, and pattern matching algorithms. Mostly, it is related to ViPER, the Video Performance Evaluation Resource. If you find a good reference, or would like to comment, e-mail viper at cfar.umd.edu.

Archives

Media Processing Evaluation Weblog

Friday, June 27, 2003

Interface (Chronicle & Canvas) Ideas

I've been writing up the spec and requirements for the ViPER chronicle widget, and found this Gantt chart app, which is under the GPL, as is ViPER. This may be useful to Nagia on her project as well; I'll have to send her the link.

Also, I've been talking with Dave about modifying the input mechanism for bounding boxes to be more efficient (the old term was 'ergonomic', but I think it means something different now) for power users, which is pretty much anybody who has to use the application for its original purpose. Trust me, after eight hours of ground truthing a guy standing around and reading a newspaper while the handheld camera shakes for four minutes of footage will make anybody look for the most effient means of input, no matter how hard it is to learn or different from the standard mouse-drawn box editing metaphors.

In other news, I am close to having undo work properly in the Config editor. This will be the first demonstration that my plan to rebuild ViPER for version four might be feasible, which I'd been beginning to doubt. As someone who has a hard time shaking the 'bottom-up' approach (I like to mess with details too much), it often can be difficult to get to a stage where things work together.

Monday, June 23, 2003

More interesting code

I found a few more interesting links at freshmeat. One is a flow based video processing toolkit in Squeak. The other is similar, but for linux, called Veejay.

Wednesday, June 18, 2003

Functional Needs for ViPER

ViPER, as its name states, is used for evaluating video processing algorithms. The use cases I've put together so far detail the situation. A person wishes to compare an algorithm, so must develop ground truth data and a means of comparison. ViPER, with its ground truth editor and a comparison tool, provides a great deal of assistance, but the user must still spend a considerable amount of effort deciding on the metrics and programming viper-pe, not to mention the man-hours that are often required to generate ground truth (and the forethought that goes into designing the schema). So ViPER must make is possible, and should make it less painful than it already does.

Where should ViPER be in Three Years

So, in 2006, when I should theoretically have a PhD in hand, and looking back at ViPER, I should ask 'what has come out of it?' I mean, looking at something like Haystack, which produces a freakishly large number of publications, I should say that the most obvious thing I need to do is write papers. I would also like ViPER-GT to be a stable platform that is modular and extensible. The same can be said for ViPER-PE, but I would more like to focus on getting it to work in a larger system.

There are three kinds of papers I see coming out of the project: user interface papers relating to GT, information architecture papers relating to the ViPER-API, and processing evaluation papers relating to PE (and the system as a whole). The only possibly innovative thing coming out of the GUI end is the time line control, although a user study with a good outcome might be worth a paper as well; either way, a TR might be in order. A paper about the RDF-base application loader might be in order, but I think it is only interesting enough for a TR and maybe a few extra pages on the web site. As to development of the API and file format, these probably go more towards the multimedia metadata community than a description logic or ontology group, as the API doesn't have support for much real reasoning. There is some literature about using DLs for multimedia retrieval, but I haven't found much in the way of evaluation of the proposed systems. [Work by Ardizzone and Hacid, Carrive, Pachet, and Ronfard, and Bechhofer and Goble] As for the evaluation side of things, this is obviously ViPER's strength and its purpose.

In the near future, I would like to get a system working for evaluating activity detection. I'd like to get it working on Nagia's work, as well as the Flightline Activity use case, as those I are situations where I have actual data and can get feedback from the algorithms' designers. This might lead to a paper (hopefully, a thesis). The system could be expanded and made more user friendly, including better output and a more convenient front-end. The evaluation is slow; the order n-cubed Hungarian algorithms is dominated by the n-squared computation of the distances, which is sometimes dominated by the slow parser, or even the java virtual machine overhead. So a more efficient system could be designed, but I think that what would be more interesting is a more usable system, with a better set of support scripts and possibly some more interesting forms of output.

Wednesday, June 11, 2003

External Memories

There are several projects I know of to keep track of everything. I would like to have my own personal media database, a camera attached to my retina, and a record of everything I've ever said or written. MSR is working on their own project, as is MIT, with Project Oxygen, and now it seems that Daniel, one of the co-authors of the video-mining book I mentioned earlier, is working on something similar as well.

See Also:

Tuesday, June 10, 2003

Use Case: Flightline Activity

I need to come up with a use case for activity for aiport runway surveillance. There's already a lot of ground truth.

Friday, June 06, 2003

Config (Schema/Ontology) Editor

So, we need a new component for ViPER - an Ontology editor. Since the new system will use the viper-api, it is likely we'll have a working config editor before we have a new working system. Anyway, we have to decide how to do it. A simple solution would be to use a tree view with tear-off property sheets, while a better idea would also offer a uml view. If we do that, the program would most likely use an existing open source tool for the view, such as that in ArgoUML.

Link

Use Case: Text Tracking

In order to do good OCR in video, it is often necessary to fuse multiple frames of video together to provide what is known as superresolution enhancement. The text can then be interpreted with greater accuracy. Also, if text scrolls off or onto screen, or is occluded, it may become necessary to combine multiple runs of OCR into a single text entity. Key to these steps, in current systems, is text tracking.

Thursday, June 05, 2003

LTI-Lib

Saw an interesting notice in freshmeat today, after submitting a notice for ViPER, for a C++ image processing and computer vision algorithm library.

Link

Wednesday, June 04, 2003

Use Case: Outdoor Surveillance Footage

Most of the outdoor surveillance footage we have is collected from a set of cameras located around the building. Most footage we have includes people walking around, getting in and out of cars, and moving packages. There is footage of some thefts and some phone calls. The idea of activity detection in an outdoor forum is very open ended. We could easily capture footage of cricket players and picnickers in the courtyard, for example. Since we only have cameras on some of the enterances, we can't track the comings and goings of all, and probably won't try, although we may attempt to catalogue arrivals/departures of some individuals who give consent. To avoid such problems, we are going to focus, for this use case, on thefts, running, package delivery, and otherwise suspiscious or noteworthy behaviour.

We want to detect these activities, and determine how well we've detected them. Usually, such activities are pseudo-heirarchical: we detect a person going in, and that person is running, and that person is named Jerry. What does it mean when the processor says that Jerry is skipping? The detection of an 'entering the building' event is still correct, but the more fine grained activity was a false detection. However, should this count as false as well - two seperate events, one correct, one false, and one missed? An alternative would be to first check at the highest level, assign a score, then apply evaluations to finer and finer grains.

Use Case: Finding Activities in a Conference Room

There are a variety of activities that take place in a conference room. For most of its existence, a conference room lies dormant, with the lights off. When the lights are on, there can either be people inside or there can be crews managing and cleaning the space. What are most interesting, from the perspective of video processing, are the conferences themselves, and what goes on in them.

Most of the footage we currently have of conferences consists of talking heads before a whiteboard or screen. There may be some interaction with the audience. From a high level perspective, the conference is a series of seminars, lectures and presentations given by individuals or small groups.

At the most granular level, it would be useful to get an idea of the content of the whole conference. At a slightly finer grain, it would be useful to segment the conference video into individual presentations, and then classify the presentations. The presentations can be divided by style (lecture v. Q & A; PowerPoint v. white board) and by content (thesis, keywords, etc). It may be necessary to further segment talks in to different sections, based on what slide the presenter is on, who is speaking, or some other information.

Finally, at the finest grain we will consider here, it may be useful to do things like word/utterance segmentation, object tracking, person identification, and character recognition. These may all be used in support of the higher levels of activity detection, or may be used directly for things like categorization. However, the quality is likely to be poor enough to make these unusable as transcripts, and these also get away from our goal of activity detection.

Tuesday, June 03, 2003

Use Case: The Wherehouse

Groups Alpha and Beta are trying to develop software that monitors wearhouses using digital video cameras. Team Alpha is focused on person detection and uses a rule based system to turn person tracks and object tracks into sets of activities. Group Beta uses a more statistical approach, skipping object detection and tries to turn segment trace information into activity information. As such, the two teams have different goals, and very different ontologies.

Group Alpha's software generates a very rich set of data, with information regarding person tracks, and which person is standing where. The rule system then attempts to use colorometric data to equate the different person tracks into one person. After that, it can detect if a person is getting something, if a person is sleeping on the job, when a person transfers an object, and so forth. The major activities detected include idle time, transfer events, sleeping, running, and alarm state.

Group Beta's software is much more opaque, generating only probabilities for each of five different states: idle, active, unusual, theft, and fire.

There are two major reasons to use ViPER here: to benchmark and improve an individual system, or to determine which system best suits the customer. Teams Alpha and Beta could each develop ground truth that is similar to their output, and use ViPER to mark performance improvements as they tweak the parameters. From the customer's perspective, what is important is that the software works well on her problems.

The customer has two different uses for the software: real-time enhancement of existing monitoring systems, and for use to collect aggregate data. It makes the most sense to develop a set of ground truth that accurately reflects her situation, preferably using existing surveillance footage. She may also stage some scenarios.

Monday, June 02, 2003

Haystack and Adenine

Haystack is an interesting application that uses the idea of RDF and intertwingularity to display user data. The system uses an n3-alike called Adenine. Adenine is actually a lisp-like scripting language with a python like syntax (indentation is used in addition to parens to make it cleaner) that happens to have a lot of sugar for creating and manipulating sets of RDF triples. If it becomes necessary to turn the configs for the apploader into a complete language, or to turn the loader itself from a Swing application to an SWT one, I'll have to give haystack another look.

Prelinger

The Prelinger Archives are a large collection of motion picture ephemera, of which a portion is available at archive.org. There is an interview with the collector, Rick Prelinger, up at the O'Reilly Network. His work is useful for people like myself who rely on public domain video to do research and to publish papers. Good stuff.

Video Mining

Azriel Rosenfeld, David Doermann and Daniel DeMenthon are about to release a book editing together a wide variety of interesting papers about video mining for the DIMACS Workshop on Video Mining. I've been reading a review copy, and it looks pretty interesting, if suffering from the standard problem of edited books (redundancy and ommision). I'll probably write a couple of short entries about some of the more relevant articles.

Thesis Tracking

If I am going to do my thesis, I've gotta actually get started. My first goal is to write some use cases for activities. These should get me towards actually having a good idea about how to evaluate activities, which are pretty much the current top of the semantic cake for video.


Powered by Blogger