Shpungin, Boris
Coauthors(s):
UCSD
Cognitive Science
4249 Nobel Dr. #9
San Diego, CA 92122-1115
mplab.ucsd.edu
A System for Robustly Tracking Faces in Real-time
I aim to present a work in progress on a face
tracking system, which operates in real-time on
streaming color images from a video camera, and
presently consists of two stages. The multi-stage
approach follows recent ideas in real-time natural
signal processing, which involve applying graded
and escalating levels of analysis to the signal over
time, as information accumulates online from
previous computational steps. The first stage
utilizes a mixture of hues model combined with a
spatial blob bias to locate regions of interest in the
image, and subsequently track the region in an
efficient and robust manner, resistant to occlusion,
rapid accelerations, and varying levels of
illumination. The second stage classifies images
based on whether they contain a face. It applies to
the region of interest pre-selected by the first stage,
and is particularly useful in enabling the tracker to
resist distractions such as non-face parts of the
body, or flesh-colored objects in the background.
This classification module needs to be tolerant of
widely varying orientations of the face, including
extreme in- and out-of-plane rotations; it must also
be tolerant of varying backgrounds, diverse facial
features and expressions, other non-face parts of
the human body, and random natural and artificial
images. At present, the best candidate for the
module is a hierarchical mixture of experts network
operating on a vector of magnitudes produced by a
grid of Gabor jets applied to the automatically
normalized subimage. The current system
approaches an overall discrimination accuracy rate
of 80%, with possibility of further improvement.
Work is in progress to also audition alternative
architectures, including support vector machines,
as applied to this problem. Additionally, time
allowing, I plan to investigate over the coming month
and a half ways to make the second stage resistant
to partial occlusion of the face - with more results to
show at the symposium if all goes well. Results and
performance figures for the tracker will be presented
along with analysis of how the architecture is solving
the problem. The tracker is targeted to compute
efficiently, use minimal resources, and operate at 30
frames per second on a medium-range PC - all of
which it currently does -- and if possible, I hope to
provide an online demonstration.