In large-scale multimedia event detection, complex target events are extracted from a large set of
consumer-generated web videos taken in unconstrained environments. We devised a multimedia event detection
method based on Gaussian mixture model (GMM) supervectors and support vector machines. A GMM supervector
consists of the parameters of a GMM for the distribution of low-level features extracted from a video clip. A GMM is
regarded as an extension of the bag-of-words framework to a probabilistic framework, and thus, it can be expected to
be robust against the data insufficiency problem. We also propose a camera motion cancelled feature, which is a
spatio-temporal feature robust against camera motions found in consumer-generated web videos. By combining
these methods with the existing features, we aim to construct a high-performance event detection system. The
effectiveness of our method is evaluated using TRECVID MED task benchmark.
Keywords: Multimedia event detection; Feature extraction; GMM supervector; Support vector machines;
Camera motion cancelled features