Dim P. Papadopoulos, Vicky S. Kalogeiton, Savvas A. Chatzichristoﬁs and Nikos Papamarkos
"Automatic Summarization and Annotation of Videos with Lack of Metadata Information"
Expert Systems With Applications,Volume 40, Issue 14, Pages 5765–5778, 2013.
The advances in computer and network infrastructure together with the fast evolution of multimedia
data has resulted in the growth of attention to the digital video’s development. The scientific community
has increased the amount of research into new technologies, with a view to improving the digital video
utilization: its archiving, indexing, accessibility, acquisition, store and even its process and usability. All
these parts of the video utilization entail the necessity of the extraction of all important information of a
video, especially in cases of lack of metadata information. The main goal of this paper is the construction
of a system that automatically generates and provides all the essential information, both in visual and
textual form, of a video. By using the visual or the textual information, a user is facilitated on the one
hand to locate a specific video and on the other hand is able to comprehend rapidly the basic points
and generally, the main concept of a video without the need to watch the whole of it. The visual information
of the system emanates from a video summarization method, while the textual one derives from
a key-word-based video annotation approach. The video annotation technique is based on the keyframes,
that constitute the video abstract and therefore, the first part of the system consists of the
new video summarization method.
According to the proposed video abstraction technique, initially, each frame of the video is described by
the Compact Composite Descriptors (CCDs) and a visual word histogram. Afterwards, the proposed
approach utilizes the Self-Growing and Self-Organized Neural Gas (SGONG) network, with a view to classifying
the frames into clusters. The extraction of a representative key frame from every cluster leads to
the generation of the video abstract. The most significant advantage of the video summarization
approach is its ability to calculate dynamically the appropriate number of final clusters. In the sequel,
a new video annotation method is applied to the generated video summary leading to the automatic generation
of key-words capable of describing the semantic content of the given video. This approach is
based on the recently proposed N-closest Photos Model (NCP). Experimental results on several videos
are presented not only to evaluate the proposed system but also to indicate its effectiveness.