Multimedia Semantics: Metadata, Analysis and Interaction. Lecture Talk at the 5th Summer School on Multimedia Semantics (SSMS), August 2010, Amsterdam, The Netherlands
2. Some BIG numbers
User Generated Content (July 2010)
4.3+ billion photos (50% are public, 30% are tagged)
30+ billion photos (2.5 billions per month)
110+ million videos
24 hours uploaded / min ≈ 90 000 full length movies / week
2 billions videos served a day
Archived TV content
1.5 million hours ≈ 120 km of shelves
300000 hours | 1 petabyte / year
News content
Content difficult to search and reuse
Barely visible for the search engines
31/08/2010 - Multimedia Semantics: Metadata, Analysis and Interaction -SSMS 2010 -2
3. Why is it so difficult to find
appropriate multimedia content, to
reuse and repurpose content
previously published and to present
this content in interfaces that vary
with user needs?
4. Image/Video indexing
Techniques used by mainstream search engines
search term occurs in the filename or in the caption or in user tags
no semantics
Image indexing: main problem
an image is not alphabetic: there is no countable discrete units, that,
in combination will provide the meaning of the image
image descriptors are not given with the image: one needs to
extract or interpret them
Video indexing: additional problem
a video has additionally a temporal dimension to take into account
a video has a priori no discrete units neither (i.e. frames, shots,
sequences cannot be absolutely defined)
31/08/2010 - Multimedia Semantics: Metadata, Analysis and Interaction -SSMS 2010 -4
5. Sounds Familiar?
[Arnold Smeulders,
PAMI, 2000]
The semantic gap is the
lack of coincidence
between the information
that one can extract from
the sensory data and the
interpretation that the
same data has for a user
in a given situation
31/08/2010 - Multimedia Semantics: Metadata, Analysis and Interaction -SSMS 2010 -5
6. a little drop of semantics goes a
long way
Jim Hendler [1997]
7. Multimedia Research Themes @EURECOM
From signal … to symbols … to meaning … to users
110010000011111110101001001001
101010111011011011101001111110
010000000001010001101100000010
010110001111100010101100011110
001011101000100011111111111010
000010010101010111001000010100
101100001101011101101011011001
Content Analysis Content Modeling Multimedia Semantics
& Indexing & Interaction
Audio processing Video Indexation Semantic Web
Video Segmentation Video Summarization Social networks
Emotion Recognition Facial+Body Biometrics Multimedia Interaction
Applications: Security in Multimedia, Multimedia on the Web
31/08/2010 - Multimedia Semantics: Metadata, Analysis and Interaction -SSMS 2010 -7
8. Learning Objectives
Learn how to get metadata (machine learning)
(Semantic) multimedia analysis … or the science of labeling
(Semantic) audio processing (ASR + NER + background knowledge)
Explore various multimedia metadata formats
Be aware of the advantages and limitations of various models
Know the interoperability issues and understand COMM, a Core
Ontology for Multimedia, learn about the W3C ontology for Media
Resources
Discuss exploratory interfaces based on rich
multimedia metadata semantics
Know how to link and expose your data on the web
See various multimedia presentation interfaces
31/08/2010 - Multimedia Semantics: Metadata, Analysis and Interaction -SSMS 2010 -8
9. Agenda
1. Semantics in multimedia analysis
Detecting concepts in video and speech
Evaluating interactive search tasks
2. Semantics in metadata
MPEG-7 based ontologies and COMM: a Core Ontology for
Multimedia
Expose your data following 4 basic principles and re-use a
growing amount of publicly open datasets
3. Semantics in user interfaces
Provide meaningful presentation of underlying data
HTML5: a game changer for video on the web
Event-centric based interfaces for browsing rich media collection
31/08/2010 - Multimedia Semantics: Metadata, Analysis and Interaction -SSMS 2010 -9
10. Overview of Canonical Processes
31/08/2010 - Multimedia Semantics: Metadata, Analysis and Interaction -SSMS 2010 - 10
11. Canonical Processes Possible Flow
31/08/2010 - Multimedia Semantics: Metadata, Analysis and Interaction -SSMS 2010 - 11
12. The Importance of the Annotations
31/08/2010 - Multimedia Semantics: Metadata, Analysis and Interaction -SSMS 2010 - 12
13. The science of labeling
Automatically detecting the presence of a
concept in a video stream
airplane
Naming visual information
31/08/2010 - Multimedia Semantics: Metadata, Analysis and Interaction -SSMS 2010 - 13
14. The Computer Vision Approach
Building detectors one-at-the-time
a face detector for
frontal faces
3 years later
a face detector for
non-frontal faces
One (or more) PhD for
every new concept
31/08/2010 - Multimedia Semantics: Metadata, Analysis and Interaction -SSMS 2010 - 14
15. So how about these?
[Cees Snoek and Marcel Worring, SSMS, 2007]
31/08/2010 - Multimedia Semantics: Metadata, Analysis and Interaction -SSMS 2010 - 15
16. A Simple Concept Detector
[Cees Snoek and Marcel Worring, SSMS, 2007]
31/08/2010 - Multimedia Semantics: Metadata, Analysis and Interaction -SSMS 2010 - 16
17. Support Vector Machine
[Cees Snoek and Marcel Worring, SSMS, 2007]
31/08/2010 - Multimedia Semantics: Metadata, Analysis and Interaction -SSMS 2010 - 17
19. NIST TRECVID Evaluation
Until 2001, everybody defined his own concepts
Using specific and small data sets
Hard to compare methodologies
Since 2001, worldwide evaluation by NIST
Promote progress in video retrieval search
Provide common datasets (shots, ASR, key frames)
Use open, metrics-based evaluation
Large-Scale Concept
Ontology for Multimedia
31/08/2010 - Multimedia Semantics: Metadata, Analysis and Interaction -SSMS 2010 - 19
20. Success and Criticism
More and more concept detectors available:
TRECVID 2005: 101 concept lexicon
TRECVID 2006: 491 concept lexicon
MediaMill Challenge 2007: 572 concept lexicon
... but focus is on the final result
relative merit of indexing methods: ignore intermediary
steps while systems become more complex (several
features and learning methods)
... but concept detectors developed mismatch
user information needs
31/08/2010 - Multimedia Semantics: Metadata, Analysis and Interaction -SSMS 2010 - 20
21. TRECVID Interactive Video Search Task
Query selection:
by keyword,
by concept,
by example
Topics unknown
Test set
English (2004)
Chinese (2005-6)
Dutch (2007-8-9)
31/08/2010 - Multimedia Semantics: Metadata, Analysis and Interaction -SSMS 2010 - 21
22. VideOlympics
Benchmark performance cannot be sole criterion
Experience of searcher counts
Usability of systems matters
VideoOlympics: live interactive search task
Simultaneous exposure
of video retrieval systems
Showcase that goes
beyond a regular demo
session
Fun to do (participants)
& Fun to watch (audience)
31/08/2010 - Multimedia Semantics: Metadata, Analysis and Interaction -SSMS 2010 - 22
23. VideOlympics Setup
One display
TRECVID like queries
Results pushed by searchers
31/08/2010 - Multimedia Semantics: Metadata, Analysis and Interaction -SSMS 2010 - 23
24. How to make video viewable to the blind?
What is required to make video accessible on the Web?
How to increase the number of accessible videos?
Technologies:
Annotating: automatic (speech transcription) and manual (social
collaborative annotation tool)
Addressing: pointing to, retrieving, transmitting only parts of media
Rendering: video visualization for the impaired, Braille output
Expected benefits for:
disabled people, getting better access to video
video provider, reaching a wider audience
the Web in general, using semantic annotations
31/08/2010 - Multimedia Semantics: Metadata, Analysis and Interaction -SSMS 2010 - 24
25. ACAV: Collaborative Annotation for Video Accessibility
Produce (semantic) annotations of multimedia content:
Automatically: speaker diarization, speech recognition
Manually: collaborative annotations, template
Generate multimodal presentation of annotated content
Subtitles / Surtitles / Close captioning
Braille output
Media Fragment access
31/08/2010 - Multimedia Semantics: Metadata, Analysis and Interaction -SSMS 2010 - 25
26. Accessibility Features for Visually
Impaired and Blind People
Man’s actions Put on his shoes Walk in the street
Son’s actions Look his mother
Characters The mother, her son The son, the man The man and his friend
Scenery In the shop In the street
Annotations multimodal presentation
Annotations depends on video context
and user preferences
Audio Auditory Audio Braille
track icons description
31/08/2010 - Multimedia Semantics: Metadata, Analysis and Interaction -SSMS 2010 - 26
27. Accessibility Features for Deaf People
Mother‘s dialogues How are you ?
Son’s dialogues Hi mom Fine and you ?
Sound Car horn
Annotations presentation
Annotations depends on video cointext
and user preferences
Video Subtitles Surtitles
track
31/08/2010 - Multimedia Semantics: Metadata, Analysis and Interaction -SSMS 2010 - 27
28. Producing Video Annotations
Automatic annotations Social annotations
Speaker diarization
Who spoke and When? Annotation corrections,
Speech recognition enhancement
Transcription Audio description
(for visually impaired)
Annotations
Mother How are you ? Annotations
Son Ho mom Fine Mother How are you ?
Son Hi mom Fine and you ?
Sound Car horn
31/08/2010 - Multimedia Semantics: Metadata, Analysis and Interaction -SSMS 2010 - 28
31. Braille Rendering
The Advene prototype emulation views
Enriched
Media Player
Timeline
with typed
annotations
31/08/2010 - Multimedia Semantics: Metadata, Analysis and Interaction -SSMS 2010 31
32. Preliminary study (1/2)
Semi-structured interviews with blind users (n=2)
Participant’s habits when watching programs with audio description
Audio description process
Multimodal presentations of descriptions
Requirements:
R1: generate additional descriptions and provide unobtrusive access
to descriptions (tactile access for blind Braille readers)
R2: descriptions at various level of granularity and verbosity
R3: use system’s multimodal output to provide two or more
descriptions (e.g. speech synthesis and Braille display)
31/08/2010 - Multimedia Semantics: Metadata, Analysis and Interaction -SSMS 2010 - 32
33. Preliminary study (2/2)
Goal: see whether we can use auditory icons to convey
the rhythm of the editing of a movie to blind users
e.g.: sound of a locomotive arriving from the right to convey the
concept of a traveling from right to left
Experiment and questionnaires (n=16+9)
Viewing with headsets of 5 min of Ratatouille,
http://www.imdb.com/title/tt0382932/
Results:
Rhythm and movie dynamic better perceived
Usefulness of auditory icons but must be limited (5 max) and be very
different from the main soundtrack of the movie
Editing cues: change of scenes, camera movement, flashback (e.g. NCIS)
Audio zoom (e.g. Survivor)
31/08/2010 - Multimedia Semantics: Metadata, Analysis and Interaction -SSMS 2010 - 33
34. ACAV Architecture
ASR Engine: Sphinx/HTK
NER + full text index with the
transcription
Interlinking with the Linked Data
Cloud to enable semantic search
31/08/2010 - Multimedia Semantics: Metadata, Analysis and Interaction -SSMS 2010 - 34
35. Agenda
1. Semantics in multimedia analysis
Detecting concepts in video and speech
Evaluating interactive search tasks
2. Semantics in metadata
MPEG-7 based ontologies and COMM: a Core Ontology for
Multimedia
Expose your data following 4 basic principles and re-use a
growing amount of publicly open datasets
3. Semantics in user interfaces
Provide meaningful presentation of underlying data
HTML5: a game changer for video on the web
Event-centric based interfaces for browsing rich media collection
31/08/2010 - Multimedia Semantics: Metadata, Analysis and Interaction -SSMS 2010 - 35
36. What is Ontology ?
Ontology (from the Greek ὄν, genitive ὄντος: of
being (neuter participle of εἶναι: to be) and -
λογία, -logia: science, study, theory) is the
philosophical study of the nature of being,
existence or reality in general, as well as the
basic categories of being and their relations.
Science of Being (Aristotle, Metaphysics, IV, 1)
Tries to answer the questions:
What characterizes being?
Eventually, what is being?
How should things be classified?
31/08/2010 - Multimedia Semantics: Metadata, Analysis and Interaction -SSMS 2010 - 36
37. Why is this Funny?
In “The analytical language of John Wilkins”*, Jorge
Borges writes about a “certain Chinese encyclopaedia”
that has the following categorization of animals:
(a) belonging to the emperor, (i) frenzied,
(b) embalmed, (j) innumerable,
(c) tame, (k) drawn with a very fine
(d) sucking pigs, camelhair brush,
(e) sirens, (l) et cetera,
(f) fabulous, (m) having just broken the
(g) stray dogs, water pitcher,
(h) included in the present (n) that from a long way off
classification, look like flies.
* http://agents.umbc.edu/misc/johnWilkins.html
31/08/2010 - Multimedia Semantics: Metadata, Analysis and Interaction -SSMS 2010 - 37
38. Ontology in Computers
An ontology is an engineering artifact consisting of:
A vocabulary used to describe (a particular view of)
some domain
An explicit specification of the intended meaning of the
vocabulary.
almost always includes how concepts should be classified
Constraints capturing additional knowledge about the
domain
Ideally, an ontology should:
Capture a shared understanding of a domain of interest
Provide a formal and machine manipulable model of the
domain
31/08/2010 - Multimedia Semantics: Metadata, Analysis and Interaction -SSMS 2010 - 38
39. Ontologies: more definitions
An ontology is a "formal, explicit
specification of a shared conceptualization".
Ontologies define the concepts and
relationships used to describe and represent an
area of knowledge. Ontologies are used to
classify the terms used in a particular application,
characterize possible relationships, and define
possible constraints on using those relationships.
In practice, ontologies can be very complex (with
several thousands of terms) or very simple
(describing one or two concepts only).
31/08/2010 - Multimedia Semantics: Metadata, Analysis and Interaction -SSMS 2010 - 39
42. MPEG-7: a multimedia description language?
ISO standard
since December
of 2001 Content organization
Collections Models User
interaction
Main
components: Creation &
Navigation & User
Access Preferences
Descriptors Production
Summaries
(Ds) and Media Usage
Content management User
Description Views History
Schemes Content description
Structural Semantic
(DSs) aspects aspects
Variations
DDL (XML
Schema +
Basic elements
extensions) Schema Basic Links & media Basic
Tools datatypes localization Tools
Concern all
types of media Part 5 – MDS
Multimedia Description Schemes
31/08/2010 - Multimedia Semantics: Metadata, Analysis and Interaction -SSMS 2010 - 42
43. MPEG-7 and the Semantic Web
MDS Upper Layer represented in RDFS
2001: Hunter
Later on: link to the ABC upper ontology
MDS fully represented in OWL-DL
2004: Tsinaraki et al., DS-MIRF model
MPEG-7 fully represented in OWL-DL
2005: Garcia and Celma, Rhizomik model
Fully automatic translation of the whole standard
MDS and Visual parts represented in OWL-DL
2007: Arndt et al., COMM model
Re-engineering MPEG-7 using DOLCE design patterns
31/08/2010 - Multimedia Semantics: Metadata, Analysis and Interaction -SSMS 2010 - 43
44. Requirements [aceMedia, MMSEM XG]
MPEG-7 compliance
Support most descriptors (decomposition, visual, audio)
Syntactic and Semantic interoperability
Shared and formal semantics represented in a Web language (OWL,
RDF/XML, RDFa, etc.)
Separation of concerns
Domain knowledge versus multimedia specific information
Modularity
Enable customization of multimedia ontology
Extensibility
Enable inclusion of further descriptors (non MPEG-7)
31/08/2010 - Multimedia Semantics: Metadata, Analysis and Interaction -SSMS 2010 - 44
45. MPEG-7 Based Ontologies
Hunter DS-MIRF Rhizomik COMM
Foundational
ABC None None DOLCE
Ontologies
Complexity OWL-Full OWL-DL OWL-DL OWL-DL
Coverage MDS+Visual MDS+CS All MDS+Visual
Digital Digital
Applications Digital Rights MM Analysis
Libraries Libraries
31/08/2010 - Multimedia Semantics: Metadata, Analysis and Interaction -SSMS 2010 - 45
46. Common Scenario
The "Big Three" at the Yalta
Conference (Wikipedia)
31/08/2010 - Multimedia Semantics: Metadata, Analysis and Interaction -SSMS 2010 - 46
47. Common Scenario: Tagging Approach
Reg1
The "Big Three" at the Yalta
Conference (Wikipedia)
Localize a region
Draw a bounding box, a circle around a shape
Annotate the content
Interpret the content
Tag: Winston Churchill, UK Prime Minister, Allied Forces, WWII
31/08/2010 - Multimedia Semantics: Metadata, Analysis and Interaction -SSMS 2010 - 47
48. Common Scenario: SW Approach
Reg1
The "Big Three" at the Yalta
Conference (Wikipedia)
Localize a region
Draw a bounding box, a circle around a shape
Annotate the content
Interpret the content
Link to knowledge on the Web
:Reg1 foaf:depicts dbpedia:Winston_Churchill
dbpedia:Winston_Churchill skos:altLabel
"Sir Winston Leonard Spencer-Churchill"
dbpedia:Winston_Churchill rdf:type foaf:Person
31/08/2010 - Multimedia Semantics: Metadata, Analysis and Interaction -SSMS 2010 - 48
49. Hunter's MPEG-7 Ontology
http://en.wikipedia.org/wiki/
Image:Yalta_Conference.jpg
mpeg7:MediaLocator
mpeg7:StillRegion
rdf:type
mpeg7:image mpeg7:spatial_decomposition
mpeg7:DominantColor
Reg1 rgb(25,255,255)
mpeg7:depicts
mpeg7:SpatialMask
mpeg7:depicts
The Big Three at the Yalta Conference mpeg7:Polygon
dbpedia:Churchill
mpeg7:Coords
5 25 10 20 15 15 10 10 5 15"^^xsd:string
31/08/2010 - Multimedia Semantics: Metadata, Analysis and Interaction -SSMS 2010 - 49
50. DS-MIRF MPEG-7 Ontology
http://en.wikipedia.org/wiki/
Image:Yalta_Conference.jpg
mpeg7:MediaURI
mpeg7:MediaLocator
mpeg7:StillRegion
rdf:type
mpeg7:image mpeg7:SpatialDecomposition
Reg1 dbpedia:Churchill
mpeg7:RelatedMaterial
mpeg7:CreationInformation
mpeg7:SpatialMask
mpeg7:Creation
mpeg7:SubRegion mpeg7:Coords
mpeg7:Polygon
mpeg7:Title mpeg7:dim
The Big Three at the Yalta
5 25 10 20 15 15 10 10 5 15"^^xsd:string
Conference
contentString
31/08/2010 - Multimedia Semantics: Metadata, Analysis and Interaction -SSMS 2010 - 50
51. Rhizomik MPEG-7 Ontology
http://en.wikipedia.org/wiki/
Image:Yalta_Conference.jpg
mpeg7:MediaLocator
mpeg7:SegmentType
rdf:type
mpeg7:image mpeg7:spatial_decomposition
Reg1 dbpedia:Churchill
mpeg7:Semantic
mpeg7:CreationInformation
mpeg7:SpatialMask
mpeg7:SubRegion mpeg7:Coords
mpeg7:Polygon
mpeg7:Title mpeg7:dim
The Big Three at the Yalta
5 25 10 20 15 15 10 10 5 15"^^xsd:string
Conference
31/08/2010 - Multimedia Semantics: Metadata, Analysis and Interaction -SSMS 2010 - 51
53. Comparison
Link with domain semantics
Hunter: ABC model + mpeg7:depicts relationship
DS-MIRF: Domain ontologies needs to subclass the general MPEG-
7 categories
Rhizomik: Use the mpeg7:semantic relationship
COMM: Semantic Annotation pattern
MPEG-7 coverage
Hunter: extension of the MPEG-7 visual descriptors
COMM:
Formalization of the context of the annotation
Representation of the method (algorithm) that provides the annotation
31/08/2010 - Multimedia Semantics: Metadata, Analysis and Interaction -SSMS 2010 - 53
54. Comparison
Modeling Decisions:
DS-MIRF and Rhizomik: 1-to-1 translation from MPEG-7 to
OWL/RDF
Hunter: Simplification and link to the ABC upper model
COMM: NO 1-to-1 translation
Need for patterns: use DOLCE, a well designed foundational ontology
as a modeling basis
Scalability:
Hunter DS-MIRF Rhizomik COMM
Triples 11 27 20 19
31/08/2010 - Multimedia Semantics: Metadata, Analysis and Interaction -SSMS 2010 - 54
57. COMM: Design Rationale
Approach:
NO 1-to-1 translation from MPEG-7 to OWL/RDF
Need for patterns: use DOLCE, a well designed foundational
ontology as a modeling basis
Design patterns:
Ontology of Information Objects (OIO)
Formalization of information exchange
Multimedia = complex compound information objects
Descriptions and Situations (D&S)
Formalization of context
Multimedia = contextual interpretation (situation)
Define multimedia patterns that translate MPEG-7 in the
DOLCE vocabulary
31/08/2010 - Multimedia Semantics: Metadata, Analysis and Interaction -SSMS 2010 - 57
58. COMM: Core Functionalities
Most important MPEG-7 functionalities:
Decomposition of multimedia content into segments
Annotation of segments with metadata
Administrative metadata: creation & production
Content-based metadata: audio/visual descriptors
Semantic metadata: interface with domain specific ontologies
Note that all are subjective and context
dependent situations
31/08/2010 - Multimedia Semantics: Metadata, Analysis and Interaction -SSMS 2010 - 58
59. COMM: D&S / OIO Patterns
Definition of design patterns for decomposition and
annotation based on D&S and OIO
MPEG-7 describes digital data (multimedia information objects) with
digital data (annotation)
Digital data entities are information objects
Decompositions and annotations are situations that satisfy the rules
of a method or algorithm
31/08/2010 - Multimedia Semantics: Metadata, Analysis and Interaction -SSMS 2010 - 59
67. W3C Ontology for Media Resources
“The ontology for media resources is meant to bridge the
different descriptions of media resources on the Web, as
opposed to media resources in local archives or musea. It is
defined based on a core set of properties which covers
basic metadata to describe media resources. Further it
defines syntactic and semantic level mappings between
elements from existing formats. The ontology is supposed
to foster the interoperability among various kinds of
metadata formats currently used to describe media
resources on the Web.”
http://www.w3.org/TR/mediaont-10/
31/08/2010 - Multimedia Semantics: Metadata, Analysis and Interaction -SSMS 2010 - 67
68. Media Ontology: A useful set of mappings
Identifier Format Example Reference
cl11 CableLabs 1.1 cl11:Writer_Display Cablelabs 1.1
dig35:ipr_name/ipr_person@d
dig35 DIG35 DIG35
escription='Image Creator'
dc Dublin Core dc:creator Dublin Core
ebucore EBUCore ebuc:creator EBUCore
exif EXIF 2.2 exif:Artist EXIF
id3 ID3 id3:TCOM ID3
iptc IPTC iptc:Creator IPTC
lom21:LifeCycle/Contribute/En
lom21 LOM 2.1 LOM
tity
ma Core properties of the MA WG ma:creator 4 Property definitions
media Media RDF media:Recording Media RDF
mrss Media RSS mrss:credit@role='author' Media RSS
mets METS mets:agency METS
mpeg7:CreationInformation/Cr
mpeg7 MPEG-7 MPEG-7
eation/Creator/Agent
dms DMS-1 dms:Participant/Person DMS-1
tva TV-Anytime tva:CredistsList/CredistItem TV-Anytime
txf TXFeed txf:author TXFeed
xmp XMP xmpDM:composer XMP
yt YouTube Data API Protocol yt:author YouTube Data API Protocol
31/08/2010 - Multimedia Semantics: Metadata, Analysis and Interaction -SSMS 2010 - 68
69. Media Ontology: classes
31/08/2010 - Multimedia Semantics: Metadata, Analysis and Interaction -SSMS 2010 - 69
70. Media Ontology: object properties
31/08/2010 - Multimedia Semantics: Metadata, Analysis and Interaction -SSMS 2010 - 70
71. Media Ontology: datatype properties
31/08/2010 - Multimedia Semantics: Metadata, Analysis and Interaction -SSMS 2010 - 71
73. Media Ontology exemplified on Flickr
31/08/2010 - Multimedia Semantics: Metadata, Analysis and Interaction -SSMS 2010 - 73
74. Linked Data Cloud
31/08/2010 - Multimedia Semantics: Metadata, Analysis and Interaction -SSMS 2010 - 74
75. Linked Data Principles
Tim Berners Lee [2006] (Design Issues)
1. Use URIs to identify things
(anything, not just documents);
2. Use HTTP URIs – globally unique names, distributed
ownership –
so that people can look up those names;
3. Provide useful information in RDF –
when someone looks up a URI;
4. Include RDF links to other URIs –
to enable discovery of related information
31/08/2010 - Multimedia Semantics: Metadata, Analysis and Interaction -SSMS 2010 - 75
77. Image Annotation with Linked Data
Reg1
The "Big Three" at the Yalta
Conference (Wikipedia)
Localize a region (bounding box)
Annotate the content (interpretation)
Tag: Winston Churchill, UK Prime Minister, Allied Forces, WWII
Link to knowledge on the Web
:Reg1 foaf:depicts dbpedia:Winston_Churchill
----------------------------------------------
dbpedia:Winston_Churchill dbpedia:spouse
dbpedia:Clementine_Churchill
dbpedia:Winston_Churchill owl:sameAs
fbase:Winston_Churchill
31/08/2010 - Multimedia Semantics: Metadata, Analysis and Interaction -SSMS 2010 - 77
81. Find connection between media
Unexpected relationships:
enable further discovery, exploration
:Clip foaf:depicts dbpedia:Boris_Yeltsin
:Clip foaf:depicts dbpedia:Bill_Clinton
:Clip foaf:depicts fbase:Laughter
Research problems
Where should we stop in the exploration?
When does it start to be intrusive for the end-user?
31/08/2010 - Multimedia Semantics: Metadata, Analysis and Interaction -SSMS 2010 - 81
82. Agenda
1. Semantics in multimedia analysis
Detecting concepts in video and speech
Evaluating interactive search tasks
2. Semantics in metadata
MPEG-7 based ontologies and COMM: a Core Ontology for
Multimedia
Expose your data following 4 basic principles and re-use a
growing amount of publicly open datasets
3. Semantics in user interfaces
Provide meaningful presentation of underlying data
HTML5: a game changer for video on the web
Event-centric based interfaces for browsing rich media collection
31/08/2010 - Multimedia Semantics: Metadata, Analysis and Interaction -SSMS 2010 - 82
83. Who are the users?
Why would they use the cloud?
What tasks can be supported?
How will the semantics help?
31/08/2010 - Multimedia Semantics: Metadata, Analysis and Interaction -SSMS 2010 - 83
84. How can semantics help?
Query construction
disambiguate input (auto-completion)
selection of available terms (grouping and ranking algorithms)
(Semantic) search algorithm
graph traversal
query expansion
RDFS/OWL reasoning
Presentation of search results
grouping by property
visualization on timeline, map, etc.
84
31/08/2010 - Multimedia Semantics: Metadata, Analysis and Interaction -SSMS 2010 - 84
85. Provide meaningful presentation of data
31/08/2010 - Multimedia Semantics: Metadata, Analysis and Interaction -SSMS 2010 - 85
86. ... and behind the scene
31/08/2010 - Multimedia Semantics: Metadata, Analysis and Interaction -SSMS 2010 - 86
87. ... link an artist to more data
31/08/2010 - Multimedia Semantics: Metadata, Analysis and Interaction -SSMS 2010 - 87
91. Going through the Walled Gardens
David Simonds: Everywhere and nowhere. 19 May 2008, The Economist.
31/08/2010 - Multimedia Semantics: Metadata, Analysis and Interaction -SSMS 2010 - 91
92. Reinventing HTML
Tim Berners Lee (27/10/2006, blog post)
«The attempt to get the world to switch to XML … all at
once didn't work. The large HTML-generating public did not
move … Some large communities did shift and are enjoying
the fruits of well-formed systems … The plan is to charter a
completely new HTML group. »
31/08/2010 - Multimedia Semantics: Metadata, Analysis and Interaction -SSMS 2010 - 92
94. Basic Layout in HTML5
31/08/2010 - Multimedia Semantics: Metadata, Analysis and Interaction -SSMS 2010 - 94
95. HTML5 Audio / Video
Native support in the browser
No need for plug-ins anymore
Flash, Silverlight, Quicktime, Windows Media
DOM APIs for scripts to control the playback
<audio src="music.oga" controls>
<a href="music.oga">Download song</a>
</audio>
<video src="video.ogv" controls
poster="poster.jpg" width="320" height="240">
<a href="video.ogv">Download movie</a>
</video>
31/08/2010 - Multimedia Semantics: Metadata, Analysis and Interaction -SSMS 2010 - 95
96. HTML5 Codecs
Media containers:
MPEG 4 (extension .mp4)
Ogg (extension .ogg)
AVI (extension .avi)
Flash video (extension .flv)
WebM: contained based on a profile of Matroska
Media codecs:
MPEG 4: various implementations (Xvid is open source) but various
patents on this codec
H.264: variant of MPEG 4, high compression. it is used by Youtube for
HD and by Blu-Ray
Theora: free codec. It is generally used within the ogg container
VP8: open video compression format released by Google (On2)
31/08/2010 - Multimedia Semantics: Metadata, Analysis and Interaction -SSMS 2010 - 96
97. HTML5 Audio / Video specification
Element:
<audio>, <video>
Attributes for both:
src: URL of the media container
autobuffer: true/false, video starts loading with the page
autoplay: true/false, video starts playing automatically
loop: true/false
controls: true/false, display default controls
Attributes for <video>
width, height: dimensions displayed
poster: URL of a still image replacing the video
videoWidth, videoHeight: original dimensions of the video
31/08/2010 - Multimedia Semantics: Metadata, Analysis and Interaction -SSMS 2010 - 97
98. HTML5 <source> Element Demo
Use the <source> element to provide
alternative streams and let the browser choose
from based on its media and codec support:
<audio>
<source src="music.oga" type="audio/ogg"/>
<source src="music.mp3" type="audio/mpeg"/>
</audio>
<video poster="poster.jpg">
<source src="video.3gp" type="video/3gpp"
media="handheld"/>
<source src="video.ogv" type="video/ogg;
codecs=theora, vorbis"/>
<source src="video.mp4" type="video/mp4"/>
</video>
31/08/2010 - Multimedia Semantics: Metadata, Analysis and Interaction -SSMS 2010 - 98
99. Sarkozy Laughing with Putin?
http://www.youtube.com/watch?v=7fMCTo-GQ2A#t=34s
31/08/2010 - Multimedia Semantics: Metadata, Analysis and Interaction -SSMS 2010 - 99
100. Clinton Laughing with Yeltsin?
• Temporal annotation in YouTube
... but the UA seeks, buffers and downloads the resource
... and the YouTube syntax is different from Google Video,
Vimeo, DailyMotion, etc.
http://www.youtube.com/watch?v=sxoh1z6s_Cw#t=15s
31/08/2010 - Multimedia Semantics: Metadata, Analysis and Interaction -SSMS 2010 - 100
101. Media Fragments
Every popular web site does it ...
region-based annotation in Flickr
temporal sequence annotation
in YouTube
#t=34s #t=15s
... BUT:
region-based annotations cannot be exported
YouTube syntax is different than DailyMotion, Vimeo, etc.
31/08/2010 - Multimedia Semantics: Metadata, Analysis and Interaction -SSMS 2010 - 101
102. W3C Media Fragments WG
W3C Media Fragments WG
http://www.w3.org/2008/WebVideo/Fragments/
31/08/2010 - Multimedia Semantics: Metadata, Analysis and Interaction -SSMS 2010 - 102
103. W3C Media Fragments WG
Provide URI-based
mechanisms for
uniquely identifying
fragments for media
objects on the Web,
such as video, audio,
and images.
31/08/2010 - Multimedia Semantics: Metadata, Analysis and Interaction -SSMS 2010 - 103
104. Use Case
Aidem received on her Facebook
wall a status message containing a
Media Fragment URI
Use a ‘#’ !
Highlight a video
sequence
Highlight a region
to pay attention to
31/08/2010 - Multimedia Semantics: Metadata, Analysis and Interaction -SSMS 2010 - 104
105. Requirements
r01: Temporal fragments:
a clipping along the time dimension from a start to an end time that
are within the duration of the media resource
r02: Spatial fragments:
a clipping of an image region, only consider rectangular regions
r03: Track fragments:
a track as exposed by a container format of the media resource
r04: Named fragments:
a media fragment - either a track, a time section, or a spatial region -
that has been given a name through some sort of annotation
mechanism
31/08/2010 - Multimedia Semantics: Metadata, Analysis and Interaction -SSMS 2010 - 105
106. Side Conditions
Restrict to what the container format (encapsulating the
compressed media content) can express (and expose),
thus no transcoding
Protocol covered: HTTP(S), FILE, RTSP, RTMP
http://www.w3.org/TR/media-frags-reqs/
31/08/2010 - Multimedia Semantics: Metadata, Analysis and Interaction -SSMS 2010 - 106
107. Media Fragments processing
General principle:
Smart UA will strip out the fragment definition and
encode it into custom http headers ...
(Media) Servers will handle the request, slice the media
content and serve just the fragment while old ones will
serve the whole resource
Four recipes proposed
UA knows how to map a fragment into bytes
UA sends a Range request expressed in a custom unit
Variant with cacheability
Server serves a playable media resource
31/08/2010 - Multimedia Semantics: Metadata, Analysis and Interaction -SSMS 2010 - 107
108. Recipe 1: UA mapped byte ranges
The User Agent knows how to map a custom unit into bytes and
sends a normal Range request expressed in bytes
31/08/2010 - Multimedia Semantics: Metadata, Analysis and Interaction -SSMS 2010 - 108
109. Recipe 1: UA mapped byte ranges
31/08/2010 - Multimedia Semantics: Metadata, Analysis and Interaction -SSMS 2010 - 109
110. Recipe 2: Server mapped byte ranges
The UA sends a Range request expressed in a custom unit (e.g.
seconds), the server answers directly with a 206 Partial Content
and indicates the mapping between bytes and the custom unit
31/08/2010 - Multimedia Semantics: Metadata, Analysis and Interaction -SSMS 2010 - 110
111. Recipe 2: Server mapped byte ranges
31/08/2010 - Multimedia Semantics: Metadata, Analysis and Interaction -SSMS 2010 - 111
112. Implementation
Media Fragment server (4 recipes supported):
Ninsuna: http://ninsuna.elis.ugent.be/MediaFragmentsServer
Media Fragment user agents:
Ninsuna Flash player:
http://ninsuna.elis.ugent.be/MediaFragmentsPlayer
Supports recipe 1
Silvia Pfeiffer's experiment with HTML5 + JS:
http://annodex.net/~silvia/itext/mediafrag.html
Supports recipe 1 (for .ogg files and time dimension)
Firefox pluggin
development in order to
support all recipes
(HTML5 +
XMLHttpRequest)
31/08/2010 - Multimedia Semantics: Metadata, Analysis and Interaction -SSMS 2010 - 112
127. Event-based centric interfaces
Action or occurrence taking place at a certain
time at a specific location
Useful for organizing and browsing collections of media
Useful for discovering complex relationships between
data
Need for an expressive event model for
connecting pieces of data
Not Yet Another Model!
31/08/2010 - Multimedia Semantics: Metadata, Analysis and Interaction -SSMS 2010 - 127
128. There are already many event ontologies
Event Model Ontology URL
CIDOC CRM http://cidoc.ics.forth.gr/OWL/cidoc_v4.2.owl
ABC Ontology http://metadata.net/harmony/ABC/ABC.owl
Event Ontology http://purl.org/NET/c4dm/event.owl#
EventsML-G2 http://www.iptc.org/EventsML/
Dolce+DnS Ultralite http://www.loa-cnr.it/ontologies/DUL.owl
F http://events.semantic-
multimedia.org/ontology/2008/12/15/model.owl
OpenCyc Ontology http://www.opencyc.org/
SEM http://semanticweb.cs.vu.nl/2009/04/event/
31/08/2010 - Multimedia Semantics: Metadata, Analysis and Interaction -SSMS 2010 - 128
129. Fundamental Types of Events
Aspect: ongoing activity vs transition between states
cyc:Event ∩ cyc:StaticSituation ≤ cyc:Situation
cidoc:E5.Event ∩ cidoc:E3.Condition_State ≤ cidco:E2.Temporal_Entity
abc:Event is a transition between abc:Situation ≈ cidoc:E3.Condition_State
Agentivity: who has produced the event?
cyc:Action, dul:Action ≤ Event
E7.Activity ≤ E5.Event
abc:Action ∩ abc:Event = Ø
Events are fully described as a set of actions taken by specific agents
Issue for modeling e.g. earthquakes
Interpretation matters!
Identifiable changes or not? Agency can be assigned?
dul:Situation describe dul:Event
dul:Action, dul:Process ≤ dul:Event
31/08/2010 - Multimedia Semantics: Metadata, Analysis and Interaction -SSMS 2010 - 129
130. Events and Temporal Intervals
Relating events to chronological spans of time
Persistent, socially attributed meanings
Arbitrary system for subdividing an abstract space
Modeling a class for temporal intervals and use an OP
ABC, CIDOC, EO (owl:TemporalEntity)
Modeling a XML Schema typed value and use a DP
Pro: simplicity, values expressed as xsd:date or xsd:dateTime
Cons: inability to express uncertain period or when there is no
coincidence with date units
Having two properties
dul:hasEventDate ... litteral value
dul:isObservableAt ... dul:TimeInterval
31/08/2010 - Multimedia Semantics: Metadata, Analysis and Interaction -SSMS 2010 - 130
131. Events, Spaces and Places
Relating events to places
Semantically significant places
Abstract spatial regions
Support spatial regions only: ABC, CIDOC, EO
eo:Event eo:place wgs84:SpatialThing
cidoc:E5.Event cidoc:P7.took_place_at cidoc:E53.Place
Support the place/space distinction
dul:Event dul:hasLocation dul:Place
dul:Event dul:hasRegion dul:SpaceRegion
Most flexible approach: allow to resolve to places with no
geographical coordinate systems (e.g. mythical events, SecondLife)
31/08/2010 - Multimedia Semantics: Metadata, Analysis and Interaction -SSMS 2010 - 131
133. Events, Influence, Purpose and Causality
Making broad assertions linking events to any thing
cidoc:P12.occurred_in_the_presence_of, cidoc:P15.was_influenced_by
eo:factor, abc:hasResult
F model uses the DnS pattern
31/08/2010 - Multimedia Semantics: Metadata, Analysis and Interaction -SSMS 2010 - 133
134. Events, Parts and Composition
A's timespan ϵ B's timespan
Event A being part of event B ≠
cidoc:P86.falls_within for expressing containment among timespans
cidoc:P9.consist_of ≈ eo:sub_event ≈ abc:isSubEventOf
Linking sub-events with parthood
dul:hasPart
The 20th century contains the year 1923
World War II included Pearl Harbour
Linking sub-events with composition
dul:hasConstituent
The French revolution is composed of the Bastille catch
31/08/2010 - Multimedia Semantics: Metadata, Analysis and Interaction -SSMS 2010 - 134
135. Towards a Linked Data Event Model
31/08/2010 -
16/09/2009 Event-based Annotation and Exploration of Media - PetaMedia SYTIM, Lausanne (CH)
Multimedia Semantics: Metadata, Analysis and Interaction -SSMS 2010 - 135
136. Some mappings in LODE
ABC CIDOC DUL EO LODE
atTime P4.has_time_span isObservableAt time atTime
P7.took_place_at place inSpace
inPlace hasLocation atPlace
involves P12.occurred_in_the_ hasParticipant factor involved
presence_of
hasPresence P11.had_participant involvesAgent agent involvedAgent
31/08/2010 - Multimedia Semantics: Metadata, Analysis and Interaction -SSMS 2010 - 136
137. 31/08/2010 -
16/09/2009 Event-based Annotation and Exploration of Media - PetaMedia SYTIM, Lausanne (CH)
Multimedia Semantics: Metadata, Analysis and Interaction -SSMS 2010 - 137
138. What to do in Nimes in July?
31/08/2010 - Multimedia Semantics: Metadata, Analysis and Interaction -SSMS 2010 - 138
139. Events and Media
Events are observable occurrences grouping
People Places Time
Experiences documented by Media
31/08/2010 - -
31/08/2010 Multimedia Semantics: Metadata, Analysis and Interaction -SSMS 2010
139 - 139
140. Goal
1. Discover PAST, PRESENT and FUTURE events
2. Live, relive and predict experiences through shared media
3. Identify meaningful and/or interesting relationships
between events/media/people
31/08/2010 - -
31/08/2010 Multimedia Semantics: Metadata, Analysis and Interaction -SSMS 2010
140 - 140
146. Services
Existing services to explore, share and
discover event
Aggregate these heterogeneous data sources
Enrich with media and social data
31/08/2010 - Multimedia Semantics: Metadata, Analysis and Interaction -SSMS 2010 - 146
148. LODE Example
Jack recorded a video with his mobile phone camera while he was
attending the Haiti Relief concert from Radiohead given on January 24th, 2010 in
LA. He thinks it was a really nice experience and wants to share it on-line. He would
also like to see how other people experienced the show
31/08/2010 - -
31/08/2010 Multimedia Semantics: Metadata, Analysis and Interaction -SSMS 2010
148 - 148
152. Jamiroquai @ Sziget Festival (Budapest)
31/08/2010 - Multimedia Semantics: Metadata, Analysis and Interaction -SSMS 2010 - 152
153. Take Home Message
Concept detection challenges: machine learning and IR
Features can be extracted and used to describe multimedia content
Show generality of approach, dynamic nature of video (event)
Show that an ontology can help
Semantic metadata representation challenges: KR
Media and metadata can be passed around and among systems
Reuse what is there
Expose what you make
Interaction challenges: CHI
Users can be given much richer
and more flexible access to (semantically annotated) content
... but we are still figuring out how to do this!
31/08/2010 - Multimedia Semantics: Metadata, Analysis and Interaction -SSMS 2010 - 153
154. Credits
Many people
Cees Snoek, Marcel Worring, Alex Hauptmann,
Alan Smeaton, Ivan Herman, Krishna Chandramouli,
David Simonds, Laurent Le Meur
Colleagues from the Interactive Information Access
Group, CWI Amsterdam
Datasets
http://www.slideshare.net/troncy
31/08/2010 - Multimedia Semantics: Metadata, Analysis and Interaction -SSMS 2010 - 154