Multimedia Semantics - SSMS 2010

Multimedia Semantics:
Metadata, Analysis and Interaction

Raphaël Troncy <raphael.troncy@eurecom.fr>
Multimedia Semantics, EURECOM (FR)

Some BIG numbers
 User Generated Content (July 2010)
 4.3+ billion photos (50% are public, 30% are tagged)
 30+ billion photos (2.5 billions per month)
 110+ million videos
24 hours uploaded / min ≈ 90 000 full length movies / week
2 billions videos served a day

 Archived TV content
 1.5 million hours ≈ 120 km of shelves
 300000 hours | 1 petabyte / year

 News content
 Content difficult to search and reuse
 Barely visible for the search engines

31/08/2010 - Multimedia Semantics: Metadata, Analysis and Interaction -SSMS 2010 -2

Why is it so difficult to find
appropriate multimedia content, to
reuse and repurpose content
previously published and to present
this content in interfaces that vary
with user needs?

Image/Video indexing

 Techniques used by mainstream search engines
 search term occurs in the filename or in the caption or in user tags
 no semantics

 Image indexing: main problem
 an image is not alphabetic: there is no countable discrete units, that,
in combination will provide the meaning of the image
 image descriptors are not given with the image: one needs to
extract or interpret them

 Video indexing: additional problem
 a video has additionally a temporal dimension to take into account
 a video has a priori no discrete units neither (i.e. frames, shots,
sequences cannot be absolutely defined)


Sounds Familiar?
 [Arnold Smeulders,
PAMI, 2000]
The semantic gap is the
lack of coincidence
between the information
that one can extract from
the sensory data and the
interpretation that the
same data has for a user
in a given situation


a little drop of semantics goes a
long way
Jim Hendler [1997]

Multimedia Research Themes @EURECOM

From signal … to symbols … to meaning … to users
110010000011111110101001001001
101010111011011011101001111110
010000000001010001101100000010
010110001111100010101100011110
001011101000100011111111111010
000010010101010111001000010100
101100001101011101101011011001

Content Analysis Content Modeling Multimedia Semantics
& Indexing & Interaction
 Audio processing  Video Indexation  Semantic Web
 Video Segmentation  Video Summarization  Social networks
 Emotion Recognition  Facial+Body Biometrics  Multimedia Interaction

Applications: Security in Multimedia, Multimedia on the Web


Learning Objectives
 Learn how to get metadata (machine learning)
 (Semantic) multimedia analysis … or the science of labeling
 (Semantic) audio processing (ASR + NER + background knowledge)

 Explore various multimedia metadata formats
 Be aware of the advantages and limitations of various models
 Know the interoperability issues and understand COMM, a Core
Ontology for Multimedia, learn about the W3C ontology for Media
Resources

 Discuss exploratory interfaces based on rich
multimedia metadata semantics
 Know how to link and expose your data on the web
 See various multimedia presentation interfaces


Agenda
1. Semantics in multimedia analysis
 Detecting concepts in video and speech
 Evaluating interactive search tasks

2. Semantics in metadata
 MPEG-7 based ontologies and COMM: a Core Ontology for
Multimedia
 Expose your data following 4 basic principles and re-use a
growing amount of publicly open datasets

3. Semantics in user interfaces
 Provide meaningful presentation of underlying data
 HTML5: a game changer for video on the web
 Event-centric based interfaces for browsing rich media collection


Overview of Canonical Processes

31/08/2010 - Multimedia Semantics: Metadata, Analysis and Interaction -SSMS 2010 - 10

Canonical Processes Possible Flow


The Importance of the Annotations


The science of labeling

 Automatically detecting the presence of a
concept in a video stream

airplane

 Naming visual information


The Computer Vision Approach

 Building detectors one-at-the-time

a face detector for
frontal faces

3 years later

a face detector for
non-frontal faces

One (or more) PhD for
every new concept


So how about these?

[Cees Snoek and Marcel Worring, SSMS, 2007]

A Simple Concept Detector


Support Vector Machine


Supervised Learner


NIST TRECVID Evaluation

 Until 2001, everybody defined his own concepts
 Using specific and small data sets
 Hard to compare methodologies

 Since 2001, worldwide evaluation by NIST
 Promote progress in video retrieval search
 Provide common datasets (shots, ASR, key frames)
 Use open, metrics-based evaluation

Large-Scale Concept
Ontology for Multimedia


Success and Criticism

 More and more concept detectors available:
 TRECVID 2005: 101 concept lexicon
 TRECVID 2006: 491 concept lexicon
 MediaMill Challenge 2007: 572 concept lexicon

 ... but focus is on the final result
 relative merit of indexing methods: ignore intermediary
steps while systems become more complex (several
features and learning methods)

 ... but concept detectors developed mismatch
user information needs


TRECVID Interactive Video Search Task
 Query selection:
 by keyword,
 by concept,
 by example

 Topics unknown
 Test set
 English (2004)
 Chinese (2005-6)
 Dutch (2007-8-9)


VideOlympics
 Benchmark performance cannot be sole criterion
 Experience of searcher counts
 Usability of systems matters

 VideoOlympics: live interactive search task
 Simultaneous exposure
of video retrieval systems
 Showcase that goes
beyond a regular demo
session
 Fun to do (participants)
& Fun to watch (audience)


VideOlympics Setup

 One display
 TRECVID like queries
 Results pushed by searchers

How to make video viewable to the blind?
 What is required to make video accessible on the Web?
 How to increase the number of accessible videos?
 Technologies:
 Annotating: automatic (speech transcription) and manual (social
collaborative annotation tool)
 Addressing: pointing to, retrieving, transmitting only parts of media
 Rendering: video visualization for the impaired, Braille output

 Expected benefits for:
 disabled people, getting better access to video
 video provider, reaching a wider audience
 the Web in general, using semantic annotations


ACAV: Collaborative Annotation for Video Accessibility

 Produce (semantic) annotations of multimedia content:
 Automatically: speaker diarization, speech recognition
 Manually: collaborative annotations, template

 Generate multimodal presentation of annotated content
 Subtitles / Surtitles / Close captioning
 Braille output
 Media Fragment access


Accessibility Features for Visually
Impaired and Blind People

Man’s actions Put on his shoes Walk in the street

Son’s actions Look his mother

Characters The mother, her son The son, the man The man and his friend

Scenery In the shop In the street

Annotations multimodal presentation
Annotations depends on video context
and user preferences

Audio Auditory Audio Braille
track icons description


Accessibility Features for Deaf People

Mother‘s dialogues How are you ?

Son’s dialogues Hi mom Fine and you ?

Sound Car horn

Annotations presentation
Annotations depends on video cointext
and user preferences

Video Subtitles Surtitles
track


Producing Video Annotations

 Automatic annotations  Social annotations

 Speaker diarization
Who spoke and When?  Annotation corrections,
 Speech recognition enhancement
Transcription  Audio description
(for visually impaired)
Annotations
Mother How are you ? Annotations
Son Ho mom Fine Mother How are you ?

Son Hi mom Fine and you ?

Sound Car horn


Speech Processing


Demo: http://acav.eurecom.fr/


Braille Rendering
The Advene prototype emulation views

Enriched
Media Player

Timeline
with typed
annotations

31/08/2010 - Multimedia Semantics: Metadata, Analysis and Interaction -SSMS 2010 31

Preliminary study (1/2)
 Semi-structured interviews with blind users (n=2)
 Participant’s habits when watching programs with audio description
 Audio description process
 Multimodal presentations of descriptions

 Requirements:
 R1: generate additional descriptions and provide unobtrusive access
to descriptions (tactile access for blind Braille readers)
 R2: descriptions at various level of granularity and verbosity
 R3: use system’s multimodal output to provide two or more
descriptions (e.g. speech synthesis and Braille display)


Preliminary study (2/2)
 Goal: see whether we can use auditory icons to convey
the rhythm of the editing of a movie to blind users
 e.g.: sound of a locomotive arriving from the right to convey the
concept of a traveling from right to left

 Experiment and questionnaires (n=16+9)
 Viewing with headsets of 5 min of Ratatouille,
http://www.imdb.com/title/tt0382932/

 Results:
 Rhythm and movie dynamic better perceived
 Usefulness of auditory icons but must be limited (5 max) and be very
different from the main soundtrack of the movie
 Editing cues: change of scenes, camera movement, flashback (e.g. NCIS)
 Audio zoom (e.g. Survivor)


ACAV Architecture

ASR Engine: Sphinx/HTK

 NER + full text index with the
transcription
 Interlinking with the Linked Data
Cloud to enable semantic search


Agenda

Multimedia



What is Ontology ?
 Ontology (from the Greek ὄν, genitive ὄντος: of
being (neuter participle of εἶναι: to be) and -
λογία, -logia: science, study, theory) is the
philosophical study of the nature of being,
existence or reality in general, as well as the
basic categories of being and their relations.

 Science of Being (Aristotle, Metaphysics, IV, 1)
 Tries to answer the questions:
What characterizes being?
Eventually, what is being?
 How should things be classified?


Why is this Funny?
In “The analytical language of John Wilkins”*, Jorge
Borges writes about a “certain Chinese encyclopaedia”
that has the following categorization of animals:
(a) belonging to the emperor, (i) frenzied,
(b) embalmed, (j) innumerable,
(c) tame, (k) drawn with a very fine
(d) sucking pigs, camelhair brush,
(e) sirens, (l) et cetera,
(f) fabulous, (m) having just broken the
(g) stray dogs, water pitcher,
(h) included in the present (n) that from a long way off
classification, look like flies.
* http://agents.umbc.edu/misc/johnWilkins.html


Ontology in Computers
 An ontology is an engineering artifact consisting of:
 A vocabulary used to describe (a particular view of)
some domain
 An explicit specification of the intended meaning of the
vocabulary.
almost always includes how concepts should be classified
 Constraints capturing additional knowledge about the
domain

 Ideally, an ontology should:
 Capture a shared understanding of a domain of interest
 Provide a formal and machine manipulable model of the
domain


Ontologies: more definitions
 An ontology is a "formal, explicit
specification of a shared conceptualization".
 Ontologies define the concepts and
relationships used to describe and represent an
area of knowledge. Ontologies are used to
classify the terms used in a particular application,
characterize possible relationships, and define
possible constraints on using those relationships.
In practice, ontologies can be very complex (with
several thousands of terms) or very simple
(describing one or two concepts only).


What is a
Multimedia Ontology?

Multimedia: Description methods

MPEG-21

MPEG-7

MPEG-4

MPEG-2

MPEG-1

ISO W3C


MPEG-7: a multimedia description language?

 ISO standard
since December
of 2001 Content organization
Collections Models User
interaction

 Main
components: Creation &
Navigation & User
Access Preferences
 Descriptors Production
Summaries
(Ds) and Media Usage
Content management User
Description Views History
Schemes Content description
Structural Semantic
(DSs) aspects aspects
Variations

 DDL (XML
Schema +
Basic elements
extensions) Schema Basic Links & media Basic
Tools datatypes localization Tools
 Concern all
types of media Part 5 – MDS
Multimedia Description Schemes

MPEG-7 and the Semantic Web
 MDS Upper Layer represented in RDFS
 2001: Hunter
 Later on: link to the ABC upper ontology

 MDS fully represented in OWL-DL
 2004: Tsinaraki et al., DS-MIRF model

 MPEG-7 fully represented in OWL-DL
 2005: Garcia and Celma, Rhizomik model
 Fully automatic translation of the whole standard

 MDS and Visual parts represented in OWL-DL
 2007: Arndt et al., COMM model
 Re-engineering MPEG-7 using DOLCE design patterns


Requirements [aceMedia, MMSEM XG]
 MPEG-7 compliance
 Support most descriptors (decomposition, visual, audio)

 Syntactic and Semantic interoperability
 Shared and formal semantics represented in a Web language (OWL,
RDF/XML, RDFa, etc.)

 Separation of concerns
 Domain knowledge versus multimedia specific information

 Modularity
 Enable customization of multimedia ontology

 Extensibility
 Enable inclusion of further descriptors (non MPEG-7)


MPEG-7 Based Ontologies

Hunter DS-MIRF Rhizomik COMM

Foundational
ABC None None DOLCE
Ontologies

Complexity OWL-Full OWL-DL OWL-DL OWL-DL

Coverage MDS+Visual MDS+CS All MDS+Visual

Digital Digital
Applications Digital Rights MM Analysis
Libraries Libraries


Common Scenario

The "Big Three" at the Yalta
Conference (Wikipedia)


Common Scenario: Tagging Approach
Reg1


 Localize a region
 Draw a bounding box, a circle around a shape

 Annotate the content
 Interpret the content
 Tag: Winston Churchill, UK Prime Minister, Allied Forces, WWII


Common Scenario: SW Approach
Reg1


 Draw a bounding box, a circle around a shape

 Interpret the content
 Link to knowledge on the Web
:Reg1 foaf:depicts dbpedia:Winston_Churchill
dbpedia:Winston_Churchill skos:altLabel
"Sir Winston Leonard Spencer-Churchill"
dbpedia:Winston_Churchill rdf:type foaf:Person

Hunter's MPEG-7 Ontology

http://en.wikipedia.org/wiki/
Image:Yalta_Conference.jpg

mpeg7:MediaLocator
mpeg7:StillRegion

rdf:type

mpeg7:image mpeg7:spatial_decomposition
mpeg7:DominantColor
Reg1 rgb(25,255,255)

mpeg7:depicts
mpeg7:SpatialMask
mpeg7:depicts

The Big Three at the Yalta Conference mpeg7:Polygon
dbpedia:Churchill
mpeg7:Coords

5 25 10 20 15 15 10 10 5 15"^^xsd:string


DS-MIRF MPEG-7 Ontology

mpeg7:MediaURI

mpeg7:MediaLocator
mpeg7:StillRegion
rdf:type

mpeg7:image mpeg7:SpatialDecomposition
Reg1 dbpedia:Churchill
mpeg7:RelatedMaterial
mpeg7:CreationInformation
mpeg7:SpatialMask

mpeg7:Creation
mpeg7:SubRegion mpeg7:Coords
mpeg7:Polygon

mpeg7:Title mpeg7:dim

The Big Three at the Yalta
5 25 10 20 15 15 10 10 5 15"^^xsd:string
Conference
contentString


Rhizomik MPEG-7 Ontology


mpeg7:MediaLocator
mpeg7:SegmentType

rdf:type

mpeg7:image mpeg7:spatial_decomposition
Reg1 dbpedia:Churchill
mpeg7:Semantic
mpeg7:CreationInformation
mpeg7:SpatialMask

mpeg7:SubRegion mpeg7:Coords
mpeg7:Polygon

mpeg7:Title mpeg7:dim

The Big Three at the Yalta
5 25 10 20 15 15 10 10 5 15"^^xsd:string
Conference

COMM: Fragment Identification


dns:realized-by

dns:setting
core:semantic-
core:image-data
annotation

dns:plays dns:defines foaf:Person

loc:region- loc:spatial-mask- core:semantic-label-
locator-descriptor role role
dns:played-by
rdf:type
dns:defines dns:played-by

loc:bounding-box 5 25 10 20 15 15 10 10 5 15"^^xsd:string dbpedia:Churchill

data:has-rectangle


Comparison
 Link with domain semantics
 Hunter: ABC model + mpeg7:depicts relationship
 DS-MIRF: Domain ontologies needs to subclass the general MPEG-
7 categories
 Rhizomik: Use the mpeg7:semantic relationship
 COMM: Semantic Annotation pattern

 MPEG-7 coverage
 Hunter: extension of the MPEG-7 visual descriptors
 COMM:
Formalization of the context of the annotation
Representation of the method (algorithm) that provides the annotation


Comparison

 Modeling Decisions:
 DS-MIRF and Rhizomik: 1-to-1 translation from MPEG-7 to
OWL/RDF
 Hunter: Simplification and link to the ABC upper model
 COMM: NO 1-to-1 translation
Need for patterns: use DOLCE, a well designed foundational ontology
as a modeling basis

 Scalability:

Hunter DS-MIRF Rhizomik COMM

Triples 11 27 20 19


Research Problem Seq4
Reg1
Seq1

The "Big Three" at the Yalta A history of G8 violence (video)
Conference (Wikipedia) (© Reuters)
 Multimedia objects are complex
 MPEG-7
 Compound information objects, fragment identification
 Semantic annotation
 Subjective interpretation, context dependent  D&S | OIO
 Linked data principle
 Open to reuse existing knowledge  RDF

COMM: Design Rationale
 Approach:
 NO 1-to-1 translation from MPEG-7 to OWL/RDF
 Need for patterns: use DOLCE, a well designed foundational
ontology as a modeling basis

 Design patterns:
 Ontology of Information Objects (OIO)
Formalization of information exchange
Multimedia = complex compound information objects
 Descriptions and Situations (D&S)
Formalization of context
Multimedia = contextual interpretation (situation)

 Define multimedia patterns that translate MPEG-7 in the
DOLCE vocabulary


COMM: Core Functionalities

 Most important MPEG-7 functionalities:
 Decomposition of multimedia content into segments
 Annotation of segments with metadata
Administrative metadata: creation & production
Content-based metadata: audio/visual descriptors
Semantic metadata: interface with domain specific ontologies

 Note that all are subjective and context
dependent situations


COMM: D&S / OIO Patterns

Definition of design patterns for decomposition and
annotation based on D&S and OIO
MPEG-7 describes digital data (multimedia information objects) with
digital data (annotation)
Digital data entities are information objects
Decompositions and annotations are situations that satisfy the rules
of a method or algorithm


COMM: Decomposition Pattern

MPEG-
MPEG-7
7


COMM: Annotation Pattern

MPEG-7


COMM: Semantic Pattern

Domain
Ontologies


COMM:
Modules

Annotation
Pattern

Decomposition
Pattern


Example 1: Region Annotation


dns:realized-by

dns:setting
core:semantic-
core:image-data
annotation

dns:plays dns:defines foaf:Person

loc:region- loc:spatial-mask- core:semantic-label-
locator-descriptor role role
dns:played-by
rdf:type

loc:bounding-box 5 25 10 20 15 15 10 10 5 15"^^xsd:string
Churchill
data:has-rectangle


Example 2: Sequence Annotation

http://www.reuters.com/news/video/
summitVideo?videoId=56114

dns:realized-by

dns:setting
core:semantic-
core:image-data
annotation

dns:plays dns:defines tgn:Sweden

loc:media-time- loc:temporal- core:semantic-label-
descriptor mask-role role
dns:played-by
skos:broader

loc:media-time-
"1:21"^^xsd:time tgn:Gothenburg
point
data:has-time

W3C Ontology for Media Resources
“The ontology for media resources is meant to bridge the
different descriptions of media resources on the Web, as
opposed to media resources in local archives or musea. It is
defined based on a core set of properties which covers
basic metadata to describe media resources. Further it
defines syntactic and semantic level mappings between
elements from existing formats. The ontology is supposed
to foster the interoperability among various kinds of
metadata formats currently used to describe media
resources on the Web.”

http://www.w3.org/TR/mediaont-10/


Media Ontology: A useful set of mappings
Identifier Format Example Reference
cl11 CableLabs 1.1 cl11:Writer_Display Cablelabs 1.1

dig35:ipr_name/ipr_person@d
dig35 DIG35 DIG35
escription='Image Creator'

dc Dublin Core dc:creator Dublin Core
ebucore EBUCore ebuc:creator EBUCore
exif EXIF 2.2 exif:Artist EXIF
id3 ID3 id3:TCOM ID3
iptc IPTC iptc:Creator IPTC
lom21:LifeCycle/Contribute/En
lom21 LOM 2.1 LOM
tity
ma Core properties of the MA WG ma:creator 4 Property definitions

media Media RDF media:Recording Media RDF

mrss Media RSS mrss:credit@role='author' Media RSS
mets METS mets:agency METS
mpeg7:CreationInformation/Cr
mpeg7 MPEG-7 MPEG-7
eation/Creator/Agent
dms DMS-1 dms:Participant/Person DMS-1

tva TV-Anytime tva:CredistsList/CredistItem TV-Anytime
txf TXFeed txf:author TXFeed
xmp XMP xmpDM:composer XMP

yt YouTube Data API Protocol yt:author YouTube Data API Protocol

Media Ontology: classes


Media Ontology: object properties


Media Ontology: datatype properties


Media Ontology exemplified on Flickr


Linked Data Cloud


Linked Data Principles

 Tim Berners Lee [2006] (Design Issues)
1. Use URIs to identify things
(anything, not just documents);
2. Use HTTP URIs – globally unique names, distributed
ownership –
so that people can look up those names;
3. Provide useful information in RDF –
when someone looks up a URI;
4. Include RDF links to other URIs –
to enable discovery of related information


: Interlinking Multimedia
wp:2006_FIFA_Wolrd_Cup#Final
nc:15054000

nar:subject events:id

nar:location foaf:depicts

geonames:2950159 dbpedia:Zidane

Image Annotation with Linked Data
Reg1

 Localize a region (bounding box)
 Annotate the content (interpretation)
 Tag: Winston Churchill, UK Prime Minister, Allied Forces, WWII
 Link to knowledge on the Web
:Reg1 foaf:depicts dbpedia:Winston_Churchill
----------------------------------------------
dbpedia:Winston_Churchill dbpedia:spouse
dbpedia:Clementine_Churchill
dbpedia:Winston_Churchill owl:sameAs
fbase:Winston_Churchill

Video Annotation with Linked Data
Seq4

Seq1
A history of G8 violence (video)
(© Reuters)

 Tag: G8 Summit, Heiligendamn, 2007
 Link to knowledge on the Web EU Summit, Gothenburg, 2001
:Seq1 foaf:depicts dbpedia:34th_G8_Summit
----------------------------------------------
dbpedia:33rd_G8_Summit foaf:based_near geo:Heilegendamn
geo:Heilegendamn skos:broader geo:Germany

Media Annotations

• Annotate the content
(interpretation)
Boris Yeltsin, Bill Clinton,
laugh, Bosnia, Hyde Park

 Using structured knowledge on the Web
:Clip foaf:depicts dbpedia:Laughter
:Clip foaf:depicts dbpedia:Boris_Yeltsin
:Clip foaf:depicts dbpedia:Bill_Clinton
:Clip foaf:depicts dbpedia:Hyde_Park,New_York
----------------------------------------------
dbpedia:Hyde_Park,New_York owl:sameAs fbase:hyde_park
fbase:hyde_park skos:broader fbase:new_york_state


Answer abstract queries

PREFIX foaf: <http://xmlns.com/foaf/0.1/>
SELECT ?Clip
WHERE {
?Clip foaf:depicts dbpedia:Laughter ,
yago:PresidentsOfTheRussianFederation ,
yago:President110468559 .
}

 Research Problems
 Data modeling, vocabulary alignment, disambiguation


Find connection between media
 Unexpected relationships:
enable further discovery, exploration

:Clip foaf:depicts dbpedia:Boris_Yeltsin
:Clip foaf:depicts dbpedia:Bill_Clinton
:Clip foaf:depicts fbase:Laughter

 Research problems
 Where should we stop in the exploration?
 When does it start to be intrusive for the end-user?


Agenda

Multimedia



 Who are the users?
 Why would they use the cloud?
 What tasks can be supported?
 How will the semantics help?


How can semantics help?

 Query construction
 disambiguate input (auto-completion)
 selection of available terms (grouping and ranking algorithms)

 (Semantic) search algorithm
 graph traversal
 query expansion
 RDFS/OWL reasoning

 Presentation of search results
 grouping by property
 visualization on timeline, map, etc.

84

Provide meaningful presentation of data


... and behind the scene


... link an artist to more data


... myspace


... last.fm


... IMDb


Going through the Walled Gardens

David Simonds: Everywhere and nowhere. 19 May 2008, The Economist.

Reinventing HTML

 Tim Berners Lee (27/10/2006, blog post)

«The attempt to get the world to switch to XML … all at
once didn't work. The large HTML-generating public did not
move … Some large communities did shift and are enjoying
the fruits of well-formed systems … The plan is to charter a
completely new HTML group. »


Basic Layout in HTML5


HTML5 Audio / Video

 Native support in the browser
 No need for plug-ins anymore
Flash, Silverlight, Quicktime, Windows Media
 DOM APIs for scripts to control the playback
<audio src="music.oga" controls>
<a href="music.oga">Download song</a>
</audio>

<video src="video.ogv" controls
poster="poster.jpg" width="320" height="240">
<a href="video.ogv">Download movie</a>
</video>


HTML5 Codecs

 Media containers:
 MPEG 4 (extension .mp4)
 Ogg (extension .ogg)
 AVI (extension .avi)
 Flash video (extension .flv)
 WebM: contained based on a profile of Matroska

 Media codecs:
 MPEG 4: various implementations (Xvid is open source) but various
patents on this codec
H.264: variant of MPEG 4, high compression. it is used by Youtube for
HD and by Blu-Ray
 Theora: free codec. It is generally used within the ogg container
 VP8: open video compression format released by Google (On2)


HTML5 Audio / Video specification
 Element:
 <audio>, <video>

 Attributes for both:
 src: URL of the media container
 autobuffer: true/false, video starts loading with the page
 autoplay: true/false, video starts playing automatically
 loop: true/false
 controls: true/false, display default controls

 Attributes for <video>
 width, height: dimensions displayed
 poster: URL of a still image replacing the video
 videoWidth, videoHeight: original dimensions of the video


HTML5 <source> Element Demo
 Use the <source> element to provide
alternative streams and let the browser choose
from based on its media and codec support:
<audio>
<source src="music.oga" type="audio/ogg"/>
<source src="music.mp3" type="audio/mpeg"/>
</audio>

<video poster="poster.jpg">
<source src="video.3gp" type="video/3gpp"
media="handheld"/>
<source src="video.ogv" type="video/ogg;
codecs=theora, vorbis"/>
<source src="video.mp4" type="video/mp4"/>
</video>


Sarkozy Laughing with Putin?

http://www.youtube.com/watch?v=7fMCTo-GQ2A#t=34s

Clinton Laughing with Yeltsin?

• Temporal annotation in YouTube
... but the UA seeks, buffers and downloads the resource
... and the YouTube syntax is different from Google Video,
Vimeo, DailyMotion, etc.
http://www.youtube.com/watch?v=sxoh1z6s_Cw#t=15s

Media Fragments

 Every popular web site does it ...
 region-based annotation in Flickr
 temporal sequence annotation
in YouTube

#t=34s #t=15s

 ... BUT:
 region-based annotations cannot be exported
 YouTube syntax is different than DailyMotion, Vimeo, etc.

W3C Media Fragments WG
http://www.w3.org/2008/WebVideo/Fragments/



 Provide URI-based
mechanisms for
uniquely identifying
fragments for media
objects on the Web,
such as video, audio,
and images.


Use Case
 Aidem received on her Facebook
wall a status message containing a
Media Fragment URI
 Use a ‘#’ !
 Highlight a video
sequence
 Highlight a region
to pay attention to


Requirements
 r01: Temporal fragments:
 a clipping along the time dimension from a start to an end time that
are within the duration of the media resource

 r02: Spatial fragments:
 a clipping of an image region, only consider rectangular regions

 r03: Track fragments:
 a track as exposed by a container format of the media resource

 r04: Named fragments:
 a media fragment - either a track, a time section, or a spatial region -
that has been given a name through some sort of annotation
mechanism


Side Conditions
 Restrict to what the container format (encapsulating the
compressed media content) can express (and expose),
thus no transcoding

 Protocol covered: HTTP(S), FILE, RTSP, RTMP
http://www.w3.org/TR/media-frags-reqs/

Media Fragments processing

 General principle:
 Smart UA will strip out the fragment definition and
encode it into custom http headers ...
 (Media) Servers will handle the request, slice the media
content and serve just the fragment while old ones will
serve the whole resource

 Four recipes proposed
 UA knows how to map a fragment into bytes
 UA sends a Range request expressed in a custom unit
 Variant with cacheability
 Server serves a playable media resource


Recipe 1: UA mapped byte ranges
 The User Agent knows how to map a custom unit into bytes and
sends a normal Range request expressed in bytes


Recipe 1: UA mapped byte ranges


Recipe 2: Server mapped byte ranges
 The UA sends a Range request expressed in a custom unit (e.g.
seconds), the server answers directly with a 206 Partial Content
and indicates the mapping between bytes and the custom unit


Recipe 2: Server mapped byte ranges


Implementation
 Media Fragment server (4 recipes supported):
 Ninsuna: http://ninsuna.elis.ugent.be/MediaFragmentsServer

 Media Fragment user agents:
 Ninsuna Flash player:
http://ninsuna.elis.ugent.be/MediaFragmentsPlayer
Supports recipe 1
 Silvia Pfeiffer's experiment with HTML5 + JS:
http://annodex.net/~silvia/itext/mediafrag.html
Supports recipe 1 (for .ogg files and time dimension)
 Firefox pluggin
development in order to
support all recipes
(HTML5 +
XMLHttpRequest)


Towards an Event-Based
Multimedia Web

We have directory of events...


We have knowledge about “many things”...

31/08/2010 -
16/09/2009 Multimedia Semantics: Metadata, Analysis and Interaction -SSMS 2010 - 125

Event-based centric interfaces

 Action or occurrence taking place at a certain
time at a specific location
 Useful for organizing and browsing collections of media
 Useful for discovering complex relationships between
data

 Need for an expressive event model for
connecting pieces of data

 Not Yet Another Model!


There are already many event ontologies
Event Model Ontology URL

CIDOC CRM http://cidoc.ics.forth.gr/OWL/cidoc_v4.2.owl

ABC Ontology http://metadata.net/harmony/ABC/ABC.owl

Event Ontology http://purl.org/NET/c4dm/event.owl#

EventsML-G2 http://www.iptc.org/EventsML/

Dolce+DnS Ultralite http://www.loa-cnr.it/ontologies/DUL.owl

F http://events.semantic-
multimedia.org/ontology/2008/12/15/model.owl
OpenCyc Ontology http://www.opencyc.org/

SEM http://semanticweb.cs.vu.nl/2009/04/event/


Fundamental Types of Events
 Aspect: ongoing activity vs transition between states
 cyc:Event ∩ cyc:StaticSituation ≤ cyc:Situation
 cidoc:E5.Event ∩ cidoc:E3.Condition_State ≤ cidco:E2.Temporal_Entity
 abc:Event is a transition between abc:Situation ≈ cidoc:E3.Condition_State

 Agentivity: who has produced the event?
 cyc:Action, dul:Action ≤ Event
 E7.Activity ≤ E5.Event
 abc:Action ∩ abc:Event = Ø
Events are fully described as a set of actions taken by specific agents
Issue for modeling e.g. earthquakes

 Interpretation matters!
 Identifiable changes or not? Agency can be assigned?
 dul:Situation describe dul:Event
 dul:Action, dul:Process ≤ dul:Event

Events and Temporal Intervals
 Relating events to chronological spans of time
 Persistent, socially attributed meanings
 Arbitrary system for subdividing an abstract space

 Modeling a class for temporal intervals and use an OP
 ABC, CIDOC, EO (owl:TemporalEntity)

 Modeling a XML Schema typed value and use a DP
 Pro: simplicity, values expressed as xsd:date or xsd:dateTime
 Cons: inability to express uncertain period or when there is no
coincidence with date units

 Having two properties
 dul:hasEventDate ... litteral value
 dul:isObservableAt ... dul:TimeInterval


Events, Spaces and Places
 Relating events to places
 Semantically significant places
 Abstract spatial regions

 Support spatial regions only: ABC, CIDOC, EO
 eo:Event  eo:place  wgs84:SpatialThing
 cidoc:E5.Event  cidoc:P7.took_place_at  cidoc:E53.Place

 Support the place/space distinction
 dul:Event  dul:hasLocation  dul:Place
 dul:Event  dul:hasRegion  dul:SpaceRegion
 Most flexible approach: allow to resolve to places with no
geographical coordinate systems (e.g. mythical events, SecondLife)


Participation in events
 Object involvement in events:
 Simple involvement in event:
abc:Event  abc:involves  owl:Thing (≤ abc:Actuality)
cidoc:E5.Event  cidoc:P12.occurred_in_the_presence_of  cidoc:E77
dul:Event  dul:hasParticipant  dul:Object
eo:Event  eo:factor  owl:Thing
 Tangible thing which results from an event:
abc:Event  abc:hasResult  owl:Thing
eo:Event  eo:product  owl:Thing

 Agent participation in events:
 abc:hasParticipant ≤ abc:hasPresence
 cidoc:P11.had_participant ≤ cidoc:P14.carried_out_by
 dul:involvesAgent ≤ abc:hasParticipant


Events, Influence, Purpose and Causality
 Making broad assertions linking events to any thing
 cidoc:P12.occurred_in_the_presence_of, cidoc:P15.was_influenced_by
 eo:factor, abc:hasResult

 F model uses the DnS pattern


Events, Parts and Composition

A's timespan ϵ B's timespan
 Event A being part of event B ≠

 cidoc:P86.falls_within for expressing containment among timespans
 cidoc:P9.consist_of ≈ eo:sub_event ≈ abc:isSubEventOf

 Linking sub-events with parthood
 dul:hasPart
The 20th century contains the year 1923
World War II included Pearl Harbour

 Linking sub-events with composition
 dul:hasConstituent
The French revolution is composed of the Bastille catch


Towards a Linked Data Event Model

31/08/2010 -
16/09/2009 Event-based Annotation and Exploration of Media - PetaMedia SYTIM, Lausanne (CH)
Multimedia Semantics: Metadata, Analysis and Interaction -SSMS 2010 - 135

Some mappings in LODE
ABC CIDOC DUL EO LODE

atTime P4.has_time_span isObservableAt time atTime

P7.took_place_at place inSpace

inPlace hasLocation atPlace

involves P12.occurred_in_the_ hasParticipant factor involved
presence_of

hasPresence P11.had_participant involvesAgent agent involvedAgent


31/08/2010 -
16/09/2009 Event-based Annotation and Exploration of Media - PetaMedia SYTIM, Lausanne (CH)
Multimedia Semantics: Metadata, Analysis and Interaction -SSMS 2010 - 137

What to do in Nimes in July?


Events and Media

Events are observable occurrences grouping

People Places Time

Experiences documented by Media

31/08/2010 - -
31/08/2010 Multimedia Semantics: Metadata, Analysis and Interaction -SSMS 2010
139 - 139

Goal

1. Discover PAST, PRESENT and FUTURE events
2. Live, relive and predict experiences through shared media
3. Identify meaningful and/or interesting relationships
between events/media/people

31/08/2010 - -
140 - 140

Exploratory Study

Online Survey (n=28), 2 group discussions (n=35)

Past Experiences
(Memorable Events) Existing Technologies
• Discovery • Opinions Scenarios
• Decision making • Interests Requirements
• Registering & sharing • Suggestions 1st Design Concept
• Meaningful relationships • Benefits/drawbacks

31/08/2010 - -
141 - 141

Results (1/3)

Discovery
 Invitations and recommendations
 Rely on traditional media
 Social networks (facebook - students)
 Previously attended events or venues

Decision Making
 Who’s Joining?
 Where, When, How Much?(constraints)
 What? (e.g. type, performer, topic)
 Subjective factors (fun, atmosphere)

31/08/2010 - -
31/08/2010 Multimedia Semantics: Metadata, Analysis andand Interaction -SSMS 2010
Multimedia Semantics: Metadata, Analysis Interaction -SSMS 2010 142- 142

Results (2/3)

Registering and Sharing
 Communicating their experience
 Pictures and short videos (for sharing)
 Media directories and social networks

Meaningful Relationships
 Similar categories, attributes and content
 User attendance (similar interests, behaviors)
 Repeated events (e.g. annual festivals)

31/08/2010 - -

Results (3/3)

Event Directories
 Single source event overview & information which allows
opportunistic/serendipitous discovery
 Limited exploration/browsing features
 Information overload (cluttered, difficult)
 Information incompleteness (coverage, decision)

Media Directories
 Aids decision making, remembering and sharing
experiences

Social Networks
 Allows communication, sharing and event attendance
31/08/2010 - -

Services

 Existing services to explore, share and
discover event

 Aggregate these heterogeneous data sources
 Enrich with media and social data


Semantization of Data

SEARCH

1,438,128 results
Machine tags
“lastfm:events”

Lastfm + flickr APIs

...Events[ event_id, ...medias[photo_id, user_id, url_t, url_o, title, description]]

LastFM events 2 LODE Upcoming + Flickr (363,137)
Eventful, Dailymotion, Youtube?

31/08/2010 - -
147 - 147

LODE Example
Jack recorded a video with his mobile phone camera while he was
attending the Haiti Relief concert from Radiohead given on January 24th, 2010 in
LA. He thinks it was a really nice experience and wants to share it on-line. He would
also like to see how other people experienced the show

31/08/2010 - -
148 - 148

Jamiroquai @ Sziget Festival (Budapest)


Take Home Message
 Concept detection challenges: machine learning and IR
 Features can be extracted and used to describe multimedia content
 Show generality of approach, dynamic nature of video (event)
 Show that an ontology can help

 Semantic metadata representation challenges: KR
 Media and metadata can be passed around and among systems
 Reuse what is there
 Expose what you make

 Interaction challenges: CHI
 Users can be given much richer
and more flexible access to (semantically annotated) content
 ... but we are still figuring out how to do this!


Credits

 Many people
 Cees Snoek, Marcel Worring, Alex Hauptmann,
Alan Smeaton, Ivan Herman, Krishna Chandramouli,
David Simonds, Laurent Le Meur
 Colleagues from the Interactive Information Access
Group, CWI Amsterdam

 Datasets

http://www.slideshare.net/troncy


Multimedia Semantics - SSMS 2010

Recommended

Recommended

More Related Content

Viewers also liked

Viewers also liked (20)

Similar to Multimedia Semantics - SSMS 2010

Similar to Multimedia Semantics - SSMS 2010 (20)

More from Raphael Troncy

More from Raphael Troncy (20)

Recently uploaded

Recently uploaded (20)

Multimedia Semantics - SSMS 2010