Citizen Sensing, Social Media Analytics, and Applications
1. Citizen Sensor Data Mining,
Social Media Analytics and
Development Centric Web Applications.
Tutorial at
Semantic Technology Conference,
San Francisco, CA.
Karthik Gomadam Amit Sheth Selvam Velmurugan
Accenture Technology Labs, Kno.e.sis @ eMoksha, Kiirti
San Jose Wright State
University
Monday, June 6, 2011
3. A Quick Word
Much of the work discussed in this tutorial is
primarily the doctoral research by Dr. Meena
Nagarajan, currently at IBM Almaden. It also
includes current work done at kno.e.sis center at
Wright State University.
Monday, June 6, 2011
4. Outline
Citizen Sensing: Role, Enablers, Apps
Systematic Study Social Media
Citizen Sensing @ Real-‐‑time
Emerging Research Areas
‣ Spam and Trust in Social Media, Mobile Social Computing
Research Application: Twitris
Tutorial part 2
Monday, June 6, 2011
5. Citizen Sensing
Everyday users of Web2.0 and social networks:
Citizens of an Internet- or Web-enabled social
community
Observation and Information reported by citizens
=> Citizen Sensing
Human-in-the-loop (participatory) sensing + Web
2.0 + mobile computing = emergence of
" citizen-sensor networks
Monday, June 6, 2011
6. Social Signals
The activity of observing, reporting, disseminating
information via text, audio, video and built in device
sensor (and smart devices),
‣ Creating social signals through aggregation, enhancement,
analysis, visualization, and interpretation.
Immense potential to disseminate information
quickly and in real-time
Monday, June 6, 2011
7. Enablers: Mobile Devices &
Ubiquitous Connectivity
Mobile device fast emerging as our primary tool
‣ Redefines the way we engage with people, information,
etc.
Global, Ubiquitous, always available
Sense where you are, how you are, …
Monday, June 6, 2011
8. Enablers: Mobile Devices &
Ubiquitous Connectivity
Global, Ubiquitous, always available
Sense where you are, how you are, …
Monday, June 6, 2011
9. Enablers: Mobile Devices &
Ubiquitous Connectivity
Sense where you are, how you are, …
Monday, June 6, 2011
11. Enablers: Mobile Devices &
Ubiquitous Connectivity
Mobile Platforms Hit Critical Mass
‣ Over 5 billion users
‣ 1+B with internet connected mobile devices (2010)
‣ Smartphones > Notebooks + Netbooks (2010E)
‣ 500K+ mobile phone applications
‣ 74% of mobile phone users (2.4B) worldwide texted (2007)
Monday, June 6, 2011
12. Enablers: Web 2.0 & Social Media
500M+ Facebook Users
100M+ Twitter users, 85M+ tweets/day
Internet Users: 1.8 Bln
Content dissemination medium
‣ Even for traditional media (@cnn, @nytimes)
Monday, June 6, 2011
13. Enablers: Web 2.0 & Social Media
100M+ Twitter users, 85M+ tweets/day
Internet Users: 1.8 Bln
Content dissemination medium
‣ Even for traditional media (@cnn, @nytimes)
Monday, June 6, 2011
14. Enablers: Web 2.0 & Social Media
Internet Users: 1.8 Bln
Content dissemination medium
‣ Even for traditional media (@cnn, @nytimes)
Monday, June 6, 2011
15. Enablers: Web 2.0 & Social Media
Content dissemination medium
‣ Even for traditional media (@cnn, @nytimes)
Monday, June 6, 2011
17. Enablers: Web 2.0 & Social Media
Types of UGC: Twitter(text/microblogs), Facebook
(multimedia),YouTube(videos),
Flicker(images), Blogs(text),
Ping: (Social network for music)
Monday, June 6, 2011
18. Enablers: Web 2.0 & Social Media
Flicker(images), Blogs(text),
Ping: (Social network for music)
Monday, June 6, 2011
19. Enablers: Web 2.0 & Social Media
Ping: (Social network for music)
Monday, June 6, 2011
21. Citizen Sensors in Action
Iran election
Haiti Earthquake
US healthcare
debate
Monday, June 6, 2011
22. Revolution 2.0
Political/Social Activism
“If you want to liberate a government, give them the
internet.” - Wael Ghonim (Egyptian social activist)
When Blitzer asked “Tunisia, then Egypt, what’s
next?,” Ghonim replied succinctly “Ask Facebook.”
Monday, June 6, 2011
23. Revolution 2.0
Political/Social Activism
When Blitzer asked “Tunisia, then Egypt, what’s
next?,” Ghonim replied succinctly “Ask Facebook.”
Monday, June 6, 2011
24. Revolution 2.0
Political/Social Activism
Monday, June 6, 2011
26. Social Media Influence:
Intelligence, News & Analysis
Many media companies use Facebook and Twitter
as news-delivery platform. Many individuals rely on
them as news source. News is increasingly social.
Monday, June 6, 2011
28. Development
(Education, Health, eGov)
LiveMocha (http://www.livemocha.com/)
‣ Online Language learning tool with social engagement
‣ bridging the gap!!
Soliya (http://www.soliya.net/)
‣ Dialogue between students from diverse " backgrounds
across the globe using latest multimedia technologies
Project Einstein (http://digital-democracy.org/what-we-do/programs/)
‣ A photography-based digital penpal program connecting
youths in refugee camps to the world
Monday, June 6, 2011
29. Development
(Education, Health, eGov)
Soliya (http://www.soliya.net/)
‣ Dialogue between students from diverse " backgrounds
across the globe using latest multimedia technologies
Project Einstein (http://digital-democracy.org/what-we-do/programs/)
‣ A photography-based digital penpal program connecting
youths in refugee camps to the world
Monday, June 6, 2011
30. Development
(Education, Health, eGov)
Project Einstein (http://digital-democracy.org/what-we-do/programs/)
‣ A photography-based digital penpal program connecting
youths in refugee camps to the world
Monday, June 6, 2011
31. Development
(Education, Health, eGov)
Monday, June 6, 2011
32. Development
(Education, Health, eGov)
PatientsLikeMe (http://mashable.com/2010/07/13/social-media-health-trends/)
TrialX (http://trialx.com)
Image: hMp://www.dragonsearchmarketing.com/
blog/
social-‐‑media-‐‑development-‐‑through-‐‑visual-‐‑aids-‐‑
tools/
Monday, June 6, 2011
34. Dimensions of Systematic Study
of Social Media
Spatio - Temporal -Thematic
+
People - Content - Network
Monday, June 6, 2011
35. Social Information
Processing
"Who says what, to whom, why, to what extent
and with what effect?" [Laswell]
Network: Social structure emerges from the
aggregate of relationships (ties)
People: poster identities, the active effort of
accomplishing interaction
Content : studying the content of ommunication.
Monday, June 6, 2011
36. Studying Online Human Social
Dynamics
How does the (semantics or style of) content fit
into the observations made about the network?
‣ Often, the three-‐‑dimensional dynamic of people,
content and link structure is what shapes the social
dynamic.
Monday, June 6, 2011
38. Studying Online Human Social
Dynamics
Example: how does the topic of discussion,
emotional charge of a conversation, the presence of an
expert and connections between participants; together
explain information propagation in a social network?
Monday, June 6, 2011
42. People Metadata: Variety of
Self-‐‑expression Modes on Multiple
Social Media Platforms
Explicit information from user profiles
‣ User Names, Pictures, Videos, Links, Demographic
Information, Group memberships...
‣ Often is not updated
Implicit information from user a+ention metadata
‣ Page views, Facebook 'ʹLikes'ʹ, Comments; TwiMer
'ʹFollows'ʹ, Retweets, Replies..
Monday, June 6, 2011
44. People Metadata: Continued
User Demographic Metadata Interest Level Metadata
•User-id •Author type
•Screen/Display-name of •Trustee/donor, journalist,
user blogger, scientist etc.
•Real name of user • Favorite tweets
•Location • Types of lists subscribed
•Profile Creation Date • Style of Writing –
•User description personality indicator
•User Bio • No. of Followees
•URL • Author type trend of
Followees
Monday, June 6, 2011
45. People Metadata: Continued
Activity Level Metadata Influence Level Metadata
(Inferring People Metadata from Network level Information)
•Age of the profile •No. of Followers – normal, influential
•Frequency of posts •No. of Mentions
•Timestamp of last status •No. of Retweets/Forwards
•No. of Posts •No. of Replies
•No. of Lists/groups created •No. of Lists/groups following
•No. of Lists/groups subscribed •No. of people following back
•Authority & Hub Scores
Web Presence:
•User affiliations
•KLOUT Score – influence measure (www.klout.com)
Monday, June 6, 2011
46. Content Metadata
Content Independent metadata
‣" date, location, author etc
Content Dependent metadata
‣ Direct content-based metadata
‣ Explicit/Mentioned Content metadata
‣ named entities in content
‣ Implicit/Inferred Content Metadata
‣ related named entities from knowledge sources
‣ Indirect content-based metadata (External metadata)
‣ context inferred from URLs in content (images, links to articles,
FourSquare checkins etc.)
Monday, June 6, 2011
47. Content Metadata
Content Dependent metadata
‣ Direct content-based metadata
‣ Explicit/Mentioned Content metadata
‣ named entities in content
‣ Implicit/Inferred Content Metadata
‣ related named entities from knowledge sources
‣ Indirect content-based metadata (External metadata)
‣ context inferred from URLs in content (images, links to articles,
FourSquare checkins etc.)
Monday, June 6, 2011
49. Content Independent Metadata
For Tweets
‣ Published date and time
‣ Location (where tweet was generated from)
‣ Tweet posting method (smart-phone, twitter.com,
clients for twitter)
‣ Author information
Monday, June 6, 2011
58. Metadata Creation & Extraction
Extracted Metadata
‣ Directly visible information from the user profile, tweet
content & community structure
Created Metadata
‣ After processing information in the user profile, content
and/or network structure
Monday, June 6, 2011
59. An Example
Length: 144 characters; General topic: Egypt protest
This poor {sentiment_expression: {target:”Lara
Logan”, polarity:”negative”}} woman! RT @THR CBS
News'{entity:{type=”News Agency”}} Lara Logan
{entity:{type=”Person”}} Released From Hospital
{entity:{type=”Location”}} After Egypt{entity:
{type=”Country”} Assault{type=”topic”}
http://bit.ly/dKWTY0 {external_URL}
Monday, June 6, 2011
60. Why Semantic Web is a standard
for social metadata?
Rich Snippet, RDFa, open graph, semantic web
based social data standards
Relationships/connections play central role
‣ Relationships as first class object is important
Monday, June 6, 2011
62. Semantic Web: A Very Short
Primer
Representation
‣ RDF
‣ relationships as first class object <subject,
predicate,object>
‣ OWL
‣ Representing Knowledge and Agreements:
nomenclature, taxonomy, folksonomy, ontology
Monday, June 6, 2011
64. Semantic Web: A Very Short
Primer
Annotation
‣ RDFa, Xlink, model reference
Monday, June 6, 2011
65. Semantic Web: A Very Short
Primer
Annotation
‣ RDFa, Xlink, model reference
Web of Data
‣ Linked Open Data
Monday, June 6, 2011
66. Semantic Web: A Very Short
Primer
Annotation
‣ RDFa, Xlink, model reference
Web of Data
‣ Linked Open Data
Querying
‣ SPARQL; Rules: SWRL, RIF
Monday, June 6, 2011
67. How to save and use metadata?
Store metadata as data and use standard database
techniques
Use filtering and clustering, summarization,
statistics - implicit semantics
Monday, June 6, 2011
68. How to save and use metadata?
Use filtering and clustering, summarization,
statistics - implicit semantics
Monday, June 6, 2011
69. How to save and use metadata?
Monday, June 6, 2011
70. How to save and use metadata?
Monday, June 6, 2011
71. How to save and use metadata?
Use explicit semantics and Semantic Web
standards and technologies
‣semantics = meaning
‣richer representation, support for relationships, context
‣supports use of background knowledge
‣better integration, powerful analysis
Semantics- the implicit, the formal and the
powerful
Social metadata on the Web
Monday, June 6, 2011
72. Metadata Extraction from
Informal Text
Meena Nagarajan, Understanding User-Generated Content on
Social Media, Ph.D. Dissertation, Wright State University, 2010
Monday, June 6, 2011
75. Content Analysis-‐‑Typical Sub-‐‑tasks
Recognize key entities mentioned in content
‣ Information Extraction (entity recognition, anaphora
resolution, entity classification..)
‣ Discovery of Semantic Associations between entities
Topic Classification, Aboutness of content
‣ What is the content about?
Intention Analysis
‣ Why did they share this content?
Monday, June 6, 2011
76. Content Analysis-‐‑Typical Sub-‐‑tasks
Topic Classification, Aboutness of content
‣ What is the content about?
Intention Analysis
‣ Why did they share this content?
Monday, June 6, 2011
80. Content Analysis-‐‑Typical Sub-‐‑tasks
Sentiment Analysis
‣What opinions are people conveying via the content?
Author Profiling
‣What can we infer about the author from the content he
posts?
Context (external to content) extraction
‣URL extraction, analyzing external content
Monday, June 6, 2011
81. Research Efforts, Contributions in
this space..
Examining usefulness of multiple context cues
for text mining algorithms
‣ Compensating for for informal, highly variable
language, lack of context
‣ Using context cues: Document corpus, syntactic,
structural cues, social medium, external domain
knowledge…
In this talk, highlighting sample metadata
creation tasks: NER, Key Phrase Extraction,
Intention, Sentiment/Opinion Mining
Monday, June 6, 2011
82. Part 1. NER, Key
Phrase Extraction
Named Entity Recognition
‣ I loved <movie> the hangover </movie>!
Key Phrase Extraction
Monday, June 6, 2011
84. Multiple Context Cues Utilized for
Keyphrase Extraction from TwiTer,
Facebook and MySpace
Monday, June 6, 2011
85. Focus, Impact
Techniques focus on
‣ relatively less explored content aspects on social
media platforms
Combination of top-down, bottom-up analysis
for informal text
‣ Statistical NLP, ML algorithms over large corpora
‣ Models and rich knowledge bases in a domain
Monday, June 6, 2011
87. NAMED ENTITY
RECOGNITION
I loved your music Yesterday!
“It was THE HANGOVER of the year..lasted
forever..
So I went to the movies..badchoice picking “GI
Jane”worse now”
Monday, June 6, 2011
88. NAMED ENTITY
RECOGNITION
Identifying and classifying tokens
Monday, June 6, 2011
89. NER in prior work vs. NER for
Informal Text
Monday, June 6, 2011
90. Cultural Named Entities
NER focus in this work: Cultural Named
Entities
Artifacts of Culture
‣ Name of a books, music albums, films, video games,
etc.
Common words in a language
‣ The Lord of the Rings, Lips, Crash, Up, Wanted,
Today, Twilight, Dark Knight…
Monday, June 6, 2011
91. Characteristics of Cultural Entities
Varied senses, several poorly documented
‣ Merry Christmas covered by 60+ artists Star Trek:
movies, TV series, media franchise.. and cuisines !!
Changing contexts with recent events
‣ The Dark Knight reference to Obama, health care
reform
Unrealistic expectations
‣ Comprehensive sense definitions, enumeration of
contexts, labeled corpora for all senses ..
‣ NER Relaxing the closed-world sense assumptions
Monday, June 6, 2011
92. NER in prior work vs.
NER for Informal Text
Monday, June 6, 2011
93. A Spot and Disambiguate
Paradigm
NER generally a sequential prediction problem
‣ NER system that achieves 90.8 F1 score on the
CoNLL-2003 NER shared task (PER, LOC, ORGN
entities) [Lev Ratinov, Dan Roth]
Focus of approach: Spot and Disambiguate
Paradigm
Starting off with a dictionary or list of entities we
want to spot
Monday, June 6, 2011
94. A Spot and Disambiguate
Paradigm
Spot, then disambiguate in context (natural
language, domain knowledge cues)
Binary Classification
Is this mention of “the hangover” in a sentence
referring to a movie?
Monday, June 6, 2011
95. NER in prior work vs.
NER for Informal Text
Monday, June 6, 2011
97. Algorithmic Contributions
Supervised Algorithms
Examples:
“I am watching Pattinson scenes in <movie
id=2341> Twilight</movie> for the nth time.”
“I spent a romantic evening watching the Twilight
by the bay..”
“I love <artist id=357688>Lily’s</artist> song
Monday, June 6, 2011
102. Algorithm Preliminaries
Goal: Semantic Annotation of
music named entities (w.r.t
MusicBrainz)
Monday, June 6, 2011
103. Using a Knowledge Resource for
NER is not straight-‐‑forward..
Monday, June 6, 2011
104. Approach Overview
Scoped Relationship graphs
‣Using context cues from the
content, webpage title, url…
new Merry Christmas tune
‣Reduce potential entity spot size
new albums/songs
‣Generate candidate entities
‣Spot and Disambiguate
Monday, June 6, 2011
105. Sample Real-‐‑world Constraints
Career Restrictions
‣“release your third album already..”
Recent Album restrictions
‣“I loved your new album..”
Artist age restrictions
‣”happy 25th rihanna, loved alfie btw..” etc.
Monday, June 6, 2011
106. Non-‐‑Music Mentions
Challenge 1: Several senses in the same domain
‣ Scoping relationship graphs narrows possible senses
‣ Solves the named entity identification problem
partially
Challenge 2: Non-music mentions
‣ Got your new album Smile. Loved it!
‣ Keep your SMILE on!
" " " "
" " " "
Monday, June 6, 2011
107. Non-‐‑Music Mentions
Challenge 1: Several senses in the same domain
‣ Scoping relationship graphs narrows possible senses
‣ Solves the named entity identification problem
partially
Challenge 2: Non-music mentions
‣ Got your new album Smile. Loved it!
‣ Keep your SMILE on!
" " " "
" " " "
Monday, June 6, 2011
108. Using Language Features to
eliminate incorrect mentions..
Syntactic features
‣ POS Tags, Typed dependencies..
‣ Example here
Word-level features
‣ Capitalization, Quotes
Domain-level features
Monday, June 6, 2011
110. Hand Labeling -‐‑ Fairly Subjective
1800+ spots in MySpace user comments from
artist pages
Keep your SMILE on!
–good spot, bad spot, inconclusive?
4-‐‑way annotator agreements
–Madonna 90% agreement
–Rihanna 84% agreement
–Lily Allen 53% agreement
Monday, June 6, 2011
111. Dictionary SpoTer + NLP Step
Daniel Gruhl, Meena Nagarajan, Jan Pieper, Christine Robson, Amit Sheth, Context and Domain
Knowledge Enhanced Entity SpoMing in Informal Text, The 8th International Semantic Web Conference,
2009: 260-‐‑276
Monday, June 6, 2011
112. NER on Social Media Text using
Domain Knowledge
Highlights issues with using a domain
knowledge for an IE task
Two stage approach: chaining NL learners over
results of domain model based spotters
Improves accuracy up to a further 50%
‣ allows the more time-intensive NLP analytics to
run on less than the full set of input data
Monday, June 6, 2011
113. BBC SoundIndex (IBM Almaden):
Pulse of the Online Music
" "
Daniel Gruhl, Meenakshi Nagarajan, Jan Pieper, Christine Robson, Amit Sheth: “Multimodal Social
Intelligence in a Real-‐‑Time Dashboard System,” special issue of the VLDB Journal on "ʺData Management
and Mining for Social Networks and Social Media"ʺ, 2010 CHECK hMp://www.almaden.ibm.com/cs/
projects/iis/sound/
Monday, June 6, 2011
114. The Vision
http://www.almaden.ibm.com/cs/projects/iis/sound/
Monday, June 6, 2011
116. Several Insights
Trending popularity of artists Trending topics in artist pages
Only 4% -‐‑ve sentiments, perhaps ignore the Sentiment Ignoring Spam can change ordering
Annotator on this data source? of popular artists
Monday, June 6, 2011
117. Predictive Power of Data
Billboards Top 50 Singles chart during the week of
Sept 22-28 ’07 vs. MySpace popularity charts.
User study indicated 2:1 and upto 7:1 (younger age
groups) preference for MySpace list.
Challenging traditional polling methods!
Monday, June 6, 2011
119. Key Phrase Extraction: Example
Key phrases extracted from prominent discussions
on Twitter around the 2009 Health Care Reform
debate and 2008 Mumbai Terror Attack on one day
Monday, June 6, 2011
120. Key Phrase Extraction from SM
Text
Different from Information Extraction
Extracting vs. Assigning Key Phrases " Focus:
Key Phrase Extraction
Prior work focus: extracting phrases that
summarize a document -- a news article, a web
page, a journal article, a book..
Focus: summarize multiple documents (UGC)
around same event/topic of interest
Monday, June 6, 2011
121. Key Phrase Extraction on SM
Content
Focus: Summarizing Social Perceptions via key
phrase extraction
Preserving/Isolating the social behind the social
data
‣"What is said in Egypt vs. the USA should be viewed in
isolation
Monday, June 6, 2011
122. Key Phrase Extraction on SM
Content
‣ Accounting for redundancy, variability, off-topic
content
" “Met up with mom for lunch, she looks lovely as ever,
good genes .. Thanks Nike, I love my new
Gladiators ..smooth as a feather. I burnt all the calories of
Italian joy in one run.. if you are looking for good Italian
food on Main, Bucais the place to go.”
Monday, June 6, 2011
123. Social and Cultural Logic in SMC
Thematic components
‣ similar messages convey similar ideas
Space, time metadata
‣ role of community and geography in communication
Poster attributes
‣ age, gender, socio-economic status reflect similar
perceptions
Monday, June 6, 2011
124. Feature Space (common to several
efforts)
Focus: n-grams, spatio-temporal metadata (social
components)
Syntactic Cues: In quotes, italics, bold; in
document headers; phrases collocated with
acronyms
Monday, June 6, 2011
125. Feature Space (common to several
efforts)
Document and Structural Cues: Two word
phrases, appearing in the beginning of a
document, frequency, presence in multiple similar
documents etc.
Linguistic Cues: Stemmed form of a phrase,
phrases that are simple and compound nouns in
sentences etc.
Monday, June 6, 2011
126. Key Phrase Extraction: Overview
“President Obama in trying to regain control of the
health-care debate will likely shift his pitch in
September”
" 1-grams: President, Obama, in, trying, to, regain, ...
" 2-grams: “President Obama”, “Obama in”, “in
trying”, “trying
Monday, June 6, 2011
127. A descriptor is an n-gram weighted by:
‣ Thematic Importance
‣ TFIDF, stop words, noun phrases
‣ Redundancy: statistically discriminatory in nature
‣ variability: contextually important
‣ Spatial Importance (local vs. global popularity)
‣ Temporal Importance (always popular vs. currently trending)
Monday, June 6, 2011
129. Eliminating Off-topic Content [WISE2009]
Frequency based heuristics will not eliminate
off-topic content that is ALSO POPULAR
Monday, June 6, 2011
130. Approach Overview
“Yeah i know this a bit off topic but the other
electronics forum is dead right now. im looking
for a good camcorder, somethin not to large that
can record in full HD only ones so far that ive
seen are sonys”
“CanonHV20.Great little cameras under $1000.”
Monday, June 6, 2011
131. Approach Overview
Assume one or more seed words (from domain
knowledge base) C1 -['camcorder']
Extracted Key words / phrases
C2 -['electronics forum', 'hd', 'camcorder', 'somethin', 'ive',
'canon', 'little camera', 'canon hv20', 'cameras', 'offtopic']
Gradually expand C1 by adding phrases from C2
that are strongly associated with C1
Mutual Information based algorithm [WISE2009]
Monday, June 6, 2011
132. Key Phrases and Aboutness
Evaluations
Are the key phrases we extracted topical and
good indicators of what the content is about?
‣ If it is, it should act as an effective index/search
phrase and return relevant content
Evaluation Application: Targeted Content
Delivery
Monday, June 6, 2011
133. Targeted Content
Delivery -‐‑Evaluations
12K posts from MySpace and Facebook
Electronics forums
‣ Baseline phrases: Yahoo Term Extractor
‣ Our method phrases: Key phrase extraction,
elimination
Targeted Content from Google AdSense
Monday, June 6, 2011
134. Targeted Content for all content
vs. extracted key phrases
Monday, June 6, 2011
136. Impact and Contributions
TFIDF + social contextual cues yield more useful
phrases that preserve social perceptions
Corpus + seeds from a domain knowledge base
eliminate off-topic phrases effectively
Monday, June 6, 2011
138. Targeted Content Delivery via
Intention Mining
On social networks
Use case for this talk
‣" Targeted content = content-based " advertisements
‣ " Target = user profiles
Content-based advertisements CBAs
‣" Well-known monetization model for online content
Monday, June 6, 2011
141. What is going on here
Interests do not translate to purchase intents
‣" Interests are often outdated..
‣ " Intents are rarely stated on a profile..
Cases that do seem to work
‣" New store openings, sales
‣ " Highly demographic-targeted ads
Monday, June 6, 2011
144. Targeted Content-‐‑based
Advertising
Non-trivial
‣ Non-policed content
Brand image, Unfavorable sentiments
‣ People are there to network
User attention to ads is not guaranteed
‣ Informal, casual nature of content
‣ People are sharing experiences and events
Main message overloaded with off topic content"
Monday, June 6, 2011
146. Targeted Content-‐‑based
Advertising
I NEED HELP WITHSONY VEGAS PRO 8!! Ugh and ihave a
video project due tomorrow for merrilllynch :(( all ineed to
do is simple: Extract several scenes from a clip, insert
captions, transitions and thatsit. really. omggicant figure out
anything!! help!! and igot food poisoning from eggs. its not
fun. Pleasssse, help? :(
Learning from Multi-topic Web Documents for Contextual
Advertisement, Zhang, Y., Surendran, A. C., Platt, J. C., and
Narasimhan, M.,KDD 2008
Monday, June 6, 2011
147. Preliminary Results in…
Identifying intents behind user posts on social
networks
‣ Identify Content with monetization potential
Identifying keywords for advertising in user-
generated content
‣ Considering interpersonal communication & off-topic
chatter
Monday, June 6, 2011
148. Investigations
User studies
‣ Hard to compare activity based ads to s.o.t.a
‣ Impressions to Clickthroughs
‣ How well are we able to identify monetizable posts
‣ How targeted are ads generated using our " keywords
vs. entire user generated content
Monday, June 6, 2011
149. Identifying Monetizable Intents
Scribe Intent not same as Web Search Intent 1B.
People write sentences, not keywords or phrases
Presence of a keyword does not imply
navigational / transactional intents
‣ ‘am thinking of getting X’ (transactional)
‣ ‘I like my new X’ (information sharing)
‣ ‘what do you think about X’ (information seeking)
1B. J. Jansen, D. L. Booth, and A. Spink, “Determining the informational, navigational, and transactional intent of web
queries,”Inf. Process. Manage., vol. 44, no. 3, 2008.
Monday, June 6, 2011
150. From X to Action PaTerns
Action patterns surrounding an entity
‣ How questions are asked and not topic words that indicate
what the question is about
‣ “where can I find a chottopspcam”
‣ User post also has an entity
Monday, June 6, 2011
151. Conceptual Overview
Bootstrapping to learn IS paTerns
Set of user posts from SNSs
Not annotated for presence or absence of any intent
Monday, June 6, 2011
152. Bootstrapping to
learn IS paTerns
Generate a universal set of n-‐‑ gram paMerns; freq > f
S = set of all 4-‐‑grams; freq > 3
Monday, June 6, 2011
153. Bootstrapping to
learn IS paTerns
! !
Generate set of candidate paMerns from seed words
(why,when,where,how,what)
Sc= all 4-‐‑grams in S that extract seed words
Monday, June 6, 2011
154. Bootstrapping to
learn IS paTerns
! !
User picks 10 seed paMerns from Sc
Sis= ‘does anyone know how’, ‘where do I find’,
‘someone tell me where’…
Monday, June 6, 2011
155. Bootstrapping to
learn IS paTerns
! !
! !
Gradually expand Sis by adding Information
Seeking paDerns from Sc
Monday, June 6, 2011
156. Bootstrapping to
learn IS paTerns
! !
! !
For every pis in Sis generate set of filler paMerns
Monday, June 6, 2011
157. Bootstrapping to
learn IS paTerns
‘.* anyone know how’‘
does .* know how’
‘does anyone .* how’ ‘does anyone
know .*’
Monday, June 6, 2011
163. Expanding the PaTern Pool
Functional properties / communicative functions
of words
From a subset of LIWC
–cognitive mechanical (e.g., if, whether, wondering, find)
•‘I am thinking about geMing X’
–adverbs(e.g., how, somehow, where)
– (e.g., someone, anybody, whichever)
•‘Someone tell me where can I find X’
1Linguistic Inquiry Word Count, LIWC, hMp://liwc.net
Monday, June 6, 2011
164. Details in [WISE2009] for..
Over iterations, single-‐‑word substitutions,
functional usage and empirical support
conservatively expands Sis
Infusing new paMerns and seed words
Stopping conditions
Monday, June 6, 2011
166. Identifying Monetizable Posts
Information Seeking paMerns generated offline
Information seeking intent score of a post
‣ Extract and compare paMerns in posts with
extracted paMerns
‣ Transactional intent score of a post
‣ LIWC ‘Money’ dictionary -‐‑ 173 words and
word forms indicative of transactions, e.g.,
trade, deal, buy, sell, worth, price etc.
Monday, June 6, 2011
167. Keywords for Advertizing
Identifying keywords in monetizable posts
" –Plethora of work in this space
Off-topic noise removal is our focus
" I NEED HELP WITHSONY VEGAS PRO 8!! Ugh
and ihave a video project due tomorrow for
merrilllynch :(( all ineed to do is simple: Extract
several scenes from a clip, insert captions,
Monday, June 6, 2011
168. Keywords for Advertising
Identifying keywords in monetizable posts
‣ Plethora of work in this space
Off-topic noise removal is our focus
‣ I NEED HELP WITHSONY VEGAS PRO 8!! Ugh and
ihave a video project due tomorrow for merrilllynch :(( all
ineed to do is simple: Extract several scenes from a clip,
insert captions, transitions and thatsit. really. omggicant
figure out anything!! help!! and igot food poisoning from
eggs. its not fun. Pleasssse, help? :(
Monday, June 6, 2011
169. Conceptual Overview
(also see slides 88,89)
Topical hints
‣ C1 -['camcorder']
Keywords in post
‣ C2 -['electronics forum', 'hd', 'camcorder', 'somethin', 'ive', 'canon',
'little camera', 'canon hv20', 'cameras', 'offtopic']
Move strongly related keywords from C2 to C1 one-by-one
‣ Relatedness determined using information gain
‣ Using the Web as a corpus, domain independent
Monday, June 6, 2011
171. Evaluations -‐‑User Study
Keywords from 60 monetizable user posts
‣ Monetizable intent, at least 3 keywords in content
45 MySpace Forums, 15 Facebook Marketplace, 30
graduate students
‣ 10 sets of 6 posts each
‣ Each set evaluated by 3 randomly selected users
Monetizable intents?
‣ All 60 posts voted as unambiguously information seeking in intent
Monday, June 6, 2011
172. 1. Effectiveness of using
topical keywords
Google AdSenseads for user post vs. extracted
topical keywords
Monday, June 6, 2011
174. Result -‐‑2X Relevant Impressions
Users picked ads relevant to the post
‣ At least 50% inter-evaluator agreement
For the 60 posts
‣ Total of 144 ad impressions
‣ 17% of ads picked as relevant
For the topical keywords
‣ Total of 162 ad impressions
‣ 40% of ads picked as relevant
Monday, June 6, 2011
175. 2. Profile Ads vs. Activity Ads
User’s profile information
‣ Interests, hobbies, TV shows..
‣ Non-demographic information
Submit a post
Looking to buy and why (induced noise)
Ads that generate interest, captured attention
Monday, June 6, 2011
176. Result -‐‑8X Generated Interest
Using profile ads
‣ Total of 56 ad impressions
‣ 7% of ads generated interest
Using authored posts
‣ Total of 56 ad impressions
‣ 43% of ads generated interest
•" Using topical keywords from authored posts
‣ Total of 59 ad impressions
‣ 59% of ads generated interest
Monday, June 6, 2011
177. To note…
User studies small and preliminary, clearly suggest
‣ Monetization potential in user activity
‣ Improvement for Ad programs in terms of relevant
impressions
Evaluations based on forum, marketplace
‣ Verbose content
‣ Status updates, notes, community and event
memberships…
‣ One size may not fit all
Monday, June 6, 2011
178. To note…
A world between relevant impressions and click
throughs
‣ Objectionable content, vocabulary impedance, Ad
placement, network behavior
In a pipeline of other community efforts
No profile information taken into account
Cannot custom send information to Google AdSense
Monday, June 6, 2011
180. Content Analysis: Sentiment
Analysis/Opinion Mining
Two main types of information we can learn from
user-generated content: fact vs. opinion
Much of what we read in social media (e.g., blogs,
Twitter, Facebook) is a mix of facts and opinions.
For example, " Latest news: Mobile web services not
working in #Bahrain and Internet is extremely slow
#feb14 {fact}... looks like they "learned" from #Egypt
{opinion}"
Monday, June 6, 2011
181. Sentiment Analysis Motivation
Why do
Which movie What customers people oppose
should I see? complain about? health care
reform?
Monday, June 6, 2011
182. Sentiment Analysis: Tasks
Example:
‣ How awful that many #Egyptian artifacts are in danger of
being destroyed.
‣ What Zahi Hawass must be thinking #jan25 (read in the
tone of “what were YOU thinking”
Monday, June 6, 2011
184. Sentiment Analysis: Tasks
Classification: overall sentiment polarity: positive/
neutral/negative
‣Example: “How awful that many #Egyptian artifacts are
in danger of being destroyed.”
‣overall polarity is negative
‣Target-specific sentiment polarity: positive/neutral/
negative
‣ Example: for target "egyptian artifacts", polarity is
"negative“ for target "Zahi Hawass", polarity is "neutral“
Monday, June 6, 2011
189. Sentiment Analysis: Approaches
Identification & Extraction:
‣utilizing the relations between opinion and opinion target,
‣proximity,
‣syntactic dependency,
‣co-occurrence and
‣prepared patterns/rules
Monday, June 6, 2011
190. Sentiment Analysis:
From Tweets to polls
corpus:
• 0.7 billion tweets,
Jan 2008 – Oct
2009
• 1.5 billion tweets,
Jan 2008 – May
2010
Lexicon-based approach for sentiment analysis of tweets:
subjective lexicon from OpinionFinder (Wilson et al., 2005)
Within topic tweets, count messages containing these positive and
negative words defined by the lexicon
Monday, June 6, 2011
191. Sentiment Analysis:
From Tweets to polls
corpus:
• 0.7 billion tweets,
Jan 2008 – Oct
2009
• 1.5 billion tweets,
Jan 2008 – May
2010
subjective lexicon from OpinionFinder (Wilson et al., 2005)
Within topic tweets, count messages containing these positive and
negative words defined by the lexicon
Monday, June 6, 2011
192. Sentiment Analysis:
From Tweets to polls
corpus:
• 0.7 billion tweets,
Jan 2008 – Oct
2009
• 1.5 billion tweets,
Jan 2008 – May
2010
Within topic tweets, count messages containing these positive and
negative words defined by the lexicon
Monday, June 6, 2011
193. Sentiment Analysis:
From Tweets to polls
corpus:
• 0.7 billion tweets,
Jan 2008 – Oct
2009
• 1.5 billion tweets,
Jan 2008 – May
2010
B.O’Connor, R.Balasubramanyan, B.R.Routledge, and
N.A.Smith. From Tweets to polls: Linking text sentiment to public
opinion time series. In Intl.AAAI Conference on Weblogs and
Social Media, Washington,D.C.,2010.
Monday, June 6, 2011
194. Sentiment Analysis: Predicting
the Future With Social Media
Corpus: 2.89 million tweets referring to 24 movies released over a period of three months
Sentiment Analysis Classifier:
DynamicLMClassifier provided by LingPipe linguistic analysis package
thousands of workers from the Amazon Mechanical Turk to assign
sentiments (positive, negative, neutral) for a large random sample of tweets
train the classifier using an n-gram model
S. Asur and B.Huberman. Predicting the Future With Social Media. 2010. hMp://arxiv.org/abs/1003.5699
Monday, June 6, 2011
195. Sentiment Analysis: Predicting
the Future With Social Media
Sentiment Analysis Classifier:
DynamicLMClassifier provided by LingPipe linguistic analysis package
thousands of workers from the Amazon Mechanical Turk to assign
sentiments (positive, negative, neutral) for a large random sample of tweets
train the classifier using an n-gram model
S. Asur and B.Huberman. Predicting the Future With Social Media. 2010. hMp://arxiv.org/abs/1003.5699
Monday, June 6, 2011
196. Sentiment Analysis: Predicting
the Future With Social Media
DynamicLMClassifier provided by LingPipe linguistic analysis package
thousands of workers from the Amazon Mechanical Turk to assign
sentiments (positive, negative, neutral) for a large random sample of tweets
train the classifier using an n-gram model
S. Asur and B.Huberman. Predicting the Future With Social Media. 2010. hMp://arxiv.org/abs/1003.5699
Monday, June 6, 2011
197. Sentiment Analysis: Predicting
the Future With Social Media
thousands of workers from the Amazon Mechanical Turk to assign
sentiments (positive, negative, neutral) for a large random sample of tweets
train the classifier using an n-gram model
S. Asur and B.Huberman. Predicting the Future With Social Media. 2010. hMp://arxiv.org/abs/1003.5699
Monday, June 6, 2011
198. Sentiment Analysis: Predicting
the Future With Social Media
train the classifier using an n-gram model
S. Asur and B.Huberman. Predicting the Future With Social Media. 2010. hMp://arxiv.org/abs/1003.5699
Monday, June 6, 2011
199. Sentiment Analysis: Predicting
the Future With Social Media
S. Asur and B.Huberman. Predicting the Future With Social Media. 2010. hMp://arxiv.org/abs/1003.5699
Monday, June 6, 2011
200. Sentiment Analysis: Target-‐‑specific opinion
identification & Classification of
Tweets-‐‑Unsupervised Approach
Simple lexicon-‐‑based method doesn'ʹt work.
Observations:
The opinions may not contribute toward the given target (1,2,3,6)
The subjectivity and polarity of opinion clues are domain-
dependent (5,7)
Single words are not enough (4,7,8)
Monday, June 6, 2011
201. Sentiment Analysis: Target-‐‑specific opinion
identification & Classification of
Tweets-‐‑Unsupervised Approach
General subjective lexicon
‣ Commonly used subjective lexicon + popular slangs learned from
Urban Dictionary
Domain-‐‑dependent sentiment lexicon
‣ Learned from domain-‐‑specific corpus
‣ bootstrapping
‣ More than words (word/phrase/paMern)
‣ n-‐‑gram + statistical model
Monday, June 6, 2011
202. Sentiment Analysis: Target-‐‑specific opinion
identification & Classification of
Tweets-‐‑Unsupervised Approach
General subjective lexicon
‣ Commonly used subjective lexicon + popular slangs learned from
Urban Dictionary
Domain-‐‑dependent sentiment lexicon
‣ Learned from domain-‐‑specific corpus
‣ bootstrapping
‣ More than words (word/phrase/paMern)
‣ n-‐‑gram + statistical model
Monday, June 6, 2011
203. Sentiment Analysis: Target-‐‑specific opinion
identification & Classification of
Tweets-‐‑Unsupervised Approach
Domain-‐‑dependent sentiment lexicon
‣ Learned from domain-‐‑specific corpus
‣ bootstrapping
‣ More than words (word/phrase/paMern)
‣ n-‐‑gram + statistical model
Monday, June 6, 2011
208. Content Analysis: Context
Extraction, Utilization
URL Extraction is for Tweets
FourSquare in Facebook, TwiMer
What is it in other mediums/SMS?
Monday, June 6, 2011
210. Author Categorization: Using
Content to derive additional
People metadata
Personality Signals
Blogs, Style of Writing
Psychometric analysis of content
Sample study: Gendered writing styles online
Monday, June 6, 2011
211. People Analysis: Using Network
to derive People metadata
Interesting questions to ask:
‣ Who are the most popular people* in the network
‣ Who are the most influential people in the network
‣ Who are the most active people in the network
‣ What are the types of people in communities of the
network
‣ Who are the bridges between communities in the network
Monday, June 6, 2011
212. People Analysis: Influence
By Link Analysis Algorithms
Hits [K-99] & variants
PageRank [BP-97] & variants etc..
Links not sufficient!
‣ Million Follower Fallacy [C-10]
Source : informing-arts
Monday, June 6, 2011
214. People Analysis: Influence
Flavor of Context Analysis (activity level)
Popularity NOT = Influence!
‣ Influence & Passivity [RGAH-10]
Interest Similarity
‣ TwitterRank: Reciprocity & Homophily [WLJH-10]
Klout Score - True Reach, Amplification [Klout]
Monday, June 6, 2011
215. People Analysis: User types
& Affiliation
Blogger, Scientist, Journalist, Artist, Trustee,
Company X in Domain Y..
‣ Multiple types and affiliations!
User interest mining
‣ Key Phrase Extraction followed by semantic association on
user bio, tweets, lists, favorite posts Source: kahunainstitute.com
‣ Twitter Study [BCDMJNRM-09]
Monday, June 6, 2011
217. People Analysis: User types
& Affiliation
Semantic analysis of profile description
‣ Web Presence: Use of Web & Knowledge bases
(Wikipedia, Blogs) to build context for user types
‣ Entity Spotting & Extraction, followed by Semantic
Association and Similarity with user-type context
Monday, June 6, 2011
218. People Analysis:
Social Engagement
Source: http://www.syscomminternational.com/
Frequency Distribution Analysis of user activity
‣ posting, retweet, reply, mentions, lists etc.
Monday, June 6, 2011
219. Network Analysis
Foundation of network:
•Nodes
•Connections/Relationships
Interesting questions to ask:
How communities form around topics-‐‑ growth & evolution
What are the effects of presence of influential participants in the
communities
What are the effects of content nature (or sentiment, opinions)
flowing in network on the community life
What is the community structure: degree of separation and sub-‐‑
communities
Monday, June 6, 2011
222. Network Analysis: Algorithms
Community Discovery, growth, evolution
‣ Based on relationship types (e.g., signed network),
geography/location based etc.
Hierarchical clustering algorithms – Top-down,
bottom-up
Modularity Maximization [NW-06]
Algorithms comparison survey [B-06]
Monday, June 6, 2011
223. Network Analysis: Algorithms
Graph Partitioning & Traversal
Best time-complexity & reachability
Follow Greedy paths
‣ K-way multilevel Partitioning ,
‣ Bron-Kerbosch, K-plex, K-core or N-cliques, DFS, BFS,
MST
"ʺWe dream in Graph and
We analyze in Matrix”-‐‑
Barry Wellman, INSNA
Monday, June 6, 2011
224. Network Analysis: Methods
Network Modeling Approaches
‣ Random graph model (Erdos-Renyi model)
‣ Small-world model (Small World Phenomenon)
‣ Scale-free model (led to Power-Law degree distribution)
‣ Social Network Analysis methods
‣ Centrality (Degree, Eigenvector, Betweenness, Closeness)
‣ Clusters (Cliques and extensions, Communities)
Source: http://www.kudos-
dynamics.com/
Monday, June 6, 2011
225. Network Analysis:
Diffusion & Homophily
Information Flow: Diffusion
‣ Maximizing Spread (Opinion, Innovation, Recommendation)
‣ Outbreak Detection (e.g., disease)
Social Network: No info about user action–
Understanding dynamics is challenging!
Power Law distribution [LAH-07]
Factors impacting flow:
‣ Sampling strategy, user Homophily, content nature
[CLSCK-10, NPS-10]
Monday, June 6, 2011
230. Real-‐‑Time Motivation
People cant wait for Information
500 years ago
‣ Single life time
20 years ago
‣ Next day or two
‣ Television, News papers
Presently
‣ Minutes are not considered fast enough
‣ Digital media, Social media
Monday, June 6, 2011
231. Real-‐‑Time Social Media
Is Real-Time the future of Web?
Social Media for Real-Time Web
‣ Disaster Management
‣ Ushahidi
‣ Real-Time Markets
‣ Examples
‣ Brand Tracking
‣ Twarql
‣ Movie reviews
Monday, June 6, 2011
232. Scenario
The
Guardian
Feb
2010
Monday, June 6, 2011
233. Scenario
The
Guardian
Feb
2010
Monday, June 6, 2011
234. Scenario
The
Guardian
Feb
2010
Journalist
Monday, June 6, 2011
235. Challenges
Information Overload
‣ Can we aggregate, organize and collectively analyze data
Real Time
‣ Can we deliver the data as it is generated
Monday, June 6, 2011
236. A Semantic Web Approach
Expressive description of Information need
‣ Using SPARQL (Instead of traditional keyword search)
Flexibility on the point of view
‣ Ability to "slice and dice" the data in several dimensions: thematic,
spatial, temporal, sentiment etc..
Streaming data with Background Knowledge
‣ Enables automatic evolution and serendipity
Scalable Real-Time delivery
‣ Using sparqlPuSH (SFSW'10)
Monday, June 6, 2011
243. Metadata Extractions
(Social Sensor Server)
Other Metadata provided by Twitter
‣ User profile: User Name, Location, Time etc..
‣ Tweet: RT, reply etc..
Monday, June 6, 2011
244. Structured Data
(Social Sensor Server)
RDF Annotation
‣ Common RDF/OWL Vocabularies
‣ FOAF - (foaf-project.org) Friend of a Friend
‣ SIOC - (sioc-project.org) Semantically Interlinked
Online Communities
‣ OPO - (online-presence.net) Online Presence Ontology
‣ MOAT - (moat-project.org) — Meaning Of A Tag
Monday, June 6, 2011
245. Structured Data
(Social Sensor Server)
Monday, June 6, 2011
246. Structured Data
(Social Sensor Server)
A snippet of the annotation
<http://twitter.com/ bob/statuses/123456789>
rdf:type sioct:MicroblogPost ;
sioc:content ”Fingers crossed for the upcoming #hcrvote”
sioc:hascreator <http://twitter.com/bob> ;
foaf:maker <http://example.org/bob> ;
moat:taggedWith dbpedia:Healthcare_reform .
<http://twitter.com/bob> geonames:locatedIn
Dbpedia:Ohio .
Monday, June 6, 2011
248. Semantic Publisher
Virtuoso to store triples
Queries formulated by the users are stored
SPARQL protocol over the HTTP to access rdf from
the store
Combine data from tweet with the background
knowledge in the rdf store
Monday, June 6, 2011
250. Application Server & Distribution
Hub
Distribution Hub
‣ PUSH Model -‐‑ Pubsubhubbub protocol
‣ Pushes the tweets to the Application Server
Application Server
‣ Delivers data to the Clients
‣ RSS Enable Concept feeds
Monday, June 6, 2011
251. Brand Tracking -‐‑ Example
Background Knowledge (e.g. DBpedia)
@anonymized
Lorem ipsum bla bla this is an example tweet
?category
skos:subject
? skos:subject
competitor skos:subject
moat:taggedWith
dbpedia:IPad
?tweet
Monday, June 6, 2011
252. Brand Tracking -‐‑ Example
Background Knowledge (e.g. DBpedia)
?category
skos:subject
? skos:subject
competitor bla this is an example tweet
@anonymized skos:subject
Lorem ipsum bla
moat:taggedWith
dbpedia:IPad
?tweet
Monday, June 6, 2011
253. Brand Tracking -‐‑ Example
Background Knowledge (e.g. DBpedia)
category:Wi-Fi
category:Touchscreen
?category
skos:subject
? skos:subject
competitor bla this is an example tweet
@anonymized skos:subject
Lorem ipsum bla
moat:taggedWith
dbpedia:IPad
?tweet
Monday, June 6, 2011
254. Brand Tracking -‐‑ Example
Background Knowledge (e.g. DBpedia)
IPhone
HPTabletPC
category:Wi-Fi
category:Touchscreen
?category
skos:subject
? skos:subject
competitor bla this is an example tweet
@anonymized skos:subject
Lorem ipsum bla
moat:taggedWith
dbpedia:IPad
?tweet
Monday, June 6, 2011
256. President Obama
1242 Articles from Nytimes lays out plan for
Around 800,000 tweets Health care reform
in Speech to Joint
Session of Congress
(10th Sept
Timeline.com)
Monday, June 6, 2011
257. President Obama
1242 Articles from Nytimes lays out plan for
Around 800,000 tweets Health care reform
in Speech to Joint
Session of Congress
(10th Sept
Timeline.com)
Obama taking an
active role in Health
talks in pursuing his
proposed overhaul
of health care
system. (13th Aug
Monday, June 6, 2011
261. Spam in Social Networks
Reasons for spamming include:
‣ Gaining Popularity
‣ Use of popular topic related keywords (e.g. hashtags of
trending topics) to propagate something off topic.
Launching malicious attacks
‣ Phishing attacks, virus, malware etc.
‣ Misleading the masses
‣ Propagating false information [MM-10].
Monday, June 6, 2011
262. Spam in Social Networks
Gaining popularity using trending keywords:
This tweet uses #Cairo but refers to a fashion
website.
Monday, June 6, 2011
263. Spam in Social Networks
Gaining popularity using trending keywords:
This tweet uses #Cairo but refers to a fashion
website.
Monday, June 6, 2011
264. Spam in Social Networks
Gaining popularity using trending keywords:
This tweet uses #Cairo but refers to a fashion
website.
Monday, June 6, 2011
265. Spam in Social Networks
Gaining popularity using trending keywords:
This tweet uses #Cairo but refers to a fashion
website. Egypt
Protests
Monday, June 6, 2011
266. Spam in Social Networks
Gaining popularity using trending keywords:
This tweet uses #Cairo but refers to a fashion
website. Egypt
Protests
Monday, June 6, 2011
267. Spam in Social Networks
Gaining popularity using trending keywords:
This tweet uses #Cairo but refers to a fashion
website. Egypt
Protests
Monday, June 6, 2011
268. Spam in Social Networks
Gaining popularity using trending keywords:
This tweet uses #Cairo but refers to a fashion
website. Egypt
Protests
Monday, June 6, 2011
269. Spam in Social Networks
Gaining popularity using trending keywords:
This tweet uses #Cairo but refers to a fashion
website. Egypt
Protests
Monday, June 6, 2011
270. Spam in Social Networks
Gaining popularity using trending keywords:
This tweet uses #Cairo but refers to a fashion
website. Egypt
Protests
Monday, June 6, 2011
271. Spam in Social Networks
Spam detection
‣ Content-based features
‣ Content Size, URL type, spam words
‣ Metadata-based features
‣ Account information, behavior.
‣ Network-based features
‣ Provenance. (e.g. content from a reliable source)
Monday, June 6, 2011
272. Trust in Social Networks
Reputation, Policy, Evidence, and Provenance used
to derive trustworthiness.
Illustrative examples of online cues used for trust
assessment.
‣ Wikipedia: article size, number of references, author, edit
history, age of the article, edit frequency etc.
‣ Product Reviews: number of helpful, very helpful ratings,
author expertise, sentiments in comments received for a
review etc.
Monday, June 6, 2011
273. Trust in Social Networks
We propose trust ontology[AHTS-10] that
‣ Captures semantics of trust.
‣ Enables representation and reasoning with trust.
Semantics of Trust specifies, for a given trustor and
trustee, the following features.
‣ Type - Type of trust relationship.
‣ Scope - Context of the trust relationship.
‣ Value - Quantifies the trust relationship.
Monday, June 6, 2011
274. Trust in Social Networks
Gleaning primitive (edge) trust
‣ Trust value between two nodes is quantified using
numbers. E.g., [0,1] or [-1,1] or partial ordering[TAHS-09].
Gleaning composite (path) trust
‣ Propagation via chaining and aggregation (transitivity)
Some popular algorithms for trust computation
‣ Eigentrust, Spreading Activation, SUNNY etc.
Monday, June 6, 2011
275. Integrating Social And
Sensor Networks
Machine sensor observations are quantitative in
nature, while human observations can be both
qualitative and quantitative.
Benefits of combining observations from humans
and machine sensors
‣ Complementary evidence.
‣ Corroborative evidence
Monday, June 6, 2011
276. Integrating Social And
Sensor Networks
Applications of integrating heterogeneous sensor
observations
‣ Situation Awareness by using human observations to
interpret machine sensor observations.
‣ Enhancing trustworthiness using corroborative evidence.
Monday, June 6, 2011
277. Mobile Social Computing
Instant Discovery: Geo-‐‑tagging and location-‐‑
aware services, in combination with search, have
made discovery a two-‐‑way street.
Compressed Expression: Mobile makes social
networking even more compelling
Outsourced Memory: Cloud-‐‑based servers to
store all of their mobile applications and
databases
Monday, June 6, 2011
278. Mobile Social Computing
Compressed Expression: Mobile makes social
networking even more compelling
Outsourced Memory: Cloud-‐‑based servers to
store all of their mobile applications and
databases
Monday, June 6, 2011
279. Mobile Social Computing
Outsourced Memory: Cloud-‐‑based servers to
store all of their mobile applications and
databases
Monday, June 6, 2011
282. Mobile Social Computing
Automated Decisions: Smart apps helps to make
faster decisions or even apps makes decisions for
us
Peer Power: Mobiles can create social movements
based on peer influence
Monday, June 6, 2011
283. Mobile Social Computing (Cont.)
Personalized Branding: advertising are rapidly
becoming personalized based on individual's needs
and preferences
Mobiles in social development becoming an integral
part of development
‣ Coordination in disaster situations
‣ Health care delivery, especially in developing countries
‣ Elections and other forms of political expression
Monday, June 6, 2011
285. Twitris -‐‑ Motivation
1. Information Overload
Multiple events around us
WHAT to be aware of
Multiple Storylines about same event!!
Monday, June 6, 2011
286. Twitris -‐‑ Motivation
2. Evolution of Citizen Observation
‣ with location and time
Monday, June 6, 2011
287. Twitris -‐‑ Motivation
3. Semantics of Social perceptions
‣ What is being said about an event (theme)
‣ where (spatial)
‣ When (temporal )
Twitris lets you browse citizen reports using social
perceptions as the fulcrum
Monday, June 6, 2011
288. Twitris: Semantic Social Web
Mash-‐‑up
Facilitates understanding of multi-‐‑dimensional social perceptions over
SMS, Tweets, multimedia Web content, electronic news media
Monday, June 6, 2011