SlideShare a Scribd company logo
1 of 335
Download to read offline
Citizen  Sensor  Data  Mining,    
                Social  Media  Analytics  and  
           Development  Centric  Web  Applications.
                          Tutorial  at  
             Semantic  Technology  Conference,  
                     San  Francisco,  CA.

            Karthik Gomadam          Amit Sheth     Selvam Velmurugan
        Accenture Technology Labs,   Kno.e.sis @      eMoksha, Kiirti
                 San Jose            Wright State
                                     University



Monday, June 6, 2011
Meena Nagarajan                                Selvam Velmurugan
  (Content Analysis)                            (Kiirti, eMoksha NGOs)




                                                    Hemant Purohit
                             Amit Sheth
                                               (People & Network analysis)
                            (Semantic Web)
  Ashutosh Jadhav
  (Event Analysis)




                                                      Lu Chen
   Pramod Anantharam                             (Sentiment Analysis)
   (Social & Sensor web)   Pavan Kapanipathi
                             (Real Time Web)




Monday, June 6, 2011
A  Quick  Word

       Much  of  the  work  discussed  in  this  tutorial  is  
       primarily  the  doctoral  research  by  Dr.  Meena  
       Nagarajan,  currently  at  IBM  Almaden.  It  also  
       includes  current  work  done  at  kno.e.sis  center  at  
       Wright  State  University.




Monday, June 6, 2011
Outline

        Citizen  Sensing:  Role,  Enablers,  Apps    
        Systematic  Study  Social  Media
        Citizen  Sensing  @  Real-­‐‑time
        Emerging  Research  Areas
      ‣ Spam  and  Trust  in  Social  Media,  Mobile  Social  Computing
        Research  Application:  Twitris
        Tutorial  part  2  


Monday, June 6, 2011
Citizen  Sensing

    Everyday users of Web2.0 and social networks:
    Citizens of an Internet- or Web-enabled social
    community
    Observation and Information reported by citizens
    => Citizen Sensing
    Human-in-the-loop (participatory) sensing + Web
    2.0 + mobile computing = emergence of 
 " citizen-sensor networks




Monday, June 6, 2011
Social  Signals

       The activity of observing, reporting, disseminating
       information via text, audio, video and built in device
       sensor (and smart devices),
      ‣ Creating social signals through aggregation, enhancement,
            analysis, visualization, and interpretation.

       Immense potential to disseminate information
       quickly and in real-time



Monday, June 6, 2011
Enablers:  Mobile  Devices  &  
                 Ubiquitous  Connectivity
       Mobile device fast emerging as our primary tool
      ‣ Redefines the way we engage with people, information,
        etc.
       Global, Ubiquitous, always available
       Sense where you are, how you are, …




Monday, June 6, 2011
Enablers:  Mobile  Devices  &  
                 Ubiquitous  Connectivity



       Global, Ubiquitous, always available
       Sense where you are, how you are, …




Monday, June 6, 2011
Enablers:  Mobile  Devices  &  
                 Ubiquitous  Connectivity




       Sense where you are, how you are, …




Monday, June 6, 2011
Enablers:  Mobile  Devices  &  
                 Ubiquitous  Connectivity




Monday, June 6, 2011
Enablers:  Mobile  Devices  &  
                 Ubiquitous  Connectivity
       Mobile Platforms Hit Critical Mass 
      ‣     Over 5 billion users
      ‣     1+B with internet connected mobile devices (2010)
      ‣     Smartphones > Notebooks + Netbooks (2010E)
      ‣     500K+ mobile phone applications
      ‣     74% of mobile phone users (2.4B) worldwide texted (2007)




Monday, June 6, 2011
Enablers:  Web  2.0  &  Social  Media

       500M+ Facebook Users
       100M+ Twitter users, 85M+ tweets/day
       Internet Users: 1.8 Bln
       Content dissemination medium
      ‣ Even for traditional media (@cnn, @nytimes)




Monday, June 6, 2011
Enablers:  Web  2.0  &  Social  Media


       100M+ Twitter users, 85M+ tweets/day
       Internet Users: 1.8 Bln
       Content dissemination medium
      ‣ Even for traditional media (@cnn, @nytimes)




Monday, June 6, 2011
Enablers:  Web  2.0  &  Social  Media



       Internet Users: 1.8 Bln
       Content dissemination medium
      ‣ Even for traditional media (@cnn, @nytimes)




Monday, June 6, 2011
Enablers:  Web  2.0  &  Social  Media




       Content dissemination medium
      ‣ Even for traditional media (@cnn, @nytimes)




Monday, June 6, 2011
Enablers:  Web  2.0  &  Social  Media




Monday, June 6, 2011
Enablers:  Web  2.0  &  Social  Media

        Types of UGC: Twitter(text/microblogs), Facebook
       (multimedia),YouTube(videos),
        Flicker(images), Blogs(text), 
        Ping: (Social network for music) 




Monday, June 6, 2011
Enablers:  Web  2.0  &  Social  Media


        Flicker(images), Blogs(text), 
        Ping: (Social network for music) 




Monday, June 6, 2011
Enablers:  Web  2.0  &  Social  Media




        Ping: (Social network for music) 




Monday, June 6, 2011
Enablers:  Web  2.0  &  Social  Media




Monday, June 6, 2011
Citizen  Sensors  in  Action


                                     Iran election
                                     Haiti Earthquake
                                     US healthcare
                                     debate




Monday, June 6, 2011
Revolution  2.0  
                         Political/Social  Activism
       “If you want to liberate a government, give them the
       internet.” - Wael Ghonim (Egyptian social activist)
       When Blitzer asked “Tunisia, then Egypt, what’s
       next?,” Ghonim replied succinctly “Ask Facebook.”




Monday, June 6, 2011
Revolution  2.0  
                         Political/Social  Activism


       When Blitzer asked “Tunisia, then Egypt, what’s
       next?,” Ghonim replied succinctly “Ask Facebook.”




Monday, June 6, 2011
Revolution  2.0  
                         Political/Social  Activism




Monday, June 6, 2011
Citizen  Journalism




                                      Twitter Journalism




Monday, June 6, 2011
Social  Media  Influence:  
            Intelligence,  News  &  Analysis  
      Many media companies use Facebook and Twitter
 as news-delivery platform. Many individuals rely on
 them as news source. News is increasingly social.




Monday, June 6, 2011
Business  Intelligence  Trend  
           SpoTing,  Forecasting,  Brand  
        Tracking    and  Crisis  Management
     Sysomos  : http://www.sysomos.com/
     Trendspotting  : http://trendspotting.com
     Simplify : http://simplify360.com/
     Shoutlet  : http://www.shoutlet.com/
     Reputation (Defender)  :
     http://www.reputationdefender.com/

Monday, June 6, 2011
Development  
                   (Education,  Health,  eGov)
       LiveMocha  (http://www.livemocha.com/)
      ‣ Online Language learning tool with social engagement 
      ‣ bridging the gap!!
       Soliya (http://www.soliya.net/)
      ‣ Dialogue between students from diverse " backgrounds
        across the globe using latest multimedia technologies
       Project Einstein (http://digital-democracy.org/what-we-do/programs/) 
      ‣ A photography-based digital penpal program connecting
        youths in refugee camps to the world



Monday, June 6, 2011
Development  
                   (Education,  Health,  eGov)



       Soliya (http://www.soliya.net/)
      ‣ Dialogue between students from diverse " backgrounds
        across the globe using latest multimedia technologies
       Project Einstein (http://digital-democracy.org/what-we-do/programs/) 
      ‣ A photography-based digital penpal program connecting
        youths in refugee camps to the world



Monday, June 6, 2011
Development  
                   (Education,  Health,  eGov)




       Project Einstein (http://digital-democracy.org/what-we-do/programs/) 
      ‣ A photography-based digital penpal program connecting
        youths in refugee camps to the world



Monday, June 6, 2011
Development  
                   (Education,  Health,  eGov)




Monday, June 6, 2011
Development  
                   (Education,  Health,  eGov)
       PatientsLikeMe (http://mashable.com/2010/07/13/social-media-health-trends/)  
       TrialX (http://trialx.com)




                                                            Image:  hMp://www.dragonsearchmarketing.com/
                                                                                       blog/
                                                             social-­‐‑media-­‐‑development-­‐‑through-­‐‑visual-­‐‑aids-­‐‑
                                                                                       tools/  




Monday, June 6, 2011
Why  People-­‐‑Content-­‐‑Network  
                    metadata?




Monday, June 6, 2011
Dimensions  of  Systematic  Study  
                of  Social  Media


             Spatio - Temporal -Thematic
                          +
             People - Content - Network



Monday, June 6, 2011
Social  Information
                           Processing
       "Who says what, to whom, why, to what extent
       and with what effect?" [Laswell]
       Network: Social structure emerges from the
       aggregate of relationships (ties)
       People: poster identities, the active effort of
       accomplishing interaction
       Content : studying the content of ommunication. 


Monday, June 6, 2011
Studying  Online  Human  Social  
                      Dynamics
        How  does  the  (semantics  or  style  of)  content  fit  
 into  the  observations  made  about  the  network?
     ‣ Often,  the  three-­‐‑dimensional  dynamic  of  people,  
         content  and  link  structure  is  what  shapes  the  social  
         dynamic.  




Monday, June 6, 2011
Studying  Online  Human  Social  
                      Dynamics




Monday, June 6, 2011
Studying  Online  Human  Social  
                      Dynamics
        Example:  how  does  the  topic  of  discussion,  
 emotional  charge  of  a  conversation,  the  presence  of  an  
 expert  and  connections  between  participants;  together  
 explain  information  propagation  in  a  social  network?  




Monday, June 6, 2011
Studying  Online  Human  Social  
                      Dynamics




Monday, June 6, 2011
Metadata/Annotations

       Metadata: an organized way to study
      ‣ types

      ‣ creation/extraction and storage

      ‣ use




Monday, June 6, 2011
The  Anatomy  of  a  Tweet




Monday, June 6, 2011
People  Metadata:  Variety  of  
         Self-­‐‑expression  Modes  on    Multiple  
                  Social  Media  Platforms
           Explicit  information  from  user  profiles  
         ‣ User  Names,  Pictures,  Videos,  Links,  Demographic  
           Information,  Group  memberships...
         ‣ Often  is  not  updated        
           Implicit  information  from  user  a+ention  metadata
         ‣ Page  views,  Facebook  'ʹLikes'ʹ,  Comments;  TwiMer  
           'ʹFollows'ʹ,  Retweets,  Replies.. 




Monday, June 6, 2011
People  Metadata:  Various  Levels


                                    Demographic




                                      Interests




             Activity

                        Network



Monday, June 6, 2011
People  Metadata:  Continued
     User Demographic Metadata Interest Level Metadata
     •User-id                  •Author type  
     •Screen/Display-name of       •Trustee/donor, journalist,
     user                          blogger, scientist etc.
     •Real name of user         • Favorite tweets
     •Location                  • Types of lists subscribed
     •Profile Creation Date      • Style of Writing –
     •User description          personality indicator
          •User Bio             • No. of Followees
          •URL                  • Author type trend of
                                Followees

Monday, June 6, 2011
People  Metadata:  Continued
     Activity  Level  Metadata                      Influence  Level  Metadata  
                                                    (Inferring  People  Metadata  from  Network  level  Information)



     •Age  of  the  profile                          •No.  of  Followers  –  normal,  influential
     •Frequency  of  posts                          •No.  of  Mentions
     •Timestamp  of  last  status                   •No.  of  Retweets/Forwards
     •No.  of  Posts                                •No.  of  Replies
     •No.  of  Lists/groups  created                •No.  of  Lists/groups  following  
     •No.  of  Lists/groups  subscribed             •No.  of  people  following  back

                                                    •Authority  &  Hub  Scores
                                    Web Presence:
               •User affiliations
               •KLOUT Score – influence measure (www.klout.com)
Monday, June 6, 2011
Content  Metadata

          Content Independent metadata
     ‣"     date, location, author etc
       Content Dependent metadata
      ‣        Direct content-based metadata
      ‣        Explicit/Mentioned Content metadata
           ‣     named entities in content

      ‣        Implicit/Inferred Content Metadata
           ‣     related named entities from knowledge sources

      ‣        Indirect content-based metadata (External metadata)
           ‣     context inferred from URLs in content (images, links to articles,
                 FourSquare checkins etc.)
Monday, June 6, 2011
Content  Metadata



       Content Dependent metadata
      ‣        Direct content-based metadata
      ‣        Explicit/Mentioned Content metadata
           ‣     named entities in content

      ‣        Implicit/Inferred Content Metadata
           ‣     related named entities from knowledge sources

      ‣        Indirect content-based metadata (External metadata)
           ‣     context inferred from URLs in content (images, links to articles,
                 FourSquare checkins etc.)
Monday, June 6, 2011
Content  Metadata




Monday, June 6, 2011
Content  Independent  Metadata

           For Tweets
           ‣ Published date and time
           ‣ Location (where tweet was generated from)
           ‣ Tweet posting method (smart-phone, twitter.com,
                 clients for twitter)
           ‣     Author information




Monday, June 6, 2011
Content  Independent  Metadata




Monday, June 6, 2011
Content  Independent  Metadata

           For Text messages
           ‣     Published date and time
           ‣     Origin location
           ‣     Recipient
           ‣     Carrier information




Monday, June 6, 2011
Content  Independent  Metadata




Monday, June 6, 2011
Content  Independent  Metadata




Monday, June 6, 2011
Content  Dependent  Metadata  (Tweet)  
         Direct  Content-­‐‑based  Metadata
                       Direct Content-based Metadata




         Indirect content-based metadata (External metadata)


Monday, June 6, 2011
Content  Dependent  Metadata  
                       Direct  Content-­‐‑based  Metadata




Monday, June 6, 2011
Network  Metadata

     Connections/Relationships (foundation for the network)
     matter!
        Structure  Level  Metadata          Relationship  Level  Metadata

        •Community  Size                    •Type  of  Relationship
        •Community  growth  rate            •Relationship  strength
        •Largest  Strongly  Connected    •User  Homophily  based  on  
        Component  size                  certain  characteristic  (e.g.,  
        •Weakly  Connected  Components   Location,  interest  etc.)
        &  Max.  size                       •Reciprocity:  mutual  relationship
        •Average  Degree  of  Separation    •Active  Community/  Ties
        •Clustering  Coefficient  
Monday, June 6, 2011
Metadata:  Creation,  Extraction  
                    and  Storage



Monday, June 6, 2011
Metadata  Creation  &  Extraction

      Extracted Metadata
     ‣ Directly visible information from the user profile, tweet
         content & community structure
      Created Metadata
     ‣ After processing information in the user profile, content
         and/or network structure




Monday, June 6, 2011
An  Example

     Length: 144 characters; General topic: Egypt protest 
     This poor {sentiment_expression: {target:”Lara
     Logan”, polarity:”negative”}} woman! RT @THR CBS
     News'{entity:{type=”News Agency”}} Lara Logan
     {entity:{type=”Person”}} Released From Hospital
     {entity:{type=”Location”}} After Egypt{entity:
     {type=”Country”} Assault{type=”topic”}
     http://bit.ly/dKWTY0   {external_URL}


Monday, June 6, 2011
Why  Semantic  Web  is  a  standard    
              for  social  metadata?

        Rich  Snippet,  RDFa,  open  graph,  semantic  web  
       based  social  data  standards
        Relationships/connections  play  central  role
      ‣ Relationships  as  first  class  object  is  important




Monday, June 6, 2011
Semantic  Web:  A  Very  Short  
                         Primer




Monday, June 6, 2011
Semantic  Web:  A  Very  Short  
                         Primer
      Representation
     ‣ RDF
       ‣ relationships as first class object <subject,
            predicate,object>
     ‣ OWL
       ‣ Representing Knowledge  and Agreements:
            nomenclature, taxonomy, folksonomy, ontology




Monday, June 6, 2011
Semantic  Web:  A  Very  Short  
                         Primer




Monday, June 6, 2011
Semantic  Web:  A  Very  Short  
                         Primer
      Annotation
     ‣ RDFa, Xlink, model reference




Monday, June 6, 2011
Semantic  Web:  A  Very  Short  
                         Primer
      Annotation
     ‣ RDFa, Xlink, model reference
      Web of Data
     ‣ Linked Open Data 




Monday, June 6, 2011
Semantic  Web:  A  Very  Short  
                         Primer
      Annotation
     ‣ RDFa, Xlink, model reference
      Web of Data
     ‣ Linked Open Data 
      Querying
     ‣ SPARQL; Rules: SWRL, RIF




Monday, June 6, 2011
How  to  save  and  use  metadata?

      Store metadata as data and use standard database
 techniques
      Use filtering and clustering, summarization,
 statistics - implicit semantics




Monday, June 6, 2011
How  to  save  and  use  metadata?



      Use filtering and clustering, summarization,
 statistics - implicit semantics




Monday, June 6, 2011
How  to  save  and  use  metadata?




Monday, June 6, 2011
How  to  save  and  use  metadata?




Monday, June 6, 2011
How  to  save  and  use  metadata?

    Use explicit semantics and Semantic Web
 standards and technologies
       ‣semantics = meaning
       ‣richer representation, support for relationships, context
      ‣supports use of background knowledge
       ‣better integration, powerful analysis
    Semantics- the implicit, the formal and the
 powerful
   Social metadata on the Web


Monday, June 6, 2011
Metadata  Extraction  from  
                            Informal  Text
   Meena Nagarajan, Understanding User-Generated Content on
   Social Media, Ph.D. Dissertation, Wright State University, 2010


Monday, June 6, 2011
Characteristics  of  Text  on  Social  
                       Media




Monday, June 6, 2011
The  Formality  of  Text




Monday, June 6, 2011
Content  Analysis-­‐‑Typical  Sub-­‐‑tasks

       Recognize key entities mentioned in content
      ‣ Information Extraction (entity recognition, anaphora
        resolution, entity classification..)
      ‣ Discovery of Semantic Associations between entities
       Topic Classification, Aboutness of content 
      ‣ What is the content about?
       Intention Analysis 
      ‣ Why did they share this content?




Monday, June 6, 2011
Content  Analysis-­‐‑Typical  Sub-­‐‑tasks




       Topic Classification, Aboutness of content 
      ‣ What is the content about?
       Intention Analysis 
      ‣ Why did they share this content?




Monday, June 6, 2011
Content  Analysis-­‐‑Typical  Sub-­‐‑tasks




       Intention Analysis 
      ‣ Why did they share this content?




Monday, June 6, 2011
Content  Analysis-­‐‑Typical  Sub-­‐‑tasks




Monday, June 6, 2011
Content  Analysis-­‐‑Typical  Sub-­‐‑tasks




Monday, June 6, 2011
Content  Analysis-­‐‑Typical  Sub-­‐‑tasks

      Sentiment Analysis
       ‣What opinions are people conveying via the content?
       Author Profiling
       ‣What can we infer about the author from the content he
       posts?
      Context (external to content) extraction
       ‣URL extraction, analyzing external content




Monday, June 6, 2011
Research  Efforts,  Contributions  in  
                   this  space..
       Examining usefulness of multiple context cues
       for text mining algorithms
      ‣ Compensating for for informal, highly variable
        language, lack of context
      ‣ Using context cues: Document corpus, syntactic,
        structural cues, social medium, external domain
        knowledge…
       In this talk, highlighting sample metadata
       creation tasks: NER, Key Phrase Extraction,
       Intention, Sentiment/Opinion Mining

Monday, June 6, 2011
Part  1.  NER,                                                              Key  
                    Phrase  Extraction
       Named Entity Recognition
      ‣ I loved <movie> the hangover </movie>!
       Key Phrase Extraction




Monday, June 6, 2011
Multiple  Context  Cues  Utilized  for  
        NER  in  Blogs  and  MySpace  




Monday, June 6, 2011
Multiple  Context  Cues  Utilized  for  
     Keyphrase  Extraction  from  TwiTer,  
          Facebook  and  MySpace




Monday, June 6, 2011
Focus,  Impact

       Techniques focus on
      ‣ relatively less explored content aspects on social
        media platforms
       Combination of top-down, bottom-up analysis
       for informal text
      ‣ Statistical NLP, ML algorithms over large corpora
      ‣ Models and rich knowledge bases in a domain




Monday, June 6, 2011
NAMED  ENTITY  
                       RECOGNITION




Monday, June 6, 2011
NAMED  ENTITY  
                       RECOGNITION
      I loved your music Yesterday!
      “It was THE HANGOVER of the year..lasted
     forever..
      So I went to the movies..badchoice picking “GI
     Jane”worse now”




Monday, June 6, 2011
NAMED  ENTITY  
                           RECOGNITION



                       Identifying and classifying tokens




Monday, June 6, 2011
NER  in  prior  work  vs.  NER  for  
                    Informal  Text




Monday, June 6, 2011
Cultural  Named  Entities

          NER  focus  in  this  work:  Cultural  Named  
     Entities
        Artifacts  of  Culture  
      ‣ Name  of  a  books,  music  albums,  films,  video  games,  
        etc.
        Common  words  in  a  language
      ‣ The  Lord  of  the  Rings,  Lips,  Crash,  Up,  Wanted,  
        Today,  Twilight,  Dark  Knight…




Monday, June 6, 2011
Characteristics  of  Cultural  Entities

       Varied senses, several poorly documented
      ‣ Merry Christmas covered by 60+ artists Star Trek:
        movies, TV series, media franchise.. and cuisines !!
       Changing contexts with recent events
      ‣ The Dark Knight reference to Obama, health care
        reform
       Unrealistic expectations
      ‣ Comprehensive sense definitions, enumeration of
        contexts, labeled corpora for all senses ..
      ‣      NER Relaxing the closed-world sense assumptions

Monday, June 6, 2011
NER  in  prior  work  vs.    
                       NER  for  Informal  Text




Monday, June 6, 2011
A  Spot  and  Disambiguate  
                             Paradigm
       NER generally a sequential prediction problem
      ‣ NER system that achieves 90.8 F1 score on the
        CoNLL-2003 NER shared task (PER, LOC, ORGN
        entities) [Lev Ratinov, Dan Roth]
       Focus of approach: Spot and Disambiguate
       Paradigm
       Starting off with a dictionary or list of entities we
       want to spot



Monday, June 6, 2011
A  Spot  and  Disambiguate  
                             Paradigm
       Spot, then disambiguate in context (natural
       language, domain knowledge cues)
       Binary Classification
       Is this mention of “the hangover” in a sentence
       referring to a movie?




Monday, June 6, 2011
NER  in  prior  work  vs.                        
                       NER  for  Informal  Text




Monday, June 6, 2011
Algorithmic  Contributions  
                  Supervised  Algorithms




Monday, June 6, 2011
Algorithmic  Contributions  
                  Supervised  Algorithms
 Examples:
 “I am watching Pattinson scenes in <movie
     id=2341> Twilight</movie> for the nth time.”
 “I spent a romantic evening watching the Twilight
     by the bay..”
 “I love <artist id=357688>Lily’s</artist> song




Monday, June 6, 2011
Multiple  Senses  in  the  Same  
                         Domain




Monday, June 6, 2011
Algorithm  Preliminaries

       Problem Defn
       ‣ Cultural Entity Identification : Music album, tracks
       ‣ Smile (Lilly Allen), Celebration (Madonna)
      Corpus: MySpace comments
       ‣ Context-poor utterances
 " “Happy 25th Lilly, Alfieis funny”




Monday, June 6, 2011
Algorithm  Preliminaries




      Corpus: MySpace comments
       ‣ Context-poor utterances
 " “Happy 25th Lilly, Alfieis funny”




Monday, June 6, 2011
Algorithm  Preliminaries




 " “Happy 25th Lilly, Alfieis funny”




Monday, June 6, 2011
Algorithm  Preliminaries




  Goal:  Semantic  Annotation  of  
    music  named  entities  (w.r.t  
           MusicBrainz)
Monday, June 6, 2011
Using  a  Knowledge  Resource  for  
         NER  is  not  straight-­‐‑forward..




Monday, June 6, 2011
Approach  Overview  


      Scoped Relationship graphs
       ‣Using context cues from the
         content, webpage title, url…
         new Merry Christmas tune
       ‣Reduce potential entity spot size
         new albums/songs
       ‣Generate candidate entities
       ‣Spot and Disambiguate


Monday, June 6, 2011
Sample  Real-­‐‑world  Constraints


      Career Restrictions
       ‣“release your third album already..”
      Recent Album restrictions
       ‣“I loved your new album..”
      Artist age restrictions
       ‣”happy 25th rihanna, loved alfie btw..” etc.


Monday, June 6, 2011
Non-­‐‑Music  Mentions

       Challenge 1: Several senses in the same domain
      ‣ Scoping relationship graphs narrows possible senses
      ‣ Solves the named entity identification problem
        partially

       Challenge 2: Non-music mentions
      ‣ Got your new album Smile. Loved it!
      ‣ Keep your SMILE on!
 " "        " "
 " "        " "

Monday, June 6, 2011
Non-­‐‑Music  Mentions

       Challenge 1: Several senses in the same domain
      ‣ Scoping relationship graphs narrows possible senses
      ‣ Solves the named entity identification problem
        partially

       Challenge 2: Non-music mentions
      ‣ Got your new album Smile. Loved it!
      ‣ Keep your SMILE on!
 " "        " "
 " "        " "

Monday, June 6, 2011
Using  Language  Features  to  
        eliminate  incorrect  mentions..
       Syntactic features
      ‣ POS Tags, Typed dependencies..
      ‣ Example here
       Word-level features
      ‣ Capitalization, Quotes
       Domain-level features




Monday, June 6, 2011
Supervised  Learners




Monday, June 6, 2011
Hand  Labeling  -­‐‑  Fairly  Subjective

       1800+  spots  in  MySpace  user  comments  from  
      artist  pages  
       Keep  your  SMILE  on!
	
 –good  spot,  bad  spot,  inconclusive?

       4-­‐‑way  annotator  agreements
	
–Madonna  90%  agreement
	
 –Rihanna  84%  agreement
	
 –Lily  Allen  53%  agreement


Monday, June 6, 2011
Dictionary  SpoTer  +  NLP  Step  




       Daniel  Gruhl,  Meena  Nagarajan,  Jan  Pieper,  Christine  Robson,  Amit  Sheth,  Context  and  Domain  
    Knowledge  Enhanced  Entity  SpoMing  in  Informal  Text,  The  8th  International  Semantic  Web  Conference,  
                                                   2009:  260-­‐‑276  

Monday, June 6, 2011
NER  on  Social  Media  Text  using  
              Domain  Knowledge
       Highlights issues with using a domain
       knowledge for an IE task
       Two stage approach: chaining NL learners over
       results of domain model based spotters
       Improves accuracy up to a further 50%
      ‣ allows the more time-intensive NLP analytics to
        run on less than the full set of input data



Monday, June 6, 2011
BBC  SoundIndex  (IBM  Almaden):  
         Pulse  of  the  Online  Music  




          " "




          
        Daniel  Gruhl,  Meenakshi  Nagarajan,  Jan  Pieper,  Christine  Robson,  Amit  Sheth:  “Multimodal  Social  
     Intelligence  in  a  Real-­‐‑Time  Dashboard  System,”  special  issue  of  the  VLDB  Journal  on  "ʺData  Management  
       and  Mining  for  Social  Networks  and  Social  Media"ʺ,  2010    CHECK    hMp://www.almaden.ibm.com/cs/
                                                       projects/iis/sound/

Monday, June 6, 2011
The  Vision

     http://www.almaden.ibm.com/cs/projects/iis/sound/




Monday, June 6, 2011
Monday, June 6, 2011
Several  Insights
                       Trending  popularity  of  artists            Trending  topics  in  artist  pages




 Only  4%  -­‐‑ve  sentiments,  perhaps  ignore  the  Sentiment   Ignoring  Spam  can  change  ordering  
                   Annotator  on  this  data  source?                       of  popular  artists

Monday, June 6, 2011
Predictive  Power  of  Data

    Billboards Top 50 Singles chart during the week of
  Sept 22-28 ’07 vs. MySpace popularity charts.
    User study indicated 2:1 and upto 7:1 (younger age
  groups) preference for MySpace list.
    Challenging traditional polling methods!




Monday, June 6, 2011
Key  Phrase  Extraction




Monday, June 6, 2011
Key  Phrase  Extraction:  Example

     Key phrases extracted from prominent discussions
     on Twitter around the 2009 Health Care Reform
     debate and 2008 Mumbai Terror Attack on one day




Monday, June 6, 2011
Key  Phrase  Extraction  from  SM  
                        Text
          Different from Information Extraction
          Extracting vs. Assigning Key Phrases " Focus:
          Key Phrase Extraction
          Prior work focus: extracting phrases that
          summarize a document -- a news article, a web
          page, a journal article, a book..
          Focus: summarize multiple documents (UGC)
          around same event/topic of interest

Monday, June 6, 2011
Key  Phrase  Extraction  on  SM  
                       Content
       Focus: Summarizing Social Perceptions via key
       phrase extraction
      Preserving/Isolating the social behind the social
     data
     ‣"What is said in Egypt vs. the USA should be viewed in
         isolation




Monday, June 6, 2011
Key  Phrase  Extraction  on  SM  
                       Content
     ‣ Accounting for redundancy, variability, off-topic
      content
 " “Met up with mom for lunch, she looks lovely as ever,
     good genes .. Thanks Nike, I love my new
     Gladiators ..smooth as a feather. I burnt all the calories of
     Italian joy in one run.. if you are looking for good Italian
     food on Main, Bucais the place to go.”




Monday, June 6, 2011
Social  and  Cultural  Logic  in  SMC

       Thematic components
      ‣ similar messages convey similar ideas
       Space, time metadata
      ‣ role of community and geography in communication
       Poster attributes
      ‣ age, gender, socio-economic status reflect similar
        perceptions




Monday, June 6, 2011
Feature  Space  (common  to  several  
                    efforts)
           Focus: n-grams, spatio-temporal metadata (social
           components)
           Syntactic Cues: In quotes, italics, bold; in
           document headers; phrases collocated with
           acronyms




Monday, June 6, 2011
Feature  Space  (common  to  several  
                    efforts)
           Document and Structural Cues: Two word
           phrases, appearing in the beginning of a
           document, frequency, presence in multiple similar
           documents etc.
           Linguistic Cues: Stemmed form of a phrase,
           phrases that are simple and compound nouns in
           sentences etc.


Monday, June 6, 2011
Key  Phrase  Extraction:  Overview




“President Obama in trying to regain control of the
  health-care debate will likely shift his pitch in
  September”
" 1-grams: President, Obama, in, trying, to, regain, ...
" 2-grams: “President Obama”, “Obama in”, “in
  trying”, “trying

Monday, June 6, 2011
A descriptor is an n-gram weighted by:

     ‣ Thematic Importance
       ‣ TFIDF, stop words, noun phrases
          ‣ Redundancy: statistically discriminatory in nature
          ‣ variability: contextually important
     ‣ Spatial Importance (local vs. global popularity)
     ‣ Temporal Importance (always popular vs. currently trending)

Monday, June 6, 2011
Monday, June 6, 2011
Eliminating Off-topic Content [WISE2009]
           Frequency based heuristics will not eliminate
           off-topic content that is ALSO POPULAR




Monday, June 6, 2011
Approach  Overview

      “Yeah i know this a bit off topic but the other
     electronics forum is dead right now. im looking
     for a good camcorder, somethin not to large that
     can record in full HD only ones so far that ive
     seen are sonys”
      “CanonHV20.Great little cameras under $1000.”




Monday, June 6, 2011
Approach  Overview

      Assume one or more seed words (from domain
     knowledge base) C1 -['camcorder']
      Extracted Key words / phrases
     C2 -['electronics forum', 'hd', 'camcorder', 'somethin', 'ive',
     'canon', 'little camera', 'canon hv20', 'cameras', 'offtopic']

      Gradually expand C1 by adding phrases from C2
     that are strongly associated with C1
      Mutual Information based algorithm [WISE2009]

Monday, June 6, 2011
Key  Phrases  and  Aboutness  
                       Evaluations
       Are the key phrases we extracted topical and
       good indicators of what the content is about?
      ‣ If it is, it should act as an effective index/search
        phrase and return relevant content
       Evaluation Application: Targeted Content
       Delivery




Monday, June 6, 2011
Targeted  Content  
                       Delivery  -­‐‑Evaluations
       12K posts from MySpace and Facebook
       Electronics forums
      ‣ Baseline phrases: Yahoo Term Extractor
      ‣ Our method phrases: Key phrase extraction,
        elimination
       Targeted Content from Google AdSense




Monday, June 6, 2011
Targeted  Content  for  all  content  
            vs.  extracted  key  phrases




Monday, June 6, 2011
User  Studies  and  Results




Monday, June 6, 2011
Impact  and  Contributions

       TFIDF + social contextual cues yield more useful
       phrases that preserve social perceptions
       Corpus + seeds from a domain knowledge base
       eliminate off-topic phrases effectively




Monday, June 6, 2011
Intention  Mining




Monday, June 6, 2011
Targeted  Content  Delivery  via          
                                                      
                  Intention  Mining
       On social networks
       Use case for this talk
     ‣"    Targeted content = content-based " advertisements
     ‣   " Target = user profiles
       Content-based advertisements CBAs
     ‣"     Well-known monetization model for online content




Monday, June 6, 2011
Circa.  2009  Content-­‐‑based  Ads




Monday, June 6, 2011
Circa.  2009  -­‐‑Ads  on  Profiles




Monday, June 6, 2011
What  is  going  on  here

      Interests do not translate to purchase intents
     ‣"    Interests are often outdated..
     ‣   " Intents are rarely stated on a profile..
      Cases that do seem to work
     ‣"    New store openings, sales
     ‣   " Highly demographic-targeted ads




Monday, June 6, 2011
Intents  in  User  




Monday, June 6, 2011
Content  Ads  Outside  Profiles




Monday, June 6, 2011
Targeted  Content-­‐‑based  
                             Advertising  
       Non-trivial

      ‣ Non-policed content
       Brand image, Unfavorable sentiments

      ‣ People are there to network
       User attention to ads is not guaranteed

      ‣ Informal, casual nature of content
      ‣ People are sharing experiences and events
       Main message overloaded with off topic content"

Monday, June 6, 2011
Targeted  Content-­‐‑based  
                             Advertising  




Monday, June 6, 2011
Targeted  Content-­‐‑based  
                             Advertising  
    I NEED HELP WITHSONY VEGAS PRO 8!! Ugh and ihave a
      video project due tomorrow for merrilllynch :(( all ineed to
         do is simple: Extract several scenes from a clip, insert
     captions, transitions and thatsit. really. omggicant figure out
      anything!! help!! and igot food poisoning from eggs. its not
                         fun. Pleasssse, help? :(

      Learning from Multi-topic Web Documents for Contextual
      Advertisement, Zhang, Y., Surendran, A. C., Platt, J. C., and
                     Narasimhan, M.,KDD 2008



Monday, June 6, 2011
Preliminary  Results  in…  

       Identifying intents behind user posts on social
       networks

      ‣ Identify Content with monetization potential
       Identifying keywords for advertising in user-
       generated content

      ‣ Considering interpersonal communication & off-topic
            chatter




Monday, June 6, 2011
Investigations



       User studies

     ‣ Hard to compare activity based ads to s.o.t.a
       ‣ Impressions to Clickthroughs
          ‣ How well are we able to identify monetizable posts
          ‣ How targeted are ads generated using our " keywords
                vs. entire user generated content
Monday, June 6, 2011
Identifying  Monetizable  Intents
       Scribe Intent not same as Web Search Intent 1B.
       People write sentences, not keywords or phrases
       Presence of a keyword does not imply
       navigational / transactional intents

      ‣ ‘am thinking of getting X’ (transactional)
      ‣ ‘I like my new X’ (information sharing)
      ‣ ‘what do you think about X’ (information seeking)
       1B. J. Jansen, D. L. Booth, and A. Spink, “Determining the informational, navigational, and transactional intent of web
       queries,”Inf. Process. Manage., vol. 44, no. 3, 2008.




Monday, June 6, 2011
From  X  to  Action  PaTerns

       Action patterns surrounding an entity

      ‣ How questions are asked and not topic words that indicate
            what the question is about
      ‣ “where can I find a chottopspcam”
        ‣ User post also has an entity




Monday, June 6, 2011
Conceptual  Overview  
       Bootstrapping  to  learn  IS  paTerns
       Set of user posts from SNSs
       Not annotated for presence or absence of any intent




Monday, June 6, 2011
Bootstrapping  to  
                          learn  IS  paTerns
     Generate  a  universal  set  of  n-­‐‑  gram  paMerns;  freq  >  f
     S  =  set  of  all  4-­‐‑grams;  freq  >  3




Monday, June 6, 2011
Bootstrapping  to  
                       learn  IS  paTerns
     ! !
     Generate  set  of  candidate  paMerns  from  seed  words  
        (why,when,where,how,what)


     Sc=  all  4-­‐‑grams  in  S  that  extract  seed  words




Monday, June 6, 2011
Bootstrapping  to  
                       learn  IS  paTerns

     ! !
     User  picks  10  seed  paMerns  from  Sc



     Sis=  ‘does  anyone  know  how’,  ‘where  do  I  find’,  
        ‘someone  tell  me  where’…	



Monday, June 6, 2011
Bootstrapping  to  
                        learn  IS  paTerns
     ! !
     ! !




               Gradually  expand  Sis  by  adding     Information  
        Seeking  paDerns  from  Sc



Monday, June 6, 2011
Bootstrapping  to  
                        learn  IS  paTerns
 ! !
 ! !




           For  every  pis  in  Sis  generate  set  of  filler  paMerns




Monday, June 6, 2011
Bootstrapping  to  
                            learn  IS  paTerns
     ‘.*  anyone  know  how’‘	
          does  .*  know  how’
                                
           	
           ‘does  anyone  .*  how’                                  ‘does  anyone  
        know  .*’




Monday, June 6, 2011
Extracting  and  Scoring  PaTerns




Monday, June 6, 2011
Extracting  and  Scoring  PaTerns

                                   •‘does  *  know  how’
                               –‘does  someone  know  how’
                           •Functional  Compatibility  -­‐‑Impersonal  pronouns
                                        •Empirical  Support  –1/3

                              –‘does  somebody  know  how’
                           •Functional  Compatibility  -­‐‑Impersonal  pronouns
                                         •Empirical  Support  –0
                                            •PaMern  Retained

                                  –‘does  john  know  how’
                                           •PaMern  discarded
Monday, June 6, 2011
Extracting  and  Scoring  PaTerns

        Sc=  {‘does  anyone  know  how’,  ‘where  do  I  find’,    
       ‘someone  tell  me  where’}
          pis=  `does  anyone  know  how’




Monday, June 6, 2011
Extracting  and  Scoring  PaTerns



          pis=  `does  anyone  know  how’




Monday, June 6, 2011
Extracting  and  Scoring  PaTerns




Monday, June 6, 2011
Expanding  the  PaTern  Pool

         Functional  properties  /  communicative  functions  
         of  words
         From  a  subset  of  LIWC
 	
–cognitive  mechanical  (e.g.,  if,  whether,  wondering,  find)  
 	
 	
      •‘I  am  thinking  about  geMing  X’  
 	
 –adverbs(e.g.,  how,  somehow,  where)  
 	
 –  (e.g.,  someone,  anybody,  whichever)
 	
 	
      •‘Someone  tell  me  where  can  I  find  X’  

                       1Linguistic  Inquiry  Word  Count,  LIWC,  hMp://liwc.net




Monday, June 6, 2011
Details  in  [WISE2009]  for..

            Over  iterations,  single-­‐‑word  substitutions,  
            functional  usage  and  empirical  support  
            conservatively  expands  Sis

            Infusing  new  paMerns  and  seed  words
            Stopping  conditions




Monday, June 6, 2011
Sample  Extracted  PaTerns




Monday, June 6, 2011
Identifying  Monetizable  Posts

        Information  Seeking  paMerns  generated  offline
        Information  seeking  intent  score  of  a  post
      ‣ Extract  and  compare  paMerns  in  posts  with  
         extracted  paMerns
      ‣ Transactional  intent  score  of  a  post
        ‣ LIWC  ‘Money’  dictionary  -­‐‑  173  words  and  
                word  forms  indicative    of  transactions,  e.g.,  
                trade,  deal,  buy,  sell,  worth,  price  etc.


Monday, June 6, 2011
Keywords  for  Advertizing


       Identifying keywords in monetizable posts
       " –Plethora of work in this space
       Off-topic noise removal is our focus
       " I NEED HELP WITHSONY VEGAS PRO 8!! Ugh
       and ihave a video project due tomorrow for
       merrilllynch :(( all ineed to do is simple: Extract
       several scenes from a clip, insert captions,
Monday, June 6, 2011
Keywords  for  Advertising

       Identifying keywords in monetizable posts

      ‣ Plethora of work in this space
       Off-topic noise removal is our focus

      ‣ I NEED HELP WITHSONY VEGAS PRO 8!! Ugh and
            ihave a video project due tomorrow for merrilllynch :(( all
            ineed to do is simple: Extract several scenes from a clip,
            insert captions, transitions and thatsit. really. omggicant
            figure out anything!! help!! and igot food poisoning from
            eggs. its not fun. Pleasssse, help? :(



Monday, June 6, 2011
Conceptual  Overview  
                       (also  see  slides  88,89)  
       Topical hints

      ‣ C1 -['camcorder']
       Keywords in post

      ‣ C2 -['electronics forum', 'hd', 'camcorder', 'somethin', 'ive', 'canon',
            'little camera', 'canon hv20', 'cameras', 'offtopic']

       Move strongly related keywords from C2 to C1 one-by-one

      ‣ Relatedness determined using information gain
      ‣ Using the Web as a corpus, domain independent


Monday, June 6, 2011
Off-­‐‑topic  ChaTer

       C1 -['camcorder']
       C2 -['electronics forum', 'hd', 'camcorder', 'somethin',
       'ive', 'canon', 'little camera', 'canon hv20', 'cameras',
       'offtopic']
       Informative words

      ‣ ['camcorder', 'canon hv20', 'little camera', 'hd', 'cameras',
            'canon']



Monday, June 6, 2011
Evaluations  -­‐‑User  Study

       Keywords from 60 monetizable user posts

      ‣ Monetizable intent, at least 3 keywords in content
       45 MySpace Forums, 15 Facebook Marketplace, 30
       graduate students

      ‣ 10 sets of 6 posts each
      ‣ Each set evaluated by 3 randomly selected users
       Monetizable intents?

      ‣ All 60 posts voted as unambiguously information seeking in intent

Monday, June 6, 2011
1.  Effectiveness  of  using  
                            topical  keywords
       Google AdSenseads for user post vs. extracted
       topical keywords




Monday, June 6, 2011
Instructions  –User  Study




Monday, June 6, 2011
Result  -­‐‑2X  Relevant  Impressions

       Users picked ads relevant to the post

      ‣ At least 50% inter-evaluator agreement
       For the 60 posts

      ‣ Total of 144 ad impressions
      ‣ 17% of ads picked as relevant
       For the topical keywords

      ‣ Total of 162 ad impressions
      ‣ 40% of ads picked as relevant
Monday, June 6, 2011
2.  Profile  Ads  vs.  Activity  Ads

       User’s profile information

      ‣ Interests, hobbies, TV shows..
      ‣ Non-demographic information
       Submit a post
       Looking to buy and why (induced noise)
       Ads that generate interest, captured attention




Monday, June 6, 2011
Result  -­‐‑8X  Generated  Interest
       Using profile ads

      ‣ Total of 56 ad impressions
      ‣ 7% of ads generated interest
       Using authored posts

      ‣ Total of 56 ad impressions
      ‣ 43% of ads generated interest
       •" Using topical keywords from authored posts

      ‣ Total of 59 ad impressions
      ‣ 59% of ads generated interest

Monday, June 6, 2011
To  note…

       User studies small and preliminary, clearly suggest

      ‣ Monetization potential in user activity
      ‣ Improvement for Ad programs in terms of relevant
            impressions
       Evaluations based on forum, marketplace

      ‣ Verbose content
      ‣ Status updates, notes, community and event
            memberships…
      ‣ One size may not fit all
Monday, June 6, 2011
To  note…

       A world between relevant impressions and click
       throughs

      ‣ Objectionable content, vocabulary impedance, Ad
            placement, network behavior
       In a pipeline of other community efforts
       No profile information taken into account
       Cannot custom send information to Google AdSense



Monday, June 6, 2011
SENTIMENT  /  OPINION  
                         MINING




Monday, June 6, 2011
Content  Analysis:  Sentiment  
               Analysis/Opinion  Mining
       Two main types of information we can learn from
       user-generated content: fact vs. opinion
       Much of what we read in social media (e.g., blogs,
       Twitter, Facebook) is a mix of facts and opinions.  
       For example, " Latest news: Mobile web services not
       working in #Bahrain and Internet is extremely slow
       #feb14 {fact}... looks like they "learned" from #Egypt
       {opinion}"

Monday, June 6, 2011
Sentiment  Analysis  Motivation




                                                Why do
           Which movie     What customers    people oppose
           should I see?   complain about?    health care
                                                reform?



Monday, June 6, 2011
Sentiment  Analysis:  Tasks

       Example:

      ‣ How awful that many #Egyptian artifacts are in danger of
            being destroyed.
      ‣ What Zahi Hawass must be thinking #jan25 (read in the
            tone of “what were YOU thinking”




Monday, June 6, 2011
Sentiment  Analysis:  Tasks




Monday, June 6, 2011
Sentiment  Analysis:  Tasks

   Classification: overall sentiment polarity: positive/
 neutral/negative
      ‣Example: “How awful that many #Egyptian artifacts are
      in danger of being destroyed.”
      ‣overall polarity is negative
      ‣Target-specific sentiment polarity: positive/neutral/
      negative
      ‣ Example: for target "egyptian artifacts", polarity is
      "negative“ for target "Zahi Hawass", polarity is "neutral“




Monday, June 6, 2011
Sentiment  Analysis:  Tasks




Monday, June 6, 2011
Sentiment  Analysis:  Tasks

   Identification & Extraction: opinion, opinion
 holder, opinion target
   Example: opinion="awful", opinion holder="the
 author", target="egyptian artifacts are in danger"
   Opinion="must be thinking", opinion holder="the
 author", target="Zahi Hawass"




Monday, June 6, 2011
Sentiment  Analysis:  Approaches

       Classification:

      ‣ Supervised: 
        ‣ labeled training data
           ‣ features, differ from traditional topic classification tasks
           ‣ learning strategies
      ‣ Unsupervised:
        ‣ lexicon-based approach
           ‣ Bootstrapping
Monday, June 6, 2011
Sentiment  Analysis:  Approaches




Monday, June 6, 2011
Sentiment  Analysis:  Approaches

      Identification & Extraction:
      ‣utilizing the relations between opinion and opinion target,
      ‣proximity,
      ‣syntactic dependency,
      ‣co-occurrence and
      ‣prepared patterns/rules




Monday, June 6, 2011
Sentiment  Analysis:  
                       From  Tweets  to  polls
                                                                      corpus:  
                                                               •     0.7  billion  tweets,  
                                                                            Jan  2008  –  Oct
                                                                                     2009
                                                               •     1.5  billion  tweets,  
                                                                          Jan  2008  –  May
                                                                                      2010


    Lexicon-based approach for sentiment analysis of tweets:
          subjective lexicon from OpinionFinder (Wilson et al., 2005)
          Within topic tweets, count messages containing these positive and
          negative words defined by the lexicon

Monday, June 6, 2011
Sentiment  Analysis:  
                       From  Tweets  to  polls
                                                                      corpus:  
                                                               •     0.7  billion  tweets,  
                                                                            Jan  2008  –  Oct
                                                                                     2009
                                                               •     1.5  billion  tweets,  
                                                                          Jan  2008  –  May
                                                                                      2010




          subjective lexicon from OpinionFinder (Wilson et al., 2005)
          Within topic tweets, count messages containing these positive and
          negative words defined by the lexicon

Monday, June 6, 2011
Sentiment  Analysis:  
                       From  Tweets  to  polls
                                                                     corpus:  
                                                              •     0.7  billion  tweets,  
                                                                           Jan  2008  –  Oct
                                                                                    2009
                                                              •     1.5  billion  tweets,  
                                                                         Jan  2008  –  May
                                                                                     2010




          Within topic tweets, count messages containing these positive and
          negative words defined by the lexicon

Monday, June 6, 2011
Sentiment  Analysis:  
                       From  Tweets  to  polls
                                                                 corpus:  
                                                  •     0.7  billion  tweets,  
                                                               Jan  2008  –  Oct
                                                                        2009
                                                  •     1.5  billion  tweets,  
                                                             Jan  2008  –  May
                                                                         2010
                                                B.O’Connor,  R.Balasubramanyan,  B.R.Routledge,  and  
                                         N.A.Smith.  From  Tweets  to  polls:  Linking  text  sentiment  to  public  
                                          opinion  time  series.  In  Intl.AAAI  Conference  on  Weblogs  and  
                                                       Social  Media,  Washington,D.C.,2010.




Monday, June 6, 2011
Sentiment  Analysis:  Predicting  
            the  Future  With  Social  Media




     Corpus: 2.89 million tweets referring to 24 movies released over a period of three months
     Sentiment Analysis Classifier:
            DynamicLMClassifier provided by LingPipe linguistic analysis package
           thousands of workers from the Amazon Mechanical Turk to assign
           sentiments (positive, negative, neutral) for a large random sample of tweets
            train the classifier using an n-gram model
        S.  Asur  and  B.Huberman.  Predicting  the  Future  With  Social  Media.  2010.  hMp://arxiv.org/abs/1003.5699

Monday, June 6, 2011
Sentiment  Analysis:  Predicting  
            the  Future  With  Social  Media




     Sentiment Analysis Classifier:
            DynamicLMClassifier provided by LingPipe linguistic analysis package
           thousands of workers from the Amazon Mechanical Turk to assign
           sentiments (positive, negative, neutral) for a large random sample of tweets
            train the classifier using an n-gram model
        S.  Asur  and  B.Huberman.  Predicting  the  Future  With  Social  Media.  2010.  hMp://arxiv.org/abs/1003.5699

Monday, June 6, 2011
Sentiment  Analysis:  Predicting  
            the  Future  With  Social  Media




            DynamicLMClassifier provided by LingPipe linguistic analysis package
           thousands of workers from the Amazon Mechanical Turk to assign
           sentiments (positive, negative, neutral) for a large random sample of tweets
            train the classifier using an n-gram model
        S.  Asur  and  B.Huberman.  Predicting  the  Future  With  Social  Media.  2010.  hMp://arxiv.org/abs/1003.5699

Monday, June 6, 2011
Sentiment  Analysis:  Predicting  
            the  Future  With  Social  Media




           thousands of workers from the Amazon Mechanical Turk to assign
           sentiments (positive, negative, neutral) for a large random sample of tweets
            train the classifier using an n-gram model
        S.  Asur  and  B.Huberman.  Predicting  the  Future  With  Social  Media.  2010.  hMp://arxiv.org/abs/1003.5699

Monday, June 6, 2011
Sentiment  Analysis:  Predicting  
            the  Future  With  Social  Media




            train the classifier using an n-gram model
        S.  Asur  and  B.Huberman.  Predicting  the  Future  With  Social  Media.  2010.  hMp://arxiv.org/abs/1003.5699

Monday, June 6, 2011
Sentiment  Analysis:  Predicting  
            the  Future  With  Social  Media




        S.  Asur  and  B.Huberman.  Predicting  the  Future  With  Social  Media.  2010.  hMp://arxiv.org/abs/1003.5699

Monday, June 6, 2011
Sentiment  Analysis:  Target-­‐‑specific  opinion  
          identification  &  Classification  of  
          Tweets-­‐‑Unsupervised  Approach
       Simple  lexicon-­‐‑based  method  doesn'ʹt  work.




        Observations:
              The opinions may not contribute toward the given target (1,2,3,6)
              The subjectivity and polarity of opinion clues are domain-
              dependent (5,7)
              Single words are not enough (4,7,8)
Monday, June 6, 2011
Sentiment  Analysis:  Target-­‐‑specific  opinion  
           identification  &  Classification  of  
           Tweets-­‐‑Unsupervised  Approach
          General  subjective  lexicon
      ‣     Commonly  used  subjective  lexicon  +  popular  slangs  learned  from  
            Urban  Dictionary
          Domain-­‐‑dependent  sentiment  lexicon
      ‣     Learned  from  domain-­‐‑specific  corpus
           ‣ bootstrapping  
      ‣     More  than  words  (word/phrase/paMern)
           ‣ n-­‐‑gram  +  statistical  model

   


Monday, June 6, 2011
Sentiment  Analysis:  Target-­‐‑specific  opinion  
           identification  &  Classification  of  
           Tweets-­‐‑Unsupervised  Approach
          General  subjective  lexicon
      ‣     Commonly  used  subjective  lexicon  +  popular  slangs  learned  from  
            Urban  Dictionary
          Domain-­‐‑dependent  sentiment  lexicon
      ‣     Learned  from  domain-­‐‑specific  corpus
           ‣ bootstrapping  
      ‣     More  than  words  (word/phrase/paMern)
           ‣ n-­‐‑gram  +  statistical  model

   


Monday, June 6, 2011
Sentiment  Analysis:  Target-­‐‑specific  opinion  
           identification  &  Classification  of  
           Tweets-­‐‑Unsupervised  Approach



          Domain-­‐‑dependent  sentiment  lexicon
      ‣     Learned  from  domain-­‐‑specific  corpus
           ‣ bootstrapping  
      ‣     More  than  words  (word/phrase/paMern)
           ‣ n-­‐‑gram  +  statistical  model

   


Monday, June 6, 2011
Sentiment  Analysis:  Target-­‐‑specific  opinion  
           identification  &  Classification  of  
           Tweets-­‐‑Unsupervised  Approach




   


Monday, June 6, 2011
Sentiment  Analysis:  Target-­‐‑specific  opinion  
          identification  &  Classification  of  
          Tweets-­‐‑Unsupervised  Approach




Monday, June 6, 2011
Sentiment  Analysis:  Target-­‐‑
         specific  opinion  identification  &  
             Classification  of  Tweets-­‐‑
             Unsupervised  Approach




Monday, June 6, 2011
Sentiment  Analysis:  Target-­‐‑
         specific  opinion  identification  &  
             Classification  of  Tweets-­‐‑
             Unsupervised  Approach
         Target-­‐‑specific  opinion  identification/extraction
       ‣ Shallow  syntactic  analysis
       ‣ Rules  +  Proximity




Monday, June 6, 2011
Content  Analysis:  Context  
                   Extraction,  Utilization
        URL  Extraction  is  for  Tweets
        FourSquare  in  Facebook,  TwiMer  
        What  is  it  in  other  mediums/SMS?




Monday, June 6, 2011
Content  Analysis:  
                        URL  extraction
       Resolution
       Semantic Context Relevance




Monday, June 6, 2011
Author  Categorization:  Using  
             Content  to  derive  additional  
                  People  metadata
       Personality Signals
       Blogs, Style of Writing
       Psychometric analysis of content
       Sample study: Gendered writing styles online




Monday, June 6, 2011
People  Analysis:  Using  Network  
             to  derive  People  metadata
       Interesting questions to ask:

      ‣     Who are the most popular people* in the network
      ‣     Who are the most influential people in the network
      ‣     Who are the most active people in the network
      ‣     What are the types of people in communities of the
            network
      ‣ Who are the bridges between communities in the network



Monday, June 6, 2011
People  Analysis:  Influence

       By Link Analysis Algorithms
       Hits [K-99] & variants  
       PageRank [BP-97] & variants  etc..
       Links not sufficient!

      ‣ Million Follower Fallacy [C-10]


                                            Source : informing-arts



Monday, June 6, 2011
People  Analysis:  Influence




Monday, June 6, 2011
People  Analysis:  Influence

          Flavor of Context Analysis (activity level)
          Popularity NOT = Influence!
           ‣ Influence & Passivity [RGAH-10]
          Interest Similarity
           ‣ TwitterRank: Reciprocity & Homophily [WLJH-10]
          Klout Score - True Reach, Amplification [Klout]




Monday, June 6, 2011
People  Analysis:  User  types  
                        &  Affiliation
       Blogger, Scientist, Journalist, Artist, Trustee,
       Company X in  Domain Y..

      ‣ Multiple types and affiliations!
       User interest mining

      ‣ Key Phrase Extraction followed by semantic association on
            user bio, tweets, lists, favorite posts   Source: kahunainstitute.com

           ‣ Twitter Study [BCDMJNRM-09]



Monday, June 6, 2011
People  Analysis:  User  types  
                        &  Affiliation




Monday, June 6, 2011
People  Analysis:  User  types  
                        &  Affiliation
      Semantic analysis of profile description
       ‣ Web Presence: Use of Web & Knowledge bases
           (Wikipedia, Blogs) to build context for user types
       ‣ Entity Spotting & Extraction, followed by Semantic
           Association and Similarity with user-type context




Monday, June 6, 2011
People  Analysis:  
                             Social  Engagement




             Source: http://www.syscomminternational.com/


             Frequency  Distribution  Analysis  of  user  activity
         ‣      posting,  retweet,  reply,  mentions,  lists  etc.  




Monday, June 6, 2011
Network  Analysis  

           Foundation  of  network:  
          •Nodes
          •Connections/Relationships

    Interesting  questions  to  ask:

          How  communities  form  around  topics-­‐‑  growth  &  evolution  

          What  are  the  effects  of  presence  of  influential  participants  in  the  
          communities

          What  are  the  effects  of  content  nature  (or  sentiment,  opinions)  
          flowing  in  network  on  the  community  life

          What  is  the  community  structure:  degree  of  separation  and  sub-­‐‑
          communities
Monday, June 6, 2011
Network  Analysis:  Methods




                                     Source: http://www.kudos-
                                          dynamics.com/




Monday, June 6, 2011
Network  Analysis:  Methods

           Network  Structure  metrics
      Centrality,  Connected  Component,  Avg.  
  Degree,  Clustering  Coefficient,  Avg.  Path  Length,  
      Bridge,  Cohesion,  Prestige,  Reciprocity  
                Important  Literature:                  
   [AB-­‐‑02,  WS-­‐‑98,    BW-­‐‑00;  NW-­‐‑06,  WF-­‐‑92,  MW-­‐‑10]
                                                                         Source: http://www.kudos-
                                                                              dynamics.com/




Monday, June 6, 2011
Network  Analysis:  Algorithms  

       Community Discovery, growth, evolution

      ‣ Based on relationship types (e.g., signed network),
            geography/location based etc.
       Hierarchical clustering algorithms – Top-down,
       bottom-up
       Modularity Maximization [NW-06]
       Algorithms comparison survey [B-06]



Monday, June 6, 2011
Network  Analysis:  Algorithms  

       Graph Partitioning & Traversal
       Best time-complexity & reachability
       Follow Greedy paths

      ‣ K-way multilevel Partitioning ,
      ‣ Bron-Kerbosch, K-plex, K-core or N-cliques, DFS, BFS,
            MST
                       "ʺWe  dream  in  Graph  and  
                        We  analyze  in  Matrix”-­‐‑  
                          Barry  Wellman,  INSNA  


Monday, June 6, 2011
Network  Analysis:  Methods

       Network Modeling Approaches 

      ‣     Random graph model (Erdos-Renyi model)
      ‣     Small-world model (Small World Phenomenon) 
      ‣     Scale-free model (led to Power-Law degree distribution)
      ‣     Social Network Analysis methods
      ‣     Centrality (Degree, Eigenvector, Betweenness, Closeness)
      ‣     Clusters (Cliques and extensions, Communities)

                                  Source: http://www.kudos-
                                       dynamics.com/

Monday, June 6, 2011
Network  Analysis:  
                       Diffusion  &  Homophily
       Information Flow: Diffusion

      ‣ Maximizing Spread (Opinion, Innovation, Recommendation)
      ‣ Outbreak Detection (e.g., disease)
       Social Network: No info about user action–
       Understanding dynamics is challenging!
       Power Law distribution [LAH-07]
       Factors impacting flow:

      ‣ Sampling strategy, user Homophily, content nature
            [CLSCK-10, NPS-10]
Monday, June 6, 2011
Querying




Monday, June 6, 2011
Analysis  &  Visualization  Tools

       (Network WorkBench)NWB

       Truthy

       Graph-tool
       Orange

       Pajek
                                          Source:  hMp://truthy.indiana.edu/

       Tulip

       http://en.wikipedia.org/wiki/
       social_network_analysis_software

Monday, June 6, 2011
Event  Detection




Monday, June 6, 2011
Citizen  Sensing  in  Real-­‐‑time




Monday, June 6, 2011
Real-­‐‑Time  Motivation
       People cant wait for Information
       500 years ago

      ‣     Single life time

       20 years ago

      ‣     Next day or two

           ‣    Television, News papers

       Presently

      ‣     Minutes are not considered fast enough

           ‣    Digital media, Social media 

Monday, June 6, 2011
Real-­‐‑Time  Social  Media

       Is Real-Time the future of Web?
       Social Media for Real-Time Web

      ‣ Disaster Management
        ‣ Ushahidi
      ‣ Real-Time Markets
        ‣ Examples
      ‣ Brand Tracking
        ‣ Twarql
      ‣ Movie reviews
Monday, June 6, 2011
           Scenario


        The	
  Guardian
          Feb	
  2010




Monday, June 6, 2011
           Scenario


        The	
  Guardian
          Feb	
  2010




Monday, June 6, 2011
           Scenario


        The	
  Guardian
          Feb	
  2010




                                      Journalist

Monday, June 6, 2011
Challenges

       Information Overload

      ‣ Can we aggregate, organize and collectively analyze data


       Real Time

      ‣ Can we deliver the data as it is generated




Monday, June 6, 2011
A  Semantic  Web  Approach
       Expressive description of Information need

      ‣ Using SPARQL (Instead of traditional keyword search)
        Flexibility on the point of view

      ‣ Ability to "slice and dice" the data in several dimensions: thematic,
            spatial, temporal, sentiment etc..

       Streaming data with Background Knowledge

      ‣ Enables automatic evolution and serendipity
       Scalable Real-Time delivery 

      ‣ Using sparqlPuSH (SFSW'10)

Monday, June 6, 2011
Concept  Feed




Monday, June 6, 2011
Architecture




Monday, June 6, 2011
Social  Sensor  Server




Monday, June 6, 2011
Metadata  Extractions    
                       (Social  Sensor  Server)
       Named Entity Recognition

      ‣ 2 Million Entities from DBPedia
      ‣ Load as Trie for efficiency
      ‣ N-grams matched
        ‣ Example: Obama, Barack Obama




Monday, June 6, 2011
Metadata  Extractions    
                       (Social  Sensor  Server)
       URL, HashTag Extraction

      ‣ Regex extraction
      ‣ Resolution
        ‣ URL Resolution: Follows http redirects for resolution
           ‣ HashTag Resolution: Tagdef, Tagal,WTHashTag.com




Monday, June 6, 2011
Metadata  Extractions    
                       (Social  Sensor  Server)




Monday, June 6, 2011
Metadata  Extractions    
                       (Social  Sensor  Server)
    Other Metadata provided by Twitter
     ‣ User profile: User Name, Location, Time etc..
     ‣ Tweet: RT, reply etc..




Monday, June 6, 2011
Structured  Data
                       (Social  Sensor  Server)
       RDF Annotation

      ‣ Common RDF/OWL Vocabularies
        ‣ FOAF - (foaf-project.org) Friend of a Friend
           ‣ SIOC - (sioc-project.org) Semantically Interlinked
                Online Communities

           ‣ OPO - (online-presence.net) Online Presence Ontology
           ‣ MOAT - (moat-project.org) — Meaning Of A Tag

Monday, June 6, 2011
Structured  Data
                       (Social  Sensor  Server)




Monday, June 6, 2011
Structured  Data
                       (Social  Sensor  Server)

                                  A snippet of the annotation
                     <http://twitter.com/ bob/statuses/123456789>
                                  rdf:type   sioct:MicroblogPost ;
                  sioc:content  ”Fingers crossed for the upcoming #hcrvote”
                          sioc:hascreator   <http://twitter.com/bob> ;
                            foaf:maker    <http://example.org/bob> ;
                        moat:taggedWith   dbpedia:Healthcare_reform .
                        <http://twitter.com/bob> geonames:locatedIn
                                          Dbpedia:Ohio .




Monday, June 6, 2011
Semantic  Publisher




Monday, June 6, 2011
Semantic  Publisher

       Virtuoso to store triples
       Queries formulated by the users are stored
       SPARQL protocol over the HTTP to access rdf from
       the store
       Combine data from tweet with the background
       knowledge in the rdf store 




Monday, June 6, 2011
Application  Server  &  Distribution  
                     Hub




Monday, June 6, 2011
Application  Server  &  Distribution  
                     Hub
          Distribution  Hub
      ‣       PUSH  Model  -­‐‑  Pubsubhubbub  protocol
      ‣       Pushes  the  tweets  to  the  Application  Server



          Application  Server
      ‣       Delivers  data  to  the  Clients
      ‣       RSS  Enable  Concept  feeds




Monday, June 6, 2011
Brand  Tracking  -­‐‑  Example
                                                                  Background  Knowledge  (e.g.  DBpedia)



          @anonymized
            Lorem ipsum bla bla this is an example tweet




                                              ?category
                    skos:subject

            ?                                                        skos:subject
        competitor                                             skos:subject

                   moat:taggedWith

                                                           dbpedia:IPad
           ?tweet

Monday, June 6, 2011
Brand  Tracking  -­‐‑  Example
                                                          Background  Knowledge  (e.g.  DBpedia)




                                       ?category
                   skos:subject

              ?                                              skos:subject
        competitor bla this is an example tweet
         @anonymized                                   skos:subject
          Lorem ipsum bla

                  moat:taggedWith

                                                   dbpedia:IPad
           ?tweet

Monday, June 6, 2011
Brand  Tracking  -­‐‑  Example
                                                          Background  Knowledge  (e.g.  DBpedia)



                                   category:Wi-Fi
                             category:Touchscreen

                                       ?category
                   skos:subject

              ?                                              skos:subject
        competitor bla this is an example tweet
         @anonymized                                   skos:subject
          Lorem ipsum bla

                  moat:taggedWith

                                                   dbpedia:IPad
           ?tweet

Monday, June 6, 2011
Brand  Tracking  -­‐‑  Example
                                                          Background  Knowledge  (e.g.  DBpedia)
  IPhone
 HPTabletPC
                                   category:Wi-Fi
                             category:Touchscreen

                                       ?category
                   skos:subject

              ?                                              skos:subject
        competitor bla this is an example tweet
         @anonymized                                   skos:subject
          Lorem ipsum bla

                  moat:taggedWith

                                                   dbpedia:IPad
           ?tweet

Monday, June 6, 2011
1242  Articles  from  Nytimes
        Around  800,000  tweets




Monday, June 6, 2011
President  Obama  
        1242  Articles  from  Nytimes     lays  out  plan  for  
        Around  800,000  tweets         Health  care  reform  
                                         in  Speech  to  Joint  
                                        Session  of  Congress  
                                              (10th  Sept  
                                           Timeline.com)




Monday, June 6, 2011
President  Obama  
        1242  Articles  from  Nytimes     lays  out  plan  for  
        Around  800,000  tweets         Health  care  reform  
                                         in  Speech  to  Joint  
                                        Session  of  Congress  
                                              (10th  Sept  
                                           Timeline.com)
                                          Obama  taking  an  
                                        active  role  in  Health  
                                        talks  in  pursuing  his  
                                         proposed  overhaul  
                                            of  health  care  
                                          system.  (13th  Aug  
Monday, June 6, 2011
Twarql  on  Linked  Open  Data




Monday, June 6, 2011
Twarql  on  Linked  Open  Data




Monday, June 6, 2011
Emerging  Research  Areas  




Monday, June 6, 2011
Spam  in  Social  Networks

       Reasons for spamming include:

      ‣ Gaining Popularity
        ‣ Use of popular topic related keywords (e.g. hashtags of
                trending topics) to propagate something off topic.

       Launching malicious attacks

      ‣ Phishing attacks, virus, malware etc.
      ‣ Misleading the masses
        ‣ Propagating false information [MM-10].
Monday, June 6, 2011
Spam  in  Social  Networks

       Gaining popularity using trending keywords:
       This tweet uses #Cairo but refers to a fashion
       website.




Monday, June 6, 2011
Spam  in  Social  Networks

       Gaining popularity using trending keywords:
       This tweet uses #Cairo but refers to a fashion
       website.




Monday, June 6, 2011
Spam  in  Social  Networks

       Gaining popularity using trending keywords:
       This tweet uses #Cairo but refers to a fashion
       website.




Monday, June 6, 2011
Spam  in  Social  Networks

       Gaining popularity using trending keywords:
       This tweet uses #Cairo but refers to a fashion
       website.                 Egypt
                               Protests




Monday, June 6, 2011
Spam  in  Social  Networks

       Gaining popularity using trending keywords:
       This tweet uses #Cairo but refers to a fashion
       website.                 Egypt
                               Protests




Monday, June 6, 2011
Spam  in  Social  Networks

       Gaining popularity using trending keywords:
       This tweet uses #Cairo but refers to a fashion
       website.                 Egypt
                               Protests




Monday, June 6, 2011
Spam  in  Social  Networks

       Gaining popularity using trending keywords:
       This tweet uses #Cairo but refers to a fashion
       website.                 Egypt
                               Protests




Monday, June 6, 2011
Spam  in  Social  Networks

       Gaining popularity using trending keywords:
       This tweet uses #Cairo but refers to a fashion
       website.                 Egypt
                               Protests




Monday, June 6, 2011
Spam  in  Social  Networks

       Gaining popularity using trending keywords:
       This tweet uses #Cairo but refers to a fashion
       website.                 Egypt
                               Protests




Monday, June 6, 2011
Spam  in  Social  Networks

       Spam detection

      ‣ Content-based features
        ‣ Content Size, URL type, spam words
      ‣ Metadata-based features
        ‣ Account information, behavior.
      ‣ Network-based features
        ‣ Provenance. (e.g. content from a reliable source)


Monday, June 6, 2011
Trust  in  Social  Networks

       Reputation, Policy, Evidence, and Provenance used
       to derive trustworthiness.
       Illustrative examples of online cues used for trust
       assessment.

      ‣ Wikipedia: article size, number of references, author, edit
            history, age of the article, edit frequency etc.
      ‣ Product Reviews: number of helpful, very helpful ratings,
            author expertise, sentiments in comments received for a
            review etc.


Monday, June 6, 2011
Trust  in  Social  Networks

       We propose trust ontology[AHTS-10] that

      ‣ Captures semantics of trust.
      ‣ Enables representation and reasoning with trust.
       Semantics of Trust specifies, for a given trustor and
       trustee, the following features.

      ‣ Type - Type of trust relationship.
      ‣ Scope - Context of the trust relationship.
      ‣ Value - Quantifies the trust relationship.

Monday, June 6, 2011
Trust  in  Social  Networks

       Gleaning primitive (edge) trust

      ‣ Trust value between two nodes is quantified using
            numbers. E.g., [0,1] or [-1,1] or partial ordering[TAHS-09].
       Gleaning composite (path) trust

      ‣  Propagation via chaining and aggregation (transitivity)
       Some popular algorithms for trust computation 

      ‣ Eigentrust, Spreading Activation, SUNNY etc.


Monday, June 6, 2011
Integrating  Social  And  
                          Sensor  Networks
       Machine sensor observations are quantitative in
       nature, while human observations can be both
       qualitative and quantitative.
       Benefits of combining observations from humans
       and machine sensors

      ‣ Complementary evidence.
      ‣ Corroborative evidence


Monday, June 6, 2011
Integrating  Social  And  
                          Sensor  Networks
       Applications of integrating heterogeneous sensor
       observations

      ‣ Situation Awareness by using  human observations to
            interpret machine sensor observations.
      ‣ Enhancing trustworthiness using corroborative evidence.




Monday, June 6, 2011
Mobile  Social  Computing

        Instant  Discovery:  Geo-­‐‑tagging  and  location-­‐‑
       aware  services,  in  combination  with  search,  have  
       made  discovery  a  two-­‐‑way  street.
        Compressed  Expression:  Mobile  makes  social  
       networking  even  more  compelling
        Outsourced  Memory:  Cloud-­‐‑based  servers  to  
       store  all  of  their  mobile  applications  and  
       databases

Monday, June 6, 2011
Mobile  Social  Computing




        Compressed  Expression:  Mobile  makes  social  
       networking  even  more  compelling
        Outsourced  Memory:  Cloud-­‐‑based  servers  to  
       store  all  of  their  mobile  applications  and  
       databases

Monday, June 6, 2011
Mobile  Social  Computing




        Outsourced  Memory:  Cloud-­‐‑based  servers  to  
       store  all  of  their  mobile  applications  and  
       databases

Monday, June 6, 2011
Mobile  Social  Computing




Monday, June 6, 2011
Mobile  Social  Computing




Monday, June 6, 2011
Mobile  Social  Computing

         Automated Decisions: Smart apps helps to make
         faster decisions or even apps makes decisions for
         us
         Peer Power: Mobiles can create social movements
         based on peer influence




Monday, June 6, 2011
Mobile  Social  Computing  (Cont.)

       Personalized Branding: advertising are rapidly
       becoming personalized based on individual's needs
       and preferences 
       Mobiles in social development becoming an integral
       part of development 

      ‣ Coordination in disaster situations
      ‣ Health care delivery, especially in developing countries
      ‣ Elections and other forms of political expression

Monday, June 6, 2011
Research  Application:  Twitris




Monday, June 6, 2011
Twitris  -­‐‑  Motivation

       1. Information Overload
       Multiple events around us
       WHAT to be aware of
       Multiple Storylines about same event!!




Monday, June 6, 2011
Twitris  -­‐‑  Motivation

       2. Evolution of Citizen Observation

      ‣ with location and time 




Monday, June 6, 2011
Twitris  -­‐‑  Motivation  

       3. Semantics of Social perceptions

           ‣ What is being said about an event (theme)
           ‣ where (spatial)
           ‣ When (temporal )
       Twitris lets you browse citizen reports using social
       perceptions as the fulcrum



Monday, June 6, 2011
Twitris:  Semantic  Social  Web  
                         Mash-­‐‑up
      Facilitates  understanding  of  multi-­‐‑dimensional  social  perceptions  over  
         SMS,  Tweets,  multimedia  Web  content,  electronic  news  media




Monday, June 6, 2011
Twitris:  Architecture




Monday, June 6, 2011
Citizen Sensing, Social Media Analytics, and Applications
Citizen Sensing, Social Media Analytics, and Applications
Citizen Sensing, Social Media Analytics, and Applications
Citizen Sensing, Social Media Analytics, and Applications
Citizen Sensing, Social Media Analytics, and Applications
Citizen Sensing, Social Media Analytics, and Applications
Citizen Sensing, Social Media Analytics, and Applications
Citizen Sensing, Social Media Analytics, and Applications
Citizen Sensing, Social Media Analytics, and Applications
Citizen Sensing, Social Media Analytics, and Applications
Citizen Sensing, Social Media Analytics, and Applications
Citizen Sensing, Social Media Analytics, and Applications
Citizen Sensing, Social Media Analytics, and Applications
Citizen Sensing, Social Media Analytics, and Applications
Citizen Sensing, Social Media Analytics, and Applications
Citizen Sensing, Social Media Analytics, and Applications
Citizen Sensing, Social Media Analytics, and Applications
Citizen Sensing, Social Media Analytics, and Applications
Citizen Sensing, Social Media Analytics, and Applications
Citizen Sensing, Social Media Analytics, and Applications
Citizen Sensing, Social Media Analytics, and Applications
Citizen Sensing, Social Media Analytics, and Applications
Citizen Sensing, Social Media Analytics, and Applications
Citizen Sensing, Social Media Analytics, and Applications
Citizen Sensing, Social Media Analytics, and Applications
Citizen Sensing, Social Media Analytics, and Applications
Citizen Sensing, Social Media Analytics, and Applications
Citizen Sensing, Social Media Analytics, and Applications
Citizen Sensing, Social Media Analytics, and Applications
Citizen Sensing, Social Media Analytics, and Applications
Citizen Sensing, Social Media Analytics, and Applications
Citizen Sensing, Social Media Analytics, and Applications
Citizen Sensing, Social Media Analytics, and Applications
Citizen Sensing, Social Media Analytics, and Applications
Citizen Sensing, Social Media Analytics, and Applications
Citizen Sensing, Social Media Analytics, and Applications
Citizen Sensing, Social Media Analytics, and Applications
Citizen Sensing, Social Media Analytics, and Applications
Citizen Sensing, Social Media Analytics, and Applications
Citizen Sensing, Social Media Analytics, and Applications
Citizen Sensing, Social Media Analytics, and Applications
Citizen Sensing, Social Media Analytics, and Applications
Citizen Sensing, Social Media Analytics, and Applications
Citizen Sensing, Social Media Analytics, and Applications
Citizen Sensing, Social Media Analytics, and Applications
Citizen Sensing, Social Media Analytics, and Applications

More Related Content

What's hot

tech tools for global collaboration 4 good
tech tools for global collaboration 4 goodtech tools for global collaboration 4 good
tech tools for global collaboration 4 goodAndrew Turner
 
Kin Global Kellogg 2011 Chicago
Kin Global Kellogg 2011 ChicagoKin Global Kellogg 2011 Chicago
Kin Global Kellogg 2011 ChicagoCarlos Dominguez
 
Digitalwindow keynote london #inspire2011
Digitalwindow keynote london #inspire2011Digitalwindow keynote london #inspire2011
Digitalwindow keynote london #inspire2011Vincent Everts
 
Kdd12 tutorial-inf-part-ii
Kdd12 tutorial-inf-part-iiKdd12 tutorial-inf-part-ii
Kdd12 tutorial-inf-part-iiLaks Lakshmanan
 
Kdd12 tutorial-inf-part-i
Kdd12 tutorial-inf-part-iKdd12 tutorial-inf-part-i
Kdd12 tutorial-inf-part-iLaks Lakshmanan
 

What's hot (7)

tech tools for global collaboration 4 good
tech tools for global collaboration 4 goodtech tools for global collaboration 4 good
tech tools for global collaboration 4 good
 
Kin Global Kellogg 2011 Chicago
Kin Global Kellogg 2011 ChicagoKin Global Kellogg 2011 Chicago
Kin Global Kellogg 2011 Chicago
 
Multimedia Privacy
Multimedia PrivacyMultimedia Privacy
Multimedia Privacy
 
Digitalwindow keynote london #inspire2011
Digitalwindow keynote london #inspire2011Digitalwindow keynote london #inspire2011
Digitalwindow keynote london #inspire2011
 
Civic Engagement
Civic EngagementCivic Engagement
Civic Engagement
 
Kdd12 tutorial-inf-part-ii
Kdd12 tutorial-inf-part-iiKdd12 tutorial-inf-part-ii
Kdd12 tutorial-inf-part-ii
 
Kdd12 tutorial-inf-part-i
Kdd12 tutorial-inf-part-iKdd12 tutorial-inf-part-i
Kdd12 tutorial-inf-part-i
 

Similar to Citizen Sensing, Social Media Analytics, and Applications

Crowdsourcing 102: Mining Real-Time Data
Crowdsourcing 102: Mining Real-Time DataCrowdsourcing 102: Mining Real-Time Data
Crowdsourcing 102: Mining Real-Time DataUshahidi
 
Gone Viral - the Growth of Social Media in Public Health
Gone Viral - the Growth of Social Media in Public HealthGone Viral - the Growth of Social Media in Public Health
Gone Viral - the Growth of Social Media in Public HealthJodi Sperber
 
Citizens vs Disaster: How Technology Is Changing the Rules of Engagement
Citizens vs Disaster: How Technology Is Changing the Rules of EngagementCitizens vs Disaster: How Technology Is Changing the Rules of Engagement
Citizens vs Disaster: How Technology Is Changing the Rules of EngagementMark Belinsky
 
Citizen Sensor Data Mining, Social Media Analytics and Applications
Citizen Sensor Data Mining, Social Media Analytics and ApplicationsCitizen Sensor Data Mining, Social Media Analytics and Applications
Citizen Sensor Data Mining, Social Media Analytics and ApplicationsAmit Sheth
 
JTerm Day 2 - History, Definitions & Stats
JTerm Day 2 - History, Definitions & StatsJTerm Day 2 - History, Definitions & Stats
JTerm Day 2 - History, Definitions & StatsAndrew Hoffman
 
Social media mining PPT
Social media mining PPTSocial media mining PPT
Social media mining PPTChhavi Mathur
 
Lasa cyp social media
Lasa cyp social mediaLasa cyp social media
Lasa cyp social mediaMiles Maier
 
Social Media Planning for Medical Reserve Corps Units
Social Media Planning for Medical Reserve Corps UnitsSocial Media Planning for Medical Reserve Corps Units
Social Media Planning for Medical Reserve Corps Unitsacobb027
 
The State of Social Media in Federal Government - April 2012
The State of Social Media in Federal Government - April 2012The State of Social Media in Federal Government - April 2012
The State of Social Media in Federal Government - April 2012GovLoop
 
Doncaster CVS Social Media Introduction
Doncaster CVS Social Media IntroductionDoncaster CVS Social Media Introduction
Doncaster CVS Social Media IntroductionLasa UK
 
Summer Social Webshop: Technology-Mediated Social Participation
Summer Social Webshop: Technology-Mediated Social ParticipationSummer Social Webshop: Technology-Mediated Social Participation
Summer Social Webshop: Technology-Mediated Social ParticipationUniversity of Maryland
 
Transforming Social Big Data into Timely Decisions and Actions for Crisis Mi...
Transforming Social Big Data into Timely Decisions  and Actions for Crisis Mi...Transforming Social Big Data into Timely Decisions  and Actions for Crisis Mi...
Transforming Social Big Data into Timely Decisions and Actions for Crisis Mi...Amit Sheth
 

Similar to Citizen Sensing, Social Media Analytics, and Applications (20)

Crowdsourcing 102: Mining Real-Time Data
Crowdsourcing 102: Mining Real-Time DataCrowdsourcing 102: Mining Real-Time Data
Crowdsourcing 102: Mining Real-Time Data
 
Nfais social discovery-v5
Nfais social discovery-v5Nfais social discovery-v5
Nfais social discovery-v5
 
Gone Viral - the Growth of Social Media in Public Health
Gone Viral - the Growth of Social Media in Public HealthGone Viral - the Growth of Social Media in Public Health
Gone Viral - the Growth of Social Media in Public Health
 
Citizens vs Disaster: How Technology Is Changing the Rules of Engagement
Citizens vs Disaster: How Technology Is Changing the Rules of EngagementCitizens vs Disaster: How Technology Is Changing the Rules of Engagement
Citizens vs Disaster: How Technology Is Changing the Rules of Engagement
 
Citizen Sensor Data Mining, Social Media Analytics and Applications
Citizen Sensor Data Mining, Social Media Analytics and ApplicationsCitizen Sensor Data Mining, Social Media Analytics and Applications
Citizen Sensor Data Mining, Social Media Analytics and Applications
 
JTerm Day 2 - History, Definitions & Stats
JTerm Day 2 - History, Definitions & StatsJTerm Day 2 - History, Definitions & Stats
JTerm Day 2 - History, Definitions & Stats
 
U.S. Conference on AIDS 2009: Experienced Users New Media Institute by AIDS.gov
U.S. Conference on AIDS 2009: Experienced Users New Media Institute by AIDS.govU.S. Conference on AIDS 2009: Experienced Users New Media Institute by AIDS.gov
U.S. Conference on AIDS 2009: Experienced Users New Media Institute by AIDS.gov
 
Social media mining PPT
Social media mining PPTSocial media mining PPT
Social media mining PPT
 
Social Media and Public Health
Social Media and Public HealthSocial Media and Public Health
Social Media and Public Health
 
Interaction, gender and videogames
Interaction, gender and videogamesInteraction, gender and videogames
Interaction, gender and videogames
 
Social media
Social mediaSocial media
Social media
 
Lasa cyp social media
Lasa cyp social mediaLasa cyp social media
Lasa cyp social media
 
Social Media Planning for Medical Reserve Corps Units
Social Media Planning for Medical Reserve Corps UnitsSocial Media Planning for Medical Reserve Corps Units
Social Media Planning for Medical Reserve Corps Units
 
The State of Social Media in Federal Government - April 2012
The State of Social Media in Federal Government - April 2012The State of Social Media in Federal Government - April 2012
The State of Social Media in Federal Government - April 2012
 
Midwest Disasters 2.0 - Technology Trends for First Responders
Midwest Disasters 2.0 - Technology Trends for First RespondersMidwest Disasters 2.0 - Technology Trends for First Responders
Midwest Disasters 2.0 - Technology Trends for First Responders
 
Segmenting the Health Consumer Population
Segmenting the Health Consumer PopulationSegmenting the Health Consumer Population
Segmenting the Health Consumer Population
 
Doncaster CVS Social Media Introduction
Doncaster CVS Social Media IntroductionDoncaster CVS Social Media Introduction
Doncaster CVS Social Media Introduction
 
Summer Social Webshop: Technology-Mediated Social Participation
Summer Social Webshop: Technology-Mediated Social ParticipationSummer Social Webshop: Technology-Mediated Social Participation
Summer Social Webshop: Technology-Mediated Social Participation
 
AIDS.gov's Presentation on New Media Strategy for CDC's National Conference o...
AIDS.gov's Presentation on New Media Strategy for CDC's National Conference o...AIDS.gov's Presentation on New Media Strategy for CDC's National Conference o...
AIDS.gov's Presentation on New Media Strategy for CDC's National Conference o...
 
Transforming Social Big Data into Timely Decisions and Actions for Crisis Mi...
Transforming Social Big Data into Timely Decisions  and Actions for Crisis Mi...Transforming Social Big Data into Timely Decisions  and Actions for Crisis Mi...
Transforming Social Big Data into Timely Decisions and Actions for Crisis Mi...
 

Recently uploaded

4.16.24 Poverty and Precarity--Desmond.pptx
4.16.24 Poverty and Precarity--Desmond.pptx4.16.24 Poverty and Precarity--Desmond.pptx
4.16.24 Poverty and Precarity--Desmond.pptxmary850239
 
BIOCHEMISTRY-CARBOHYDRATE METABOLISM CHAPTER 2.pptx
BIOCHEMISTRY-CARBOHYDRATE METABOLISM CHAPTER 2.pptxBIOCHEMISTRY-CARBOHYDRATE METABOLISM CHAPTER 2.pptx
BIOCHEMISTRY-CARBOHYDRATE METABOLISM CHAPTER 2.pptxSayali Powar
 
Q-Factor General Quiz-7th April 2024, Quiz Club NITW
Q-Factor General Quiz-7th April 2024, Quiz Club NITWQ-Factor General Quiz-7th April 2024, Quiz Club NITW
Q-Factor General Quiz-7th April 2024, Quiz Club NITWQuiz Club NITW
 
Using Grammatical Signals Suitable to Patterns of Idea Development
Using Grammatical Signals Suitable to Patterns of Idea DevelopmentUsing Grammatical Signals Suitable to Patterns of Idea Development
Using Grammatical Signals Suitable to Patterns of Idea Developmentchesterberbo7
 
4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptx4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptxmary850239
 
DIFFERENT BASKETRY IN THE PHILIPPINES PPT.pptx
DIFFERENT BASKETRY IN THE PHILIPPINES PPT.pptxDIFFERENT BASKETRY IN THE PHILIPPINES PPT.pptx
DIFFERENT BASKETRY IN THE PHILIPPINES PPT.pptxMichelleTuguinay1
 
Scientific Writing :Research Discourse
Scientific  Writing :Research  DiscourseScientific  Writing :Research  Discourse
Scientific Writing :Research DiscourseAnita GoswamiGiri
 
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptxQ4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptxlancelewisportillo
 
Reading and Writing Skills 11 quarter 4 melc 1
Reading and Writing Skills 11 quarter 4 melc 1Reading and Writing Skills 11 quarter 4 melc 1
Reading and Writing Skills 11 quarter 4 melc 1GloryAnnCastre1
 
Daily Lesson Plan in Mathematics Quarter 4
Daily Lesson Plan in Mathematics Quarter 4Daily Lesson Plan in Mathematics Quarter 4
Daily Lesson Plan in Mathematics Quarter 4JOYLYNSAMANIEGO
 
Unraveling Hypertext_ Analyzing Postmodern Elements in Literature.pptx
Unraveling Hypertext_ Analyzing  Postmodern Elements in  Literature.pptxUnraveling Hypertext_ Analyzing  Postmodern Elements in  Literature.pptx
Unraveling Hypertext_ Analyzing Postmodern Elements in Literature.pptxDhatriParmar
 
MS4 level being good citizen -imperative- (1) (1).pdf
MS4 level   being good citizen -imperative- (1) (1).pdfMS4 level   being good citizen -imperative- (1) (1).pdf
MS4 level being good citizen -imperative- (1) (1).pdfMr Bounab Samir
 
31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...
31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...
31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...Nguyen Thanh Tu Collection
 
Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...
Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...
Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...DhatriParmar
 
Textual Evidence in Reading and Writing of SHS
Textual Evidence in Reading and Writing of SHSTextual Evidence in Reading and Writing of SHS
Textual Evidence in Reading and Writing of SHSMae Pangan
 
ClimART Action | eTwinning Project
ClimART Action    |    eTwinning ProjectClimART Action    |    eTwinning Project
ClimART Action | eTwinning Projectjordimapav
 
Mental Health Awareness - a toolkit for supporting young minds
Mental Health Awareness - a toolkit for supporting young mindsMental Health Awareness - a toolkit for supporting young minds
Mental Health Awareness - a toolkit for supporting young mindsPooky Knightsmith
 
Mythology Quiz-4th April 2024, Quiz Club NITW
Mythology Quiz-4th April 2024, Quiz Club NITWMythology Quiz-4th April 2024, Quiz Club NITW
Mythology Quiz-4th April 2024, Quiz Club NITWQuiz Club NITW
 
Measures of Position DECILES for ungrouped data
Measures of Position DECILES for ungrouped dataMeasures of Position DECILES for ungrouped data
Measures of Position DECILES for ungrouped dataBabyAnnMotar
 

Recently uploaded (20)

4.16.24 Poverty and Precarity--Desmond.pptx
4.16.24 Poverty and Precarity--Desmond.pptx4.16.24 Poverty and Precarity--Desmond.pptx
4.16.24 Poverty and Precarity--Desmond.pptx
 
BIOCHEMISTRY-CARBOHYDRATE METABOLISM CHAPTER 2.pptx
BIOCHEMISTRY-CARBOHYDRATE METABOLISM CHAPTER 2.pptxBIOCHEMISTRY-CARBOHYDRATE METABOLISM CHAPTER 2.pptx
BIOCHEMISTRY-CARBOHYDRATE METABOLISM CHAPTER 2.pptx
 
Q-Factor General Quiz-7th April 2024, Quiz Club NITW
Q-Factor General Quiz-7th April 2024, Quiz Club NITWQ-Factor General Quiz-7th April 2024, Quiz Club NITW
Q-Factor General Quiz-7th April 2024, Quiz Club NITW
 
Using Grammatical Signals Suitable to Patterns of Idea Development
Using Grammatical Signals Suitable to Patterns of Idea DevelopmentUsing Grammatical Signals Suitable to Patterns of Idea Development
Using Grammatical Signals Suitable to Patterns of Idea Development
 
4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptx4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptx
 
DIFFERENT BASKETRY IN THE PHILIPPINES PPT.pptx
DIFFERENT BASKETRY IN THE PHILIPPINES PPT.pptxDIFFERENT BASKETRY IN THE PHILIPPINES PPT.pptx
DIFFERENT BASKETRY IN THE PHILIPPINES PPT.pptx
 
Scientific Writing :Research Discourse
Scientific  Writing :Research  DiscourseScientific  Writing :Research  Discourse
Scientific Writing :Research Discourse
 
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptxQ4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
 
Reading and Writing Skills 11 quarter 4 melc 1
Reading and Writing Skills 11 quarter 4 melc 1Reading and Writing Skills 11 quarter 4 melc 1
Reading and Writing Skills 11 quarter 4 melc 1
 
Daily Lesson Plan in Mathematics Quarter 4
Daily Lesson Plan in Mathematics Quarter 4Daily Lesson Plan in Mathematics Quarter 4
Daily Lesson Plan in Mathematics Quarter 4
 
Unraveling Hypertext_ Analyzing Postmodern Elements in Literature.pptx
Unraveling Hypertext_ Analyzing  Postmodern Elements in  Literature.pptxUnraveling Hypertext_ Analyzing  Postmodern Elements in  Literature.pptx
Unraveling Hypertext_ Analyzing Postmodern Elements in Literature.pptx
 
MS4 level being good citizen -imperative- (1) (1).pdf
MS4 level   being good citizen -imperative- (1) (1).pdfMS4 level   being good citizen -imperative- (1) (1).pdf
MS4 level being good citizen -imperative- (1) (1).pdf
 
prashanth updated resume 2024 for Teaching Profession
prashanth updated resume 2024 for Teaching Professionprashanth updated resume 2024 for Teaching Profession
prashanth updated resume 2024 for Teaching Profession
 
31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...
31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...
31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...
 
Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...
Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...
Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...
 
Textual Evidence in Reading and Writing of SHS
Textual Evidence in Reading and Writing of SHSTextual Evidence in Reading and Writing of SHS
Textual Evidence in Reading and Writing of SHS
 
ClimART Action | eTwinning Project
ClimART Action    |    eTwinning ProjectClimART Action    |    eTwinning Project
ClimART Action | eTwinning Project
 
Mental Health Awareness - a toolkit for supporting young minds
Mental Health Awareness - a toolkit for supporting young mindsMental Health Awareness - a toolkit for supporting young minds
Mental Health Awareness - a toolkit for supporting young minds
 
Mythology Quiz-4th April 2024, Quiz Club NITW
Mythology Quiz-4th April 2024, Quiz Club NITWMythology Quiz-4th April 2024, Quiz Club NITW
Mythology Quiz-4th April 2024, Quiz Club NITW
 
Measures of Position DECILES for ungrouped data
Measures of Position DECILES for ungrouped dataMeasures of Position DECILES for ungrouped data
Measures of Position DECILES for ungrouped data
 

Citizen Sensing, Social Media Analytics, and Applications

  • 1. Citizen  Sensor  Data  Mining,     Social  Media  Analytics  and   Development  Centric  Web  Applications. Tutorial  at   Semantic  Technology  Conference,   San  Francisco,  CA. Karthik Gomadam Amit Sheth Selvam Velmurugan Accenture Technology Labs, Kno.e.sis @ eMoksha, Kiirti San Jose Wright State University Monday, June 6, 2011
  • 2. Meena Nagarajan Selvam Velmurugan (Content Analysis) (Kiirti, eMoksha NGOs) Hemant Purohit Amit Sheth (People & Network analysis) (Semantic Web) Ashutosh Jadhav (Event Analysis) Lu Chen Pramod Anantharam (Sentiment Analysis) (Social & Sensor web) Pavan Kapanipathi (Real Time Web) Monday, June 6, 2011
  • 3. A  Quick  Word Much  of  the  work  discussed  in  this  tutorial  is   primarily  the  doctoral  research  by  Dr.  Meena   Nagarajan,  currently  at  IBM  Almaden.  It  also   includes  current  work  done  at  kno.e.sis  center  at   Wright  State  University. Monday, June 6, 2011
  • 4. Outline Citizen  Sensing:  Role,  Enablers,  Apps     Systematic  Study  Social  Media Citizen  Sensing  @  Real-­‐‑time Emerging  Research  Areas ‣ Spam  and  Trust  in  Social  Media,  Mobile  Social  Computing Research  Application:  Twitris Tutorial  part  2   Monday, June 6, 2011
  • 5. Citizen  Sensing Everyday users of Web2.0 and social networks: Citizens of an Internet- or Web-enabled social community Observation and Information reported by citizens => Citizen Sensing Human-in-the-loop (participatory) sensing + Web 2.0 + mobile computing = emergence of  " citizen-sensor networks Monday, June 6, 2011
  • 6. Social  Signals The activity of observing, reporting, disseminating information via text, audio, video and built in device sensor (and smart devices), ‣ Creating social signals through aggregation, enhancement, analysis, visualization, and interpretation. Immense potential to disseminate information quickly and in real-time Monday, June 6, 2011
  • 7. Enablers:  Mobile  Devices  &   Ubiquitous  Connectivity Mobile device fast emerging as our primary tool ‣ Redefines the way we engage with people, information, etc. Global, Ubiquitous, always available Sense where you are, how you are, … Monday, June 6, 2011
  • 8. Enablers:  Mobile  Devices  &   Ubiquitous  Connectivity Global, Ubiquitous, always available Sense where you are, how you are, … Monday, June 6, 2011
  • 9. Enablers:  Mobile  Devices  &   Ubiquitous  Connectivity Sense where you are, how you are, … Monday, June 6, 2011
  • 10. Enablers:  Mobile  Devices  &   Ubiquitous  Connectivity Monday, June 6, 2011
  • 11. Enablers:  Mobile  Devices  &   Ubiquitous  Connectivity Mobile Platforms Hit Critical Mass  ‣ Over 5 billion users ‣ 1+B with internet connected mobile devices (2010) ‣ Smartphones > Notebooks + Netbooks (2010E) ‣ 500K+ mobile phone applications ‣ 74% of mobile phone users (2.4B) worldwide texted (2007) Monday, June 6, 2011
  • 12. Enablers:  Web  2.0  &  Social  Media 500M+ Facebook Users 100M+ Twitter users, 85M+ tweets/day Internet Users: 1.8 Bln Content dissemination medium ‣ Even for traditional media (@cnn, @nytimes) Monday, June 6, 2011
  • 13. Enablers:  Web  2.0  &  Social  Media 100M+ Twitter users, 85M+ tweets/day Internet Users: 1.8 Bln Content dissemination medium ‣ Even for traditional media (@cnn, @nytimes) Monday, June 6, 2011
  • 14. Enablers:  Web  2.0  &  Social  Media Internet Users: 1.8 Bln Content dissemination medium ‣ Even for traditional media (@cnn, @nytimes) Monday, June 6, 2011
  • 15. Enablers:  Web  2.0  &  Social  Media Content dissemination medium ‣ Even for traditional media (@cnn, @nytimes) Monday, June 6, 2011
  • 16. Enablers:  Web  2.0  &  Social  Media Monday, June 6, 2011
  • 17. Enablers:  Web  2.0  &  Social  Media Types of UGC: Twitter(text/microblogs), Facebook (multimedia),YouTube(videos), Flicker(images), Blogs(text),  Ping: (Social network for music)  Monday, June 6, 2011
  • 18. Enablers:  Web  2.0  &  Social  Media Flicker(images), Blogs(text),  Ping: (Social network for music)  Monday, June 6, 2011
  • 19. Enablers:  Web  2.0  &  Social  Media Ping: (Social network for music)  Monday, June 6, 2011
  • 20. Enablers:  Web  2.0  &  Social  Media Monday, June 6, 2011
  • 21. Citizen  Sensors  in  Action Iran election Haiti Earthquake US healthcare debate Monday, June 6, 2011
  • 22. Revolution  2.0    Political/Social  Activism “If you want to liberate a government, give them the internet.” - Wael Ghonim (Egyptian social activist) When Blitzer asked “Tunisia, then Egypt, what’s next?,” Ghonim replied succinctly “Ask Facebook.” Monday, June 6, 2011
  • 23. Revolution  2.0    Political/Social  Activism When Blitzer asked “Tunisia, then Egypt, what’s next?,” Ghonim replied succinctly “Ask Facebook.” Monday, June 6, 2011
  • 24. Revolution  2.0    Political/Social  Activism Monday, June 6, 2011
  • 25. Citizen  Journalism Twitter Journalism Monday, June 6, 2011
  • 26. Social  Media  Influence:   Intelligence,  News  &  Analysis   Many media companies use Facebook and Twitter as news-delivery platform. Many individuals rely on them as news source. News is increasingly social. Monday, June 6, 2011
  • 27. Business  Intelligence  Trend   SpoTing,  Forecasting,  Brand   Tracking    and  Crisis  Management Sysomos  : http://www.sysomos.com/ Trendspotting  : http://trendspotting.com Simplify : http://simplify360.com/ Shoutlet  : http://www.shoutlet.com/ Reputation (Defender)  : http://www.reputationdefender.com/ Monday, June 6, 2011
  • 28. Development   (Education,  Health,  eGov) LiveMocha  (http://www.livemocha.com/) ‣ Online Language learning tool with social engagement  ‣ bridging the gap!! Soliya (http://www.soliya.net/) ‣ Dialogue between students from diverse " backgrounds across the globe using latest multimedia technologies Project Einstein (http://digital-democracy.org/what-we-do/programs/)  ‣ A photography-based digital penpal program connecting youths in refugee camps to the world Monday, June 6, 2011
  • 29. Development   (Education,  Health,  eGov) Soliya (http://www.soliya.net/) ‣ Dialogue between students from diverse " backgrounds across the globe using latest multimedia technologies Project Einstein (http://digital-democracy.org/what-we-do/programs/)  ‣ A photography-based digital penpal program connecting youths in refugee camps to the world Monday, June 6, 2011
  • 30. Development   (Education,  Health,  eGov) Project Einstein (http://digital-democracy.org/what-we-do/programs/)  ‣ A photography-based digital penpal program connecting youths in refugee camps to the world Monday, June 6, 2011
  • 31. Development   (Education,  Health,  eGov) Monday, June 6, 2011
  • 32. Development   (Education,  Health,  eGov) PatientsLikeMe (http://mashable.com/2010/07/13/social-media-health-trends/)   TrialX (http://trialx.com) Image:  hMp://www.dragonsearchmarketing.com/ blog/ social-­‐‑media-­‐‑development-­‐‑through-­‐‑visual-­‐‑aids-­‐‑ tools/   Monday, June 6, 2011
  • 33. Why  People-­‐‑Content-­‐‑Network   metadata? Monday, June 6, 2011
  • 34. Dimensions  of  Systematic  Study   of  Social  Media Spatio - Temporal -Thematic + People - Content - Network Monday, June 6, 2011
  • 35. Social  Information Processing "Who says what, to whom, why, to what extent and with what effect?" [Laswell] Network: Social structure emerges from the aggregate of relationships (ties) People: poster identities, the active effort of accomplishing interaction Content : studying the content of ommunication.  Monday, June 6, 2011
  • 36. Studying  Online  Human  Social   Dynamics How  does  the  (semantics  or  style  of)  content  fit   into  the  observations  made  about  the  network? ‣ Often,  the  three-­‐‑dimensional  dynamic  of  people,   content  and  link  structure  is  what  shapes  the  social   dynamic.   Monday, June 6, 2011
  • 37. Studying  Online  Human  Social   Dynamics Monday, June 6, 2011
  • 38. Studying  Online  Human  Social   Dynamics Example:  how  does  the  topic  of  discussion,   emotional  charge  of  a  conversation,  the  presence  of  an   expert  and  connections  between  participants;  together   explain  information  propagation  in  a  social  network?   Monday, June 6, 2011
  • 39. Studying  Online  Human  Social   Dynamics Monday, June 6, 2011
  • 40. Metadata/Annotations Metadata: an organized way to study ‣ types ‣ creation/extraction and storage ‣ use Monday, June 6, 2011
  • 41. The  Anatomy  of  a  Tweet Monday, June 6, 2011
  • 42. People  Metadata:  Variety  of   Self-­‐‑expression  Modes  on    Multiple   Social  Media  Platforms Explicit  information  from  user  profiles   ‣ User  Names,  Pictures,  Videos,  Links,  Demographic   Information,  Group  memberships... ‣ Often  is  not  updated         Implicit  information  from  user  a+ention  metadata ‣ Page  views,  Facebook  'ʹLikes'ʹ,  Comments;  TwiMer   'ʹFollows'ʹ,  Retweets,  Replies..  Monday, June 6, 2011
  • 43. People  Metadata:  Various  Levels Demographic Interests Activity Network Monday, June 6, 2011
  • 44. People  Metadata:  Continued User Demographic Metadata Interest Level Metadata •User-id •Author type   •Screen/Display-name of •Trustee/donor, journalist, user blogger, scientist etc. •Real name of user • Favorite tweets •Location • Types of lists subscribed •Profile Creation Date • Style of Writing – •User description personality indicator •User Bio • No. of Followees •URL • Author type trend of Followees Monday, June 6, 2011
  • 45. People  Metadata:  Continued Activity  Level  Metadata Influence  Level  Metadata   (Inferring  People  Metadata  from  Network  level  Information) •Age  of  the  profile •No.  of  Followers  –  normal,  influential •Frequency  of  posts •No.  of  Mentions •Timestamp  of  last  status •No.  of  Retweets/Forwards •No.  of  Posts •No.  of  Replies •No.  of  Lists/groups  created •No.  of  Lists/groups  following   •No.  of  Lists/groups  subscribed •No.  of  people  following  back •Authority  &  Hub  Scores Web Presence: •User affiliations •KLOUT Score – influence measure (www.klout.com) Monday, June 6, 2011
  • 46. Content  Metadata Content Independent metadata ‣" date, location, author etc Content Dependent metadata ‣ Direct content-based metadata ‣ Explicit/Mentioned Content metadata ‣ named entities in content ‣ Implicit/Inferred Content Metadata ‣ related named entities from knowledge sources ‣ Indirect content-based metadata (External metadata) ‣ context inferred from URLs in content (images, links to articles, FourSquare checkins etc.) Monday, June 6, 2011
  • 47. Content  Metadata Content Dependent metadata ‣ Direct content-based metadata ‣ Explicit/Mentioned Content metadata ‣ named entities in content ‣ Implicit/Inferred Content Metadata ‣ related named entities from knowledge sources ‣ Indirect content-based metadata (External metadata) ‣ context inferred from URLs in content (images, links to articles, FourSquare checkins etc.) Monday, June 6, 2011
  • 49. Content  Independent  Metadata For Tweets ‣ Published date and time ‣ Location (where tweet was generated from) ‣ Tweet posting method (smart-phone, twitter.com, clients for twitter) ‣ Author information Monday, June 6, 2011
  • 51. Content  Independent  Metadata For Text messages ‣ Published date and time ‣ Origin location ‣ Recipient ‣ Carrier information Monday, June 6, 2011
  • 54. Content  Dependent  Metadata  (Tweet)   Direct  Content-­‐‑based  Metadata Direct Content-based Metadata Indirect content-based metadata (External metadata) Monday, June 6, 2011
  • 55. Content  Dependent  Metadata   Direct  Content-­‐‑based  Metadata Monday, June 6, 2011
  • 56. Network  Metadata Connections/Relationships (foundation for the network) matter! Structure  Level  Metadata Relationship  Level  Metadata •Community  Size •Type  of  Relationship •Community  growth  rate •Relationship  strength •Largest  Strongly  Connected   •User  Homophily  based  on   Component  size certain  characteristic  (e.g.,   •Weakly  Connected  Components   Location,  interest  etc.) &  Max.  size •Reciprocity:  mutual  relationship •Average  Degree  of  Separation •Active  Community/  Ties •Clustering  Coefficient   Monday, June 6, 2011
  • 57. Metadata:  Creation,  Extraction   and  Storage Monday, June 6, 2011
  • 58. Metadata  Creation  &  Extraction Extracted Metadata ‣ Directly visible information from the user profile, tweet content & community structure Created Metadata ‣ After processing information in the user profile, content and/or network structure Monday, June 6, 2011
  • 59. An  Example Length: 144 characters; General topic: Egypt protest  This poor {sentiment_expression: {target:”Lara Logan”, polarity:”negative”}} woman! RT @THR CBS News'{entity:{type=”News Agency”}} Lara Logan {entity:{type=”Person”}} Released From Hospital {entity:{type=”Location”}} After Egypt{entity: {type=”Country”} Assault{type=”topic”} http://bit.ly/dKWTY0 {external_URL} Monday, June 6, 2011
  • 60. Why  Semantic  Web  is  a  standard     for  social  metadata? Rich  Snippet,  RDFa,  open  graph,  semantic  web   based  social  data  standards Relationships/connections  play  central  role ‣ Relationships  as  first  class  object  is  important Monday, June 6, 2011
  • 61. Semantic  Web:  A  Very  Short   Primer Monday, June 6, 2011
  • 62. Semantic  Web:  A  Very  Short   Primer Representation ‣ RDF ‣ relationships as first class object <subject, predicate,object> ‣ OWL ‣ Representing Knowledge  and Agreements: nomenclature, taxonomy, folksonomy, ontology Monday, June 6, 2011
  • 63. Semantic  Web:  A  Very  Short   Primer Monday, June 6, 2011
  • 64. Semantic  Web:  A  Very  Short   Primer Annotation ‣ RDFa, Xlink, model reference Monday, June 6, 2011
  • 65. Semantic  Web:  A  Very  Short   Primer Annotation ‣ RDFa, Xlink, model reference Web of Data ‣ Linked Open Data  Monday, June 6, 2011
  • 66. Semantic  Web:  A  Very  Short   Primer Annotation ‣ RDFa, Xlink, model reference Web of Data ‣ Linked Open Data  Querying ‣ SPARQL; Rules: SWRL, RIF Monday, June 6, 2011
  • 67. How  to  save  and  use  metadata? Store metadata as data and use standard database techniques Use filtering and clustering, summarization, statistics - implicit semantics Monday, June 6, 2011
  • 68. How  to  save  and  use  metadata? Use filtering and clustering, summarization, statistics - implicit semantics Monday, June 6, 2011
  • 69. How  to  save  and  use  metadata? Monday, June 6, 2011
  • 70. How  to  save  and  use  metadata? Monday, June 6, 2011
  • 71. How  to  save  and  use  metadata? Use explicit semantics and Semantic Web standards and technologies ‣semantics = meaning ‣richer representation, support for relationships, context ‣supports use of background knowledge ‣better integration, powerful analysis  Semantics- the implicit, the formal and the powerful Social metadata on the Web Monday, June 6, 2011
  • 72. Metadata  Extraction  from   Informal  Text Meena Nagarajan, Understanding User-Generated Content on Social Media, Ph.D. Dissertation, Wright State University, 2010 Monday, June 6, 2011
  • 73. Characteristics  of  Text  on  Social   Media Monday, June 6, 2011
  • 74. The  Formality  of  Text Monday, June 6, 2011
  • 75. Content  Analysis-­‐‑Typical  Sub-­‐‑tasks Recognize key entities mentioned in content ‣ Information Extraction (entity recognition, anaphora resolution, entity classification..) ‣ Discovery of Semantic Associations between entities Topic Classification, Aboutness of content  ‣ What is the content about? Intention Analysis  ‣ Why did they share this content? Monday, June 6, 2011
  • 76. Content  Analysis-­‐‑Typical  Sub-­‐‑tasks Topic Classification, Aboutness of content  ‣ What is the content about? Intention Analysis  ‣ Why did they share this content? Monday, June 6, 2011
  • 77. Content  Analysis-­‐‑Typical  Sub-­‐‑tasks Intention Analysis  ‣ Why did they share this content? Monday, June 6, 2011
  • 80. Content  Analysis-­‐‑Typical  Sub-­‐‑tasks Sentiment Analysis ‣What opinions are people conveying via the content? Author Profiling ‣What can we infer about the author from the content he posts? Context (external to content) extraction ‣URL extraction, analyzing external content Monday, June 6, 2011
  • 81. Research  Efforts,  Contributions  in   this  space.. Examining usefulness of multiple context cues for text mining algorithms ‣ Compensating for for informal, highly variable language, lack of context ‣ Using context cues: Document corpus, syntactic, structural cues, social medium, external domain knowledge… In this talk, highlighting sample metadata creation tasks: NER, Key Phrase Extraction, Intention, Sentiment/Opinion Mining Monday, June 6, 2011
  • 82. Part  1.  NER,                                                              Key   Phrase  Extraction Named Entity Recognition ‣ I loved <movie> the hangover </movie>! Key Phrase Extraction Monday, June 6, 2011
  • 83. Multiple  Context  Cues  Utilized  for   NER  in  Blogs  and  MySpace   Monday, June 6, 2011
  • 84. Multiple  Context  Cues  Utilized  for   Keyphrase  Extraction  from  TwiTer,   Facebook  and  MySpace Monday, June 6, 2011
  • 85. Focus,  Impact Techniques focus on ‣ relatively less explored content aspects on social media platforms Combination of top-down, bottom-up analysis for informal text ‣ Statistical NLP, ML algorithms over large corpora ‣ Models and rich knowledge bases in a domain Monday, June 6, 2011
  • 86. NAMED  ENTITY   RECOGNITION Monday, June 6, 2011
  • 87. NAMED  ENTITY   RECOGNITION I loved your music Yesterday! “It was THE HANGOVER of the year..lasted forever.. So I went to the movies..badchoice picking “GI Jane”worse now” Monday, June 6, 2011
  • 88. NAMED  ENTITY   RECOGNITION Identifying and classifying tokens Monday, June 6, 2011
  • 89. NER  in  prior  work  vs.  NER  for   Informal  Text Monday, June 6, 2011
  • 90. Cultural  Named  Entities  NER  focus  in  this  work:  Cultural  Named   Entities Artifacts  of  Culture   ‣ Name  of  a  books,  music  albums,  films,  video  games,   etc. Common  words  in  a  language ‣ The  Lord  of  the  Rings,  Lips,  Crash,  Up,  Wanted,   Today,  Twilight,  Dark  Knight… Monday, June 6, 2011
  • 91. Characteristics  of  Cultural  Entities Varied senses, several poorly documented ‣ Merry Christmas covered by 60+ artists Star Trek: movies, TV series, media franchise.. and cuisines !! Changing contexts with recent events ‣ The Dark Knight reference to Obama, health care reform Unrealistic expectations ‣ Comprehensive sense definitions, enumeration of contexts, labeled corpora for all senses .. ‣ NER Relaxing the closed-world sense assumptions Monday, June 6, 2011
  • 92. NER  in  prior  work  vs.     NER  for  Informal  Text Monday, June 6, 2011
  • 93. A  Spot  and  Disambiguate   Paradigm NER generally a sequential prediction problem ‣ NER system that achieves 90.8 F1 score on the CoNLL-2003 NER shared task (PER, LOC, ORGN entities) [Lev Ratinov, Dan Roth] Focus of approach: Spot and Disambiguate Paradigm Starting off with a dictionary or list of entities we want to spot Monday, June 6, 2011
  • 94. A  Spot  and  Disambiguate   Paradigm Spot, then disambiguate in context (natural language, domain knowledge cues) Binary Classification Is this mention of “the hangover” in a sentence referring to a movie? Monday, June 6, 2011
  • 95. NER  in  prior  work  vs.                         NER  for  Informal  Text Monday, June 6, 2011
  • 96. Algorithmic  Contributions   Supervised  Algorithms Monday, June 6, 2011
  • 97. Algorithmic  Contributions   Supervised  Algorithms Examples: “I am watching Pattinson scenes in <movie id=2341> Twilight</movie> for the nth time.” “I spent a romantic evening watching the Twilight by the bay..” “I love <artist id=357688>Lily’s</artist> song Monday, June 6, 2011
  • 98. Multiple  Senses  in  the  Same   Domain Monday, June 6, 2011
  • 99. Algorithm  Preliminaries Problem Defn ‣ Cultural Entity Identification : Music album, tracks ‣ Smile (Lilly Allen), Celebration (Madonna) Corpus: MySpace comments ‣ Context-poor utterances " “Happy 25th Lilly, Alfieis funny” Monday, June 6, 2011
  • 100. Algorithm  Preliminaries Corpus: MySpace comments ‣ Context-poor utterances " “Happy 25th Lilly, Alfieis funny” Monday, June 6, 2011
  • 101. Algorithm  Preliminaries " “Happy 25th Lilly, Alfieis funny” Monday, June 6, 2011
  • 102. Algorithm  Preliminaries Goal:  Semantic  Annotation  of   music  named  entities  (w.r.t   MusicBrainz) Monday, June 6, 2011
  • 103. Using  a  Knowledge  Resource  for   NER  is  not  straight-­‐‑forward.. Monday, June 6, 2011
  • 104. Approach  Overview   Scoped Relationship graphs ‣Using context cues from the content, webpage title, url… new Merry Christmas tune ‣Reduce potential entity spot size new albums/songs ‣Generate candidate entities ‣Spot and Disambiguate Monday, June 6, 2011
  • 105. Sample  Real-­‐‑world  Constraints Career Restrictions ‣“release your third album already..” Recent Album restrictions ‣“I loved your new album..” Artist age restrictions ‣”happy 25th rihanna, loved alfie btw..” etc. Monday, June 6, 2011
  • 106. Non-­‐‑Music  Mentions Challenge 1: Several senses in the same domain ‣ Scoping relationship graphs narrows possible senses ‣ Solves the named entity identification problem partially Challenge 2: Non-music mentions ‣ Got your new album Smile. Loved it! ‣ Keep your SMILE on! " " " " " " " " Monday, June 6, 2011
  • 107. Non-­‐‑Music  Mentions Challenge 1: Several senses in the same domain ‣ Scoping relationship graphs narrows possible senses ‣ Solves the named entity identification problem partially Challenge 2: Non-music mentions ‣ Got your new album Smile. Loved it! ‣ Keep your SMILE on! " " " " " " " " Monday, June 6, 2011
  • 108. Using  Language  Features  to   eliminate  incorrect  mentions.. Syntactic features ‣ POS Tags, Typed dependencies.. ‣ Example here Word-level features ‣ Capitalization, Quotes Domain-level features Monday, June 6, 2011
  • 110. Hand  Labeling  -­‐‑  Fairly  Subjective 1800+  spots  in  MySpace  user  comments  from   artist  pages   Keep  your  SMILE  on! –good  spot,  bad  spot,  inconclusive? 4-­‐‑way  annotator  agreements –Madonna  90%  agreement –Rihanna  84%  agreement –Lily  Allen  53%  agreement Monday, June 6, 2011
  • 111. Dictionary  SpoTer  +  NLP  Step   Daniel  Gruhl,  Meena  Nagarajan,  Jan  Pieper,  Christine  Robson,  Amit  Sheth,  Context  and  Domain   Knowledge  Enhanced  Entity  SpoMing  in  Informal  Text,  The  8th  International  Semantic  Web  Conference,   2009:  260-­‐‑276   Monday, June 6, 2011
  • 112. NER  on  Social  Media  Text  using   Domain  Knowledge Highlights issues with using a domain knowledge for an IE task Two stage approach: chaining NL learners over results of domain model based spotters Improves accuracy up to a further 50% ‣ allows the more time-intensive NLP analytics to run on less than the full set of input data Monday, June 6, 2011
  • 113. BBC  SoundIndex  (IBM  Almaden):   Pulse  of  the  Online  Music   " "   Daniel  Gruhl,  Meenakshi  Nagarajan,  Jan  Pieper,  Christine  Robson,  Amit  Sheth:  “Multimodal  Social   Intelligence  in  a  Real-­‐‑Time  Dashboard  System,”  special  issue  of  the  VLDB  Journal  on  "ʺData  Management   and  Mining  for  Social  Networks  and  Social  Media"ʺ,  2010    CHECK    hMp://www.almaden.ibm.com/cs/ projects/iis/sound/ Monday, June 6, 2011
  • 114. The  Vision http://www.almaden.ibm.com/cs/projects/iis/sound/ Monday, June 6, 2011
  • 116. Several  Insights Trending  popularity  of  artists Trending  topics  in  artist  pages Only  4%  -­‐‑ve  sentiments,  perhaps  ignore  the  Sentiment Ignoring  Spam  can  change  ordering    Annotator  on  this  data  source? of  popular  artists Monday, June 6, 2011
  • 117. Predictive  Power  of  Data Billboards Top 50 Singles chart during the week of Sept 22-28 ’07 vs. MySpace popularity charts. User study indicated 2:1 and upto 7:1 (younger age groups) preference for MySpace list. Challenging traditional polling methods! Monday, June 6, 2011
  • 119. Key  Phrase  Extraction:  Example Key phrases extracted from prominent discussions on Twitter around the 2009 Health Care Reform debate and 2008 Mumbai Terror Attack on one day Monday, June 6, 2011
  • 120. Key  Phrase  Extraction  from  SM   Text Different from Information Extraction Extracting vs. Assigning Key Phrases " Focus: Key Phrase Extraction Prior work focus: extracting phrases that summarize a document -- a news article, a web page, a journal article, a book.. Focus: summarize multiple documents (UGC) around same event/topic of interest Monday, June 6, 2011
  • 121. Key  Phrase  Extraction  on  SM   Content Focus: Summarizing Social Perceptions via key phrase extraction Preserving/Isolating the social behind the social data ‣"What is said in Egypt vs. the USA should be viewed in isolation Monday, June 6, 2011
  • 122. Key  Phrase  Extraction  on  SM   Content ‣ Accounting for redundancy, variability, off-topic content " “Met up with mom for lunch, she looks lovely as ever, good genes .. Thanks Nike, I love my new Gladiators ..smooth as a feather. I burnt all the calories of Italian joy in one run.. if you are looking for good Italian food on Main, Bucais the place to go.” Monday, June 6, 2011
  • 123. Social  and  Cultural  Logic  in  SMC Thematic components ‣ similar messages convey similar ideas Space, time metadata ‣ role of community and geography in communication Poster attributes ‣ age, gender, socio-economic status reflect similar perceptions Monday, June 6, 2011
  • 124. Feature  Space  (common  to  several   efforts) Focus: n-grams, spatio-temporal metadata (social components) Syntactic Cues: In quotes, italics, bold; in document headers; phrases collocated with acronyms Monday, June 6, 2011
  • 125. Feature  Space  (common  to  several   efforts) Document and Structural Cues: Two word phrases, appearing in the beginning of a document, frequency, presence in multiple similar documents etc. Linguistic Cues: Stemmed form of a phrase, phrases that are simple and compound nouns in sentences etc. Monday, June 6, 2011
  • 126. Key  Phrase  Extraction:  Overview “President Obama in trying to regain control of the health-care debate will likely shift his pitch in September” " 1-grams: President, Obama, in, trying, to, regain, ... " 2-grams: “President Obama”, “Obama in”, “in trying”, “trying Monday, June 6, 2011
  • 127. A descriptor is an n-gram weighted by: ‣ Thematic Importance ‣ TFIDF, stop words, noun phrases ‣ Redundancy: statistically discriminatory in nature ‣ variability: contextually important ‣ Spatial Importance (local vs. global popularity) ‣ Temporal Importance (always popular vs. currently trending) Monday, June 6, 2011
  • 129. Eliminating Off-topic Content [WISE2009] Frequency based heuristics will not eliminate off-topic content that is ALSO POPULAR Monday, June 6, 2011
  • 130. Approach  Overview “Yeah i know this a bit off topic but the other electronics forum is dead right now. im looking for a good camcorder, somethin not to large that can record in full HD only ones so far that ive seen are sonys” “CanonHV20.Great little cameras under $1000.” Monday, June 6, 2011
  • 131. Approach  Overview Assume one or more seed words (from domain knowledge base) C1 -['camcorder'] Extracted Key words / phrases C2 -['electronics forum', 'hd', 'camcorder', 'somethin', 'ive', 'canon', 'little camera', 'canon hv20', 'cameras', 'offtopic'] Gradually expand C1 by adding phrases from C2 that are strongly associated with C1 Mutual Information based algorithm [WISE2009] Monday, June 6, 2011
  • 132. Key  Phrases  and  Aboutness   Evaluations Are the key phrases we extracted topical and good indicators of what the content is about? ‣ If it is, it should act as an effective index/search phrase and return relevant content Evaluation Application: Targeted Content Delivery Monday, June 6, 2011
  • 133. Targeted  Content   Delivery  -­‐‑Evaluations 12K posts from MySpace and Facebook Electronics forums ‣ Baseline phrases: Yahoo Term Extractor ‣ Our method phrases: Key phrase extraction, elimination Targeted Content from Google AdSense Monday, June 6, 2011
  • 134. Targeted  Content  for  all  content   vs.  extracted  key  phrases Monday, June 6, 2011
  • 135. User  Studies  and  Results Monday, June 6, 2011
  • 136. Impact  and  Contributions TFIDF + social contextual cues yield more useful phrases that preserve social perceptions Corpus + seeds from a domain knowledge base eliminate off-topic phrases effectively Monday, June 6, 2011
  • 138. Targeted  Content  Delivery  via             Intention  Mining On social networks Use case for this talk ‣" Targeted content = content-based " advertisements ‣ " Target = user profiles Content-based advertisements CBAs ‣" Well-known monetization model for online content Monday, June 6, 2011
  • 139. Circa.  2009  Content-­‐‑based  Ads Monday, June 6, 2011
  • 140. Circa.  2009  -­‐‑Ads  on  Profiles Monday, June 6, 2011
  • 141. What  is  going  on  here Interests do not translate to purchase intents ‣" Interests are often outdated.. ‣ " Intents are rarely stated on a profile.. Cases that do seem to work ‣" New store openings, sales ‣ " Highly demographic-targeted ads Monday, June 6, 2011
  • 142. Intents  in  User   Monday, June 6, 2011
  • 143. Content  Ads  Outside  Profiles Monday, June 6, 2011
  • 144. Targeted  Content-­‐‑based   Advertising   Non-trivial ‣ Non-policed content Brand image, Unfavorable sentiments ‣ People are there to network User attention to ads is not guaranteed ‣ Informal, casual nature of content ‣ People are sharing experiences and events Main message overloaded with off topic content" Monday, June 6, 2011
  • 145. Targeted  Content-­‐‑based   Advertising   Monday, June 6, 2011
  • 146. Targeted  Content-­‐‑based   Advertising   I NEED HELP WITHSONY VEGAS PRO 8!! Ugh and ihave a video project due tomorrow for merrilllynch :(( all ineed to do is simple: Extract several scenes from a clip, insert captions, transitions and thatsit. really. omggicant figure out anything!! help!! and igot food poisoning from eggs. its not fun. Pleasssse, help? :( Learning from Multi-topic Web Documents for Contextual Advertisement, Zhang, Y., Surendran, A. C., Platt, J. C., and Narasimhan, M.,KDD 2008 Monday, June 6, 2011
  • 147. Preliminary  Results  in…   Identifying intents behind user posts on social networks ‣ Identify Content with monetization potential Identifying keywords for advertising in user- generated content ‣ Considering interpersonal communication & off-topic chatter Monday, June 6, 2011
  • 148. Investigations User studies ‣ Hard to compare activity based ads to s.o.t.a ‣ Impressions to Clickthroughs ‣ How well are we able to identify monetizable posts ‣ How targeted are ads generated using our " keywords vs. entire user generated content Monday, June 6, 2011
  • 149. Identifying  Monetizable  Intents Scribe Intent not same as Web Search Intent 1B. People write sentences, not keywords or phrases Presence of a keyword does not imply navigational / transactional intents ‣ ‘am thinking of getting X’ (transactional) ‣ ‘I like my new X’ (information sharing) ‣ ‘what do you think about X’ (information seeking) 1B. J. Jansen, D. L. Booth, and A. Spink, “Determining the informational, navigational, and transactional intent of web queries,”Inf. Process. Manage., vol. 44, no. 3, 2008. Monday, June 6, 2011
  • 150. From  X  to  Action  PaTerns Action patterns surrounding an entity ‣ How questions are asked and not topic words that indicate what the question is about ‣ “where can I find a chottopspcam” ‣ User post also has an entity Monday, June 6, 2011
  • 151. Conceptual  Overview   Bootstrapping  to  learn  IS  paTerns Set of user posts from SNSs Not annotated for presence or absence of any intent Monday, June 6, 2011
  • 152. Bootstrapping  to   learn  IS  paTerns Generate  a  universal  set  of  n-­‐‑  gram  paMerns;  freq  >  f S  =  set  of  all  4-­‐‑grams;  freq  >  3 Monday, June 6, 2011
  • 153. Bootstrapping  to   learn  IS  paTerns ! ! Generate  set  of  candidate  paMerns  from  seed  words   (why,when,where,how,what) Sc=  all  4-­‐‑grams  in  S  that  extract  seed  words Monday, June 6, 2011
  • 154. Bootstrapping  to   learn  IS  paTerns ! ! User  picks  10  seed  paMerns  from  Sc Sis=  ‘does  anyone  know  how’,  ‘where  do  I  find’,   ‘someone  tell  me  where’… Monday, June 6, 2011
  • 155. Bootstrapping  to   learn  IS  paTerns ! ! ! !     Gradually  expand  Sis  by  adding     Information   Seeking  paDerns  from  Sc Monday, June 6, 2011
  • 156. Bootstrapping  to   learn  IS  paTerns ! ! ! ! For  every  pis  in  Sis  generate  set  of  filler  paMerns Monday, June 6, 2011
  • 157. Bootstrapping  to   learn  IS  paTerns ‘.*  anyone  know  how’‘          does  .*  know  how’         ‘does  anyone  .*  how’                                  ‘does  anyone   know  .*’ Monday, June 6, 2011
  • 158. Extracting  and  Scoring  PaTerns Monday, June 6, 2011
  • 159. Extracting  and  Scoring  PaTerns •‘does  *  know  how’  –‘does  someone  know  how’    •Functional  Compatibility  -­‐‑Impersonal  pronouns    •Empirical  Support  –1/3  –‘does  somebody  know  how’    •Functional  Compatibility  -­‐‑Impersonal  pronouns    •Empirical  Support  –0    •PaMern  Retained  –‘does  john  know  how’    •PaMern  discarded Monday, June 6, 2011
  • 160. Extracting  and  Scoring  PaTerns Sc=  {‘does  anyone  know  how’,  ‘where  do  I  find’,     ‘someone  tell  me  where’}  pis=  `does  anyone  know  how’ Monday, June 6, 2011
  • 161. Extracting  and  Scoring  PaTerns  pis=  `does  anyone  know  how’ Monday, June 6, 2011
  • 162. Extracting  and  Scoring  PaTerns Monday, June 6, 2011
  • 163. Expanding  the  PaTern  Pool Functional  properties  /  communicative  functions   of  words From  a  subset  of  LIWC –cognitive  mechanical  (e.g.,  if,  whether,  wondering,  find)   •‘I  am  thinking  about  geMing  X’   –adverbs(e.g.,  how,  somehow,  where)   –  (e.g.,  someone,  anybody,  whichever) •‘Someone  tell  me  where  can  I  find  X’           1Linguistic  Inquiry  Word  Count,  LIWC,  hMp://liwc.net Monday, June 6, 2011
  • 164. Details  in  [WISE2009]  for.. Over  iterations,  single-­‐‑word  substitutions,   functional  usage  and  empirical  support   conservatively  expands  Sis Infusing  new  paMerns  and  seed  words Stopping  conditions Monday, June 6, 2011
  • 166. Identifying  Monetizable  Posts Information  Seeking  paMerns  generated  offline Information  seeking  intent  score  of  a  post ‣ Extract  and  compare  paMerns  in  posts  with   extracted  paMerns ‣ Transactional  intent  score  of  a  post ‣ LIWC  ‘Money’  dictionary  -­‐‑  173  words  and   word  forms  indicative    of  transactions,  e.g.,   trade,  deal,  buy,  sell,  worth,  price  etc. Monday, June 6, 2011
  • 167. Keywords  for  Advertizing Identifying keywords in monetizable posts " –Plethora of work in this space Off-topic noise removal is our focus " I NEED HELP WITHSONY VEGAS PRO 8!! Ugh and ihave a video project due tomorrow for merrilllynch :(( all ineed to do is simple: Extract several scenes from a clip, insert captions, Monday, June 6, 2011
  • 168. Keywords  for  Advertising Identifying keywords in monetizable posts ‣ Plethora of work in this space Off-topic noise removal is our focus ‣ I NEED HELP WITHSONY VEGAS PRO 8!! Ugh and ihave a video project due tomorrow for merrilllynch :(( all ineed to do is simple: Extract several scenes from a clip, insert captions, transitions and thatsit. really. omggicant figure out anything!! help!! and igot food poisoning from eggs. its not fun. Pleasssse, help? :( Monday, June 6, 2011
  • 169. Conceptual  Overview   (also  see  slides  88,89)   Topical hints ‣ C1 -['camcorder'] Keywords in post ‣ C2 -['electronics forum', 'hd', 'camcorder', 'somethin', 'ive', 'canon', 'little camera', 'canon hv20', 'cameras', 'offtopic'] Move strongly related keywords from C2 to C1 one-by-one ‣ Relatedness determined using information gain ‣ Using the Web as a corpus, domain independent Monday, June 6, 2011
  • 170. Off-­‐‑topic  ChaTer C1 -['camcorder'] C2 -['electronics forum', 'hd', 'camcorder', 'somethin', 'ive', 'canon', 'little camera', 'canon hv20', 'cameras', 'offtopic'] Informative words ‣ ['camcorder', 'canon hv20', 'little camera', 'hd', 'cameras', 'canon'] Monday, June 6, 2011
  • 171. Evaluations  -­‐‑User  Study Keywords from 60 monetizable user posts ‣ Monetizable intent, at least 3 keywords in content 45 MySpace Forums, 15 Facebook Marketplace, 30 graduate students ‣ 10 sets of 6 posts each ‣ Each set evaluated by 3 randomly selected users Monetizable intents? ‣ All 60 posts voted as unambiguously information seeking in intent Monday, June 6, 2011
  • 172. 1.  Effectiveness  of  using   topical  keywords Google AdSenseads for user post vs. extracted topical keywords Monday, June 6, 2011
  • 174. Result  -­‐‑2X  Relevant  Impressions Users picked ads relevant to the post ‣ At least 50% inter-evaluator agreement For the 60 posts ‣ Total of 144 ad impressions ‣ 17% of ads picked as relevant For the topical keywords ‣ Total of 162 ad impressions ‣ 40% of ads picked as relevant Monday, June 6, 2011
  • 175. 2.  Profile  Ads  vs.  Activity  Ads User’s profile information ‣ Interests, hobbies, TV shows.. ‣ Non-demographic information Submit a post Looking to buy and why (induced noise) Ads that generate interest, captured attention Monday, June 6, 2011
  • 176. Result  -­‐‑8X  Generated  Interest Using profile ads ‣ Total of 56 ad impressions ‣ 7% of ads generated interest Using authored posts ‣ Total of 56 ad impressions ‣ 43% of ads generated interest •" Using topical keywords from authored posts ‣ Total of 59 ad impressions ‣ 59% of ads generated interest Monday, June 6, 2011
  • 177. To  note… User studies small and preliminary, clearly suggest ‣ Monetization potential in user activity ‣ Improvement for Ad programs in terms of relevant impressions Evaluations based on forum, marketplace ‣ Verbose content ‣ Status updates, notes, community and event memberships… ‣ One size may not fit all Monday, June 6, 2011
  • 178. To  note… A world between relevant impressions and click throughs ‣ Objectionable content, vocabulary impedance, Ad placement, network behavior In a pipeline of other community efforts No profile information taken into account Cannot custom send information to Google AdSense Monday, June 6, 2011
  • 179. SENTIMENT  /  OPINION   MINING Monday, June 6, 2011
  • 180. Content  Analysis:  Sentiment   Analysis/Opinion  Mining Two main types of information we can learn from user-generated content: fact vs. opinion Much of what we read in social media (e.g., blogs, Twitter, Facebook) is a mix of facts and opinions.   For example, " Latest news: Mobile web services not working in #Bahrain and Internet is extremely slow #feb14 {fact}... looks like they "learned" from #Egypt {opinion}" Monday, June 6, 2011
  • 181. Sentiment  Analysis  Motivation Why do Which movie What customers people oppose should I see? complain about? health care reform? Monday, June 6, 2011
  • 182. Sentiment  Analysis:  Tasks Example: ‣ How awful that many #Egyptian artifacts are in danger of being destroyed. ‣ What Zahi Hawass must be thinking #jan25 (read in the tone of “what were YOU thinking” Monday, June 6, 2011
  • 184. Sentiment  Analysis:  Tasks Classification: overall sentiment polarity: positive/ neutral/negative ‣Example: “How awful that many #Egyptian artifacts are in danger of being destroyed.” ‣overall polarity is negative ‣Target-specific sentiment polarity: positive/neutral/ negative ‣ Example: for target "egyptian artifacts", polarity is "negative“ for target "Zahi Hawass", polarity is "neutral“ Monday, June 6, 2011
  • 186. Sentiment  Analysis:  Tasks Identification & Extraction: opinion, opinion holder, opinion target Example: opinion="awful", opinion holder="the author", target="egyptian artifacts are in danger" Opinion="must be thinking", opinion holder="the author", target="Zahi Hawass" Monday, June 6, 2011
  • 187. Sentiment  Analysis:  Approaches Classification: ‣ Supervised:  ‣ labeled training data ‣ features, differ from traditional topic classification tasks ‣ learning strategies ‣ Unsupervised: ‣ lexicon-based approach ‣ Bootstrapping Monday, June 6, 2011
  • 189. Sentiment  Analysis:  Approaches Identification & Extraction: ‣utilizing the relations between opinion and opinion target, ‣proximity, ‣syntactic dependency, ‣co-occurrence and ‣prepared patterns/rules Monday, June 6, 2011
  • 190. Sentiment  Analysis:   From  Tweets  to  polls corpus:   •    0.7  billion  tweets,          Jan  2008  –  Oct      2009 •    1.5  billion  tweets,          Jan  2008  –  May        2010 Lexicon-based approach for sentiment analysis of tweets: subjective lexicon from OpinionFinder (Wilson et al., 2005) Within topic tweets, count messages containing these positive and negative words defined by the lexicon Monday, June 6, 2011
  • 191. Sentiment  Analysis:   From  Tweets  to  polls corpus:   •    0.7  billion  tweets,          Jan  2008  –  Oct      2009 •    1.5  billion  tweets,          Jan  2008  –  May        2010 subjective lexicon from OpinionFinder (Wilson et al., 2005) Within topic tweets, count messages containing these positive and negative words defined by the lexicon Monday, June 6, 2011
  • 192. Sentiment  Analysis:   From  Tweets  to  polls corpus:   •    0.7  billion  tweets,          Jan  2008  –  Oct      2009 •    1.5  billion  tweets,          Jan  2008  –  May        2010 Within topic tweets, count messages containing these positive and negative words defined by the lexicon Monday, June 6, 2011
  • 193. Sentiment  Analysis:   From  Tweets  to  polls corpus:   •    0.7  billion  tweets,          Jan  2008  –  Oct      2009 •    1.5  billion  tweets,          Jan  2008  –  May        2010  B.O’Connor,  R.Balasubramanyan,  B.R.Routledge,  and   N.A.Smith.  From  Tweets  to  polls:  Linking  text  sentiment  to  public   opinion  time  series.  In  Intl.AAAI  Conference  on  Weblogs  and   Social  Media,  Washington,D.C.,2010. Monday, June 6, 2011
  • 194. Sentiment  Analysis:  Predicting   the  Future  With  Social  Media Corpus: 2.89 million tweets referring to 24 movies released over a period of three months Sentiment Analysis Classifier:  DynamicLMClassifier provided by LingPipe linguistic analysis package thousands of workers from the Amazon Mechanical Turk to assign sentiments (positive, negative, neutral) for a large random sample of tweets  train the classifier using an n-gram model S.  Asur  and  B.Huberman.  Predicting  the  Future  With  Social  Media.  2010.  hMp://arxiv.org/abs/1003.5699 Monday, June 6, 2011
  • 195. Sentiment  Analysis:  Predicting   the  Future  With  Social  Media Sentiment Analysis Classifier:  DynamicLMClassifier provided by LingPipe linguistic analysis package thousands of workers from the Amazon Mechanical Turk to assign sentiments (positive, negative, neutral) for a large random sample of tweets  train the classifier using an n-gram model S.  Asur  and  B.Huberman.  Predicting  the  Future  With  Social  Media.  2010.  hMp://arxiv.org/abs/1003.5699 Monday, June 6, 2011
  • 196. Sentiment  Analysis:  Predicting   the  Future  With  Social  Media  DynamicLMClassifier provided by LingPipe linguistic analysis package thousands of workers from the Amazon Mechanical Turk to assign sentiments (positive, negative, neutral) for a large random sample of tweets  train the classifier using an n-gram model S.  Asur  and  B.Huberman.  Predicting  the  Future  With  Social  Media.  2010.  hMp://arxiv.org/abs/1003.5699 Monday, June 6, 2011
  • 197. Sentiment  Analysis:  Predicting   the  Future  With  Social  Media thousands of workers from the Amazon Mechanical Turk to assign sentiments (positive, negative, neutral) for a large random sample of tweets  train the classifier using an n-gram model S.  Asur  and  B.Huberman.  Predicting  the  Future  With  Social  Media.  2010.  hMp://arxiv.org/abs/1003.5699 Monday, June 6, 2011
  • 198. Sentiment  Analysis:  Predicting   the  Future  With  Social  Media  train the classifier using an n-gram model S.  Asur  and  B.Huberman.  Predicting  the  Future  With  Social  Media.  2010.  hMp://arxiv.org/abs/1003.5699 Monday, June 6, 2011
  • 199. Sentiment  Analysis:  Predicting   the  Future  With  Social  Media S.  Asur  and  B.Huberman.  Predicting  the  Future  With  Social  Media.  2010.  hMp://arxiv.org/abs/1003.5699 Monday, June 6, 2011
  • 200. Sentiment  Analysis:  Target-­‐‑specific  opinion   identification  &  Classification  of   Tweets-­‐‑Unsupervised  Approach Simple  lexicon-­‐‑based  method  doesn'ʹt  work. Observations: The opinions may not contribute toward the given target (1,2,3,6) The subjectivity and polarity of opinion clues are domain- dependent (5,7) Single words are not enough (4,7,8) Monday, June 6, 2011
  • 201. Sentiment  Analysis:  Target-­‐‑specific  opinion   identification  &  Classification  of   Tweets-­‐‑Unsupervised  Approach General  subjective  lexicon ‣ Commonly  used  subjective  lexicon  +  popular  slangs  learned  from   Urban  Dictionary Domain-­‐‑dependent  sentiment  lexicon ‣ Learned  from  domain-­‐‑specific  corpus ‣ bootstrapping   ‣ More  than  words  (word/phrase/paMern) ‣ n-­‐‑gram  +  statistical  model   Monday, June 6, 2011
  • 202. Sentiment  Analysis:  Target-­‐‑specific  opinion   identification  &  Classification  of   Tweets-­‐‑Unsupervised  Approach General  subjective  lexicon ‣ Commonly  used  subjective  lexicon  +  popular  slangs  learned  from   Urban  Dictionary Domain-­‐‑dependent  sentiment  lexicon ‣ Learned  from  domain-­‐‑specific  corpus ‣ bootstrapping   ‣ More  than  words  (word/phrase/paMern) ‣ n-­‐‑gram  +  statistical  model   Monday, June 6, 2011
  • 203. Sentiment  Analysis:  Target-­‐‑specific  opinion   identification  &  Classification  of   Tweets-­‐‑Unsupervised  Approach Domain-­‐‑dependent  sentiment  lexicon ‣ Learned  from  domain-­‐‑specific  corpus ‣ bootstrapping   ‣ More  than  words  (word/phrase/paMern) ‣ n-­‐‑gram  +  statistical  model   Monday, June 6, 2011
  • 204. Sentiment  Analysis:  Target-­‐‑specific  opinion   identification  &  Classification  of   Tweets-­‐‑Unsupervised  Approach   Monday, June 6, 2011
  • 205. Sentiment  Analysis:  Target-­‐‑specific  opinion   identification  &  Classification  of   Tweets-­‐‑Unsupervised  Approach Monday, June 6, 2011
  • 206. Sentiment  Analysis:  Target-­‐‑ specific  opinion  identification  &   Classification  of  Tweets-­‐‑ Unsupervised  Approach Monday, June 6, 2011
  • 207. Sentiment  Analysis:  Target-­‐‑ specific  opinion  identification  &   Classification  of  Tweets-­‐‑ Unsupervised  Approach Target-­‐‑specific  opinion  identification/extraction ‣ Shallow  syntactic  analysis ‣ Rules  +  Proximity Monday, June 6, 2011
  • 208. Content  Analysis:  Context   Extraction,  Utilization URL  Extraction  is  for  Tweets FourSquare  in  Facebook,  TwiMer   What  is  it  in  other  mediums/SMS? Monday, June 6, 2011
  • 209. Content  Analysis:   URL  extraction Resolution Semantic Context Relevance Monday, June 6, 2011
  • 210. Author  Categorization:  Using   Content  to  derive  additional   People  metadata Personality Signals Blogs, Style of Writing Psychometric analysis of content Sample study: Gendered writing styles online Monday, June 6, 2011
  • 211. People  Analysis:  Using  Network   to  derive  People  metadata Interesting questions to ask: ‣ Who are the most popular people* in the network ‣ Who are the most influential people in the network ‣ Who are the most active people in the network ‣ What are the types of people in communities of the network ‣ Who are the bridges between communities in the network Monday, June 6, 2011
  • 212. People  Analysis:  Influence By Link Analysis Algorithms Hits [K-99] & variants   PageRank [BP-97] & variants  etc.. Links not sufficient! ‣ Million Follower Fallacy [C-10] Source : informing-arts Monday, June 6, 2011
  • 214. People  Analysis:  Influence Flavor of Context Analysis (activity level) Popularity NOT = Influence! ‣ Influence & Passivity [RGAH-10] Interest Similarity ‣ TwitterRank: Reciprocity & Homophily [WLJH-10] Klout Score - True Reach, Amplification [Klout] Monday, June 6, 2011
  • 215. People  Analysis:  User  types   &  Affiliation Blogger, Scientist, Journalist, Artist, Trustee, Company X in  Domain Y.. ‣ Multiple types and affiliations! User interest mining ‣ Key Phrase Extraction followed by semantic association on user bio, tweets, lists, favorite posts Source: kahunainstitute.com ‣ Twitter Study [BCDMJNRM-09] Monday, June 6, 2011
  • 216. People  Analysis:  User  types   &  Affiliation Monday, June 6, 2011
  • 217. People  Analysis:  User  types   &  Affiliation Semantic analysis of profile description ‣ Web Presence: Use of Web & Knowledge bases (Wikipedia, Blogs) to build context for user types ‣ Entity Spotting & Extraction, followed by Semantic Association and Similarity with user-type context Monday, June 6, 2011
  • 218. People  Analysis:   Social  Engagement Source: http://www.syscomminternational.com/ Frequency  Distribution  Analysis  of  user  activity ‣ posting,  retweet,  reply,  mentions,  lists  etc.   Monday, June 6, 2011
  • 219. Network  Analysis   Foundation  of  network:   •Nodes •Connections/Relationships Interesting  questions  to  ask: How  communities  form  around  topics-­‐‑  growth  &  evolution   What  are  the  effects  of  presence  of  influential  participants  in  the   communities What  are  the  effects  of  content  nature  (or  sentiment,  opinions)   flowing  in  network  on  the  community  life What  is  the  community  structure:  degree  of  separation  and  sub-­‐‑ communities Monday, June 6, 2011
  • 220. Network  Analysis:  Methods Source: http://www.kudos- dynamics.com/ Monday, June 6, 2011
  • 221. Network  Analysis:  Methods Network  Structure  metrics Centrality,  Connected  Component,  Avg.   Degree,  Clustering  Coefficient,  Avg.  Path  Length,   Bridge,  Cohesion,  Prestige,  Reciprocity   Important  Literature:                   [AB-­‐‑02,  WS-­‐‑98,    BW-­‐‑00;  NW-­‐‑06,  WF-­‐‑92,  MW-­‐‑10] Source: http://www.kudos- dynamics.com/ Monday, June 6, 2011
  • 222. Network  Analysis:  Algorithms   Community Discovery, growth, evolution ‣ Based on relationship types (e.g., signed network), geography/location based etc. Hierarchical clustering algorithms – Top-down, bottom-up Modularity Maximization [NW-06] Algorithms comparison survey [B-06] Monday, June 6, 2011
  • 223. Network  Analysis:  Algorithms   Graph Partitioning & Traversal Best time-complexity & reachability Follow Greedy paths ‣ K-way multilevel Partitioning , ‣ Bron-Kerbosch, K-plex, K-core or N-cliques, DFS, BFS, MST "ʺWe  dream  in  Graph  and   We  analyze  in  Matrix”-­‐‑   Barry  Wellman,  INSNA   Monday, June 6, 2011
  • 224. Network  Analysis:  Methods Network Modeling Approaches  ‣ Random graph model (Erdos-Renyi model) ‣ Small-world model (Small World Phenomenon)  ‣ Scale-free model (led to Power-Law degree distribution) ‣ Social Network Analysis methods ‣ Centrality (Degree, Eigenvector, Betweenness, Closeness) ‣ Clusters (Cliques and extensions, Communities) Source: http://www.kudos- dynamics.com/ Monday, June 6, 2011
  • 225. Network  Analysis:   Diffusion  &  Homophily Information Flow: Diffusion ‣ Maximizing Spread (Opinion, Innovation, Recommendation) ‣ Outbreak Detection (e.g., disease) Social Network: No info about user action– Understanding dynamics is challenging! Power Law distribution [LAH-07] Factors impacting flow: ‣ Sampling strategy, user Homophily, content nature [CLSCK-10, NPS-10] Monday, June 6, 2011
  • 227. Analysis  &  Visualization  Tools (Network WorkBench)NWB Truthy Graph-tool Orange Pajek Source:  hMp://truthy.indiana.edu/ Tulip http://en.wikipedia.org/wiki/ social_network_analysis_software Monday, June 6, 2011
  • 229. Citizen  Sensing  in  Real-­‐‑time Monday, June 6, 2011
  • 230. Real-­‐‑Time  Motivation People cant wait for Information 500 years ago ‣ Single life time 20 years ago ‣ Next day or two ‣ Television, News papers Presently ‣ Minutes are not considered fast enough ‣ Digital media, Social media  Monday, June 6, 2011
  • 231. Real-­‐‑Time  Social  Media Is Real-Time the future of Web? Social Media for Real-Time Web ‣ Disaster Management ‣ Ushahidi ‣ Real-Time Markets ‣ Examples ‣ Brand Tracking ‣ Twarql ‣ Movie reviews Monday, June 6, 2011
  • 232.            Scenario The  Guardian Feb  2010 Monday, June 6, 2011
  • 233.            Scenario The  Guardian Feb  2010 Monday, June 6, 2011
  • 234.            Scenario The  Guardian Feb  2010 Journalist Monday, June 6, 2011
  • 235. Challenges Information Overload ‣ Can we aggregate, organize and collectively analyze data Real Time ‣ Can we deliver the data as it is generated Monday, June 6, 2011
  • 236. A  Semantic  Web  Approach Expressive description of Information need ‣ Using SPARQL (Instead of traditional keyword search)  Flexibility on the point of view ‣ Ability to "slice and dice" the data in several dimensions: thematic, spatial, temporal, sentiment etc.. Streaming data with Background Knowledge ‣ Enables automatic evolution and serendipity Scalable Real-Time delivery  ‣ Using sparqlPuSH (SFSW'10) Monday, June 6, 2011
  • 240. Metadata  Extractions     (Social  Sensor  Server) Named Entity Recognition ‣ 2 Million Entities from DBPedia ‣ Load as Trie for efficiency ‣ N-grams matched ‣ Example: Obama, Barack Obama Monday, June 6, 2011
  • 241. Metadata  Extractions     (Social  Sensor  Server) URL, HashTag Extraction ‣ Regex extraction ‣ Resolution ‣ URL Resolution: Follows http redirects for resolution ‣ HashTag Resolution: Tagdef, Tagal,WTHashTag.com Monday, June 6, 2011
  • 242. Metadata  Extractions     (Social  Sensor  Server) Monday, June 6, 2011
  • 243. Metadata  Extractions     (Social  Sensor  Server) Other Metadata provided by Twitter ‣ User profile: User Name, Location, Time etc.. ‣ Tweet: RT, reply etc.. Monday, June 6, 2011
  • 244. Structured  Data (Social  Sensor  Server) RDF Annotation ‣ Common RDF/OWL Vocabularies ‣ FOAF - (foaf-project.org) Friend of a Friend ‣ SIOC - (sioc-project.org) Semantically Interlinked Online Communities ‣ OPO - (online-presence.net) Online Presence Ontology ‣ MOAT - (moat-project.org) — Meaning Of A Tag Monday, June 6, 2011
  • 245. Structured  Data (Social  Sensor  Server) Monday, June 6, 2011
  • 246. Structured  Data (Social  Sensor  Server) A snippet of the annotation <http://twitter.com/ bob/statuses/123456789>   rdf:type   sioct:MicroblogPost ;   sioc:content  ”Fingers crossed for the upcoming #hcrvote”   sioc:hascreator   <http://twitter.com/bob> ;   foaf:maker    <http://example.org/bob> ;   moat:taggedWith   dbpedia:Healthcare_reform . <http://twitter.com/bob> geonames:locatedIn Dbpedia:Ohio . Monday, June 6, 2011
  • 248. Semantic  Publisher Virtuoso to store triples Queries formulated by the users are stored SPARQL protocol over the HTTP to access rdf from the store Combine data from tweet with the background knowledge in the rdf store  Monday, June 6, 2011
  • 249. Application  Server  &  Distribution   Hub Monday, June 6, 2011
  • 250. Application  Server  &  Distribution   Hub Distribution  Hub ‣  PUSH  Model  -­‐‑  Pubsubhubbub  protocol ‣  Pushes  the  tweets  to  the  Application  Server Application  Server ‣  Delivers  data  to  the  Clients ‣  RSS  Enable  Concept  feeds Monday, June 6, 2011
  • 251. Brand  Tracking  -­‐‑  Example Background  Knowledge  (e.g.  DBpedia) @anonymized Lorem ipsum bla bla this is an example tweet ?category skos:subject ? skos:subject competitor skos:subject moat:taggedWith dbpedia:IPad ?tweet Monday, June 6, 2011
  • 252. Brand  Tracking  -­‐‑  Example Background  Knowledge  (e.g.  DBpedia) ?category skos:subject ? skos:subject competitor bla this is an example tweet @anonymized skos:subject Lorem ipsum bla moat:taggedWith dbpedia:IPad ?tweet Monday, June 6, 2011
  • 253. Brand  Tracking  -­‐‑  Example Background  Knowledge  (e.g.  DBpedia) category:Wi-Fi category:Touchscreen ?category skos:subject ? skos:subject competitor bla this is an example tweet @anonymized skos:subject Lorem ipsum bla moat:taggedWith dbpedia:IPad ?tweet Monday, June 6, 2011
  • 254. Brand  Tracking  -­‐‑  Example Background  Knowledge  (e.g.  DBpedia) IPhone HPTabletPC category:Wi-Fi category:Touchscreen ?category skos:subject ? skos:subject competitor bla this is an example tweet @anonymized skos:subject Lorem ipsum bla moat:taggedWith dbpedia:IPad ?tweet Monday, June 6, 2011
  • 255. 1242  Articles  from  Nytimes Around  800,000  tweets Monday, June 6, 2011
  • 256. President  Obama   1242  Articles  from  Nytimes lays  out  plan  for   Around  800,000  tweets Health  care  reform   in  Speech  to  Joint   Session  of  Congress   (10th  Sept   Timeline.com) Monday, June 6, 2011
  • 257. President  Obama   1242  Articles  from  Nytimes lays  out  plan  for   Around  800,000  tweets Health  care  reform   in  Speech  to  Joint   Session  of  Congress   (10th  Sept   Timeline.com) Obama  taking  an   active  role  in  Health   talks  in  pursuing  his   proposed  overhaul   of  health  care   system.  (13th  Aug   Monday, June 6, 2011
  • 258. Twarql  on  Linked  Open  Data Monday, June 6, 2011
  • 259. Twarql  on  Linked  Open  Data Monday, June 6, 2011
  • 260. Emerging  Research  Areas   Monday, June 6, 2011
  • 261. Spam  in  Social  Networks Reasons for spamming include: ‣ Gaining Popularity ‣ Use of popular topic related keywords (e.g. hashtags of trending topics) to propagate something off topic. Launching malicious attacks ‣ Phishing attacks, virus, malware etc. ‣ Misleading the masses ‣ Propagating false information [MM-10]. Monday, June 6, 2011
  • 262. Spam  in  Social  Networks Gaining popularity using trending keywords: This tweet uses #Cairo but refers to a fashion website. Monday, June 6, 2011
  • 263. Spam  in  Social  Networks Gaining popularity using trending keywords: This tweet uses #Cairo but refers to a fashion website. Monday, June 6, 2011
  • 264. Spam  in  Social  Networks Gaining popularity using trending keywords: This tweet uses #Cairo but refers to a fashion website. Monday, June 6, 2011
  • 265. Spam  in  Social  Networks Gaining popularity using trending keywords: This tweet uses #Cairo but refers to a fashion website. Egypt Protests Monday, June 6, 2011
  • 266. Spam  in  Social  Networks Gaining popularity using trending keywords: This tweet uses #Cairo but refers to a fashion website. Egypt Protests Monday, June 6, 2011
  • 267. Spam  in  Social  Networks Gaining popularity using trending keywords: This tweet uses #Cairo but refers to a fashion website. Egypt Protests Monday, June 6, 2011
  • 268. Spam  in  Social  Networks Gaining popularity using trending keywords: This tweet uses #Cairo but refers to a fashion website. Egypt Protests Monday, June 6, 2011
  • 269. Spam  in  Social  Networks Gaining popularity using trending keywords: This tweet uses #Cairo but refers to a fashion website. Egypt Protests Monday, June 6, 2011
  • 270. Spam  in  Social  Networks Gaining popularity using trending keywords: This tweet uses #Cairo but refers to a fashion website. Egypt Protests Monday, June 6, 2011
  • 271. Spam  in  Social  Networks Spam detection ‣ Content-based features ‣ Content Size, URL type, spam words ‣ Metadata-based features ‣ Account information, behavior. ‣ Network-based features ‣ Provenance. (e.g. content from a reliable source) Monday, June 6, 2011
  • 272. Trust  in  Social  Networks Reputation, Policy, Evidence, and Provenance used to derive trustworthiness. Illustrative examples of online cues used for trust assessment. ‣ Wikipedia: article size, number of references, author, edit history, age of the article, edit frequency etc. ‣ Product Reviews: number of helpful, very helpful ratings, author expertise, sentiments in comments received for a review etc. Monday, June 6, 2011
  • 273. Trust  in  Social  Networks We propose trust ontology[AHTS-10] that ‣ Captures semantics of trust. ‣ Enables representation and reasoning with trust. Semantics of Trust specifies, for a given trustor and trustee, the following features. ‣ Type - Type of trust relationship. ‣ Scope - Context of the trust relationship. ‣ Value - Quantifies the trust relationship. Monday, June 6, 2011
  • 274. Trust  in  Social  Networks Gleaning primitive (edge) trust ‣ Trust value between two nodes is quantified using numbers. E.g., [0,1] or [-1,1] or partial ordering[TAHS-09]. Gleaning composite (path) trust ‣  Propagation via chaining and aggregation (transitivity) Some popular algorithms for trust computation  ‣ Eigentrust, Spreading Activation, SUNNY etc. Monday, June 6, 2011
  • 275. Integrating  Social  And   Sensor  Networks Machine sensor observations are quantitative in nature, while human observations can be both qualitative and quantitative. Benefits of combining observations from humans and machine sensors ‣ Complementary evidence. ‣ Corroborative evidence Monday, June 6, 2011
  • 276. Integrating  Social  And   Sensor  Networks Applications of integrating heterogeneous sensor observations ‣ Situation Awareness by using  human observations to interpret machine sensor observations. ‣ Enhancing trustworthiness using corroborative evidence. Monday, June 6, 2011
  • 277. Mobile  Social  Computing Instant  Discovery:  Geo-­‐‑tagging  and  location-­‐‑ aware  services,  in  combination  with  search,  have   made  discovery  a  two-­‐‑way  street. Compressed  Expression:  Mobile  makes  social   networking  even  more  compelling Outsourced  Memory:  Cloud-­‐‑based  servers  to   store  all  of  their  mobile  applications  and   databases Monday, June 6, 2011
  • 278. Mobile  Social  Computing Compressed  Expression:  Mobile  makes  social   networking  even  more  compelling Outsourced  Memory:  Cloud-­‐‑based  servers  to   store  all  of  their  mobile  applications  and   databases Monday, June 6, 2011
  • 279. Mobile  Social  Computing Outsourced  Memory:  Cloud-­‐‑based  servers  to   store  all  of  their  mobile  applications  and   databases Monday, June 6, 2011
  • 282. Mobile  Social  Computing Automated Decisions: Smart apps helps to make faster decisions or even apps makes decisions for us Peer Power: Mobiles can create social movements based on peer influence Monday, June 6, 2011
  • 283. Mobile  Social  Computing  (Cont.) Personalized Branding: advertising are rapidly becoming personalized based on individual's needs and preferences  Mobiles in social development becoming an integral part of development  ‣ Coordination in disaster situations ‣ Health care delivery, especially in developing countries ‣ Elections and other forms of political expression Monday, June 6, 2011
  • 285. Twitris  -­‐‑  Motivation 1. Information Overload Multiple events around us WHAT to be aware of Multiple Storylines about same event!! Monday, June 6, 2011
  • 286. Twitris  -­‐‑  Motivation 2. Evolution of Citizen Observation ‣ with location and time  Monday, June 6, 2011
  • 287. Twitris  -­‐‑  Motivation   3. Semantics of Social perceptions ‣ What is being said about an event (theme) ‣ where (spatial) ‣ When (temporal ) Twitris lets you browse citizen reports using social perceptions as the fulcrum Monday, June 6, 2011
  • 288. Twitris:  Semantic  Social  Web   Mash-­‐‑up Facilitates  understanding  of  multi-­‐‑dimensional  social  perceptions  over   SMS,  Tweets,  multimedia  Web  content,  electronic  news  media Monday, June 6, 2011