SlideShare a Scribd company logo
1 of 25
Download to read offline
Detec%ng
Decep%on
in

        Wri%ng
Style


Sadia
Afroz,
Michael
Brennan
and
Rachel
Greenstadt.

        Privacy,
Security
and
Automa%on
Lab

                   Drexel
University

Overview

•  Authorship
recogni%on

•  Authorship
recogni%on
in
adversarial

   environment

•  Decep%on
detec%on

•  Experiments
on
different
datasets

Authorship
recogni%on






















Who
wrote
the
document?

Authorship
recogni%on

Stylometry:



  –  An
authorship
recogni%on
system
based
solely
on

     wri%ng
style.

  –  Not
handwri%ng

  –  Only
linguis%c
style:
word
choice,
sentence
length,

     parts‐of‐speech
usage,
…

Why
it
works?



•  Everybody
has
learned
language
differently

How
regular
authorship
recogni%on

              works



           Extract
features

                               Machine
Learning

                                   System

Extract

                       Determine

           features
                       authorship

                       Machine
Learning

                           System



Document
of


unknown
authorship

Assump%ons

•  Wri%ng
style
is
invariant.


   –  It’s
like
a
fingerprint,
you
can’t
really
change
it.


   –  Authorship
recogni%on
can
iden%fy
you
if
there

      are
sufficient
wri%ng
samples
and
a
set
of

      suspects.

Adversarial
AUacks


•  Imita%on
or
framing
aUack

    –  Where
one
author
imitates
another
author

    –  Par%cipants
were
asked
to
imitate
Cormac

       McCarthy
in
wri%ng
about
their
day

•  Obfusca%on
aUack

    –  Where
an
author
hides
his
regular
style

    –  Par%cipants
were
asked
to
write
about
their

       neighborhood
in
a
different
style

M.
Brennan
and
R.
Greenstadt.
Prac%cal
aUacks
against
authorship
recogni%on

techniques.
In
Proceedings
of
the
Twenty‐First
Conference
on
Innova%ve

Applica%ons
of
Ar%ficial
Intelligence
(IAAI),
Pasadena,
CA,
2009.

Accuracy
in
detec6ng
authorship
of
regular

                             documents

  1

0.9

0.8

0.7

0.6

0.5
        More
than
80%
accurate
in
detec%ng
          9‐Feature
(NN)

            authorship
with
40
authors
in
regular
       Synonym‐Based

0.4

                         document
                       Writeprints
Baseline
(SVM)

0.3
                                                     Random

0.2

0.1

  0

       5
     10
   15
   20
   25
    30
   35
   40

                      Number
of
Authors

Accuracy
in
detec6ng
authorship
of

                 Obfuscated
documents

  1

0.9

0.8

0.7

0.6

                                                       9‐Feature
(NN)

0.5

       Accuracy
is
less
than
random
chance

           Synonym‐Based

0.4
   in
obfuscated
documents
                        Writeprints
Baseline
(SVM)

0.3
                                                   Random

0.2

0.1

  0

       5
   10
   15
   20
   25
    30
   35
   40

                    Number
of
Authors

Accuracy
in
detec6ng
authorship
of

                   Imitated
documents

  1

0.9

0.8

0.7

0.6

                                                       9‐Feature
(NN)

0.5

       Accuracy
is
less
than
random
chance

           Synonym‐Based

0.4
   in
imitated
documents
                          Writeprints
Baseline
(SVM)

0.3
                                                   Random

0.2

0.1

  0

       5
   10
   15
   20
   25
    30
   35
   40

                    Number
of
Authors

Can
we
detect
Stylis%c
Decep%on?



                      Imitated

           Regular


                Obfuscated

Extended‐Brennan‐Greenstadt


                    Corpus

•  56
authors


   –  12
of
the
par%cipants
are
from
Drexel
University

   –  The
rest
are
paid
workers
from
Amazon
mechanical
turk

•  Three
kinds
of
wri%ng
samples

   –  Regular
wri%ngs
(5000
words)

   –  Imitated
wri%ng

      •  A
500‐word
ar%cle
describing
a
day


      •  Imitate
Cormac
McCarthy
from
`The
Road’

   –  Obfuscated
wri%ng

      •  A
500‐word
ar%cle

describing
neighborhood

      •  Hide
own
wri%ng
style

Detec%ng
stylis%c
decep%on
is
possible


100
   98
   94.5

                                  89.5

 90
     95.7
       85

 80
                    75.3

 70

                                      59.9

 60

                            48
                 Writeprint,
SVM

 50
                                      43

                                                Lying‐detec%on,
J48

 40

                                                9‐feature
set,
J48

 30

 20

 10

  0

        Regular
     Imita%on
    Obfusca%on

Feature
Changes
in
Imita6on
and
Obfusca6on


            Personal
pronoun

              Sentence
count

                      Par%cle

                 Short
Words

                        Verb

               Unique
words

                      Adverb

             Existen%al
there
                                        Imita%on

   Average
syllables
per
word
                                        Obfusca%on

         Average
word
length

                    Adjec%ve

            Cardinal
number

Gunning‐Fog
readability
index

     Average
sentence
length


              ‐80
 ‐60
 ‐40
 ‐20
   0
   20
   40
   60
   80
 100

Problem
with
the
dataset:

             Topic
Similarity

•  All
the
decep%ve
documents
were
of
same

   topic.

                                             5,$6.)78)9+,$($-.)8$%.'($)&$.)+-)9$.$60-1)
                                                      %9:$(&%(+%4)%'.;7(&;+3)
                                            $"



•  Non‐content‐specific

                                          !#,"
                                          !#+"
                                          !#*"




                             !"#$%&'($)

features
have
same


                                          !#)"
                                          !#("                                                        =>3/0<1<"
                                          !#'"                                                        ?5@-<08"
                                          !#&"

effect
as
content‐specific


                                                                                                      A23/53/"
                                          !#%"
                                          !#$"
                                            !"

features.
                                       -.-/0123"           4567804"
                                                             *+,$($-.)/(+0-1)2%#34$&)
                                                                                        29:7;<0123"
Hemingway‐Faulkner
Imita%on


                Corpus

•  Ar%cles
from
the
Interna%onal
Imita%on

   Hemingway
Contest
(2000‐2005)

•  Ar%cles
from
the
Faux
Faulkner
Contest

   (2001‐2005)

•  Original
excerpts
of
Ernest
Hemingway
and

   William
Faulkner

Decep%on
detec%on
is
possible

even
when
the
topic
is
not
similar



•  81.2%
accurate
in
detec%ng
imitated

   documents.

Long
term
decep%on:

            A
Gay
Girl
In
Damascus





Thomas
MacMaster.

                                      Fake
picture
of
Amina
Arraf.

–  Original
author
was
a
40‐year
old
American
ci%zen,

   Thomas
MacMaster.

–  Pretended
to
be
a
Syrian
gay
woman,
Amina
Arraf.

–  The
author
worked
for
at
least
5
years
to
create
a

   new
style.

Long
term
decep%on
is
hard
to
detect

•  None
of
the
blog
posts
were
found
to
be

   decep%ve.

•  But
regular
authorship
recogni%on
can
help.

•  We
tried
to
aUribute
authorship
of
the
blog

   posts
using
Thomas
(as
himself),
Thomas
(as

   Amina),
BriUa
(Thomas’s
wife).

Long
term
decep%on

 Authorship
recogni%on
of
the
blog

               posts





Thomas
MacMaster.
   Amina
Arraf
   BriUa
(Thomas’s
wife)


   54%
                    43%
                    3%

Future
works

•  Intrusion
detec%on

•  Social
spam
detec%on

•  Iden%fying
quality
discourse

Two
Tools

•  JStylo:
Authorship
Recogni%on
Analysis
Tool.

•  Anonymouth:
Authorship
Recogni%on
Evasion

   Tool.



•  Free,
Open
Source.
(GNU
GPL)

•  Alpha
releases
available
today
at

   hUps://psal.cs.drexel.edu

   –  Migra%ng
to
GitHub
soon.

Privacy,
Security
and
Automa%on
Lab

      (hUps://psal.cs.drexel.edu)

•  Faculty

   –  Dr.
Rachel
Greenstadt

•  Graduate
Students

   –  Sadia
Afroz
(Decep%on
Detec%on
Lead)

   –  Diamond
Bishop

   –  Michael
Brennan

   –  Aylin
Caliskan

   –  Ariel
Stolerman
(JStylo
Lead
Developer)

•  Undergraduate
Students

   –  Pavan
Kantharaju

   –  Andrew
McDonald
(Anonymouth
Lead
Developer)


More Related Content

More from pamselle

Power Spriting With Compass
Power Spriting With CompassPower Spriting With Compass
Power Spriting With Compasspamselle
 
Aylin Caliskan: Quantifying the Translator Effect: Identifying authors and ma...
Aylin Caliskan: Quantifying the Translator Effect: Identifying authors and ma...Aylin Caliskan: Quantifying the Translator Effect: Identifying authors and ma...
Aylin Caliskan: Quantifying the Translator Effect: Identifying authors and ma...pamselle
 
Kamelia Aryafar: Musical Genre Classification Using Sparsity-Eager Support Ve...
Kamelia Aryafar: Musical Genre Classification Using Sparsity-Eager Support Ve...Kamelia Aryafar: Musical Genre Classification Using Sparsity-Eager Support Ve...
Kamelia Aryafar: Musical Genre Classification Using Sparsity-Eager Support Ve...pamselle
 
GDI WordPress 4 January 2012 (white)
GDI WordPress 4 January 2012 (white)GDI WordPress 4 January 2012 (white)
GDI WordPress 4 January 2012 (white)pamselle
 
GDI WordPress 4 January 2012
GDI WordPress 4 January 2012GDI WordPress 4 January 2012
GDI WordPress 4 January 2012pamselle
 
GDI WordPress 3 January 2012 (white background)
GDI WordPress 3 January 2012 (white background)GDI WordPress 3 January 2012 (white background)
GDI WordPress 3 January 2012 (white background)pamselle
 
GDI WordPress 3 January 2012
GDI WordPress 3 January 2012GDI WordPress 3 January 2012
GDI WordPress 3 January 2012pamselle
 
GDI WordPress 2 January 2012
GDI WordPress 2 January 2012 GDI WordPress 2 January 2012
GDI WordPress 2 January 2012 pamselle
 
Gdi word press_2
Gdi word press_2Gdi word press_2
Gdi word press_2pamselle
 
GDI WordPress 1 January 2012
GDI WordPress 1 January 2012GDI WordPress 1 January 2012
GDI WordPress 1 January 2012pamselle
 

More from pamselle (10)

Power Spriting With Compass
Power Spriting With CompassPower Spriting With Compass
Power Spriting With Compass
 
Aylin Caliskan: Quantifying the Translator Effect: Identifying authors and ma...
Aylin Caliskan: Quantifying the Translator Effect: Identifying authors and ma...Aylin Caliskan: Quantifying the Translator Effect: Identifying authors and ma...
Aylin Caliskan: Quantifying the Translator Effect: Identifying authors and ma...
 
Kamelia Aryafar: Musical Genre Classification Using Sparsity-Eager Support Ve...
Kamelia Aryafar: Musical Genre Classification Using Sparsity-Eager Support Ve...Kamelia Aryafar: Musical Genre Classification Using Sparsity-Eager Support Ve...
Kamelia Aryafar: Musical Genre Classification Using Sparsity-Eager Support Ve...
 
GDI WordPress 4 January 2012 (white)
GDI WordPress 4 January 2012 (white)GDI WordPress 4 January 2012 (white)
GDI WordPress 4 January 2012 (white)
 
GDI WordPress 4 January 2012
GDI WordPress 4 January 2012GDI WordPress 4 January 2012
GDI WordPress 4 January 2012
 
GDI WordPress 3 January 2012 (white background)
GDI WordPress 3 January 2012 (white background)GDI WordPress 3 January 2012 (white background)
GDI WordPress 3 January 2012 (white background)
 
GDI WordPress 3 January 2012
GDI WordPress 3 January 2012GDI WordPress 3 January 2012
GDI WordPress 3 January 2012
 
GDI WordPress 2 January 2012
GDI WordPress 2 January 2012 GDI WordPress 2 January 2012
GDI WordPress 2 January 2012
 
Gdi word press_2
Gdi word press_2Gdi word press_2
Gdi word press_2
 
GDI WordPress 1 January 2012
GDI WordPress 1 January 2012GDI WordPress 1 January 2012
GDI WordPress 1 January 2012
 

Recently uploaded

Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 

Recently uploaded (20)

Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 

Sadia Afroz: Detecting Hoaxes, Frauds, and Deception in Writing Style Online