A sample of 2.5M tweets mentioning "Ebola" was collected during November 5-12, 2014. The titles of the 6227 web pages referenced by the tweets were used to cluster the web pages into roughly 100 topics. Then Paragon Science's patented dynamic anomaly detection software (http://www.paragonscience.com/intellectual_property.htm) then identified the top five most-anomalous topics. This research demonstrates how these techniques allow us to focus attention quickly on viral, emerging topics. A video showing an animation of those anomalous topics and the key related web pages for every hour of that week in November 2014 is available at https://www.youtube.com/watch?v=AEQ02hv4Xjw.
2. What Are We Doing?
• Provide valuable intelligence results to clients using our
dynamic anomaly detection software and data mining tools
• Many possible application areas:
– Social media alerting and sentiment change detection
– Analysis of web trends and user activities
– Pricing and market trend analysis and alerting
– Network defense against cyberattacks
– Insider threat detection
– Fraud prevention (banking, insurance, online auctions,…)
– Healthcare data mining
Paragon Science, Inc. 2
3. How Is It Done Today?
• Existing approaches
– Standard SNA metrics
– Rule-based systems (transaction profiling, etc.)
– Bayesian and other statistical/probabilistic models
– Machine learning tools (neural nets, HMMs, etc.)
• Some limitations of existing methods
– Training requirements can be large for neural nets.
– For rule-based systems, it is difficult to effectively predict or define
new “bad” anomalies or patterns in advance.
– Many current methods are not scalable to real-world operational
requirements.
Paragon Science, Inc. 3
4. What Is New in Our Patented
Approach?
• A powerful anomaly detection approach that incorporates
nonlinear time series analysis methods
– US Patent #8738652 (1.usa.gov/1kkyVD9)
“Systems and Methods for Dynamic Anomaly Detection”
• Key questions answered:
– Which entities behave or evolve differently than others in the data set?
– Which entities have shifted their behavior unexpectedly?
Paragon Science, Inc. 4
5. What Is New in Our Approach?
(Cont’d.)
• Our framework inherently captures the dynamics of the entities under
study, without having to specify in advance normal vs. abnormal
behavior.
• We can simultaneously analyze the time evolution of
– Network structures
– Any associated attributes (text terms, geospatial position, etc.)
• Our technique is robust with respect to missing or erroneous data.
• As result, we can
– Find key players in rapidly changing networks
– Provide early warning of viral videos and online documents
– Focus attention on the most-anomalous events or transactions
Paragon Science, Inc. 5
6. Dynamic Anomaly Detection Overview
• A general approach that incorporates nonlinear time series
analysis methods
– Complexity measures
– Finite-time Lyapunov exponents (FTLEs)
• Input data
– Communications or transactional data streams
– General time-dependent data sets
• Key questions
– Which entities behave or evolve differently than others in the data
set?
– Which entities have shifted their behavior unexpectedly?
Paragon Science, Inc. 6
7. Finite-Time Lyapunov Exponents
(FTLEs)
• General dynamical system
• Flow map
– Advects points in the state
space
– Describes the time evolution of
the system
Paragon Science, Inc. 7
8. Finite-Time Lyapunov Exponents
(FTLEs)
• FTLEs characterize the amount of stretching or contraction
about a point x0 during a time interval T
– Stability
– Predictability
• Definition
Paragon Science, Inc. 8
9. Derived Jacobian Vectors
• Similarly, characteristic vectors derived from the flow map’s
Jacobian can describe the generalized directions of the
local stretching or contraction.
• Possible derivation approaches:
– Weight-based column sampling
– Singular value decomposition (SVD)
– Principal component analysis (PCA)
Paragon Science, Inc. 9
10. Paragon Dynamic Anomaly Detection
Paragon Science, Inc. 10
Representation
of Data at t=ti
Cluster
Resolution
Feature Vector
Encoding
Outlier Detection
at t=ti
More Time
Intervals?
Yes
No
Clustering /
Segmentation
Dynamic Anomaly Detection
Nonlinear Time Series Analysis
FTLEs, Dynamic Thresholds, etc.
Pattern
Classification
Outlier
Detection
Domain-Specific Filtering
Threat Signatures,
Risk Profiles, etc.
11. Example: Ebola Twitter Analysis
• Sample data set from Twitter API collected using twittertap
– Date range: 11/8/2014 – 11/16/2014
– 2,541,812 tweets
– 4,708,678 generated links with hashtags, URLs, and user replies
• Research plan
– Perform k-core decomposition
– Run anomaly detection software on sub-networks of nodes in the
central core to find the most influential users and most viral URLs
– Carry out community detection and topic detection
Paragon Science, Inc. 11
12. K-Core Decomposition of the Ebola Network
Paragon Science, Inc. 12
http://sourceforge.net/projects/lanet-vi/
14. Top URLs in the Central Core
Paragon Science, Inc. 14
URL K
Shell
Degree
http://goo.gl/pFg3Z2 49 279
http://goo.gl/BFEUgy 49 233
http://goo.gl/S37kHT 49 212
http://goo.gl/silISF 47 364
http://invst.rs/7MKWHB 22 779
http://cnn.it/1wlIlUe 22 741
http://trib.al/YKSMCSN 22 734
http://nyp.st/136BPG3 22 698
http://nypost.com/2014/10/29/cdc-admits-droplets-from-a-sneeze-could-
spread-ebola/
22 415
http://fxn.ws/1oVgLwc 22 406
15. Top-Ranked Website (URLs 1, 2, and 4)
Paragon Science, Inc. 15
UMA MENTIRA CHAMADA ,,EBOLA,, VEJAM !!! |
NOTICIÃRIO DA WEB
A statement made by a man in Ghana called Nana Kwame
rocked the internet in recent days. The following information
has to reach people. We need to see the Ebola for what it
really is. It's time to wake up the world agenda behind this
whole story.
Follow what this man has to say about what is happening in
their country of origin:
People in the world need to know what is happening here in
West Africa. They are lying! The '' Ebola''como a virus does
not exist and is not contagious. The Red Cross brought a
disease to four specific countries, for four specific reasons and
is only contracted by those who receive treatments and
injections of the Red Cross. That's why Liberians and
Nigerians began to expel the Red Cross in their countries!
18. Topic Detection in the Ebola Twitter
Network
Paragon Science, Inc. 18
User A User B
User C
replies to
mentions
URL 1 URL 2
references
Term 1
Term 2
Term N
Term 3
Topic 1
Topic 2
Topic M
20. Key Sites Related to Top 5 Ebola Topic Anomalies
Paragon Science, Inc. 20
Topic Max
Change
Metric
Peak
Datetime
Top Related URL Title
Topic
99
2.973 2014-11-06
17:18:27
FACT SHEET: Emergency Funding Request to Enhance the
U.S. Government’s Response to Ebola at Home and
Abroad | The White House
Topic
8
2.888 2014-11-05
20:18:27
BBC News - Ebola outbreak: Barack Obama 'to ask
Congress for $6bn'
Topic
59
2.426 2014-11-07
02:18:27
» Obama Caught Ordering Press to Cover Up Ebola Alex
Jones' Infowars: There's a war on for your mind!
Topic
1
2.321 2014-11-05
17:18:27
UMA MENTIRA CHAMADA ,,EBOLA,, VEJAM !!! |
NOTICIÃRIO DA WEB
Topic
52
2.296 2014-11-05
17:18:27
Nigeria Property: Ebola Virus Originated From US Bio-
warfare Labs In West Africa – American Prof
27. Topic 99f: Follow Ebola
Paragon Science, Inc. 27
Follow Ebola | Updated every second & see what the
#CDC & #WHO is not telling you about #Ebola
28. Animation of Evolving Topic Network
Paragon Science, Inc. 28
http://youtu.be/AEQ02hv4Xjw
29. Paragon Science, Inc. 29
What Are the Payoffs?
• Quickly identify key influencers and trends in online
networks
• Provide early warning of viral videos, anomalous
web events, or unusual network traffic
• Enable enhanced business intelligence without
having to specify normal vs. abnormal behavior in
advance
29Paragon Science, Inc.
30. 30
Third-Party Software Acknowledgements
• Paragon Science gratefully acknowledges the following researchers and software
providers:
– Cytoscape (http://www.cytoscape.org/)
– dynnetwork Cytoscape plugin (https://code.google.com/p/dynnetwork/)
– Lanet-vi (http://sourceforge.net/projects/lanet-vi/)
• J. Alvarez-Hamelin, et al. "Understanding Edge Connectivity in the Internet through
Core Decomposition," Internet Mathematics 7 (1): 45–66, 2011.
– Louvain community detection software (http://perso.crans.org/aynaud/communities/)
• V. Blondel, et al., “Fast Unfolding of Communities in Large Networks,” Journal of
Statistical Mechanics: Theory and Experiment, 10, P10008, 2008.
– Networkx (https://networkx.github.io/)
• A Hagberg, D Conway, "Hacking social networks using the Python programming
language (Module II - Why do SNA in NetworkX)", Sunbelt 2010: International
Network for Social Network Analysis.
Paragon Science, Inc.