SlideShare a Scribd company logo
1 of 20
Download to read offline
Network analysis using
 Hadoop en Neo4j




                  Friso van Vollenhoven
                                   @fzk
           fvanvollenhoven@xebia.com
Why networks matter?
ws
RT: @krisgeus...




                                            follo
                   RT: @krisgeus...
     RT




                                 ows
                   RT



                            foll          @fzk


                         RT
                                  RT: @krisgeus...
s   to
goe
See any meaningful patterns?
A toy problem...

TABLE_DUMP2|1332345590|B|195.66.224.97|1299|1.11.64.0/21|
    1299 6461 9318 38091|EGP|195.66.224.97|0|0||NAG||




            AS1299               AS9318



                        AS6461            AS38091
TABLE_DUMP2|1332345590|B|195.66.224.97|1299|1.11.64.0/21
TABLE_DUMP2|1332345590|B|195.66.224.97|1299|1.11.64.0/21
TABLE_DUMP2|1332345590|B|195.66.224.97|1299|1.11.64.0/21
TABLE_DUMP2|1332345590|B|195.66.224.97|1299|1.11.64.0/21




                raw
TABLE_DUMP2|1332345590|B|195.66.224.97|1299|1.11.64.0/21
TABLE_DUMP2|1332345590|B|195.66.224.97|1299|1.11.64.0/21




                                                           transform     nodes.txt
TABLE_DUMP2|1332345590|B|195.66.224.97|1299|1.11.64.0/21
TABLE_DUMP2|1332345590|B|195.66.224.97|1299|1.11.64.0/21
TABLE_DUMP2|1332345590|B|195.66.224.97|1299|1.11.64.0/21
TABLE_DUMP2|1332345590|B|195.66.224.97|1299|1.11.64.0/21




                data                                                         +
TABLE_DUMP2|1332345590|B|195.66.224.97|1299|1.11.64.0/21
TABLE_DUMP2|1332345590|B|195.66.224.97|1299|1.11.64.0/21
TABLE_DUMP2|1332345590|B|195.66.224.97|1299|1.11.64.0/21
TABLE_DUMP2|1332345590|B|195.66.224.97|1299|1.11.64.0/21



                                                                         edges.txt
TABLE_DUMP2|1332345590|B|195.66.224.97|1299|1.11.64.0/21
TABLE_DUMP2|1332345590|B|195.66.224.97|1299|1.11.64.0/21
TABLE_DUMP2|1332345590|B|195.66.224.97|1299|1.11.64.0/21




                                                                                     en
TABLE_DUMP2|1332345590|B|195.66.224.97|1299|1.11.64.0/21
TABLE_DUMP2|1332345590|B|195.66.224.97|1299|1.11.64.0/21
TABLE_DUMP2|1332345590|B|195.66.224.97|1299|1.11.64.0/21
TABLE_DUMP2|1332345590|B|195.66.224.97|1299|1.11.64.0/21




                                                                                      ric
                                                                                          h
                                                                                          ?
                                                                       import   enriched-nodes.txt
                                                                                        +
                                                                                enriched-edges.txt

                                                             qu
                                                           int er
                                                              er y /
                                                                ac
                                                                   t
[head]




                                                                   GlobHfs[/Users/friso/Downloads/bview/alltxt.txt]


                      [{14}:'proto', 'time', 'type', 'peerip', 'peeras', 'prefix', 'path', 'origin', 'nexthop', 'localpref', 'MED', 'community', 'AAGG', 'aggregator'] [{14}:'proto',
                      [{14}:'proto', 'time', 'type', 'peerip', 'peeras', 'prefix', 'path', 'origin', 'nexthop', 'localpref', 'MED', 'community', 'AAGG', 'aggregator'] [{14}:'proto',


Each('nodes')[PathToNodes[decl:'id', 'name']]


                                        [{2}:'id', 'name']
                                                                                                Each('edges')[PathToEdges[decl:'from', 'to', 'updatecount']]
                                        [{2}:'id', 'name']

                                                                                                                                  [{3}:'from', 'to', 'updatecount']
                      Each('nodes')[FilterPartialDuplicates[decl:'id', 'name']]
                                                                                                                                  [{3}:'from', 'to', 'updatecount']

                                                       [{2}:'id', 'name']
                                                                                                               GroupBy('edges')[by:['from', 'to']]
                                                       [{2}:'id', 'name']

                                                                                                                                    edges[{2}:'from', 'to']
                                          GroupBy('nodes')[by:['id']]
                                                                                                                                [{3}:'from', 'to', 'updatecount']

                                                          nodes[{1}:'id']
                                                                                                    Every('edges')[Sum[decl:'updatecount'][args:1]]
                                                         [{2}:'id', 'name']

                                                                                                                               [{3}:'from', 'to', 'updatecount']
                                    Every('nodes')[First[decl:'id', 'name']]
                                                                                                                               [{3}:'from', 'to', 'updatecount']

                                                        [{2}:'id', 'name']
                                                                                           Hfs['TextDelimited[['from', 'to', 'updatecount']]']['/tmp/edges']']
                                                        [{2}:'id', 'name']

                                                                                                                           [{3}:'from', 'to', 'updatecount']
                                         Hfs['TextDelimited[['id', 'name']]']['/tmp/nodes']']
                                                                                                                           [{3}:'from', 'to', 'updatecount']

                                                                                          [{2}:'id', 'name']
                                                                                          [{2}:'id', 'name']


                                                                                                    [tail]
http://bit.ly/IzWvcT and http://bit.ly/HHNNIb
nodes.txt:
1       AS1       LVLT-1 - Level 3 Communications, Inc.
10      AS10      CSNET-EXT-AS - CSNET Coordination and Information Center (CSNET-CIC)
100     AS100     FMC-CTC - FMC Central Engineering Laboratories
1000    AS1000    GONET-ASN-17 - GONET
10000   AS10000   NCM Nagasaki Cable Media Inc.
10001   AS10001   MICSNET Mics Network Corporation
10002   AS10002   ICT IGAUENO CABLE TELEVISION CO.,LTD
10003   AS10003   OCT-NET Ogaki Cable Television Co.,Inc.
10004   AS10004   AS-PHOENIX-J JIN Office Service Inc.
10006   AS10006   SECOMTRUST SECOM Trust Systems Co.,Ltd.
10010   AS10010   TOKAI TOKAI Communications Corporation
10011   AS10011   ADVAN advanscope.inc
10012   AS10012   FUSION Fusion Communications Corp.


edges.txt:
1       21616     3
1       3705      3
1       2         3
2       3         1
3       4         2
3       11488     2
4       5         1
10      10        2
10      13227     2
12      12        1
public class SillyImporter {
	   private static enum ConnectionTypes implements RelationshipType {
	   	   FOLLOWS;
	   }
	
	   public static void main(String[] args) {
	   	   BatchInserter database = new BatchInserterImpl("/Users/friso/Desktop/graph.db");
	   	   BatchInserterIndexProvider provider = new LuceneBatchInserterIndexProvider(database);
	   	   BatchInserterIndex index = provider.nodeIndex("allnodes", stringMap(
	   	   	   	  "type", "fulltext",
	   	   	   	  IndexManager.PROVIDER, "lucene"
	   	   	   	  ));
	   	
	   	   long fzkNodeId = database.createNode(map(
	   	   	   	  new Object[] {"name", "fzk", "tweets", 25L} //node properties
	   	   	   	  ));
	   	   index.add(fzkNodeId, map(new Object[] {"name", "fzk"}));
	   	
	   	   long krisgeusNodeId = database.createNode(map(
	   	   	   	  new Object[] {"name", "krisgeus", "tweets", 100L} //node properties
	   	   	   	  ));
	   	   index.add(krisgeusNodeId, map(new Object[] {"name", "krisgeus"}));
	   	
	   	   database.createRelationship(
	   	   	   	  krisgeusNodeId, //from node
	   	   	   	  fzkNodeId, //to node
	   	   	   	  ConnectionTypes.FOLLOWS, //relationship type
	   	   	   	  map(new Object[] {"retweets", 3})); //relationship properties
	   	
	   	   index.flush();
	   	
	   	   provider.shutdown();
	   	   database.shutdown();
	   }
}
30M nodes + 250M edges, < 30 minutes
            (if your graph fits in memory)
Cypher graph query language

start a = node:allnodes(‘name:”fzk”’)
match p = a-[r]-b
return p




  All relationships from node with name=fzk
Cypher graph query language

    start a = node:allnodes(‘name:”fzk”’)
    match p = a-[r]-b
    where any(x in r.amounts where x > 500)
    return p




  All relationships from node with name=fzk, where any
element in the array property amounts of the relationship
                     is greater than 500
Cypher graph query language

     start a = node:allnodes(‘name:”fzk”’),
     b = node:allnodes(‘name:”krisgeus”’)
     match p = a-[:FOLLOWS*1..3]-b
     return p




All paths with a length between 1 and 3 (inclusive) between
 node with name=fzk and node with name=krisgeus with
             any relationship type=FOLLOWS
Cypher graph query language

  start a = node:allnodes(‘name:”fzk”’),
  b = node:allnodes(‘name:”krisgeus”’)
  match p = a<-[:FOLLOWS]-x-[:FOLLOWS]->b
  return x




All people who follow both fzk and krisgeus. Note the
 indication of direction in the relationship predicates.
http://thejit.org/
Toy problem and viewer source code:
   https://github.com/friso/graphs




   Questions?
                        Friso van Vollenhoven
                                         @fzk
                 fvanvollenhoven@xebia.com

More Related Content

What's hot

python chapter 1
python chapter 1python chapter 1
python chapter 1Raghu nath
 
Python chapter 2
Python chapter 2Python chapter 2
Python chapter 2Raghu nath
 
Go 프로그래밍 소개 - 장재휴, DomainDriven커뮤니티
Go 프로그래밍 소개 - 장재휴, DomainDriven커뮤니티Go 프로그래밍 소개 - 장재휴, DomainDriven커뮤니티
Go 프로그래밍 소개 - 장재휴, DomainDriven커뮤니티JaeYeoul Ahn
 
刘平川:【用户行为分析】Marmot实践
刘平川:【用户行为分析】Marmot实践刘平川:【用户行为分析】Marmot实践
刘平川:【用户行为分析】Marmot实践taobao.com
 
Groovy ネタ NGK 忘年会2009 ライトニングトーク
Groovy ネタ NGK 忘年会2009 ライトニングトークGroovy ネタ NGK 忘年会2009 ライトニングトーク
Groovy ネタ NGK 忘年会2009 ライトニングトークTsuyoshi Yamamoto
 
Is HTML5 Ready? (workshop)
Is HTML5 Ready? (workshop)Is HTML5 Ready? (workshop)
Is HTML5 Ready? (workshop)Remy Sharp
 
The Ring programming language version 1.8 book - Part 28 of 202
The Ring programming language version 1.8 book - Part 28 of 202The Ring programming language version 1.8 book - Part 28 of 202
The Ring programming language version 1.8 book - Part 28 of 202Mahmoud Samir Fayed
 
MongoDB .local Munich 2019: Aggregation Pipeline Power++: How MongoDB 4.2 Pip...
MongoDB .local Munich 2019: Aggregation Pipeline Power++: How MongoDB 4.2 Pip...MongoDB .local Munich 2019: Aggregation Pipeline Power++: How MongoDB 4.2 Pip...
MongoDB .local Munich 2019: Aggregation Pipeline Power++: How MongoDB 4.2 Pip...MongoDB
 
The MongoDB Driver for F#
The MongoDB Driver for F#The MongoDB Driver for F#
The MongoDB Driver for F#MongoDB
 
Replication and Replica Sets
Replication and Replica SetsReplication and Replica Sets
Replication and Replica SetsMongoDB
 
Python 내장 함수
Python 내장 함수Python 내장 함수
Python 내장 함수용 최
 
Replication and Replica Sets
Replication and Replica SetsReplication and Replica Sets
Replication and Replica SetsMongoDB
 
GeeCON Prague 2014 - Metaprogramming with Groovy
GeeCON Prague 2014 - Metaprogramming with GroovyGeeCON Prague 2014 - Metaprogramming with Groovy
GeeCON Prague 2014 - Metaprogramming with GroovyIván López Martín
 
Gpu programming with java
Gpu programming with javaGpu programming with java
Gpu programming with javaGary Sieling
 
The Ring programming language version 1.3 book - Part 33 of 88
The Ring programming language version 1.3 book - Part 33 of 88The Ring programming language version 1.3 book - Part 33 of 88
The Ring programming language version 1.3 book - Part 33 of 88Mahmoud Samir Fayed
 
Q-learning and Deep Q Network (Reinforcement Learning)
Q-learning and Deep Q Network (Reinforcement Learning)Q-learning and Deep Q Network (Reinforcement Learning)
Q-learning and Deep Q Network (Reinforcement Learning)Thom Lane
 
Clustering com numpy e cython
Clustering com numpy e cythonClustering com numpy e cython
Clustering com numpy e cythonAnderson Dantas
 
(Greach 2015) Dsl'ing your Groovy
(Greach 2015) Dsl'ing your Groovy(Greach 2015) Dsl'ing your Groovy
(Greach 2015) Dsl'ing your GroovyAlonso Torres
 

What's hot (20)

python chapter 1
python chapter 1python chapter 1
python chapter 1
 
Python chapter 2
Python chapter 2Python chapter 2
Python chapter 2
 
Go 프로그래밍 소개 - 장재휴, DomainDriven커뮤니티
Go 프로그래밍 소개 - 장재휴, DomainDriven커뮤니티Go 프로그래밍 소개 - 장재휴, DomainDriven커뮤니티
Go 프로그래밍 소개 - 장재휴, DomainDriven커뮤니티
 
刘平川:【用户行为分析】Marmot实践
刘平川:【用户行为分析】Marmot实践刘平川:【用户行为分析】Marmot实践
刘平川:【用户行为分析】Marmot实践
 
Groovy ネタ NGK 忘年会2009 ライトニングトーク
Groovy ネタ NGK 忘年会2009 ライトニングトークGroovy ネタ NGK 忘年会2009 ライトニングトーク
Groovy ネタ NGK 忘年会2009 ライトニングトーク
 
Is HTML5 Ready? (workshop)
Is HTML5 Ready? (workshop)Is HTML5 Ready? (workshop)
Is HTML5 Ready? (workshop)
 
The Ring programming language version 1.8 book - Part 28 of 202
The Ring programming language version 1.8 book - Part 28 of 202The Ring programming language version 1.8 book - Part 28 of 202
The Ring programming language version 1.8 book - Part 28 of 202
 
MongoDB .local Munich 2019: Aggregation Pipeline Power++: How MongoDB 4.2 Pip...
MongoDB .local Munich 2019: Aggregation Pipeline Power++: How MongoDB 4.2 Pip...MongoDB .local Munich 2019: Aggregation Pipeline Power++: How MongoDB 4.2 Pip...
MongoDB .local Munich 2019: Aggregation Pipeline Power++: How MongoDB 4.2 Pip...
 
The MongoDB Driver for F#
The MongoDB Driver for F#The MongoDB Driver for F#
The MongoDB Driver for F#
 
Groovy
GroovyGroovy
Groovy
 
Replication and Replica Sets
Replication and Replica SetsReplication and Replica Sets
Replication and Replica Sets
 
Python 내장 함수
Python 내장 함수Python 내장 함수
Python 내장 함수
 
DDS-20m
DDS-20mDDS-20m
DDS-20m
 
Replication and Replica Sets
Replication and Replica SetsReplication and Replica Sets
Replication and Replica Sets
 
GeeCON Prague 2014 - Metaprogramming with Groovy
GeeCON Prague 2014 - Metaprogramming with GroovyGeeCON Prague 2014 - Metaprogramming with Groovy
GeeCON Prague 2014 - Metaprogramming with Groovy
 
Gpu programming with java
Gpu programming with javaGpu programming with java
Gpu programming with java
 
The Ring programming language version 1.3 book - Part 33 of 88
The Ring programming language version 1.3 book - Part 33 of 88The Ring programming language version 1.3 book - Part 33 of 88
The Ring programming language version 1.3 book - Part 33 of 88
 
Q-learning and Deep Q Network (Reinforcement Learning)
Q-learning and Deep Q Network (Reinforcement Learning)Q-learning and Deep Q Network (Reinforcement Learning)
Q-learning and Deep Q Network (Reinforcement Learning)
 
Clustering com numpy e cython
Clustering com numpy e cythonClustering com numpy e cython
Clustering com numpy e cython
 
(Greach 2015) Dsl'ing your Groovy
(Greach 2015) Dsl'ing your Groovy(Greach 2015) Dsl'ing your Groovy
(Greach 2015) Dsl'ing your Groovy
 

Viewers also liked

Class graph neo4j and software metrics
Class graph neo4j and software metricsClass graph neo4j and software metrics
Class graph neo4j and software metricsjexp
 
PMCD Fall 2015 Newsletter
PMCD Fall 2015 NewsletterPMCD Fall 2015 Newsletter
PMCD Fall 2015 NewsletterSandeep Raju
 
Xebicon 2015 - Go Data Driven NOW!
Xebicon 2015 - Go Data Driven NOW!Xebicon 2015 - Go Data Driven NOW!
Xebicon 2015 - Go Data Driven NOW!fvanvollenhoven
 
Divolte Collector - meetup presentation
Divolte Collector - meetup presentationDivolte Collector - meetup presentation
Divolte Collector - meetup presentationfvanvollenhoven
 
Unidirectional Security, Andrew Ginter of Waterfall Security
Unidirectional Security, Andrew Ginter of Waterfall Security Unidirectional Security, Andrew Ginter of Waterfall Security
Unidirectional Security, Andrew Ginter of Waterfall Security Digital Bond
 
Security best practices for hyper v and server virtualisation [svr307]
Security best practices for hyper v and server virtualisation [svr307]Security best practices for hyper v and server virtualisation [svr307]
Security best practices for hyper v and server virtualisation [svr307]Louis Göhl
 
Introduction To Work Item Customisation
Introduction To Work Item CustomisationIntroduction To Work Item Customisation
Introduction To Work Item Customisationwbarthol
 
SQL and NoSQL in SQL Server
SQL and NoSQL in SQL ServerSQL and NoSQL in SQL Server
SQL and NoSQL in SQL ServerMichael Rys
 
Understanding AzMan In Hyper-V
Understanding AzMan In Hyper-VUnderstanding AzMan In Hyper-V
Understanding AzMan In Hyper-VLai Yoong Seng
 
Windows Server 2008 R2 Hyper-V SP1 Component Architecture
Windows Server 2008 R2 Hyper-V SP1 Component Architecture Windows Server 2008 R2 Hyper-V SP1 Component Architecture
Windows Server 2008 R2 Hyper-V SP1 Component Architecture Tũi Wichets
 
Rodc features
Rodc featuresRodc features
Rodc featurespothurajr
 
Getting Started With The TFS API
Getting Started With The TFS APIGetting Started With The TFS API
Getting Started With The TFS APIwbarthol
 
Managing Hyper-V With PowerShell
Managing Hyper-V With PowerShellManaging Hyper-V With PowerShell
Managing Hyper-V With PowerShellRavikanth Chaganti
 
Attacking Web Applications
Attacking Web ApplicationsAttacking Web Applications
Attacking Web ApplicationsSasha Goldshtein
 
Storage and hyper v - the choices you can make and the things you need to kno...
Storage and hyper v - the choices you can make and the things you need to kno...Storage and hyper v - the choices you can make and the things you need to kno...
Storage and hyper v - the choices you can make and the things you need to kno...Louis Göhl
 
Software development manager performance appraisal
Software development manager performance appraisalSoftware development manager performance appraisal
Software development manager performance appraisalmartinjack417
 
Hyper-V Best Practices & Tips and Tricks
Hyper-V Best Practices & Tips and TricksHyper-V Best Practices & Tips and Tricks
Hyper-V Best Practices & Tips and TricksAmit Gatenyo
 
DeltaV Development Systems in a Virtualized Environment
DeltaV Development Systems in a Virtualized EnvironmentDeltaV Development Systems in a Virtualized Environment
DeltaV Development Systems in a Virtualized EnvironmentEmerson Exchange
 

Viewers also liked (20)

Neo4j Jízdomat
Neo4j JízdomatNeo4j Jízdomat
Neo4j Jízdomat
 
Class graph neo4j and software metrics
Class graph neo4j and software metricsClass graph neo4j and software metrics
Class graph neo4j and software metrics
 
PMCD Fall 2015 Newsletter
PMCD Fall 2015 NewsletterPMCD Fall 2015 Newsletter
PMCD Fall 2015 Newsletter
 
RuG Guest Lecture
RuG Guest LectureRuG Guest Lecture
RuG Guest Lecture
 
Xebicon 2015 - Go Data Driven NOW!
Xebicon 2015 - Go Data Driven NOW!Xebicon 2015 - Go Data Driven NOW!
Xebicon 2015 - Go Data Driven NOW!
 
Divolte Collector - meetup presentation
Divolte Collector - meetup presentationDivolte Collector - meetup presentation
Divolte Collector - meetup presentation
 
Unidirectional Security, Andrew Ginter of Waterfall Security
Unidirectional Security, Andrew Ginter of Waterfall Security Unidirectional Security, Andrew Ginter of Waterfall Security
Unidirectional Security, Andrew Ginter of Waterfall Security
 
Security best practices for hyper v and server virtualisation [svr307]
Security best practices for hyper v and server virtualisation [svr307]Security best practices for hyper v and server virtualisation [svr307]
Security best practices for hyper v and server virtualisation [svr307]
 
Introduction To Work Item Customisation
Introduction To Work Item CustomisationIntroduction To Work Item Customisation
Introduction To Work Item Customisation
 
SQL and NoSQL in SQL Server
SQL and NoSQL in SQL ServerSQL and NoSQL in SQL Server
SQL and NoSQL in SQL Server
 
Understanding AzMan In Hyper-V
Understanding AzMan In Hyper-VUnderstanding AzMan In Hyper-V
Understanding AzMan In Hyper-V
 
Windows Server 2008 R2 Hyper-V SP1 Component Architecture
Windows Server 2008 R2 Hyper-V SP1 Component Architecture Windows Server 2008 R2 Hyper-V SP1 Component Architecture
Windows Server 2008 R2 Hyper-V SP1 Component Architecture
 
Rodc features
Rodc featuresRodc features
Rodc features
 
Getting Started With The TFS API
Getting Started With The TFS APIGetting Started With The TFS API
Getting Started With The TFS API
 
Managing Hyper-V With PowerShell
Managing Hyper-V With PowerShellManaging Hyper-V With PowerShell
Managing Hyper-V With PowerShell
 
Attacking Web Applications
Attacking Web ApplicationsAttacking Web Applications
Attacking Web Applications
 
Storage and hyper v - the choices you can make and the things you need to kno...
Storage and hyper v - the choices you can make and the things you need to kno...Storage and hyper v - the choices you can make and the things you need to kno...
Storage and hyper v - the choices you can make and the things you need to kno...
 
Software development manager performance appraisal
Software development manager performance appraisalSoftware development manager performance appraisal
Software development manager performance appraisal
 
Hyper-V Best Practices & Tips and Tricks
Hyper-V Best Practices & Tips and TricksHyper-V Best Practices & Tips and Tricks
Hyper-V Best Practices & Tips and Tricks
 
DeltaV Development Systems in a Virtualized Environment
DeltaV Development Systems in a Virtualized EnvironmentDeltaV Development Systems in a Virtualized Environment
DeltaV Development Systems in a Virtualized Environment
 

Similar to Network analysis using Hadoop and Neo4j

Leap Ahead with Redis 6.2
Leap Ahead with Redis 6.2Leap Ahead with Redis 6.2
Leap Ahead with Redis 6.2VMware Tanzu
 
Stupid Awesome Python Tricks
Stupid Awesome Python TricksStupid Awesome Python Tricks
Stupid Awesome Python TricksBryan Helmig
 
Happy Go Programming
Happy Go ProgrammingHappy Go Programming
Happy Go ProgrammingLin Yo-An
 
1-Object and Data Structures.pptx
1-Object and Data Structures.pptx1-Object and Data Structures.pptx
1-Object and Data Structures.pptxRobNieves1
 
Python basic
Python basic Python basic
Python basic sewoo lee
 
Data Mangling with mongoDB the Right Way [PyData London] 2016]
Data Mangling with mongoDB the Right Way [PyData London] 2016]Data Mangling with mongoDB the Right Way [PyData London] 2016]
Data Mangling with mongoDB the Right Way [PyData London] 2016]Alexander Hendorf
 
The Rust Programming Language: an Overview
The Rust Programming Language: an OverviewThe Rust Programming Language: an Overview
The Rust Programming Language: an OverviewRoberto Casadei
 
js+ts fullstack typescript with react and express.pdf
js+ts fullstack typescript with react and express.pdfjs+ts fullstack typescript with react and express.pdf
js+ts fullstack typescript with react and express.pdfNuttavutThongjor1
 
fullstack typescript with react and express.pdf
fullstack typescript with react and express.pdffullstack typescript with react and express.pdf
fullstack typescript with react and express.pdfNuttavutThongjor1
 
Opentalk at Large - StS 2005
Opentalk at Large - StS 2005Opentalk at Large - StS 2005
Opentalk at Large - StS 2005Martin Kobetic
 
GECon2017_Cpp a monster that no one likes but that will outlast them all _Ya...
GECon2017_Cpp  a monster that no one likes but that will outlast them all _Ya...GECon2017_Cpp  a monster that no one likes but that will outlast them all _Ya...
GECon2017_Cpp a monster that no one likes but that will outlast them all _Ya...GECon_Org Team
 

Similar to Network analysis using Hadoop and Neo4j (20)

Introduction to Groovy
Introduction to GroovyIntroduction to Groovy
Introduction to Groovy
 
Leap Ahead with Redis 6.2
Leap Ahead with Redis 6.2Leap Ahead with Redis 6.2
Leap Ahead with Redis 6.2
 
Lab 13
Lab 13Lab 13
Lab 13
 
Lập trình Python cơ bản
Lập trình Python cơ bảnLập trình Python cơ bản
Lập trình Python cơ bản
 
Stupid Awesome Python Tricks
Stupid Awesome Python TricksStupid Awesome Python Tricks
Stupid Awesome Python Tricks
 
Data type in c
Data type in cData type in c
Data type in c
 
Data type2 c
Data type2 cData type2 c
Data type2 c
 
Happy Go Programming
Happy Go ProgrammingHappy Go Programming
Happy Go Programming
 
Arrays
ArraysArrays
Arrays
 
1-Object and Data Structures.pptx
1-Object and Data Structures.pptx1-Object and Data Structures.pptx
1-Object and Data Structures.pptx
 
Python basic
Python basic Python basic
Python basic
 
Data Mangling with mongoDB the Right Way [PyData London] 2016]
Data Mangling with mongoDB the Right Way [PyData London] 2016]Data Mangling with mongoDB the Right Way [PyData London] 2016]
Data Mangling with mongoDB the Right Way [PyData London] 2016]
 
360|iDev
360|iDev360|iDev
360|iDev
 
The Rust Programming Language: an Overview
The Rust Programming Language: an OverviewThe Rust Programming Language: an Overview
The Rust Programming Language: an Overview
 
ts+js
ts+jsts+js
ts+js
 
js+ts fullstack typescript with react and express.pdf
js+ts fullstack typescript with react and express.pdfjs+ts fullstack typescript with react and express.pdf
js+ts fullstack typescript with react and express.pdf
 
fullstack typescript with react and express.pdf
fullstack typescript with react and express.pdffullstack typescript with react and express.pdf
fullstack typescript with react and express.pdf
 
Opentalk at Large - StS 2005
Opentalk at Large - StS 2005Opentalk at Large - StS 2005
Opentalk at Large - StS 2005
 
FalcorJS
FalcorJSFalcorJS
FalcorJS
 
GECon2017_Cpp a monster that no one likes but that will outlast them all _Ya...
GECon2017_Cpp  a monster that no one likes but that will outlast them all _Ya...GECon2017_Cpp  a monster that no one likes but that will outlast them all _Ya...
GECon2017_Cpp a monster that no one likes but that will outlast them all _Ya...
 

More from fvanvollenhoven

Prototyping online ML with Divolte Collector
Prototyping online ML with Divolte CollectorPrototyping online ML with Divolte Collector
Prototyping online ML with Divolte Collectorfvanvollenhoven
 
Apache Spark talk @ The Amsterdam Applied Machine Learning meetup group
Apache Spark talk @ The Amsterdam Applied Machine Learning meetup groupApache Spark talk @ The Amsterdam Applied Machine Learning meetup group
Apache Spark talk @ The Amsterdam Applied Machine Learning meetup groupfvanvollenhoven
 
JFall 2011 no sql workshop
JFall 2011 no sql workshopJFall 2011 no sql workshop
JFall 2011 no sql workshopfvanvollenhoven
 
GOTO 2011 preso: 3x Hadoop
GOTO 2011 preso: 3x HadoopGOTO 2011 preso: 3x Hadoop
GOTO 2011 preso: 3x Hadoopfvanvollenhoven
 
Hadoop, HDFS and MapReduce
Hadoop, HDFS and MapReduceHadoop, HDFS and MapReduce
Hadoop, HDFS and MapReducefvanvollenhoven
 

More from fvanvollenhoven (6)

Prototyping online ML with Divolte Collector
Prototyping online ML with Divolte CollectorPrototyping online ML with Divolte Collector
Prototyping online ML with Divolte Collector
 
Apache Spark talk @ The Amsterdam Applied Machine Learning meetup group
Apache Spark talk @ The Amsterdam Applied Machine Learning meetup groupApache Spark talk @ The Amsterdam Applied Machine Learning meetup group
Apache Spark talk @ The Amsterdam Applied Machine Learning meetup group
 
JFall 2011 no sql workshop
JFall 2011 no sql workshopJFall 2011 no sql workshop
JFall 2011 no sql workshop
 
GOTO 2011 preso: 3x Hadoop
GOTO 2011 preso: 3x HadoopGOTO 2011 preso: 3x Hadoop
GOTO 2011 preso: 3x Hadoop
 
Hadoop, HDFS and MapReduce
Hadoop, HDFS and MapReduceHadoop, HDFS and MapReduce
Hadoop, HDFS and MapReduce
 
Berlin Buzzwords preso
Berlin Buzzwords presoBerlin Buzzwords preso
Berlin Buzzwords preso
 

Recently uploaded

Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 

Recently uploaded (20)

Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 

Network analysis using Hadoop and Neo4j

  • 1. Network analysis using Hadoop en Neo4j Friso van Vollenhoven @fzk fvanvollenhoven@xebia.com
  • 3. ws RT: @krisgeus... follo RT: @krisgeus... RT ows RT foll @fzk RT RT: @krisgeus...
  • 4.
  • 5. s to goe
  • 6. See any meaningful patterns?
  • 7. A toy problem... TABLE_DUMP2|1332345590|B|195.66.224.97|1299|1.11.64.0/21| 1299 6461 9318 38091|EGP|195.66.224.97|0|0||NAG|| AS1299 AS9318 AS6461 AS38091
  • 8. TABLE_DUMP2|1332345590|B|195.66.224.97|1299|1.11.64.0/21 TABLE_DUMP2|1332345590|B|195.66.224.97|1299|1.11.64.0/21 TABLE_DUMP2|1332345590|B|195.66.224.97|1299|1.11.64.0/21 TABLE_DUMP2|1332345590|B|195.66.224.97|1299|1.11.64.0/21 raw TABLE_DUMP2|1332345590|B|195.66.224.97|1299|1.11.64.0/21 TABLE_DUMP2|1332345590|B|195.66.224.97|1299|1.11.64.0/21 transform nodes.txt TABLE_DUMP2|1332345590|B|195.66.224.97|1299|1.11.64.0/21 TABLE_DUMP2|1332345590|B|195.66.224.97|1299|1.11.64.0/21 TABLE_DUMP2|1332345590|B|195.66.224.97|1299|1.11.64.0/21 TABLE_DUMP2|1332345590|B|195.66.224.97|1299|1.11.64.0/21 data + TABLE_DUMP2|1332345590|B|195.66.224.97|1299|1.11.64.0/21 TABLE_DUMP2|1332345590|B|195.66.224.97|1299|1.11.64.0/21 TABLE_DUMP2|1332345590|B|195.66.224.97|1299|1.11.64.0/21 TABLE_DUMP2|1332345590|B|195.66.224.97|1299|1.11.64.0/21 edges.txt TABLE_DUMP2|1332345590|B|195.66.224.97|1299|1.11.64.0/21 TABLE_DUMP2|1332345590|B|195.66.224.97|1299|1.11.64.0/21 TABLE_DUMP2|1332345590|B|195.66.224.97|1299|1.11.64.0/21 en TABLE_DUMP2|1332345590|B|195.66.224.97|1299|1.11.64.0/21 TABLE_DUMP2|1332345590|B|195.66.224.97|1299|1.11.64.0/21 TABLE_DUMP2|1332345590|B|195.66.224.97|1299|1.11.64.0/21 TABLE_DUMP2|1332345590|B|195.66.224.97|1299|1.11.64.0/21 ric h ? import enriched-nodes.txt + enriched-edges.txt qu int er er y / ac t
  • 9. [head] GlobHfs[/Users/friso/Downloads/bview/alltxt.txt] [{14}:'proto', 'time', 'type', 'peerip', 'peeras', 'prefix', 'path', 'origin', 'nexthop', 'localpref', 'MED', 'community', 'AAGG', 'aggregator'] [{14}:'proto', [{14}:'proto', 'time', 'type', 'peerip', 'peeras', 'prefix', 'path', 'origin', 'nexthop', 'localpref', 'MED', 'community', 'AAGG', 'aggregator'] [{14}:'proto', Each('nodes')[PathToNodes[decl:'id', 'name']] [{2}:'id', 'name'] Each('edges')[PathToEdges[decl:'from', 'to', 'updatecount']] [{2}:'id', 'name'] [{3}:'from', 'to', 'updatecount'] Each('nodes')[FilterPartialDuplicates[decl:'id', 'name']] [{3}:'from', 'to', 'updatecount'] [{2}:'id', 'name'] GroupBy('edges')[by:['from', 'to']] [{2}:'id', 'name'] edges[{2}:'from', 'to'] GroupBy('nodes')[by:['id']] [{3}:'from', 'to', 'updatecount'] nodes[{1}:'id'] Every('edges')[Sum[decl:'updatecount'][args:1]] [{2}:'id', 'name'] [{3}:'from', 'to', 'updatecount'] Every('nodes')[First[decl:'id', 'name']] [{3}:'from', 'to', 'updatecount'] [{2}:'id', 'name'] Hfs['TextDelimited[['from', 'to', 'updatecount']]']['/tmp/edges']'] [{2}:'id', 'name'] [{3}:'from', 'to', 'updatecount'] Hfs['TextDelimited[['id', 'name']]']['/tmp/nodes']'] [{3}:'from', 'to', 'updatecount'] [{2}:'id', 'name'] [{2}:'id', 'name'] [tail]
  • 11. nodes.txt: 1 AS1 LVLT-1 - Level 3 Communications, Inc. 10 AS10 CSNET-EXT-AS - CSNET Coordination and Information Center (CSNET-CIC) 100 AS100 FMC-CTC - FMC Central Engineering Laboratories 1000 AS1000 GONET-ASN-17 - GONET 10000 AS10000 NCM Nagasaki Cable Media Inc. 10001 AS10001 MICSNET Mics Network Corporation 10002 AS10002 ICT IGAUENO CABLE TELEVISION CO.,LTD 10003 AS10003 OCT-NET Ogaki Cable Television Co.,Inc. 10004 AS10004 AS-PHOENIX-J JIN Office Service Inc. 10006 AS10006 SECOMTRUST SECOM Trust Systems Co.,Ltd. 10010 AS10010 TOKAI TOKAI Communications Corporation 10011 AS10011 ADVAN advanscope.inc 10012 AS10012 FUSION Fusion Communications Corp. edges.txt: 1 21616 3 1 3705 3 1 2 3 2 3 1 3 4 2 3 11488 2 4 5 1 10 10 2 10 13227 2 12 12 1
  • 12. public class SillyImporter { private static enum ConnectionTypes implements RelationshipType { FOLLOWS; } public static void main(String[] args) { BatchInserter database = new BatchInserterImpl("/Users/friso/Desktop/graph.db"); BatchInserterIndexProvider provider = new LuceneBatchInserterIndexProvider(database); BatchInserterIndex index = provider.nodeIndex("allnodes", stringMap( "type", "fulltext", IndexManager.PROVIDER, "lucene" )); long fzkNodeId = database.createNode(map( new Object[] {"name", "fzk", "tweets", 25L} //node properties )); index.add(fzkNodeId, map(new Object[] {"name", "fzk"})); long krisgeusNodeId = database.createNode(map( new Object[] {"name", "krisgeus", "tweets", 100L} //node properties )); index.add(krisgeusNodeId, map(new Object[] {"name", "krisgeus"})); database.createRelationship( krisgeusNodeId, //from node fzkNodeId, //to node ConnectionTypes.FOLLOWS, //relationship type map(new Object[] {"retweets", 3})); //relationship properties index.flush(); provider.shutdown(); database.shutdown(); } }
  • 13. 30M nodes + 250M edges, < 30 minutes (if your graph fits in memory)
  • 14. Cypher graph query language start a = node:allnodes(‘name:”fzk”’) match p = a-[r]-b return p All relationships from node with name=fzk
  • 15. Cypher graph query language start a = node:allnodes(‘name:”fzk”’) match p = a-[r]-b where any(x in r.amounts where x > 500) return p All relationships from node with name=fzk, where any element in the array property amounts of the relationship is greater than 500
  • 16. Cypher graph query language start a = node:allnodes(‘name:”fzk”’), b = node:allnodes(‘name:”krisgeus”’) match p = a-[:FOLLOWS*1..3]-b return p All paths with a length between 1 and 3 (inclusive) between node with name=fzk and node with name=krisgeus with any relationship type=FOLLOWS
  • 17. Cypher graph query language start a = node:allnodes(‘name:”fzk”’), b = node:allnodes(‘name:”krisgeus”’) match p = a<-[:FOLLOWS]-x-[:FOLLOWS]->b return x All people who follow both fzk and krisgeus. Note the indication of direction in the relationship predicates.
  • 19.
  • 20. Toy problem and viewer source code: https://github.com/friso/graphs Questions? Friso van Vollenhoven @fzk fvanvollenhoven@xebia.com