SlideShare a Scribd company logo
1 of 44
How to Win Friends and Influence People (with
Hadoop)
Strata Conference New York
Sam Shah and Joseph Adler
October 25 2012


©2012 LinkedIn Corporation. All Rights Reserved.
Sam Shah
                          Principal Engineer and Engineering Manager
                          www.linkedin.com/in/shahsam




                          Joseph Adler
                          Senior Data Scientist
                          www.linkedin.com/in/josephadler




©2012 LinkedIn Corporation. All Rights Reserved.
LinkedIn is the leading professional network site



         175M+
           LinkedIn Members



         640M+        Worldwide
                   Professionals


 3,300M+
      Worldwide Workforce




©2012 LinkedIn Corporation. All Rights Reserved.   STRATA NY 2012   3
Data rich




       175+M                                       Members   175M   Member
                                                                    Profiles


©2012 LinkedIn Corporation. All Rights Reserved.                          STRATA NY 2012   4
LinkedIn




          9.3B                                 Page Views
                                               per Quarter   130M   Unique Visitors



©2012 LinkedIn Corporation. All Rights Reserved.                          STRATA NY 2012   5
We have a lot of data.




©2012 LinkedIn Corporation. All Rights Reserved.
We have a lot of data.
    We want to leverage this data to build products.




©2012 LinkedIn Corporation. All Rights Reserved.
We have a lot of data.
  We want to leverage this data to build products.
How do you make it easy to build products from data?




 ©2012 LinkedIn Corporation. All Rights Reserved.
Products we have built on Hadoop




©2012 LinkedIn Corporation. All Rights Reserved.   STRATA NY 2012   9
Building products from data

Examples of products built with data

    Year in Review Email
    Network Updates
    Skills and Endorsements
    People You May Know
    and more…




©2012 LinkedIn Corporation. All Rights Reserved.   STRATA NY 2012
Year in Review


                            One of the most
                            successful email
                            messages ever.




   20%           Response
                 Rate                 5   Clicks per
                                          responder


                                                 STRATA NY 2012
Network updates




©2012 LinkedIn Corporation. All Rights Reserved.   STRATA NY 2012 12
People you may know




©2012 LinkedIn Corporation. All Rights Reserved.   STRATA NY 2012 13
Skills and Endorsements




©2012 LinkedIn Corporation. All Rights Reserved.   STRATA NY 2012
Building products from data

Hadoop is awesome for building product with data

    Lots of cheap storage
    Vast computational resources
    Lots of tools for processing data, learning from data
    Shared infrastructure
    Shared support services
    Runs on commodity hardware (or AWS)




©2012 LinkedIn Corporation. All Rights Reserved.             STRATA NY 2012
Leverage

The marginal cost of building new products is low

    People You May Know (2 people)
    Skills and Endorsements (2 people)
    Year in Review (1 person, 1 month)
    Network Updates Stream (1 person, 3 months)

 Hadoop can empower small teams to build things




©2012 LinkedIn Corporation. All Rights Reserved.    STRATA NY 2012
Leverage

The marginal cost of building new products is low

    People You May Know (2 people)
    Skills and Endorsements (2 people)
    Year in Review (1 person, 1 month)
    Network Updates Stream (1 person, 3 months)

 Hadoop can empower small teams to build things




©2012 LinkedIn Corporation. All Rights Reserved.    STRATA NY 2012
Turning data into products

How we build products




©2012 LinkedIn Corporation. All Rights Reserved.   STRATA NY 2012 18
Year in Review


                  Steps to make the email

                    – Collect job changers
                    – Figure out who is connected
                      to them
                    – Rank job changes




                                          STRATA NY 2012
Example: Year in Review


memberPosition = LOAD '$latest_positions' USING BinaryJSON;                   connectionsWithChangeWithPic::source_id AS source_id,
memberWithPositionsChangedLastYear = FOREACH (
                                                                              connectionsWithChangeWithPic::member_id AS member_id,
  FILTER memberPosition BY ((start_date >= $start_date_low ) AND              connectionsWithChangeWithPic::dest_first_name as first_name,
   (start_date <= $start_date_high))                                          connectionsWithChangeWithPic::dest_last_name as last_name,
) GENERATE member_id, start_date, end_date;                                   connectionsWithChangeWithPic::pic_id AS pic_id,
                                                                              memberinfowpics::first_name AS firstName,
allConnections = LOAD '$latest_bidirectional_connections' USING BinaryJSON;   memberinfowpics::last_name AS lastName,
                                                                              memberinfowpics::gmt_offset as gmt_offset,
allConnectionsWithChange_nondistinct = FOREACH (                              memberinfowpics::email_locale as email_locale,
  JOIN memberWithPositionsChangedLastYear BY member_id,                       memberinfowpics::email_address as email_address;
  allConnections BY dest
) GENERATE allConnections::source AS source,
 allConnections::dest AS dest;                                                resultGroup0 = GROUP withName BY (source_id, firstName,
                                                                               lastName, email_address, email_locale, gmt_offset);
allConnectionsWithChange = DISTINCT
 allConnectionsWithChange_nondistinct;                                        -- get the count of results per recipient
                                                                              resultGroupCount = FOREACH resultGroup0 GENERATE group ,
memberinfowpics = LOAD '$latest_memberinfowpics' USING                         withName as toomany, COUNT_STAR(withName) as num_results;
 BinaryJSON;
pictures = FOREACH ( FILTER memberinfowpics BY                                resultGroupPre = filter resultGroupCount by num_results > 2;
 ((cropped_picture_id is not null) AND                                        resultGroup = FOREACH resultGroupPre {
 ( (member_picture_privacy == 'N') OR                                           withName = LIMIT toomany 64;
   (member_picture_privacy == 'E')))
) GENERATE member_id, cropped_picture_id, first_name as
                                                                                GENERATE group, withName, num_results;
  dest_first_name, last_name as dest_last_name;                               }

resultPic = JOIN allConnectionsWithChange BY dest, pictures                   x_in_review_pre_out = FOREACH resultGroup GENERATE
 BY member_id;                                                                 FLATTEN(group) as (source_id, firstName, lastName,
connectionsWithChangeWithPic = FOREACH resultPic GENERATE                      email_address, email_locale, gmt_offset),
 allConnectionsWithChange::source AS source_id,                                withName.(member_id, pic_id, first_name, last_name) as
 allConnectionsWithChange::dest AS member_id,
                                                                               jobChanger, '2011' as changeYear:chararray,
 pictures::cropped_picture_id AS pic_id,                                       num_results as num_results;
 pictures::dest_first_name AS dest_first_name,
 pictures::dest_last_name AS dest_last_name;
                                                                              x_in_review = FOREACH x_in_review_pre_out GENERATE
                                                                               source_id as recipientID, gmt_offset as gmtOffset,
joinResult = JOIN connectionsWithChangeWithPic BY source_id,                   firstName as first_name, lastName as last_name, email_address,
  memberinfowpics BY member_id;                                                email_locale,
  withName = FOREACH joinResult GENERATE                                       TOTUPLE( changeYear, source_id,firstName, lastName,
                                                                                num_results,jobChanger) as body;

                                                                              rmf $xir;
                                                                              STORE x_in_review INTO '$xir' USING BinaryJSON('recipientID');




                                                                                                                                          STRATA NY 2012
Example: Year in Review

{body={num_results=80, lastName=Adler, changeYear=2011, firstName=Joseph, jobChanger=[{last_name=O'Connor, first_n
ame=Br?on, member_id=12562482, pic_id=/p/3/000/086/1bd/10ee035.jpg}, {last_name=Sundaram, first_name=Vivek, member
_id=6590171, pic_id=/p/3/000/0ae/354/36eb54c.jpg}, {last_name=Crane, first_name=Patrick, member_id=8628324, pic_id
                                                                                                                     Each message requires a lot of
=/p/1/000/09c/064/10191de.jpg}, {last_name=McLennan, first_name=Dan, member_id=10551114, pic_id=/p/2/000/09d/12f/1
47def1.jpg}, {last_name=Shaughnessy, first_name=Helen, member_id=2211035, pic_id=/p/3/000/06d/2ba/06a113c.jpg}, {l
ast_name=Chen, first_name=Richard, member_id=12800647, pic_id=/p/2/000/007/1ad/0fb84f9.jpg}, {last_name=Barba, fir
st_name=Troy, member_id=27577, pic_id=/p/2/000/0a2/3e9/3a83a33.jpg}, {last_name=Reed, first_name=Harper, member_id
                                                                                                                     data:
=1865420, pic_id=/p/1/000/001/17b/396a2c3.jpg}, {last_name=Goldstein, first_name=Peter, member_id=205610, pic_id=/
p/2/000/01c/2e6/042999f.jpg}, {last_name=Koren, first_name=Yuval, member_id=2289577, pic_id=/p/1/000/02b/3d3/1fc36
27.jpg}, {last_name=Kiang, first_name=Andy, member_id=8347, pic_id=/p/1/000/063/115/1256f61.jpg}, {last_name=Green
field, first_name=Nick, member_id=82814545, pic_id=/p/1/000/068/39f/2080b8f.jpg}, {last_name=Murarka, first_name=B
ubba, member_id=174233, pic_id=/p/3/000/011/2c8/33837b8.jpg}, {last_name=Kutter, first_name=Norbert, member_id=310
933, pic_id=/p/3/000/005/0e2/02775a0.jpg}, {last_name=Ehrenberg, first_name=Roger, member_id=1662181, pic_id=/p/3/
000/038/066/3572baf.jpg}, {last_name=Coderre, CISSP, first_name=Rob, member_id=68521, pic_id=/p/1/000/088/0d5/2438
981.jpg}, {last_name=Stephens, first_name=Bradford, member_id=10900447, pic_id=/p/1/000/0ad/0dc/15f9df5.jpg}, {las
t_name=Shiau, first_name=Peter, member_id=300654, pic_id=/p/2/000/056/2a6/18938e3.jpg}, {last_name=Rajan, first_na
                                                                                                                         – Header information (10 fields)
me=Arvind, member_id=1260, pic_id=/p/3/000/019/3f7/1e6e0f2.jpg}, {last_name=Bellister, first_name=Jesse, member_id


                                                                                                                         – 4 fields per person, 64 people
=25234604, pic_id=/p/3/000/00a/17d/1e2136b.jpg}, {last_name=Mohan, first_name=Viraj, member_id=56817108, pic_id=/p
/3/000/0cd/0a4/097527a.jpg}, {last_name=Ragade, first_name=Dhananjay, member_id=325284, pic_id=/p/3/000/000/035/05
04fe7.jpg}, {last_name=Richards, first_name=Jeff, member_id=16762, pic_id=/p/2/000/039/14e/081d1c7.jpg}, {last_nam
e=Wittenauer, first_name=Allen, member_id=3328775, pic_id=/p/3/000/08d/2a3/307b112.jpg}, {last_name=Porzak, first_


                                                                                                                         – That’s over 250 data fields for
name=Jim, member_id=1708710, pic_id=/p/2/000/00d/109/0e4aa34.jpg}, {last_name=Ruma, first_name=Laurel, member_id=3
429732, pic_id=/p/1/000/01e/277/2bb115b.jpg}, {last_name=Higgins, first_name=Josh, member_id=1458792, pic_id=/p/1/
000/0c9/38b/1a24457.jpg}, {last_name=Benedict, first_name=Harvey, member_id=641340, pic_id=/p/3/000/0c6/1eb/2eb711
9.jpg}, {last_name=Lazarus, first_name=Brett, member_id=49965786, pic_id=/p/2/000/03b/04e/318d080.jpg}, {last_name
=Zhang, first_name=Simon, member_id=16323996, pic_id=/p/3/000/03f/0fe/35d4ded.jpg}, {last_name=Aspen, first_name=M
att, member_id=25240804, pic_id=/p/3/000/09b/371/22ec974.jpg}, {last_name=Herz, first_name=Erik, member_id=147604,

pic_id=/p/3/000/086/014/0fab4d6.jpg}, {last_name=Sanders, first_name=Geoffrey, member_id=340570, pic_id=/p/1/000/0
                                                                                                                           the final message
d1/2d1/37a76e6.jpg}, {last_name=Wright, first_name=Caleb, member_id=12798700, pic_id=/p/2/000/08c/337/2cc951a.jpg}
, {last_name=Parab, first_name=Guru, member_id=8915230, pic_id=/p/1/000/08a/257/051926a.jpg}, {last_name=Grossman,

first_name=Nick, member_id=12159520, pic_id=/p/2/000/005/2f3/1955f31.jpg}, {last_name=Skomoroch, first_name=Peter,

member_id=11642980, pic_id=/p/2/000/0b4/12d/31eadbe.jpg}, {last_name=Singh, first_name=Deepak, member_id=1246166,
pic_id=/p/1/000/042/3f5/369f807.jpg}, {last_name=Noakes, first_name=Geoffrey, member_id=3518726, pic_id=/p/3/000/0
05/3d7/3f67632.jpg}, {last_name=Scudiere, first_name=Robert, member_id=3965286, pic_id=/p/2/000/090/210/009a099.jp
g}, {last_name=Skyler, first_name=David, member_id=15377099, pic_id=/p/3/000/005/1bf/080b255.jpg}, {last_name=Shar
                                                                                                                     How do we turn this raw data in to
ma, first_name=Manu, member_id=19295378, pic_id=/p/3/000/0d4/11e/2176c30.jpg}, {last_name=Huang, first_name=Erica,

member_id=1808438, pic_id=/p/1/000/001/3a5/02ddd24.jpg}, {last_name=Ballotta, first_name=Pete, member_id=2011178,
pic_id=/p/2/000/0b6/08f/3a92357.jpg}, {last_name=Kast, first_name=Anton, member_id=1092686, pic_id=/p/1/000/054/0e
                                                                                                                     web content or email messages?
2/1a8efb2.jpg}, {last_name=Redfern, first_name=Joff, member_id=2849241, pic_id=/p/3/000/03d/28d/19f5688.jpg}, {las
t_name=Smith, first_name=Aaron, member_id=83470876, pic_id=/p/2/000/08c/27c/3cfe37a.jpg}, {last_name=Yadav, first_
name=Rishi, member_id=2097381, pic_id=/p/2/000/0c8/08d/3ab9006.jpg}, {last_name=Repass, first_name=Mike, member_id
=8633208, pic_id=/p/2/000/071/195/0bfc573.jpg}, {last_name=Dalvi, first_name=Anand, member_id=8388, pic_id=/p/1/00
0/003/3cd/3127384.jpg}, {last_name=Croll, first_name=Alistair, member_id=511218, pic_id=/p/2/000/029/0e5/1ebc076.j
pg}, {last_name=Tolman, first_name=Sarah, member_id=86040596, pic_id=/p/2/000/06f/1c9/1a7870e.jpg}, {last_name=Suv
arna, first_name=Sandeep, member_id=10558779, pic_id=/p/1/000/05b/2c7/0ec214a.jpg}, {last_name=Elliott-
McCrea, first_name=Kellan, member_id=163959, pic_id=/p/1/000/06b/2e8/2dbd3ae.jpg}, {last_name=Jatkar, first_name=T
arang, member_id=17763609, pic_id=/p/1/000/012/010/2e8ee7f.jpg}, {last_name=Brown, first_name=David, member_id=420
737, pic_id=/p/3/000/002/140/0b2dbcc.jpg}, {last_name=Patel, first_name=Jay, member_id=1179857, pic_id=/p/2/000/07
c/0b2/0365e91.jpg}, {last_name=Field, first_name=Dylan, member_id=13066037, pic_id=/p/2/000/0a5/3e2/1fb7f06.jpg},
{last_name=Patel, first_name=Sumeet, member_id=23402387, pic_id=/p/2/000/0bf/3ca/2ca5f1f.jpg}, {last_name=Ting, fi
rst_name=Moses, member_id=15624915, pic_id=/p/2/000/0ac/117/29e329a.jpg}, {last_name=Hinnach, first_name=Yassine,
member_id=1731285, pic_id=/p/3/000/000/035/330cce0.jpg}, {last_name=Das, first_name=Anshu, member_id=38878221, pic
_id=/p/3/000/0b2/1ac/15902f4.jpg}, {last_name=Mendelson, first_name=Jordan, member_id=8598415, pic_id=/p/3/000/032
/22a/1d2eaa6.jpg}, {last_name=Besbeas, first_name=Nick, member_id=12510505, pic_id=/p/3/000/093/167/34f5b6b.jpg}],
 source_id=256842}, first_name=Joseph, email_locale=en_US, last_name=Adler, gmtOffset=-
8, recipientID=256842, email_address=jadler@linkedin.com}




                                                                                                                                                 STRATA NY 2012
People you may know




©2012 LinkedIn Corporation. All Rights Reserved.   STRATA NY 2012 22
People you may know




                                                   Alice           Bob




                                                           Carol




©2012 LinkedIn Corporation. All Rights Reserved.                         STRATA NY 2012 23
People you may know




                                                   Alice           Bob




                                                           Carol




                          > 80% of connections from triangle closing



©2012 LinkedIn Corporation. All Rights Reserved.                         STRATA NY 2012 24
People you may know

                                                                   Age
                   Organizational Overlap                                Distance


          Alice                            Bob




Dave
                         Carol
                                                                                    Ranked
                                                                                     Matches
            Eve




                                                        User
                                                    Interactions                           Results



 ©2012 LinkedIn Corporation. All Rights Reserved.                                              STRATA NY 2012 25
Skills and Endorsements




©2012 LinkedIn Corporation. All Rights Reserved.   STRATA NY 2012
Tagging Skills




©2012 LinkedIn Corporation. All Rights Reserved.   STRATA NY 2012 27
©2012 LinkedIn Corporation. All Rights Reserved.   STRATA NY 2012 28
Skills and Endorsements




  A combination of
           – Propensity to know member
           – Propensity for member to have skill

©2012 LinkedIn Corporation. All Rights Reserved.   STRATA NY 2012
Productionalization

Take something that runs once…

         … and run it multiple times
         … and serve it reliably at scale
         … and iterate quickly




©2012 LinkedIn Corporation. All Rights Reserved.   STRATA NY 2012 31
Data Lifecycle

 Moving around data is the key problem

         1. Ingress
            Moving raw data from online systems to offline systems

         2. Workflow management
            Managing offline processes

         3. Egress
            Moving results from offline systems to online systems




©2012 LinkedIn Corporation. All Rights Reserved.                     STRATA NY 2012 32
Ingress

 Apache Kafka: Low latency publish/subscribe message bus
         – Common data format (Avro)
         – Changelog is the abstraction for integration
         – Schema evolution
                    Programmatic compatibility model
                    Explicit schema reviews
                    “O(1)” ETL

 K. Goodhope, J. Koshy, J. Kreps, N. Narkhede, R. Park, J. Rao, V.Y. Ye: Building
  LinkedIn’s Real-time Activity Data Pipeline. In IEEE Data Engineering Bulletin. Vol
  35, No. 2, June 2012.




©2012 LinkedIn Corporation. All Rights Reserved.                              STRATA NY 2012 33
Workflows

                                                   Job A




                                                   Job B




                                                   Job C




©2012 LinkedIn Corporation. All Rights Reserved.           STRATA NY 2012 34
Workflows

                                                         Job A




                                                         Job B




                                                         Job C



                                                   Push to Production




©2012 LinkedIn Corporation. All Rights Reserved.                        STRATA NY 2012 35
Workflows

                                                         Job A




                                                         Job B

                                                                  Job X

                                                         Job C



                                                   Push to Production




©2012 LinkedIn Corporation. All Rights Reserved.                          STRATA NY 2012 36
Workflows

                                                         Job A




                                                         Job B

                                                                  Job X

                                                         Job C



                                                   Push to Production     Push to QA




©2012 LinkedIn Corporation. All Rights Reserved.                                 STRATA NY 2012 37
Real workflows are complicated




©2012 LinkedIn Corporation. All Rights Reserved.   STRATA NY 2012 38
Workflow Management: Azkaban

    Dependency management
    Diverse job types (Pig, Hive, Java, . . . )
    Scheduling
    Monitoring
    Configuration
    Retry/restart on failure
    Resource locking
    Log collection
    Historical information




©2012 LinkedIn Corporation. All Rights Reserved.   STRATA NY 2012 39
Workflow Management: Azkaban




©2012 LinkedIn Corporation. All Rights Reserved.   STRATA NY 2012 40
Workflow Management: Azkaban




©2012 LinkedIn Corporation. All Rights Reserved.   STRATA NY 2012 41
Egress: Voldemort

 Distributed key/value store
 Easy to integrate into workflows
         – Off the shelf jobs to copy Voldemort Stores
         – One line command in Pig
    Cost of data load
    Data stored per node? Response time
    Fail-over
    How to transfer
    Versioning & rollback

 R. Sumbaly, J. Kreps, L. Gao, A. Feinberg, C. Soman, & S. Shah. Serving Large-
  Scale Batch Computed Data With Project Voldemort. In FAST 2012.




©2012 LinkedIn Corporation. All Rights Reserved.                          STRATA NY 2012 42
Recap

Why we use Hadoop

 Simple programmatic model
 Rich developer ecosystem
         – Languages: Pig, Hive, Crunch, Cascading, …
         – Libraries: Mahout, DataFu, ElephantBird, …
 Horizontal scalability, fault tolerance, multi-tenancy
         – Reliably process multiple TB of data
 Don’t need hardcore distributed systems engineers




©2012 LinkedIn Corporation. All Rights Reserved.           STRATA NY 2012 43
Recap

How we use Hadoop

Open source projects started at LinkedIn:

 Getting data in: Kafka
 Building and running job flows: Azkaban
 Getting data out: Voldemort

This empowers data scientists and engineers to focus on new product
ideas, not infrastructure




©2012 LinkedIn Corporation. All Rights Reserved.           STRATA NY 2012 44
Learning More

data.linkedin.com




©2012 LinkedIn Corporation. All Rights Reserved.   STRATA NY 2012 45

More Related Content

What's hot

Aesthetics and the Beauty of an Architecture
Aesthetics and the Beauty of an ArchitectureAesthetics and the Beauty of an Architecture
Aesthetics and the Beauty of an ArchitectureTom Scott
 
Doctrine MongoDB Object Document Mapper
Doctrine MongoDB Object Document MapperDoctrine MongoDB Object Document Mapper
Doctrine MongoDB Object Document MapperJonathan Wage
 
Symfony Day 2010 Doctrine MongoDB ODM
Symfony Day 2010 Doctrine MongoDB ODMSymfony Day 2010 Doctrine MongoDB ODM
Symfony Day 2010 Doctrine MongoDB ODMJonathan Wage
 
ZendCon2010 Doctrine MongoDB ODM
ZendCon2010 Doctrine MongoDB ODMZendCon2010 Doctrine MongoDB ODM
ZendCon2010 Doctrine MongoDB ODMJonathan Wage
 
Practical Ruby Projects with MongoDB - MongoSF
Practical Ruby Projects with MongoDB - MongoSFPractical Ruby Projects with MongoDB - MongoSF
Practical Ruby Projects with MongoDB - MongoSFAlex Sharp
 
Practical Ruby Projects (Alex Sharp)
Practical Ruby Projects (Alex Sharp)Practical Ruby Projects (Alex Sharp)
Practical Ruby Projects (Alex Sharp)MongoSF
 
Symfony2 from the Trenches
Symfony2 from the TrenchesSymfony2 from the Trenches
Symfony2 from the TrenchesJonathan Wage
 
MongoDB, PHP and the cloud - php cloud summit 2011
MongoDB, PHP and the cloud - php cloud summit 2011MongoDB, PHP and the cloud - php cloud summit 2011
MongoDB, PHP and the cloud - php cloud summit 2011Steven Francia
 
The Testing Games: Mocking, yay!
The Testing Games: Mocking, yay!The Testing Games: Mocking, yay!
The Testing Games: Mocking, yay!Donny Wals
 
Embedding a language into string interpolator
Embedding a language into string interpolatorEmbedding a language into string interpolator
Embedding a language into string interpolatorMichael Limansky
 
Doctrator Symfony Live 2011 San Francisco
Doctrator Symfony Live 2011 San FranciscoDoctrator Symfony Live 2011 San Francisco
Doctrator Symfony Live 2011 San Franciscopablodip
 
W3C XBL 2.0 and Widgets 1.0
W3C XBL 2.0 and Widgets 1.0 W3C XBL 2.0 and Widgets 1.0
W3C XBL 2.0 and Widgets 1.0 Marcos Caceres
 
Couchbase Korea User Group 2nd Meetup #2
Couchbase Korea User Group 2nd Meetup #2Couchbase Korea User Group 2nd Meetup #2
Couchbase Korea User Group 2nd Meetup #2won min jang
 
XML Binding Language 2.0
XML Binding Language 2.0XML Binding Language 2.0
XML Binding Language 2.0Marcos Caceres
 
Type safe embedded domain-specific languages
Type safe embedded domain-specific languagesType safe embedded domain-specific languages
Type safe embedded domain-specific languagesArthur Xavier
 
Real World CouchDB
Real World CouchDBReal World CouchDB
Real World CouchDBJohn Wood
 
MongoDB + Java - Everything you need to know
MongoDB + Java - Everything you need to know MongoDB + Java - Everything you need to know
MongoDB + Java - Everything you need to know Norberto Leite
 

What's hot (20)

Aesthetics and the Beauty of an Architecture
Aesthetics and the Beauty of an ArchitectureAesthetics and the Beauty of an Architecture
Aesthetics and the Beauty of an Architecture
 
Doctrine MongoDB Object Document Mapper
Doctrine MongoDB Object Document MapperDoctrine MongoDB Object Document Mapper
Doctrine MongoDB Object Document Mapper
 
Symfony Day 2010 Doctrine MongoDB ODM
Symfony Day 2010 Doctrine MongoDB ODMSymfony Day 2010 Doctrine MongoDB ODM
Symfony Day 2010 Doctrine MongoDB ODM
 
ZendCon2010 Doctrine MongoDB ODM
ZendCon2010 Doctrine MongoDB ODMZendCon2010 Doctrine MongoDB ODM
ZendCon2010 Doctrine MongoDB ODM
 
Practical Ruby Projects with MongoDB - MongoSF
Practical Ruby Projects with MongoDB - MongoSFPractical Ruby Projects with MongoDB - MongoSF
Practical Ruby Projects with MongoDB - MongoSF
 
Practical Ruby Projects (Alex Sharp)
Practical Ruby Projects (Alex Sharp)Practical Ruby Projects (Alex Sharp)
Practical Ruby Projects (Alex Sharp)
 
Symfony2 from the Trenches
Symfony2 from the TrenchesSymfony2 from the Trenches
Symfony2 from the Trenches
 
MongoDB, PHP and the cloud - php cloud summit 2011
MongoDB, PHP and the cloud - php cloud summit 2011MongoDB, PHP and the cloud - php cloud summit 2011
MongoDB, PHP and the cloud - php cloud summit 2011
 
The Testing Games: Mocking, yay!
The Testing Games: Mocking, yay!The Testing Games: Mocking, yay!
The Testing Games: Mocking, yay!
 
Embedding a language into string interpolator
Embedding a language into string interpolatorEmbedding a language into string interpolator
Embedding a language into string interpolator
 
Curious case of Dust
Curious case of DustCurious case of Dust
Curious case of Dust
 
Doctrator Symfony Live 2011 San Francisco
Doctrator Symfony Live 2011 San FranciscoDoctrator Symfony Live 2011 San Francisco
Doctrator Symfony Live 2011 San Francisco
 
W3C XBL 2.0 and Widgets 1.0
W3C XBL 2.0 and Widgets 1.0 W3C XBL 2.0 and Widgets 1.0
W3C XBL 2.0 and Widgets 1.0
 
Jongo mongo sv
Jongo mongo svJongo mongo sv
Jongo mongo sv
 
Couchbase Korea User Group 2nd Meetup #2
Couchbase Korea User Group 2nd Meetup #2Couchbase Korea User Group 2nd Meetup #2
Couchbase Korea User Group 2nd Meetup #2
 
XML Binding Language 2.0
XML Binding Language 2.0XML Binding Language 2.0
XML Binding Language 2.0
 
Type safe embedded domain-specific languages
Type safe embedded domain-specific languagesType safe embedded domain-specific languages
Type safe embedded domain-specific languages
 
Real World CouchDB
Real World CouchDBReal World CouchDB
Real World CouchDB
 
MongoDB + Java - Everything you need to know
MongoDB + Java - Everything you need to know MongoDB + Java - Everything you need to know
MongoDB + Java - Everything you need to know
 
Web 6 | JavaScript DOM
Web 6 | JavaScript DOMWeb 6 | JavaScript DOM
Web 6 | JavaScript DOM
 

Viewers also liked

Daniel Anguilla Options
Daniel Anguilla OptionsDaniel Anguilla Options
Daniel Anguilla Optionschglat
 
Adp 1003 power point tutorial 1
Adp 1003 power point tutorial 1Adp 1003 power point tutorial 1
Adp 1003 power point tutorial 1smartphones1
 
Stacey pvr
Stacey pvrStacey pvr
Stacey pvrchglat
 
St. Lucia Options
St. Lucia OptionsSt. Lucia Options
St. Lucia Optionschglat
 
Andrew St. Lucia Part 1
Andrew St. Lucia Part 1Andrew St. Lucia Part 1
Andrew St. Lucia Part 1chglat
 
Jamaica Options
Jamaica OptionsJamaica Options
Jamaica Optionschglat
 
Puerto Rico Honeymoon
Puerto Rico HoneymoonPuerto Rico Honeymoon
Puerto Rico Honeymoonchglat
 
Hyatt Ziva Cancun
Hyatt Ziva CancunHyatt Ziva Cancun
Hyatt Ziva Cancunchglat
 
Jill Hawaii
Jill HawaiiJill Hawaii
Jill Hawaiichglat
 
Dr Jacqueline Stevenson MoRKSS presentation 17 Oct 2013
Dr Jacqueline Stevenson MoRKSS presentation 17 Oct 2013 Dr Jacqueline Stevenson MoRKSS presentation 17 Oct 2013
Dr Jacqueline Stevenson MoRKSS presentation 17 Oct 2013 viscabarca
 
Eastenders, submarine, only fools and horses
Eastenders, submarine, only fools and horsesEastenders, submarine, only fools and horses
Eastenders, submarine, only fools and horsesGiggleMeTimbers
 
All about me ofa tahilanu
All about me ofa tahilanuAll about me ofa tahilanu
All about me ofa tahilanuotahilanu11316
 
Cabo Wedding Options
Cabo Wedding OptionsCabo Wedding Options
Cabo Wedding Optionschglat
 
Shahrukh Riviera Maya Honeymoon Options
Shahrukh Riviera Maya Honeymoon OptionsShahrukh Riviera Maya Honeymoon Options
Shahrukh Riviera Maya Honeymoon Optionschglat
 
Gwenna Costa Rica Options
Gwenna Costa Rica OptionsGwenna Costa Rica Options
Gwenna Costa Rica Optionschglat
 
Jill St. Lucia Options
Jill St. Lucia OptionsJill St. Lucia Options
Jill St. Lucia Optionschglat
 
Mexico Options
Mexico OptionsMexico Options
Mexico Optionschglat
 

Viewers also liked (20)

Daniel Anguilla Options
Daniel Anguilla OptionsDaniel Anguilla Options
Daniel Anguilla Options
 
Adp 1003 power point tutorial 1
Adp 1003 power point tutorial 1Adp 1003 power point tutorial 1
Adp 1003 power point tutorial 1
 
Stacey pvr
Stacey pvrStacey pvr
Stacey pvr
 
St. Lucia Options
St. Lucia OptionsSt. Lucia Options
St. Lucia Options
 
Andrew St. Lucia Part 1
Andrew St. Lucia Part 1Andrew St. Lucia Part 1
Andrew St. Lucia Part 1
 
Jamaica Options
Jamaica OptionsJamaica Options
Jamaica Options
 
Puerto Rico Honeymoon
Puerto Rico HoneymoonPuerto Rico Honeymoon
Puerto Rico Honeymoon
 
Hyatt Ziva Cancun
Hyatt Ziva CancunHyatt Ziva Cancun
Hyatt Ziva Cancun
 
Jill Hawaii
Jill HawaiiJill Hawaii
Jill Hawaii
 
практический Pr
практический Prпрактический Pr
практический Pr
 
Dr Jacqueline Stevenson MoRKSS presentation 17 Oct 2013
Dr Jacqueline Stevenson MoRKSS presentation 17 Oct 2013 Dr Jacqueline Stevenson MoRKSS presentation 17 Oct 2013
Dr Jacqueline Stevenson MoRKSS presentation 17 Oct 2013
 
2013 06 24 graduacion 4_eso
2013 06 24 graduacion 4_eso2013 06 24 graduacion 4_eso
2013 06 24 graduacion 4_eso
 
Multimedia
MultimediaMultimedia
Multimedia
 
Eastenders, submarine, only fools and horses
Eastenders, submarine, only fools and horsesEastenders, submarine, only fools and horses
Eastenders, submarine, only fools and horses
 
All about me ofa tahilanu
All about me ofa tahilanuAll about me ofa tahilanu
All about me ofa tahilanu
 
Cabo Wedding Options
Cabo Wedding OptionsCabo Wedding Options
Cabo Wedding Options
 
Shahrukh Riviera Maya Honeymoon Options
Shahrukh Riviera Maya Honeymoon OptionsShahrukh Riviera Maya Honeymoon Options
Shahrukh Riviera Maya Honeymoon Options
 
Gwenna Costa Rica Options
Gwenna Costa Rica OptionsGwenna Costa Rica Options
Gwenna Costa Rica Options
 
Jill St. Lucia Options
Jill St. Lucia OptionsJill St. Lucia Options
Jill St. Lucia Options
 
Mexico Options
Mexico OptionsMexico Options
Mexico Options
 

Similar to How to win friends and influence people (with Hadoop)

Miniproject on Employee Management using Perl/Database.
Miniproject on Employee Management using Perl/Database.Miniproject on Employee Management using Perl/Database.
Miniproject on Employee Management using Perl/Database.Sanchit Raut
 
The "Big Data" Ecosystem at LinkedIn
The "Big Data" Ecosystem at LinkedInThe "Big Data" Ecosystem at LinkedIn
The "Big Data" Ecosystem at LinkedInSam Shah
 
Paintfree Object-Document Mapping for MongoDB by Philipp Krenn
Paintfree Object-Document Mapping for MongoDB by Philipp KrennPaintfree Object-Document Mapping for MongoDB by Philipp Krenn
Paintfree Object-Document Mapping for MongoDB by Philipp KrennJavaDayUA
 
WPSessions - Thinking Outside The Box With BuddyPress
WPSessions - Thinking Outside The Box With BuddyPressWPSessions - Thinking Outside The Box With BuddyPress
WPSessions - Thinking Outside The Box With BuddyPressDavid Bisset
 
Be pragmatic, be SOLID (at Boiling Frogs, Wrocław)
Be pragmatic, be SOLID (at Boiling Frogs, Wrocław)Be pragmatic, be SOLID (at Boiling Frogs, Wrocław)
Be pragmatic, be SOLID (at Boiling Frogs, Wrocław)Krzysztof Menżyk
 
When Relational Isn't Enough: Neo4j at Squidoo
When Relational Isn't Enough: Neo4j at SquidooWhen Relational Isn't Enough: Neo4j at Squidoo
When Relational Isn't Enough: Neo4j at SquidooGil Hildebrand
 
OOP Is More Then Cars and Dogs - Midwest PHP 2017
OOP Is More Then Cars and Dogs - Midwest PHP 2017OOP Is More Then Cars and Dogs - Midwest PHP 2017
OOP Is More Then Cars and Dogs - Midwest PHP 2017Chris Tankersley
 
Contacto server API in PHP
Contacto server API in PHPContacto server API in PHP
Contacto server API in PHPHem Shrestha
 
Building Web Service Clients with ActiveModel
Building Web Service Clients with ActiveModelBuilding Web Service Clients with ActiveModel
Building Web Service Clients with ActiveModelpauldix
 
Building Web Service Clients with ActiveModel
Building Web Service Clients with ActiveModelBuilding Web Service Clients with ActiveModel
Building Web Service Clients with ActiveModelpauldix
 
OOP Is More Than Cars and Dogs
OOP Is More Than Cars and DogsOOP Is More Than Cars and Dogs
OOP Is More Than Cars and DogsChris Tankersley
 
Cena-DTA PHP Conference 2011 Slides
Cena-DTA PHP Conference 2011 SlidesCena-DTA PHP Conference 2011 Slides
Cena-DTA PHP Conference 2011 SlidesAsao Kamei
 
Consuming Data From Many Platforms: The Benefits of OData - St. Louis Day of ...
Consuming Data From Many Platforms: The Benefits of OData - St. Louis Day of ...Consuming Data From Many Platforms: The Benefits of OData - St. Louis Day of ...
Consuming Data From Many Platforms: The Benefits of OData - St. Louis Day of ...Eric D. Boyd
 
AngularJS: What's the Big Deal?
AngularJS: What's the Big Deal?AngularJS: What's the Big Deal?
AngularJS: What's the Big Deal?Jim Duffy
 
Modware next generation with pub module
Modware next generation with pub moduleModware next generation with pub module
Modware next generation with pub modulecybersiddhu
 
4Developers 2015: Be pragmatic, be SOLID - Krzysztof Menżyk
4Developers 2015: Be pragmatic, be SOLID - Krzysztof Menżyk4Developers 2015: Be pragmatic, be SOLID - Krzysztof Menżyk
4Developers 2015: Be pragmatic, be SOLID - Krzysztof MenżykPROIDEA
 
Crafting beautiful software
Crafting beautiful softwareCrafting beautiful software
Crafting beautiful softwareJorn Oomen
 
Evolving a Clean, Pragmatic Architecture at JBCNConf 2019
Evolving a Clean, Pragmatic Architecture at JBCNConf 2019Evolving a Clean, Pragmatic Architecture at JBCNConf 2019
Evolving a Clean, Pragmatic Architecture at JBCNConf 2019Victor Rentea
 

Similar to How to win friends and influence people (with Hadoop) (20)

Miniproject on Employee Management using Perl/Database.
Miniproject on Employee Management using Perl/Database.Miniproject on Employee Management using Perl/Database.
Miniproject on Employee Management using Perl/Database.
 
The "Big Data" Ecosystem at LinkedIn
The "Big Data" Ecosystem at LinkedInThe "Big Data" Ecosystem at LinkedIn
The "Big Data" Ecosystem at LinkedIn
 
Paintfree Object-Document Mapping for MongoDB by Philipp Krenn
Paintfree Object-Document Mapping for MongoDB by Philipp KrennPaintfree Object-Document Mapping for MongoDB by Philipp Krenn
Paintfree Object-Document Mapping for MongoDB by Philipp Krenn
 
SOLID Ruby, SOLID Rails
SOLID Ruby, SOLID RailsSOLID Ruby, SOLID Rails
SOLID Ruby, SOLID Rails
 
WPSessions - Thinking Outside The Box With BuddyPress
WPSessions - Thinking Outside The Box With BuddyPressWPSessions - Thinking Outside The Box With BuddyPress
WPSessions - Thinking Outside The Box With BuddyPress
 
Be pragmatic, be SOLID (at Boiling Frogs, Wrocław)
Be pragmatic, be SOLID (at Boiling Frogs, Wrocław)Be pragmatic, be SOLID (at Boiling Frogs, Wrocław)
Be pragmatic, be SOLID (at Boiling Frogs, Wrocław)
 
When Relational Isn't Enough: Neo4j at Squidoo
When Relational Isn't Enough: Neo4j at SquidooWhen Relational Isn't Enough: Neo4j at Squidoo
When Relational Isn't Enough: Neo4j at Squidoo
 
OOP Is More Then Cars and Dogs - Midwest PHP 2017
OOP Is More Then Cars and Dogs - Midwest PHP 2017OOP Is More Then Cars and Dogs - Midwest PHP 2017
OOP Is More Then Cars and Dogs - Midwest PHP 2017
 
Contacto server API in PHP
Contacto server API in PHPContacto server API in PHP
Contacto server API in PHP
 
Building Web Service Clients with ActiveModel
Building Web Service Clients with ActiveModelBuilding Web Service Clients with ActiveModel
Building Web Service Clients with ActiveModel
 
Building Web Service Clients with ActiveModel
Building Web Service Clients with ActiveModelBuilding Web Service Clients with ActiveModel
Building Web Service Clients with ActiveModel
 
OOP Is More Than Cars and Dogs
OOP Is More Than Cars and DogsOOP Is More Than Cars and Dogs
OOP Is More Than Cars and Dogs
 
Cena-DTA PHP Conference 2011 Slides
Cena-DTA PHP Conference 2011 SlidesCena-DTA PHP Conference 2011 Slides
Cena-DTA PHP Conference 2011 Slides
 
Consuming Data From Many Platforms: The Benefits of OData - St. Louis Day of ...
Consuming Data From Many Platforms: The Benefits of OData - St. Louis Day of ...Consuming Data From Many Platforms: The Benefits of OData - St. Louis Day of ...
Consuming Data From Many Platforms: The Benefits of OData - St. Louis Day of ...
 
AngularJS: What's the Big Deal?
AngularJS: What's the Big Deal?AngularJS: What's the Big Deal?
AngularJS: What's the Big Deal?
 
Modware next generation with pub module
Modware next generation with pub moduleModware next generation with pub module
Modware next generation with pub module
 
Be pragmatic, be SOLID
Be pragmatic, be SOLIDBe pragmatic, be SOLID
Be pragmatic, be SOLID
 
4Developers 2015: Be pragmatic, be SOLID - Krzysztof Menżyk
4Developers 2015: Be pragmatic, be SOLID - Krzysztof Menżyk4Developers 2015: Be pragmatic, be SOLID - Krzysztof Menżyk
4Developers 2015: Be pragmatic, be SOLID - Krzysztof Menżyk
 
Crafting beautiful software
Crafting beautiful softwareCrafting beautiful software
Crafting beautiful software
 
Evolving a Clean, Pragmatic Architecture at JBCNConf 2019
Evolving a Clean, Pragmatic Architecture at JBCNConf 2019Evolving a Clean, Pragmatic Architecture at JBCNConf 2019
Evolving a Clean, Pragmatic Architecture at JBCNConf 2019
 

Recently uploaded

What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfIngrid Airi González
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Strongerpanagenda
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesKari Kakkonen
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Farhan Tariq
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterMydbops
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality AssuranceInflectra
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfNeo4j
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...Wes McKinney
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Hiroshi SHIBATA
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 

Recently uploaded (20)

What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdf
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examples
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL Router
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdf
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 

How to win friends and influence people (with Hadoop)

  • 1. How to Win Friends and Influence People (with Hadoop) Strata Conference New York Sam Shah and Joseph Adler October 25 2012 ©2012 LinkedIn Corporation. All Rights Reserved.
  • 2. Sam Shah Principal Engineer and Engineering Manager www.linkedin.com/in/shahsam Joseph Adler Senior Data Scientist www.linkedin.com/in/josephadler ©2012 LinkedIn Corporation. All Rights Reserved.
  • 3. LinkedIn is the leading professional network site 175M+ LinkedIn Members 640M+ Worldwide Professionals 3,300M+ Worldwide Workforce ©2012 LinkedIn Corporation. All Rights Reserved. STRATA NY 2012 3
  • 4. Data rich 175+M Members 175M Member Profiles ©2012 LinkedIn Corporation. All Rights Reserved. STRATA NY 2012 4
  • 5. LinkedIn 9.3B Page Views per Quarter 130M Unique Visitors ©2012 LinkedIn Corporation. All Rights Reserved. STRATA NY 2012 5
  • 6. We have a lot of data. ©2012 LinkedIn Corporation. All Rights Reserved.
  • 7. We have a lot of data. We want to leverage this data to build products. ©2012 LinkedIn Corporation. All Rights Reserved.
  • 8. We have a lot of data. We want to leverage this data to build products. How do you make it easy to build products from data? ©2012 LinkedIn Corporation. All Rights Reserved.
  • 9. Products we have built on Hadoop ©2012 LinkedIn Corporation. All Rights Reserved. STRATA NY 2012 9
  • 10. Building products from data Examples of products built with data  Year in Review Email  Network Updates  Skills and Endorsements  People You May Know  and more… ©2012 LinkedIn Corporation. All Rights Reserved. STRATA NY 2012
  • 11. Year in Review One of the most successful email messages ever. 20% Response Rate 5 Clicks per responder STRATA NY 2012
  • 12. Network updates ©2012 LinkedIn Corporation. All Rights Reserved. STRATA NY 2012 12
  • 13. People you may know ©2012 LinkedIn Corporation. All Rights Reserved. STRATA NY 2012 13
  • 14. Skills and Endorsements ©2012 LinkedIn Corporation. All Rights Reserved. STRATA NY 2012
  • 15. Building products from data Hadoop is awesome for building product with data  Lots of cheap storage  Vast computational resources  Lots of tools for processing data, learning from data  Shared infrastructure  Shared support services  Runs on commodity hardware (or AWS) ©2012 LinkedIn Corporation. All Rights Reserved. STRATA NY 2012
  • 16. Leverage The marginal cost of building new products is low  People You May Know (2 people)  Skills and Endorsements (2 people)  Year in Review (1 person, 1 month)  Network Updates Stream (1 person, 3 months) Hadoop can empower small teams to build things ©2012 LinkedIn Corporation. All Rights Reserved. STRATA NY 2012
  • 17. Leverage The marginal cost of building new products is low  People You May Know (2 people)  Skills and Endorsements (2 people)  Year in Review (1 person, 1 month)  Network Updates Stream (1 person, 3 months) Hadoop can empower small teams to build things ©2012 LinkedIn Corporation. All Rights Reserved. STRATA NY 2012
  • 18. Turning data into products How we build products ©2012 LinkedIn Corporation. All Rights Reserved. STRATA NY 2012 18
  • 19. Year in Review  Steps to make the email – Collect job changers – Figure out who is connected to them – Rank job changes STRATA NY 2012
  • 20. Example: Year in Review memberPosition = LOAD '$latest_positions' USING BinaryJSON; connectionsWithChangeWithPic::source_id AS source_id, memberWithPositionsChangedLastYear = FOREACH ( connectionsWithChangeWithPic::member_id AS member_id, FILTER memberPosition BY ((start_date >= $start_date_low ) AND connectionsWithChangeWithPic::dest_first_name as first_name, (start_date <= $start_date_high)) connectionsWithChangeWithPic::dest_last_name as last_name, ) GENERATE member_id, start_date, end_date; connectionsWithChangeWithPic::pic_id AS pic_id, memberinfowpics::first_name AS firstName, allConnections = LOAD '$latest_bidirectional_connections' USING BinaryJSON; memberinfowpics::last_name AS lastName, memberinfowpics::gmt_offset as gmt_offset, allConnectionsWithChange_nondistinct = FOREACH ( memberinfowpics::email_locale as email_locale, JOIN memberWithPositionsChangedLastYear BY member_id, memberinfowpics::email_address as email_address; allConnections BY dest ) GENERATE allConnections::source AS source, allConnections::dest AS dest; resultGroup0 = GROUP withName BY (source_id, firstName, lastName, email_address, email_locale, gmt_offset); allConnectionsWithChange = DISTINCT allConnectionsWithChange_nondistinct; -- get the count of results per recipient resultGroupCount = FOREACH resultGroup0 GENERATE group , memberinfowpics = LOAD '$latest_memberinfowpics' USING withName as toomany, COUNT_STAR(withName) as num_results; BinaryJSON; pictures = FOREACH ( FILTER memberinfowpics BY resultGroupPre = filter resultGroupCount by num_results > 2; ((cropped_picture_id is not null) AND resultGroup = FOREACH resultGroupPre { ( (member_picture_privacy == 'N') OR withName = LIMIT toomany 64; (member_picture_privacy == 'E'))) ) GENERATE member_id, cropped_picture_id, first_name as GENERATE group, withName, num_results; dest_first_name, last_name as dest_last_name; } resultPic = JOIN allConnectionsWithChange BY dest, pictures x_in_review_pre_out = FOREACH resultGroup GENERATE BY member_id; FLATTEN(group) as (source_id, firstName, lastName, connectionsWithChangeWithPic = FOREACH resultPic GENERATE email_address, email_locale, gmt_offset), allConnectionsWithChange::source AS source_id, withName.(member_id, pic_id, first_name, last_name) as allConnectionsWithChange::dest AS member_id, jobChanger, '2011' as changeYear:chararray, pictures::cropped_picture_id AS pic_id, num_results as num_results; pictures::dest_first_name AS dest_first_name, pictures::dest_last_name AS dest_last_name; x_in_review = FOREACH x_in_review_pre_out GENERATE source_id as recipientID, gmt_offset as gmtOffset, joinResult = JOIN connectionsWithChangeWithPic BY source_id, firstName as first_name, lastName as last_name, email_address, memberinfowpics BY member_id; email_locale, withName = FOREACH joinResult GENERATE TOTUPLE( changeYear, source_id,firstName, lastName, num_results,jobChanger) as body; rmf $xir; STORE x_in_review INTO '$xir' USING BinaryJSON('recipientID'); STRATA NY 2012
  • 21. Example: Year in Review {body={num_results=80, lastName=Adler, changeYear=2011, firstName=Joseph, jobChanger=[{last_name=O'Connor, first_n ame=Br?on, member_id=12562482, pic_id=/p/3/000/086/1bd/10ee035.jpg}, {last_name=Sundaram, first_name=Vivek, member _id=6590171, pic_id=/p/3/000/0ae/354/36eb54c.jpg}, {last_name=Crane, first_name=Patrick, member_id=8628324, pic_id Each message requires a lot of =/p/1/000/09c/064/10191de.jpg}, {last_name=McLennan, first_name=Dan, member_id=10551114, pic_id=/p/2/000/09d/12f/1 47def1.jpg}, {last_name=Shaughnessy, first_name=Helen, member_id=2211035, pic_id=/p/3/000/06d/2ba/06a113c.jpg}, {l ast_name=Chen, first_name=Richard, member_id=12800647, pic_id=/p/2/000/007/1ad/0fb84f9.jpg}, {last_name=Barba, fir st_name=Troy, member_id=27577, pic_id=/p/2/000/0a2/3e9/3a83a33.jpg}, {last_name=Reed, first_name=Harper, member_id data: =1865420, pic_id=/p/1/000/001/17b/396a2c3.jpg}, {last_name=Goldstein, first_name=Peter, member_id=205610, pic_id=/ p/2/000/01c/2e6/042999f.jpg}, {last_name=Koren, first_name=Yuval, member_id=2289577, pic_id=/p/1/000/02b/3d3/1fc36 27.jpg}, {last_name=Kiang, first_name=Andy, member_id=8347, pic_id=/p/1/000/063/115/1256f61.jpg}, {last_name=Green field, first_name=Nick, member_id=82814545, pic_id=/p/1/000/068/39f/2080b8f.jpg}, {last_name=Murarka, first_name=B ubba, member_id=174233, pic_id=/p/3/000/011/2c8/33837b8.jpg}, {last_name=Kutter, first_name=Norbert, member_id=310 933, pic_id=/p/3/000/005/0e2/02775a0.jpg}, {last_name=Ehrenberg, first_name=Roger, member_id=1662181, pic_id=/p/3/ 000/038/066/3572baf.jpg}, {last_name=Coderre, CISSP, first_name=Rob, member_id=68521, pic_id=/p/1/000/088/0d5/2438 981.jpg}, {last_name=Stephens, first_name=Bradford, member_id=10900447, pic_id=/p/1/000/0ad/0dc/15f9df5.jpg}, {las t_name=Shiau, first_name=Peter, member_id=300654, pic_id=/p/2/000/056/2a6/18938e3.jpg}, {last_name=Rajan, first_na – Header information (10 fields) me=Arvind, member_id=1260, pic_id=/p/3/000/019/3f7/1e6e0f2.jpg}, {last_name=Bellister, first_name=Jesse, member_id – 4 fields per person, 64 people =25234604, pic_id=/p/3/000/00a/17d/1e2136b.jpg}, {last_name=Mohan, first_name=Viraj, member_id=56817108, pic_id=/p /3/000/0cd/0a4/097527a.jpg}, {last_name=Ragade, first_name=Dhananjay, member_id=325284, pic_id=/p/3/000/000/035/05 04fe7.jpg}, {last_name=Richards, first_name=Jeff, member_id=16762, pic_id=/p/2/000/039/14e/081d1c7.jpg}, {last_nam e=Wittenauer, first_name=Allen, member_id=3328775, pic_id=/p/3/000/08d/2a3/307b112.jpg}, {last_name=Porzak, first_ – That’s over 250 data fields for name=Jim, member_id=1708710, pic_id=/p/2/000/00d/109/0e4aa34.jpg}, {last_name=Ruma, first_name=Laurel, member_id=3 429732, pic_id=/p/1/000/01e/277/2bb115b.jpg}, {last_name=Higgins, first_name=Josh, member_id=1458792, pic_id=/p/1/ 000/0c9/38b/1a24457.jpg}, {last_name=Benedict, first_name=Harvey, member_id=641340, pic_id=/p/3/000/0c6/1eb/2eb711 9.jpg}, {last_name=Lazarus, first_name=Brett, member_id=49965786, pic_id=/p/2/000/03b/04e/318d080.jpg}, {last_name =Zhang, first_name=Simon, member_id=16323996, pic_id=/p/3/000/03f/0fe/35d4ded.jpg}, {last_name=Aspen, first_name=M att, member_id=25240804, pic_id=/p/3/000/09b/371/22ec974.jpg}, {last_name=Herz, first_name=Erik, member_id=147604, pic_id=/p/3/000/086/014/0fab4d6.jpg}, {last_name=Sanders, first_name=Geoffrey, member_id=340570, pic_id=/p/1/000/0 the final message d1/2d1/37a76e6.jpg}, {last_name=Wright, first_name=Caleb, member_id=12798700, pic_id=/p/2/000/08c/337/2cc951a.jpg} , {last_name=Parab, first_name=Guru, member_id=8915230, pic_id=/p/1/000/08a/257/051926a.jpg}, {last_name=Grossman, first_name=Nick, member_id=12159520, pic_id=/p/2/000/005/2f3/1955f31.jpg}, {last_name=Skomoroch, first_name=Peter, member_id=11642980, pic_id=/p/2/000/0b4/12d/31eadbe.jpg}, {last_name=Singh, first_name=Deepak, member_id=1246166, pic_id=/p/1/000/042/3f5/369f807.jpg}, {last_name=Noakes, first_name=Geoffrey, member_id=3518726, pic_id=/p/3/000/0 05/3d7/3f67632.jpg}, {last_name=Scudiere, first_name=Robert, member_id=3965286, pic_id=/p/2/000/090/210/009a099.jp g}, {last_name=Skyler, first_name=David, member_id=15377099, pic_id=/p/3/000/005/1bf/080b255.jpg}, {last_name=Shar How do we turn this raw data in to ma, first_name=Manu, member_id=19295378, pic_id=/p/3/000/0d4/11e/2176c30.jpg}, {last_name=Huang, first_name=Erica, member_id=1808438, pic_id=/p/1/000/001/3a5/02ddd24.jpg}, {last_name=Ballotta, first_name=Pete, member_id=2011178, pic_id=/p/2/000/0b6/08f/3a92357.jpg}, {last_name=Kast, first_name=Anton, member_id=1092686, pic_id=/p/1/000/054/0e web content or email messages? 2/1a8efb2.jpg}, {last_name=Redfern, first_name=Joff, member_id=2849241, pic_id=/p/3/000/03d/28d/19f5688.jpg}, {las t_name=Smith, first_name=Aaron, member_id=83470876, pic_id=/p/2/000/08c/27c/3cfe37a.jpg}, {last_name=Yadav, first_ name=Rishi, member_id=2097381, pic_id=/p/2/000/0c8/08d/3ab9006.jpg}, {last_name=Repass, first_name=Mike, member_id =8633208, pic_id=/p/2/000/071/195/0bfc573.jpg}, {last_name=Dalvi, first_name=Anand, member_id=8388, pic_id=/p/1/00 0/003/3cd/3127384.jpg}, {last_name=Croll, first_name=Alistair, member_id=511218, pic_id=/p/2/000/029/0e5/1ebc076.j pg}, {last_name=Tolman, first_name=Sarah, member_id=86040596, pic_id=/p/2/000/06f/1c9/1a7870e.jpg}, {last_name=Suv arna, first_name=Sandeep, member_id=10558779, pic_id=/p/1/000/05b/2c7/0ec214a.jpg}, {last_name=Elliott- McCrea, first_name=Kellan, member_id=163959, pic_id=/p/1/000/06b/2e8/2dbd3ae.jpg}, {last_name=Jatkar, first_name=T arang, member_id=17763609, pic_id=/p/1/000/012/010/2e8ee7f.jpg}, {last_name=Brown, first_name=David, member_id=420 737, pic_id=/p/3/000/002/140/0b2dbcc.jpg}, {last_name=Patel, first_name=Jay, member_id=1179857, pic_id=/p/2/000/07 c/0b2/0365e91.jpg}, {last_name=Field, first_name=Dylan, member_id=13066037, pic_id=/p/2/000/0a5/3e2/1fb7f06.jpg}, {last_name=Patel, first_name=Sumeet, member_id=23402387, pic_id=/p/2/000/0bf/3ca/2ca5f1f.jpg}, {last_name=Ting, fi rst_name=Moses, member_id=15624915, pic_id=/p/2/000/0ac/117/29e329a.jpg}, {last_name=Hinnach, first_name=Yassine, member_id=1731285, pic_id=/p/3/000/000/035/330cce0.jpg}, {last_name=Das, first_name=Anshu, member_id=38878221, pic _id=/p/3/000/0b2/1ac/15902f4.jpg}, {last_name=Mendelson, first_name=Jordan, member_id=8598415, pic_id=/p/3/000/032 /22a/1d2eaa6.jpg}, {last_name=Besbeas, first_name=Nick, member_id=12510505, pic_id=/p/3/000/093/167/34f5b6b.jpg}], source_id=256842}, first_name=Joseph, email_locale=en_US, last_name=Adler, gmtOffset=- 8, recipientID=256842, email_address=jadler@linkedin.com} STRATA NY 2012
  • 22. People you may know ©2012 LinkedIn Corporation. All Rights Reserved. STRATA NY 2012 22
  • 23. People you may know Alice Bob Carol ©2012 LinkedIn Corporation. All Rights Reserved. STRATA NY 2012 23
  • 24. People you may know Alice Bob Carol > 80% of connections from triangle closing ©2012 LinkedIn Corporation. All Rights Reserved. STRATA NY 2012 24
  • 25. People you may know Age Organizational Overlap Distance Alice Bob Dave Carol Ranked Matches Eve User Interactions Results ©2012 LinkedIn Corporation. All Rights Reserved. STRATA NY 2012 25
  • 26. Skills and Endorsements ©2012 LinkedIn Corporation. All Rights Reserved. STRATA NY 2012
  • 27. Tagging Skills ©2012 LinkedIn Corporation. All Rights Reserved. STRATA NY 2012 27
  • 28. ©2012 LinkedIn Corporation. All Rights Reserved. STRATA NY 2012 28
  • 29. Skills and Endorsements A combination of – Propensity to know member – Propensity for member to have skill ©2012 LinkedIn Corporation. All Rights Reserved. STRATA NY 2012
  • 30. Productionalization Take something that runs once… … and run it multiple times … and serve it reliably at scale … and iterate quickly ©2012 LinkedIn Corporation. All Rights Reserved. STRATA NY 2012 31
  • 31. Data Lifecycle  Moving around data is the key problem 1. Ingress Moving raw data from online systems to offline systems 2. Workflow management Managing offline processes 3. Egress Moving results from offline systems to online systems ©2012 LinkedIn Corporation. All Rights Reserved. STRATA NY 2012 32
  • 32. Ingress  Apache Kafka: Low latency publish/subscribe message bus – Common data format (Avro) – Changelog is the abstraction for integration – Schema evolution  Programmatic compatibility model  Explicit schema reviews  “O(1)” ETL  K. Goodhope, J. Koshy, J. Kreps, N. Narkhede, R. Park, J. Rao, V.Y. Ye: Building LinkedIn’s Real-time Activity Data Pipeline. In IEEE Data Engineering Bulletin. Vol 35, No. 2, June 2012. ©2012 LinkedIn Corporation. All Rights Reserved. STRATA NY 2012 33
  • 33. Workflows Job A Job B Job C ©2012 LinkedIn Corporation. All Rights Reserved. STRATA NY 2012 34
  • 34. Workflows Job A Job B Job C Push to Production ©2012 LinkedIn Corporation. All Rights Reserved. STRATA NY 2012 35
  • 35. Workflows Job A Job B Job X Job C Push to Production ©2012 LinkedIn Corporation. All Rights Reserved. STRATA NY 2012 36
  • 36. Workflows Job A Job B Job X Job C Push to Production Push to QA ©2012 LinkedIn Corporation. All Rights Reserved. STRATA NY 2012 37
  • 37. Real workflows are complicated ©2012 LinkedIn Corporation. All Rights Reserved. STRATA NY 2012 38
  • 38. Workflow Management: Azkaban  Dependency management  Diverse job types (Pig, Hive, Java, . . . )  Scheduling  Monitoring  Configuration  Retry/restart on failure  Resource locking  Log collection  Historical information ©2012 LinkedIn Corporation. All Rights Reserved. STRATA NY 2012 39
  • 39. Workflow Management: Azkaban ©2012 LinkedIn Corporation. All Rights Reserved. STRATA NY 2012 40
  • 40. Workflow Management: Azkaban ©2012 LinkedIn Corporation. All Rights Reserved. STRATA NY 2012 41
  • 41. Egress: Voldemort  Distributed key/value store  Easy to integrate into workflows – Off the shelf jobs to copy Voldemort Stores – One line command in Pig  Cost of data load  Data stored per node? Response time  Fail-over  How to transfer  Versioning & rollback  R. Sumbaly, J. Kreps, L. Gao, A. Feinberg, C. Soman, & S. Shah. Serving Large- Scale Batch Computed Data With Project Voldemort. In FAST 2012. ©2012 LinkedIn Corporation. All Rights Reserved. STRATA NY 2012 42
  • 42. Recap Why we use Hadoop  Simple programmatic model  Rich developer ecosystem – Languages: Pig, Hive, Crunch, Cascading, … – Libraries: Mahout, DataFu, ElephantBird, …  Horizontal scalability, fault tolerance, multi-tenancy – Reliably process multiple TB of data  Don’t need hardcore distributed systems engineers ©2012 LinkedIn Corporation. All Rights Reserved. STRATA NY 2012 43
  • 43. Recap How we use Hadoop Open source projects started at LinkedIn:  Getting data in: Kafka  Building and running job flows: Azkaban  Getting data out: Voldemort This empowers data scientists and engineers to focus on new product ideas, not infrastructure ©2012 LinkedIn Corporation. All Rights Reserved. STRATA NY 2012 44
  • 44. Learning More data.linkedin.com ©2012 LinkedIn Corporation. All Rights Reserved. STRATA NY 2012 45

Editor's Notes

  1. Today, Sam and I are going to talk about how we use Hadoop to build products with data.Sam and I are both engineers at LinkedIn. My title is trendier than Sam’s, but don’t hold that against me. Or him. We both know how to build products with data.Both of us have talked about a lot of the products in this presentation before, but we haven’t focused as much on infrastructure
  2. We’d like to start by telling you a little bit about LinkedIn (and LinkedIn’s data).LinkedIn is the leading web site for professional networking. We currently have over 175 million members, but we’re still growing. That means that our data is growing too.
  3. Each member has a profile. We know a lot about our members (start scrolling animation)…We know their current position, past positions, schools they attended, skills they have, skills that other people have endorsed them for, people and companies they follow, companies they work on.We think this data is very interesting. We can use this data to help members connect to each other, and make them more productive. That’s actually LinkedIn’s mission statement… I can’t believe I recited our mission statement in a public presentation. Anyway, let’s take a look at how we use this data.&lt;Flip to next slide to show how we use this data&gt;
  4. When a user logs into LinkedIn, they see a page like this. Almost every part of our home page has been touched by data science.Home page is purely driven by data:News articlesNews streamPYMKDisplay adJYMBIIWVMPGYMLEtc…And by the way, we also learn what our members like and don’t like. Wehave over 130 million visitors to our site every quarter, and deliver over 9.3 billion web pages. (That’s even more data)
  5. So, here’s the point of today’s talk.At LinkedIn, we have a lot of data.
  6. We store our data in Hadoop, and we want to build product using that data on Hadoop.
  7. So here’s the big challenge: how do we make it easy for our engineers, product managers, data scientists, analysts, web devs, reseptionists, whatever, build products from our data?That’s what we’re going to talk about today. We’ll tell you about some of the products that we’ve built from data, how we built these products, and why we built infastructure to support these products.
  8. Let’s start by telling you a little more about some of the products that we have built with Hadoop, then we’ll tell you more about two of those products and the challenges that we faced productionalizing them.
  9. More examples:Groups you might likeNetwork updates digest email“People who viewed this profile also viewed”Etc.
  10. Let’s start with a project that I worked on at LinkedIn that I think illustrates the power of building products with data.Ask audience “who got this email?”We sent this to every LinkedIn member who had a lot of job changes in their network.[now read the numbers]Later in this presentation, we’ll tell you how we built this email from our data. I’ll even show you the code.
  11. Hereis another example of a product that I’ve worked on. In the network stream on our home page, we’ve started sharing trends and patterns in data.We also tell you things that you might not know about your network. For example, it turns out that 21 of my former coworkers are now working at Google.
  12. One of the most famous examples of data products at LinkedIn is People you may Know.PYMK was invented at LinkedIn. The idea of PYMK was to help you discover current coworkers, former coworkers, and friends on LinkedIn to help make your experience better. (This is not an actual screen shot from my account; I’m already connected to Sam.)We used Hadoop to build and scale PYMK. (We’ll also tell you more about how we built PYMK later in this presentation.)
  13. Has anyone in the room seen a screen like this on LinkedIn?Has anyone endorsed someone else?Has anyone found it hard to stop endorsing people?We also used Hadoop to build our suggested endorsements.
  14. We love using Hadoop for building data products.There are so many things that are great about Hadoop. (Our user quotas are in TB.)Hundreds of nodesGreat tools for working with data like Pig, and hive, and CrunchShared infrastucture. Hundreds of employees have accounts on Hadoop and run jobs (engineers, data scientists, product managers, even designers and finance people)
  15. One of the greatest advantages of Hadoop is that it empowers small teams to build great things. Here are a few examplesMost of the items on this are big, important features: lots of page views, lots of new connections, lots of great content.The marginal cost of building more products is low
  16. One of the greatest advantages of Hadoop is that it empowers small teams to build great things. Here are a few examplesMost of the items on this are big, important features: lots of page views, lots of new connections, lots of great content.The marginal cost of building more products is low
  17. Let’s talk a little more about the year in review email. This is actually a pretty straightforward message in theory. Here’s how we do it. (Read slide)There isn’t any machine learning, or fancy algorithms. It’s just grouping and ranking.And in practice, it’s not that hard.
  18. This is the code to compose this message. It’s About 60 lines of code, and most of that code involved renaming things.This is why we love Hadoop: we can do something simple without much code…Great! We’re done. We write this code and the message is done.
  19. Well, not so fast… here’s the challenge. We know how to do the computation to make this message. But every message requires a lot of data: we potentially look at hundreds of MB of data before degnerating every message, and in the end the messages are up to 1MB in size.How do we get all the raw data that we need to make this message? How do we keep it up to date?How do we run this job frequently so the results stay current?How do we get these results out of Hadoop, turn them into email messages, and send them out?Let’s consider another problem.
  20. One of the most famous examples of data products is People you may Know.PYMK was invented at LinkedIn. The idea of PYMK was to help you discover current coworkers, former coworkers, and friends on LinkedIn to help make your experience better. (This is not an actual screen shot from my account; I’m already connected to Sam.)We used Hadoop to build and scale PYMK.
  21. - PYMK started simpler, grew more complicated- Complicated workflow, required tools and infrastructure to do this --&gt; we needed it in place.
  22. Throw over the wall from data science to productionizationNo one dedicated toproductionizationProvided “as a service” to do so
  23. - Don’t want to beg for data- Others: Scribe, Flume
  24. - Others: Oozie
  25. - Others: Hbase, Cassandra, Kafka