SlideShare a Scribd company logo
1 of 21
Download to read offline
The CERN Approach in 

the Age of “Big Data”
T. Nicholas Kypreos, Ph.D.	

!

Principal Data Scientist and Project Manager	

MAQ Software

Seattle Scalability Meetup, 26 Feb, 2014
acknowledgments

•

CMS Collaboration and CERN !

•
•
•
•

FNAL for support in writing my dissertation… and for computing
facilities to abuse !

Bradford Stephens for letting me do this… !
Mike Miller and Adam Kocoloski from Cloudant for letting me steal some
nice slides on the subject! !

•
•

UF for sending me there!

Cloudant was just bought by IBM — so clearly all physicists are
experts in the cloud!

everyone here for suffering this diatribe
2
OBLIGATORY
TRAVEL PHOTOS…
the life of LHC data

•

information picked up by detectors!

•
•
•

data are clumped together to create events!
real-time hardware filtering — FIFO buffering!

raw data are sent to T0 center at CERN for archival (replicated at FNAL)!

•
•

transferred to T1 sites, reconstructed, filtered into primary datasets!
transferred to T2 sites, reconstructed skimmed and filtered !

•
•

accessed at this level by general users across the globe!

moved to laptops for direct analysis!

•

peer-review of analysis, lots of liquor, an eventual paper
4
the reader

Figure 3-3. Slice of the CMS detector in the r
particles.

•

lots of ways you can
39
describe this!

•

5

plane showing the paths of diff

collection of detectors
reading out information
(like a camera?)
LHC 2010-2011 LHC 2012+

luminosity

•

LHC can produce all SM particles
across a range of energies!

•
•
•

production is not equiprobable -governed by the cross section, σ !
small cross sections correspond
to rarer processes in nature!

event production depends on
integrated luminosity of the machine

Nevents =

•

(pp ! X) L

probability x trials

higher luminosity - more events

I WANT THE UNICORNS!

6
information has to all come together somehow…

•

trigger system helps determine what we want to keep and throw away 

(hardware-based pattern recognition) !

•

real-time interface with detectors keeps data flowing that’s not broken
Figure 3-16. Architecture of the CMS Level 1 trigger.
and collected at the right time — constant feedback and data validation

this still hurts my brain

7
Figure 3-17. Overview of the CMS trigger control system.
what that information looks like…

	

 not even the highest granularity…

8
triggering

•

collision rate of 40 MHz!

•
•

100 Hz can be written to tape

➞ rate reduction is necessary!

rate reduced in two phases!

•

Level-1 (hardware)!

•
•

High Level Trigger (HLT) (software)!

•
•

collects the full event 100 kHz out!

on-site computing farm performs
fast partial reconstruction!

pack trigger menus into bits users
fight for priority
9
what we’re aiming for…

•
•

lots of filtering!

•

multivariate analysis,
machine learning, derp-aderp, Kalman filtering…
fitting and clustering!

•

analysis teams have to
fight for their analysis
priority — you can’t
always get what you want

data from online
monitoring goes to a
computer farm (HLT) !

10
the life of LHC data

•

information picked up by detectors!

•
•
•

data are clumped together to create events!
real-time hardware filtering — FIFO software filtering!

raw data are sent to T0 center at CERN for archival (replicated at
FNAL)!

•
•

transferred to T1 sites, reconstructed, filtered into primary datasets!
transferred to T2 sites, reconstructed skimmed and filtered !

•
•

accessed at this level by general users across the globe!

moved to laptops for direct analysis!

•

peer-review of analysis, lots of liquor, an eventual paper
11
he CMS experiment has taken a novel approach to Grid data distribution. Instead
having a central processing component making global decisions on replica allocatiers
on, CMS has a data management layer composed of a series of collaborating agents;
1PB/week
each persistent, be a bit
he agents arecenter maystateless processes which manage specific parts of replicaon operations at each non- the distribution network. The agents communicate
different (think site in
e a bit different
ynchronously through centers)!
standard data a blackboard architecture; as a result the system is robust in
he face ofsites with 100’s TB CMS’ data distribution network is represented in a
many local failures.
(100)
milar fashionsites with 100sThis means that the management of multi-hop transto the internet.
many worker nodes of
ge, 10000’s
rs at datasetstorage, 1000s of system allows CMS to seamlessly bridge the gap
TB of level is simple. This
so many? Politics, power budget,
etween traditional HEP “waterfall” type data distribution, and more dynamic Grid
worker nodes 

2.6. EVENT able FLOW
ata distribution. Using this system CMS is DATA to reach transfer rates of 500Mbps,
(giant map reduce)!
nd sustain transfers for long periods in tape-to-tape transfers. The system has been
sed to driven in part byof large scale Service Challenges for LCG this year.
manage branches 

CMS

•
•
•

politics, physical
roduction
limitations, budgets

~6 Tier-1
Centres (off-site)

Online
(HLT)

~10 online
streams (RAW)

Detector
Facility

ically High Energy Physics (HEP) exts have been able to rely on manpowere mechanisms for distributing data to geally dispersed collaborators. The newTier 0
ERN Large Hadron Collider (LHC) [1]
CMS-CAF
Tier 0
(CERN Analysis Facility)
ng necessitates a move to more scal~10 online
streams
(RAW)

~10 online
streams (RAW)

Primary
tape
archive

First pass
reconstruction

Regional
Centres

~50 Datasets
(RAW+RECO)

~50 Datasets
(RAW+RECO)
shared amongst
Tier 1's

Average of
~8 Datasets
per Tier 1
(RAW+RECO)

Secondary tape
archive

Tier 1
Tier 1
Tier 1
Tier 1
Tier 1 1
Tier
Analysis,
Calibration,
Re-reconstruction,
skim making...

Tier 1

Skims:
RECO,
AOD’s

~25 Tier-2
Centres

Small
Sites

Tier 2
Tier 2
Tier 2
Tier 2
Tier 2
Tier 2
Tier 2
Tier 2
Tier 2
Tier 2
Tier 2
Tier 2
Tier 2
Tier 2
Tier 2
Tier 2
Tier 2
Tier 2
Tier 2
Tier 2
Tier 2
Tier 2
Tier 2
Tier 2
Tier 2
Tier 2
Tier 2

Tier 2

1
data access

•
•

users submit jobs through
custom build interface 

(think EMR)!

4.7. GRID WORKLOAD MANAGEMENT SYSTEMS

jobs have parameters to take
them to blacklist/whitelist
tier-2 centers, access
datasets and simulation sets!

k

as

sig

n

re

qu

es

job

t jo

b

site boundary

job

subm
it
CMS
batc
h

ign

13

t job

looks a lot like a traditional
batch queue

ues

Site
Batch MGR

Local
Policy
Task Queue
ass

Harvest
Batch Slots

returns custom data
structures!

•

tas

Global Policy
Task Queue

managed from local or remote
location!

•

mit

req

•

UI

sub

6

Site
Batch Slot

CMS
Batch

Figure 4.3: Schematics showing the hierarchical task queue architecture.
outputs and maintenance

•

data reprocessing happens as software improves!

•

data formats consistency is enforced in major versions

NO BROKEN CODE IN PRODUCTION… even though it is crowdsourced among 100s of physicists who are awful at coding!

•

incremental improvements are proposed, reviewed, and systematically
merged!

•

read back data on remote storage areas on T2s and T3s or to scratch
areas!

•

T0 and T1 centers are kept for privileged users: those people doing
validation and data monitoring — not private analysis!

•

failed jobs can be re-submitted or continued as more data are available
the life of LHC data

•

information picked up by detectors!

•
•
•

data are clumped together to create events!
real-time hardware filtering — FIFO buffering!

raw data are sent to T0 center at CERN for archival (replicated at FNAL)!

•
•

transferred to T1 sites, reconstructed, filtered into primary datasets!
transferred to T2 sites, reconstructed skimmed and filtered !

•
•

accessed at this level by general users across the globe!

moved to laptops for direct analysis!

•

peer-review of analysis, lots of liquor, an eventual paper
15
each analysis is sacred

•

there are losses, systematic uncertainties, statistical uncertainties,
biases

http://www.slac.stanford.edu/econf/C030908/papers/TUIT001.pdf
16
data structuring and the framework that binds

•
•

not databases!

•

CMS software framework is written in C++ libraries and bound together in
Python framework!

data are put together in events and stored in custom C-based data
structures (streamed as nTuples)!

•
•
•
•

compile C++ code, load in specific libraries for the analysis from the
framework as-needed!
all parts are visible and configurable (no black boxes) !
processes, filters, new data constructions are controlled in Python and
read out per “event”!

compiled code is transported on-site when submitting jobs via the GRID
17
signal on the noise

and the other pair is required to have a mass in the range
shape. The 12–120 GeV. The ZZ background⇥ is evaluated from of the Z ⇥⇤ model where
reference width of the Z is taken from MC simulation studies. Two different approaches are employed to estimate
the reducible and instrumental backgrounds from data. it provides a sufficiently
M⇤ . The Z ⇥⇤ width is chosen as a reference becauseBoth start
by selecting events in a background control region, well separated
from the signal region, by relaxing
onance such that detector resolution the isolation andThe background shape for
dominates. identification
criteria for two same-flavour reconstructed leptons. In the first approach, the additional pair of leptons is required to have the same
d channel is normalized signal contamination) while in the second, two data above
charge (to avoid to the number of observed events in
opposite-charge leptons failing the isolation and identification criThe extended likelihood function, L,control region with three passing
teria are required. In addition, a is
leptons and one failing lepton is used to estimate contributions
from backgrounds with three prompt leptons and one misidenti⇥
N
fied lepton. The event rates measured⇤ the background control
in
µNsignal region µS (Rthe measured
e µ
µ
region are extrapolated to the
using ⇥ ) f + B f
(6–7)
L (m|R⇥ , M, , ⇥g , a, b, µB ) =
S
B ,
probability for a reconstructed N!
lepton to pass the isolation and
µ
µ
i=1
identification requirements. This probability is measured in an independent sample. Within uncertainties, comparable background
“m” represents the observable are estimated by both methods.
counts in the signal region in the measurement, which is the the invariant
The number of selected ZZ → 4ℓ candidate events in the mass
Fig. 4. Distribution of the four-lepton invariant mass for the ZZ → 4ℓ anal
rangeis the m4ℓ  160 GeV, in each of the three final states, is M⌅⌅  200 GeV.
110  total number of observed events where
pton pairs. “N”
The points represent the data, the filled histograms represent the backgro
given in Table 3, where m4ℓ is the four-lepton invariant mass. The
and the open histogram shows the signal expectation for a Higgs boson of m
number of predicted background events, in each of the three fiGeV,
Poisson meanstates, and theirnumber of expected background events.125after selection of events with K Dexpectation. The insetinshowstext. m4ℓ
of the total uncertainties are also given, together with mH = “µ’ isadded to the background  0.5, as described the the
the
tribution
nal
the number of signal events expected from a SM Higgs boson of
ean of the distribution from 4ℓ distribution is shown in Fig. 4. There is athat µ = µB + µS .
mH = 125 GeV. The m which N is an observation such
is tabulated using a simulation of the qq → ZZ/Zγ process.
clear peak at the Z boson mass where the decay Z → 4ℓ is restatistical analysis only includes events with m4ℓ  100 GeV.
constructed. This feature of the data is well reproduced by the
Fig. 5 (upper) shows the distribution of K D versus m4ℓ
background estimation. The figure also shows an excess of events
events selected in the 4ℓ subchannels. The colour-coded regi
above the expected background around 125 GeV. The total backshow the expected background. Fig. 5 (lower) shows the same tw
ground and thefor the Parton Density Function. bins
numbers of events observed in the three
be confused with PDF
dimensional distribution of events, but this time superimpo
where an excess is seen are also shown in Table 3. The combined
on the expected event density from a SM Higgs boson (mH
1
signal reconstruction and selection efficiency, with respect 8to the

•

sometimes we measure to calibrate,
discover new physics, find nothing!

•

at the end of the day, there will be a
multivariate analysis
workflow
rkflow Ladder
Number of users
Large datasets (100 TB)
Complex computation
Large datasets (100 TB)
Simple computation
Shared datasets (500 GB)
Complex computation
Shared datasets (10-500 GB)
Complex computation
Shared datasets (10-100 GB)
Simple computation
Shared datasets (0.1-10 GB)
Simple computation
Private datasets (0.1-10 GB)
Simple computation

}
}
}
19

Use Grid compute and storage exclusively

Work on departmental resources,
store resulting datasets to Grid storage

Work on laptop/desktop machine,
store resulting datasets to Grid storage
the take away…

•

particle physics has a special place in scale!

•
•
•
•
•
•

physicists are awful programmers — but no worse than most!
all machine learning gets reduced to being a “fit” of some sort!

a lot has been done since this was designed 15 years ago !
keep only what you can use — “skim data for fun and profit”!
humans are biased to find patterns — be careful and don’t trust yourself!

•
•
•

long history of real-time analysis, developed the proto-cloud!

exercise due diligence, test your algorithms, clean your data!
take time for Data Quality Monitoring (DQM)!

parallelize when you can
20
thanks!
@doomsuckle	

nickkypreos@gmail.com

More Related Content

What's hot

Tta protocolsfinalppt-140305235749-phpapp02
Tta protocolsfinalppt-140305235749-phpapp02Tta protocolsfinalppt-140305235749-phpapp02
Tta protocolsfinalppt-140305235749-phpapp02Hrudya Balachandran
 
Process Migration in Heterogeneous Systems
Process Migration in Heterogeneous SystemsProcess Migration in Heterogeneous Systems
Process Migration in Heterogeneous Systemsijsrd.com
 
Clock Synchronization in Distributed Systems
Clock Synchronization in Distributed SystemsClock Synchronization in Distributed Systems
Clock Synchronization in Distributed SystemsZbigniew Jerzak
 
Experiences with High-bandwidth Networks
Experiences with High-bandwidth NetworksExperiences with High-bandwidth Networks
Experiences with High-bandwidth Networksbalmanme
 
Google Spanner : our understanding of concepts and implications
Google Spanner : our understanding of concepts and implicationsGoogle Spanner : our understanding of concepts and implications
Google Spanner : our understanding of concepts and implicationsHarisankar H
 
Lecture 3
Lecture 3Lecture 3
Lecture 3Mr SMAK
 
Application of Parallel Processing
Application of Parallel ProcessingApplication of Parallel Processing
Application of Parallel Processingare you
 
Processor allocation in Distributed Systems
Processor allocation in Distributed SystemsProcessor allocation in Distributed Systems
Processor allocation in Distributed SystemsRitu Ranjan Shrivastwa
 
Real-Time Pedestrian Detection Using Apache Storm in a Distributed Environment
Real-Time Pedestrian Detection Using Apache Storm in a Distributed Environment Real-Time Pedestrian Detection Using Apache Storm in a Distributed Environment
Real-Time Pedestrian Detection Using Apache Storm in a Distributed Environment csandit
 
Synchronization For High Frequency Trading Networks: A How To Guide
Synchronization For High Frequency Trading Networks: A How To GuideSynchronization For High Frequency Trading Networks: A How To Guide
Synchronization For High Frequency Trading Networks: A How To Guidejeremyonyan
 
dos mutual exclusion algos
dos mutual exclusion algosdos mutual exclusion algos
dos mutual exclusion algosAkhil Sharma
 
Cloud data management
Cloud data managementCloud data management
Cloud data managementambitlick
 
Distributed computing time
Distributed computing timeDistributed computing time
Distributed computing timeDeepak John
 
Distribute Storage System May-2014
Distribute Storage System May-2014Distribute Storage System May-2014
Distribute Storage System May-2014Công Lợi Dương
 
Applications of paralleL processing
Applications of paralleL processingApplications of paralleL processing
Applications of paralleL processingPage Maker
 

What's hot (19)

Tta protocolsfinalppt-140305235749-phpapp02
Tta protocolsfinalppt-140305235749-phpapp02Tta protocolsfinalppt-140305235749-phpapp02
Tta protocolsfinalppt-140305235749-phpapp02
 
Process Migration in Heterogeneous Systems
Process Migration in Heterogeneous SystemsProcess Migration in Heterogeneous Systems
Process Migration in Heterogeneous Systems
 
Lecture 04 Chapter 1 - Introduction to Parallel Computing
Lecture 04  Chapter 1 - Introduction to Parallel ComputingLecture 04  Chapter 1 - Introduction to Parallel Computing
Lecture 04 Chapter 1 - Introduction to Parallel Computing
 
Lecture 04 chapter 2 - Parallel Programming Platforms
Lecture 04  chapter 2 - Parallel Programming PlatformsLecture 04  chapter 2 - Parallel Programming Platforms
Lecture 04 chapter 2 - Parallel Programming Platforms
 
Clock Synchronization in Distributed Systems
Clock Synchronization in Distributed SystemsClock Synchronization in Distributed Systems
Clock Synchronization in Distributed Systems
 
Experiences with High-bandwidth Networks
Experiences with High-bandwidth NetworksExperiences with High-bandwidth Networks
Experiences with High-bandwidth Networks
 
Google Spanner : our understanding of concepts and implications
Google Spanner : our understanding of concepts and implicationsGoogle Spanner : our understanding of concepts and implications
Google Spanner : our understanding of concepts and implications
 
Lecture 3
Lecture 3Lecture 3
Lecture 3
 
Application of Parallel Processing
Application of Parallel ProcessingApplication of Parallel Processing
Application of Parallel Processing
 
Processor allocation in Distributed Systems
Processor allocation in Distributed SystemsProcessor allocation in Distributed Systems
Processor allocation in Distributed Systems
 
Real-Time Pedestrian Detection Using Apache Storm in a Distributed Environment
Real-Time Pedestrian Detection Using Apache Storm in a Distributed Environment Real-Time Pedestrian Detection Using Apache Storm in a Distributed Environment
Real-Time Pedestrian Detection Using Apache Storm in a Distributed Environment
 
Lecture3
Lecture3Lecture3
Lecture3
 
Synchronization For High Frequency Trading Networks: A How To Guide
Synchronization For High Frequency Trading Networks: A How To GuideSynchronization For High Frequency Trading Networks: A How To Guide
Synchronization For High Frequency Trading Networks: A How To Guide
 
dos mutual exclusion algos
dos mutual exclusion algosdos mutual exclusion algos
dos mutual exclusion algos
 
Cloud data management
Cloud data managementCloud data management
Cloud data management
 
Distributed computing time
Distributed computing timeDistributed computing time
Distributed computing time
 
Distribute Storage System May-2014
Distribute Storage System May-2014Distribute Storage System May-2014
Distribute Storage System May-2014
 
Introduction to parallel computing
Introduction to parallel computingIntroduction to parallel computing
Introduction to parallel computing
 
Applications of paralleL processing
Applications of paralleL processingApplications of paralleL processing
Applications of paralleL processing
 

Viewers also liked

Presentacionintromomome 021009
Presentacionintromomome 021009Presentacionintromomome 021009
Presentacionintromomome 021009Daniel Cardenas
 
Semana5 riesgos
Semana5 riesgosSemana5 riesgos
Semana5 riesgosjuliohorna
 
Trabajo práctico Nº 3
Trabajo práctico Nº 3Trabajo práctico Nº 3
Trabajo práctico Nº 3rominalopez11
 
Apple- A business school on its own
Apple- A business school on its ownApple- A business school on its own
Apple- A business school on its ownnaikajbhobe
 
Guia aritmetica
Guia aritmeticaGuia aritmetica
Guia aritmeticaAndrea0829
 
Terremoto volcanes
Terremoto volcanesTerremoto volcanes
Terremoto volcanessucasaca
 
Números relativos 2
Números relativos 2Números relativos 2
Números relativos 2Maria Baião
 
Animales 2
Animales 2Animales 2
Animales 224utopia
 
Taller rosa[1]
Taller rosa[1]Taller rosa[1]
Taller rosa[1]daramunay
 
402 orlando vega energías renovables y desarrollo rural inclusivo
402 orlando vega   energías renovables y desarrollo rural inclusivo402 orlando vega   energías renovables y desarrollo rural inclusivo
402 orlando vega energías renovables y desarrollo rural inclusivoGVEP International LAC
 
Seisen presentacion corporativa v1
Seisen   presentacion corporativa v1Seisen   presentacion corporativa v1
Seisen presentacion corporativa v1ingfelipe244
 
Nota de prensa archivo caso dehesilla
Nota de prensa archivo caso dehesillaNota de prensa archivo caso dehesilla
Nota de prensa archivo caso dehesillappmarmolejo
 
Moición las 15 presentada
Moición las 15 presentadaMoición las 15 presentada
Moición las 15 presentadappmarmolejo
 

Viewers also liked (20)

Agua
AguaAgua
Agua
 
Presentacionintromomome 021009
Presentacionintromomome 021009Presentacionintromomome 021009
Presentacionintromomome 021009
 
Proyecto ambiental escolar
Proyecto ambiental escolarProyecto ambiental escolar
Proyecto ambiental escolar
 
Semana5 riesgos
Semana5 riesgosSemana5 riesgos
Semana5 riesgos
 
Trabajo práctico Nº 3
Trabajo práctico Nº 3Trabajo práctico Nº 3
Trabajo práctico Nº 3
 
Zhb
ZhbZhb
Zhb
 
Apple- A business school on its own
Apple- A business school on its ownApple- A business school on its own
Apple- A business school on its own
 
Guia aritmetica
Guia aritmeticaGuia aritmetica
Guia aritmetica
 
Ruslan communications
Ruslan communicationsRuslan communications
Ruslan communications
 
Terremoto volcanes
Terremoto volcanesTerremoto volcanes
Terremoto volcanes
 
Números relativos 2
Números relativos 2Números relativos 2
Números relativos 2
 
A shabarov cv_en
A shabarov cv_enA shabarov cv_en
A shabarov cv_en
 
Animales 2
Animales 2Animales 2
Animales 2
 
Taller rosa[1]
Taller rosa[1]Taller rosa[1]
Taller rosa[1]
 
A shabarov cv_rus
A shabarov cv_rusA shabarov cv_rus
A shabarov cv_rus
 
402 orlando vega energías renovables y desarrollo rural inclusivo
402 orlando vega   energías renovables y desarrollo rural inclusivo402 orlando vega   energías renovables y desarrollo rural inclusivo
402 orlando vega energías renovables y desarrollo rural inclusivo
 
Seisen presentacion corporativa v1
Seisen   presentacion corporativa v1Seisen   presentacion corporativa v1
Seisen presentacion corporativa v1
 
Ensayo capitulo 2
Ensayo capitulo 2Ensayo capitulo 2
Ensayo capitulo 2
 
Nota de prensa archivo caso dehesilla
Nota de prensa archivo caso dehesillaNota de prensa archivo caso dehesilla
Nota de prensa archivo caso dehesilla
 
Moición las 15 presentada
Moición las 15 presentadaMoición las 15 presentada
Moición las 15 presentada
 

Similar to Scalability20140226

Hw09 Hadoop Based Data Mining Platform For The Telecom Industry
Hw09   Hadoop Based Data Mining Platform For The Telecom IndustryHw09   Hadoop Based Data Mining Platform For The Telecom Industry
Hw09 Hadoop Based Data Mining Platform For The Telecom IndustryCloudera, Inc.
 
Programmable Exascale Supercomputer
Programmable Exascale SupercomputerProgrammable Exascale Supercomputer
Programmable Exascale SupercomputerSagar Dolas
 
Graph Hardware Architecture - Enterprise graphs deserve great hardware!
Graph Hardware Architecture - Enterprise graphs deserve great hardware!Graph Hardware Architecture - Enterprise graphs deserve great hardware!
Graph Hardware Architecture - Enterprise graphs deserve great hardware!TigerGraph
 
Ling liu part 02:big graph processing
Ling liu part 02:big graph processingLing liu part 02:big graph processing
Ling liu part 02:big graph processingjins0618
 
Floating Point Operations , Memory Chip Organization , Serial Bus Architectur...
Floating Point Operations , Memory Chip Organization , Serial Bus Architectur...Floating Point Operations , Memory Chip Organization , Serial Bus Architectur...
Floating Point Operations , Memory Chip Organization , Serial Bus Architectur...VAISHNAVI MADHAN
 
Floating Point Operations , Memory Chip Organization , Serial Bus Architectur...
Floating Point Operations , Memory Chip Organization , Serial Bus Architectur...Floating Point Operations , Memory Chip Organization , Serial Bus Architectur...
Floating Point Operations , Memory Chip Organization , Serial Bus Architectur...KRamasamy2
 
Distributed Monte Carlo Feature Selection: Extracting Informative Features Ou...
Distributed Monte Carlo Feature Selection: Extracting Informative Features Ou...Distributed Monte Carlo Feature Selection: Extracting Informative Features Ou...
Distributed Monte Carlo Feature Selection: Extracting Informative Features Ou...Łukasz Król
 
Solving Cybersecurity at Scale
Solving Cybersecurity at ScaleSolving Cybersecurity at Scale
Solving Cybersecurity at ScaleDataWorks Summit
 
Tsinghua University: Two Exemplary Applications in China
Tsinghua University: Two Exemplary Applications in ChinaTsinghua University: Two Exemplary Applications in China
Tsinghua University: Two Exemplary Applications in ChinaDataStax Academy
 
The Power of Determinism in Database Systems
The Power of Determinism in Database SystemsThe Power of Determinism in Database Systems
The Power of Determinism in Database SystemsDaniel Abadi
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big DataAlbert Bifet
 
Challenges and Opportunities of Big Data Genomics
Challenges and Opportunities of Big Data GenomicsChallenges and Opportunities of Big Data Genomics
Challenges and Opportunities of Big Data GenomicsYasin Memari
 
Fdi extreme features
Fdi extreme featuresFdi extreme features
Fdi extreme features360cell
 
Improving Efficiency of Machine Learning Algorithms using HPCC Systems
Improving Efficiency of Machine Learning Algorithms using HPCC SystemsImproving Efficiency of Machine Learning Algorithms using HPCC Systems
Improving Efficiency of Machine Learning Algorithms using HPCC SystemsHPCC Systems
 
Network-aware Data Management for High Throughput Flows Akamai, Cambridge, ...
Network-aware Data Management for High Throughput Flows   Akamai, Cambridge, ...Network-aware Data Management for High Throughput Flows   Akamai, Cambridge, ...
Network-aware Data Management for High Throughput Flows Akamai, Cambridge, ...balmanme
 
Distributed Decision Tree Learning for Mining Big Data Streams
Distributed Decision Tree Learning for Mining Big Data StreamsDistributed Decision Tree Learning for Mining Big Data Streams
Distributed Decision Tree Learning for Mining Big Data StreamsArinto Murdopo
 
A Dataflow Processing Chip for Training Deep Neural Networks
A Dataflow Processing Chip for Training Deep Neural NetworksA Dataflow Processing Chip for Training Deep Neural Networks
A Dataflow Processing Chip for Training Deep Neural Networksinside-BigData.com
 

Similar to Scalability20140226 (20)

Exascale Capabl
Exascale CapablExascale Capabl
Exascale Capabl
 
Hw09 Hadoop Based Data Mining Platform For The Telecom Industry
Hw09   Hadoop Based Data Mining Platform For The Telecom IndustryHw09   Hadoop Based Data Mining Platform For The Telecom Industry
Hw09 Hadoop Based Data Mining Platform For The Telecom Industry
 
Programmable Exascale Supercomputer
Programmable Exascale SupercomputerProgrammable Exascale Supercomputer
Programmable Exascale Supercomputer
 
Graph Hardware Architecture - Enterprise graphs deserve great hardware!
Graph Hardware Architecture - Enterprise graphs deserve great hardware!Graph Hardware Architecture - Enterprise graphs deserve great hardware!
Graph Hardware Architecture - Enterprise graphs deserve great hardware!
 
Ling liu part 02:big graph processing
Ling liu part 02:big graph processingLing liu part 02:big graph processing
Ling liu part 02:big graph processing
 
Floating Point Operations , Memory Chip Organization , Serial Bus Architectur...
Floating Point Operations , Memory Chip Organization , Serial Bus Architectur...Floating Point Operations , Memory Chip Organization , Serial Bus Architectur...
Floating Point Operations , Memory Chip Organization , Serial Bus Architectur...
 
Floating Point Operations , Memory Chip Organization , Serial Bus Architectur...
Floating Point Operations , Memory Chip Organization , Serial Bus Architectur...Floating Point Operations , Memory Chip Organization , Serial Bus Architectur...
Floating Point Operations , Memory Chip Organization , Serial Bus Architectur...
 
Distributed Monte Carlo Feature Selection: Extracting Informative Features Ou...
Distributed Monte Carlo Feature Selection: Extracting Informative Features Ou...Distributed Monte Carlo Feature Selection: Extracting Informative Features Ou...
Distributed Monte Carlo Feature Selection: Extracting Informative Features Ou...
 
Solving Cybersecurity at Scale
Solving Cybersecurity at ScaleSolving Cybersecurity at Scale
Solving Cybersecurity at Scale
 
Tsinghua University: Two Exemplary Applications in China
Tsinghua University: Two Exemplary Applications in ChinaTsinghua University: Two Exemplary Applications in China
Tsinghua University: Two Exemplary Applications in China
 
Par com
Par comPar com
Par com
 
The Power of Determinism in Database Systems
The Power of Determinism in Database SystemsThe Power of Determinism in Database Systems
The Power of Determinism in Database Systems
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big Data
 
Challenges and Opportunities of Big Data Genomics
Challenges and Opportunities of Big Data GenomicsChallenges and Opportunities of Big Data Genomics
Challenges and Opportunities of Big Data Genomics
 
Fdi extreme features
Fdi extreme featuresFdi extreme features
Fdi extreme features
 
Improving Efficiency of Machine Learning Algorithms using HPCC Systems
Improving Efficiency of Machine Learning Algorithms using HPCC SystemsImproving Efficiency of Machine Learning Algorithms using HPCC Systems
Improving Efficiency of Machine Learning Algorithms using HPCC Systems
 
Network-aware Data Management for High Throughput Flows Akamai, Cambridge, ...
Network-aware Data Management for High Throughput Flows   Akamai, Cambridge, ...Network-aware Data Management for High Throughput Flows   Akamai, Cambridge, ...
Network-aware Data Management for High Throughput Flows Akamai, Cambridge, ...
 
Machine Learning @NECST
Machine Learning @NECSTMachine Learning @NECST
Machine Learning @NECST
 
Distributed Decision Tree Learning for Mining Big Data Streams
Distributed Decision Tree Learning for Mining Big Data StreamsDistributed Decision Tree Learning for Mining Big Data Streams
Distributed Decision Tree Learning for Mining Big Data Streams
 
A Dataflow Processing Chip for Training Deep Neural Networks
A Dataflow Processing Chip for Training Deep Neural NetworksA Dataflow Processing Chip for Training Deep Neural Networks
A Dataflow Processing Chip for Training Deep Neural Networks
 

Scalability20140226

  • 1. The CERN Approach in 
 the Age of “Big Data” T. Nicholas Kypreos, Ph.D. ! Principal Data Scientist and Project Manager MAQ Software Seattle Scalability Meetup, 26 Feb, 2014
  • 2. acknowledgments • CMS Collaboration and CERN ! • • • • FNAL for support in writing my dissertation… and for computing facilities to abuse ! Bradford Stephens for letting me do this… ! Mike Miller and Adam Kocoloski from Cloudant for letting me steal some nice slides on the subject! ! • • UF for sending me there! Cloudant was just bought by IBM — so clearly all physicists are experts in the cloud! everyone here for suffering this diatribe 2
  • 4. the life of LHC data • information picked up by detectors! • • • data are clumped together to create events! real-time hardware filtering — FIFO buffering! raw data are sent to T0 center at CERN for archival (replicated at FNAL)! • • transferred to T1 sites, reconstructed, filtered into primary datasets! transferred to T2 sites, reconstructed skimmed and filtered ! • • accessed at this level by general users across the globe! moved to laptops for direct analysis! • peer-review of analysis, lots of liquor, an eventual paper 4
  • 5. the reader Figure 3-3. Slice of the CMS detector in the r particles. • lots of ways you can 39 describe this! • 5 plane showing the paths of diff collection of detectors reading out information (like a camera?)
  • 6. LHC 2010-2011 LHC 2012+ luminosity • LHC can produce all SM particles across a range of energies! • • • production is not equiprobable -governed by the cross section, σ ! small cross sections correspond to rarer processes in nature! event production depends on integrated luminosity of the machine Nevents = • (pp ! X) L probability x trials higher luminosity - more events I WANT THE UNICORNS! 6
  • 7. information has to all come together somehow… • trigger system helps determine what we want to keep and throw away 
 (hardware-based pattern recognition) ! • real-time interface with detectors keeps data flowing that’s not broken Figure 3-16. Architecture of the CMS Level 1 trigger. and collected at the right time — constant feedback and data validation this still hurts my brain 7 Figure 3-17. Overview of the CMS trigger control system.
  • 8. what that information looks like…
 not even the highest granularity… 8
  • 9. triggering • collision rate of 40 MHz! • • 100 Hz can be written to tape
 ➞ rate reduction is necessary! rate reduced in two phases! • Level-1 (hardware)! • • High Level Trigger (HLT) (software)! • • collects the full event 100 kHz out! on-site computing farm performs fast partial reconstruction! pack trigger menus into bits users fight for priority 9
  • 10. what we’re aiming for… • • lots of filtering! • multivariate analysis, machine learning, derp-aderp, Kalman filtering… fitting and clustering! • analysis teams have to fight for their analysis priority — you can’t always get what you want data from online monitoring goes to a computer farm (HLT) ! 10
  • 11. the life of LHC data • information picked up by detectors! • • • data are clumped together to create events! real-time hardware filtering — FIFO software filtering! raw data are sent to T0 center at CERN for archival (replicated at FNAL)! • • transferred to T1 sites, reconstructed, filtered into primary datasets! transferred to T2 sites, reconstructed skimmed and filtered ! • • accessed at this level by general users across the globe! moved to laptops for direct analysis! • peer-review of analysis, lots of liquor, an eventual paper 11
  • 12. he CMS experiment has taken a novel approach to Grid data distribution. Instead having a central processing component making global decisions on replica allocatiers on, CMS has a data management layer composed of a series of collaborating agents; 1PB/week each persistent, be a bit he agents arecenter maystateless processes which manage specific parts of replicaon operations at each non- the distribution network. The agents communicate different (think site in e a bit different ynchronously through centers)! standard data a blackboard architecture; as a result the system is robust in he face ofsites with 100’s TB CMS’ data distribution network is represented in a many local failures. (100) milar fashionsites with 100sThis means that the management of multi-hop transto the internet. many worker nodes of ge, 10000’s rs at datasetstorage, 1000s of system allows CMS to seamlessly bridge the gap TB of level is simple. This so many? Politics, power budget, etween traditional HEP “waterfall” type data distribution, and more dynamic Grid worker nodes 
 2.6. EVENT able FLOW ata distribution. Using this system CMS is DATA to reach transfer rates of 500Mbps, (giant map reduce)! nd sustain transfers for long periods in tape-to-tape transfers. The system has been sed to driven in part byof large scale Service Challenges for LCG this year. manage branches 
 CMS • • • politics, physical roduction limitations, budgets ~6 Tier-1 Centres (off-site) Online (HLT) ~10 online streams (RAW) Detector Facility ically High Energy Physics (HEP) exts have been able to rely on manpowere mechanisms for distributing data to geally dispersed collaborators. The newTier 0 ERN Large Hadron Collider (LHC) [1] CMS-CAF Tier 0 (CERN Analysis Facility) ng necessitates a move to more scal~10 online streams (RAW) ~10 online streams (RAW) Primary tape archive First pass reconstruction Regional Centres ~50 Datasets (RAW+RECO) ~50 Datasets (RAW+RECO) shared amongst Tier 1's Average of ~8 Datasets per Tier 1 (RAW+RECO) Secondary tape archive Tier 1 Tier 1 Tier 1 Tier 1 Tier 1 1 Tier Analysis, Calibration, Re-reconstruction, skim making... Tier 1 Skims: RECO, AOD’s ~25 Tier-2 Centres Small Sites Tier 2 Tier 2 Tier 2 Tier 2 Tier 2 Tier 2 Tier 2 Tier 2 Tier 2 Tier 2 Tier 2 Tier 2 Tier 2 Tier 2 Tier 2 Tier 2 Tier 2 Tier 2 Tier 2 Tier 2 Tier 2 Tier 2 Tier 2 Tier 2 Tier 2 Tier 2 Tier 2 Tier 2 1
  • 13. data access • • users submit jobs through custom build interface 
 (think EMR)! 4.7. GRID WORKLOAD MANAGEMENT SYSTEMS jobs have parameters to take them to blacklist/whitelist tier-2 centers, access datasets and simulation sets! k as sig n re qu es job t jo b site boundary job subm it CMS batc h ign 13 t job looks a lot like a traditional batch queue ues Site Batch MGR Local Policy Task Queue ass Harvest Batch Slots returns custom data structures! • tas Global Policy Task Queue managed from local or remote location! • mit req • UI sub 6 Site Batch Slot CMS Batch Figure 4.3: Schematics showing the hierarchical task queue architecture.
  • 14. outputs and maintenance • data reprocessing happens as software improves! • data formats consistency is enforced in major versions
 NO BROKEN CODE IN PRODUCTION… even though it is crowdsourced among 100s of physicists who are awful at coding! • incremental improvements are proposed, reviewed, and systematically merged! • read back data on remote storage areas on T2s and T3s or to scratch areas! • T0 and T1 centers are kept for privileged users: those people doing validation and data monitoring — not private analysis! • failed jobs can be re-submitted or continued as more data are available
  • 15. the life of LHC data • information picked up by detectors! • • • data are clumped together to create events! real-time hardware filtering — FIFO buffering! raw data are sent to T0 center at CERN for archival (replicated at FNAL)! • • transferred to T1 sites, reconstructed, filtered into primary datasets! transferred to T2 sites, reconstructed skimmed and filtered ! • • accessed at this level by general users across the globe! moved to laptops for direct analysis! • peer-review of analysis, lots of liquor, an eventual paper 15
  • 16. each analysis is sacred • there are losses, systematic uncertainties, statistical uncertainties, biases http://www.slac.stanford.edu/econf/C030908/papers/TUIT001.pdf 16
  • 17. data structuring and the framework that binds • • not databases! • CMS software framework is written in C++ libraries and bound together in Python framework! data are put together in events and stored in custom C-based data structures (streamed as nTuples)! • • • • compile C++ code, load in specific libraries for the analysis from the framework as-needed! all parts are visible and configurable (no black boxes) ! processes, filters, new data constructions are controlled in Python and read out per “event”! compiled code is transported on-site when submitting jobs via the GRID 17
  • 18. signal on the noise and the other pair is required to have a mass in the range shape. The 12–120 GeV. The ZZ background⇥ is evaluated from of the Z ⇥⇤ model where reference width of the Z is taken from MC simulation studies. Two different approaches are employed to estimate the reducible and instrumental backgrounds from data. it provides a sufficiently M⇤ . The Z ⇥⇤ width is chosen as a reference becauseBoth start by selecting events in a background control region, well separated from the signal region, by relaxing onance such that detector resolution the isolation andThe background shape for dominates. identification criteria for two same-flavour reconstructed leptons. In the first approach, the additional pair of leptons is required to have the same d channel is normalized signal contamination) while in the second, two data above charge (to avoid to the number of observed events in opposite-charge leptons failing the isolation and identification criThe extended likelihood function, L,control region with three passing teria are required. In addition, a is leptons and one failing lepton is used to estimate contributions from backgrounds with three prompt leptons and one misidenti⇥ N fied lepton. The event rates measured⇤ the background control in µNsignal region µS (Rthe measured e µ µ region are extrapolated to the using ⇥ ) f + B f (6–7) L (m|R⇥ , M, , ⇥g , a, b, µB ) = S B , probability for a reconstructed N! lepton to pass the isolation and µ µ i=1 identification requirements. This probability is measured in an independent sample. Within uncertainties, comparable background “m” represents the observable are estimated by both methods. counts in the signal region in the measurement, which is the the invariant The number of selected ZZ → 4ℓ candidate events in the mass Fig. 4. Distribution of the four-lepton invariant mass for the ZZ → 4ℓ anal rangeis the m4ℓ 160 GeV, in each of the three final states, is M⌅⌅ 200 GeV. 110 total number of observed events where pton pairs. “N” The points represent the data, the filled histograms represent the backgro given in Table 3, where m4ℓ is the four-lepton invariant mass. The and the open histogram shows the signal expectation for a Higgs boson of m number of predicted background events, in each of the three fiGeV, Poisson meanstates, and theirnumber of expected background events.125after selection of events with K Dexpectation. The insetinshowstext. m4ℓ of the total uncertainties are also given, together with mH = “µ’ isadded to the background 0.5, as described the the the tribution nal the number of signal events expected from a SM Higgs boson of ean of the distribution from 4ℓ distribution is shown in Fig. 4. There is athat µ = µB + µS . mH = 125 GeV. The m which N is an observation such is tabulated using a simulation of the qq → ZZ/Zγ process. clear peak at the Z boson mass where the decay Z → 4ℓ is restatistical analysis only includes events with m4ℓ 100 GeV. constructed. This feature of the data is well reproduced by the Fig. 5 (upper) shows the distribution of K D versus m4ℓ background estimation. The figure also shows an excess of events events selected in the 4ℓ subchannels. The colour-coded regi above the expected background around 125 GeV. The total backshow the expected background. Fig. 5 (lower) shows the same tw ground and thefor the Parton Density Function. bins numbers of events observed in the three be confused with PDF dimensional distribution of events, but this time superimpo where an excess is seen are also shown in Table 3. The combined on the expected event density from a SM Higgs boson (mH 1 signal reconstruction and selection efficiency, with respect 8to the • sometimes we measure to calibrate, discover new physics, find nothing! • at the end of the day, there will be a multivariate analysis
  • 19. workflow rkflow Ladder Number of users Large datasets (100 TB) Complex computation Large datasets (100 TB) Simple computation Shared datasets (500 GB) Complex computation Shared datasets (10-500 GB) Complex computation Shared datasets (10-100 GB) Simple computation Shared datasets (0.1-10 GB) Simple computation Private datasets (0.1-10 GB) Simple computation } } } 19 Use Grid compute and storage exclusively Work on departmental resources, store resulting datasets to Grid storage Work on laptop/desktop machine, store resulting datasets to Grid storage
  • 20. the take away… • particle physics has a special place in scale! • • • • • • physicists are awful programmers — but no worse than most! all machine learning gets reduced to being a “fit” of some sort! a lot has been done since this was designed 15 years ago ! keep only what you can use — “skim data for fun and profit”! humans are biased to find patterns — be careful and don’t trust yourself! • • • long history of real-time analysis, developed the proto-cloud! exercise due diligence, test your algorithms, clean your data! take time for Data Quality Monitoring (DQM)! parallelize when you can 20