SlideShare a Scribd company logo
1 of 21
© 2009 IBM Corporation
Leveraging Information for Smarter Organizational Outcomes
1 © 2009 IBM CorporationIBM Confidential
June, 2013
1© 2009 IBM Corporation
Identity and Biometrics in
the Big Data & Analytics
Context
Dr. Charles Li
Analytics Solution Center
Washington, DC
Charles _Li@us.ibm.com
Leveraging Information for Smarter Organizational Outcomes
© 2009 IBM Corporation
Leveraging Information for Smarter Organizational Outcomes
2
Topics
ID Management, Identity & Biometrics
Views on Biometrics Technology and
System
The Concept of the Big Data,
Analytics and Challenges
Identity Establishment from All
Sources
Identity and Biometrics in the Cloud
Identity and Biometrics Analytics in
Near Real Time
Summary
© 2009 IBM Corporation
Leveraging Information for Smarter Organizational Outcomes
ID Management, Identity and Biometrics
Identity
Elements
Players
Entitlement(s)
Actions
Identity
Trust
(Rules)
Status
(Environment)
Reputation
(History)
Identity Management
© 2009 IBM Corporation
Leveraging Information for Smarter Organizational Outcomes
Views on biometrics
technology and system
4
What is missing?
© 2009 IBM Corporation
Leveraging Information for Smarter Organizational Outcomes
5
Extract insight from a high volume, variety and velocity of data in a
timely and cost-effective manner
Big Data Concept
Data in many forms –
structured, unstructured, text
and multimedia
Data in Motion – Analysis of
streaming data to enable
decisions within fractions of a
second
Data at Scale - from
terabytes to zettabytes
Variety:
Velocity:
Volume:
© 2009 IBM Corporation
Leveraging Information for Smarter Organizational Outcomes
6
Analytics Concept
Structured
Data &
Unstructured
Content
Descriptive
Analytics
Prescriptive
Analytics
Predictive
Analytics
Made
consumable
and
accessible to
everyone
What if
these
trends
continue?
Forecasting
How can we
achieve the best
outcome and
address variability?
Stochastic
Optimisation
What is
happening
What
exactly is
the
problem?
How many,
how often,
where?
What
actions are
needed?
What could
happen?
Simulation
How can we
achieve the best
outcome?
Optimisation
What will
happen
next if?
Predictive
Modelling
Extracting
insight,
concepts and
relationships
Content
Analytics
Deep insights
to improve
visualization
and
marketing
interactions
Visual
Analytics
© 2009 IBM Corporation
Leveraging Information for Smarter Organizational Outcomes
Biometrics Data at Scale – Static & Single Instance
1 Billion Arrivals 2012 world wide
United States – 100-200 million
international arrivals 2012
1 Exabytes traveling data
Unique Identification Authority of India (UIDAI)
plans to enroll 1.2 billion citizens.(UID
Program) ( enroll million /day; half billion by
2014) 3-4 Exabytes Biometrics &
Biographic Data
Prolific Usage of Mobile Phones
6 Billion Mobile Phones
6 Exabytes of behavior data
ID Cards/Border Crossings/Benefits/Multiple
Instances
7,000,000,000x(10 Print 0.5-1MB + Face 200KB +
IRIS KB)
7 Exabytes
EU VIS Biometrics Matching System (BMS) at
70 million individuals and 100K daily enrollment
~100 Terabyte
US DoS has in the range of
100 million faces & Others
~ at least 10-50 Terabytes
DHS IDENT over 150 million
identities;
125,000 transactions daily
~100-300 Terabytes
FBI NGI ~ over100 Million
Fingerprints & More coming plus
Faces/Iris
~100-200 Terabytes
1 GigaBytes = 1000MB
1 TeraBytes = 1000GB
1 PetaBytes = 1000TB
1 ExaByes = 1000PB
1 ZettaBytes = 1000EB
1 YottaBytes = 1000ZB
many instances, history, transaction, logs… data in reality
© 2009 IBM Corporation
Leveraging Information for Smarter Organizational Outcomes
8
Big Data Sources
System Transaction, Log and Transition Data – Several Times More!
© 2009 IBM Corporation
Leveraging Information for Smarter Organizational Outcomes
Other Big data examples
150 Exabytes global size of
“Big Data” in Healthcare, growing
between 1.2 and 2.4 EX / year
For every session, NY Stock
Exchange captures 1 Terabyte
of trade information
AT&T transfers about
30 Petabytes of data through
its network daily
Hadron Collider at CERN
generates 40 Terabytes
of usable data / day
Facebook processes
500+ Terabytes of data daily
Google processes
> 24 Petabytes
of data in a single day
Twitter processes
12 Terabytes of data daily
By 2016, annual Internet traffic
will reach 1.3 Zettabytes
We don’t have the most challenging problem!
© 2009 IBM Corporation
Leveraging Information for Smarter Organizational Outcomes
“Brutal Force” De-Duplication
• Cumulative de-duplication / Total number of checks= N(N-1)/2 –
“Combination Problem”
• De-duplicate 100 million population enrollment results
4,999,999,950,000,000 checking!!!
• 15 years to complete with 10 million matches per second
Biometric Accuracy Challenge
• FMR at 1 Identification false match per million;
• 500 False Matches with 1 million enrollment population
• 5 million false matches with 100 million enrollment population
Biometric Performance at Giga Scale*
* Courtesy to Bojan Cukic* Courtesy to Bojan Cukic
Prohibitive!
We have some unique challenges!
Prohibitive!
We have some unique challenges!
© 2009 IBM Corporation
Leveraging Information for Smarter Organizational Outcomes
Face the Challenges
Identity Establishment with All Data Sources
- Leverage Entity Resolution Technologies
Biometrics Services in the Cloud
- Leverage Big Data Infrastructure, Platforms and Software Services
Identity and Biometrics Analytics in Motion
11
© 2009 IBM Corporation
Leveraging Information for Smarter Organizational Outcomes
Establishment Identity with All Sources
Biometrics(physical and behavioral)
Biographic information
Behavior data (Social media usage)
Travel data (API, PNR)
Banking Information
Web or Desktop usage behavior
• Emails
• Multimedia
Spatial and temporal information
12
Entity /Identity
Resolution
With all
Sources
Entity / Identity Resolution - a
complex process involving the
application of sophisticated
algorithms across multiple
heterogeneous data sources to
resolve multiple records into a
single fused view of an individual
• Reduce search space and• Reduce search space and
computing resources
• Compliment to low quality images
• Cost and benefits tradeoff
• Systematic research necessary
• Successful programs
© 2009 IBM Corporation
Leveraging Information for Smarter Organizational Outcomes
Infrastructure
Platform
Management
and Administration
Availability and
Performance
Security and
Compliance
Usage and
Accounting
Enterprise
Application Services
Application
Lifecycle
Application
Resources
Application
Environments
Application
Management
Integration
Cloud Services
Infrastructure and Platform as a Service
Smarter Commerce Smarter Cities
Social BusinessBusiness Analytics
and Optimization
Enterprise+
Cloud Solutions
Software and Business Process as a Service
Infrastructure
aaS
Platform
PaaS
Software
SaaS
Business Process
BPaaS
Deployment
Private, Public and Hybrid Models
Biometrics Services in the Cloud - Leverage Big Data
Infrastructure, Platform and Software Services
Standard Interface
Process
Data
Process
Data
Process
Data
Process
Data
Process
Data
Process
Data
Process
Data
Process
Data
Process
Data
Enrolment Service
1:1 Identification Service
….
Fingerprint Biometric Data
Iris
Face
Note: Cloud & Big Data not the same
© 2009 IBM Corporation
Leveraging Information for Smarter Organizational Outcomes
A Prototype - Leveraging the cloud for Big Data Biometrics
• E. Kohlwey et al. “Leveraging the Cloud for Big Data Biometrics,
2011
• A prototype system for generalized searching of cloud-scale
biometric data as well as an application of this system to the task of
matching collection of synthetic human iris images
• Implemented with Hadoop (Map/Reduce framework)
Successful deployment of Identification algorithms for India
UID program
• Non-traditional matching vendor technologies
Biometrics as a Service
• Business process as a service
• Software as a service
14
Progress
© 2009 IBM Corporation
Leveraging Information for Smarter Organizational Outcomes
Focus on Parallelism and Scalability
• Excellent research and testing areas
• Bring algorithms into operational environment
Explore defining biometrics as a service program –
new way of thinking about acquisition
• Business process as a service
• Software as a service
Encourage partnership among Big Data & Analytics
developers, traditional biometrics solution
providers
• Big Data and Analytics players
15
Challenges
© 2009 IBM Corporation
Leveraging Information for Smarter Organizational Outcomes
Big Data Appliance Examples
IBM Nettezza
Oracle EXADATA
Terradata
EMC2 Greenplum
SAP HANA
Schooner Appliance MySQL
Example - (CBP) 40TB data (per appliance, a few hundreds
cores) hosted by a little more than a dozen appliances support
30 – 40 % of DHS’s operations
16
© 2009 IBM Corporation
Leveraging Information for Smarter Organizational Outcomes
17
Identity and Biometrics Analytics in Near Real Time
ROC curve calibration along the security vs convenience
• Allow systems to dynamically change operation criteria based on live situation
• This is a real challenge due to the needed ground truth…
Quality Feedback to the Collection
• Avoid collecting ‘bad’ data to degrade the system
Operating Metrics Monitoring
• Rates on enrollment, rejection and etc.
• Geo-location and temporal information
Fuse all data sources based on real time feedback
• Dynamically allocating fusion algorithms and configurations
Provide controlled parallelism
• System and algorithms levels
© 2009 IBM Corporation
Leveraging Information for Smarter Organizational Outcomes
Achieve scale:
By partitioning applications into software components
By distributing across stream-connected hardware hosts
Infrastructure provides services for
Scheduling analytics across hardware hosts,
Establishing streaming connectivity
Transform
Filter / Sample
Classify
Correlate
Annotate
Where appropriate:
Elements can be fused together
for lower communication latency
Continuous ingestion
Continuous analysis
One Approach - Streams Technology in Working
© 2013 IBM
Corporation1
Near Real Time on Big Data Platform
© 2009 IBM Corporation
Leveraging Information for Smarter Organizational Outcomes
19
Summary
Re-focus on Identity
• Biometrics as an enabling technology
Re-thinking on
• Open architecture
• Vendor agnostic solution via biometrics middleware
Big Impact by Big Data and Cloud Technologies
• Biometrics as a Service to Leverage Cloud Computing
Big Data Real Time Platform
• Near real time analytics requirements
© 2009 IBM Corporation
Leveraging Information for Smarter Organizational Outcomes
20
Page 20 6/18/2013
© 2009 IBM Corporation
Leveraging Information for Smarter Organizational Outcomes
21
A New Look - Identity and Biometrics Analytics
Stream in
Parallel
Big Data
Platform
Entity /Identity
Resolution
Big Data
Solution
Pipeline
Identification
Services
Including
many
Models
Massively
Parallel
Processing
Real
Time
High
Volume
Travel Data
Banking Data
Spatial Data
Temporal Data
Real-time feeds
Biometrics
Capture Data
Biographic
Data
Unstructured data
Social Media
Info on Web
Behavioral data
Report – Descriptive
Analytics
Predictive Models
Business
Workflow Resolution
Visualization Analytics
Content
Analytics

More Related Content

What's hot

Big data a possible game changer for e-governance
Big data   a possible game changer for e-governanceBig data   a possible game changer for e-governance
Big data a possible game changer for e-governanceSomenath Nag
 
Ctrls-Company Presentation
Ctrls-Company PresentationCtrls-Company Presentation
Ctrls-Company PresentationCTRLS
 
Internet of Things (IoT)
Internet of Things (IoT)Internet of Things (IoT)
Internet of Things (IoT)Prakhyath Rai
 
Disaster Recovery Trends In India - Future Outlook
Disaster Recovery Trends In India - Future OutlookDisaster Recovery Trends In India - Future Outlook
Disaster Recovery Trends In India - Future OutlookCTRLS
 
Ls subramanian internet of things
Ls subramanian internet of thingsLs subramanian internet of things
Ls subramanian internet of thingspromediakw
 
Data Management The Next Level
 Data Management The Next Level Data Management The Next Level
Data Management The Next LevelCTRLS
 
Zinnov Zones for IoT Services 2017
Zinnov Zones for IoT Services 2017Zinnov Zones for IoT Services 2017
Zinnov Zones for IoT Services 2017Zinnov
 
IBM Watson IoT - New Possibilities in a Connected World
IBM Watson IoT - New Possibilities in a Connected WorldIBM Watson IoT - New Possibilities in a Connected World
IBM Watson IoT - New Possibilities in a Connected WorldCasey Lucas
 
IRJET- Analysis of Big Data Technology and its Challenges
IRJET- Analysis of Big Data Technology and its ChallengesIRJET- Analysis of Big Data Technology and its Challenges
IRJET- Analysis of Big Data Technology and its ChallengesIRJET Journal
 
Lijun-Ravi
Lijun-RaviLijun-Ravi
Lijun-RaviEnergyIP
 
Contemporary Hardware Platform Trends
Contemporary Hardware Platform TrendsContemporary Hardware Platform Trends
Contemporary Hardware Platform TrendsAlbrecht Jones
 
A Framework for Cloud Computing Adoption in South African Government
A Framework for Cloud Computing Adoption in South African GovernmentA Framework for Cloud Computing Adoption in South African Government
A Framework for Cloud Computing Adoption in South African GovernmentGovCloud Network
 
Cloud Computing : Situation in Thailand
Cloud Computing : Situation in ThailandCloud Computing : Situation in Thailand
Cloud Computing : Situation in ThailandSoftware Park Thailand
 
Virtualization Conference Nov08 V2
Virtualization Conference Nov08 V2Virtualization Conference Nov08 V2
Virtualization Conference Nov08 V2Pini Cohen
 
Dispelling the Vapour around Cloud for Financial services
Dispelling the Vapour around Cloud for Financial servicesDispelling the Vapour around Cloud for Financial services
Dispelling the Vapour around Cloud for Financial servicesIBM India Smarter Computing
 

What's hot (20)

Big data a possible game changer for e-governance
Big data   a possible game changer for e-governanceBig data   a possible game changer for e-governance
Big data a possible game changer for e-governance
 
Ctrls-Company Presentation
Ctrls-Company PresentationCtrls-Company Presentation
Ctrls-Company Presentation
 
Internet of Things (IoT)
Internet of Things (IoT)Internet of Things (IoT)
Internet of Things (IoT)
 
Disaster Recovery Trends In India - Future Outlook
Disaster Recovery Trends In India - Future OutlookDisaster Recovery Trends In India - Future Outlook
Disaster Recovery Trends In India - Future Outlook
 
Ls subramanian internet of things
Ls subramanian internet of thingsLs subramanian internet of things
Ls subramanian internet of things
 
Data Management The Next Level
 Data Management The Next Level Data Management The Next Level
Data Management The Next Level
 
Zinnov Zones for IoT Services 2017
Zinnov Zones for IoT Services 2017Zinnov Zones for IoT Services 2017
Zinnov Zones for IoT Services 2017
 
IBM Watson IoT - New Possibilities in a Connected World
IBM Watson IoT - New Possibilities in a Connected WorldIBM Watson IoT - New Possibilities in a Connected World
IBM Watson IoT - New Possibilities in a Connected World
 
The Full Spectrum of IoT Electronics
The Full Spectrum of IoT ElectronicsThe Full Spectrum of IoT Electronics
The Full Spectrum of IoT Electronics
 
IRJET- Analysis of Big Data Technology and its Challenges
IRJET- Analysis of Big Data Technology and its ChallengesIRJET- Analysis of Big Data Technology and its Challenges
IRJET- Analysis of Big Data Technology and its Challenges
 
Lijun-Ravi
Lijun-RaviLijun-Ravi
Lijun-Ravi
 
Data dynamics in IoT Era
Data dynamics in IoT EraData dynamics in IoT Era
Data dynamics in IoT Era
 
Data Science
Data ScienceData Science
Data Science
 
Contemporary Hardware Platform Trends
Contemporary Hardware Platform TrendsContemporary Hardware Platform Trends
Contemporary Hardware Platform Trends
 
220401IMI2.pptx
220401IMI2.pptx220401IMI2.pptx
220401IMI2.pptx
 
A Framework for Cloud Computing Adoption in South African Government
A Framework for Cloud Computing Adoption in South African GovernmentA Framework for Cloud Computing Adoption in South African Government
A Framework for Cloud Computing Adoption in South African Government
 
Cloud Computing : Situation in Thailand
Cloud Computing : Situation in ThailandCloud Computing : Situation in Thailand
Cloud Computing : Situation in Thailand
 
Virtualization Conference Nov08 V2
Virtualization Conference Nov08 V2Virtualization Conference Nov08 V2
Virtualization Conference Nov08 V2
 
Ibm iot overview
Ibm   iot overviewIbm   iot overview
Ibm iot overview
 
Dispelling the Vapour around Cloud for Financial services
Dispelling the Vapour around Cloud for Financial servicesDispelling the Vapour around Cloud for Financial services
Dispelling the Vapour around Cloud for Financial services
 

Similar to Identity and Biometrics in the Big Data & Analytics Context

Overview - IBM Big Data Platform
Overview - IBM Big Data PlatformOverview - IBM Big Data Platform
Overview - IBM Big Data PlatformVikas Manoria
 
IBM an Era of new computing
IBM an Era of new computingIBM an Era of new computing
IBM an Era of new computingShane McCaul
 
There are 250 Database products, are you running the right one?
There are 250 Database products, are you running the right one?There are 250 Database products, are you running the right one?
There are 250 Database products, are you running the right one?Aerospike, Inc.
 
Qo Introduction V2
Qo Introduction V2Qo Introduction V2
Qo Introduction V2Joe_F
 
OpTier McKinsey Big Data Overview
OpTier McKinsey Big Data OverviewOpTier McKinsey Big Data Overview
OpTier McKinsey Big Data Overviewnickychu
 
McKinsey Big Data Overview
McKinsey Big Data OverviewMcKinsey Big Data Overview
McKinsey Big Data Overviewoptier
 
David valovcin big data - big risk
David valovcin big data - big riskDavid valovcin big data - big risk
David valovcin big data - big riskIBM Sverige
 
2019 Top IT Trends - Understanding the fundamentals of the next generation ...
2019 Top IT Trends - Understanding the  fundamentals of the next  generation ...2019 Top IT Trends - Understanding the  fundamentals of the next  generation ...
2019 Top IT Trends - Understanding the fundamentals of the next generation ...Tony Pearson
 
McKinsey Big Data Overview
McKinsey Big Data OverviewMcKinsey Big Data Overview
McKinsey Big Data Overviewoptier
 
IBM Technology Day 2013 BigData Salle Rome
IBM Technology Day 2013 BigData Salle RomeIBM Technology Day 2013 BigData Salle Rome
IBM Technology Day 2013 BigData Salle RomeIBM Switzerland
 
IBM CDS Overview
IBM CDS OverviewIBM CDS Overview
IBM CDS OverviewJean Tan
 
Ironstream for IBM i - Enabling Splunk Insight into Key Security and Operatio...
Ironstream for IBM i - Enabling Splunk Insight into Key Security and Operatio...Ironstream for IBM i - Enabling Splunk Insight into Key Security and Operatio...
Ironstream for IBM i - Enabling Splunk Insight into Key Security and Operatio...Precisely
 
Ibm big data-platform
Ibm big data-platformIbm big data-platform
Ibm big data-platformIBM Sverige
 
Denodo Data Virtualization - IT Days in Luxembourg with Oktopus
Denodo Data Virtualization - IT Days in Luxembourg with OktopusDenodo Data Virtualization - IT Days in Luxembourg with Oktopus
Denodo Data Virtualization - IT Days in Luxembourg with OktopusDenodo
 
InfoSphere Streams toolkits :Real-Time Analytics on Data in Motion
InfoSphere Streams toolkits :Real-Time Analytics on Data in MotionInfoSphere Streams toolkits :Real-Time Analytics on Data in Motion
InfoSphere Streams toolkits :Real-Time Analytics on Data in MotionAvadhoot Patwardhan
 
G111614 top-trends-sydney2019-v1910a
G111614 top-trends-sydney2019-v1910aG111614 top-trends-sydney2019-v1910a
G111614 top-trends-sydney2019-v1910aTony Pearson
 
Why You Need to Govern Big Data
Why You Need to Govern Big DataWhy You Need to Govern Big Data
Why You Need to Govern Big DataIBM Analytics
 

Similar to Identity and Biometrics in the Big Data & Analytics Context (20)

Overview - IBM Big Data Platform
Overview - IBM Big Data PlatformOverview - IBM Big Data Platform
Overview - IBM Big Data Platform
 
IBM an Era of new computing
IBM an Era of new computingIBM an Era of new computing
IBM an Era of new computing
 
09 research
09 research09 research
09 research
 
There are 250 Database products, are you running the right one?
There are 250 Database products, are you running the right one?There are 250 Database products, are you running the right one?
There are 250 Database products, are you running the right one?
 
Machine Data Analytics
Machine Data AnalyticsMachine Data Analytics
Machine Data Analytics
 
IBM Z for the Digital Enterprise 2018 - Z Keynote
IBM Z for the Digital Enterprise 2018 - Z KeynoteIBM Z for the Digital Enterprise 2018 - Z Keynote
IBM Z for the Digital Enterprise 2018 - Z Keynote
 
Qo Introduction V2
Qo Introduction V2Qo Introduction V2
Qo Introduction V2
 
OpTier McKinsey Big Data Overview
OpTier McKinsey Big Data OverviewOpTier McKinsey Big Data Overview
OpTier McKinsey Big Data Overview
 
McKinsey Big Data Overview
McKinsey Big Data OverviewMcKinsey Big Data Overview
McKinsey Big Data Overview
 
David valovcin big data - big risk
David valovcin big data - big riskDavid valovcin big data - big risk
David valovcin big data - big risk
 
2019 Top IT Trends - Understanding the fundamentals of the next generation ...
2019 Top IT Trends - Understanding the  fundamentals of the next  generation ...2019 Top IT Trends - Understanding the  fundamentals of the next  generation ...
2019 Top IT Trends - Understanding the fundamentals of the next generation ...
 
McKinsey Big Data Overview
McKinsey Big Data OverviewMcKinsey Big Data Overview
McKinsey Big Data Overview
 
IBM Technology Day 2013 BigData Salle Rome
IBM Technology Day 2013 BigData Salle RomeIBM Technology Day 2013 BigData Salle Rome
IBM Technology Day 2013 BigData Salle Rome
 
IBM CDS Overview
IBM CDS OverviewIBM CDS Overview
IBM CDS Overview
 
Ironstream for IBM i - Enabling Splunk Insight into Key Security and Operatio...
Ironstream for IBM i - Enabling Splunk Insight into Key Security and Operatio...Ironstream for IBM i - Enabling Splunk Insight into Key Security and Operatio...
Ironstream for IBM i - Enabling Splunk Insight into Key Security and Operatio...
 
Ibm big data-platform
Ibm big data-platformIbm big data-platform
Ibm big data-platform
 
Denodo Data Virtualization - IT Days in Luxembourg with Oktopus
Denodo Data Virtualization - IT Days in Luxembourg with OktopusDenodo Data Virtualization - IT Days in Luxembourg with Oktopus
Denodo Data Virtualization - IT Days in Luxembourg with Oktopus
 
InfoSphere Streams toolkits :Real-Time Analytics on Data in Motion
InfoSphere Streams toolkits :Real-Time Analytics on Data in MotionInfoSphere Streams toolkits :Real-Time Analytics on Data in Motion
InfoSphere Streams toolkits :Real-Time Analytics on Data in Motion
 
G111614 top-trends-sydney2019-v1910a
G111614 top-trends-sydney2019-v1910aG111614 top-trends-sydney2019-v1910a
G111614 top-trends-sydney2019-v1910a
 
Why You Need to Govern Big Data
Why You Need to Govern Big DataWhy You Need to Govern Big Data
Why You Need to Govern Big Data
 

Recently uploaded

What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 

Recently uploaded (20)

What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 

Identity and Biometrics in the Big Data & Analytics Context

  • 1. © 2009 IBM Corporation Leveraging Information for Smarter Organizational Outcomes 1 © 2009 IBM CorporationIBM Confidential June, 2013 1© 2009 IBM Corporation Identity and Biometrics in the Big Data & Analytics Context Dr. Charles Li Analytics Solution Center Washington, DC Charles _Li@us.ibm.com Leveraging Information for Smarter Organizational Outcomes
  • 2. © 2009 IBM Corporation Leveraging Information for Smarter Organizational Outcomes 2 Topics ID Management, Identity & Biometrics Views on Biometrics Technology and System The Concept of the Big Data, Analytics and Challenges Identity Establishment from All Sources Identity and Biometrics in the Cloud Identity and Biometrics Analytics in Near Real Time Summary
  • 3. © 2009 IBM Corporation Leveraging Information for Smarter Organizational Outcomes ID Management, Identity and Biometrics Identity Elements Players Entitlement(s) Actions Identity Trust (Rules) Status (Environment) Reputation (History) Identity Management
  • 4. © 2009 IBM Corporation Leveraging Information for Smarter Organizational Outcomes Views on biometrics technology and system 4 What is missing?
  • 5. © 2009 IBM Corporation Leveraging Information for Smarter Organizational Outcomes 5 Extract insight from a high volume, variety and velocity of data in a timely and cost-effective manner Big Data Concept Data in many forms – structured, unstructured, text and multimedia Data in Motion – Analysis of streaming data to enable decisions within fractions of a second Data at Scale - from terabytes to zettabytes Variety: Velocity: Volume:
  • 6. © 2009 IBM Corporation Leveraging Information for Smarter Organizational Outcomes 6 Analytics Concept Structured Data & Unstructured Content Descriptive Analytics Prescriptive Analytics Predictive Analytics Made consumable and accessible to everyone What if these trends continue? Forecasting How can we achieve the best outcome and address variability? Stochastic Optimisation What is happening What exactly is the problem? How many, how often, where? What actions are needed? What could happen? Simulation How can we achieve the best outcome? Optimisation What will happen next if? Predictive Modelling Extracting insight, concepts and relationships Content Analytics Deep insights to improve visualization and marketing interactions Visual Analytics
  • 7. © 2009 IBM Corporation Leveraging Information for Smarter Organizational Outcomes Biometrics Data at Scale – Static & Single Instance 1 Billion Arrivals 2012 world wide United States – 100-200 million international arrivals 2012 1 Exabytes traveling data Unique Identification Authority of India (UIDAI) plans to enroll 1.2 billion citizens.(UID Program) ( enroll million /day; half billion by 2014) 3-4 Exabytes Biometrics & Biographic Data Prolific Usage of Mobile Phones 6 Billion Mobile Phones 6 Exabytes of behavior data ID Cards/Border Crossings/Benefits/Multiple Instances 7,000,000,000x(10 Print 0.5-1MB + Face 200KB + IRIS KB) 7 Exabytes EU VIS Biometrics Matching System (BMS) at 70 million individuals and 100K daily enrollment ~100 Terabyte US DoS has in the range of 100 million faces & Others ~ at least 10-50 Terabytes DHS IDENT over 150 million identities; 125,000 transactions daily ~100-300 Terabytes FBI NGI ~ over100 Million Fingerprints & More coming plus Faces/Iris ~100-200 Terabytes 1 GigaBytes = 1000MB 1 TeraBytes = 1000GB 1 PetaBytes = 1000TB 1 ExaByes = 1000PB 1 ZettaBytes = 1000EB 1 YottaBytes = 1000ZB many instances, history, transaction, logs… data in reality
  • 8. © 2009 IBM Corporation Leveraging Information for Smarter Organizational Outcomes 8 Big Data Sources System Transaction, Log and Transition Data – Several Times More!
  • 9. © 2009 IBM Corporation Leveraging Information for Smarter Organizational Outcomes Other Big data examples 150 Exabytes global size of “Big Data” in Healthcare, growing between 1.2 and 2.4 EX / year For every session, NY Stock Exchange captures 1 Terabyte of trade information AT&T transfers about 30 Petabytes of data through its network daily Hadron Collider at CERN generates 40 Terabytes of usable data / day Facebook processes 500+ Terabytes of data daily Google processes > 24 Petabytes of data in a single day Twitter processes 12 Terabytes of data daily By 2016, annual Internet traffic will reach 1.3 Zettabytes We don’t have the most challenging problem!
  • 10. © 2009 IBM Corporation Leveraging Information for Smarter Organizational Outcomes “Brutal Force” De-Duplication • Cumulative de-duplication / Total number of checks= N(N-1)/2 – “Combination Problem” • De-duplicate 100 million population enrollment results 4,999,999,950,000,000 checking!!! • 15 years to complete with 10 million matches per second Biometric Accuracy Challenge • FMR at 1 Identification false match per million; • 500 False Matches with 1 million enrollment population • 5 million false matches with 100 million enrollment population Biometric Performance at Giga Scale* * Courtesy to Bojan Cukic* Courtesy to Bojan Cukic Prohibitive! We have some unique challenges! Prohibitive! We have some unique challenges!
  • 11. © 2009 IBM Corporation Leveraging Information for Smarter Organizational Outcomes Face the Challenges Identity Establishment with All Data Sources - Leverage Entity Resolution Technologies Biometrics Services in the Cloud - Leverage Big Data Infrastructure, Platforms and Software Services Identity and Biometrics Analytics in Motion 11
  • 12. © 2009 IBM Corporation Leveraging Information for Smarter Organizational Outcomes Establishment Identity with All Sources Biometrics(physical and behavioral) Biographic information Behavior data (Social media usage) Travel data (API, PNR) Banking Information Web or Desktop usage behavior • Emails • Multimedia Spatial and temporal information 12 Entity /Identity Resolution With all Sources Entity / Identity Resolution - a complex process involving the application of sophisticated algorithms across multiple heterogeneous data sources to resolve multiple records into a single fused view of an individual • Reduce search space and• Reduce search space and computing resources • Compliment to low quality images • Cost and benefits tradeoff • Systematic research necessary • Successful programs
  • 13. © 2009 IBM Corporation Leveraging Information for Smarter Organizational Outcomes Infrastructure Platform Management and Administration Availability and Performance Security and Compliance Usage and Accounting Enterprise Application Services Application Lifecycle Application Resources Application Environments Application Management Integration Cloud Services Infrastructure and Platform as a Service Smarter Commerce Smarter Cities Social BusinessBusiness Analytics and Optimization Enterprise+ Cloud Solutions Software and Business Process as a Service Infrastructure aaS Platform PaaS Software SaaS Business Process BPaaS Deployment Private, Public and Hybrid Models Biometrics Services in the Cloud - Leverage Big Data Infrastructure, Platform and Software Services Standard Interface Process Data Process Data Process Data Process Data Process Data Process Data Process Data Process Data Process Data Enrolment Service 1:1 Identification Service …. Fingerprint Biometric Data Iris Face Note: Cloud & Big Data not the same
  • 14. © 2009 IBM Corporation Leveraging Information for Smarter Organizational Outcomes A Prototype - Leveraging the cloud for Big Data Biometrics • E. Kohlwey et al. “Leveraging the Cloud for Big Data Biometrics, 2011 • A prototype system for generalized searching of cloud-scale biometric data as well as an application of this system to the task of matching collection of synthetic human iris images • Implemented with Hadoop (Map/Reduce framework) Successful deployment of Identification algorithms for India UID program • Non-traditional matching vendor technologies Biometrics as a Service • Business process as a service • Software as a service 14 Progress
  • 15. © 2009 IBM Corporation Leveraging Information for Smarter Organizational Outcomes Focus on Parallelism and Scalability • Excellent research and testing areas • Bring algorithms into operational environment Explore defining biometrics as a service program – new way of thinking about acquisition • Business process as a service • Software as a service Encourage partnership among Big Data & Analytics developers, traditional biometrics solution providers • Big Data and Analytics players 15 Challenges
  • 16. © 2009 IBM Corporation Leveraging Information for Smarter Organizational Outcomes Big Data Appliance Examples IBM Nettezza Oracle EXADATA Terradata EMC2 Greenplum SAP HANA Schooner Appliance MySQL Example - (CBP) 40TB data (per appliance, a few hundreds cores) hosted by a little more than a dozen appliances support 30 – 40 % of DHS’s operations 16
  • 17. © 2009 IBM Corporation Leveraging Information for Smarter Organizational Outcomes 17 Identity and Biometrics Analytics in Near Real Time ROC curve calibration along the security vs convenience • Allow systems to dynamically change operation criteria based on live situation • This is a real challenge due to the needed ground truth… Quality Feedback to the Collection • Avoid collecting ‘bad’ data to degrade the system Operating Metrics Monitoring • Rates on enrollment, rejection and etc. • Geo-location and temporal information Fuse all data sources based on real time feedback • Dynamically allocating fusion algorithms and configurations Provide controlled parallelism • System and algorithms levels
  • 18. © 2009 IBM Corporation Leveraging Information for Smarter Organizational Outcomes Achieve scale: By partitioning applications into software components By distributing across stream-connected hardware hosts Infrastructure provides services for Scheduling analytics across hardware hosts, Establishing streaming connectivity Transform Filter / Sample Classify Correlate Annotate Where appropriate: Elements can be fused together for lower communication latency Continuous ingestion Continuous analysis One Approach - Streams Technology in Working © 2013 IBM Corporation1 Near Real Time on Big Data Platform
  • 19. © 2009 IBM Corporation Leveraging Information for Smarter Organizational Outcomes 19 Summary Re-focus on Identity • Biometrics as an enabling technology Re-thinking on • Open architecture • Vendor agnostic solution via biometrics middleware Big Impact by Big Data and Cloud Technologies • Biometrics as a Service to Leverage Cloud Computing Big Data Real Time Platform • Near real time analytics requirements
  • 20. © 2009 IBM Corporation Leveraging Information for Smarter Organizational Outcomes 20 Page 20 6/18/2013
  • 21. © 2009 IBM Corporation Leveraging Information for Smarter Organizational Outcomes 21 A New Look - Identity and Biometrics Analytics Stream in Parallel Big Data Platform Entity /Identity Resolution Big Data Solution Pipeline Identification Services Including many Models Massively Parallel Processing Real Time High Volume Travel Data Banking Data Spatial Data Temporal Data Real-time feeds Biometrics Capture Data Biographic Data Unstructured data Social Media Info on Web Behavioral data Report – Descriptive Analytics Predictive Models Business Workflow Resolution Visualization Analytics Content Analytics