This document discusses how mobile phone data can be used to understand and model cities. It provides examples of analyzing aggregated and anonymized data from mobile phones to understand search patterns, map views, check-ins, and other data to learn about a city's businesses, transportation patterns, and usage over time. The document argues that with the vast amount of sensor and usage data collected from phones, they can act as "city samplers" and allow analysis of a city's "biology" without traditional surveys or models.
2. understanding systems
by making models
We have always tried to understand systems by creating models of them. We create rules that
match reality just closely enough that we can study reality by studying the model. MONIAC is
one such example, created at the London School of Economics in 1949 by Bill Phillips. It uses
fluid dynamics to model an economy, with the flow between water tanks standing in for the
monetary flow between the Treasury, Education and so forth.
3. “The more we learn about biology,
the further we find ourselves from
a model that can explain it.”
Chris Anderson, http://www.wired.com/science/discoveries/magazine/16-07/pb_theory
“All models are wrong, but some are useful.” — George Box, Statistician, quoted in http://
www.wired.com/science/discoveries/magazine/16-07/pb_theory
As our knowledge advances in a field like biology, our inaccurate models give us diminishing
returns. In “The End Of Theory”, Chris Anderson argues that the future of science is transitioning
to analysing empirical data gathered from observation of the world. He calls this The Petabyte
Age, pioneered by companies such as Google who created techniques for large-scale analysis of
data out of the necessity to analyse the whole internet.
Credit: http://www.flickr.com/photos/timo/851027757/
4. people are city biology
We can try to study cities with models. But human behaviour, the biology of the city, makes
cities too complex to model.
5. Recent visualisations of the movement of hire-bikes through London emphasise for me the
organic, biological nature of human city-data.
6. “We can’t see how the
street is immersed in a
twitching, pulsing cloud
of data.”
Dan Hill: http://www.cityofsound.com/blog/2008/02/the-street-as-p.html
Dan Hill continues, “This is over and above the well-established electromagnetic radiation,
crackles of static, radio waves conveying radio and television broadcasts in digital and
analogue forms, police voice traffic. This is a new kind of data, collective and individual,
aggregated and discrete, open and closed, constantly logging impossibly detailed patterns of
behaviour. The behaviour of the street.”
The data that flows through modern cities is not even visible to the human eye. We can’t
gather this data with interviews, surveys and clipboards.
7. city samplers
So at Nokia, we’ve been asking the question, can the phone be the entire source of data that
allows us to know our cities?
8. This is plausible because so many people carry a phone with them 24 hours a day, wherever
they go in the city. It’s also because the modern mobile phone is packed with sensors. Early
phones had a microphone and a radio. Phones today know which way up they are, where they
are in the world, can record images and video, and can sense the presence of many other
devices, networks and signals.
9. This brings the city into the Petabyte Age. What allows us to process the data is a technique
developed by Google and popularised in open-source in the Hadoop project.
Map-Reduce is a system for specifying a data-processing algorithm that allows the work to
be split up and distributed to a network of computers to solve in pieces. It maps raw input
data to processed output data, then reduces the output data into final results.
10. With map-reduce, we can run an algorithm on a rack of servers...
http://www.flickr.com/photos/johnseb/3425464/
11. ... or a corridor full of racks of servers ...
12. ... or data-centre full of corridors full of racks of servers.
We can start small and scale up our processing capability to keep pace with the scale of our
data. It sidesteps the limit we hit with traditional single-machine analytics, when we can no
longer process 24 hours of data in 24 hours of CPU time.
13. learning from search
My first example shows what we can learn by looking at what people search for on a map,
and where they are when they search.
14. Ikea Spandau
Ikea Schoenefeld
Ikea Tempelhof
This map of Berlin (made by Nokia’s Josh Devins) aggregates searches made over the last
Thursday, January 27, 2011
Ikea geo-searches bounded to Berlin
four months for the word “Ikea”. It clearly shows that people all over Berlin look for Ikea, but
can we make any assumptions about whatBerlin Ikea stores.
that there are obvious clusters near the 3 the actual locations are?
kind of, but not much data here
clearly there is a Tempelhof cluster but the others are not very evident
certainly shows the relative popularity of all the locations
Ikea Lichtenberg was not open yet during this time frame
15. Prenzl Berg Yuppies
Ikea Spandau
Ikea Schoenefeld
Ikea Tempelhof
The fourth obvious cluster is a demographic - the young middle-class families who tend to
Thursday, January 27, 2011
Ikeain the Prenzlauer Berg district of Berlin.
live geo-searches bounded to Berlin
can we make any assumptions about what the actual locations are?
kind of, but not muchalso shows that people don’t search for Ikea on a Sunday as much as
Incidentally, the data data here
clearly there is week. This is cluster but the others are not very evident laws and even Ikea is
the rest of the a Tempelhof because Germany still has Sunday-closing
certainly shows the relative popularity of all the locations
not open on Sundays.
Ikea Lichtenberg was not open yet during this time frame
16. learning from maps
We can learn plenty about a city just from looking at its maps, and the places on the map.
17. The “Starbucks Index”, invented by designer Tom Coates, is calculated from the number of
Starbucks cafes per square kilometre of the city. By analysing Nokia’s places registry, we can
show the difference between difference cities, or different parts of a city, by looking at what
companies choose to base themselves there. We could equally well calculate a McDonalds
index, or an Italian food index, or a public parks index.
18. Searches are goal-driven user behaviour - someone typed something into a search box on a
phone. But we can even learn from activity that isn’t so explicit.
When someone views a Nokia Ovi map on the web or phone, the visuals for the map are
served up in square “tiles” from our servers. We can analyse the number of requests made for
each tile and take it as a measure of interest or attention in that part of the world.
19. Searches are goal-driven user behaviour - someone typed something into a search box on a
phone. But we can even learn from activity that isn’t so explicit.
When someone views a Nokia Ovi map on the web or phone, the visuals for the map are
served up in square “tiles” from our servers. We can analyse the number of requests made for
each tile and take it as a measure of interest or attention in that part of the world.
20. LA attention heatmap
This is the attention map of Los Angeles, California. We can clearly see several important
hotspots such as Downtown, Hollywood and LAX airport.
21. LA driving heatmap
If we turn to the navigation logs, we get another map of Los Angeles. This data is recorded
whenever someone requests a car route from one place to another. You can clearly see the
roads, and it heavily emphasises major roads because that’s what is favoured by route-
planning algorithms. It’s also a map made by people who don’t know where they’re going - if
they knew exactly what route to take, they wouldn’t be using navigation on their phones.
22. business perspective
City data also reflects business activity. In Berlin our local coffee shop owner uses pen and
paper to record every sale he makes. He uses this to optimise his pricing and the kinds of
coffee he sells. We can do some of the same analysis on a larger scale.
23. business context
Looking at the check-in and search patterns around coffee shops, we made this map of the
San Francisco Dolores Park area. Red circles are coffee shops, and blue circles are other
businesses. The larger the circle, the more popular the location is to visit.
24. usage patterns
We discovered we could deduce more than just business information from this data. When we
looked at one specific venue, Dolores Park itself, we can tell that San Francisco is cold at
night. No matter the time of year, checkins at the park are much lower in the evening and
night than in daytime.
When we looked at the day of the week that people visit the park, we thought we had a bug in
our data collection. Why would Thursday be different from other days for popularity of parks?
When we cross-referenced the data with weather records, we realised that this particular
Thursday was wet and cold.
Like many other examples in this presentation, we were excited by the fact that we can find
verifiable real-world information in pure data, without any human guidance.
25. “Information is
quickly becoming
a material to
design with.”
Mike Kuniavsky: http://orangecone.com/archives/2010/08/information_is_.html
In his recent book “Smart Things”, Mike Kuniavsky compares information to traditional
materials such as wood and rubber. It has now become a material that we can build with in
the real world, to connect the physical and the digital worlds together.
26. [nod to Matt Jones, for many conversations we had about cities while working together at
Dopplr]
27. Thank you.
Matt Biddulph
@mattb | matthew.biddulph@nokia.com
After the talk, there were questions from the audience...
28. Audience question
What about individual privacy, and the ethics of profiting from
individual user data?
1. We only ever analyse the aggregate, anonymised set of all users’ data. We didn’t track any
individuals in any part of this work.
2. I believe that it could only be unethical to profit from analysing user data if you don’t
return some value by making them a useful, desirable product in return.
29. Audience question
I’m not uncomfortable with services analysing my data, but I am
unhappy if I feel like I don’t own my personal data.
In my personal opinion, individual data belongs to the individual. Putting your data into a
large service gives you access to economies of scale, allowing it to do useful analysis of the
aggregate data that you couldn’t achieve with your data alone. You benefit from this when
their service gets better the more you use it.
A company you deposit data with should act like a bank: hold it in trust, generate some
benefit, give it back when you ask.