The document provides an overview of the Swift River platform, an initiative by Ushahidi to aggregate and filter large amounts of real-time data from various sources. Some key points:
- Swift River aggregated over 100,000 reports from sources like Twitter and SMS in the first four days of the 2010 Haiti earthquake.
- It aims to surface important information while suppressing noise, identify authoritative sources and users, and curate discussions around topics or events.
- The platform's goals are to help users more easily find, understand, and manage large amounts of unstructured data in real-time from around the world.
- It utilizes various techniques like natural language processing, location services, duplication filtering, and reputation
4. HAITI EARTHQUAKE
• 20,000 reports in the first few hours
•100,000 reports in the first four days
•Averaged of 25,000 per day
•Real-time Reports from around the world
•Conflicting statements
•From Twitter users, text messages, Email,
and Web
•Each report verified by humans
5. What if we listened to the crowd?
What if the crowd was also the filter?
Our problem then becomes who in the
crowd do we listen to?
13. PLATFORM GOALS
•Save time when sweeping through large,
unstructured datasets.
•Surface signal, suppress noise (irrelevant
discussions, inaccuracies, duplicates)
•Identify relevant sources and rate them
•Identify and rate authoritative users
•Curate discussion and media around any topic
or event
•Puts the tools into the user’s hands (think Google
Reader vs. Google News)
14. WHAT IS SWIFT FOR?
• Improving information findability
• Surfacing content you didn't know you were looking for
• Automatically sorting and structuring unstructured data
• Understanding media from other parts of the world (translation)
• Making urgent data more discoverable
• Leveraging eyewitness accounts
• Using location as context
• Expanding the grassroots reporting network
• Preserving information (archiving)
• Improving data management within Ushahidi
15. Example Use Cases
• Journalists / Newsrooms (aggregating thousands of feeds)
• Emergency Response Organizations (aggregating reports)
• Election Monitoring (curating trusted sources)
• Research and Information Curation (collecting info from around the web)
• Real-Time News Gathering (filter by conditional rules)
• Online Brand Monitoring (monitoring keywords)
16. TERMINOLOGY
•Channels - methods of delivery (email, SMS,
Twitter, YouTube, RSS/XML etc.)
•Sources - the source producing aggregated
content
•Content Items - individual items coming from
sources
•Dashboard - the central window for processing
the various types of data aggregated by swift
17. SwiftRiver Web Services
• SiLCC - NLP for SMS and Twitter
• SULSa - Location Services
• SiCDS - Duplication Filtering
• River ID - Distributed Reputation
• Reverberations - Measures influence of online content
18. ROADMAP
• March 31 v0.0.9 Rumba *
• first deployment
• proof of concept
• April 30 v0.1.0 Apala *
• New Core, API, Plugin Architecture
• Set up EC2 environment for hosted services
• SMS and Email
• May 31 v.0.2.0 Batuque *
• integrate SiLCC, SULSa
• modules for Wordpress
• June 30 v0.3.0 Benge
• User profiles, SiCDS
• July 30 v0.4.0 Bikutsi
• Jquery? Tornado for PHP?
• River ID
• Reverberations
• Drupal
• August 30 v0.5.0 Jazz
• Feed Recommendations
• MHI