Data Collaboration | Manas.Tech

1 /3

Client

Although development of Riff (in partnership with InSTEDD) has initially been focused on health-related detection scenarios, the underlying system is a general collaboration environment for content creation, social metadata annotation, and automated analysis with potential applicability in a wide range of areas. Several organizations are exploring the use of Riff in areas as wide ranging as humanitarian crisis reporting and early conflict warning. One organization, for example, has recently begun training Riff’s integrated SVM machine learning engine to identify hate speech and other potential indicators of geopolitical deterioration in news reports.

2 /3

Approach

Riff consists of several high-level modules, including:

Data aggregation and gathering
Automatic feature extraction, data classification and tagging
Human input, hypotheses generation and review
Predictions and alerts output,
Field confirmation and feedback.

The data aggregation and gathering module allows users to collect (or extract, transform and load - ETL) information from several sources - SMS messages (e.g., Geochat), RSS feeds, email list (e.g., ProMED, Veratect, HealthMap, Biocaster, EpiSpider), OpenROSA, Map Sync, Epi Info™, documents, web pages, electronic medical records (e.g., OpenMRS), animal disease data (e.g., OIE, AVRI hotline), environmental feed, NASA remote sensing, etc.

The automatic feature extraction, data classification and tagging module is an architecturally extensible module that allows the introduction of machine learning algorithms (e.g., Bayesian, SVM). These components extract and augment the features (tags or metadata) from multiple data streams; such as: source and target geo-location, time, route of transmission (e.g., person-to-person, waterborne), etc. In addition, these components help detect relationships between these extracted features within a collaborative space or across different collaborative spaces. Furthermore, with human input, these components can suggest possible events or event types (e.g., at the earliest stages of a disease outbreak: “there is an unknown respiratory event, transmitted person-to-person, detected in location X, and with a certain spatio-temporal pattern”).

The human input and review module is exposed as a set of functionalities that allows users to comment, tag, and semantically rank the elements (positive, neutral, or negative). Additionally, users can generate and test multiple hypotheses in parallel, further collect and rank sets of related items (evidence), and model against baseline information (for cyclical or known events). The system maintains a list of ongoing possible threats allowing domain experts to focus their field information and either confirm or reject the hypotheses created. That feedback is then fed into the system to update (increase or decrease) the reliability of the sources and credibility of the users in light of their inferences or decisions.

3 /3

Results

In the Public Health and Biosurveillance domain, Riff helps synthesize health-related event indicators from a wide variety of information sources (structured and unstructured) into a consolidated picture for analysis, maintenance of “community-wide coherence”, and collaboration. Current automatic classification includes seven syndromes, ten transmission modes, more than 100 infectious diseases, 180 microorganisms, 140 symptoms, and more than 50 chemicals. Presently, Riff is being piloted in the Mekong Basin region of Southeast Asia.