Unstructured Data Parsing

Seentags is a solution developed for extracting and sorting information from text reports

Jan 2010 - Oct 2011
InSTEDD, Data collection, Mobile, Open Source
3 min

1 /3

Client

Developed together with InSTEDD for simplifying data-reporting from users on the field, Seentags is an open-source tool that can be easily plugged into any existing web application by using simple HTTP requests, or used by itself receiving messages and storing them in a database.

2 /3

Approach

Since most end-users find it hard remembering and accurately typing complex syntax when structured data-reporting is needed, Seentags can decipher incoming messages regardless their conformance to the syntax rules, thus eliminating the need to use complex or consistent reporting structures in order to have the message understood.

3 /3

Results

Seentags accepts incoming text messages and breaks the contents up into categories, without requiring the admin to tell the system in advance what the categories are. Once the information is in the Seentags system, the user can provide feedback to ‘train’ Seentags so that it learns to correctly interpret new report formats.

With a built in pattern detector, the system needs to be taught what category the text is one time before it applies it automatically to all future messages and corrects it in all previous entries. Text separators such as commas, dashes or periods are no longer needed in order to define, categorize and organize incoming information. Seentags automatically puts data into a CSV file that can be exported for future needs.

Deciphering process

diagram

Open source

This project is Open Source, we invite you to collaborate and join us in the development of a better world through the use of technology.


https://bitbucket.org/instedd/seentags
 

Let's work together




Start your project