Last month I started working on my master thesis in the context of Manas Research. The idea for this research is to contribute with Farm Radio International (an organization that works with many partners in Africa to deliver effective programs to serve smallholder farmers through radio) by helping them with the problem of indexing content using speech recognition.

We plan on solving this problem by using a technique called Keyword Spotting which is used when the program has to recognize utterances of words in large amounts of recorded data.

One could say that for this task, we could use a general-purpose speech recognition system (based on Hidden Markov Models or similar techniques),  but the first constraint we are dealing with consists of thinking that the target language does not have resources for speech processing (as do Spanish or English where a lot of research and software have been developed) since the radio programs will be spoken in less studied languages such as Swahilli.

Therefore, our research will be based on the search of other kind of methods to face the problem.

Hopefully, this research will show us that good results can be achieved in spite of the constraints, and then, we will be able to implement full systems using our studies and algorithms.

Keep you informed :).