Google partners with Zindi to increase awareness and skills for NLP in Africa
Earlier this year, Zindi and Google partnered together to organize a 4-part Natural Language Processing (NLP) hackathon series which included an African Automatic Speech Recognition (ASR) workshop and 3 challenges centred on speech recognition, sentiment analysis, and speech data collection.
The Google NLP Hack Series spanned two months and kicked off with a live interactive workshop in February. The goal of the workshop was to increase awareness and skills for NLP in Africa, especially among researchers, students, and data scientists who were new to NLP. The workshop aimed to lower the barrier to NLP by providing a beginner-friendly introduction and encouraging participants to practice their skills in a competition.
In the workshop, teams from Zindi, Data Science Network, and Google talked about the challenges and progress in the Africa NLP space and opportunities to get involved with data science. The workshop also gave an introduction to NLP and ASR, including topics such as “What makes a good dataset?” and “Speech model training using ELPIS”.
The live workshop and recording reached almost 1000 African data scientists. A majority of surveyed participants said the workshop led to an interest to continue learning and practising NLP.
According to Ameck Dosseh, it was a great opportunity of learning and preserve his mother tongue for future generations.
“Taking part in the process of teaching our mother tongue to a computer is the new way of preserving our culture and its transmission to future generations. Thanks to Zindi and Google for organizing this workshop about ASR in Africa. I got the chance to learn a lot of ideas towards this common goal.”
Following the workshop, participants had the opportunity to compete in 3 challenges. First, was the West Africa ASR Challenge, which was organized in partnership with Data Scientists Network (DSN). DSN is a frontline Artificial Intelligence (AI) social enterprise committed to driving the solution-oriented application of Artificial Intelligence in Nigeria, with a focus on building world-class AI knowledge, research, and innovation ecosystem that delivers transformational research, business use applications, AI-first start-ups, employability, and social good use cases.
Tejumola Asanloko from DSN shared that the hackathon was of great benefit to the participants in offering real-world experience.
“With a vision to support the development of a million Artificial Intelligence/ digital technology talents in Africa and enabling the development of context-aware AI solutions to enhance the quality of life, we are very elated to be part of this partnership because it has provided a sizeable local text dataset that participants collected and used. The hackathon offered each participant a real-world, hands-on, and immersive experience to sharpen their skills as they learned to solve local problems.”
This 1-week challenge asked participants to train their own Automatic Speech Recognition (ASR) model using Mozilla Common Voice’s Hausa dataset. This challenge focused on ASR because speech technologies enable interaction with services and applications that would be difficult to access via means other than voice and coverage of speech technologies are not yet widely available in African languages.
The second challenge, the Swahili Social Media Sentiment Analysis Challenge was held as a virtual hackathon over a weekend in February 2022 across 5 East African countries, Tanzania, Malawi, Kenya, Rwanda and Uganda. The task was to classify if a Swahili tweet was positive, negative or neutral sentiment using machine learning. The tweet data was collected by 5 Zindi ambassadors.
This weekend hack saw 63 data scientists participate, many as returning Zindians and some new faces. This event allowed fellow East Africans to connect and their Zindi ambassadors in a supported environment which allowed them to ask questions and improve their machine learning and NLP skills.
The final challenge, the Intro to ASR Africa Challenge, gave participants 2 months to collect their speech data for an African language of their choice (and as a bonus train a speech recognition model on that data). This challenge was the most important of the series because while there have been many advances in ASR modeling and data collection methodologies in recent years, the greatest problem continues to be the requirement for large amounts of voice data and the lack of it in African languages.
This challenge resulted in 7 winners collecting and open-sourcing 13.4 hours of recordings, covering these countries/languages: Malawi: Chichewa, Benin: Fongbe, Ivory Coast: Dendi and Baule, Senegal: Wolof, Sudan: Sudanese dialect, and Kenya: Swahili.
Check out this reflection on the Google NLP Hack Series article to learn tips and lessons learned from the challenge winners and hear about why they’re interested in NLP.
The Google NLP series was a great partnership between Google and Zindi. Special mention goes to the Google NLP team: Connie Tao, Clara Rivera, Sandy Ritchie, Alëna Aksënova, Kim Heiligenstein, Daan van Esch, Evan Crew, Yossi Matias, and Slav Petrov; DSN team who helped market the event: Tejumola Asanloko and Aanu Oyeniran; and Zindi team: Amy Bray, Paul Kennedy, Cleo Henry, Delilah Gesicho, Celina Lee, Agnes Wanjiru, Davis David.