Based on a piece from the Rest of World, Lesan, a small AI startup from Ethiopia, is making impressive strides in language translation, even outperforming tech giants like Google in translating Ethiopian languages.
Use the most powerful academic tools to write better with AI, check for plagiarism and detect AI content!
- Unlike major tech companies, Lesan believes in creating specialized models for each language.
- Traditional AI tools like ChatGPT often perform poorly when dealing with less-documented languages, due to a lack of adequate data, increasing the need for language-specific technologies.
- There is a growing pan-African AI movement where developers and companies work togetherto address the language technology gaps for African languages.
In an arena dominated by tech heavyweights, Lesan’s approach deviates from the norm. While giants like Google and Facebook promote their universal translation models, Lesan CEO questions their efficacy.
“Google and Facebook overhype that they have built one single giant model to solve machine translation for hundreds of languages. If you put it on its face and compare it with smaller startups, like Lesan or Ghana NLP, the quality is actually low.”
The Need for Language-Specific Technologies
The surge in conversational artificial intelligence tools, such as ChatGPT, has only further highlighted the shortcomings in dealing with low-resource languages. As the Lesan CEO points out,
“If you ask ChatGPT in Tigrinya or Amharic the simplest and most frequently asked questions, it gives you gibberish, a mix of Tigrinya and Amharic, or even made-up words.”
This poor performance underscores the urgency of creating language-specific technologies, particularly for languages that lack sufficient data online.
Fostering a Pan-African AI Movement
Instead of adopting a competitive stance, Lesan aligns itself with a larger pan-African AI movement, aiming to address these language gaps together. The CEO proudly states,
“We’re actually meeting every other week with other language technology startups, discussing how we can come together and solve this problem our own way. We don’t have to repeat what Silicon Valley does for our languages. And you don’t have to have a one-player-takes-it-all mentality.”
The work of Lesan and similar startups could potentially shift the focus in AI development towards a more cooperative, culturally sensitive approach, putting lesser-known languages on the map and bridging the digital language divide.
Enhancing Translation Accuracy in Underrepresented Languages
It’s no secret that in machine translation, less-documented or low-resource languages often get the short end of the stick. However, there are steps we can take to improve the quality of translation in these languages, fostering inclusivity and providing a richer linguistic experience for all:
- Rather than trying to create a one-size-fits-all translation model, develop language-specific models. These specialized models can better account for the unique syntactical, morphological, and semantic characteristics of each language.
- Work with native speakers for data collection and model training. Their linguistic intuition and cultural understanding can greatly enhance translation accuracy.
- Gather more linguistic data for low-resource languages. This could involve online data mining, conducting linguistic surveys, or even crowdsourcing language data from native speakers.
- Join forces with other developers or organizations working on low-resource languages. Collective efforts can accelerate progress and create more comprehensive and accurate language models.
- Continuously test your models and gather feedback from users. Use this feedback to refine your models, making them more accurate over time.
These steps can help us bridge the gap in translation quality for less popularized languages, ensuring every language has a place in our digital world.
Follow us on Reddit for more insights and updates.