• Share this News :        


  • December 5, 2023
  • Anaranniya N
Revolutionizing Linguistic Inclusion: India Harnesses AI to Preserve and Empower 121 Languages

In a groundbreaking initiative, the Indian government is spearheading efforts to harness artificial intelligence (AI) for the benefit of millions who speak regional languages. The state of Karnataka saw villagers actively participating in a project to develop India's first AI-driven chatbot for Tuberculosis, emphasizing the importance of catering to the diverse linguistic landscape. With over 40 million native Kannada speakers facing a lack of representation in natural language processing (NLP), initiatives like Bhashini and Karya are striving to bridge the gap.

Tech firm Karya is engaging speakers from various Indian languages to contribute to datasets, aiding giants like Microsoft and Google in creating AI models for sectors like education and healthcare. Simultaneously, the Indian government is leveraging Bhashini, an AI-driven translation system, to build open-source datasets for the development of AI tools. However, challenges persist, including the predominantly oral tradition of many Indian languages and the scarcity of electronic records. Despite obstacles, the government's push to create datasets has already seen implementation in translation tools for education, tourism, and legal proceedings.

As the demand for inclusive AI grows, the spotlight is on crowdsourcing as an effective means to collect diverse speech and language data. Notably, ethical considerations regarding gender, ethnicity, and socio-economic bias are emphasized by experts like Kalika Bali from Microsoft Research India. Crowdsourcing efforts must be conducted ethically, ensuring awareness, fair compensation for contributors, and a specific focus on collecting data from smaller languages. The overarching goal is to make AI tools accessible to every linguistic community, thereby unlocking economic opportunities and valuable information for millions across India.

Empowering Indian Languages Through Economic Incentives and Technological Innovation.As India strives to democratize access to AI tools, innovative approaches are emerging to address linguistic diversity and inclusion. Karya, collaborating with non-profit organizations, has adopted a model that not only pays contributors above the minimum wage but also ensures they own a share of the data they generate. This economic incentive model not only benefits contributors but also holds potential for community-driven AI products in areas such as healthcare and farming. 

With less than 11% of India's population proficient in English, the focus on speech-based AI models gains prominence. Initiatives like Google-funded Project Vaani and AI4Bharat's Jugalbandi chatbot are actively collecting speech data to break down language barriers and enhance accessibility. Furthermore, AI-based translation tools from organizations like EkStep Foundation are making strides in legal settings, exemplified by their use at the Supreme Court in India and Bangladesh. These technological advancements not only mitigate language barriers but also empower marginalized communities by providing critical information at the grassroots level.

At the grassroots level, individuals like Swarnalata Nayak from Raghurajpur district in Odisha are witnessing the positive impact of AI on their lives. Engaging in speech data work for Karya, Nayak experiences a much-needed additional income, highlighting the transformative potential of AI for economic empowerment in rural areas. The collaboration between technology firms, the government, and grassroots organizations reflects a concerted effort to ensure that AI advancements not only break linguistic barriers but also contribute to the socio-economic upliftment of communities that need them the most.