
Expanding Linguistic Diversity in AI Through Meta’s Language Technology Partner Program and Open Source InitiativesAt Meta we are committed to advancing linguistic diversity and inclusivity in the digital world by supporting underserved languages in AI models This effort is crucial not only for promoting global inclusivity but also for building intelligent systems capable of adapting to new situations and learning from experience Today we’re excited to share our latest initiatives including the Language Technology Partner Program and an open source machine translation benchmark both designed to empower speakers of underrepresented languages and enhance AI-driven language technologiesOur collaboration with UNESCO underscores this mission as we work together to expand support for underserved languages particularly indigenous ones Achieving advanced machine intelligence AMI is a core focus for Meta’s Fundamental AI Research FAIR team AMI refers to AI systems that can use human reasoning to perform cognitively demanding tasks such as translation By developing models that work across multilingual contexts and prioritize underserved languages we aim to create tools that benefit everyone while fostering linguistic diversity and inclusivityToday we’re unveiling new programs research and models that align with this vision and offering opportunities for collaborators to contribute to AI translation technologies that encompass a wide array of global languages and dialects These efforts are part of our long-term commitment to ensuring no language is left behind in the digital ageThrough the Language Technology Partner Program we are seeking partners to collaborate on advancing and broadening Meta’s open source language technologies with a particular focus on underserved languages This initiative supports UNESCO’s International Decade of Indigenous Languages and aims to integrate more languages into AI-driven speech recognition and machine translation models Partners are invited to contribute resources such as 10+ hours of speech recordings with transcriptions large amounts of written text 200+ sentences and sets of translated sentences in diverse languages These contributions will help us train models that when released will be open sourced and freely available to the communityIn addition to contributing data partners will gain access to technical workshops led by our research teams These workshops will provide hands-on guidance on leveraging our open source models to build language technologies We are thrilled to announce that the Government of Nunavut Canada has joined us in this initiative collaborating to share data in the Inuit languages Inuktitut and Inuinnaqtun To become a partner and contribute to this transformative program please fill out our interest formTo further support the development of robust machine translation systems we are launching an open source machine translation benchmark This benchmark composed of sentences carefully crafted by linguistic experts is designed to evaluate the performance of AI models in conducting translations It showcases the diversity of human language and is available in seven languages We invite researchers developers and linguists to access the benchmark contribute translations and help us build an unprecedented multilingual machine translation resource By making this benchmark open source we aim to foster innovation and ensure that advancements in translation technology are accessible to all Access the benchmark hereOur dedication to linguistic diversity is reflected in our ongoing projects and milestones One of the most significant achievements in this area was the 2022 release of the No Language Left Behind NLLB project a groundbreaking open source machine translation engine This model marked the first neural machine translation system for many languages and laid the foundation for future research and development Building on this success we collaborated with UNESCO and Hugging Face to create a language translator based on NLLB which we announced during United Nations General Assembly week last SeptemberMore recently we introduced the Meta Massively Multilingual Speech MMS project to support digital empowerment a key focus of the Global Action Plan of the International Decade of Indigenous Languages The MMS project scales audio transcription capabilities to over 1100 languages significantly expanding the reach of speech recognition technologies In 2024 we added zero-shot speech recognition enabling the system to transcribe audio in languages it has never encountered before without prior training This breakthrough demonstrates our commitment to creating intelligent systems that can understand and respond to complex human needs regardless of language or cultural backgroundUltimately our goal is to build AI systems that are inclusive adaptable and capable of serving people worldwide regardless of their linguistic background Through initiatives like the Language Technology Partner Program and the open source machine translation benchmark we are taking meaningful steps toward this vision We invite researchers organizations and communities to join us in collaboratively enhancing and expanding machine translation and other language technologies Together we can ensure that every voice is heard and represented in the digital conversation Explore our programs contribute your expertise and help us shape a more inclusive future powered by AI



