Many people are excited about creating usable speech technology. However, most of the audio data used by large companies isn’t available to the majority of people, and that data is often biased in terms of language, accent, and gender. Jenny, Josh, and Remy from Mozilla join us to discuss how Mozilla is building an open-source voice database that anyone can use to make innovative apps for devices and the web (Common Voice). They also discuss efforts through Mozilla fellowship program to develop speech tech for African languages and understand bias in data sets.
Join the discussion
Changelog++ members get a bonus 2 minutes at the end of this episode and zero ads. Join today!
Sponsors:
Linode • – Our cloud of choice and the home of Changelog.com. • Deploy a fast, efficient, native SSD cloud server for only $5/month. Get 4 months free using the code changelog2019 OR changelog2020. To learn more and get started head to linode.com/changelog • . Pace.dev • – Minimalist web based management tool for your teams. Async by default communication and simplistic task management gives you everything you need to build your next thing. Brought to you by Go Time • panelist Mat Ryer. Try it out today!Fastly • – Our bandwidth partner. • Fastly powers fast, secure, and scalable digital experiences. Move beyond your content delivery network to their powerful edge cloud platform. Learn more at fastly.com • . Rollbar • – We move fast and fix things because of Rollbar. • Resolve errors in minutes. Deploy with confidence. Learn more at rollbar.com/changelog • .
Featuring:
• Jenny Zhang – Website • , X • Remy Muhire – GitHub • , X • Josh Meyer – GitHub • , X • Chris Benson – Website • , GitHub • , LinkedIn • , X • Daniel Whitenack – Website • , GitHub • , X Show Notes:
Mozilla Common VoiceAnnouncement of Josh and Remy’s fellowship work on speech tech for African languagesArtie Bias Corpus • Readings on Demographic Bias in ASR: Voice recognition still has significant race and gender biasesGender and Dialect Bias in YouTube’s Automatic CaptionsRacial disparities in automated speech recognitionCommon Voice LREC Paper • Common Voice + DeepSpeech collaborators for Low-resource languages: Digital UmugandaAI Lab, Makerere UniversityLanguage Technologies Unit, Bangor UniversityLinguistics Department, Indiana University Bloomington • “under-sampled majority” is a quote from Joy Boulamwini (see this article • )
Something missing or broken? PRs welcome!
★ Support this podcast ★
Nyd den ubegrænsede adgang til tusindvis af spændende e- og lydbøger - helt gratis
Dansk
Danmark