Web Accessibility Support
Media Release

Howard University and Google Research Enhance A.I. Speech Recognition of African American English

Howard University and Google researchers release dataset of over 600 hours of African American English dialects to improve AI speech recognition

Black man with glasses speaks into his iPhone on the Howard University Yard

WASHINGTON, D.C. — Howard University and Google Research released data today which can be used by artificial intelligence developers to improve the experience of Black people using automatic speech recognition (ASR) technology. Through the partnership, Project Elevate Black Voices, researchers traveled across the United States to catalogue dialects and diction used frequently in Black communities but often not recognized or misconstrued by artificial intelligence-driven technologies, making it more difficult for many Black individuals to engage with the technology. 

African American English (AAE), African American Vernacular, Black English, Black talk, or Ebonics is a rich language rooted in history and culture. Because of inherent bias in the development process, incorrect results are sometimes generated when Black users vocalize commands to AI-driven technology. Many Black users have needed to inauthentically change their voice patterns away from their natural accents to be understood by voice products. 

“African American English has been at the forefront of United States culture since almost the beginning of the country” said Gloria Washington, Ph.D., Howard University researcher and co-principal investigator of Project Elevate Black Voices and Howard University researcher. Voice assistant technology should understand different dialects of all African American English to truly serve not just African Americans, but other persons who speak these unique dialects. It's about time that we provide the best experience for all users of these technologies.”  

Researchers collected 600 hours of data from users of different AAE dialects in an effort to address implicit barriers to improving ASR performance. Thirty-two states are represented in the dataset. They found that there is a lack of natural AAE speech found within speech data because Black users have been implicitly conditioned to change their voices when using ASR-based technology. Even when data is available, in-product AAE is difficult to leverage because of code-switching. 

"Working with our outstanding partners at Howard University on Project Elevate Black Voices has been a tremendous and personal honor,” said Courtney Heldreth, Ph.D., co-principal investigator at Google Research. “It’s our mission at Google to make technology that’s useful and accessible, and I truly believe that our work here will allow more users to express themselves authentically when using smart devices.” 

 Howard University will retain ownership of the dataset and licensing, and serve as stewards for its responsible use, ensuring the data benefits Black communities. Google can also use the dataset to improve its own products, ensuring that their tools work for more people. Google performs this type of model training work with all sorts of dialects, languages, and accents around the US and the world. 

“As a community-based researcher, I wanted to carefully curate the community activations to be a safe and trusted space for members of the community to share their experiences about tech and AI and to also ask those uncomfortable questions regarding data privacy,” said Lucretia Williams, Ph.D., project lead and Howard University researcher. 

The project team adopted a community-centric approach to audio data collection by organizing curated events in several cities, centering around Black panelists who both live and work in those communities.  

These panelists facilitated open and transparent discussions focused on Black culture, the intersection of technology and Black experiences, the growing presence of AI, and the importance of the Black community’s active participation in innovation. At the end of each event, the team introduced a three-week audio data collection initiative, inviting participants to sign up and contribute their voices and experiences to the project.

Orange Blossom Classic Game

The Howard African American English Dataset 1.0 will initially be made available exclusively to researchers and institutions within historically Black colleges and universities to ensure that the data is employed in ways that reflect the interests and needs of marginalized communities, specifically African American communities whose linguistic practices have often been excluded or misrepresented in computational systems.  

The release of this dataset to entities outside the HBCU network will be held for consideration at a later date, with the intention of prioritizing those whose work aligns with the values of inclusivity, empowerment, and community-driven research. 

###

About Howard University 

Howard University, established in 1867, is a leading private research university based in Washington, D.C. Howard’s 14 schools and colleges offer 140 undergraduate, graduate, and professional degree programs and lead the nation in awarding doctoral degrees to African American students. Howard is the top-ranked historically Black college or university according to Forbes and is the only HBCU ranked among U.S. News & World Report’s Top 100 National Universities. Renowned for its esteemed faculty, high achieving students, and commitment to excellence, leadership, truth and service, Howard produces distinguished alumni across all sectors, including the first Black U.S. Supreme Court justice and the first woman U.S. vice president; Schwarzman, Marshall, Rhodes and Truman Scholars; prestigious fellows; and over 165 Fulbright recipients. Learn more at www.howard.edu