NLP for Audio Search

Services: Natural Language Processing

On air since 1954, KQED, the most frequented public radio station in the United States, shares valuable information to local communities daily. Following the broadcast, KQED oversees indexing and archiving immense amounts of audio data. With a growing dataset of over 60 years, finding a solution to increase search and retrieval processes was of interest. Increasing search and retrieval necessitates transcribing audio files into text and assigning metadata; however, doing so by hand would prove time-consuming and taxing. Seeking potential solutions to reduce manual tasks required, KQED partnered with Google to research the “Art of the Possible” in Machine Learning. As one of Google’s trusted AI partners, KUNGFU.AI was invited to guide research efforts.


Transcribing audio data into a machine-workable text requires understanding context, content, and syntax while accounting for colloquialisms. Audio quality and grammar can vary, while accents can be difficult to decipher. Also, individuals often use shorthand or acronyms in lieu of complete phrases. For example, machines have a harder time understanding phrases that refer to the same subject, such as “the Agency” and “the C.I.A.” as the same entity. Machines are less adept at identifying and inferring the major actors in a sentence: who, what, when, where, and why, than humans are.


Recognizing the scale of challenges for KQED, Google and KUNGFU.AI conducted a variety of Subject Matter Expert (SME) interviews. The interviews gave insight into a variety of challenges, data sources, and use cases predominant in the broadcasting space. One challenge was transcribing audio content to machine-workable text. Tackling transcription, KUNGFU.AI executed model research prototyping, leveraging Google’s Speech-to-Text feature converting audio data into machine legible text. Additionally, KUNGFU.AI explored the efficacy of recent advancements in transformers, a deep learning technique that proved promising in Natural Language Processing (NLP) tasks.


Completing the Research Phase, KUNGFU.AI provided Google and KQED a feasibility study and technical strategy. Our work outlined difficulties, pinpointed opportunities to leverage state-of-the-art Natural Language Processing (NLP) architectures, and provided guidance on short-term and long-term development sprints. We promptly demonstrated the efficacy of Machine Learning on audio data and outlined solution approaches to a variety of use cases previously difficult to surface. KUNGFU.AI’s 4-week development ultimately saved 9-12 months of internal exploration and averted loss during development.