Khoury News
With this AI-driven tool, blind users can experience YouTube and TikTok videos, too
“Video is such a big part of how people experience the world now,” says Lana Do, a member of the YouDescribe research team. “Blind and low-vision people deserve access to that world, too.”
Every year, billions of videos are uploaded to platforms like YouTube and TikTok. For many viewers, they are a source of entertainment, news, and cultural conversation. But for people who are blind or have low vision, much of that content is inaccessible.
Professional audio descriptions, the narrated explanations of visual action that make films and television understandable without sight, are common on major streaming platforms. But user-generated videos almost never include them.
At Northeastern University’s Silicon Valley campus, a Khoury College research team led by Teaching Professor and Director of Computing Programs Ilmi Yoon is working to change that. Using artificial intelligence and a global community of volunteers, the group is building tools that automatically generate audio descriptions for online videos and allow users to refine them collaboratively.
The work is part of a long-running project called YouDescribe. This year, the researchers — Yoon, Align master’s alumna Lana Do, and master’s students Zhenzhen Qin, Hyunjoo Shim, Yue Liang, and Trung Kien Nguyen — received a $200,000 grant from the nonprofit Ability Central to expand the system and bring more students into the research effort.
YouDescribe began in 2013 at the Smith-Kettlewell Eye Research Institute in San Francisco as a crowdsourcing project for training sighted volunteers to create audio descriptions for YouTube videos. Volunteers could watch a video, record narration explaining the visual action, and upload the description so blind users could follow along.
Yoon joined the project nearly a decade ago, initially helping to develop the platform’s web and mobile infrastructure. At the time, audio descriptions were written entirely by humans — a slow and labor-intensive process — and blind community members wanted access to the ever-increasing number of videos they could understand.
Over the years, students helped expand the platform. Today the site has more than 3,000 active users who volunteer to describe videos or request new ones. Even so, the backlog is enormous. Only a small fraction of requested videos currently have audio descriptions.
Recent advances in vision-language models — AI systems that can analyze images and video while generating natural language descriptions — have made it possible to automate part of the process.
“We use AI to generate the first draft [of the description],” Qin says. “Then volunteers can refine it before it’s published for blind and low-vision users.”

This is not simply a technical task. The goal is to strike a balance between automation and human judgment, which requires carefully choosing what to describe and when, without overwhelming listeners with unnecessary information.
“AI can be impressively good,” Liang says. “But humans are still better at deciding which details are important.”
One of the team’s newest features makes the experience more interactive. Rather than attempting to describe every visual element in a video, the system provides a concise baseline description. If a viewer wants more information, they can pause the video and ask a question.
Do, who completed her master’s in computer science last year, helped develop the feature.
“AI descriptions can sometimes be very long,” Do says. “We’re working on making them concise and contextual but also giving users the ability to ask for more details when they need them.”
For example, a viewer watching a cooking video could ask what ingredients are on the counter or what color a dish looks like. The system then generates additional descriptions in response, mimicking the experience of watching a video with a friend who can answer questions about what’s happening on-screen.
Much of the development work, including backend infrastructure and AI research, is done by Khoury graduate students in Silicon Valley, who bring a variety of backgrounds to the project. Liang previously worked as an auditor before becoming interested in generative AI through hackathons. Qin studied linguistics and worked in speech technology before transitioning to software engineering.
Kien, a graduate student in artificial intelligence, is developing tools to measure how much a human editor modifies an AI-generated description. The system calculates the edit distance between drafts to better understand how humans and AI collaborate.
“That helps us see how much of the final description comes from the AI and how much from volunteers,” Yoon says.
Students also maintain the platform’s infrastructure, design user interfaces, and fix technical issues that arise when integrating with external video platforms.
“Using computing skills to impact someone else’s life is very rewarding,” Yoon says. “Many students tell me that working on this project fills something that was missing for them.”
The new grant from Ability Central will help stabilize the platform, make it more reliable, expand its capabilities, support the student researchers, and increase testing with blind and low-vision users. The organization, which supports technology and services for people with disabilities, partnered with Yoon’s team to help bring the system to a wider audience.
To help grow the number of people that they can reach, the researchers are pursuing additional funding. Yoon is currently working with Ability Central on a proposal to the National Science Foundation’s Future of Core Technologies program. If awarded, the grant would bring roughly $1 million to the project and support deeper research into accessible AI systems.
It would also create more opportunities for students to participate in the work.
“We want the Silicon Valley campus to be a place where students can do meaningful research,” Yoon says.
Ultimately, the team hopes to build more than a tool. They envision an ecosystem where AI, volunteers, and blind users collaborate to make online video accessible.
“Video is such a big part of how people experience the world now,” Do says. “Blind and low-vision people deserve access to that world, too.”
The Khoury Network: Be in the know
Subscribe now to our monthly newsletter for the latest stories and achievements of our students and faculty