TalkBack uses Gemini Nano to increase image accessibility for users with low vision

fromermedia@gmail.comSeptember 4, 202437 comments

posted on Sep. 04, 2024 at 6:24 pm

263Views

Posted by Terence Zhang – Developer Relations Engineer and Lisie Lillianfeld – Product Manager

TalkBack is Android’s screen reader in the Android Accessibility Suite that describes text and images for Android users who have blindness or low vision. The TalkBack team is always working to make Android more accessible. Today, thanks to Gemini Nano with multimodality, TalkBack automatically provides users with blindness or low vision more vivid and detailed image descriptions to better understand the images on their screen.

Increasing accessibility using Gemini Nano with multimodality

Advancing accessibility is a core part of Google’s mission to build for everyone. That’s why TalkBack has a feature to describe images when developers didn’t include descriptive alt text. This feature was powered by a small ML model called Garcon. However, Garcon produced short, generic responses and couldn’t specify relevant details like landmarks or products.

The development of Gemini Nano with multimodality was the perfect opportunity to use the latest AI technology to increase accessibility with TalkBack. Now, when TalkBack users opt in on eligible devices, the screen reader uses Gemini Nano’s new multimodal capabilities to automatically provide users with clear, detailed image descriptions in apps including Google Photos and Chrome, even if the device is offline or has an unstable network connection.

“Gemini Nano helps fill in missing information,” said Lisie Lillianfeld, product manager at Google. “Whether it’s more details about what’s in a photo a friend sent or the style and cut of clothing when shopping online.”

Going beyond basic image descriptions

Here’s an example that illustrates how Gemini Nano improves image descriptions: When Garcon is presented with a panorama of the Sydney, Australia shoreline at night, it might read: “Full moon over the ocean.” Gemini Nano with multimodality can paint a richer picture, with a description like: “A panoramic view of Sydney Opera House and the Sydney Harbour Bridge from the north shore of Sydney, New South Wales, Australia.”

“It’s amazing how Nano can recognize something specific. For instance, the model will recognize not just a tower, but the Eiffel Tower,” said Lisie. “This kind of context takes advantage of the unique strengths of LLMs to deliver a helpful experience for our users.”

Using an on-device model like Gemini Nano was the only feasible solution for TalkBack to provide automatically generated detailed image descriptions for images, even while the device is offline.

“The average TalkBack user comes across 90 unlabeled images per day, and those images weren’t as accessible before this new feature,” said Lisie. The feature has gained positive user feedback, with early testers writing that the new image descriptions are a “game changer” and that it’s “wonderful” to have detailed image descriptions built into TalkBack.

Gemini Nano with multimodality was critical to improving the experience for users with low vision. Providing detailed on-device image descriptions wouldn’t have been possible without it. — Lisie Lillianfeld, Product Manager at Google

Balancing inference verbosity and speed

One important decision the Android accessibility team made when implementing Gemini Nano with multimodality was between inference verbosity and speed, which is partially determined by image resolution. Gemini Nano with multimodality currently accepts images in either 512 pixels or 768 pixels.

“The 512-pixel resolution emitted its first token almost two seconds faster than 768 pixels, but the output wasn’t as detailed,” said Tyler Freeman, a senior software engineer at Google. “For our users, we decided a longer, richer description was worth the increased latency. We were able to hide the perceived latency a bit by streaming the tokens directly to the text-to-speech system, so users don’t have to wait for the full text to be generated before hearing a response.”

A hybrid solution using Gemini Nano and Gemini 1.5 Flash

TalkBack developers also implemented a hybrid AI solution using Gemini 1.5 Flash. With this server-based AI model, TalkBack can provide the best of on-device and server-based generative AI features to make the screen reader even more powerful.

When users want more details after hearing an automatically generated image description from Gemini Nano, TalkBack gives the user an option to listen to more by running the image through Gemini Flash. When users focus on an image, they can use a three-finger tap to open the TalkBack menu and select the “Describe Image” option to send the image to Gemini 1.5 Flash on the server and get even more details.

By combining the unique advantages of both Gemini Nano’s on-device processing with the full power of cloud-based Gemini 1.5 Flash, TalkBack provides blind and low-vision Android users a helpful and informative experience with images. The “describe image” feature powered by Gemini 1.5 Flash launched to TalkBack users on more Android devices, so even more users can get detailed image descriptions.

Animated UI example of TalkBack in action, describing a photo of a sunny view of Sydney Harbor, Australia, with the Sydney Opera House and Sydney Harbour Bridge in the frame.

Compact model, big impact

The Android accessibility team recommends developers looking to use the Gemini Nano with multimodality prototype and test on a powerful, server-side model first. There developers can understand the UX faster, iterate on prompt engineering, and get a better idea of the highest quality possible using the most capable model available.

While Gemini Nano with multimodality can include missing context to improve image descriptions, it’s still best practice for developers to provide detailed alt text for all images on their apps or websites. If the alt text is not provided, TalkBack can help fill in the gaps.

The Android accessibility team’s goal is to create inclusive and accessible features, and leveraging Gemini Nano with multimodality to provide vivid and detailed image descriptions automatically is a big step towards that. Furthermore, their hybrid approach towards AI, combining the strengths of both Gemini Nano on device and Gemini 1.5 Flash in the server, showcases the transformative potential of AI in promoting inclusivity and accessibility and highlights Google’s ongoing commitment to building for everyone.

Get started

Learn more about Gemini Nano for app development.

This blog post is part of our series: Spotlight Week on Android 15, where we provide resources — blog posts, videos, sample code, and more — all designed to help you prepare your apps and take advantage of the latest features in Android 15. You can read more in the overview of Spotlight Week: Android 15, which will be updated throughout the week.

Source link

fromermedia@gmail.comSeptember 4, 2024

the authorfromermedia@gmail.com

37 Comments

xwm4j says:

July 8, 2025 at 2:19 am

amoxicillin over the counter – combamoxi order amoxil pills
vrdz5 says:

July 8, 2025 at 4:20 pm

amoxil over the counter – https://combamoxi.com/ how to get amoxil without a prescription
i30ya says:

July 9, 2025 at 2:45 pm

where to buy diflucan without a prescription – this buy generic fluconazole 200mg
gbzlj says:

July 9, 2025 at 3:37 pm

fluconazole sale – https://gpdifluca.com/ diflucan 200mg oral
yhmj9 says:

July 10, 2025 at 9:20 pm

escitalopram where to buy – https://escitapro.com/# buy escitalopram 10mg pill
hy6k0 says:

July 10, 2025 at 10:10 pm

escitalopram 20mg us – buy lexapro cheap buy lexapro 10mg generic
nmhgy says:

July 11, 2025 at 4:12 am

order cenforce generic – https://cenforcers.com/ cenforce 50mg generic
0a3ti says:

July 11, 2025 at 4:59 am

order cenforce 100mg online – cenforcers.com cenforce drug
6zqan says:

July 12, 2025 at 2:40 pm

where can i buy cialis online in canada – this where to get the best price on cialis
tvrer says:

July 12, 2025 at 3:26 pm

purchase cialis online – https://ciltadgn.com/# tadalafil prescribing information
l5046 says:

July 14, 2025 at 12:41 am

cialis stopped working – purchase generic cialis online cialis prescription
2g770 says:

July 14, 2025 at 2:15 am

is tadalafil peptide safe to take – https://strongtadafl.com/# cheaper alternative to cialis
Connieblats says:

July 15, 2025 at 9:50 pm

buy ranitidine sale – this zantac 300mg brand
m5yt6 says:

July 16, 2025 at 6:39 am

where can i buy viagra in manila – strongvpls buy levitra viagra
m67mw says:

July 16, 2025 at 7:56 am

buy viagra cialis online canada – site download cheap viagra
jcso1 says:

July 18, 2025 at 5:21 am

I am in point of fact thrilled to glitter at this blog posts which consists of tons of useful facts, thanks representing providing such data. gabapentin 100mg generic
cpe4j says:

July 18, 2025 at 6:26 am

I’ll certainly return to review more. https://buyfastonl.com/furosemide.html
Connieblats says:

July 18, 2025 at 8:02 pm

Thanks on putting this up. It’s okay done. site
Connieblats says:

July 18, 2025 at 10:13 pm

This website exceedingly has all of the information and facts I needed about this participant and didn’t comprehend who to ask. https://gnolvade.com/
Connieblats says:

July 21, 2025 at 2:33 am

More posts like this would persuade the online play more useful. https://ursxdol.com/cialis-tadalafil-20/
Connieblats says:

July 21, 2025 at 6:08 am

This is a keynote which is near to my verve… Diverse thanks! Unerringly where can I find the acquaintance details for questions? https://ursxdol.com/augmentin-amoxiclav-pill/
w7x3d says:

July 21, 2025 at 8:19 am

More articles like this would make the blogosphere richer. https://prohnrg.com/product/acyclovir-pills/
o8lg3 says:

July 21, 2025 at 9:16 am

This is the amicable of content I take advantage of reading. https://prohnrg.com/
9x5gr says:

July 24, 2025 at 12:25 am

More posts like this would bring about the blogosphere more useful. Г©quivalent viagra homme
gtdfd says:

July 24, 2025 at 1:18 am

The reconditeness in this piece is exceptional. https://aranitidine.com/fr/modalert-en-france/
Connieblats says:

August 5, 2025 at 6:00 am

More posts like this would make the online time more useful. https://ondactone.com/product/domperidone/
Connieblats says:

August 5, 2025 at 10:50 pm

I couldn’t weather commenting. Profoundly written! https://ondactone.com/simvastatin/
Connieblats says:

August 8, 2025 at 3:23 am

This is the description of glad I get high on reading.
https://proisotrepl.com/product/baclofen/
Connieblats says:

August 8, 2025 at 9:43 pm

Thanks for putting this up. It’s evidently done.
levofloxacin 500mg cheap
Connieblats says:

August 16, 2025 at 8:31 pm

I couldn’t hold back commenting. Warmly written! http://seafishzone.com/home.php?mod=space&uid=2293990
Connieblats says:

August 17, 2025 at 8:04 pm

More articles like this would frame the blogosphere richer. https://www.forum-joyingauto.com/member.php?action=profile&uid=48099
Connieblats says:

August 21, 2025 at 8:51 pm

buy forxiga 10mg generic – order forxiga 10 mg sale buy generic dapagliflozin online
Connieblats says:

August 22, 2025 at 2:09 pm

dapagliflozin 10 mg cheap – buy generic forxiga over the counter dapagliflozin 10 mg us
Connieblats says:

August 24, 2025 at 9:03 pm

buy xenical paypal – https://asacostat.com/ buy orlistat pills for sale
Connieblats says:

August 25, 2025 at 2:40 pm

purchase xenical pills – click order orlistat 120mg online
Connieblats says:

August 30, 2025 at 2:32 pm

The vividness in this piece is exceptional. https://www.forum-joyingauto.com/member.php?action=profile&uid=49481
Connieblats says:

August 31, 2025 at 9:44 pm

More content pieces like this would insinuate the web better. http://zqykj.com/bbs/home.php?mod=space&uid=303411