Inclusive Photo App Design for the Blind

Design a photo app for the blind.

Related to Product Design
For Internal Tools PM

Google

Add Your Answer

Answers (3)

Marie Hamilton

Clarifying questions : 1. What is the photo app used for ? Is it for any specific use case or in general ?

2. Is the photo app specifically for blind people or can be used by anyone ?

3. Is there any technical constraints to be taken into consideration ?

4. Would also want to understand by blind we mean both full blindness and low vision ?

5. Is there a specific language the app should be in or it can adapt to multiple languages ?

6. Is there a specific timeline that we are looking at ?

7. Another question is if the photo app is static images or dynamic images / videos ?

So now the question becomes, design a photo app that helps the blind people go about their day to day activities, does not have any constraints, first MVP in English, to be released in 6 months and can be used by all via both static and dynamic images : Fully blind, Low vision and people who have sight.

Personas:

The personas are already defined as : Fully blind and people with low vision

User Journey and the associated pain points:

User Journey	Pain Points
Visually Impaired person gets up	Understanding of the time
Completes morning activities and would like to read	Finds the braille section and reads
Gets ready and would want to start walking	Needs regular identification of the location and GPS
Reaches office, completes work	There could have been an obstacle on the way
While returning back from office, feels the cool breeze	No way to send the same experience to friends , query them
Reaches home, need of some entertainment	Finding books or videos but not in a mood to read

Solutions : The solutions can be categorized under different buckets

Community : Click a “static” photo and send it to friends (VIP and sighted) people to share experience, have any doubts clarified and/or ask for help from volunteers
Travel & GPS : A “static” or dynamic photo app that identifies the current location, lets the user know the distance and time to reach the destination
Experience & entertainment : A photo app or a camera app that
- Reads normal text and braille aloud
- Movie dialogues could be read or spoken to
- Headphones to give the immersive experience
- Recmmendations
Obstacle finder : The Photo app sensing obstacles send vibration or haptic feedback, aids in immersive experience also.
General identifier : The photo app that is able to identify anything that is being pointed at. Reads aloud the name and describes the object as needed. It could be currency identifier or a blanket in the front or a vase on the table.
Low vision :
- People with low vision could be assisted by magnifier lens (to expand the text)
- People with low vision for colors could be assisted by color sharpener

Since the time to build is 6 months, would prefer to take the solution which is easy to build and has the highest impact to the community, In that sense for short term MVP, would be picking up GPS and Community. For long term feature development. would look at General identifier which could help cover almost 80% use cases

Success metrics :

# of app downloads

# of sign ups

Avrage # of queries

Volunteer ratings

NPS

# recommendations

Caveats : A photo app could be expanded well to everything by pointing the camera however it also mixes the definition of taking a photo and understanding it further.

In Summary, my photo app for the blind to be released in 6 months would have features for community, social networking and mobility preferences to begin with thereby reducing the gap that blind people are different from the visually abled and enjoy the world.

Matthew Shun

Clarify

Google or another company [Google]
Can you share insights driving the need for the product? [large and growing potential population of users that we are not serving today]
Platform? [up to me – choose mobile]
Geography – ok to focus on US due to market size [yes]
Any resource contrainsgts [no]

Goal

Potential: A, A, R, M – we know the google doesn’t try to monetize these apps initially. Suggest focusing on driving engagement and retention initially. If the product is enaging can recruit additional users to the platform by partnering with marketing teams.
Focus – Engagement/retention

Users

users are visually impared – that implies other senses may be more finely tuned. assume that verbal navigation is something they are comfortable with.
need for photos – stronger desire or opportunity enable deeper customer engagement since they may be restricted from this today

typical User journey for someone taking photos – will highlight pp for visually impared

take a picture on your phone
store the photo
- PP: so many photos, rarely end up going to retrieve them
  - Med
view the photo
- PP: difficult to navigate a list of photos if you cannot see them
  - High
share the photo
- PP: difficult to share if you cannot see them
  - High
create an album
- PP: need to rely on others to create an album
  - High

Focus on viewing, sharing, and album creattion as key PP to focus on

Sound OK?

Solutions

assistant search – “find photos of grandpa” – there are 100 picts of grandpa i have found
- help with pain point to search for photo
assistant describe – “tell me about this photo” – date, who is in the photo, and location of photo, sentimate analysis – happy/sad/etc, google lens to identify details – in the back there is a palm tree, etc
- help with PP to view photos
living photo book – create a “memory” of grouped photos, such as of a person, and the assistant can collect photos and add to a live album – nia growing up – here is her first step, first word, first day of school, etc. and this book can visually played and grow over time – similiar to a playlist on spodify, this is a curated and narrated group
smart photo book – enables a cut of a group of photos with audio details
(moonshot) – way to incorporate smell into photos – photo of a rose – to bring it to life in a new dimension?

Rating

Assistant search
- Impact – H
- Effort – L
- Overall – Must have
Assistant describe
- Impact – H
- Effort – H
- Overall – Must have
Living photo book
- Impact – M
- Effort – H
- Overall – Nice to have
Smart photo book
- Impact – L
- Effort – L
- Overall – Nice to have
Addition of Smell dimension
- Impact – H
- Effort – H
- Overall – Should have (nice to have but would highly differentate product and would delight user)

Prioritized features – assistant search, assistant descrive, addition to smell (likely research area to start) due to combo of impact and effort

Metrics

Primary – Usage – DAU/WAU/MAU
Secondary – # of pictured stored, engagement with new assistant features, # of downloads, interval betwee logins, churn

Summary

– designed a photo app for visually impared, described user journey, rated pain points, and identified 5 potential solutions including a moonshot. Prioritized top features and identified key success metrics.

Gerard Kolan

Clarification & Understanding

What is the greater context and our motivation for doing this? Is there some org wide strategic initiative we need to align with or a specific goal? Let’s just focus on providing a really good experience to our users.

Is this a stand alone offering or an extension of something Google currently offers? It can be either. What do we mean when we say app? Is there a specific type of app we should focus on: desktop, mobile, web? If up to us then let’s approach this as platform agnostic, but I could see us leaning towards a mobile app as a lot of photo capturing, viewing, and sharing is done on mobile devices.

User Segments

This problem is still pretty ambiguous so let’s try to break it down further by looking at some of the different user segments this could apply to. It is worth noting that there are varying levels of blindness:

Completely Blind – Can’t see anything at all pitch black
Partially Blind – Can still see shapes and colors
Near Sighted – Functionally blind to objects far away, but might be able to read text via a screen reader
Not Blind – While the core users we are designing for are some level of blind, non-blind users particularly friends or family may still be using the app to interact with those blind users.

Out of the above user segments, I’m going to suggest we focus on the completely blind. This is the most common form of blindness and if our solution works for someone who is completely blind it will also work for someone who is partially blind as well where as the opposite of that isn’t necessarily true.

User Needs / Pain Points

Blind people have no idea if they successfully took a clear photo of what they were trying to take a photo of
Blind people need to know what is in the photo they are looking at
Blind people need to know if their photo has been shared successfully
Blind people can’t thumb or scroll through a photo album to find the photo they are looking for, finding a specific photo is difficult

Out of the above pain points, I’m going to suggest we focus on pain point #2. This pain point in particular is really central to the experience of a photo app. If users of our photo app can’t understand what the photo is of then we fundamentally aren’t providing a good experience.

Solutions

Now that we have a better understanding of some of the issues the user faces, let’s go ahead and brainstorm some solutions to help blind users understand what a photo is of:

Description Prompt – When users share or send photos to a blind user of Google Photos, we could prompt them to type a short description describing the image. When the visually impaired user receives the photo or views it at a later date the description of the photo would be read aloud.
Audio Companion Files – When a user takes a photo we could allow them to automatically record an audio description of what they are taking a picture of that would then be attached to the photo.
AI Descriptions – We could use machine learning algorithms to automatically analyze the contents of a photo and create a verbal description of the photo that would then be read aloud to the user

Prioritization

I’m assuming we won’t have the bandwidth necessary to build out all three solutions in parallel so let’s go ahead and choose one to prioritize. Remember our goal here is to provide a really good experience to our users. To help us do this I’m going to use the comparison matrix below:

Ease of Implementation, User Satisfaction

1. A, B

2. B, B-

3. C+, A

I’m going to suggest we prioritize building out solution #3 first. While this is the hardest solution to implement I think it is going to be the one users find most useful. The first two solutions are a little too niche in the sense that they apply to individual scenarios, either someone sending the blind user a photo or the blind user taking a photo themself. They both require someone to manually describe the contents of photo and that is going to be difficult to scale. The AI Descriptions of photos would not be constrained by requiring using to do something manually.

The AI Description of photos could of course be leveraged by any sort of photo specific applications, but I think it can be useful to our users beyond that. If incorporated as a general utility into audio based screen readers then the visually impaired users would be able to understand the contents of photos they encounter elsewhere online. For example, an attachement in an email or an image embedded into a news article they’re reading.

While all of the above sounds great, let’s not lose sight of the fact that this is going to be quite difficult to implement. A very rudimentary version could read aloud information like the capture time and location from the image’s exif data as well as detect standard objects present like “person smiling”. Future versions of the model could both be more specific and be tailored to the individual users. For example, instead of detecting “person smiling” it could detect “Your grand daughter Sophie smirking while eating halloween candy”. Facebook’s auto-tagging capability is proof that this is doable, but at the same time they have a very large and rich data set of photos.

Summary

In order to provide a really good product experience to users who are completely blind, we are going to build an AI based application that will automatically analyze the contents of a photo and read an audio description of it’s contents.

Metrics

Right off the top of my head, I think there are two points that are important to measure:

1. Does our product accurately describe the contents of a photo?

2. Do our users find our descriptions of the photos useful?

Let’s focus on picking out a metric that accurately reflects our ability to do #2. I think #2 encapsulates #1 in the sense that if our product can’t accurately decribe the contents of photos then our users aren’t going to find the descriptions useful so based upon that let’s focus on question #2.

To assess whether or not our users find our descriptions of photos useful, we should focus on monitoring our number of daily active users. This is the type of application that is meant to be used day-to-day so focusing on daily active users makes more sense than monthly active users. If users aren’t finding our product useful then they won’t use it.

One potential complication here is if our AI Descriptions are leveraged by something like a general audio based screen reader as a standard offering then users could be de facto using our AI descriptions even if they didn’t find them useful. In that situation maybe offering a small subset of users the ability to skip the reading of our descriptions could be useful just to monitor if they do it or not as a proxxy for how useful they find our AI Descriptions.