Design a photo app for the blind.

  Google
Add Your Answer
Answers (3)

Clarifying questions : 1. What is the photo app used for ? Is it for any specific use case or in general ?

2. Is the photo app specifically for blind people or can be used by anyone ?

3. Is there any technical constraints to be taken into consideration ?

4. Would also want to understand by blind we mean both full blindness and low vision ?

5. Is there a specific language the app should be in or it can adapt to multiple languages ?

6. Is there a specific timeline that we are looking at ?

7. Another question is if the photo app is static images or dynamic images / videos ?

So now the question becomes, design a photo app that helps the blind people go about their day to day  activities, does not have any constraints, first MVP in English, to be released in 6 months and can be used by all via both static and dynamic images : Fully blind, Low vision and people who have sight.

Personas:

The personas are already defined as : Fully blind and people with low vision

User Journey and the associated pain points:

User Journey Pain Points
Visually Impaired person gets up Understanding of the time
Completes morning activities and would like to read Finds the braille section and reads
Gets ready and would want to start walking Needs regular identification of the location and GPS
Reaches office, completes work There could have been an obstacle on the way
While returning back from office, feels the cool breeze No way to send the same experience to friends , query them
Reaches home, need of some entertainment Finding books or videos but not in a mood to read

Solutions : The solutions can be categorized under different buckets

  • Community : Click a “static” photo and send it to friends (VIP and sighted) people to share experience, have any doubts clarified and/or ask for help from volunteers
  • Travel & GPS : A “static” or dynamic photo app that identifies the current location, lets the user know the distance and time to reach the destination
  • Experience & entertainment : A photo app or a camera app that
    • Reads normal text and braille aloud
    • Movie dialogues could be read or spoken to
    • Headphones to give the immersive experience
    • Recmmendations
  • Obstacle finder : The Photo app sensing obstacles send vibration or haptic feedback, aids in immersive experience also.
  • General identifier : The photo app that is able to identify anything that is being pointed at. Reads aloud the name and describes the object as needed. It could be currency identifier or a blanket in the front or a vase on the table.
  • Low vision :
    • People with low vision could be assisted by magnifier lens (to expand the text)
    • People with low vision for colors could be assisted by color sharpener
Since the time to build is 6 months, would prefer to take the solution which is easy to build and has the highest impact to the community, In that sense for short term MVP, would be picking up GPS and Community. For long term feature development. would look at General identifier which could help cover almost 80% use cases
Success metrics :
# of app downloads
# of sign ups
Avrage # of queries
Volunteer ratings
NPS
# recommendations
Caveats : A photo app could be expanded well to everything by pointing the camera however it also mixes the definition of taking a photo and understanding it further.
In Summary, my photo app for the blind to be released in 6 months would have features for community, social networking and mobility preferences to begin with thereby reducing the gap that  blind people are different from the visually abled and enjoy the world.

Clarify

  • Google or another company [Google]
  • Can you share insights driving the need for the product? [large and growing potential population of users that we are not serving today]
  • Platform? [up to me – choose mobile]
  • Geography – ok to focus on US due to market size [yes]
  • Any resource contrainsgts [no]
Goal
  • Potential: A, A, R, M – we know the google doesn’t try to monetize these apps initially. Suggest focusing on driving engagement and retention initially. If the product is enaging can recruit additional users to the platform by partnering with marketing teams.
  • Focus – Engagement/retention
Users
  • users are visually impared – that implies other senses may be more finely tuned. assume that verbal navigation is something they are comfortable with.
  • need for photos – stronger desire or opportunity enable deeper customer engagement since they may be restricted from this today
typical User journey for someone taking photos – will highlight pp for visually  impared
  • take a picture on your phone
  • store the photo
    • PP: so many photos, rarely end up going to retrieve them
      • Med
  • view the photo
    • PP: difficult to navigate a list of photos if you cannot see them
      • High
  • share the photo
    • PP: difficult to share if you cannot see them
      • High
  • create an album
    • PP: need to rely on others to create an album
      • High
Focus on viewing, sharing, and album creattion as key PP to focus on
Sound OK?
Solutions
  • assistant search – “find photos of grandpa” – there are 100 picts of grandpa i have found
    • help with pain point to search for photo
  • assistant describe – “tell me about this photo” – date, who is in the photo, and location of photo, sentimate analysis – happy/sad/etc, google lens to identify details – in the back there is a palm tree, etc
    • help with PP to view photos
  • living photo book – create a “memory” of grouped photos, such as of a person, and the assistant can collect photos and add to a live album – nia growing up – here is her first step, first word, first day of school, etc. and this book can visually played and grow over time – similiar to a playlist on spodify, this is a curated and narrated group
  • smart photo book – enables a cut of a group of photos with audio details
  • (moonshot) – way to incorporate smell into photos – photo of a rose – to bring it to life in a new dimension?
Rating
  • Assistant search
    • Impact – H
    • Effort – L
    • Overall – Must have
  • Assistant describe
    • Impact – H
    • Effort – H
    • Overall – Must have
  • Living photo book
    • Impact – M
    • Effort – H
    • Overall – Nice to have
  • Smart photo book
    • Impact – L
    • Effort – L
    • Overall – Nice to have
  • Addition of Smell dimension
    • Impact – H
    • Effort – H
    • Overall – Should have (nice to have but would highly differentate product and would delight user)
Prioritized features – assistant search, assistant descrive, addition to smell (likely research area to start) due to combo of impact and effort
Metrics
  • Primary – Usage – DAU/WAU/MAU
  • Secondary – # of pictured stored, engagement with new assistant features, # of downloads, interval betwee logins, churn
Summary
– designed a photo app for visually impared, described user journey, rated pain points, and identified 5 potential solutions including a moonshot. Prioritized top features and identified key success metrics.

Clarification & Understanding

What is the greater context and our motivation for doing this? Is there some org wide strategic initiative we need to align with or a specific goal? Let’s just focus on providing a really good experience to our users.

Is this a stand alone offering or an extension of something Google currently offers? It can be either. What do we mean when we say app? Is there a specific type of app we should focus on: desktop, mobile, web? If up to us then let’s approach this as platform agnostic, but I could see us leaning towards a mobile app as a lot of photo capturing, viewing, and sharing is done on mobile devices.

 

User Segments

This problem is still pretty ambiguous so let’s try to break it down further by looking at some of the different user segments this could apply to. It is worth noting that there are varying levels of blindness:

  1. Completely Blind – Can’t see anything at all pitch black
  2. Partially Blind – Can still see shapes and colors
  3. Near Sighted – Functionally blind to objects far away, but might be able to read text via a screen reader
  4. Not Blind – While the core users we are designing for are some level of blind, non-blind users particularly friends or family may still be using the app to interact with those blind users.

Out of the above user segments, I’m going to suggest we focus on the completely blind. This is the most common form of blindness and if our solution works for someone who is completely blind it will also work for someone who is partially blind as well where as the opposite of that isn’t necessarily true.

 

User Needs / Pain Points

  1. Blind people have no idea if they successfully took a clear photo of what they were trying to take a photo of
  2. Blind people need to know what is in the photo they are looking at
  3. Blind people need to know if their photo has been shared successfully
  4. Blind people can’t thumb or scroll through a photo album to find the photo they are looking for, finding a specific photo is difficult

Out of the above pain points, I’m going to suggest we focus on pain point #2. This pain point in particular is really central to the experience of a photo app. If users of our photo app can’t understand what the photo is of then we fundamentally aren’t providing a good experience.

 

Solutions

Now that we have a better understanding of some of the issues the user faces, let’s go ahead and brainstorm some solutions to help blind users understand what a photo is of:

  1. Description Prompt – When users share or send photos to a blind user of Google Photos, we could prompt them to type a short description describing the image. When the visually impaired user receives the photo or views it at a later date the description of the photo would be read aloud.
  2. Audio Companion Files – When a user takes a photo we could allow them to automatically record an audio description of what they are taking a picture of that would then be attached to the photo.
  3. AI Descriptions – We could use machine learning algorithms to automatically analyze the contents of a photo and create a verbal description of the photo that would then be read aloud to the user

 

Prioritization

I’m assuming we won’t have the bandwidth necessary to build out all three solutions in parallel so let’s go ahead and choose one to prioritize. Remember our goal here is to provide a really good experience to our users. To help us do this I’m going to use the comparison matrix below:

Ease of Implementation, User Satisfaction

1. A, B

2. B, B-

3. C+, A

I’m going to suggest we prioritize building out solution #3 first. While this is the hardest solution to implement I think it is going to be the one users find most useful. The first two solutions are a little too niche in the sense that they apply to individual scenarios, either someone sending the blind user a photo or the blind user taking a photo themself. They both require someone to manually describe the contents of photo and that is going to be difficult to scale. The AI Descriptions of photos would not be constrained by requiring using to do something manually.

The AI Description of photos could of course be leveraged by any sort of photo specific applications, but I think it can be useful to our users beyond that. If incorporated as a general utility into audio based screen readers then the visually impaired users would be able to understand the contents of photos they encounter elsewhere online. For example, an attachement in an email or an image embedded into a news article they’re reading.

While all of the above sounds great, let’s not lose sight of the fact that this is going to be quite difficult to implement. A very rudimentary version could read aloud information like the capture time and location from the image’s exif data as well as detect standard objects present like “person smiling”. Future versions of the model could both be more specific and be tailored to the individual users. For example, instead of detecting “person smiling” it could detect “Your grand daughter Sophie smirking while eating halloween candy”. Facebook’s auto-tagging capability is proof that this is doable, but at the same time they have a very large and rich data set of photos.

 

Summary

In order to provide a really good product experience to users who are completely blind, we are going to build an AI based application that will automatically analyze the contents of a photo and read an audio description of it’s contents.

 

Metrics

Right off the top of my head, I think there are two points that are important to measure:

1. Does our product accurately describe the contents of a photo?

2. Do our users find our descriptions of the photos useful?

 

Let’s focus on picking out a metric that accurately reflects our ability to do #2. I think #2 encapsulates #1 in the sense that if our product can’t accurately decribe the contents of photos then our users aren’t going to find the descriptions useful so based upon that let’s focus on question #2.

To assess whether or not our users find our descriptions of photos useful, we should focus on monitoring our number of daily active users. This is the type of application that is meant to be used day-to-day so focusing on daily active users makes more sense than monthly active users. If users aren’t finding our product useful then they won’t use it.

One potential complication here is if our AI Descriptions are leveraged by something like a general audio based screen reader as a standard offering then users could be de facto using our AI descriptions even if they didn’t find them useful. In that situation maybe offering a small subset of users the ability to skip the reading of our descriptions could be useful just to monitor if they do it or not as a proxxy for how useful they find our AI Descriptions.