How can we improve Microsoft’s “Seeing AI” application?

8 min readJan 11, 2021

A cat captured with Seeing AI during the test sessions and described as a cat lying on the bed (correct) — A cat captured with “Seeing AI” during the test sessions

In this article, I am going to summarize my independent accessibility study that I have been working on since December, 2020. The story actually started with the Accessibility Design course I took in winter 2019. I worked with participants who are visually impaired and conducted usability research for the “Be My Eyes” application designed for the blind community. Working on an accessibility project gave me so much passion and purpose in my work, and at that point, I decided that I wanted to continue working towards accessibility design.

This year, I decided to commit my time to evaluate Microsoft’s Seeing AI app in order to gauge its usability. Seeing AI brings the power of the cloud and artificial intelligence (AI) to identify people and objects, and audibly describes those objects for people with visual impairments. I downloaded the app and started to explore user task flows. The more I observed, the more I wanted to talk with the actual users in order to understand the current experience.

Over two weeks, I recruited 6 participants who were either blind or with low vision. Thanks to my participants, they were sincerely ready and excited to share their insights with me over Zoom sessions. As they went through the tasks I prepared for them I was able to gain a true understanding of their experience with this product. They shared their phone screen and audio with me through the Zoom iPhone screen share feature. I gave them 4 different tasks with pre and post-interview questions. I chose the task scenarios I mentioned below because I believed some of them were the potential purposes for a user to use the app during Christmas time and others were the top features available at the time the sessions were done.

Reading expiration date on any kitchen item
Describing visual figures on a holiday card
Describing people on an image
Reading a document

I transcribed all my interviews and conducted a qualitative analysis using a coding technique and an affinity diagram. For the coding technique, I tagged my participant quotes with the keywords that I came up and later I put those keywords and some of the important quotes on the sticky notes. I have followed Microsoft’s AI research guidelines and matched my codes to “Guidelines for Human-AI Interaction”. This helped me synthesize my findings and bring up results that are aligned with Microsoft’s sources.

An affinity diagram composed of interviews notes written on post-its that are matched with the Human-AI guideline cards — Interview Insights in an Affinity Diagram (for the diagram details please email me at bengisudost@gmail.com)

The Results

Guideline 1 - Make clear what the system can do

This is where we set expectations when designing AI systems. For example, none of my participants knew that they can have Seeing AI describe their photos from their camera roll. When I asked them to use the “Browse Photos” feature which was hidden under the Menu, they ran their photos. They were fascinated by the descriptions of their loved ones and pets. And actually, the “browse photos” feature described better than the “scene” feature for certain tasks. Therefore I believe users should be aware of the opportunity they have and use it for better results.

Potential Solution:

Bring up the “Browse Photos” feature to the main channel bar.

Guideline 2 - Make clear how well the system can do

This is the section where we explain the limits of AI to users. None of the participants was able to read an expiration date on a kitchen item during our testing sessions. The participants used a milk carton, a mini milk bottle, a box of chocolate bars, and a coffee creamer. One of the users thought that it’s hard to read text on a rounded surface. Most of them had no idea where the expiration date would be placed on those items. Even though most of them mentioned the “short text” feature is their favorite and easiest channel, this task was extremely difficult for them.

“It’s a little embarrassing that I couldn’t find the expiration date on the creamer” — Participant 3
“I would consider myself extremely lucky to find something like that (exp date)” — Participant 2

Potential Solution

Let users know how well the app can read at the beginning of the channel such as what type of prints or font sizes are available which prevents users from questioning or feeling bad about themselves for the tasks which are out of the app’s intention or limits.

Three of my participant also mentioned that they wished the system could read from digital screens such as blood pressure devices, digital stove screens, or recognize a person from a computer screen. However, over time by trial method, they learned that those functions were beyond the Seeing AI’s limits.

Potential Solution

Either state common tasks that are not available at the beginning of the channel, or brainstorm ideas to bring those new features alive.

Guideline 4 - Show contextually relevant information & Guideline 10 - Scope services when in doubt

During the study session, while users were going through tasks such as describing holiday cards and images, I discovered that accuracy and the way accuracy information is given to users were causing a problem for them. According to these guidelines, the AI system should display related information to the user’s task and environment, and also express itself accurately when it's in doubt. For example, I observed 9 different attempts with failed objects and people recognition results, however, in only 2 of them, AI described these objects and people by starting a probability statement such as “probably…”. But surprisingly, during another user attempt for depicting a description of a boy on an image, the AI described him perfectly but put a probability at the beginning of the description “probably a smiling boy with a flower pattern on his shirt.” I believe the user should be informed of these probabilities with more consistently accurate results and informed about when AI is actually in doubt.

Potential Solution

Reevaluate, restructure, and refine the confidence level of AI on descriptions.
OR Design a new category scale for the confidence level of AI that gives more clarity to the users. i.e “Most likely” >“Probably” > “Might be”

Accuracy excel sheet with 5 columns: Participant #,Task number, AI description, Actual Case Description, Accuracy Percentage. — Accuracy Table for Task 2 and Task 3 Results (for more details please email me at bengisudost@gmail.com)

A collage of participants’ phone screenshots taken during the interviews. Image contains shots of objects used in this study — Some of the objects and photos used during the interviews

Guideline 5 - Match relevant social norms

This guideline is where, as a researcher, we ensure the experience is delivered based on users’ social and cultural expectations. I discovered that pets (including guide dogs) carry specific cultural importance for the blind community. For example, most of their camera roll included many dogs and cat pictures. Unfortunately, Seeing AI seemed to have a hard time describing their pets while users expected that the app would actually describe them better.

“I got a lot of cat pictures. That’s what I usually use my phone for. Yeah, I send kitty pictures to their mama.” — P5
“…how the app is picking up both dogs would be important for me to know, because a lot of times I use these photos to send puppy raisers. So these people need to raise these puppies and then give them back to the school to give to people who are blind and visually impaired. And I have my own guide dog. I want to be able to differenciate their photos before sending.” — P6

Potential Solution

The participants mentioned that they would love to have the app to better describe their pets’ color, size, quantity with the “scene”, and “browse photo” features. Let’s see how far we can achieve on this!

Guideline 9 - Support efficient correction

4 out of 6 participants mentioned that in a perfect world, they would like to get a perfect color identifier. They said even with the paid apps, there is no successful color identifier until now. This is mostly because the camera light creates a problem or the point of the camera is too wide and can't capture a specific spot.

Potential Solution

Narrow the camera vision for the color identifier (this could also be a potential solution for picking up specific things in the environment such as restaurant names in open space as well).

Another area for improvement I discovered is that the document feature is not successful in heavy text documents with multiple columns. It doesn't identify the locations of columns and corresponding headlines. For example, if the scanned document is an article and there are 3 columns on it, it would read the first line across all three columns then go to the second line, and so on.

“It goes straight across in cooking instruction columns such as oven, microwave or stove options. It mixes up the cooking instructions.” — P3

Potential Solution

The way AI reads the document should be changed depending on the type of document. For example, in an article, you read the whole first column and then you read the second column. But like on a bill, you might need to read across with the corresponding headlines included.
Learn opportunities from other competitors that provide good results in documents such as VoiceDream Reader and KNFB Reader.

“If I am going to read something columnar, I probably would use VoiceDream Scanner. I have VoiceDream reader and the two integrate and it’s just a really wonderful reader.” — P4

Document samples: first image Verizon Tv bill results; second image a college yearbook page; third image a recipe book page — Document samples used in the study sessions

In Conclusion

Seeing AI is a great tool for the visually impaired community. All of my study participants stated that they like this app and appreciate all the efforts that Microsoft employees put into it. They know it's hard to master all the functions the app has but also they believe Seeing AI got good stuff in there and it's worth working towards mastery. Visual impairment is not going away and it’s actually rising with the increased aging population. I believe we have to be responsible and take action to tackle these issues with the vision of mastery.

What have I learned during this study?

Don't talk with the screen reader at the same time because participants can’t listen to both of you :) Make sure you hear what they hear during the test sessions.
Practice VoiceOver and join the blind community’s world. As one of my participants said “do it as if you needed to” and I totally agree!
I appreciate working with diverse users and I will always be committed to bringing their voices to the table.

About me

I am a recent Human-Computer Interaction graduate and certified in Human Subject Research. I have found my niche in conducting effective research and information gathering with the purpose of creating practical tools and processes that optimize the efficiency and perceived user experience of end-user. I am passionate about accessibility design projects because working with diverse users energizes me and I love bringing their voice to the table!

To keep this article short, I only provided some of my findings. If you want to learn more about my study, please contact me at bengisudost@gmail.com

Connect me on Linkedin! https://www.linkedin.com/in/bengisudost/

How can we improve Microsoft’s “Seeing AI” application?

The Results

Guideline 1 - Make clear what the system can do

Guideline 2 - Make clear how well the system can do

Guideline 4 - Show contextually relevant information & Guideline 10 - Scope services when in doubt

Guideline 5 - Match relevant social norms

Guideline 9 - Support efficient correction

In Conclusion

What have I learned during this study?

About me

Written by Bengisu Dost