• Time
  • Oct.20 2017 - Nov.17 2017
  • Team
  • Luna Ouyang, Zheru Jiang
  • My Role
  • I am one of the two Chief Designers for this project. I and my partner Luna Ouyang came up with this original idea. We carved out our design from task and need analysis to interaction design. As designers, our work are mixed and we both devoted lots of passion to this project.

  • Brief
  • Voice emoji bring users seamless emotion sharing experience by allowing them to send emoji via voice and hear the emoji. Specific sound effects are paired with each emoji to convey a more accurate and interesting meaning.

Problem Statement

Audio messages are better than text messages in that it is faster to compose, more accurate and direct representation.

When users input audio message, emotion tools like emoji are hard to use. Users need to stop audio recording, switch back to keyboard and select the intended emoticon on the cluttered panel. This task become worse when user's eyes are busy or when in a mobility required situation.

Target User

Our target users are people who frequently use emoji when sending electronic messages. They are more likely to be younger generations. Children with limited knowledge of language and senior groups who have visual impairment are considered as potential users.

According to the 2015 Emoji Report, emoji were used by 92% of the online population by that time. 78% of woman and 60% of men were frequent emoji users (use emoji several times per week).

We drafted the following user personas to reflect the characteristics and needs of our target users.


Audio messages are convenient in the following situations:
• When user's eyes are busy;
• When user's hands are busy;
• When in a situation where physical limitation exists;
• When mobility is required

Design Alternatives

To solve the problem that has been raised in the context, we are going to enable speech-based input on our interface. Besides, we decided to pair emoji with audio output to enhance the emotion sharing experience. We brainstormed the following three design alternatives to address the design concepts.

In all three design concepts, we made the task of sending emoji easier to complete. Users no longer select a emoji from the cluttered panel, they can just utter their expression, the system will recognize the utterance and display suggestive emoji for selection.

Design Decision

We carefully evaluated the three concepts by referring back to our persona, usage context, and taking into consideration the audio recognition technology to be used in the context. We decided to continue with concept 3 as our design direction for the following reasons:

•   When users are using audio message, concept 1 has a higher potential error rate to record the wake up word and utterance as part of the audio message;

•   Concept 2 is less efficient. The seperate buttons for voice message and voice emoji increase the action time. Besides, it’s less intuitive if users need to press on different buttons for different part of voice message;

•   Concept 3 associates emoji audio input with gesture control and combine it well into the current message ecosystem. Users can use the same finger that press and hold for recording voice message to switch to “recognize emoji” mode. It not only saves time but also enables them to send emoji without looking at the screen and thus reduces the error rate;

•   Users slide up and down to switch between recording message and selecting emoji. The exploitation of mode switching in concept 3 largely decrease the error rate of mistakenly record utterance as part of audio message;

•   Concept 3 has lower requirement for natural language processing technology, thus it can be adopted by various instant messaging platform without demanding rigorous technical support.


We simply drafted out the wireframe to demonstrate the visual interface of interaction flow. We invested more time on the audio dialogue of interaction design.

Information Architecture

Each emoji will be paired with a unique sound that represents the meaning. We mapped a series of associational words to each one of the emoji. These words consist of :

1) Names of emoji      2) Emotion related to the emoji     3) Features of the emoji     4) Cultural interpretation of the emoji.
These associational words will be used in speech recognition for searching emoji that users intend to use. We referred to Emojipedia in order to gain an accurate understanding of the meanings, context of use, how people remember and describe them.

Due to limited time and resource, we weren't able to design the whole set of voice emoji, so we decided to work with a few most popular ones as a start. We gathered the most frequently used emoji of year 2015 to 2017 from the most popular social network platforms. We then selected a subset of them that has been repetitively popular.

Interaction Design

Generally, dialogue of an audio interface starts with a wake word, followed by launch command, invocation name and utterance. Our system is simpler, since wake up and launch can be achieved by the gesture, the main body of our dialogue only consists of invocation name and utterance.

Based on different types of user intents, we designed three categories of commands for our system:

1) adding an emoji
      enable “repeat” for quickly adding the same emoji repetitively in one recognition.

2) cancel recognized emoji

3) replace emoji

Tool: Sketch, Principle

We created a partialy functional protototype to demo the interaction. This prototype allows users to compelete a series of core tasks.

>Download interactive prototype for mac<
Thanks for reading