This project is an information visualization of the diversity data of tech companies headquartered in the silicon valley. The data are based on their 2016 EEO-1 report (a compliance survey requiring company employment data to be categorized by race/ethnicity, gender and job category). We focus on revealing the leadership distribution in the IT industry and enabling comparison between companies.

My Role
User Experience Designer

I analyzed our data set and defined the goals of visualization.
I designed the interactions to help answer the questions users may have
I chose what views and charts to use throughout the project.

Visual/Information Designer

I encoded the data using existing data visualization techniques and charts.
I created different design alternatives with sketches and UI mockups.
I designed the final UI and interactions
I chose the color scheme and font.

Web Developer

I implemented 50% of the website using HTML, CSS, D3.js.

- Preface -

The purpose of this visualization is to display how race and gender figure into the job distribution in the tech companies. We chose this topic because we want to bring more attention to the gender and racial bias in the IT industry. We wish our project would attract more people to advocate for equal employment rights.

Understanding the Data

We found the original data from https://github.com/cirlabs/Silicon-Valley-Diversity-Data. The dataset contains workforce distributions by job category and race for 177 of the largest tech companies headquartered in Silicon Valley. We processed these data by compiling several spreadsheets and kept data cases of 24 identified companies that have publicly released their data in the anonymized dataset.

Each figure in the dataset represents the percentage/count of each job category that is made up of employees with a given race/gender combination, and are based on each company's EEO-1 report of 2016.

Figure: sample data
I communicated my understanding of the data with my team

In each data case, we have 4 categorical attributes. There are two continuous variables for each combination of the demographic category and job category.

We decided to organize our data in this tree structure
Job Category Description

Thanks to Reveal from The Center for Investigative Reporting and The Center for Employment Equity, who provided this open data source, we had good data to start with. In order to build better visualizations, we need to understand what people might use them for. We felt the need to align our understanding of the target audience within our team.

Target Audience

Job seekers who take into consideration the diversity in workplaces
Minority candidates who want to know which company offers more opportunities
People who want to have a better understanding of EEO-1 reports
Advocator of the employment equity

Our target audience is a mixture of people who only want to see the big picture, and people who want to retrieve a specific number. To cater the experience for both groups, we face the challenge of balancing macro and micro views.

User Goals

We then went on identifying potential goals that users may want to accomplish while exploring the visualization.

Identify the most and least represented demographic groups among IT practitioners
Explore the relationship between demographic features and positions
Compare the diversity between different companies
Find out about which company may offer more employment opportunity to a certain group

Design Process

In order to create an understandable, engaging experience for our audience, I came up with the strategy of increasing level of exploration through the experience. We will start with explanatory data presentation, and transition to self-exploratory views as users become more familiar with the data.

I drafted some sketches to communicate the initial design with my team. It can be broken into two distinct parts based on the goals of communication and techniques applied.

Part 1 - Scrollytelling
Part 2 - Dashboard

In the first part, we focus on presenting the overview by explaining a confirmed hypothesis through the technique of scrollytelling. Following that, we will display multiple glyphs and apply interactions to support self-exploration tasks.

Design Alternatives

We experimented with different kinds of charts and visualization techniques. I created some medium-fidelity design alternatives to help us make design choices.

Scrollytelling Design

Animated bar chart

This design is one bar chart changing corresponding to scroll events. The top rectangle represents all employees as a whole. Numbers of employees map to the area of the chart. When scrolling down, it breaks down to 24 pieces to represent each individual company. The areas in each bar represent percentages of employees in that company.

The purpose of this design is to communicate the makeup of employees in our data set. Given multiple categorical attributes, we focus on communicating one aspect at a time. The first part presents the gender ratio, and the second part emphasizes ethnicity makeup.

Scrollytelling Design

Dots Plot

The second overview design focus on presenting the leadership distribution in these tech companies. Every dot in the graph represents 1000 people and is bonded with demographic data and job categories.

Through grouping and positioning the dots, we can communicate the leadership distribution, and how its affected by gender and race. Most of the information is conveyed through the spacial relationship (proximity, distance) and color.

Dashboard Design

Small Multiples

We want to display more details about each company in the second part of the visualization. The technique I use in this design is "small multiples". For each company, we use a combination of charts to show different aspects of their diversity. The big circle in the center shows race and gender distribution on different job categories, where each ring represents one job category. The 5 small glyphs represent the job distribution of each race.

Dashboard Design

Dense Pixel Display and multiple views

In this design, I combined several kinds of charts as well as views on the same dashboard. I kept a dots plot on this dashboard to show where these companies stand on gender and racial diversity. The dense pixel grid provides a clear overview of each company's racial diversity. When the user selects one dense pixel grid, the company's diversity information will expand on the top left with more details.

Dashboard Design

Small multiples plus zoom interaction

I came up with this design after exploring the other alternatives. The inspiration was stolen from "Nightingale Rose Chart", based on which I made some alteration to adapt our data. Each segment (wedge) represents one job category and each ring represents one race (labeled by color). Values are mapped to the area not the radius of each ring. With this view, we were able to encode multi-dimensional categorical data without relying on interactions.

We presented the above design alternatives to the class and obtained valuable feedback.

Good things are, by combining two parts, our visualization is doing well in communicating the overview and details. But given a data set that cries for comparison, the current designs are not competent in facilitating this array of tasks.
When we try to illustrate employee distribution on 3 categorical variables (race, gender, job category), the chart turns out to be difficult to understand.
Users can get lost with such rich information on one page. A good visualization promotes a dialog between viewers and the data, spurring new discoveries.

Improved Design

Based on the feedback, we selected the views that performed better and built our final design. I improved the layout and chart design with several modifications.

Session 1

The leadership overview

Session 2

How many of them are...

Session 3

Look up details about each company

Try our visualization here

Pinching the Challenges

01|Challenge: Facilitate Comparison

A new session was added to enable comparisons. We use mosaic plots to display the proportion of any combination of gender and race.

Given a set of data cases, rank them according to some ordinal metric.

Users can easily find the median, extremum by checking the sort button. This also makes comparing the values easier.

02|Challenge: Illustrate the relationship among multiple categorical attributes
Brushing and Linking

brushing and linking refers to the connection of two or more views of the same data.

We've explored using different views to illustrate the 3 categorical attributes(gender, race, job category). But when we try to aggregate them on one view, the chart becomes too complex to understand. In order to reflect all three aspects of the data and still keep it understandable, we decided to use a two-part display, while keeping them connected.


Highlight the selected company in one view, in the other connected data representation.

On the left side is a ranked bar chart that displays the gender ratio for each company. The view on the right displays job distribution of different races in 3 job categories.

03|Challenge: Promotes a dialogue between the user and the data

Not only do we want to answer user's questions, but we also want to spur the generation of new questions. To accomplish that, we've designed many dialog interactions. By throwing questions at users first, we hope to guide them to explore and obtain initial understanding/interest of the data.

The first session starts with a question and then uses the data presentation to help users find and test their hypothesis.


Given some concrete conditions on attribute values, find data cases satisfying those conditions.

Gender * Race

Through the interactive capabilities of this session, we allow a diverse and flexible set of questions to be asked and answered about the data collection.


Upon finishing the development, we invited other people to try out our visualization and give feedbacks. We used convenience sampling in recruiting, most of our participants are Georgia Tech alumni, and some are data visualization experts.

Usability Metrics

Due to our efforts put into designing dialog interactions and balancing macro and micro views, the overall learnability was rated high. People found the dots plot very intuitive and effective in telling the overview story. But we received some feedback that the rose chart takes extra time to understand. It may be better if we can include an annotated diagram to explain how to read it.


All the participants were able to recall the meaning of data attributes throughout the exploration. After completing exploration, they were able to name a few insights obtained from the project. However, it was a challenge for them to recall the color-and-race pairs without hovering over for the legend. Although we've used consistent color coding for the races, there should be a legend on the dashboard view as well.


Due to the use of scrollytelling on this visualization, users found it difficult to quickly go to a target session. To solve this problem, we may implement a navigation section to help people jump between sessions more efficiently.

Utility Metrics

The visualization allows viewers to find answers to those questions identified in the earlier need analysis with ease. Furthermore, some participants reported serendipitous discoveries.


Some interesting insights obtained through this fruitful experience are:

  1. Male employees take more than 50% of positions in all the identified tech companies.
  2. The least represented groups are African-Americans and other minorities.
  3. Most Asian employees work as professionals in these tech companies.
  4. White employees dominate the executive positions.
  5. To our surprise, the supportive roles are mostly taken by white people and the group is less diversified than the professionals.


Task-driven data visualization design

The data visualization design is not just a fancy representation of excel sheets. Just like UX design, it should help users accomplish their goals. We've defined user goals in the very beginning and used those tasks to guide our design process. THe user-centered design philosophy is applicable for information design as well.

Visualization is more than just answering a specific question

It also is about the investigative analysis process, which helps us to learn about, develop awareness of, and generate trust in the data, and its context.

A good representation captures the essential elements of the event, deliberately leaving out the rest

The critical trick is to get the abstractions right, to represent the important aspects and not the unimportant. I've been trying very hard to work out a design that preserves all the details of data until I figured out this was not the purpose of data visualization. A good data visualization communicates a clear idea, tells a story that users care about.