Android App Market Analysis — Data Visualization Case Study

Vivi Shin
11 min readAug 20, 2020

Android App Market Analysis is a data visualisation project that presents an overview of collated data from 10k apps on Google Play Store and gives a brief insight into strategies for pricing and sizing of applications.

View Dashboard →

The project was undertaken after completion of a beginners/intermediate-level Tableau course hosted by Planit. With the application of navigation tabs between two main sets of data and filter options, I created two interactive dashboards that allow users to select and access the specific data they choose. This work visualizes the released data of approximately 10,000 major apps in the Google Play Store.

RAW DATA & DATA SET

Utilising Kaggle community

Kaggle, a community of data scientists and machine learners, provides user-generated datasets freely accessible to the wider community and individuals such as myself. The uploaded datasets are assigned a rating from 0–10 based on their reliability and machine- readability, with a score of 0 inferring very unreliable data and the highest score of 10 being very reliable, machine-readable and organised data.

Defining a subject: Google Play Store App, Android Market Analysis

Using Kaggle, I was able to locate a reliable data source analysing the Android app market. The data comprising this set was scraped from more than 10,000 major apps in the Google Play Store and was updated six months prior to when it was accessed, thus, the most recent information was available and able to be analysed.

Kaggle allows one to preview data sources in advance. This allowed me to brainstorm what insights I wanted to generate before committing to a particular dataset.

DATA CLEANING

Cleaning data using Tableau

Initially, the raw information comprising the chosen dataset was inspected to ensure the data could be purified and align with my intention of visually presenting this material to provide users with greater insight of the Android App market.

Data refining is achievable through programs such as R, Python, Hadoop programming, etc. However, given my minimal experience with data manipulation of this nature and the language used by the above-mentioned programs, I stuck to working with the functions of Tableau as much as possible to refine the data.

The Google Play Store Apps data sourced from Kaggle had a machine-readable and usability rating of 7.1. Given that this rating inferred acceptable but imperfect data, the following corrections, as shown by the dialogue box below, were needing attention.

① Eliminate unnecessary characters ‘M’, ‘’, & ‘+’ for app size and number of installations
② Integrate KB into MB per app size
③ Removing decimal point from App Price
④ Other: Removing data backlogs, error names, etc.

Tasks 1, 2 and 3 were processed by adding functions via computational fields within Tableau. Task 4 was arranged in Excel manually (not a large amount of data needed manipulation, so this process was easily achievable).

At the time of pursuing this project, my simple knowledge of Excel functions included calculating the sum or average of cells of interest and organising number formats. To get the most out of this project and fulfill the desire to present a visual insight into the dataset chosen, I needed to take to self-directed learning on the web.

① Eliminate unnecessary characters

Following the transferral of the raw data into an Excel spreadsheet, the populated cells of the “Size” and “Installs” columns contained unnecessary characters, as shown by the red boxes in the figure on the left below. If a populated cell contains a number adjoined to a letter or symbol, the data cannot be recognized as consecutive numbers or measurements. Depicted in the Tableau figure to the right below, columns containing measurable values fall under an overall category denoted with a “#” symbol, where as, columns comprising discontinuous strings of data are given an “Abc” heading. These strings of data need to be refined in order to deem them measurable.

Left) Raw data imported into Excel; Right) Raw data imported into Tableau

Executing a Google search with the terms “Remove special characters in data tableau” led me to the Tableau Forum. This forum suggested I utilise the REPLACE function to rid the data of irrelevant letters and symbols.

In the Dimension section, right click mouse then select “Create Calculated Field”.

Name the new calculated field and in the dialogue box enter the following: ‘REPLACE(REPLACE ([the corresponding column title], “symbol or text to be replaced”, “the replacement symbol, number, text or blank space”). The use of double quotes e.g “_” indicates that the special character within the quotes is a string (letter, symbol) and this string can only be replaced by another string (another letter, symbol or blank space). I removed unnecessary characters by replacing them with a blank space (no other letters or symbols) e.g. a “+” was removed by replacing it with a blank space by using the trigger “ “. The replacement function will only be successful if the Tableau wizard identifies the formula input as valid.

Left) Number of installations, Right) Refining equation for app Size

② Integrate KB into MB per app size

Through the eliminating process of Task 1, a string, M was removed from the app size. However, in the case of KB, its number couldn’t be considered as MB units, so it is not possible to simply eliminate a K letter.

Whilst struggling with what to search for dealing with this task, I discovered Kernels on Kaggle, which helped me to solve the issue. (Regarding Kernels, it will be explained in detail on the other section below.)

Examining Notebook on Kernels, it allows me to view in-depth case studies of people who’ve already analyzed and visualized the raw data using R or Python language, which data was I meant to use.

From one of the examples below, I found a data analyst who seemed to use Python to rectify the problem I wanted to correct. All I was able to understand at the time was Replace(‘k’, ‘’) and ‘if’ function. To fulfill the desire to solve this task on my own, I needed to grasp and learn the basics of Python.

Fortunately, the Python tutorial was kindly explained in w3schools, which made me slightly decipherable the used code in about half an hour. Python was always a foreign language for me, but it was worthwhile that I could understand the basics after a short study.

Tableau wizard required me to use the structure of the equation with ‘if-then-else-end’. Since I needed the result of dividing cells only with ‘K’ by 1000 in the raw material, I entered the condition into ‘if’ function. Inspecting the Python tutorial, I found ‘float()’ is used to return a floating point number from a number or a string. And Else catches anything which isn’t caught by the preceding conditions. Throughout using this equation, I ended up solving this task.

③ Removing decimal point from App Price

This is a simple way of changing the number format, which is also available with a basic feature on Excel. To remove the decimal point from the price, the number of digits must be zeroed in the input section of decimal places.

GAINING INSPIRATION

Using Kernels at Kaggle and Tableau Public Gallery for gaining an inspiration of visualisation.

As most designers gain artistic inspiration from Behance or Pinterest before diving into work, it is also principal for data visualisation to be prepared with sufficient references. Given my minimal experience in data visualisation, it was difficult for me to determine which charts are applicable for exhibiting the related data effectively. Gratefully, there is active communication within Kernels and Kaggle’s open community, so I was able to learn and taught myself enough.

What is Kernels?

As mentioned earlier, what is Kaggel’s kernel? According to Wikipedia, it is a computer program at the core of a computer’s operating system with complete control over everything in the system. In other words, it can be considered as a research institute and community related to data analysis and machine learning codes at Kaggle.

My data visualization dashboard was inspired by Lavanya Gupta’s notebook below, which is the highest-rated project. She visualized and draw conclusions in multiple ways with the raw data I applied.

Tableau Public Gallery

There are countless visualizations at Tablo Public Gallery for references. Thankfully, if the integrated document is open to download, I can examine it for gaining inspiration. On the left is Ivy Brewer’s dashboard, which is enjoyable with smartphone graphics and expressed the number app of a category with circles. On the right is Lin (Jamie) LAN’s dashboard, which organizes apps for Top10 ratings and the number of installations with various shapes.

DATA VISUALIZATION

Wireframe

Before UX designers can move on to anything else, we must think holistically about the structure of our project, determining how every element — no matter how small — fits into the overall experience. For visualising data, this wireframing process is the key process for picturing the way of expressing raw data, so I created rapid wireframe with using Figma.

Interactive Data Visualization

Using the ‘story’ function within Tableau, I was able to combine several dashboards or sheets through captions (buttons). I decided to build my dashboard into two sheets: Overview and Strategy.

Stats summary

I inserted a summary figure next to the title, which displays the total number of apps, average price, and average rating upon clicking each category. It was a simply implemented function by inserting and editing the corresponding values in the tooltip within Tableau.

Top 10 Category

After watching the educational clip on Planit’s youtube channel, I was able to visualise the Top10 category with a donut chart using the double axis. I aimed to intentionally show only the top 10 in order to deliver primary information amongst 25 categories in raw data.

Insights

1. Top 3 categories of apps are Family (Education, Entertainment), Game, Medical, Business.

2. On the Google Play Store, there are 1,899 apps within family categories, accounting for the highest percentage.

3. Through inspecting this chart, when audiences want to launch an app, they can reckon how many competitors exist and what is the most popular category in the Android market beforehand.

Top 10 Installed App with mobile UI

Users can view selected data about the desired category through filters on the left. of pricing and rating. Inspired by Ivy Brewer introduced above, I also added a chart in the mobile screen image. Additionally, I implemented pricing and rating filters within the screen so that users can check the figures they want according to the filters. The advantage of utilising Tableau is that it provides for users to interact with the dashboard controlling by themselves.

By employing Tableau’s tooltip feature, the name of app, category, price and rating on a description clicking upon a circle in the chart.

App Pricing across Category

The second dashboard, which has a more strategic theme, shows the price distribution chart of apps by category and the ratio of free or paid. When a user selects a category, it features as a trigger to filter for displaying the ratio of the paid app on the corresponding category.

Insights

1. Most apps are overwhelmingly free.

2. Enable to check the price range of apps by category that I want.

3. For Family and Finance categories, there are also apps with extreme prices ($400).

Pricing & Sizing Strategy

Tableau function includes establishing a connecting relationship between charts, which can be applied as filters on different charts clicking upon each category in the figure below.

Insights

1. $1 to $30 could be an appropriate price when an app is not launched with free.

2. Users prefer the smaller app size, which is around 2 to 40MB.

What I’ve learnt

Reflecting back on this project, I feel gradually adjustable to dealing with Tableau. The more I learn, the more I want to learn and interested in visualizing data. This is because I found that the working process for data project is similar with UX/UI and product design.

From the discovery of research to the execution of final output, these schemes are included in both product design and data case studies. However, data visualization seems to be more attractive in terms of extracting from ‘raw data’ and cleaning an imperfect dataset. This project fulfilled my product design ideologies with a data-centric mindset enabling me to utilise design and data better.

Thank you for reading my article :). I will appreciate any comments and feedback below or just feel free to contact me via hello@vivishin.com.

☝️ You can also find this article on my portfolio.

✌️ Korean Article is here.

--

--