Solving Business Problems using SQL¶

In this notebook, we'll be working with the Chinook database. This database contains information about a fictional digital music store.

By querying this database using sqlite, we'll attempt to obtain data driven answers to the following business questions:

  • Which genres are most popular? Should we invest in recruiting upcoming artists of a particular genre?
  • Are certain employees generating more revenue than others? If so, why is this the case?
  • Should marketing campaigns aimed towards any particular country be launched?
  • Would it be valuable to our customers to add an "album purchase" option to our website?

Summary of Results¶

After querying the database to generate tables answering the above questions, we concluded that:

  • The rock genre is far and away the most popular. However, it may be more valuable to invest in upcoming artists from the Metal and Alternative/Punk genres.
  • All employees seem to be generating a similar amount of revenue. Any differences are likely accounted for by time spent at the company.
  • A few countries show potential for specific marketing campaigns, but this is based on a small amount of data. Smaller campaigns should first be launched targetting these countries in an effort to increase sample size before making a bigger decision.
  • About 20% of invoices correspond to album purchases. This seems like a significant enough percentage to suggest adding an "album purchase" feature to the website.

The Chinook Database¶

The following Schema diagram for the Chinook database was obtained from DataQuest.

Initial Exploration¶

To begin, we'll setup sqlite to run properly in this notebook.

In [ ]:
%%capture
%load_ext sql
%sql sqlite:///data/chinook.db

Now, we write a short query that gives us the name and type of any table or view in our database. Notice that before running this query, we drop two views (which are created later on in this notebook). This will solve any issues that may arise from running certain code-cells multiplie times.

In [ ]:
%%sql
DROP VIEW IF EXISTS order_by_country;
DROP VIEW IF EXISTS stats_by_country;
SELECT
    name,
    type
FROM sqlite_master
WHERE type IN ("table","view");
 * sqlite:///data/chinook.db
Done.
Done.
Done.
Out[ ]:
name type
album table
artist table
customer table
employee table
genre table
invoice table
invoice_line table
media_type table
playlist table
playlist_track table
track table

We see the name of each tabe. We can check our Schema diagram above that all tables are accounted for. Now, we can begin writing some queries to answer our questions.

Most Popular Genres¶

In the following query, we generate a table which answers the following questions:

  • How many tracks of each genre were sold?
  • What percentage of the total number of tracks sold does this account for?
In [ ]:
%%sql

/* Use a with clause to generate a temprorary table
contains genre, dollar amount of tracks_sold */

WITH tracks_by_genre AS 
    (
    SELECT 
        g.name AS genre,
        CAST(SUM(il.quantity) AS float) AS tracks_sold /* cast as float for future percentage calculation */
    FROM invoice_line AS il

    /* join genre to invoice_line through track table (see Schema) */

    INNER JOIN 
        track AS t 
        ON t.track_id = il.track_id
    INNER JOIN 
        genre AS g 
        ON g.genre_id = t.genre_id
    GROUP BY genre
    )
    
/* Main query */

SELECT
    genre,
    CAST(tracks_sold AS int) as tracks_sold,
    ROUND(100*tracks_sold/(SELECT SUM(tracks_sold) 
        FROM tracks_by_genre)
          ,2) AS percentage
FROM tracks_by_genre
ORDER BY tracks_sold DESC
 * sqlite:///data/chinook.db
Done.
Out[ ]:
genre tracks_sold percentage
Rock 2635 55.39
Metal 619 13.01
Alternative & Punk 492 10.34
Latin 167 3.51
R&B/Soul 159 3.34
Blues 124 2.61
Jazz 121 2.54
Alternative 117 2.46
Easy Listening 74 1.56
Pop 63 1.32
Electronica/Dance 55 1.16
Classical 47 0.99
Reggae 35 0.74
Hip Hop/Rap 33 0.69
Heavy Metal 8 0.17
Soundtrack 5 0.11
TV Shows 2 0.04
Drama 1 0.02

Here, we see that Rock is by far the most popular genre, accounting for over 50% of the total number of tracks sold. Since a high percentage of these purchases likely correspond to older tracks, we reccomend searching for upcoming artists in the second & third most popular genres of Metal and Alternative/Punk.

Next, we'll take a look at how emplyee perforamnce can be analyzed by looking at the employee, customer, and invoice tables.

Analyzing Employee Performance¶

Using the Schema, we can relate each customer to an employee who represents him/her. Hence, after relating each invoice to the customer who made the purchase, we can sum the total number of revenue associated with each sales representative. We do this with the following query.

In [ ]:
%%sql
SELECT 
    e.first_name || " " || e.last_name AS employee_name,
    SUM(i.total) AS total_sales, /* total from invoices (to be grouped by employee name) */
    e.hire_date
FROM 
    invoice AS i
LEFT JOIN
    customer AS c
    ON c.customer_id = i.customer_id
LEFT JOIN 
    employee AS e
    ON e.employee_id = c.support_rep_id /* left join each customer to their employee representative */
GROUP BY employee_name;
 * sqlite:///data/chinook.db
Done.
Out[ ]:
employee_name total_sales hire_date
Jane Peacock 1731.5100000000039 2017-04-01 00:00:00
Margaret Park 1584.0000000000034 2017-05-03 00:00:00
Steve Johnson 1393.920000000002 2017-10-17 00:00:00

From here we see that total sales is similar among each employee. Jane Peacock has the most, but she was also hired the earliest. After correcting for time spent at the company, it seems each employee is performing similarly.

Analyzing Customers by Country¶

To investigate if any countries should be specifically marketed towards, we write a query which tells us number of customers, sales, as well as other factors, for each country.

To begin, we'll create two views:

  • one view which keeps track of the total dollar amount, customer, and country associated to each invoice
  • another view (using the first view) which computes some customer and sales statistics for each country
In [ ]:
%%sql
DROP VIEW IF EXISTS order_by_country;
CREATE VIEW order_by_country AS
    SELECT
        i.total,
        c.customer_id,
        c.first_name || " " || c.last_name AS customer_name,
        c.country
    FROM 
        invoice AS i
    LEFT JOIN
        customer AS c
        ON c.customer_id = i.customer_id;
    
DROP VIEW IF EXISTS stats_by_country;   
CREATE VIEW stats_by_country AS 
    SELECT
        country, 
        COUNT(DISTINCT customer_id) AS number_of_customers,
        ROUND(SUM(total),2) AS total_sales,
        COUNT(customer_id) AS number_of_orders,   
        ROUND(SUM(total)/(SELECT CAST(
                    COUNT(DISTINCT customer_id) AS float)
                FROM order_by_country
                GROUP BY country),2)
    AS sales_per_customer
    FROM order_by_country
    GROUP BY country;
 * sqlite:///data/chinook.db
Done.
Done.
Done.
Done.
Out[ ]:
[]

With these views created, we can retrieve the data we need. A quick look at the above views reveal that many countries only have a single customer. We want to group all countries with only 1 customer into their own category, "other."

In the following query, we retrieve the information we need, while grouping countries with only a single customer into their own row.

In [ ]:
%%sql
SELECT
    country,
    number_of_customers,
    total_sales,
    sales_per_customer,
    sales_per_order
FROM 
    (
    SELECT 
        country,
        number_of_customers,
        total_sales,
        total_sales/number_of_customers AS sales_per_customer, 
        ROUND(total_sales/number_of_orders,2) AS sales_per_order,
        "A" AS sorting
    FROM stats_by_country
    WHERE number_of_customers > 1

    UNION 

    SELECT
        "Other",
        SUM(number_of_customers),
        SUM(total_sales),
        SUM(total_sales)/SUM(number_of_customers), 
        SUM(ROUND(total_sales/number_of_orders,2)) AS sales_per_order,
        "B" AS sorting
    FROM stats_by_country
    WHERE number_of_customers = 1
    )
ORDER BY sorting, total_sales DESC
 * sqlite:///data/chinook.db
Done.
Out[ ]:
country number_of_customers total_sales sales_per_customer sales_per_order
USA 13 1040.49 80.03769230769231 7.94
Canada 8 535.59 66.94875 7.05
Brazil 5 427.68 85.536 7.01
France 5 389.07 77.814 7.78
Germany 4 334.62 83.655 8.16
Czech Republic 2 273.24 136.62 9.11
United Kingdom 3 245.52 81.84 8.77
Portugal 2 185.13 92.565 6.38
India 2 183.15 91.575 8.72
Other 15 1094.94 72.99600000000001 111.66

We see that a plurality of sales come from the "Other" category. This suggests that marketing towards any specific country may not be beneficial. However, Czech Republic, Portugal, and India have high values of sales_per_customer.

Since the sample size for each country is so small, we recommend to not allocate too many resources towards targetting these countries specifically. Running smaller campaigns for these countries may allow a slight increase in sample size, which would give us a better idea as to whether or not larger campaigns would be useful.

Detecting Album Purchases¶

To determine whether the company should add an album purchase feature, we would like to know the percentage of invoices which correspond to an album purchase. To do this, we use the following method:

  • For each row of invoice_line, store the track_id and album_id corresponding to the track purchased
  • For each invoice, compute the number of distinct albums purchased from, and the number of distinct tracks purchased
  • Album purchases correspond to invoices for which the number of distinct album_id's is 1, and the number of distinct tracks purchased is equal to the album length.

This idea is implemented in the following query.

In [ ]:
%%sql
WITH 
    invoice_album AS
        (
        SELECT il.invoice_id,
               il.invoice_line_id,
               il.track_id,
               t.album_id
        FROM 
               invoice_line il 
        LEFT JOIN
               track t
               ON t.track_id = il.track_id
        ),
        
    album_lengths AS 
        (
        SELECT album_id, 
               COUNT(DISTINCT track_id) AS album_length
        FROM track
        GROUP BY album_id
        )

SELECT album_purchase, 
       COUNT(invoice_id) AS number_of_invoices, 
       ROUND(
           CAST(
               COUNT(
                   invoice_id) AS float)/(SELECT 
                                          COUNT(invoice_id) FROM invoice),3) AS percent  
FROM 
    (
    SELECT ia.invoice_id,
       COUNT(DISTINCT ia.album_id) AS num_albums, 
       COUNT(DISTINCT ia.track_id) AS distinct_songs,
       MAX(ia.album_id) AS album_id,
       al.album_length,
       "yes" AS album_purchase
    FROM invoice_album ia
    LEFT JOIN album_lengths al
          ON al.album_id = ia.album_id 
    GROUP BY invoice_id
    HAVING (num_albums = 1) AND (distinct_songs = album_length)

    UNION 

    SELECT ia.invoice_id,
       COUNT(DISTINCT ia.album_id) AS num_albums, 
       COUNT(DISTINCT ia.track_id) AS distinct_songs,
       MAX(ia.album_id) AS album_id,
       al.album_length,
       "no" AS album_purchase
    FROM invoice_album ia
    LEFT JOIN album_lengths al
          ON al.album_id = ia.album_id 
    GROUP BY invoice_id
    HAVING (num_albums != 1) OR (distinct_songs != album_length)
)
GROUP BY album_purchase
 * sqlite:///data/chinook.db
Done.
Out[ ]:
album_purchase number_of_invoices percent
no 500 0.814
yes 114 0.186

Based on this data, it seems that adding an album purchase option would be a worthwhile investment since it would make the purchasing process for 1/5 of all purchases much simpler.