2.10. Extra Practice#

This is meant to help you practise the same core skills you developed in the previous exercises. Completing these exercises are optional and only meant to provide a little extra practice if you want.

2.10.1. Set up Python Libraries#

As usual you will need to run this code block to import the relevant Python libraries

# Set-up Python libraries - you need to run this but you don't need to change it
import numpy as np
import matplotlib.pyplot as plt
import scipy.stats as stats
import pandas as pd
import seaborn as sns
sns.set_theme(style='white')
import statsmodels.api as sm
import statsmodels.formula.api as smf
import warnings 
warnings.simplefilter('ignore', category=FutureWarning)

2.10.2. Import a dataset to work with#

Here we will read in a data set which covers a wide range of variables related to Taylor Swift’s discography. Each row of the dataset represents a song, and the columns include both musical features (derived from Spotify’s audio analysis) and metadata such as the song title, album, release year, and popularity score. Here are some key variables, but feel free to explore the dataset further for more information

  • track_name: Title of the song

  • album : Name of the album the song appears on

  • release_date : Date the song was released

  • popularity : Spotify popularity score (0–100)

  • duration_ms : Length of the song in milliseconds

  • danceability : How suitable the track is for dancing (0–1)

  • energy : Intensity and activity level of the track (0–1)

  • acousticness : Degree of acoustic sound (0–1)

  • valence : Positivity or happiness of the musical content (0–1)

  • tempo : Estimated tempo in beats per minute (BPM)

  • loudness : Overall loudness of the track in decibels (dB)

taytay = pd.read_csv("https://raw.githubusercontent.com/SageBoettcher/StatsCourseBook_2026/main/data/taylor_swift_spotify.csv")
display(taytay)
Unnamed: 0 name album release_date track_number id uri acousticness danceability energy instrumentalness liveness loudness speechiness tempo valence popularity duration_ms
0 0 Fortnight (feat. Post Malone) THE TORTURED POETS DEPARTMENT: THE ANTHOLOGY 2024-04-19 1 6dODwocEuGzHAavXqTbwHv spotify:track:6dODwocEuGzHAavXqTbwHv 0.50200 0.504 0.386 0.000015 0.0961 -10.976 0.0308 192.004 0.281 82 228965
1 1 The Tortured Poets Department THE TORTURED POETS DEPARTMENT: THE ANTHOLOGY 2024-04-19 2 4PdLaGZubp4lghChqp8erB spotify:track:4PdLaGZubp4lghChqp8erB 0.04830 0.604 0.428 0.000000 0.1260 -8.441 0.0255 110.259 0.292 79 293048
2 2 My Boy Only Breaks His Favorite Toys THE TORTURED POETS DEPARTMENT: THE ANTHOLOGY 2024-04-19 3 7uGYWMwRy24dm7RUDDhUlD spotify:track:7uGYWMwRy24dm7RUDDhUlD 0.13700 0.596 0.563 0.000000 0.3020 -7.362 0.0269 97.073 0.481 80 203801
3 3 Down Bad THE TORTURED POETS DEPARTMENT: THE ANTHOLOGY 2024-04-19 4 1kbEbBdEgQdQeLXCJh28pJ spotify:track:1kbEbBdEgQdQeLXCJh28pJ 0.56000 0.541 0.366 0.000001 0.0946 -10.412 0.0748 159.707 0.168 82 261228
4 4 So Long, London THE TORTURED POETS DEPARTMENT: THE ANTHOLOGY 2024-04-19 5 7wAkQFShJ27V8362MqevQr spotify:track:7wAkQFShJ27V8362MqevQr 0.73000 0.423 0.533 0.002640 0.0816 -11.388 0.3220 160.218 0.248 80 262974
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
577 577 Our Song Taylor Swift (Deluxe Edition) 2006-10-24 11 1j6gmK6u4WNI33lMZ8dC1s spotify:track:1j6gmK6u4WNI33lMZ8dC1s 0.11100 0.668 0.672 0.000000 0.3290 -4.931 0.0303 89.011 0.539 64 201106
578 578 I'm Only Me When I'm With You Taylor Swift (Deluxe Edition) 2006-10-24 12 7CzxXgQXurKZCyHz9ufbo1 spotify:track:7CzxXgQXurKZCyHz9ufbo1 0.00452 0.563 0.934 0.000807 0.1030 -3.629 0.0646 143.964 0.518 56 213053
579 579 Invisible Taylor Swift (Deluxe Edition) 2006-10-24 13 1k3PzDNjg38cWqOvL4M9vq spotify:track:1k3PzDNjg38cWqOvL4M9vq 0.63700 0.612 0.394 0.000000 0.1470 -5.723 0.0243 96.001 0.233 54 203226
580 580 A Perfectly Good Heart Taylor Swift (Deluxe Edition) 2006-10-24 14 0YgHuReCSPwTXYny7isLja spotify:track:0YgHuReCSPwTXYny7isLja 0.00349 0.483 0.751 0.000000 0.1280 -5.726 0.0365 156.092 0.268 53 220146
581 581 Teardrops on My Guitar - Pop Version Taylor Swift (Deluxe Edition) 2006-10-24 15 1hxLyjC9D9Jpw6EAPKqWv4 spotify:track:1hxLyjC9D9Jpw6EAPKqWv4 0.04020 0.459 0.753 0.000000 0.0863 -3.827 0.0537 199.997 0.483 55 179066

582 rows × 18 columns

2.10.3. Part 1: Distributions#

Let’s have an inital peak into the dataset:

a. have a look at the distribution of the variable popularity? Can you find any songs that you might suspect to be “outliers”?

# Your code here

Your text here

b. have a look at the distribution of the variable duration? Does the distribution look skewed?

# Your code here

Your text here

c. have a look at the distribution of the variable danceability?

# Your code here

d. let’s add a new variable which will classify each song as either a dance song or not.

To do this we will check if each song is above or below the median danceability Think! what percentage of songs should be in each group?

taytay['dancey_song']=taytay.danceability>taytay.danceability.median()
taytay
Unnamed: 0 name album release_date track_number id uri acousticness danceability energy instrumentalness liveness loudness speechiness tempo valence popularity duration_ms dancey_song
0 0 Fortnight (feat. Post Malone) THE TORTURED POETS DEPARTMENT: THE ANTHOLOGY 2024-04-19 1 6dODwocEuGzHAavXqTbwHv spotify:track:6dODwocEuGzHAavXqTbwHv 0.50200 0.504 0.386 0.000015 0.0961 -10.976 0.0308 192.004 0.281 82 228965 False
1 1 The Tortured Poets Department THE TORTURED POETS DEPARTMENT: THE ANTHOLOGY 2024-04-19 2 4PdLaGZubp4lghChqp8erB spotify:track:4PdLaGZubp4lghChqp8erB 0.04830 0.604 0.428 0.000000 0.1260 -8.441 0.0255 110.259 0.292 79 293048 True
2 2 My Boy Only Breaks His Favorite Toys THE TORTURED POETS DEPARTMENT: THE ANTHOLOGY 2024-04-19 3 7uGYWMwRy24dm7RUDDhUlD spotify:track:7uGYWMwRy24dm7RUDDhUlD 0.13700 0.596 0.563 0.000000 0.3020 -7.362 0.0269 97.073 0.481 80 203801 True
3 3 Down Bad THE TORTURED POETS DEPARTMENT: THE ANTHOLOGY 2024-04-19 4 1kbEbBdEgQdQeLXCJh28pJ spotify:track:1kbEbBdEgQdQeLXCJh28pJ 0.56000 0.541 0.366 0.000001 0.0946 -10.412 0.0748 159.707 0.168 82 261228 False
4 4 So Long, London THE TORTURED POETS DEPARTMENT: THE ANTHOLOGY 2024-04-19 5 7wAkQFShJ27V8362MqevQr spotify:track:7wAkQFShJ27V8362MqevQr 0.73000 0.423 0.533 0.002640 0.0816 -11.388 0.3220 160.218 0.248 80 262974 False
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
577 577 Our Song Taylor Swift (Deluxe Edition) 2006-10-24 11 1j6gmK6u4WNI33lMZ8dC1s spotify:track:1j6gmK6u4WNI33lMZ8dC1s 0.11100 0.668 0.672 0.000000 0.3290 -4.931 0.0303 89.011 0.539 64 201106 True
578 578 I'm Only Me When I'm With You Taylor Swift (Deluxe Edition) 2006-10-24 12 7CzxXgQXurKZCyHz9ufbo1 spotify:track:7CzxXgQXurKZCyHz9ufbo1 0.00452 0.563 0.934 0.000807 0.1030 -3.629 0.0646 143.964 0.518 56 213053 False
579 579 Invisible Taylor Swift (Deluxe Edition) 2006-10-24 13 1k3PzDNjg38cWqOvL4M9vq spotify:track:1k3PzDNjg38cWqOvL4M9vq 0.63700 0.612 0.394 0.000000 0.1470 -5.723 0.0243 96.001 0.233 54 203226 True
580 580 A Perfectly Good Heart Taylor Swift (Deluxe Edition) 2006-10-24 14 0YgHuReCSPwTXYny7isLja spotify:track:0YgHuReCSPwTXYny7isLja 0.00349 0.483 0.751 0.000000 0.1280 -5.726 0.0365 156.092 0.268 53 220146 False
581 581 Teardrops on My Guitar - Pop Version Taylor Swift (Deluxe Edition) 2006-10-24 15 1hxLyjC9D9Jpw6EAPKqWv4 spotify:track:1hxLyjC9D9Jpw6EAPKqWv4 0.04020 0.459 0.753 0.000000 0.0863 -3.827 0.0537 199.997 0.483 55 179066 False

582 rows × 19 columns

2.10.4. Part 2: Counts#

a. How many songs does taylor have in each of her albums?

hint if you are having trouble reading the x axis, try using the command plt.xticks(rotation=90)

# Your code here

b. Does Taylor usually balance the amount of Dancey Songs on her albums?

# Your code here

2.10.5. Part 3: Comparing Across Variables#

a. Do Taylors Dancey Songs tend to be more popular?

# Your code here

b. What is Taylors most popular album?

  • Does this change if you use mean, median, or max song popularity?

# Your code here

c. Does Taylor tend to make her most popular tracks in a certain position on an album?

# Your code here

d. is there a relationship between valence and popularity?

# Your code here

2.10.6. Part 4: Timeseries#

Let’s start by adding a new variable to our dataset called release_year based on the release_date

taytay['release_year']=taytay['release_date'].str[:4].astype(int)
taytay
Unnamed: 0 name album release_date track_number id uri acousticness danceability energy instrumentalness liveness loudness speechiness tempo valence popularity duration_ms dancey_song release_year
0 0 Fortnight (feat. Post Malone) THE TORTURED POETS DEPARTMENT: THE ANTHOLOGY 2024-04-19 1 6dODwocEuGzHAavXqTbwHv spotify:track:6dODwocEuGzHAavXqTbwHv 0.50200 0.504 0.386 0.000015 0.0961 -10.976 0.0308 192.004 0.281 82 228965 False 2024
1 1 The Tortured Poets Department THE TORTURED POETS DEPARTMENT: THE ANTHOLOGY 2024-04-19 2 4PdLaGZubp4lghChqp8erB spotify:track:4PdLaGZubp4lghChqp8erB 0.04830 0.604 0.428 0.000000 0.1260 -8.441 0.0255 110.259 0.292 79 293048 True 2024
2 2 My Boy Only Breaks His Favorite Toys THE TORTURED POETS DEPARTMENT: THE ANTHOLOGY 2024-04-19 3 7uGYWMwRy24dm7RUDDhUlD spotify:track:7uGYWMwRy24dm7RUDDhUlD 0.13700 0.596 0.563 0.000000 0.3020 -7.362 0.0269 97.073 0.481 80 203801 True 2024
3 3 Down Bad THE TORTURED POETS DEPARTMENT: THE ANTHOLOGY 2024-04-19 4 1kbEbBdEgQdQeLXCJh28pJ spotify:track:1kbEbBdEgQdQeLXCJh28pJ 0.56000 0.541 0.366 0.000001 0.0946 -10.412 0.0748 159.707 0.168 82 261228 False 2024
4 4 So Long, London THE TORTURED POETS DEPARTMENT: THE ANTHOLOGY 2024-04-19 5 7wAkQFShJ27V8362MqevQr spotify:track:7wAkQFShJ27V8362MqevQr 0.73000 0.423 0.533 0.002640 0.0816 -11.388 0.3220 160.218 0.248 80 262974 False 2024
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
577 577 Our Song Taylor Swift (Deluxe Edition) 2006-10-24 11 1j6gmK6u4WNI33lMZ8dC1s spotify:track:1j6gmK6u4WNI33lMZ8dC1s 0.11100 0.668 0.672 0.000000 0.3290 -4.931 0.0303 89.011 0.539 64 201106 True 2006
578 578 I'm Only Me When I'm With You Taylor Swift (Deluxe Edition) 2006-10-24 12 7CzxXgQXurKZCyHz9ufbo1 spotify:track:7CzxXgQXurKZCyHz9ufbo1 0.00452 0.563 0.934 0.000807 0.1030 -3.629 0.0646 143.964 0.518 56 213053 False 2006
579 579 Invisible Taylor Swift (Deluxe Edition) 2006-10-24 13 1k3PzDNjg38cWqOvL4M9vq spotify:track:1k3PzDNjg38cWqOvL4M9vq 0.63700 0.612 0.394 0.000000 0.1470 -5.723 0.0243 96.001 0.233 54 203226 True 2006
580 580 A Perfectly Good Heart Taylor Swift (Deluxe Edition) 2006-10-24 14 0YgHuReCSPwTXYny7isLja spotify:track:0YgHuReCSPwTXYny7isLja 0.00349 0.483 0.751 0.000000 0.1280 -5.726 0.0365 156.092 0.268 53 220146 False 2006
581 581 Teardrops on My Guitar - Pop Version Taylor Swift (Deluxe Edition) 2006-10-24 15 1hxLyjC9D9Jpw6EAPKqWv4 spotify:track:1hxLyjC9D9Jpw6EAPKqWv4 0.04020 0.459 0.753 0.000000 0.0863 -3.827 0.0537 199.997 0.483 55 179066 False 2006

582 rows × 20 columns

a. Now let’s track how taylors songs have changed over the years. Start by tracking the popularity of her songs across the years

#Your code here

b. okay now the valence

#Your code here

c. and how about the danceability

#Your code here

2.10.7. Part 5: Open Exploration#

Are there any other interesting variables we should consider?