1.11. Extra Practice#

This is meant to help you practise the same core skills you developed in the previous exercises. Completing these exercises are optional and only meant to provide a little extra practice if you want.

1.11.1. Set up Python Libraries#

As usual you will need to run this code block to import the relevant Python libraries

# Set-up Python libraries - you need to run this but you don't need to change it
import numpy as np
import matplotlib.pyplot as plt
import scipy.stats as stats
import pandas as pd
import seaborn as sns
sns.set_theme(style='white')
import statsmodels.api as sm
import statsmodels.formula.api as smf
import warnings 
warnings.simplefilter('ignore', category=FutureWarning)

1.11.2. Import a dataset to work with#

Here we will read in a data set which covers a wide range of variables related to sleep and daily habits.

  • Person ID: An identifier for each individual.

  • Gender: The prefered gender identity of the person.

  • Age: The age of the person in years.

  • Occupation: The occupation or profession of the person.

  • Sleep Duration (hours): The number of hours the person sleeps per day.

  • Quality of Sleep (scale: 1-10): A subjective rating of the quality of sleep, ranging from 1 to 10.

  • Physical Activity Level (minutes/day): The number of minutes the person engages in physical activity daily.

  • Stress Level (scale: 1-10): A subjective rating of the stress level experienced by the person, ranging from 1 to 10.

  • BMI Category: The BMI category of the person (e.g., Underweight, Normal, Overweight).

  • Blood Pressure (systolic/diastolic): The blood pressure measurement of the person, indicated as systolic pressure over diastolic pressure.

  • Heart Rate (bpm): The resting heart rate of the person in beats per minute.

  • Daily Steps: The number of steps the person takes per day.

  • Sleep Disorder: The presence or absence of a sleep disorder in the person (None, Insomnia, Sleep Apnea).

sleep = pd.read_csv("https://raw.githubusercontent.com/SageBoettcher/StatsCourseBook_2026/main/data/sleep_health_data.csv")
display(sleep)
PersonID Gender Age Occupation SleepDuration QualityofSleep PhysicalActivityLevel StressLevel BMICategory BloodPressure HeartRate DailySteps SleepDisorder
0 1 Male 27 Software Engineer 6.1 6 42 6 Overweight 126/83 77 4200 NaN
1 2 Male 28 Doctor 6.2 6 60 8 Normal 125/80 75 10000 NaN
2 3 Male 28 Doctor 6.2 6 60 8 Normal 125/80 75 10000 NaN
3 4 Male 28 Sales Representative 5.9 4 30 8 Obese 140/90 85 3000 Sleep Apnea
4 5 Male 28 Sales Representative 5.9 4 30 8 Obese 140/90 85 3000 Sleep Apnea
... ... ... ... ... ... ... ... ... ... ... ... ... ...
369 370 Female 59 Nurse 8.1 9 75 3 Overweight 140/95 68 7000 Sleep Apnea
370 371 Female 59 Nurse 8.0 9 75 3 Overweight 140/95 68 7000 Sleep Apnea
371 372 Female 59 Nurse 8.1 9 75 3 Overweight 140/95 68 7000 Sleep Apnea
372 373 Female 59 Nurse 8.1 9 75 3 Overweight 140/95 68 7000 Sleep Apnea
373 374 Female 59 Nurse 8.1 9 75 3 Overweight 140/95 68 7000 Sleep Apnea

374 rows × 13 columns

1.11.3. Exercises#

In the following questions, you’ll use descriptive statistics and indexing to explore questions about sleep and health.

When you are asked to calculate a value (for example, a mean or standard deviation) rather than produce a full table, you should report your answer in words in the text box below the code block. This is exactly how you would do it in a written report.

When the question asks you to “comment”, you are being asked to interpret the data. That is, explain what you notice, what patterns stand out, or what the numbers might mean in context. Use plain English and discuss your ideas with your tutor and classmates. Developing the skill of turning numbers into insight is one of the most important parts of learning data analysis.

Part 1: Sleep Duration#

a. What is the average sleep duration across all participants?

# Your code here

your text here

b. Compare the mean sleep duration across the Gender

# Your code here

your text here

c. Comment on your findings.

Part 2: Stress and Activity#

a. What is the average physical activity level, sleep duration, and DailySteps for participants across stress levels?

# Your code here

b. Split the data set into high stress and low stress individuals

# Your code here
#highstress = 
#lowstress = 

c. Which group is most physically active?

# Your code here

Part 3: Age and Sleep#

a. What is the relationship between Age and Sleep

# Your code here

b. What sort of things might explain this relationship

your text here

Part 4: Open Exploration#

What other relationships might be interesting to explore?