1.11. Extra Practice#
This is meant to help you practise the same core skills you developed in the previous exercises. Completing these exercises are optional and only meant to provide a little extra practice if you want.
1.11.1. Set up Python Libraries#
As usual you will need to run this code block to import the relevant Python libraries
# Set-up Python libraries - you need to run this but you don't need to change it
import numpy as np
import matplotlib.pyplot as plt
import scipy.stats as stats
import pandas as pd
import seaborn as sns
sns.set_theme(style='white')
import statsmodels.api as sm
import statsmodels.formula.api as smf
import warnings
warnings.simplefilter('ignore', category=FutureWarning)
1.11.2. Import a dataset to work with#
Here we will read in a data set which covers a wide range of variables related to sleep and daily habits.
Person ID: An identifier for each individual.Gender: The prefered gender identity of the person.Age: The age of the person in years.Occupation: The occupation or profession of the person.Sleep Duration(hours): The number of hours the person sleeps per day.Quality of Sleep(scale: 1-10): A subjective rating of the quality of sleep, ranging from 1 to 10.Physical Activity Level(minutes/day): The number of minutes the person engages in physical activity daily.Stress Level (scale: 1-10): A subjective rating of the stress level experienced by the person, ranging from 1 to 10.BMI Category: The BMI category of the person (e.g., Underweight, Normal, Overweight).Blood Pressure(systolic/diastolic): The blood pressure measurement of the person, indicated as systolic pressure over diastolic pressure.Heart Rate(bpm): The resting heart rate of the person in beats per minute.Daily Steps: The number of steps the person takes per day.Sleep Disorder: The presence or absence of a sleep disorder in the person (None, Insomnia, Sleep Apnea).
sleep = pd.read_csv("https://raw.githubusercontent.com/SageBoettcher/StatsCourseBook_2026/main/data/sleep_health_data.csv")
display(sleep)
| PersonID | Gender | Age | Occupation | SleepDuration | QualityofSleep | PhysicalActivityLevel | StressLevel | BMICategory | BloodPressure | HeartRate | DailySteps | SleepDisorder | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 1 | Male | 27 | Software Engineer | 6.1 | 6 | 42 | 6 | Overweight | 126/83 | 77 | 4200 | NaN |
| 1 | 2 | Male | 28 | Doctor | 6.2 | 6 | 60 | 8 | Normal | 125/80 | 75 | 10000 | NaN |
| 2 | 3 | Male | 28 | Doctor | 6.2 | 6 | 60 | 8 | Normal | 125/80 | 75 | 10000 | NaN |
| 3 | 4 | Male | 28 | Sales Representative | 5.9 | 4 | 30 | 8 | Obese | 140/90 | 85 | 3000 | Sleep Apnea |
| 4 | 5 | Male | 28 | Sales Representative | 5.9 | 4 | 30 | 8 | Obese | 140/90 | 85 | 3000 | Sleep Apnea |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 369 | 370 | Female | 59 | Nurse | 8.1 | 9 | 75 | 3 | Overweight | 140/95 | 68 | 7000 | Sleep Apnea |
| 370 | 371 | Female | 59 | Nurse | 8.0 | 9 | 75 | 3 | Overweight | 140/95 | 68 | 7000 | Sleep Apnea |
| 371 | 372 | Female | 59 | Nurse | 8.1 | 9 | 75 | 3 | Overweight | 140/95 | 68 | 7000 | Sleep Apnea |
| 372 | 373 | Female | 59 | Nurse | 8.1 | 9 | 75 | 3 | Overweight | 140/95 | 68 | 7000 | Sleep Apnea |
| 373 | 374 | Female | 59 | Nurse | 8.1 | 9 | 75 | 3 | Overweight | 140/95 | 68 | 7000 | Sleep Apnea |
374 rows × 13 columns
1.11.3. Exercises#
In the following questions, you’ll use descriptive statistics and indexing to explore questions about sleep and health.
When you are asked to calculate a value (for example, a mean or standard deviation) rather than produce a full table, you should report your answer in words in the text box below the code block. This is exactly how you would do it in a written report.
When the question asks you to “comment”, you are being asked to interpret the data. That is, explain what you notice, what patterns stand out, or what the numbers might mean in context. Use plain English and discuss your ideas with your tutor and classmates. Developing the skill of turning numbers into insight is one of the most important parts of learning data analysis.
Part 1: Sleep Duration#
a. What is the average sleep duration across all participants?
# Your code here
your text here
b. Compare the mean sleep duration across the Gender
# Your code here
your text here
c. Comment on your findings.
Part 2: Stress and Activity#
a. What is the average physical activity level, sleep duration, and DailySteps for participants across stress levels?
# Your code here
b. Split the data set into high stress and low stress individuals
# Your code here
#highstress =
#lowstress =
c. Which group is most physically active?
# Your code here
Part 3: Age and Sleep#
a. What is the relationship between Age and Sleep
# Your code here
b. What sort of things might explain this relationship
your text here
Part 4: Open Exploration#
What other relationships might be interesting to explore?