library(tidyverse)
library(ggrain)Skills lab 05: Data viz and t-test
Setup
Packages and data
Load the necessary packages:
Data
Load the data and save into an object called smarvus_tib
smarvus_tib <- readr::read_csv("data/smarvus_data.csv")Codebook
ricomisc::rstudio_viewer("smarvus_codebook.html", "data")Today’s Task
You are working on an ongoing research project, perhaps on a placement year or for your final-year project. The project is investigating accessibility at University, specifically how diagnosis of specific learning differences (SpLD) impacts students’ experiences. A key research question is whether students with or without diagnosis of SpLD differ in their average levels of adverse experiences, such as anxiety, worry, and fear of being perceived negatively.
The previous research assistant has already created a plot, but it’s not the best 😬 and once again, they haven’t documented their code! They’ve also run the analysis, but their time on the project ended before they got around to the formal write-up.
Our tasks are:
- Recreate the plot.
- Make improvements to the plot, with clearly documented code.
- Recreate the analysis
- Report the analysis in full, in APA style.
Task 1: Plot
Look at the plot that the previous RA created.
- What are the variables? What levels or values do they have?
- What information is represented on the plot?
- Is there any info information that should be added - or removed?
1smarvus_tib |>
2 ggplot(aes(x = spld, y = bfne, colour = spld)) +
3 ggrain::geom_rain() +
4 stat_summary(fun.data = mean_cl_normal, color = "black")+
5 labs(x = "Diagnosis of Specific Learning Difference", y = "Mean Score (BFNE)") +
6 theme_bw()- 1
-
Take the
smarvus_tibdataset, and then - 2
-
Set up a plot, using the
aesthetics (variables)spldon the x-axis,bfneon the y-axis, and different colours for different levels inspld - 3
- Draw a raincloud plot
- 4
- Add a summary statistic element of means and CIs, in black
- 5
- Relabel the x and y axes
- 6
- Apply a theme
Task 2: Doing Better
To-Do List
- Remove NAs
- Cannot know why they are missing, not a good comparison
- Remove superfluous legend
- Change colours
- Keep accessible colours in mind!
- One light, one dark; avoid contrasting red and green
Implementation
smarvus_tib |>
1 dplyr::filter(!is.na(spld)) |>
ggplot(aes(x = spld, y = bfne, colour = spld)) +
ggrain::geom_rain() +
stat_summary(fun.data = mean_cl_normal, color = "black")+
labs(x = "Diagnosis of Specific Learning Difference", y = "Mean Score (BFNE)") +
2 scale_color_manual(values = c("#003B4A", #Sussex: Ocean Teal
"#F5C48A" # Sussex: Shell Peach
)
) +
theme_bw() +
3 theme(legend.position = "none")- 1
- Temporarily remove NAs. Note that this change is NOT assigned to the dataset (yet!)
- 2
- Add custom colours from the Sussex palette using hex codes
- 3
- Remove unneeded legend
We could improve this plot even more with some more tweaks. Some things to consider are:
- Changing the order of categories; it feels more natural, and matches the colours better, to have Yes before No
- Be careful: this is more than just relabeling the axes!
- Changing the appearance of the points
- Using the
fillargument to make the plot look nicer
Some - but not all! - of these changes have some hints in the tutorial, but there are no solutions for this task. What do you think makes a good plot?
Task 3: Doing the Analysis
The previous research assistant’s notes just have an image of this output:
Let’s first try to recreate the results.
t.test(bfne ~ spld, data = smarvus_tib)
Welch Two Sample t-test
data: bfne by spld
t = -0.4567, df = 339.15, p-value = 0.6482
alternative hypothesis: true difference in means between group No and group Yes is not equal to 0
95 percent confidence interval:
-0.1734689 0.1080943
sample estimates:
mean in group No mean in group Yes
3.226893 3.259580
- Why weren’t NAs a problem this time? 🤔
- What’s the mean difference between the two
spldconditions? - Is this difference statistically significant?
round(3.226893 - 3.259580, 2)[1] -0.03
A complete writeup of the results should include:
- A brief description of the design and the analysis that was conducted (one sentence)
- A plain-language interpretation of the statistical results, along with the statistical reporting
- The key descriptive information, for example the means or counts in each group.
For the analysis, the statistical reporting will look like:
test_statistic_name(degrees_of_freedom) = test_statistic_value, p = p_value, Mdiff = difference_in_means, 95% CI [CI_lower, CI_upper]
Which for the analysis is:
t(339.15) = -0.46, p = .648, Mdiff = -0.03, 95% CI [-0.17, 0.11]
Have a go writing up the rest of the results. Use the lecture for more help, and feel free to check it with the teaching team in a practical or drop-in.
Render!
Render your document!
References
- A fun meta-research paper discussing why bar charts are often not a great idea: https://journals.plos.org/plosbiology/article?id=10.1371/journal.pbio.1002128