Skills lab 05: Data viz and t-test

Setup

Packages and data

Load the necessary packages:

library(tidyverse)
library(ggrain)

Data

Load the data and save into an object called smarvus_tib

smarvus_tib <- readr::read_csv("data/smarvus_data.csv")

Codebook

ricomisc::rstudio_viewer("smarvus_codebook.html", "data")

Today’s Task

You are working on an ongoing research project, perhaps on a placement year or for your final-year project. The project is investigating accessibility at University, specifically how diagnosis of specific learning differences (SpLD) impacts students’ experiences. A key research question is whether students with or without diagnosis of SpLD differ in their average levels of adverse experiences, such as anxiety, worry, and fear of being perceived negatively.

The previous research assistant has already created a plot, but it’s not the best 😬 and once again, they haven’t documented their code! They’ve also run the analysis, but their time on the project ended before they got around to the formal write-up.

Our tasks are:

Recreate the plot.
Make improvements to the plot, with clearly documented code.
Recreate the analysis
Report the analysis in full, in APA style.

Task 1: Plot

Look at the plot that the previous RA created.

What are the variables? What levels or values do they have?
What information is represented on the plot?
- Is there any info information that should be added - or removed?

1smarvus_tib |>
2  ggplot(aes(x = spld, y = bfne, colour = spld)) +
3  ggrain::geom_rain() +
4  stat_summary(fun.data = mean_cl_normal, color = "black")+
5  labs(x = "Diagnosis of Specific Learning Difference", y = "Mean Score (BFNE)") +
6  theme_bw()

1: Take the smarvus_tib dataset, and then
2: Set up a plot, using the aesthetics (variables) spld on the x-axis, bfne on the y-axis, and different colours for different levels in spld
3: Draw a raincloud plot
4: Add a summary statistic element of means and CIs, in black
5: Relabel the x and y axes
6: Apply a theme

Task 2: Doing Better

To-Do List

Remove NAs
- Cannot know why they are missing, not a good comparison
Remove superfluous legend
Change colours
- Keep accessible colours in mind!
- One light, one dark; avoid contrasting red and green

Implementation

smarvus_tib |> 
1  dplyr::filter(!is.na(spld)) |>
  ggplot(aes(x = spld, y = bfne, colour = spld)) +
  ggrain::geom_rain() + 
  stat_summary(fun.data = mean_cl_normal, color = "black")+
  labs(x = "Diagnosis of Specific Learning Difference", y = "Mean Score (BFNE)") +
2  scale_color_manual(values =  c("#003B4A", #Sussex: Ocean Teal
                                 "#F5C48A" # Sussex: Shell Peach
                                  )
                       ) +
  theme_bw() +
3  theme(legend.position = "none")

1: Temporarily remove NAs. Note that this change is NOT assigned to the dataset (yet!)
2: Add custom colours from the Sussex palette using hex codes
3: Remove unneeded legend

OPTIONAL: Further Changes

We could improve this plot even more with some more tweaks. Some things to consider are:

Changing the order of categories; it feels more natural, and matches the colours better, to have Yes before No
- Be careful: this is more than just relabeling the axes!
Changing the appearance of the points
Using the fill argument to make the plot look nicer

Some - but not all! - of these changes have some hints in the tutorial, but there are no solutions for this task. What do you think makes a good plot?

Task 3: Doing the Analysis

The previous research assistant’s notes just have an image of this output:

Let’s first try to recreate the results.

t.test(bfne ~ spld, data = smarvus_tib)


    Welch Two Sample t-test

data:  bfne by spld
t = -0.4567, df = 339.15, p-value = 0.6482
alternative hypothesis: true difference in means between group No and group Yes is not equal to 0
95 percent confidence interval:
 -0.1734689  0.1080943
sample estimates:
 mean in group No mean in group Yes 
         3.226893          3.259580

Why weren’t NAs a problem this time? 🤔
What’s the mean difference between the two spld conditions?
Is this difference statistically significant?

round(3.226893 - 3.259580, 2)

[1] -0.03

Writing Up

A complete writeup of the results should include:

A brief description of the design and the analysis that was conducted (one sentence)
A plain-language interpretation of the statistical results, along with the statistical reporting
The key descriptive information, for example the means or counts in each group.

For the analysis, the statistical reporting will look like:

test_statistic_name(degrees_of_freedom) = test_statistic_value, p = p_value, M_diff = difference_in_means, 95% CI [CI_lower, CI_upper]

Which for the analysis is:

t(339.15) = -0.46, p = .648, M_diff = -0.03, 95% CI [-0.17, 0.11]

Have a go writing up the rest of the results. Use the lecture for more help, and feel free to check it with the teaching team in a practical or drop-in.

Render!

Render your document!

References

A fun meta-research paper discussing why bar charts are often not a great idea: https://journals.plos.org/plosbiology/article?id=10.1371/journal.pbio.1002128