Skills Lab 06: Sample Take-Away Paper

Author

Skills Lab 06

Important Points

Use what you know: The TAP is about what we have taught you about working with, presenting, and analysing data (like it says on the tin!). Use previous Skills Labs, tutorials, and worksheets to solve the tasks.
Read the academic misconduct info carefully: You must work independently, without help/input from others. Use of ChatGPT or other programmes to generate or check code or answers is NOT permitted.
Don’t panic: If you hit a technical problem, come to a regular or tech help drop-in (see the Module Contacts page on Canvas)
KNOW YOUR DEADLINE: You must submit before the deadline or you will receive a 0. Make sure you know YOUR deadline and get it done well before!

FAQs

The following questions have come up previously in case it’s useful for anyone to know.

What can we do to firmly secure a mark in the 70-80 band?

You should demonstrate the techniques, functions, skills, and concepts that we have taught you on this module.
You should ensure that your answers demonstrate a clear connection what you say you are going to do, and what you actually produce with your code.
Your answers, both written and code, should be clear, accurate, relevant, and without errors.
Render your document early and often, and compare carefully to the Study Brief.

For the tasks where we need to recreate output, does that include how the table/graph visually looks or just outputting the same raw data?

It includes everything about the presentation of the table/graph, both the content and formatting, as it appears in the Study Brief as compared to your rendered document. Make sure you check your rendered document frequently as you are working.

Can we use colour etc. or do we have to stick to black-and-white/APA style?

For tasks where you may be asked to “suggest improvements” to e.g. tables and graphs, what you produce must be accurate, easy to read/understand, fit for purpose (that is, it represents the data and relationships of interest clearly and transparently), and accessible. If you would like to add more colour, flair, or visual interest, please do - you do not have to only stick to basic formatting or greyscale. However, if your aesthetic choices interfere with the key basic characteristics of your output (i.e. accuracy, ease of reading, fit for purpose, accessible), this may have a negative impact on your mark.

How much is the TAP worth?

The TAP is worth 25% of your overall module mark on Analysing Data.

Important

As I said previously, we will not release complete solutions to the sample TAP. The code included below is only what was covered in the live Skills Lab session, with some extra tips to help you succeed.

Setup

Data

Run the code chunk below by clicking the green ▶️ button. This will read in the dataset to your Environment.

syn_data <- readr::read_csv("data/syn_data.csv")

Libraries

Load any necessary packages here.

library(tidyverse)

Task 1: Cleaning and Preparation

1.1

Inspect your dataset, and compare it to the Study Brief, particularly the Codebook. Identify any discrepancies between the two, using the Study Brief as a guide for what your dataset should look like, how it needs to be cleaned, and what variables it should contain.

Then, make a list below of the steps you must take to clean or change your dataset so that it matches the information in the Study Brief.

Relabel gender
Create syn_type

Tip

For this task especially, it is imperative that you read the Study Brief and Codebook carefully. Your answer here should be a short list of the steps that you need to take.

1.2

Complete the steps you have listed in task 1.1 to prepare your dataset for analysis, using the coding and analysis techniques we have covered on the module thus far.

syn_data <- syn_data |>
  dplyr::mutate(
    syn_type = dplyr::case_when(
      gc_score <= 1.43 & syn_graph_col == "Yes" & syn_seq_space == "Yes" ~ "both",
      gc_score <= 1.43 & syn_graph_col == "Yes" ~ "grapheme_colour",
      syn_seq_space == "Yes" ~ "sequence_space",
      .default = "non_syn"
    )
  )

Note

Note that this is NOT a complete answer to this task. We did not cover gender relabelling in the Skills Lab, but if you can do the recoding above, it’s the same idea.

Task 2: Summary Table

The previous research assistant produced a summary table, which you can find in the Output section of the Study Brief.

2.1

Write your own code to reproduce the summary table exactly as it appears in the Study Brief, and print it out below.

Tip

Make sure you compare the Study Brief output to how this table appears in your rendered document, not just when you run it in RStudio.

2.2

Evaluate the table from 2.1 in the context of the Study Brief and good practice we have discussed on the module thus far. How, if at all, can this table be improved? Make a list of recommendations and briefly explain why you have suggested each point.

2.3

Implement the improvements you have suggested in 2.2 to make an improved version of the same table.

Task 3: Visualisation

The previous research assistant also produced a data visualisation, which you can find in the Output section of the Study Brief.

3.1

Write your own code to reproduce this visualisation exactly as it appears in the Study Brief, and print it out below.

Tip

Make sure you compare the Study Brief output to how this visualisation appears in your rendered document, not just when you run it in RStudio.

3.2

Evaluate the visualisation from 3.1 in the context of the Study Brief and good practice we have discussed on the module thus far. How, if at all, can this visualisation be improved? Make a list of recommendations and briefly explain why you have suggested each point.

3.3

Implement the improvements you have suggested in 3.2 to make an improved version of the same visualisation.

Tip

If you want to add colour or use other options to make your visualisation look good, creativity is great! You do not have to stick to only greyscale/APA style. However, your visualisation must still be accurate, easy to read/understand, fit for purpose, and accessible.

Task 4: Analysis

Finally, the previous research assistant also produced some analysis output, which you can find in the Output section of the Study Brief.

4.1

Write your own code to reproduce the test output exactly as it appears in the Study Brief, and print it out below.

t.test(scsq_organise ~ syn_type, data = syn_data)

Error in t.test.formula(scsq_organise ~ syn_type, data = syn_data): grouping factor must have exactly 2 levels

Tip

Make sure you read the error carefully - it’s very informative here! It tells us that we need exactly 2 levels (that is, unique values) in our grouping variable, which is syn_type. However, when we created it above, we made it contain four: both, grapheme-colour, sequence-space, and non-syn. That was correct, but now the Study Brief tells us which two categories we want to compare. Before we can run the test, we first need to keep only the rows from the participants we want to compare.

4.2

Evaluate the test output from 4.1 in the context of the Study Brief. Does this output adequately address the research question? Explain your reasoning and, if necessary, produce improved test output below.

syn_data_t <- syn_data |> 
  dplyr::filter(syn_type %in% c("grapheme_colour", "sequence_space"))

## IMPORTANT: update the data argument
## or you'll still have the same error as before!
t.test(scsq_organise ~ syn_type, data = syn_data_t)


    Welch Two Sample t-test

data:  scsq_organise by syn_type
t = -0.68232, df = 30.413, p-value = 0.5002
alternative hypothesis: true difference in means between group grapheme_colour and group sequence_space is not equal to 0
95 percent confidence interval:
 -0.5826208  0.2906861
sample estimates:
mean in group grapheme_colour  mean in group sequence_space 
                     3.029545                      3.175513

Tip

Here, I chose to create a new dataset, syn_data_t, which contains the filtered output keeping only grapheme-colour and sequence-space synaesthetes. I could have just assigned this change to the whole dataset: syn_data <- syn_data |> ... The choice to make a new dataset is style/preference, but I recommend it in this situation. The reason is, if I overwrite syn_data, I now have only the grapheme-colour and sequence-space synaesthetes left in my dataset; so if I want to go on to do other things using the rest of the data, I would have to rerun my code to get them back. By creating syn_data_t instead for the specific purpose of running the t-test, I still have access to the whole dataset in syn_data if I want to use it.