1. Vowel reduction in Russian

Pavel Duryagin ran an experiment on perception of vowel reduction in Russian language. The dataset shva includes the following variables:

1.0 Read the data from file to the variable shva.

1.1 Scatterplot f1 and f2 using ggplot().

Design it to look like the following:

1.2 Plot the boxplots of f1 and f2 for each vowel using ggplot().

1.3 Which f1 can be considered outliers in a vowel?

We assume outliers to be those observations that lie outside 1.5 * IQR, where IQR, the ‘Inter Quartile Range’, is the difference between the 1st and the 3rd quartile (= 25% and 75% percentile).

1.4 Calculate Pearson’s correlation of f1 and f2 (all data)

1.5 Calculate Pearson’s correlation of f1 and f2 for each vowel

1.6 Use the linear regression model to predict f2 by f1.

1.6.1 Provide the result regression formula

1.6.2 Provide the adjusted R\(^2\)

1.6.3 Add the regression line in scatterplot 1.1

1.7 Use the mixed-efects model to predict f2 by f1 using vowel intercept as a random effect

1.7.1 Provide the fixed effects formula

1.7.2 Provide the variance for intercept argument for vowel random effects

1.7.3 Add the regression line in scatterplot 1.1

2. English Lexicon Project data

880 nouns, adjectives and verbs from the English Lexicon Project data (Balota et al. 2007).

Source (http://elexicon.wustl.edu/WordStart.asp)

Data from Natalya Levshina’s RLing package available (here)[https://raw.githubusercontent.com/agricolamz/2018-MAG_R_course/master/data/ELP.csv]

2.0 Read the data from file to the variable elp.

2.1 Which two variables have the highest Pearson’s correlaton value.

2.2 Group your data by parts of speech and make a scatterplot of SUBTLWF and Mean_RT.

I’ve used scale_color_continuous(low = "lightblue", high = "red")

2.3 Use the linear regression model to predict Mean_RT by log(SUBTLWF) and POS.

2.3.1 Provide the result regression formula

2.3.2 Provide the adjusted R\(^2\)

2.3.3 Add the regression line in scatterplot 1.1

2.4 Use the mixed-efects model to predict Mean_RT by log(SUBTLWF) using POS intercept as a random effect

2.4.1 Provide the fixed effects formula

2.4.2 Provide the variance for intercept argument for POS random effects

2.4.3 Add the regression line to scatterplot

3. Dutch causative constructions

A data set with examples of two Dutch periphrastic causatives from newspaper corpora.

A data frame with 100 observations on the following 7 variables.

Data from Natalya Levshina’s RLing package available (here)[https://raw.githubusercontent.com/agricolamz/2018-MAG_R_course/master/data/dutch_causatives.csv]

3.0 Read the data from file to the variable d_caus.

3.1 We are going to test whether the association between Aux and other categorical variables (Aux ~ CrSem, Aux ~ CeSem, etc) is statistically significant. The assiciation with which variable should be analysed using Fisher’s Exact Test and not using Pearson’s Chi-squared Test? Is this association statistically significant?

3.2. Test the hypothesis that Aux and EPTrans are not independent with the help of Pearson’s Chi-squared Test.

3.3 Provide expected values for Pearson’s Chi-squared Test of Aux and EPTrans variables.

3.4. Calculate the odds ratio.

3.5 Calculate effect size for this test using Cramer’s V (phi).

3.6. Report the results of independence test using the following template:

3.7 Visualize the distribution using mosaic plot.

Use mosaic() function from vcd library.

Below is an example of how to use mosaic() with three variables.

vcd::mosaic(~ Aux + CrSem + Country, data=d_caus, shade=TRUE, legend=TRUE)

3.9 Provide a short text (300 words) describing the hypothesis on this study and the results of your analysis.