Pavel Duryagin ran an experiment on perception of vowel reduction in Russian language. The dataset shva
includes the following variables:
time1
- reaction time 1duration
- duration of the vowel in the stimuly (in milliseconds, ms)time2
- reaction time 2f1
, f2
, f3
- the 1st, 2nd and 3rd formant of the vowel measured in Hz (for a short introduction into formants, see here)vowel
- vowel classified according the 3-fold classification (A - a under stress, a - a/o as in the first syllable before the stressed one, y (stands for shva) - a/o as in the second etc. syllable before the stressed one or after the stressed syllable, cf. g[y]g[a]t[A]l[y] gogotala `guffawed’).shva
.f1
and f2
using ggplot()
.Design it to look like the following:
f1
and f2
for each vowel using ggplot()
.f1
can be considered outliers in a vowel?We assume outliers to be those observations that lie outside 1.5 * IQR, where IQR, the ‘Inter Quartile Range’, is the difference between the 1st and the 3rd quartile (= 25% and 75% percentile).
f1
and f2
(all data)f1
and f2
for each vowelf2
by f1
.f2
by f1
using vowel
intercept as a random effect880 nouns, adjectives and verbs from the English Lexicon Project data (Balota et al. 2007).
Format
– A data frame with 880 observations on the following 5 variables.Word
– a factor with lexical stimuli.Length
– a numeric vector with word lengths.SUBTLWF
– a numeric vector with frequencies in film subtitles.POS
– a factor with levels JJ (adjective) NN (noun) VB (verb)Mean_RT
– a numeric vector with mean reaction times in a lexical decision taskSource (http://elexicon.wustl.edu/WordStart.asp)
Data from Natalya Levshina’s RLing
package available (here)[https://raw.githubusercontent.com/agricolamz/2018-MAG_R_course/master/data/ELP.csv]
elp
.I’ve used scale_color_continuous(low = "lightblue", high = "red")
Mean_RT
by log(SUBTLWF)
using POS intercept as a random effectA data set with examples of two Dutch periphrastic causatives from newspaper corpora.
A data frame with 100 observations on the following 7 variables.
Cx
– a factor with levels doen_V and laten_VCrSem
– a factor that contains the semantic class of the Causer with levels Anim (animate) and Inanim (inanimate).CeSem
– a factor that describes the semantic class of the Causee with levels Anim (animate) and Inanim (inanimate).CdEv
– a factor that describes the semantic domain of the caused event expressed by the Effected Predicate. The levels are Ment (mental), Phys (physical) and Soc (social).Neg
– a factor with levels No (absence of negation) and Yes (presence of negation).Coref
– a factor with levels No (no coreferentiality) and Yes (coreferentiality).Poss
– a factor with levels No (no overt expression of possession) Yes (overt expression of possession)Data from Natalya Levshina’s RLing
package available (here)[https://raw.githubusercontent.com/agricolamz/2018-MAG_R_course/master/data/dutch_causatives.csv]
d_caus
.Aux
and other categorical variables (Aux
~ CrSem
, Aux
~ CeSem
, etc) is statistically significant. The assiciation with which variable should be analysed using Fisher’s Exact Test and not using Pearson’s Chi-squared Test? Is this association statistically significant?Aux
and EPTrans
are not independent with the help of Pearson’s Chi-squared Test.Aux
and EPTrans
variables.Use mosaic()
function from vcd
library.
Below is an example of how to use mosaic() with three variables.
vcd::mosaic(~ Aux + CrSem + Country, data=d_caus, shade=TRUE, legend=TRUE)