Durational differences of homophonous suffixes emerge from the lexicon: Three Analyses
This script documents the analysis of ENGS production data with measures of Linear Discriminative Learning (LDL; e.g. Baayen et al., 2018; Chuang et al., 2020). This is a work in progress.
The idea of this LDL implementation is the following: Networks trained on real word data and their transformation matrices are used to create semantic matrices for pseudowords, following the implementation by Chuang et al. (2020). The underlying assumption is that speakers who face pseudowords do so with their knowledge on previously encountered words, i.e. their lexicon. Thus, pseudowords are not semantically empty shells but contain some sort of meaning.
The following script includes a reproduction of most steps found in the three previous scripts, here, here, and here. Newly added is the updated pseudoword S matrix creation as well as a more thorough modelling procedure.
Note: In the remainder of this document, the following naming conventions are used:
- real word cue matrices are called \(mald.C\), \(mald.Cmat\), \(mald.cues\), and \(C_{realwords}\) interchangeably
- pseudoword cue matrices are called \(engs.C\), \(engs.Cmat\), \(engs.cues\), and \(C_{pseudowords}\) interchangeably
- real word semantic matrices are called \(mald.S\), \(mald.Smat\) and \(S_{realwords}\) interchangeably
- pseudoword semantic matrices are called \(engs.S\), \(engs.Smat\) and \(S_{pseudowords}\) interchangeably
1 Data
1.1 MALD data
Data on real words is taken from the Massive Auditory Lexical Decision (MALD) Database (Tucker et al., 2018). It is a subset of the complete MALD database, containing 8362 words of which 2120 words contain one of 36 different affixes.
[1] 8362 4
1.2 ENGS data
Pseudoword data is taken from the ENGS production experiment (Schmitz et al., 2020). In total, 48 pseudowords are contained within this data set, with half of them representing monomorphemic words (e.g. singular bloufs) and half of them representing plural wordforms (e.g. plural bloufs). As a number of pseudowords showed more than one consistent pronunciation, these are contained more than twice, e.g. prups.
## load pseudowords (ENGS)
engs.data <- read.csv("data666/final_S_data_LDL_types.csv")
dim(engs.data)
[1] 78 4
1.3 COMB data
The combined data set contains both data on real and on nonce words as introduced in 1.1 and 1.2. In total, 8440 words and 36 different affixes are part of this data set.
## create a combined data set
comb.data <- rbind(mald.data, engs.data)
## sort combined data set alphabetically
comb.data <- comb.data[order(comb.data$Word),]
dim(comb.data)
[1] 8440 4
2 A matrix
We can create an \(A\) matrix for real words using the semantic vectors constructed of the TASA corpus. The \(A\) matrix consists of semantic vectors (columns) for all lexomes in orthographic form (rows). Note: Inflectional and derivational lexomes are represented by abbreviations, e.g. PL for plural.
X | meat | mechanic | mechanical | mechanism | medal |
---|---|---|---|---|---|
eat | 0.0118417 | -0.0001872 | 0.0002885 | -0.0002331 | 1.98e-05 |
echo | 0.0005983 | -0.0000384 | -0.0000922 | -0.0000899 | -2.05e-05 |
eclipse | 0.0000350 | -0.0003962 | -0.0000451 | -0.0000278 | -9.70e-06 |
## load TASA vectors
load("data/wL2Ld.rda")
## find all unique MALD (= real word) lexomes
mald.subset.allLexomes = unique(c(mald.data$Word, mald.data$Affix))
mald.subset.allLexomes = mald.subset.allLexomes[!is.na(mald.subset.allLexomes)]
## find vectors for all unique lexomes > this then is the A matrix
mald.Amat = wL2Ld[mald.subset.allLexomes, ]
3 C matrices
Cue matrices contain all triphones of a data set (columns) which are coded binary for all words in phonological form (rows).
X | X..b | X.b. | b.n | X.nd | nd. | d.n | X.nm | nm. | m.n | X.nt | nt. | b.S | X.S. | X..b.1 | X.bI |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
@b{nd@nm@nt | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 0 | 0 | 0 | 0 |
@b{S | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 0 |
{bI | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 |
{b@t | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 |
{bdIk1t | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 |
3.1 real words
The cue matrix for real words \(mald.C\) is computed based on the MALD data set and \(Word+Affix\), i.e. words and affixes are considered.
mald.cues = make_cue_matrix(data = mald.data, formula = ~ Word + Affix, grams = 3, wordform = "DISC")
dim(mald.cues$matrices$C)
# 8362 7610
3.2 pseudowords
The cue matrix for pseudowords \(engs.C\) is computed based on the ENGS data set and \(Word+Affix\), i.e. words and affixes are considered.
engs.cues = make_cue_matrix(data = engs.data, formula = ~ Word + Affix, grams = 3, wordform = "DISC")
dim(engs.cues$matrices$C)
# 78 78
3.3 combined
The cue matrix for real and nonce words \(comb.C\) is computed based on the combined data set and \(Word+Affix\), i.e. words and affixes are considered.
comb.cues = make_cue_matrix(data = comb.data, formula = ~ Word + Affix, grams = 3, wordform = "DISC")
dim(comb.cues$matrices$C)
# 8440 7610
4 S matrices
Semantic matrices contain all word forms of a data set (columns) and their pertinent values for all lexomes in phonological form (rows).
X | meat | mechanic | mechanical | mechanism | medal |
---|---|---|---|---|---|
it | 0.0118417 | -0.0001872 | 0.0002885 | -0.0002331 | 1.98e-05 |
Ek5 | 0.0005983 | -0.0000384 | -0.0000922 | -0.0000899 | -2.05e-05 |
IklIps | 0.0000350 | -0.0003962 | -0.0000451 | -0.0000278 | -9.70e-06 |
4.1 real words
The semantic matrix for real words \(mald.S\) is computed based on the MALD data set, the MALD \(A\) matrix, and \(Word+Affix\), i.e. words and affixes are considered.
mald.Smat = make_S_matrix(data = mald.data, formula = ~ Word + Affix, inputA = mald.Amat, grams = 3, wordform = "DISC")
dim(mald.Smat)
# 8362 5487
4.2 pseudowords
We can create a semantic matrix for pseudowords \(engs.S\) based on the semantics of real words solving the following equation
\[engs.Smat = engs.C * F\] with \(engs.Smat\) being the semantic matrix for pseudowords, \(engs.C\) being the cue matrix of pseudowords, and \(F\) being the transformation matrix for mapping real word cues onto real word semantics.
To obtain this real word transformation matrix \(F\) we first need to learn comprehension of real words.
Then, we can extract \(F\) from mald.comp
.
The following toy example illustrates the use of the \(F\) transformation matrix in mapping \(C\) onto \(S\), i.e. solving \[mald.C * F = predicted.mald.Smat\] Consider the following toy example illustrating the procedure. The lexicon of this toy example includes the words cat, bus, and eel. Thus, its cue matrix \(C\) looks as follows:
For the same toy lexicon, suppose that the semantic vectors for these three words are the row vectors of the following semantic matrix \(S\):
Then, we are interested in a transformation matrix \(F\), such that
\[C*F=S\] The transformation matrix \(F\) is straightforward to obtain. Let \(C'\) denote the Moore-Penrose generalized inverse of \(C\). Then,
\[F=C'S\]
Thus, for the toy lexicon example,
with \(CF\) being exactly equal to \(S\) in this simple toy example.
Additionally, for matrix multiplication, the number of columns in the first matrix must be equal to the number of rows in the second matrix (e.g. Nykamp, 2020). While this is easily achieved, one must also consider the order of columns. This is easily illustrated by the following example:
\[ \left(\begin{array}{cc} A & B\\ C & D \end{array}\right) \left(\begin{array}{cc} \alpha & \beta \\ \gamma & \delta \end{array}\right) = \left(\begin{array}{cc} A\alpha+B\gamma & A\beta+B\delta \\ C\alpha+D\gamma & C\beta+D\delta \end{array}\right) \] Let the first matrix be a simplified version of \(C_{pseudowords}\) and the second matrix a simplified version of \(F\). The resulting matrix then is a simplified version of \(\hat{S}\). However, if the order of columns (i.e. triphones) in \(C_{realwords}\), from which \(F\) is derived, is different to the order of columns (i.e. triphones) in \(C_{pseudowords}\), multiplication is possible, yet different (wrong) columns and rows are multiplied, as shown in the following example:
\[ \left(\begin{array}{cc} B & A \\ D & C \end{array}\right) \left(\begin{array}{cc} \alpha & \beta \\ \gamma & \delta \end{array}\right) = \left(\begin{array}{cc} B\alpha+A\gamma & B\beta+A\delta \\ D\alpha+C\gamma & D\beta+C\delta \end{array}\right) \]
Thus, we must make sure to use a pseudoword cue matrix \(C_{pseudowords}\) with the same order of columns as the real word cue matrix \(C_{realwords}\). This, in turn, also expands the pseudoword cue matrix \(C_{pseudowords}\), i.e. to contain all column names of the real word cue matrix \(C_{realwords}\) it must have an identical number of columns.
Using two functions of the LDLConvFunctions
package, this is easily done. First, find which column in the real word cue matrix \(C_{realwords}\) corresponds to which column in the pseudoword cue matrix \(C_{pseudowords}\):
Then, use this information to create a new pseudoword cue matrix \(C\) which a) has the same number of columns as the real word cue matrix, and b) the same order of columns (i.e. triphones) as the real word cue matrix:
pseudo.C.new <- reorder_pseudo_C_matrix(pseudo_C_matrix = engs.cues, real_C_matrix = mald.cues, found_triphones = found_triphones)
Finally, we can solve the aforementioned equation \(engs.Smat = engs.C * F\).
engs.Smat = pseudo.C.new %*% F
dim(engs.Smat)
# 78 5487
# change rownames to pseudoword rownames (instead of numbers)
rownames(engs.Smat) <- engs.data$DISC
In this semantic matrix, 48 pseudowords (i.e. 78 pronunciations) and their semantic vectors are contained. However, all of our pseudoword pairs are homophones, thus, they show identical semantic vectors even though half of them contain a suffix, e.g. the semantic vectors for bloufs (non-morphemic S) and bloufs (plural S) are identical.
X | fierce | fifteen | fifteenth | fifth | fifty |
---|---|---|---|---|---|
bl6fs | -0.0001453 | 0.0002496 | 0.0002782 | 0.0006957 | -0.0001303 |
bl6fs | -0.0001453 | 0.0002496 | 0.0002782 | 0.0006957 | -0.0001303 |
bl6ks | 0.0000021 | -0.0006015 | 0.0000090 | -0.0012144 | 0.0003275 |
bl6ks | 0.0000021 | -0.0006015 | 0.0000090 | -0.0012144 | 0.0003275 |
We “fix” this by adding the “PL” semantic vector values to the semantic vectors of plural pseudowords (idea by Yu-Ying).
Thus, we first need to extract the “PL” semantic vector contained in the general \(A\) matrix.
Taking a closer look at the pseudoword semantic matrix \(engs.S\), we find that every second row corresponds to a plural pseudoword, i.e. this is where we add the “PL” vector.
Taking this plural subset, we add the “PL” semantic vector to all rows.
We then recombine the original monomorphemic semantic matrix and the modified plural semantic matrix, and recreate the original row order.
engs.nm = engs.Smat[seq(1, nrow(engs.Smat), 2), ]
engs.Smat <- rbind(engs.nm, engs.pl.added)
sort.rows <- row.names(engs.Smat.old)
engs.Smat <- engs.Smat[order(match(rownames(engs.Smat), sort.rows)), , drop = FALSE]
Now, homophonous pseudowords have differing semantic vectors.
X | fierce | fifteen | fifteenth | fifth | fifty |
---|---|---|---|---|---|
bl6fs | -0.0001453 | 0.0002496 | 0.0002782 | 0.0006957 | -0.0001303 |
bl6fs | -0.0000796 | 0.0001372 | 0.0003686 | 0.0003187 | -0.0002868 |
bl6ks | 0.0000021 | -0.0006015 | 0.0000090 | -0.0012144 | 0.0003275 |
bl6ks | 0.0000678 | -0.0007139 | 0.0000994 | -0.0015913 | 0.0001709 |
4.3 combined
Finally, we can create one combined semantic matrix \(comb.S\) for real and nonce words.
However, WpmWithLDL
functions in R are sensitive to row order. Thus, we need to recreate the original row order as found in the combined cue matrix \(comb.C\).
all(rownames(comb.cues$matrices$C) == rownames(comb.Smat))
# FALSE
comb.Smat2 <- comb.Smat[rownames(comb.cues$matrices$C),,drop=FALSE]
all(rownames(comb.cues$matrices$C) == rownames(comb.Smat2))
# TRUE
5 Comprehension
Using \(comb.C\) and \(comb.S\), we can now learn and evaluate comprehension of real and nonce words in one go, i.e. as if they are encountered by a person with their individual lexicon.
First, comprehension is learned.
Second, the accuracy of this comprehension is evaluated. That is, how good can form (cue matrix) be mapped onto meaning (semantic matrix). The result of this mapping, i.e. a predicted semantic matrix, is obtained as \(\hat{S}\).
comb.comp_acc = accuracy_comprehension(m = comb.comp, data = comb.data, wordform = "DISC", show_rank = TRUE, neighborhood_density = TRUE)
comb.comp_acc$acc # 0.7440758
We find that comprehension accuracy is about \(74\%\).
6 Production
Using \(comb.C\) and \(comb.S\) as well as the comprehension results, we now are able to learn production and evaluate its accuracy.
First, production is learned.
Second, the accuracy of this production is evaluated. That is, how good can meaning (semantic matrix) be mapped onto form (cue matrix). The result of this mapping, i.e. a predicted cue matrix, is obtained as \(\hat{C}\). Note: \(C\) and \(\hat{C}\) are sometimes referred to as \(T\) and \(\hat{T}\) when talking about production (i.e. T as in target matrix).
comb.prod_acc = accuracy_production(m = comb.prod, data = comb.data, wordform = "DISC", full_results = TRUE, return_triphone_supports = TRUE)
comb.prod_acc$acc # 0.9727488
We find that production accuracy is about \(97\%\).
7 Measures
7.1 Comprehension
Extracting and saving comprehension measures is straightforward.
comb.comp_measures = comprehension_measures(comp_acc = comb.comp_acc, Shat = comb.comp$Shat, S = comb.Smat2, A = mald.Amat, affixFunctions = comb.data$Affix)
A brief explanation for each measure is given here:
cor_max
The maximum of correlations between a given word’s predicted semantics \(\hat{s}\) and all other words’ semantics in \(comb.Smat\). For this, correlations between a word’s predicted semantic vector \(\hat{s}\) and all semantic vectors of \(comb.Smat\) are computed. Then, the highest of the correlation values is taken as cor_max. Higher values of this variable indicate a close semantic neighbour, while lower values indicate a larger distance to a word’s nearest semantic neighbour.
cor_target
cor_target describes the correlation between a word’s predicted semantic vector \(\hat{s}\) in \(comb.\hat{S}\) and its targeted semantic vector \(s\) in the \(comb.S\). Higher values indicate a good mapping from form to meaning, i.e. a higher semantic certainty.
l1norm
The l1norm is the sum of the absolute values of vector elements of a given word’s predicted semantic vector \(\hat{s}\), i.e. its city-block distance. Higher values imply more strong links to many other lexomes. Thus, this measure may be interpreted as semantic activation diversity.
l2norm
The l2norm is the square root of the sum of the squared values of a given word’s predicted vector \(\hat{s}\), i.e. its Euclidian distance. Higher values imply more strong links to many other lexomes. Thus, this measure may be interpreted as semantic activation diversity.
rank
Given the correlation values of a word’s predicted semantic vector \(\hat{s}\) and its eight nearest neighbours’ semantic vectors \(s_{n1}...s_{n8}\), ordinal values from \(1\) to \(1+n\) are distributed, with a value of \(1\) given to the word with the highest correlation value. That is, if a word’s predicted semantic vector \(\hat{s}\) is most highly correlated with the targeted word’s semantic vector \(s\), its rank will be \(1\). If a word’s predicted semantic vector \(\hat{s}\) is most highly correlated with any other but the targeted word’s semantic vector, its rank will be \(1+n\). Thus, a value of \(1\) indicates that a predicted semantic vector is closest to its targeted semantic vector, while values of \(1+n\) indicate a mismatch, i.e. a comprehension failure. This measure can be taken as a goodness of comprehension indicator.
density
The correlation values of a word’s predicted semantic vector \(\hat{s}\) and its eight nearest neighbours’ semantic vectors \(s_{n1}...s_{n8}\) are taken into consideration. The mean of these eight correlation values is given in density. Thus, higher values indicate a denser semantic neighbourhood.
cor_affix
This variable describes the correlation of a word’s estimated semantic vector \(\hat{s}\) with the vector of its affix \(s_{affix}\) in \(S\). That is, only words containing affixes are ascribed a cor_affix value. Higher values indicate a higher correlation of estimated word semantics and affix semantics, thus acting as a measure of semantic transparency.
7.2 Production
Extracting and saving production measures runs into problems. It appears that there is 1 word for which comb.prod_acc
does not contain all necessary information. Why? I have no clue.
I worked around this problem by removing the word from full
, i.e. the list in which all necessary info is contained within comb.prod_acc
.
Using the summary()
function, the case with missing info is identified. I call this newly created data frame nineortwelve
as complete entries consist of 12 entries per word, while incomplete entries consist of only 9 entries per word.
Checking this data frame, the following entry can be found with only 9 individual entries instead of 12. Thus, it is removed from full2
.
Fun fact: entry 3725 belongs to the word inauguration.
This reduced full2
then replaces the original full
in a copy of comb.prod_acc
.
Only then production measures can be obtained.
comb.prod_measures = production_measures(prod_acc = comb.prod_acc2, S = comb.Smat2, prod = comb.prod)
A brief explanation for each measure is given here:
correlations
Given all candidate forms of a word, i.e. all word forms found as candidate production by the production model, and their estimated semantic vectors \(s_{candidate1}...s_{candidate1+n}\), correlation values between these estimated semantic vectors and a word’s semantic vector \(s\) are computed. The highest of these correlations values is taken as value of correlations. Similar to rank for comprehension, this may be interpreted as a measure of goodness of production.
path_counts
path_counts describes the number of paths detected for the production of a word by the production model. path_counts may be interpreted as a measure of phonological activation diversity, as higher values indicate the existence of multiple candidates (and thus paths) in production.
path_sum
path_sum describes the summed support of paths for a predicted form. path_sum may be interpreted as a measure of phonological certainty, with higher values indicating a higher certainty in the candidate form.
path_entropies
path_entropies contains the Shannon entropy values which are calculated over the path supports of the predicted form in \(\hat{C}\). Thus, path_entropies may be interpreted as a measure of phonological uncertainty, with higher values indicating a higher level of disorder, i.e. uncertainty.
lwlr
The length-weakest link ratio, lwlr, is computed by taking the number of path nodes divided by the value of the weakest link of that path. Higher values of lwlr indicate less support for a predicted form \(\hat{c}\). Thus, lwlr may be interpreted as a measure of phonological uncertainty, with higher values indicating less certainty.
7.3 LDLConvFunctions
A third set of measures consists of the variables introduced by Chuang et al. (2020). To extract these measures, we can use the LDLConvFunctions
package.
# remove 'inauguration' from data set
comb.data2 <- subset(comb.data, Word != "inauguration")
# load package
library(LDLConvFunctions)
# compute measures
ALCframe <- ALC(pseudo_S_matrix = engs.Smat, real_S_matrix = mald.Smat, pseudo_word_data = engs.data)
ALDCframe <- ALDC(prod_acc = comb.prod_acc2, data = comb.data2)
EDNNframe <- EDNN(comprehension = comb.comp, data = comb.data)
NNCframe <- NNC(pseudo_S_matrix = engs.Smat, real_S_matrix = comb.Smat2, pseudo_word_data = engs.data, real_word_data = mald.data)
SCPPframe <- SCPP(prod_measures = comb.prod_measures, data = comb.data2)
A brief explanation for each measure is given here:
ALC
The Average Lexical Correlation is the mean value of all correlation values of a pseudoword’s estimated semantic vector as contained in \(engs.S\) with each of the real word semantic vectors contained in \(mald.S\). Higher ALC values indicate that a pseudoword’s semantics are part of a denser semantic neighbourhood. Thus, ALC may be interpreted as a measure of semantic activation diversity for pseudowords.
ALDC
The Average Levenshtein Distance of all Candidate productions is the mean of all Levenshtein distances of a word and its candidate forms. That is, for a word with only one candidate form, the Levenshtein distance between that word and its candidate form is its ALDC. For words with multiple candidates, the mean of the individual Levenshtein distances between candidates and targeted form constitutes the ALDC. Thus, higher values indicate that a word’s candidate forms are very different from the intended pronunciation. ALDC may be interpreted as a measure of phonological neighbourhood density, i.e. large values indicate sparse neighbourhoods.
EDNN
This variable describes the Euclidian Distance between a word’s estimated semantic vector \(\hat{s}\) and its Nearest semantic Neighbour. Thus, higher values indicate a larger distance to the nearest semantic neighbour. EDNN may be regarded as a measure of semantic neighbourhood density.
NNC
The Nearest Neighbour Correlation is computed by taking a pseudoword’s estimated semantic vector as given in \(engs.S\) and checking it for the highest correlation value against all real word semantic vectors as given in \(mald.S\). This highest correlation value is taken as NNC value. Thus, higher values indicate that a pseudoword is semantically close to a real word. Additionally, one can tell which real word a pseudoword’s semantics are closest to. This measure may be interpreted as certainty in pseudoword semantics (?).
SCPP
The Semantic Correlation of Predicted Production describes the highest correlation between the estimated semantic vector \(\hat{s}\) of a pseudoword and any of the semantic vectors \(\hat{s}_{candidate1}...\hat{s}_{candidate1+n}\) generated from that word’s candidate forms. Thus, this measure is identical to the previously introduced correlations measure as introduced by the WpmWithLdl
package.
8 Analyses
In total, three separate analyses are presented. First, an analysis based on the WpmWithLdl
measures is conducted. Second, a similar analysis with LDLConvFunctions
measures is done. Third, all measures are combined for a final analysis.
The data set used in the remainder of this script contains the following parts:
- ENGS production data (Schmitz et al., 2020)
- LDL measures as extracted by the
WpmWithLdl
package - LDL measures as extracted by the
LDLConvFunctions
package
The data set contains 52 columns and 653 rows, i.e. 653 observations of 52 variables:
9 WpmWithLdl measures
WpmWithLdl
includes the following measures via the comprehension_measures
and production_measures
functions: click here.
Without discussing the underlying implications of the individual measures, let’s see whether some of them are correlated. If so, we need to take some measures before including them in any models.
9.1 Correlations
Correlations of all WpmWithLdl
measures:
pairscor.fnc(data[,c("cor_max", "cor_target", "l1norm", "l2norm", "density", "cor_affix",
"correlations", "path_counts", "path_sum", "path_entropies", "lwlr")])
Note: rank
is not considered as a predictive variable as its value is \(1\) for all observations. Affix
is not correlated (\(-0.08<\rho<0.16\)) with any of the included measures.
We find the following highly correlated (\(\rho\ge0.5\)) variable combinations:
- cor_max vs. cor_target, \(\rho=1\)
- cor_max vs. correlations, \(\rho=1\)
- cor_target vs. correlations, \(\rho=1\)
- l1norm vs. l2norm, \(\rho=0.94\)
- density vs. cor_affix, \(\rho=0.84\)
- path_counts vs. path_entropies, \(\rho=0.95\)
- path_sum vs. lwlr, \(\rho=-0.76\)
Thus, we need to take some measures before we run into problems of colinearity later on.
9.2 PCA
To avoid colinearity issues when modelling, we combine all highly correlated variables in a principal component analysis.
PCA Computation
First, we extract the highly correlated variables and merge them to form a new data frame.
pca.data1 <- data[colnames(data)=="cor_max"]
pca.data2 <- data[colnames(data)=="cor_target"]
pca.data3 <- data[colnames(data)=="correlations"]
pca.data4 <- data[colnames(data)=="l1norm"]
pca.data5 <- data[colnames(data)=="l2norm"]
pca.data6 <- data[colnames(data)=="density"]
pca.data7 <- data[colnames(data)=="cor_affix"] # this variable is dropped as it only contains data on PL cases
pca.data8 <- data[colnames(data)=="path_counts"]
pca.data9 <- data[colnames(data)=="path_entropies"]
pca.data10 <- data[colnames(data)=="path_sum"]
pca.data11 <- data[colnames(data)=="lwlr"]
pca.data <- cbind(pca.data1,
pca.data2,
pca.data3,
pca.data4,
pca.data5,
pca.data6,
#pca.data7,
pca.data8,
pca.data9,
pca.data10,
pca.data11)
Then, we compute a principal component analysis using the princomp
function:
In order to decide which components to keep, we follow three criteria:
- The Eigenvalue-criterion: Any component that displays an eigenvalue greater than 1.00 is accounting for a greater amount of variance than had been contributed by one variable. Such a component is therefore accounting for a meaningful amount of variance, and is worthy of being retained.
- The scree-criterion: Plot the eigenvalues associated with each component and look for a “break” between the components with relatively large eigenvalues and those with small eigenvalues. The components that appear before the break are assumed to be meaningful and are retained (Cattell, 1966).
- The proportion-of-variance-criterion: Retain enough components so that the cumulative percent of variance accounted for is equal to some minimal value, i.e. 80 percent.
# load package to extract Eigenvalues
library("factoextra")
# extract Eigenvalues
get_eigenvalue(pc)
eigenvalue variance.percent cumulative.variance.percent
Dim.1 3.155143e+00 3.155143e+01 31.55143
Dim.2 2.782508e+00 2.782508e+01 59.37650
Dim.3 2.002288e+00 2.002288e+01 79.39938
Dim.4 1.111486e+00 1.111486e+01 90.51424
Dim.5 6.368246e-01 6.368246e+00 96.88248
Dim.6 2.192438e-01 2.192438e+00 99.07492
Dim.7 6.284442e-02 6.284442e-01 99.70337
Dim.8 2.966341e-02 2.966341e-01 100.00000
Dim.9 7.170136e-17 7.170136e-16 100.00000
Dim.10 0.000000e+00 0.000000e+00 100.00000
Taking all three criteria into consideration, we find that Components 1, 2, and 3 show an Eigenvalue of \(v\ge1\), are followed by a break in Eigenvalues with the Eigenvalue of Component 3 being \(v=2\) and the Eigenvalue of Component 4 being \(v=1.1\), and that the cumulative percentage of variance accounted for by all three components is \(σ^2=79.5\%\), i.e. close to our target level of \(σ^2=80\%\).
Thus, we add components 1, 2, and 3 to our data frame.
pcframe <- as.data.frame(pc$scores)
pcframe <- pcframe[1:3]
names(pcframe)[names(pcframe) == "Comp.1"] <- "WWL_Comp.1"
names(pcframe)[names(pcframe) == "Comp.2"] <- "WWL_Comp.2"
names(pcframe)[names(pcframe) == "Comp.3"] <- "WWL_Comp.3"
data <- cbind(data, pcframe)
PCA Loadings
Now we can take a closer look at the individual components and their loadings.
Loadings:
Comp.1 Comp.2 Comp.3 Comp.4 Comp.5 Comp.6 Comp.7 Comp.8 Comp.9
cor_max 0.462 0.339 0.349
cor_target 0.462 0.339 0.465
correlations 0.462 0.339 -0.814
l1norm 0.211 -0.179 -0.579 0.179 -0.247 -0.509 0.484
l2norm 0.213 -0.170 -0.592 -0.242 -0.158 0.501 -0.483
density -0.246 0.682 0.665 0.120
path_counts 0.296 -0.413 -0.396 0.283 -0.505 -0.499
path_entropies 0.279 -0.408 -0.465 0.190 0.131 0.474 0.510
path_sum -0.208 0.297 -0.377 -0.272 0.516 -0.609 0.119
lwlr 0.229 -0.328 0.403 0.219 -0.224 -0.755
Comp.10
cor_max 0.738
cor_target -0.671
correlations
l1norm
l2norm
density
path_counts
path_entropies
path_sum
lwlr
Comp.1 Comp.2 Comp.3 Comp.4 Comp.5 Comp.6 Comp.7 Comp.8 Comp.9
SS loadings 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0
Proportion Var 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1
Cumulative Var 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
Comp.10
SS loadings 1.0
Proportion Var 0.1
Cumulative Var 1.0
But what do these loadings conceptually mean?
Component 1
Component 1
is rather highly correlated with cor_max
, cor_target
, and correlations
.
For cor_max
, higher values indicate a close semantic neighbour.
For cor_target
, higher values indicate a good mapping from form to meaning, i.e. a higher semantic certainty.
For correlations
, higher values indicate a higher goodness of production by means of fitting semantics.
Note: What is correlations
again? Given all candidate forms of a word, i.e. all word forms found as candidate production by the production model, and their estimated semantic vectors \(\hat{s}_{candidate1}...\hat{s}_{candidate1+n}\), correlation values between these estimated semantic vectors and a word’s semantic vector \(s\) are computed. The highest of these correlations values is taken as value of correlations.
Thus, for Component 1
, high values indicate higher semantic certainty, while lower values indicate higher semantic uncertainty.
Component 2
Component 2
is rather highly negatively correlated with path_counts
and path_entropies
.
For path_counts
, higher values indicate the existence of multiple candidates (and thus paths) in production. This measure may be interpreted as a measure of phonological activation diversity.
For path_entropies
, higher values indicate a higher level of disorder, i.e. uncertainty.
In sum, Component 2
reflects the certainty in production, i.e. high values indicate low certainty.
Component 3
Component 3
is rather highly negatively correlated with l1norm
and l2norm
.
For l1norm
, higher values imply more strong links to many other lexomes.
For l2norm
, higher values imply more strong links to many other lexomes.
Both, l1norm
and l2norm
, may be interpreted as measures of semantic activation diversity.
In sum, Component 3
reflects semantic activation diversity, i.e. high component values indicate low semantic activation diversity.
9.3 GAMM
Let’s see whether the PCA components are significant predictors for word-final /s/ duration in singular and plural pseudowords. We will use Generalized Additive Mixed Models (GAMM) to model /s/ durations.
Modelling
Step 0
First, create a model with all relevant variables, including control and PCA variables, and speaker
as random effect:
gam.WpmWithLdl.1 <- bam(sDurLog ~
Affix +
s(WWL_Comp.1) +
s(WWL_Comp.2) +
s(WWL_Comp.3) +
SylPerMin+
pauseBin+
folType+
preC+
monoMultilingual+
s(speaker, bs="re"),
data=data, method = "REML")
Checking the anova()
function’s output, we now exclude non-significant variables in a stepwise fashion. After each potential exclusion, compareML()
is used to check whether the exclusion is justified in terms of model fit.
Family: gaussian
Link function: identity
Formula:
sDurLog ~ Affix + s(WWL_Comp.1) + s(WWL_Comp.2) + s(WWL_Comp.3) +
SylPerMin + pauseBin + folType + preC + monoMultilingual +
s(speaker, bs = "re")
Parametric Terms:
df F p-value
Affix 1 30.020 6.29e-08
SylPerMin 1 817.699 < 2e-16
pauseBin 1 39.214 7.22e-10
folType 4 4.915 0.000662
preC 3 1.658 0.174939
monoMultilingual 1 0.452 0.501848
Approximate significance of smooth terms:
edf Ref.df F p-value
s(WWL_Comp.1) 1.000 1.001 0.012 0.914079
s(WWL_Comp.2) 3.105 3.943 1.424 0.222711
s(WWL_Comp.3) 3.182 3.926 5.524 0.000316
s(speaker) 31.041 38.000 4.992 < 2e-16
Step 1
Exclusion of WWL_Comp.1
.
gam.WpmWithLdl.2 <- bam(sDurLog ~
Affix +
#s(WWL_Comp.1) +
s(WWL_Comp.2) +
s(WWL_Comp.3) +
SylPerMin+
pauseBin+
folType+
preC+
monoMultilingual+
s(speaker, bs="re"),
data=data, method = "REML")
compareML(gam.WpmWithLdl.1, gam.WpmWithLdl.2)
gam.WpmWithLdl.1: sDurLog ~ Affix + s(WWL_Comp.1) + s(WWL_Comp.2) + s(WWL_Comp.3) +
SylPerMin + pauseBin + folType + preC + monoMultilingual +
s(speaker, bs = "re")
gam.WpmWithLdl.2: sDurLog ~ Affix + s(WWL_Comp.2) + s(WWL_Comp.3) + SylPerMin +
pauseBin + folType + preC + monoMultilingual + s(speaker,
bs = "re")
Model gam.WpmWithLdl.2 preferred: lower REML score (3.083), and lower df (2.000).
-----
Model Score Edf Difference Df
1 gam.WpmWithLdl.1 -148.2247 19
2 gam.WpmWithLdl.2 -151.3075 17 -3.083 2.000
AIC difference: 4.61, model gam.WpmWithLdl.2 has lower AIC.
Family: gaussian
Link function: identity
Formula:
sDurLog ~ Affix + s(WWL_Comp.2) + s(WWL_Comp.3) + SylPerMin +
pauseBin + folType + preC + monoMultilingual + s(speaker,
bs = "re")
Parametric Terms:
df F p-value
Affix 1 28.318 1.45e-07
SylPerMin 1 819.008 < 2e-16
pauseBin 1 38.521 1.00e-09
folType 4 4.645 0.00106
preC 3 1.439 0.23027
monoMultilingual 1 0.430 0.51207
Approximate significance of smooth terms:
edf Ref.df F p-value
s(WWL_Comp.2) 1.000 1.001 2.644 0.104453
s(WWL_Comp.3) 3.093 3.891 4.927 0.000684
s(speaker) 31.068 38.000 5.007 < 2e-16
Step 2
Exclusion of WWL_Comp.2
.
gam.WpmWithLdl.3 <- bam(sDurLog ~
Affix +
#s(WWL_Comp.1) +
#s(WWL_Comp.2) +
s(WWL_Comp.3) +
SylPerMin+
pauseBin+
folType+
preC+
monoMultilingual+
s(speaker, bs="re"),
data=data, method = "REML")
compareML(gam.WpmWithLdl.2, gam.WpmWithLdl.3)
gam.WpmWithLdl.2: sDurLog ~ Affix + s(WWL_Comp.2) + s(WWL_Comp.3) + SylPerMin +
pauseBin + folType + preC + monoMultilingual + s(speaker,
bs = "re")
gam.WpmWithLdl.3: sDurLog ~ Affix + s(WWL_Comp.3) + SylPerMin + pauseBin + folType +
preC + monoMultilingual + s(speaker, bs = "re")
Model gam.WpmWithLdl.3 preferred: lower REML score (2.707), and lower df (2.000).
-----
Model Score Edf Difference Df
1 gam.WpmWithLdl.2 -151.3075 17
2 gam.WpmWithLdl.3 -154.0144 15 -2.707 2.000
AIC difference: -1.38, model gam.WpmWithLdl.2 has lower AIC.
Family: gaussian
Link function: identity
Formula:
sDurLog ~ Affix + s(WWL_Comp.3) + SylPerMin + pauseBin + folType +
preC + monoMultilingual + s(speaker, bs = "re")
Parametric Terms:
df F p-value
Affix 1 26.856 2.99e-07
SylPerMin 1 816.208 < 2e-16
pauseBin 1 38.939 8.21e-10
folType 4 4.850 0.000742
preC 3 1.368 0.251565
monoMultilingual 1 0.445 0.505057
Approximate significance of smooth terms:
edf Ref.df F p-value
s(WWL_Comp.3) 3.185 4.002 4.819 0.000775
s(speaker) 30.943 38.000 4.915 < 2e-16
Step 3
Exclusion of monoMultilingual
.
gam.WpmWithLdl.4 <- bam(sDurLog ~
Affix +
#s(WWL_Comp.1) +
#s(WWL_Comp.2) +
s(WWL_Comp.3) +
SylPerMin+
pauseBin+
folType+
preC+
#monoMultilingual+
s(speaker, bs="re"),
data=data, method = "REML")
compareML(gam.WpmWithLdl.3, gam.WpmWithLdl.4)
gam.WpmWithLdl.3: sDurLog ~ Affix + s(WWL_Comp.3) + SylPerMin + pauseBin + folType +
preC + monoMultilingual + s(speaker, bs = "re")
gam.WpmWithLdl.4: sDurLog ~ Affix + s(WWL_Comp.3) + SylPerMin + pauseBin + folType +
preC + s(speaker, bs = "re")
Model gam.WpmWithLdl.4 preferred: lower REML score (2.044), and lower df (1.000).
-----
Model Score Edf Difference Df
1 gam.WpmWithLdl.3 -154.0144 15
2 gam.WpmWithLdl.4 -156.0587 14 -2.044 1.000
AIC difference: 0.33, model gam.WpmWithLdl.4 has lower AIC.
Family: gaussian
Link function: identity
Formula:
sDurLog ~ Affix + s(WWL_Comp.3) + SylPerMin + pauseBin + folType +
preC + s(speaker, bs = "re")
Parametric Terms:
df F p-value
Affix 1 26.825 3.03e-07
SylPerMin 1 831.268 < 2e-16
pauseBin 1 38.776 8.88e-10
folType 4 4.838 0.000757
preC 3 1.353 0.256172
Approximate significance of smooth terms:
edf Ref.df F p-value
s(WWL_Comp.3) 3.191 4.009 4.829 0.000752
s(speaker) 31.659 39.000 4.842 < 2e-16
Step 4
Exclusion of preC
.
gam.WpmWithLdl.5 <- bam(sDurLog ~
Affix +
#s(WWL_Comp.1) +
#s(WWL_Comp.2) +
s(WWL_Comp.3) +
SylPerMin+
pauseBin+
folType+
#preC+
#monoMultilingual+
s(speaker, bs="re"),
data=data, method = "REML")
compareML(gam.WpmWithLdl.4, gam.WpmWithLdl.5)
gam.WpmWithLdl.4: sDurLog ~ Affix + s(WWL_Comp.3) + SylPerMin + pauseBin + folType +
preC + s(speaker, bs = "re")
gam.WpmWithLdl.5: sDurLog ~ Affix + s(WWL_Comp.3) + SylPerMin + pauseBin + folType +
s(speaker, bs = "re")
Model gam.WpmWithLdl.5 preferred: lower REML score (7.281), and lower df (3.000).
-----
Model Score Edf Difference Df
1 gam.WpmWithLdl.4 -156.0587 14
2 gam.WpmWithLdl.5 -163.3395 11 -7.281 3.000
AIC difference: 1.40, model gam.WpmWithLdl.5 has lower AIC.
Family: gaussian
Link function: identity
Formula:
sDurLog ~ Affix + s(WWL_Comp.3) + SylPerMin + pauseBin + folType +
s(speaker, bs = "re")
Parametric Terms:
df F p-value
Affix 1 26.473 3.61e-07
SylPerMin 1 850.839 < 2e-16
pauseBin 1 39.885 5.19e-10
folType 4 5.143 0.000442
Approximate significance of smooth terms:
edf Ref.df F p-value
s(WWL_Comp.3) 3.151 3.964 4.636 0.00106
s(speaker) 31.592 39.000 4.785 < 2e-16
Summary
Family: gaussian
Link function: identity
Formula:
sDurLog ~ Affix + s(WWL_Comp.3) + SylPerMin + pauseBin + folType +
s(speaker, bs = "re")
Parametric coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.8722959 0.0473099 -18.438 < 2e-16 ***
AffixPL -0.0754263 0.0146596 -5.145 3.61e-07 ***
SylPerMin -0.5032766 0.0172537 -29.169 < 2e-16 ***
pauseBinpause 0.1122957 0.0177810 6.315 5.19e-10 ***
folTypeF -0.0185203 0.0550083 -0.337 0.736472
folTypeN 0.0002509 0.0211516 0.012 0.990540
folTypeP 0.0095421 0.0185635 0.514 0.607421
folTypeV -0.0672956 0.0188033 -3.579 0.000372 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Approximate significance of smooth terms:
edf Ref.df F p-value
s(WWL_Comp.3) 3.151 3.964 4.636 0.00106 **
s(speaker) 31.592 39.000 4.785 < 2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
R-sq.(adj) = 0.804 Deviance explained = 81.7%
-REML = -163.34 Scale est. = 0.029512 n = 653
Summary:
* Affix : factor; set to the value(s): PL.
* SylPerMin : numeric predictor; set to the value(s): 2.336339423.
* pauseBin : factor; set to the value(s): no_pause.
* folType : factor; set to the value(s): APP.
* WWL_Comp.3 : numeric predictor; with 30 values ranging from -3.513786 to 3.834930.
* speaker : factor; set to the value(s): 2. (Might be canceled as random effect, check below.)
* NOTE : The following random effects columns are canceled: s(speaker)
As we only have a few values to the far left, but a clearly linear slope to the right for a majority of data points, these results may indicate a rather linear relationship. That is, higher WWL_Comp.3
values lead to longer /s/ durations, i.e. a lower semantic activation diversity leads to longer /s/ durations.
9.4 LMER
Let’s see whether a linear mixed effects regression model does a similarly good job as compared to the previous GAMM model.
Modelling
First, create a model with all relevant variables, including control and PCA variables, and speaker
and monoMultilingual
as random effect:
lmer.1 <- lmer(sDurLog ~
Affix +
WWL_Comp.1 +
WWL_Comp.2 +
WWL_Comp.3 +
SylPerMin +
pauseBin +
folType +
preC +
(1 | monoMultilingual) +
(1 | speaker),
data=data, REML=T)
anova(lmer.1)
Type III Analysis of Variance Table with Satterthwaite's method
Sum Sq Mean Sq NumDF DenDF F value Pr(>F)
Affix 0.8772 0.8772 1 627.56 29.3486 8.626e-08 ***
WWL_Comp.1 0.0094 0.0094 1 611.01 0.3131 0.575983
WWL_Comp.2 0.0724 0.0724 1 610.91 2.4219 0.120166
WWL_Comp.3 0.2908 0.2908 1 607.50 9.7290 0.001900 **
SylPerMin 24.3084 24.3084 1 524.79 813.2841 < 2.2e-16 ***
pauseBin 1.1917 1.1917 1 634.01 39.8705 5.105e-10 ***
folType 0.5508 0.1377 4 610.60 4.6074 0.001135 **
preC 0.1804 0.0601 3 607.23 2.0114 0.111149
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Using the step()
function, we exclude all non-significant predictors:
Backward reduced random-effect table:
Eliminated npar logLik AIC LRT Df Pr(>Chisq)
<none> 17 144.931 -255.86
(1 | monoMultilingual) 1 16 144.931 -257.86 0.000 1 1
(1 | speaker) 0 15 96.682 -163.37 96.498 1 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Backward reduced fixed-effect table:
Degrees of freedom method: Satterthwaite
Eliminated Sum Sq Mean Sq NumDF DenDF F value Pr(>F)
WWL_Comp.1 1 0.0094 0.0094 1 611.01 0.3131 0.5759828
preC 2 0.1711 0.0570 3 608.37 1.9105 0.1266333
WWL_Comp.2 3 0.0708 0.0708 1 616.28 2.3597 0.1250230
Affix 0 0.8281 0.8281 1 633.30 27.5107 2.133e-07 ***
WWL_Comp.3 0 0.2137 0.2137 1 612.02 7.0983 0.0079192 **
SylPerMin 0 24.9558 24.9558 1 532.92 829.1163 < 2.2e-16 ***
pauseBin 0 1.2666 1.2666 1 639.57 42.0816 1.754e-10 ***
folType 0 0.6187 0.1547 4 615.86 5.1387 0.0004454 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Model found:
sDurLog ~ Affix + WWL_Comp.3 + SylPerMin + pauseBin + folType +
(1 | speaker)
This leaves us with the following final model:
lmer.2 <- lmer(sDurLog ~
Affix +
WWL_Comp.3 +
SylPerMin +
pauseBin +
folType +
(1 | speaker),
data=data, REML=T)
anova(lmer.2)
Type III Analysis of Variance Table with Satterthwaite's method
Sum Sq Mean Sq NumDF DenDF F value Pr(>F)
Affix 0.8281 0.8281 1 633.30 27.5107 2.133e-07 ***
WWL_Comp.3 0.2137 0.2137 1 612.02 7.0983 0.0079192 **
SylPerMin 24.9558 24.9558 1 532.92 829.1163 < 2.2e-16 ***
pauseBin 1.2666 1.2666 1 639.57 42.0816 1.754e-10 ***
folType 0.6187 0.1547 4 615.86 5.1387 0.0004454 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Trimming
Trimming appears to be a good idea according to the residual plots. However, for some unknown reason, the trimming procedure cannot be shown in R Markdown. Thus, the detailed procedure is skipped here. Trimming results in the removal of \(n=9\) (i.e. 1.4%) data points.
lmer.final <- lmer(sDurLog ~
Affix +
WWL_Comp.3 +
SylPerMin +
pauseBin +
folType +
(1 | speaker),
data=data.trim, REML=T)
anova(lmer.final)
Type III Analysis of Variance Table with Satterthwaite's method
Sum Sq Mean Sq NumDF DenDF F value Pr(>F)
Affix 0.6726 0.6726 1 621.87 25.0318 7.349e-07 ***
WWL_Comp.3 0.2631 0.2631 1 602.56 9.7893 0.001840 **
SylPerMin 25.1170 25.1170 1 542.38 934.7171 < 2.2e-16 ***
pauseBin 1.0618 1.0618 1 627.72 39.5127 6.102e-10 ***
folType 0.5285 0.1321 4 605.72 4.9174 0.000659 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Linear mixed model fit by REML. t-tests use Satterthwaite's method [
lmerModLmerTest]
Formula: sDurLog ~ Affix + WWL_Comp.3 + SylPerMin + pauseBin + folType +
(1 | speaker)
Data: data.trim
REML criterion at convergence: -379.9
Scaled residuals:
Min 1Q Median 3Q Max
-2.62309 -0.65373 -0.00777 0.68864 2.36020
Random effects:
Groups Name Variance Std.Dev.
speaker (Intercept) 0.008885 0.09426
Residual 0.026871 0.16392
Number of obs: 644, groups: speaker, 40
Fixed effects:
Estimate Std. Error df t value Pr(>|t|)
(Intercept) -0.832181 0.046440 389.439546 -17.920 < 2e-16 ***
AffixPL -0.070555 0.014102 621.865569 -5.003 7.35e-07 ***
WWL_Comp.3 0.014533 0.004645 602.558403 3.129 0.001840 **
SylPerMin -0.519794 0.017002 542.380437 -30.573 < 2e-16 ***
pauseBinpause 0.107248 0.017062 627.724677 6.286 6.10e-10 ***
folTypeF -0.026878 0.052474 608.948469 -0.512 0.608684
folTypeN 0.003267 0.020308 600.757647 0.161 0.872248
folTypeP 0.010356 0.017887 602.370095 0.579 0.562842
folTypeV -0.061733 0.018072 609.657388 -3.416 0.000678 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Correlation of Fixed Effects:
(Intr) AffxPL WWL_C. SylPrM psBnps flTypF flTypN flTypP
AffixPL -0.073
WWL_Comp.3 0.106 0.081
SylPerMin -0.892 -0.091 -0.121
pauseBinpas -0.450 -0.131 -0.115 0.422
folTypeF -0.083 0.060 -0.021 0.017 0.001
folTypeN -0.124 0.098 0.025 -0.044 -0.076 0.136
folTypeP -0.066 0.032 -0.016 -0.121 -0.094 0.153 0.409
folTypeV 0.062 -0.015 0.024 -0.244 -0.230 0.139 0.415 0.489
9.5 LMER vs. GAMM
Let’s compare the results of both, LMER and GAMM, models.
As we have seen already, the GAMM model (ignoring the left hand side with only few data points) predicts higher sDurLog
values for higher values of WWL_Comp.3
:
Summary:
* Affix : factor; set to the value(s): PL.
* SylPerMin : numeric predictor; set to the value(s): 2.336339423.
* pauseBin : factor; set to the value(s): no_pause.
* folType : factor; set to the value(s): APP.
* WWL_Comp.3 : numeric predictor; with 30 values ranging from -3.513786 to 3.834930.
* speaker : factor; set to the value(s): 2. (Might be canceled as random effect, check below.)
* NOTE : The following random effects columns are canceled: s(speaker)
Taking into account the low number of data points, we checked whether a LMER model comes to a similar result:
Indeed, the LMER model predicts higher sDurLog
values for higher values of WWL_Comp.3
as well.
Thus, both models indicate that the duration of /s/ increases with higher values of WWL_Comp.3
, i.e. with lower semantic activation diversity (i.e. lower values of l1norm and l2norm).
10 LDLConvFunctions measures
LDLConvFunctions
includes the following measures as implemented in Chuang et al. (2020): click here.
Without discussing the underlying implications of the individual measures, let’s see whether some of them are correlated. If so, we need to take some measures before including them in any models.
10.1 Correlations
Correlations of all LDLConvFunctions
measures and Affix
:
We find the following highly correlated (\(\rho\ge0.5\)) variable combination:
- EDNN vs. SCPP, \(\rho=-0.86\)
- Affix vs. NNC, \(r_{pb}=-0.55\) (point-biserial correlation)
Thus, we need to take some measures before we run into problems of colinearity later on.
10.2 PCA
To avoid colinearity issues when modelling, we combine all highly correlated variables in a principal component analysis. However, as we must include Affix
as categorical variable, we cannot use the princomp
function. Instead, we make use of the PCAmixdata
function of the PCAmix
package. This function uses multiple correspondence analysis (MCA) to include categorical data in PCAs.
PCA Computation
First, we extract the highly correlated variables and merge them to form two new data frames: one for numerical data, one for categorical data.
pca.data1 <- data[colnames(data)=="EDNN"]
pca.data2 <- data[colnames(data)=="SCPP"]
pca.data3 <- data[colnames(data)=="Affix"]
pca.data4 <- data[colnames(data)=="NNC"]
# numerical data
pca.data1 <- cbind(pca.data1,
pca.data2,
pca.data4)
# categorical data
pca.data2 <- cbind(pca.data3)
Then, we compute a principal component analysis using the PCAmixdata
function:
library(PCAmixdata)
res.pcamix <- PCAmix(X.quanti = pca.data1, X.quali = pca.data2, rename.level = TRUE, graph = FALSE)
In order to decide which components to keep, we follow three criteria:
- The Eigenvalue-criterion: Any component that displays an eigenvalue greater than 1.00 is accounting for a greater amount of variance than had been contributed by one variable. Such a component is therefore accounting for a meaningful amount of variance, and is worthy of being retained.
- The scree-criterion: Plot the eigenvalues associated with each component and look for a “break” between the components with relatively large eigenvalues and those with small eigenvalues. The components that appear before the break are assumed to be meaningful and are retained (Cattell, 1966).
- The proportion-of-variance-criterion: Retain enough components so that the cumulative percent of variance accounted for is equal to some minimal value, i.e. 80 percent.
Eigenvalue Proportion Cumulative
dim 1 1.8785047 46.962617 46.96262
dim 2 1.5394156 38.485390 85.44801
dim 3 0.4485290 11.213226 96.66123
dim 4 0.1335507 3.338767 100.00000
Taking all three criteria into consideration, we find that Components 1 and 2 show an Eigenvalue of \(v\ge1\), are followed by a break in Eigenvalues with the Eigenvalue of Component 2 being \(v=1.5\) and the Eigenvalue of Component 3 being \(v=0.4\), and that the cumulative percentage of variance accounted for by Components 1 and 2 is \(σ^2=85\%\), i.e. above our target level of \(σ^2=80\%\).
Thus, we add components 1 and 2 to our data frame.
pcframe <- as.data.frame(res.pcamix$scores)
pcframe <- pcframe[1:2]
names(pcframe)[names(pcframe) == "dim 1"] <- "LCF_Comp.1b"
names(pcframe)[names(pcframe) == "dim 2"] <- "LCF_Comp.2b"
data <- cbind(data, pcframe)
PCA Loadings
Now we can take a closer look at the component’s loadings.
dim 1 dim 2 dim 3 dim 4
EDNN 0.90763095 0.02444891 0.001449271 0.066470876
SCPP 0.88329422 0.04403071 0.008175566 0.064499500
NNC 0.06501819 0.71454314 0.218583906 0.001854766
Affix 0.02256131 0.75639285 0.220320282 0.000725555
But what do these loadings conceptually mean?
Component 1
Component 1
is rather highly correlated with EDNN
and SCPP
.
Following the correlation circle, we find that Component 1
is positively correlated with EDDN
, while it is negatively correlated with SCPP.
For EDNN
, higher values indicate a larger distance to the nearest semantic neighbour. Thus, EDNN
may be regarded as a measure of semantic activation diversity.
For SCPP
, higher values indicate a higher goodness of production by means of fitting semantics.
Note: SCPP
is identical to the correlations
measure of the WpmWithLdl
package.
Thus, for Component 1
, high values indicate less semantic activation diversity, while lower values indicate more semantic activation diversity.
Component 2
Component 2
is rather highly correlated with Affix
and NNC
.
Following the correlation circle, we find that Component 2
is positively correlated with NNC
.
For NNC
, higher values indicate a close real word semantic neighbour, i.e. a probably denser neighbourhood/higher activation diversity.
As Affix
is a categorical variable, its interpretation is somewhat different. According to Chavent et al. (2017), coordinates on the level map are barycenters of PC scores. That is, we get some sort of an idea of how the levels of Affix
behave. However, we cannot tell much more (I think).
Thus, for Component 2
, high values indicate a close real word semantic neighbour, while also containing some sort of information on Affix.
10.3 GAMM
Let’s see whether PCA component and LCF measures are significant predictors for word-final /s/ duration in singular and plural pseudowords. We will use Generalized Additive Mixed Models (GAMM) to model /s/ durations.
Modelling
Step 0
First, create a model with all relevant variables, including control and PCA variables, and speaker
as random effect:
gam.LCF.1 <- bam(sDurLog ~
s(LCF_Comp.1b) +
s(LCF_Comp.2b) +
s(ALC) +
s(ALDC, k=3) +
SylPerMin+
pauseBin+
folType+
preC+
monoMultilingual+
s(speaker, bs="re"),
data=data, method = "REML")
Checking the anova()
function’s output, we now exclude non-significant variables in a stepwise fashion. After each potential exclusion, compareML()
is used to check whether the exclusion is justified in terms of model fit.
Family: gaussian
Link function: identity
Formula:
sDurLog ~ s(LCF_Comp.1b) + s(LCF_Comp.2b) + s(ALC) + s(ALDC,
k = 3) + SylPerMin + pauseBin + folType + preC + monoMultilingual +
s(speaker, bs = "re")
Parametric Terms:
df F p-value
SylPerMin 1 814.815 < 2e-16
pauseBin 1 41.425 2.5e-10
folType 4 4.840 0.000755
preC 3 1.452 0.226538
monoMultilingual 1 0.406 0.524330
Approximate significance of smooth terms:
edf Ref.df F p-value
s(LCF_Comp.1b) 1.000 1.000 0.827 0.363355
s(LCF_Comp.2b) 2.572 3.103 6.102 0.000392
s(ALC) 2.487 3.108 5.307 0.001105
s(ALDC) 1.000 1.000 0.201 0.654484
s(speaker) 31.317 38.000 5.207 < 2e-16
Step 1
Exclusion of ALDC
.
gam.LCF.2 <- bam(sDurLog ~
s(LCF_Comp.1b) +
s(LCF_Comp.2b) +
s(ALC) +
#s(ALDC, k=3) +
SylPerMin+
pauseBin+
folType+
preC+
monoMultilingual+
s(speaker, bs="re"),
data=data, method = "REML")
compareML(gam.LCF.1, gam.LCF.2)
gam.LCF.1: sDurLog ~ s(LCF_Comp.1b) + s(LCF_Comp.2b) + s(ALC) + s(ALDC,
k = 3) + SylPerMin + pauseBin + folType + preC + monoMultilingual +
s(speaker, bs = "re")
gam.LCF.2: sDurLog ~ s(LCF_Comp.1b) + s(LCF_Comp.2b) + s(ALC) + SylPerMin +
pauseBin + folType + preC + monoMultilingual + s(speaker,
bs = "re")
Model gam.LCF.2 preferred: lower REML score (3.850), and lower df (2.000).
-----
Model Score Edf Difference Df
1 gam.LCF.1 -147.7378 20
2 gam.LCF.2 -151.5880 18 -3.850 2.000
AIC difference: 2.01, model gam.LCF.2 has lower AIC.
Family: gaussian
Link function: identity
Formula:
sDurLog ~ s(LCF_Comp.1b) + s(LCF_Comp.2b) + s(ALC) + SylPerMin +
pauseBin + folType + preC + monoMultilingual + s(speaker,
bs = "re")
Parametric Terms:
df F p-value
SylPerMin 1 818.173 < 2e-16
pauseBin 1 41.391 2.54e-10
folType 4 4.855 0.000736
preC 3 1.390 0.244925
monoMultilingual 1 0.408 0.523430
Approximate significance of smooth terms:
edf Ref.df F p-value
s(LCF_Comp.1b) 1.000 1.001 0.973 0.324097
s(LCF_Comp.2b) 2.585 3.120 6.474 0.000228
s(ALC) 2.660 3.310 5.008 0.001355
s(speaker) 31.329 38.000 5.218 < 2e-16
Step 2
Exclusion of LCF_Comp.1b
.
gam.LCF.3 <- bam(sDurLog ~
#s(LCF_Comp.1b) +
s(LCF_Comp.2b) +
s(ALC) +
#s(ALDC, k=3) +
SylPerMin+
pauseBin+
folType+
preC+
monoMultilingual+
s(speaker, bs="re"),
data=data, method = "REML")
compareML(gam.LCF.2, gam.LCF.3)
gam.LCF.2: sDurLog ~ s(LCF_Comp.1b) + s(LCF_Comp.2b) + s(ALC) + SylPerMin +
pauseBin + folType + preC + monoMultilingual + s(speaker,
bs = "re")
gam.LCF.3: sDurLog ~ s(LCF_Comp.2b) + s(ALC) + SylPerMin + pauseBin + folType +
preC + monoMultilingual + s(speaker, bs = "re")
Model gam.LCF.3 preferred: lower REML score (3.495), and lower df (2.000).
-----
Model Score Edf Difference Df
1 gam.LCF.2 -151.5880 18
2 gam.LCF.3 -155.0833 16 -3.495 2.000
AIC difference: -0.07, model gam.LCF.2 has lower AIC.
Family: gaussian
Link function: identity
Formula:
sDurLog ~ s(LCF_Comp.2b) + s(ALC) + SylPerMin + pauseBin + folType +
preC + monoMultilingual + s(speaker, bs = "re")
Parametric Terms:
df F p-value
SylPerMin 1 817.979 < 2e-16
pauseBin 1 41.930 1.96e-10
folType 4 5.011 0.000559
preC 3 1.630 0.181189
monoMultilingual 1 0.410 0.522315
Approximate significance of smooth terms:
edf Ref.df F p-value
s(LCF_Comp.2b) 2.604 3.149 6.112 0.000367
s(ALC) 2.674 3.326 5.429 0.000701
s(speaker) 31.265 38.000 5.178 < 2e-16
Step 3
Exclusion of monoMultilingual
.
gam.LCF.4 <- bam(sDurLog ~
#s(LCF_Comp.1b) +
s(LCF_Comp.2b) +
s(ALC) +
#s(ALDC, k=3) +
SylPerMin+
pauseBin+
folType+
preC+
#monoMultilingual+
s(speaker, bs="re"),
data=data, method = "REML")
compareML(gam.LCF.3, gam.LCF.4)
gam.LCF.3: sDurLog ~ s(LCF_Comp.2b) + s(ALC) + SylPerMin + pauseBin + folType +
preC + monoMultilingual + s(speaker, bs = "re")
gam.LCF.4: sDurLog ~ s(LCF_Comp.2b) + s(ALC) + SylPerMin + pauseBin + folType +
preC + s(speaker, bs = "re")
Model gam.LCF.4 preferred: lower REML score (2.044), and lower df (1.000).
-----
Model Score Edf Difference Df
1 gam.LCF.3 -155.0833 16
2 gam.LCF.4 -157.1269 15 -2.044 1.000
AIC difference: 0.28, model gam.LCF.4 has lower AIC.
Family: gaussian
Link function: identity
Formula:
sDurLog ~ s(LCF_Comp.2b) + s(ALC) + SylPerMin + pauseBin + folType +
preC + s(speaker, bs = "re")
Parametric Terms:
df F p-value
SylPerMin 1 832.927 < 2e-16
pauseBin 1 41.771 2.11e-10
folType 4 4.998 0.000572
preC 3 1.627 0.181921
Approximate significance of smooth terms:
edf Ref.df F p-value
s(LCF_Comp.2b) 2.608 3.154 6.111 0.000366
s(ALC) 2.677 3.330 5.428 0.000700
s(speaker) 31.986 39.000 5.097 < 2e-16
Step 4
Exclusion of preC
.
gam.LCF.5 <- bam(sDurLog ~
#s(LCF_Comp.1b) +
s(LCF_Comp.2b) +
s(ALC) +
#s(ALDC, k=3) +
SylPerMin+
pauseBin+
folType+
#preC+
#monoMultilingual+
s(speaker, bs="re"),
data=data, method = "REML")
compareML(gam.LCF.4, gam.LCF.5)
gam.LCF.4: sDurLog ~ s(LCF_Comp.2b) + s(ALC) + SylPerMin + pauseBin + folType +
preC + s(speaker, bs = "re")
gam.LCF.5: sDurLog ~ s(LCF_Comp.2b) + s(ALC) + SylPerMin + pauseBin + folType +
s(speaker, bs = "re")
Model gam.LCF.5 preferred: lower REML score (6.892), and lower df (3.000).
-----
Model Score Edf Difference Df
1 gam.LCF.4 -157.1269 15
2 gam.LCF.5 -164.0189 12 -6.892 3.000
AIC difference: 0.55, model gam.LCF.5 has lower AIC.
Family: gaussian
Link function: identity
Formula:
sDurLog ~ s(LCF_Comp.2b) + s(ALC) + SylPerMin + pauseBin + folType +
s(speaker, bs = "re")
Parametric Terms:
df F p-value
SylPerMin 1 851.499 < 2e-16
pauseBin 1 42.215 1.7e-10
folType 4 5.227 0.000382
Approximate significance of smooth terms:
edf Ref.df F p-value
s(LCF_Comp.2b) 2.503 3.016 6.280 0.000331
s(ALC) 2.731 3.392 4.996 0.001534
s(speaker) 31.932 39.000 5.044 < 2e-16
Summary
Family: gaussian
Link function: identity
Formula:
sDurLog ~ s(LCF_Comp.2b) + s(ALC) + SylPerMin + pauseBin + folType +
s(speaker, bs = "re")
Parametric coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.9169296 0.0472791 -19.394 < 2e-16 ***
SylPerMin -0.5017847 0.0171959 -29.180 < 2e-16 ***
pauseBinpause 0.1144248 0.0176111 6.497 1.7e-10 ***
folTypeF -0.0272402 0.0548033 -0.497 0.619331
folTypeN 0.0009119 0.0210794 0.043 0.965510
folTypeP 0.0092602 0.0184728 0.501 0.616350
folTypeV -0.0675720 0.0187290 -3.608 0.000334 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Approximate significance of smooth terms:
edf Ref.df F p-value
s(LCF_Comp.2b) 2.503 3.016 6.280 0.000331 ***
s(ALC) 2.731 3.392 4.996 0.001534 **
s(speaker) 31.932 39.000 5.044 < 2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
R-sq.(adj) = 0.806 Deviance explained = 81.9%
-REML = -164.02 Scale est. = 0.029191 n = 653
Summary:
* SylPerMin : numeric predictor; set to the value(s): 2.336339423.
* pauseBin : factor; set to the value(s): no_pause.
* folType : factor; set to the value(s): APP.
* LCF_Comp.2b : numeric predictor; set to the value(s): -0.433978743638555.
* ALC : numeric predictor; with 30 values ranging from -0.013297 to 0.019239.
* speaker : factor; set to the value(s): 2. (Might be canceled as random effect, check below.)
* NOTE : The following random effects columns are canceled: s(speaker)
The model predicts a rather unclear picture for the influence of ALC.
That is, lower ALC
appears to lead to longer /s/ durations, while after a certain point ALC
no longer seems to show a clear effect.
Summary:
* SylPerMin : numeric predictor; set to the value(s): 2.336339423.
* pauseBin : factor; set to the value(s): no_pause.
* folType : factor; set to the value(s): APP.
* LCF_Comp.2b : numeric predictor; with 30 values ranging from -3.977563 to 1.311103.
* ALC : numeric predictor; set to the value(s): 0.008344338.
* speaker : factor; set to the value(s): 2. (Might be canceled as random effect, check below.)
* NOTE : The following random effects columns are canceled: s(speaker)
LCF_Comp.2b
also shows a rather unclear picture. While higher values seemingly lead to longer /s/ durations, values somewhere between -1 and 0.5 do lead to shorter /s/ durations. Values above 0.5 again lead to longer /s/ durations.
Taking the unclear picture, it appears that LMER models may be a justified alternative.
10.4 LMER
Let’s see whether a linear mixed effects regression model does a similarly good job as compared to the previous GAMM model.
Modelling
First, create a model with all relevant variables, including control and PCA variables, and speaker
and monoMultilingual
as random effect:
lmer.1 <- lmer(sDurLog ~
LCF_Comp.1b +
LCF_Comp.2b +
ALC +
ALDC +
SylPerMin+
pauseBin+
folType+
preC+
(1 | monoMultilingual) +
(1 | speaker),
data=data, REML=T)
anova(lmer.1)
Type III Analysis of Variance Table with Satterthwaite's method
Sum Sq Mean Sq NumDF DenDF F value Pr(>F)
LCF_Comp.1b 0.0001 0.0001 1 612.09 0.0048 0.9446279
LCF_Comp.2b 0.1756 0.1756 1 621.73 5.8536 0.0158308 *
ALC 0.7639 0.7639 1 611.99 25.4697 5.934e-07 ***
ALDC 0.0785 0.0785 1 610.34 2.6172 0.1062261
SylPerMin 23.9691 23.9691 1 537.11 799.1533 < 2.2e-16 ***
pauseBin 1.2166 1.2166 1 633.28 40.5615 3.663e-10 ***
folType 0.6448 0.1612 4 609.93 5.3742 0.0002942 ***
preC 0.1676 0.0559 3 607.22 1.8631 0.1346015
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Using the step()
function, we exclude all non-significant predictors:
Backward reduced random-effect table:
Eliminated npar logLik AIC LRT Df Pr(>Chisq)
<none> 17 149.046 -264.09
(1 | monoMultilingual) 1 16 149.046 -266.09 0.00 1 1
(1 | speaker) 0 15 98.682 -167.36 100.73 1 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Backward reduced fixed-effect table:
Degrees of freedom method: Satterthwaite
Eliminated Sum Sq Mean Sq NumDF DenDF F value Pr(>F)
LCF_Comp.1b 1 0.0001 0.0001 1 612.09 0.0048 0.944628
preC 2 0.1710 0.0570 3 608.42 1.9033 0.127817
ALDC 3 0.0519 0.0519 1 614.45 1.7242 0.189640
LCF_Comp.2b 0 0.2468 0.2468 1 626.78 8.1856 0.004363 **
ALC 0 0.6714 0.6714 1 619.08 22.2688 2.933e-06 ***
SylPerMin 0 24.7776 24.7776 1 540.90 821.8333 < 2.2e-16 ***
pauseBin 0 1.2308 1.2308 1 638.75 40.8227 3.213e-10 ***
folType 0 0.6826 0.1707 4 615.30 5.6603 0.000177 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Model found:
sDurLog ~ LCF_Comp.2b + ALC + SylPerMin + pauseBin + folType +
(1 | speaker)
This leaves us with the following final model:
lmer.2 <- lmer(sDurLog ~
LCF_Comp.2b +
ALC +
SylPerMin +
pauseBin +
folType +
(1 | speaker),
data=data, REML=T)
anova(lmer.2)
Type III Analysis of Variance Table with Satterthwaite's method
Sum Sq Mean Sq NumDF DenDF F value Pr(>F)
LCF_Comp.2b 0.2468 0.2468 1 626.78 8.1856 0.004363 **
ALC 0.6714 0.6714 1 619.08 22.2688 2.933e-06 ***
SylPerMin 24.7776 24.7776 1 540.90 821.8333 < 2.2e-16 ***
pauseBin 1.2308 1.2308 1 638.75 40.8227 3.213e-10 ***
folType 0.6826 0.1707 4 615.30 5.6603 0.000177 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Trimming
Trimming appears to be a good idea according to the residual plots. However, for some unknown reason, the trimming procedure cannot be shown in R Markdown. Thus, the detailed procedure is skipped here. Trimming results in the removal of \(n=10\) (i.e. 1.5%) data points.
lmer.final <- lmer(sDurLog ~
LCF_Comp.2b +
ALC +
SylPerMin +
pauseBin +
folType +
(1 | speaker),
data=data.trim, REML=T)
anova(lmer.final)
Type III Analysis of Variance Table with Satterthwaite's method
Sum Sq Mean Sq NumDF DenDF F value Pr(>F)
LCF_Comp.2b 0.1368 0.1368 1 615.43 5.1128 0.024098 *
ALC 0.6190 0.6190 1 607.32 23.1385 1.904e-06 ***
SylPerMin 24.8506 24.8506 1 546.76 928.9257 < 2.2e-16 ***
pauseBin 0.9860 0.9860 1 626.43 36.8587 2.202e-09 ***
folType 0.6255 0.1564 4 604.27 5.8449 0.000128 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Linear mixed model fit by REML. t-tests use Satterthwaite's method [
lmerModLmerTest]
Formula: sDurLog ~ LCF_Comp.2b + ALC + SylPerMin + pauseBin + folType +
(1 | speaker)
Data: data.trim
REML criterion at convergence: -390
Scaled residuals:
Min 1Q Median 3Q Max
-2.57729 -0.69695 -0.00471 0.68707 2.62291
Random effects:
Groups Name Variance Std.Dev.
speaker (Intercept) 0.009019 0.09497
Residual 0.026752 0.16356
Number of obs: 643, groups: speaker, 40
Fixed effects:
Estimate Std. Error df t value Pr(>|t|)
(Intercept) -0.842383 0.047433 406.509041 -17.760 < 2e-16 ***
LCF_Comp.2b 0.012681 0.005608 615.429693 2.261 0.024098 *
ALC -4.196156 0.872336 607.321860 -4.810 1.9e-06 ***
SylPerMin -0.517208 0.016970 546.763337 -30.478 < 2e-16 ***
pauseBinpause 0.102981 0.016962 626.430656 6.071 2.2e-09 ***
folTypeF -0.027539 0.052405 607.773113 -0.526 0.599426
folTypeN 0.004413 0.020288 599.542730 0.218 0.827886
folTypeP 0.014218 0.017876 601.661616 0.795 0.426715
folTypeV -0.065205 0.018070 607.789600 -3.609 0.000333 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Correlation of Fixed Effects:
(Intr) LCF_C. ALC SylPrM psBnps flTypF flTypN flTypP
LCF_Comp.2b -0.153
ALC -0.213 0.113
SylPerMin -0.896 0.145 0.076
pauseBinpas -0.466 0.139 0.042 0.421
folTypeF -0.082 -0.030 0.064 0.018 0.004
folTypeN -0.114 -0.080 0.035 -0.043 -0.070 0.137
folTypeP -0.056 -0.025 0.011 -0.129 -0.094 0.153 0.413
folTypeV 0.053 0.000 0.002 -0.240 -0.229 0.142 0.420 0.495
10.5 LMER vs. GAMM
Let’s compare the results of both, LMER and GAMM, models.
As we have seen already, the GAMM model apparently predicts longer /s/ durations for low ALC
values, while higher ALC
values show almost no influence on /s/ duration:
Summary:
* SylPerMin : numeric predictor; set to the value(s): 2.336339423.
* pauseBin : factor; set to the value(s): no_pause.
* folType : factor; set to the value(s): APP.
* LCF_Comp.2b : numeric predictor; set to the value(s): -0.433978743638555.
* ALC : numeric predictor; with 30 values ranging from -0.013297 to 0.019239.
* speaker : factor; set to the value(s): 2. (Might be canceled as random effect, check below.)
* NOTE : The following random effects columns are canceled: s(speaker)
Taking into account the unclear picture, we checked whether an LMER model comes to a similar result:
The LMER model predicts shorter /s/ durations for higher ALC
values. Taking into account the low edf-value (\(edf=2.731\)) of the ALC
smooth term in the GAMM model, one should prefer the LMER result (I think).
For LCF_Comp.2b
, the GAMM model again shows an unclear picture.
Summary:
* SylPerMin : numeric predictor; set to the value(s): 2.336339423.
* pauseBin : factor; set to the value(s): no_pause.
* folType : factor; set to the value(s): APP.
* LCF_Comp.2b : numeric predictor; with 30 values ranging from -3.977563 to 1.311103.
* ALC : numeric predictor; set to the value(s): 0.008344338.
* speaker : factor; set to the value(s): 2. (Might be canceled as random effect, check below.)
* NOTE : The following random effects columns are canceled: s(speaker)
The LMER model, however, shows a clear increase of /s/ duration with increasing LCF_Comp.2b
values:
Thus, it appears that higher LCF_Comp.2b
values lead to longer /s/ durations.
In summary, higher ALC
values lead to shorter /s/ durations. That is, a denser semantic neighbourhood (higher ALC
values), i.e. a higher semantic activation diversity, leads to shorter /s/ durations.
Moreover, higher LCF_Comp.2b
values lead to longer /s/ durations. As NNC
is part of LCF_Comp.2b
, i.e. high values indicate a close semantic real word neighbour, one may interpret this as an effect of semantic certainty. As for the Affix
part of this variable, one can observe that non-morphemic /s/ shows higher mean values of LCF_Comp.2b
than plural /s/ (\(1.17\) vs. \(-0.99\), \(p<0.001\)). Thus, higher values for non-morphemic /s/ cases go together with higher values of NNC
, and LCF_Comp.2b
in general, thus, leading to longer /s/ durations in monomorphemic pseudowords.
11 All measures
Taking into consideration all measures presented in sections 9 and 10, a final analysis is done.
Without discussing the underlying implications of the individual measures, let’s see whether some of them are correlated. If so, we need to take some measures before including them in any models.
11.1 Correlations
Correlations of all measures:
pairscor.fnc(data[,c("Affix",
"cor_max", "cor_target", "l1norm", "l2norm", "density", "cor_affix",
"correlations", "path_counts", "path_sum", "path_entropies", "lwlr",
"ALC", "ALDC", "EDNN", "NNC")])
Note: Without rank
as it is always \(1\), and without SCPP
as it is identical to correlations.
We find the following highly correlated (\(\rho\ge0.5\)) variable combinations:
- cor_max vs. cor_target, \(\rho=1\)
- cor_max vs. correlations, \(\rho=1\)
- cor_target vs. correlations, \(\rho=1\)
- l1norm vs. l2norm, \(\rho=0.94\)
- density vs. cor_affix, \(\rho=0.84\)
- path_counts vs. path_entropies, \(\rho=0.95\)
- path_sum vs. lwlr, \(\rho=-0.76\)
- cor_max vs. EDNN, \(\rho=-0.86\)
- cor_target vs. EDNN, \(\rho=-0.86\)
- correlations vs. EDNN, \(\rho=-0.86\)
- density vs. ALC, \(\rho=0.75\)
- cor_affix vs. ALC, \(\rho=0.72\)
- cor_affix vs. NNC, \(\rho=0.84\)
- path_counts vs. ALDC, \(\rho=0.89\)
- path_entropies vs. ALDC, \(\rho=0.9\)
- Affix vs. NNC, \(r_{pb}=-0.55\) (point-biserial correlation)
Thus, we need to take some measures before we run into problems of colinearity later on.
11.2 PCA
To avoid colinearity issues when modelling, we combine all highly correlated variables in a principal component analysis. However, as we must include Affix
as categorical variable, we cannot use the princomp
function. Instead, we make use of the PCAmixdata
function of the PCAmix
package. This function uses multiple correspondence analysis (MCA) to include categorical data in PCAs.
PCA Computation
First, we extract the highly correlated variables and merge them to form a new data frame.
pca.data1 <- data[colnames(data)=="cor_max"]
pca.data2 <- data[colnames(data)=="cor_target"]
pca.data3 <- data[colnames(data)=="correlations"]
pca.data4 <- data[colnames(data)=="l1norm"]
pca.data5 <- data[colnames(data)=="l2norm"]
pca.data6 <- data[colnames(data)=="density"]
pca.data7 <- data[colnames(data)=="cor_affix"] # NAs
pca.data8 <- data[colnames(data)=="path_counts"]
pca.data9 <- data[colnames(data)=="path_entropies"]
pca.data10 <- data[colnames(data)=="path_sum"]
pca.data11 <- data[colnames(data)=="lwlr"]
pca.data12 <- data[colnames(data)=="EDNN"]
pca.data13 <- data[colnames(data)=="ALC"]
pca.data14 <- data[colnames(data)=="ALDC"]
pca.data15 <- data[colnames(data)=="NNC"]
pca.data16 <- data[colnames(data)=="Affix"]
# numerical data
pca.data1 <- cbind(pca.data1,
pca.data2,
pca.data3,
pca.data4,
pca.data5,
pca.data6,
#pca.data7, NAs
pca.data8,
pca.data9,
pca.data10,
pca.data11,
pca.data12,
pca.data13,
pca.data14,
pca.data15)
# categorical data
pca.data2 <- cbind(pca.data16)
Then, we compute a principal component analysis using the princomp
function:
library(PCAmixdata)
res.pcamix <- PCAmix(X.quanti = pca.data1, X.quali = pca.data2, rename.level = TRUE, graph = FALSE)
In order to decide which components to keep, we follow three criteria:
- The Eigenvalue-criterion: Any component that displays an eigenvalue greater than 1.00 is accounting for a greater amount of variance than had been contributed by one variable. Such a component is therefore accounting for a meaningful amount of variance, and is worthy of being retained.
- The scree-criterion: Plot the eigenvalues associated with each component and look for a “break” between the components with relatively large eigenvalues and those with small eigenvalues. The components that appear before the break are assumed to be meaningful and are retained (Cattell, 1966).
- The proportion-of-variance-criterion: Retain enough components so that the cumulative percent of variance accounted for is equal to some minimal value, i.e. 80 percent.
Eigenvalue Proportion Cumulative
dim 1 3.89282845 25.9521896 25.95219
dim 2 3.83148447 25.5432298 51.49542
dim 3 2.21067273 14.7378182 66.23324
dim 4 1.62920303 10.8613535 77.09459
dim 5 1.59939209 10.6626139 87.75721
dim 6 0.89564655 5.9709770 93.72818
dim 7 0.27558824 1.8372550 95.56544
dim 8 0.23485797 1.5657198 97.13116
dim 9 0.16680812 1.1120541 98.24321
dim 10 0.10689236 0.7126157 98.95583
dim 11 0.08452222 0.5634815 99.51931
dim 12 0.04753395 0.3168930 99.83620
dim 13 0.02456981 0.1637988 100.00000
Taking all three criteria into consideration, we find that Components 1, 2, 3, and 4 show an Eigenvalue of \(v\ge1\), are followed by a small break in Eigenvalues with the Eigenvalue of Component 4 being \(v=1.63\) and the Eigenvalue of Component 5 being \(v=1.60\), and that the cumulative percentage of variance accounted for by all four components is \(σ^2=77.1\%\), i.e. almost at our target level of \(σ^2=80\%\).
Thus, we add components 1, 2, 3, and 4 to our data frame.
pcframe <- as.data.frame(res.pcamix$scores)
pcframe <- pcframe[1:4]
names(pcframe)[names(pcframe) == "dim 1"] <- "ALL_Comp.1b"
names(pcframe)[names(pcframe) == "dim 2"] <- "ALL_Comp.2b"
names(pcframe)[names(pcframe) == "dim 3"] <- "ALL_Comp.3b"
names(pcframe)[names(pcframe) == "dim 4"] <- "ALL_Comp.4b"
data <- cbind(data, pcframe)
PCA Loadings
Now we can take a closer look at the individual components and their loadings.
dim 1 dim 2 dim 3 dim 4 dim 5
cor_max 0.637722065 0.326920348 0.005879550 0.0120559346 0.0038315277
cor_target 0.637722065 0.326920348 0.005879550 0.0120559346 0.0038315277
correlations 0.637722065 0.326920348 0.005879550 0.0120559346 0.0038315277
l1norm 0.108952652 0.145102711 0.596151554 0.0003336438 0.0082385537
l2norm 0.110055547 0.138399299 0.599856157 0.0020516355 0.0266775599
density 0.015951067 0.252396450 0.027885544 0.3248052927 0.2542507614
path_counts 0.349562685 0.448228572 0.055724227 0.0088127215 0.0851677391
path_entropies 0.310867526 0.426202300 0.069083373 0.0294327025 0.1245838013
path_sum 0.118532333 0.106756317 0.213682472 0.0184313105 0.2653051450
lwlr 0.154372616 0.164872517 0.286525573 0.0089769820 0.1831666121
EDNN 0.449785343 0.419112785 0.005859934 0.0008826359 0.0003926486
ALC 0.017413225 0.279537745 0.006706018 0.5230282496 0.0057341834
ALDC 0.296892353 0.317874615 0.056139063 0.0254351727 0.1938145659
NNC 0.037602170 0.143869002 0.268593767 0.1350932734 0.2417072964
Affix 0.009674736 0.008371119 0.006826399 0.5157516060 0.1988586395
But what do these loadings conceptually mean?
Component 1
Component 1
is rather highly correlated with cor_max
, cor_target
, and correlations
. This component is identical to WWL.Comp_1
.
For cor_max
, higher values indicate a close semantic neighbour.
For cor_target
, higher values indicate a good mapping from form to meaning, i.e. a higher semantic certainty.
For correlations
, higher values indicate a higher goodness of production by means of fitting semantics.
Note: What is correlations
again? Given all candidate forms of a word, i.e. all word forms found as candidate production by the production model, and their estimated semantic vectors \(s_{candidate1}...s_{candidate1+n}\), correlation values between these estimated semantic vectors and a word’s semantic vector \(s\) are computed. The highest of these correlations values is taken as value of correlations.
Thus, for Component 1
, high values indicate higher semantic certainty, while lower values indicate higher semantic uncertainty.
Component 2
Component 2
is rather highly negatively correlated with path_counts
and path_entropies
. This component is identical to WWL.Comp_2
.
For path_counts
, higher values indicate the existence of multiple candidates (and thus paths) in production. May be interpreted as a measure of phonological activation diversity.
For path_entropies
, higher values indicate a higher level of disorder, i.e. uncertainty.
In sum, Component 2
reflects the certainty in production, i.e. high values indicate high certainty.
Component 3
Component 3
is rather highly correlated with l1norm
and l2norm
. This component is almost identical to WWL.Comp_3
, i.e. for WWL.Comp_3
measures are negatively correlated.
For l1norm
, higher values imply more strong links to many other lexomes.
For l2norm
, higher values imply more strong links to many other lexomes.
Both, l1norm
and l2norm
, may be interpreted as measures of semantic activation diversity.
In sum, Component 3
reflects semantic activation diversity, i.e. high component values indicate low semantic activation diversity.
Component 4
Component 4
is rather highly correlated with ALC
and rather highly negatively correlated with Affix
.
For ALC
, higher values indicate that a pseudoword’s semantics are part of a denser real word semantic neighbourhood. This may be interpreted as semantic activation diversity, but also as certainty in pseudoword semantics.
Thus, Component 4
reflects semantic activation diversity, i.e. high component values indicate a dense semantic neighbourhood.
11.3 GAMM
Let’s see whether the PCA component are significant predictors for word-final /s/ duration in singular and plural pseudowords. We will use Generalized Additive Mixed Models (GAMM) to model /s/ durations.
Modelling
Step 0
First, create a model with all relevant variables, including control and PCA variables, and speaker
as random effect:
gam.ALL.1 <- bam(sDurLog ~
s(ALL_Comp.1b) +
s(ALL_Comp.2b) +
s(ALL_Comp.3b) +
s(ALL_Comp.4b) +
SylPerMin+
pauseBin+
folType+
preC+
monoMultilingual+
s(speaker, bs="re"),
data=data, method = "REML")
Checking the anova()
function’s output, we now exclude non-significant variables in a stepwise fashion. After each potential exclusion, compareML()
is used to check whether the exclusion is justified in terms of model fit.
Family: gaussian
Link function: identity
Formula:
sDurLog ~ s(ALL_Comp.1b) + s(ALL_Comp.2b) + s(ALL_Comp.3b) +
s(ALL_Comp.4b) + SylPerMin + pauseBin + folType + preC +
monoMultilingual + s(speaker, bs = "re")
Parametric Terms:
df F p-value
SylPerMin 1 847.659 < 2e-16
pauseBin 1 36.267 2.99e-09
folType 4 4.760 0.000869
preC 3 1.541 0.202805
monoMultilingual 1 0.337 0.561827
Approximate significance of smooth terms:
edf Ref.df F p-value
s(ALL_Comp.1b) 1.103 1.195 0.078 0.77292
s(ALL_Comp.2b) 1.006 1.012 2.309 0.12749
s(ALL_Comp.3b) 4.480 5.465 3.681 0.00208
s(ALL_Comp.4b) 1.848 2.301 17.796 6.66e-09
s(speaker) 31.440 38.000 5.333 < 2e-16
Step 1
Exclusion of ALL_Comp.1b
.
gam.ALL.2 <- bam(sDurLog ~
#s(ALL_Comp.1b) +
s(ALL_Comp.2b) +
s(ALL_Comp.3b) +
s(ALL_Comp.4b) +
SylPerMin+
pauseBin+
folType+
preC+
monoMultilingual+
s(speaker, bs="re"),
data=data, method = "REML")
compareML(gam.ALL.1, gam.ALL.2)
gam.ALL.1: sDurLog ~ s(ALL_Comp.1b) + s(ALL_Comp.2b) + s(ALL_Comp.3b) +
s(ALL_Comp.4b) + SylPerMin + pauseBin + folType + preC +
monoMultilingual + s(speaker, bs = "re")
gam.ALL.2: sDurLog ~ s(ALL_Comp.2b) + s(ALL_Comp.3b) + s(ALL_Comp.4b) +
SylPerMin + pauseBin + folType + preC + monoMultilingual +
s(speaker, bs = "re")
Model gam.ALL.2 preferred: lower REML score (3.942), and lower df (2.000).
-----
Model Score Edf Difference Df
1 gam.ALL.1 -151.5819 20
2 gam.ALL.2 -155.5238 18 -3.942 2.000
AIC difference: 5.53, model gam.ALL.2 has lower AIC.
Family: gaussian
Link function: identity
Formula:
sDurLog ~ s(ALL_Comp.2b) + s(ALL_Comp.3b) + s(ALL_Comp.4b) +
SylPerMin + pauseBin + folType + preC + monoMultilingual +
s(speaker, bs = "re")
Parametric Terms:
df F p-value
SylPerMin 1 851.200 < 2e-16
pauseBin 1 36.188 3.11e-09
folType 4 4.824 0.000778
preC 3 1.830 0.140468
monoMultilingual 1 0.330 0.565763
Approximate significance of smooth terms:
edf Ref.df F p-value
s(ALL_Comp.2b) 1.000 1.000 3.957 0.0471
s(ALL_Comp.3b) 4.419 5.394 3.656 0.0023
s(ALL_Comp.4b) 1.797 2.235 18.580 4.23e-09
s(speaker) 31.438 38.000 5.335 < 2e-16
Step 2
Exclusion of monoMultilingual
.
gam.ALL.3 <- bam(sDurLog ~
#s(ALL_Comp.1b) +
s(ALL_Comp.2b) +
s(ALL_Comp.3b) +
s(ALL_Comp.4b) +
SylPerMin+
pauseBin+
folType+
preC+
#monoMultilingual+
s(speaker, bs="re"),
data=data, method = "REML")
compareML(gam.ALL.2, gam.ALL.3)
gam.ALL.2: sDurLog ~ s(ALL_Comp.2b) + s(ALL_Comp.3b) + s(ALL_Comp.4b) +
SylPerMin + pauseBin + folType + preC + monoMultilingual +
s(speaker, bs = "re")
gam.ALL.3: sDurLog ~ s(ALL_Comp.2b) + s(ALL_Comp.3b) + s(ALL_Comp.4b) +
SylPerMin + pauseBin + folType + preC + s(speaker, bs = "re")
Model gam.ALL.3 preferred: lower REML score (2.080), and lower df (1.000).
-----
Model Score Edf Difference Df
1 gam.ALL.2 -155.5238 18
2 gam.ALL.3 -157.6034 17 -2.080 1.000
AIC difference: 0.14, model gam.ALL.3 has lower AIC.
Family: gaussian
Link function: identity
Formula:
sDurLog ~ s(ALL_Comp.2b) + s(ALL_Comp.3b) + s(ALL_Comp.4b) +
SylPerMin + pauseBin + folType + preC + s(speaker, bs = "re")
Parametric Terms:
df F p-value
SylPerMin 1 866.290 < 2e-16
pauseBin 1 36.063 3.3e-09
folType 4 4.812 0.000793
preC 3 1.818 0.142592
Approximate significance of smooth terms:
edf Ref.df F p-value
s(ALL_Comp.2b) 1.000 1.000 3.943 0.04748
s(ALL_Comp.3b) 4.422 5.398 3.670 0.00223
s(ALL_Comp.4b) 1.783 2.217 18.735 3.97e-09
s(speaker) 32.154 39.000 5.239 < 2e-16
Step 3
Exclusion of preC
.
gam.ALL.4 <- bam(sDurLog ~
#s(ALL_Comp.1b) +
s(ALL_Comp.2b) +
s(ALL_Comp.3b) +
s(ALL_Comp.4b) +
SylPerMin+
pauseBin+
folType+
#preC+
#monoMultilingual+
s(speaker, bs="re"),
data=data, method = "REML")
compareML(gam.ALL.3, gam.ALL.4)
gam.ALL.3: sDurLog ~ s(ALL_Comp.2b) + s(ALL_Comp.3b) + s(ALL_Comp.4b) +
SylPerMin + pauseBin + folType + preC + s(speaker, bs = "re")
gam.ALL.4: sDurLog ~ s(ALL_Comp.2b) + s(ALL_Comp.3b) + s(ALL_Comp.4b) +
SylPerMin + pauseBin + folType + s(speaker, bs = "re")
Model gam.ALL.4 preferred: lower REML score (6.690), and lower df (3.000).
-----
Model Score Edf Difference Df
1 gam.ALL.3 -157.6034 17
2 gam.ALL.4 -164.2930 14 -6.690 3.000
AIC difference: -0.87, model gam.ALL.3 has lower AIC.
Family: gaussian
Link function: identity
Formula:
sDurLog ~ s(ALL_Comp.2b) + s(ALL_Comp.3b) + s(ALL_Comp.4b) +
SylPerMin + pauseBin + folType + s(speaker, bs = "re")
Parametric Terms:
df F p-value
SylPerMin 1 886.415 < 2e-16
pauseBin 1 38.013 1.28e-09
folType 4 5.042 0.000529
Approximate significance of smooth terms:
edf Ref.df F p-value
s(ALL_Comp.2b) 1.315 1.548 1.231 0.19749
s(ALL_Comp.3b) 5.209 6.296 3.859 0.00077
s(ALL_Comp.4b) 2.023 2.520 16.619 7.26e-09
s(speaker) 32.117 39.000 5.214 < 2e-16
Step 4
Exclusion of ALL_Comp.2b
.
gam.ALL.5 <- bam(sDurLog ~
#s(ALL_Comp.1b) +
#s(ALL_Comp.2b) +
s(ALL_Comp.3b) +
s(ALL_Comp.4b) +
SylPerMin+
pauseBin+
folType+
#preC+
#monoMultilingual+
s(speaker, bs="re"),
data=data, method = "REML")
compareML(gam.ALL.4, gam.ALL.5)
gam.ALL.4: sDurLog ~ s(ALL_Comp.2b) + s(ALL_Comp.3b) + s(ALL_Comp.4b) +
SylPerMin + pauseBin + folType + s(speaker, bs = "re")
gam.ALL.5: sDurLog ~ s(ALL_Comp.3b) + s(ALL_Comp.4b) + SylPerMin + pauseBin +
folType + s(speaker, bs = "re")
Model gam.ALL.5 preferred: lower REML score (2.805), and lower df (2.000).
-----
Model Score Edf Difference Df
1 gam.ALL.4 -164.293 14
2 gam.ALL.5 -167.098 12 -2.805 2.000
AIC difference: 0.97, model gam.ALL.5 has lower AIC.
Family: gaussian
Link function: identity
Formula:
sDurLog ~ s(ALL_Comp.3b) + s(ALL_Comp.4b) + SylPerMin + pauseBin +
folType + s(speaker, bs = "re")
Parametric Terms:
df F p-value
SylPerMin 1 885.121 < 2e-16
pauseBin 1 39.169 7.35e-10
folType 4 5.233 0.000378
Approximate significance of smooth terms:
edf Ref.df F p-value
s(ALL_Comp.3b) 5.356 6.456 4.204 0.000309
s(ALL_Comp.4b) 2.194 2.728 15.414 8.11e-09
s(speaker) 31.993 39.000 5.121 < 2e-16
Summary
Family: gaussian
Link function: identity
Formula:
sDurLog ~ s(ALL_Comp.3b) + s(ALL_Comp.4b) + SylPerMin + pauseBin +
folType + s(speaker, bs = "re")
Parametric coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.900214 0.046968 -19.167 < 2e-16 ***
SylPerMin -0.508560 0.017094 -29.751 < 2e-16 ***
pauseBinpause 0.109416 0.017483 6.259 7.35e-10 ***
folTypeF -0.001866 0.054333 -0.034 0.972609
folTypeN 0.004629 0.020793 0.223 0.823915
folTypeP 0.009948 0.018280 0.544 0.586516
folTypeV -0.065820 0.018568 -3.545 0.000423 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Approximate significance of smooth terms:
edf Ref.df F p-value
s(ALL_Comp.3b) 5.356 6.456 4.204 0.000309 ***
s(ALL_Comp.4b) 2.194 2.728 15.414 8.11e-09 ***
s(speaker) 31.993 39.000 5.121 < 2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
R-sq.(adj) = 0.81 Deviance explained = 82.4%
-REML = -167.1 Scale est. = 0.028569 n = 653
Summary:
* SylPerMin : numeric predictor; set to the value(s): 2.336339423.
* pauseBin : factor; set to the value(s): no_pause.
* folType : factor; set to the value(s): APP.
* ALL_Comp.3b : numeric predictor; with 30 values ranging from -4.311999 to 2.983377.
* ALL_Comp.4b : numeric predictor; set to the value(s): 0.0515270814629842.
* speaker : factor; set to the value(s): 2. (Might be canceled as random effect, check below.)
* NOTE : The following random effects columns are canceled: s(speaker)
The model predicts a rather unclear picture for the influence of ALL_Comp.3b
.
Summary:
* SylPerMin : numeric predictor; set to the value(s): 2.336339423.
* pauseBin : factor; set to the value(s): no_pause.
* folType : factor; set to the value(s): APP.
* ALL_Comp.3b : numeric predictor; set to the value(s): 0.0225544158487821.
* ALL_Comp.4b : numeric predictor; with 30 values ranging from -3.315741 to 1.854101.
* speaker : factor; set to the value(s): 2. (Might be canceled as random effect, check below.)
* NOTE : The following random effects columns are canceled: s(speaker)
ALL_Comp.4b
shows an almost linear effect.
Thus, it appears that LMER models may be a justified alternative.
11.4 LMER
Let’s see whether a linear mixed effects regression model does a similarly good job as compared to the previous GAMM model.
Modelling
First, create a model with all relevant variables, including control and PCA variables, and speaker
and monoMultilingual
as random effect:
lmer.ALL.1 <- lmer(sDurLog ~
ALL_Comp.1b +
ALL_Comp.2b +
ALL_Comp.3b +
ALL_Comp.4b +
SylPerMin+
pauseBin+
folType+
preC+
(1 | monoMultilingual) +
(1 | speaker),
data=data, REML=T)
anova(lmer.ALL.1)
Type III Analysis of Variance Table with Satterthwaite's method
Sum Sq Mean Sq NumDF DenDF F value Pr(>F)
ALL_Comp.1b 0.0004 0.0004 1 611.25 0.0121 0.912600
ALL_Comp.2b 0.1744 0.1744 1 610.87 6.0023 0.014567 *
ALL_Comp.3b 0.3149 0.3149 1 605.90 10.8341 0.001054 **
ALL_Comp.4b 1.2835 1.2835 1 619.06 44.1650 6.626e-11 ***
SylPerMin 24.6608 24.6608 1 535.94 848.5651 < 2.2e-16 ***
pauseBin 1.0320 1.0320 1 633.09 35.5107 4.212e-09 ***
folType 0.5823 0.1456 4 609.58 5.0095 0.000560 ***
preC 0.2661 0.0887 3 606.43 3.0522 0.028047 *
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Using the step()
function, we exclude all non-significant predictors:
Backward reduced random-effect table:
Eliminated npar logLik AIC LRT Df Pr(>Chisq)
<none> 17 151.258 -268.52
(1 | monoMultilingual) 1 16 151.258 -270.52 0.0 1 1
(1 | speaker) 0 15 98.957 -167.91 104.6 1 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Backward reduced fixed-effect table:
Degrees of freedom method: Satterthwaite
Eliminated Sum Sq Mean Sq NumDF DenDF F value Pr(>F)
ALL_Comp.1b 1 0.0004 0.0004 1 611.25 0.0121 0.9125999
ALL_Comp.2b 0 0.1750 0.1750 1 612.06 6.0326 0.0143211 *
ALL_Comp.3b 0 0.3171 0.3171 1 606.77 10.9299 0.0010021 **
ALL_Comp.4b 0 1.2852 1.2852 1 620.27 44.2940 6.221e-11 ***
SylPerMin 0 24.7441 24.7441 1 537.71 852.7706 < 2.2e-16 ***
pauseBin 0 1.0317 1.0317 1 634.14 35.5571 4.114e-09 ***
folType 0 0.5844 0.1461 4 610.60 5.0354 0.0005349 ***
preC 0 0.2855 0.0952 3 607.69 3.2802 0.0206276 *
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Model found:
sDurLog ~ ALL_Comp.2b + ALL_Comp.3b + ALL_Comp.4b + SylPerMin +
pauseBin + folType + preC + (1 | speaker)
This leaves us with the following final model:
lmer.ALL.2 <- lmer(sDurLog ~
ALL_Comp.2b +
ALL_Comp.3b +
ALL_Comp.4b +
SylPerMin +
pauseBin +
folType +
preC +
(1 | speaker),
data=data, REML=T)
anova(lmer.ALL.2)
Type III Analysis of Variance Table with Satterthwaite's method
Sum Sq Mean Sq NumDF DenDF F value Pr(>F)
ALL_Comp.2b 0.1750 0.1750 1 612.06 6.0326 0.0143211 *
ALL_Comp.3b 0.3171 0.3171 1 606.77 10.9299 0.0010021 **
ALL_Comp.4b 1.2852 1.2852 1 620.27 44.2940 6.221e-11 ***
SylPerMin 24.7441 24.7441 1 537.71 852.7706 < 2.2e-16 ***
pauseBin 1.0317 1.0317 1 634.14 35.5571 4.114e-09 ***
folType 0.5844 0.1461 4 610.60 5.0354 0.0005349 ***
preC 0.2855 0.0952 3 607.69 3.2802 0.0206276 *
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Trimming
Trimming appears to be a good idea according to the residual plots. However, for some unknown reason, the trimming procedure cannot be shown in R Markdown. Thus, the detailed procedure is skipped here. Trimming results in the removal of \(n=7\) (i.e. 1.1%) data points.
lmer.final <- lmer(sDurLog ~
ALL_Comp.2b +
ALL_Comp.3b +
ALL_Comp.4b +
SylPerMin +
pauseBin +
folType +
preC +
(1 | speaker),
data=data.trim, REML=T)
anova(lmer.final)
Type III Analysis of Variance Table with Satterthwaite's method
Sum Sq Mean Sq NumDF DenDF F value Pr(>F)
ALL_Comp.2b 0.1543 0.1543 1 604.57 5.8258 0.0160893 *
ALL_Comp.3b 0.3601 0.3601 1 600.02 13.5939 0.0002475 ***
ALL_Comp.4b 1.0439 1.0439 1 612.56 39.4113 6.503e-10 ***
SylPerMin 25.0421 25.0421 1 538.90 945.4032 < 2.2e-16 ***
pauseBin 0.8957 0.8957 1 625.73 33.8158 9.665e-09 ***
folType 0.5697 0.1424 4 603.42 5.3772 0.0002931 ***
preC 0.2770 0.0923 3 600.75 3.4856 0.0156299 *
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Linear mixed model fit by REML. t-tests use Satterthwaite's method [
lmerModLmerTest]
Formula: sDurLog ~ ALL_Comp.2b + ALL_Comp.3b + ALL_Comp.4b + SylPerMin +
pauseBin + folType + preC + (1 | speaker)
Data: data.trim
REML criterion at convergence: -362.8
Scaled residuals:
Min 1Q Median 3Q Max
-2.51964 -0.63451 0.01489 0.70055 2.54587
Random effects:
Groups Name Variance Std.Dev.
speaker (Intercept) 0.009063 0.0952
Residual 0.026488 0.1628
Number of obs: 646, groups: speaker, 40
Fixed effects:
Estimate Std. Error df t value Pr(>|t|)
(Intercept) -0.829145 0.048025 420.881076 -17.265 < 2e-16 ***
ALL_Comp.2b -0.008318 0.003446 604.567104 -2.414 0.016089 *
ALL_Comp.3b -0.017038 0.004621 600.018486 -3.687 0.000247 ***
ALL_Comp.4b -0.033745 0.005375 612.562512 -6.278 6.50e-10 ***
SylPerMin -0.524868 0.017070 538.898703 -30.747 < 2e-16 ***
pauseBinpause 0.098413 0.016924 625.729472 5.815 9.66e-09 ***
folTypeF -0.012088 0.052199 606.447567 -0.232 0.816951
folTypeN 0.013395 0.020195 598.930413 0.663 0.507405
folTypeP 0.017635 0.017776 600.826520 0.992 0.321552
folTypeV -0.058057 0.017995 607.711315 -3.226 0.001321 **
preCk -0.031337 0.019489 599.938538 -1.608 0.108381
preCp -0.031340 0.019345 600.318088 -1.620 0.105745
preCt -0.062844 0.019479 601.786821 -3.226 0.001322 **
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
11.5 LMER vs. GAMM
Let’s compare the results of both, LMER and GAMM, models.
As we have seen already, the GAMM model somewhat predicts an unclear effect of ALL_Comp.3
:
Summary:
* SylPerMin : numeric predictor; set to the value(s): 2.336339423.
* pauseBin : factor; set to the value(s): no_pause.
* folType : factor; set to the value(s): APP.
* ALL_Comp.3b : numeric predictor; with 30 values ranging from -4.311999 to 2.983377.
* ALL_Comp.4b : numeric predictor; set to the value(s): 0.0515270814629842.
* speaker : factor; set to the value(s): 2. (Might be canceled as random effect, check below.)
* NOTE : The following random effects columns are canceled: s(speaker)
Considering the unclear picture, we checked an LMER model:
The LMER model predicts shorter /s/ durations for higher ALL_Comp.3b
values.
For ALL_Comp.4b
, both models assume a linear effect pointing into the same direction.
Summary:
* SylPerMin : numeric predictor; set to the value(s): 2.336339423.
* pauseBin : factor; set to the value(s): no_pause.
* folType : factor; set to the value(s): APP.
* ALL_Comp.3b : numeric predictor; set to the value(s): 0.0225544158487821.
* ALL_Comp.4b : numeric predictor; with 30 values ranging from -3.315741 to 1.854101.
* speaker : factor; set to the value(s): 2. (Might be canceled as random effect, check below.)
* NOTE : The following random effects columns are canceled: s(speaker)
Thus, is appear that higher ALL_Comp.4b
values lead to shorter /s/ durations.
Lastly, the LMER model also contains ALL_Comp.2b
as a significant predictor for /s/ duration.
It appears that higher ALL_Comp.2b
values lead to shorter /s/ durations.
In sum, higher ALL_Comp.2b
and ALL_Comp.3b
values lead to shorter /s/ durations. That is, higher semantic activation diversity leads to shorter /s/ durations.
Additionally, higher ALL_Comp.4b
values lead to shorter /s/ durations. That is, pseudowords which are part of a denser semantic neighbourhood, i.e. higher values of ALC
, show shorter /s/ durations. As for the Affix
part of this variable, one can observe that non-morphemic /s/ shows lower mean values of ALL_Comp.4b
than plural /s/ (\(-0.99\) vs. \(0.85\), \(p<0.001\)). Thus, higher values for plural /s/ cases go together with higher values of ALC
, and LCF_Comp.4b
in general, thus, leading to shorter /s/ durations in plural pseudowords.
12 Summary
Lower semantic activation diversity leads to longer /s/ durations. WWL_Comp.3; ALL_Comp.2b; ALL_Comp.3b
A denser semantic neighbourhood leads to shorter /s/ durations. ALC
Higher semantic certainty leads to longer /s/ durations. LCF_Comp.2b; ALL_Comp.4b
References
Baayen, R. H., Chuang, Y. Y., and Blevins, J. P. (2018). Inflectional morphology with linear mappings. The Mental Lexicon, 13 (2), 232-270. doi.org/10.1075/ml.18010.baa
Baayen, R. H., Chuang, Y. Y., Shafaei-Bajestan, E., & Blevins, J. P. (2019). The discriminative lexicon: A unified computational model for the lexicon and lexical processing in comprehension and production grounded not in (de)composition but in linear discriminative learning. Complexity, 2019, 1-39. doi.org/10.1155/2019/4895891
Cattell, R. B. (1966). The scree test for the number of factors. Multivariate Behavioral Research, 1, 245-276.
Chavant, M., Kuentz, V., Labenne, A., Liquet, B., & Saracco, J. (2017). PCAmixdata: Multivariate Analysis of Mixed Data. R package version 3.1. https://CRAN.R-project.org/package=PCAmixdata
Chuang, Y-Y., Vollmer, M-l., Shafaei-Bajestan, E., Gahl, S., Hendrix, P., & Baayen, R. H. (2020). The processing of pseudoword form and meaning in production and comprehension: A computational modeling approach using Linear Discriminative Learning. Behavior Research Methods, 1-51. doi.org/10.3758/s13428-020-01356-w
Nykamp, D. (2020, November 10). Multiplying matrices and vectors. Math Insight. mathinsight.org/matrix_vector_multiplication
Schmitz, D., Baer-Henney, D., & Plag, I. (submitted). The duration of word-final /s/ differs across morphological categories in English: Evidence from pseudowords.
Tucker, B. V., Brenner, D., Danielson, D. K., Kelley, M. C., Nenadic, F., & Sims, M. (2018). The massive auditory lexical decision (MALD) database. Behavior Research Methods, 1-18. doi.org/10.3758/s13428-018-1056-1