Durational differences of homophonous suffixes emerge from the lexicon: After Workshop
This script documents the analysis of ENGS production data and MALD corpus data with measures of Linear Discriminative Learning (LDL; e.g. Baayen et al., 2018). This is a work in progress.
The idea of this LDL implementation is the following: Networks trained on real word data and their transformation matrices are used to create semantic matrices for pseudowords, following the implementation by Chuang et al. (2020). The underlying assumption is that speakers who face pseudowords do so with their knowledge on previously encountered words, i.e. their lexicon. Thus, pseudowords are not semantically empty shells but contain some sort of meaning.
The following script includes a reproduction of most steps found in the two previous scripts, here and here. Newly added is the updated pseudoword S matrix creation as well as a more thorough modelling procedure. [to be continued]
1 Data
1.1 MALD data
Data on real words is taken from the Massive Auditory Lexical Decision (MALD) Database (Tucker et al., 2018). It is a subset of the complete MALD database, containing 8362 words of which 2120 words contain one of 36 different affixes.
[1] 8362 4
1.2 ENGS data
Pseudoword data is taken from the ENGS production experiment (Schmitz et al., 2020). In total, 48 pseudowords are contained within this data set, with half of them representing monomorphemic words (e.g. singular bloufs) and half of them representing plural wordforms (e.g. plural bloufs). As a number of pseudowords showed more than one consistent pronunciation, these are contained more than twice, e.g. prups.
## load pseudowords (ENGS)
engs.data <- read.csv("data666/final_S_data_LDL_types.csv")
dim(engs.data)
[1] 78 4
1.3 COMB data
The combined data set contains both data on real and on nonce words as introduced in 1.1 and 1.2. In total, 8440 words and 36 different affixes are part of this data set.
## create a combined data set
comb.data <- rbind(mald.data, engs.data)
## sort combined data set alphabetically
comb.data <- comb.data[order(comb.data$Word),]
dim(comb.data)
[1] 8440 4
2 A matrix
We can create an A matrix for real words using the semantic vectors constructed of the TASA corpus. The A matrix consists of semantic vectors (columns) for all words in orthographic form (rows).
X | meat | mechanic | mechanical | mechanism | medal |
---|---|---|---|---|---|
eat | 0.0118417 | -0.0001872 | 0.0002885 | -0.0002331 | 1.98e-05 |
echo | 0.0005983 | -0.0000384 | -0.0000922 | -0.0000899 | -2.05e-05 |
eclipse | 0.0000350 | -0.0003962 | -0.0000451 | -0.0000278 | -9.70e-06 |
## load TASA vectors
load("data_simon/wL2Ld.rda")
## find all unique MALD (= real word) lexomes
mald.subset.allLexomes = unique(c(mald.data$Word, mald.data$Affix))
mald.subset.allLexomes = mald.subset.allLexomes[!is.na(mald.subset.allLexomes)]
## find vectors for all unique lexomes > this then is the A matrix
mald.Amat = wL2Ld[mald.subset.allLexomes, ]
3 C matrices
C matrices contain all triphones of a data set (columns) which are coded binary for all words in phonological form (rows).
X | X..b | X.b. | b.n | X.nd | nd. | d.n | X.nm | nm. | m.n | X.nt | nt. | b.S | X.S. | X..b.1 | X.bI |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
@b{nd@nm@nt | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 0 | 0 | 0 | 0 |
@b{S | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 0 |
{bI | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 |
{b@t | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 |
{bdIk1t | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 |
3.1 real words
The C matrix for real words is computed based on the MALD data set and \(Word+Affix\), i.e. word forms and affixes are considered.
mald.cues = make_cue_matrix(data = mald.data, formula = ~ Word + Affix, grams = 3, wordform = "DISC")
dim(mald.cues$matrices$C)
# 8328 7587
4 S matrices
S matrices contain all semantic vectors of a data set (columns) and their pertinent values for all words in phonological form (rows).
X | meat | mechanic | mechanical | mechanism | medal |
---|---|---|---|---|---|
it | 0.0118417 | -0.0001872 | 0.0002885 | -0.0002331 | 1.98e-05 |
Ek5 | 0.0005983 | -0.0000384 | -0.0000922 | -0.0000899 | -2.05e-05 |
IklIps | 0.0000350 | -0.0003962 | -0.0000451 | -0.0000278 | -9.70e-06 |
4.1 real words
The S matrix for real words is computed based on the MALD data set, the MALD A matrix, and \(Word+Affix\), i.e. word forms and affixes are considered.
mald.Smat = make_S_matrix(data = mald.data, formula = ~ Word + Affix, inputA = mald.Amat, grams = 3, wordform = "DISC")
dim(mald.Smat)
# 8328 5487
4.2 pseudowords
We can create a semantic matrix for pseudoword based on the semantics of real words solving the following equation
\[engs.Smat = engs.C * F\] with engs.Smat
being the semantic matrix for pseudowords, engs.C
being the C matrix contained within the cue matrix of pseudowords, and F
being the transformation matrix for mapping real word cues onto real word semantics.
To obtain this real word transformation matrix F
(Hp
in Chuang et al., 2020) we first need to learn comprehension of real words.
Then, we can extract F
from mald.comp
.
The following equation illustrates the use of the F
transformation matrix in solving \(mald.C * F = estimated.mald.Smat\):
\[
\left(\begin{array}{cc}
A & B & C\\
D & E & F\\
G & H & I
\end{array}\right)
\left(\begin{array}{cc}
K & L & M\\
N & O & P\\
Q & R & S
\end{array}\right)
=
\left(\begin{array}{cc}
AK+BN+CQ & AL+BO+CR & AM+BP+CS\\
DK+EN+FQ & DL+EO+FR & DM+EP+FS\\
GK+HN+IQ & GL+HN+IR & GM+HP+IS
\end{array}\right)
\] That is, the F
matrix corresponds to the order of columns and rows found in its original computation (i.e. in learn_comprehension
). Thus, one must consider this order when using the F matrix in further calculations. Otherwise, wrong rows will be multiplied with wrong columns, as is shown in the following examples.
\[
\left(\begin{array}{cc}
A & \alpha & B\\
D & \beta & E\\
G & \gamma & H
\end{array}\right)
\left(\begin{array}{cc}
K & L & M\\
N & O & P\\
Q & R & S
\end{array}\right)
=
\left(\begin{array}{cc}
AK+\alpha N+BQ & AL+\alpha O+BR & AM+\alpha P+BS\\
DK+\beta N+EQ & DL+\beta O+ER & DM+\beta P+ES\\
GK+\gamma N+HQ & GL+\gamma N+HR & GM+\gamma P+HS
\end{array}\right)
\] Let’s assume the \((\alpha, \beta, \gamma)\) column represents the triphone pru and the \((B, E, H)\) column represents the triphone pli. In the above example, \((\alpha, \beta, \gamma)\) is multiplied with the \((N, O, P)\) row of the F
matrix. However, in the real word C matrix, the second column is \((B, E, H)\). Thus, \((B, E, H)\) should be multiplied with \((N, O, P)\) instead. This is shown below:
\[ \left(\begin{array}{cc} A & B & \alpha \\ D & E & \beta \\ G & H & \gamma \end{array}\right) \left(\begin{array}{cc} K & L & M\\ N & O & P\\ Q & R & S \end{array}\right) = \left(\begin{array}{cc} AK+BN+\alpha Q & AL+BO+\alpha R & AM+BP+\alpha S\\ DK+EN+\beta Q & DL+EO+\beta R & DM+EP+\beta S\\ GK+HN+\gamma Q & GL+HN+\gamma R & GM+HP+\gamma S \end{array}\right) \]
Addtionally, for matrix multiplication, the number of columns in the first matrix must be equal to the number of rows in the second matrix (e.g. Nykamp, 2020).
Thus, we must make sure to use a pseudoword C matrix with the same order of columns as the real word C matrix. This, in turn, also expands the pseudoword C matrix, i.e. to entertain all column names of the real word C matrix we must have an identical number of columns.
Using two functions of the LDLConvFunctions
package, this is easily done. First, find which column in the real word C matrix corresponds to which column in the pseudoword C matrix:
Then, use this information to create a new pseudoword C matrix which a) has the same number of columns as the real word C matrix, and b) the same order of columns (i.e. triphones) as the real word C matrix:
pseudo.C.new <- reorder_pseudo_C_matrix(pseudo_C_matrix = engs.cues, real_C_matrix = mald.cues, found_triphones = found_triphones)
Now, we can solve the aforementioned equation \(engs.Smat = engs.C * F\).
engs.Smat = pseudo.C.new %*% F
dim(engs.Smat)
# 78 5487
# change rownames to pseudoword rownames (instead of numbers)
rownames(engs.Smat) <- engs.data$DISC
In this S matrix, 48 pseudowords (i.e. 78 pronunciations) and their semantic vectors are contained. However, all of our pseudoword pairs are homophones, thus, they show identical semantic vectors even though half of them contain a suffix, e.g. the semantic vectors for bloufs (non-morphemic S) and bloufs (plural S) are identical.
X | fierce | fifteen | fifteenth | fifth | fifty |
---|---|---|---|---|---|
bl6fs | -0.0001453 | 0.0002496 | 0.0002782 | 0.0006957 | -0.0001303 |
bl6fs | -0.0001453 | 0.0002496 | 0.0002782 | 0.0006957 | -0.0001303 |
bl6ks | 0.0000021 | -0.0006015 | 0.0000090 | -0.0012144 | 0.0003275 |
bl6ks | 0.0000021 | -0.0006015 | 0.0000090 | -0.0012144 | 0.0003275 |
We “fix” this by adding the “PL” semantic vector values to the semantic vectors of plural pseudowords (idea by Yu-Ying).
Thus, we first need to extract the “PL” semantic vector contained in the general A matrix.
Taking a closer look at the pseudoword S matrix, we find that every second row corresponds to a plural pseudoword, i.e. this is where we add the PL vector.
Taking this plural subset, we add the “PL” semantic vector to all rows.
We then recombine the original monomorphemic S matrix and the modified plural S matrix, and recreate the original row order.
engs.nm = engs.Smat[seq(1, nrow(engs.Smat), 2), ]
engs.Smat <- rbind(engs.nm, engs.pl.added)
sort.rows <- row.names(engs.Smat.old)
engs.Smat <- engs.Smat[order(match(rownames(engs.Smat), sort.rows)), , drop = FALSE]
Now, homophonous pseudowords have differing semantic vectors.
X | fierce | fifteen | fifteenth | fifth | fifty |
---|---|---|---|---|---|
bl6fs | -0.0001453 | 0.0002496 | 0.0002782 | 0.0006957 | -0.0001303 |
bl6fs | -0.0000796 | 0.0001372 | 0.0003686 | 0.0003187 | -0.0002868 |
bl6ks | 0.0000021 | -0.0006015 | 0.0000090 | -0.0012144 | 0.0003275 |
bl6ks | 0.0000678 | -0.0007139 | 0.0000994 | -0.0015913 | 0.0001709 |
4.3 combined
Finally, we can create one combined S matrix for real and nonce words.
However, LDL commands in R are sensitive to row order. Thus, we need to recreate the original row order as found in the combined C matrix.
all(rownames(comb.cues$matrices$C) == rownames(comb.Smat))
# FALSE
comb.Smat2 <- comb.Smat[rownames(comb.cues$matrices$C),,drop=FALSE]
all(rownames(comb.cues$matrices$C) == rownames(comb.Smat2))
# TRUE
5 Comprehension
Using the combined C and S matrices, we can now learn and evaluate comprehension of real and nonce words in one go, i.e. as if they are encountered by a person with their pertinent lexicon.
First, comprehension is learned.
Second, the accuracy of this comprehension is evaluated. That is, how good can form (C matrix) be mapped onto meaning (S matrix).
comb.comp_acc = accuracy_comprehension(m = comb.comp, data = comb.data, wordform = "DISC", show_rank = TRUE, neighborhood_density = TRUE)
comb.comp_acc$acc # 0.7440758
We find that comprehension accuracy is about \(74\%\).
6 Production
Using the combined C and S matrices as well as the comprehension results, we now are able to learn production and evaluate its accuracy.
First, production is learned.
Second, the accuracy of this production is evaluated. That is, how good can meaning (S matrix) be mapped onto form (C matrix).
comb.prod_acc = accuracy_production(m = comb.prod, data = comb.data, wordform = "DISC", full_results = TRUE, return_triphone_supports = TRUE)
comb.prod_acc$acc # 0.9729858
We find that production accuracy is about \(97\%\).
7 Measures
7.1 Comprehension
Extracting and saving comprehension measures is straightforward.
comb.comp_measures = comprehension_measures(comp_acc = comb.comp_acc, Shat = comb.comp$Shat, S = comb.Smat2, A = mald.Amat, affixFunctions = comb.data$Affix)
- recognition: whether the correct form was recognized
- cor_max: the correlation between the pertinent vector in Shat and the semantic vector of its predicted form in S
- cor_target: the correlation between the pertinent vector in Shat and the semantic vector of its targeted form in S
- l1norm: city block distance / Manhattan distance of Shat
- l2norm: euclidian (direct) distance / length of Shat
- rank: the rank of correlation between s-hat and the semantic vector of its targeted form
- density: the mean correlation of s-hat with the semantic vectors of its top 8 neighbors
- cor_affix: the correlation of s-hat and the semantic vector of its affix function; more or less a measure of transparency
7.2 Production
Extracting and saving production measures runs into problems. It appears that there is 1 word for which comb.prod_acc
does not contain all necessary information. Why? I have no clue.
I worked around this problem by removing the word from full
, i.e. the list in which all necessary info is contained within comb.prod_acc
.
Using the summary()
function, the case with missing info is identified. I call this newly created data frame nineortwelve
as complete entries consist of 12 entries per word, while incomplete entries consist of only 9 entries per word.
Checking this data frame, the following entry can be found with only 9 individual entries instead of 12. Thus, it is removed from full2
.
Fun fact: entry 3725 belongs to the word inauguration.
This reduced full2
then replaces the original full
in a copy of comb.prod_acc
.
Only then production measures can be obtained.
comb.prod_measures = production_measures(prod_acc = comb.prod_acc2, S = comb.Smat2, prod = comb.prod)
- correlations: the correlation of the predicted path with the targeted semantic vector
- path_counts: the number of paths detected
- path_sum: the summed support for the predicted path, could be read as the uncertainty about how to pronounce the word due to competing activation of articulatory motor plans
- path_entropies: Shannon entropy calculated over the path supports
- lwlr: Simon: “The length-weakest link ratios of the predicted path (also sometimes equal to word length). When the length of a word’s path, as measured in number of nodes, becomes higher, while the weakest link between nodes stays the same, uncertainty becomes greater. If triphone support is high, i.e. when all the strengths of the links are almost equal to one, the value will be almost equal to the number of segments in the word. This may not be the case in other models, where the triphone support may be lower.”
8 Duration Data
The data set used in the remainder of this script contains the following parts:
- ENGS production data (Schmitz et al., 2020)
- LDL measures as extracted by the
WpmWithLdl
package - LDL measures as extracted by the
LDLConvFunctions
package
The data set contains 52 columns and 653 rows, i.e. 653 observations of 52 variables:
9 WpmWithLdl measures
WpmWithLdl
includes the following measures via the comprehension_measures
and production_measures
functions: click here.
Without discussing the underlying implications of the individual measures, let’s see whether some of them are correlated. If so, we need to take some measures before including them in any models.
9.1 Correlations
Correlations of all WpmWithLdl
measures:
pairscor.fnc(data[,c("cor_max", "cor_target", "l1norm", "l2norm", "density", "cor_affix",
"correlations", "path_counts", "path_sum", "path_entropies", "lwlr")])
Note: rank
is not considered as a predictive variable as its value is \(1\) for all observations.
We find the following highly correlated (\(\rho\ge0.5\)) variable combinations:
- cor_max vs. cor_target, \(\rho=1\)
- cor_max vs. correlations, \(\rho=1\)
- cor_target vs. correlations, \(\rho=1\)
- l1norm vs. l2norm, \(\rho=0.95\)
- density vs. cor_affix, \(\rho=0.84\)
- path_counts vs. path_entropies, \(\rho=0.95\)
- path_sum vs. lwlr, \(\rho=-0.76\)
Thus, we need to take some measures before we run into problems of colinearity later on.
9.2 PCA
To avoid colinearity issues when modelling, we combine all highly correlated variables in a principal component analysis.
PCA Computation
First, we extract the highly correlated variables and merge them to form a new data frame.
pca.data1 <- data[colnames(data)=="cor_max"]
pca.data2 <- data[colnames(data)=="cor_target"]
pca.data3 <- data[colnames(data)=="correlations"]
pca.data4 <- data[colnames(data)=="l1norm"]
pca.data5 <- data[colnames(data)=="l2norm"]
pca.data6 <- data[colnames(data)=="density"]
pca.data7 <- data[colnames(data)=="cor_affix"] # this variable is dropped as it only contains data on PL cases
pca.data8 <- data[colnames(data)=="path_counts"]
pca.data9 <- data[colnames(data)=="path_entropies"]
pca.data10 <- data[colnames(data)=="path_sum"]
pca.data11 <- data[colnames(data)=="lwlr"]
pca.data <- cbind(pca.data1,
pca.data2,
pca.data3,
pca.data4,
pca.data5,
pca.data6,
#pca.data7,
pca.data8,
pca.data9,
pca.data10,
pca.data11)
Then, we compute a principal component analysis using the princomp
function:
In order to decide which components to keep, we follow three criteria:
- The Eigenvalue-criterion: Any component that displays an eigenvalue greater than 1.00 is accounting for a greater amount of variance than had been contributed by one variable. Such a component is therefore accounting for a meaningful amount of variance, and is worthy of being retained.
- The scree-criterion: Plot the eigenvalues associated with each component and look for a “break” between the components with relatively large eigenvalues and those with small eigenvalues. The components that appear before the break are assumed to be meaningful and are retained (Cattell, 1966).
- The proportion-of-variance-criterion: Retain enough components so that the cumulative percent of variance accounted for is equal to some minimal value, i.e. 80 percent.
# load package to extract Eigenvalues
library("factoextra")
# extract Eigenvalues
get_eigenvalue(pc)
eigenvalue variance.percent cumulative.variance.percent
Dim.1 3.155143e+00 3.155143e+01 31.55143
Dim.2 2.782508e+00 2.782508e+01 59.37650
Dim.3 2.002288e+00 2.002288e+01 79.39938
Dim.4 1.111486e+00 1.111486e+01 90.51424
Dim.5 6.368246e-01 6.368246e+00 96.88248
Dim.6 2.192438e-01 2.192438e+00 99.07492
Dim.7 6.284442e-02 6.284442e-01 99.70337
Dim.8 2.966341e-02 2.966341e-01 100.00000
Dim.9 7.170136e-17 7.170136e-16 100.00000
Dim.10 0.000000e+00 0.000000e+00 100.00000
Taking all three criteria into consideration, we find that Components 1, 2, and 3 show an Eigenvalue of \(v\ge1\), are followed by a break in Eigenvalues with the Eigenvalue of Component 3 being \(v=2\) and the Eigenvalue of Component 4 being \(v=1.1\), and that the cumulative percentage of variance accounted for by all three components is \(σ^2=79.4\%\), i.e. close to our target level of \(σ^2=80\%\).
Thus, we add components 1, 2, and 3 to our data frame.
pcframe <- as.data.frame(pc$scores)
pcframe <- pcframe[1:3]
names(pcframe)[names(pcframe) == "Comp.1"] <- "WWL_Comp.1"
names(pcframe)[names(pcframe) == "Comp.2"] <- "WWL_Comp.2"
names(pcframe)[names(pcframe) == "Comp.3"] <- "WWL_Comp.3"
data <- cbind(data, pcframe)
PCA Loadings
Now we can take a closer look at the individual components and their loadings.
Loadings:
Comp.1 Comp.2 Comp.3 Comp.4 Comp.5 Comp.6 Comp.7 Comp.8 Comp.9
cor_max 0.462 0.339 0.349
cor_target 0.462 0.339 0.465
correlations 0.462 0.339 -0.814
l1norm 0.211 -0.179 -0.579 0.179 -0.247 -0.509 0.484
l2norm 0.213 -0.170 -0.592 -0.242 -0.158 0.501 -0.483
density -0.246 0.682 0.665 0.120
path_counts 0.296 -0.413 -0.396 0.283 -0.505 -0.499
path_entropies 0.279 -0.408 -0.465 0.190 0.131 0.474 0.510
path_sum -0.208 0.297 -0.377 -0.272 0.516 -0.609 0.119
lwlr 0.229 -0.328 0.403 0.219 -0.224 -0.755
Comp.10
cor_max 0.738
cor_target -0.671
correlations
l1norm
l2norm
density
path_counts
path_entropies
path_sum
lwlr
Comp.1 Comp.2 Comp.3 Comp.4 Comp.5 Comp.6 Comp.7 Comp.8 Comp.9
SS loadings 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0
Proportion Var 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1
Cumulative Var 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
Comp.10
SS loadings 1.0
Proportion Var 0.1
Cumulative Var 1.0
But what do these loadings conceptually mean?
Component 1
Component 1
is rather highly correlated with cor_max
, cor_target
, and correlations
.
For cor_max
, high values indicate a high correlation between the estimated semantic vector and the semantic vector of another word.
For cor_target
, high values indicate a high correlation between the predicted and targeted semantic vector.
For correlations
, high values indicate a high correlation of the predicted path [and its predicted semantics] with the targeted semantic vector.
Note: Where does correlations
come from? All candidate forms of a word have their own predicted semantic vectors; the value of correlations
is the correlation value of the chosen candidate’s semantic vector and the semantic vector that is to be realized in production.
Thus, for Component 1
, high values indicate higher semantic certainty, while lower values indicate higher semantic uncertainty.
Component 2
Component 1
is rather highly negatively correlated with path_counts
and path_entropies
.
For path_counts
, high values indicate a high number of paths detected.
For path_entropies
, high values indicate a high Shannon entropy calculated over the path supports.
In sum, Component 2
reflects the certainty in production, i.e. high values indicate low certainty.
Component 3
Component 3
is rather highly negatively correlated with l1norm
and l2norm
.
For l1norm
, higher values imply more strong links to many other lexomes.
For l2norm
, higher values imply more strong links to many other lexomes.
In sum, Component 3
reflects semantic certainty, i.e. higher values indicate higher certainty.
9.3 GAMMs
Let’s see whether the PCA components are significant predictors for word-final /s/ duration in singular and plural pseudowords. We will use Generalized Additive Mixed Models (GAMMs) to model /s/ durations.
Modelling
Step 0
First, create a model with all relevant variables, including control and PCA variables, and speaker
as random effect:
gam.WpmWithLdl.1 <- bam(sDurLog ~
Affix +
s(WWL_Comp.1) +
s(WWL_Comp.2) +
s(WWL_Comp.3) +
SylPerMin+
pauseBin+
folType+
preC+
monoMultilingual+
s(speaker, bs="re"),
data=data, method = "REML")
Checking the anova()
function’s output, we now exclude non-significant variables in a stepwise fashion. After each potential exclusion, compareML()
is used to check whether the exclusion is justified in terms of model fit.
Family: gaussian
Link function: identity
Formula:
sDurLog ~ Affix + s(WWL_Comp.1) + s(WWL_Comp.2) + s(WWL_Comp.3) +
SylPerMin + pauseBin + folType + preC + monoMultilingual +
s(speaker, bs = "re")
Parametric Terms:
df F p-value
Affix 1 30.020 6.29e-08
SylPerMin 1 817.699 < 2e-16
pauseBin 1 39.214 7.22e-10
folType 4 4.915 0.000662
preC 3 1.658 0.174939
monoMultilingual 1 0.452 0.501848
Approximate significance of smooth terms:
edf Ref.df F p-value
s(WWL_Comp.1) 1.000 1.001 0.012 0.914079
s(WWL_Comp.2) 3.105 3.943 1.424 0.222711
s(WWL_Comp.3) 3.182 3.926 5.524 0.000316
s(speaker) 31.041 38.000 4.992 < 2e-16
Step 1
Exclusion of WWL_Comp.1
.
gam.WpmWithLdl.2 <- bam(sDurLog ~
Affix +
#s(WWL_Comp.1) +
s(WWL_Comp.2) +
s(WWL_Comp.3) +
SylPerMin+
pauseBin+
folType+
preC+
monoMultilingual+
s(speaker, bs="re"),
data=data, method = "REML")
compareML(gam.WpmWithLdl.1, gam.WpmWithLdl.2)
gam.WpmWithLdl.1: sDurLog ~ Affix + s(WWL_Comp.1) + s(WWL_Comp.2) + s(WWL_Comp.3) +
SylPerMin + pauseBin + folType + preC + monoMultilingual +
s(speaker, bs = "re")
gam.WpmWithLdl.2: sDurLog ~ Affix + s(WWL_Comp.2) + s(WWL_Comp.3) + SylPerMin +
pauseBin + folType + preC + monoMultilingual + s(speaker,
bs = "re")
Model gam.WpmWithLdl.2 preferred: lower REML score (3.083), and lower df (2.000).
-----
Model Score Edf Difference Df
1 gam.WpmWithLdl.1 -148.2247 19
2 gam.WpmWithLdl.2 -151.3075 17 -3.083 2.000
AIC difference: 4.61, model gam.WpmWithLdl.2 has lower AIC.
Family: gaussian
Link function: identity
Formula:
sDurLog ~ Affix + s(WWL_Comp.2) + s(WWL_Comp.3) + SylPerMin +
pauseBin + folType + preC + monoMultilingual + s(speaker,
bs = "re")
Parametric Terms:
df F p-value
Affix 1 28.318 1.45e-07
SylPerMin 1 819.008 < 2e-16
pauseBin 1 38.521 1.00e-09
folType 4 4.645 0.00106
preC 3 1.439 0.23027
monoMultilingual 1 0.430 0.51207
Approximate significance of smooth terms:
edf Ref.df F p-value
s(WWL_Comp.2) 1.000 1.001 2.644 0.104453
s(WWL_Comp.3) 3.093 3.891 4.927 0.000684
s(speaker) 31.068 38.000 5.007 < 2e-16
Step 2
Exclusion of WWL_Comp.2
.
gam.WpmWithLdl.3 <- bam(sDurLog ~
Affix +
#s(WWL_Comp.1) +
#s(WWL_Comp.2) +
s(WWL_Comp.3) +
SylPerMin+
pauseBin+
folType+
preC+
monoMultilingual+
s(speaker, bs="re"),
data=data, method = "REML")
compareML(gam.WpmWithLdl.2, gam.WpmWithLdl.3)
gam.WpmWithLdl.2: sDurLog ~ Affix + s(WWL_Comp.2) + s(WWL_Comp.3) + SylPerMin +
pauseBin + folType + preC + monoMultilingual + s(speaker,
bs = "re")
gam.WpmWithLdl.3: sDurLog ~ Affix + s(WWL_Comp.3) + SylPerMin + pauseBin + folType +
preC + monoMultilingual + s(speaker, bs = "re")
Model gam.WpmWithLdl.3 preferred: lower REML score (2.707), and lower df (2.000).
-----
Model Score Edf Difference Df
1 gam.WpmWithLdl.2 -151.3075 17
2 gam.WpmWithLdl.3 -154.0144 15 -2.707 2.000
AIC difference: -1.38, model gam.WpmWithLdl.2 has lower AIC.
Family: gaussian
Link function: identity
Formula:
sDurLog ~ Affix + s(WWL_Comp.3) + SylPerMin + pauseBin + folType +
preC + monoMultilingual + s(speaker, bs = "re")
Parametric Terms:
df F p-value
Affix 1 26.856 2.99e-07
SylPerMin 1 816.208 < 2e-16
pauseBin 1 38.939 8.21e-10
folType 4 4.850 0.000742
preC 3 1.368 0.251565
monoMultilingual 1 0.445 0.505057
Approximate significance of smooth terms:
edf Ref.df F p-value
s(WWL_Comp.3) 3.185 4.002 4.819 0.000775
s(speaker) 30.943 38.000 4.915 < 2e-16
Step 3
Exclusion of monoMultilingual
.
gam.WpmWithLdl.4 <- bam(sDurLog ~
Affix +
#s(WWL_Comp.1) +
#s(WWL_Comp.2) +
s(WWL_Comp.3) +
SylPerMin+
pauseBin+
folType+
preC+
#monoMultilingual+
s(speaker, bs="re"),
data=data, method = "REML")
compareML(gam.WpmWithLdl.3, gam.WpmWithLdl.4)
gam.WpmWithLdl.3: sDurLog ~ Affix + s(WWL_Comp.3) + SylPerMin + pauseBin + folType +
preC + monoMultilingual + s(speaker, bs = "re")
gam.WpmWithLdl.4: sDurLog ~ Affix + s(WWL_Comp.3) + SylPerMin + pauseBin + folType +
preC + s(speaker, bs = "re")
Model gam.WpmWithLdl.4 preferred: lower REML score (2.044), and lower df (1.000).
-----
Model Score Edf Difference Df
1 gam.WpmWithLdl.3 -154.0144 15
2 gam.WpmWithLdl.4 -156.0587 14 -2.044 1.000
AIC difference: 0.33, model gam.WpmWithLdl.4 has lower AIC.
Family: gaussian
Link function: identity
Formula:
sDurLog ~ Affix + s(WWL_Comp.3) + SylPerMin + pauseBin + folType +
preC + s(speaker, bs = "re")
Parametric Terms:
df F p-value
Affix 1 26.825 3.03e-07
SylPerMin 1 831.268 < 2e-16
pauseBin 1 38.776 8.88e-10
folType 4 4.838 0.000757
preC 3 1.353 0.256172
Approximate significance of smooth terms:
edf Ref.df F p-value
s(WWL_Comp.3) 3.191 4.009 4.829 0.000752
s(speaker) 31.659 39.000 4.842 < 2e-16
Step 4
Exclusion of preC
.
gam.WpmWithLdl.5 <- bam(sDurLog ~
Affix +
#s(WWL_Comp.1) +
#s(WWL_Comp.2) +
s(WWL_Comp.3) +
SylPerMin+
pauseBin+
folType+
#preC+
#monoMultilingual+
s(speaker, bs="re"),
data=data, method = "REML")
compareML(gam.WpmWithLdl.4, gam.WpmWithLdl.5)
gam.WpmWithLdl.4: sDurLog ~ Affix + s(WWL_Comp.3) + SylPerMin + pauseBin + folType +
preC + s(speaker, bs = "re")
gam.WpmWithLdl.5: sDurLog ~ Affix + s(WWL_Comp.3) + SylPerMin + pauseBin + folType +
s(speaker, bs = "re")
Model gam.WpmWithLdl.5 preferred: lower REML score (7.281), and lower df (3.000).
-----
Model Score Edf Difference Df
1 gam.WpmWithLdl.4 -156.0587 14
2 gam.WpmWithLdl.5 -163.3395 11 -7.281 3.000
AIC difference: 1.40, model gam.WpmWithLdl.5 has lower AIC.
Family: gaussian
Link function: identity
Formula:
sDurLog ~ Affix + s(WWL_Comp.3) + SylPerMin + pauseBin + folType +
s(speaker, bs = "re")
Parametric Terms:
df F p-value
Affix 1 26.473 3.61e-07
SylPerMin 1 850.839 < 2e-16
pauseBin 1 39.885 5.19e-10
folType 4 5.143 0.000442
Approximate significance of smooth terms:
edf Ref.df F p-value
s(WWL_Comp.3) 3.151 3.964 4.636 0.00106
s(speaker) 31.592 39.000 4.785 < 2e-16
Step 5
Is the smooth for WWL_Comp.3
necessary?
gam.WpmWithLdl.6 <- bam(sDurLog ~
Affix +
#s(WWL_Comp.1) +
#s(WWL_Comp.2) +
WWL_Comp.3 +
SylPerMin+
pauseBin+
folType+
#preC+
#monoMultilingual+
s(speaker, bs="re"),
data=data, method = "REML")
compareML(gam.WpmWithLdl.5, gam.WpmWithLdl.6)
gam.WpmWithLdl.5: sDurLog ~ Affix + s(WWL_Comp.3) + SylPerMin + pauseBin + folType +
s(speaker, bs = "re")
gam.WpmWithLdl.6: sDurLog ~ Affix + WWL_Comp.3 + SylPerMin + pauseBin + folType +
s(speaker, bs = "re")
Chi-square test of REML scores
-----
Model Score Edf Difference Df p.value Sig.
1 gam.WpmWithLdl.6 -159.2081 10
2 gam.WpmWithLdl.5 -163.3395 11 4.131 1.000 0.004 **
AIC difference: -9.43, model gam.WpmWithLdl.5 has lower AIC.
Yes, let’s keep the smooth.
Summary
Family: gaussian
Link function: identity
Formula:
sDurLog ~ Affix + s(WWL_Comp.3) + SylPerMin + pauseBin + folType +
s(speaker, bs = "re")
Parametric coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.8722959 0.0473099 -18.438 < 2e-16 ***
AffixPL -0.0754263 0.0146596 -5.145 3.61e-07 ***
SylPerMin -0.5032766 0.0172537 -29.169 < 2e-16 ***
pauseBinpause 0.1122957 0.0177810 6.315 5.19e-10 ***
folTypeF -0.0185203 0.0550083 -0.337 0.736472
folTypeN 0.0002509 0.0211516 0.012 0.990540
folTypeP 0.0095421 0.0185635 0.514 0.607421
folTypeV -0.0672956 0.0188033 -3.579 0.000372 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Approximate significance of smooth terms:
edf Ref.df F p-value
s(WWL_Comp.3) 3.151 3.964 4.636 0.00106 **
s(speaker) 31.592 39.000 4.785 < 2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
R-sq.(adj) = 0.804 Deviance explained = 81.7%
-REML = -163.34 Scale est. = 0.029512 n = 653
Summary:
* Affix : factor; set to the value(s): PL.
* SylPerMin : numeric predictor; set to the value(s): 2.336339423.
* pauseBin : factor; set to the value(s): no_pause.
* folType : factor; set to the value(s): APP.
* WWL_Comp.3 : numeric predictor; with 30 values ranging from -3.513786 to 3.834930.
* speaker : factor; set to the value(s): 2. (Might be canceled as random effect, check below.)
* NOTE : The following random effects columns are canceled: s(speaker)
However, as we only have a few values to the far left, but a clearly linear slope to the right for a multitude of data points, these results may indicate a rather linear relationship. That is, higher WWL_Comp.3
values lead to longer /s/ durations, i.e. a higher semantic certainty leads to longer /s/ durations.
9.4 LMER
Let’s see whether a linear mixed effects regression model does a similarly good job as compared to the previous GAMMs model.
Modelling
First, create a model with all relevant variables, including control and PCA variables, and speaker
as random effect:
lmer.1 <- lmer(sDurLog ~
Affix +
WWL_Comp.1 +
WWL_Comp.2 +
WWL_Comp.3 +
SylPerMin +
pauseBin +
folType +
preC +
(1 | monoMultilingual) +
(1 | speaker),
data=data, REML=T)
anova(lmer.1)
Type III Analysis of Variance Table with Satterthwaite's method
Sum Sq Mean Sq NumDF DenDF F value Pr(>F)
Affix 0.8772 0.8772 1 627.56 29.3486 8.626e-08 ***
WWL_Comp.1 0.0094 0.0094 1 611.01 0.3131 0.575983
WWL_Comp.2 0.0724 0.0724 1 610.91 2.4219 0.120166
WWL_Comp.3 0.2908 0.2908 1 607.50 9.7290 0.001900 **
SylPerMin 24.3084 24.3084 1 524.79 813.2841 < 2.2e-16 ***
pauseBin 1.1917 1.1917 1 634.01 39.8705 5.105e-10 ***
folType 0.5508 0.1377 4 610.60 4.6074 0.001135 **
preC 0.1804 0.0601 3 607.23 2.0114 0.111149
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Using the step() function, we exclude all non-significant predictors:
Backward reduced random-effect table:
Eliminated npar logLik AIC LRT Df Pr(>Chisq)
<none> 17 144.931 -255.86
(1 | monoMultilingual) 1 16 144.931 -257.86 0.000 1 1
(1 | speaker) 0 15 96.682 -163.37 96.498 1 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Backward reduced fixed-effect table:
Degrees of freedom method: Satterthwaite
Eliminated Sum Sq Mean Sq NumDF DenDF F value Pr(>F)
WWL_Comp.1 1 0.0094 0.0094 1 611.01 0.3131 0.5759828
preC 2 0.1711 0.0570 3 608.37 1.9105 0.1266333
WWL_Comp.2 3 0.0708 0.0708 1 616.28 2.3597 0.1250230
Affix 0 0.8281 0.8281 1 633.30 27.5107 2.133e-07 ***
WWL_Comp.3 0 0.2137 0.2137 1 612.02 7.0983 0.0079192 **
SylPerMin 0 24.9558 24.9558 1 532.92 829.1163 < 2.2e-16 ***
pauseBin 0 1.2666 1.2666 1 639.57 42.0816 1.754e-10 ***
folType 0 0.6187 0.1547 4 615.86 5.1387 0.0004454 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Model found:
sDurLog ~ Affix + WWL_Comp.3 + SylPerMin + pauseBin + folType +
(1 | speaker)
This leaves us with the following final model:
lmer.2 <- lmer(sDurLog ~
Affix +
WWL_Comp.3 +
SylPerMin +
pauseBin +
folType +
(1 | speaker),
data=data, REML=T)
anova(lmer.2)
Type III Analysis of Variance Table with Satterthwaite's method
Sum Sq Mean Sq NumDF DenDF F value Pr(>F)
Affix 0.8281 0.8281 1 633.30 27.5107 2.133e-07 ***
WWL_Comp.3 0.2137 0.2137 1 612.02 7.0983 0.0079192 **
SylPerMin 24.9558 24.9558 1 532.92 829.1163 < 2.2e-16 ***
pauseBin 1.2666 1.2666 1 639.57 42.0816 1.754e-10 ***
folType 0.6187 0.1547 4 615.86 5.1387 0.0004454 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
LMER vs. GAMM
Let’s compare the results of both, LMER and GAMM, models:
Linear mixed model fit by REML. t-tests use Satterthwaite's method [
lmerModLmerTest]
Formula: sDurLog ~ Affix + WWL_Comp.3 + SylPerMin + pauseBin + folType +
(1 | speaker)
Data: data
REML criterion at convergence: -318.4
Scaled residuals:
Min 1Q Median 3Q Max
-2.8969 -0.5968 -0.0009 0.6709 3.3861
Random effects:
Groups Name Variance Std.Dev.
speaker (Intercept) 0.00846 0.09198
Residual 0.03010 0.17349
Number of obs: 653, groups: speaker, 40
Fixed effects:
Estimate Std. Error df t value Pr(>|t|)
(Intercept) -0.886456 0.047377 391.229475 -18.711 < 2e-16 ***
AffixPL -0.077320 0.014741 633.298586 -5.245 2.13e-07 ***
WWL_Comp.3 0.013042 0.004895 612.024712 2.664 0.007919 **
SylPerMin -0.497273 0.017270 532.920182 -28.794 < 2e-16 ***
pauseBinpause 0.116108 0.017898 639.573724 6.487 1.75e-10 ***
folTypeF -0.022973 0.055467 619.793165 -0.414 0.678891
folTypeN -0.001099 0.021344 610.959988 -0.051 0.958951
folTypeP 0.008526 0.018726 612.168568 0.455 0.649052
folTypeV -0.068649 0.018978 619.630061 -3.617 0.000322 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Correlation of Fixed Effects:
(Intr) AffxPL WWL_C. SylPrM psBnps flTypF flTypN flTypP
AffixPL -0.086
WWL_Comp.3 0.094 0.077
SylPerMin -0.893 -0.080 -0.108
pauseBinpas -0.449 -0.124 -0.109 0.418
folTypeF -0.087 0.059 -0.021 0.020 0.002
folTypeN -0.136 0.091 0.026 -0.035 -0.072 0.135
folTypeP -0.075 0.029 -0.012 -0.117 -0.093 0.153 0.409
folTypeV 0.045 -0.016 0.021 -0.230 -0.227 0.139 0.413 0.489
Family: gaussian
Link function: identity
Formula:
sDurLog ~ Affix + s(WWL_Comp.3) + SylPerMin + pauseBin + folType +
s(speaker, bs = "re")
Parametric coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.8722959 0.0473099 -18.438 < 2e-16 ***
AffixPL -0.0754263 0.0146596 -5.145 3.61e-07 ***
SylPerMin -0.5032766 0.0172537 -29.169 < 2e-16 ***
pauseBinpause 0.1122957 0.0177810 6.315 5.19e-10 ***
folTypeF -0.0185203 0.0550083 -0.337 0.736472
folTypeN 0.0002509 0.0211516 0.012 0.990540
folTypeP 0.0095421 0.0185635 0.514 0.607421
folTypeV -0.0672956 0.0188033 -3.579 0.000372 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Approximate significance of smooth terms:
edf Ref.df F p-value
s(WWL_Comp.3) 3.151 3.964 4.636 0.00106 **
s(speaker) 31.592 39.000 4.785 < 2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
R-sq.(adj) = 0.804 Deviance explained = 81.7%
-REML = -163.34 Scale est. = 0.029512 n = 653
10 LDLConvFunctions measures
LDLConvFunctions
includes the following measures as implemented in Chuang et al. (2020): click here.
Without discussing the underlying implications of the individual measures, let’s see whether some of them are correlated. If so, we need to take some measures before including them in any models.
10.1 Correlations
Correlations of all LDLConvFunctions
measures:
We find the following highly correlated (\(\rho\ge0.5\)) variable combination:
- EDNN vs. SCPP, \(\rho=1\)
Thus, we need to take some measures before we run into problems of colinearity later on.
10.2 PCA
To avoid colinearity issues when modelling, we combine all highly correlated variables in a principal component analysis.
PCA Computation
First, we extract the highly correlated variables and merge them to form a new data frame.
pca.data1 <- data[colnames(data)=="EDNN"]
pca.data2 <- data[colnames(data)=="SCPP"]
pca.data <- cbind(pca.data1,
pca.data2)
Then, we compute a principal component analysis using the princomp
function:
In order to decide which components to keep, we follow three criteria:
- The Eigenvalue-criterion: Any component that displays an eigenvalue greater than 1.00 is accounting for a greater amount of variance than had been contributed by one variable. Such a component is therefore accounting for a meaningful amount of variance, and is worthy of being retained.
- The scree-criterion: Plot the eigenvalues associated with each component and look for a “break” between the components with relatively large eigenvalues and those with small eigenvalues. The components that appear before the break are assumed to be meaningful and are retained (Cattell, 1966).
- The proportion-of-variance-criterion: Retain enough components so that the cumulative percent of variance accounted for is equal to some minimal value, i.e. 80 percent.
# load package to extract Eigenvalues
library("factoextra")
# extract Eigenvalues
get_eigenvalue(pc)
eigenvalue variance.percent cumulative.variance.percent
Dim.1 1.8592701 92.963503 92.9635
Dim.2 0.1407299 7.036497 100.0000
Taking all three criteria into consideration, we find that Component 1 shows an Eigenvalue of \(v\ge1\), is followed by a break in Eigenvalues with the Eigenvalue of Component 1 being \(v=1.9\) and the Eigenvalue of Component 2 being \(v=0.1\), and that the cumulative percentage of variance accounted for by Component 1 is \(σ^2=93\%\), i.e. above our target level of \(σ^2=80\%\).
Thus, we add component 1 to our data frame.
pcframe <- as.data.frame(pc$scores)
pcframe <- pcframe[1]
names(pcframe)[names(pcframe) == "Comp.1"] <- "LCF_Comp.1"
data <- cbind(data, pcframe)
PCA Loadings
Now we can take a closer look at the component’s loadings.
Loadings:
Comp.1 Comp.2
EDNN 0.707 0.707
SCPP -0.707 0.707
Comp.1 Comp.2
SS loadings 1.0 1.0
Proportion Var 0.5 0.5
Cumulative Var 0.5 1.0
But what do these loadings conceptually mean?
Component 1
Component 1
is rather highly correlated with EDNN
and rather highly negatively correlated with SCPP
.
For EDNN
, high values indicate that a word is more distant from its nearest word neighbour.
For SCPP
, high values indicate a high correlation of the predicted path [and its predicted semantics] with the targeted semantic vector.
Note: SCPP
is identical to the correlations
measure of the WpmWithLdl
package.
Thus, for Component 1
, high values indicate less activation diversity, while lower values indicate more activation diversity.
10.3 GAMMs
Let’s see whether PCA component and LCF measures are significant predictors for word-final /s/ duration in singular and plural pseudowords. We will use Generalized Additive Mixed Models (GAMMs) to model /s/ durations.
Modelling
Step 0
First, create a model with all relevant variables, including control and PCA variables, and speaker
as random effect:
gam.LCF.1 <- bam(sDurLog ~
Affix +
s(LCF_Comp.1) +
s(ALC) +
s(ALDC, k=3) +
s(DRC) +
s(NNC) +
SylPerMin+
pauseBin+
folType+
preC+
monoMultilingual+
s(speaker, bs="re"),
data=data, method = "REML")
Checking the anova()
function’s output, we now exclude non-significant variables in a stepwise fashion. After each potential exclusion, compareML()
is used to check whether the exclusion is justified in terms of model fit.
Family: gaussian
Link function: identity
Formula:
sDurLog ~ Affix + s(LCF_Comp.1) + s(ALC) + s(ALDC, k = 3) + s(DRC) +
s(NNC) + SylPerMin + pauseBin + folType + preC + monoMultilingual +
s(speaker, bs = "re")
Parametric Terms:
df F p-value
Affix 1 18.228 2.28e-05
SylPerMin 1 826.702 < 2e-16
pauseBin 1 40.703 3.53e-10
folType 4 4.794 0.000819
preC 3 2.527 0.056543
monoMultilingual 1 0.352 0.553418
Approximate significance of smooth terms:
edf Ref.df F p-value
s(LCF_Comp.1) 1.000 1.000 0.339 0.5606
s(ALC) 2.334 2.917 3.856 0.0104
s(ALDC) 1.000 1.000 0.031 0.8602
s(DRC) 1.443 1.746 0.996 0.4691
s(NNC) 1.647 2.031 2.820 0.0585
s(speaker) 31.391 38.000 5.275 <2e-16
Step 1
Exclusion of ALDC
.
gam.LCF.2 <- bam(sDurLog ~
Affix +
s(LCF_Comp.1) +
s(ALC) +
#s(ALDC, k=3) +
s(DRC) +
s(NNC) +
SylPerMin+
pauseBin+
folType+
preC+
monoMultilingual+
s(speaker, bs="re"),
data=data, method = "REML")
compareML(gam.LCF.1, gam.LCF.2)
gam.LCF.1: sDurLog ~ Affix + s(LCF_Comp.1) + s(ALC) + s(ALDC, k = 3) + s(DRC) +
s(NNC) + SylPerMin + pauseBin + folType + preC + monoMultilingual +
s(speaker, bs = "re")
gam.LCF.2: sDurLog ~ Affix + s(LCF_Comp.1) + s(ALC) + s(DRC) + s(NNC) +
SylPerMin + pauseBin + folType + preC + monoMultilingual +
s(speaker, bs = "re")
Model gam.LCF.2 preferred: lower REML score (3.869), and lower df (2.000).
-----
Model Score Edf Difference Df
1 gam.LCF.1 -145.5339 23
2 gam.LCF.2 -149.4032 21 -3.869 2.000
AIC difference: 2.76, model gam.LCF.2 has lower AIC.
Family: gaussian
Link function: identity
Formula:
sDurLog ~ Affix + s(LCF_Comp.1) + s(ALC) + s(DRC) + s(NNC) +
SylPerMin + pauseBin + folType + preC + monoMultilingual +
s(speaker, bs = "re")
Parametric Terms:
df F p-value
Affix 1 18.548 1.93e-05
SylPerMin 1 830.084 < 2e-16
pauseBin 1 40.760 3.44e-10
folType 4 4.803 0.000806
preC 3 2.506 0.058130
monoMultilingual 1 0.352 0.553339
Approximate significance of smooth terms:
edf Ref.df F p-value
s(LCF_Comp.1) 1.000 1.001 0.383 0.53643
s(ALC) 2.389 2.980 3.920 0.00896
s(DRC) 1.505 1.831 1.174 0.40004
s(NNC) 1.624 2.003 2.874 0.05689
s(speaker) 31.408 38.000 5.287 < 2e-16
Step 2
Exclusion of LCF_Comp.1
.
gam.LCF.3 <- bam(sDurLog ~
Affix +
#s(LCF_Comp.1) +
s(ALC) +
#s(ALDC, k=3) +
s(DRC) +
s(NNC) +
SylPerMin+
pauseBin+
folType+
preC+
monoMultilingual+
s(speaker, bs="re"),
data=data, method = "REML")
compareML(gam.LCF.2, gam.LCF.3)
gam.LCF.2: sDurLog ~ Affix + s(LCF_Comp.1) + s(ALC) + s(DRC) + s(NNC) +
SylPerMin + pauseBin + folType + preC + monoMultilingual +
s(speaker, bs = "re")
gam.LCF.3: sDurLog ~ Affix + s(ALC) + s(DRC) + s(NNC) + SylPerMin + pauseBin +
folType + preC + monoMultilingual + s(speaker, bs = "re")
Model gam.LCF.3 preferred: lower REML score (3.834), and lower df (2.000).
-----
Model Score Edf Difference Df
1 gam.LCF.2 -149.4032 21
2 gam.LCF.3 -153.2369 19 -3.834 2.000
AIC difference: 1.70, model gam.LCF.3 has lower AIC.
Family: gaussian
Link function: identity
Formula:
sDurLog ~ Affix + s(ALC) + s(DRC) + s(NNC) + SylPerMin + pauseBin +
folType + preC + monoMultilingual + s(speaker, bs = "re")
Parametric Terms:
df F p-value
Affix 1 18.619 1.86e-05
SylPerMin 1 831.416 < 2e-16
pauseBin 1 40.924 3.17e-10
folType 4 4.911 0.000667
preC 3 2.770 0.040903
monoMultilingual 1 0.347 0.555956
Approximate significance of smooth terms:
edf Ref.df F p-value
s(ALC) 2.383 2.973 4.280 0.00554
s(DRC) 1.562 1.910 1.116 0.39325
s(NNC) 1.633 2.015 3.027 0.04825
s(speaker) 31.383 38.000 5.281 < 2e-16
Step 3
Exclusion of DRC
.
gam.LCF.4 <- bam(sDurLog ~
Affix +
#s(LCF_Comp.1) +
s(ALC) +
#s(ALDC, k=3) +
#s(DRC) +
s(NNC) +
SylPerMin+
pauseBin+
folType+
preC+
monoMultilingual+
s(speaker, bs="re"),
data=data, method = "REML")
compareML(gam.LCF.3, gam.LCF.4)
gam.LCF.3: sDurLog ~ Affix + s(ALC) + s(DRC) + s(NNC) + SylPerMin + pauseBin +
folType + preC + monoMultilingual + s(speaker, bs = "re")
gam.LCF.4: sDurLog ~ Affix + s(ALC) + s(NNC) + SylPerMin + pauseBin + folType +
preC + monoMultilingual + s(speaker, bs = "re")
Model gam.LCF.4 preferred: lower REML score (3.158), and lower df (2.000).
-----
Model Score Edf Difference Df
1 gam.LCF.3 -153.2369 19
2 gam.LCF.4 -156.3945 17 -3.158 2.000
AIC difference: 2.49, model gam.LCF.4 has lower AIC.
Family: gaussian
Link function: identity
Formula:
sDurLog ~ Affix + s(ALC) + s(NNC) + SylPerMin + pauseBin + folType +
preC + monoMultilingual + s(speaker, bs = "re")
Parametric Terms:
df F p-value
Affix 1 21.522 4.29e-06
SylPerMin 1 829.083 < 2e-16
pauseBin 1 41.677 2.21e-10
folType 4 4.840 0.000756
preC 3 2.173 0.090067
monoMultilingual 1 0.378 0.538845
Approximate significance of smooth terms:
edf Ref.df F p-value
s(ALC) 2.653 3.304 5.397 0.000872
s(NNC) 1.326 1.573 4.464 0.012851
s(speaker) 31.367 38.000 5.273 < 2e-16
Step 4
Exclusion of monoMultilingual
.
gam.LCF.5 <- bam(sDurLog ~
Affix +
#s(LCF_Comp.1) +
s(ALC) +
#s(ALDC, k=3) +
#s(DRC) +
s(NNC) +
SylPerMin+
pauseBin+
folType+
preC+
#monoMultilingual+
s(speaker, bs="re"),
data=data, method = "REML")
compareML(gam.LCF.4, gam.LCF.5)
gam.LCF.4: sDurLog ~ Affix + s(ALC) + s(NNC) + SylPerMin + pauseBin + folType +
preC + monoMultilingual + s(speaker, bs = "re")
gam.LCF.5: sDurLog ~ Affix + s(ALC) + s(NNC) + SylPerMin + pauseBin + folType +
preC + s(speaker, bs = "re")
Model gam.LCF.5 preferred: lower REML score (2.056), and lower df (1.000).
-----
Model Score Edf Difference Df
1 gam.LCF.4 -156.3945 17
2 gam.LCF.5 -158.4508 16 -2.056 1.000
AIC difference: 0.37, model gam.LCF.5 has lower AIC.
Family: gaussian
Link function: identity
Formula:
sDurLog ~ Affix + s(ALC) + s(NNC) + SylPerMin + pauseBin + folType +
preC + s(speaker, bs = "re")
Parametric Terms:
df F p-value
Affix 1 21.459 4.43e-06
SylPerMin 1 844.014 < 2e-16
pauseBin 1 41.531 2.37e-10
folType 4 4.828 0.000771
preC 3 2.174 0.089918
Approximate significance of smooth terms:
edf Ref.df F p-value
s(ALC) 2.657 3.309 5.392 0.000872
s(NNC) 1.338 1.593 4.423 0.012882
s(speaker) 32.088 39.000 5.190 < 2e-16
Step 5
Exclusion of preC
.
gam.LCF.6 <- bam(sDurLog ~
Affix +
#s(LCF_Comp.1) +
s(ALC) +
#s(ALDC, k=3) +
#s(DRC) +
s(NNC) +
SylPerMin+
pauseBin+
folType+
#preC+
#monoMultilingual+
s(speaker, bs="re"),
data=data, method = "REML")
compareML(gam.LCF.5, gam.LCF.6)
gam.LCF.5: sDurLog ~ Affix + s(ALC) + s(NNC) + SylPerMin + pauseBin + folType +
preC + s(speaker, bs = "re")
gam.LCF.6: sDurLog ~ Affix + s(ALC) + s(NNC) + SylPerMin + pauseBin + folType +
s(speaker, bs = "re")
Model gam.LCF.6 preferred: lower REML score (6.212), and lower df (3.000).
-----
Model Score Edf Difference Df
1 gam.LCF.5 -158.4508 16
2 gam.LCF.6 -164.6630 13 -6.212 3.000
AIC difference: 2.93, model gam.LCF.6 has lower AIC.
Family: gaussian
Link function: identity
Formula:
sDurLog ~ Affix + s(ALC) + s(NNC) + SylPerMin + pauseBin + folType +
s(speaker, bs = "re")
Parametric Terms:
df F p-value
Affix 1 23.657 1.47e-06
SylPerMin 1 862.936 < 2e-16
pauseBin 1 41.921 1.96e-10
folType 4 5.055 0.000517
Approximate significance of smooth terms:
edf Ref.df F p-value
s(ALC) 2.762 3.436 4.627 0.00241
s(NNC) 1.000 1.000 7.142 0.00772
s(speaker) 32.011 39.000 5.111 < 2e-16
Step 6
Remove smooth of NNC
.
gam.LCF.7 <- bam(sDurLog ~
Affix +
#s(LCF_Comp.1) +
s(ALC) +
#s(ALDC, k=3) +
#s(DRC) +
NNC +
SylPerMin+
pauseBin+
folType+
#preC+
#monoMultilingual+
s(speaker, bs="re"),
data=data, method = "REML")
compareML(gam.LCF.6, gam.LCF.7)
gam.LCF.6: sDurLog ~ Affix + s(ALC) + s(NNC) + SylPerMin + pauseBin + folType +
s(speaker, bs = "re")
gam.LCF.7: sDurLog ~ Affix + s(ALC) + NNC + SylPerMin + pauseBin + folType +
s(speaker, bs = "re")
Model gam.LCF.7 preferred: lower REML score (3.184), and lower df (1.000).
-----
Model Score Edf Difference Df
1 gam.LCF.6 -164.6630 13
2 gam.LCF.7 -167.8472 12 -3.184 1.000
AIC difference: 0.02, model gam.LCF.7 has lower AIC.
Family: gaussian
Link function: identity
Formula:
sDurLog ~ Affix + s(ALC) + NNC + SylPerMin + pauseBin + folType +
s(speaker, bs = "re")
Parametric Terms:
df F p-value
Affix 1 23.660 1.47e-06
NNC 1 7.145 0.007717
SylPerMin 1 862.936 < 2e-16
pauseBin 1 41.921 1.96e-10
folType 4 5.055 0.000517
Approximate significance of smooth terms:
edf Ref.df F p-value
s(ALC) 2.762 3.436 4.627 0.00241
s(speaker) 32.011 39.000 5.111 < 2e-16
Step 7
Remove smooth of ALC
.
gam.LCF.8 <- bam(sDurLog ~
Affix +
#s(LCF_Comp.1) +
ALC +
#s(ALDC, k=3) +
#s(DRC) +
NNC +
SylPerMin+
pauseBin+
folType+
#preC+
#monoMultilingual+
s(speaker, bs="re"),
data=data, method = "REML")
compareML(gam.LCF.7, gam.LCF.8)
gam.LCF.7: sDurLog ~ Affix + s(ALC) + NNC + SylPerMin + pauseBin + folType +
s(speaker, bs = "re")
gam.LCF.8: sDurLog ~ Affix + ALC + NNC + SylPerMin + pauseBin + folType +
s(speaker, bs = "re")
Model gam.LCF.8 preferred: lower REML score (2.857), and lower df (1.000).
-----
Model Score Edf Difference Df
1 gam.LCF.7 -167.8472 12
2 gam.LCF.8 -170.7037 11 -2.857 1.000
AIC difference: -4.35, model gam.LCF.7 has lower AIC.
Family: gaussian
Link function: identity
Formula:
sDurLog ~ Affix + ALC + NNC + SylPerMin + pauseBin + folType +
s(speaker, bs = "re")
Parametric Terms:
df F p-value
Affix 1 24.728 8.59e-07
ALC 1 9.188 0.002539
NNC 1 5.866 0.015729
SylPerMin 1 852.004 < 2e-16
pauseBin 1 42.316 1.62e-10
folType 4 5.131 0.000452
Approximate significance of smooth terms:
edf Ref.df F p-value
s(speaker) 31.82 39.00 4.959 <2e-16
Summary
Family: gaussian
Link function: identity
Formula:
sDurLog ~ Affix + ALC + NNC + SylPerMin + pauseBin + folType +
s(speaker, bs = "re")
Parametric coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.353527 0.215506 -1.640 0.10143
AffixPL -0.091180 0.018336 -4.973 8.59e-07 ***
ALC -2.961153 0.976879 -3.031 0.00254 **
NNC -0.502288 0.207393 -2.422 0.01573 *
SylPerMin -0.502371 0.017211 -29.189 < 2e-16 ***
pauseBinpause 0.114892 0.017662 6.505 1.62e-10 ***
folTypeF -0.034296 0.054918 -0.624 0.53254
folTypeN -0.003295 0.021097 -0.156 0.87594
folTypeP 0.008877 0.018509 0.480 0.63170
folTypeV -0.067817 0.018782 -3.611 0.00033 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Approximate significance of smooth terms:
edf Ref.df F p-value
s(speaker) 31.82 39 4.959 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
R-sq.(adj) = 0.805 Deviance explained = 81.7%
-REML = -170.7 Scale est. = 0.029397 n = 653
However, as we only have linear predictors in our final model, doing GAMMs appears to be rather unnecessary. Thus, let’s see what LMER models can do.
10.4 LMER
Let’s see whether a linear mixed effects regression model does a similarly good job as compared to the previous GAMMs model.
Modelling
First, create a model with all relevant variables, including control and PCA variables, and speaker
as random effect:
lmer.1 <- lmer(sDurLog ~
Affix +
LCF_Comp.1 +
ALC +
ALDC +
DRC +
NNC +
SylPerMin+
pauseBin+
folType+
preC+
(1 | monoMultilingual) +
(1 | speaker),
data=data, REML=T)
anova(lmer.1)
Type III Analysis of Variance Table with Satterthwaite's method
Sum Sq Mean Sq NumDF DenDF F value Pr(>F)
Affix 0.6109 0.6109 1 618.06 20.9409 5.727e-06 ***
LCF_Comp.1 0.0076 0.0076 1 609.27 0.2612 0.6094903
ALC 0.2621 0.2621 1 609.61 8.9846 0.0028334 **
ALDC 0.0128 0.0128 1 610.09 0.4389 0.5079133
DRC 0.0396 0.0396 1 603.60 1.3568 0.2445494
NNC 0.0959 0.0959 1 604.22 3.2866 0.0703435 .
SylPerMin 24.2848 24.2848 1 536.13 832.4275 < 2.2e-16 ***
pauseBin 1.1868 1.1868 1 631.01 40.6805 3.467e-10 ***
folType 0.5527 0.1382 4 608.14 4.7359 0.0009061 ***
preC 0.2271 0.0757 3 604.96 2.5952 0.0516470 .
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Using the step() function, we exclude all non-significant predictors:
Backward reduced random-effect table:
Eliminated npar logLik AIC LRT Df Pr(>Chisq)
<none> 19 157.21 -276.41
(1 | monoMultilingual) 1 18 157.21 -278.41 0.0 1 1
(1 | speaker) 0 17 105.46 -176.92 103.5 1 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Backward reduced fixed-effect table:
Degrees of freedom method: Satterthwaite
Eliminated Sum Sq Mean Sq NumDF DenDF F value Pr(>F)
LCF_Comp.1 1 0.0076 0.0076 1 609.27 0.2612 0.6094904
ALDC 2 0.0166 0.0166 1 612.23 0.5697 0.4506751
DRC 3 0.0627 0.0627 1 605.03 2.1512 0.1429746
preC 4 0.2058 0.0686 3 608.03 2.3513 0.0712936 .
Affix 0 0.7270 0.7270 1 625.06 24.7284 8.538e-07 ***
ALC 0 0.2701 0.2701 1 615.70 9.1884 0.0025378 **
NNC 0 0.1724 0.1724 1 611.25 5.8657 0.0157289 *
SylPerMin 0 25.0468 25.0468 1 541.64 852.0041 < 2.2e-16 ***
pauseBin 0 1.2440 1.2440 1 637.49 42.3161 1.571e-10 ***
folType 0 0.6034 0.1508 4 614.43 5.1310 0.0004516 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Model found:
sDurLog ~ Affix + ALC + NNC + SylPerMin + pauseBin + folType +
(1 | speaker)
This leaves us with the following final model:
lmer.2 <- lmer(sDurLog ~
Affix +
ALC +
NNC +
SylPerMin +
pauseBin +
folType +
(1 | speaker),
data=data, REML=T)
anova(lmer.2)
Type III Analysis of Variance Table with Satterthwaite's method
Sum Sq Mean Sq NumDF DenDF F value Pr(>F)
Affix 0.7270 0.7270 1 625.06 24.7284 8.538e-07 ***
ALC 0.2701 0.2701 1 615.70 9.1884 0.0025378 **
NNC 0.1724 0.1724 1 611.25 5.8657 0.0157289 *
SylPerMin 25.0468 25.0468 1 541.64 852.0041 < 2.2e-16 ***
pauseBin 1.2440 1.2440 1 637.49 42.3161 1.571e-10 ***
folType 0.6034 0.1508 4 614.43 5.1310 0.0004516 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
LMER vs. GAMM
Let’s compare the results of both, LMER and GAMM, models:
Linear mixed model fit by REML. t-tests use Satterthwaite's method [
lmerModLmerTest]
Formula: sDurLog ~ Affix + ALC + NNC + SylPerMin + pauseBin + folType +
(1 | speaker)
Data: data
REML criterion at convergence: -341.4
Scaled residuals:
Min 1Q Median 3Q Max
-2.6783 -0.6057 -0.0213 0.7076 3.3649
Random effects:
Groups Name Variance Std.Dev.
speaker (Intercept) 0.00892 0.09445
Residual 0.02940 0.17146
Number of obs: 653, groups: speaker, 40
Fixed effects:
Estimate Std. Error df t value Pr(>|t|)
(Intercept) -0.353527 0.215506 627.518776 -1.640 0.10141
AffixPL -0.091180 0.018336 625.056760 -4.973 8.54e-07 ***
ALC -2.961153 0.976879 615.698014 -3.031 0.00254 **
NNC -0.502288 0.207393 611.245425 -2.422 0.01573 *
SylPerMin -0.502371 0.017211 541.640869 -29.189 < 2e-16 ***
pauseBinpause 0.114892 0.017662 637.486511 6.505 1.57e-10 ***
folTypeF -0.034296 0.054918 618.211537 -0.624 0.53253
folTypeN -0.003295 0.021097 609.693974 -0.156 0.87594
folTypeP 0.008877 0.018509 610.753479 0.480 0.63170
folTypeV -0.067817 0.018782 618.064091 -3.611 0.00033 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Correlation of Fixed Effects:
(Intr) AffxPL ALC NNC SylPrM psBnps flTypF flTypN flTypP
AffixPL -0.568
ALC 0.246 -0.364
NNC -0.975 0.574 -0.294
SylPerMin -0.297 -0.013 0.050 0.104
pauseBinpas -0.153 -0.071 0.038 0.057 0.415
folTypeF -0.042 0.050 0.045 0.022 0.024 0.004
folTypeN -0.013 0.056 0.025 -0.018 -0.031 -0.069 0.136
folTypeP -0.026 0.031 -0.005 0.011 -0.117 -0.095 0.153 0.409
folTypeV 0.058 -0.045 0.021 -0.050 -0.232 -0.228 0.138 0.413 0.489
Family: gaussian
Link function: identity
Formula:
sDurLog ~ Affix + ALC + NNC + SylPerMin + pauseBin + folType +
s(speaker, bs = "re")
Parametric coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.353527 0.215506 -1.640 0.10143
AffixPL -0.091180 0.018336 -4.973 8.59e-07 ***
ALC -2.961153 0.976879 -3.031 0.00254 **
NNC -0.502288 0.207393 -2.422 0.01573 *
SylPerMin -0.502371 0.017211 -29.189 < 2e-16 ***
pauseBinpause 0.114892 0.017662 6.505 1.62e-10 ***
folTypeF -0.034296 0.054918 -0.624 0.53254
folTypeN -0.003295 0.021097 -0.156 0.87594
folTypeP 0.008877 0.018509 0.480 0.63170
folTypeV -0.067817 0.018782 -3.611 0.00033 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Approximate significance of smooth terms:
edf Ref.df F p-value
s(speaker) 31.82 39 4.959 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
R-sq.(adj) = 0.805 Deviance explained = 81.7%
-REML = -170.7 Scale est. = 0.029397 n = 653
Interpretation
For ALC
, higher values indicate that a pseudoword’s semantic vector is part of a denser semantic neighbourhood.
For NNC
, higher values indicate that a pseudoword’s semantics are close to a real word’s semantics.
References
Baayen, R. H., Chuang, Y. Y., and Blevins, J. P. (2018). Inflectional morphology with linear mappings. The Mental Lexicon, 13 (2), 232-270. doi.org/10.1075/ml.18010.baa
Baayen, R. H., Chuang, Y. Y., Shafaei-Bajestan, E., & Blevins, J. P. (2019). The discriminative lexicon: A unified computational model for the lexicon and lexical processing in comprehension and production grounded not in (de)composition but in linear discriminative learning. Complexity, 2019, 1-39. doi.org/10.1155/2019/4895891
Cattell, R. B. (1966). The scree test for the number of factors. Multivariate Behavioral Research, 1, 245-276.
Chuang, Y-Y., Vollmer, M-l., Shafaei-Bajestan, E., Gahl, S., Hendrix, P., & Baayen, R. H. (2020). The processing of pseudoword form and meaning in production and comprehension: A computational modeling approach using Linear Discriminative Learning. Behavior Research Methods, 1-51. doi.org/10.3758/s13428-020-01356-w
Nykamp, D. (2020, November 10). Multiplying matrices and vectors. Math Insight. mathinsight.org/matrix_vector_multiplication
Schmitz, D., Baer-Henney, D., & Plag, I. (submitted). The duration of word-final /s/ differs across morphological categories in English: Evidence from pseudowords.
Tucker, B. V., Brenner, D., Danielson, D. K., Kelley, M. C., Nenadic, F., & Sims, M. (2018). The massive auditory lexical decision (MALD) database. Behavior Research Methods, 1-18. doi.org/10.3758/s13428-018-1056-1