This script documents the analysis of ENGS production data with measures of Linear Discriminative Learning (LDL; e.g. Baayen et al., 2018; Chuang et al., 2020). This is a work in progress.

The idea of this LDL implementation is the following:
Networks trained on real word data and their transformation matrices are used to create semantic matrices for pseudowords, following the implementation by Chuang et al. (2020). The underlying assumption is that speakers who face pseudowords do so with their knowledge on previously encountered words, i.e. their lexicon. Thus, pseudowords are not semantically empty shells but contain some sort of meaning.

The following script includes a reproduction of most steps found in the three previous scripts, here, here, and here. Newly added is the updated pseudoword S matrix creation as well as a more thorough modelling procedure.


Note: In the remainder of this document, the following naming conventions are used:

1 Data

1.1 MALD data

Data on real words is taken from the Massive Auditory Lexical Decision (MALD) Database (Tucker et al., 2018). It is a subset of the complete MALD database, containing 8362 words of which 2120 words contain one of 36 different affixes.

## load real words (MALD)
mald.data <- read.csv("data666/mald.subset.dfr.csv")[2:5]

dim(mald.data)
[1] 8362    4

1.2 ENGS data

Pseudoword data is taken from the ENGS production experiment (Schmitz et al., 2020). In total, 48 pseudowords are contained within this data set, with half of them representing monomorphemic words (e.g. singular bloufs) and half of them representing plural wordforms (e.g. plural bloufs). As a number of pseudowords showed more than one consistent pronunciation, these are contained more than twice, e.g. prups.

## load pseudowords (ENGS)
engs.data <- read.csv("data666/final_S_data_LDL_types.csv")

dim(engs.data)
[1] 78  4

1.3 COMB data

The combined data set contains both data on real and on nonce words as introduced in 1.1 and 1.2. In total, 8440 words and 36 different affixes are part of this data set.

## create a combined data set
comb.data <- rbind(mald.data, engs.data)

## sort combined data set alphabetically
comb.data <- comb.data[order(comb.data$Word),]

dim(comb.data)
[1] 8440    4

2 A matrix

We can create an \(A\) matrix for real words using the semantic vectors constructed of the TASA corpus. The \(A\) matrix consists of semantic vectors (columns) for all lexomes in orthographic form (rows). Note: Inflectional and derivational lexomes are represented by abbreviations, e.g. PL for plural.

X meat mechanic mechanical mechanism medal
eat 0.0118417 -0.0001872 0.0002885 -0.0002331 1.98e-05
echo 0.0005983 -0.0000384 -0.0000922 -0.0000899 -2.05e-05
eclipse 0.0000350 -0.0003962 -0.0000451 -0.0000278 -9.70e-06
## load TASA vectors
load("data/wL2Ld.rda")

## find all unique MALD (= real word) lexomes
mald.subset.allLexomes = unique(c(mald.data$Word, mald.data$Affix))
mald.subset.allLexomes = mald.subset.allLexomes[!is.na(mald.subset.allLexomes)]

## find vectors for all unique lexomes > this then is the A matrix
mald.Amat = wL2Ld[mald.subset.allLexomes, ]

3 C matrices

Cue matrices contain all triphones of a data set (columns) which are coded binary for all words in phonological form (rows).

X X..b X.b. b.n X.nd nd. d.n X.nm nm. m.n X.nt nt. b.S X.S. X..b.1 X.bI
@b{ 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0
@b{S 1 1 0 0 0 0 0 0 0 0 0 1 1 0 0
{bI 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1
{ 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0
{bdIk1t 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0

3.1 real words

The cue matrix for real words \(mald.C\) is computed based on the MALD data set and \(Word+Affix\), i.e. words and affixes are considered.

mald.cues = make_cue_matrix(data = mald.data, formula = ~ Word + Affix, grams = 3, wordform = "DISC")

dim(mald.cues$matrices$C)
# 8362 7610

3.2 pseudowords

The cue matrix for pseudowords \(engs.C\) is computed based on the ENGS data set and \(Word+Affix\), i.e. words and affixes are considered.

engs.cues = make_cue_matrix(data = engs.data, formula = ~ Word + Affix, grams = 3, wordform = "DISC")

dim(engs.cues$matrices$C)
# 78 78

3.3 combined

The cue matrix for real and nonce words \(comb.C\) is computed based on the combined data set and \(Word+Affix\), i.e. words and affixes are considered.

comb.cues = make_cue_matrix(data = comb.data, formula = ~ Word + Affix, grams = 3, wordform = "DISC")

dim(comb.cues$matrices$C)
# 8440 7610

4 S matrices

Semantic matrices contain all word forms of a data set (columns) and their pertinent values for all lexomes in phonological form (rows).

X meat mechanic mechanical mechanism medal
it 0.0118417 -0.0001872 0.0002885 -0.0002331 1.98e-05
Ek5 0.0005983 -0.0000384 -0.0000922 -0.0000899 -2.05e-05
IklIps 0.0000350 -0.0003962 -0.0000451 -0.0000278 -9.70e-06

4.1 real words

The semantic matrix for real words \(mald.S\) is computed based on the MALD data set, the MALD \(A\) matrix, and \(Word+Affix\), i.e. words and affixes are considered.

mald.Smat = make_S_matrix(data = mald.data, formula = ~ Word + Affix, inputA = mald.Amat, grams = 3, wordform = "DISC")

dim(mald.Smat)
# 8362 5487

4.2 pseudowords

We can create a semantic matrix for pseudowords \(engs.S\) based on the semantics of real words solving the following equation

\[engs.Smat = engs.C * F\] with \(engs.Smat\) being the semantic matrix for pseudowords, \(engs.C\) being the cue matrix of pseudowords, and \(F\) being the transformation matrix for mapping real word cues onto real word semantics.

To obtain this real word transformation matrix \(F\) we first need to learn comprehension of real words.

mald.comp = learn_comprehension(cue_obj = mald.cues, S = mald.Smat)

Then, we can extract \(F\) from mald.comp.

F = mald.comp$F

dim(F) 
# 7610 5487

The following toy example illustrates the use of the \(F\) transformation matrix in mapping \(C\) onto \(S\), i.e. solving \[mald.C * F = predicted.mald.Smat\] Consider the following toy example illustrating the procedure. The lexicon of this toy example includes the words cat, bus, and eel. Thus, its cue matrix \(C\) looks as follows:

For the same toy lexicon, suppose that the semantic vectors for these three words are the row vectors of the following semantic matrix \(S\):

Then, we are interested in a transformation matrix \(F\), such that

\[C*F=S\] The transformation matrix \(F\) is straightforward to obtain. Let \(C'\) denote the Moore-Penrose generalized inverse of \(C\). Then,

\[F=C'S\]

Thus, for the toy lexicon example,

with \(CF\) being exactly equal to \(S\) in this simple toy example.

Additionally, for matrix multiplication, the number of columns in the first matrix must be equal to the number of rows in the second matrix (e.g. Nykamp, 2020). While this is easily achieved, one must also consider the order of columns. This is easily illustrated by the following example:

\[ \left(\begin{array}{cc} A & B\\ C & D \end{array}\right) \left(\begin{array}{cc} \alpha & \beta \\ \gamma & \delta \end{array}\right) = \left(\begin{array}{cc} A\alpha+B\gamma & A\beta+B\delta \\ C\alpha+D\gamma & C\beta+D\delta \end{array}\right) \] Let the first matrix be a simplified version of \(C_{pseudowords}\) and the second matrix a simplified version of \(F\). The resulting matrix then is a simplified version of \(\hat{S}\). However, if the order of columns (i.e. triphones) in \(C_{realwords}\), from which \(F\) is derived, is different to the order of columns (i.e. triphones) in \(C_{pseudowords}\), multiplication is possible, yet different (wrong) columns and rows are multiplied, as shown in the following example:

\[ \left(\begin{array}{cc} B & A \\ D & C \end{array}\right) \left(\begin{array}{cc} \alpha & \beta \\ \gamma & \delta \end{array}\right) = \left(\begin{array}{cc} B\alpha+A\gamma & B\beta+A\delta \\ D\alpha+C\gamma & D\beta+C\delta \end{array}\right) \]

Thus, we must make sure to use a pseudoword cue matrix \(C_{pseudowords}\) with the same order of columns as the real word cue matrix \(C_{realwords}\). This, in turn, also expands the pseudoword cue matrix \(C_{pseudowords}\), i.e. to contain all column names of the real word cue matrix \(C_{realwords}\) it must have an identical number of columns.

Using two functions of the LDLConvFunctions package, this is easily done. First, find which column in the real word cue matrix \(C_{realwords}\) corresponds to which column in the pseudoword cue matrix \(C_{pseudowords}\):

found_triphones <- find_triphones(pseudo_C_matrix = engs.cues, real_C_matrix = mald.cues)

Then, use this information to create a new pseudoword cue matrix \(C\) which a) has the same number of columns as the real word cue matrix, and b) the same order of columns (i.e. triphones) as the real word cue matrix:

pseudo.C.new <- reorder_pseudo_C_matrix(pseudo_C_matrix = engs.cues, real_C_matrix = mald.cues, found_triphones = found_triphones)

Finally, we can solve the aforementioned equation \(engs.Smat = engs.C * F\).

engs.Smat = pseudo.C.new %*% F

dim(engs.Smat)
# 78 5487

# change rownames to pseudoword rownames (instead of numbers)
rownames(engs.Smat) <- engs.data$DISC

In this semantic matrix, 48 pseudowords (i.e. 78 pronunciations) and their semantic vectors are contained. However, all of our pseudoword pairs are homophones, thus, they show identical semantic vectors even though half of them contain a suffix, e.g. the semantic vectors for bloufs (non-morphemic S) and bloufs (plural S) are identical.

X fierce fifteen fifteenth fifth fifty
bl6fs -0.0001453 0.0002496 0.0002782 0.0006957 -0.0001303
bl6fs -0.0001453 0.0002496 0.0002782 0.0006957 -0.0001303
bl6ks 0.0000021 -0.0006015 0.0000090 -0.0012144 0.0003275
bl6ks 0.0000021 -0.0006015 0.0000090 -0.0012144 0.0003275

We “fix” this by adding the “PL” semantic vector values to the semantic vectors of plural pseudowords (idea by Yu-Ying).

Thus, we first need to extract the “PL” semantic vector contained in the general \(A\) matrix.

PL.vec <- mald.Amat[rownames(mald.Amat)=="PL"]

length(PL.vec)
# 5487

Taking a closer look at the pseudoword semantic matrix \(engs.S\), we find that every second row corresponds to a plural pseudoword, i.e. this is where we add the “PL” vector.

engs.pl = engs.Smat[seq(2, nrow(engs.Smat), 2), ]

Taking this plural subset, we add the “PL” semantic vector to all rows.

engs.pl.added <- engs.pl + rep(PL.vec, each = nrow(engs.pl))

We then recombine the original monomorphemic semantic matrix and the modified plural semantic matrix, and recreate the original row order.

engs.nm = engs.Smat[seq(1, nrow(engs.Smat), 2), ]

engs.Smat <- rbind(engs.nm, engs.pl.added)

sort.rows <- row.names(engs.Smat.old)
engs.Smat <- engs.Smat[order(match(rownames(engs.Smat), sort.rows)), , drop = FALSE]

Now, homophonous pseudowords have differing semantic vectors.

X fierce fifteen fifteenth fifth fifty
bl6fs -0.0001453 0.0002496 0.0002782 0.0006957 -0.0001303
bl6fs -0.0000796 0.0001372 0.0003686 0.0003187 -0.0002868
bl6ks 0.0000021 -0.0006015 0.0000090 -0.0012144 0.0003275
bl6ks 0.0000678 -0.0007139 0.0000994 -0.0015913 0.0001709

4.3 combined

Finally, we can create one combined semantic matrix \(comb.S\) for real and nonce words.

comb.Smat <- rbind(mald.Smat, engs.Smat)

However, WpmWithLDL functions in R are sensitive to row order. Thus, we need to recreate the original row order as found in the combined cue matrix \(comb.C\).

all(rownames(comb.cues$matrices$C) == rownames(comb.Smat))
# FALSE

comb.Smat2 <- comb.Smat[rownames(comb.cues$matrices$C),,drop=FALSE]

all(rownames(comb.cues$matrices$C) == rownames(comb.Smat2))
# TRUE

5 Comprehension

Using \(comb.C\) and \(comb.S\), we can now learn and evaluate comprehension of real and nonce words in one go, i.e. as if they are encountered by a person with their individual lexicon.

First, comprehension is learned.

comb.comp = learn_comprehension(cue_obj = comb.cues, S = comb.Smat2)

Second, the accuracy of this comprehension is evaluated. That is, how good can form (cue matrix) be mapped onto meaning (semantic matrix). The result of this mapping, i.e. a predicted semantic matrix, is obtained as \(\hat{S}\).

comb.comp_acc = accuracy_comprehension(m = comb.comp, data = comb.data, wordform = "DISC", show_rank = TRUE, neighborhood_density = TRUE)

comb.comp_acc$acc # 0.7440758

We find that comprehension accuracy is about \(74\%\).

6 Production

Using \(comb.C\) and \(comb.S\) as well as the comprehension results, we now are able to learn production and evaluate its accuracy.

First, production is learned.

comb.prod = learn_production(cue_obj = comb.cues, S = comb.Smat2, comp = comb.comp)

Second, the accuracy of this production is evaluated. That is, how good can meaning (semantic matrix) be mapped onto form (cue matrix). The result of this mapping, i.e. a predicted cue matrix, is obtained as \(\hat{C}\).
Note: \(C\) and \(\hat{C}\) are sometimes referred to as \(T\) and \(\hat{T}\) when talking about production (i.e. T as in target matrix).

comb.prod_acc = accuracy_production(m = comb.prod, data = comb.data, wordform = "DISC", full_results = TRUE, return_triphone_supports = TRUE)

comb.prod_acc$acc # 0.9727488

We find that production accuracy is about \(97\%\).

7 Measures

7.1 Comprehension

Extracting and saving comprehension measures is straightforward.

comb.comp_measures = comprehension_measures(comp_acc = comb.comp_acc, Shat = comb.comp$Shat, S = comb.Smat2, A = mald.Amat, affixFunctions = comb.data$Affix)
write.csv(comb.comp_measures, "comb.comp_measures.csv")

A brief explanation for each measure is given here:

cor_max

The maximum of correlations between a given word’s predicted semantics \(\hat{s}\) and all other words’ semantics in \(comb.Smat\). For this, correlations between a word’s predicted semantic vector \(\hat{s}\) and all semantic vectors of \(comb.Smat\) are computed. Then, the highest of the correlation values is taken as cor_max.
Higher values of this variable indicate a close semantic neighbour, while lower values indicate a larger distance to a word’s nearest semantic neighbour.

cor_target

cor_target describes the correlation between a word’s predicted semantic vector \(\hat{s}\) in \(comb.\hat{S}\) and its targeted semantic vector \(s\) in the \(comb.S\).
Higher values indicate a good mapping from form to meaning, i.e. a higher semantic certainty.

l1norm

The l1norm is the sum of the absolute values of vector elements of a given word’s predicted semantic vector \(\hat{s}\), i.e. its city-block distance.
Higher values imply more strong links to many other lexomes. Thus, this measure may be interpreted as semantic activation diversity.

l2norm

The l2norm is the square root of the sum of the squared values of a given word’s predicted vector \(\hat{s}\), i.e. its Euclidian distance.
Higher values imply more strong links to many other lexomes. Thus, this measure may be interpreted as semantic activation diversity.

rank

Given the correlation values of a word’s predicted semantic vector \(\hat{s}\) and its eight nearest neighbours’ semantic vectors \(s_{n1}...s_{n8}\), ordinal values from \(1\) to \(1+n\) are distributed, with a value of \(1\) given to the word with the highest correlation value. That is, if a word’s predicted semantic vector \(\hat{s}\) is most highly correlated with the targeted word’s semantic vector \(s\), its rank will be \(1\). If a word’s predicted semantic vector \(\hat{s}\) is most highly correlated with any other but the targeted word’s semantic vector, its rank will be \(1+n\).
Thus, a value of \(1\) indicates that a predicted semantic vector is closest to its targeted semantic vector, while values of \(1+n\) indicate a mismatch, i.e. a comprehension failure. This measure can be taken as a goodness of comprehension indicator.

density

The correlation values of a word’s predicted semantic vector \(\hat{s}\) and its eight nearest neighbours’ semantic vectors \(s_{n1}...s_{n8}\) are taken into consideration. The mean of these eight correlation values is given in density.
Thus, higher values indicate a denser semantic neighbourhood.

cor_affix

This variable describes the correlation of a word’s estimated semantic vector \(\hat{s}\) with the vector of its affix \(s_{affix}\) in \(S\). That is, only words containing affixes are ascribed a cor_affix value.
Higher values indicate a higher correlation of estimated word semantics and affix semantics, thus acting as a measure of semantic transparency.

7.2 Production

Extracting and saving production measures runs into problems. It appears that there is 1 word for which comb.prod_acc does not contain all necessary information. Why? I have no clue.

I worked around this problem by removing the word from full, i.e. the list in which all necessary info is contained within comb.prod_acc.

full <- comb.prod_acc[["full"]]

Using the summary() function, the case with missing info is identified. I call this newly created data frame nineortwelve as complete entries consist of 12 entries per word, while incomplete entries consist of only 9 entries per word.

nineortwelve <- as.data.frame(summary(full))

Checking this data frame, the following entry can be found with only 9 individual entries instead of 12. Thus, it is removed from full2.

full2 <- full

full2[[3725]] <- NULL

Fun fact: entry 3725 belongs to the word inauguration.

This reduced full2 then replaces the original full in a copy of comb.prod_acc.

comb.prod_acc2 <- comb.prod_acc

comb.prod_acc2$full <- full2

Only then production measures can be obtained.

comb.prod_measures = production_measures(prod_acc = comb.prod_acc2, S = comb.Smat2, prod = comb.prod)
write.csv(comb.prod_measures, "comb.prod_measures.csv")

A brief explanation for each measure is given here:

correlations

Given all candidate forms of a word, i.e. all word forms found as candidate production by the production model, and their estimated semantic vectors \(s_{candidate1}...s_{candidate1+n}\), correlation values between these estimated semantic vectors and a word’s semantic vector \(s\) are computed. The highest of these correlations values is taken as value of correlations.
Similar to rank for comprehension, this may be interpreted as a measure of goodness of production.

path_counts

path_counts describes the number of paths detected for the production of a word by the production model.
path_counts may be interpreted as a measure of phonological activation diversity, as higher values indicate the existence of multiple candidates (and thus paths) in production.

path_sum

path_sum describes the summed support of paths for a predicted form.
path_sum may be interpreted as a measure of phonological certainty, with higher values indicating a higher certainty in the candidate form.

path_entropies

path_entropies contains the Shannon entropy values which are calculated over the path supports of the predicted form in \(\hat{C}\).
Thus, path_entropies may be interpreted as a measure of phonological uncertainty, with higher values indicating a higher level of disorder, i.e. uncertainty.

lwlr

The length-weakest link ratio, lwlr, is computed by taking the number of path nodes divided by the value of the weakest link of that path. Higher values of lwlr indicate less support for a predicted form \(\hat{c}\).
Thus, lwlr may be interpreted as a measure of phonological uncertainty, with higher values indicating less certainty.

7.3 LDLConvFunctions

A third set of measures consists of the variables introduced by Chuang et al. (2020). To extract these measures, we can use the LDLConvFunctions package.

# remove 'inauguration' from data set 
comb.data2 <- subset(comb.data, Word != "inauguration")

# load package
library(LDLConvFunctions)

# compute measures
ALCframe <- ALC(pseudo_S_matrix = engs.Smat, real_S_matrix = mald.Smat, pseudo_word_data = engs.data)

ALDCframe <- ALDC(prod_acc = comb.prod_acc2, data = comb.data2)

EDNNframe <- EDNN(comprehension = comb.comp, data = comb.data)

NNCframe <- NNC(pseudo_S_matrix = engs.Smat, real_S_matrix = comb.Smat2, pseudo_word_data = engs.data, real_word_data = mald.data)

SCPPframe <- SCPP(prod_measures = comb.prod_measures, data = comb.data2)

A brief explanation for each measure is given here:

ALC

The Average Lexical Correlation is the mean value of all correlation values of a pseudoword’s estimated semantic vector as contained in \(engs.S\) with each of the real word semantic vectors contained in \(mald.S\).
Higher ALC values indicate that a pseudoword’s semantics are part of a denser semantic neighbourhood. Thus, ALC may be interpreted as a measure of semantic activation diversity for pseudowords.

ALDC

The Average Levenshtein Distance of all Candidate productions is the mean of all Levenshtein distances of a word and its candidate forms. That is, for a word with only one candidate form, the Levenshtein distance between that word and its candidate form is its ALDC. For words with multiple candidates, the mean of the individual Levenshtein distances between candidates and targeted form constitutes the ALDC.
Thus, higher values indicate that a word’s candidate forms are very different from the intended pronunciation. ALDC may be interpreted as a measure of phonological neighbourhood density, i.e. large values indicate sparse neighbourhoods.

EDNN

This variable describes the Euclidian Distance between a word’s estimated semantic vector \(\hat{s}\) and its Nearest semantic Neighbour.
Thus, higher values indicate a larger distance to the nearest semantic neighbour. EDNN may be regarded as a measure of semantic neighbourhood density.

NNC

The Nearest Neighbour Correlation is computed by taking a pseudoword’s estimated semantic vector as given in \(engs.S\) and checking it for the highest correlation value against all real word semantic vectors as given in \(mald.S\). This highest correlation value is taken as NNC value.
Thus, higher values indicate that a pseudoword is semantically close to a real word. Additionally, one can tell which real word a pseudoword’s semantics are closest to. This measure may be interpreted as certainty in pseudoword semantics (?).

SCPP

The Semantic Correlation of Predicted Production describes the highest correlation between the estimated semantic vector \(\hat{s}\) of a pseudoword and any of the semantic vectors \(\hat{s}_{candidate1}...\hat{s}_{candidate1+n}\) generated from that word’s candidate forms. Thus, this measure is identical to the previously introduced correlations measure as introduced by the WpmWithLdl package.

8 Analyses

In total, three separate analyses are presented. First, an analysis based on the WpmWithLdl measures is conducted. Second, a similar analysis with LDLConvFunctions measures is done. Third, all measures are combined for a final analysis.

The data set used in the remainder of this script contains the following parts:

  • ENGS production data (Schmitz et al., 2020)
  • LDL measures as extracted by the WpmWithLdl package
  • LDL measures as extracted by the LDLConvFunctions package

The data set contains 52 columns and 653 rows, i.e. 653 observations of 52 variables:

9 WpmWithLdl measures

WpmWithLdl includes the following measures via the comprehension_measures and production_measures functions: click here.

Without discussing the underlying implications of the individual measures, let’s see whether some of them are correlated. If so, we need to take some measures before including them in any models.

9.1 Correlations

Correlations of all WpmWithLdl measures:

pairscor.fnc(data[,c("cor_max", "cor_target", "l1norm", "l2norm", "density", "cor_affix",
                     "correlations", "path_counts", "path_sum", "path_entropies", "lwlr")])

Note: rank is not considered as a predictive variable as its value is \(1\) for all observations. Affix is not correlated (\(-0.08<\rho<0.16\)) with any of the included measures.

We find the following highly correlated (\(\rho\ge0.5\)) variable combinations:

  • cor_max vs. cor_target, \(\rho=1\)
  • cor_max vs. correlations, \(\rho=1\)
  • cor_target vs. correlations, \(\rho=1\)
  • l1norm vs. l2norm, \(\rho=0.94\)
  • density vs. cor_affix, \(\rho=0.84\)
  • path_counts vs. path_entropies, \(\rho=0.95\)
  • path_sum vs. lwlr, \(\rho=-0.76\)

Thus, we need to take some measures before we run into problems of colinearity later on.

9.2 PCA

To avoid colinearity issues when modelling, we combine all highly correlated variables in a principal component analysis.

PCA Computation

First, we extract the highly correlated variables and merge them to form a new data frame.

pca.data1 <- data[colnames(data)=="cor_max"]
pca.data2 <- data[colnames(data)=="cor_target"]
pca.data3 <- data[colnames(data)=="correlations"]
pca.data4 <- data[colnames(data)=="l1norm"]
pca.data5 <- data[colnames(data)=="l2norm"]
pca.data6 <- data[colnames(data)=="density"]
pca.data7 <- data[colnames(data)=="cor_affix"] # this variable is dropped as it only contains data on PL cases
pca.data8 <- data[colnames(data)=="path_counts"]
pca.data9 <- data[colnames(data)=="path_entropies"]
pca.data10 <- data[colnames(data)=="path_sum"]
pca.data11 <- data[colnames(data)=="lwlr"]

pca.data <- cbind(pca.data1,
                  pca.data2,
                  pca.data3,
                  pca.data4,
                  pca.data5,
                  pca.data6,
                  #pca.data7,
                  pca.data8,
                  pca.data9,
                  pca.data10,
                  pca.data11)

Then, we compute a principal component analysis using the princomp function:

pc <- princomp(pca.data, cor=TRUE, score=TRUE)

In order to decide which components to keep, we follow three criteria:

  1. The Eigenvalue-criterion: Any component that displays an eigenvalue greater than 1.00 is accounting for a greater amount of variance than had been contributed by one variable. Such a component is therefore accounting for a meaningful amount of variance, and is worthy of being retained.
  2. The scree-criterion: Plot the eigenvalues associated with each component and look for a “break” between the components with relatively large eigenvalues and those with small eigenvalues. The components that appear before the break are assumed to be meaningful and are retained (Cattell, 1966).
  3. The proportion-of-variance-criterion: Retain enough components so that the cumulative percent of variance accounted for is equal to some minimal value, i.e. 80 percent.
# load package to extract Eigenvalues
library("factoextra")

# extract Eigenvalues
get_eigenvalue(pc)
         eigenvalue variance.percent cumulative.variance.percent
Dim.1  3.155143e+00     3.155143e+01                    31.55143
Dim.2  2.782508e+00     2.782508e+01                    59.37650
Dim.3  2.002288e+00     2.002288e+01                    79.39938
Dim.4  1.111486e+00     1.111486e+01                    90.51424
Dim.5  6.368246e-01     6.368246e+00                    96.88248
Dim.6  2.192438e-01     2.192438e+00                    99.07492
Dim.7  6.284442e-02     6.284442e-01                    99.70337
Dim.8  2.966341e-02     2.966341e-01                   100.00000
Dim.9  7.170136e-17     7.170136e-16                   100.00000
Dim.10 0.000000e+00     0.000000e+00                   100.00000

Taking all three criteria into consideration, we find that Components 1, 2, and 3 show an Eigenvalue of \(v\ge1\), are followed by a break in Eigenvalues with the Eigenvalue of Component 3 being \(v=2\) and the Eigenvalue of Component 4 being \(v=1.1\), and that the cumulative percentage of variance accounted for by all three components is \(σ^2=79.5\%\), i.e. close to our target level of \(σ^2=80\%\).

Thus, we add components 1, 2, and 3 to our data frame.

pcframe <- as.data.frame(pc$scores)
pcframe <- pcframe[1:3]

names(pcframe)[names(pcframe) == "Comp.1"] <- "WWL_Comp.1"
names(pcframe)[names(pcframe) == "Comp.2"] <- "WWL_Comp.2"
names(pcframe)[names(pcframe) == "Comp.3"] <- "WWL_Comp.3"

data <- cbind(data, pcframe)

PCA Loadings

Now we can take a closer look at the individual components and their loadings.

pc$loadings

Loadings:
               Comp.1 Comp.2 Comp.3 Comp.4 Comp.5 Comp.6 Comp.7 Comp.8 Comp.9
cor_max         0.462  0.339                                            0.349
cor_target      0.462  0.339                                            0.465
correlations    0.462  0.339                                           -0.814
l1norm          0.211 -0.179 -0.579  0.179 -0.247        -0.509  0.484       
l2norm          0.213 -0.170 -0.592        -0.242 -0.158  0.501 -0.483       
density               -0.246         0.682  0.665  0.120                     
path_counts     0.296 -0.413        -0.396  0.283        -0.505 -0.499       
path_entropies  0.279 -0.408        -0.465  0.190  0.131  0.474  0.510       
path_sum       -0.208  0.297 -0.377 -0.272  0.516 -0.609         0.119       
lwlr            0.229 -0.328  0.403  0.219 -0.224 -0.755                     
               Comp.10
cor_max         0.738 
cor_target     -0.671 
correlations          
l1norm                
l2norm                
density               
path_counts           
path_entropies        
path_sum              
lwlr                  

               Comp.1 Comp.2 Comp.3 Comp.4 Comp.5 Comp.6 Comp.7 Comp.8 Comp.9
SS loadings       1.0    1.0    1.0    1.0    1.0    1.0    1.0    1.0    1.0
Proportion Var    0.1    0.1    0.1    0.1    0.1    0.1    0.1    0.1    0.1
Cumulative Var    0.1    0.2    0.3    0.4    0.5    0.6    0.7    0.8    0.9
               Comp.10
SS loadings        1.0
Proportion Var     0.1
Cumulative Var     1.0

But what do these loadings conceptually mean?

Component 1

Component 1 is rather highly correlated with cor_max, cor_target, and correlations.

For cor_max, higher values indicate a close semantic neighbour.

For cor_target, higher values indicate a good mapping from form to meaning, i.e. a higher semantic certainty.

For correlations, higher values indicate a higher goodness of production by means of fitting semantics.

Note: What is correlations again? Given all candidate forms of a word, i.e. all word forms found as candidate production by the production model, and their estimated semantic vectors \(\hat{s}_{candidate1}...\hat{s}_{candidate1+n}\), correlation values between these estimated semantic vectors and a word’s semantic vector \(s\) are computed. The highest of these correlations values is taken as value of correlations.

Thus, for Component 1, high values indicate higher semantic certainty, while lower values indicate higher semantic uncertainty.

Component 2

Component 2 is rather highly negatively correlated with path_counts and path_entropies.

For path_counts, higher values indicate the existence of multiple candidates (and thus paths) in production. This measure may be interpreted as a measure of phonological activation diversity.

For path_entropies, higher values indicate a higher level of disorder, i.e. uncertainty.

In sum, Component 2 reflects the certainty in production, i.e. high values indicate low certainty.

Component 3

Component 3 is rather highly negatively correlated with l1norm and l2norm.

For l1norm, higher values imply more strong links to many other lexomes.

For l2norm, higher values imply more strong links to many other lexomes.

Both, l1norm and l2norm, may be interpreted as measures of semantic activation diversity.

In sum, Component 3 reflects semantic activation diversity, i.e. high component values indicate low semantic activation diversity.

9.3 GAMM

Let’s see whether the PCA components are significant predictors for word-final /s/ duration in singular and plural pseudowords. We will use Generalized Additive Mixed Models (GAMM) to model /s/ durations.

Modelling

Step 0

First, create a model with all relevant variables, including control and PCA variables, and speaker as random effect:

gam.WpmWithLdl.1 <- bam(sDurLog ~
                          Affix +
                          s(WWL_Comp.1) +
                          s(WWL_Comp.2) +
                          s(WWL_Comp.3) +
                          SylPerMin+
                          pauseBin+
                          folType+
                          preC+
                          monoMultilingual+
                          s(speaker, bs="re"),
                        data=data, method = "REML")

Checking the anova() function’s output, we now exclude non-significant variables in a stepwise fashion. After each potential exclusion, compareML() is used to check whether the exclusion is justified in terms of model fit.

anova(gam.WpmWithLdl.1)

Family: gaussian 
Link function: identity 

Formula:
sDurLog ~ Affix + s(WWL_Comp.1) + s(WWL_Comp.2) + s(WWL_Comp.3) + 
    SylPerMin + pauseBin + folType + preC + monoMultilingual + 
    s(speaker, bs = "re")

Parametric Terms:
                 df       F  p-value
Affix             1  30.020 6.29e-08
SylPerMin         1 817.699  < 2e-16
pauseBin          1  39.214 7.22e-10
folType           4   4.915 0.000662
preC              3   1.658 0.174939
monoMultilingual  1   0.452 0.501848

Approximate significance of smooth terms:
                 edf Ref.df     F  p-value
s(WWL_Comp.1)  1.000  1.001 0.012 0.914079
s(WWL_Comp.2)  3.105  3.943 1.424 0.222711
s(WWL_Comp.3)  3.182  3.926 5.524 0.000316
s(speaker)    31.041 38.000 4.992  < 2e-16

Step 1

Exclusion of WWL_Comp.1.

gam.WpmWithLdl.2 <- bam(sDurLog ~
                          Affix +
                          #s(WWL_Comp.1) +
                          s(WWL_Comp.2) +
                          s(WWL_Comp.3) +
                          SylPerMin+
                          pauseBin+
                          folType+
                          preC+
                          monoMultilingual+
                          s(speaker, bs="re"),
                        data=data, method = "REML")

compareML(gam.WpmWithLdl.1, gam.WpmWithLdl.2)
gam.WpmWithLdl.1: sDurLog ~ Affix + s(WWL_Comp.1) + s(WWL_Comp.2) + s(WWL_Comp.3) + 
    SylPerMin + pauseBin + folType + preC + monoMultilingual + 
    s(speaker, bs = "re")

gam.WpmWithLdl.2: sDurLog ~ Affix + s(WWL_Comp.2) + s(WWL_Comp.3) + SylPerMin + 
    pauseBin + folType + preC + monoMultilingual + s(speaker, 
    bs = "re")

Model gam.WpmWithLdl.2 preferred: lower REML score (3.083), and lower df (2.000).
-----
             Model     Score Edf Difference    Df
1 gam.WpmWithLdl.1 -148.2247  19                 
2 gam.WpmWithLdl.2 -151.3075  17     -3.083 2.000

AIC difference: 4.61, model gam.WpmWithLdl.2 has lower AIC.
anova(gam.WpmWithLdl.2)

Family: gaussian 
Link function: identity 

Formula:
sDurLog ~ Affix + s(WWL_Comp.2) + s(WWL_Comp.3) + SylPerMin + 
    pauseBin + folType + preC + monoMultilingual + s(speaker, 
    bs = "re")

Parametric Terms:
                 df       F  p-value
Affix             1  28.318 1.45e-07
SylPerMin         1 819.008  < 2e-16
pauseBin          1  38.521 1.00e-09
folType           4   4.645  0.00106
preC              3   1.439  0.23027
monoMultilingual  1   0.430  0.51207

Approximate significance of smooth terms:
                 edf Ref.df     F  p-value
s(WWL_Comp.2)  1.000  1.001 2.644 0.104453
s(WWL_Comp.3)  3.093  3.891 4.927 0.000684
s(speaker)    31.068 38.000 5.007  < 2e-16

Step 2

Exclusion of WWL_Comp.2.

gam.WpmWithLdl.3 <- bam(sDurLog ~
                          Affix +
                          #s(WWL_Comp.1) +
                          #s(WWL_Comp.2) +
                          s(WWL_Comp.3) +
                          SylPerMin+
                          pauseBin+
                          folType+
                          preC+
                          monoMultilingual+
                          s(speaker, bs="re"),
                        data=data, method = "REML")

compareML(gam.WpmWithLdl.2, gam.WpmWithLdl.3)
gam.WpmWithLdl.2: sDurLog ~ Affix + s(WWL_Comp.2) + s(WWL_Comp.3) + SylPerMin + 
    pauseBin + folType + preC + monoMultilingual + s(speaker, 
    bs = "re")

gam.WpmWithLdl.3: sDurLog ~ Affix + s(WWL_Comp.3) + SylPerMin + pauseBin + folType + 
    preC + monoMultilingual + s(speaker, bs = "re")

Model gam.WpmWithLdl.3 preferred: lower REML score (2.707), and lower df (2.000).
-----
             Model     Score Edf Difference    Df
1 gam.WpmWithLdl.2 -151.3075  17                 
2 gam.WpmWithLdl.3 -154.0144  15     -2.707 2.000

AIC difference: -1.38, model gam.WpmWithLdl.2 has lower AIC.
anova(gam.WpmWithLdl.3)

Family: gaussian 
Link function: identity 

Formula:
sDurLog ~ Affix + s(WWL_Comp.3) + SylPerMin + pauseBin + folType + 
    preC + monoMultilingual + s(speaker, bs = "re")

Parametric Terms:
                 df       F  p-value
Affix             1  26.856 2.99e-07
SylPerMin         1 816.208  < 2e-16
pauseBin          1  38.939 8.21e-10
folType           4   4.850 0.000742
preC              3   1.368 0.251565
monoMultilingual  1   0.445 0.505057

Approximate significance of smooth terms:
                 edf Ref.df     F  p-value
s(WWL_Comp.3)  3.185  4.002 4.819 0.000775
s(speaker)    30.943 38.000 4.915  < 2e-16

Step 3

Exclusion of monoMultilingual.

gam.WpmWithLdl.4 <- bam(sDurLog ~
                          Affix +
                          #s(WWL_Comp.1) +
                          #s(WWL_Comp.2) +
                          s(WWL_Comp.3) +
                          SylPerMin+
                          pauseBin+
                          folType+
                          preC+
                          #monoMultilingual+
                          s(speaker, bs="re"),
                        data=data, method = "REML")

compareML(gam.WpmWithLdl.3, gam.WpmWithLdl.4)
gam.WpmWithLdl.3: sDurLog ~ Affix + s(WWL_Comp.3) + SylPerMin + pauseBin + folType + 
    preC + monoMultilingual + s(speaker, bs = "re")

gam.WpmWithLdl.4: sDurLog ~ Affix + s(WWL_Comp.3) + SylPerMin + pauseBin + folType + 
    preC + s(speaker, bs = "re")

Model gam.WpmWithLdl.4 preferred: lower REML score (2.044), and lower df (1.000).
-----
             Model     Score Edf Difference    Df
1 gam.WpmWithLdl.3 -154.0144  15                 
2 gam.WpmWithLdl.4 -156.0587  14     -2.044 1.000

AIC difference: 0.33, model gam.WpmWithLdl.4 has lower AIC.
anova(gam.WpmWithLdl.4)

Family: gaussian 
Link function: identity 

Formula:
sDurLog ~ Affix + s(WWL_Comp.3) + SylPerMin + pauseBin + folType + 
    preC + s(speaker, bs = "re")

Parametric Terms:
          df       F  p-value
Affix      1  26.825 3.03e-07
SylPerMin  1 831.268  < 2e-16
pauseBin   1  38.776 8.88e-10
folType    4   4.838 0.000757
preC       3   1.353 0.256172

Approximate significance of smooth terms:
                 edf Ref.df     F  p-value
s(WWL_Comp.3)  3.191  4.009 4.829 0.000752
s(speaker)    31.659 39.000 4.842  < 2e-16

Step 4

Exclusion of preC.

gam.WpmWithLdl.5 <- bam(sDurLog ~
                          Affix +
                          #s(WWL_Comp.1) +
                          #s(WWL_Comp.2) +
                          s(WWL_Comp.3) +
                          SylPerMin+
                          pauseBin+
                          folType+
                          #preC+
                          #monoMultilingual+
                          s(speaker, bs="re"),
                        data=data, method = "REML")

compareML(gam.WpmWithLdl.4, gam.WpmWithLdl.5)
gam.WpmWithLdl.4: sDurLog ~ Affix + s(WWL_Comp.3) + SylPerMin + pauseBin + folType + 
    preC + s(speaker, bs = "re")

gam.WpmWithLdl.5: sDurLog ~ Affix + s(WWL_Comp.3) + SylPerMin + pauseBin + folType + 
    s(speaker, bs = "re")

Model gam.WpmWithLdl.5 preferred: lower REML score (7.281), and lower df (3.000).
-----
             Model     Score Edf Difference    Df
1 gam.WpmWithLdl.4 -156.0587  14                 
2 gam.WpmWithLdl.5 -163.3395  11     -7.281 3.000

AIC difference: 1.40, model gam.WpmWithLdl.5 has lower AIC.
anova(gam.WpmWithLdl.5)

Family: gaussian 
Link function: identity 

Formula:
sDurLog ~ Affix + s(WWL_Comp.3) + SylPerMin + pauseBin + folType + 
    s(speaker, bs = "re")

Parametric Terms:
          df       F  p-value
Affix      1  26.473 3.61e-07
SylPerMin  1 850.839  < 2e-16
pauseBin   1  39.885 5.19e-10
folType    4   5.143 0.000442

Approximate significance of smooth terms:
                 edf Ref.df     F p-value
s(WWL_Comp.3)  3.151  3.964 4.636 0.00106
s(speaker)    31.592 39.000 4.785 < 2e-16

Summary

summary.gam(gam.WpmWithLdl.5)

Family: gaussian 
Link function: identity 

Formula:
sDurLog ~ Affix + s(WWL_Comp.3) + SylPerMin + pauseBin + folType + 
    s(speaker, bs = "re")

Parametric coefficients:
                Estimate Std. Error t value Pr(>|t|)    
(Intercept)   -0.8722959  0.0473099 -18.438  < 2e-16 ***
AffixPL       -0.0754263  0.0146596  -5.145 3.61e-07 ***
SylPerMin     -0.5032766  0.0172537 -29.169  < 2e-16 ***
pauseBinpause  0.1122957  0.0177810   6.315 5.19e-10 ***
folTypeF      -0.0185203  0.0550083  -0.337 0.736472    
folTypeN       0.0002509  0.0211516   0.012 0.990540    
folTypeP       0.0095421  0.0185635   0.514 0.607421    
folTypeV      -0.0672956  0.0188033  -3.579 0.000372 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Approximate significance of smooth terms:
                 edf Ref.df     F p-value    
s(WWL_Comp.3)  3.151  3.964 4.636 0.00106 ** 
s(speaker)    31.592 39.000 4.785 < 2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

R-sq.(adj) =  0.804   Deviance explained = 81.7%
-REML = -163.34  Scale est. = 0.029512  n = 653
Summary:
    * Affix : factor; set to the value(s): PL. 
    * SylPerMin : numeric predictor; set to the value(s): 2.336339423. 
    * pauseBin : factor; set to the value(s): no_pause. 
    * folType : factor; set to the value(s): APP. 
    * WWL_Comp.3 : numeric predictor; with 30 values ranging from -3.513786 to 3.834930. 
    * speaker : factor; set to the value(s): 2. (Might be canceled as random effect, check below.) 
    * NOTE : The following random effects columns are canceled: s(speaker)
 

As we only have a few values to the far left, but a clearly linear slope to the right for a majority of data points, these results may indicate a rather linear relationship. That is, higher WWL_Comp.3 values lead to longer /s/ durations, i.e. a lower semantic activation diversity leads to longer /s/ durations.

9.4 LMER

Let’s see whether a linear mixed effects regression model does a similarly good job as compared to the previous GAMM model.

Modelling

First, create a model with all relevant variables, including control and PCA variables, and speaker and monoMultilingual as random effect:

lmer.1 <- lmer(sDurLog ~
                 Affix +
                 WWL_Comp.1 +
                 WWL_Comp.2 +
                 WWL_Comp.3 +
                 SylPerMin +
                 pauseBin +
                 folType +
                 preC +
                 (1 | monoMultilingual) +
                 (1 | speaker),
               data=data, REML=T)

anova(lmer.1)
Type III Analysis of Variance Table with Satterthwaite's method
            Sum Sq Mean Sq NumDF  DenDF  F value    Pr(>F)    
Affix       0.8772  0.8772     1 627.56  29.3486 8.626e-08 ***
WWL_Comp.1  0.0094  0.0094     1 611.01   0.3131  0.575983    
WWL_Comp.2  0.0724  0.0724     1 610.91   2.4219  0.120166    
WWL_Comp.3  0.2908  0.2908     1 607.50   9.7290  0.001900 ** 
SylPerMin  24.3084 24.3084     1 524.79 813.2841 < 2.2e-16 ***
pauseBin    1.1917  1.1917     1 634.01  39.8705 5.105e-10 ***
folType     0.5508  0.1377     4 610.60   4.6074  0.001135 ** 
preC        0.1804  0.0601     3 607.23   2.0114  0.111149    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Using the step() function, we exclude all non-significant predictors:

step(lmer.1)
Backward reduced random-effect table:

                       Eliminated npar  logLik     AIC    LRT Df Pr(>Chisq)    
<none>                              17 144.931 -255.86                         
(1 | monoMultilingual)          1   16 144.931 -257.86  0.000  1          1    
(1 | speaker)                   0   15  96.682 -163.37 96.498  1     <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Backward reduced fixed-effect table:
Degrees of freedom method: Satterthwaite 

           Eliminated  Sum Sq Mean Sq NumDF  DenDF  F value    Pr(>F)    
WWL_Comp.1          1  0.0094  0.0094     1 611.01   0.3131 0.5759828    
preC                2  0.1711  0.0570     3 608.37   1.9105 0.1266333    
WWL_Comp.2          3  0.0708  0.0708     1 616.28   2.3597 0.1250230    
Affix               0  0.8281  0.8281     1 633.30  27.5107 2.133e-07 ***
WWL_Comp.3          0  0.2137  0.2137     1 612.02   7.0983 0.0079192 ** 
SylPerMin           0 24.9558 24.9558     1 532.92 829.1163 < 2.2e-16 ***
pauseBin            0  1.2666  1.2666     1 639.57  42.0816 1.754e-10 ***
folType             0  0.6187  0.1547     4 615.86   5.1387 0.0004454 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Model found:
sDurLog ~ Affix + WWL_Comp.3 + SylPerMin + pauseBin + folType + 
    (1 | speaker)

This leaves us with the following final model:

lmer.2 <- lmer(sDurLog ~
                 Affix +
                 WWL_Comp.3 +
                 SylPerMin +
                 pauseBin +
                 folType +
                 (1 | speaker),
               data=data, REML=T)

anova(lmer.2)
Type III Analysis of Variance Table with Satterthwaite's method
            Sum Sq Mean Sq NumDF  DenDF  F value    Pr(>F)    
Affix       0.8281  0.8281     1 633.30  27.5107 2.133e-07 ***
WWL_Comp.3  0.2137  0.2137     1 612.02   7.0983 0.0079192 ** 
SylPerMin  24.9558 24.9558     1 532.92 829.1163 < 2.2e-16 ***
pauseBin    1.2666  1.2666     1 639.57  42.0816 1.754e-10 ***
folType     0.6187  0.1547     4 615.86   5.1387 0.0004454 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Trimming

Trimming appears to be a good idea according to the residual plots. However, for some unknown reason, the trimming procedure cannot be shown in R Markdown. Thus, the detailed procedure is skipped here. Trimming results in the removal of \(n=9\) (i.e. 1.4%) data points.

lmer.final <- lmer(sDurLog ~
                     Affix +
                     WWL_Comp.3 +
                     SylPerMin +
                     pauseBin +
                     folType +
                     (1 | speaker),
                   data=data.trim, REML=T)

anova(lmer.final)
Type III Analysis of Variance Table with Satterthwaite's method
            Sum Sq Mean Sq NumDF  DenDF  F value    Pr(>F)    
Affix       0.6726  0.6726     1 621.87  25.0318 7.349e-07 ***
WWL_Comp.3  0.2631  0.2631     1 602.56   9.7893  0.001840 ** 
SylPerMin  25.1170 25.1170     1 542.38 934.7171 < 2.2e-16 ***
pauseBin    1.0618  1.0618     1 627.72  39.5127 6.102e-10 ***
folType     0.5285  0.1321     4 605.72   4.9174  0.000659 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
summary(lmer.final)
Linear mixed model fit by REML. t-tests use Satterthwaite's method [
lmerModLmerTest]
Formula: sDurLog ~ Affix + WWL_Comp.3 + SylPerMin + pauseBin + folType +  
    (1 | speaker)
   Data: data.trim

REML criterion at convergence: -379.9

Scaled residuals: 
     Min       1Q   Median       3Q      Max 
-2.62309 -0.65373 -0.00777  0.68864  2.36020 

Random effects:
 Groups   Name        Variance Std.Dev.
 speaker  (Intercept) 0.008885 0.09426 
 Residual             0.026871 0.16392 
Number of obs: 644, groups:  speaker, 40

Fixed effects:
                Estimate Std. Error         df t value Pr(>|t|)    
(Intercept)    -0.832181   0.046440 389.439546 -17.920  < 2e-16 ***
AffixPL        -0.070555   0.014102 621.865569  -5.003 7.35e-07 ***
WWL_Comp.3      0.014533   0.004645 602.558403   3.129 0.001840 ** 
SylPerMin      -0.519794   0.017002 542.380437 -30.573  < 2e-16 ***
pauseBinpause   0.107248   0.017062 627.724677   6.286 6.10e-10 ***
folTypeF       -0.026878   0.052474 608.948469  -0.512 0.608684    
folTypeN        0.003267   0.020308 600.757647   0.161 0.872248    
folTypeP        0.010356   0.017887 602.370095   0.579 0.562842    
folTypeV       -0.061733   0.018072 609.657388  -3.416 0.000678 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Correlation of Fixed Effects:
            (Intr) AffxPL WWL_C. SylPrM psBnps flTypF flTypN flTypP
AffixPL     -0.073                                                 
WWL_Comp.3   0.106  0.081                                          
SylPerMin   -0.892 -0.091 -0.121                                   
pauseBinpas -0.450 -0.131 -0.115  0.422                            
folTypeF    -0.083  0.060 -0.021  0.017  0.001                     
folTypeN    -0.124  0.098  0.025 -0.044 -0.076  0.136              
folTypeP    -0.066  0.032 -0.016 -0.121 -0.094  0.153  0.409       
folTypeV     0.062 -0.015  0.024 -0.244 -0.230  0.139  0.415  0.489

9.5 LMER vs. GAMM

Let’s compare the results of both, LMER and GAMM, models.

As we have seen already, the GAMM model (ignoring the left hand side with only few data points) predicts higher sDurLog values for higher values of WWL_Comp.3:

Summary:
    * Affix : factor; set to the value(s): PL. 
    * SylPerMin : numeric predictor; set to the value(s): 2.336339423. 
    * pauseBin : factor; set to the value(s): no_pause. 
    * folType : factor; set to the value(s): APP. 
    * WWL_Comp.3 : numeric predictor; with 30 values ranging from -3.513786 to 3.834930. 
    * speaker : factor; set to the value(s): 2. (Might be canceled as random effect, check below.) 
    * NOTE : The following random effects columns are canceled: s(speaker)
 

Taking into account the low number of data points, we checked whether a LMER model comes to a similar result:

Indeed, the LMER model predicts higher sDurLog values for higher values of WWL_Comp.3 as well.

Thus, both models indicate that the duration of /s/ increases with higher values of WWL_Comp.3, i.e. with lower semantic activation diversity (i.e. lower values of l1norm and l2norm).

10 LDLConvFunctions measures

LDLConvFunctions includes the following measures as implemented in Chuang et al. (2020): click here.

Without discussing the underlying implications of the individual measures, let’s see whether some of them are correlated. If so, we need to take some measures before including them in any models.

10.1 Correlations

Correlations of all LDLConvFunctions measures and Affix:

pairscor.fnc(data[,c("Affix", "ALC", "ALDC", "EDNN", "NNC", "SCPP")])

We find the following highly correlated (\(\rho\ge0.5\)) variable combination:

  • EDNN vs. SCPP, \(\rho=-0.86\)
  • Affix vs. NNC, \(r_{pb}=-0.55\) (point-biserial correlation)

Thus, we need to take some measures before we run into problems of colinearity later on.

10.2 PCA

To avoid colinearity issues when modelling, we combine all highly correlated variables in a principal component analysis. However, as we must include Affix as categorical variable, we cannot use the princomp function. Instead, we make use of the PCAmixdata function of the PCAmix package. This function uses multiple correspondence analysis (MCA) to include categorical data in PCAs.

PCA Computation

First, we extract the highly correlated variables and merge them to form two new data frames: one for numerical data, one for categorical data.

pca.data1 <- data[colnames(data)=="EDNN"]
pca.data2 <- data[colnames(data)=="SCPP"]
pca.data3 <- data[colnames(data)=="Affix"]
pca.data4 <- data[colnames(data)=="NNC"]

# numerical data
pca.data1 <- cbind(pca.data1,
                  pca.data2,
                  pca.data4)

# categorical data
pca.data2 <- cbind(pca.data3)

Then, we compute a principal component analysis using the PCAmixdata function:

library(PCAmixdata)

res.pcamix <- PCAmix(X.quanti = pca.data1, X.quali = pca.data2, rename.level = TRUE, graph = FALSE)

In order to decide which components to keep, we follow three criteria:

  1. The Eigenvalue-criterion: Any component that displays an eigenvalue greater than 1.00 is accounting for a greater amount of variance than had been contributed by one variable. Such a component is therefore accounting for a meaningful amount of variance, and is worthy of being retained.
  2. The scree-criterion: Plot the eigenvalues associated with each component and look for a “break” between the components with relatively large eigenvalues and those with small eigenvalues. The components that appear before the break are assumed to be meaningful and are retained (Cattell, 1966).
  3. The proportion-of-variance-criterion: Retain enough components so that the cumulative percent of variance accounted for is equal to some minimal value, i.e. 80 percent.
# extract Eigenvalues
res.pcamix$eig
      Eigenvalue Proportion Cumulative
dim 1  1.8785047  46.962617   46.96262
dim 2  1.5394156  38.485390   85.44801
dim 3  0.4485290  11.213226   96.66123
dim 4  0.1335507   3.338767  100.00000

Taking all three criteria into consideration, we find that Components 1 and 2 show an Eigenvalue of \(v\ge1\), are followed by a break in Eigenvalues with the Eigenvalue of Component 2 being \(v=1.5\) and the Eigenvalue of Component 3 being \(v=0.4\), and that the cumulative percentage of variance accounted for by Components 1 and 2 is \(σ^2=85\%\), i.e. above our target level of \(σ^2=80\%\).

Thus, we add components 1 and 2 to our data frame.

pcframe <- as.data.frame(res.pcamix$scores)
pcframe <- pcframe[1:2]


names(pcframe)[names(pcframe) == "dim 1"] <- "LCF_Comp.1b"
names(pcframe)[names(pcframe) == "dim 2"] <- "LCF_Comp.2b"


data <- cbind(data, pcframe)

PCA Loadings

Now we can take a closer look at the component’s loadings.

res.pcamix$sqload
           dim 1      dim 2       dim 3       dim 4
EDNN  0.90763095 0.02444891 0.001449271 0.066470876
SCPP  0.88329422 0.04403071 0.008175566 0.064499500
NNC   0.06501819 0.71454314 0.218583906 0.001854766
Affix 0.02256131 0.75639285 0.220320282 0.000725555

But what do these loadings conceptually mean?

Component 1

Component 1 is rather highly correlated with EDNN and SCPP.

Following the correlation circle, we find that Component 1 is positively correlated with EDDN, while it is negatively correlated with SCPP.

For EDNN, higher values indicate a larger distance to the nearest semantic neighbour. Thus, EDNN may be regarded as a measure of semantic activation diversity.

For SCPP, higher values indicate a higher goodness of production by means of fitting semantics.

Note: SCPP is identical to the correlations measure of the WpmWithLdl package.

Thus, for Component 1, high values indicate less semantic activation diversity, while lower values indicate more semantic activation diversity.

Component 2

Component 2 is rather highly correlated with Affix and NNC.

Following the correlation circle, we find that Component 2 is positively correlated with NNC.

For NNC, higher values indicate a close real word semantic neighbour, i.e. a probably denser neighbourhood/higher activation diversity.


As Affix is a categorical variable, its interpretation is somewhat different. According to Chavent et al. (2017), coordinates on the level map are barycenters of PC scores. That is, we get some sort of an idea of how the levels of Affix behave. However, we cannot tell much more (I think).

Thus, for Component 2, high values indicate a close real word semantic neighbour, while also containing some sort of information on Affix.

10.3 GAMM

Let’s see whether PCA component and LCF measures are significant predictors for word-final /s/ duration in singular and plural pseudowords. We will use Generalized Additive Mixed Models (GAMM) to model /s/ durations.

Modelling

Step 0

First, create a model with all relevant variables, including control and PCA variables, and speaker as random effect:

gam.LCF.1 <- bam(sDurLog ~
                   s(LCF_Comp.1b) +
                   s(LCF_Comp.2b) +
                   s(ALC) +
                   s(ALDC, k=3) +
                   SylPerMin+
                   pauseBin+
                   folType+
                   preC+
                   monoMultilingual+
                   s(speaker, bs="re"),
                 data=data, method = "REML")

Checking the anova() function’s output, we now exclude non-significant variables in a stepwise fashion. After each potential exclusion, compareML() is used to check whether the exclusion is justified in terms of model fit.

anova(gam.LCF.1)

Family: gaussian 
Link function: identity 

Formula:
sDurLog ~ s(LCF_Comp.1b) + s(LCF_Comp.2b) + s(ALC) + s(ALDC, 
    k = 3) + SylPerMin + pauseBin + folType + preC + monoMultilingual + 
    s(speaker, bs = "re")

Parametric Terms:
                 df       F  p-value
SylPerMin         1 814.815  < 2e-16
pauseBin          1  41.425  2.5e-10
folType           4   4.840 0.000755
preC              3   1.452 0.226538
monoMultilingual  1   0.406 0.524330

Approximate significance of smooth terms:
                  edf Ref.df     F  p-value
s(LCF_Comp.1b)  1.000  1.000 0.827 0.363355
s(LCF_Comp.2b)  2.572  3.103 6.102 0.000392
s(ALC)          2.487  3.108 5.307 0.001105
s(ALDC)         1.000  1.000 0.201 0.654484
s(speaker)     31.317 38.000 5.207  < 2e-16

Step 1

Exclusion of ALDC.

gam.LCF.2 <- bam(sDurLog ~
                   s(LCF_Comp.1b) +
                   s(LCF_Comp.2b) +
                   s(ALC) +
                   #s(ALDC, k=3) +
                   SylPerMin+
                   pauseBin+
                   folType+
                   preC+
                   monoMultilingual+
                   s(speaker, bs="re"),
                 data=data, method = "REML")

compareML(gam.LCF.1, gam.LCF.2)
gam.LCF.1: sDurLog ~ s(LCF_Comp.1b) + s(LCF_Comp.2b) + s(ALC) + s(ALDC, 
    k = 3) + SylPerMin + pauseBin + folType + preC + monoMultilingual + 
    s(speaker, bs = "re")

gam.LCF.2: sDurLog ~ s(LCF_Comp.1b) + s(LCF_Comp.2b) + s(ALC) + SylPerMin + 
    pauseBin + folType + preC + monoMultilingual + s(speaker, 
    bs = "re")

Model gam.LCF.2 preferred: lower REML score (3.850), and lower df (2.000).
-----
      Model     Score Edf Difference    Df
1 gam.LCF.1 -147.7378  20                 
2 gam.LCF.2 -151.5880  18     -3.850 2.000

AIC difference: 2.01, model gam.LCF.2 has lower AIC.
anova(gam.LCF.2)

Family: gaussian 
Link function: identity 

Formula:
sDurLog ~ s(LCF_Comp.1b) + s(LCF_Comp.2b) + s(ALC) + SylPerMin + 
    pauseBin + folType + preC + monoMultilingual + s(speaker, 
    bs = "re")

Parametric Terms:
                 df       F  p-value
SylPerMin         1 818.173  < 2e-16
pauseBin          1  41.391 2.54e-10
folType           4   4.855 0.000736
preC              3   1.390 0.244925
monoMultilingual  1   0.408 0.523430

Approximate significance of smooth terms:
                  edf Ref.df     F  p-value
s(LCF_Comp.1b)  1.000  1.001 0.973 0.324097
s(LCF_Comp.2b)  2.585  3.120 6.474 0.000228
s(ALC)          2.660  3.310 5.008 0.001355
s(speaker)     31.329 38.000 5.218  < 2e-16

Step 2

Exclusion of LCF_Comp.1b.

gam.LCF.3 <- bam(sDurLog ~
                   #s(LCF_Comp.1b) +
                   s(LCF_Comp.2b) +
                   s(ALC) +
                   #s(ALDC, k=3) +
                   SylPerMin+
                   pauseBin+
                   folType+
                   preC+
                   monoMultilingual+
                   s(speaker, bs="re"),
                 data=data, method = "REML")

compareML(gam.LCF.2, gam.LCF.3)
gam.LCF.2: sDurLog ~ s(LCF_Comp.1b) + s(LCF_Comp.2b) + s(ALC) + SylPerMin + 
    pauseBin + folType + preC + monoMultilingual + s(speaker, 
    bs = "re")

gam.LCF.3: sDurLog ~ s(LCF_Comp.2b) + s(ALC) + SylPerMin + pauseBin + folType + 
    preC + monoMultilingual + s(speaker, bs = "re")

Model gam.LCF.3 preferred: lower REML score (3.495), and lower df (2.000).
-----
      Model     Score Edf Difference    Df
1 gam.LCF.2 -151.5880  18                 
2 gam.LCF.3 -155.0833  16     -3.495 2.000

AIC difference: -0.07, model gam.LCF.2 has lower AIC.
anova(gam.LCF.3)

Family: gaussian 
Link function: identity 

Formula:
sDurLog ~ s(LCF_Comp.2b) + s(ALC) + SylPerMin + pauseBin + folType + 
    preC + monoMultilingual + s(speaker, bs = "re")

Parametric Terms:
                 df       F  p-value
SylPerMin         1 817.979  < 2e-16
pauseBin          1  41.930 1.96e-10
folType           4   5.011 0.000559
preC              3   1.630 0.181189
monoMultilingual  1   0.410 0.522315

Approximate significance of smooth terms:
                  edf Ref.df     F  p-value
s(LCF_Comp.2b)  2.604  3.149 6.112 0.000367
s(ALC)          2.674  3.326 5.429 0.000701
s(speaker)     31.265 38.000 5.178  < 2e-16

Step 3

Exclusion of monoMultilingual.

gam.LCF.4 <- bam(sDurLog ~
                   #s(LCF_Comp.1b) +
                   s(LCF_Comp.2b) +
                   s(ALC) +
                   #s(ALDC, k=3) +
                   SylPerMin+
                   pauseBin+
                   folType+
                   preC+
                   #monoMultilingual+
                   s(speaker, bs="re"),
                 data=data, method = "REML")

compareML(gam.LCF.3, gam.LCF.4)
gam.LCF.3: sDurLog ~ s(LCF_Comp.2b) + s(ALC) + SylPerMin + pauseBin + folType + 
    preC + monoMultilingual + s(speaker, bs = "re")

gam.LCF.4: sDurLog ~ s(LCF_Comp.2b) + s(ALC) + SylPerMin + pauseBin + folType + 
    preC + s(speaker, bs = "re")

Model gam.LCF.4 preferred: lower REML score (2.044), and lower df (1.000).
-----
      Model     Score Edf Difference    Df
1 gam.LCF.3 -155.0833  16                 
2 gam.LCF.4 -157.1269  15     -2.044 1.000

AIC difference: 0.28, model gam.LCF.4 has lower AIC.
anova(gam.LCF.4)

Family: gaussian 
Link function: identity 

Formula:
sDurLog ~ s(LCF_Comp.2b) + s(ALC) + SylPerMin + pauseBin + folType + 
    preC + s(speaker, bs = "re")

Parametric Terms:
          df       F  p-value
SylPerMin  1 832.927  < 2e-16
pauseBin   1  41.771 2.11e-10
folType    4   4.998 0.000572
preC       3   1.627 0.181921

Approximate significance of smooth terms:
                  edf Ref.df     F  p-value
s(LCF_Comp.2b)  2.608  3.154 6.111 0.000366
s(ALC)          2.677  3.330 5.428 0.000700
s(speaker)     31.986 39.000 5.097  < 2e-16

Step 4

Exclusion of preC.

gam.LCF.5 <- bam(sDurLog ~
                   #s(LCF_Comp.1b) +
                   s(LCF_Comp.2b) +
                   s(ALC) +
                   #s(ALDC, k=3) +
                   SylPerMin+
                   pauseBin+
                   folType+
                   #preC+
                   #monoMultilingual+
                   s(speaker, bs="re"),
                 data=data, method = "REML")

compareML(gam.LCF.4, gam.LCF.5)
gam.LCF.4: sDurLog ~ s(LCF_Comp.2b) + s(ALC) + SylPerMin + pauseBin + folType + 
    preC + s(speaker, bs = "re")

gam.LCF.5: sDurLog ~ s(LCF_Comp.2b) + s(ALC) + SylPerMin + pauseBin + folType + 
    s(speaker, bs = "re")

Model gam.LCF.5 preferred: lower REML score (6.892), and lower df (3.000).
-----
      Model     Score Edf Difference    Df
1 gam.LCF.4 -157.1269  15                 
2 gam.LCF.5 -164.0189  12     -6.892 3.000

AIC difference: 0.55, model gam.LCF.5 has lower AIC.
anova(gam.LCF.5)

Family: gaussian 
Link function: identity 

Formula:
sDurLog ~ s(LCF_Comp.2b) + s(ALC) + SylPerMin + pauseBin + folType + 
    s(speaker, bs = "re")

Parametric Terms:
          df       F  p-value
SylPerMin  1 851.499  < 2e-16
pauseBin   1  42.215  1.7e-10
folType    4   5.227 0.000382

Approximate significance of smooth terms:
                  edf Ref.df     F  p-value
s(LCF_Comp.2b)  2.503  3.016 6.280 0.000331
s(ALC)          2.731  3.392 4.996 0.001534
s(speaker)     31.932 39.000 5.044  < 2e-16

Summary

summary.gam(gam.LCF.5)

Family: gaussian 
Link function: identity 

Formula:
sDurLog ~ s(LCF_Comp.2b) + s(ALC) + SylPerMin + pauseBin + folType + 
    s(speaker, bs = "re")

Parametric coefficients:
                Estimate Std. Error t value Pr(>|t|)    
(Intercept)   -0.9169296  0.0472791 -19.394  < 2e-16 ***
SylPerMin     -0.5017847  0.0171959 -29.180  < 2e-16 ***
pauseBinpause  0.1144248  0.0176111   6.497  1.7e-10 ***
folTypeF      -0.0272402  0.0548033  -0.497 0.619331    
folTypeN       0.0009119  0.0210794   0.043 0.965510    
folTypeP       0.0092602  0.0184728   0.501 0.616350    
folTypeV      -0.0675720  0.0187290  -3.608 0.000334 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Approximate significance of smooth terms:
                  edf Ref.df     F  p-value    
s(LCF_Comp.2b)  2.503  3.016 6.280 0.000331 ***
s(ALC)          2.731  3.392 4.996 0.001534 ** 
s(speaker)     31.932 39.000 5.044  < 2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

R-sq.(adj) =  0.806   Deviance explained = 81.9%
-REML = -164.02  Scale est. = 0.029191  n = 653
Summary:
    * SylPerMin : numeric predictor; set to the value(s): 2.336339423. 
    * pauseBin : factor; set to the value(s): no_pause. 
    * folType : factor; set to the value(s): APP. 
    * LCF_Comp.2b : numeric predictor; set to the value(s): -0.433978743638555. 
    * ALC : numeric predictor; with 30 values ranging from -0.013297 to 0.019239. 
    * speaker : factor; set to the value(s): 2. (Might be canceled as random effect, check below.) 
    * NOTE : The following random effects columns are canceled: s(speaker)
 

The model predicts a rather unclear picture for the influence of ALC. That is, lower ALC appears to lead to longer /s/ durations, while after a certain point ALC no longer seems to show a clear effect.


Summary:
    * SylPerMin : numeric predictor; set to the value(s): 2.336339423. 
    * pauseBin : factor; set to the value(s): no_pause. 
    * folType : factor; set to the value(s): APP. 
    * LCF_Comp.2b : numeric predictor; with 30 values ranging from -3.977563 to 1.311103. 
    * ALC : numeric predictor; set to the value(s): 0.008344338. 
    * speaker : factor; set to the value(s): 2. (Might be canceled as random effect, check below.) 
    * NOTE : The following random effects columns are canceled: s(speaker)
 

LCF_Comp.2b also shows a rather unclear picture. While higher values seemingly lead to longer /s/ durations, values somewhere between -1 and 0.5 do lead to shorter /s/ durations. Values above 0.5 again lead to longer /s/ durations.

Taking the unclear picture, it appears that LMER models may be a justified alternative.

10.4 LMER

Let’s see whether a linear mixed effects regression model does a similarly good job as compared to the previous GAMM model.

Modelling

First, create a model with all relevant variables, including control and PCA variables, and speaker and monoMultilingual as random effect:

lmer.1 <- lmer(sDurLog ~
                 LCF_Comp.1b +
                 LCF_Comp.2b +
                 ALC +
                 ALDC +
                 SylPerMin+
                 pauseBin+
                 folType+
                 preC+
                 (1 | monoMultilingual) +
                 (1 | speaker),
               data=data, REML=T)

anova(lmer.1)
Type III Analysis of Variance Table with Satterthwaite's method
             Sum Sq Mean Sq NumDF  DenDF  F value    Pr(>F)    
LCF_Comp.1b  0.0001  0.0001     1 612.09   0.0048 0.9446279    
LCF_Comp.2b  0.1756  0.1756     1 621.73   5.8536 0.0158308 *  
ALC          0.7639  0.7639     1 611.99  25.4697 5.934e-07 ***
ALDC         0.0785  0.0785     1 610.34   2.6172 0.1062261    
SylPerMin   23.9691 23.9691     1 537.11 799.1533 < 2.2e-16 ***
pauseBin     1.2166  1.2166     1 633.28  40.5615 3.663e-10 ***
folType      0.6448  0.1612     4 609.93   5.3742 0.0002942 ***
preC         0.1676  0.0559     3 607.22   1.8631 0.1346015    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Using the step() function, we exclude all non-significant predictors:

step(lmer.1)
Backward reduced random-effect table:

                       Eliminated npar  logLik     AIC    LRT Df Pr(>Chisq)    
<none>                              17 149.046 -264.09                         
(1 | monoMultilingual)          1   16 149.046 -266.09   0.00  1          1    
(1 | speaker)                   0   15  98.682 -167.36 100.73  1     <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Backward reduced fixed-effect table:
Degrees of freedom method: Satterthwaite 

            Eliminated  Sum Sq Mean Sq NumDF  DenDF  F value    Pr(>F)    
LCF_Comp.1b          1  0.0001  0.0001     1 612.09   0.0048  0.944628    
preC                 2  0.1710  0.0570     3 608.42   1.9033  0.127817    
ALDC                 3  0.0519  0.0519     1 614.45   1.7242  0.189640    
LCF_Comp.2b          0  0.2468  0.2468     1 626.78   8.1856  0.004363 ** 
ALC                  0  0.6714  0.6714     1 619.08  22.2688 2.933e-06 ***
SylPerMin            0 24.7776 24.7776     1 540.90 821.8333 < 2.2e-16 ***
pauseBin             0  1.2308  1.2308     1 638.75  40.8227 3.213e-10 ***
folType              0  0.6826  0.1707     4 615.30   5.6603  0.000177 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Model found:
sDurLog ~ LCF_Comp.2b + ALC + SylPerMin + pauseBin + folType + 
    (1 | speaker)

This leaves us with the following final model:

lmer.2 <- lmer(sDurLog ~ 
                 LCF_Comp.2b + 
                 ALC + 
                 SylPerMin + 
                 pauseBin + 
                 folType + 
                 (1 | speaker),
               data=data, REML=T)

anova(lmer.2)
Type III Analysis of Variance Table with Satterthwaite's method
             Sum Sq Mean Sq NumDF  DenDF  F value    Pr(>F)    
LCF_Comp.2b  0.2468  0.2468     1 626.78   8.1856  0.004363 ** 
ALC          0.6714  0.6714     1 619.08  22.2688 2.933e-06 ***
SylPerMin   24.7776 24.7776     1 540.90 821.8333 < 2.2e-16 ***
pauseBin     1.2308  1.2308     1 638.75  40.8227 3.213e-10 ***
folType      0.6826  0.1707     4 615.30   5.6603  0.000177 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Trimming

Trimming appears to be a good idea according to the residual plots. However, for some unknown reason, the trimming procedure cannot be shown in R Markdown. Thus, the detailed procedure is skipped here. Trimming results in the removal of \(n=10\) (i.e. 1.5%) data points.

lmer.final <- lmer(sDurLog ~ 
                     LCF_Comp.2b + 
                     ALC + 
                     SylPerMin + 
                     pauseBin + 
                     folType + 
                     (1 | speaker),
                   data=data.trim, REML=T)

anova(lmer.final)
Type III Analysis of Variance Table with Satterthwaite's method
             Sum Sq Mean Sq NumDF  DenDF  F value    Pr(>F)    
LCF_Comp.2b  0.1368  0.1368     1 615.43   5.1128  0.024098 *  
ALC          0.6190  0.6190     1 607.32  23.1385 1.904e-06 ***
SylPerMin   24.8506 24.8506     1 546.76 928.9257 < 2.2e-16 ***
pauseBin     0.9860  0.9860     1 626.43  36.8587 2.202e-09 ***
folType      0.6255  0.1564     4 604.27   5.8449  0.000128 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
summary(lmer.final)
Linear mixed model fit by REML. t-tests use Satterthwaite's method [
lmerModLmerTest]
Formula: sDurLog ~ LCF_Comp.2b + ALC + SylPerMin + pauseBin + folType +  
    (1 | speaker)
   Data: data.trim

REML criterion at convergence: -390

Scaled residuals: 
     Min       1Q   Median       3Q      Max 
-2.57729 -0.69695 -0.00471  0.68707  2.62291 

Random effects:
 Groups   Name        Variance Std.Dev.
 speaker  (Intercept) 0.009019 0.09497 
 Residual             0.026752 0.16356 
Number of obs: 643, groups:  speaker, 40

Fixed effects:
                Estimate Std. Error         df t value Pr(>|t|)    
(Intercept)    -0.842383   0.047433 406.509041 -17.760  < 2e-16 ***
LCF_Comp.2b     0.012681   0.005608 615.429693   2.261 0.024098 *  
ALC            -4.196156   0.872336 607.321860  -4.810  1.9e-06 ***
SylPerMin      -0.517208   0.016970 546.763337 -30.478  < 2e-16 ***
pauseBinpause   0.102981   0.016962 626.430656   6.071  2.2e-09 ***
folTypeF       -0.027539   0.052405 607.773113  -0.526 0.599426    
folTypeN        0.004413   0.020288 599.542730   0.218 0.827886    
folTypeP        0.014218   0.017876 601.661616   0.795 0.426715    
folTypeV       -0.065205   0.018070 607.789600  -3.609 0.000333 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Correlation of Fixed Effects:
            (Intr) LCF_C. ALC    SylPrM psBnps flTypF flTypN flTypP
LCF_Comp.2b -0.153                                                 
ALC         -0.213  0.113                                          
SylPerMin   -0.896  0.145  0.076                                   
pauseBinpas -0.466  0.139  0.042  0.421                            
folTypeF    -0.082 -0.030  0.064  0.018  0.004                     
folTypeN    -0.114 -0.080  0.035 -0.043 -0.070  0.137              
folTypeP    -0.056 -0.025  0.011 -0.129 -0.094  0.153  0.413       
folTypeV     0.053  0.000  0.002 -0.240 -0.229  0.142  0.420  0.495

10.5 LMER vs. GAMM

Let’s compare the results of both, LMER and GAMM, models.

As we have seen already, the GAMM model apparently predicts longer /s/ durations for low ALC values, while higher ALC values show almost no influence on /s/ duration:

Summary:
    * SylPerMin : numeric predictor; set to the value(s): 2.336339423. 
    * pauseBin : factor; set to the value(s): no_pause. 
    * folType : factor; set to the value(s): APP. 
    * LCF_Comp.2b : numeric predictor; set to the value(s): -0.433978743638555. 
    * ALC : numeric predictor; with 30 values ranging from -0.013297 to 0.019239. 
    * speaker : factor; set to the value(s): 2. (Might be canceled as random effect, check below.) 
    * NOTE : The following random effects columns are canceled: s(speaker)
 

Taking into account the unclear picture, we checked whether an LMER model comes to a similar result:

The LMER model predicts shorter /s/ durations for higher ALC values. Taking into account the low edf-value (\(edf=2.731\)) of the ALC smooth term in the GAMM model, one should prefer the LMER result (I think).


For LCF_Comp.2b, the GAMM model again shows an unclear picture.

Summary:
    * SylPerMin : numeric predictor; set to the value(s): 2.336339423. 
    * pauseBin : factor; set to the value(s): no_pause. 
    * folType : factor; set to the value(s): APP. 
    * LCF_Comp.2b : numeric predictor; with 30 values ranging from -3.977563 to 1.311103. 
    * ALC : numeric predictor; set to the value(s): 0.008344338. 
    * speaker : factor; set to the value(s): 2. (Might be canceled as random effect, check below.) 
    * NOTE : The following random effects columns are canceled: s(speaker)
 

The LMER model, however, shows a clear increase of /s/ duration with increasing LCF_Comp.2b values:

Thus, it appears that higher LCF_Comp.2b values lead to longer /s/ durations.


In summary, higher ALC values lead to shorter /s/ durations. That is, a denser semantic neighbourhood (higher ALC values), i.e. a higher semantic activation diversity, leads to shorter /s/ durations.

Moreover, higher LCF_Comp.2b values lead to longer /s/ durations. As NNC is part of LCF_Comp.2b, i.e. high values indicate a close semantic real word neighbour, one may interpret this as an effect of semantic certainty.
As for the Affix part of this variable, one can observe that non-morphemic /s/ shows higher mean values of LCF_Comp.2b than plural /s/ (\(1.17\) vs. \(-0.99\), \(p<0.001\)).
Thus, higher values for non-morphemic /s/ cases go together with higher values of NNC, and LCF_Comp.2b in general, thus, leading to longer /s/ durations in monomorphemic pseudowords.

11 All measures

Taking into consideration all measures presented in sections 9 and 10, a final analysis is done.

Without discussing the underlying implications of the individual measures, let’s see whether some of them are correlated. If so, we need to take some measures before including them in any models.

11.1 Correlations

Correlations of all measures:

pairscor.fnc(data[,c("Affix", 
                     "cor_max", "cor_target", "l1norm", "l2norm", "density", "cor_affix",
                     "correlations", "path_counts", "path_sum", "path_entropies", "lwlr",
                     "ALC", "ALDC", "EDNN", "NNC")])

Note: Without rank as it is always \(1\), and without SCPP as it is identical to correlations.

We find the following highly correlated (\(\rho\ge0.5\)) variable combinations:

  • cor_max vs. cor_target, \(\rho=1\)
  • cor_max vs. correlations, \(\rho=1\)
  • cor_target vs. correlations, \(\rho=1\)
  • l1norm vs. l2norm, \(\rho=0.94\)
  • density vs. cor_affix, \(\rho=0.84\)
  • path_counts vs. path_entropies, \(\rho=0.95\)
  • path_sum vs. lwlr, \(\rho=-0.76\)
  • cor_max vs. EDNN, \(\rho=-0.86\)
  • cor_target vs. EDNN, \(\rho=-0.86\)
  • correlations vs. EDNN, \(\rho=-0.86\)
  • density vs. ALC, \(\rho=0.75\)
  • cor_affix vs. ALC, \(\rho=0.72\)
  • cor_affix vs. NNC, \(\rho=0.84\)
  • path_counts vs. ALDC, \(\rho=0.89\)
  • path_entropies vs. ALDC, \(\rho=0.9\)
  • Affix vs. NNC, \(r_{pb}=-0.55\) (point-biserial correlation)

Thus, we need to take some measures before we run into problems of colinearity later on.

11.2 PCA

To avoid colinearity issues when modelling, we combine all highly correlated variables in a principal component analysis. However, as we must include Affix as categorical variable, we cannot use the princomp function. Instead, we make use of the PCAmixdata function of the PCAmix package. This function uses multiple correspondence analysis (MCA) to include categorical data in PCAs.

PCA Computation

First, we extract the highly correlated variables and merge them to form a new data frame.

pca.data1 <- data[colnames(data)=="cor_max"]
pca.data2 <- data[colnames(data)=="cor_target"]
pca.data3 <- data[colnames(data)=="correlations"]
pca.data4 <- data[colnames(data)=="l1norm"]
pca.data5 <- data[colnames(data)=="l2norm"]
pca.data6 <- data[colnames(data)=="density"]
pca.data7 <- data[colnames(data)=="cor_affix"] # NAs
pca.data8 <- data[colnames(data)=="path_counts"]
pca.data9 <- data[colnames(data)=="path_entropies"]
pca.data10 <- data[colnames(data)=="path_sum"]
pca.data11 <- data[colnames(data)=="lwlr"]
pca.data12 <- data[colnames(data)=="EDNN"]
pca.data13 <- data[colnames(data)=="ALC"]
pca.data14 <- data[colnames(data)=="ALDC"]
pca.data15 <- data[colnames(data)=="NNC"]
pca.data16 <- data[colnames(data)=="Affix"]

# numerical data
pca.data1 <- cbind(pca.data1,
                   pca.data2,
                   pca.data3,
                   pca.data4,
                   pca.data5,
                   pca.data6,
                   #pca.data7, NAs
                   pca.data8,
                   pca.data9,
                   pca.data10,
                   pca.data11,
                   pca.data12,
                   pca.data13,
                   pca.data14,
                   pca.data15)

# categorical data
pca.data2 <- cbind(pca.data16)

Then, we compute a principal component analysis using the princomp function:

library(PCAmixdata)

res.pcamix <- PCAmix(X.quanti = pca.data1, X.quali = pca.data2, rename.level = TRUE, graph = FALSE)

In order to decide which components to keep, we follow three criteria:

  1. The Eigenvalue-criterion: Any component that displays an eigenvalue greater than 1.00 is accounting for a greater amount of variance than had been contributed by one variable. Such a component is therefore accounting for a meaningful amount of variance, and is worthy of being retained.
  2. The scree-criterion: Plot the eigenvalues associated with each component and look for a “break” between the components with relatively large eigenvalues and those with small eigenvalues. The components that appear before the break are assumed to be meaningful and are retained (Cattell, 1966).
  3. The proportion-of-variance-criterion: Retain enough components so that the cumulative percent of variance accounted for is equal to some minimal value, i.e. 80 percent.
res.pcamix$eig
       Eigenvalue Proportion Cumulative
dim 1  3.89282845 25.9521896   25.95219
dim 2  3.83148447 25.5432298   51.49542
dim 3  2.21067273 14.7378182   66.23324
dim 4  1.62920303 10.8613535   77.09459
dim 5  1.59939209 10.6626139   87.75721
dim 6  0.89564655  5.9709770   93.72818
dim 7  0.27558824  1.8372550   95.56544
dim 8  0.23485797  1.5657198   97.13116
dim 9  0.16680812  1.1120541   98.24321
dim 10 0.10689236  0.7126157   98.95583
dim 11 0.08452222  0.5634815   99.51931
dim 12 0.04753395  0.3168930   99.83620
dim 13 0.02456981  0.1637988  100.00000

Taking all three criteria into consideration, we find that Components 1, 2, 3, and 4 show an Eigenvalue of \(v\ge1\), are followed by a small break in Eigenvalues with the Eigenvalue of Component 4 being \(v=1.63\) and the Eigenvalue of Component 5 being \(v=1.60\), and that the cumulative percentage of variance accounted for by all four components is \(σ^2=77.1\%\), i.e. almost at our target level of \(σ^2=80\%\).

Thus, we add components 1, 2, 3, and 4 to our data frame.

pcframe <- as.data.frame(res.pcamix$scores)
pcframe <- pcframe[1:4]


names(pcframe)[names(pcframe) == "dim 1"] <- "ALL_Comp.1b"
names(pcframe)[names(pcframe) == "dim 2"] <- "ALL_Comp.2b"
names(pcframe)[names(pcframe) == "dim 3"] <- "ALL_Comp.3b"
names(pcframe)[names(pcframe) == "dim 4"] <- "ALL_Comp.4b"


data <- cbind(data, pcframe)

PCA Loadings

Now we can take a closer look at the individual components and their loadings.

res.pcamix$sqload
                     dim 1       dim 2       dim 3        dim 4        dim 5
cor_max        0.637722065 0.326920348 0.005879550 0.0120559346 0.0038315277
cor_target     0.637722065 0.326920348 0.005879550 0.0120559346 0.0038315277
correlations   0.637722065 0.326920348 0.005879550 0.0120559346 0.0038315277
l1norm         0.108952652 0.145102711 0.596151554 0.0003336438 0.0082385537
l2norm         0.110055547 0.138399299 0.599856157 0.0020516355 0.0266775599
density        0.015951067 0.252396450 0.027885544 0.3248052927 0.2542507614
path_counts    0.349562685 0.448228572 0.055724227 0.0088127215 0.0851677391
path_entropies 0.310867526 0.426202300 0.069083373 0.0294327025 0.1245838013
path_sum       0.118532333 0.106756317 0.213682472 0.0184313105 0.2653051450
lwlr           0.154372616 0.164872517 0.286525573 0.0089769820 0.1831666121
EDNN           0.449785343 0.419112785 0.005859934 0.0008826359 0.0003926486
ALC            0.017413225 0.279537745 0.006706018 0.5230282496 0.0057341834
ALDC           0.296892353 0.317874615 0.056139063 0.0254351727 0.1938145659
NNC            0.037602170 0.143869002 0.268593767 0.1350932734 0.2417072964
Affix          0.009674736 0.008371119 0.006826399 0.5157516060 0.1988586395

But what do these loadings conceptually mean?

Component 1

Component 1 is rather highly correlated with cor_max, cor_target, and correlations. This component is identical to WWL.Comp_1.

For cor_max, higher values indicate a close semantic neighbour.

For cor_target, higher values indicate a good mapping from form to meaning, i.e. a higher semantic certainty.

For correlations, higher values indicate a higher goodness of production by means of fitting semantics.

Note: What is correlations again? Given all candidate forms of a word, i.e. all word forms found as candidate production by the production model, and their estimated semantic vectors \(s_{candidate1}...s_{candidate1+n}\), correlation values between these estimated semantic vectors and a word’s semantic vector \(s\) are computed. The highest of these correlations values is taken as value of correlations.

Thus, for Component 1, high values indicate higher semantic certainty, while lower values indicate higher semantic uncertainty.

Component 2

Component 2 is rather highly negatively correlated with path_counts and path_entropies. This component is identical to WWL.Comp_2.

For path_counts, higher values indicate the existence of multiple candidates (and thus paths) in production. May be interpreted as a measure of phonological activation diversity.

For path_entropies, higher values indicate a higher level of disorder, i.e. uncertainty.

In sum, Component 2 reflects the certainty in production, i.e. high values indicate high certainty.

Component 3

Component 3 is rather highly correlated with l1norm and l2norm. This component is almost identical to WWL.Comp_3, i.e. for WWL.Comp_3 measures are negatively correlated.

For l1norm, higher values imply more strong links to many other lexomes.

For l2norm, higher values imply more strong links to many other lexomes.

Both, l1norm and l2norm, may be interpreted as measures of semantic activation diversity.

In sum, Component 3 reflects semantic activation diversity, i.e. high component values indicate low semantic activation diversity.

Component 4

Component 4 is rather highly correlated with ALC and rather highly negatively correlated with Affix.

For ALC, higher values indicate that a pseudoword’s semantics are part of a denser real word semantic neighbourhood. This may be interpreted as semantic activation diversity, but also as certainty in pseudoword semantics.

Thus, Component 4 reflects semantic activation diversity, i.e. high component values indicate a dense semantic neighbourhood.

11.3 GAMM

Let’s see whether the PCA component are significant predictors for word-final /s/ duration in singular and plural pseudowords. We will use Generalized Additive Mixed Models (GAMM) to model /s/ durations.

Modelling

Step 0

First, create a model with all relevant variables, including control and PCA variables, and speaker as random effect:

gam.ALL.1 <- bam(sDurLog ~
                   s(ALL_Comp.1b) +
                   s(ALL_Comp.2b) +
                   s(ALL_Comp.3b) +
                   s(ALL_Comp.4b) +
                   SylPerMin+
                   pauseBin+
                   folType+
                   preC+
                   monoMultilingual+
                   s(speaker, bs="re"),
                 data=data, method = "REML")

Checking the anova() function’s output, we now exclude non-significant variables in a stepwise fashion. After each potential exclusion, compareML() is used to check whether the exclusion is justified in terms of model fit.

anova(gam.ALL.1)

Family: gaussian 
Link function: identity 

Formula:
sDurLog ~ s(ALL_Comp.1b) + s(ALL_Comp.2b) + s(ALL_Comp.3b) + 
    s(ALL_Comp.4b) + SylPerMin + pauseBin + folType + preC + 
    monoMultilingual + s(speaker, bs = "re")

Parametric Terms:
                 df       F  p-value
SylPerMin         1 847.659  < 2e-16
pauseBin          1  36.267 2.99e-09
folType           4   4.760 0.000869
preC              3   1.541 0.202805
monoMultilingual  1   0.337 0.561827

Approximate significance of smooth terms:
                  edf Ref.df      F  p-value
s(ALL_Comp.1b)  1.103  1.195  0.078  0.77292
s(ALL_Comp.2b)  1.006  1.012  2.309  0.12749
s(ALL_Comp.3b)  4.480  5.465  3.681  0.00208
s(ALL_Comp.4b)  1.848  2.301 17.796 6.66e-09
s(speaker)     31.440 38.000  5.333  < 2e-16

Step 1

Exclusion of ALL_Comp.1b.

gam.ALL.2 <- bam(sDurLog ~
                   #s(ALL_Comp.1b) +
                   s(ALL_Comp.2b) +
                   s(ALL_Comp.3b) +
                   s(ALL_Comp.4b) +
                   SylPerMin+
                   pauseBin+
                   folType+
                   preC+
                   monoMultilingual+
                   s(speaker, bs="re"),
                 data=data, method = "REML")

compareML(gam.ALL.1, gam.ALL.2)
gam.ALL.1: sDurLog ~ s(ALL_Comp.1b) + s(ALL_Comp.2b) + s(ALL_Comp.3b) + 
    s(ALL_Comp.4b) + SylPerMin + pauseBin + folType + preC + 
    monoMultilingual + s(speaker, bs = "re")

gam.ALL.2: sDurLog ~ s(ALL_Comp.2b) + s(ALL_Comp.3b) + s(ALL_Comp.4b) + 
    SylPerMin + pauseBin + folType + preC + monoMultilingual + 
    s(speaker, bs = "re")

Model gam.ALL.2 preferred: lower REML score (3.942), and lower df (2.000).
-----
      Model     Score Edf Difference    Df
1 gam.ALL.1 -151.5819  20                 
2 gam.ALL.2 -155.5238  18     -3.942 2.000

AIC difference: 5.53, model gam.ALL.2 has lower AIC.
anova(gam.ALL.2)

Family: gaussian 
Link function: identity 

Formula:
sDurLog ~ s(ALL_Comp.2b) + s(ALL_Comp.3b) + s(ALL_Comp.4b) + 
    SylPerMin + pauseBin + folType + preC + monoMultilingual + 
    s(speaker, bs = "re")

Parametric Terms:
                 df       F  p-value
SylPerMin         1 851.200  < 2e-16
pauseBin          1  36.188 3.11e-09
folType           4   4.824 0.000778
preC              3   1.830 0.140468
monoMultilingual  1   0.330 0.565763

Approximate significance of smooth terms:
                  edf Ref.df      F  p-value
s(ALL_Comp.2b)  1.000  1.000  3.957   0.0471
s(ALL_Comp.3b)  4.419  5.394  3.656   0.0023
s(ALL_Comp.4b)  1.797  2.235 18.580 4.23e-09
s(speaker)     31.438 38.000  5.335  < 2e-16

Step 2

Exclusion of monoMultilingual.

gam.ALL.3 <- bam(sDurLog ~
                   #s(ALL_Comp.1b) +
                   s(ALL_Comp.2b) +
                   s(ALL_Comp.3b) +
                   s(ALL_Comp.4b) +
                   SylPerMin+
                   pauseBin+
                   folType+
                   preC+
                   #monoMultilingual+
                   s(speaker, bs="re"),
                 data=data, method = "REML")

compareML(gam.ALL.2, gam.ALL.3)
gam.ALL.2: sDurLog ~ s(ALL_Comp.2b) + s(ALL_Comp.3b) + s(ALL_Comp.4b) + 
    SylPerMin + pauseBin + folType + preC + monoMultilingual + 
    s(speaker, bs = "re")

gam.ALL.3: sDurLog ~ s(ALL_Comp.2b) + s(ALL_Comp.3b) + s(ALL_Comp.4b) + 
    SylPerMin + pauseBin + folType + preC + s(speaker, bs = "re")

Model gam.ALL.3 preferred: lower REML score (2.080), and lower df (1.000).
-----
      Model     Score Edf Difference    Df
1 gam.ALL.2 -155.5238  18                 
2 gam.ALL.3 -157.6034  17     -2.080 1.000

AIC difference: 0.14, model gam.ALL.3 has lower AIC.
anova(gam.ALL.3)

Family: gaussian 
Link function: identity 

Formula:
sDurLog ~ s(ALL_Comp.2b) + s(ALL_Comp.3b) + s(ALL_Comp.4b) + 
    SylPerMin + pauseBin + folType + preC + s(speaker, bs = "re")

Parametric Terms:
          df       F  p-value
SylPerMin  1 866.290  < 2e-16
pauseBin   1  36.063  3.3e-09
folType    4   4.812 0.000793
preC       3   1.818 0.142592

Approximate significance of smooth terms:
                  edf Ref.df      F  p-value
s(ALL_Comp.2b)  1.000  1.000  3.943  0.04748
s(ALL_Comp.3b)  4.422  5.398  3.670  0.00223
s(ALL_Comp.4b)  1.783  2.217 18.735 3.97e-09
s(speaker)     32.154 39.000  5.239  < 2e-16

Step 3

Exclusion of preC.

gam.ALL.4 <- bam(sDurLog ~
                   #s(ALL_Comp.1b) +
                   s(ALL_Comp.2b) +
                   s(ALL_Comp.3b) +
                   s(ALL_Comp.4b) +
                   SylPerMin+
                   pauseBin+
                   folType+
                   #preC+
                   #monoMultilingual+
                   s(speaker, bs="re"),
                 data=data, method = "REML")

compareML(gam.ALL.3, gam.ALL.4)
gam.ALL.3: sDurLog ~ s(ALL_Comp.2b) + s(ALL_Comp.3b) + s(ALL_Comp.4b) + 
    SylPerMin + pauseBin + folType + preC + s(speaker, bs = "re")

gam.ALL.4: sDurLog ~ s(ALL_Comp.2b) + s(ALL_Comp.3b) + s(ALL_Comp.4b) + 
    SylPerMin + pauseBin + folType + s(speaker, bs = "re")

Model gam.ALL.4 preferred: lower REML score (6.690), and lower df (3.000).
-----
      Model     Score Edf Difference    Df
1 gam.ALL.3 -157.6034  17                 
2 gam.ALL.4 -164.2930  14     -6.690 3.000

AIC difference: -0.87, model gam.ALL.3 has lower AIC.
anova(gam.ALL.4)

Family: gaussian 
Link function: identity 

Formula:
sDurLog ~ s(ALL_Comp.2b) + s(ALL_Comp.3b) + s(ALL_Comp.4b) + 
    SylPerMin + pauseBin + folType + s(speaker, bs = "re")

Parametric Terms:
          df       F  p-value
SylPerMin  1 886.415  < 2e-16
pauseBin   1  38.013 1.28e-09
folType    4   5.042 0.000529

Approximate significance of smooth terms:
                  edf Ref.df      F  p-value
s(ALL_Comp.2b)  1.315  1.548  1.231  0.19749
s(ALL_Comp.3b)  5.209  6.296  3.859  0.00077
s(ALL_Comp.4b)  2.023  2.520 16.619 7.26e-09
s(speaker)     32.117 39.000  5.214  < 2e-16

Step 4

Exclusion of ALL_Comp.2b.

gam.ALL.5 <- bam(sDurLog ~
                   #s(ALL_Comp.1b) +
                   #s(ALL_Comp.2b) +
                   s(ALL_Comp.3b) +
                   s(ALL_Comp.4b) +
                   SylPerMin+
                   pauseBin+
                   folType+
                   #preC+
                   #monoMultilingual+
                   s(speaker, bs="re"),
                 data=data, method = "REML")

compareML(gam.ALL.4, gam.ALL.5)
gam.ALL.4: sDurLog ~ s(ALL_Comp.2b) + s(ALL_Comp.3b) + s(ALL_Comp.4b) + 
    SylPerMin + pauseBin + folType + s(speaker, bs = "re")

gam.ALL.5: sDurLog ~ s(ALL_Comp.3b) + s(ALL_Comp.4b) + SylPerMin + pauseBin + 
    folType + s(speaker, bs = "re")

Model gam.ALL.5 preferred: lower REML score (2.805), and lower df (2.000).
-----
      Model    Score Edf Difference    Df
1 gam.ALL.4 -164.293  14                 
2 gam.ALL.5 -167.098  12     -2.805 2.000

AIC difference: 0.97, model gam.ALL.5 has lower AIC.
anova(gam.ALL.5)

Family: gaussian 
Link function: identity 

Formula:
sDurLog ~ s(ALL_Comp.3b) + s(ALL_Comp.4b) + SylPerMin + pauseBin + 
    folType + s(speaker, bs = "re")

Parametric Terms:
          df       F  p-value
SylPerMin  1 885.121  < 2e-16
pauseBin   1  39.169 7.35e-10
folType    4   5.233 0.000378

Approximate significance of smooth terms:
                  edf Ref.df      F  p-value
s(ALL_Comp.3b)  5.356  6.456  4.204 0.000309
s(ALL_Comp.4b)  2.194  2.728 15.414 8.11e-09
s(speaker)     31.993 39.000  5.121  < 2e-16

Summary

summary.gam(gam.ALL.5)

Family: gaussian 
Link function: identity 

Formula:
sDurLog ~ s(ALL_Comp.3b) + s(ALL_Comp.4b) + SylPerMin + pauseBin + 
    folType + s(speaker, bs = "re")

Parametric coefficients:
               Estimate Std. Error t value Pr(>|t|)    
(Intercept)   -0.900214   0.046968 -19.167  < 2e-16 ***
SylPerMin     -0.508560   0.017094 -29.751  < 2e-16 ***
pauseBinpause  0.109416   0.017483   6.259 7.35e-10 ***
folTypeF      -0.001866   0.054333  -0.034 0.972609    
folTypeN       0.004629   0.020793   0.223 0.823915    
folTypeP       0.009948   0.018280   0.544 0.586516    
folTypeV      -0.065820   0.018568  -3.545 0.000423 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Approximate significance of smooth terms:
                  edf Ref.df      F  p-value    
s(ALL_Comp.3b)  5.356  6.456  4.204 0.000309 ***
s(ALL_Comp.4b)  2.194  2.728 15.414 8.11e-09 ***
s(speaker)     31.993 39.000  5.121  < 2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

R-sq.(adj) =   0.81   Deviance explained = 82.4%
-REML = -167.1  Scale est. = 0.028569  n = 653
Summary:
    * SylPerMin : numeric predictor; set to the value(s): 2.336339423. 
    * pauseBin : factor; set to the value(s): no_pause. 
    * folType : factor; set to the value(s): APP. 
    * ALL_Comp.3b : numeric predictor; with 30 values ranging from -4.311999 to 2.983377. 
    * ALL_Comp.4b : numeric predictor; set to the value(s): 0.0515270814629842. 
    * speaker : factor; set to the value(s): 2. (Might be canceled as random effect, check below.) 
    * NOTE : The following random effects columns are canceled: s(speaker)
 

The model predicts a rather unclear picture for the influence of ALL_Comp.3b.


Summary:
    * SylPerMin : numeric predictor; set to the value(s): 2.336339423. 
    * pauseBin : factor; set to the value(s): no_pause. 
    * folType : factor; set to the value(s): APP. 
    * ALL_Comp.3b : numeric predictor; set to the value(s): 0.0225544158487821. 
    * ALL_Comp.4b : numeric predictor; with 30 values ranging from -3.315741 to 1.854101. 
    * speaker : factor; set to the value(s): 2. (Might be canceled as random effect, check below.) 
    * NOTE : The following random effects columns are canceled: s(speaker)
 

ALL_Comp.4b shows an almost linear effect.

Thus, it appears that LMER models may be a justified alternative.

11.4 LMER

Let’s see whether a linear mixed effects regression model does a similarly good job as compared to the previous GAMM model.

Modelling

First, create a model with all relevant variables, including control and PCA variables, and speaker and monoMultilingual as random effect:

lmer.ALL.1 <- lmer(sDurLog ~
                 ALL_Comp.1b +
                 ALL_Comp.2b +
                 ALL_Comp.3b +
                 ALL_Comp.4b +
                 SylPerMin+
                 pauseBin+
                 folType+
                 preC+
                 (1 | monoMultilingual) +
                 (1 | speaker),
               data=data, REML=T)

anova(lmer.ALL.1)
Type III Analysis of Variance Table with Satterthwaite's method
             Sum Sq Mean Sq NumDF  DenDF  F value    Pr(>F)    
ALL_Comp.1b  0.0004  0.0004     1 611.25   0.0121  0.912600    
ALL_Comp.2b  0.1744  0.1744     1 610.87   6.0023  0.014567 *  
ALL_Comp.3b  0.3149  0.3149     1 605.90  10.8341  0.001054 ** 
ALL_Comp.4b  1.2835  1.2835     1 619.06  44.1650 6.626e-11 ***
SylPerMin   24.6608 24.6608     1 535.94 848.5651 < 2.2e-16 ***
pauseBin     1.0320  1.0320     1 633.09  35.5107 4.212e-09 ***
folType      0.5823  0.1456     4 609.58   5.0095  0.000560 ***
preC         0.2661  0.0887     3 606.43   3.0522  0.028047 *  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Using the step() function, we exclude all non-significant predictors:

step(lmer.ALL.1)
Backward reduced random-effect table:

                       Eliminated npar  logLik     AIC   LRT Df Pr(>Chisq)    
<none>                              17 151.258 -268.52                        
(1 | monoMultilingual)          1   16 151.258 -270.52   0.0  1          1    
(1 | speaker)                   0   15  98.957 -167.91 104.6  1     <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Backward reduced fixed-effect table:
Degrees of freedom method: Satterthwaite 

            Eliminated  Sum Sq Mean Sq NumDF  DenDF  F value    Pr(>F)    
ALL_Comp.1b          1  0.0004  0.0004     1 611.25   0.0121 0.9125999    
ALL_Comp.2b          0  0.1750  0.1750     1 612.06   6.0326 0.0143211 *  
ALL_Comp.3b          0  0.3171  0.3171     1 606.77  10.9299 0.0010021 ** 
ALL_Comp.4b          0  1.2852  1.2852     1 620.27  44.2940 6.221e-11 ***
SylPerMin            0 24.7441 24.7441     1 537.71 852.7706 < 2.2e-16 ***
pauseBin             0  1.0317  1.0317     1 634.14  35.5571 4.114e-09 ***
folType              0  0.5844  0.1461     4 610.60   5.0354 0.0005349 ***
preC                 0  0.2855  0.0952     3 607.69   3.2802 0.0206276 *  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Model found:
sDurLog ~ ALL_Comp.2b + ALL_Comp.3b + ALL_Comp.4b + SylPerMin + 
    pauseBin + folType + preC + (1 | speaker)

This leaves us with the following final model:

lmer.ALL.2 <- lmer(sDurLog ~ 
                 ALL_Comp.2b + 
                 ALL_Comp.3b + 
                 ALL_Comp.4b + 
                 SylPerMin + 
                 pauseBin + 
                 folType + 
                 preC + 
                 (1 | speaker),
               data=data, REML=T)

anova(lmer.ALL.2)
Type III Analysis of Variance Table with Satterthwaite's method
             Sum Sq Mean Sq NumDF  DenDF  F value    Pr(>F)    
ALL_Comp.2b  0.1750  0.1750     1 612.06   6.0326 0.0143211 *  
ALL_Comp.3b  0.3171  0.3171     1 606.77  10.9299 0.0010021 ** 
ALL_Comp.4b  1.2852  1.2852     1 620.27  44.2940 6.221e-11 ***
SylPerMin   24.7441 24.7441     1 537.71 852.7706 < 2.2e-16 ***
pauseBin     1.0317  1.0317     1 634.14  35.5571 4.114e-09 ***
folType      0.5844  0.1461     4 610.60   5.0354 0.0005349 ***
preC         0.2855  0.0952     3 607.69   3.2802 0.0206276 *  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Trimming

Trimming appears to be a good idea according to the residual plots. However, for some unknown reason, the trimming procedure cannot be shown in R Markdown. Thus, the detailed procedure is skipped here. Trimming results in the removal of \(n=7\) (i.e. 1.1%) data points.

lmer.final <- lmer(sDurLog ~ 
                     ALL_Comp.2b + 
                     ALL_Comp.3b + 
                     ALL_Comp.4b + 
                     SylPerMin + 
                     pauseBin + 
                     folType + 
                     preC + 
                     (1 | speaker),
                   data=data.trim, REML=T)

anova(lmer.final)
Type III Analysis of Variance Table with Satterthwaite's method
             Sum Sq Mean Sq NumDF  DenDF  F value    Pr(>F)    
ALL_Comp.2b  0.1543  0.1543     1 604.57   5.8258 0.0160893 *  
ALL_Comp.3b  0.3601  0.3601     1 600.02  13.5939 0.0002475 ***
ALL_Comp.4b  1.0439  1.0439     1 612.56  39.4113 6.503e-10 ***
SylPerMin   25.0421 25.0421     1 538.90 945.4032 < 2.2e-16 ***
pauseBin     0.8957  0.8957     1 625.73  33.8158 9.665e-09 ***
folType      0.5697  0.1424     4 603.42   5.3772 0.0002931 ***
preC         0.2770  0.0923     3 600.75   3.4856 0.0156299 *  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
summary(lmer.final)
Linear mixed model fit by REML. t-tests use Satterthwaite's method [
lmerModLmerTest]
Formula: sDurLog ~ ALL_Comp.2b + ALL_Comp.3b + ALL_Comp.4b + SylPerMin +  
    pauseBin + folType + preC + (1 | speaker)
   Data: data.trim

REML criterion at convergence: -362.8

Scaled residuals: 
     Min       1Q   Median       3Q      Max 
-2.51964 -0.63451  0.01489  0.70055  2.54587 

Random effects:
 Groups   Name        Variance Std.Dev.
 speaker  (Intercept) 0.009063 0.0952  
 Residual             0.026488 0.1628  
Number of obs: 646, groups:  speaker, 40

Fixed effects:
                Estimate Std. Error         df t value Pr(>|t|)    
(Intercept)    -0.829145   0.048025 420.881076 -17.265  < 2e-16 ***
ALL_Comp.2b    -0.008318   0.003446 604.567104  -2.414 0.016089 *  
ALL_Comp.3b    -0.017038   0.004621 600.018486  -3.687 0.000247 ***
ALL_Comp.4b    -0.033745   0.005375 612.562512  -6.278 6.50e-10 ***
SylPerMin      -0.524868   0.017070 538.898703 -30.747  < 2e-16 ***
pauseBinpause   0.098413   0.016924 625.729472   5.815 9.66e-09 ***
folTypeF       -0.012088   0.052199 606.447567  -0.232 0.816951    
folTypeN        0.013395   0.020195 598.930413   0.663 0.507405    
folTypeP        0.017635   0.017776 600.826520   0.992 0.321552    
folTypeV       -0.058057   0.017995 607.711315  -3.226 0.001321 ** 
preCk          -0.031337   0.019489 599.938538  -1.608 0.108381    
preCp          -0.031340   0.019345 600.318088  -1.620 0.105745    
preCt          -0.062844   0.019479 601.786821  -3.226 0.001322 ** 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

11.5 LMER vs. GAMM

Let’s compare the results of both, LMER and GAMM, models.

As we have seen already, the GAMM model somewhat predicts an unclear effect of ALL_Comp.3:

Summary:
    * SylPerMin : numeric predictor; set to the value(s): 2.336339423. 
    * pauseBin : factor; set to the value(s): no_pause. 
    * folType : factor; set to the value(s): APP. 
    * ALL_Comp.3b : numeric predictor; with 30 values ranging from -4.311999 to 2.983377. 
    * ALL_Comp.4b : numeric predictor; set to the value(s): 0.0515270814629842. 
    * speaker : factor; set to the value(s): 2. (Might be canceled as random effect, check below.) 
    * NOTE : The following random effects columns are canceled: s(speaker)
 

Considering the unclear picture, we checked an LMER model:

The LMER model predicts shorter /s/ durations for higher ALL_Comp.3b values.


For ALL_Comp.4b, both models assume a linear effect pointing into the same direction.

Summary:
    * SylPerMin : numeric predictor; set to the value(s): 2.336339423. 
    * pauseBin : factor; set to the value(s): no_pause. 
    * folType : factor; set to the value(s): APP. 
    * ALL_Comp.3b : numeric predictor; set to the value(s): 0.0225544158487821. 
    * ALL_Comp.4b : numeric predictor; with 30 values ranging from -3.315741 to 1.854101. 
    * speaker : factor; set to the value(s): 2. (Might be canceled as random effect, check below.) 
    * NOTE : The following random effects columns are canceled: s(speaker)
 

Thus, is appear that higher ALL_Comp.4b values lead to shorter /s/ durations.


Lastly, the LMER model also contains ALL_Comp.2b as a significant predictor for /s/ duration.

It appears that higher ALL_Comp.2b values lead to shorter /s/ durations.


In sum, higher ALL_Comp.2b and ALL_Comp.3b values lead to shorter /s/ durations. That is, higher semantic activation diversity leads to shorter /s/ durations.

Additionally, higher ALL_Comp.4b values lead to shorter /s/ durations. That is, pseudowords which are part of a denser semantic neighbourhood, i.e. higher values of ALC, show shorter /s/ durations.
As for the Affix part of this variable, one can observe that non-morphemic /s/ shows lower mean values of ALL_Comp.4b than plural /s/ (\(-0.99\) vs. \(0.85\), \(p<0.001\)).
Thus, higher values for plural /s/ cases go together with higher values of ALC, and LCF_Comp.4b in general, thus, leading to shorter /s/ durations in plural pseudowords.

12 Summary

Lower semantic activation diversity leads to longer /s/ durations.
WWL_Comp.3; ALL_Comp.2b; ALL_Comp.3b

A denser semantic neighbourhood leads to shorter /s/ durations.
ALC

Higher semantic certainty leads to longer /s/ durations.
LCF_Comp.2b; ALL_Comp.4b

References

Baayen, R. H., Chuang, Y. Y., and Blevins, J. P. (2018). Inflectional morphology with linear mappings. The Mental Lexicon, 13 (2), 232-270. doi.org/10.1075/ml.18010.baa

Baayen, R. H., Chuang, Y. Y., Shafaei-Bajestan, E., & Blevins, J. P. (2019). The discriminative lexicon: A unified computational model for the lexicon and lexical processing in comprehension and production grounded not in (de)composition but in linear discriminative learning. Complexity, 2019, 1-39. doi.org/10.1155/2019/4895891

Cattell, R. B. (1966). The scree test for the number of factors. Multivariate Behavioral Research, 1, 245-276.

Chavant, M., Kuentz, V., Labenne, A., Liquet, B., & Saracco, J. (2017). PCAmixdata: Multivariate Analysis of Mixed Data. R package version 3.1. https://CRAN.R-project.org/package=PCAmixdata

Chuang, Y-Y., Vollmer, M-l., Shafaei-Bajestan, E., Gahl, S., Hendrix, P., & Baayen, R. H. (2020). The processing of pseudoword form and meaning in production and comprehension: A computational modeling approach using Linear Discriminative Learning. Behavior Research Methods, 1-51. doi.org/10.3758/s13428-020-01356-w

Nykamp, D. (2020, November 10). Multiplying matrices and vectors. Math Insight. mathinsight.org/matrix_vector_multiplication

Schmitz, D., Baer-Henney, D., & Plag, I. (submitted). The duration of word-final /s/ differs across morphological categories in English: Evidence from pseudowords.

Tucker, B. V., Brenner, D., Danielson, D. K., Kelley, M. C., Nenadic, F., & Sims, M. (2018). The massive auditory lexical decision (MALD) database. Behavior Research Methods, 1-18. doi.org/10.3758/s13428-018-1056-1