This RScript details how to perform an RSiena analysis for network and behavior co-evolution. In this example we will analyse how friendship and alcohol consumption co-evolve over time. The data comes from a Scottish school cohort collected by Lynn Michell and Patrick West of the Medical Research Council / Medical Sociology Unit of the University of Glasgow in the school years 1994-95 until 1996-97.
If not done yet, install the necessary packages.
install.packages("RSiena", repos = "http://cran.us.r-project.org")
install.packages("network", repos = "http://cran.us.r-project.org")
install.packages("sna", repos = "http://cran.us.r-project.org")
To be sure, let us check the package version of RSiena. It should be 1.2-9 or higher.
packageVersion("RSiena")
## [1] '1.2.12'
Let us load the packages so that we can use them.
library(RSiena)
library(sna)
library(network)
The s50 data set is part of the RSiena package you can use ?s50 to get some more information.
?s50
Let’s find out the number of actors in this data set:
numberActors <- nrow(s501)
numberActors
## [1] 50
s50a contains the alcohol consumption s50s contains the smoking behavior means of demographic variables:
table(s50a[,1])
##
## 1 2 3 4 5
## 5 16 12 14 3
table(s50a[,2])
##
## 1 2 3 4 5
## 3 16 12 11 8
table(s50a[,3])
##
## 1 2 3 4 5
## 1 9 17 17 6
table(s50s[,1])
##
## 1 2 3
## 38 5 7
table(s50s[,2])
##
## 1 2 3
## 33 8 9
table(s50s[,3])
##
## 1 2 3
## 31 2 17
Let’s take a look at the network change between observations, e.g., between the first two waves:
table(s501,s502,useNA = 'always')
## s502
## s501 0 1 <NA>
## 0 2328 59 0
## 1 56 57 0
## <NA> 0 0 0
We can compute Hamming distances and Jaccard indices from such tables. Unfortuantely these functions are not implemented in the packages we loaded (maybe should change that…), but it is straightforward to write a function yourself in R:
Hamming <- function(changetable) {
return(changetable[2,1] + changetable[1,2])
}
Jaccard <- function(changetable) {
return(changetable[2,2]/(changetable[1,2] + changetable[2,1] + changetable[2,2]))
}
Now that we have the functions, let’s apply them.
Hamming(table(s501,s502))
## [1] 115
Hamming(table(s502,s503))
## [1] 106
Jaccard(table(s501,s502))
## [1] 0.3313953
Jaccard(table(s502,s503))
## [1] 0.3837209
And now overall change:
Hamming(table(s501,s503))
## [1] 133
Jaccard(table(s501,s503))
## [1] 0.2771739
First, let us identify the dependent network variable for the analysis:
friendship <- sienaDependent(array(
c(s501,s502,s503),
dim = c(numberActors,numberActors,3)))
You can use ?sienaDependent to see the function arugments.
?sienaDependent
Now we identify the dependent behavior variable:
drinking <- sienaDependent(s50a,type = "behavior")
Let’s bind the data together for the RSiena analysis:
CoEvolutionData <- sienaDataCreate(friendship,drinking)
CoEvolutionData
## Dependent variables: friendship, drinking
## Number of observations: 3
##
## Nodeset Actors
## Number of nodes 50
##
## Dependent variable friendship
## Type oneMode
## Observations 3
## Nodeset Actors
## Densities 0.046 0.047 0.05
##
## Dependent variable drinking
## Type behavior
## Observations 3
## Nodeset Actors
## Range 1 - 5
We can generate an initial descriptive outputfile:
print01Report(CoEvolutionData,modelname = 'netDynamics-descriptive')
File “netDynamics-descriptive.out” was created in the working directory. You can use getwd() to find your current directory. But I recommend to use Rprojects in RStudio.
First, we need to create an RSiena effects object. This is basically just a table that contains all the effects we want to use in our model and all the special options you have specified for your effect (e.g. initial values of parameters).
CoEvolutionModel <- getEffects(CoEvolutionData)
How does thís ‘empty’ model specification look like?
print(CoEvolutionModel)
## name effectName include fix test
## 1 friendship constant friendship rate (period 1) TRUE FALSE FALSE
## 2 friendship constant friendship rate (period 2) TRUE FALSE FALSE
## 3 friendship outdegree (density) TRUE FALSE FALSE
## 4 friendship reciprocity TRUE FALSE FALSE
## 5 drinking rate drinking (period 1) TRUE FALSE FALSE
## 6 drinking rate drinking (period 2) TRUE FALSE FALSE
## 7 drinking drinking linear shape TRUE FALSE FALSE
## 8 drinking drinking quadratic shape TRUE FALSE FALSE
## initialValue parm
## 1 4.69604 0
## 2 4.32885 0
## 3 -1.46770 0
## 4 0.00000 0
## 5 0.70571 0
## 6 0.84939 0
## 7 0.32237 0
## 8 0.00000 0
As we can see, by default RSiena adds a few pre-selected effects. If necessary you can remove these effects from your model with the setEffect() function.
You can check which effects are available for adding to the model specification by using
effectsDocumentation(CoEvolutionModel)
However, these are usually quite a lot and it might not be the easiest option to look through this document. We strongly recommend to have a close look at the RSiena manual. Section 12 lists all effects programmed in RSiena, including the exact mathematical formulation used.
In this model we have two dependent variables, the network and the behavior. We start with specifying the network effects:
Now let’s include some effects. We start with sender, receiver and homophily effects for the drinking behavior.
CoEvolutionModel <- includeEffects(CoEvolutionModel,
egoX,altX,simX, interaction1 = 'drinking')
## effectName include fix test initialValue parm
## 1 drinking alter TRUE FALSE FALSE 0 0
## 2 drinking ego TRUE FALSE FALSE 0 0
## 3 drinking similarity TRUE FALSE FALSE 0 0
Lastly, we want to add some structural effects. First we add an effect for transitive closure. Here we do not choose the normal transitive closure effect (although available in RSiena, but we choose a geometrically weighted option. Geometrically weighted parameters for triadic closure often lead to better fit, and they usually fit better with theory and common sense. The idea is that every additional friend that actors i and j have in common only adds diminishign returns to the probabilty of i sending a tie to j.
CoEvolutionModel <- includeEffects(CoEvolutionModel, gwespFF)
## effectName include fix test initialValue parm
## 1 GWESP I -> K -> J (#) TRUE FALSE FALSE 0 69
From the session about network evolution we know that we need some more effects for a better fit:
CoEvolutionModel <- includeEffects(CoEvolutionModel, inPopSqrt, outActSqrt,outPopSqrt,nbrDist2,
reciAct,gwespBB)
## effectName include fix test initialValue parm
## 1 number of actors at dist 2 TRUE FALSE FALSE 0 0
## 2 GWESP I <- K <- J (#) TRUE FALSE FALSE 0 69
## 3 indegree - popularity (sqrt) TRUE FALSE FALSE 0 0
## 4 outdegree - popularity (sqrt) TRUE FALSE FALSE 0 0
## 5 outdegree - activity (sqrt) TRUE FALSE FALSE 0 1
## 6 rec.degree^(1/#) - activity TRUE FALSE FALSE 0 1
Now that we have specified the network model we can specify the behavior model.
First, we can test if friends effect each other in their alcohol consumption.
CoEvolutionModel <- includeEffects(CoEvolutionModel,
avSim, interaction1 = "friendship", name = "drinking")
## effectName include fix test initialValue parm
## 1 drinking average similarity TRUE FALSE FALSE 0 0
Here, instead of ‘avSim’ also could have used ‘totSim’, ‘avAlt’, or ‘totAlt’. All of these are influence effects, but each of them estimates a slightly different effect. Your theory should tell you which is the best choice for you.
We can also test other hypotheses. For instance, maybe students isolated in the network are more likely to increase their alcohol consumption.
CoEvolutionModel <- includeEffects(CoEvolutionModel,
isolate, interaction1 = "friendship", name = "drinking")
## effectName include fix test initialValue parm
## 1 drinking isolate TRUE FALSE FALSE 0 0
Let’s have a look at our full model:
print(CoEvolutionModel)
## name effectName include fix test
## 1 friendship constant friendship rate (period 1) TRUE FALSE FALSE
## 2 friendship constant friendship rate (period 2) TRUE FALSE FALSE
## 3 friendship outdegree (density) TRUE FALSE FALSE
## 4 friendship reciprocity TRUE FALSE FALSE
## 5 friendship number of actors at dist 2 TRUE FALSE FALSE
## 6 friendship GWESP I -> K -> J (#) TRUE FALSE FALSE
## 7 friendship GWESP I <- K <- J (#) TRUE FALSE FALSE
## 8 friendship indegree - popularity (sqrt) TRUE FALSE FALSE
## 9 friendship outdegree - popularity (sqrt) TRUE FALSE FALSE
## 10 friendship outdegree - activity (sqrt) TRUE FALSE FALSE
## 11 friendship rec.degree^(1/#) - activity TRUE FALSE FALSE
## 12 friendship drinking alter TRUE FALSE FALSE
## 13 friendship drinking ego TRUE FALSE FALSE
## 14 friendship drinking similarity TRUE FALSE FALSE
## 15 drinking rate drinking (period 1) TRUE FALSE FALSE
## 16 drinking rate drinking (period 2) TRUE FALSE FALSE
## 17 drinking drinking linear shape TRUE FALSE FALSE
## 18 drinking drinking quadratic shape TRUE FALSE FALSE
## 19 drinking drinking average similarity TRUE FALSE FALSE
## 20 drinking drinking isolate TRUE FALSE FALSE
## initialValue parm
## 1 4.69604 0
## 2 4.32885 0
## 3 -1.46770 0
## 4 0.00000 0
## 5 0.00000 0
## 6 0.00000 69
## 7 0.00000 69
## 8 0.00000 0
## 9 0.00000 0
## 10 0.00000 1
## 11 0.00000 1
## 12 0.00000 0
## 13 0.00000 0
## 14 0.00000 0
## 15 0.70571 0
## 16 0.84939 0
## 17 0.32237 0
## 18 0.00000 0
## 19 0.00000 0
## 20 0.00000 0
Before we can start to estimate our model we have to specify some additional options for tuning the estimation algorithm:
CoEvolutionOptions <- sienaAlgorithmCreate(useStdInits = FALSE,
projname = 'CoEvol-results', seed = 786840)
Now let us estimate.
siena07() has the arguments useCluster and nbrNodes. If your PC/Mac/Laptop has multiple cores (which nearly all have nowadays), you can use multiple ones for the estimation and simulation. The simulation phase is completely parallelizable. This means that in Phase 3 RSiena will use all the cores you give it, this can be 1, 2, 10… Phase 2, however, can only be parallelized over the number of periods. In our example this would be 2. Usually phase 2 is the most time consuming so it makes sense to use a number of cores equal to the number of waves, or if you have too many waves, it is advisable to pick a number of cores that is a factor of the number of periods. However, most importantly, never use all cores of your computer. Otherwise it might not react to your commands anymore until it is done with the estimation. To use multiple cores (e.g. 2) set useClsuter = TRUE and nbrNodes = 2.
CoEvolutionResults <- siena07(CoEvolutionOptions,
data = CoEvolutionData,
effects = CoEvolutionModel,
batch = FALSE, verbose = FALSE,
useCluster = TRUE, nbrNodes = 2,
returnDeps = TRUE)
The model is converged. Let us extract the parameters and calcualte some approximate p-values:
parameter <- CoEvolutionResults$effects$effectName
estimate <- CoEvolutionResults$theta
st.error <- sqrt(diag(CoEvolutionResults$covtheta))
normal.variate <- estimate/st.error
p.value.2sided <- 2*pnorm(abs(normal.variate),lower.tail = FALSE)
(results.table <- data.frame(parameter,
estimate = round(estimate,3),
st.error = round(st.error,3),
normal.variate = round(normal.variate,2),
p.value = round(p.value.2sided,4)
))
In order to safeguard that these results are not artefacts of model misspecification, we continue with an investigation of goodness of fit.
Note: convergence is not the same as fit!
Now assess network goodness of fit on several dimensions:
Goodness of fit for indegree distribution:
gof.indegrees <- sienaGOF(CoEvolutionResults,IndegreeDistribution,
verbose = TRUE,varName = "friendship",cumulative = FALSE)
## Detected 1000 iterations and 1 group.
## Calculating auxiliary statistics for periods 1 2 .
## Period 1
## > Completed 100 calculations
> Completed 200 calculations
> Completed 300 calculations
> Completed 400 calculations
> Completed 500 calculations
> Completed 600 calculations
> Completed 700 calculations
> Completed 800 calculations
> Completed 900 calculations
> Completed 1000 calculations
> Completed 1000 calculations
## Period 2
## > Completed 100 calculations
> Completed 200 calculations
> Completed 300 calculations
> Completed 400 calculations
> Completed 500 calculations
> Completed 600 calculations
> Completed 700 calculations
> Completed 800 calculations
> Completed 900 calculations
> Completed 1000 calculations
> Completed 1000 calculations
plot(gof.indegrees)
This fits well.
Goodness of fit for outdegree distribution:
gof.outdegrees <- sienaGOF(CoEvolutionResults,OutdegreeDistribution,
verbose = TRUE,varName = "friendship",cumulative = FALSE)
## Detected 1000 iterations and 1 group.
## Calculating auxiliary statistics for periods 1 2 .
## Period 1
## > Completed 100 calculations
> Completed 200 calculations
> Completed 300 calculations
> Completed 400 calculations
> Completed 500 calculations
> Completed 600 calculations
> Completed 700 calculations
> Completed 800 calculations
> Completed 900 calculations
> Completed 1000 calculations
> Completed 1000 calculations
## Period 2
## > Completed 100 calculations
> Completed 200 calculations
> Completed 300 calculations
> Completed 400 calculations
> Completed 500 calculations
> Completed 600 calculations
> Completed 700 calculations
> Completed 800 calculations
> Completed 900 calculations
> Completed 1000 calculations
> Completed 1000 calculations
plot(gof.outdegrees)
This, too, fits well.
Goodness of fit for triad census:
gof.triads <- sienaGOF(CoEvolutionResults,TriadCensus,
verbose = TRUE,varName = "friendship")
## Detected 1000 iterations and 1 group.
## Calculating auxiliary statistics for periods 1 2 .
## Period 1
## > Completed 100 calculations
> Completed 200 calculations
> Completed 300 calculations
> Completed 400 calculations
> Completed 500 calculations
> Completed 600 calculations
> Completed 700 calculations
> Completed 800 calculations
> Completed 900 calculations
> Completed 1000 calculations
> Completed 1000 calculations
## Period 2
## > Completed 100 calculations
> Completed 200 calculations
> Completed 300 calculations
> Completed 400 calculations
> Completed 500 calculations
> Completed 600 calculations
> Completed 700 calculations
> Completed 800 calculations
> Completed 900 calculations
> Completed 1000 calculations
> Completed 1000 calculations
plot(gof.triads,center = TRUE,scale = TRUE)
This fits well!
Again, looks good!
Goodness of fit for geodesic distances:
Goodness of fit for geodesic distances:
We first define a auxiliary fit function to calculate geodesics making use of the packages ‘network’ and ‘sna’:
GeodesicDistribution <- function(i, data, sims, period, groupName,
varName, levls = c(1:5,Inf), cumulative= TRUE, ...) {
x <- networkExtraction(i, data, sims, period, groupName, varName)
require(sna)
a <- sna::geodist(symmetrize(x))$gdist
if (cumulative)
{
gdi <- sapply(levls, function(i){ sum(a <= i) })
}
else
{
gdi <- sapply(levls, function(i){ sum(a == i) })
}
names(gdi) <- as.character(levls)
return(gdi)
}
gof1.geodesic <- sienaGOF(CoEvolutionResults,GeodesicDistribution,
verbose = TRUE,varName = "friendship",cumulative = FALSE)
## Detected 1000 iterations and 1 group.
## Calculating auxiliary statistics for periods 1 2 .
## Period 1
## > Completed 100 calculations
> Completed 200 calculations
> Completed 300 calculations
> Completed 400 calculations
> Completed 500 calculations
> Completed 600 calculations
> Completed 700 calculations
> Completed 800 calculations
> Completed 900 calculations
> Completed 1000 calculations
> Completed 1000 calculations
## Period 2
## > Completed 100 calculations
> Completed 200 calculations
> Completed 300 calculations
> Completed 400 calculations
> Completed 500 calculations
> Completed 600 calculations
> Completed 700 calculations
> Completed 800 calculations
> Completed 900 calculations
> Completed 1000 calculations
> Completed 1000 calculations
plot(gof1.geodesic)
Not great again…
Goodness of fit of the overall behaviour distribution:
gof.behaviour <- sienaGOF(CoEvolutionResults,BehaviorDistribution,
verbose = TRUE,join = TRUE,varName = "drinking",
cumulative = FALSE)
## Detected 1000 iterations and 1 group.
## Calculating auxiliary statistics for periods 1 2 .
## Period 1
## > Completed 100 calculations
> Completed 200 calculations
> Completed 300 calculations
> Completed 400 calculations
> Completed 500 calculations
> Completed 600 calculations
> Completed 700 calculations
> Completed 800 calculations
> Completed 900 calculations
> Completed 1000 calculations
> Completed 1000 calculations
## Period 2
## > Completed 100 calculations
> Completed 200 calculations
> Completed 300 calculations
> Completed 400 calculations
> Completed 500 calculations
> Completed 600 calculations
> Completed 700 calculations
> Completed 800 calculations
> Completed 900 calculations
> Completed 1000 calculations
> Completed 1000 calculations
plot(gof.behaviour) # fits very well+
This fits well.
It is usually a good idea to save the R-workspace after an RSiena estiamtion. It can take a long time to estiamte a model and you do not want to lose the results to an accident.
save.image("co-evolution lab results.RData")