This R-script guides you through an RSiena analysis of network evolution with a dynamic, school cohort friendship network from Scottland; three yearly waves of data were collected by Lynn Michell and Patrick West of the Medical Research Council / Medical Sociology Unit of the University of Glasgow in the school years 1994-95 until 1996-97.

(1) Preparatory steps in R

If not done yet, install the necessary packages.

install.packages(RSiena, repos = "https://cran.rstudio.com")
install.packages(network, repos = "https://cran.rstudio.com")
install.packages(sna, repos = "https://cran.rstudio.com")

Let us load the packages so that we can use them.

library(RSiena)
library(sna)
library(network)

To be sure, let us check the package version of RSiena. It should be 1.2-9 or higher.

packageVersion("RSiena")
## [1] '1.2.12'

(2) Inspection of the data

The s50 data set is part of the RSiena package you can use ?s50 to get some more information.

?s50

Let’s find out the number of actors in this data set:

numberActors <- nrow(s501)
numberActors
## [1] 50

s50a contains the alcohol consumption and s50s contains the smoking behavior. Let’s have a look at the data:

table(s50a[,1])
## 
##  1  2  3  4  5 
##  5 16 12 14  3
table(s50a[,2])
## 
##  1  2  3  4  5 
##  3 16 12 11  8
table(s50a[,3])
## 
##  1  2  3  4  5 
##  1  9 17 17  6
table(s50s[,1])
## 
##  1  2  3 
## 38  5  7
table(s50s[,2])
## 
##  1  2  3 
## 33  8  9
table(s50s[,3])
## 
##  1  2  3 
## 31  2 17

Next we take a look at the network change between observations:

table(s501,s502,useNA = 'always')
##       s502
## s501      0    1 <NA>
##   0    2328   59    0
##   1      56   57    0
##   <NA>    0    0    0

We can compute Hamming distances and Jaccard indices from such tables. Unfortuantely these functions are not implemented in the packages we loaded (maybe should change that…), but it is straightforward to write a function yourself in R:

Hamming <- function(changetable) {
    return(changetable[2,1] + changetable[1,2])
}
Jaccard <- function(changetable) {
    return(changetable[2,2]/(changetable[1,2] + changetable[2,1] + changetable[2,2]))
}

Now that we have the functions, let’s apply them.

Hamming(table(s501,s502))
## [1] 115
Hamming(table(s502,s503))
## [1] 106
Jaccard(table(s501,s502))
## [1] 0.3313953
Jaccard(table(s502,s503))
## [1] 0.3837209

And now overall change:

Hamming(table(s501,s503))
## [1] 133
Jaccard(table(s501,s503))
## [1] 0.2771739

(3) preparing variables for RSiena

First, let us identify the dependent network variable for the analysis:

friendship <- sienaDependent(array(
    c(s501,s502,s503),
    dim = c(numberActors,numberActors,3)))

You can use ?sienaDependent to see the function arugments.

?sienaDependent

Let us now identify actor-level predictor variable for the analysis:

smoke <- varCovar(s50s[,1:2],centered = TRUE)
alcohol <- varCovar(s50a[,1:2],centered = TRUE)

There are 2 different functions to add covariates in RSiena, coCovar() and varCovar. coCovar() adds a constant covariate and varCovar() adds varying/changing covariate. This varying covariate is an exogenous variable. This means that the change in the varCovar() variable is not modelled by RSiena. We get to non-network dependent variables, usually referred to as “behavior”, later today. The first observation of a varying covariate is used in the first estimation period (wave 1 to wave 2), the second observation for the 2nd period and so on. This means with 3 waves in our data we can only use the covariate observations from wave 1 and wave 2 for our varCovar variable (hence s50a[,1:2]). To get more information on coCovar() and varCovar() see the help functions.

?coCovar
?varCovar

Now we need to combine the identified data for the analysis:

netDynamics <- sienaDataCreate(friendship,smoke,alcohol)

How does the object look like when called in R?

netDynamics
## Dependent variables:  friendship 
## Number of observations: 3 
## 
## Nodeset                  Actors 
## Number of nodes              50 
## 
## Dependent variable friendship      
## Type               oneMode         
## Observations       3               
## Nodeset            Actors          
## Densities          0.046 0.047 0.05
## 
## Changing covariates:  smoke, alcohol

We can generate an initial descriptive outputfile:

print01Report(netDynamics,modelname = 'netDynamics-descriptive')

File “netDynamics-descriptive.out” was created in the working directory. You can use getwd() to find your current directory. But I recommend to use Rprojects in RStudio.


(4) Model specification / selection of effects

First, we need to create an RSiena effects object. This is basically just a table that contains all the effects we want to use in our model and all the special options you have specified for your effect (e.g. initial values of parameters).

first.model <- getEffects(netDynamics)

How does thís ‘empty’ model specification look like?

print(first.model)
##   effectName                          include fix   test  initialValue
## 1 constant friendship rate (period 1) TRUE    FALSE FALSE    4.69604  
## 2 constant friendship rate (period 2) TRUE    FALSE FALSE    4.32885  
## 3 outdegree (density)                 TRUE    FALSE FALSE   -1.46770  
## 4 reciprocity                         TRUE    FALSE FALSE    0.00000  
##   parm
## 1 0   
## 2 0   
## 3 0   
## 4 0
You can also have a look at the table directly. Just click on the object in your environment (if you are using RStudio).

As we can see, by default RSiena adds a few pre-selected effects. If necessary you can remove these effects from your model with the setEffect() function.

You can check which effects are available for adding to the model specification by using

effectsDocumentation(first.model)

However, these are usually quite a lot and it might not be the easiest option to look through this document. We strongly recommend to have a close look at the RSiena manual. Section 12 lists all effects programmed in RSiena, including the exact mathematical formulation used.

Now let’s include some effects. We start with sender, receiver and homophily effects of smoking and alcohol consumption.

first.model <- includeEffects(first.model,
    egoX,altX,simX, interaction1 = 'smoke')
##   effectName       include fix   test  initialValue parm
## 1 smoke alter      TRUE    FALSE FALSE          0   0   
## 2 smoke ego        TRUE    FALSE FALSE          0   0   
## 3 smoke similarity TRUE    FALSE FALSE          0   0
first.model <- includeEffects(first.model,
    egoX,altX,simX, interaction1 = 'alcohol')
##   effectName         include fix   test  initialValue parm
## 1 alcohol alter      TRUE    FALSE FALSE          0   0   
## 2 alcohol ego        TRUE    FALSE FALSE          0   0   
## 3 alcohol similarity TRUE    FALSE FALSE          0   0

We further want to add an effect for transitive closure. Here we do not choose the normal transitive closure effect (although available in RSiena), but we choose a geometrically weighted option. Geometrically weighted parameters for triadic closure often lead to better fit, and they usually fit better with theory and common sense. The idea is that every additional friend that actors i and j have in common only adds diminishign returns to the probabilty of i sending a tie to j.

first.model <- includeEffects(first.model, gwespFF)
##   effectName            include fix   test  initialValue parm
## 1 GWESP I -> K -> J (#) TRUE    FALSE FALSE          0   69

(5) Model estimation

Before we can start to estimate our model we have to specify some additional options for tuning the estimation algorithm:

estimation.options <- sienaAlgorithmCreate(
    useStdInits = FALSE,projname = 'netDynamics-estimates',
    cond = FALSE,seed = 1234567)

Now let us estimate the first model:

(first.results <- siena07(estimation.options,
    data = netDynamics,
    effects = first.model,
    batch = FALSE,verbose = FALSE,
    returnDeps = TRUE))
## Estimates, standard errors and convergence t-ratios
## 
##                                                Estimate   Standard   Convergence 
##                                                             Error      t-ratio   
##    1. rate constant friendship rate (period 1)  6.7375  ( 1.3197   )    0.0555   
##    2. rate constant friendship rate (period 2)  5.3590  ( 0.8804   )    0.0216   
##    3. eval outdegree (density)                 -2.8642  ( 0.1411   )    0.0648   
##    4. eval reciprocity                          2.4626  ( 0.1971   )    0.0509   
##    5. eval GWESP I -> K -> J (69)               1.5905  ( 0.1760   )    0.0690   
##    6. eval smoke alter                          0.1176  ( 0.1205   )   -0.0143   
##    7. eval smoke ego                            0.0636  ( 0.1327   )   -0.0155   
##    8. eval smoke similarity                     0.3963  ( 0.2262   )   -0.0345   
##    9. eval alcohol alter                       -0.0222  ( 0.0752   )    0.0808   
##   10. eval alcohol ego                          0.0454  ( 0.0818   )   -0.0005   
##   11. eval alcohol similarity                   0.7531  ( 0.3141   )   -0.0012   
## 
## Overall maximum convergence ratio:    0.2072 
## 
## 
## Total of 2808 iteration steps.

Results are written to file “netDynamics-estimates.out” in the working directory. Sidenote: The function to estimate SAOMs with RSiena is called siena07 based on tradition. Before the SIENA algorithm was implemented in R in the package RSiena it was available as standalone software and also facilitated other network models (e.g. ergms). Back then, these functions were numbered. Only siena07() and siena08() have survived the transition.

OPTIONAL: If estimation results should show insufficient convergence (i.e., the convergence t-ratios are not all between -0.1 and +0.1 for the estimated(!) parameters, and the overall maximum convergence ratio is below .25), another run with better starting values may help. The option ‘prevAns’ makes it possible to use the last available estimates as starting values for a new run:

first.results <- siena07(estimation.options,
    data = netDynamics,
    effects = first.model,
    batch = FALSE,verbose = FALSE,
    returnDeps = TRUE,
    prevAns = first.results)

Convergence statistics should have improved and be okay before proceeding. Note that new results will also be appended to the existing file “netDynamics-estimates.out”, so scroll down there for the latest estimates.

We can now proceed to preliminary interpretation of the results. The following lines make a table with approximate p-values:

parameter <- first.results$effects$effectName
estimate <- first.results$theta
st.error <- sqrt(diag(first.results$covtheta))
normal.variate <- estimate/st.error
p.value.2sided <- 2*pnorm(abs(normal.variate),lower.tail = FALSE)
(results.table <- data.frame(parameter,
                      estimate = round(estimate,3),
                      st.error = round(st.error,3),
                      normal.variate = round(normal.variate,2),
                      p.value = round(p.value.2sided,4)))

(7) Assessing model fit

Convergence is not the same as fit!
Goodness of fit should be reasonable also on network dimensions not explicitly included in the model specification. Otherwise, there is the danger of “omitted variable bias” in the parameter estimates.

We now assess goodness of fit on several network dimensions:

Goodness of fit for the indegree distribution:

gof1.indegrees <- sienaGOF(first.results,IndegreeDistribution,
                           varName = "friendship",cumulative = FALSE)
##   > Completed  1000  calculations
##   > Completed  1000  calculations
plot(gof1.indegrees) 

This fits reasonably well.

Goodness of fit for the outdegree distribution:

gof1.outdegrees <- sienaGOF(first.results,OutdegreeDistribution,
                                varName = "friendship",cumulative = FALSE)
##   > Completed  1000  calculations
##   > Completed  1000  calculations
plot(gof1.outdegrees) 

Goodness of fit for triad census:

gof1.triads <- sienaGOF(first.results,TriadCensus,varName = "friendship")
##   > Completed  1000  calculations
##   > Completed  1000  calculations
plot(gof1.triads,center = TRUE,scale = TRUE)

This fits quite badly!
The model generates too few triads of type 120C and too many of types 111U and 030C. This should be improved!

Goodness of fit for geodesic distances:
We first define a auxiliary fit function to calculate geodesics making use of the packages ‘network’ and ‘sna’:

GeodesicDistribution <- function(i, data, sims, period, groupName,
    varName, levls = c(1:5,Inf), cumulative= TRUE, ...) {
    x <- networkExtraction(i, data, sims, period, groupName, varName)
    require(sna)
    a <- sna::geodist(symmetrize(x))$gdist
    if (cumulative)
    {
        gdi <- sapply(levls, function(i){ sum(a <= i) })
    }
    else
    {
        gdi <- sapply(levls, function(i){ sum(a == i) })
    }
    names(gdi) <- as.character(levls)
    return(gdi)
}

gof1.geodesic <- sienaGOF(first.results,GeodesicDistribution,
                           varName = "friendship",cumulative = FALSE)
##   > Completed  1000  calculations
##   > Completed  1000  calculations
plot(gof1.geodesic) 

This fit is comparatively bad. Geodesic fit cannot easily be improved - due to the randomness in stochastic network models. These models tend to predict shorter distances than many data sets show. This is almost inevitable, as Watts & Strogatz show in their “small worlds” paper.

We conclude from the above plots that we need to do something about the triad census. Because the worst-fitting triad 120C contains reciprocity as well as transitivity.
We could add the interaction between gwespFF and reciprocity, but maybe it is simpler if we first add a term for 3-cycles, because 120C also contains a 3-cycle. Again we add the geometrically weighted term, here it is gwespBB. This also helps to better fit triad 030C:

second.model <- includeEffects(first.model,gwespBB, parameter = 69)
##   effectName            include fix   test  initialValue parm
## 1 GWESP I <- K <- J (#) TRUE    FALSE FALSE          0   69  
## Warning: argument 'parameter' has no effect in includeEffects; use setEffect.

We also add the interaction between reciprocity and activity to better account for triad 111U:

second.model <- includeEffects(second.model,reciAct)
##   effectName                  include fix   test  initialValue parm
## 1 rec.degree^(1/#) - activity TRUE    FALSE FALSE          0   1

It might also be a good idea to add effects for the degree distribution.

second.model <- includeEffects(second.model,inPopSqrt, outActSqrt,
                               outPopSqrt,nbrDist2)
##   effectName                    include fix   test  initialValue parm
## 1 number of actors at dist 2    TRUE    FALSE FALSE          0   0   
## 2 indegree - popularity (sqrt)  TRUE    FALSE FALSE          0   0   
## 3 outdegree - popularity (sqrt) TRUE    FALSE FALSE          0   0   
## 4 outdegree - activity (sqrt)   TRUE    FALSE FALSE          0   1

Again we do not add the simple effect of in- and out-degree but we add the square root term. The reason for this is the same as for the geometrically weighted parameters.

Now let’s estimate the model:

second.results <- siena07(estimation.options,
                          data = netDynamics,
                          effects = second.model,
                          batch = FALSE,verbose = FALSE,
                          prevAns = first.results,
                          returnDeps = TRUE) 

The model is converged, so let us check the goodness of fit:

gof2.indegrees <- sienaGOF(second.results,IndegreeDistribution,
                            varName = "friendship",cumulative = FALSE)
##   > Completed  1000  calculations
##   > Completed  1000  calculations
plot(gof2.indegrees) 

gof2.outdegrees <- sienaGOF(second.results,OutdegreeDistribution,
                             varName = "friendship",cumulative = FALSE)
##   > Completed  1000  calculations
##   > Completed  1000  calculations
plot(gof2.outdegrees) 

gof2.triads <- sienaGOF(second.results,TriadCensus,
                         varName = "friendship")
##   > Completed  1000  calculations
##   > Completed  1000  calculations
plot(gof2.triads,center = TRUE,scale = TRUE) 

gof2.geodesic <- sienaGOF(second.results,GeodesicDistribution,
                           varName = "friendship",cumulative = FALSE)
##   > Completed  1000  calculations
##   > Completed  1000  calculations
plot(gof2.geodesic) 

It is usually a good idea to save the R-workspace after an RSiena estimation. It can take a long time to estimate a model and you do not want to lose the results to an accident.

save.image("network dynamics lab results.RData")