This RScript details how to perform an RSiena analysis for network and behavior co-evolution. In this example we will analyse how friendship and alcohol consumption co-evolve over time. The data comes from a Scottish school cohort collected by Lynn Michell and Patrick West of the Medical Research Council / Medical Sociology Unit of the University of Glasgow in the school years 1994-95 until 1996-97.

(1) Preparatory steps in R

If not done yet, install the necessary packages.

install.packages("RSiena", repos = "http://cran.us.r-project.org")
install.packages("network", repos = "http://cran.us.r-project.org")
install.packages("sna", repos = "http://cran.us.r-project.org")

To be sure, let us check the package version of RSiena. It should be 1.2-9 or higher.

packageVersion("RSiena")

## [1] '1.2.12'

Let us load the packages so that we can use them.

library(RSiena)
library(sna)
library(network)

(2) Inspection of the data

The s50 data set is part of the RSiena package you can use ?s50 to get some more information.

?s50

Let’s find out the number of actors in this data set:

numberActors <- nrow(s501)
numberActors

## [1] 50

s50a contains the alcohol consumption s50s contains the smoking behavior means of demographic variables:

table(s50a[,1])

## 
##  1  2  3  4  5 
##  5 16 12 14  3

table(s50a[,2])

## 
##  1  2  3  4  5 
##  3 16 12 11  8

table(s50a[,3])

## 
##  1  2  3  4  5 
##  1  9 17 17  6

table(s50s[,1])

## 
##  1  2  3 
## 38  5  7

table(s50s[,2])

## 
##  1  2  3 
## 33  8  9

table(s50s[,3])

## 
##  1  2  3 
## 31  2 17

Let’s take a look at the network change between observations, e.g., between the first two waves:

table(s501,s502,useNA = 'always')

##       s502
## s501      0    1 <NA>
##   0    2328   59    0
##   1      56   57    0
##   <NA>    0    0    0

We can compute Hamming distances and Jaccard indices from such tables. Unfortuantely these functions are not implemented in the packages we loaded (maybe should change that…), but it is straightforward to write a function yourself in R:

Hamming <- function(changetable) {
    return(changetable[2,1] + changetable[1,2])
}
Jaccard <- function(changetable) {
    return(changetable[2,2]/(changetable[1,2] + changetable[2,1] + changetable[2,2]))
}

Now that we have the functions, let’s apply them.

Hamming(table(s501,s502))

## [1] 115

Hamming(table(s502,s503))

## [1] 106

Jaccard(table(s501,s502))

## [1] 0.3313953

Jaccard(table(s502,s503))

## [1] 0.3837209

And now overall change:

Hamming(table(s501,s503))

## [1] 133

Jaccard(table(s501,s503))

## [1] 0.2771739

(3) preparing variables for RSiena

First, let us identify the dependent network variable for the analysis:

friendship <- sienaDependent(array(
    c(s501,s502,s503),
    dim = c(numberActors,numberActors,3)))

You can use ?sienaDependent to see the function arugments.

?sienaDependent

Now we identify the dependent behavior variable:

drinking <- sienaDependent(s50a,type = "behavior")

Let’s bind the data together for the RSiena analysis:

CoEvolutionData <- sienaDataCreate(friendship,drinking)
CoEvolutionData

## Dependent variables:  friendship, drinking 
## Number of observations: 3 
## 
## Nodeset                  Actors 
## Number of nodes              50 
## 
## Dependent variable friendship      
## Type               oneMode         
## Observations       3               
## Nodeset            Actors          
## Densities          0.046 0.047 0.05
## 
## Dependent variable drinking
## Type               behavior
## Observations       3       
## Nodeset            Actors  
## Range              1 - 5

We can generate an initial descriptive outputfile:

print01Report(CoEvolutionData,modelname = 'netDynamics-descriptive')

File “netDynamics-descriptive.out” was created in the working directory. You can use getwd() to find your current directory. But I recommend to use Rprojects in RStudio.

(4) Model specification / selection of effects

First, we need to create an RSiena effects object. This is basically just a table that contains all the effects we want to use in our model and all the special options you have specified for your effect (e.g. initial values of parameters).

CoEvolutionModel <- getEffects(CoEvolutionData)

How does thís ‘empty’ model specification look like?

print(CoEvolutionModel)

##   name       effectName                          include fix   test 
## 1 friendship constant friendship rate (period 1) TRUE    FALSE FALSE
## 2 friendship constant friendship rate (period 2) TRUE    FALSE FALSE
## 3 friendship outdegree (density)                 TRUE    FALSE FALSE
## 4 friendship reciprocity                         TRUE    FALSE FALSE
## 5 drinking   rate drinking (period 1)            TRUE    FALSE FALSE
## 6 drinking   rate drinking (period 2)            TRUE    FALSE FALSE
## 7 drinking   drinking linear shape               TRUE    FALSE FALSE
## 8 drinking   drinking quadratic shape            TRUE    FALSE FALSE
##   initialValue parm
## 1    4.69604   0   
## 2    4.32885   0   
## 3   -1.46770   0   
## 4    0.00000   0   
## 5    0.70571   0   
## 6    0.84939   0   
## 7    0.32237   0   
## 8    0.00000   0

As we can see, by default RSiena adds a few pre-selected effects. If necessary you can remove these effects from your model with the setEffect() function.

You can check which effects are available for adding to the model specification by using

effectsDocumentation(CoEvolutionModel)

However, these are usually quite a lot and it might not be the easiest option to look through this document. We strongly recommend to have a close look at the RSiena manual. Section 12 lists all effects programmed in RSiena, including the exact mathematical formulation used.

In this model we have two dependent variables, the network and the behavior. We start with specifying the network effects:

Network Effects

Now let’s include some effects. We start with sender, receiver and homophily effects for the drinking behavior.

CoEvolutionModel <- includeEffects(CoEvolutionModel,
    egoX,altX,simX, interaction1 = 'drinking')

##   effectName          include fix   test  initialValue parm
## 1 drinking alter      TRUE    FALSE FALSE          0   0   
## 2 drinking ego        TRUE    FALSE FALSE          0   0   
## 3 drinking similarity TRUE    FALSE FALSE          0   0

Lastly, we want to add some structural effects. First we add an effect for transitive closure. Here we do not choose the normal transitive closure effect (although available in RSiena, but we choose a geometrically weighted option. Geometrically weighted parameters for triadic closure often lead to better fit, and they usually fit better with theory and common sense. The idea is that every additional friend that actors i and j have in common only adds diminishign returns to the probabilty of i sending a tie to j.

CoEvolutionModel <- includeEffects(CoEvolutionModel, gwespFF)

##   effectName            include fix   test  initialValue parm
## 1 GWESP I -> K -> J (#) TRUE    FALSE FALSE          0   69

From the session about network evolution we know that we need some more effects for a better fit:

CoEvolutionModel <- includeEffects(CoEvolutionModel, inPopSqrt, outActSqrt,outPopSqrt,nbrDist2,
                                   reciAct,gwespBB)

##   effectName                    include fix   test  initialValue parm
## 1 number of actors at dist 2    TRUE    FALSE FALSE          0    0  
## 2 GWESP I <- K <- J (#)         TRUE    FALSE FALSE          0   69  
## 3 indegree - popularity (sqrt)  TRUE    FALSE FALSE          0    0  
## 4 outdegree - popularity (sqrt) TRUE    FALSE FALSE          0    0  
## 5 outdegree - activity (sqrt)   TRUE    FALSE FALSE          0    1  
## 6 rec.degree^(1/#) - activity   TRUE    FALSE FALSE          0    1

Behavior Dynamics

Now that we have specified the network model we can specify the behavior model.

First, we can test if friends effect each other in their alcohol consumption.

CoEvolutionModel <- includeEffects(CoEvolutionModel,
                        avSim, interaction1 = "friendship", name = "drinking")

##   effectName                  include fix   test  initialValue parm
## 1 drinking average similarity TRUE    FALSE FALSE          0   0

Here, instead of ‘avSim’ also could have used ‘totSim’, ‘avAlt’, or ‘totAlt’. All of these are influence effects, but each of them estimates a slightly different effect. Your theory should tell you which is the best choice for you.

We can also test other hypotheses. For instance, maybe students isolated in the network are more likely to increase their alcohol consumption.

CoEvolutionModel <- includeEffects(CoEvolutionModel,
    isolate, interaction1 = "friendship", name = "drinking")

##   effectName       include fix   test  initialValue parm
## 1 drinking isolate TRUE    FALSE FALSE          0   0

Let’s have a look at our full model:

print(CoEvolutionModel)

##    name       effectName                          include fix   test 
## 1  friendship constant friendship rate (period 1) TRUE    FALSE FALSE
## 2  friendship constant friendship rate (period 2) TRUE    FALSE FALSE
## 3  friendship outdegree (density)                 TRUE    FALSE FALSE
## 4  friendship reciprocity                         TRUE    FALSE FALSE
## 5  friendship number of actors at dist 2          TRUE    FALSE FALSE
## 6  friendship GWESP I -> K -> J (#)               TRUE    FALSE FALSE
## 7  friendship GWESP I <- K <- J (#)               TRUE    FALSE FALSE
## 8  friendship indegree - popularity (sqrt)        TRUE    FALSE FALSE
## 9  friendship outdegree - popularity (sqrt)       TRUE    FALSE FALSE
## 10 friendship outdegree - activity (sqrt)         TRUE    FALSE FALSE
## 11 friendship rec.degree^(1/#) - activity         TRUE    FALSE FALSE
## 12 friendship drinking alter                      TRUE    FALSE FALSE
## 13 friendship drinking ego                        TRUE    FALSE FALSE
## 14 friendship drinking similarity                 TRUE    FALSE FALSE
## 15 drinking   rate drinking (period 1)            TRUE    FALSE FALSE
## 16 drinking   rate drinking (period 2)            TRUE    FALSE FALSE
## 17 drinking   drinking linear shape               TRUE    FALSE FALSE
## 18 drinking   drinking quadratic shape            TRUE    FALSE FALSE
## 19 drinking   drinking average similarity         TRUE    FALSE FALSE
## 20 drinking   drinking isolate                    TRUE    FALSE FALSE
##    initialValue parm
## 1     4.69604    0  
## 2     4.32885    0  
## 3    -1.46770    0  
## 4     0.00000    0  
## 5     0.00000    0  
## 6     0.00000   69  
## 7     0.00000   69  
## 8     0.00000    0  
## 9     0.00000    0  
## 10    0.00000    1  
## 11    0.00000    1  
## 12    0.00000    0  
## 13    0.00000    0  
## 14    0.00000    0  
## 15    0.70571    0  
## 16    0.84939    0  
## 17    0.32237    0  
## 18    0.00000    0  
## 19    0.00000    0  
## 20    0.00000    0

(5) Model Estimation

Before we can start to estimate our model we have to specify some additional options for tuning the estimation algorithm:

CoEvolutionOptions <- sienaAlgorithmCreate(useStdInits = FALSE,
    projname = 'CoEvol-results', seed = 786840)

Now let us estimate.
siena07() has the arguments useCluster and nbrNodes. If your PC/Mac/Laptop has multiple cores (which nearly all have nowadays), you can use multiple ones for the estimation and simulation. The simulation phase is completely parallelizable. This means that in Phase 3 RSiena will use all the cores you give it, this can be 1, 2, 10… Phase 2, however, can only be parallelized over the number of periods. In our example this would be 2. Usually phase 2 is the most time consuming so it makes sense to use a number of cores equal to the number of waves, or if you have too many waves, it is advisable to pick a number of cores that is a factor of the number of periods. However, most importantly, never use all cores of your computer. Otherwise it might not react to your commands anymore until it is done with the estimation. To use multiple cores (e.g. 2) set useClsuter = TRUE and nbrNodes = 2.

CoEvolutionResults <- siena07(CoEvolutionOptions,
    data = CoEvolutionData,
    effects = CoEvolutionModel,
    batch = FALSE, verbose = FALSE,
    useCluster = TRUE, nbrNodes = 2,
    returnDeps = TRUE)

The model is converged. Let us extract the parameters and calcualte some approximate p-values:

parameter <- CoEvolutionResults$effects$effectName
estimate <- CoEvolutionResults$theta
st.error <- sqrt(diag(CoEvolutionResults$covtheta))
normal.variate <- estimate/st.error
p.value.2sided <- 2*pnorm(abs(normal.variate),lower.tail = FALSE)
(results.table <- data.frame(parameter,
    estimate = round(estimate,3),
    st.error = round(st.error,3),
    normal.variate = round(normal.variate,2),
    p.value = round(p.value.2sided,4)
))

In order to safeguard that these results are not artefacts of model misspecification, we continue with an investigation of goodness of fit.

(6) Goodness of fit

Network fit

Note: convergence is not the same as fit!

Now assess network goodness of fit on several dimensions:

Goodness of fit for indegree distribution:

gof.indegrees <- sienaGOF(CoEvolutionResults,IndegreeDistribution,
    verbose = TRUE,varName = "friendship",cumulative = FALSE)

## Detected 1000 iterations and 1 group.
## Calculating auxiliary statistics for periods 1 2 .
##   Period  1 
##   > Completed  100  calculations
  > Completed  200  calculations
  > Completed  300  calculations
  > Completed  400  calculations
  > Completed  500  calculations
  > Completed  600  calculations
  > Completed  700  calculations
  > Completed  800  calculations
  > Completed  900  calculations
  > Completed  1000  calculations
  > Completed  1000  calculations
##   Period  2 
##   > Completed  100  calculations
  > Completed  200  calculations
  > Completed  300  calculations
  > Completed  400  calculations
  > Completed  500  calculations
  > Completed  600  calculations
  > Completed  700  calculations
  > Completed  800  calculations
  > Completed  900  calculations
  > Completed  1000  calculations
  > Completed  1000  calculations

plot(gof.indegrees)

This fits well.

Goodness of fit for outdegree distribution:

gof.outdegrees <- sienaGOF(CoEvolutionResults,OutdegreeDistribution,
    verbose = TRUE,varName = "friendship",cumulative = FALSE)

## Detected 1000 iterations and 1 group.
## Calculating auxiliary statistics for periods 1 2 .
##   Period  1 
##   > Completed  100  calculations
  > Completed  200  calculations
  > Completed  300  calculations
  > Completed  400  calculations
  > Completed  500  calculations
  > Completed  600  calculations
  > Completed  700  calculations
  > Completed  800  calculations
  > Completed  900  calculations
  > Completed  1000  calculations
  > Completed  1000  calculations
##   Period  2 
##   > Completed  100  calculations
  > Completed  200  calculations
  > Completed  300  calculations
  > Completed  400  calculations
  > Completed  500  calculations
  > Completed  600  calculations
  > Completed  700  calculations
  > Completed  800  calculations
  > Completed  900  calculations
  > Completed  1000  calculations
  > Completed  1000  calculations

plot(gof.outdegrees)

This, too, fits well.

Goodness of fit for triad census:

gof.triads <- sienaGOF(CoEvolutionResults,TriadCensus,
    verbose = TRUE,varName = "friendship")

## Detected 1000 iterations and 1 group.
## Calculating auxiliary statistics for periods 1 2 .
##   Period  1 
##   > Completed  100  calculations
  > Completed  200  calculations
  > Completed  300  calculations
  > Completed  400  calculations
  > Completed  500  calculations
  > Completed  600  calculations
  > Completed  700  calculations
  > Completed  800  calculations
  > Completed  900  calculations
  > Completed  1000  calculations
  > Completed  1000  calculations
##   Period  2 
##   > Completed  100  calculations
  > Completed  200  calculations
  > Completed  300  calculations
  > Completed  400  calculations
  > Completed  500  calculations
  > Completed  600  calculations
  > Completed  700  calculations
  > Completed  800  calculations
  > Completed  900  calculations
  > Completed  1000  calculations
  > Completed  1000  calculations

plot(gof.triads,center = TRUE,scale = TRUE)

This fits well!

Again, looks good!
Goodness of fit for geodesic distances:

Goodness of fit for geodesic distances:
We first define a auxiliary fit function to calculate geodesics making use of the packages ‘network’ and ‘sna’:

GeodesicDistribution <- function(i, data, sims, period, groupName,
    varName, levls = c(1:5,Inf), cumulative= TRUE, ...) {
    x <- networkExtraction(i, data, sims, period, groupName, varName)
    require(sna)
    a <- sna::geodist(symmetrize(x))$gdist
    if (cumulative)
    {
        gdi <- sapply(levls, function(i){ sum(a <= i) })
    }
    else
    {
        gdi <- sapply(levls, function(i){ sum(a == i) })
    }
    names(gdi) <- as.character(levls)
    return(gdi)
}

gof1.geodesic <- sienaGOF(CoEvolutionResults,GeodesicDistribution,
    verbose = TRUE,varName = "friendship",cumulative = FALSE)

## Detected 1000 iterations and 1 group.
## Calculating auxiliary statistics for periods 1 2 .
##   Period  1 
##   > Completed  100  calculations
  > Completed  200  calculations
  > Completed  300  calculations
  > Completed  400  calculations
  > Completed  500  calculations
  > Completed  600  calculations
  > Completed  700  calculations
  > Completed  800  calculations
  > Completed  900  calculations
  > Completed  1000  calculations
  > Completed  1000  calculations
##   Period  2 
##   > Completed  100  calculations
  > Completed  200  calculations
  > Completed  300  calculations
  > Completed  400  calculations
  > Completed  500  calculations
  > Completed  600  calculations
  > Completed  700  calculations
  > Completed  800  calculations
  > Completed  900  calculations
  > Completed  1000  calculations
  > Completed  1000  calculations

plot(gof1.geodesic)

Not great again…

Behavior fit

Goodness of fit of the overall behaviour distribution:

gof.behaviour <- sienaGOF(CoEvolutionResults,BehaviorDistribution,
    verbose = TRUE,join = TRUE,varName = "drinking",
    cumulative = FALSE)

## Detected 1000 iterations and 1 group.
## Calculating auxiliary statistics for periods 1 2 .
##   Period  1 
##   > Completed  100  calculations
  > Completed  200  calculations
  > Completed  300  calculations
  > Completed  400  calculations
  > Completed  500  calculations
  > Completed  600  calculations
  > Completed  700  calculations
  > Completed  800  calculations
  > Completed  900  calculations
  > Completed  1000  calculations
  > Completed  1000  calculations
##   Period  2 
##   > Completed  100  calculations
  > Completed  200  calculations
  > Completed  300  calculations
  > Completed  400  calculations
  > Completed  500  calculations
  > Completed  600  calculations
  > Completed  700  calculations
  > Completed  800  calculations
  > Completed  900  calculations
  > Completed  1000  calculations
  > Completed  1000  calculations

plot(gof.behaviour) # fits very well+