- Top page
- Analyses on Ecological Assemblages
- Cluster and Indicator Species
Analyses on Ecological Assemblages
Cluster and Indicator Species
Here, we learn methods to construct clusters of assemblages according to their similarity and to recognize thier indicator species.
Dissimilarity of assemblges and hierarchical cluster
Load packages by require(vegan)
and require(cluster)
.
Read a assemblage data matrix from a csv file. An argument header=TRUE
sets species names as column names, and row.names=1
sets site names as row names.
dat <- read.csv("spcdat.csv", header = TRUE, row.names=1)
To evaluate differencies between assemblages, a distance matrix is calculated by a function vegdist()
in the package vegan. Morisita-Horn dissimilarity can be obtained by an argument method="horn"
. If you would like to get the Bray-Curtis dissimilarity, set method="bray"
.
dis.mh <- vegdist(dat, method="horn")
dis.bc <- vegdist(dat, method="bray")
Using the distance matrix, a function hclust()
in the pacage cluster produces a hierarchical cluster. There are several methods to make clusters. The following example show one of the popular methods: the Ward method method="ward.D2"
.
resmh.ward <- hclust(dis.mh, method="ward.D2")
plot(resmh.ward)
If we would like to use the group-average method, another popular method, set method="average"
. In the example above, the site Ako1 is an outlier but this is included in a group by a superior grouping power of the Ward method.
After inspecting this dendrogram, we may think that the whole sits should be grouped into three groups. Then, use a function cutree()
and set an argument k=3
.
resmh.ward.g <- cutree(resmh.ward, k=3)
resmh.ward.g
## Ako1 Ako2 Ako3 Nisio1 Nisio2 Nisio3 Nakatu1 Nakatu2 Nakatu3
## 1 2 2 1 1 2 3 1 1
## Banzu1 Banzu2 Banzu3 Uto1 Uto2 Uto3
## 2 2 2 3 3 1
We can decide the number of groups more objctivly by using a silhouette value.
asil <- numeric(nrow(dat))
for (i in 2:(nrow(dat)-1)){
sil <- silhouette(cutree(resmh.ward, k=i), dis.mh)
asil[i] <- mean(sil[,3])
}
asil
## [1] 0.00000000 0.27436092 0.37874725 0.41602592 0.43325536 0.41073129
## [7] 0.34779031 0.32437756 0.28972742 0.23109473 0.24977641 0.22718512
## [13] 0.09599773 0.06826518 0.00000000
Larger silhouette value indicates better grouping numbers. In this example, five groups gives the maximum silhouette 0.4332554. Let's see the 5 grouping and its silhouette plot.
resmh.ward.g <- cutree(resmh.ward, k=which.max(asil))
resmh.ward.g
## Ako1 Ako2 Ako3 Nisio1 Nisio2 Nisio3 Nakatu1 Nakatu2 Nakatu3
## 1 2 2 3 3 2 4 5 5
## Banzu1 Banzu2 Banzu3 Uto1 Uto2 Uto3
## 2 2 2 4 4 3
sil <- silhouette(resmh.ward.g, dis.mh)
rownames(sil) <- row.names(dat)
plot(sil)
Non-hierarchical cluster: K-medoids
Depending on characteristics of data and/or purpose of study, non-hierarchical clustur is a better option to group assemblages.
K-means method is one of the best known methods for hierarchical clusters, that has an assumption of Euclidean distance. But most of the Ecologically useful dissimilarities, including Bray-Curtis and Morisita-Horn, do not fit the Euclidean distance. The following example shows K-medoids method, by using a function pam()
, that is known to be more robust to violation of the Euclidean distance.
asilmk <- numeric(nrow(dat))
for (i in 2:(nrow(dat)-1)){
asilmk[i] <- pam(dis.mh, k=i, diss=TRUE)$silinfo$avg.width
}
asilmk
## [1] 0.00000000 0.21662533 0.32562256 0.36650728 0.42029671 0.38719369
## [7] 0.38006387 0.34541372 0.28972742 0.23109473 0.21786278 0.08558420
## [13] 0.09599773 0.06826518 0.00000000
Again, five groups gives the maximum silhouette 0.4202967. Let's see the 5 grouping and its silhouette plot.
resmh.pam <- pam(dis.mh, k=which.max(asilmk), diss=TRUE)
summary(resmh.pam)
## Medoids:
## ID
## [1,] "1" "Ako1"
## [2,] "10" "Banzu1"
## [3,] "15" "Uto3"
## [4,] "9" "Nakatu3"
## [5,] "13" "Uto1"
## Clustering vector:
## Ako1 Ako2 Ako3 Nisio1 Nisio2 Nisio3 Nakatu1 Nakatu2 Nakatu3
## 1 2 2 3 4 2 5 4 4
## Banzu1 Banzu2 Banzu3 Uto1 Uto2 Uto3
## 2 2 2 5 5 3
## Objective function:
## build swap
## 0.2251717 0.2217902
##
## Numerical information per cluster:
## size max_diss av_diss diameter separation
## [1,] 1 0.0000000 0.00000000 0.0000000 0.7825552
## [2,] 6 0.4932491 0.21581136 0.7984151 0.5436638
## [3,] 2 0.6921108 0.34605540 0.6921108 0.6004724
## [4,] 3 0.5783309 0.36657508 0.9096986 0.5536461
## [5,] 3 0.1268453 0.08004966 0.1640254 0.5436638
##
## Isolated clusters:
## L-clusters: character(0)
## L*-clusters: [1] 5
##
## Silhouette plot information:
## cluster neighbor sil_width
## Ako1 1 5 0.00000000
## Banzu1 2 5 0.67233690
## Banzu3 2 5 0.62128995
## Ako2 2 5 0.56283193
## Banzu2 2 5 0.52444944
## Nisio3 2 4 0.39476346
## Ako3 2 3 0.35069105
## Nisio1 3 4 0.14466868
## Uto3 3 5 0.07093218
## Nakatu3 4 3 0.33196280
## Nakatu2 4 5 0.20581221
## Nisio2 4 3 -0.04255231
## Uto1 5 3 0.86179224
## Uto2 5 1 0.81415322
## Nakatu1 5 2 0.79131885
## Average silhouette width per cluster:
## [1] 0.0000000 0.5210605 0.1078004 0.1650742 0.8224214
## Average silhouette width of total data set:
## [1] 0.4202967
##
## Available components:
## [1] "medoids" "id.med" "clustering" "objective" "isolation"
## [6] "clusinfo" "silinfo" "diss" "call"
plot(resmh.pam)
Indicator species of clusters: IndVal method
According to the clusters of the assemblages, we would like to know characteristic species of each cluster. We can select indicator species by IndVal method.
Here, we use the result of K-medoids clusters. First, load labdsv package require(labdsv)
to use a function indval()
.
iva <- indval(dat, resmh.pam$clustering, numitr=5000)
gr <- iva$maxcls[iva$pval<=0.05]
iv <- iva$indcls[iva$pval<=0.05]
pv <- iva$pval[iva$pval<=0.05]
fr <- apply(dat>0,2,sum)[iva$pval<=0.05]
fdig <- data.frame(group=gr, indval=iv, pvalue=pv, freq=fr)
(fdig <- fdig[order(fdig$group, -fdig$indval),])
## group indval pvalue freq
## Musculista_senhousia 1 0.9926342 0.0292 9
## Nectoneanthes_latipoda 1 0.9795918 0.0360 2
## Aonides_oxycephala 1 0.9354839 0.0370 6
## Spio_sp 3 0.8571429 0.0158 5
## Synidotea_sp 4 1.0000000 0.0040 3
## Mactra_veneriformis 5 0.8463950 0.0026 9
Now, we have a list of indicator species that shows significantly (p<0.05) large value of indval. Group 2 does not have any indicator species.
References
- Borcard D, Gille F, Legendre P (2011) Numerical Ecology with R. Springer
- Legendre P, Legendre L (2012) Numerical Ecology. Elsevier