Influence of Coral Cover on Indonesian Coral Reefs

Abstract

Coral reefs are extremely important ecosystems on Earth, and the health of hard corals on reefs is an important part of coral reefs. This analysis is motivated by an interest in coral reefs, and a desire to be able to understand what influences the health of coral reefs for future conservation efforts. In this study, data will be used to conduct tests on what influences coral cover in coral reefs in Indonesia. The data was collected from three different regions of Indonesia with two categorical and two numeric variables. An ANOVA and ANCOVA test were used in order to understand the significant factors that influence coral cover. The results showed that the coral cover differed significantly among at least two of the regions, and that dominant trophic level on a site and human population density are predictors of coral reef cover.

Introduction

This dataset was gathered by members of University of Rhode Island and Bogor Agricultural University for the purpose of determining whether fishing and habitat differences among different coral reefs affects size spectra slopes of the fish there. The explanation of the data explains only what the data encompasses, but the methods of how each variable was collected are unclear as it was not explained. All of the data were taken from a variety of sites within three different regions of Indonesia which have a range of different fishing pressures and habitat conditions. Rather than using this data to analyze the impact on size spectra slope of fish, in this report, the data will be used to analyze what may affect the percent cover of hard coral on a reef. Coral reef ecosystems are one of the most biodiverse areas while only occupying less than 0.01% of the Earth’s surface, and they are also very vulnerable because hard corals are sensitive to multiple stressors, which can cause coral bleaching and, if they do not recover, death. Coral reefs have a myriad of beneficial functions such as providing medicine and fisheries, and with climate change, hard coral on reefs are bleaching at faster rates than ever because multiple stressors are all impacting the resilience - ability to recover - of the hard corals. Hard coral cover influences the health as a reef as a whole, and a shift to an algae dominated landscape causes a regime shift, and it is extremely difficult for it to shift back to a coral dominated landscape. Because of this, studies of coral reefs are extremely vital and pressing at this time, and as conservation efforts are implemented, it is important to understand the basics of what factors can cause hard coral cover to decrease in the first place.

There were a variety of variables collected in this dataset, and for this report, the trophic level of fish (carnivore vs. herbivore), and a human population density measurement within each site and region specifically were used to determine which factors are significant in the decrease of hard coral cover of these reefs in Indonesia. I hypothesized that at least two of the regions will have a significantly different mean number of hard coral cover. Additionally, human impact poses anthropogenic stressors, so human population density and the dominant trophic level of fish in a site should have an impact on coral reef hard coral cover as well.

Exploratory Data Analysis

This report focuses on the factors that influence the percent cover of hard corals on coral reefs. The data for the percent cover of coral reefs was slightly right skewed, and after performing logarithmic and square root transformations, the square root transformation data was the most normal with a visually normally distributed histogram as shown below in Figure 1, as well as a Shapiro-Wilk test p-value of 0.7999, which means that the null hypothesis that the data is normally distributed is not rejected. For the other numeric variable, which is the human population density metric, it was not normally distributed, with a heavy right skew (figure 2). This dataset was transformed logarithmically, with square root, and it was exponentiated as well, but the distribution remained right skewed.

Figure 1. Histogram of the square root of hard coral cover. This data is normally distributed and has a Shapiro-Wilk test p-value of 0.7999.

Figure 2. Histogram of the human density population metric. This data is heavily right skewed, and no transformations made it normally distributed.

All of the data points in this dataset were collected from fifty-seven different sites in Indonesia, and all of these sites fell into three different regions. By treating each region like a different sample and separating the square root percent cover of hard coral values into their respective regions, it shows that each region has a varying range of percent cover among their sites. For the region Lombok, it appears that mean of the square root of percent cover of coral is lower than the other two regions. Raja Ampat and Wakatobi have similar means, but Wakatobi has a wider range of values than Raja Ampat. For Lombok, there appears to be a singular outlier greater than the rest of the data.

Figure 3. Boxplot of the square root of hard coral cover per region

The human population density metric was also collected per site. Figure 4 demonstrates the average human population density per region, and the pattern appears to be opposite of the square root of percent coral cover per region. It seems that the sites within lombok have a higher average human population density, while the other two regions, Raja Ampat and Wakatobi, have low values. Visually, figure 3 and figure 4 appear to be inverses of each other. For Lombok, there appears to be two outliers, one above and one below the rest of the data.

Figure 4. Boxplot of the Human population density metric per region

There were many fish sampled from each site, and their trophic level was recorded in the dataset. There were hundreds of fish sampled per site which yielded a dataset of approximately 18,000 data points for trophic levels of fish (carnivore or herbivore), while the other variables - such as percent cover of coral and the human population density metric - only had fifty-seven data points. To account for this discrepancy, the “COUNTIF” function in Microsoft Excel was used to count which type of fish (carnivore or herbivore) was the most dominant per site, and was included into the dataset as one data value per site. In order to visualize the amount of carnivore or herbivore dominated sites per region, a grouped bar graph was created (figure 5).

Figure 5. Bar plot demonstrating the amount of sites within each region that either has a dominant Herbivore or Carnivore population

Statistical Methods

ANOVA

An ANOVA test is an “analysis of variance” test which tests whether the variance within individual treatment samples - in this case the “treatment” is region - is larger than the overall variance of all of the samples combined. It is used to compare the means between three or more different samples and to determine if at least one mean is significantly different from the others. A one-way ANOVA test was used in order to determine whether the mean of the square root of hard coral cover was significantly different among the three different regions in Indonesia that the data were collected from. By running this test, it reveals whether there are significant differences in coral cover over different areas. This confirms that regionally, coral reef composition differs, and it will allow further exploration of what specific factors and qualities? of certain regions are related to these differences.

The ANOVA test yields a p-value that is compared to an alpha level of 0.05. This means that if the p-value is less than 0.05, the null hypothesis is rejected while a p-value of greater than 0.05 means that we fail to reject the null hypothesis. The null hypothesis states that the means between all of the samples are not significantly different while the alternative hypothesis states that at least one of the means between the samples are significantly different. In this case, a p-value of less than 0.05 would mean that at least one of the regions has a significantly different mean square root of percent of hard coral cover than the other regions. A p-value of over 0.05 would mean that the mean square root of percent of hard coral cover is not significantly different among all three of the regions. The ANOVA test is able to reveal whether or not at least one of the sample’s mean is different, but it does not show which specific samples are different from one another.

To conduct an ANOVA test, certain assumptions must be met. First, the data collected must be a randomly collected sample. Although it is not explicitly mentioned whether this data set was collected randomly, it is assumed that it was considering that this data set was previously used to run different statistical tests which required the data to be randomly sampled. Additionally, the residuals must be normally distributed as well as equal variances among all the samples being compared in the test. Variance is a measure of how spread out the data is while the residuals are the difference between the actual data that is observed and the theoretical data.

In order to test the last two assumptions, a Q-Q plot was used as well as a residuals versus fitted plot. In the Q-Q plot, if the data points follow the line tightly, this means that the residuals are normally distributed. Additionally the residuals versus fitted plot yielding a plot with a red trendline that is relatively straight as well as points that are somewhat equally distributed across this line without any obvious “cone shapes” or patterns shows that there is homogeneity of variance among the samples. In figure 6, it shows that the residuals versus fitted plot’s red trendline is relatively straight across, and the points are equally distributed above and below the line. The Q-Q plot in this figure also shows that the residuals are normally distributed.

ANCOVA

The ANCOVA test is an “analysis of covariance” test that is a form of multiple regression. In a linear regression, an explanatory and response variable is used to create a model in which the explanatory variable can predict the value of the response variable. A multiple regression is a test that produces a prediction model, but instead of one explanatory variable, there are multiple. Because one of the explanatory variables in this specific dataset is categorical, an ANCOVA is used, which accounts for both categorical and continuous numeric variables as explanatory variables. Using the dominant trophic level of fish in a site as well as using the human population density metric as explanatory variables, conducting an ANCOVA reveals whether or not these factors can help predict the square root of percent cover of hard coral on a coral reef. The results can help reveal what characteristics in a certain region can explain the differences of the square root of percent coral cover, and it can also be used to predict what the square root of percent coral cover will look like in other regions.

ANCOVA also has assumptions that need to be met in order to conduct this test. First the data needs to be randomly sampled and the residuals must be normally distributed as well as have equal variance. As stated for the ANOVA, it is assumed that the data was randomly sampled. Additionally, the residuals vs. fitted plot was also used to check the equality of variance as well as the Q-Q plot to check for the normality of the residuals. In figure 7, the results show that the homogeneity of variance and normality of residuals assumptions are met due to the residuals versus fitted plot and Q-Q plot. It is also important to test whether the data has collinearity among explanatory variables, which means that there is a relationship between them. If collinearity occurs, one of the variables needs to be removed from the test. In appendix 4, it is evident that there is no collinearity between the explanatory variables. If there are also interactions among the categorical and explanatory variables, this needs to be accounted for in the test as well. In order to determine whether there are interactions, a plot is created that includes all variables, and if the slopes cross, there is an interaction.

Three different regression models were created to test which model fit the best. The models all included the square root of percent of hard coral cover as the response variable, but as explanatory variables they included: just the human population density metric, the population density metric as well as the dominant trophic level, and both of the previous variables as well as percent cover of algae (these three tests were labeled fit1, fit2, and fit3 respectively). In order to determine which model was the best fit, the Akaike’s Information Criterion (AIC) and Bayes’ Information Criterion (BIC) score were calculated. A lower score means that the model fits the data better. Additionally, the R2 values were compared; the R2 value describes how closely the data fits the regression line.

The R2 value and the p-value that is produced with the chosen model is what we are looking for. The R2 value describes how well the model explains the variance of the output. The R2 value describes how well the model explains the variance of the output. A p-value of below the alpha level of 0.05 means that the null hypothesis that the slope of the regression line is zero is rejected, meaning that there is a relationship and the model can predict the data. If the p-value is greater than 0.05, that means that we fail to reject the null hypothesis. In this case, the null hypothesis is that the human population density metric and the dominant trophic level in a site cannot predict the square root of percent of hard coral cover on a coral reef in Indonesia, and the alternative hypothesis is that human population density metric and the dominant trophic level in a site can predict the square root of percent of hard coral cover on a coral reef in Indonesia.

Results

ANOVA

The one-way ANOVA test resulted in a p-value of 0.00146. Because this p-value is lower than the alpha level of 0.05, the null hypothesis that there are no significant differences in the mean square root of percent hard coral cover per region in Indonesia is rejected. This means that at least one of the mean square root of percent hard coral cover of a region is significantly different than the other regions. However, the only way to determine how many of the regions - and which regions - are significantly different, a post-hoc analysis, such as the Tukey-Kramer test, needs to be conducted. Because this test was used for the purpose of determining whether there were differences between the regions at all, it is not important to know how many are different, or which ones. The distribution of the square root of percent of hard coral cover per region can be seen in figure 3.

Figure 6. Analysis of Assumptions for ANOVA. This figure shows the residuals vs. fitted plot that is used to analyze the homogeneity of variance as well as the Q-Q plot that is used to determine the normality of residuals.

ANCOVA

The AIC and BIC values for each linear model were tabulated along with the r-squared and adjusted r-squared values. Usually the lower the AIC and BIC scores are better, but in this case, fit2 - which has trophic dominance and human density as predictors - has a barely higher AIC and BIC score, but the r-squared value is higher, which means that this model explains a higher percentage of the variance of the response variable. Due to these reasons, fit2 was chosen as the linear model. Figure 9 shows that there is interaction between the dominant trophic level and the human population density metric because the slopes are crossed, which means that when conducting the ANCOVA, it is important to account for this interaction.

The p-value yielded was 8.049e-05, which is smaller than the alpha level of 0.05, meaning that the null hypothesis is rejected. The null hypothesis states that the slope of the line is equal to 0, which means that the dominant trophic level of fish and the human population density metric does not predict the square root of hard coral cover in Indonesia. Because this null hypothesis is rejected, this model is able to predict the square root of coral cover on a reef in Indonesia. The r-squared value of this model is 0.3369 which means that approximately 34% of the data is explained by this model. Figure 10 displays the predicted and actual data together.

Figure 7. Analysis of Assumptions for ANCOVA. This figure shows the residuals vs. fitted plot that is used to analyze the homogeneity of variance as well as the Q-Q plot that is used to determine the normality of residuals.

Figure 8.his table displays the outcomes of the AIC and BIC values for each model as well as the R-squared values.

Figure 9. This graph displays dominant trophic level and human population density metric as predictors of the square root of the percent cover of hard coral. This shows that there is interaction between the variables.

Figure 10. This plot shows the how well the data fits the model. The pink points are the predicted data while the black points are the actual data.

Discussion

Coral cover on coral reefs is vital for the health of the ecosystem, and with a changing climate and changing human activity, it is important to understand the basics of hard corals on coral reefs and what factors may influence the cover in order to understand loss of hard coral. This study only provided surface levels ideas of how coral cover can be influenced. If more time and resources were allotted, it would have been beneficial to delve deeper into other factors that may influence hard coral cover on coral reefs and how these factors directly impact individual corals.

The results of the ANOVA test suggest that there are differences between the square root of percent hard coral cover on coral reefs among different regions in Indonesia. Square root of hard coral cover is correlated to the regular value of percent coral cover, so higher square root of percent of coral cover also means that the regular value of percent coral cover is higher as well. The patterns observed from the ANOVA can be extrapolated, and it is safe to assume that coral reefs in different oceans and countries will also differ in hard coral cover among each other. Additionally, the results of the ANCOVA test suggest that the trophic level of fish that dominate the area as well as its interaction with human population density can help predict the square root of coral cover in Indonesia. All of the data was taken from Indonesia, so the results are specific to the regions in Indonesia, but these results can help scientists understand patterns to apply elsewhere and conduct further studies on this topic in other coral reef systems and predict what will happen to coral cover if human population density increases or decreases in a certain area, or if invasive herbivores or carnivores are introduced to a coral reef and alter the species composition.

Although the regression model showed to be able to predict the percent of hard coral cover on reefs in Indonesia, the r-squared value was still relatively low at approximately 0.34. This may be due to the fact that hard corals have many known stressors, and only two stressors were used in this model. Temperature and prevalence of coral diseases are only a couple more out of many factors that may also affect coral cover and adding factors such as these may increase the model’s ability to predict coral cover.

When analyzing the human population density metric as a predictor of hard coral cover on its own, the slope is -0.0065 and a p-value of 0.00925 (see appendix 4), suggesting that a denser human population is significant in causing less coral cover. With this knowledge future studies can involve what aspects of human activities may be linked to the degradation of corals, which could help future conservation efforts, and educate people to be more wary about their effects on the coral reef ecosystem.

The results from this study’s tests only predicted the percent of hard coral cover on reefs, but not the health. Usually the health of hard corals is correlated with the total percent of cover of coral, but unfortunately this was not analyzed. To understand these coral reef systems better and understand why the coral cover changes with these factors, it is important to study the direct effects that these factors have on the individual coral organisms themselves to specifically cater conservation efforts to specific problems.

Overall, studies like these will be able to help us better understand what is influencing the health of coral reefs, and it will allow us to better understand what conservation efforts need to be taken in order to make the most impact.

References

Carvalho, Paul et al. (2021), Fishing and habitat condition differentially affect size spectra slopes of coral reef fishes, Dryad, Dataset, https://doi.org/10.7291/D1DM42

John Fox and Sanford Weisberg (2019). An {R} Companion to Applied Regression, Third Edition. Thousand Oaks CA: Sage. URL: https://socialsciences.mcmaster.ca/jfox/Books/Companion/

R Core Team (2020). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. URL https://www.R-project.org/.

H. Wickham. ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag New York, 2016.

Yihui Xie (2014) knitr: A Comprehensive Tool for Reproducible Research in R. In Victoria Stodden, Friedrich Leisch and Roger D. Peng, editors, Implementing Reproducible Computational Research. Chapman and Hall/CRC. ISBN 978-1466561595

Hope Hahn