Delta and Omicron are the most recent variants so far during the COVID-19 pandemic. Knowing the difference in the death rate between Delta and Omicron can help us to develop more knowledge for the COVID-19 virus. This study targets the population worldwide. We compare the death rate of Delta and Omicron with vaccination rate by each country. After the Box-Cox transformation of the variable death rate, the assumptions for missing value, outliers, normality, independent and constant variance are all met. The two-way ANOVA is the model that fits best for the study. The result we found is that Delta is more deadly than Omicron. There is no effect of variants that depend on vaccination rate/level.
The COVID-19 virus has been with us for more than two years. During these two years, the virus has developed into a few different variants which are Alpha, Beta, Gamma, Delta and Omicron. The last two recent variants Delta and Omicron were considered as Variants of concern that were named by monitoring organizations, such as the CDC. In addition, once Omicron arrived, getting infected became very common. Even people who were wearing masks and were fully vaccinated still got infected from the virus. According to “Is Omicron really mild?” from Michigan Health, the infections are mild in most people. Because people recover more easily, getting infected is not the primary concern. The primary concern is the death rate. Therefore, in this project, one of the questions of interest is if there is any difference in death rate between Delta and Omicron. If so, which variant has a higher death rate? Another question of interest is if the death rate of Delta and Omicron depend on vaccination rate/level.
This project targets the population all over the world. There are three data sets that will be used on this project.
One set of data is called “WHO COVID-19 global table data” from WHO. This data set will be used to find the death rate for Delta and Omicron. The Delta variant was first found in India in October 2020 before vaccination began. However, it was not designated as a VOC(Variant of Concern) until May 11, 2021. Omicron was first found in South Africa on 11/24/2021. Delta and Omicron are the only two VOC found through out the COVID-19 pandemic. The article “Omicron Is Driving Delta Into the Ground” from The Atlantic stated at late November 2021 that Omicron over took Delta. In addition, the numbers of new cases became lower on 11/24/2021. Therefore, Omicron became the main variant over Delta on 11/24/2021. To generate the death rate data for Delta and Omicron, the data collected from 5/11/2021 to 11/23/2021 was assigned to Delta. The data collected on and after 11/24/2021 was assigned to Omicron.
The other set of data is a vaccination data also from WHO. This data includes the total number of people who were one-plus dose vaccinated and the total number of people who were fully vaccinated. It can be used to show the relationship between one-plus dose vaccinated and fully vaccinated.
Since this vaccination data from the WHO only provides a total number that doesn’t work for model building in this study, I used another vaccination data from Our World In Data. This data contains vaccination information by date. According to “How long does it take for the COVID-19 vaccine to work?” from The Ohio State University, it takes two weeks for the vaccine to work. Therefore, the vaccine rate on 11/9/2021 was assigned to Delta, and the vaccine rate till the latest date in the data set was assigned to Omicron.
Based on my questions of interest, a new data set needed to be created. A list of variables are listed as follow.
Relationship between vaccination
The correlation between One-plus dose and fully vaccinated is 0.9821619, which is close to 1. From the plot we can also see they are very highly correlated to each other. Therefore, in this project, it is not necessary to use both variables to represent vaccination. It is enough to just use one of these two variables in the model.
Missing values
## death_rate variant v_level
## Min. :0.000000 Delta :215 0-20% : 85
## 1st Qu.:0.002684 Omicron:219 21-40%: 63
## Median :0.006916 41-60%: 87
## Mean :0.012295 61-80%:133
## 3rd Qu.:0.016633 80%+ : 34
## Max. :0.191942 NA's : 32
The variable vaccination level tell us that there are 32 missing values. Before merging the death rate from the “WHO COVID-19 global table data” with vaccination level from the vaccination data, some countries in the vaccination data are in the “WHO COVID-19 global table data”. Some countries in the “WHO COVID-19 global table data” are not in the vaccination data. This is the reason why there are 32 missing values. These missing values need to be removed.
Summary table of the data
## death_rate variant v_level
## Min. :0.000000 Delta :198 0-20% : 85
## 1st Qu.:0.002791 Omicron:204 21-40%: 63
## Median :0.007327 41-60%: 87
## Mean :0.012797 61-80%:133
## 3rd Qu.:0.016892 80%+ : 34
## Max. :0.191942
After removing the missing values, we can see there is at least one country that has a 0% death rate. The mean death rate is 1.28%, the median is 0.73%, and the maximum death rate is 19.19%. The numbers of observations for Delta is 198, and for Omicron is 204. The numbers of observations for vaccination level 0-20% is 85, for 21-40% is 63, for 41-60% is 87, for 61-80% is 133, and for 80% and up is 34.
##
## 0-20% 21-40% 41-60% 61-80% 80%+
## Delta 42 31 43 65 17
## Omicron 43 32 44 68 17
This table tells us that the number of each cell is not equal. The model will be an unbalanced two-way ANOVA model.
Visualizations of the data
Histogram of the Response Variable
The histogram of the variable death rate shows the data is right skewed. In order to have a normal distributed data, a Box-Cox transformation is needed. Meanwhile, we see there is a death rate around 0.2% which seems like an outlier. We can take a close look of this country.
The country Yemen (country code: YEM) has the highest death rate 19.19424%. From the data above, we see that this death rate is for the Delta variant. Also, their vaccination rate is only 1.2% which means most of the population of the country has not been vaccinated. This might be the main reason that their death rate is so high.
Side-by-side Boxplots
Comparing Delta and Omicron, Delta has a higher death rate than Omicron.Comparing vaccination levels, a higher vaccination level tends to have a lower death rate except for vaccination level 0-20% and 21-40%. These two vaccination rates have a similar death rate.
Base on this box plot, for Delta, a higher vaccination level has a lower death rate except for 0-20% and 21-40% vaccination level. Their death rate is similar. For omicron, a higher vaccination level has a lower death rate.
Interaction plot
The red and yellow lines look parallel which indicates the absence of an interaction effect. The effect of variant does not depend on the vaccination level between 0-20% and 21-40%. The green and black lines are not parallel which means that they may have some interaction effect. The effect of variant may depend on the vaccination level between 61-80% and 80%+.
A two-way ANOVA model is suitable for this study because we need to estimate how the death rate changes according to the levels of two categorical variables, variant and vaccination level.
I have fit two two-way ANOVA models. one(model1) is with interaction, and the other one (model2) is without interaction.
For model1, the model is \[ Y_{ijk} = \mu_{\cdot\cdot} + \alpha_i+\beta_j + (\alpha\beta)_{ij}+\epsilon_{ijk}, \ k=1,\ldots, n, j=1,\ldots, b, i=1,\ldots, a, \]
with constrains \[\begin{align} \sum_{i=1}^a \alpha_i & = \sum_{j=1}^b \beta_j=0\\ \sum_{i=1}^a (\alpha\beta)_{ij} & =\sum_{j=1}^b (\alpha\beta)_{ij} =0 \end{align}\]
For model2, the model is \[Y_{ijk} = \mu_{\cdot\cdot} + \alpha_i+\beta_j +\epsilon_{ijk}, \ k=1,\ldots, n, j=1,\ldots, b, i=1,\ldots, a\]
with constrains \[\begin{align} \sum_{i=1}^a \alpha_i & = \sum_{j=1}^b \beta_j=0 \end{align}\]
where \(\{\epsilon_{ijk}\}\) are i.i.d. \(N(0,\sigma^2)\). The \(\ Y_{ijk}\) represents the death rate. \[ \mu_{\cdot \cdot} =\sum_{i=1}^a \sum_{j=1}^ b \mu_{ij}/(ab), \ \mu_{i\cdot} = \sum_{j=1}^b \mu_{ij} /b, \ \mu_{\cdot j}=\sum_{i=1}^a \mu_{ij}/a. \] \[ \alpha_i=\mu_{i\cdot} - \mu_{\cdot \cdot},\ \beta_j=\mu_{\cdot j}- \mu_{\cdot\cdot} \]
\(\alpha_i\) is variant, where the index \(i\) represents the type of variant: Delta (\(i=1\)) and Omicron (\(i=2\)). \(\beta_j\) is vaccination level where the index \(j=1,2...5\) represents the level of the vaccination: 0-20% (\(j=1\)), 21-40% (\(j=2\)),41-60% (\(j=3\)),61-80% (\(j=4\)), and 80%+ (\(j=5\)). \(\epsilon_{ijk}\) is i.i.d. \(N(0,\sigma^2)\).
The assumptions of a two-way ANOVA model (Please see the sensitivity analysis session below):
After checking the above assumptions, we found out we need to do a transformation for the response variable to meet the normality assumption and the equal variance assumption. Other than that, the data met all the other assumptions. For the model fitting, the Box-Cox transformed data was used.
Model 1: Fitting an two-way ANOVA model with interaction:
F-test for the interaction term \((\alpha\beta)_{ij}\): the null and alternative hypotheses. \[ H_0: (\alpha\beta)_{ij}=0 \ \forall i, j \ \ {\rm v.s.} \ \ \ H_1:\ {\rm not \ all \ } (\alpha\beta)_{ij} \ {\rm are \ zero}. \]
F-test for \(\alpha_{i}\): the null and alternative hypotheses. \[ H_0: \alpha_{i}=0 \ \ \forall i \ \ \ {\rm v.s.}\ \ \ H_1:\ {\rm not \ all \ } \alpha_i \ {\rm are \ zero}. \]
F-test for \(\beta_{j}\): the null and alternative hypotheses. \[ H_0: \beta_{j}=0 \ \ \forall j \ \ \ {\rm v.s.}\ \ \ H_1:\ {\rm not \ all \ } \beta_j \ {\rm are \ zero}. \]
According to the information of the ANOVA table above, we failed to reject the null hypotheses for the interaction term because the p-value of the F-test is \(0.9492\) at the level of 0.05, and we concluded that the effect of the variants do not depend on vaccination rate.
For variants, we reject the null hypotheses because the p-value of the F-test for variant is almost equal to zero at the level of 0.05, and we concluded that the death rate for Delta and for Omicron are not all the same.
For vaccination levels, we reject the null hypotheses because the p-value of the F-test for schools is almost equal to zero at the level of 0.05, and we concluded that the death rate from different vaccination levels are not all the same.
Model 2: Fitting a two-way ANOVA model without interaction:
F-test for \(\alpha_i\): the null and alternative hypotheses. \[ H_0: \alpha_{i}=0 \ \ \forall i \ \ \ {\rm v.s.}\ \ \ H_1:\ {\rm not \ all \ } \alpha_i \ {\rm are \ zero}. \]
F-test for \(\beta_{j}\): the null and alternative hypotheses. \[ H_0: \beta_{j}=0 \ \ \forall j \ \ \ {\rm v.s.}\ \ \ H_1:\ {\rm not \ all \ } \beta_j \ {\rm are \ zero}. \]
According to the information of the ANOVA table above, we reject the null hypotheses for the variants variable because the p-value of the F-test is almost equal to zero at the level of 0.05, and we concluded that the death rate for Delta and for Omicron are not the same.
We also reject the null hypotheses for the vaccination levels because the p-value of the F-test is almost equal to zero at the level of 0.05, and we concluded that the death rate from different vaccination levels are not all the same.
Comparing to the result from the model1 and model2, since the interaction term is not significant, I used model2, the model without interaction, to find the 95% confidence interval for death rate difference between Delta and Omicron.
The Tukey-Kramer testing:
For the Post-Hoc test, the Tukey-Kramer testing is suitable because the sample size for each group is not equal. Delta has 215 observations, and Omicron has 219 observations. The the null and alternative hypotheses for this Tukey-Kramer testing is \[H_0: \mu_{Delta}=\mu_{Omicron} \\\ H_1:\ \mu_{Delta} \not= \mu_{Omicron} \]
## diff lwr upr p adj
## Omicron-Delta -0.264917 -0.3134021 -0.2164319 0
Based on the results from the Tukey-Kramer testing, at the level of 0.05, comparing the mean death rate between Delta and Omicron, we rejected the null hypotheses, and concluded that the mean death rate is different between Delta and Omicron.
The Tukey-kramer test result tells us that Delta is more deadly than Omicron because “Omicron-Delta” is a negative number. The 95% confidence interval for the death rate, which was transformed to \(\frac{(\ Death \ Rate) ^ \lambda-1}{\lambda}\), where \(\lambda\) is = 0.3030303, tells us that the difference between Delta and Omicron is (-0.3134021, -0.2164319)
Assumptions check before fitting models
All required assumptions for two-way ANOVA have to be met before fitting the model. As mentioned above, the assumptions of a two-way ANOVA model are as follow.
Missing values
All missing values have been removed due to lack of information or un-matched country code.
Outliers
Outlying Y observations:
In order to identify outlying Y cases, we need to find the studentized deleted residuals, and then to see if any of these residuals exceed Bonferroni’s threshold, which is \(\ t(1-\frac{\alpha}{2n};n-p-1)\) at the level of \(\alpha\).
\(p\) is number of predictors. \(n\) is numbers of total observations.
The studentized deleted residuals are the deleted residuals divided by their standard errors: \[t_{i}=\frac{d_i}{S(d_i)} \ , \ d_i= \frac{e_i}{1-h_{ii}} \]
\(e_i\) is ordinary residuals. \(h_{ii}\) is the hat matrix.
## 212 58 431 171 155 407
## 12.931474 6.971129 5.733834 4.575430 4.064502 2.735246
There are 5 residuals that exceed Bonferroni’s threshold, which is 3.6969952. These 5 data points are considered as outlying Y observations.
Outlying X observations
Outlying X observations are identified through examining the diagonal elements \(h_{ii}\) of the hat matrix, a.k.a., leverage. A large value of \(h_{ii}\) indicates that the X values of the i-th case is far away from the center of the X observations when taking into account the shape of the data.
We can use \(h_{ii}>2pn\) as a criteria to identify the i-th case as outlying with regard to its X values.
## 74 290
## 0.02313785 0.02293260
There are two data points that are considered as outlying X.
The Cook’s distance
Now we need to see if any of these data points (point 212,58,431,171,155,407,74 and 290) have aggregate influence. We can use the Cook’s distance to do that. If the value is larger than 1, this point is considered as influential which needs further examination.
\[D_i := \frac{\sum_{j=1}^n (\hat{Y}_j-\hat{Y}_{j(i)})^2}{p \cdot MSE}\]
Here, \(\hat{Y}_j\) is the fitted value for the \(j^{th}\) case when all cases are used in fitting the model and \(\hat{Y}_{j(i)}\) is the fitted value for the \(j^{th}\) case when the \(i^{th}\) case is excluded from the fitting.
According to the Cook’s distance Plot above, Since point 212 is standing out, we can do an investigation to this point.
As we discussed above in the histogram of the death rate variable, Yemen is the country who has the highest death rate. The point 212 is not considered as an influential outlier because it makes scene that a lower vaccination rate tends to a higher death rate. In addition, according to the result of the Cook’s distance, none of the values is larger than 1. Therefore, there is no influential outliers in the data set. All outliers can be retained.
Normality
Recall that the response variable death rate is right skewed. A Box-cox transformation is needed.
Since 1 does not include in the 95% confidence interval, a transformation is needed. \(\lambda\) is = 0.3030303. Hence, the transformed death rate is \(\frac{(\ Death \ Rate) ^ \lambda-1}{\lambda}\).
The histogram on the left is the death rate distribution before the transformation. The histogram on the right is the death rate distribution after the transformation. It is obvious to see that the response variable is normal distributed after the transformation.
Independence
A Durbin-Watson Test can be used to check if the residuals are independent.
The null hypothesis: There is no correlation among the residuals.
The alternative hypothesis: The residuals are auto correlated.
## lag Autocorrelation D-W Statistic p-value
## 1 -0.01596649 2.031333 0.748
## Alternative hypothesis: rho != 0
From the output we can see that the test statistic is 2.031333 and the corresponding p-value is 0.82. Since this p-value is larger than 0.05, we failed to reject the null hypothesis and conclude that there is no correlation among the residuals. The residuals are independent.
Equal variances
Equal variances can be tested by using the Levene test.
The null hypothesis: The population variances are equal
The alternative hypothesis: The population variances are not equal
## [1] "Levene test using original data"
## Levene's Test for Homogeneity of Variance (center = median)
## Df F value Pr(>F)
## group 9 5.259 8.923e-07 ***
## 392
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## [1] "Levene test using transformed data"
## Levene's Test for Homogeneity of Variance (center = median)
## Df F value Pr(>F)
## group 9 1.4705 0.1568
## 392
Based on the Levene test result for the data before transformation, we have to reject the \(H_0\) and conclude the population variances are not equal because the p-value is 8.923e-07 which is less than 0.05.
Based on the Levene test result for the data after transformation, we failed to reject the \(H_0\) and conclude the population variances are equal because the p-value is 0.1568 which is larger than 0.05.
Assumption plots after transformation
The Box-cox transformation helped to improve the distribution of the response variable from not normal distributed to normal distributed. It also helped to improve the residuals variance from not equal to equal. Therefore, using a transformed data is necessary.
The plots above are from transformed data. From the residuals vs. fitted value plot, the expected value of the residuals is approximately 0, and the variance is approximately constant. Assumption of constant variance is met. The line in the Q-Q plot looks like a straight line. Assumption of normality is met. There is no obvious outliers in the Residuals vs Leverage plot.
The aim for this project is to compare which variant is more deadly. We also want to see if the death rate of Delta and Omicron depend on the vaccination rate. The ANOVA table for the model with interaction showed the interaction term was not significant at the level of 0.05 because the p-value is 0.9492, which is larger than 0.05. Because of this, the death rate of Delta and the death rate of Omicron do not depend on the vaccination rate. Since the interaction term is not significant, we fit another two-way ANOVA model without interaction. The ANOVA table showed that the death rate between Delta and Omicron are different at the level of 0.05. The Tukey-kramer test result showed that Delta is more deadly than Omicron. The 95% confidence interval for the death rate, which was transformed to \(\frac{(\ Death \ Rate) ^ \lambda-1}{\lambda}\), where \(\lambda\) is = 0.3030303, tells us that the difference between Delta and Omicron is (-0.3134021, -0.2164319).
This study is not suitable for causal inference. One of the assumptions for causal inference is called “stable unit treatment value assumption”, which means that the person who is in the treatment group cannot interact with the person who is in the control group. To apply this assumption to this COVID-19 project, it would be impossible to prevent interaction between a person who was vaccinated and a person who was not vaccinated. In addition, the other assumption is called ignorability. This assumption requires that there is no other unknown confounders. In this COVID-19 study, Delta happened during the summer and Omicron happened during fall and winter. Therefore, temperature is a confounder since it may have effect the death rate for Delta and Omicron. Other than temperature, there are other factors that are also confounders, such as masks and social distancing policies for each country, population distribution for each country, etc. Since the COVID-19 pandemic is a global event, there must be many confounders that we are not aware of. Therefore, this study is not suitable for causal inference.
Tracking sars-COV-2 variants - world health organization. (n.d.). Retrieved March 4, 2022, from https://www.who.int/health-topics/typhoid/tracking-SARS-CoV-2-variants
Malcom, K. (2022, January 20). Is Omicron really mild? Health & Wellness Topics, Health Tips & Disease Prevention. Retrieved March 4, 2022, from https://healthblog.uofmhealth.org/wellness-prevention/omicron-really-mild#:~:text=Washer%3A%20For%20many%20people%2C%20especially,is%20relatively%20common%E2%80%94and%20headaches.
Wu, K. J. (2022, January 28). Will Delta Survive the Omicron Wave? The Atlantic. https://www.theatlantic.com/science/archive/2022/01/delta-omicron-showdown/621380/
Author: Nora Colburn, MD, MPH. (2021, January 25). How long does it take for the COVID-19 vaccine to work? How Long Does It Take for the COVID-19 Vaccine to Work? | Ohio State Medical Center. https://wexnermedical.osu.edu/blog/how-long-for-covid-vaccine-to-work
sessionInfo()
## R version 4.2.2 (2022-10-31 ucrt)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
## Running under: Windows 10 x64 (build 19044)
##
## Matrix products: default
##
## locale:
## [1] LC_COLLATE=English_United States.utf8
## [2] LC_CTYPE=English_United States.utf8
## [3] LC_MONETARY=English_United States.utf8
## [4] LC_NUMERIC=C
## [5] LC_TIME=English_United States.utf8
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] car_3.1-1 carData_3.0-5 MASS_7.3-58.1 hrbrthemes_0.8.0
## [5] plotly_4.10.1 dplyr_1.1.0 ggplot2_3.4.1 countrycode_1.4.0
##
## loaded via a namespace (and not attached):
## [1] Rcpp_1.0.10 tidyr_1.3.0 digest_0.6.31
## [4] utf8_1.2.3 mime_0.12 R6_2.5.1
## [7] evaluate_0.20 highr_0.10 httr_1.4.4
## [10] pillar_1.8.1 gdtools_0.3.1 rlang_1.0.6
## [13] lazyeval_0.2.2 curl_5.0.0 rstudioapi_0.14
## [16] data.table_1.14.8 extrafontdb_1.0 jquerylib_0.1.4
## [19] rmarkdown_2.20 labeling_0.4.2 extrafont_0.19
## [22] htmlwidgets_1.6.1 munsell_0.5.0 shiny_1.7.4
## [25] compiler_4.2.2 httpuv_1.6.9 xfun_0.37
## [28] pkgconfig_2.0.3 systemfonts_1.0.4 gfonts_0.2.0
## [31] htmltools_0.5.4 tidyselect_1.2.0 tibble_3.1.8
## [34] fontBitstreamVera_0.1.1 httpcode_0.3.0 fansi_1.0.4
## [37] viridisLite_0.4.1 crayon_1.5.2 withr_2.5.0
## [40] later_1.3.0 crul_1.3 grid_4.2.2
## [43] jsonlite_1.8.4 xtable_1.8-4 Rttf2pt1_1.3.12
## [46] gtable_0.3.1 lifecycle_1.0.3 magrittr_2.0.3
## [49] scales_1.2.1 cli_3.6.0 cachem_1.0.6
## [52] farver_2.1.1 promises_1.2.0.1 bslib_0.4.2
## [55] ellipsis_0.3.2 generics_0.1.3 vctrs_0.5.2
## [58] tools_4.2.2 glue_1.6.2 fontquiver_0.2.1
## [61] purrr_1.0.1 abind_1.4-5 fastmap_1.1.0
## [64] yaml_2.3.7 colorspace_2.1-0 fontLiberation_0.1.0
## [67] memoise_2.0.1 knitr_1.42 sass_0.4.5