This project is based on a pilot study that was conducted with 32 patients that had acute Traumatic Brain Injury (TBI) at the University of Pennsylvania and Unviersity of Alabama Birmingham Hospital. The goal of the project was to determine the change in plasma von Willebrand (VWF) antigen levels.

In the context of this project, the goal is to determine if there are any other variables other than the VWF, such as the ADAMTS13 or other non-molecular clinical factors (e.g. hospital length of stay) that can be predictive of certain outcomes such as mortality, modified Rankin scale, or neurosurgery.

Based on the `.xlsx`

extension, we use the `library(readxl)`

package and load the data to check basic statistics using numerical and visual means.

```
### Load data
### here our syntax will be such that we will load libraries as-needed
library(readxl)
df <- read_xlsx("~/Documents/Projects/Huy/tbi_analysis/data/kumar_tbi_final111717.xlsx", sheet = 3)
```

There are many variables… in fact there are 79 variables. Initial variable creation/merging/manipulation have been done previously and for further information please consult Dr. Monisha Kumar. The first few columns (i.e. **vwfag_D[0-5]**) are clinical parameters taken by individuals at 5 different times. The variables with _avg at the end indicate the averge of the values (taken based on non-missing values).

Here we will dichotomize the dc_mrs variable such that >=3 will be set as 1 and < 3 will be set as 0. Also some variables that are binary will be changed to character or factor variables for easy analysis

```
library(dplyr) # for piping
df$dc_mrs_bin <- ifelse(df$dc_mrs >=3, 1, 0) # dichotomize using ifelse()
df <- df %>%
mutate_at(c("sex_binary","race_binary","surgery","dc_mrs_bin","mortality_atdischarge"),as.factor)
```

Below we (selectively) display numeric summary of 5 variables within the sample we have.

nbr.val | nbr.null | nbr.na | min | max | range | sum | median | mean | SE.mean | CI.mean.0.95 | var | std.dev | coef.var | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|

tbitbsubjectid |
32 | 0 | 0 | 28 | 78 | 50 | 1646 | 46.5 | 51.44 | 2.949 | 6.014 | 278.3 | 16.68 | 0.3243 |

vwfag_D0 |
10 | 0 | 22 | 0.9598 | 6.633 | 5.673 | 28.29 | 2.452 | 2.829 | 0.5085 | 1.15 | 2.586 | 1.608 | 0.5685 |

vwfag_D1 |
16 | 0 | 16 | 1.001 | 5.928 | 4.927 | 49.29 | 2.947 | 3.081 | 0.3557 | 0.7581 | 2.024 | 1.423 | 0.4618 |

vwfag_D2 |
20 | 0 | 12 | 1.817 | 15.27 | 13.45 | 87.34 | 3.598 | 4.367 | 0.6544 | 1.37 | 8.565 | 2.927 | 0.6702 |

vwfag_D3 |
21 | 1 | 11 | 0 | 17.45 | 17.45 | 105.3 | 4.233 | 5.012 | 0.9429 | 1.967 | 18.67 | 4.321 | 0.8621 |

Before going onto checking every single variable, we will focus mostly on the average values, demographic variables, and clinical outcome indicators. We will describe them more as we go.

We can see that a few variables seem to be close to being normally distributed and a few that have high kurtosis and skewness (we will quantify them below).

VWFAg avg | VWFAc avg | Ratio avg | A13 average | A13VWFratio average |
---|---|---|---|---|

2.562 | 0.9384 | -0.0002568 | 0.1052 | 2 |

HNP average | CCI score | dc_mrs | hosp_los | icu_los | pt | inr | ptt |
---|---|---|---|---|---|---|---|

1.401 | 1.582 | 0.2244 | 1.572 | 0.4971 | 0.244 | 0.255 | 0.6453 |

VWFAg avg | VWFAc avg | Ratio avg | A13 average | A13VWFratio average | HNP average | CCI score | dc_mrs | hosp_los | icu_los | pt | inr | ptt |
---|---|---|---|---|---|---|---|---|---|---|---|---|

6.488 | 1.136 | -1.217 | -0.7049 | 4.195 | 1.897 | 2.212 | -1.417 | 2.22 | -0.8989 | -0.8114 | -0.9175 | 0.9709 |

There definitely seems to be few variables that are highly skewed but most seem to be normally distributed. Similar abberation from a normal distribution can be indicated by looking at the kurtosis values.

The next report will go through some options regarding how to set up a prediction model that may allow for predicting the several outcomes of interest. The focus is on mortality at discharge and `dc_mrs`

which indicates the modified rank score that is dichotomized. We will explort dimension reduction strategies, variable selection, and resampling strategies.

Table 1 is based on stratifying the analytical cohort by mortality at discharge. In the below table 1 indicates those that have died and 0 indicates those subjects that haven’t.

level | 0 | 1 | p | test | |
---|---|---|---|---|---|

n |
23 | 3 | |||

VWFAg_avg (mean (sd)) |
4.13 (3.17) | 8.52 (6.45) | 0.056 | ||

VWFAc_avg (mean (sd)) |
2.90 (1.13) | 4.70 (1.83) | 0.023 | ||

Ratio_avg (mean (sd)) |
0.81 (0.24) | 0.63 (0.25) | 0.239 | ||

A13_avg (mean (sd)) |
0.77 (0.32) | 0.75 (0.12) | 0.922 | ||

A13VWFratio (mean (sd)) |
0.25 (0.17) | 0.11 (0.05) | 0.175 | ||

HNP_avg (mean (sd)) |
26.69 (23.49) | 20.83 (12.47) | 0.679 | ||

mechanismofinjury (mean (sd)) |
2.57 (1.16) | 2.33 (1.15) | 0.748 | ||

CCI (mean (sd)) |
1.22 (1.83) | 1.33 (1.53) | 0.918 | ||

age (mean (sd)) |
44.04 (22.94) | 39.00 (15.39) | 0.717 | ||

sex_binary (%) |
0 | 3 (13.0) | 1 ( 33.3) | 0.408 | exact |

1 | 20 (87.0) | 2 ( 66.7) | |||

race_binary (%) |
0 | 12 (52.2) | 1 ( 33.3) | 1.000 | exact |

1 | 11 (47.8) | 2 ( 66.7) | |||

surgery (%) |
0 | 13 (56.5) | 1 ( 33.3) | 0.887 | |

1 | 10 (43.5) | 2 ( 66.7) | |||

hosp_los (mean (sd)) |
16.70 (13.47) | 27.67 (27.21) | 0.248 | ||

gcs_adm (mean (sd)) |
8.22 (5.47) | 4.33 (2.31) | 0.242 | ||

pt (mean (sd)) |
14.24 (1.41) | 13.60 (1.47) | 0.469 | ||

inr (mean (sd)) |
1.19 (0.14) | 1.13 (0.12) | 0.522 | ||

ptt (mean (sd)) |
30.70 (5.90) | 34.23 (3.61) | 0.326 | ||

dc_mrs_bin (%) |
0 | 13 (56.5) | 0 ( 0.0) | 0.220 | exact |

1 | 10 (43.5) | 3 (100.0) |

We can see that there really isn’t any statistically significant difference in the two groups based on t-test (unequal variance assumption) and Fisher’s exact test (\(\chi^2\) equivalent for small sample). Let’s look at difference in modified rank score groups. In the below Table 2, 1 indicates those with high modified rank score (i.e. MRS >= 3) and 0 indicates those with low modified rank score.

level | 0 | 1 | p | test | |
---|---|---|---|---|---|

n |
13 | 13 | |||

VWFAg_avg (mean (sd)) |
3.15 (1.42) | 6.13 (4.77) | 0.041 | ||

VWFAc_avg (mean (sd)) |
2.45 (0.95) | 3.76 (1.34) | 0.009 | ||

Ratio_avg (mean (sd)) |
0.83 (0.24) | 0.74 (0.25) | 0.341 | ||

A13_avg (mean (sd)) |
0.79 (0.36) | 0.75 (0.25) | 0.758 | ||

A13VWFratio (mean (sd)) |
0.30 (0.20) | 0.16 (0.07) | 0.024 | ||

HNP_avg (mean (sd)) |
26.99 (29.29) | 25.04 (13.59) | 0.829 | ||

mechanismofinjury (mean (sd)) |
2.69 (0.95) | 2.38 (1.33) | 0.502 | ||

CCI (mean (sd)) |
1.46 (1.81) | 1.00 (1.78) | 0.518 | ||

age (mean (sd)) |
46.54 (23.98) | 40.38 (20.35) | 0.487 | ||

sex_binary (%) |
0 | 2 ( 15.4) | 2 (15.4) | 1.000 | exact |

1 | 11 ( 84.6) | 11 (84.6) | |||

race_binary (%) |
0 | 8 ( 61.5) | 5 (38.5) | 0.434 | exact |

1 | 5 ( 38.5) | 8 (61.5) | |||

mortality_atdischarge (%) |
0 | 13 (100.0) | 10 (76.9) | 0.220 | exact |

1 | 0 ( 0.0) | 3 (23.1) | |||

surgery (%) |
0 | 12 ( 92.3) | 2 (15.4) | <0.001 | |

1 | 1 ( 7.7) | 11 (84.6) | |||

hosp_los (mean (sd)) |
9.00 (6.87) | 26.92 (16.17) | 0.001 | ||

gcs_adm (mean (sd)) |
11.23 (5.12) | 4.31 (2.63) | <0.001 | ||

pt (mean (sd)) |
14.23 (1.44) | 14.10 (1.41) | 0.817 | ||

inr (mean (sd)) |
1.18 (0.15) | 1.18 (0.12) | 0.886 | ||

ptt (mean (sd)) |
30.41 (6.54) | 31.81 (4.99) | 0.545 |

With the outcome as modified rank score, we definitely see some difference between those with higher score vs. those with lower (\(0-2 vs. 3-6\)). Perhaps it would be prudent to look at results for this outcome carefully. Let’s also get some univariate results for difference inthose who get neurosurgery vs. those who do not (In Table 3, surgery = 1 and no surgery = 0).

level | 0 | 1 | p | test | |
---|---|---|---|---|---|

n |
13 | 13 | |||

VWFAg_avg (mean (sd)) |
3.15 (1.42) | 6.13 (4.77) | 0.041 | ||

VWFAc_avg (mean (sd)) |
2.45 (0.95) | 3.76 (1.34) | 0.009 | ||

Ratio_avg (mean (sd)) |
0.83 (0.24) | 0.74 (0.25) | 0.341 | ||

A13_avg (mean (sd)) |
0.79 (0.36) | 0.75 (0.25) | 0.758 | ||

A13VWFratio (mean (sd)) |
0.30 (0.20) | 0.16 (0.07) | 0.024 | ||

HNP_avg (mean (sd)) |
26.99 (29.29) | 25.04 (13.59) | 0.829 | ||

mechanismofinjury (mean (sd)) |
2.69 (0.95) | 2.38 (1.33) | 0.502 | ||

CCI (mean (sd)) |
1.46 (1.81) | 1.00 (1.78) | 0.518 | ||

age (mean (sd)) |
46.54 (23.98) | 40.38 (20.35) | 0.487 | ||

sex_binary (%) |
0 | 2 ( 15.4) | 2 (15.4) | 1.000 | exact |

1 | 11 ( 84.6) | 11 (84.6) | |||

race_binary (%) |
0 | 8 ( 61.5) | 5 (38.5) | 0.434 | exact |

1 | 5 ( 38.5) | 8 (61.5) | |||

mortality_atdischarge (%) |
0 | 13 (100.0) | 10 (76.9) | 0.220 | exact |

1 | 0 ( 0.0) | 3 (23.1) | |||

surgery (%) |
0 | 12 ( 92.3) | 2 (15.4) | <0.001 | |

1 | 1 ( 7.7) | 11 (84.6) | |||

hosp_los (mean (sd)) |
9.00 (6.87) | 26.92 (16.17) | 0.001 | ||

gcs_adm (mean (sd)) |
11.23 (5.12) | 4.31 (2.63) | <0.001 | ||

pt (mean (sd)) |
14.23 (1.44) | 14.10 (1.41) | 0.817 | ||

inr (mean (sd)) |
1.18 (0.15) | 1.18 (0.12) | 0.886 | ||

ptt (mean (sd)) |
30.41 (6.54) | 31.81 (4.99) | 0.545 |

In addition to the VWF and A13 clinical variables, the Hospital and ICU length of stay seem to be univariately associated with the surgery outcome. We can further investigate the classification that can be done in terms of the 3 outcomes.

For the next step (i.e. Statistical Analysis) click here.