For our survival modeling analyses, we utilize publicly available clinical data from the TCGA-BRCA Pan-Cancer Atlas (2018). The file name is (brca_tcga_pan_can_atlas_2018_clinical_data.tsv). This dataset, widely used in cancer research, includes comprehensive clinical annotations for breast cancer patients.
In the dataset, there are 1,085 patients, of whom 145 experienced the event of interest (Progression) and 940 were censored (i.e., did not experience progression during the observation period). Our platform focuses on a subset of clinically and genomically relevant variables to enhance both the interpretability and predictive performance of survival models. By concentrating on these features, we reduce noise and ensure that the model outputs are meaningful for clinical decision-making. The variables considered include: Diagnosis Age: Age at diagnosis (continuous). Aneuploidy Score: Measurement of chromosomal instability. Buffa Hypoxia Score: Quantifies tumor hypoxia levels. Time Since Initial Pathologic Diagnosis: Duration since diagnosis (continuous). Fraction Genome Altered: Proportion of the genome with alterations. MSI Mantis Score: Microsatellite instability measured via the MANTIS tool. MSIsensor Score: Microsatellite instability measured via the MSIsensor tool. Mutation Count: Total number of mutations detected in the tumor. Progression-Free Survival (months): Time the patient remains progression-free. Ragnum Hypoxia Score: Additional hypoxia-related metric. Sex: Categorical variable. Tumor Break Load: Measure of genomic breaks in tumor DNA. TMB (Nonsynonymous): Tumor mutation burden considering nonsynonymous mutations. Winter Hypoxia Score: Additional hypoxia metric to capture tumor oxygenation levels. By focusing on these clinically and genomically informative features, the platform balances model complexity with interpretability, enabling robust and actionable predictions of patient-specific outcomes.
The model includes these covariates representing clinical or biological measurements, estimated simultaneously for their effect on survival (hazard):
The model estimates log hazard ratios (β) and corresponding hazard ratios (HR = exp(β)) indicating how a 1-unit increase affects risk:
| Variable | Coef (β) | HR (exp(β)) | 95% CI HR Lower | 95% CI HR Upper | p-value | Interpretation |
|---|---|---|---|---|---|---|
| Buffa Hypoxia Score | -0.0443 | 0.957 | 0.922 | 0.993 | 0.0186 | Each unit ↑ decreases hazard by 4.3% (protective) |
| Last Communication Contact Time | -0.0039 | 0.996 | 0.995 | 0.997 | 2.2e-19 | Longer time since diagnosis strongly lowers hazard |
| Fraction Genome Altered | -3.3045 | 0.037 | 0.0043 | 0.316 | 0.0026 | High fraction altered strongly protective |
| Progress Free Survival (months) | -0.0606 | 0.941 | 0.925 | 0.957 | 4.1e-12 | Longer PFS time decreases hazard |
| Ragnum Hypoxia Score | 0.0418 | 1.043 | 1.005 | 1.082 | 0.0249 | Higher score raises hazard by ~4.3% |
| Winter Hypoxia Score | 0.0466 | 1.048 | 1.015 | 1.081 | 0.0037 | Higher score raises hazard by ~4.8% |
Non-significant covariates (p > 0.05): Diagnosis age, aneuploidy score, MSI mantis score, MSIsensor score, mutation count, sex, tumor break load, TMB (nonsynonymous).
P-values test the null hypothesis β = 0 (no effect). Variables with p < 0.05 are statistically significant. Your model contains highly significant predictors (e.g., last communication contact p < 1e-18), and some borderline (diagnosis age p=0.066) or non-significant variables (sex p=0.44).
Confidence intervals (CI) provide plausible effect ranges. Significant variables have CIs for HR that do not cross 1, reinforcing the effect. For example, Buffa hypoxia score HR CI: 0.922 – 0.993 (protective). Non-significant covariates often have CIs including 1.
The Cox model is semi-parametric and estimates hazard multiplicatively relative to baseline hazard. Although not directly reported, key functions include:
The baseline survival curve is available and visualized below.
To validate model assumptions and improve reliability, consider:
You can predict survival probabilities or hazard for new patients by:
LP = β₁x₁ + β₂x₂ + ... + βₖxₖ for the new covariate values.HR = exp(LP).S₀(t) (estimated from training data) scaled by the relative hazard to get predicted survival: S(t|x) = S₀(t)^{HR}.survival_curve.png) or retrieve baseline hazard from the fitted model object in code.In practice, software packages (e.g., lifelines, R's survival) provide functions for prediction, including confidence intervals.
The log-normal survival model assumes the log of survival time follows a normal distribution with mean μ and standard deviation σ. The location parameter μ is modeled as:
The scale parameter is modeled separately by the log of σ:
Each coefficient βᵢ represents the effect of one unit increase in the covariate on the log of survival time. A positive coefficient indicates longer expected survival time; a negative coefficient indicates shorter survival.
Exponentiated coefficients exp(β) are interpreted as multiplicative effects on survival time (not hazards). Values > 1 indicate longer survival time; < 1 indicate shorter.
Example: Diagnosis age has exp(β) = 0.9934, meaning each additional year decreases expected survival to ~99.34% of the previous.
Significant predictors (p < 0.05) include:
All other covariates are not statistically significant predictors of survival time.
For significant variables, 95% CIs for coefficients exclude zero and for exp(β) exclude one, confirming significance.
The survival function s(t) gives probability of surviving beyond time t, computed from the normal distribution of log(t) with mean μ and scale σ.
The hazard function h(t) represents instantaneous risk at time t. For log-normal, hazard is non-monotonic, rising then falling.
The cumulative hazard function is the integral of the hazard over time (not shown explicitly).
The scale parameter σ controls spread of log survival times and is modeled by the intercept under 'sigma_'.
Log-normal models do not have a separate shape parameter like Weibull models.
These indicate the model fits significantly better than a null model, but fit should be compared to alternative models.
No residuals (deviance, Cox-Snell, Martingale, Schoenfeld) or proportional hazards tests are reported.
Proportional hazards assumption is not relevant for this parametric log-normal model.
No diagnostics for linearity or time-dependent effects were performed.
The model's c-index is 0.040, which is very low compared to random chance (~0.5). This suggests poor predictive discrimination on the test data.
| Variable | Coefficient (β) | exp(β) | p-value | Interpretation |
|---|---|---|---|---|
| Diagnosis age | -0.0066 | 0.9934 | 0.0188 | Each additional year reduces expected survival time by ~0.66%. |
| Last communication contact | 0.000138 | 1.00014 | 0.00319 | Longer follow-up associated with slightly longer survival. |
| MSI mantis score | -2.123 | 0.120 | 0.047 | Higher score strongly associated with longer survival. |
| Progress free survival (months) | 0.0158 | 1.0159 | 8.7e-16 | Longer progression-free survival strongly predicts longer overall survival. |
| Intercept (mu_) | 4.289 | 72.92 | ~0 | Baseline log survival time. |
| Intercept (sigma_) | -0.864 | 0.421 | ~0 | Scale parameter (log survival time variability). |
μ_new = β₀ + Σ(βᵢ × xᵢ_new)
σ = exp(γ₀)
S(t) = 1 - Φ[(log(t) - μ_new) / σ]
where Φ is the standard normal CDF.
Coefficients represent effects on the scale parameter (λ). Positive β → increased hazard (shorter survival), negative β → decreased hazard (longer survival).
| Covariate | β (Coef.) | HR = exp(β) | 95% CI for HR | p-value | Interpretation |
|---|---|---|---|---|---|
| Diagnosis age | -0.0074 | 0.993 | (0.988, 0.997) | 0.00257 | Older age significantly lowers hazard (longer survival). |
| Buffa hypoxia score | 0.00637 | 1.006 | (1.0001, 1.013) | 0.045 | Weak significant increase in hazard with higher hypoxia. |
| Last communication contact | 0.000246 | 1.00025 | - | ≪ 0.001 | Very significant hazard increase with follow-up time. |
| MSI Mantis score | -1.703 | 0.182 | (0.034, 0.965) | 0.045 | High MSI Mantis score strongly reduces hazard. |
| Progress free survival (months) | (Not listed explicitly) | (Not listed) | - | ≈ 4.7e-18 | Strong positive association with hazard. |
| Ragnum hypoxia score | (Not listed explicitly) | (Not listed) | - | 0.046 | Weakly significant negative association (lower hazard). |
Other covariates such as aneuploidy score, fraction genome altered, sex, mutation count, etc., were not statistically significant (p > 0.05).
To predict survival probabilities or hazard for new patients:
LP = β₁x₁ + β₂x₂ + ... + βₙxₙ.λ = exp(LP) for the Weibull distribution.ρ (estimated intercept, ~1.619).t:S(t) = exp[-(λ * t)^ρ].h(t) = λ^ρ * ρ * t^(ρ-1).Software packages like lifelines in Python or survival in R can automate these calculations.