METODI STATISTICI PER L'ANALISI E LA PREVISIONE DELLA MORTALITA' PER TUMORE

Rosso, T.

doi:10.13130/rosso-tiziana_phd2015-12-11

The introduction of time series modeling techniques made analyzing the different factors underlying the changes in mortality and incidence rates over time possible, both for analytic and predictive purposes. Age-period-cohort analyses contribute to the etiologic purpose of descriptive epidemiology making inference from the group to the individual possible. These refer to a family of statistical techniques that study the temporal trends of outcomes, such as mortality an incidence, in terms of three temporal variables: subject age, calendar period and the subject's birth cohort. Useful as it is, the age-period-cohort model is marred by a structural problem of identifiability: the variables of age, period and cohort have an exact linear dependence, i.e. "age = period - cohort". Predicting a future event is a complex and insidious process, however, it is a useful endeavor in most human activities. The information gained on probable future trends, even if unreliable or imprecise is highly valuable. Predicted future cancer incidence and mortality rates are essential tools for both epidemiology and health planning. Numerous methods to carry out age-period-cohort analysis are described in the literature, three of these are illustrated in detail and compared by applying them to real data (WHO mortality database): a method based on penalized likelihood, one using generalized additive models (GAM) and one based on partial least squares (PLS) techniques. Predictive analysis techniques are also presented and compared, using observed mortality data. Short term age-period prediction methods based on joinpoint analysis and Bayesian modelling, and a long term technique, which uses a Bayesian age-period-cohort model, are reviewed. In details, predictions through age-period method based on joinpoint analysis are carried out applying linear, Poisson and log-linear regression models. In the age-period-cohort analysis comparison, the penalized likelihood and GAM methods produce similar results, while effect estimates from the PLS model are noticeably different. These differences can be explained by looking at how the three models solve the issue of perfect collinearity between age, period and cohort parameters. On the one hand, the penalized likelihood and GAM methods use different techniques to distribute the linear drift between the period and cohort effects. The PLS method, on the other hand, solves the identifiability problem by tackling the generalized inverse, minimizing the estimated parameter variance and covariance matrix. Without a formal simulation analysis, comments are limited to stating that the two models based on linear drift distribution are more suitable for epidemiological comparisons, where the effects of age are well defined (as in the case of cancer mortality) and the major problems reside in untangling the period and cohort effects. The PLS model, on the other hand, may hypothetically prove to be a useful method to predict future trends. Age-period-cohort analysis is thus an extremely useful tool in the study of mortality data, particularly for cohort effect analysis, but it should be used with due caution since it is relatively easy to draw erroneous conclusions. The predictive method comparison shows that estimates from the different models are similar, especially for the Poisson and log-linear models. However, the linear model has a tendency to underestimate, while the other considered models seem to overestimate, particularly as the forecasting time period grew larger. Overall, the Bayesian age-period model seems to be less suitable for short and medium term mortality predictions, while the other models do not show large performance differences. From these limited tests the linear model and the Bayesian age-period-cohort model seem to provide better estimates when mortality values are low, whereas in the case of greater numbers Poisson and log-linear models seem like better choices. Finally, the analyzed data's unknown underlying distribution shape determines which model predicts more successfully. However, all the studied models are appropriate for predicting data over short periods (up to 5 years). While none of them performs well over the medium term. Prediction of future trends will always be a complex and insidious exercise, albeit an extremely useful one, furthermore the obtained estimates should be taken with caution and only regarded as a general indication of potential interest for epidemiology and health planning.

METODI STATISTICI PER L'ANALISI E LA PREVISIONE DELLA MORTALITA' PER TUMORE / T. Rosso ; tutor: A. Decarli ; coordinator: A. Decarli. DIPARTIMENTO DI SCIENZE CLINICHE E DI COMUNITA', 2015 Dec 11. 28. ciclo, Anno Accademico 2015. [10.13130/rosso-tiziana_phd2015-12-11].