The graphical models (GM) for categorical data are models useful to represent conditional independencies through graphs. The variables are represented by vertices and the relationships among the variables by the presence/absence of edges, for details see Lauritzen (1996). Chain graphs (CG) are particular graphs able to represent complex structures of independencies, taking advantage of the possibility of grouping the variables into components. There are 4 types of GMs associated to CGs, see Drton (2009). In this work we analyse the GMs of type II (GM II), proposed by Andersson Madigan and Perlman (2001). This choice is due to different aspects. First, the grouping of variables in components allows to split the variables in ”purely explicative” variables, ”purely response” variables and ”intervening” variables. Secondly, in the GMs II, the relationship among a variable and its explicative variables is considered marginally regarding the variables in the same component. Finally, the GMs II model the association between the variables within the same component using a log-linear approach. All these topics make the GMs II one of the easiest interpretable models. Unfortunately, Drton (2009) showed that these models are not always smooth. As the parametric marginal models for categorical data have useful properties for the asymptotic properties of the ML estimators, showed by Bergsma and Rudas (2002), we are interested to study which GM of type II can be parametrized as marginal models. In this work we present a subclass of smooth GMs II having this property, applying theorem 1 of Bergsma, Rudas and Nem´eth (2011). The marginal models, obtained by parameterizing sets of marginal probability functions with log-linear parameters, are even used for their capability to describe relationships through variables constricting to zero certain parameters. In order to show the main results on GMs II, we analyze the data from the European Values Study (EVS), (2008). The EVS is a research project on human values in Europe. In particular, the research involves how Europeans think about family, work, religion, politics and society. From this dataset we build different subsets of data collecting the observations on different variables in order to investigate different problems. For all datasets we divide the variables in two or three groups. In the first group we place the variables concerning the personal data of the respondents (i.e. sex, range of age, country,...). In the second (possible) group there are variables about the achievements of the respondents (i.e. education level, house owner, employed, children...). Finally, the last group regards the variables that consider the opinion of the respondents about the main topics cited above (i.e. family, work, religion, politics and society). Each group of variables is represented with a component in the graphs. For all datasets we propose certain graphical models in order to find the most representative model. Applying this method on both national datasets and European dataset, we highlight some interesting trends in the opinion of the European citizens. The statistical software R-project is used with the help of the package ”hmmm”, (that is available from the comprehensive R Archive Network out http://cran.r-project.org/web/packages/hmmm) for the test of the marginal models and the estimation of the parameters and the packages ”gRbase” (http://cran.r-roject.org/web/packages/gRbase) and ”RBGL” (http://www.bioconductor.org/packages/release/bioc/html/RBGL.html) to the part concerning the graphs. The work will be structured in two sections. In the first we will give basic concepts about the methodology, furthermore graphical models for chain graph, marginal models and the subclass of GMs II that will be used. In the second section we will introduce the different datasets and will be shown the applications on the different data, with the main aspects.

Smooth Graphical models of type II: link with marginal models / F. Nicolussi - In: Proceedings of the 28th international workshop on statistical modelling. 2 / [a cura di] M.R. V. Muggeo. - Palermo : Istituto poligrafico europeo, 2013. - ISBN 9788896251492. (( Intervento presentato al 28. convegno International workshop on statistical modelling tenutosi a Palermo nel 2013.

Smooth Graphical models of type II: link with marginal models

F. Nicolussi
2013

Abstract

The graphical models (GM) for categorical data are models useful to represent conditional independencies through graphs. The variables are represented by vertices and the relationships among the variables by the presence/absence of edges, for details see Lauritzen (1996). Chain graphs (CG) are particular graphs able to represent complex structures of independencies, taking advantage of the possibility of grouping the variables into components. There are 4 types of GMs associated to CGs, see Drton (2009). In this work we analyse the GMs of type II (GM II), proposed by Andersson Madigan and Perlman (2001). This choice is due to different aspects. First, the grouping of variables in components allows to split the variables in ”purely explicative” variables, ”purely response” variables and ”intervening” variables. Secondly, in the GMs II, the relationship among a variable and its explicative variables is considered marginally regarding the variables in the same component. Finally, the GMs II model the association between the variables within the same component using a log-linear approach. All these topics make the GMs II one of the easiest interpretable models. Unfortunately, Drton (2009) showed that these models are not always smooth. As the parametric marginal models for categorical data have useful properties for the asymptotic properties of the ML estimators, showed by Bergsma and Rudas (2002), we are interested to study which GM of type II can be parametrized as marginal models. In this work we present a subclass of smooth GMs II having this property, applying theorem 1 of Bergsma, Rudas and Nem´eth (2011). The marginal models, obtained by parameterizing sets of marginal probability functions with log-linear parameters, are even used for their capability to describe relationships through variables constricting to zero certain parameters. In order to show the main results on GMs II, we analyze the data from the European Values Study (EVS), (2008). The EVS is a research project on human values in Europe. In particular, the research involves how Europeans think about family, work, religion, politics and society. From this dataset we build different subsets of data collecting the observations on different variables in order to investigate different problems. For all datasets we divide the variables in two or three groups. In the first group we place the variables concerning the personal data of the respondents (i.e. sex, range of age, country,...). In the second (possible) group there are variables about the achievements of the respondents (i.e. education level, house owner, employed, children...). Finally, the last group regards the variables that consider the opinion of the respondents about the main topics cited above (i.e. family, work, religion, politics and society). Each group of variables is represented with a component in the graphs. For all datasets we propose certain graphical models in order to find the most representative model. Applying this method on both national datasets and European dataset, we highlight some interesting trends in the opinion of the European citizens. The statistical software R-project is used with the help of the package ”hmmm”, (that is available from the comprehensive R Archive Network out http://cran.r-project.org/web/packages/hmmm) for the test of the marginal models and the estimation of the parameters and the packages ”gRbase” (http://cran.r-roject.org/web/packages/gRbase) and ”RBGL” (http://www.bioconductor.org/packages/release/bioc/html/RBGL.html) to the part concerning the graphs. The work will be structured in two sections. In the first we will give basic concepts about the methodology, furthermore graphical models for chain graph, marginal models and the subclass of GMs II that will be used. In the second section we will introduce the different datasets and will be shown the applications on the different data, with the main aspects.
Categorical data; chain graphs; EVS; hierarchical and complete marginal parametrization; log-linear parameters; Markov properties; smoothness
Settore SECS-S/01 - Statistica
2013
Book Part (author)
File in questo prodotto:
File Dimensione Formato  
IWSM_2013_Smooth Graphical models of type II_IWSM13.pdf

accesso riservato

Tipologia: Post-print, accepted manuscript ecc. (versione accettata dall'editore)
Dimensione 153.45 kB
Formato Adobe PDF
153.45 kB Adobe PDF   Visualizza/Apri   Richiedi una copia
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2434/576121
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact