Background: The identification of a location-, scale- and shape-sensitive test to detect differentially expressed features between two comparison groups represents a key point in high dimensional studies. The most commonly used tests refer to differences in location, but general distributional discrepancies might be important to reveal differential biological processes. Methods: A simulation study was conducted to compare the performance of a set of two-sample tests, i.e. Student's t, Welch's t, Wilcoxon-Mann-Whitney (WMW), Podgor-Gastwirth PG2, Cucconi, Kolmogorov-Smirnov (KS), Cramervon Mises (CvM), Anderson-Darling (AD) and Zhang tests (ZK, ZC and ZA) which were investigated under different distributional patterns. We applied the same tests to a real data example. Results: AD, CvM, ZA and ZC tests proved to be the most sensitive tests in mixture distribution patterns, while still maintaining a high power in normal distribution patterns. At best, the AD test showed a power loss of ~ 2% in the comparison of two normal distributions, but a gain of ~ 32% with mixture distributions with respect to the parametric tests. Accordingly, the AD test detected the greatest number of differentially expressed features in the real data application. Conclusion: The tests for the general two-sample problem introduce a more general concept of 'differential expression', thus overcoming the limitations of the other tests restricted to specific moments of the feature distributions. In particular, the AD test should be considered as a powerful alternative to the parametric tests for feature screening in order to keep as many discriminative features as possible for the class prediction analysis.

Parametric and nonparametric two-sample tests for feature screening in class comparison : a simulation study / E. Landoni, F. Ambrogi, L. Mariani, R. Miceli. - In: EPIDEMIOLOGY BIOSTATISTICS AND PUBLIC HEALTH. - ISSN 2282-0930. - 13:2(2016), pp. e11808.1-e11808.11. [10.2427/11808]

Parametric and nonparametric two-sample tests for feature screening in class comparison : a simulation study

E. Landoni;F. Ambrogi;
2016

Abstract

Background: The identification of a location-, scale- and shape-sensitive test to detect differentially expressed features between two comparison groups represents a key point in high dimensional studies. The most commonly used tests refer to differences in location, but general distributional discrepancies might be important to reveal differential biological processes. Methods: A simulation study was conducted to compare the performance of a set of two-sample tests, i.e. Student's t, Welch's t, Wilcoxon-Mann-Whitney (WMW), Podgor-Gastwirth PG2, Cucconi, Kolmogorov-Smirnov (KS), Cramervon Mises (CvM), Anderson-Darling (AD) and Zhang tests (ZK, ZC and ZA) which were investigated under different distributional patterns. We applied the same tests to a real data example. Results: AD, CvM, ZA and ZC tests proved to be the most sensitive tests in mixture distribution patterns, while still maintaining a high power in normal distribution patterns. At best, the AD test showed a power loss of ~ 2% in the comparison of two normal distributions, but a gain of ~ 32% with mixture distributions with respect to the parametric tests. Accordingly, the AD test detected the greatest number of differentially expressed features in the real data application. Conclusion: The tests for the general two-sample problem introduce a more general concept of 'differential expression', thus overcoming the limitations of the other tests restricted to specific moments of the feature distributions. In particular, the AD test should be considered as a powerful alternative to the parametric tests for feature screening in order to keep as many discriminative features as possible for the class prediction analysis.
high-dimensional data; class comparison; location-scale problem; general two-sample problem; mixtures
Settore MED/01 - Statistica Medica
2016
Article (author)
File in questo prodotto:
File Dimensione Formato  
Parametric and nonparametric two-sample tests for feature screening in class comparison.pdf

accesso aperto

Tipologia: Publisher's version/PDF
Dimensione 479.66 kB
Formato Adobe PDF
479.66 kB Adobe PDF Visualizza/Apri
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2434/637766
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 3
  • ???jsp.display-item.citation.isi??? 3
social impact