NOVEL METHODS TO EXTRACT COSMOLOGICAL INFORMATION FROM GALAXY REDSHIFT SURVEYS

Cagliari, M.S.

After decades of successes, the ΛCDM standard cosmological model is facing the first cracks in its structure. The nature of the two most abundant components of the Universe, namely dark energy and dark matter, still eludes our understanding and we started observing consistent discrepancies between the early and late-time measurements of some cosmological parameters. To clarify if these tensions are indicating a deeper problem in the ΛCDM model and hopefully understand the meaning of its key ingredients, a new generation of cosmological surveys has just started. A key probe of the cosmological model is provided by the large-scale distribution of structures in the Universe. The cosmic web contains information related to late-time parameters, such as the cosmological constant or the equation of state of dark energy, and gives means to determine, among others, the fraction of matter in the Universe, the linear matter power spectrum amplitude, or the neutrino mass. For this reason, starting from the 80s of last century, the amount of data available for large-scale structure studies has steadily increased. It is now about to make a further leap forward thanks to the fourth-generation galaxy surveys, such as Euclid, the dark energy spectroscopic instrument (DESI), or the Vera C. Rubin Observatory legacy survey of space and time (LSST). In comparison to previous surveys, these experiments will observe larger volumes and will measure photometric and spectroscopic information for an unprecedented number of galaxies. Standard analysis methods will become sub-optimal in terms of data management, both memory and time-wise, data modelling, and information extraction capabilities. To achieve such ambitious goals, it is mandatory to develop new methods to study the data and improve their management at all levels of the analysis pipelines. In order to meet the requirements on the precision and accuracy of cosmological parameters, we need, in particular, to efficiently select the samples to be analysed, to measure redshifts with high confidence, and to correctly model summary statistics at all scales. The primary interest of my work is the development of alternative algorithms to improve the extraction of scientific information from large-scale galaxy surveys. The focus is on machine learning-based models, but I also study the potential of more standard methods, such as optimal quadratic estimators. In the first part of this thesis, I develop and discuss two algorithms that exploit galaxy photometric information to measure redshifts and select samples for clustering analyses. First, I present a novel method that exploits the angular correlation of galaxies to improve photometric redshift measurements. We worked on a graph neural network that classifies angular close pairs of galaxies based on their photometric properties as true or false physical neighbours. The algorithm is especially useful when the spectroscopic information of one of the galaxies in the pair is known. In this case, the graph neural network helps identify catastrophic errors in the redshift measurements reducing the dispersion of the final photometric sample by a factor of 2 and the fraction of catastrophic errors by a factor of ∼4. This method is complementary to traditional techniques based on spectral energy distribution fitting and it also helps break the degeneracies in colour-redshift space the standard algorithms are prone to. Secondly, I explore the efficiency of machine learning classifiers for galaxy photometric selection tasks. The aim of this work is to improve the purity and completeness of the Euclid galaxy clustering spectroscopic sample using photometric information. I conduct a performance comparison among six machine learning classifiers and traditional photometric selection methods based on colour and magnitude cuts. The results reveal that machine learning algorithms, especially neural networks and support vector classifiers, can identify more intricate boundaries in the multidimensional colour-magnitude space compared to standard techniques. Demonstrating the efficacy of combining spectroscopic selection with neural network photometric selection, I observe an improvement in the redshift purity of the final sample by approximately 20% and 50% when using Euclid photometry alone and Euclid in combination with ground-based photometry, respectively. In the second part of the thesis, I report my work on cosmological parameter measurements with galaxy clustering data. I present two alternatives to traditional approaches. I first illustrate my work with the optimal quadratic estimator of the signal of local primordial non-Gaussianities (PNG), parameterised by f_NL, from the large-scale structure of the Universe. The analysis makes use of optimal redshift weights that maximise the response of the tracers to the possible presence of non-zero PNG. Analysing the power spectrum monopole of the quasar sample of the latest data release of the extended baryon oscillation spectroscopic survey (eBOSS), I obtain one of the most stringent constraints on local PNG from large-scale structure data up to date. This method not only mitigates the bias in the results, but also yields more precise bounds, with an estimated error on f_NL of σ_fNL∼16. This corresponds to an improvement of approximately 13% compared to the standard approach. In scenarios where quasars exhibit a lower response to local PNG, the optimal constraint gives σ_fNL∼21, representing an improvement of around 30% over standard analyses. This work is a first step in the direction of high-precision f_NL measurements from large-scale structure data, which will enable us to better understand the dynamics of inflation. Finally, I discuss a preliminary study on the application of convolutional neural networks for a field-level analysis of large-scale structure data. This investigation is currently confined to the analysis of dark matter halo distributions. However, it applies a realistic survey geometry to generate training data and utilises observational information, such as halo angular positions and redshifts, to construct the network inputs. A novelty is that the training data for the convolutional neural network are generated using a third-order Lagrangian perturbation theory (3LPT) code, which is faster in producing halo catalogues than an N-body simulation. I assess the neural network performance on both 3LPT and N-body simulations to determine its generalisation ability across simulation types. Preliminary findings indicate that, in both real and redshift space, with a field pixelisation of approximately ∼10 Mpc/h, the convolutional neural network consistently produces comparable results for both 3LPT and N-body simulations. The possibility to train machine learning algorithms for field-level analyses with fast simulations is of major importance. It would greatly reduce the computational costs of these methods making them a competitive alternative to traditional approaches.

NOVEL METHODS TO EXTRACT COSMOLOGICAL INFORMATION FROM GALAXY REDSHIFT SURVEYS / M.s. Cagliari ; tutor: B. R. Granett, L. Guzzo ; coordinator: R. Vecchi. - Dipartimento di Fisica. Università degli Studi di Milano, 2024 Mar 27. 36. ciclo, Anno Accademico 2022/2023.