Motivation: Molecule Representation Learning (MRL) translates molecules into a real vector space, serving as input to downstream tasks in biology, chemistry, and computer science. This paper introduces a Chemical Synthesis Graph Learning (CSGL) framework, which enhances MRL by considering both the atomic structures of molecules and their roles in chemical reactions through a hierarchical graph representation. Specically, molecules are rst modeled based on their molecular graphs, which capture atomic-level structural information. They are then further rened using a Chemical Synthesis Graph, where nodes represent reactant and product molecule sets, and edges encode chemical transformations between reactants and products (e.g., changes in molecular structures). CSGL optimizes molecular embeddings of reactant and product nodes in a fashion that ensures the embeddings conform to a chemical balance constraint. Results: Experimental results show that our method CSGL achieves strong performance on a variety of tasks, including product prediction, reaction classication, and molecular property prediction. Availability: https://github.com/li-2023/CSGL, Contact: anchen.li@aalto. Supplementary Information: Supplementary data are available at Bioinformatics online.
CSGL: Chemical Synthesis Graph Learning for Molecule Representation / A. Li, E. Casiraghi, J. Rousu. - In: BIOINFORMATICS. - ISSN 1367-4811. - (2025), pp. 1-14. [Epub ahead of print] [10.1093/bioinformatics/btaf355]
CSGL: Chemical Synthesis Graph Learning for Molecule Representation
E. CasiraghiPenultimo
;
2025
Abstract
Motivation: Molecule Representation Learning (MRL) translates molecules into a real vector space, serving as input to downstream tasks in biology, chemistry, and computer science. This paper introduces a Chemical Synthesis Graph Learning (CSGL) framework, which enhances MRL by considering both the atomic structures of molecules and their roles in chemical reactions through a hierarchical graph representation. Specically, molecules are rst modeled based on their molecular graphs, which capture atomic-level structural information. They are then further rened using a Chemical Synthesis Graph, where nodes represent reactant and product molecule sets, and edges encode chemical transformations between reactants and products (e.g., changes in molecular structures). CSGL optimizes molecular embeddings of reactant and product nodes in a fashion that ensures the embeddings conform to a chemical balance constraint. Results: Experimental results show that our method CSGL achieves strong performance on a variety of tasks, including product prediction, reaction classication, and molecular property prediction. Availability: https://github.com/li-2023/CSGL, Contact: anchen.li@aalto. Supplementary Information: Supplementary data are available at Bioinformatics online.| File | Dimensione | Formato | |
|---|---|---|---|
|
CSJL_Anchen_Bioinformatics_btaf355.pdf
accesso aperto
Tipologia:
Post-print, accepted manuscript ecc. (versione accettata dall'editore)
Dimensione
3.72 MB
Formato
Adobe PDF
|
3.72 MB | Adobe PDF | Visualizza/Apri |
Pubblicazioni consigliate
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.




