Motivation: Molecule Representation Learning (MRL) translates molecules into a real vector space, serving as input to downstream tasks in biology, chemistry, and computer science. This paper introduces a Chemical Synthesis Graph Learning (CSGL) framework, which enhances MRL by considering both the atomic structures of molecules and their roles in chemical reactions through a hierarchical graph representation. Specically, molecules are rst modeled based on their molecular graphs, which capture atomic-level structural information. They are then further rened using a Chemical Synthesis Graph, where nodes represent reactant and product molecule sets, and edges encode chemical transformations between reactants and products (e.g., changes in molecular structures). CSGL optimizes molecular embeddings of reactant and product nodes in a fashion that ensures the embeddings conform to a chemical balance constraint. Results: Experimental results show that our method CSGL achieves strong performance on a variety of tasks, including product prediction, reaction classication, and molecular property prediction. Availability: https://github.com/li-2023/CSGL, Contact: anchen.li@aalto. Supplementary Information: Supplementary data are available at Bioinformatics online.

CSGL: Chemical Synthesis Graph Learning for Molecule Representation / A. Li, E. Casiraghi, J. Rousu. - In: BIOINFORMATICS. - ISSN 1367-4811. - (2025), pp. 1-14. [Epub ahead of print] [10.1093/bioinformatics/btaf355]

CSGL: Chemical Synthesis Graph Learning for Molecule Representation

E. Casiraghi
Penultimo
;
2025

Abstract

Motivation: Molecule Representation Learning (MRL) translates molecules into a real vector space, serving as input to downstream tasks in biology, chemistry, and computer science. This paper introduces a Chemical Synthesis Graph Learning (CSGL) framework, which enhances MRL by considering both the atomic structures of molecules and their roles in chemical reactions through a hierarchical graph representation. Speci cally, molecules are rst modeled based on their molecular graphs, which capture atomic-level structural information. They are then further re ned using a Chemical Synthesis Graph, where nodes represent reactant and product molecule sets, and edges encode chemical transformations between reactants and products (e.g., changes in molecular structures). CSGL optimizes molecular embeddings of reactant and product nodes in a fashion that ensures the embeddings conform to a chemical balance constraint. Results: Experimental results show that our method CSGL achieves strong performance on a variety of tasks, including product prediction, reaction classi cation, and molecular property prediction. Availability: https://github.com/li-2023/CSGL, Contact: anchen.li@aalto. Supplementary Information: Supplementary data are available at Bioinformatics online.
Settore INFO-01/A - Informatica
Settore CHEM-04/A - Chimica industriale
Settore CHEM-04/A - Chimica industriale
2025
17-giu-2025
Article (author)
File in questo prodotto:
File Dimensione Formato  
CSJL_Anchen_Bioinformatics_btaf355.pdf

accesso aperto

Tipologia: Post-print, accepted manuscript ecc. (versione accettata dall'editore)
Dimensione 3.72 MB
Formato Adobe PDF
3.72 MB Adobe PDF Visualizza/Apri
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2434/1172715
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? 0
  • OpenAlex 0
social impact