Rethinking Molecule Synthesizability with Chain-of-Reaction

Generative AI & LLMs
Published: arXiv: 2509.16084v1
Authors

Seul Lee Karsten Kreis Srimukh Prasad Veccham Meng Liu Danny Reidenbach Saee Paliwal Weili Nie Arash Vahdat

Abstract

A well-known pitfall of molecular generative models is that they are not guaranteed to generate synthesizable molecules. There have been considerable attempts to address this problem, but given the exponentially large combinatorial space of synthesizable molecules, existing methods have shown limited coverage of the space and poor molecular optimization performance. To tackle these problems, we introduce ReaSyn, a generative framework for synthesizable projection where the model explores the neighborhood of given molecules in the synthesizable space by generating pathways that result in synthesizable analogs. To fully utilize the chemical knowledge contained in the synthetic pathways, we propose a novel perspective that views synthetic pathways akin to reasoning paths in large language models (LLMs). Specifically, inspired by chain-of-thought (CoT) reasoning in LLMs, we introduce the chain-of-reaction (CoR) notation that explicitly states reactants, reaction types, and intermediate products for each step in a pathway. With the CoR notation, ReaSyn can get dense supervision in every reaction step to explicitly learn chemical reaction rules during supervised training and perform step-by-step reasoning. In addition, to further enhance the reasoning capability of ReaSyn, we propose reinforcement learning (RL)-based finetuning and goal-directed test-time compute scaling tailored for synthesizable projection. ReaSyn achieves the highest reconstruction rate and pathway diversity in synthesizable molecule reconstruction and the highest optimization performance in synthesizable goal-directed molecular optimization, and significantly outperforms previous synthesizable projection methods in synthesizable hit expansion. These results highlight ReaSyn's superior ability to navigate combinatorially-large synthesizable chemical space.

Paper Summary

Problem
The main problem addressed in this paper is the limitation of molecular generative models in generating synthesizable molecules. These models often produce molecules that are not easily accessible through chemical synthesis, making them impractical for real-world applications such as drug discovery.
Key Innovation
The innovation of this work lies in the introduction of ReaSyn, a generative framework for synthesizable projection that views synthetic pathways as chain-of-thought (CoT) reasoning paths. This is achieved through the use of a novel notation called chain-of-reaction (CoR), which explicitly states reactants, reaction types, and intermediate products for each step in a pathway. This allows ReaSyn to learn chemical reaction rules during supervised training and perform step-by-step reasoning.
Practical Impact
The practical impact of this research is significant, as it has the potential to accelerate the drug discovery process by generating synthesizable molecules that are more likely to be accessible through chemical synthesis. This could lead to the development of new and more effective treatments for various diseases. Additionally, the framework proposed in this paper can be applied to other fields where molecular generation is relevant, such as materials science and chemistry.
Analogy / Intuitive Explanation
Imagine trying to solve a complex puzzle, where each step requires a specific sequence of actions to reach the final solution. In the same way, ReaSyn uses the CoR notation to break down the complex process of chemical synthesis into a series of individual steps, allowing it to reason and generate synthesizable molecules more effectively. This analogy highlights the step-by-step nature of ReaSyn's reasoning process, which is a key innovation of this work.
Paper Information
Categories:
cs.LG
Published Date:

arXiv ID:

2509.16084v1

Quick Actions