Informing Zeolite Synthesis Enabled by Natural Language Processing

We have built an automated way to extract and combine body text and table information from published literature on the synthesis of zeolites, an industrially significant catalyst material. These tools are important as they move the field closer to the ability to predict and design synthesis routes for zeolites.

In this contribution, we have developed methods to accelerate the successful synthesis of new catalysts. We have built an automated way to extract and combine body text and table information from published literature on the synthesis of zeolites, an industrially significant catalyst material. 

These tools are important as they move the field closer to the ability to predict and design synthesis routes for zeolites. Researchers have previously estimated the energetic feasibility of several million unique zeolite structures, but only about 200 have been made and far fewer are commercially available. 

By examining and developing learning models for this automatically extracted data we learn key variables for making low framework density zeolites including the ratio of Si to Ge, the concentration of the gel, and the volume of the organic templating molecule. This work is enabled by linkages between theory, synthesis and machine learning.

We have made fully open source the data and tools developed for this work. Those have been uploaded at our website synthesisproject.org.

This includes the table extractor along with a tutorial and description as well as the data extracted for the models that have been developed. These tools can be modified by other researchers across a broad range of disciplines who wish to automatically extract information from the literature. In this way we are enabling broader use of information from published scientific manuscripts.

Related Software Innovation Tool: The Synthesis Project 

Designing Materials to Revolutionize and Engineer our Future (DMREF)