Voxel Image Representation Learning of Crystalline Materials for Formation Energy Prediction

Jun 20, 2025
Figure. The overall design of the deep convolutional neural network and the fully connected neural network of this study. The crystal structures are digitized into 3D colored sparse voxel images which are input to a deep convolutional neural network. The network consists of 7 residual blocks arranged in sequence in combination with merging and pooling layers. The architecture of each residual block is shown in the inset, which consists of a skip connection used to bypass the output of the previous block to the next. The latent features learned by the convolutional neural network are flattened and input into a fully connected neural network which performs the final prediction of the formation energy.
Figure. The overall design of the deep convolutional neural network and the fully connected neural network of this study. The crystal structures are digitized into 3D colored sparse voxel images which are input to a deep convolutional neural network. The network consists of 7 residual blocks arranged in sequence in combination with merging and pooling layers. The architecture of each residual block is shown in the inset, which consists of a skip connection used to bypass the output of the previous block to the next. The latent features learned by the convolutional neural network are flattened and input into a fully connected neural network which performs the final prediction of the formation energy.

We demonstrate the use of visual image representation and deep convolutional neural networks (CNNs) in material property prediction. While machine learning methods have been utilized in materials research, this study shows the relatively unexplored application of computer vision techniques. By encoding crystalline materials using visual image representation and learning complex features using deep CNNs, our model accurately predicts the formation energy, a crucial material property.

Our research is groundbreaking in demonstrating that the underlying physicochemical information of materials can be learned solely from their visual representation, without the need for additional physical characteristics. This finding highlights the untapped potential of computer vision techniques in materials science and opens new possibilities for fully leveraging the power of machine learning. Moreover, our work suggests exciting prospects for generative machine learning. By utilizing the introduced visual image representation, reverse engineering of materials with target properties can be made.

Authors

Sara Kadkhodaei (University of Illinois-Chicago)

Additional Materials

Designing Materials to Revolutionize and Engineer our Future (DMREF)