Quantum Chemistry / DFT
Data Science
Mechanical / Viscosity / Viscoelasticity
Thermal
Small molecule penetration / diffusion / adsorption
Optical / Electrical / Magnetic
Interface / Phase Separation / Particle Dispersion
Molecular structure / Affinity / Solubility
Other Properties
Materials Science

[Analysis Example] Estimating Physical Properties Using σ-profile as Descriptors in Machine Learning

Predicting Glass Transition Temperature Using Machine Learning-Inferred σ-profile as Descriptors

Objectives and Methods

The σ-profile describes the surface charge distribution of a molecule and is used in COSMO-RS [1] and COSMO-SAC [2]. While it can predict physical properties such as solubility and phase equilibrium with high accuracy, it has the disadvantage of requiring quantum chemical calculations for its generation, which are computationally expensive.

J-OCTA's descriptor calculation function allows for fast σ-profile estimation using machine learning. With the ease of creating data sets, these descriptors can be considered for use in property estimation.

In this case study, the descriptor calculation function is used to infer the σ-profile, which is then employed as a feature in machine learning to predict physical properties.

1 Prediction of σ-Profile Using GCN

The σ-profile is represented by a spectrum and is used to predict various physical properties. Normally, a quantum chemical calculation is required to obtain the σ-profile, but J-OCTA uses a machine learning model, the Graph Convolutional Network (GCN), to rapidly estimate [3] it from SMILES, a molecular notation.

In this study, we compared the σ-profiles obtained from quantum chemical calculations and GCN. Gaussian was used for the quantum chemical calculation.

Result

Figure 1 shows the results of the estimation. It can be seen that the results are in good agreement with the results of quantum chemical calculations.

Figure 1: (a) AcetamideFigure 1: (a) Acetamide and (b) σ-Profile of Acetamide.
The blue line represents quantum chemical calculation, while the orange line represents machine learning prediction.

2 Prediction of Glass Transition Temperature Using XGBoost

We present an example of using the σ-profile, a descriptor of the COSMO method, as a feature for machine learning to predict physical properties. The J-OCTA machine learning function (MI-Suite) was used to build the model and prepare features to predict the glass transition temperature (Tg). The data set for Tg was taken from [4]. The data obtained includes the SMILES and Tg values for each compound.

The procedure for learning and predicting Tg is as follows:

  1. 1. The descriptor calculation function was used to deduce the σ-profile from the SMILES of each compound.
  2. 2. Learning was performed with the deduced σ-profile as the input value and Tg as the target value.
    The settings for learning are as follows:
    • - Among the learning methods supported by MI-Suite's learning function, XGBoost, a boosting-based learning method, was used.
    • - The data ratio of the training and test sets was 8:2.
  3. 3. The estimated σ-profiles were mapped using UMAP [5] and color-coded by Tg value.
    • - UMAP is a dimensionality reduction method that visualizes multidimensional space and improves interpretability.
    • - In this case, a 51-dimensional σ-profile was mapped to a two-dimensional space.

Result

Figure 2 shows the training results. The prediction accuracy was 𝑅2=0.999 for the training data and 𝑅2=0.942  for the test data, based on the coefficient of determination. Mapping results are shown in Figure 3. It can be observed that the feature indicated by the x-axis affects Tg, while the y-axis indicates another feature that has little to do with Tg. The plot of σ-profiles for points where only the x value differs significantly shows that the feature concentrated in x is the polarity of molecules.

Figure 2: Tg Prediction ResultsFigure 2: Tg Prediction Results.
The vertical axis represents predicted values, while the horizontal axis represents experimental values.

Figure 3: Mapping results and σ-profiles corresponding to each pointFigure 3: Mapping results and σ-profiles corresponding to each point.
Each point represents a substance, and the color indicates the respective Tg value. The x-axis and y-axis represent different characteristics. In this figure, the Tg value varies along the x-axis, suggesting that the characteristics represented by the x-axis are influential. The σ-profiles of the three substances are shown, and each σ-profile exhibits a characteristic shape.

Contact Us >>

Page Top

This website uses cookies to improve functionality and performance. If you continue browsing the site, you are giving implied consent to the use of cookies on this website. If you want to know more or refuse consent, read our Cookie Policy.

Accept