抱歉,您的浏览器无法访问本站
本页面需要浏览器支持(启用)JavaScript
了解详情 >

今年课题组发在NAR上的一篇文章, IF: 16.971

ADMETlab 2.0: an integrated online platform for accurate and comprehensive predictions of ADMET properties
ADMETlab 2.0: an integrated online platform for accurate and comprehensive predictions of ADMET properties

论文链接:

访问地址:

1. Overview

由于候选化合物的不良药代动力学和毒性是药物开发失败的主要原因,因此,吸收、分布、代谢、排泄和毒性(ADMET)的评价应尽早得到评估。在计算机中进行的大量实验中,ADMET评价模型已被开发为辅助药物化学家设计和优化先导物的辅助工具。在这里,我们宣布了ADMETlab 2.0的发布,这是广泛使用的AMDETlab web服务器的一个完全重新设计的版本,用于预测药物动力学和化学品毒性特性,其中支持的admet相关端点的数量大约是上一个版本的两倍。包括17个理化性质,13个药用化学性质,23个ADME性质,27个毒性终点和8个毒性基团规则。采用多任务图注意框架(MGA),在AdmetLab 2.0中开发强大和准确的模型。批量计算模块是响应于用户的批量请求提供的,并且结果进一步优化了结果的表示。

Figure 1. Workflow scheme of ADMETlab 2.0
Figure 1. Workflow scheme of ADMETlab 2.0

2. New Developments

1. Comprehensively enhanced ADMET profiles

In this update, the available ADMET profile is extended to 88 related characteristics spanning 7 different categories, roughly twice the number of its predecessor. Compared with the initial version, the number of entries for model training in the current release has almost tripled.

2. Re-engineered modules and batch evaluation support

The functional modules were re-engineered and optimized to improve the user experience. An independent module has been added for supporting batch uploading and downloading. The users could define their own criterion to promising and desirable molecules.

3. Robust and accurate MGA models

The MGA framework was employed to develop classification and regression predictors simultaneously. Deep learning makes multitask learning very natural and the combination leads to improved performance for many modeled endpoints.

4. Practical explanation and guidance

Detailed explanation and optimal range of each property are provided to help the users to get a whole ADMET picture of input molecule. The empirical-based decision states of each property are visually represented with different colored dots (green: excellent; yellow: medium; red: poor).

3. Program Description

1. Model data

Table 1. Data information of 53 predictive models

Properties Total (positive/Negative) training set (positive/Negative) test set (positive/Negative) valuation set (positive/Negative)
LogS 4797 3836 480 481
LogD7.4 10370 8296 1036 1038
LogP 12682 10145 1270 1267
Caco-2 Permeability 2464 1970 247 247
MDCK Permeability 1140 912 114 114
Pgp-inhibitor 2209 (1315/894) 1764 (1051/713) 222 (132/90) 223 (132/91)
pgp-substrate 1185 (586/599) 949 (471/478) 118 (58/60) 118 (57/61)
HIA 1160 (1022/138) 927 (818/109) 116 (101/15) 117 (103/14)
F20% 992 (753/239) 794 (602/192) 98 (75/23) 100 (76/24)
F30% 992 (666/326) 793 (532/261) 99 (67/32) 100 (67/33)
PPB 4712 3771 479 480
VD 1086 872 107 107
BBB Penetration 2865 (1651/1254) 2324 (1321/1003) 290 (165/125) 291 (165/126)
Fu 2575 2059 258 258
CYP1A2 inhibitor 12635 (5876/6759) 10111 (4702/5425) 1261 (588/673) 1263 (586/677)
CYP1A2 substrate 366 (176/190) 292 (140/152) 37 (18/19) 37 (18/19)
CYP2C19 inhibitor 12611 (5770/6841) 10096 (4618/5478) 1257 (577/680) 1258 (575/683)
CYP2C19 substrate 258 (107/151) 206 (85/121) 26 (11/15) 26 (11/15)
CYP2C9 inhibitor 12111 (4017/8094) 9686 (3213/6473) 1213 (402/811) 1212 (402/810)
CYP2C9 substrate 811 (325/486) 647 (259/388) 82 (33/49) 82 (33/49)
CYP2D6 inhibitor 13073 (2535/10538) 10471 (2032/8439) 1304 (255/1051) 1298 (250/1048)
CYP2D6 substrate 877 (435/442) 703 (347/356) 85 (44/41) 89 (44/45)
CYP3A4 inhibitor 12339 (5092/7247) 9880 (4074/5806) 1232 (510/722) 1227 (508/719)
CYP3A4 substrate 979 (497/482) 786 (397/389) 97 (49/48) 96 (51/45)
CL 831 666 81 84
T1/2 1219 (500/719) 973 (399/574) 124 (51/73) 122 (50/72)
hERG Blockers 13845 (6922/6923) 11076 (5538/5538) 1384 (692/692) 1385 (692/693)
H-HT 2304 (1299/1005) 1850 (1044/806) 227 (128/99) 227 (127/100)
DILI 467 (235/232) 373 (187/186) 47 (24/23) 47 (24/23)
AMES Toxicity 7575 (4222/3353) 6071 (3389/2682) 751 (416/335) 753 (417/336)
Rat Oral Acute Toxicity 7327 (2799/4528) 5862 (2240/3622) 733 (280/453) 732 (279/453)
FDAMDD 1197 (561/636) 957 (448/509) 120 (56/64) 120 (57/63)
Skin Sensitization 405 (274/131) 324 (219/105) 40 (27/13) 41 (28/13)
Carcinogencity 1041 (516/525) 832 (413/419) 104 (51/53) 105 (52/53)
Bioconcentration Factor 676 540 68 68
IGC50 1787 1429 179 179
LC50FM 816 652 82 82
LC50DM 347 277 35 35
Eye Corrosion 2298 (886/1412) 1838 (709/1129) 230 (89/141) 230 (84/142)
Eye Irritation 5219 (3874/1345) 4176 (3099/1077) 522 (388/134) 521 (387/134)
Respiratory Toxicity 1388 (835/553) 1109 (666/443) 139 (84/55) 140 (85/55)
NR-AR 7312 (266/7046) 5853 (213/5640) 726 (26/700) 733 (27/706)
NR-AR-LBD 6862 (233/6629) 5493 (186/5307) 688 (23/665) 681 (24/657)
NR-AhR 6603 (763/5840) 5285 (610/4675) 657 (77/580) 661 (76/585)
NR-Aromatase 5887 (256/5631) 4711 (205/4506) 588 (25/563) 588 (26/562)
NR-ER 6166 (669/5497) 4935 (536/4399) 616 (66/550) 615 (67/548)
NR-ER-LBD 7052 (342/6710) 5643 (274/5369) 701 (33/668) 708 (35/673)
NR-PPAR-gamma 6586 (197/6389) 5266 (158/5108) 661 (19/642) 659 (20/639)
SR-ARE 5652 (865/4787) 4521 (691/3830) 564 (87/477) 567 (87/480)
SR-ATAD5 7170 (249/6921) 5736 (199/5537) 718 (25/693) 716 (25/691)
SR-HSE 6319 (360/5959) 5059 (289/4770) 630 (35/595) 630 (36/594)
SR-MMP 5913 (892/5021) 4735 (713/4022) 592 (91/501) 586 (88/498)
SR-p53 6915 (456/6459) 5543 (364/5179) 692 (46/646) 680 (46/634)

2. MGA framework

An overview of the Multi-task Graph Attention (MGA) framework is shown in Figure 2. As shown in Figure 2, MGA is composed of input, Relation graph convolution network (RGCN) layers, attention layer and fully-connected (FC) layers. In the Input, a node represents the information of an atom, and after passing RGCN layers, the node represents general features of circular substructure centered on the atom. RGCN is an extension of the standard graph convolution network (GCN) by introducing edge features to enrich the messages used to update the hidden states in the network. The propagation rule for each node in RGCN layer is calculated via

$h_{v}^{(l+1)}=\sigma\left(\sum_{r \in R} \sum_{u \epsilon N_{v}^{r}} W_{r}^{(l)} h_{u}^{(l)}+W_{u}^{(l)} h_{v}^{(l)}\right)$

where $h_{v}^{(l+1)}$ is the state vector of target node v after l+1 iterations and $N_{v}^{r}$ denotes the neighbors of node v under the relation (edge) $r \epsilon R$. $W_{r}^{(l)}$ is the weight for neighbor node u connecting to node v by an edge attributed with the relation $r \in R$, and $W_{0}^{(l)}$ is the weight for target node v. As can be seen above, the edge information is explicitly incorporated in a RGCN under the relation $r \in R$. The weight $W_{r}^{(l)}$ is a linear combination of basis transformation.

As shown in Figure 2B, attention layers can assign different attention weights to different substructures, and then generate the customized fingerprints (CFP) from the general features for a specific task. The attention weights and customized fingerprints are generated as follows:

$\omega_{v}=\sigma\left(W \cdot h_{v}+\right.$ bias $)$
$C F P=\sum_{v=1}^{N}\left(\omega_{v} \bullet h_{v}\right)$

where W and bias are the parameters of attention layers learned in model training, N is the number of nodes (substructures), $\omega_{v}$ is the attention weight of node (substructure) v, and $h_{v}$ is the general feature of node (substructure) v.

As shown in Figure 2A, fully-connected (FC) layers predict the corresponding tasks based on the customized toxicity fingerprints. The classification and regression tasks adopt different loss functions (loss_c and loss_r) as follows:

$\operatorname{loss_{-}c}=\sum_{n=1}^{N} \sum_{c=1}^{C}\left(-\left[p_{c} y_{n, c} \cdot \log \sigma\left(x_{n, c}\right)+\left(1-y_{n, c}\right) \cdot \log \left(1-\sigma\left(x_{n, c}\right)\right)\right]\right)$
$\operatorname{loss_{-}r}=\sum_{n=1}^{N} \sum_{r=1}^{R}\left(x_{n, r}-y_{n, r}\right)^{2}$

where $X_{n, c}$ is the predict value of molecule n for classification task c, $y_{n, c}$ is the true values of molecule n for classification task c, $p_{c}$ is the weight of positive samples, $x_{n, r}$ is the predict value of molecule n for regression task r, $y_{n, r}$ is the true value of molecule n for regression task r, N is the number of molecules, C is the number of the classification tasks, and R is the number of the regression tasks.

The loss function of MGA is a combination of loss_c and loss_r:

$\operatorname{loss}=\operatorname{loss_{-}c}+\operatorname{loss_{-}r}$
Figure 2. An overview of the Multi-task Graph Attention framework.
Figure 2. An overview of the Multi-task Graph Attention framework.

3. Model performance

Table 2. Predictive performance of regression models

Properties Test set Validation set Training set
R2 RMSE MAE R2 RMSE MAE R2 RMSE MAE
LogS 0.854 0.850 0.588 0.871 0.814 0.555 0.967 0.399 0.287
LogD7.4 0.892 0.462 0.347 0.901 0.457 0.345 0.950 0.305 0.236
LogP 0.957 0.357 0.256 0.957 0.387 0.261 0.980 0.257 0.193
Caco-2 Permeability 0.746 0.307 0.222 0.786 0.296 0.203 0.943 0.152 0.117
MDCK Permeability 0.731 0.291 0.199 0.662 0.301 0.233 0.934 0.140 0.105
PPB 0.733 0.135 0.083 0.744 0.155 0.091 0.961 0.054 0.037
VD 0.782 0.670 0.457 0.785 0.637 0.409 0.895 0.492 0.330
Fu 0.763 0.367 0.263 0.778 0.354 0.258 0.861 0.268 0.197
CL 0.678 3.375 2.240 0.692 2.956 1.883 0.977 0.740 0.556
Bioconcentration Factor 0.786 0.603 0.435 0.779 0.641 0.508 0.929 0.365 0.280
IGC50 0.723 0.496 0.335 0.860 0.356 0.270 0.920 0.305 0.232
LC50FM 0.745 0.863 0.643 0.660 0.693 0.536 0.918 0.423 0.324
LC50DM 0.524 0.994 0.692 0.909 0.496 0.386 0.950 0.398 0.319

Table 3. Predictive performance of classification models

Property Test set Validation set Training set
AUC ACC SP Sen MCC AUC ACC SP Sen MCC AUC ACC SP Sen MCC
Pgp-inhibitor 0.922 0.867 0.844 0.882 0.723 0.912 0.836 0.769 0.882 0.657 1.000 0.994 0.993 0.994 0.987
Pgp-substrate 0.840 0.768 0.705 0.828 0.538 0.901 0.840 0.853 0.828 0.680 1.000 1.000 1.000 1.000 1.000
HIA 0.866 0.924 0.800 0.942 0.687 0.944 0.949 0.867 0.961 0.785 1.000 0.988 1.000 0.987 0.950
F20% 0.833 0.750 0.680 0.773 0.414 0.905 0.842 0.760 0.868 0.599 1.000 0.995 1.000 0.993 0.987
F30% 0.848 0.802 0.794 0.806 0.580 0.797 0.800 0.727 0.836 0.555 1.000 0.998 1.000 0.996 0.994
BBB Penetration 0.908 0.862 0.824 0.891 0.718 0.920 0.852 0.810 0.885 0.698 0.992 0.957 0.948 0.964 0.912
CYP1A2 inhibitor 0.928 0.852 0.848 0.857 0.704 0.948 0.886 0.876 0.896 0.771 0.972 0.914 0.898 0.932 0.828
CYP1A2 substrate 0.737 0.649 0.632 0.667 0.298 0.842 0.816 0.800 0.833 0.632 0.985 0.936 0.942 0.929 0.871
CYP2C19 inhibitor 0.913 0.839 0.813 0.869 0.679 0.925 0.854 0.825 0.889 0.712 0.952 0.877 0.845 0.916 0.758
CYP2C19 substrate 0.758 0.654 0.667 0.636 0.300 0.926 0.741 0.688 0.818 0.497 0.974 0.928 0.894 0.977 0.859
CYP2C9 inhibitor 0.919 0.841 0.823 0.878 0.671 0.905 0.820 0.792 0.876 0.635 0.960 0.880 0.849 0.942 0.755
CYP2C9 substrate 0.725 0.707 0.776 0.606 0.386 0.785 0.744 0.816 0.636 0.461 0.967 0.904 0.911 0.894 0.801
CYP2D6 inhibitor 0.892 0.824 0.823 0.828 0.558 0.882 0.809 0.816 0.780 0.515 0.973 0.884 0.866 0.958 0.715
CYP2D6 substrate 0.847 0.775 0.733 0.818 0.553 0.775 0.663 0.600 0.727 0.330 0.947 0.893 0.849 0.937 0.788
CYP3A4 inhibitor 0.921 0.832 0.825 0.841 0.659 0.921 0.842 0.824 0.869 0.683 0.960 0.891 0.869 0.922 0.781
CYP3A4 substrate 0.776 0.713 0.820 0.608 0.437 0.802 0.753 0.760 0.745 0.505 0.948 0.887 0.920 0.855 0.776
T1/2 0.801 0.727 0.658 0.827 0.478 0.822 0.744 0.750 0.736 0.481 0.948 0.869 0.822 0.938 0.746
hERG Blockers 0.943 0.889 0.869 0.909 0.778 0.947 0.889 0.866 0.912 0.778 0.984 0.936 0.919 0.954 0.873
H-HT 0.814 0.720 0.814 0.650 0.461 0.750 0.675 0.735 0.630 0.362 0.975 0.895 0.976 0.835 0.802
DILI 0.924 0.894 0.826 0.958 0.793 0.849 0.708 0.583 0.833 0.430 0.998 0.981 0.984 0.979 0.963
AMES Toxicity 0.902 0.807 0.732 0.865 0.606 0.876 0.797 0.753 0.831 0.586 0.976 0.917 0.869 0.955 0.832
ROA Toxicity 0.853 0.778 0.769 0.793 0.549 0.846 0.795 0.826 0.744 0.567 0.986 0.936 0.923 0.957 0.868
FDAMDD 0.804 0.736 0.734 0.737 0.471 0.869 0.787 0.766 0.810 0.575 0.986 0.946 0.926 0.970 0.894
Skin Sensitization 0.707 0.775 0.539 0.889 0.462 0.901 0.854 0.692 0.929 0.652 0.991 0.966 0.952 0.973 0.923
Carcinogencity 0.788 0.731 0.623 0.843 0.476 0.694 0.619 0.566 0.673 0.240 0.974 0.909 0.876 0.942 0.817
Eye Corrosion 0.983 0.957 0.965 0.944 0.908 0.982 0.965 0.958 0.977 0.928 1.000 0.995 0.995 0.994 0.989
Eye Irritation 0.982 0.952 0.918 0.964 0.876 0.963 0.931 0.904 0.941 0.825 0.996 0.974 0.983 0.971 0.834
Respiratory Toxicity 0.828 0.764 0.732 0.786 0.514 0.906 0.850 0.836 0.859 0.689 0.989 0.956 0.960 0.954 0.909
NR-AR 0.886 0.890 0.896 0.731 0.348 0.778 0.881 0.898 0.444 0.201 0.991 0.911 0.908 0.986 0.506
NR-AR-LBD 0.915 0.936 0.942 0.783 0.472 0.967 0.948 0.952 0.833 0.545 0.996 0.962 0.960 0.995 0.666
NR-AhR 0.943 0.862 0.858 0.896 0.573 0.873 0.828 0.840 0.737 0.435 0.975 0.891 0.882 0.962 0.655
NR-Aromatase 0.852 0.849 0.859 0.615 0.264 0.895 0.888 0.898 0.654 0.340 0.985 0.914 0.910 0.995 0.552
NR-ER 0.771 0.815 0.845 0.567 0.320 0.781 0.847 0.877 0.603 0.394 0.946 0.885 0.889 0.853 0.587
NR-ER-LBD 0.850 0.903 0.918 0.618 0.364 0.832 0.892 0.907 0.600 0.340 0.987 0.915 0.911 0.993 0.572
NR-PPAR-gamma 0.893 0.896 0.901 0.750 0.344 0.957 0.884 0.887 0.800 0.345 0.989 0.918 0.916 0.994 0.495
SR-ARE 0.863 0.827 0.850 0.701 0.469 0.852 0.841 0.871 0.678 0.483 0.954 0.891 0.888 0.905 0.675
SR-ATAD5 0.874 0.919 0.929 0.640 0.361 0.882 0.913 0.923 0.640 0.348 0.991 0.936 0.934 0.995 0.573
SR-HSE 0.907 0.868 0.875 0.750 0.393 0.855 0.885 0.898 0.667 0.384 0.985 0.908 0.903 0.990 0.582
SR-MMP 0.927 0.897 0.908 0.835 0.660 0.933 0.880 0.896 0.791 0.607 0.979 0.924 0.918 0.957 0.766
SR-p53 0.881 0.841 0.849 0.723 0.365 0.889 0.844 0.846 0.809 0.411 0.982 0.885 0.878 0.995 0.566

Table 4. Results of leave-cluster-out validation of regression models

Property R2 MAE RMSE
LogS 0.826 0.654 0.855
LogD7.4 0.873 0.409 0.537
LogP 0.961 0.295 0.387
Caco-2 Permeability 0.613 0.343 0.464
MDCK Permeability 0.424 0.415 0.494
PPB 0.769 8.577 0.134
VD 0.392 0.783 1.371
Fu 0.720 0.281 0.368
CL 0.301 3.034 4.437
BCF 0.368 0.789 1.06
IGC50 0.743 0.402 0.549
LC50 0.710 0.641 0.892
LC50DM 0.719 0.772 1.006

Table 5. Results of leave-cluster-out validation of classification models

Property ACC AUC MCC
Pgp-inhibitor 0.871 0.939 0.719
Pgp-substrate 0.808 0.887 0.618
HIA 0.935 0.959 0.753
F20% 0.811 0.757 0.392
F30% 0.770 0.763 0.355
BBB Penetration 0.809 0.886 0.573
CYP1A2 inhibitor 0.900 0.96 0.800
CYP1A2 substrate 0.722 0.796 0.438
CYP2C19 inhibitor 0.869 0.920 0.693
CYP2C19 substrate 0.657 0.706 0.333
CYP2C9 inhibitor 0.875 0.908 0.608
CYP2C9 substrate 0.621 0.651 0.248
CYP2D6 inhibitor 0.843 0.907 0.561
CYP2D6 substrate 0.716 0.788 0.436
CYP3A4 inhibitor 0.862 0.936 0.704
CYP3A4 substrate 0.787 0.868 0.58
T1/2 0.665 0.729 0.334
hERG Blockers 0.892 0.957 0.754
H-HT 0.625 0.677 0.232
DILI 0.767 0.877 0.556
AMES Toxicity 0.748 0.829 0.497
ROA Toxicity 0.774 0.825 0.509
FDAMDD 0.751 0.846 0.520
Skin Sensitization 0.577 0.689 0.247
Carcinogenicity 0.550 0.594 0.128
Eye Corrosion 0.881 0.960 0.759
Eye Irritation 0.953 0.970 0.721
Respiratory 0.839 0.904 0.661
NR-AR 0.847 0.925 0.508
NR-AR-LBD 0.912 0.955 0.575
NR-AhR 0.785 0.906 0.532
NR-Aromatase 0.758 0.841 0.250
NR-ER 0.742 0.641 0.147
NR-ER-LBD 0.782 0.813 0.242
NR-PPAR-gamma 0.898 0.897 0.283
SR-ARE 0.782 0.809 0.351
SR-ATAD5 0.844 0.836 0.196
SR-HSE 0.814 0.841 0.332
SR-MMP 0.798 0.880 0.477
SR-p53 0.778 0.849 0.358

4. Implementation

ADMETlab 2.0 was built using Python web framework of Django and deployed on an elastic compute service from Aliyun running an Ubuntu Linux system. The web access was enabled via the Nginx web server and the interactions between Django and proxy server were supported by uwsgi. This application was developed based on the Model-View-Template (MVT) framework. The model layer maps the business objects to the database objects. The view layer is a business logic layer, responsible for performing the access to the deep learning models, delivering the data to be shown to the template layer, and handling the upload and download of files. The template layer provides the visualization of results, page rendering, integration of documentation, etc. The uploaded and downloaded files, pre-trained models and model predictions are stored in the server. The prediction models were built with the Python programming language. The deep learning packages, PyTorch and DGL, were used in model implementation. Additionally, the RDKit package was employed to provide various cheminformatics support. The server has been successfully tested on the recent version of Mozilla Firefox, Google Chrome and Apple Safari.

Table 6. The development environment of ADMETlab 2.0

Third party library Version
rdkit 2019.03.1
django 2.2
dgl 0.5.2
dgllife 0.2.5
pytorch 1.6.0
torchvision 0.7.0
pycharts 1.8.1

5. Browser Compatibility

OS Version Chrome Edge Firefox Safari
Linux Ubuntu 18.04.5 LTS 87.0.4280.141 n/a 82.0.2 n/a
MacOS Catalina 10.15.6 87.0.4280.141 n/a 84.0.1 13.1.2
Windows 10 88.0.4324.104 88.0.705.53 84.0.2 n/a

4. References

  1. Xiong, G., Wu, Z., Yi, J., Fu, L., Yang, Z., Hsieh, C., Yin, M., Zeng, X., Wu, C., Lu, A., Chen, X., Hou, T., & Cao, D. ADMETlab 2.0: an integrated online platform for accurate and comprehensive predictions of ADMET properties. Nucleic Acids Res, 2021, doi: 10.1093/nar/gkab255.

评论