ADMETlab 2.0: an integrated online platform for accurate and comprehensive predictions of ADMET properties

今年课题组发在NAR上的一篇文章, IF: 16.971

论文链接:

ADMETlab 2.0: an integrated online platform for accurate and comprehensive predictions of ADMET properties

https://academic.oup.com/nar/article/49/W1/W5/6249611

访问地址:

ADMETlab 2.0

https://admetmesh.scbdd.com/

1. Overview

由于候选化合物的不良药代动力学和毒性是药物开发失败的主要原因，因此，吸收、分布、代谢、排泄和毒性(ADMET)的评价应尽早得到评估。在计算机中进行的大量实验中，ADMET评价模型已被开发为辅助药物化学家设计和优化先导物的辅助工具。在这里，我们宣布了ADMETlab 2.0的发布，这是广泛使用的AMDETlab web服务器的一个完全重新设计的版本，用于预测药物动力学和化学品毒性特性，其中支持的admet相关端点的数量大约是上一个版本的两倍。包括17个理化性质，13个药用化学性质，23个ADME性质，27个毒性终点和8个毒性基团规则。采用多任务图注意框架(MGA)，在AdmetLab 2.0中开发强大和准确的模型。批量计算模块是响应于用户的批量请求提供的，并且结果进一步优化了结果的表示。

Figure 1. Workflow scheme of ADMETlab 2.0

2. New Developments

1. Comprehensively enhanced ADMET profiles

In this update, the available ADMET profile is extended to 88 related characteristics spanning 7 different categories, roughly twice the number of its predecessor. Compared with the initial version, the number of entries for model training in the current release has almost tripled.

2. Re-engineered modules and batch evaluation support

The functional modules were re-engineered and optimized to improve the user experience. An independent module has been added for supporting batch uploading and downloading. The users could define their own criterion to promising and desirable molecules.

3. Robust and accurate MGA models

The MGA framework was employed to develop classification and regression predictors simultaneously. Deep learning makes multitask learning very natural and the combination leads to improved performance for many modeled endpoints.

4. Practical explanation and guidance

Detailed explanation and optimal range of each property are provided to help the users to get a whole ADMET picture of input molecule. The empirical-based decision states of each property are visually represented with different colored dots (green: excellent; yellow: medium; red: poor).

3. Program Description

1. Model data

Table 1. Data information of 53 predictive models

Properties	Total (positive/Negative)	training set (positive/Negative)	test set (positive/Negative)	valuation set (positive/Negative)
LogS	4797	3836	480	481
LogD7.4	10370	8296	1036	1038
LogP	12682	10145	1270	1267
Caco-2 Permeability	2464	1970	247	247
MDCK Permeability	1140	912	114	114
Pgp-inhibitor	2209 (1315/894)	1764 (1051/713)	222 (132/90)	223 (132/91)
pgp-substrate	1185 (586/599)	949 (471/478)	118 (58/60)	118 (57/61)
HIA	1160 (1022/138)	927 (818/109)	116 (101/15)	117 (103/14)
F20%	992 (753/239)	794 (602/192)	98 (75/23)	100 (76/24)
F30%	992 (666/326)	793 (532/261)	99 (67/32)	100 (67/33)
PPB	4712	3771	479	480
VD	1086	872	107	107
BBB Penetration	2865 (1651/1254)	2324 (1321/1003)	290 (165/125)	291 (165/126)
Fu	2575	2059	258	258
CYP1A2 inhibitor	12635 (5876/6759)	10111 (4702/5425)	1261 (588/673)	1263 (586/677)
CYP1A2 substrate	366 (176/190)	292 (140/152)	37 (18/19)	37 (18/19)
CYP2C19 inhibitor	12611 (5770/6841)	10096 (4618/5478)	1257 (577/680)	1258 (575/683)
CYP2C19 substrate	258 (107/151)	206 (85/121)	26 (11/15)	26 (11/15)
CYP2C9 inhibitor	12111 (4017/8094)	9686 (3213/6473)	1213 (402/811)	1212 (402/810)
CYP2C9 substrate	811 (325/486)	647 (259/388)	82 (33/49)	82 (33/49)
CYP2D6 inhibitor	13073 (2535/10538)	10471 (2032/8439)	1304 (255/1051)	1298 (250/1048)
CYP2D6 substrate	877 (435/442)	703 (347/356)	85 (44/41)	89 (44/45)
CYP3A4 inhibitor	12339 (5092/7247)	9880 (4074/5806)	1232 (510/722)	1227 (508/719)
CYP3A4 substrate	979 (497/482)	786 (397/389)	97 (49/48)	96 (51/45)
CL	831	666	81	84
T1/2	1219 (500/719)	973 (399/574)	124 (51/73)	122 (50/72)
hERG Blockers	13845 (6922/6923)	11076 (5538/5538)	1384 (692/692)	1385 (692/693)
H-HT	2304 (1299/1005)	1850 (1044/806)	227 (128/99)	227 (127/100)
DILI	467 (235/232)	373 (187/186)	47 (24/23)	47 (24/23)
AMES Toxicity	7575 (4222/3353)	6071 (3389/2682)	751 (416/335)	753 (417/336)
Rat Oral Acute Toxicity	7327 (2799/4528)	5862 (2240/3622)	733 (280/453)	732 (279/453)
FDAMDD	1197 (561/636)	957 (448/509)	120 (56/64)	120 (57/63)
Skin Sensitization	405 (274/131)	324 (219/105)	40 (27/13)	41 (28/13)
Carcinogencity	1041 (516/525)	832 (413/419)	104 (51/53)	105 (52/53)
Bioconcentration Factor	676	540	68	68
IGC50	1787	1429	179	179
LC50FM	816	652	82	82
LC50DM	347	277	35	35
Eye Corrosion	2298 (886/1412)	1838 (709/1129)	230 (89/141)	230 (84/142)
Eye Irritation	5219 (3874/1345)	4176 (3099/1077)	522 (388/134)	521 (387/134)
Respiratory Toxicity	1388 (835/553)	1109 (666/443)	139 (84/55)	140 (85/55)
NR-AR	7312 (266/7046)	5853 (213/5640)	726 (26/700)	733 (27/706)
NR-AR-LBD	6862 (233/6629)	5493 (186/5307)	688 (23/665)	681 (24/657）
NR-AhR	6603 (763/5840)	5285 (610/4675)	657 (77/580)	661 (76/585)
NR-Aromatase	5887 (256/5631)	4711 (205/4506)	588 (25/563)	588 (26/562)
NR-ER	6166 (669/5497)	4935 (536/4399)	616 (66/550)	615 (67/548)
NR-ER-LBD	7052 (342/6710)	5643 (274/5369)	701 (33/668）	708 (35/673)
NR-PPAR-gamma	6586 (197/6389)	5266 (158/5108)	661 (19/642)	659 (20/639)
SR-ARE	5652 (865/4787)	4521 (691/3830)	564 (87/477)	567 (87/480)
SR-ATAD5	7170 (249/6921)	5736 (199/5537)	718 (25/693)	716 (25/691)
SR-HSE	6319 (360/5959)	5059 (289/4770)	630 (35/595)	630 (36/594)
SR-MMP	5913 (892/5021)	4735 (713/4022)	592 (91/501)	586 (88/498)
SR-p53	6915 (456/6459)	5543 (364/5179)	692 (46/646)	680 (46/634)

2. MGA framework

An overview of the Multi-task Graph Attention (MGA) framework is shown in Figure 2. As shown in Figure 2, MGA is composed of input, Relation graph convolution network (RGCN) layers, attention layer and fully-connected (FC) layers. In the Input, a node represents the information of an atom, and after passing RGCN layers, the node represents general features of circular substructure centered on the atom. RGCN is an extension of the standard graph convolution network (GCN) by introducing edge features to enrich the messages used to update the hidden states in the network. The propagation rule for each node in RGCN layer is calculated via

$h_{v}^{(l+1)}=\sigma\left(\sum_{r \in R} \sum_{u \epsilon N_{v}^{r}} W_{r}^{(l)} h_{u}^{(l)}+W_{u}^{(l)} h_{v}^{(l)}\right)$

where $h_{v}^{(l+1)}$ is the state vector of target node v after l+1 iterations and $N_{v}^{r}$ denotes the neighbors of node v under the relation (edge) $r \epsilon R$. $W_{r}^{(l)}$ is the weight for neighbor node u connecting to node v by an edge attributed with the relation $r \in R$, and $W_{0}^{(l)}$ is the weight for target node v. As can be seen above, the edge information is explicitly incorporated in a RGCN under the relation $r \in R$. The weight $W_{r}^{(l)}$ is a linear combination of basis transformation.

As shown in Figure 2B, attention layers can assign different attention weights to different substructures, and then generate the customized fingerprints (CFP) from the general features for a specific task. The attention weights and customized fingerprints are generated as follows:

$\omega_{v}=\sigma\left(W \cdot h_{v}+\right.$ bias $)$ $C F P=\sum_{v=1}^{N}\left(\omega_{v} \bullet h_{v}\right)$

where W and bias are the parameters of attention layers learned in model training, N is the number of nodes (substructures), $\omega_{v}$ is the attention weight of node (substructure) v, and $h_{v}$ is the general feature of node (substructure) v.

As shown in Figure 2A, fully-connected (FC) layers predict the corresponding tasks based on the customized toxicity fingerprints. The classification and regression tasks adopt different loss functions (loss_c and loss_r) as follows:

$\operatorname{loss_{-}c}=\sum_{n=1}^{N} \sum_{c=1}^{C}\left(-\left[p_{c} y_{n, c} \cdot \log \sigma\left(x_{n, c}\right)+\left(1-y_{n, c}\right) \cdot \log \left(1-\sigma\left(x_{n, c}\right)\right)\right]\right)$ $\operatorname{loss_{-}r}=\sum_{n=1}^{N} \sum_{r=1}^{R}\left(x_{n, r}-y_{n, r}\right)^{2}$

where $X_{n, c}$ is the predict value of molecule n for classification task c, $y_{n, c}$ is the true values of molecule n for classification task c, $p_{c}$ is the weight of positive samples, $x_{n, r}$ is the predict value of molecule n for regression task r, $y_{n, r}$ is the true value of molecule n for regression task r, N is the number of molecules, C is the number of the classification tasks, and R is the number of the regression tasks.

The loss function of MGA is a combination of loss_c and loss_r:

$\operatorname{loss}=\operatorname{loss_{-}c}+\operatorname{loss_{-}r}$

Figure 2. An overview of the Multi-task Graph Attention framework.

3. Model performance

Table 2. Predictive performance of regression models

Properties	Test set			Validation set			Training set
Properties	R2	RMSE	MAE	R2	RMSE	MAE	R2	RMSE	MAE
LogS	0.854	0.850	0.588	0.871	0.814	0.555	0.967	0.399	0.287
LogD7.4	0.892	0.462	0.347	0.901	0.457	0.345	0.950	0.305	0.236
LogP	0.957	0.357	0.256	0.957	0.387	0.261	0.980	0.257	0.193
Caco-2 Permeability	0.746	0.307	0.222	0.786	0.296	0.203	0.943	0.152	0.117
MDCK Permeability	0.731	0.291	0.199	0.662	0.301	0.233	0.934	0.140	0.105
PPB	0.733	0.135	0.083	0.744	0.155	0.091	0.961	0.054	0.037
VD	0.782	0.670	0.457	0.785	0.637	0.409	0.895	0.492	0.330
Fu	0.763	0.367	0.263	0.778	0.354	0.258	0.861	0.268	0.197
CL	0.678	3.375	2.240	0.692	2.956	1.883	0.977	0.740	0.556
Bioconcentration Factor	0.786	0.603	0.435	0.779	0.641	0.508	0.929	0.365	0.280
IGC50	0.723	0.496	0.335	0.860	0.356	0.270	0.920	0.305	0.232
LC50FM	0.745	0.863	0.643	0.660	0.693	0.536	0.918	0.423	0.324
LC50DM	0.524	0.994	0.692	0.909	0.496	0.386	0.950	0.398	0.319

Table 3. Predictive performance of classification models

Property	Test set					Validation set					Training set
Property	AUC	ACC	SP	Sen	MCC	AUC	ACC	SP	Sen	MCC	AUC	ACC	SP	Sen	MCC
Pgp-inhibitor	0.922	0.867	0.844	0.882	0.723	0.912	0.836	0.769	0.882	0.657	1.000	0.994	0.993	0.994	0.987
Pgp-substrate	0.840	0.768	0.705	0.828	0.538	0.901	0.840	0.853	0.828	0.680	1.000	1.000	1.000	1.000	1.000
HIA	0.866	0.924	0.800	0.942	0.687	0.944	0.949	0.867	0.961	0.785	1.000	0.988	1.000	0.987	0.950
F20%	0.833	0.750	0.680	0.773	0.414	0.905	0.842	0.760	0.868	0.599	1.000	0.995	1.000	0.993	0.987
F30%	0.848	0.802	0.794	0.806	0.580	0.797	0.800	0.727	0.836	0.555	1.000	0.998	1.000	0.996	0.994
BBB Penetration	0.908	0.862	0.824	0.891	0.718	0.920	0.852	0.810	0.885	0.698	0.992	0.957	0.948	0.964	0.912
CYP1A2 inhibitor	0.928	0.852	0.848	0.857	0.704	0.948	0.886	0.876	0.896	0.771	0.972	0.914	0.898	0.932	0.828
CYP1A2 substrate	0.737	0.649	0.632	0.667	0.298	0.842	0.816	0.800	0.833	0.632	0.985	0.936	0.942	0.929	0.871
CYP2C19 inhibitor	0.913	0.839	0.813	0.869	0.679	0.925	0.854	0.825	0.889	0.712	0.952	0.877	0.845	0.916	0.758
CYP2C19 substrate	0.758	0.654	0.667	0.636	0.300	0.926	0.741	0.688	0.818	0.497	0.974	0.928	0.894	0.977	0.859
CYP2C9 inhibitor	0.919	0.841	0.823	0.878	0.671	0.905	0.820	0.792	0.876	0.635	0.960	0.880	0.849	0.942	0.755
CYP2C9 substrate	0.725	0.707	0.776	0.606	0.386	0.785	0.744	0.816	0.636	0.461	0.967	0.904	0.911	0.894	0.801
CYP2D6 inhibitor	0.892	0.824	0.823	0.828	0.558	0.882	0.809	0.816	0.780	0.515	0.973	0.884	0.866	0.958	0.715
CYP2D6 substrate	0.847	0.775	0.733	0.818	0.553	0.775	0.663	0.600	0.727	0.330	0.947	0.893	0.849	0.937	0.788
CYP3A4 inhibitor	0.921	0.832	0.825	0.841	0.659	0.921	0.842	0.824	0.869	0.683	0.960	0.891	0.869	0.922	0.781
CYP3A4 substrate	0.776	0.713	0.820	0.608	0.437	0.802	0.753	0.760	0.745	0.505	0.948	0.887	0.920	0.855	0.776
T1/2	0.801	0.727	0.658	0.827	0.478	0.822	0.744	0.750	0.736	0.481	0.948	0.869	0.822	0.938	0.746
hERG Blockers	0.943	0.889	0.869	0.909	0.778	0.947	0.889	0.866	0.912	0.778	0.984	0.936	0.919	0.954	0.873
H-HT	0.814	0.720	0.814	0.650	0.461	0.750	0.675	0.735	0.630	0.362	0.975	0.895	0.976	0.835	0.802
DILI	0.924	0.894	0.826	0.958	0.793	0.849	0.708	0.583	0.833	0.430	0.998	0.981	0.984	0.979	0.963
AMES Toxicity	0.902	0.807	0.732	0.865	0.606	0.876	0.797	0.753	0.831	0.586	0.976	0.917	0.869	0.955	0.832
ROA Toxicity	0.853	0.778	0.769	0.793	0.549	0.846	0.795	0.826	0.744	0.567	0.986	0.936	0.923	0.957	0.868
FDAMDD	0.804	0.736	0.734	0.737	0.471	0.869	0.787	0.766	0.810	0.575	0.986	0.946	0.926	0.970	0.894
Skin Sensitization	0.707	0.775	0.539	0.889	0.462	0.901	0.854	0.692	0.929	0.652	0.991	0.966	0.952	0.973	0.923
Carcinogencity	0.788	0.731	0.623	0.843	0.476	0.694	0.619	0.566	0.673	0.240	0.974	0.909	0.876	0.942	0.817
Eye Corrosion	0.983	0.957	0.965	0.944	0.908	0.982	0.965	0.958	0.977	0.928	1.000	0.995	0.995	0.994	0.989
Eye Irritation	0.982	0.952	0.918	0.964	0.876	0.963	0.931	0.904	0.941	0.825	0.996	0.974	0.983	0.971	0.834
Respiratory Toxicity	0.828	0.764	0.732	0.786	0.514	0.906	0.850	0.836	0.859	0.689	0.989	0.956	0.960	0.954	0.909
NR-AR	0.886	0.890	0.896	0.731	0.348	0.778	0.881	0.898	0.444	0.201	0.991	0.911	0.908	0.986	0.506
NR-AR-LBD	0.915	0.936	0.942	0.783	0.472	0.967	0.948	0.952	0.833	0.545	0.996	0.962	0.960	0.995	0.666
NR-AhR	0.943	0.862	0.858	0.896	0.573	0.873	0.828	0.840	0.737	0.435	0.975	0.891	0.882	0.962	0.655
NR-Aromatase	0.852	0.849	0.859	0.615	0.264	0.895	0.888	0.898	0.654	0.340	0.985	0.914	0.910	0.995	0.552
NR-ER	0.771	0.815	0.845	0.567	0.320	0.781	0.847	0.877	0.603	0.394	0.946	0.885	0.889	0.853	0.587
NR-ER-LBD	0.850	0.903	0.918	0.618	0.364	0.832	0.892	0.907	0.600	0.340	0.987	0.915	0.911	0.993	0.572
NR-PPAR-gamma	0.893	0.896	0.901	0.750	0.344	0.957	0.884	0.887	0.800	0.345	0.989	0.918	0.916	0.994	0.495
SR-ARE	0.863	0.827	0.850	0.701	0.469	0.852	0.841	0.871	0.678	0.483	0.954	0.891	0.888	0.905	0.675
SR-ATAD5	0.874	0.919	0.929	0.640	0.361	0.882	0.913	0.923	0.640	0.348	0.991	0.936	0.934	0.995	0.573
SR-HSE	0.907	0.868	0.875	0.750	0.393	0.855	0.885	0.898	0.667	0.384	0.985	0.908	0.903	0.990	0.582
SR-MMP	0.927	0.897	0.908	0.835	0.660	0.933	0.880	0.896	0.791	0.607	0.979	0.924	0.918	0.957	0.766
SR-p53	0.881	0.841	0.849	0.723	0.365	0.889	0.844	0.846	0.809	0.411	0.982	0.885	0.878	0.995	0.566

Table 4. Results of leave-cluster-out validation of regression models

Property	R2	MAE	RMSE
LogS	0.826	0.654	0.855
LogD7.4	0.873	0.409	0.537
LogP	0.961	0.295	0.387
Caco-2 Permeability	0.613	0.343	0.464
MDCK Permeability	0.424	0.415	0.494
PPB	0.769	8.577	0.134
VD	0.392	0.783	1.371
Fu	0.720	0.281	0.368
CL	0.301	3.034	4.437
BCF	0.368	0.789	1.06
IGC50	0.743	0.402	0.549
LC50	0.710	0.641	0.892
LC50DM	0.719	0.772	1.006

Table 5. Results of leave-cluster-out validation of classification models

Property	ACC	AUC	MCC
Pgp-inhibitor	0.871	0.939	0.719
Pgp-substrate	0.808	0.887	0.618
HIA	0.935	0.959	0.753
F20%	0.811	0.757	0.392
F30%	0.770	0.763	0.355
BBB Penetration	0.809	0.886	0.573
CYP1A2 inhibitor	0.900	0.96	0.800
CYP1A2 substrate	0.722	0.796	0.438
CYP2C19 inhibitor	0.869	0.920	0.693
CYP2C19 substrate	0.657	0.706	0.333
CYP2C9 inhibitor	0.875	0.908	0.608
CYP2C9 substrate	0.621	0.651	0.248
CYP2D6 inhibitor	0.843	0.907	0.561
CYP2D6 substrate	0.716	0.788	0.436
CYP3A4 inhibitor	0.862	0.936	0.704
CYP3A4 substrate	0.787	0.868	0.58
T1/2	0.665	0.729	0.334
hERG Blockers	0.892	0.957	0.754
H-HT	0.625	0.677	0.232
DILI	0.767	0.877	0.556
AMES Toxicity	0.748	0.829	0.497
ROA Toxicity	0.774	0.825	0.509
FDAMDD	0.751	0.846	0.520
Skin Sensitization	0.577	0.689	0.247
Carcinogenicity	0.550	0.594	0.128
Eye Corrosion	0.881	0.960	0.759
Eye Irritation	0.953	0.970	0.721
Respiratory	0.839	0.904	0.661
NR-AR	0.847	0.925	0.508
NR-AR-LBD	0.912	0.955	0.575
NR-AhR	0.785	0.906	0.532
NR-Aromatase	0.758	0.841	0.250
NR-ER	0.742	0.641	0.147
NR-ER-LBD	0.782	0.813	0.242
NR-PPAR-gamma	0.898	0.897	0.283
SR-ARE	0.782	0.809	0.351
SR-ATAD5	0.844	0.836	0.196
SR-HSE	0.814	0.841	0.332
SR-MMP	0.798	0.880	0.477
SR-p53	0.778	0.849	0.358

4. Implementation

ADMETlab 2.0 was built using Python web framework of Django and deployed on an elastic compute service from Aliyun running an Ubuntu Linux system. The web access was enabled via the Nginx web server and the interactions between Django and proxy server were supported by uwsgi. This application was developed based on the Model-View-Template (MVT) framework. The model layer maps the business objects to the database objects. The view layer is a business logic layer, responsible for performing the access to the deep learning models, delivering the data to be shown to the template layer, and handling the upload and download of files. The template layer provides the visualization of results, page rendering, integration of documentation, etc. The uploaded and downloaded files, pre-trained models and model predictions are stored in the server. The prediction models were built with the Python programming language. The deep learning packages, PyTorch and DGL, were used in model implementation. Additionally, the RDKit package was employed to provide various cheminformatics support. The server has been successfully tested on the recent version of Mozilla Firefox, Google Chrome and Apple Safari.

Table 6. The development environment of ADMETlab 2.0

Third party library	Version
rdkit	2019.03.1
django	2.2
dgl	0.5.2
dgllife	0.2.5
pytorch	1.6.0
torchvision	0.7.0
pycharts	1.8.1

5. Browser Compatibility

OS	Version	Chrome	Edge	Firefox	Safari
Linux	Ubuntu 18.04.5 LTS	87.0.4280.141	n/a	82.0.2	n/a
MacOS	Catalina 10.15.6	87.0.4280.141	n/a	84.0.1	13.1.2
Windows	10	88.0.4324.104	88.0.705.53	84.0.2	n/a

4. References

Xiong, G., Wu, Z., Yi, J., Fu, L., Yang, Z., Hsieh, C., Yin, M., Zeng, X., Wu, C., Lu, A., Chen, X., Hou, T., & Cao, D. ADMETlab 2.0: an integrated online platform for accurate and comprehensive predictions of ADMET properties. Nucleic Acids Res, 2021, doi: 10.1093/nar/gkab255.