Abstract
A 5'-leader,known initially as the 5'-untranslated region,contains multiple isoforms due to alternative splicing(aS)and alternative transcription start site(aTSS).Therefore,a representative 5'-leader is demanded to examine the embedded RNA regulatory elements in controlling translation efficiency.Here,we develop a ranking algorithm and a deep-learning model to annotate representative 5'-leaders for five plant species.We rank the intra-sample and inter-sample frequency of aS-mediated transcript isoforms using the Kruskal-Wallis test-based algorithm and identify the representative aS-5'-leader.To further assign a representative 5'-end,we train the deep-learning model 5'IeaderP to learn aTSS-mediated 5'-end distribution patterns from cap-analysis gene expression data.The model accurately predicts the 5'-end,confirmed experimentally in Arabidopsis and rice.The representative 5'-leader-contained gene models and 5'leaderP can be accessed at RNAirport(http://www.rnairport.com/leader5P/).The Stage 1 annotation of 5'-leader records 5'-leader diversity and will pave the way to Ribo-Seq open-reading frame annotation,identical to the project recently initiated by human GENCODE.
基金项目
National Key R&D Program of China(2023ZD04073)
Major Project of Hubei Hongshan Laboratory(2022hszd016)
Key Research and Development Program of Hubei Province(2022BFE003)
National Natural Science Foundation of China(32070284t)