Prediction of Anti-CRISPR Proteins Based on Feature Information Fusion
The CRISPR-Cas system is a natural immune system found in bacteria and archaea.It has been extensively studied as a gene editing tool in various fields,including cancer treatment.However,the CRISPR-Cas gene editing technology is associated with off-target effects.Based on research findings,Anti-CRISPR proteins are capable of modulating the functionality of the CRISPR-Cas system.These proteins can reduce off-target effects and other adverse impacts without compro-mising targeted gene editing,thereby improving the efficiency and safety of gene editing techniques.Therefore,studying Anti-CRISPR proteins is of significant importance for understanding the func-tionality of the CRISPR-Cas system and the bacterial-viral interactions.In this study,an Anti-CRISPR protein dataset was constructed,and six features,including amino acid composition,dipep-tide composition,g-gap dipeptide composition,protein secondary structure,auto-covariance average chemical shift and protein blocks were extracted.Support vector machine(SVM)was employed for the prediction of Anti-CRISPR proteins.The highest accuracy of individual parameter is 93.50%with Jackknife test.Dimensionality reduction is performed on the high-dimensional dipeptide com-position and g-gap dipeptide composition,and the highest accuracy of 95.10%is obtained when g is set to 3.Further research discovers that 16 g-gap dipeptide compositions correspond to 11 amino acids with high relevance to the prediction of Anti-CRISPR proteins.Finally,the highest accuracy of combined features is 96.07%with Jackknife test.