Estimation of the Number of Binding Sites for Six Transcription Factors in the Human Genomes
Transcription factors are proteins that bind to specific DNA sequences,thereby regu-lating the transcription and expression of genes.When using theoretical methods to predict cell line or tissue-specific transcription factor binding sites,the size and selection of the negative set often impact the performance evaluation of the prediction model.Therefore,an accurate estimation of the number of transcription factor binding sites in the human genome can help evaluate the performance of prediction models more accurately.In this study,utilizing the ChIP-Seq data of six transcription factors(CTCF,POLR2A,EZH2,REST,MAX,RAD21)in different cell lines,we performed poly-nomial fitting and estimation for the number of their binding sites in the human genomes.This results provide a reference for the selection of negative sets when constructing prediction models of transcription factor binding sites.