首页|基于多查询特性的搜索引擎缓存替换策略研究

基于多查询特性的搜索引擎缓存替换策略研究

扫码查看
缓存是搜索引擎中的重要技术,能显著节省查询处理计算量,缩短查询请求响应时间和提高系统吞吐量,得到学术界的关注和业界的广泛应用。当前搜索引擎缓存替换策略没有充分利用查询的多种访问特征信息,没有充分利用查询分布特性,传统替换策略用在搜索引擎中存在各种不足。针对以上问题研究查询请求的分布特征,分析现有缓存替换策略的不足,然后基于查询词访问特征提出代表查询词未来热度值的综合价值函数模型,然后通过对搜索引擎查询日志进行细粒度的统计分析,得到每个查询词每日各访问特性的详细记录,并基于多元回归分析方法计算得到查询词价值函数模型的未知参数,设计结合查询词当前动态访问特性和未来访问热度值的查询结果缓存管理策略,并通过真实查询记录测试不同替换区大小下本缓存系统的命中率,对比证明所提出的缓存替换策略相对于传统替换策略在命中率方面的显著提升。
Research on the Search Engine Cache RepIacement Strategy Based on MuItipIe Query Attributes
Cache is a very important technology in search engine, which can significantly save query computation processing, improve query re-sponse and improve system throughput, which are widely applied by the academia and the industry. Current cache replacement policy does not take full advantage of search engine queries of multiple access feature information, does not take advantage of query distribution, also deficiencies exist in the traditional replacement policy when used in search engines. For the above problems, studies query distribu-tion features, analyses the insufficient of existing cache replace strategies, then proposes integrated value function model represent query future heat value based on query access features, analyses search engine query log for fine grain degrees, gets each query's daily access characteristics of detailed records, and based on multiple return analysis in the minimum II multiplication calculation to get the unknown parameter in the function model, designs cache management policy integrate current dynamic access attributes with the heat value of the query in the future, hit ratio test of replace management strategy through real query shows that, in contrast with traditional cache replace-ment strategy, this replacement strategy significantly exceeds them in hit rate.

Search EngineCache Replacement StrategyIntegrated ValueMultiple Query AttributesRegression Analysis

房耘耘

展开 >

同济大学电子与信息工程学院,上海 201804

Search Engine Cache Replacement Strategy Integrated Value Multiple Query Attributes Regression Analysis

2015

现代计算机(普及版)
中山大学

现代计算机(普及版)

影响因子:0.202
ISSN:1007-1423
年,卷(期):2015.(8)
  • 2