首页|A coral-reef approach to extract information from HTML tables
A coral-reef approach to extract information from HTML tables
扫码查看
点击上方二维码区域,可以放大扫码查看
原文链接
NSTL
Elsevier
This article presents Coraline, which is a new table-understanding proposal. Its novelty lies in a coral-reef optimisation algorithm that addresses the problem of feature selection in synchrony with a clustering technique and some custom heuristics that help extract information in a totally unsupervised manner. Our experimental analysis was performed on a large collection of tables with a variety of layouts, encoding problems, and formatting alternatives. Coraline could achieve an F-1 score as high as 0.90 and took 7.07 CPU seconds per table, which improves on the best supervised proposal by 6.67% regarding effectiveness and 40.54% regarding efficiency; it also improves on the best unsupervised proposal by 11.11% regarding effectiveness while it remains very competitive regarding efficiency. (C) 2021 Elsevier B.V. All rights reserved.
HTML tablesInformation extractionCoral-reef optimisationFeature selectionClusteringWEB DATA EXTRACTIONOPTIMIZATIONALGORITHM
Jimenez, Patricia、Corchuelo, Rafael、Roldan, Juan C.