首页|非关系型表格理解前沿进展

非关系型表格理解前沿进展

扫码查看
表格理解是指通过计算机对广泛存在于互联网、垂直领域的表格进行自动识别、解析和应用的过程.表格可大致分为关系型表格和非关系型表格.前者类似关系数据库表格,具有结构固定、机器易解析等特点,其研究历史由来已久.后者通常布局多变,语法灵活,具有更明显的语言特性,这也导致计算机在解析和应用非关系型表格时面临着极大挑战.非关系型表格理解是自然语言和计算机视觉多模态交叉的重要新兴领域之一.随着近年来深度学习技术的普及应用,非关系型表格在表格识别、语义分析、创新应用几个方向得到了长足发展.该文介绍了非关系型表格的结构特点,阐述了其在研究过程中面临的独特挑战,然后从表格识别、语义分析、创新应用三个研究方向简要介绍了近年来此领域的发展,归纳了相关数据集,最后总结了 目前非关系型表格理解领域亟需解决的问题,展望了未来研究方向.
A Survey on Non-Relational Table Understanding
Table understanding is the process of automatically recognizing,parsing,and applying tables that are widely available on the Internet and in vertical domains.Tables can be broadly classified into relational tables and non-relational tables.The former is similar to relational database tables,with a fixed structure easy for machine par-sing.The latter is usually more flexible in layout and syntax,with more obvious linguistic features,which is very challenging for computers to parse.Non-relational table understanding is one of the important emerging areas at the intersection of natural language and computer vision.With the popularity of deep learning technology in recent years,non-relational table understanding has been greatly developed in several directions,including recognition,se-mantic analysis,and application.This paper introduces the characteristics of non-relational tables,then systemati-cally introduces the recent developments in this field from the three research directions mentioned above.It also summarizes the public datasets related to non-relational tables,reveals the existing problems that need to be solved in non-relational table understanding and ends with possible future research directions.

table intelligencedeep learningmultimodal nature language processing

罗平、杨清平、曹逸轩、曹荣禹、何清

展开 >

中国科学院计算技术研究所中国科学院智能信息处理重点实验室,北京 100190

中国科学院大学,北京 100049

鹏城实验室,广东深圳 518066

表格智能 深度学习 多模态自然语言处理

国家自然科学基金国家自然科学基金国家自然科学基金国家博士后基金

62076231U1811461622062652021M703271

2024

中文信息学报
中国中文信息学会,中国科学院软件研究所

中文信息学报

CSTPCDCHSSCD北大核心
影响因子:0.8
ISSN:1003-0077
年,卷(期):2024.38(5)
  • 1