Interactive Tuples Extraction from Semi-Structured Data

Item request has been placed!

Item request cannot be made.

Processing Request

اقرأ أكثر حفظ في قائمتي

المؤلفون: Gilleron, Rémi; Marty, Patrick; Torre, Fabien; Tommasi, Marc
المصدر:
Web Intelligence ; https://inria.hal.science/inria-00581253 ; Web Intelligence, Dec 2006, Hong Kong, China
الموضوع:
[INFO.INFO-WB]Computer Science [cs]/Web; [INFO.INFO-LG]Computer Science [cs]/Machine Learning [cs.LG]
نوع التسجيلة:
conference object
اللغة:
English

معلومة اضافية
- Contributors:
  Laboratoire d'Informatique Fondamentale de Lille (LIFL); Université de Lille, Sciences et Technologies-Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lille, Sciences Humaines et Sociales-Centre National de la Recherche Scientifique (CNRS); Modeling Tree Structures, Machine Learning, and Information Extraction (MOSTRARE); Université de Lille, Sciences et Technologies-Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lille, Sciences Humaines et Sociales-Centre National de la Recherche Scientifique (CNRS)-Université de Lille, Sciences et Technologies-Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lille, Sciences Humaines et Sociales-Centre National de la Recherche Scientifique (CNRS)-Inria Lille - Nord Europe; Institut National de Recherche en Informatique et en Automatique (Inria); Groupe de Recherche en Apprentissage Automatique (GRAppA - LIFL); Université de Lille, Sciences et Technologies-Université de Lille, Sciences Humaines et Sociales-Centre National de la Recherche Scientifique (CNRS)
- بيانات النشر:
  HAL CCSD
- الموضوع:
  2006
- Collection:
  Université de Lille 3 - Sciences Humaines et Sociales: HAL
- الموضوع:
  Hong Kong; China
- نبذة مختصرة :
  International audience ; This paper studies from a machine learning viewpoint the problem of extracting tuples of a target n-ary relation from tree structured data like XML or XHTML documents. Our system can extract, without any post-processing, tuples for all data structures including nested, rotated and cross tables. The wrapper induction algorithm we propose is based on two main ideas. It is incremental: partial tuples are extracted by increasing length. It is based on a representation-enrichment procedure: partial tuples of length i are encoded with the knowledge of extracted tu- ples of length i − 1. The algorithm is then set in a friendly interactive wrapper induction system for Web documents. We evaluate our system on several information extraction tasks over corporate Web sites. It achieves state-of-the-art results on simple data structures and succeeds on complex data structures where previous approaches fail. Experiments also show that our interactive framework significantly reduces the number of user interactions needed to build a wrapper.
- Relation:
  inria-00581253; https://inria.hal.science/inria-00581253; https://inria.hal.science/inria-00581253/document; https://inria.hal.science/inria-00581253/file/WI2006.pdf
- الدخول الالكتروني :
  https://inria.hal.science/inria-00581253
  https://inria.hal.science/inria-00581253/document
  https://inria.hal.science/inria-00581253/file/WI2006.pdf
- Rights:
  info:eu-repo/semantics/OpenAccess
- الرقم المعرف:
  edsbas.9792018E

تعليقات

No Comments.

Interactive Tuples Extraction from Semi-Structured Data

اتصل بنا

اتبع