Item request has been placed! ×
Item request cannot be made. ×
loading  Processing Request

Robust Detection of Synthetic Tabular Data under Schema Variability

Item request has been placed! ×
Item request cannot be made. ×
loading   Processing Request
  • معلومة اضافية
    • Contributors:
      Orange Labs Lannion; France Télécom; Apprentissage Automatique avec Contraintes Temporelles (MALT); Université de Rennes 2 (UR2)-Centre Inria de l'Université de Rennes; Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-GESTION DES DONNÉES ET DE LA CONNAISSANCE (IRISA-D7); Institut de Recherche en Informatique et Systèmes Aléatoires (IRISA); Université de Rennes (UR)-Institut National des Sciences Appliquées - Rennes (INSA Rennes); Institut National des Sciences Appliquées (INSA)-Institut National des Sciences Appliquées (INSA)-Université de Bretagne Sud (UBS)-École normale supérieure - Rennes (ENS Rennes)-Institut National de Recherche en Informatique et en Automatique (Inria)-CentraleSupélec-Centre National de la Recherche Scientifique (CNRS)-IMT Atlantique (IMT Atlantique); Institut Mines-Télécom Paris (IMT)-Institut Mines-Télécom Paris (IMT)-Université de Rennes (UR)-Institut National des Sciences Appliquées - Rennes (INSA Rennes); Institut Mines-Télécom Paris (IMT)-Institut Mines-Télécom Paris (IMT)-Institut de Recherche en Informatique et Systèmes Aléatoires (IRISA); Institut National des Sciences Appliquées (INSA)-Institut National des Sciences Appliquées (INSA)-Université de Bretagne Sud (UBS)-École normale supérieure - Rennes (ENS Rennes)-CentraleSupélec-Centre National de la Recherche Scientifique (CNRS)-IMT Atlantique (IMT Atlantique); Institut Mines-Télécom Paris (IMT)-Institut Mines-Télécom Paris (IMT); Université de Rennes (UR); ANR-23-IACL-0009,SequoIA,IA, trust, security(2023)
    • بيانات النشر:
      CCSD
    • الموضوع:
      2026
    • Collection:
      Archive Ouverte de l'Université Rennes (HAL)
    • الموضوع:
    • نبذة مختصرة :
      International audience ; The rise of powerful generative models has sparked concerns over data authenticity. While detection methods have been extensively developed for images and text, the case of tabular data, despite its ubiquity, has been largely overlooked. Yet, detecting synthetic tabular data is especially challenging due to its heterogeneous structure and unseen formats at test time. We address the underexplored task of detecting synthetic tabular data "in the wild'', i.e. when the detector is deployed on tables with variable and previously unseen schemas. We introduce a novel datum-wise transformer architecture that significantly outperforms the only previously published baseline, improving both AUC and accuracy by 7 points. By incorporating a table-adaptation component, our model gains an additional 7 accuracy points, demonstrating enhanced robustness. This work provides the first strong evidence that detecting synthetic tabular data in real-world conditions is feasible, and demonstrates substantial improvements over previous approaches. Following acceptance of the paper, we are finalizing the administrative and licensing procedures necessary for releasing the source code. This extended version will be updated as soon as the release is complete.
    • Relation:
      info:eu-repo/semantics/altIdentifier/arxiv/2509.00092; ARXIV: 2509.00092
    • الدخول الالكتروني :
      https://hal.science/hal-05012441
      https://hal.science/hal-05012441v3/document
      https://hal.science/hal-05012441v3/file/extended_version.pdf
    • Rights:
      https://creativecommons.org/licenses/by/4.0/ ; info:eu-repo/semantics/OpenAccess
    • الرقم المعرف:
      edsbas.C03A4FAC