Item request has been placed! ×
Item request cannot be made. ×
loading  Processing Request

Toddler-inspired embodied vision for learning object representations

Item request has been placed! ×
Item request cannot be made. ×
loading   Processing Request
  • معلومة اضافية
    • Contributors:
      Institut Pascal (IP); Centre National de la Recherche Scientifique (CNRS)-Université Clermont Auvergne (UCA)-Institut national polytechnique Clermont Auvergne (INP Clermont Auvergne); Université Clermont Auvergne (UCA)-Université Clermont Auvergne (UCA); Frankfurt Institute for Advanced Studies (FIAS )
    • بيانات النشر:
      HAL CCSD
    • الموضوع:
      2022
    • Collection:
      HAL Clermont Auvergne (Université Blaise Pascal Clermont-Ferrand / Université d'Auvergne)
    • الموضوع:
    • نبذة مختصرة :
      International audience ; Recent time-contrastive learning approaches manage to learn invariant object representations without supervision. This is achieved by mapping successive views of an object onto close-by internal representations. When considering this learning approach as a model of the development of human object recognition, it is important to consider what visual input a toddler would typically observe while interacting with objects. First, human vision is highly foveated, with high resolution only available in the central region of the field of view. Second, objects may be seen against a blurry background due to toddlers' limited depth of field. Third, during object manipulation a toddler mostly observes close objects filling a large part of the field of view due to their rather short arms. Here, we study how these effects impact the quality of visual representations learnt through time-contrastive learning. To this end, we let a visually embodied agent "play" with objects in different locations of a near photo-realistic flat. During each play session the agent views an object in multiple orientations before turning its body to view another object. The resulting sequence of views feeds a time-contrastive learning algorithm. Our results show that visual statistics mimicking those of a toddler improve object recognition accuracy in both familiar and novel environments. We argue that this effect is caused by the reduction of features extracted in the background, a neural network bias for large features in the image and a greater similarity between novel and familiar background regions. The results of our model suggest that several influences on toddler's visual input statistics support their unsupervised learning of object representations.
    • Relation:
      hal-03838291; https://hal.science/hal-03838291; https://hal.science/hal-03838291/document; https://hal.science/hal-03838291/file/ICDL_2022.pdf
    • الدخول الالكتروني :
      https://hal.science/hal-03838291
      https://hal.science/hal-03838291/document
      https://hal.science/hal-03838291/file/ICDL_2022.pdf
    • Rights:
      info:eu-repo/semantics/OpenAccess
    • الرقم المعرف:
      edsbas.DF3CA906