Item request has been placed! ×
Item request cannot be made. ×
loading  Processing Request

Multimodal Food Image Classification with Large Language Models

Item request has been placed! ×
Item request cannot be made. ×
loading   Processing Request
  • معلومة اضافية
    • بيانات النشر:
      Multidisciplinary Digital Publishing Institute
    • الموضوع:
      2024
    • Collection:
      MDPI Open Access Publishing
    • نبذة مختصرة :
      In this study, we leverage advancements in large language models (LLMs) for fine-grained food image classification. We achieve this by integrating textual features extracted from images using an LLM into a multimodal learning framework. Specifically, semantic textual descriptions generated by the LLM are encoded and combined with image features obtained from a transformer-based architecture to improve food image classification. Our approach employs a cross-attention mechanism to effectively fuse visual and textual modalities, enhancing the model’s ability to extract discriminative features beyond what can be achieved with visual features alone.
    • File Description:
      application/pdf
    • Relation:
      Computer Science & Engineering; https://dx.doi.org/10.3390/electronics13224552
    • الرقم المعرف:
      10.3390/electronics13224552
    • الدخول الالكتروني :
      https://doi.org/10.3390/electronics13224552
    • Rights:
      https://creativecommons.org/licenses/by/4.0/
    • الرقم المعرف:
      edsbas.8843ED87