Explicit Image Caption Reasoning: Generating Accurate and Informative Captions for Complex Scenes with LMM.

Item request has been placed!

Item request cannot be made.

Processing Request

اقرأ على الانترنت اقرأ أكثر حفظ في قائمتي

المؤلفون: Cui M;Cui M; Li C; Li C; Yang Y; Yang Y
المصدر:
Sensors (Basel, Switzerland) [Sensors (Basel)] 2024 Jun 13; Vol. 24 (12). Date of Electronic Publication: 2024 Jun 13.
نوع النشر :
Journal Article
اللغة:
English

معلومة اضافية
- المصدر:
  Publisher: MDPI Country of Publication: Switzerland NLM ID: 101204366 Publication Model: Electronic Cited Medium: Internet ISSN: 1424-8220 (Electronic) Linking ISSN: 14248220 NLM ISO Abbreviation: Sensors (Basel) Subsets: PubMed not MEDLINE; MEDLINE
- بيانات النشر:
  Original Publication: Basel, Switzerland : MDPI, c2000-
- نبذة مختصرة :
  The rapid advancement of sensor technologies and deep learning has significantly advanced the field of image captioning, especially for complex scenes. Traditional image captioning methods are often unable to handle the intricacies and detailed relationships within complex scenes. To overcome these limitations, this paper introduces Explicit Image Caption Reasoning (ECR), a novel approach that generates accurate and informative captions for complex scenes captured by advanced sensors. ECR employs an enhanced inference chain to analyze sensor-derived images, examining object relationships and interactions to achieve deeper semantic understanding. We implement ECR using the optimized ICICD dataset, a subset of the sensor-oriented Flickr30K-EE dataset containing comprehensive inference chain information. This dataset enhances training efficiency and caption quality by leveraging rich sensor data. We create the Explicit Image Caption Reasoning Multimodal Model (ECRMM) by fine-tuning TinyLLaVA with the ICICD dataset. Experiments demonstrate ECR's effectiveness and robustness in processing sensor data, outperforming traditional methods.
- Contributed Indexing:
  Keywords: explicit image caption; image caption; large multimodal model; prompt engineering
- الموضوع:
  Date Created: 20240627 Latest Revision: 20240629
- الموضوع:
  20250114
- الرقم المعرف:
  PMC11207553
- الرقم المعرف:
  10.3390/s24123820
- الرقم المعرف:
  38931605

تعليقات

No Comments.

Explicit Image Caption Reasoning: Generating Accurate and Informative Captions for Complex Scenes with LMM.

اتصل بنا

اتبع