نبذة مختصرة : More and more, videos are becoming the most common means of communication, leveraged by the popularization of affordable video recording devices and social networks such as TikTok, Instagram, and others. The most common ways of searching for videos on these social networks as well as on search portals are based on metadata linked to videos through keywords and previous classifications. However, keyword searches depend on exact knowledge of what you want and may not necessarily be efficient when trying to find a particular video from a description, superficial or not, of a particular scene, which may lead to frustrating results in the search. The objective of this work is to find a particular video within a list of available videos from a textual description in natural language based only on the content of its scenes, without relying on previously cataloged metadata. From a dataset containing videos with a defined number of descriptions of their scenes, a Siamese network with a triplet loss function was modeled to identify, in hyperspace, the similarities between two different modalities, one of them being the information extracted from a video, and the other information extracted from a text in natural language. The final architecture of the model, as well as the values of its parameters, was defined based on tests that followed the best results obtained. Because videos are not classified into groups or classes and considering that the triplet loss function is based on an anchor text and two video examples, one positive and one negative, a difficulty was identified in the selection of false examples needed for the model training. In this way, methods of choosing examples of negative videos for training were also tested using a random choice and a directed choice, based on the distances of the available descriptions of the videos in the training phase, being the first the most effective. At the end of the tests, a result was achieved with the exact presence of the searched video in 10.67% of the cases in the top ...
No Comments.