Item request has been placed! ×
Item request cannot be made. ×
loading  Processing Request

Generative Adversarial Networks for Diverse and Explainable Text-to-Image Generation

Item request has been placed! ×
Item request cannot be made. ×
loading   Processing Request
  • المؤلفون: Zhang, Zhenxing
  • المصدر:
    Zhang , Z 2023 , ' Generative Adversarial Networks for Diverse and Explainable Text-to-Image Generation ' , Doctor of Philosophy , University of Groningen , [Groningen] . https://doi.org/10.33612/diss.507581028
  • نوع التسجيلة:
    book
  • اللغة:
    English
  • معلومة اضافية
    • بيانات النشر:
      University of Groningen
    • الموضوع:
      2023
    • Collection:
      University of Groningen research database
    • نبذة مختصرة :
      This thesis focuses on algorithms for text-to-image generation, which aim at yielding photo-realistic and semantically consistent pictures on the basis of a natural-language description. Chapter 1 provides a brief general introduction of research in image synthesis based on linguistic (textual) descriptions. In Chapter 2, we propose the Dual-Attention Generative-Adversarial Network (DTGAN) which can produce perceptually plausible pictures from given natural-language descriptions, only employing a single generator/discriminator network pair. Chapter 3 intends to deal with the lack-of-diversity issue present in current single-stage text-to-image generation models. To tackle this problem, we improve on DTGAN with an efficient and effective single-stage framework (DiverGAN) to yield diverse, photo-realistic and semantically related images according to a single natural-language description and different latent vectors. In Chapter 4, we constructed novel data sets for ‘Good vs Bad’ data consisting of successful as well as unsuccessful synthesized samples of birds and of human faces. For these, special classifiers were trained to ensure that generated images are natural, realistic and believable. In Chapter 5 and Chapter 6, we investigate the latent space and the linguistic space of a conditional text-to-image GAN model for an improved explainability of the results in the generation process. More specifically, we explore the relationship between the latent control space and the obtained image variation by conducting an independent-component analysis algorithm on pretrained weight values of the generator. Furthermore, we qualitatively analyze the roles played by ‘linguistic’ embeddings in the synthetic-image semantic space, by using linear and triangular interpolation between keywords.
    • File Description:
      application/pdf
    • Relation:
      https://research.rug.nl/en/publications/1f4cf491-3f28-4974-a8ee-07279a32128f
    • الرقم المعرف:
      10.33612/diss.507581028
    • Rights:
      info:eu-repo/semantics/openAccess
    • الرقم المعرف:
      edsbas.584C033B