Training an environment generator of a generative adversarial network (GAN) to generate realistic environments that incorporate reinforcement learning (RL) algorithm feedback

Item request has been placed!

Item request cannot be made.

Processing Request

اقرأ أكثر حفظ في قائمتي

Publication Date:
October 22, 2024

معلومة اضافية
- Patent Number:
  12124,537
- Appl. No:
  17/567718
- Application Filed:
  January 03, 2022
- نبذة مختصرة :
  A computer-implemented method according to one embodiment includes causing an environment generator of a Generative Adversarial Network (GAN) to generate realistic training environments, and causing a first discriminator of the GAN to determine whether the realistic training environments are real or fake. In response to a determination that an accuracy of the first discriminator at determining whether the realistic training environments are real or fake is within a predetermined range, the environment generator is caused to generate a first realistic environment. The method further includes causing the first realistic environment to be shared with an agent of a reinforcement learning (RL) algorithm and a second discriminator, and receiving, from the agent of the RL algorithm and the second discriminator, feedback associated with the first realistic environment. The environment generator is caused to generate a second realistic environment based on the feedback associated with the first realistic environment.
- Inventors:
  International Business Machines Corporation (Armonk, NY, US)
- Assignees:
  International Business Machines Corporation (Armonk, NY, US)
- Claim:
  1. A computer-implemented method, comprising: causing an environment generator of a Generative Adversarial Network (GAN) to generate realistic training environments; causing a first discriminator of the GAN to determine whether the realistic training environments are real or fake to train the environment generator to generate realistic environments; in response to a determination that an accuracy of the first discriminator at determining whether the realistic training environments are real or fake is within a predetermined range, causing the environment generator to generate a first realistic environment; causing the first realistic environment to be shared with an agent of a reinforcement learning (RL) algorithm and a second discriminator that is a different discriminator than the first discriminator; receiving, from the agent of the RL algorithm and the second discriminator, feedback associated with the first realistic environment, wherein the feedback includes a confidence score that includes a numerical score of difficulty incorporated into the first realistic environment; and causing the environment generator to generate a second realistic environment based on the feedback associated with the first realistic environment.
- Claim:
  2. The computer-implemented method of claim 1 , comprising: causing the agent to perform at least one action of a plurality of predetermined actions using the first realistic environment to obtain an actual reward.
- Claim:
  3. The computer-implemented method of claim 2 , wherein the agent does not receive a reward as a result of performing the at least one action, wherein generating the second realistic environment includes: adding a predetermined degree of difficulty to the first realistic environment in response to a determination that the feedback indicates that the agent did not receive a reward as a result of performing the at least one action.
- Claim:
  4. The computer-implemented method of claim 1 , wherein the feedback is based on an actual reward obtained by the agent of the RL algorithm, wherein the confidence score includes a determination of whether a degree of difficulty incorporated into the first realistic environment is correct.
- Claim:
  5. The computer-implemented method of claim 4 , wherein generating the second realistic environment includes: adding a predetermined degree of difficulty to the first realistic environment in response to a determination that the confidence score indicates that the degree of difficulty incorporated into the first realistic environment is incorrect and/or the numerical score of difficulty falls within a predetermined range of values.
- Claim:
  6. The computer-implemented method of claim 4 , wherein generating the second realistic environment includes: subtracting a predetermined degree of difficulty from the first realistic environment in response to a determination that the confidence score indicates that the degree of difficulty incorporated into the first realistic environment is incorrect and/or the numerical score of difficulty falls within a predetermined range of values.
- Claim:
  7. The computer-implemented method of claim 1 , comprising: causing the second realistic environment to be shared with the agent of the RL algorithm and the second discriminator; receiving, from the agent of the RL algorithm and the second discriminator, feedback associated with the second realistic environment; and causing the environment generator to generate a third realistic environment based on the feedback associated with the second realistic environment.
- Claim:
  8. A computer program product, the computer program product comprising a computer readable storage medium having program instructions embodied therewith, the program instructions readable and/or executable by a computer to cause the computer to: cause, by the computer, an environment generator of a Generative Adversarial Network (GAN) to generate realistic training environments; cause, by the computer, a first discriminator of the GAN to determine whether the realistic training environments are real or fake to train the environment generator to generate realistic environments; in response to a determination that an accuracy of the first discriminator at determining whether the realistic training environments are real or fake is within a predetermined range, cause, by the computer, the environment generator to generate a first realistic environment; cause, by the computer, the first realistic environment to be shared with an agent of a reinforcement learning (RL) algorithm and a second discriminator; receive, by the computer, feedback associated with the first realistic environment, wherein the feedback includes a confidence score that includes a numerical score of difficulty incorporated into the first realistic environment; and cause, by the computer, the environment generator to generate a second realistic environment based on the feedback associated with the first realistic environment.
- Claim:
  9. The computer program product of claim 8 , the program instructions readable and/or executable by the computer to cause the computer to: cause, by the computer, the agent to perform at least one action of a plurality of predetermined actions using the first realistic environment to obtain an actual reward.
- Claim:
  10. The computer program product of claim 9 , wherein the agent does not receive a reward as a result of performing the at least one action, wherein generating the second realistic environment includes: adding a predetermined degree of difficulty to the first realistic environment in response to a determination that the feedback indicates that the agent did not receive a reward as a result of performing the at least one action.
- Claim:
  11. The computer program product of claim 8 , wherein the feedback is received, by the computer, from the agent of the RL algorithm and the second discriminator and is based on an actual reward obtained by the agent of the RL algorithm, wherein the confidence score includes a determination of whether a degree of difficulty incorporated into the first realistic environment is correct.
- Claim:
  12. The computer program product of claim 11 , wherein generating the second realistic environment includes: adding a predetermined degree of difficulty to the first realistic environment in response to a determination that the confidence score indicates that the degree of difficulty incorporated into the first realistic environment is incorrect and/or the numerical score of difficulty falls within a predetermined range of values.
- Claim:
  13. The computer program product of claim 11 , wherein generating the second realistic environment includes: subtracting a predetermined degree of difficulty from the first realistic environment in response to a determination that the confidence score indicates that the degree of difficulty incorporated into the first realistic environment is incorrect and/or the numerical score of difficulty falls within a predetermined range of values.
- Claim:
  14. The computer program product of claim 8 , the program instructions readable and/or executable by the computer to cause the computer to: cause, by the computer, the second realistic environment to be shared with the agent of the RL algorithm and the second discriminator; receive, by the computer, from the agent of the RL algorithm and the second discriminator, feedback associated with the second realistic environment; and cause, by the computer, the environment generator to generate a third realistic environment based on the feedback associated with the second realistic environment.
- Claim:
  15. A system, comprising: a processor; and logic integrated with the processor, executable by the processor, or integrated with and executable by the processor, the logic being configured to: cause an environment generator of a Generative Adversarial Network (GAN) to generate realistic training environments; cause a first discriminator of the GAN to determine whether the realistic training environments are real or fake to train the environment generator to generate realistic environments; in response to a determination that an accuracy of the first discriminator at determining whether the realistic training environments are real or fake is within a predetermined range, cause the environment generator to generate a first realistic environment; cause the first realistic environment to be shared with an agent of a reinforcement learning (RL) algorithm and a second discriminator; receive, from the agent of the RL algorithm and the second discriminator, feedback associated with the first realistic environment, wherein the feedback includes a confidence score that is based on an actual reward obtained by the agent of the RL algorithm, wherein the confidence score includes a numerical score of difficulty incorporated into the first realistic environment and a determination of whether a degree of difficulty incorporated into the first realistic environment is correct; and cause the environment generator to generate a second realistic environment based on the feedback associated with the first realistic environment.
- Claim:
  16. The system of claim 15 , the logic being configured to: cause the agent to perform at least one action of a plurality of predetermined actions using the first realistic environment to obtain an actual reward.
- Claim:
  17. The system of claim 16 , wherein the agent does not receive a reward as a result of performing the at least one action, wherein generating the second realistic environment includes: adding a predetermined degree of difficulty to the first realistic environment in response to a determination that the feedback indicates that the agent did not receive a reward as a result of performing the at least one action.
- Claim:
  18. The system of claim 15 , wherein generating the second realistic environment includes: adding a predetermined degree of difficulty to the first realistic environment in response to a determination that the confidence score indicates that the degree of difficulty incorporated into the first realistic environment is incorrect and/or the numerical score of difficulty falls within a predetermined range of values.
- Claim:
  19. The system of claim 15 , wherein generating the second realistic environment includes: subtracting a predetermined degree of difficulty from the first realistic environment in response to a determination that the confidence score indicates that the degree of difficulty incorporated into the first realistic environment is incorrect and/or the numerical score of difficulty falls within a predetermined range of values.
- Patent References Cited:
  20190050726 February 2019 Azaria
  20190126472 May 2019 Tunyasuvunakool et al.
  20200065673 February 2020 Huang et al.
  20200234144 July 2020 Such et al.
  20220366264 November 2022 Moradi
  20230214725 July 2023 Hu
  2020216431 October 2020
- Other References:
  Sarmad et al., “RL-GAN-Net: A Reinforcement Learning Agent Controlled GAN Network for Real-Time Point Cloud Shape Completion,” arXiv, 2019, 21 pages, received from https://arxiv.org/pdf/1904.12304.pdf. cited by applicant
  Zhao et al., “Simulating User Feedback for Reinforcement Learning Based Recommendations,” arXiv, 2019, 10 pages, retrieved from https://arxiv.org/pdf/1906.11462.pdf. cited by applicant
  Florensa et al., “Automatic Goal Generation for Reinforcement Learning Agents,” Proceedings of the 35th International Conference on Machine Learning, 2018, 14 pages, retrieved from https://arxiv.org/pdf/1705.06366.pdf. cited by applicant
  Kim et al., “Learning to Simulate Dynamic Environments with GameGAN,” arXiv, 2020, 16 pages, retrieved from https://arxiv.org/pdf/2005.12126.pdf. cited by applicant
- Primary Examiner:
  Hicks, Austin
- Attorney, Agent or Firm:
  Zilka-Kotab, P.C.
- الرقم المعرف:
  edspgr.12124537

تعليقات

No Comments.

Training an environment generator of a generative adversarial network (GAN) to generate realistic environments that incorporate reinforcement learning (RL) algorithm feedback

اتصل بنا

اتبع