Item request has been placed! ×
Item request cannot be made. ×
loading  Processing Request

METHOD AND APPARATUS FOR INCREMENTAL LEARNING

Item request has been placed! ×
Item request cannot be made. ×
loading   Processing Request
  • Publication Date:
    May 5, 2022
  • معلومة اضافية
    • Document Number:
      20220138633
    • Appl. No:
      17/317421
    • Application Filed:
      May 11, 2021
    • نبذة مختصرة :
      An electronic device and method for performing class-incremental learning are provided. The method includes designating a pre-trained first model for at least one past data class as a first teacher; training a second model; designating the trained second model as a second teacher; performing dual-teacher information distillation by maximizing mutual information at intermediate layers of the first teacher and second teacher; and transferring the information to a combined student model.
    • Claim:
      1. A method of performing class-incremental learning, the method comprising: designating a pre-trained first model for at least one past data class as a first teacher; training a second model; designating the trained second model as a second teacher; performing dual-teacher information distillation by maximizing mutual information at intermediate layers of the first teacher and second teacher; and transferring the information to a combined student model.
    • Claim:
      2. The method of claim 1, further comprising: training at least one of a first conditional generator or a second conditional generator to generate synthetic data, given the first model or the second model, without using any stored training data, wherein the synthetic data is configured to mimic training data used to train the first teacher or the second teacher.
    • Claim:
      3. The method of claim 2, further comprising: determining a cross-entropy loss between a label input into the conditional generator and a value output from the first teacher or the second teacher; determining a batch-normalization statistics loss by matching mean and variance variables stored in batch-normalization layers of the first teacher or the second teacher with mean and variance variables computed at the same batch-normalization layers of the first teacher or the second teacher for information output from the conditional generator; and incrementally adjusting the conditional generator to account for the cross-entropy loss and the batch-normalization statistics loss.
    • Claim:
      4. The method of claim 1, wherein the first model designated as the first teacher is updated using weight imprinting by accessing stored training data.
    • Claim:
      5. The method of claim 1, wherein the trained second model designated as the second teacher is trained by using a “none” class in response to training data not being accessible.
    • Claim:
      6. The method of claim 1, wherein performing the dual-teacher information distillation further comprises: applying data-free generative replay to generate a first set of synthetic samples with a first conditional generator for a first class at a first time; applying data-free generative replay to generate a second set of synthetic samples with a second conditional generator for a second class at a second time, wherein the second time is after the first time; determining a dual-teacher information distillation loss based on the first set of synthetic samples and the second set of synthetic samples; and accounting for the dual-teacher information distillation loss when performing dual-teacher information distillation.
    • Claim:
      7. The method of claim 2, wherein training the first conditional generator or the second conditional generator further comprises using a pre-trained model to generate the synthetic data that is used to train the first conditional generator or the second conditional generator without using any stored training data.
    • Claim:
      8. The method of claim 1, wherein the second model designated as the second teacher is trained with new data for each new class that is introduced.
    • Claim:
      9. The method of claim 1, wherein data output from the second teacher and data output from the first teacher are applied to the combined student model to perform dual-teacher information distillation.
    • Claim:
      10. An electronic device for performing class-incremental learning, the electronic device comprising a non-transitory computer readable memory and a processor, wherein the processor, upon executing instructions stored in the non-transitory computer readable memory, is configured to: designate a pre-trained first model for at least one past data class as a first teacher; train a second model; designate the trained second model as a second teacher; perform dual-teacher information distillation by maximizing mutual information at intermediate layers of the first teacher and second teacher; and transferring the information to a combined student model.
    • Claim:
      11. The electronic device of claim 10, wherein the processor, upon executing the instructions stored in the non-transitory computer readable memory, is further configured to: train at least one of with a first conditional generator or a second conditional generator to generate synthetic data, given the first model or the second model, without using any stored training data, wherein the synthetic data is configured to mimic training data used to train the first teacher or the second teacher.
    • Claim:
      12. The electronic device of claim 11, wherein the processor, upon executing the instructions stored in the non-transitory computer readable memory, is further configured to: determine a cross-entropy loss between a label input into the conditional generator and a value output from the first teacher or the second teacher; determine a batch-normalization statistics loss by matching mean and variance variables stored in batch-normalization layers of the first teacher or the second teacher with mean and variance variables computed at the same batch-normalization layers of the first teacher or the second teacher for information output from the conditional generator; and incrementally adjust the conditional generator to account for the cross-entropy loss and the batch-normalization statistics loss.
    • Claim:
      13. The electronic device of claim 10, wherein the first model designated as the first teacher is updated using weight imprinting by accessing stored training data.
    • Claim:
      14. The electronic device of claim 10, wherein the trained second model designated as the second teacher is trained by using a “none” class in response to training data not being accessible.
    • Claim:
      15. The electronic device of claim 10, wherein performing the dual-teacher information distillation further comprises: applying data-free generative replay to generate a first set of synthetic samples with a first conditional generator for a first class at a first time; applying data-free generative replay to generate a second set of synthetic samples with a second conditional generator for a second class at a second time, wherein the second time is after the first time; determining a dual-teacher information distillation loss based on the first set of synthetic samples and the second set of synthetic samples; and accounting for the dual-teacher information distillation loss when performing dual-teacher information distillation.
    • Claim:
      16. The electronic device of claim 11, wherein training the first conditional generator or the second conditional generator further comprises using a pre-trained model to generate the synthetic data that is used to train the first conditional generator or the second conditional generator without using any stored training data.
    • Claim:
      17. The electronic device of claim 10, wherein the second teacher is trained with new data for each new class that is introduced.
    • Claim:
      18. The electronic device of claim 10, wherein data output from the second teacher and data output from the pre-trained first teacher are applied to the combined student model to perform dual-teacher information distillation.
    • Current International Class:
      06; 06
    • الرقم المعرف:
      edspap.20220138633