Feature prediction for minority class data augmentation

Item request has been placed!

Item request cannot be made.

Processing Request

اقرأ أكثر حفظ في قائمتي

Publication Date:
February 04, 2025

معلومة اضافية
- Patent Number:
  12217,875
- Appl. No:
  17/978009
- Application Filed:
  October 31, 2022
- نبذة مختصرة :
  A method for generating synthetic training records for use in training a model to predict low-incidence events. A synthetic training record is generated from a minority-class training record by substituting a different value for a feature in the minority-class training record, where the probability of the different value occurring in the minority-class training record exceeds a probability threshold. Also disclosed are a non-transitory storage medium comprising minority-class training records and synthetic training records and a method of training a machine-leaning model using training records augmented with synthetic training records. An exemplary synthetic training records is a synthetic medical record for use in training a model to predict drug overdoses.
- Inventors:
  Pulselight Holdings, Inc. (Austin, TX, US)
- Assignees:
  Pulselight Holdings, Inc. (Austin, TX, US)
- Claim:
  1. A computer-implemented method of generating synthetic minority-class training records for machine learning, the method performed by a computer system, said computer system comprising one or more processors and computer-usable non-transitory storage media operationally coupled to the one or more processors, comprising: storing in the non-transitory storage media a plurality of original minority-class training records, including a first minority-class training record, wherein each of the plurality of original minority-class training records is labeled with a same first label and comprises a feature value for each of a plurality of features, including a first feature, and wherein the first minority-class training record comprises a first feature value for the first feature; using a computational process performed by the one or more processors executing software instructions stored in the computer-usable non-transitory storage media, determining that the probability of the first feature having a different second feature value in the first minority-class training record exceeds a pre-determined probability threshold; and generating a first synthetic minority-class training record from the first minority-class training record, comprising changing the feature value of the first feature in the first minority-class training record from the first feature value to the second feature value, and storing the modified version of the first minority-class training record as the first synthetic minority-class training record in the non-transitory storage media, thereby augmenting the plurality of original minority-class training records with the first synthetic minority-class training record.
- Claim:
  2. The computer-implemented method of claim 1 of generating synthetic minority-class training records for machine learning, wherein the first feature can be measured with respect to health.
- Claim:
  3. The computer-implemented method of claim 1 of generating synthetic minority-class training records for machine learning, wherein the original minority-class training records comprise medical records.
- Claim:
  4. The computer-implemented method of claim 1 of generating synthetic minority-class training records for machine learning, wherein the original minority-class training records are high-dimensional.
- Claim:
  5. The computer-implemented method of claim 1 of generating synthetic minority-class training records for machine learning, wherein the first label describes a low incidence event.
- Claim:
  6. The computer-implemented method of claim 5 of generating synthetic minority-class training records for machine learning, wherein the first label describes a health care incident.
- Claim:
  7. The computer-implemented method of claim 6 of generating synthetic minority-class training records for machine learning, wherein the first label describes a medication incident.
- Claim:
  8. The computer-implemented method of claim 1 of generating synthetic minority-class training records for machine learning, wherein the computational process comprises logistic regression.
- Claim:
  9. The computer-implemented method of claim 1 of generating synthetic minority-class training records for machine learning, wherein the probability threshold is within the range 0.175 to 0.7.
- Claim:
  10. The computer-implemented method of claim 9 of generating synthetic minority-class training records for machine learning, wherein the probability threshold is 0.2.
- Claim:
  11. The computer-implemented method of claim 1 of generating synthetic minority-class training records for machine learning, wherein the first feature is binary.
- Claim:
  12. The computer-implemented method of claim 1 of generating synthetic minority-class training records for machine learning, wherein the first feature is categorical.
- Claim:
  13. A computer-usable non-transitory storage medium comprising an augmented plurality of minority-class training records for machine learning, comprising: a plurality of original minority-class training records wherein each of the plurality of original minority-class training records is labeled with a same first label and comprises a feature value for each of a plurality of features; and one or more synthetic minority-class training records having the same first label, wherein each of the one or more synthetic minority-class training records has been generated using a computational method implemented by one or more processors executing software instructions, the computational method comprising: changing the feature value of a first feature in an original minority-class training record from a first feature value to a different second feature value, wherein the probability of the first feature having the different second feature value in the original minority-class training record exceeds a pre-determined probability threshold.
- Claim:
  14. The computer-usable non-transitory storage medium comprising an augmented plurality of minority-class training records for machine learning of claim 13 , wherein the first feature can be measured with respect to health.
- Claim:
  15. The computer-usable non-transitory storage medium comprising an augmented plurality of minority-class training records for machine learning of claim 13 , wherein the first label describes a low-incidence event.
- Claim:
  16. A computer-implemented method of training a machine learning model to predict a low-incidence event, the method performed by a computer system, said computer system comprising one or more processors and computer-usable non-transitory storage media operationally coupled to the one or more processors, comprising: storing in the non-transitory storage media a plurality of machine-learning training records, the plurality of machine-learning training records comprising: a plurality of majority-class training records; a plurality of original minority-class training records, each original minority-class training record having a same minority-class label describing a low-incidence event; and one or more synthetic minority-class training records having the same minority-class label, wherein each of the one or more synthetic minority-class training records has been generated using a computational method, the method comprising: changing the feature value of a first feature in an original minority-class training record from a first feature value to a different second feature value, wherein the probability of the first feature having the different second feature value in the original minority-class training record exceeds a pre-determined probability threshold; and using a computational machine-learning method performed by one or more processors executing software instructions stored in the computer-usable non-transitory storage media, training a machine learning model with the plurality of machine-learning training records to predict the low-incidence event.
- Claim:
  17. The computer-implemented method of training a machine learning model to predict a low-incidence event of claim 16 , wherein the first feature can be measured with respect to health.
- Claim:
  18. The computer-implemented method of training a machine learning model to predict a low-incidence event of claim 16 , wherein each of the plurality of minority-class training records comprises medical records.
- Claim:
  19. The computer-implemented method of training a machine learning model to predict a low-incidence event of claim 16 , wherein the low-incidence event comprises a health care incident.
- Claim:
  20. The computer-implemented method of training a machine learning model to predict a low-incidence event of claim 19 , wherein the low-incidence event comprises a medication incident.
- Patent References Cited:
  11488723 November 2022 Mugan
  11977991 May 2024 Mugan
  2017/0330109 November 2017 Maughan
  2018/0052961 February 2018 Shrivastava
  WO-2016016459 February 2016
- Other References:
  Li, X., et al. “Using machine learning to predict opioid overdoses among prescription opioid users.” Value in Health 21 (2018): S245. (Year: 2018). cited by examiner
- Assistant Examiner:
  Siozopoulos, Constantine
- Primary Examiner:
  Dunham, Jason B
- Attorney, Agent or Firm:
  Williams, Jr., J. Roger
- الرقم المعرف:
  edspgr.12217875

تعليقات

No Comments.

Feature prediction for minority class data augmentation

اتصل بنا

اتبع