- Patent Number:
12217,875
- Appl. No:
17/978009
- Application Filed:
October 31, 2022
- نبذة مختصرة :
A method for generating synthetic training records for use in training a model to predict low-incidence events. A synthetic training record is generated from a minority-class training record by substituting a different value for a feature in the minority-class training record, where the probability of the different value occurring in the minority-class training record exceeds a probability threshold. Also disclosed are a non-transitory storage medium comprising minority-class training records and synthetic training records and a method of training a machine-leaning model using training records augmented with synthetic training records. An exemplary synthetic training records is a synthetic medical record for use in training a model to predict drug overdoses.
- Inventors:
Pulselight Holdings, Inc. (Austin, TX, US)
- Assignees:
Pulselight Holdings, Inc. (Austin, TX, US)
- Claim:
1. A computer-implemented method of generating synthetic minority-class training records for machine learning, the method performed by a computer system, said computer system comprising one or more processors and computer-usable non-transitory storage media operationally coupled to the one or more processors, comprising: storing in the non-transitory storage media a plurality of original minority-class training records, including a first minority-class training record, wherein each of the plurality of original minority-class training records is labeled with a same first label and comprises a feature value for each of a plurality of features, including a first feature, and wherein the first minority-class training record comprises a first feature value for the first feature; using a computational process performed by the one or more processors executing software instructions stored in the computer-usable non-transitory storage media, determining that the probability of the first feature having a different second feature value in the first minority-class training record exceeds a pre-determined probability threshold; and generating a first synthetic minority-class training record from the first minority-class training record, comprising changing the feature value of the first feature in the first minority-class training record from the first feature value to the second feature value, and storing the modified version of the first minority-class training record as the first synthetic minority-class training record in the non-transitory storage media, thereby augmenting the plurality of original minority-class training records with the first synthetic minority-class training record.
- Claim:
2. The computer-implemented method of claim 1 of generating synthetic minority-class training records for machine learning, wherein the first feature can be measured with respect to health.
- Claim:
3. The computer-implemented method of claim 1 of generating synthetic minority-class training records for machine learning, wherein the original minority-class training records comprise medical records.
- Claim:
4. The computer-implemented method of claim 1 of generating synthetic minority-class training records for machine learning, wherein the original minority-class training records are high-dimensional.
- Claim:
5. The computer-implemented method of claim 1 of generating synthetic minority-class training records for machine learning, wherein the first label describes a low incidence event.
- Claim:
6. The computer-implemented method of claim 5 of generating synthetic minority-class training records for machine learning, wherein the first label describes a health care incident.
- Claim:
7. The computer-implemented method of claim 6 of generating synthetic minority-class training records for machine learning, wherein the first label describes a medication incident.
- Claim:
8. The computer-implemented method of claim 1 of generating synthetic minority-class training records for machine learning, wherein the computational process comprises logistic regression.
- Claim:
9. The computer-implemented method of claim 1 of generating synthetic minority-class training records for machine learning, wherein the probability threshold is within the range 0.175 to 0.7.
- Claim:
10. The computer-implemented method of claim 9 of generating synthetic minority-class training records for machine learning, wherein the probability threshold is 0.2.
- Claim:
11. The computer-implemented method of claim 1 of generating synthetic minority-class training records for machine learning, wherein the first feature is binary.
- Claim:
12. The computer-implemented method of claim 1 of generating synthetic minority-class training records for machine learning, wherein the first feature is categorical.
- Claim:
13. A computer-usable non-transitory storage medium comprising an augmented plurality of minority-class training records for machine learning, comprising: a plurality of original minority-class training records wherein each of the plurality of original minority-class training records is labeled with a same first label and comprises a feature value for each of a plurality of features; and one or more synthetic minority-class training records having the same first label, wherein each of the one or more synthetic minority-class training records has been generated using a computational method implemented by one or more processors executing software instructions, the computational method comprising: changing the feature value of a first feature in an original minority-class training record from a first feature value to a different second feature value, wherein the probability of the first feature having the different second feature value in the original minority-class training record exceeds a pre-determined probability threshold.
- Claim:
14. The computer-usable non-transitory storage medium comprising an augmented plurality of minority-class training records for machine learning of claim 13 , wherein the first feature can be measured with respect to health.
- Claim:
15. The computer-usable non-transitory storage medium comprising an augmented plurality of minority-class training records for machine learning of claim 13 , wherein the first label describes a low-incidence event.
- Claim:
16. A computer-implemented method of training a machine learning model to predict a low-incidence event, the method performed by a computer system, said computer system comprising one or more processors and computer-usable non-transitory storage media operationally coupled to the one or more processors, comprising: storing in the non-transitory storage media a plurality of machine-learning training records, the plurality of machine-learning training records comprising: a plurality of majority-class training records; a plurality of original minority-class training records, each original minority-class training record having a same minority-class label describing a low-incidence event; and one or more synthetic minority-class training records having the same minority-class label, wherein each of the one or more synthetic minority-class training records has been generated using a computational method, the method comprising: changing the feature value of a first feature in an original minority-class training record from a first feature value to a different second feature value, wherein the probability of the first feature having the different second feature value in the original minority-class training record exceeds a pre-determined probability threshold; and using a computational machine-learning method performed by one or more processors executing software instructions stored in the computer-usable non-transitory storage media, training a machine learning model with the plurality of machine-learning training records to predict the low-incidence event.
- Claim:
17. The computer-implemented method of training a machine learning model to predict a low-incidence event of claim 16 , wherein the first feature can be measured with respect to health.
- Claim:
18. The computer-implemented method of training a machine learning model to predict a low-incidence event of claim 16 , wherein each of the plurality of minority-class training records comprises medical records.
- Claim:
19. The computer-implemented method of training a machine learning model to predict a low-incidence event of claim 16 , wherein the low-incidence event comprises a health care incident.
- Claim:
20. The computer-implemented method of training a machine learning model to predict a low-incidence event of claim 19 , wherein the low-incidence event comprises a medication incident.
- Patent References Cited:
11488723 November 2022 Mugan
11977991 May 2024 Mugan
2017/0330109 November 2017 Maughan
2018/0052961 February 2018 Shrivastava
WO-2016016459 February 2016
- Other References:
Li, X., et al. “Using machine learning to predict opioid overdoses among prescription opioid users.” Value in Health 21 (2018): S245. (Year: 2018). cited by examiner
- Assistant Examiner:
Siozopoulos, Constantine
- Primary Examiner:
Dunham, Jason B
- Attorney, Agent or Firm:
Williams, Jr., J. Roger
- الرقم المعرف:
edspgr.12217875
No Comments.