- Document Number:
20110178964
- Appl. No:
12/690976
- Application Filed:
January 21, 2010
- نبذة مختصرة :
The present invention solves problems of cold start, first rater, sparsity and scalability for recommendation. A recommendation system according to the present invention finds association rules through data mining. Then, the recommendation system integrates a rough-set algorithm and a statistical analysis prediction for recommendation. The recommendation is dynamically made from a result of the rough-set algorithm and a result of the statistical analysis prediction by setting a standard deviation as a threshold.
- Inventors:
Tseng, Shin-Mu (Tainan City, TW); Su, Ja-Hwung (Qiaotou Shiang, TW); Hsaio, Chin Yuan (Tainan City, TW)
- Assignees:
NATIONAL CHENG KUNG UNIVERSITY (Tainan City, TW)
- Claim:
1. A recommendation method using rough-set and multiple features mining integrally, said method comprising a training session and a prediction session, said training session building association rules, user clusters and rating tables, said training session comprising steps of: (a) providing data including user profiles, user rating logs and item contents; (b) pre-processing said data to obtain a transaction table; (c) associating transactions in said transaction table to obtain a plurality of associations to further obtain a plurality of association rules through data mining to be saved in an association rule database; (d) obtaining said user rating logs to divide users in said user rating logs into user clusters through a clustering algorithm to be saved in a user cluster database; and (e) analyzing said transactions in said transaction table to re-symbolize items into item categories through a statistical analysis and reorganizing user rating logs to obtain rating averages of said item categories and to further obtain a rating table of said re-symbolized item categories, said prediction session applying rough-set and statistical analysis prediction to obtain predicted rating values from said user rating logs, said prediction session comprising steps of: (f) finding a user cluster of related users to a target user from said user clusters to obtain a rating table of said related users and said target user; (g) based on said association rules, predicting unknown values in said rating table other than rating value of a target item of said target user to obtain a complete sub-matrix; (h) obtaining a class item, a referred item and a plurality of item sets in said sub-matrix, obtaining a plurality of first elementary sets by dividing said users with said class item, obtaining a plurality of second elementary sets by dividing said users with said item sets, and comparing said first elementary sets and said second elementary sets to obtain a lower approximation through a rough-set algorithm using a user cardinality constraint and an item cardinality constraint to further obtain a predicted rating value of said target item of said target user; (i) obtaining predicted rating values of said item categories in said rating table obtained through said statistical analysis prediction in said training session to further obtain another predicted rating value of said target item of said target user; and (j) obtaining a final predicted rating value of said target item of said target user through a switch-based mixing, wherein a first standard deviation is pre-set as a threshold; wherein said predicted rating value obtained through said statistical analysis prediction is obtained as said final predicted rating value of said target item on obtaining a second standard deviation bigger than said threshold, said second standard deviation being a standard deviation of past rating values of the same item category as that of said target item; and wherein said predicted rating value obtained through said rough-set algorithm is obtained as said final predicted rating value on obtaining said second standard deviation not bigger than said threshold.
- Claim:
2. The recommendation method according to claim 1, wherein, in step (d), a Pearson correlation coefficient is used in said clustering algorithm to divide users into said user clusters based on similarities of said users to said target user.
- Claim:
3. The recommendation method according to claim 1, wherein said clustering algorithm is a K-means algorithm.
- Claim:
4. The recommendation method according to claim 1, wherein, in step (f), similarities of centers of said user clusters to said target user on rating logs are obtained through a distance formula of Pearson correlation coefficient to obtain a plurality of said user clusters having a nearest distance to said target user.
- Claim:
5. The recommendation method according to claim 1, wherein, in step (g), a distance formula of Pearson correlation coefficient is used to obtain a similarity of each item in said sub-matrix to said target item and said distance formula is as follows: [mathematical expression included]
- Claim:
6. The recommendation method according to claim 1, wherein said user cardinality constraint is used to limit a size of said lower approximation.
- Claim:
7. The recommendation method according to claim 1, wherein said item cardinality constraint is used to limit a number of said item sets on building said second elementary sets.
- Claim:
8. The recommendation method according to claim 1, wherein step (h) further comprises steps of: (h1) obtaining said sub-matrix built in step (g); (h2) obtaining similarities of items in said sub-matrix to said target item and obtaining an item having the highest similarity as a class item; (h3) obtaining similarities of items in said sub-matrix other than said class item to said target item to obtain a referred item by setting an item cardinality constraint in said rough-set algorithm and to obtain item sets most related to said target item through sorting; (h4) based on said class item, dividing said users into user clusters to obtain a plurality of first elementary sets and, based on said item sets, dividing said users other than said target user having rating logs of equivalent class into user clusters to obtain a plurality of second elementary sets; (h5) comparing said first elementary sets and said second elementary sets to obtain items completely included in both groups of sets as a lower approximation; (h6) obtaining related items in the lower approximation by setting a user cardinality constraint in said rough-set algorithm, wherein step (h3) is went back on number of said related items in the lower approximation bigger than said user cardinality constraint; and (h7) based on said lower approximation, obtaining predicted rating value of said target item of said target user while rating value of said target item is similar to rating value of said item set.
- Claim:
9. The recommendation method according to claim 1, wherein said standard deviation used in said switch-based mixing has a formula as follows: [mathematical expression included]
- Claim:
10. A recommendation system using rough-set and multiple features mining integrally, comprising: a user-and-item module, said user-and-item module providing user profiles, user rating logs and item contents; a data integration module, said data integration module receiving said user profiles, said user rating logs and said item contents from said user-and-item module to be pre-processed to obtain a transaction table by integrating said user rating logs with said user profiles and said item contents; an association mining module, said association mining module receiving said transaction table from said data integration module to obtain associations in said transaction table, wherein said associations are saved as association rules in an association rule database; a user clustering module, said user clustering module receiving said user rating logs to divide users into user clusters and saving said user clusters in a user cluster database; a statistical analysis prediction (SAP) module, said SAP module receiving said transaction table to process a statistical analysis to each transaction to be summarized by category to obtain a rating table of each re-symbolized item categories and obtaining predicted rating values of said item categories of each user according to said rating table; a user-cluster selection module, said user-cluster selection module obtaining a user cluster of related users other than said target user from said user clusters in said user cluster database based on rating logs of said target user and obtaining a rating table of all items of said target user and said related users; a data matrix module, said data matrix module receiving said rating table to predict unknown values of items other than that of said target item of said target user according to said association rules in said association rule database to obtain a complete sub-matrix; a rough-set prediction (RSP) module, said RSP module receiving said sub-matrix to compare first elementary sets and second elementary sets to obtain a lower approximation to further obtain a predicted rating value of said target item of said target user, wherein said first elementary sets are obtained through dividing said sub-matrix into equivalent classes with a class item; and wherein said second elementary sets are obtained through dividing said sub-matrix into equivalent classes by item sets; and a deviation decision module, said deviation decision module receiving said predicted rating value from said SAP module and said predicted rating value from said RSP module to dynamically obtain a final predicted rating value by setting a threshold.
- Current U.S. Class:
706/12
- Current International Class:
06; 06
- الرقم المعرف:
edspap.20110178964
No Comments.