نبذة مختصرة : Integrating or linking data from different sources is an increasingly important task in the preprocessing stage of many data mining projects. The aim of such linkages is to merge all records relating to the same entity, such as a patient or a customer. If no common unique entity identifiers (keys) are available in all data sources, the linkage needs to be performed using the available identifying attributes, like names and addresses. Data confidentiality often limits or even prohibits successful data linkage, as either no consent can be gained (for example in biomedical studies) or the data holders are not willing to release their data for linkage by other parties. We present methods for confidential data linkage based on hash encoding, public key encryption and n-gram similarity comparison techniques, and show how blind data linkage can be performed.
No Comments.