Anonymization of Sensitive Quasi-Identifiers for l-Diversity and t-Closeness

ABSTRACT:

A number of studies on privacy-preserving data mining have been proposed. Most of them assume that they can separate quasi-identifiers (QIDs) from sensitive attributes. For instance, they assume that address, job, and age are QIDs but are not sensitive attributes and that a disease name is a sensitive attribute but is not a QID. However, all of these attributes can have features that are both sensitive attributes and QIDs in practice. In this paper, we refer to these attributes as sensitive QIDs and we propose novel privacy models, namely, (l1, …, lq)-diversity and (t1, …, tq)-closeness, and a method that can treat sensitive QIDs. Our method is composed of two algorithms: an anonymization algorithm and a reconstruction algorithm. The anonymization algorithm, which is conducted by data holders, is simple but effective, whereas the reconstruction algorithm, which is conducted by data analyzers, can be conducted according to each data analyzer’s objective. Our proposed method was experimentally evaluated using real data sets.

PROJECT OUTPUT VIDEO: (Click the below link to see the project output video):

EXISTING SYSTEM:

Many studies regarding anonymized databases of personal information have been proposed. Most existing methods consider that the data holder has a database in the form of explicit identifiers, quasi-identifiers (QIDs), or sensitive attributes, where explicit identifiers are attributes that explicitly identify individuals (e.g., name), QIDs are attributes that could be potentially combined with other directories to identify individuals (e.g., zip code and age), and sensitive attributes are personal attributes of a private nature (e.g., disease and salary)

DISADVANTAGES OF EXISTING SYSTEM:

The values of Disease are protected by frequency l-diversity, but other attributes are not protected. In practice, the age, address, and job of a person might be considered as private information. In this case, we should consider that these attributes have features of both QIDs and sensitive attributes k-anonymity cannot protect against “attribute disclosure.”

PROPOSED SYSTEM:

Our contributions are as follows: (1) we propose new privacy models, namely, (l1; : : : ; lq)-diversity and (t1; : : : ; tq)-closeness, which can treat databases containing several sensitive QIDs; (2) we propose a simple but effective general anonymization algorithm for (l1; : : : ; lq)-diversity and (t1; : : : ; tq)-closeness, which is conducted by data holders; and (3) we propose a novel reconstruction algorithm that can decrease the reconstructed error between the reconstructed and the original values according to each data analyzer’s purpose.

ADVANTAGES OF PROPOSED SYSTEM:

Our approach can reduce information loss, even when the number of attributes is large, because the randomization of attribute value is executed for each attribute independently. Furthermore, we can ensure that, even if the data analyzer knows all the sensitive QID values of the whole database except for one record, the sensitive QID values of the record are protected.

The dominant approach to anonymize databases for l-diversity and t-closeness is based on a generalization. The generalization approach is easily understandable for data analyzers. Moreover, because the truthfulness of each record is preserved, we can obtain some information from each record.

SYSTEM ARCHITECTURE:

SYSTEM REQUIREMENTS:

HARDWARE REQUIREMENTS:

· System : Pentium Dual Core.

· Hard Disk : 120 GB.

· Monitor : 15’’ LED

· Input Devices : Keyboard, Mouse

· Ram : 1 GB

SOFTWARE REQUIREMENTS:

· Operating system : Windows 7.

· Coding Language : JAVA/J2EE

· Tool : Netbeans 7.2.1

· Database : MYSQL

REFERENCE:

Yuichi Sei, Member, IEEE, Hiroshi Okumura, Takao Takenouchi, Akihiko Ohsuga, Member, IEEE, “Anonymization of Sensitive Quasi-Identifiers for l-Diversity and t-Closeness”, IEEE Transactions on Dependable and Secure Computing, 2017.