ABSTRACT:
A number of studies on privacy-preserving data mining have been
proposed. Most of them assume that they can separate quasi-identifiers (QIDs)
from sensitive attributes. For instance, they assume that address, job, and age
are QIDs but are not sensitive attributes and that a disease name is a
sensitive attribute but is not a QID. However, all of
these attributes can have features that are both sensitive attributes and QIDs
in practice. In this paper, we refer to these attributes as sensitive QIDs and
we propose novel privacy models, namely, (l1, …, lq)-diversity and (t1, …, tq)-closeness,
and a method that can treat sensitive QIDs. Our method is composed of two
algorithms: an anonymization algorithm and a
reconstruction algorithm. The anonymization
algorithm, which is conducted by data holders, is simple but effective, whereas
the reconstruction algorithm, which is conducted by data analyzers, can be
conducted according to each data analyzer’s objective. Our proposed method was
experimentally evaluated using real data sets.
PROJECT OUTPUT VIDEO: (Click the below link to
see the project output video):
EXISTING
SYSTEM:
Many studies regarding anonymized
databases of personal information have been proposed. Most existing methods
consider that the data holder has a database in the form of explicit
identifiers, quasi-identifiers
(QIDs),
or sensitive attributes, where explicit identifiers are
attributes that explicitly identify individuals (e.g., name), QIDs are
attributes that could be potentially combined with other directories to
identify individuals (e.g., zip code and age), and sensitive attributes are
personal attributes of a private nature (e.g., disease and salary)
DISADVANTAGES
OF EXISTING SYSTEM:
The values of Disease are protected by
frequency l-diversity, but other attributes are not protected.
In practice, the age, address, and job of a person might be considered as
private information. In this case, we should consider that these attributes
have features of both QIDs and sensitive attributes k-anonymity cannot protect against
“attribute disclosure.”
PROPOSED
SYSTEM:
Our contributions are as follows: (1) we propose
new privacy models, namely, (l1; : : : ;
lq)-diversity and (t1; : : : ;
tq)-closeness, which can treat databases
containing several sensitive QIDs; (2) we propose a simple but effective
general anonymization algorithm for (l1; : : : ;
lq)-diversity and (t1; : : : ;
tq)-closeness, which is conducted by data holders;
and (3) we propose a novel reconstruction algorithm that can decrease the
reconstructed error between the reconstructed and the original values according
to each data analyzer’s purpose.
ADVANTAGES
OF PROPOSED SYSTEM:
Our approach can reduce information loss, even when the number of
attributes is large, because the randomization of attribute value is executed
for each attribute independently. Furthermore, we can ensure that, even if the
data analyzer knows all the sensitive QID values of the whole database except
for one record, the sensitive QID values of the record are protected.
The dominant approach to anonymize
databases for l-diversity and t-closeness is based on a
generalization. The generalization approach is easily understandable for data
analyzers. Moreover, because the truthfulness of each record is preserved, we
can obtain some information from each record.
SYSTEM
ARCHITECTURE:
SYSTEM
REQUIREMENTS:
HARDWARE
REQUIREMENTS:
·
System : Pentium Dual Core.
·
Hard Disk : 120 GB.
·
Monitor :
15’’ LED
·
Input
Devices : Keyboard, Mouse
·
Ram : 1
GB
SOFTWARE
REQUIREMENTS:
·
Operating
system : Windows 7.
·
Coding
Language : JAVA/J2EE
·
Tool : Netbeans 7.2.1
·
Database
: MYSQL
REFERENCE:
Yuichi Sei, Member,
IEEE, Hiroshi
Okumura, Takao Takenouchi, Akihiko Ohsuga, Member, IEEE, “Anonymization
of Sensitive Quasi-Identifiers for l-Diversity and t-Closeness”, IEEE
Transactions on Dependable and Secure Computing, 2017.