APPLICATION OF SIMULATED COMPUTER SIMULATION TO THE TASK OF PERSONAL DEPERSONALIZATION DATA. MODEL AND ALGORITHM FOR DECONTAMINATION BY SYNTHESIS
- Authors: Borisov S.A.1, Bosov A.A.1, Ivanov D.E.1
-
Affiliations:
- Federal Research Center "Informatics and Management" RAS
- Issue: No 5 (2023)
- Pages: 19-34
- Section: DATA ANALYSIS
- URL: https://journals.rcsi.science/0132-3474/article/view/141779
- DOI: https://doi.org/10.31857/S0132347423050023
- EDN: https://elibrary.ru/ZXUVBM
- ID: 141779
Cite item
Abstract
The second part of the study on the topic of automated depersonalization of personal data is presented. The review and analysis of the prospects for research, performed earlier, is supplemented here by a practical result. A model of the depersonalization process is proposed, reducing task of ensuring anonymity of personal data to manipulation of samples of different types of random elements. Accordingly, the key idea of transforming data to ensure their anonymity, provided that utility is maintained, is to apply the synthesis method, i.e. complete replacement of all unpublished data with synthetic values. The proposed model identifies a set of element types for which synthesis patterns are proposed. The set of patterns compiles the depersonalization algorithm by the synthesis method. Methodically, each template is based on a typical statistical tool – frequency probability estimates, nuclear Rosenblatt-Parsen density estimates, statistical averages and covariances. The application of the algorithm is illustrated by a simple example from the field of civil air transportation.
About the authors
S. A. Borisov
Federal Research Center "Informatics and Management" RAS
Author for correspondence.
Email: aborisov@ipiran.ru
Russia, Moscow
A. A. Bosov
Federal Research Center "Informatics and Management" RAS
Author for correspondence.
Email: avbosov@ipiran.ru
Russia, Moscow
D. E. Ivanov
Federal Research Center "Informatics and Management" RAS
Author for correspondence.
Email: aivanov@ipiran.ru
Russia, Moscow
References
- Борисов А.В., Босов А.В., Иванов А.В. Применение имитационного компьютерного моделирования к задаче обезличивания персональных данных. Оценка состояния и основные положения // Программирование, 2023. № 4, с. 58–74.
- Aggarwal C.C., Yu P.S. On Privacy-Preservation of Text and Sparse Binary Data with Sketches // SIAM Conference on Data Mining, 2007.
- Sweeney L. K-anonymity: a model for protecting privacy // International Journal on Uncertainty, Fuzziness and Knowledge-based Systems, 2002. V. 10. № 5. P. 557–570.
- Samarati P., Sweeney L. Generalizing Data to Provide Anonymity when Disclosing Information (Abstract) // Proc. of ACM Symposium on Principles of Database Systems, 1998. P. 188.
- Samarati P. Protecting Respondents’ Identities in Microdata Release // IEEE Trans. Knowl. Data Eng., 2001. V. 13. № 6. P. 1010–1027.
- Bayardo R.J., Agrawal R. Data Privacy through Optimal k-Anonymization // Proceedings of the ICDE Conference, 2005. P. 217–228.
- Fung B., Wang K., Yu P. Top-Down Specialization for Information and Privacy Preservation // ICDE Conference, 2005.
- Wang K., Yu P., Chakraborty S. Bottom-Up Generalization: A Data Mining Solution to Privacy Protection // ICDM Conference, 2004.
- Domingo-Ferrer J., Mateo-Sanz J. Practical data-oriented micro-aggregation for statistical disclosure control // IEEE TKDE, 2002. V. 14. № 1.
- Winkler W. Using simulated annealing for k-anonymity // Technical Report 7, US Census Bureau, Washington D.C. 20233, 2002.
- Iyengar V.S. Transforming Data to Satisfy Privacy Constraints // KDD Conference, 2002.
- Lakshmanan L., Ng R., Ramesh G. To Do or Not To Do: The Dilemma of Disclosing Anonymized Data // ACM SIGMOD Conference, 2005.
- Aggarwal C.C., Yu P.S. On Variable Constraints in Privacy-Preserving Data Mining // SIAM Conference, 2005.
- Aggarwal C.C. On k-anonymity and the curse of dimensionality // VLDB Conference, 2005.
- Iyengar V.S. Transforming Data to Satisfy Privacy Constraints // KDD Conference, 2002.
- Machanavajjhala A., Gehrke J., Kifer D., Venkitasubramaniam M. L-Diversity: Privacy Beyond k-Anonymity // ICDE Conference, 2006.
- Fung B., Wang K., Yu P. Top-Down Specialization for Information and Privacy Preservation // ICDE Conference, 2005.
- Wang K., Yu P., Chakraborty S. Bottom-Up Generalization: A Data Mining Solution to Privacy Protection // ICDM Conference, 2004.
- Rosenblatt M. Remarks on Some Nonparametric Estimates of a Density Function // Ann. Math. Statist., 1956. V. 27. № 3. P. 832–837.
- Parzen E. On Estimation of a Probability Density Function and Mode // Ann. Math. Statist., 1962. V. 33. № 3. P. 1065–1076.
- Silverman B.W. Density Estimation for Statistics and Data Analysis. London: Chapman & Hall/CRC, 1986.
- Kullback S., Leibler R.A. On information and sufficiency // Ann. Math. Statist., 1951. V. 22. № 1. P. 79–86.