Data ShufflingA New Masking Approach for Numerical Data
Krishnamurty Muralidhar,
Rathindra Sarathy
Gatton College of Business and Economics, University of Kentucky, Lexington, Kentucky 40506
Spears School of Business, Oklahoma State University, Stillwater, Oklahoma 74078
krishm{at}uky.edu
sarathy{at}okstate.edu
This study discusses a new procedure for masking confidential numerical dataa procedure called data shufflingin which the values of the confidential variables are "shuffled" among observations. The shuffled data provides a high level of data utility and minimizes the risk of disclosure. From a practical perspective, data shuffling overcomes reservations about using perturbed or modified confidential data because it retains all the desirable properties of perturbation methods and performs better than other masking techniques in both data utility and disclosure risk. In addition, data shuffling can be implemented using only rank-order data, and thus provides a nonparametric method for masking. We illustrate the applicability of data shuffling for small and large data sets.
Key Words: camouflage; confidentiality; data masking; data swapping; obfuscation; privacy; perturbation
History: Received: August 26, 2004;
Copyright © 2006 by INFORMS.