Skip to main content

Estimation of treatment effects from combined data: identification versus data security

Citation

Komarova, Tatiana; Nekipelov, Denis; & Yakovlev, Evgeny (2013). Estimation of treatment effects from combined data: identification versus data security.

Abstract

The security of sensitive individual data is a subject of indisputable importance. One of the major threats to sensitive data arises when one can link sensitive information and publicly available data. In this paper we demonstrate that even if the sensitive data are never publicly released, the point estimates from the empirical model estimated from the combined public and sensitive data may lead to a disclosure of individual information. Our theory builds on the work in Komarova, Nekipelov and Yakovlev (2011) where we analyze the individual disclosure that arises from the releases of marginal empirical distributions of individual data. The disclosure threat in that case is posed by the possibility of a linkage between the released marginal distributions. In this paper, we analyze a different type of disclosure. Namely, we use the notion of the risk of statistical partial disclosure to measure the threat from the inference on sensitive individual attributes from the released empirical model that uses the data combined from the public and private sources. As our main example we consider a treatment effect model in which the treatment status of an individual constitutes sensitive information.

URL

http://www.nber.org/chapters/c12998.pdf

Reference Type

Conference Paper

Year Published

2013

Author(s)

Komarova, Tatiana
Nekipelov, Denis
Yakovlev, Evgeny