Synthetic data is artificial data that is manufactured by statistical and machine learning algorithms. Synthetic data is more than masked or de-identified real data. The artificial data is carefully monitored to make sure that it is statistically similar to real data, but the individual data points are different. Records in synthetic data do not exist and are a great way to open access to health data as they don’t risk showing any private health information.
As part of the WA Health Data Linkage Strategy 2022 – 2024 the Department of Health used statistics-based and generative AI-based data engines to create three synthetic datasets. The representativeness of these synthetic data sets was ensured by a large amount of post-processing.
The representative synthetic datasets that the Department of Health will be releasing for the WA Health Hackathon 2023 have passed all quality check in terms of their representativeness to the real data sets.
There is no private information of any WA patients in these synthetic data sets. The Department of Health has encrypted any potentially personal information. For example, the field ‘person_ID’ in the synthetic data does not represent any ‘person_ID’ that you may have see in any datasets elsewhere. So, the records in the synthetic data sets do not match to any real patients. Additionally, to protect institution-level privacy attributes like hospital IDs, were also encrypted.
Synthetic data sets are important for health data for two reasons:
However, to ensure the privacy of patients we have excluded rare diseases and anything occurrence that was unusually infrequent. These rare diseases do not occur frequently so the synthesised datasets will still be very valuable for data analysis and the development of new algorithms and applications
No. There is nothing you need to change in how you conduct your analysis. However, please keep in mind that data sets being released for the WA Health Hackathon 2023 do not have a linking key between them. They are separate data sets.
We would like to acknowledge and pay respect to the traditional owners of the land on which the WADSIH office is located, the Whadjuk people of the Nyungar Nation.