Anonymisation using the example of the DZHW graduate panel

Using the example of the DZHW Graduate Panel 2009, we show you here which anonymisation measures we use and how these are related to the different access routes.

Procedure

In a first step, all direct identifiers are deleted (such as names or address data). Afterwards, quasi-identifiers are determined, they are also called indirect identifiers. Several quasi-identifiers in combination (e.g. with external data sources) could lead to the identification of a person (such as place of residence, age, field of study, university in combination). These are therefore either removed or changed to such an extent (e.g. by aggregation) that identification of the persons can be excluded. Finally, particularly sensitive information is identified that cannot be disclosed.

Examples of anonymisation DZHW graduate panel

The following table gives an overview of the characteristics that were either released, aggregated or deleted in the DZHW Graduate Panel 2009 within the framework of anonymisation, depending on the mode of access.

MEASURES OF STATISTICAL ANONYMISATION OF THE DATA OF THE DZHW GRADUATE PANEL 2009 BY MODE OF ACCESS

 
Characteristic On-Site-SUF Remote-Desktop-SUF Download-SUF Download-CUF (subsample)
Direct Identifiers Deletion and assignment of a random ID Deletion and assignment of a random ID Deletion and assignment of a random ID Deletion and assignment of a random ID
Receipt of Questionnaire Release Deletion Deletion Deletion
Field of Study Release Aggregation to study area Aggregation to study area Aggregation to subject groups
University Aggregation to type of university and location of university to NUTS 2 Aggregation to type of university and location of university to federal state Aggregation to type of university and location of university to old/ new federal state Aggregation to type of university and location of university to old/ new federal state
Further academic Qualification (State) Release Release Aggregation to Germany/ abroad Aggregation to Germany/ abroad
Working Location (Federal State/ abroad) Release Release Aggregation to federal states/ abroad Aggregation to new/ old federal states and abroad

Working Location
(Postal Code)

Release Aggregation to NUTS 2 Aggregation to NUTS 2 Deletion
(further) ...      
Age Release Release Release TOP-Coding
Health Characteristics Deletion Deletion Deletion  

In the first column, the characteristic is specified and in the other columns the anonymization procedure for the individual modes of access. The DZHW Graduate Panel 2009 is offered as Scientific Use File (SUF) via all three modes of access and additionally as Campus Use File (CUF) via the download mode of access. Release means that the characteristic is released via the corresponding mode of access as it was requested and prepared. For example, data users can view the postcode of the city where the student has been granted the right to study via the on-site mode of access. If, on the other hand, the feature is deleted, we do not disclose it via the respective mode of access. This is the case, for example, with the postal code of the place of residence in the download-CUF.

 

THREE EXAMPLES OF ANONYMISATION MEASURES

Field of study

First, we look at the respondents' field of study, which has been aggregated to varying degrees and can thus be provided via different modes of access.

Characteristic On-Site-SUF Remote-Desktop-SUF Download-SUF Download-CUF (subsample)
Field of Study Release Aggregation to study area Aggregation to study area Aggregation to subject group

The on-site mode of access is the most technically controlled, which is why we are publishing the subject here without aggregating. For the modes of access remote desktop and download, which are less technically controlled, the subject has been aggregated to study areas. For this purpose, we have used the key index of the student and examination statistics winter semester 2008/09 and summer semester 2009 from Destatis. Instead of the study subject Materials Engineering, the study subject Mechanical/Process Engineering is now published. In the case of the download-CUF, which can be applied for without concluding a data usage agreement, we aggregate more strongly than with the SUF. Here, instead of the study subjects, the subject groups are published: in our example, the subject group Engineering Sciences.

University

A second example is the characteristic university, which is never freely published.

Characteristic On-Site-SUF Remote-Desktop-SUF Download-SUF Download-CUF (subsample)
University Aggregation to type of university and location of university to NUTS 2 Aggregation to type of university and location of university to federal states Aggregation to type of university and location of university to new/ old federal states Aggregation to type of university and location of university to new/ old federal states

Instead, the university location is aggregated differently via the various modes of access, as the following figures illustrate. Via the on-site mode of access, the university locations are provided in a very detailed manner at NUTS 2 level (38 regions) and offer the highest potential for analysis. Data users who use the remote mode of access receive the university location at the federal state level. Both, the CUF and the SUF, have information on whether the university is located in one of the new or the old federal states.

Deutschlandkarte mit Umrissen der NUTS-2-Regionen
NUTS-2-Regions

 

Deutschlandkarte mit Umrissen der 16 Bundesländer
Federal States

 

Deutschlandkarte mit Umrissen der alten und neuen Bundesländer (ohne Berlin-West)
Old/new Federal States

 

Age

Another example is the age of the interviewed persons. Since our example is a survey of a university graduate cohort, the respondents are approximately the same age and people who completed their studies relatively late are quite rare. In the SUF, information on all modes of access is released. In the CUF, on the other hand, we aggregate and apply a top coding. This means that age information above a certain limit - in this case from the year of birth in 1959 and older - is aggregated to one category.

Chraracteristic On-Site-SUF Remote-Desktop-SUF Download-SUF Download-CUF (subsample)
Age Release Release Release TOP-Coding

The anonymisation procedure including an overview table is explained in detail for each study in a separate chapter of the data and methods report.

The legal background for anonymisation is, on the one hand, the EU General Data Protection Regulation (EU-GDPR) and the German Federal Data Protection Act in its revised version of 30 June 2017, which stipulates that data from scientific research projects is to be anonymised for transfer to third parties in such a way that no reference to the person can be made.

We ensure this through a combination of statistical measures and technical access restrictions. The following applies here: The more strongly data access is technically controlled, the lower the risk of de-anonymisation, the less the data must be reduced by information using statistical measures and the greater its analysis potential remains.