Data Mining Chilean Civil Service

Researcher: Bastián González-Bustamante

Assistants: Matías Astete and Berenice Orvenes

Project Status: Completed    English    Spanish

Creation of the Dataset

The sources of information were data released by the DNSC in response to requests AE004T0000240 and AE004T0000484 under Law on Access to Public Information. These requests were made on December 26, 2016, and April 26, 2018, respectively. With the first request, we developed a database of senior public managers for the period 2009-2015 (N = 391; see González-Bustamante, 2020).

With this database and the second request, we compiled 452 top-level managers for 2009-2017. Subsequently, we compiled 1,396 public documents, including appointment decrees, minutes of contests, institutional news, among other similar documents.

The documents were uploaded to the Open Science Framework (OSF) platform and assigned a unique permalink that allowed us to apply an optical character recognition (OCR) algorithm programmed for this purpose. In this way, the PDF documents were converted to PNG images which were uploaded to the project repository on GitHub, which is connected to OSF (surv-civil-servants, currently private and soon available for public consultation).

The images were then converted to a manageable text format to match and verify the documents with the identified cases. This allowed us to validate the cases.


Artwork by DALL·E in an Impressionist style.
Diagram by González-Bustamante, Astete and Orvenes (2020).
Last updated: August 23, 2021.