Exploring the Mexican Census 2020

Aarón Hernández Arcique
7 min readMar 26, 2021

How to access the database and use it to analyze the results

Hernández-Arcique A. and Camacho-Pérez E.

The National Institute of Statistics and Geography (INEGI) released the 2020 Population and Housing Census results, carried out from March 2 to 27, 2020. The purpose of the 2020 Census is to produce information on the volume, structure, and spatial distribution of the population and its primary demographic, socioeconomic, and cultural characteristics.

This tutorial will show how to access the INEGI data and show some of the most common data manipulation and data visualization tools in data science, such as Pandas, Matplotlib, Folium, and Plotly.

Getting Census data

First option: API calls

a) Request a Secret Key from INEGI. In order to make API calls, you will need to request a secret key from INEGI. Navigate to the census developers page until finding the banner “El token puede obtenerse aquí” under the menu Constructor de consultas.

b) You’ll be redirected to a page asking for an email address to register. Shortly, you will get an email with a long string of letters and numbers which is your API Key. Here the links [1] [2] :

c) Find the right data with the Indicators method. To consult the Indicators method, the parameters are sent directly in the URL, for example:

Syntax: `https://www.inegi.org.mx/app/api/indicadores/desarrolladores/jsonxml/INDICATOR/[IdIndicator]/[Idiom]/[Geographic area]/[Recent]/[Source]/[Version][Token]?type=[Format]`

URL: `https://www.inegi.org.mx/app/api/indicadores/desarrolladores/jsonxml/INDICATOR/1002000001/es/00/true/BISE/2.0/[HERE THE TOKEN]?type=json`

Input Parameters

  • Map
  • IdIndicator The first step to obtaining information from the API is to select the indicator and identifying its key. This can be done by consulting the “Query Builder”.
  • Idiom The information is available in Spanish [es] and English [en].
  • Geographic area It can be national [00], by federal entity [99] or by municipality [999], depending on each indicator.
  • Most recent data or historical series Only the most recent data [true] or the complete historical series [false] can be consulted.
  • Data source It corresponds to the dissemination source [BISE] or [BIE] from which the consulted data will be obtained.
  • Version The edition [2.0] of the data provision service will be identified with it.
  • Token To use the API, it is necessary to send a valid token, which can be obtained in the link above. It seems that the token does not have a lifetime limit, and there were no findings of that.
  • Format The information is offered in 3 types of formats: JSON [json], JSONP [jsonp] or XML [xml].

Output Structure

The results of an API request have the following general structure:

{"Header":{"Name":"Datos compactos BISE","Email":"atencion.usuarios@inegi.org.mx"},"Series":[{"INDICADOR":"1002000001","FREQ":"7","TOPIC":"123","UNIT":"96","NOTE":"1398","SOURCE":"2,3,343,487,510,1714,2960","LASTUPDATE":null,"STATUS":"3","OBSERVATIONS":[{"TIME_PERIOD":"2020","OBS_VALUE":"126014024.00000000000000000000","OBS_EXCEPTION":null,"OBS_STATUS":"3","OBS_SOURCE":"","OBS_NOTE":"","COBER_GEO":"0700"}]}]}
  • Go Identifier of the selected method ["INDICATOR"].
  • AgencyID Identification of the agency that provides the data ["INEGI"].
  • Version Version of the method used to obtain the data [2.0].
  • Lang Key of the language in which the data is provided ["es"].
  • CODE It is the arrangement of the data obtained, each element consists of the following attributes: Value, it is an identifier of the data and Description, it is the description of the data.

Second option: Download all Open Data files

a) Navigate to the “Datos” page and you should see “Datos abiertos” and “Descargar todos los archivos”.

b) Unzip the contents of the .zip file in which you will find 3 files:

  • DescargaMasivaApp.exe
  • DescargaMasivaOD.xml
  • Leeme.txt

c) Run the file DescargaMasivaApp.exe to automatically start downloading the files in the path you select. It is important to stay connected to the Internet while the transfer is taking place. Here the links: Spanish [3] and English [4].

Data Analysis demonstration

Population growth in Mexico

This analysis will begin by showing the growth of the population in Mexico since 1910, for which the following URL will be used:

URL: 'https://www.inegi.org.mx/app/api/indicadores/desarrolladores/jsonxml/INDICATOR/1002000001/es/00000/false/BISE/2.0/[TOKEN]?type=json'

Population growth by gender

Now, the population will be divided into men and women and show its growth.

Indicators

  • Total male population: 1002000002
  • Total female population: 1002000003
URL: 'https://www.inegi.org.mx/app/api/indicadores/desarrolladores/jsonxml/INDICATOR/1002000002,1002000003/es/0700/false/BISE/2.0/[TOKEN]?type=json'

As observed, between the decades of 1910 and 1950, the population grew constantly. From 1960, there is a high increase in the growth. Note that in 1960 and 1970, the percentage in the difference between men and women decreased.

With this code you can build the following DataFrame and graph:

The percentage growth differences between men and women are shown graphically to get better visualization. A decrease is observed between the 1960s and 1970s, and the percentage increased from the 1980s, but, in general, there is a predominance in women in all cases.

Percentage of population aged 12 years and older economically active by state

As an example of the manipulation of economic data for each of Mexico’s states, the percentage of the population aged 12 years and older economically active will be displayed.

Note: Mexico’s states have an abbreviation that identifies them, organized in a specific order implemented by the National Population Registry (RENAPO).

To obtain data from multiple states, the different identifiers for each state can be concatenated.

In order to show how it is reflected geographically, we will use the folium module. As observed, the country’s central area is very homogeneous compared to the rest of the country.

The average schooling level of the population 15 years of age and older

The average schooling level for the population over 15 years old will be analyzed, historically and in the last census. However, 2020 data for this item is not available in the API, so the data requested corresponds to the data from 2000 to 2015; the data for 2020 data will be taken from the CSV provided in the following link: [5]

Education in Mexico is divided into four main stages: elementary (6 years long), mid-school (3 years long), high school (3 years long) and college (3 to 5 years long).

The way in which information can be obtained from an INEGI CSV file is presented below:

From the provided dictionary in the link from above, we can search for the Mnemotecnic of the indicator. In this case, we will be using GRAPROES, that stands for “Grado Promedio Escolar” or average school grade.

As can be seen in the graph below, an average value just above nine years means that the majority of Mexicans have completed middle school.

It is possible to predict with a simple linear extrapolation what will be the average school grade in 2025 and 2030.

According to the linear prediction, by 2025, most Mexicans over 15 are expected to have completed at least the first year of high school.

The average level of schooling of the population 15 years of age and older (By state)

It is of great interest to know the distribution of school grade averages in the country’s states, for which reason a map with the levels of each state is presented below.

It would be interesting to observe the increase in the average school year for all the states to know if the educational programs implemented have influenced the current position in which they find themselves. We will take the historical data from the API.

It would be ideal for making a single request to the API, in which the values for the 32 states are received. However, there is a limit to the number of IDs that can be included in a request, which is 30. Therefore, this time individual requests will be made for each state, which facilitates integrating the data into the DataFrame.

The bar plot shows the five states with de highest grade school averages, comparing the increase between 2010 and 2020.

As can be seen in the graph Querétaro had the largest increase.

A new column is created to show the increment in the average number of school years comparing 2010 and 2020.

Population affiliated to health services

The API Service only gives the population affiliated to health services; instead, the CSV file with 2020 data gives the population that has and does not have health services.

There are four geographic levels from which information can be obtained: national, state, municipal, and locality. Below is a map showing the percentage of population per municipality with social security in Yucatan.

The data will be obtained from the CSV file of the 2020 census.

Conclusion

We hope that this abbreviated tutorial on data from the Mexico Census will help people interested in knowing important information about this country. To see the full tutorial, visit: https://github.com/aarondiuz/Proyecto_Censo

Also, in this data project, we can draw two important lessons:

  • Data manipulation is important

Most of the time, it will be necessary to clean the data, filter it, and even mix it with another database to understand the information better.

  • The power of visualizations

Once the data is available, it is necessary to know how to choose the way in which the information we want to transmit will be best reflected. As shown here, there are different ways, as well as the different available libraries. A great display can captivate the audience and ensure the right message gets delivered.

References

[1] https://www.inegi.org.mx/servicios/api_indicadores.html

[2] https://www.inegi.org.mx/app/api/indicadores/interna_v1_1/tokenVerify.aspx

[3] https://www.inegi.org.mx/programas/ccpv/2020/#Datos_abiertos

[4] http://en.www.inegi.org.mx/programas/ccpv/2020/#Open_data

[5] https://www.inegi.org.mx/programas/ccpv/2020/#Datos_abiertos

--

--