• Publicado: 29 Mar 2018

  • Archivado en: datascience, r-english

inegiR v2

After a lot of slacking around, I finally got to finishing the upgraded version of the inegiR package on CRAN. This version combines quite a few changes that I will explain further in this post.

New language

The biggest change upfront is the migration to english in both function names and documentation. The rationale behind this is to make it more accessible to developers around the world (I have recieved a few emails asking for translations). Also, the non-ASCII characters were not helpful. For the Mexican users, I assume that if you know R, you can probably find yourself around an english document.

To avoid crashing workflows, I left the legacy functions intact except for a warning to use the english version instead. An example of this is the commercial growth rate functions, which are:

# english
rate_commerce()
# spanish (old version)
tasa_comercio()

Route API

With some help from Arturo Cárdenas and a revamp of the Sákbe API in INEGI, I was able to add functions to access route information.

The two main ones are:

# to search for a destiny id
inegi_destiny()
# to get route information
inegi_route()

The first thing to understand is that INEGI has categorized sites in Mexico according to a “destiny id”. For example, the International Airport in Mexico city is destiny id #57. The inegi_destiny() function will help you find a destiny id based on a text criteria, sort of like googling the place and getting an address. Here is an example with a plaza in Monterrey:

# download on CRAN or newest dev version (if not accepted yet)
# install.packages("inegiR")
# or... 
# devtools::install_github("eflores89/inegiR")
library(inegiR)
library(knitr)
# to search for Macroplaza destiny id
token <- "mytoken"
destiny1 <- inegi_destiny("Macroplaza", token = token)
kable(destiny1)
ID ID_DEST STATE NAME GEO_STRING TYPE LAT LONG
destino 6940 N.L. Macroplaza, Monterrey {“type”:”Point”,”coordinates”:[-100.309991587,25.668862054]} Point -100.3100 25.66886
destino 20237 B.C. Macroplaza del Valle, Mexicali {“type”:”Point”,”coordinates”:[-115.50790804,32.62128025]} Point -115.5079 32.62128
destino 17891 Coah. Macroplaza, Acuña {“type”:”Point”,”coordinates”:[-100.978421457,29.3299882860001]} Point -100.9784 29.32999

When you know two destiny id’s, you can now use the API to learn about potential routes you can take between them. This function will return a list with two objects: a data.frame of route information (kilometers, toll cost, etc) and another data.frame with all the coordinates in the route. Intuitively, if you join all the dots, you can clearly see the route you would take.

To illustrate, i’m going to use the first result and see what the route would be from there to the U.S. Border (which is the other id) with a normal car and with a tolled highway. A further look at the documentation will explain the names and options in the parameters.

route <- inegi_route(from = 6940, to = 7426, token = token, pref = 1, vehicle = 1)
str(route)
# List of 2
#  $ ROUTE          :'data.frame':	1 obs. of  6 variables:
#   ..$ KMS       : num 222
#   ..$ TIME_MINS : num 151
#   ..$ TIME_HRS  : num 2.52
#   ..$ HAS_TOLL  : logi TRUE
#   ..$ TOLL_COST : num 364
#   ..$ TOTAL_COST: logi NA
#  $ COORDINATE_PATH:'data.frame':	1176 obs. of  2 variables:
#   ..$ V1: num [1:1176] -100 -100 -100 -100 -100 ...
#   ..$ V2: num [1:1176] 25.7 25.7 25.7 25.7 25.7 ...

As you can see, the returning element is a list of two data.frame objects. The first will give us basic statistics about the route.

kable(route$ROUTE)
KMS TIME_MINS TIME_HRS HAS_TOLL TOLL_COST TOTAL_COST
222.36 151.11 2.5185 TRUE 364 NA

The total cost is NA because the default value for the calc_cost parameter is FALSE. When this is set to TRUE, the function will additionally look for the price of gasoline in the Sakbé API and calculate a cost of the trip. Be warned, this is very experimental and it is just a rule of thumb (you can see the documentation for a further explanation). Once the price of gasoline is calculated, any tolls are added and then a total cost is supplied. To do this, just change the parameter.

route2 <- inegi_route(from = 6940, to = 7426, token = token, pref = 1, vehicle = 1, 
                      calc_cost = TRUE)
kable(route2$ROUTE)
KMS TIME_MINS TIME_HRS HAS_TOLL TOLL_COST TOTAL_COST
222.36 151.11 2.5185 TRUE 364 757.1729

All prices are reported in Mexican pesos.

The second element in the list is the data.frame containing all point references in the route. As I said before, just connect the dots. Here is a preview:

kable(head(route$COORDINATE_PATH))
LONGITUD LATITUD INDEX
-100.3125 25.66238 1
-100.3125 25.66231 2
-100.3124 25.66225 3
-100.3124 25.66222 4
-100.3124 25.66220 5
-100.3124 25.66215 6

For this particular route, I added the dots in Google maps to show this better:

New GDP catalog

Another huge issue that users reported was trying to find relevant indicator id’s in the INEGI webpage. As experienced users know, every economic data series has a unique id on the API. However, there is no catalog that allows you to find the id’s you are looking for. I have petitioned INEGI multiple times but got nowhere.

My personal solution was to look up the series in the BIE application (a web browser version of the API) and download the data as a .iqy object. From there, I would hack my way into the file to find the unique id’s being called. Very time intensive and error-prone.

So, to help each other out in this endeavour, I created a catalog of id’s. This version has all the sub-levels of GDP (up until 4th level desagregation), but I plan to update this catalog on a rolling basis. Any help would also be appreciated.

You can see the catalog by calling the dataset like this:

data("inegi_catalog")
kable(head(inegi_catalog[,1:7]))
# for more rows, see docs!
NAME LEVEL_2 LEVEL_3 LEVEL_4 UNITS BASE FREQUENCY
PIB TOTAL TOTAL TOTAL MILLIONS OF 2008 PESOS 2008 TRIMESTRAL
PIB - IMPUESTOS A PRODUCTOS NETOS IMPUESTOS A PRODUCTOS NETOS TOTAL TOTAL MILLIONS OF 2008 PESOS 2008 TRIMESTRAL
PIB - VALOR AGREGADO BRUTO VALOR AGREGADO BRUTO TOTAL TOTAL MILLIONS OF 2008 PESOS 2008 TRIMESTRAL
PIB - ACTIVIDADES PRIMARIAS ACTIVIDADES PRIMARIAS TOTAL TOTAL MILLIONS OF 2008 PESOS 2008 TRIMESTRAL
PIB - ACTIVIDADES PRIMARIAS - AGRICULTURA ACTIVIDADES PRIMARIAS AGRICULTURA TOTAL MILLIONS OF 2008 PESOS 2008 TRIMESTRAL
PIB - ACTIVIDADES SECUNDARIAS ACTIVIDADES SECUNDARIAS TOTAL TOTAL MILLIONS OF 2008 PESOS 2008 TRIMESTRAL

Compact metadata and series helper

Two other common headaches came up with the past versions. First, the inegi_series() functions only accepted the full URL when most of the times, the only thing that changed between them was the number of the id. So I added a simple function to paste the entire URL string for the call to the API.

GPD_ID <- 381016
inegi_code(381016)
# "http://www3.inegi.org.mx/sistemas/api/indicadores/v1//Indicador/381016/00000/es/false/xml/"

The second headache had to do with downloading multiple id’s. The list returned when using inegi_series() with the metadata parameter as TRUE is a bit clunky when using it in a loop or apply function. So I added a compact function that returns all the information in a tidy data.frame:

token_inegi <- "mytoken"
df <- compact_inegi_series(inegi_code(381016), token_inegi)
kable(head(df))
Values Dates Name Update Region Units Indicator Frequency
7945204 1993-01-01 Producto interno bruto, a precios de mercado 2017/08/22 Nacional Millones de pesos a precios de 2008 381016 Trimestral
7939362 1993-04-01 Producto interno bruto, a precios de mercado 2017/08/22 Nacional Millones de pesos a precios de 2008 381016 Trimestral
7954943 1993-07-01 Producto interno bruto, a precios de mercado 2017/08/22 Nacional Millones de pesos a precios de 2008 381016 Trimestral
8268036 1993-10-01 Producto interno bruto, a precios de mercado 2017/08/22 Nacional Millones de pesos a precios de 2008 381016 Trimestral
8210538 1994-01-01 Producto interno bruto, a precios de mercado 2017/08/22 Nacional Millones de pesos a precios de 2008 381016 Trimestral
8413362 1994-04-01 Producto interno bruto, a precios de mercado 2017/08/22 Nacional Millones de pesos a precios de 2008 381016 Trimestral

I hope this update is useful to everyone doing data science with Mexican stats. Any new suggestions or questiosn are welcome via twitter or a github issue request.