---
title: "Introduction to the `hicp`-package"
author: "Sebastian Weinand"
date: "`r Sys.setlocale('LC_TIME', 'English'); format(Sys.Date(),'%d %B %Y')`"
output:
rmarkdown::html_vignette:
toc: true
# number_sections: true
bibliography: references.bib
vignette: >
%\VignetteIndexEntry{Introduction to the hicp-package}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
---
```{r, include = FALSE}
Sys.setenv(LANGUAGE="en")
# set cores for testing on CRAN via devtools::check_rhub()
library(restatapi)
options(restatapi_cores=1)
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>"
)
```
The Harmonised Index of Consumer Prices (HICP) is the key economic figure to measure inflation in the euro area. The methodology underlying the HICP is documented in the HICP Methodological Manual [@Eurostat2024]. Based on this manual, the `hicp`-package provides functions for data users to work with publicly available HICP price indices and weights (*upper-level aggregation*). The following vignette highlights the main package features. It contains three sections on data access, the Classification of Consumption by Individual Purpose (COICOP) underlying the HICP, and index aggregation.
```{r setup, message=FALSE}
# load package:
library(hicp)
# load additional packages:
library(data.table)
# set global options:
options(hicp.coicop.version="ecoicop-hicp") # the coicop version to be used
options(hicp.unbundle=TRUE) # treatment of coicop bundle codes like 08X
options(hicp.all.items.code="00") # internal code for the all-items index
```
# HICP data
The `hicp`-package offers easy access to HICP data from Eurostat's public [database](https://ec.europa.eu/eurostat/data/database). For that purpose, it uses the download functionality provided by Eurostat's [`restatapi`](https://CRAN.R-project.org/package=restatapi)-package. This section shows how to list, filter and retrieve HICP data using the functions `hicp.datasets()`, `hicp.datafilters()`, and `hicp.dataimport()`.
## Step 1: Available datasets
Eurostat's database contains various datasets of different statistics. All datasets are classified by topic and can be accessed via a navigation tree. HICP data can be found under "Economy and finance / Prices". An even simpler solution that does not require visiting Eurostat's database is provided by the function `hicp.datasets()`, which lists all available HICP datasets with corresponding metadata (e.g., number of observations, last update).
```{r warning=FALSE}
dtd <- hicp.datasets()
dtd[1:5, list(title, code, lastUpdate, values)]
```
The output above shows the first five HICP datasets. As can be seen, a short description of each dataset and some metadata are provided. The variable `code` is the dataset identifier, which is needed to filter and download data.
## Step 2: Allowed data filters
The HICP is compiled each month in each member state of the European Union (EU) for various items. Its compilation started in 1996. Therefore, the dataset of price indices is relatively large. Sometimes, however, data users only need the price indices of certain years or specific countries. Eurostat's API and, thus, the `restatapi`-package allows to provide filters on each data request, e.g., to download only the price indices of the euro area for the all-items HICP. The filtering options can differ for each dataset. Therefore, function `hicp.datafilters()` returns the allowed filtering options for a given dataset.
```{r warning=FALSE}
# dataset 'prc_hicp_inw':
dtf <- hicp.datafilters(id="prc_hicp_inw")
# allowed filters:
unique(dtf$concept)
# allowed filter values:
dtf[1:5,]
```
The output above shows that the dataset `prc_hicp_inw` on item weights can be filtered according to `freq`, `coicop`, and `geo`. The table `dtf` contains for each filter the allowed values, e.g., `CP011` for `coicop` and `A` for `freq`. These filters can be integrated in the data download as explained in the following subsection.
## Step 3: Data download
Applying a filter to a data request can noticeably reduce the downloading time, particularly for bigger datasets. Function `hicp.dataimport()` can be used to download a specific dataset.
```{r warning=FALSE}
# download item weights for euro area from 2015 on:
item.weights <- hicp.dataimport(id="prc_hicp_inw", filters=list("geo"=c("EA","DE","FR")), date.range=c("2015", NA), flags=TRUE)
# inspect data:
item.weights[1:5, ]
nrow(item.weights) # number of observations
unique(item.weights$geo) # only EA, DE, and FR
range(item.weights$time) # since 2015
```
The object `dt` contains the item weights for the euro area since 2015. If one would have wanted the whole dataset, the request would simplify to `hicp.dataimport(id="prc_hicp_inw")`.
# HICP and COICOP
HICP item weights and price indices are classified according to the European COICOP (ECOICOP-HICP). This COICOP version is used by default (`options(hicp.coicop.version="ecoicop-hicp")`) but others are available in the package as well. The all-items HICP includes twelve item divisions, which are further broken down by consumption purpose. At the lowest level of subclasses (5-digit codes), there is the finest differentiation of items for which weights are available, e.g., *rice* (01111) or *bread* (01113). Both rice and bread belong to the same class, *bread and cereals (0111)*, and, at higher levels, to the same group *food (011)* and division *food and non-alcoholic beverages (01)*. Hence, ECOICOP and thus also the HICP follows a pre-defined hierarchical tree, where the item weights of the all-items HICP add up to 1000. This section shows how to work with the COICOP codes to derive for example the lowest level of items that form the all-items HICP.
## COICOP codes, bundles, and relatives
**COICOP codes and bundles.** The COICOP codes underlying the HICP ([ECOICOP](https://op.europa.eu/de/web/eu-vocabularies/dataset/-/resource?uri=http://publications.europa.eu/resource/dataset/ecoicop-hicp)) consist of numbers. The code `00` is used in this package for the all-items HICP although it is no official COICOP code (see `options(hicp.all.items.code="00")`). The codes of the twelve divisions below start with `01, 02,..., 12`. At the lowest level of subclasses, the codes consist of 5 digits. Using the function `is.coicop()`, it can be easily checked if a code is a valid COICOP code or not. This includes bundle codes like `082_083`, which violate the standard COICOP code pattern, but can be found in HICP data. Bundle codes can be generally detected using `is.bundle()` and be unbundled using function `unbundle()`.
```{r warning=FALSE}
# example codes:
ids <- c("00","CP00","13","08X")
# check for bundle codes:
is.bundle(id=ids)
# unbundle any bundle codes into their components:
unbundle(id=ids)
# check if valid ECOICOP code including bundle codes:
is.coicop(id=ids, settings=list(unbundle=TRUE))
# check if valid ECOICOP code excluding bundle codes:
is.coicop(id=ids, settings=list(unbundle=FALSE))
# games of chance have a valid ECOICOP code:
is.coicop("0943", settings=list(coicop.version="ecoicop"))
# but not in the ECOICOP-HICP:
is.coicop("0943", settings=list(coicop.version="ecoicop-hicp"))
```
**COICOP relatives.** COICOP codes available in the data downloaded from Eurostat's database should be generally valid (except for the prefix "CP"). More relevant is thus the detection of children and parent codes in the data. Children are those codes that belong to the same higher-level code (or parent). Such relations can be direct (e.g., `01->011`) or indirect (e.g., `01->0111`). It is important to note that children exhibit exactly one parent, while a parent may contain multiple children. This can be seen in the example below.
```{r warning=FALSE}
# example codes:
ids <- c("00","01","011","01111","01112")
# no direct parent for 01111 and 01112:
parent(id=ids, flag=FALSE, direct=TRUE)
# indirect parent available:
parent(id=ids, flag=FALSE, direct=FALSE)
# 011 has two (indirect) childs:
child(id=ids, flag=FALSE, direct=FALSE)
```
## Deriving the COICOP tree for index aggregation
The functions `child()` and `parents()` may be useful for various reasons. To derive the composition of COICOP codes at the lowest possible level, however, the function `tree()` is better suited. For the HICP, the derivation of this composition can be done separately for each reporting month and country. Consequently, the selection of COICOP codes may differ across space and time. If needed, however, specifying argument `by` in function `tree()` allows to merge the composition of COICOP codes at the lowest possible level, e.g., to obtain a unique selection of the same COICOP codes over time. Because the derivation of COICOP codes searches in the whole COICOP tree, the resulting composition of COICOP codes is also denoted as the *COICOP tree* in this package.
```{r warning=FALSE}
# subset and adjust item weights table:
item.weights <- item.weights[grepl("^CP", coicop),]
item.weights[, "coicop":=gsub(pattern="^CP", replacement="", x=coicop)]
# derive separate trees for each time period and country:
item.weights[, "t1" := tree(id=coicop, w=values, settings=list(w.tol=0.1)), by=c("geo","time")]
item.weights[t1==TRUE,
list("n"=uniqueN(coicop), # varying coicops over time and space
"w"=sum(values, na.rm=TRUE)), # weight sums should equal 1000
by=c("geo","time")]
# derive merged trees over time, but not across countries:
item.weights[, "t2" := tree(id=coicop, by=time, w=values, settings=list(w.tol=0.1)), by="geo"]
item.weights[t2==TRUE,
list("n"=uniqueN(coicop), # same selection over time in a country
"w"=sum(values, na.rm=TRUE)), # weight sums should equal 1000
by=c("geo","time")]
# derive merged trees over countries and time:
item.weights[, "t3" := tree(id=coicop, by=paste(geo,time), w=values, settings=list(w.tol=0.1))]
item.weights[t3==TRUE,
list("n"=uniqueN(coicop), # same selection over time and across countries
"w"=sum(values, na.rm=TRUE)), # weight sums should equal 1000
by=c("geo","time")]
```
All three COICOP trees in the example above can be used to aggregate the all-items HICP in a single aggregation step as the item weights add up to 1000, respectively. While the selection of COICOP codes varies over time and across countries for `t1`, it is the same over time and across countries for `t3`.
# Index aggregation, rates of change, and contributions
The HICP is a chain-linked Laspeyres-type index [@EU2016]. The (unchained) price indices in each calendar year refer to December of the previous year, which is the *price reference period*. These price indices are chain-linked to the existing index using December to obtain the HICP. The HICP indices currently refer to the *index reference period* 2015=100. Monthly and annual change rates can be derived from the price indices. The contributions of the price changes of individual items to the annual rate of change can be computed by the "Ribe contributions". More details can be found in @Eurostat2024[, chapter 8].
## Index aggregation
The all-items index is a weighted average of the items' subindices. However, because the HICP is a chain index, the subindices can not simply be aggregated. They first need to be unchained, i.e., expressed relative to December of the previous year. These unchained indices can then be aggregated as a weighted average. Since the Laspeyres-type index is *consistent in aggregation*, the aggregation can be done stepwise from the bottom level to the top or directly in one step.
In the following example, the euro area HICP is computed directly in one step and also stepwise through all higher-level indices. For that purpose, the monthly price indices and item weights are first downloaded from Eurostat's database. The two datasets are then merged. Second, the price indices are unchained using the function `unchain()`. Based on the derived ECOICOP tree, the unchained price indices are aggregated in one step using the Laspeyres-type index, `chain()`ed, and finally `rebase()`d to the index reference period 2015. A comparison to the published all-items index values shows only small differences due to rounding (since the published index numbers in Eurostat's database are rounded and not available with all decimals).
```{r warning=FALSE, fig.width=7, fig.align="center"}
# import monthly price indices:
prc <- hicp.dataimport(id="prc_hicp_midx",
filter=list(unit="I15", geo="EA"),
date.range=c("2014-12", NA))
prc[, "time":=as.Date(paste0(time, "-01"))]
prc[, "year":=as.integer(format(time, "%Y"))]
prc[, "coicop" := gsub(pattern="^CP", replacement="", x=coicop)]
setnames(x=prc, old="values", new="index")
# unchain price indices:
prc[, "dec_ratio" := unchain(x=index, t=time), by="coicop"]
# import item weights:
inw <- item.weights[geo=="EA", list(coicop,geo,time,values)]
inw[, "time":=as.integer(time)]
setnames(x=inw, old=c("time","values"), new=c("year","weight"))
# derive coicop tree:
inw[ , "tree":=tree(id=coicop, w=weight, settings=list(w.tol=0.1)), by=c("geo","year")]
# merge price indices and item weights:
hicp.data <- merge(x=prc, y=inw, by=c("geo","coicop","year"), all.x=TRUE)
hicp.data <- hicp.data[year <= year(Sys.Date())-1,]
# compute all-items HICP in one aggregation step:
hicp.own <- hicp.data[tree==TRUE,
list("laspey"=laspeyres(x=dec_ratio, w0=weight)),
by="time"]
setorderv(x=hicp.own, cols="time")
hicp.own[, "chain_laspey" := chain(x=laspey, t=time, by=12)]
hicp.own[, "chain_laspey_15" := rebase(x=chain_laspey, t=time, t.ref="2015")]
# add published all-items HICP for comparison:
hicp.own <- merge(x=hicp.own,
y=hicp.data[coicop=="00", list(time, index)],
by="time",
all.x=TRUE)
plot(index-chain_laspey_15~time,
data=hicp.own, type="l",
xlab="Time", ylab="Difference (in index points)")
title("Difference between published index and own calculations")
abline(h=0, lty="dashed")
```
Similarly, the (unchained) price indices are also `aggregate()`d stepwise, which produces in addition to the all-items index all higher-level subindices. A comparison to the all-items index that has been computed in one step shows no differences. This highlights the consistency in aggregation of the indices. User-defined functions can be passed to `aggregate()` as well, which allows aggregation using any weighted or unweighted index formula.
```{r warning=FALSE, fig.width=7, fig.align="center"}
# compute all-items HICP stepwise through all higher-levels:
hicp.own.all <- hicp.data[is.coicop(coicop),
aggregate(x=dec_ratio, w0=weight, grp=coicop, index=laspeyres),
by="time"]
setorderv(x=hicp.own.all, cols="time")
hicp.own.all[, "chain_laspey" := chain(x=laspeyres, t=time, by=12), by="grp"]
hicp.own.all[, "chain_laspey_15" := rebase(x=chain_laspey, t=time, t.ref="2015"), by="grp"]
# compare all-items HICP from direct and step-wise aggregation:
agg.comp <- merge(x=hicp.own.all[grp=="00", list(time, "index_stpwse"=chain_laspey_15)],
y=hicp.own[, list(time, "index_direct"=chain_laspey_15)],
by="time")
# no differences -> consistent in aggregation:
nrow(agg.comp[abs(index_stpwse-index_direct)>1e-4,])
```
## Rates of change and contributions
The indices show the price change between a comparison period and the index reference period. However, data users are more often interested in monthly and annual rates of change. Monthly change rates are computed by dividing the index in the current period by the index one month before, while annual change rates are derived by comparing the index in the current month to the index in the same month one year before. Both can be easily derived using function `rates()`. Contributions of the price changes of individual items to the annual rate of change can be computed by the Ribe or Kirchner contributions as implemented in function `contrib()`.
```{r warning=FALSE, fig.width=7, fig.align="center"}
# compute annual rates of change for the all-items HICP:
hicp.data[, "ar" := rates(x=index, t=time, type="annual"), by=c("geo","coicop")]
# add all-items hicp:
hicp.data <- merge(x=hicp.data,
y=hicp.data[coicop=="00", list(geo,time,index,weight)],
by=c("geo","time"), all.x=TRUE, suffixes=c("","_all"))
# ribe decomposition:
hicp.data[, "ribe" := contrib(x=index, w=weight, t=time, x.all=index_all, w.all=weight_all), by="coicop"]
# annual change rates over time:
plot(ar~time, data=hicp.data[coicop=="00",],
type="l", xlab="Time", ylab="", ylim=c(-2,12))
lines(ribe~time, data=hicp.data[coicop=="01"], col="red")
title("Contributions of food to overall inflation")
legend("topleft", col=c("black","red"), lty=1, bty="n",
legend=c("Overall inflation (in %)", "Contributions of food (in pp-points)"))
```
# References