When forecasting inflation, economists have explored a wide array of both simple and complicated models. One of the most well-known macroeconomic models for forecasting inflation is the unemployment rate Phillips curve. This analysis details models that employ a Phillips curve specification using different economic variables. Additionally, we combine the models together to test an ensemble approach. Then, the results of the individual models are compared to the ensemble model, an average of the four estimates. Findings showed that of the five estimated models, the Cleveland Fed’s expected rate of inflation was the most reliable metric for forecasting year ahead inflation.
Introduction
Inflation, or the rate of increase in prices over time, is a closely monitored economic variable. The ability to forecast changes in prices with any reliability is an exceedingly useful tool. Conventional macroeconomics teaches the Phillips curve, a model that illustrates an inverse relationship between the unemployment rate and the rate of inflation. This analysis seeks to answer two key questions:
Which economic variables serve as the best predictors for the rate of inflation using a Phillips curve specification?
Does an ensemble approach using the average of the models produce better forecasts than any individual model?
This analysis uses a time-series tools from Forecasting: Principles and Practice written by Rob J Hyndman and George Athanasopoulos.[1]
Data
Code
# Vector of variables to be gathered:vars <-c("PCEPI","UNRATE","EXPINF1YR","MICH","INDPRO")# Use tidyquant function tq_get to gather economic data from FRED and format as a tsibblefred_data <-tq_get(vars,get ="economic.data",from ="1982-01-01",to ="2022-04-01") %>%mutate(Month =yearmonth(date), value = price) %>%select(-c(date, price)) %>%as_tsibble(index = Month, key = symbol)# Pivot the data to be in a more conventional wide formatfred_dataw <- fred_data %>%pivot_wider(names_from = symbol, values_from = value) %>%as_tsibble()# Transform variablestdata <- fred_dataw %>%select(c(PCEPI, UNRATE, EXPINF1YR, MICH, INDPRO)) %>%# transformed inflationmutate(infl =1200*log(PCEPI/lag(PCEPI))) %>%# differenced inflationmutate(dinfl = infl -lag(infl,1)) %>%# differenced inflation 12mutate(dinfl12 =100*log(PCEPI/lag(PCEPI,12)) -lag(infl,12)) %>%# differenced unratemutate(unrate = UNRATE -lag(UNRATE)) %>%# differenced expected infmutate(expinf1yr = EXPINF1YR -lag(EXPINF1YR)) %>%# differenced michmutate(mich = MICH -lag(MICH)) %>%# transformed indpromutate(indpro =1200*log(INDPRO/lag(INDPRO))) %>%# keep only transformed variablesselect(-c(PCEPI, UNRATE, EXPINF1YR, MICH, INDPRO)) %>%drop_na()# Split the data into training and testing groupstrain_data <- tdata %>%filter_index(~"2018-12")test_data <- tdata %>%filter_index("2019-01"~ .)# Compare ACF plots before and after transformation# fred_dataw %>% ACF(MICH) %>%# autoplot()# tdata %>% ACF(mich) %>%# autoplot()
The data used in this analysis was gathered from FRED, the Federal Reserve’s Economic Database. The time series spans from January 1982 to April 2022.
Variables
Refer to the table for a list of the variables used in the analysis and their attributes.
Source: Federal Reserve Economic Data (FRED)
Variables
Description
Units
PCEPI
Personal Consumption Expenditures: Chain-type Price Index
Index 2012=100, Seasonally Adjusted
UNRATE
Unemployment Rate
Percent, Seasonally Adjusted
EXPINF1YR
1-Year Expected Inflation
Percent, Not Seasonally Adjusted
MICH
University of Michigan: Inflation Expectation
Percent, Not Seasonally Adjusted
INDPRO
Industrial Production: Total Index
Index 2017=100, Seasonally Adjusted
Visualize
See below for the raw time series for each variable pulled from FRED.
Code
# Use facet grid to visualize raw data for variablesfred_data %>%ggplot(aes(x = Month, y = value)) +geom_line() +facet_grid(vars(symbol), scales ="free_y") +labs(y =" ") +theme_bw()
Transformations
As is consistent with standard specification, the variables PCEPI and UNRATE will be used in the initial estimation of the unemployment rate Phillips curve. PCEPI is a seasonally adjusted monthly price index of personal consumption expenditures and will serve as a measure of the percent increase or decrease in prices. Further, UNRATE is the seasonally adjusted monthly unemployment rate. In addition to the two conventional variables for forecasting inflation, three other variables were gathered as potential predictors of the change in prices in the economy. These include a calculation of the monthly expected inflation rate by the Federal Reserve Bank of Cleveland (EXPINF1YR), a calculation of median monthly expected price changes by the University of Michigan (MICH), and a monthly index of real production for the manufacturing, mining, and electric and gas industries in the United States (INDPRO). These three variables, like unemployment rate, should be predictors of inflation and will each be modeled with the same specifications.
Before any analysis or forecasting, the data must first be transformed into a stationary time-series. A stationary time series is defined as one whose properties are constant over the entire observed length of time. Stationary time series have constant means and variance and exhibit no clear trend or seasonality. PCEPI required a log transformation and differencing to stabilize the mean and variance, and the resulting data for this variable was converted to monthly percent change. UNRATE, EXPINF1YR, and MICH also required differencing to make the time series stationary and reduce the trend and seasonality present in the data. Lastly, a log transformation was used to convert INDPRO into monthly percentage and to stabilize the variance of the time series.
Checks on the plots and autocorrelation functions confirmed that the raw data was stationary and appeared to mostly distributed as white noise following the transformations. Some extreme points due to the COVID-19 pandemic are persistent in both unemployment rate and industrial production.
Code
#Observe the plots of the transformed variablestdatam <-melt(tdata, "Month")ggplot(tdatam, aes(Month, value)) +geom_line() +facet_wrap(~variable, scales ="free", ncol =2) +theme_bw()
The first model is the unemployment rate Phillips curve, the baseline model for the analysis. According to economic theory, this model is stable and robust, so it will be used as a means of comparison when estimating models with the other variables of interest. The following three models will use the same specifications as the standard Phillips curve, instead replacing the unemployment rate with calculated inflation expectations or measures of industrial production. This paper uses the Stock and Watson (1999) specification of the Phillips curve below for all the estimated models.[2]
This specification allows for extrapolation of the steady state levels of inflation (\(\bar{\pi}\)) and unemployment rate (\(\bar{u}\)). In the long-run steady state, the equilibrium between inflation and unemployment rate can be used to find a fixed level of \(\bar{u}\), the natural rate of unemployment. This is convenient and shows the empirical applications of the Phillips curve. Note that instead of testing for the optimal number of lags, the lags of the models were specified to be consistent with economic theory.
In addition to the four models outlined above, an ensemble model was estimated using the average of the four Phillips curve models. The intuition behind an amalgamation approach is that the ensemble model should have lower variance and be a better predictor than any of the individual models. The ensemble model specification is outlined below.
\[
me=(m1+m2+m3+m4)/4
\]
Code
# Create the forecast using the combination modelforecast <- fit_combo %>%forecast(new_data = test_data)# Assess in-sample forecast accuracyinsample <-accuracy(fit_combo)# Assess out-of-sample forecast accuracyoutsample <-accuracy(forecast, tdata)
Estimation and Results
Before estimating, the data was split into train and test datasets. The models were estimated over the training dataset, which encompasses data up to December 2018. Thus, the testing dataset consists of data from January 2019 onward and will be used as the main assessment of the forecast accuracy. The plot of the one year forecast below shows the performance of the models both in-sample and out-of-sample.
Code
#Plot the estimated models#Plot the estimated modelsforecast %>%autoplot(filter(tdata, year(Month) >2016), level =c(95)) +labs(x ="Month", y ="Percent Inflation") +ggtitle("Predicted Monthly Inflation Rate Using 5 Models") +theme_bw()
The mean absolute percentage error (MAPE) will be used to compare the estimated models and their forecast accuracy. MAPE is the most widely used measure of forecast accuracy and is the average percentage error of the difference between actual and forecasted values. The mean absolute percent error is calculated using the formula below:
\[
p_t=100e_t/y_t
\]
\[
MAPE = mean(\left| p_t \right|)
\]
Code
# Create a table to display in-sample accuracy between modelsinsample %>%select(c(".model", ".type", "MAPE")) %>%arrange(MAPE) %>%kbl(col.names =c("Model", "Type", "MAPE"), align ="ccc") %>%kable_styling(bootstrap_options =c("striped", "hover")) %>%row_spec(1,bold=T,hline_after = T)
Model
Type
MAPE
mINDPRO
Training
174.8688
mUN
Training
216.4131
ensem
Training
246.9985
mEXPINF
Training
269.7534
mMICH
Training
438.5753
After estimating, none of the models fit the data particularly well. As outlined in the table above, the industrial production model had the lowest MAPE over the training dataset. This suggests that in-sample, industrial production was the best predictor of inflation. Interestingly, the ensemble model was outperformed by both the industrial production and unemployment rate models over the training period.
Code
# Create a table to display out-of-sample accuracy between modelsoutsample %>%select(c(".model", ".type", "MAPE")) %>%arrange(MAPE) %>%kbl(col.names =c("Model", "Type", "MAPE"), align ="ccc") %>%kable_styling(bootstrap_options =c("striped", "hover")) %>%row_spec(1,bold=T,hline_after = T)
Model
Type
MAPE
mINDPRO
Test
308.5174
ensem
Test
381.2155
mEXPINF
Test
418.6000
mMICH
Test
434.3837
mUN
Test
503.3144
The table above outlines the MAPE of each model when estimated over the test period. Even out-of-sample, the industrial production model again performed the best. It was followed by the ensemble model. Though the models are similar in fit and specification, the industrial production index seems to be the most reliable metric for forecasting inflation based on these results.
Summary
Overall, using the Phillips curve specification, the industrial production model and ensemble models performed better than the inflation expectations, unemployment rate, and Michigan expectations models. This makes economic sense, as changes in industrial production tend to lead changes in the overall price level. Increases in input costs, capacity utilization, and supply chain are all components of industrial production and can precede changes in prices. The most interesting conclusion from the analysis is that the ensemble model did not perform as well as predicted. This is surprising as amalgamation models are generally more accurate than any individual forecast. Perhaps using a weighted average instead of a simple average when specifying the ensemble model would have been more appropriate.
References
[1] Hyndman, Rob J. “Forecasting: Principles and Practice (3rd Ed).” OTexts. Accessed April 10, 2022. https://otexts.com/fpp3/.