Sunday, May 31, 2015

Alternate Policy Scenario Setting and Forecasting

ISLM Macroeconomic Model Part-V
(Scenario Setting and Forecasting)

(What happens if contractionary monetary policy was exercised?)


You can find the Eview file (here). You should be familiar with the previous posts. Find the Part-I (here), Part-II (here) Part-III (here) and Part-IV (here).By now we have performed a dynamic stochastic forecasting. We have assumed the Government expenditure follows some Seasonal ARMA process and money supply is fixed at 970 units.

Now, in this blogpost, I will answer what will happen while the scenario changes, say, the money supply was reduced to 900 units as a contractionary monetary policy.

Here is my video post.




Click to ModelA, then click to Scenario, then Scenario 1 (if scenario 1 is not give click the Create New Scenario), then click to Overrides.



Inside Overrides, type M and click OK. Because we are trying to alter the values of Money supply.



Then, to recheck, go to ModelA, then View, then Variables, there you should see the variable for money supply colored with Red, Right click that m, and go to properties, the you will see Overridden Exog as M_1.


Next, you need to make series as M_1. For in command window write series M_1=M.


Now double click M_1, then jump to 2000Q1 Cell, click to Edit, and change the values to 900 for all cells till 2005Q4. We are doing this for our assumption of contractionary monetary policy in which the money supply was reduced from 970 units to 900 units.


Now once again open ModelA, then click to Solve, then setup the everything like following.



Now, once again open ModelA, then click Proc, then Make Graph, set the following setting. Make sure you check the compare Baseline and Active Scenario 1 for comparing the Baseline i.e with 970 values of M and scenario 1 with 900 unit of money supply.  Also set the sample for Graph as 1996Q1 2005Q4, just to make graph more appropriate to view.




Then you see the following Graph.You can see the the effect between the two policy when money supply was fixed to 970 and 900.


So compare to the baseline scenario (when money supply was 970), when the money supply was reduce to 900 unit (in scenario 1), the interest rate rises because the money is less in supply and lower money supply leads to higher price of money (i.e interest rate). Due to higher interest rate investment fell down (blue line is below green line), lower investment mean lower growth or Y, and lower consumption.

Well, at the end, I must say, these forecasts are still not quite appropriate for policy implication because the dip retrenchment in each graph for out of sample makes forecast questionable. However, there are two ways to solve the problem.

Method 1: As we know in dynamic forecasting (here), there was a huge gap in investment so to deal with such problem we can include the lags in the equation of investment.

Method 2: A better alternative would be to try to modify the variables in the equation so that the equation can provide some explanation for the sharp rise in investment during the 1990s. An alternative approach to the problem is to leave the equation as it is, but to include an add factor in the equation so that we can model the path of the residual by hand (Eviews 9, Users Guide II).


In the next blog, I will show you how can we make forecasts better including an add factor in the equation.

References:
Eviews 9, Users Guide II.

Saturday, May 30, 2015

Dynamic Stochastic Forecasting

ISLM Macroeconomic Model Part-IV 

(Dynamic Stochastic Forecasting)


You can find the Eview file (here). You should be familiar with the previous posts. Find the Part-I (here), Part-II (here) and Part-III (here). Here is the video post.



As we found both the static and dynamic forecasting fitted the general trend of the data. Let's perform a dynamic stochastic forecast ahead for out of sample period from 2000q1 till 2005q4. At first, open your main workfile, double click Range at top left corner, then write 2005Q4 in End data and click OK. It will open another window for reconfirmation click OK.



Now once again, open the main workfile, double click Sample, write 1947Q1 1999Q4 at sample range pairs. Then click Ok.



Lets recall once again consumption, investment, interest rate and absorption were the endogenous variables while government expenditure and money supply were exogenous. Hence, before forecasting endogenous variables we need to forecast the exogenous variables (government expenditure and money supply).

Lets click quick, Estimate Equation, and paste d(g) c ar(1) ar(2) ar(3) sma(4) sma(8) and OK. (I have shown in video how and why I estimated this equations)  This will give you the estimate. Save that equation as eq04G. I checked an Automatic ARIMA Forecasting in Eviews and found the above equation as best model.




Now click the eq04G, then click the Forecast tab, then change forecast sample as 2000Q1 2005Q4, because we want to forecast the government expenditure from 2000 till 2005. The forecast value will be saved in gf series at main workfile.



Now open GF, then select the forecasts data from 2000Q1 till 2005Q4 and copy (Ctrl+V) then open G, go to 2001Q1 cell select it and paste (Ctrl+V).

Now, lets suppose the money supply remain stable to the last values i.e value at 1999Q4 (we are not forecasting money supply here). So lets double click at m, then click to edit, select the value of 199Q4 and copy it (Ctrl+C), then select remaining values form 2000Q1 till 2005Q4 then paste (Ctrl+V).





Now, we have the values of Government expenditure and Money supply. Lets forecast. Click the ModelA, then click Solve, then select Stochastic then Dynamic Solution, tick std Dev, and change the solution sample to 2000Q1 2005Q5. This will solve the model and generate the 95% CI.




Now, once again open ModelA, click Proc, then select Make Graph, then select the Endogenous Variables, the select Mean +/- 2 Std Dev, tick the Actuals, change the sample of Graph from 2000Q1 2005Q4. This will generate the Dynamic Solution from 1st quarter of 2000 till last quarter of 2005 along with 95% upper and lower CI.



This will generate the following Graph.



The actual values represented in Blue and forecast values are represented by Green along with 95% CI. We can infer if the Government spending follows the specified trend and money supply is kept fixed at 970 units then the possible dynamic future scenario along with stochastic values are given as the above Graph.


In next blog, I will post on building the possible scenario by altering the baseline scenario and effect of policy intervention like contractionary monetary policy.

Wednesday, May 27, 2015

Dynamic Forecasting of ISLM Macroeconomic Model

ISLM Macroeconomic Model Part-III
(Dynamic Forecasting of ISLM Macroeconomic Model)

You can find the Eview file (here). You should be familiar with the previous posts. Find the Part-I (here) and Part-II (here).

Dynamic forecasting helps to examine how the model performs when used to forecast many periods into the future. In this post, lets try to answer how the model would have performed if it was used back in 1985 and then forecast ahead till 1999.

Open ModelA, then click to Solve. A new window will open, choose deterministic, dynamic solution, and write 1985q1 1999q4 in solution sample and click OK.




















Now click Proc in ModelA workfile, then click to Make Graph. A new window will open, choose Endogenous variablesAll model variables, tick the Actuals, select sample for graph from 1985q1 till 1999q4 then click OK.






















This will give you the Graphs of all the endogenous variables.




The graph shows, if we implement the given model and would have performed the forecast from 1985 to 1999 then the forecast would be in Green line where as the actual is in Blue.  All the series follows the general trend. 

Here is the Video Post.




From the static solution and dynamic solution, we can see that the forecast values were close to actual values and were following the general trend. Therefore, we can reach to conclusion that the model can be used to forecast ahead for out of sample. In the next blog, I will forecast till 2005Q4.

References:
Eviews 9, Users Guide II.

Static Forecasting

ISLM Macroeconomic Model Part-II

(Static Forecasting of ISLM Macroeconomic Model)

You can find the Eview file (here). You should be familiar with the previous posts Part-I. Find the Part-I (here)

Lets recall the ISLM equations.

Ct= b1 + b2Yt + b3Ct-1                                                                           (i)
I= b4 + b5(Yt-1 - Yt-2) + b6 rt-4                                                                                 (ii)
r= b7 + b8(Yt - Yt-1) + b9 (mt - mt-1) + b10 (rt - rt-1)                            (iii)
Y = Ct+ I+ Gt                                                                                        (iv)

Based upon the data set of US economy from 1947Q1 till 1999Q4, we estimated equations (i), (ii) and (iii), while the equation(iv) is just an accounting identities. Static forecasting refers to examine the ability of the model to provide one period ahead forecasts of the endogenous variables. Here is the video post.




In Eviews, we can make model. For that just right click anywhere at workfile, then select new object then model and assign name as ModelA then click OK


Now double click ModelA, then drag and drop eq01, eq02 and eq03 in ModelA workfile


Now in ModelA right click anywhere and select insert.

Now write the identity equation (i) i.e Y = Ct+ I+ Gt   there and click OK. 















Now Select the Solve tab, it will open a new widow, select deterministic, static solution and in solution sample write 1960q1 1999q4 and click OK. The reason for doing such is money supply variable is avaialable from 1959q1 and we are allowing extra periods lagged variables. The Eview will consider the actual data for EQ01, EQ02, EQ03 and indentity equation then forecast ahead for one period.



















Now, in ModelA workfile, click View, then Ctrl+select variables colored in blue (these are endogenous variables CN, I, R and Y), then click Proc tab, then select Make Graph






















In the new window, you just need to tick Actuals and OK. This will give you a graph to perform the static forecasts. Static forecasting helps to examine the ability of the model to provide one period ahead forecasts of the endogenous variables. The graph shows the in sample one step ahead prediction against the actual values. In the graphs below, blue line shows the actual values and green shows the baseline or the static forecasts. Visually, the model perform well for one step ahead forecast and all the variables have followed the general trend, however, for investment especially after 1985 baseline failed to follow the actual closely.










































References:
Eviews 9, Users Guide II.

Formulation and Estimation of ISLM Macroeconomic Equations

 ISLM Macroeconomic Model Part-I
(Formulation and Estimation of Equations for Simple ISLM Macroeconomic Model)


Understanding the Theory of Model



For this blog, we will consider the data set of US economy from 1947Q1 till 1999Q4. You can download the Eview file (here). The data for money supply available from 1959Q1 till 1994Q4.


Let's consider a national accounting identity. GDP (GDPt ) is sum of consumption (Ct), investment (It), government expenditure (Gt) and net export (NXt). 

GDP= Ct+ I+ G+ NXt

Let’s suppose Ct is a linear function of one lag of itself (Ct-1) and Yt then we can write:

Ct= b1 + b2Yt + b3Ct-1                                                                          (i)

Let’s suppose Iis a linear function of (Yt-1 - Yt-2) and 4 lags previous interest rate (rt-4). This equation can be represented as:

I= b4 + b5(Yt-1 - Yt-2) + b6 rt-4                                                                                (ii)

Again, we assume, interest rate (rt) is a linear function of difference of absorption (Yt - Yt-1), difference of money supply (mt - mt-1), and difference of interest rate (rt - rt-1).

r= b7 + b8(Yt - Yt-1) + b9 (mt - mt-1) + b10 (rt - rt-1)                           (iii)

The sum of Ct, It and Gt is absorption and expressed as Yt.

GDP= Yt + NXt

Because, Y = Ct+ I+ Gt                                                                    (iv)

Examining the above equations consumption, investment interest rate and absorption are the endogenous variables while government expenditure and money supply are the exogenous variables. 

Estimating Above Equation in Eviews


To obtain the results check makingEQNs working page of Eview files. You can simply implement following commands in Eviews by clicking Quick and Estimate Equations. The results are given in Table-1 to Table-3.

cn c y cn(-1)
i c (y(-1)-y(-2)) y r(-4)
r c y (y-y(-1)) (m-m(-1)) (r(-1)+r(-2))

Table-1: Estimate of Equation-i

















Table-2: Estimate of Equation-ii

















Table-3: Estimate of Equation-iii



















Here in Table-3, while estimating interest rate the sample size is adjusted from 1959Q2 till 1999Q4. 

Here is the Video Post.



In the next post, I will show how to develop the model, link the equations and include the GDP identity. Then I will perform the static and dynamic solution of the model. Later,  I will perform Dynamic Stochastic Forecasts

References:
Eviews 9, Users Guide II.

Tuesday, May 26, 2015

Seasonal Adjustment

Seasonality Part-III (Seasonal Adjustment)


Once again, I will use time series dataset made available by Prof. Rob Hyndman (here) and you can download the codes that I have used for this blog (here). I will use the Avril Coghlan’s pdf booklet: “A Little Book of R For Time Series”.  You can download the booklet (here).

Before we jump into the topic, you must be familiar with additive and multiplicative seasonality. Check the previous posts (here for part-1 and part-2). Let's first understand the concept of seasonal adjustment in additive model and multiplicative model.

Any time series Y(t) can be decomposed into three components: the trend (long term direction), the seasonal (systematic, calendar related movements) and the irregular (unsystematic, short term fluctuations). Lets denote the trend by T(t), seasonal by S(t) and irregular component by e(t).

Seasonal Adjustment for Addictive Model


For an addictive model the decomposition can be expressed as:

Y(t) = T(t) + S(t) + e(t)          (i)

Once we remove the seasonal component from original series we will get the seasonally adjusted series. Therefore, the seasonally adjusted series for an addictive model is

SA(t) = Y(t) -  S(t) i.e  SA(t) = T(t)  + e(t)          (ii)

Seasonal Adjustment for Multiplicative Model

For a multiplicative model the decomposition can be expressed as:

Y(t) = T(t) * S(t) * e(t)          (iii)

Seasonally adjusted series for an multiplicative model is

SA(t) = Y(t) / S(t) i.e  SA(t) = T(t)  *  e(t)          (iv)


Here is the seasonally adjusted series of additive model birthtimeseries.

setwd("D:/RTutorial")
library(TTR)
library(forecast)

par(mfrow=c(1,2))

#Additive Seasonality
births = scan("http://robjhyndman.com/tsdldata/data/nybirths.dat")
birthstimeseries = ts(births, frequency=12, start=c(1946,1))
plot.ts(birthstimeseries, main="Plot of birthstimeseries")

birthstimeseriescomponents = decompose(birthstimeseries, type="additive")
#plot(birthstimeseriescomponents)
birthstimeseriesseasonallyadjusted = birthstimeseries - birthstimeseriescomponents$seasonal
plot(birthstimeseriesseasonallyadjusted)



























Here is the seasonally adjusted series of multiplicative model souvenirtimeseries.

#Multiplicative Seasonality
souvenir = scan("http://robjhyndman.com/tsdldata/data/fancy.dat")
souvenirtimeseries = ts(souvenir, frequency=12, start=c(1987,1))
plot.ts(souvenirtimeseries, main="Plot of Sales at Souvenir Shop")


souvenirtimeseriescomponents = decompose(souvenirtimeseries, type="multiplicative")
#plot(souvenirtimeseriescomponents)
souvenirtimeseriesseasonallyadjusted = souvenirtimeseries - souvenirtimeseriescomponents$seasonal
plot(souvenirtimeseriesseasonallyadjusted)

par(mfrow=c(1,1))
























Interestingly, while the seasonality are removed the both the time series still look additive and multiplicative model, yes of course with out seasonal effects.

Visual Detection of Additive and Multiplicative Seasonality

Seasonality Part-II
(Visual Detection of Additive & Multiplicative Seasonality - Part-II)

Once again, I will use time series dataset made available by Prof. Rob Hyndman (here) and you can download the codes that I have used for this blog (here). I will use the Avril Coghlan’s pdf booklet: “A Little Book of R For Time Series”.  You can download the booklet (here).

I will focus on visual detection of multiplicative and additive seasonality based upon the decomposition of time series. In this post, I will try to shed some light on:

1. Visual detection of seasonal and random component for both additive and multiplicative times series.

2. Further, I will answer, what happens to seasonal and random component of additive model while wrongly specified as multiplicative one and vice versa.

First lets set up working directory and load the TTR package.

setwd("D:/RTutorial")
library(TTR)  

Once again load the datasets and make them a proper time series and plot them in same panel.

par(mfrow=c(1,2))
births = scan("http://robjhyndman.com/tsdldata/data/nybirths.dat")
birthstimeseries = ts(births, frequency=12, start=c(1946,1))
plot.ts(birthstimeseries)

souvenir = scan("http://robjhyndman.com/tsdldata/data/fancy.dat")
souvenirtimeseries = ts(souvenir, frequency=12, start=c(1987,1))
plot.ts(souvenirtimeseries)
par(mfrow=c(1,1))


Figure-1: Plot of Both Time Series


















We now can perform a basic visual detection for the additive and multiplicative seasonality. Lets move ahead, Lets perform a time series decomposition. Any time series Y(t) can be decomposed into three components: the trend (long term direction), the seasonal (systematic, calendar related movements) and the irregular (unsystematic, short term fluctuations). Lets denote the trend by T(t), seasonal by S(t) and irregular component by e(t).

For an addictive model the decomposition can be expressed as:

Y(t) = T(t) + S(t) + e(t)          (i)

For a multiplicative model the decomposition can be expressed as:

Y(t) = T(t) * S(t) * e(t)          (ii)

The trend component is extracted by a moving average. Then, the seasonal figure is computed by averaging, for each time unit, over all periods. The seasonal figure is then centered. Finally, the error component is determined by removing trend and seasonal figure (recycled as needed) from the original time series.

Now, lets see how the multiplicative model can be expressed in additive one by taking log. Lets consider equation(ii) and take log on both sides:

Ln[Y(t)] = Ln[T(t) * S(t) * e(t)]

or, LnY(t)= LnT(t) + LnS(t) + Lne(t)          (iiI)    [because Ln(a*b) = ln(a) + ln(b)]

Therefore, taking log can convert the multiplicative model to addictive one. Lets take log on souvenirtimeseries (multiplicative model) and plot it to check if it changes to additive or not.

par(mfrow=c(1,2))
plot.ts(souvenirtimeseries, main="Plot of Sales at Souvenir Shop")
plot.ts(log(souvenirtimeseries), main="Plot of Log conversion")
par(mfrow=c(1,1))

Figure-2:  Comparison of Level and log Conversion of  souvenirtimeseries.


























Now, let's consider the birthstimeseries which is an additive model, then lets decompose the time series and analyse the decomposed plot.

birthstimeseriescomponents = decompose(birthstimeseries, type="additive")
plot(birthstimeseriescomponents)

Figure-3: The Decomposition of Additive Time Series

























Here, I want to clarify graphically, in an additive model, the amplitude of both the seasonal and irregular variations do not vary as the level of the trend rises or falls. This statement can be much more clarified if we decompose the multiplicative model with additive one. While we do such, both the seasonal and irregular variations varies as the level of the trend rises or falls. Here, I try to answer, what happens if we wrongly specify a multiplicative model to additive model. And, we hope to see seasonal and irregular variations vary as the level of the trend rises or falls.

false.souvenirtimeseriescomponents = decompose(souvenirtimeseries, type="additive")
plot(false.souvenirtimeseriescomponents)


Figure-4: The Decomposition of Multiplicative Model as an Additive Model



Compare the seasonal and irregular component (random) in Figure-3 and Figure-4.  The irregular component is irregular while we correctly specified the model in Figure-3, while in Figure-4, the irregular component is not irregular at all or not random. Therefore, we wrongly state (in Figure-4) a multiplicative model to additive one. So, by plotting the decomposition we can also specify if any Y(t) is additive or multiplicative.

Now, as we know souvenirtimeseries is multiplicative in nature, lets decompose it as multiplicative model.

souvenirtimeseriescomponents = decompose(souvenirtimeseries, type="multiplicative")
plot(souvenirtimeseriescomponents)

Figure-5: The Decomposition of Multiplicative Time Series

































Now, in Figure-5, while we correctly model multiplicative model as multiplicative one, the irregular and seasonal component do not vary.

As we answered, what happens when multiplicative model is specified as additive one in Figure-4. Now lets answer what happens when an additive model is falsely specified as multiplicative model.

false.birthstimeseriescomponents = decompose(birthstimeseries, type="multiplicative")
plot(false.birthstimeseriescomponents)

Figure-6:The Decomposition of an Additive Model as Multiplicative Model
































Interestingly comparing Figure-6 with Figure-3, the seasonal and random component both do not vary much while an additive model when it is wrongly specified as multiplicative one.

Finally,  Lets decompose the multiplicative model into additive model taking log. The commands are given below.

log.souvenirtimeseriescomponents = decompose(log(souvenirtimeseries), type="additive")
plot(log.souvenirtimeseriescomponents)

Figure-7:The Decomposition of an Multiplicative Model to Additive Model by Taking Log



































In next post, I will show how to generate the seasonally adjusted data for both multiplicative and additive model.

Visual Detection of Additive and Multiplicative Seasonality

Seasonality Part-I
(Visual Detection of Additive & Multiplicative Seasonality - Part-I)


Today, from my side there is no contribution. I will use time series dataset made available by Prof. Rob Hyndman (here) and I will use the Avril Coghlan’s pdf booklet: “A Little Book of R For Time Series”.  You can download the booklet (here). You can download the codes that I have used for this blog (here).

I will mainly focus on multiplicative and additive seasonality. My focus will be on visual detection rather than statistical meaning and formula in this blog. The motives of this blog is to develop an intuition only. In next blog, I will show you the decomposition of time series based upon the codes given by Coghlan (2015). Later, I will focus on formula and their meaning.

In an additive model, the amplitude of both the seasonal and irregular variations do not vary as the level of the trend rises or falls (see figure-1 and figure-2). While for multiplicative model, the amplitude of both the seasonal and irregular variations increase as the level of the trend rises (see figure-2). 

Setting up Working Directory and Loading Required Packages.

At first set your working directory at D drive RTutorial folder as we have done previously by implementing following command. 

setwd("D:/RTutorial")

Now, let’s install the required packages. Use following command (You can skip this step if you have already forecast and TTR package).
  
install.packages(“forecast”)
install.packages(“TTR”)

Now, load the required packages with following commands.

library(forecast)
library(TTR)

Additive Seasonality


Now let’s load the required data from http://robjhyndman.com/tsdldata/data/nybirths.dat

births = scan("http://robjhyndman.com/tsdldata/data/nybirths.dat")

This data set is the number of births per month (therefore we set frequency=12, for quarterly frequency=4) in New York city, from January 1946 (therefore we use start=c(1946,1) where 1 represents the first month i.e. January) to December 1959 (originally collected by Newton). We can read the data into R, and store it as a time series object, by typing:
birthstimeseries = ts(births, frequency=12, start=c(1946,1))

Lets plot the dataset birthstimeseries using following command.

plot.ts(birthstimeseries)

Figure-1: Plot of Births per Month in New York City from Jan 1946 to Dec 1959



This time series visually seems as addictive model because the “seasonal fluctuations are roughly constant in size over time and do not seem to depend on the level of the time series, and the random fluctuations also seem to be roughly constant in size over time” (Coghlan, 2015).  

Multiplicative Seasonality 


Let’s load the file from http://robjhyndman.com/tsdldata/data/fancy.dat . This dataset contains monthly sales for a souvenir shop at a beach resort town in Queensland, Australia, for January 1987-December 1993 (original data from Wheelwright and Hyndman, 1998). We can read the data into R by typing:

souvenir = scan("http://robjhyndman.com/tsdldata/data/fancy.dat")
souvenirtimeseries = ts(souvenir, frequency=12, start=c(1987,1))
souvenirtimeseries
plot.ts(souvenirtimeseries)


Figure-2: Plot of Monthly Sales at Souvenir Shop in Level Data

This plot shows that the “seasonal fluctuations and random fluctuations seem to increase with the level of the time series” (Coghlan, 2015). I would say there is heteroskedasticity therefore a way to deal is to transform the data. A log transformation can help. Lets see.

plot.ts(log(souvenirtimeseries))


Figure-2: Plot of Monthly Sales at Souvenir Shop in Log Conversion of Data




























Now, after the log transformation, the series appears to have an addictive seasonality.


Season Plot

Another interesting plot is known as season plot. Instead of plotting data as timeseries, we can stack the data based upon the season. The birthstimeseries and souvenirtimeseries both are monthly data, instead of plotting as timeseries, here, I have stacked the data implementing seasonplot command from forecast package.

seasonplot(birthstimeseries, col=rainbow(12), year.labels=TRUE, lwd=2)
seasonplot(souvenirtimeseries, col=rainbow(12), year.labels=TRUE, lwd=2)
seasonplot(log(souvenirtimeseries), col=rainbow(12), year.labels=TRUE, lwd=2)


I will only interpret the season plot of birthstimeseries.

Figure-4: Season Plot of Births per Month in New York City from Jan 1946 to Dec 1959




























Here, in Figure-4, we can see that the births has peak at Jul to Oct and troughs at Feb for almost all years.

Here is my video post.



References:
Coghlan, A. (2015). A Little Book of R For Time Series. http://doi.org/10.1080/0963828031000137072




Sunday, May 24, 2015

Exploiting Statistical Properties Rather than Economic Theory

ARIMA Forecasting Part-VII
(Exploiting Statistical Properties Rather than Economic Theory)


In the previous post, we dealt with forecasting see (here) and (here). 

In Figure-1, let me show you the forecast values (point forecast based upon the ARIMA(2,2,1)) colored with blue line along with: their 80% Confidence interval (CI) colored in light blue; 95% CI colored in light grey; and the actual value (dashed line). 

Figure-1: Actual and Forecast Values from 2009:2013



















Recall from previous posts that the model ARIMA(2,2,1): is well behaved in term of residual diagnostics; has in-sample-forecast ability (as MAPE is below 10% for in-sample); actual values lie within 95% CI of forecast values; and model has good out-of-sample performance (MAPE is below 10% for out-of-sample). 

However, there is still some gap between forecast and actual values (out-of-sample gap or error). Here in two videos, I will show you how we can exploit the statistical properties of time series to correct such gap.




The forecast values from ARIMA(2,2,1) shows the trend. But the government continuously intervene with policy to boost growth and there are lot of external factors that can affect growth. We have seen positive gap between the actual and forecast values which represents that the government policy and external factor could have positively affected hence in the next period the actual values has gone up.

Here, in the next video, I will show you what is the best CI of the forecast value that represents the actual growth for period 2009:2013. I will also show you how we can perform a multi-period forecast (which we have been doing from previous posts) and one step ahead forecast. At the end of the video, I will forecast for 2014:2018 period.



By, now you have retained a lot of information. However, here in the following video, I will summarize for you.



In upcoming blog, I will show you how to make a seasonal ARIMA (make sure you watch the video on theory here) and after that I will show you ARIMAX/SARIMAX models in which we will try to follow an economic theory rather than statistical exploitation.

Quest of Better Models

ARIMA Forecasting Part-VI (Quest of Better Models)


So, which model is the best model? How to access the quality of the various models? Most often the information criterions come to play their role in the model selection. Basically, Akaike Information Criterion (AIC) and the Bayesian information criterion (BIC) or Schwarz criterion (also SBC, SBIC) are widely used information criterions for model selection. For data (here)and codes (here).

The model always fits better (in term of r-square) as we add more parameters. But the information criterions try to answer: "Does an over fitted model really retains all the information (pattern that the data holds)?". 

While we add more parameters the model fits better but it losses the information (pattern that the data holds) therefore the model losses it ability to forecast the out of samples. The information criterions are bliss to deal with such tradeoff between the over fitting and information loss. The information criterions penalize the extra parameter added in model.

In our model we implemented auto.arima command of forecast package. It allows to select the model based upon the AIC, BIC and AICc (AIC corrected for finite sample).

The best model as per AIC, AICc and BIC can be selected with following command.
fit11 <- auto.arima(trainingdata, trace=TRUE, test="adf", ic="aic")
fit12 <- auto.arima(trainingdata, trace=TRUE, test="adf", ic="aicc")
fit13 <- auto.arima(trainingdata, trace=TRUE, test="adf", ic="bic")

As per AIC and AICc, ARIMA(2,2,1) is selected as the best model but as per BIC ARIMA(0,2,2) is selected as the best model. Now which is the best model?

But first let’s answer which criterion to choose?

The BIC penalizes extra added parameter more compare to AIC. Therefore, compare to AIC, BIC is more restrictive. So in general intuition, AIC overfits model compare to BIC. When there is large data sample, allowing overfit may lead to information loss. BIC is preferable compare to AIC when there is large sample available for data analysis. While for the small sample is given, then AIC instead of BIC is preferable. Therefore, the model suggested by BIC is always simpler than AIC. In our study we should prefer AIC to BIC as we have 39 observations on training data set.  Use length(trainingdata) command to see the length or number of observations.

Let’s compare model fit11 with fit13. Lets use summary(fit11) and summary(fit13). Lets compare models in term of coefficient significance.

Table-1: Comparison of significance of coefficients.
ARIMA(2,2,1) as per AIC

ARIMA(0,2,2) as per BIC
Coefficients:
Coefficients:

ar1
ar2
ma1
ma1
ma2
Coefficients values
-0.5273
-0.4655
-0.5143

-1.2729
0.54
s.e.
0.1824
0.1653
0.1616

0.1828
0.1582
t-value
-2.8909
-2.81609
-3.18255

-6.96335
3.413401


In Table-1, t-values (coefficients values/s.e.) are more significant with model selected by BIC compare to AIC. The BIC model is simpler than AIC as well.

Table-2: Comparison in term of In-sample forecast accuracy.
In-Sample Forecast Accuracy
ME
RMSE
MAE
MPE
MAPE
MASE
ACF1
ARIMA(2,2,1) selected as per AIC
2.58
7.19
5.36
0.86
1.81
0.38
-0.20
ARIMA(0,2,2) selected as per BIC
2.42
7.32
5.65
0.82
1.87
0.40
-0.04


The Table-2 the in-sample forecast accuracy for both of the model. In term of RMSE, MAE, MAPE and MASE, model selected by AIC is better while model selected by BIC appears better in term of ME, MPE and ACF1. But, usually RMSE and MAPE are widely used to analyze the forecast accuracy therefore, model selected by AIC appears more better for our dataset.

Table-3: Comparison in term of normality and autocorrelation of residuals.
ARIMA(2,2,1) selected as per AIC

ARIMA(0,2,2) selected as per BIC
jarque.bera.test(fit11$residuals)

jarque.bera.test(fit13$residuals)
Jarque Bera Test
Jarque Bera Test
data:  fit11$residuals
data:  fit13$residuals
X-squared = 2.8819, df = 2,
p-value =
0.2367

X-squared = 2.4786, df = 2,
p-value =
0.2896
Box.test(fit11$residuals,lag=10, ty
pe="Ljung-Box")

Box.test(fit13$residuals,lag=10, ty
pe="Ljung-Box")
Box-Ljung test
Box-Ljung test
data:  fit11$residuals
data:  fit13$residuals
X-squared = 7.8292, df = 10, p-value
0.6455

X-squared = 10.9366, df = 10, p-value
0.3625


The Table-3, shows the comparison of two models in term of normality and autocorrelation of residual. Even thou both model failed to reject the null hypothesis (accept the null): normality of residuals and non autocorrelation of residuals. However, the model selected by AIC is more robust to autocorrelation problem as the p-value of Box-Ljung test is higher compare to model selected by BIC. While, the residuals of model selected by BIC is more normal compare to model selected by AIC. In the time series, we would prefer the residuals having non autocorrelation compare to normality.  

Table-4: Comparison in term of Out of-Sample Forecast Accuracy.
Out of-Sample Forecast Accuracy
ME
RMSE
MAE
MPE
MAPE
MASE
ARIMA(2,2,1) selected as per AIC
23.77
25.47
23.77
3.06
3.06
1.67
ARIMA(0,2,2) selected as per BIC
25.86
27.00
25.86
3.34
3.34
1.81



Here in the Table-4, it can be clearly visible that model selected by AIC outperforms the model selected by BIC in term of all out of-sample forecast accuracy measures. Therefore, model selected by AIC is better in all aspect. The reason for such is due to small size data. 

Lets perform a visual inspection on out of sample forecast

Figure-1: Model Selected by AIC


























Figure-2: Model Selected by BIC





























In the Figures, the actual values (represented in dashed line) were above the confidence intervals (represented in dark grey and light grey for 80% CI and 95% CI) for the model selected by BIC. Graphically, we can infer that BIC is more restrictive to over fit the model and penalizes more for extra added parameters.