The whole purpose of this blog is to visualize the ${{Y}_{t}}=\rho {{Y}_{t-1}}+{{e}_{t}}$ time series AR(1) process with various values of $\rho $ and value of variance of ${{e}_{t}}$.

The AR(1) process given by ${{Y}_{t}}=\rho {{Y}_{t-1}}+{{e}_{t}}$. The ${{e}_{t}}$is assumed to be white noise with following properties which I have previously discussed (here).

$E({{e}_{t}})=0\to (i)$

$E(e_{t}^{2})={{\sigma }^{2}}\to (ii)$

$E({{e}_{t}},{{e}_{j}})=0$ for $i\ne j\to (iii)$

I have put an interactive diagram with random seed of 1, for 100 period in which we can play with the value of $\rho $ from -1.1 till 1.1 and value of variance ${{e}_{t}}$ from zero till 1. We know that, if the value of $\left| \rho \right|<1$ then the series will be stationary, if $\left| \rho \right|=1$ then the series will be non-stationary and $\left| \rho \right|>1$ then the series will be explosive. If the following interactive diagram fails to work then click (here) for the external link.

I have shown a tutorial video on ADF unit root test. In the video I have data generated and tested the unit root test and showed how the ADF test can behave when the assumptions are mis-specified.

In the time series among many following three equations are important. These equations are given as:

In all these equation the nature of time series will significantly differ depending upon the value $\rho $. If the value of $\left| \rho \right|<1$ then the series will be stationary, if $\left| \rho \right|=1$ then the series will be non-stationary and $\left| \rho \right|>1$ then the series will be explosive.

Here is my video post in which I have shown the exact process to perform an ADF Unit Root test with the real dataset. I have also shown what happens with the ADF values when we mis-judge the assumptions. Further below, I have shown the the step wise process with a flowchart diagram.

Exact process to do ADF Unit Root Test

A.1 See the p-value of ADF Test

A. 2 if p-value is less than level of significance usually 5% (reject the null hypothesis of unit root), thus the data is stationary. STOP the process

B. if p-value is greater than 5% level of significance

B.1 then see the p-value of estimated trend

B.2 if p-value < 5%, then the data has unit root. Stop the process and repeat the process on difference data from initial step.

C. if p-value > 5%, then the trend is insignificant, Go to step-2.

C.1 See the P value of ADF Test

C.2 if p-value is less than level of significance usually 5% (reject the null hypothesis of unit root), thus the data is stationary. STOP the process

D. if p-value is greater than 5% level of significance

D.1 then see the p-value of estimated intercept

D.2 if p-value < 5% for estimated intercept then the data has unit root. Stop the process and repeat the process on difference data from initial step.

E. if p-value > 5% for estimated intercept then the intercept is insignificant, Go to step-2.

E.1 See the P value of ADF Test

E.2 if p-value is less than level of significance usually 5% (reject the null hypothesis of unit root), thus the data is stationary. STOP the process

F. if p-value is greater than 5% level of significance then the data is non stationary. Stop the process and repeat the process on difference data from initial step.

Here, In a Diagram, I have shown a stylized picture.

Source: Pfaff, B. (2008) Analysis of Integrated and Cointegrated Time Series with R. Second Edition. Springer, New York

At first let’s define the white noise. The expectation or mean of white noise is zero with some finite variance and ${{i}^{th}}$ and ${{j}^{th}}$observations are uncorrelated for $i\ne j$. Hence for white noise it needs to have following properties.

$E({{e}_{t}})=0\to (i)$

$E(e_{t}^{2})={{\sigma }^{2}}\to (ii)$

$E({{e}_{t}},{{e}_{j}})=0$ for $i\ne j\to (iii)$

If this white noise follows a normal distribution then its known as normal or Gaussain white noise which is written as $et\tilde{\ }N(0,{{\sigma }^{2}})$ or also termed as normally distributed random variable.

Further $(iii)$ can be replaced with stronger assumption of independentness which is then known as Independent white noise and if it follows normal distribution then its known as Gaussian independent white noise or normally distributed independent random variable. There is one note: Independentness means uncorrelatedness but not vice versa.

Let’s develop normally distributed 200 random variables with zero mean and a unit variance. But to make your result consistent with mine we need to use random seed, lets use seed as 1. If you don’t know about random seed then see my previous blog (here).

rm(list = ls())
set.seed(1)
n = 200 #For 200 random numbers
e = rnorm(n, mean = 0, sd = 1) # Creating 200 random numbers which is normally distributed with zero mean and unit variance.
plot(e, type = “l”, main = “200 normally distributed random numbers with zero mean and unit variance”)
hist(e, main = “histogram of Gaussain white noise with zero mean and unit variance”)

Here is histogram:

shapiro.test(e) #Testing if e is normal or not
library(tseries)
jarque.bera.test(e) # Jarque Bera Test for normaliyt

t.test(e, mu = 0) #Testing if mean of e is zero or not

acf(e, main="ACF of residuals")

library(forecast)
Box.test(e, lag=10, type = "Box-Pierce")
Box.test(e,lag=10, fitdf=0, type="Ljung-Box")

#test for stationary
kpss.test(e) #Null hypothesis is Data has no unit root or data is stationary
adf.test(e) #Null hypothesis is Data has unit root or data is non-stationary

You can see the video here and explanation after this video.

Let’s consider series ${{\varepsilon }_{0}}$ and ${{\varepsilon }_{1}}$ both have 100 data entry and these data are normally distributed with zero mean and 1 standard deviation. You can use following commands in R to generate the data.

rm(list = ls())
#Let's generate 100 random numbers and save them in object e0 and e1
n = 100
e0 = rnorm(n, mean = 0, sd = 1)
e1 = rnorm(n, mean = 0, sd = 1)

Then let's find the mean and standard deviation.

#Let's find the mean of series e0 and e1
mean(e0)
mean(e1)
sd(e0)
sd(e1)

You will find the mean and standard deviation both will come close to above specified values but not exact (may be yours values will be different than mine as well see the video). Further we can plot both series (may be your plot will be different than mine see the video).

#Let's plot e0 and e1 at same panel
layout(1:2)
plot(e0, type="l", main = "my first plot of normally distributed 100 number")
plot(e1, type="l", main = "my second plot of normally distributed 100 number")
layout(1:1)

However, you must have noticed that the plot of e0 and e1 are same in nature but different in plot (yours can be different than mine as well). Further, the mean of e0 and e1 are statistically zero as the one tail t-test will fail to reject the null hypothesis of zero mean. Similarly the standard deviation of e0 and e1 are also statistically one.

But, you can make your replication consistent with mine. For that we can use the random seed. let me use random seed as 1 and let's develop another series $e$ with 200 data with mean of zero and standard deviation of 1 (this is a by default setting). A random seed (or seed state, or just seed) is a number (or vector) used to initialize a pseudorandom number generator. Then let’s see the plot as well.

#Now, to make your result consistent with mine there is need to set a random number seed. Let’s set the random seed
set.seed (1)

#Now let’s develop a normally distributed random 500 numbers
e <− rnorm(200)
plot(e, type="l")

If we closely observe the series $e$, we can see that it reverts to the mean of zero and variance is stable. Thus $e$ is a stationary process (I will blog on stationary process latter but for general intuition see here)

Start by clicking the + sign in front of each points.

Gross domestic product/income or expenditure in producer's price (GDP/GDI/GDE)All +All -Gross domestic product/income or expenditure in producer's price (GDP/GDI/GDE)

+-Income method earnings or cost approach

+-method-1 (GDI = total factor income (TFI) + tax less subsidies in products)

Total factor income (TFI) or GDP at factor cost/basic prices)

+-Compensation of Employees (COE)

Compensation of employees (COE) measures the total remuneration to employees for work done. It includes wages and salaries, as well as employer contributions to social security and other such programs.

+-Gross Mixed Income (GMI)

Gross operating surplus (GOS) is the surplus due to owners of incorporated businesses. Often called profits, although only subsets of total costs are subtracted from gross output to calculate GOS.

+-Gross Operating Surplus (GOS)

Gross mixed income (GMI) is the same measure as GOS, but for unincorporated businesses. This often includes most small businesses.

To replicate, You can download the data (here). In this blog, I will discuss about the stationarity process of an $AR(p)$ process for ${{y}_{t}}$.

Let’s define an$AR(p)$ process for ${{y}_{t}}$ which is given as:
$yt={{\phi }_{1}}{{y}_{t-1}}+{{\phi }_{2}}{{y}_{t-2}}+\cdots +{{\phi }_{p}}{{y}_{t-p}}+\mu +{{v}_{t}}$

This equation can be expressed as:
${{y}_{t}}-{{\phi }_{1}}{{y}_{t-1}}-{{\phi }_{2}}{{y}_{t-2}}-\cdots -{{\phi }_{p}}{{y}_{t-p}}=\mu +{{v}_{t}}$

Now, let’s use a backshift or lag operator i.e. $(B{{y}_{t}}={{y}_{t-1}},{{B}^{2}}{{y}_{t}}={{y}_{t-2}},\cdots {{B}^{p}}{{y}_{t}}={{y}_{t-p}})$.

Then the above expression can be written as:
${{y}_{t}}-{{\phi }_{1}}B{{y}_{t}}-{{\phi }_{2}}{{B}^{2}}{{y}_{t}}-\cdots -{{\phi }_{p}}{{B}^{p}}{{y}_{t}}=\mu +{{v}_{t}}$

Now, we can take ${{y}_{t}}$ common,
$(1-{{\phi }_{1}}B-{{\phi }_{2}}{{B}^{2}}-\cdots -{{\phi }_{p}}{{B}^{p}}){{y}_{t}}=\mu +{{v}_{t}}$

Now, we can write
${{\phi }_{p}}(B){{y}_{t}}=\mu +{{v}_{t}}$

For the stationarity condition,
The ${yt}$ is stationary when the roots of the characteristic polynomial lie outside the unit circle. The modulus of polynomials roots of ${{\phi }_{p}}(B)$ i.e $|{{z}_{i}}|>1$ or $|{{\phi }_{i}}|>1$.

Let’s discuss the stationary condition of an $AR(p)$ process for ${{y}_{t}}$with a practical example. You can download the data (here). At first let’s load the monthly data of US interest rate from January 1960 till December 1998 and save that to object called “r”. Now, Let’s develop an $AR(4)$ process for “r” and save it to a new object called “ar_r”.

Hence, our equation can be given as:
$\begin{align}
& {{r}_{t}}=1.417{{r}_{t-1}}-0.587{{r}_{t-2}}+0.125{{r}_{t-3}}+0.024{{y}_{t-4}}+6.161+{{v}_{t}} \\
& or,{{r}_{t}}-1.417{{r}_{t-1}}+0.587{{r}_{t-2}}-0.125{{r}_{t-3}}-0.024{{y}_{t-4}}=6.161+{{v}_{t}} \\
\end{align}$

Now, when we use Lag operator,
${{r}_{t}}(1-1.417z+0.587{{z}^{2}}-0.125{{z}^{3}}-0.024{{z}^{4}})=6.161+{{v}_{t}}$

Now we can solve for values of $z$ in the equation$(1-1.417z+0.587{{z}^{2}}-0.125{{z}^{3}}-0.024{{z}^{4}})$.

To do such in R, let’s extract the coefficients from the object “ar_r” and save that to object called “phi”.

phi <- ar_r$coef

Then concatenate the 1 and the coefficients which their sigh reversed and save that to object called “c”.
c <- c(1, -phi[1:lag_r])

Finally, we can find the polynomials or the roots using following command. Let’s save such polynominals in object called “rt”
rt <- polyroot( c )
rt
Alternatively,
Mod(polyroot(c))

See the following diagram.

Here, two real roots and tow complex roots $(1-1.417z+0.587{{z}^{2}}-0.125{{z}^{3}}-0.024{{z}^{4}})$ are given as

Let’s represent these in modulus term using “complex” command and “modulus=Mod(rt)” and save it in “zz.shift” object. The modulus of complex number ${{z}_{3}}=1.285+1.716i$ is given as: $\sqrt{{{(1.285)}^{2}}+{{(1.716)}^{2}}}=2.141$ .

zz.shift <- complex(modulus = Mod(rt))
zz.shift

The moduli of roots are given as:
$\begin{align}
& |{{z}_{1}}|=1.028; \\
& |{{z}_{2}}|=|{{z}_{3}}|=2.141; \\
& |{{z}_{4}}|=8.753 \\
\end{align}$

The modulus of polynomials roots of ${{\phi }_{p}}(B)$ i.e $|{{z}_{i}}|>1$ or outside of unit root circle, hence this $AR(4)$ process for “r” is stationary. A plot can be developed with following codes.

# Plotting the roots in a unit circle
x <- seq(-1, 1, length = 1000)
y1 <- sqrt(1- x^2)
y2 <- -sqrt(1- x^2)
plot(c(x, x), c(y1, y2), xlab='Real part',
ylab='Complex part', type='l',
main='Unit Circle', ylim=c(-2, 2), xlim=c(-2, 2))
abline(h=0)
abline(v=0)
points(Re(polyroot(c)),
Im(polyroot(c)), pch=19)
legend(-1.5, -1.5, legend="Roots of AR(2)", pch=19)

References:
Pfaff, Bernhard (2008), Analysis of Integrated and Cointegrated Time Series with R, Springer Publishing Company
Martin, Vance, Hurn, Stan, & Harris, David (2013) Econometric Modelling with Time Series : Specification, Estimation and Testing. Themes in Modern Econometrics. Cambridge University Press, New York.

Download the data (here), Rcodes (here) to replicate the results of the video of this blog. The video is given at the end of the blog.

When we talk about the economic forecasting, compare to theoretical models, a-theoretical model (model with no economic theory) performs better. However, the data in a-theoretical model is differently pre-processed compare to data pre-processing in theoretical model. An economist can learn many lessons from machine learning algorithm and can have superior advantages to use them when forecasting. The entire process of forecasting can be roughly represented in following diagram.

At first, I would like to discuss about the data pre-processing. Data pre-processing techniques generally refer to the addition, deletion, or transformation of training set data. The data can be pre-processed in following steps.

1. Transformation (centering, scaling, skewness transformations, transformation to resolve the outliers)
2. Dealing with Missing Variables via data imputation
3. Data Reductions and Feature Extractions
4. Filtering (Removal of Redundant Predictors)
5. Binning Predictors (Development of Dummy Variables)

In this blog, I would like to show you how we can pre-process the data. The dimension of the tutorial data is 190 x 118 ie. There are 190 rows for 118 variables; hence there is no missing data. Now let’s to do transformation to make data more normal, then center and scale the data then lets filter the redundant variables and finally lets perform the feature extractions. For this we will use “caret” and “corrplot” package.

At First lets load the data and load the required packages:
index <- read.csv("Data Preprocessing.csv")
library("caret")
library("corrplot")

The data can be transformed using “BoxCox” method or “YeoJohnson” method. These transformations will correct the skewness of data and can made data look more normal. But, unlike “BoxCox” method which required all the data strictly to be positive, We will use “YeoJohnson” method because our data ranges positive as well as negative.

After we make data look more normal we will center and scale the data. To center the data, we simply subtract the average of that data from all the values i.e. $\left( \overline{X}-{{X}_{i}} \right)$. After centering, each predictor will have zero mean. Similarly to scale data, each data of variable is divided by the standard deviation of that variable i.e. $\left( \frac{{{X}_{i}}}{\sqrt{\sum\limits_{i=1}^{n}{{{({{X}_{i}}-\overline{X})}^{2}}/N}}} \right)$. After scaling, each predictor will have a unit standard deviation. Hence, centering and scaling means to center the variable to zero mean and unit standard deviation. After this the usuall process is to deal with the missing variable, fortunately we don’t have the missing variable. To deal with missing variable with a K-nearest neighbor model in which a new sample is imputed by finding the samples “closest” to it and averages these nearby points to fill in the value. However, one should know “why” there is missing data? And can such imputation really necessary?

The “YeoJohnson” transformation, centering and scaling of data can be done with one single “preProcess“ command from the “caret” package. Then such transformation can be imposed in data using “predict” command of same package.

trans <- preProcess(data, method = c("YeoJohnson", "center", "scale"))
trans
trans.data <- predict(trans, data)

After, this to reduce the computational burned, one can filter and eliminate the redundant variable. If two variables are highly correlated then that represents that both variables are measuring almost same underlying information. Removing one doesn’t compromise the performance of the model and might lead to a more parsimonious and interpretable model. This also suppress the multicollinearity problem.

At first let’s develop a correlation matrix object called “correlations” uusing “cor” command, then lets find the variables with absolute values of pair-wise correlations more than 0.75 value using “findCorrelation” command of “corrplot” package and save them in object called “highCorr”. Then, at last, let’s remove those variables.

Now, we can also do feature extraction implementing Principle component analysis. PCA is a data reduction technique which generates a smaller set of predictors that seek to capture a majority of the information in the original variables. This method seeks to find linear combinations of the predictors, known as principal components (PCs), which capture the most possible variance.

In the example of blog, the initial dimesion of data was 190 x 118, then we perform the “YeoJohnson” transformation then center and scale the variable. Further we remove the redundant variable to suppress the multicollinearity problem then the dimension of filtered data was 190 x 59. Finally, we performed the PCA and we found 36 PC can explain 95% variability of data and the dimension of final data was 36 x 190.

Instead reducing the redundant variable, try to transform, center, scale and extract PCA all at once by following command. You will see it will also extract 36 PC.

Finally, to generate the correlation plot use “findCorrelation” command and to find the variable loadings, in which the rows correspond to predictor variables and columns are associated with the components use trans.pca$rotation command.

Here, $g$ is an unknown function which has to be estimated from the data. We will try to fit a spline. A spline is a numeric function that is piecewise-defined by polynomial functions, and which possesses a high degree of smoothness at the places where the polynomial pieces connect (which are known as knots). Spline with order (or the degree of the piecewise polynomial) 3 represents a cubic spline (see here for more).

First, let’s find the best order of spline based on the AIC and BIC. You can find the blog (here) for where to use AIC and BIC. We will use bs() command for spline and its order as df. This command is available in spline package however when we load AER package, spline package is automatically loaded.

First, let’s generate an object called “spline.reg” which has the fitted linear models with spline with degree of 3 to 20 using lapply() command then let's extract the best model as per minimum AIC and using sapply(spline.reg, AIC) and sapply(spline.reg, AIC) save this object called “Best.AIC” and “Best.BIC” as the matrix form.

Then we can find the minimum AIC and BIC values using Best.AIC[which.min(Best.AIC)] and Best.AIC[which.min(Best.AIC)] command. Finally we can see the order of best model. Since the matrix starts from 3 to 20 lags, make sure to add extra 2 to find the best order.

Best.AIC=as.matrix(sapply(spline.reg, AIC))

Best.AIC[which.min(Best.AIC)]

Best.AIC

Best.BIC=as.matrix(sapply(spline.reg, BIC))

Best.BIC[which.min(Best.BIC)]

Best.BIC

As per AIC the best spline should have df=15 and as per BIC the best spline should have df=5. Lets develop the partial regression as “p.reg1” and “p.reg2” as follow:

As the model have DV as log of real wage and IDV as variable experience, education and ethnicity. To plot the data against the log of wage against experience we can use plot command. However, to visualize the quadratic relation, we will take experience from 0 to 60 from the CPS1988 data, for the value of education we will use the mean education of Caucasian ethnicity. This data subset is stored in objected call “data”.

Now, let’s predict the coefficient of the regression based on the “reg1”, “p.reg1” and “p.reg2” model using data of “data”.

data$yhat1=predict(reg1, newdata=data)

data$yhat2=predict(p.reg1, newdata=data)

data$yhat3=predict(p.reg2, newdata=data)

Now, let's plot the log of wage against the experience using CPS1988 data, we will use jitter command for experience for the sense of density and we will plot in grey color and do alpha bending (transparent or opaque depend upon values of alpha)

plot(log(wage) ~ jitter(experience, factor = 3), pch = 19, col = rgb(0.5, 0.5, 0.5, alpha = 0.02), data = CPS1988)

Now, in this plot, we can plot the quadratic relation which we have achieved implementing “reg1” model in data frame called “data”. Similarly, we can also plot the partial regression achieved by “p.reg1” and p.reg2” model. Let’s do that:

lines(yhat1 ~ experience, data = data, lty = 1, col = 1, lwd = 2)

lines(yhat2 ~ experience, data = data, lty = 1, col = 2, lwd = 1)

lines(yhat3 ~ experience, data = data, lty = 1, col = 4, lwd = 3 )

legend("topleft", c("quadratic", "spline with df = 5", "spline with df = 15"),

lty = c(1,1,1), col = c(1,6,4), lwd = c(2,1,3),

bty = "n")

Interpretation

The plot of “reg1” shows only the quadratic relation only whereas the p.reg1 over estimates the data and is far away from the application in real life however, the “p.reg2” model shows that for nearly 2-3 years of experience the wage rises quickly but nearly 7-8 years of experience the wage increases passively and wage increases sluggishly from 8-30 years of experience and after 30 years of experience the wage tend to fall.

Reference:

C. Kleiber, A. Zeileis, Applied Econometrics with R,

Let’s install package called AER using the code install.packages("AER") then load the package into to R using command called library(AER). Now, Let’s use the CPS1988 data. This data is collected by Current Population Survey (CPS) by the US Census Bureau. The data is cross-section on males aged 18 to 70 with positive annual income greater than US$ 50 in 1992 who are not self-employed or working without pay. Wages are deflated by the deflator of personal consumption expenditures for 1992. A summary of the data set can be obtained using command called summary(CPS1988).

Try to answer following question (I have answered all these question in video below):
Which are the quantitative and categorical variables?

The wage is the wage in dollars per week, education and experience are measured in years, and ethnicity is a factor with levels Caucasian ("cauc") and African-American ("afam"). There are three further factors, smsa, region, and parttime, indicating residence in a standard metropolitan statistical area (SMSA), the region within the United States of America, and whether the individual works part-time.

Let’s use the model: \[\log (wage)={{\beta }_{1}}+{{\beta }_{2}}experience+{{\beta }_{3}}experienc{{e}^{2}}+{{\beta }_{2}}education+{{\beta }_{2}}ethinicity+\varepsilon\]

The model is in semilogarithmic form and also has the squared regressor i.e $experienc{{e}^{2}}$. For that lets use the lm() command and store the result in reg1 object. Then to see the summary of that reg1 object let’s use summary(reg1) command.

Try to answer following question:
Is all the sign of the regressor are as per expected?
Why we insulated with to specify the $experienc{{e}^{2}}$?
What does the negative sigh of $I(experienc{{e}^{2}})$ represents?
Which co-efficient shows the return on education? And what does that mean?
What does this co-efficient means and how to interpret?

ANOVA
Let do an ANOVA in reg1 using the command called anova(reg1). Try to answer following:
Where you can get the total variance?
Of that total variance, how much variance is explained by the variable of model and residual?
How the F-ratio is calculated?

Comparison of Model by evaluating ANOVA
Let’s make another model called reg2 same as reg1 instead this time we will drop the ethinicity variable. Using following command:

The expression . ~ . - ethnicity specifies to take the LHS and RHS in the formula (signaled by the “.”), only removing ethnicity on the RHS. You can also find ANOVA using anova(reg2).

Now let’s compare model reg2 with reg1 using the command anova(reg1, reg2). The model which has less RSS is better. Try to answer following question:

Which model is better?
Where you can find the RSS of each model?

Comparison of Model with wald test.
Now let’s install the package called install.packages("lmtest") then use library(lmtest). Then, let’s perform a wald test in reg1 reducing ethnicity variable.

waldtest(reg1, . ~ . - ethnicity)

Try to answer following question:
Even thou the wald test here gave the same result with previous ANOVA comparison, but Why Wald test is sometime prefer compare to ANOVA comparison?

By the way the plot of log of wage with experience look like following:

Here is the video in which I have answered all those questions:

Here in the table, there are $n=25$ observations of variable $Y$ and variable $X$.

Say we state that $Y$ depends on $X$ and in functional form it can be written as: $Y=f(X)$. Say, such function is linear then: ${{Y}_{i}}={{\beta }_{0}}+{{\beta }_{1}}{{X}_{i}}$. The true values of Y-intercept ${{\beta }_{0}}$ and slope ${{\beta }_{1}}$ are only known by GOD.

A man however can do some estimate for them. Estimate of $Y$ is $\hat{Y}$ and ${{\hat{Y}}_{i}}={{\hat{\beta }}_{0}}+{{\hat{\beta }}_{1}}{{X}_{i}}$ where ${{\hat{\beta }}_{0}}$ is estimate of ${{\beta }_{0}}$and ${{\hat{\beta }}_{1}}$ is estimate of${{\beta }_{1}}$. For estimate a man can however try to make the sums of deviations between ${{Y}_{i}}$and ${{\hat{Y}}_{i}}$zero i.e. $\sum\limits_{i=1}^{n}{\left( {{Y}_{i}}-\hat{Y} \right)}=0$. However, the zero cannot be easily expressed and interpreted hence, a man tried to find another way: he tried to minimize the sum of squares of deviation. Here note that, ${{Y}_{i}}={{\hat{\beta }}_{0}}+{{\hat{\beta }}_{1}}{{X}_{i}}+{{\varepsilon }_{i}}$.

Hence this function can be written as to minimize sum of the squares of deviation $S$ :
\[\begin{align}
& S=\sum\limits_{i=1}^{n}{{{({{Y}_{i}}-\hat{Y})}^{2}}} \\
& or,S=\sum\limits_{i=1}^{n}{{{({{Y}_{i}}-{{{\hat{\beta }}}_{0}}-{{{\hat{\beta }}}_{1}}{{X}_{i}})}^{2}}}\to (i) \\
\end{align}\]
For such, we can take first order derivatives w.r.t ${{\hat{\beta }}_{0}}$and ${{\hat{\beta }}_{1}}$ and set them to zero. At first let’s take first order derivatives w.r.t ${{\hat{\beta }}_{0}}$

\[\begin{align}
& \frac{\partial S}{\partial {{{\hat{\beta }}}_{0}}}=\frac{\partial \sum\limits_{i=1}^{n}{{{({{Y}_{i}}-{{{\hat{\beta }}}_{0}}-{{{\hat{\beta }}}_{1}}{{X}_{i}})}^{2}}}}{\partial {{{\hat{\beta }}}_{0}}} \\
& or,\frac{\partial S}{\partial {{{\hat{\beta }}}_{0}}}=\frac{\partial \sum\limits_{i=1}^{n}{{{({{Y}_{i}}-{{{\hat{\beta }}}_{0}}-{{{\hat{\beta }}}_{1}}{{X}_{i}})}^{2}}}}{\partial ({{Y}_{i}}-{{{\hat{\beta }}}_{0}}-{{{\hat{\beta }}}_{1}}{{X}_{i}})}\frac{\partial ({{Y}_{i}}-{{{\hat{\beta }}}_{0}}-{{{\hat{\beta }}}_{1}}{{X}_{i}})}{\partial {{{\hat{\beta }}}_{0}}} \\
& or,\frac{\partial S}{\partial {{{\hat{\beta }}}_{0}}}=2\sum\limits_{i=1}^{n}{({{Y}_{i}}-{{{\hat{\beta }}}_{0}}-{{{\hat{\beta }}}_{1}}{{X}_{i}})}(-1) \\
& or,\frac{\partial S}{\partial {{{\hat{\beta }}}_{0}}}=-2\sum\limits_{i=1}^{n}{({{Y}_{i}}-{{{\hat{\beta }}}_{0}}-{{{\hat{\beta }}}_{1}}{{X}_{i}})} \\
\end{align}\]
Let’s set the above equation to zero
\[\begin{align}
& 0=-2\sum\limits_{i=1}^{n}{({{Y}_{i}}-{{{\hat{\beta }}}_{0}}-{{{\hat{\beta }}}_{1}}{{X}_{i}})} \\
& or,\sum\limits_{i=1}^{n}{({{Y}_{i}}-{{{\hat{\beta }}}_{0}}-{{{\hat{\beta }}}_{1}}{{X}_{i}})}=0 \\
& or,\sum\limits_{i=1}^{n}{{{Y}_{i}}}-\sum\limits_{i=1}^{n}{{{{\hat{\beta }}}_{0}}-}\sum\limits_{i=1}^{n}{{{{\hat{\beta }}}_{1}}{{X}_{i}}}=0 \\
& \therefore \sum\limits_{i=1}^{n}{{{Y}_{i}}}=n{{{\hat{\beta }}}_{0}}+{{{\hat{\beta }}}_{1}}\sum\limits_{i=1}^{n}{{{X}_{i}}}\to (ii) \\
\end{align}\]
Let’s take first order derivatives w.r.t ${{\hat{\beta }}_{1}}$
\[\begin{align}
& \frac{\partial S}{\partial {{{\hat{\beta }}}_{1}}}=\frac{\partial \sum\limits_{i=1}^{n}{{{({{Y}_{i}}-{{{\hat{\beta }}}_{0}}-{{{\hat{\beta }}}_{1}}{{X}_{i}})}^{2}}}}{\partial {{{\hat{\beta }}}_{1}}} \\
& or,\frac{\partial S}{\partial {{{\hat{\beta }}}_{1}}}=\frac{\partial \sum\limits_{i=1}^{n}{{{({{Y}_{i}}-{{{\hat{\beta }}}_{0}}-{{{\hat{\beta }}}_{1}}{{X}_{i}})}^{2}}}}{\partial ({{Y}_{i}}-{{{\hat{\beta }}}_{0}}-{{{\hat{\beta }}}_{1}}{{X}_{i}})}\frac{\partial ({{Y}_{i}}-{{{\hat{\beta }}}_{0}}-{{{\hat{\beta }}}_{1}}{{X}_{i}})}{\partial {{{\hat{\beta }}}_{1}}} \\
& or,\frac{\partial S}{\partial {{{\hat{\beta }}}_{0}}}=2\sum\limits_{i=1}^{n}{({{Y}_{i}}-{{{\hat{\beta }}}_{0}}-{{{\hat{\beta }}}_{1}}{{X}_{i}})}(-{{X}_{i}}) \\
& or,\frac{\partial S}{\partial {{{\hat{\beta }}}_{0}}}=-2\sum\limits_{i=1}^{n}{({{Y}_{i}}{{X}_{i}}-{{{\hat{\beta }}}_{0}}{{X}_{i}}-{{{\hat{\beta }}}_{1}}X_{i}^{2})} \\
\end{align}\]
Let’s set the above equation to zero
\[\begin{align}
& 0=-2\sum\limits_{i=1}^{n}{({{Y}_{i}}{{X}_{i}}-{{{\hat{\beta }}}_{0}}{{X}_{i}}-{{{\hat{\beta }}}_{1}}X_{i}^{2})} \\
& or,\sum\limits_{i=1}^{n}{({{Y}_{i}}{{X}_{i}}-{{{\hat{\beta }}}_{0}}{{X}_{i}}-{{{\hat{\beta }}}_{1}}X_{i}^{2})}=0 \\
& or,\sum\limits_{i=1}^{n}{{{Y}_{i}}{{X}_{i}}}-\sum\limits_{i=1}^{n}{{{{\hat{\beta }}}_{0}}X}-\sum\limits_{i=1}^{n}{{{{\hat{\beta }}}_{1}}X_{i}^{2}}=0 \\
& \therefore \sum\limits_{i=1}^{n}{{{Y}_{i}}{{X}_{i}}}=\sum\limits_{i=1}^{n}{{{{\hat{\beta }}}_{0}}X}+\sum\limits_{i=1}^{n}{{{{\hat{\beta }}}_{1}}X_{i}^{2}}\to (iii) \\
\end{align}\]
Let’s consider equation (ii) and (iii) and expressed them in matrix form:
\[\begin{align}
& \sum\limits_{i=1}^{n}{{{Y}_{i}}}=n{{{\hat{\beta }}}_{0}}+{{{\hat{\beta }}}_{1}}\sum\limits_{i=1}^{n}{{{X}_{i}}} \\
& \sum\limits_{i=1}^{n}{{{Y}_{i}}{{X}_{i}}}=\sum\limits_{i=1}^{n}{{{{\hat{\beta }}}_{0}}X}+\sum\limits_{i=1}^{n}{{{{\hat{\beta }}}_{1}}X_{i}^{2}} \\
& or,\left[ \begin{matrix}
\sum\limits_{i=1}^{n}{{{Y}_{i}}} \\
\sum\limits_{i=1}^{n}{{{Y}_{i}}{{X}_{i}}} \\
\end{matrix} \right]=\left[ \begin{matrix}
n{{{\hat{\beta }}}_{0}}+{{{\hat{\beta }}}_{1}}\sum\limits_{i=1}^{n}{{{X}_{i}}} \\
\sum\limits_{i=1}^{n}{{{{\hat{\beta }}}_{0}}X}+\sum\limits_{i=1}^{n}{{{{\hat{\beta }}}_{1}}X_{i}^{2}} \\
\end{matrix} \right] \\
& \therefore \left[ \begin{matrix}
\sum\limits_{i=1}^{n}{{{Y}_{i}}} \\
\sum\limits_{i=1}^{n}{{{Y}_{i}}{{X}_{i}}} \\
\end{matrix} \right]=\left[ \begin{matrix}
n & \sum\limits_{i=1}^{n}{{{X}_{i}}} \\
\sum\limits_{i=1}^{n}{X} & \sum\limits_{i=1}^{n}{X_{i}^{2}} \\
\end{matrix} \right]\left[ \begin{matrix}
{{{\hat{\beta }}}_{0}} \\
{{{\hat{\beta }}}_{1}} \\
\end{matrix} \right]\to (iv) \\
\end{align}\]
Now, we can solve equation (iv) which is known as normal equation. This is in traditional algebraic method and can be solved. However, I wish to introduce the matrix terms form here. So let’s develop some matrix notation.

Let’s consider matrix now.

Y: ${{\left[ \begin{matrix}
10 \\
11 \\
12 \\
\vdots \\
12 \\
\end{matrix} \right]}_{25*1}}={{\left[ \begin{matrix}
{{Y}_{1}} \\
{{Y}_{2}} \\
{{Y}_{2}} \\
\vdots \\
{{Y}_{25}} \\
\end{matrix} \right]}_{25*1}}$. This has 25 rows and 1 column. Say, matrixes of ones can be given as I =${{\left[ \begin{matrix}
1 \\
1 \\
1 \\
\vdots \\
1 \\
\end{matrix} \right]}_{25*1}}$. For matrix of X we will column wise concatenate the matrix I with observations of X. X =${{\left[ \begin{matrix}
\begin{matrix}
1 \\
1 \\
\vdots \\
1 \\
1 \\
\end{matrix} & \begin{matrix}
35 \\
29 \\
\vdots \\
33 \\
28 \\
\end{matrix} \\
\end{matrix} \right]}_{25*2}}={{\left[ \begin{matrix}
\begin{matrix}
1 \\
1 \\
\vdots \\
1 \\
1 \\
\end{matrix} & \begin{matrix}
{{X}_{1}} \\
{{X}_{2}} \\
\vdots \\
{{X}_{24}} \\
{{X}_{25}} \\
\end{matrix} \\
\end{matrix} \right]}_{25*2}}$. The matrix of estimates can be given as: B = ${{\left[ \begin{matrix}
{{{\hat{\beta }}}_{0}} \\
{{{\hat{\beta }}}_{1}} \\
\end{matrix} \right]}_{2*1}}$ and matrix of errors can be represented as: Є =${{\left[ \begin{matrix}
{{\varepsilon }_{1}} \\
{{\varepsilon }_{2}} \\
\vdots \\
{{\varepsilon }_{24}} \\
{{\varepsilon }_{25}} \\
\end{matrix} \right]}_{25*1}}$. Then equation ${{Y}_{i}}={{\hat{\beta }}_{0}}+{{\hat{\beta }}_{1}}{{X}_{i}}+{{\varepsilon }_{i}}$ can be expressed as: Y = XB + Є. Here, XB is not the same as BX.

X’Y = X’XB holds. Now to solve for B, we can do B = (X’X)^{-1} X’Y
Where, (X’X)^{-1} is the inverse matrix of X’X.

Interestingly, Ŷ = XBand putting values of B we get Ŷ = X(X’X)^{-1} X’Yand let considerX(X’X)^{-1} X’ = Hor hat matrix. The dimension of H is n*n. then Ŷ = HY
The hat matrix H is symmetric and idempotent i.e HH=H

The residuals given as Є, like the fitted values of Ŷ can be expressed as linear combinations of the response variable observations Y .

Є = Y - Ŷ = Y – HY = (I – H)Y.

Now, the matrix normal equation can be derived by minimization of: S = Є’Є = (Y - XB)'(Y - XB) w.r.t B.

In this blog I will show you the GARCH model estimation, Backtesting the risk model and Forecasting. Here is my Video:

STEP-1: Installing required packages and loading them as library in R

We will need following packages so if you have not installed them use the following command.
install.packages(c("tseries", "zoo", "forecast", "FinTS", "rugarch"))

Now, load these packages in your R.
library("tseries")
library("zoo")
library("forecast")
library("FinTS")
library("rugarch")

STEP-2: Downloading the Data

Let’s download the daily adjusted close data of Dow Jones Industrial Average (^DJI) from first of January 2010 till 23rd of July 2015 (today is 24th July) using Yahoo then return that to zoo class object named DJI.data.

You can use following command to plot that DJI.data object.
plot(DJI.data, main = "DJIE Closing Prices on NASDAQ", ylab = "Price (USD)", xlab = "Date")

Fig-1

STEP-3: Getting the Returns of the Data

Let’s take the log return of DJI.data and name the object ans DJI.ret and plot the data. You can see in the Fig-2 that high volatile days are followed by high volatile days and low volatile days by low volatile days.

DJI.ret <- diff(log(DJI.data)) * 100
plot(DJI.ret, main = "Monthly Compound Returns", xlab = "Date", ylab = "Return in percent")

Fig-2

STEP-4: Finding Best Mean Model Using ARIMA

Now, let’s try to fit a best ARIMA (p,d,q) model based upon the Bayesian Information Criterions. Use the following command. Note that returns are always reverting to some mean returns hence you should expect the returns to be I(0). Hence, the ARIMA is actual only an ARMA (p,q) process. Get the intuition on stationary (here).

The above command gave us an ARIMA(1,0,1) or ARMA(1,1) model. Usually, the high frequency data fails the basic econometric assumption of normality, non autocorrelation and heteroskedasticity.

STEP-5: Test for ARCH Effect

However to proceed ahead with GARCH model, the residual of ARIMA must need to have the ARCH effect. To test the ARCH, lets perform the Ljung-Box test on the first 12 lags of the squared residuals of the best ARIMA model under the null hypothesis of no ARCH effects.

If p-value of Ljung-Box test is smaller than 5% level of significance then there exist the ARCH effect which shows the green light to proceed ahead to GARCH.

STEP-6: Developing a GARCH Model

Instead to fit a GARCH(P,Q), in this blog, I will only fit the GARCH(1,1) for the sake of simplicity. For GARCH theory, check my blog (here). Lets specify object called res_garch11_spec in which we want to develop a GARCH(1,1) on ARIMA(1,0,1) or in our case ARMA(1,1).

res_garch11_fit <- ugarchfit(spec = res_garch11_spec, data = DJI.ret)
res_garch11_fit

STEP-7: Backtesting the risk model

Once we have fitted the GARCH which modeled the risk. We would like to check the model performance by performing historical backtest. For that, we can compare the estimated VaR (value at risk) with the actual return over the period. If the return is more negative than the VaR, we have a VaR exceedance. In our case, a VaR exceedance should only occur in 1% of the cases if we specified a 99% confidence level. VaR at 1% shows what 1% probability of my extreme loss.

For that lets implement a rolling window (refit.window = "moving") with start period of the backtest of 120 i.e n.start = 120, reestimated each day (refit.every = 1) implementing hybrid solver (solver = "hybrid") and calculate the VaR (calculate.VaR = TRUE) at the 99% VaR tail level (VaR.alpha = 0.01). And finally lets keep the coefficients (keep.coef = TRUE). There can be chances of model not to be converged so lets implement the ctrl = list(tol = 1e-7, delta = 1e-9) command prior to estimate by a rolling window.

STEP-8: Kupiec's Unconditional Coverage and the Christoffersen Test

We can examine the backtesting report using the report function. By specifying the type argument as VaR, the function executes the unconditional and conditional coverage tests for exceedances. VaR.alpha is the tail probability and conf.level is the conﬁdence level on which the conditional coverage hypothesis test will be based.

report(res_garch11_roll, type = "VaR", VaR.alpha = 0.01, conf.level = 0.99)

To compare the number of expected versus actual exceedances given the tail probability of VaR, we can see the Kupiec's unconditional coverage, while the Christoffersen test tests a joint test of the unconditional coverage and the independence of the exceedances. If the actual exceedances are greater than the expected exceedances then we can't reject the null hypothesis that the exceedances are correct and independent.

STEP-9: Plotting the VaR Limits

A plot of the backtesting performance can also be generated easily using following command and selecting 2.

plot(res_garch11_fit)
selection: 2

Fig-3

Here, in the figure the returns of the data (blue) hits the 1% Var (Red) 47 times compare to 13 times expected. Hence we can state the rejection of the null hypothesis that the exceedances are correct and independent. The p-values of Kupiec's unconditional coverage is also less than 5% suggests same. Hence model shouldn't be used in real life. So forecasting is very hazardious however we will forecast for the sake of learning. To mitigate the problem, probably, we should find best GARCH(P,Q) model rather than GARCH(1,1) or may be we need to model other extensions of GARCH.

STEP-10: Forecasting Risk and VaR

For forecasting we can implement following command. Here we just forecast for 12 time period ahead.

The forecasts for the volatility are given by sigma and 1% VaR is given just by the side for each period. For the period T+3, the expected volatility is 0.0665 and the 1% VaR is 0.7609.

Reference:
I used and modified the R codes of book "Introduction to R for Quantitative Finance" which is combinely written by Gergely Daróczi, Michael Puhle, Edina Berlinger, Péter Csóka, Dániel Havran, Márton Michaletzky, Zsolt Tulassay, Kata Váradi and Agnes Vidovics-Dancs

High frequency time series generally exhibits the volatility. The Volatility in general means the risk in econometrics and it’s synonymous to standard deviation of data. To model the volatility of any high frequency time series data Generalized AutoRegressive Conditional Heteroskedasticity (GARCH) Process is modeled. Here is a figure of such volatility.

To run a GARCH first develop a best ARIMA process. For all the tutorial blogs on ARIMA see (here). In ARIMA, we first make data stationary (find the intuition of stationary (here)), then we find an AR (to justify the grounds of habit persistence of any time series) and MA process (to justify the grounds of habit persistence of any errors left after the AR process).

An ARIMA (p,q,d) process is given as: φp(B)Δdyt = θq(B)εt (see my previous post on ARIMA (here)). Say a series is ${{Y}_{t}}$ and let it is either transform or differenced appropriately to be stationary and such stationary series of ${{Y}_{t}}$ is ${{R}_{t}}$. The ARMA $(p,q)$ process of ${{R}_{t}}$ can be alternatively written as:

Now once we fit the best ARIMA (What is best see (here, here, here)) we are again left with residuals of ARIMA i.e. ${{\varepsilon }_{t}}$ . Unconditionally, the error term (${{\varepsilon }_{t}}$) is a zero mean white noise process. The conditional distribution of ${{\varepsilon }_{t}}$ is normal, $N(0,{{h}_{t}})$. The residual should be free from non-normality, autocorrelation and heteroskedasticity. The residual mostly becomes well behaved if we find the best ARIMA.

However, for the high frequency data, the Autoregressive Conditional Heteroskedasticity (ARCH) effect persists because the high volatility are followed by high volatility times and low with low times. Hence second step is to check whether such ARCH effect persist in residual of ARIMA or not. If ARCH persist then we move ahead to GARCH process. A GARCH $(P,Q)$ can be modeled with the $N(0,{{h}_{t}})$ white noised error term (${{\varepsilon }_{t}}$).

where ${{a}_{1}}>0$, ${{\alpha }_{m}},{{\beta }_{n}}\ge 0$ for all $m$ and $n$ and $\sum\limits_{m=1}^{P}{{{\alpha }_{m}}}+\sum\limits_{n=1}^{Q}{{{\beta }_{n}}<1}$.

We have an hypothetical economy in which there are two industry which produces two commodities. These industries employ the labor and capital as factor of production. Household provides these factors of production to the firms. In return firm pays the payment to Household. The household consumes all their income in this hypothetical economy. The household also demands and consume the commodities produced by firm. The utility of Household is Cobb Douglas and the production function of Firms is also modeled in Cobb Douglas function.

These relationships in values are given in following IO table.

We develop the Header Array file based on this IO table. See the previous blog (here). The mathematical relationships are discussed and derived (here). The programming of such mathematical relation is modeled (here).

Now, in this blog I want to show you the simulation on this hypothetical economy. I will specifically discuss on what will happen to this hypothetical economy when there is labor market shock? Or Say the labor is expected to grow 10% or reduced by 5% then what happen to total income of household? What happens to the prices of factors of production? What can happen in the industry, which industry will specialized (or grow) and which industry will fade out? To find the detail see this tutorial video.

Well, I took some time to write this blog. Anyways, Let's start to find the logic of GEMPACK programing. To understand this blog you should understand all my previous blogs on CGE. Therefore, this blog can be a bit rigorous.

I have explained about these codes and their logic in the video below.

I suggest you to watch this video in full scree and in HD. I also request you to write all these codes by your own. But for those who are lazy here is the file (here).

In the programming we define
1. sets and subsets
2. loading input output table or HAR file
3. define the variables
4. coefficients
5. read the data from the HAR file
6. define the formula
7. define the equations

Points to be noted:
1. save the file you have to use the quotation. Say if you want to save the file as ASDF.tab you have to do: "ASDF.tab".

2. To check if your codes have syntax error, use the Tablo Check options

3. Make sure you end your codes with semicolon

In the next blog I will develop a command file to impose the shock in this hypothetical economy.
See you.