## Monday, August 22, 2011

### SVAR Estimation

For the next phase of my GSoC project, I integrated SVAR estimation with restrictions on the within period effects and shock identification. This follows the method outlined in section 11.6 of Hamilton (1994) [1]. In order to show how the system works, I will work through an example.

Structural VARs of the type I will be positing here are best suited to situations where impulse response function shocks need to be identified, but there is little to no theoretical justification for the within period dynamics required for orthogonalization. While this allows greater flexibility to the researcher in specifying the model, it also leaves greater room for error, especially when specifying what parameters can be assumed zero and what should be estimated, and whether or not this specification meets the criteria for shock identification.

Given time series data, SVAR class, initiation requires (at a minimum):

1) svar_type
2) A, B matrices marking with an 'E' or 'e' where parameters are unknown

This must be A,B, or AB. An 'A' system assumed that the matrix premulitplying the error matrix is the identity matrix and vice-versa for the 'B' system. In an 'AB' system, elements in both the A and B matrix need to be estimated. The system can be summarized as:

$Ay_t = c + A_1y_{t-1} + A_2y_{t-2} + \cdots + A_py_{t-p} + B\varepsilon_t$

We estimate A, B in a second stage after A_1-A_n, /Sigma_u are estimated via OLS, by maximizing the log-likelihood:

$\dpi{80} \log{L} = - \frac{Tn}{2}\log(2π) + \frac{T}{2}\log|A|^2 - \frac{T}{2}\ln|B|^2 - \frac{T}{2}trace(A'B'^{-1}B^{-1}A\Sigma_u)$

We could set up an example system as:

In [1]: import numpy as np

In [2]: A = np.array([[1, 'E', 0], [1, 'E', 'E'], [0, 0, 1]])

In [3]: A
Out[3]:
array([['1', 'E', '0'],
['1', 'E', 'E'],
['0', '0', '1']],
dtype='|S1')

In [4]: B = np.array([['E', 0, 0], [0, 'E', 0], [0, 0, 'E']])

In [5]: B
Out[5]:
array([['E', '0', '0'],
['0', 'E', '0'],
['0', '0', 'E']],
dtype='|S1')



In order to aid numerical maximum likelihood estimation, the SVAR class fit can also be passed guess matrices for both A and B parameters.

Building upon the previously used three variable macro system, a simple example of an AB system estimation is as follows:

import numpy as np
import scikits.statsmodels.api as sm
from scikits.statsmodels.tsa.api import VAR, SVAR
import matplotlib.pyplot as plt

mdata = mdata[['realgdp','realcons','realinv']]
names = mdata.dtype.names
data = mdata.view((float,3))
data = np.diff(np.log(data), axis=0)

#define structural inputs
A = np.asarray([[1, 0, 0],['E', 1, 0],['E', 'E', 1]])
B = np.asarray([['E', 0, 0], [0, 'E', 0], [0, 0, 'E']])
A_guess = np.asarray([0.5, 0.25, -0.38])
B_guess = np.asarray([0.5, 0.1, 0.05])
mymodel = SVAR(data, svar_type='AB', A=A, B=B, names=names)
res = mymodel.fit(maxlags=3, maxiter=10000, maxfun=10000, solver='bfgs')
res.irf(periods=30).plot(impulse='realgdp', plot_stderr=False)



From here, we can access the estimates of both A and B:

In [2]: res.A
Out[2]:
array([[ 1.        ,  0.        ,  0.        ],
[-0.50680204,  1.        ,  0.        ],
[-5.53605672,  3.04117688,  1.        ]])

In [3]: res.B
Out[3]:
array([[ 0.00757566,  0.        ,  0.        ],
[ 0.        ,  0.00512051,  0.        ],
[ 0.        ,  0.        ,  0.02070894]])



This also produces a SVAR irf, resulting from a impulse shock to realgdp:

We can check our work by comparing our work to comparable packages (see below) and also performing some simple calculations. If we calculated A and B correctly, then:

$\Sigma_u = A^{-1}BB(A^{-1})'$

We find:

In [10]: P = np.dot(npl.inv(res.A), res.B)

In [11]: np.dot(P,P.T)
Out[11]:
array([[  5.73906806e-05,   2.90857141e-05,   2.29263262e-04],
[  2.90857141e-05,   4.09603703e-05,   3.64524316e-05],
[  2.29263262e-04,   3.64524316e-05,   1.58721639e-03]])

In [12]: res.sigma_u
Out[12]:
array([[  5.73907395e-05,   2.90857557e-05,   2.29263451e-04],
[  2.90857557e-05,   4.09604397e-05,   3.64524456e-05],
[  2.29263451e-04,   3.64524456e-05,   1.58721761e-03]])



Through much trial, it seems like the 'bfgs' estimator is the best for solving the maximum likelihood problem. I would like to investigate in the future why this may be. Estimating the likelihood was aided greatly by bringing in the likelihood methods already present in the base component of the statsmodels package.

This equivalent system can be estimated and plotted in the R [2] script:

library("vars")

names <- colnames(data)
data <- log(data[c('realgdp','realcons','realinv')])
data <- sapply(data, diff)
data = ts(data, start=c(1959,2), frequency=4)

amat <- matrix(0,3,3)
amat[1,1] <- 1
amat[2,1] <- NA
amat[3,1] <- NA
amat[2,2] <- 1
amat[3,2] <- NA
amat[3,3] <- 1
bmat <- diag(3)
diag(bmat) <- NA
svar <- SVAR(var, estmethod = 'scoring', Bmat=bmat, Amat=amat)