Monday, August 22, 2011

SVAR Estimation

For the next phase of my GSoC project, I integrated SVAR estimation with restrictions on the within period effects and shock identification. This follows the method outlined in section 11.6 of Hamilton (1994) [1]. In order to show how the system works, I will work through an example.

Structural VARs of the type I will be positing here are best suited to situations where impulse response function shocks need to be identified, but there is little to no theoretical justification for the within period dynamics required for orthogonalization. While this allows greater flexibility to the researcher in specifying the model, it also leaves greater room for error, especially when specifying what parameters can be assumed zero and what should be estimated, and whether or not this specification meets the criteria for shock identification.

Given time series data, SVAR class, initiation requires (at a minimum):

1) svar_type
2) A, B matrices marking with an 'E' or 'e' where parameters are unknown

This must be A,B, or AB. An 'A' system assumed that the matrix premulitplying the error matrix is the identity matrix and vice-versa for the 'B' system. In an 'AB' system, elements in both the A and B matrix need to be estimated. The system can be summarized as:



We estimate A, B in a second stage after A_1-A_n, /Sigma_u are estimated via OLS, by maximizing the log-likelihood:



We could set up an example system as:

In [1]: import numpy as np In [2]: A = np.array([[1, 'E', 0], [1, 'E', 'E'], [0, 0, 1]]) In [3]: A Out[3]: array([['1', 'E', '0'], ['1', 'E', 'E'], ['0', '0', '1']], dtype='|S1') In [4]: B = np.array([['E', 0, 0], [0, 'E', 0], [0, 0, 'E']]) In [5]: B Out[5]: array([['E', '0', '0'], ['0', 'E', '0'], ['0', '0', 'E']], dtype='|S1')

In order to aid numerical maximum likelihood estimation, the SVAR class fit can also be passed guess matrices for both A and B parameters.

Building upon the previously used three variable macro system, a simple example of an AB system estimation is as follows:

import numpy as np import scikits.statsmodels.api as sm from scikits.statsmodels.tsa.api import VAR, SVAR import matplotlib.pyplot as plt mdata = sm.datasets.macrodata.load().data mdata = mdata[['realgdp','realcons','realinv']] names = mdata.dtype.names data = mdata.view((float,3)) data = np.diff(np.log(data), axis=0) #define structural inputs A = np.asarray([[1, 0, 0],['E', 1, 0],['E', 'E', 1]]) B = np.asarray([['E', 0, 0], [0, 'E', 0], [0, 0, 'E']]) A_guess = np.asarray([0.5, 0.25, -0.38]) B_guess = np.asarray([0.5, 0.1, 0.05]) mymodel = SVAR(data, svar_type='AB', A=A, B=B, names=names) res = mymodel.fit(maxlags=3, maxiter=10000, maxfun=10000, solver='bfgs') res.irf(periods=30).plot(impulse='realgdp', plot_stderr=False)

From here, we can access the estimates of both A and B:

In [2]: res.A Out[2]: array([[ 1. , 0. , 0. ], [-0.50680204, 1. , 0. ], [-5.53605672, 3.04117688, 1. ]]) In [3]: res.B Out[3]: array([[ 0.00757566, 0. , 0. ], [ 0. , 0.00512051, 0. ], [ 0. , 0. , 0.02070894]])

This also produces a SVAR irf, resulting from a impulse shock to realgdp:


We can check our work by comparing our work to comparable packages (see below) and also performing some simple calculations. If we calculated A and B correctly, then:



We find:

In [10]: P = np.dot(npl.inv(res.A), res.B) In [11]: np.dot(P,P.T) Out[11]: array([[ 5.73906806e-05, 2.90857141e-05, 2.29263262e-04], [ 2.90857141e-05, 4.09603703e-05, 3.64524316e-05], [ 2.29263262e-04, 3.64524316e-05, 1.58721639e-03]]) In [12]: res.sigma_u Out[12]: array([[ 5.73907395e-05, 2.90857557e-05, 2.29263451e-04], [ 2.90857557e-05, 4.09604397e-05, 3.64524456e-05], [ 2.29263451e-04, 3.64524456e-05, 1.58721761e-03]])

Through much trial, it seems like the 'bfgs' estimator is the best for solving the maximum likelihood problem. I would like to investigate in the future why this may be. Estimating the likelihood was aided greatly by bringing in the likelihood methods already present in the base component of the statsmodels package.

This equivalent system can be estimated and plotted in the R [2] script:

library("vars") #data <- read.csv("/home/skipper/statsmodels/statsmodels-skipper/scikits/statsmodels/datasets/macrodata/macrodata.csv") data <- read.csv("/home/bart/statsmodels/scikits/statsmodels/datasets/macrodata/macrodata.csv") names <- colnames(data) data <- log(data[c('realgdp','realcons','realinv')]) data <- sapply(data, diff) data = ts(data, start=c(1959,2), frequency=4) amat <- matrix(0,3,3) amat[1,1] <- 1 amat[2,1] <- NA amat[3,1] <- NA amat[2,2] <- 1 amat[3,2] <- NA amat[3,3] <- 1 bmat <- diag(3) diag(bmat) <- NA svar <- SVAR(var, estmethod = 'scoring', Bmat=bmat, Amat=amat) plot(irf(svar, n.ahead=30, impulse = 'realgdp')) #myirf <- plot(irf(myvar, impulse = "realgdp", response = c("realgdp", "realcons", "realinv"), boot=TRUE, n.ahead=30, ci=0.95)) #plot.irf()

This is the end of my Google Summer of Code project. In the future, I hope to continue work on SVAR, bring in long-run restrictions ala Blanchard and Quah, and further test solvers and their performance. I have benefited a lot from this project and would like to sincerely thank my mentors: Skipper Seabold, Josef Perktold, and Alan Isaac. They have been a great support and have answered my questions swiftly and completely. All blame for failure to complete the goals I set for myself at the beginning of the summer rests on me alone. Not only have I learned a lot about time series econometrics, but also quite a bit about how community software development works and especially realistic timelines. This has been an invaluable experience and I plan to further improve my contributions to the project in the coming year.

Bart Baker

[1] Hamilton, James. 1994. Time Series Analysis. Princeton University Press: Princeton.
[2] R Development Core Team. 2011. R: A Language and Environment for Statistical Computing. R FOundation for Statistical Copmuting.