Scipy Stats Project: Sparse Least Squares Solver

Monday, March 8, 2010

Sparse Least Squares Solver

I have a homework doing some monte carlo experiments of an autoregressive process of order 1, and I thought I would use it as an example to demonstrate the sparse least squares solver that Stefan committed to scipy revision 6251.

All mistakes are mine...

Given an AR(1) process

$y_{t}=.9y_{t-1} + .01\epsilon_{t} \newline \newline \text{where} \newline \newline \epsilon_{t} \sim \boldsymbol{\text{NID} }\left(0,1\right)$

We can estimate the following autoregressive coefficients.

In [1]: import numpy as np

In [2]: np.set_printoptions(threshold=25)

In [3]: np.random.seed(1)

In [4]: # make 1000 autoregressive series of length 100

In [5]: # y_0 = 0 by assumption

In [6]: samples = np.zeros((100,1000))

In [7]: for i in range(1,100):
...: error = np.random.randn(1,1000)
...: samples[i] = .9 * samples[i-1] + .01 * error
...:

In [8]: from scipy import sparse

In [9]: from scipy.sparse import linalg as spla

In [10]: # make block diagonal sparse matrix of y_t-1

In [11]: # recommended to build as a linked list

In [12]: spX = sparse.lil_matrix((99*1000,1000))

In [13]: for i in range(1000):
....: spX[i*99:(i+1)*99,i] = samples[:-1,i][:,None]
....:

In [14]: spX = spX.tocsr() # convert it to csr for performance

In [15]: # do the least squares

In [16]: retval = spla.isolve.lsqr(spX, samples[1:,:].ravel('F'), calc_var=True)
In [17]: retval[0]
Out[17]:
array([ 0.88347438, 0.8474124 , 0.85282674, ..., 0.91019165,
0.89698465, 0.76895806])

I'm curious if there's any downside to using sparse least squares whenever the RHS of a least squares can be written in block diagonal form.

2 comments:

Stefan van der WaltMarch 14, 2010 at 3:35 PM
Hi, Skipper! Could you explain what you mean by "whenever the RHS of a least squares can be written in block diagonal form"?
ReplyDelete
Replies
jseaboldMarch 14, 2010 at 4:04 PM
Sure. The form comes up from time to time while I'm working on some estimators, where the right hand side is block diagonal and the left hand side can be a long column vector. To be a little more concrete consider the following toy example.

import scikits.statsmodels as sm
from scipy.linalg import block_diag
import numpy as np

data = sm.datasets.longley.Load()

#Fair warning, the design of the datasets objects may change soon!

exog = sm.add_constant(data.exog, prepend=True)
endog = data.endog

#Now we can do

X = block_diag(*[exog]*3)
Y = np.tile(endog, (1,3))

results = sm.OLS(Y,X).fit()
results.params.reshape(3,-1)

In this case, it's really just a way to do many OLS regressions at the same time. I was doing some Monte Carlo experiments on the model Y = XB where I had Y = 98*1000 x 1
and X as a big block diagonal, and instead of looping and doing OLS, I just did one big sparse least squares. The building of the sparse block diagonal was kind of slow though, and I wonder if it couldn't be optimized.
ReplyDelete
Replies

Add comment

Scipy Stats Project

Monday, March 8, 2010

Sparse Least Squares Solver

2 comments:

Blog Archive

My Google Reader Shared Items

Contributors

Followers