Monday, March 8, 2010

Sparse Least Squares Solver

I have a homework doing some monte carlo experiments of an autoregressive process of order 1, and I thought I would use it as an example to demonstrate the sparse least squares solver that Stefan committed to scipy revision 6251.

All mistakes are mine...

Given an AR(1) process

$y_{t}=.9y_{t-1} + .01\epsilon_{t} \newline \newline \text{where} \newline \newline \epsilon_{t} \sim \boldsymbol{\text{NID} }\left(0,1\right)$

We can estimate the following autoregressive coefficients.

In [1]: import numpy as np

In [2]: np.set_printoptions(threshold=25)

In [3]: np.random.seed(1)

In [4]: # make 1000 autoregressive series of length 100

In [5]: # y_0 = 0 by assumption

In [6]: samples = np.zeros((100,1000))

In [7]: for i in range(1,100):
...: error = np.random.randn(1,1000)
...: samples[i] = .9 * samples[i-1] + .01 * error
...:

In [8]: from scipy import sparse

In [9]: from scipy.sparse import linalg as spla

In [10]: # make block diagonal sparse matrix of y_t-1

In [11]: # recommended to build as a linked list

In [12]: spX = sparse.lil_matrix((99*1000,1000))

In [13]: for i in range(1000):
....: spX[i*99:(i+1)*99,i] = samples[:-1,i][:,None]
....:

In [14]: spX = spX.tocsr() # convert it to csr for performance

In [15]: # do the least squares

In [16]: retval = spla.isolve.lsqr(spX, samples[1:,:].ravel('F'), calc_var=True)
In [17]: retval[0]
Out[17]:
array([ 0.88347438, 0.8474124 , 0.85282674, ..., 0.91019165,
0.89698465, 0.76895806])

I'm curious if there's any downside to using sparse least squares whenever the RHS of a least squares can be written in block diagonal form.