Monday, March 8, 2010

Sparse Least Squares Solver

I have a homework doing some monte carlo experiments of an autoregressive process of order 1, and I thought I would use it as an example to demonstrate the sparse least squares solver that Stefan committed to scipy revision 6251.

All mistakes are mine...

Given an AR(1) process




We can estimate the following autoregressive coefficients.

In [1]: import numpy as np

In [2]: np.set_printoptions(threshold=25)

In [3]: np.random.seed(1)

In [4]: # make 1000 autoregressive series of length 100

In [5]: # y_0 = 0 by assumption

In [6]: samples = np.zeros((100,1000))

In [7]: for i in range(1,100):
...: error = np.random.randn(1,1000)
...: samples[i] = .9 * samples[i-1] + .01 * error
...:

In [8]: from scipy import sparse

In [9]: from scipy.sparse import linalg as spla

In [10]: # make block diagonal sparse matrix of y_t-1

In [11]: # recommended to build as a linked list

In [12]: spX = sparse.lil_matrix((99*1000,1000))

In [13]: for i in range(1000):
....: spX[i*99:(i+1)*99,i] = samples[:-1,i][:,None]
....:

In [14]: spX = spX.tocsr() # convert it to csr for performance

In [15]: # do the least squares

In [16]: retval = spla.isolve.lsqr(spX, samples[1:,:].ravel('F'), calc_var=True)
In [17]: retval[0]
Out[17]:
array([ 0.88347438, 0.8474124 , 0.85282674, ..., 0.91019165,
0.89698465, 0.76895806])

I'm curious if there's any downside to using sparse least squares whenever the RHS of a least squares can be written in block diagonal form.