Monday, May 31, 2010

Week 1 GSoC Update

Last week was the first of the Google Summer of Code. I spent most of the week in a Bayesian econometrics class led by John Geweke and studying for a comprehensive exam that I take this week, so progress on statsmodels was rather slow. That said, I have been able to take care of some low hanging fruit.

There are a few name changes:

statsmodels/family -> statsmodels/families
statsmodels/lib/ -> statsmodels/iolib/

Also Vincent has done a good bit of work on improving our output using the SimpleTable class from econpy. I will post some examples over the coming weeks, but SimpleTable provides an easy way to make tables in ASCII text, HTML, or LaTeX. The SimpleTable class has been moved

statsmodels/sandbox/ -> statsmodels/iolib/

Beyond the renames, I have removed the soft dependency on RPy for running our tests in favor of hard-coded results, refactored our tests, and added a few additional ones along the way.

We are also making an effort to keep our online documentation synced with the current trunk. The biggest change to our documentation is the addition of a developer's page for those who might like to get involved. As always, please report problems with the docs on either the scipy-user list or join in the discussions of statsmodels, pandas, larry, and other topics on statistics and Python at the pystatsmodels Google group.

Saturday, May 1, 2010

Plans for the Summer

A quick update on the plans for statsmodels over the next few months.

I have been accepted for my second Google Summer of Code, which means that we will have a chance to make a big push to get a lot of our work out of the sandbox, tested, and included in the main code base.

You can see the roadmap on Google's GSoC site here. You might have to log in to view it.

The quick version follows. As far as general issues, I will be getting the code ready for Python 3 and focusing on some design issues including an improved generic maximum likelihood framework, post-estimation testing, variable name handling, and output in text tables, LaTeX, and html. I will then be working to get a lot of our code out of the sandbox. This includes timeseries convenience functions and models such as GARCH, VARMA, Hodrick-Prescott filter, and a state space model that uses the Kalman filter. I will be polishing the systems of equation framework and panel (longitudinal) data estimators. We have also been working on some nonparametric estimators including univariate kernel density estimators and kernel regression estimators. Finally, as part of my coursework I have been working toward (generalized) maximum entropy models that I hope to include as well as some work on the scipy.maxentropy module.

I will give a quick talk on the project for the SciPy Conference in Austin.

It looks like we are set to make a good deal of progress on the code this summer.