Monday, August 31, 2009

scikits.statsmodels Release Announcement

We have been working hard to get a release ready for general consumption for the statsmodels code. Well, we're happy to announce that a (very) beta release is ready.


The statsmodels code was started by Jonathan Taylor and was formerly included as part of scipy. It was taken up to be tested, corrected, and extended as part of the Google Summer of Code 2009.

What it is

We are now releasing the efforts of the last few months under the scikits namespace as scikits.statsmodels. Statsmodels is a pure python package that requires numpy and scipy. It offers a convenient interface for fitting parameterized statistical models with growing support for displaying univariate and multivariate summary statistics, regression summaries, and (postestimation) statistical tests.

Main Feautures

* regression: Generalized least squares (including weighted least squares and least squares with autoregressive errors), ordinary least squares.
* glm: Generalized linear models with support for all of the one-parameter exponential family distributions.
* rlm: Robust linear models with support for several M-estimators.
* datasets: Datasets to be distributed and used for examples and in testing.

There is also a sandbox which contains code for generalized additive models (untested), mixed effects models, cox proportional hazards model (both are untested and still dependent on the nipy formula framework), generating descriptive statistics, and printing table output to ascii, latex, and html. None of this code is considered "production ready".

Where to get it

Development branches will be on LaunchPad. This is where to go to get the most up to date code in the trunk branch. Experimental code will also be hosted here in different branches.

Source download of stable tags will be on SourceForge.




Simplified BSD


The official documentation is hosted on SourceForge.

The sphinx docs are currently undergoing a lot of work. They are not yet comprehensive, but should get you started.

This blog will continue to be updated as we make progress on the code.

Discussion and Development

All chatter will take place on the or scipy-user mailing list. We are very interested in receiving feedback about usability, suggestions for improvements, and bug reports via the mailing list or the bug tracker at

Thursday, August 27, 2009

GSoC Is Over

Whoa, where did the last month go? The Google Summer of Code 2009 officially ended this Monday. Though I haven't taken a breath to update the blog, we (Josef and I) have been hard at work on the models code.

We have working and tested versions of Generalized Least Squares, Weighted Least Squares, Ordinary Least Squares, Robust Linear Models with several M-estimators, and Generalized Linear Models with support for all (almost all?) one parameter exponential family distributions. We have also provided some more convenience functions, created a standalone python package for the models code, and obtained permissions to distribute a few more datasets. Due to a lack of time, there is only experimental (read untested) support for autoregressive models, mixed effects models, generalized additive models, and convenience functions for returning strings (possibly html and latex output as well) with regression results and descriptive statistics. I will continue to work on these as I find time.

I will soon post a note on the progress that was made in the robust linear models code. Also, look out for a (semi-) official release of the code in the next few days. We have decided to name the project statsmodels and distribute it as a scikit. We need to finalize the documentation (should be ready to go in the next day or so...I am back taking courses) and clean up some of the usage examples, so people can jump right in and use the code, give feedback, and hopefully contribute extensions and new models.

As for the future of statsmodels, we are discussing over the next few weeks the immediate extensions that we know would like to make. It's looking like I will be wearing my microeconometrician hat this semester in my own coursework. More specifically, I will probably be working with cross-sectional and panel data models for household survey data in my own research and finding some time for time series models as part of my teaching assistantship. Josef has also mentioned wanting to work more with time series models.

If anyone (especially those from other disciplines) would like to contribute or see some extensions (my apologies to those who have made requests that I haven't yet been able to accomodate) feel free to post to the scipy-dev mailing list. I'm more than happy to discuss/debate with users and potential developers the design decisions that have been made, as I think the code is still in an unsettled enough state to merit some discussion.