Wednesday, July 1, 2009

Econometrics with Python

There is as yet no equivalent of R in applied econometrics. Therefore, the econometric community can still decide to go along the Python path.

That is Drs. Christine Choirat and Raffello Seri writing in the April issue of the Journal of Applied Econometrics. They have been kind enough to provide me with an ungated copy of their review, "Econometrics with Python." Mentioning the, quite frankly, redundant general programming functions and tools that had to be implemented for R, the authors make a nice case for Python as the programming language of choice for applied econometrics. The article provides a quick overview of some of the advantages of using Python and its many built-in libraries, extensions, and tools, gives some speed comparisons, and also mentions a few of the many tools out there in Python community for econometrics including RPy (RPy2 is now available), and of course NumPy and SciPy. Having spent the last week or more trying to master the basic syntax and usage of R, I very much sympathize with this position. The one complaint I hear most often from my fellow students is that Python is not an industry standard. I hope this can change and is changing, because it's much more of a pleasure to work with Python than the alternatives and that makes for increased productivity.


  1. Thanks for your post! The reference you mentioned seems very interesting, as I would love to switch from R to Python, without rPy preferrably.

    A library that I found very interesting for Econometrics is called 'Apophenia'. It is a C library, written by Ben Klemens. Maybe it would be a good idea to create a good interface between Python and Apophenia, as it is quite powerful and fast (and ready!). However, it depends on the GNU Scientific Library. Anyway, just thoughts.

  2. The book doesn't have any formal econometrics, just graphs. It is about correlations and stuff, but the relations are very visible and highly suggested. Of course a more thorough econometric investigation would be interesting, but I don't think the lack therof invalidates the results.

  3. What do people dislike about R the language? I am an avid user of python but I must admit there are many things I am jealous of in R. Their lists are so much more convient (combining properties of python list's dict's and objects) with the sexy keyword syntax (list(a=1, b=2, c=3) allows for attribute access slicing and name lookup. To do this in python we would need to change the way keyword arguments are passed to functions/methods which is not going to happen anytime soon).

    They have a much more powerfull argument/keyword syntax for functions (as the arguments are evaluated lazyily, and they have keyword only arguments which we won't get until python 3000 is the standard). Finally generic functions seem to make a lot more sense in many scientific libraries (instead of just classes/single dispatch), and once again python is thinking of adding this, but it is currently a dead PEP.

    So in short what do people see python having that R does not for scientific programming, is it simply the standard library? If so I don't see what is the big ones as R has regex powerfull string manipulation and growing database support. GUI's are a win for python, but I find this to be niche for the people I work with.

  4. "There is as yet no equivalent of R in applied econometrics. Therefore, the econometric community can still decide to go along the Python path."

    Amen. Here's hoping it happens.


  5. For some reason, I never saw these comments. To explain a little about what I like over R. The switch to Python 3 is coming sooner rather than later so that addresses a few of your comments. I find the OOP in R to be ad hoc and confusing compared to Python (maybe it's just more nuanced, but I don't see any advantages here). For your list(a=1, b=2, c=3), you can do this with NumPy's structured arrays right now and there are a few other projects working on more statistically convenient data structures, notably pandas and larry along with the collections.namedtuple in Python 2.6. Python just seems *much* more general than R. The syntax is nicer. It can be a replacement for anything from Matlab to GAMS. The database handling and large data handling are already there. It's used by so many more people than just statisticians so that every conceivable use case is well thought out, and this makes the language more dynamic. Further, I find the learning curve to be much less steep, so it can be used a teaching tool. Then there's the fact that you can do function modelling, symbolic algebra, visualization, machine learning, optimization, TeX integration, web integration, etc. all in one stop. And for free (I tend to favor the more BSD-style licenses over GPL-style). I also find it pretty easy to integrate lower level languages into Python, plus there are several ways to do this. Most of all Python is fun! And I can also hack away on my system with Python. I've written several GUIs and convenient little scripts that do what I need. In short, since the audience tends to be economists, I think there are plenty of positive externalities and larger increasing returns to using Python than R.

  6. I think is useful to learn both.
    Python as a more general programming language,
    and R for its strengths in graphs and a huge library for applied statistics and econometrics.
    Rpy makes the life easier by allowing to call R objects into python.

    R's competitive advantage for statistics (and econometrics) is its community. That is the only thing that prevents another (perhaps) 'stronger' language to become a convenient choice.

    R also has several limitations for applied work, in my opinion the most important is the way it handles big databases, which is far from optimal.

  7. Hi everybody,

    I we use Rpy to call R econometric packages into Python will it still be possible to handle a huge amount of data as if the package was directly written in Python.