This is an archival dump of old wiki content --- see for current material


The goal of this page is to gather informations on useful datasets related to scipy, including (but not limited to) datasets for machine learning, statistical analysis. Hopefully, we will be able to include some of them in scipy, as R is doing (eg a package datasets which consists only in data); one really important information is the license/copyright of the data, so please include it if you can find any.

Information on license

According to R. Kern (

"IANAL, but my approach would be to get in touch with the original source of the data if possible, and ask. The biggest problem you'll face is that few of those sources have ever thought about their datasets in terms of copyright licenses, particularly *software* copyright licenses that permit modification to their precious data. If it's an American source and the data appears to be freely distributed, as in the UCI database, I would probably just take it as public domain according to US law.

But of course, I'm in the US. There is a *tiny* possibility that although the "author" of the data is inside the US, too, he intends to pursue copyright outside of the US. However, if it's on something as visible as the UCI site, this possibility is really tiny."

Datasets for statistics

Geophysical datasets

Weather forecast

SciPy: DataSets (last edited 2015-10-24 17:48:23 by anonymous)