The goal of this page is to gather informations on useful datasets related to scipy, including (but not limited to) datasets for machine learning, statistical analysis. Hopefully, we will be able to include some of them in scipy, as R is doing (eg a package datasets which consists only in data); one really important information is the license/copyright of the data, so please include it if you can find any.
Information on license
According to R. Kern (http://projects.scipy.org/pipermail/scipy-dev/2007-May/007215.html)
"IANAL, but my approach would be to get in touch with the original source of the data if possible, and ask. The biggest problem you'll face is that few of those sources have ever thought about their datasets in terms of copyright licenses, particularly *software* copyright licenses that permit modification to their precious data. If it's an American source and the data appears to be freely distributed, as in the UCI database, I would probably just take it as public domain according to US law.
But of course, I'm in the US. There is a *tiny* possibility that although the "author" of the data is inside the US, too, he intends to pursue copyright outside of the US. However, if it's on something as visible as the UCI site, this possibility is really tiny."
Links to Datasets
Datasets for statistics
http://lib.stat.cmu.edu/ (no license information found)
http://stat.ethz.ch/R-manual/R-patched/library/datasets/html/datasets-package.html (the package is licensed under the GPL, but the license of the data themselves is not clear: they are coming from copyrighted works, mainly books on statistics).