This project aims at gathering attractive data sets from various fields and including them in a SciPy branch. The goal is to write examples and unit tests relying on those data sets, as well as provide new users the preformatted data necessary to try out SciPy without bothering with data collection and formatting.
Data Sets
Most of the data will reside in the data/ directory, outside of the trunk and tarballs to keep them light. Some small datasets will be included in the trunk for testing purposes.
Guidelines
Data set files should be named by their content, and the variables distinctly named. That is, the import statement should speak for itself:
from data import sunspots print sunspots.date, sunspots.size, sunspots.duration
Files and Links
Attach data files here with a short description of the data, its meaning and provenance (to include in a README).
Support Vector Machines: The libsvm authors did a pretty decent job of collecting quite a few data sets. http://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/
UN Demographic Data: This is financial and demographic data instead of strictly scientific data, but perhaps the sheer volume and quality of the data outweighs the lack of direct scientific applicability. Note: Free except for commercial usage, needs to be cited if used http://data.un.org/.
Freakonomics: The book by Levitt and Dubner has a number of really interesting cases in economics that would make fun tutorials for statistical functions. I contacted one of the author and he gave me the link to some of the data sets relating abortion to crime rates. http://http://islandia.law.yale.edu/donohue/pubsdata.htm There is also some data here on the link between car seat usage and deaths. http://www.freakonomics.com/times0710.php
Examples
Example files should have a number followed by the function at the heart of the example (ex01_spline_interpolation.py). If an example generates figures, they should follow a similar naming convention (fig01_1.png, fig01_2.png). Examples should be written so users can simply import them or run them from the console, with some kind of explanation of what is going on printed on screen. If a figure is created, it should be printed on screen. print 'Plotting figure fig01_2.png...'
Attach examples here.