How to use this data

(Note: This message is updated periodically with new info.)

The FLOSSmole project provides data about:

(a) all projects on Sourceforge
(b) all developers on Sourceforge
(c) all projects on Sourceforge AND who is developing for them, their roles, whether they are an administrator, etc.
(d) all Sourceforge projects and their programming languages, operating systems, user interfaces, end user audience, registration dates, etc (new: donations!)
(e) Edit, Oct-2005: much of the above, but for Freshmeat, also
(f) Edit, Jul-2006: also, Rubyforge
(g) Edit, Jul-2006: also, Objectweb
(h) Edit, Jan-2007: also, Free Software Foundation directory
(i) Edit, Feb-2007: also, SourceKibitzer donates data

We have done runs on Sourceforge starting in early 2004 and we have received donated Sourceforge data for December 2004 from Dawid Weiss in Poland.

We began also scraping Freshmeat, Rubyforge, and Objectweb, and we receive data from SourceKibitzer. Get the complete list of data sources here. (This is a list of each of our scrapes and the date and it's "datasource" ID.) The abbreviations for the forges are RF (Rubyforge), SF (Sourceforge), FM (Freshmeat), OW (Objectweb), FSF (Free Software Fndn Directory), SK (SourceKibitzer).

We are now collecting information from Sourceforge every 60 days, and from Freshmeat/Rubyforge/ObjectWeb/FSF/SK every 30 days (monthly).

You can get all the raw data files from the project file release system.

In addition to the text files (database dumps), we have a basic query tool. Details and tips for using the query tool are available here.

Hope this helps, and please contact me at any time (megan AT elon DOT edu) to discuss the data or what is missing, what you'd like to do with it, etc.

Comments

Do we have the self-reported skills information for each developer... I know as a developer it is 'private', but is public available?monte{x:

I can't find download stats for projects anywhere in the data. Is it really not there?

hi, all: I am a university researcher, doing a research on how FLOSS teams work. When i download the data collected by FLOSSmole, I have to say it is impressive and very helpful. Just let you know there are two files are corrupted:sfProjectList01-Oct-2005.txt.gzsfRawDeveloperProjectData04-Jul-2005.txt.gz Can someone look it over? Thanks a million!yanfeng

Hi Yangfeng, I will take a look at those two files and re-create them. -megan

Hi, I downloaded both files and they opened fine. I wonder if the problem is a mac/pc thing? (I am on a mac).Can you please send me an email so I have your address and I will email you a link where you can d/l the files again or we can troubleshoot that way?Thanks,-megan (mconklin AT elon DOT edu)