Submitted by megan on January 25, 2007 - 3:35pm
You can browse the sql version of the FLOSSmole
database schema in CVS on Sourceforge, OR you can check the
schemaspy-generated view of the schema. The CVS version is updated less often than the schemaspy version. I'll try to run a schemaspy view every few months and overwrite the old one.
Submitted by megan on January 15, 2007 - 5:03am
FLOSSmole now includes data from the FSF (Free Software Foundation) directory (
original directory link).
The flat files including the data can be found on
our FSF sourceforge file release page.
Some facts of note:
--the FSF directory contains 5226 projects
--the FSF directory allows projects with case-sensitive but otherwise identical names, i.e.
ANT
and
ant
are considered different projects
Submitted by megan on January 5, 2007 - 2:48pm
We've released the January 2007 files for Freshmeat, Objectweb, and Rubyforge.
In February, look forward to the next Sourceforge release, as well as a new feature: "All Time Stats" for Sourceforge!
Get the latest files here
Submitted by megan on December 3, 2006 - 1:56am
December data has been released. But first, some informational updates.
1. Project_indexes (the table where sourceforge web pages are stored) is now set to use the UTF-8 encoding. This is to address the concerns about character sets and corruption in our storage of Sourceforge project home pages.
2. The "forges" table is now in use. We have 5 forges, as follows:
0 - TE - test
1 - SF - Sourceforge
2 - FM - Freshmeat
3 - RF - Rubyforge
4 - OW - Objectweb
The datasources table now shows which forge the datasource is pulled from. The join column between the "forges" and "datasources" tables is "forge_id".
3. December data has been released for Sourceforge (SF), Objectweb (OW), RF (Rubyforge), FM (Freshmeat). Get the files on Sourceforge as follows:
-- All projects, SF, FM, OW, RF
link-- Sourceforge, December
link (Datasource_id = 38)
-- Freshmeat, December
link (Datasource_id = 41)
Submitted by megan on October 24, 2006 - 2:16am
There are 2 types of "home pages" for projects on Sourceforge:
1. A project's summary page. This is not a real home page, but sometimes people call it one. It lives on the SF servers and it has the URL format
http://sf.net/projects/projectname
, where "projectname" is replaced by the actual name of the project. In our system, we call this address "url", and it's located in the projects table.
2. A request came in this week for us to parse out the "real" home pages of a project. There are 2 types of home pages:
a. Homepages that live on the shell.sf.net servers and give a project a URL like this:
http://projectname.sf.net
b. Homepages that live on some other server and are not hosted by SF in any way.
I wrote a parser for these "real urls" today and created a new column in the projects table called "real_url" to hold this data. I then released files in
the sfRawData package for August 2006 and October 2006 showing these "real urls". Remember that real urls are reported by the project administrators. For the vast majority of projects, the URL is of type "a" above. But for some projects who have as their type "b" this may be of assistance in tracking down additional info about these projects.
Submitted by megan on October 14, 2006 - 1:41am
It's my Fall Break, so you know that means the October releases are finally here! (This includes Sourceforge releases, yay.)
Get the text files here.
1- FRESHMEAT fmProjectInfo (fmProjectInfo2006-Oct)
2- RUBYFORGE rfRawData (rfRawData2006-Oct)
3- SOURCEFORGE sfProjectInfo (sfProjectInfo01-oct-2006); sfRawData (sfRawData01-Oct-2006); sfRawDeveloperData (sfRawDeveloperData01-Oct-2006)
4- OBJECTWEB owRawData (osRawData2006-Oct)
Submitted by megan on September 15, 2006 - 9:09pm
The September 2006 data was released today for:
--Freshmeat
--RubyForge
--ObjectWeb
You can pick up those datafiles here on our
FLOSSmole Files Page on Sourceforge.
Enjoy!
Submitted by megan on August 9, 2006 - 5:39pm
Go to our
file release page on Sourceforge to get the latest files for August.
What's included here:
1- FRESHMEAT fmProjectInfo (fmProjectInfo2006-Aug)
2- RUBYFORGE rfRawData (rfRawData2006-Aug)
3- SOURCEFORGE sfProjectInfo (sfProjectInfo01-Aug-2006); sfRawData (sfRawData01-Aug-2006); sfRawDeveloperData (sfRawDeveloperData01-Aug-2006)
4- OBJECTWEB owRawData (osRawData2006-Aug)
Submitted by megan on July 18, 2006 - 3:25pm
Hello moles, and happy summer! I've just released Rubyforge data from July, 2006.
Now granted, Rubyforge is not as large as Sourceforge. But it has considerable "buzz" for what that's worth. And as a relatively new language and new forge, I figure it's worth watching, especially considering how easy it is to collect their data! (They put out an XML file with a bit of the data in it, and with only 1700 or so projects, it's much easier to scrape the rest than on CERTAIN OTHER FORGES. Thank you for that, Rubyforge!)
Rubyforge files available here:
Submitted by megan on July 10, 2006 - 2:07am
UPDATE (2006-Jul-11):
As far as I can tell, the problem below has been fixed and the "number of downloads" files are all set for you to use! Enjoy.UPDATE (2006-Jul-10):
Today, an alert user pointed out a problem with the data that I released yesterday for number of downloads. Sure enough, there was a problem with errant commas in the numeric values greater than 999. This was causing the SQL sum() to add values incorrectly for projects with large numbers of downloads. New files are being generated now, and they'll be posted shortly! Thanks for your patience. (I've removed the bad files, so for the time being the links below won't work.)Original posting:
=================
From the Sourceforge stats page, you can get a variety of measures, such as number of downloads, rank, etc for a particular project.
I have begun releasing these measures (summed per project, over the 60 days between SF scrapes) as Raw Downloads under the SFRawData package. Here are the links, retroactive back to December 2005:
June, 2006 (
link to release)
Apr, 2006 (
link to release)
Pages