freshmeat

Freshmeat becomes Freecode, and how our data is affected

Three things happened recently to affect our Freshmeat collection

1. Freshmeat announced a name change to Freecode.
2. We have an issue (issue #43) that talks about how the trove definitions for Freshmeat are out of date.
3. Freshmeat replaced trove with tagging and we missed the memo

What I've done is as follows:

For issue #1 - decided not to rename our abbreviation for Freshmeat. It will remain "FM".

For issue #2 & 3 - Added a new table to hold the tags associated with a project. It's called fm_projects_tags.

CREATE TABLE IF NOT EXISTS `fm_project_tags` (
`project_id` int(11) NOT NULL DEFAULT '0',
`datasource_id` int(11) NOT NULL DEFAULT '0',
`tag_name` varchar(50) NOT NULL DEFAULT '0',
`timestamp` datetime NOT NULL DEFAULT '0000-00-00 00:00:00',
PRIMARY KEY (`project_id`,`datasource_id`,`tag_name`),
KEY `datasource_id_index18` (`datasource_id`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1;

Added a new release file to hold the data from this table. The new file is called fmProjectTags2011-Nov.txt. Did not remove trove; we are still collecting the trove. Although there is no longer any "trove definition" list that I know of to describe each trove number, so these are not as useful as the "tags". But I'm leaving this alone in the database for historical purposes.

Here is a shot of the tags page for a sample project on Freshmeat, called amms.

Here is a shot of the way the tags look now in our release files (or database table) for that same project (#78922)

How has the use of "Free" and "Open" in project names grown by year?

Description

This chart shows the number of new projects in each repository that use the words "Free" and "Open" in project names through 2010.

Visualization

SQL Script

Freshmeat:

SELECT YEAR( date_added ) , COUNT( DISTINCT project_id ) AS Count
FROM fm_projects
WHERE projectname_full LIKE "%free%"
AND datasource_id = <current>
GROUP BY YEAR( date_added )
ORDER BY YEAR( date_added );


SELECT YEAR( date_added ) , COUNT( DISTINCT project_id ) AS Count
FROM fm_projects
WHERE projectname_full LIKE "%open%"
AND datasource_id = <current>
GROUP BY YEAR( date_added )
ORDER BY YEAR( date_added );

Rubyforge:

SELECT YEAR( date_registered ) , COUNT( DISTINCT proj_unixname ) AS Count
FROM rf_projects
WHERE proj_unixname LIKE "%free%"
AND datasource_id = <current>
GROUP BY YEAR( date_registered )
ORDER BY YEAR( date_registered );


SELECT YEAR( date_registered ) , COUNT( DISTINCT proj_unixname ) AS Count
FROM rf_projects
WHERE proj_unixname LIKE "%open%"
AND datasource_id = <current>
GROUP BY YEAR( date_registered )
ORDER BY YEAR( date_registered );

Savannah:

SELECT YEAR( registration_date ) , COUNT( DISTINCT project_name ) AS Count
FROM sv_projects
WHERE project_name LIKE "%free%"
AND datasource_id = <current>
GROUP BY YEAR( registration_date )
ORDER BY YEAR( registration_date );


SELECT YEAR( registration_date ) , COUNT( DISTINCT project_name ) AS Count
FROM sv_projects
WHERE project_name LIKE "%open%"
AND datasource_id = <current>
GROUP BY YEAR( registration_date )
ORDER BY YEAR( registration_date );

How many projects at each repository share identical short project names?

Description

This chart shows the number of projects at each repository that share project names as of May 2011.

Visualization

project-names

SQL Script

RF-FM

SELECT COUNT( f.projectname_short_fixed )
FROM fm_projects f, rf_projects r
WHERE f.projectname_short_fixed = r.proj_unixname
AND f.datasource_id = <current>
AND r.datasource_id = <current>;

RF-OW

SELECT COUNT( r.proj_unixname )
FROM rf_projects r, ow_projects o
WHERE r.proj_unixname = o.proj_unixname
AND r.datasource_id = <current>
AND o.datasource_id = <current>;

FM-OW

SELECT COUNT( f.projectname_short_fixed )
FROM fm_projects f, ow_projects o
WHERE f.projectname_short_fixed = o.proj_unixname
AND f.datasource_id = <current>
AND o.datasource_id = <current>;

RF-FSF

SELECT COUNT( f.proj_unixname )
FROM fsf_projects f, rf_projects r
WHERE f.proj_unixname = r.proj_unixname
AND f.datasource_id = <current>
AND r.datasource_id = <current>;

FM-FSF

SELECT COUNT( f.proj_unixname )
FROM fsf_projects f, fm_projects fm
WHERE f.proj_unixname = fm.projectname_short_fixed
AND f.datasource_id = <current>
AND fm.datasource_id = <current>;

OW-FSF

SELECT COUNT( f.proj_unixname )
FROM fsf_projects f, ow_projects o
WHERE f.proj_unixname = o.proj_unixname
AND f.datasource_id = <current>
AND o.datasource_id = <current>;

How have projects in each repository grown by year?

Description

This chart shows the number of NEW projects added to each repository by month/year.

Visualization

Notes: RF had 697 projects without a project start date. OW had one project started in 1970.

SQL Script


SELECT MONTH( date_added ) , YEAR( date_added ) , COUNT( DISTINCT project_id )
FROM fm_projects
WHERE datasource_id = <current>
GROUP BY YEAR( date_added ) , MONTH( date_added )
ORDER BY YEAR( date_added ) , MONTH( date_added );


SELECT MONTH( date_registered ) , YEAR( date_registered ) , COUNT( DISTINCT proj_unixname )
FROM rf_projects
WHERE datasource_id = <current>
GROUP BY YEAR( date_registered ) , MONTH( date_registered )
ORDER BY YEAR( date_registered ) , MONTH( date_registered );


SELECT MONTH( registration_date ) , YEAR( registration_date ) , COUNT( DISTINCT project_name )
FROM sv_projects
WHERE datasource_id = <current>
GROUP BY YEAR( registration_date ) , MONTH( registration_date )
ORDER BY YEAR( registration_date ) , MONTH( registration_date );

How many projects are listed in each repository?

Description

This chart shows the number of projects that FLOSSmole most recently collected from each repository.

Visualization

Project Count Chart


SQL Script


SELECT COUNT( DISTINCT proj_name )
FROM gc_projects
WHERE datasource_id = <current>;


SELECT COUNT( DISTINCT project_id )
FROM fm_projects
WHERE datasource_id = <current>;


SELECT COUNT( DISTINCT project_name )
FROM lp_projects
WHERE datasource_id = <current>;


SELECT COUNT( DISTINCT proj_unixname )
FROM rf_projects
WHERE datasource_id= <current>;


SELECT COUNT( DISTINCT proj_num )
FROM fsf_projects
WHERE datasource_id= <current>;


SELECT COUNT( DISTINCT project_name )
FROM sv_projects
WHERE datasource_id= <current>;


SELECT COUNT( DISTINCT unixname )
FROM tg_projects
WHERE datasource_id = <current>;


SELECT COUNT( DISTINCT proj_unixname )
FROM ow_projects
WHERE datasource_id= <current>;

May 2011 Data Released

May 2011 data has been released to Google Code and uploaded into Data Central at Teragrid.

Datasources:
263 2011-Mar UDD bugfix replaces 262
264 2011-Mar UDD bugfix replaces 263
265 2011-May UDD May 2011 UDD donation
266 Rubyforge 2011-May Rubyforge 2011-May
267 Objectweb 2011-May Objectweb 2011-May
268 FSF 2011-May Free Software Foundation 2011-May
269 Savannah 2011-May Savannah 2011-May
270 2011-May FM May 2011 Freshmeat

Status of other collectors:
Launchpad - parsing problem
Tigris - mailing list collector problem
Github - collection problem
Google Code - still running (it will be about a month until these are out)

Link to FLOSSmole files on Google Code
Link to instructions for how to access FLOSSmole db at Teragrid

January file releases

Just released data files for the following forges. You can head over to the FLOSSmole data downloads page at Google Code to download any of these files, or wait for them to be released to the Teragrid for live querying (shortly!)

datasource_id, forge_id, abbreviation, name
237 2 FM Freshmeat
238 3 RF Rubyforge
239 4 OW ObjectWeb
240 5 FSF Free Software Foundation
241 10 SV Savannah
243 12 GC Google Code
244 13 TG Tigris

Still running...
245 14 LP Launchpad

Broken...
242 11 GH Github

May 2010 Data released

May 2010 data is released for some forges.

-Freshmeat (datasource 218)
-Rubyforge (datasource 219)
-ObjectWeb (datasource 220)
-Free Software Fntn (datasource 221)
-Google Code (datasource 222) - list of projects only

Our collectors for Savannah, Sourceforge, Github, Tigris, Launchpad are all undergoing maintenance at the moment.

UPDATE May 28, 2010
-Savannah data has been released (datasource 224)

Link to download the FLOSSmole data on Google Code.

February 2010 Data Released

Lots of new data for you to peruse out on our FLOSSmole Data Downloads Page.

Here's what's out there, recently added:

Google Code, March 2010 (GC) - list of all GC projects donated by Audris Mockus (HUGE THANK YOU TO AUDRIS FOR THIS!!)
Freshmeat, February 2010 (FM)
Objectweb, February 2010 (OW)
Rubyforge, February 2010 (RF)
Github, February 2010 (GH)
Free Software Foundation, February 2010 (FSF)
Savannah, February 2010 (SV)
and Sourceforge from December 2009 (SF)

We have another set of bugs to fix with Sourceforge collection this year, 2010, but those are forthcoming. I'm running a collection now. Hopefully the data will be good. We may even have stats this time. Hallelujah.

Also, thanks to my phenomenal undergraduate superstar Steven Norris, Tigris is coming soon!! and Debian after that. We are rocking the repository collection...

December 2009 data released

December data has been released for the following forges:

(datasource-abbreviation-full name)
200-fm-freshmeat
201-rf-rubyforge
202-ow-objectweb
203-fsf-free software foundation
204-sv-savannah
205-gh-github

Sourceforge is in progress... it will be datasource_id=206.

Get the data here:
http://code.google.com/p/flossmole/downloads/list

Remember that the files marked "DM" are SQL files (mysql) but the files marked .txt are flat text files (delimited)

Syndicate content