warning: Creating default object from empty value in /var/www/drupal/modules/taxonomy/taxonomy.pages.inc on line 33.

sourceforge

February 2010 Data Released

Lots of new data for you to peruse out on our FLOSSmole Data Downloads Page.

Here's what's out there, recently added:

Google Code, March 2010 (GC) - list of all GC projects donated by Audris Mockus (HUGE THANK YOU TO AUDRIS FOR THIS!!)
Freshmeat, February 2010 (FM)
Objectweb, February 2010 (OW)
Rubyforge, February 2010 (RF)
Github, February 2010 (GH)
Free Software Foundation, February 2010 (FSF)
Savannah, February 2010 (SV)
and Sourceforge from December 2009 (SF)

We have another set of bugs to fix with Sourceforge collection this year, 2010, but those are forthcoming. I'm running a collection now. Hopefully the data will be good. We may even have stats this time. Hallelujah.

Also, thanks to my phenomenal undergraduate superstar Steven Norris, Tigris is coming soon!! and Debian after that. We are rocking the repository collection...

December Sourceforge Data released

After long delay, the December Sourceforge data has been released. You may recall that over summer 2009, SF redesigned their web site which broke many of our crawlers and all of our parsers.

We have re-written these, and with only a few exceptions, have pretty much the same data as we always had.

Here are some release notes:

1. The Datasource_id=206
2. Donors data is not available in the Dec 2009 release. Donors were moved to their own page, so we have to add this to the collection for next time.
3. Statistics data is not available in the Dec 2009 release. We accidentally collected the wrong stats pages, so we had to throw these out and re-write for next time.
4. Status data (alpha, beta, mature, etc) is not available in the Dec 2009 release. This information is still being collected and kept by SF, but we can't find where it's being reported on their web site. If you have any ideas, send them to the mailing list (ossmole-discuss@lists.sourceforge.net).

Files are located at our Google Code page: http://code.google.com/p/flossmole/downloads/list

For those of you with database access on the sdsc server, I'll get these files over there ASAP.

As of June 2009, what are the top programming languages used by projects listed in Sourceforge?

Description

This chart shows the top programming languages used by projects in Sourceforge.

Visualization

Sourceforge Programming Language Chart

SQL Script

SELECT ppl.description, count(DISTINCT ppl.proj_unixname) AS lang
FROM project_programming_language ppl
WHERE ppl.datasource_id = <current>
GROUP BY ppl.description
ORDER BY lang DESC;

As of June 2009, what are the top operating systems used by projects listed in Sourceforge?

Description

This chart shows the top operating systems used by projects in Sourceforge.

Visualization

Sourceforge Operating System Chart

SQL Script

SELECT pop.description, count(DISTINCT pop.proj_unixname) AS system
FROM project_operating_system pop
WHERE pop.datasource_id = <current>
GROUP BY pop.description
ORDER BY system DESC;

As of June 2009, how many projects of each team size are listed in Sourceforge?

Description

This chart shows the number of projects of each team size listed in Sourceforge.

Visualization

Projects listed as having NULL or 0 developers were disregarded (1432 and 1478 projects, respectively).

Sourceforge Developer Count Chart

SQL Script

SELECT DISTINCT dev_count, count(DISTINCT proj_unixname) AS count
FROM projects
WHERE datasource_id = <current>
GROUP BY dev_count
ORDER BY count DESC , dev_count;

As of June 2009, how has the use "Free" and "Open" in project names grown by year?

Description

This chart shows the number of new projects in each repository that use the words "Free" and "Open" in project names. (We ran the queries to make this chart in June. This means 2009 was not yet completed, so this explains the apparent drop-off for the 2009 numbers.)

Visualization

Freshmeat Free & Open Count Chart

SQL Script

Sourceforge:

SELECT year(date_registered) , count(DISTINCT proj_unixname) FROM projects
WHERE proj_unixname LIKE "%free%"
AND datasource_id = <current>
GROUP BY year(date_registered)
ORDER BY year(date_registered) ;


SELECT year(date_registered) , count(DISTINCT proj_unixname) FROM projects
WHERE proj_unixname LIKE "%open%"
AND datasource_id = <current>
GROUP BY year(date_registered)
ORDER BY year(date_registered) ;

Freshmeat:

SELECT year(date_added), count(DISTINCT project_id) FROM fm_projects
WHERE projectname_full LIKE "%free%"
AND datasource_id = <current>
GROUP BY year(date_added)
ORDER BY year(date_added);


SELECT year(date_added), count(DISTINCT project_id) FROM fm_projects
WHERE projectname_full LIKE "%open%"
AND datasource_id = <current>
GROUP BY year(date_added)
ORDER BY year(date_added);

Rubyforge:

SELECT year(date_registered), count(DISTINCT proj_unixname) FROM rf_projects
WHERE proj_unixname LIKE "%free%"
AND datasource_id = <current>
GROUP BY year(date_registered)
ORDER BY year(date_registered);


SELECT year(date_registered), count(DISTINCT proj_unixname) FROM rf_projects
WHERE proj_unixname LIKE "%open%"
AND datasource_id = <current>
GROUP BY year(date_registered)
ORDER BY year(date_registered);

As of June 2009, how many projects at each repository share URL's?

Description

This chart shows the number of projects at each repository that share URL's.

Visualization

Number of Projects at each Repository that List a Home Page at Another Repository

Shared URL's Table

Shared URL's Chart

Matching projects by URL has two possiblities: projects listed on different forges might both display the same external URL, or projects on one forge might actually list the project site on a competing forge as the home page of record. The diagram shown in the figure above depicts each forge/directory in FLOSSmole and how many of its projects list another forge as the actual hosting home page. For example, in the diagram, the topmost arrow shows 11,229 projects on the Freshmeat that actually have Sourceforge listed as the home page. The arrow notation is used to show a direction of the relationship (e.g. 11,229 Freshmeat projects show a home page on Sourceforge, but only 10 Sourceforge projects list a Freshmeat home page). Pairs of forges with no URLs in common do not show an arrow. (No Rubyforge projects list ObjectWeb URLs, and vice versa.)

For more information on matching project names and URLs, see:

Squire, M. (2009). Integrating projects from multiple open source code forges. International Journal of Open Source Software & Processes, 1(1). January-March 2009. pp. 46-57.

SQL Script

RF-SF

SELECT count(r.proj_unixname) FROM rf_projects r
WHERE (r.real_url like "%sourceforge%"
OR r.real_url like "%sf.net%")
AND datasource_id= <current>;

RF-FM

SELECT count(r.proj_unixname) FROM rf_projects r
WHERE r.real_url like "%freshmeat%"
AND datasource_id= <current>;

RF-OW

SELECT count(r.proj_unixname) FROM rf_projects r
WHERE r.real_url like "%objectweb%"
AND datasource_id= <current>;

FM-RF

SELECT count(f.project_id) FROM fm_project_homepages f
WHERE f.real_url_homepage like "%rubyforge%"
AND datasource_id= <current>;

FM-SF

SELECT count(f.project_id) FROM fm_project_homepages f
WHERE (f.real_url_homepage like "%sourceforge%"
OR f.real_url_homepage like "%sf.net%")
AND datasource_id= <current>;

FM-OW

SELECT count(f.project_id) FROM fm_project_homepages f
WHERE f.real_url_homepage like "%objectweb%"
AND datasource_id= <current>;

SF-FM

SELECT count(p.proj_unixname) FROM projects p
WHERE p.real_url like "%freshmeat%"
AND datasource_id= <current>;

SF-OW

SELECT count(p.proj_unixname) FROM projects p
WHERE p.real_url like "%objectweb%"
AND datasource_id= <current>;

SF-RF

SELECT count(p.proj_unixname) FROM projects p
WHERE p.real_url like "%rubyforge%"
AND datasource_id= <current>;

OW-SF

SELECT count(o.proj_unixname) FROM ow_projects o
WHERE (o.real_url like "%sourceforge%" or o.real_url like "%sf.net%")
AND datasource_id= <current>;

OW-RF

SELECT count(o.proj_unixname) FROM ow_projects o
WHERE o.real_url like "%rubyforge%"
AND datasource_id= <current>;

OW-FM

SELECT count(o.proj_unixname) FROM ow_projects o
WHERE o.real_url like "%freshmeat%"
AND datasource_id= <current>;

As of June 2009, how many projects at each repository share identical short project names?

Description

This chart shows the number of projects at each repository that share project names.

Visualization

Number of Projects at each Repository that Share an Identical Short Project Name

Shared Short Names Table

Shared Short Names Chart

This graph shows the number of short project names shared in common between each pair of projects. For instance, starfish is a project listed on both Sourceforge and Rubyforge. On Rubyforge, it is described as a "tool to make programming ridiculously easy", but on Sourceforge the starfish project is described as a password management application. There are 1367 projects with shared names on Rubyforge and Sourceforge.

For more information on matching project names and URLs, see:

Squire, M. (2009). Integrating projects from multiple open source code forges. International Journal of Open Source Software & Processes, 1(1). January-March 2009. pp. 46-57.

SQL Script

RF-SF

SELECT count(p.proj_unixname) FROM projects p, rf_projects r
WHERE p.proj_unixname = r.proj_unixname
AND p.datasource_id= <current>
AND r.datasource_id= <current>;

RF-FM

SELECT count(f.projectname_short_fixed) FROM fm_projects f, rf_projects r
WHERE f.projectname_short_fixed = r.proj_unixname
AND f.datasource_id = <current> 
AND r.datasource_id = <current>;

FM-SF

SELECT count(f.projectname_short_fixed) FROM fm_projects f, projects p
WHERE f.projectname_short_fixed = p.proj_unixname
AND f.datasource_id = <current> 
AND p.datasource_id = <current>;

SF-OW

SELECT count(p.proj_unixname) FROM projects p, ow_projects o
WHERE p.proj_unixname = o.proj_unixname
AND p.datasource_id= <current> 
AND o.datasource_id= <current>;

RF-OW

SELECT count(r.proj_unixname) FROM rf_projects r, ow_projects o
WHERE r.proj_unixname = o.proj_unixname
AND r.datasource_id= <current> 
AND o.datasource_id= <current>;

FM-OW

SELECT count(f.projectname_short_fixed) FROM fm_projects f, ow_projects o
WHERE f.projectname_short_fixed = o.proj_unixname
AND f.datasource_id = <current>
AND o.datasource_id = <current>;

As of June 2009, how have projects in each repository grown by year?

Description

This chart shows the number of NEW projects added to each repository by year.

Visualization

Project Growth Chart

SQL Script

Sourceforge:

SELECT year(date_registered) , count(DISTINCT proj_unixname) FROM projects
WHERE datasource_id = <current>
GROUP BY year(date_registered)
ORDER BY year(date_registered);

Freshmeat:

SELECT year(date_added), count(DISTINCT project_id) FROM fm_projects
WHERE datasource_id= <current>
GROUP BY year(date_added)
ORDER BY year(date_added);

Rubyforge:

SELECT year(date_registered), count(DISTINCT proj_unixname) FROM rf_projects
WHERE datasource_id= <current>
GROUP BY year(date_registered)
ORDER BY year(date_registered);

Objectweb:

SELECT year(date_registered), count(DISTINCT proj_unixname) FROM ow_projects
WHERE datasource_id= <current>
GROUP BY year(date_registered)
ORDER BY year(date_registered);

As of June 2009, how many projects are listed in each repository?

Description

This chart shows the number of projects that FLOSSmole most recently collected from each repository.

Visualization

Project Count Chart

SQL Script

SELECT count(DISTINCT proj_unixname) FROM projects
WHERE datasource_id= <current>;


SELECT count(DISTINCT project_id) FROM fm_projects
WHERE datasource_id= <current>;


SELECT count(DISTINCT proj_unixname) FROM rf_projects
WHERE datasource_id= <current>;


SELECT count(DISTINCT proj_unixname) FROM ow_projects
WHERE datasource_id= <current>;


SELECT count(DISTINCT proj_num) FROM fsf_projects
WHERE datasource_id= <current>;

Syndicate content