Freshmeat Data Debuts

We have been collecting Freshmeat files for a long time in the database, but I finally made a release available for general consumption via the Sourceforge file release system. Get the Freshmeat March 2006 Files now! (April coming soon)

Included in this release is:

--fmRawProjectAuthors (authors/roles for each project)
--fmRawProjectDesc (textual description of each project)
--fmRawProjectInfo (general information on each project such as registration dates, urls, etc)
--fmRawProjectLicense (the license for each project)
--fmRawProjectStats (statistics generated by Freshmeat for each project)

April Sourceforge data released

The April Sourceforge data has been released. You can pick up the files from our Sourceforge file release page.

Here's what's included:
Package: sfProjectInfo
Release: sfProjectInfo02-Apr-2006
Files:
--ProjectList02-Apr-2006.csv.bz2: list of just project names
--ProjectInfo02-Apr-2006.csv.bz2: list of all basic project info (i.e. number of developers, registration dates, etc)
--ProjectDescriptions02-Apr-2006.csv.bz2: project names and their text descriptions (this file is quite large)

Package: sfRawDeveloperData
Release: sfRawDeveloperData02-Apr-2006
Files:
--sfRawDevelopers02-Apr-2006.csv.bz2: list of all developers
--sfRawDevProjects02-Apr-2006.csv.bz2: list of which projects are worked on by which developers

Package: sfRawData
Release: sfRawData02-Apr-2006
Files:
--sfRawDbEnvData02-Apr-2006.csv.bz2: list of projects and their database environments
--sfRawDonorData02-Apr-2006.csv.bz2: list of projects and their donors
--sfRawIntAudData02-Apr-2006.csv.bz2: list of projects and their intended audiences
--sfRawLicenseData02-Apr-2006.csv.bz2: list of projects and their open source licenses
--sfRawOpSysData02-Apr-2006.csv.bz2: list of projects and their operating systems

Social Network analysis over time using FLOSSmole data

Just sent off the camera ready version of a paper built using data available in the tracker tables of the FLOSSmole database.

Howison, J., Inoue, K., and Crowston, K. (2006). Social dynamics of free and open source team communications. In Proceedings of the IFIP 2nd International Conference on Open Source Software, Lake Como, Italy. Available from: http://floss.syr.edu/publications/howison_dynamic_sna_intoss_ifip_short.pdf

This paper furthers inquiry into the social structure of free and open source software (FLOSS) teams by undertaking social network analysis across time. Contrary to expectations, we confirmed earlier findings of a wide distribution of centralizations even when examining the networks over time. The paper also provides empirical evidence that while change at the center of FLOSS projects is relatively uncommon, participation across the project communities is highly skewed, with many participants appearing for only one period. Surprisingly, large project teams are not more likely to undergo change at their centers.

February 2006 files released

Sourceforge data has been released for February, 2006. Get the files from our Sourceforge file release page.

What's included in this release:

Package: sfProjectInfo
Release: sfProjectInfo02-Feb-2006
Files:
--ProjectList02-Feb-2006.csv.bz2: list of just project names
--ProjectInfo02-Feb-2006.csv.bz2: list of all basic project info
--ProjectDescriptions02-Feb-2006.csv.bz2: project names and their text descriptions (this file is quite large)

Package: sfRawDeveloperData
Release: sfRawDeveloperData02-Feb-2006
Files:
--developers02-Feb-2006.csv.bz2: list of all developers
--developer_projects02-Feb-2006.csv.bz2: list of which projects are worked on by which developers

Package: sfRawData
Release: sfRawData02-Feb-2006
Files:
--project_dbenv02-Feb-2006.csv.bz2: list of projects and their database environments
--project_donors02-Feb-2006.csv.bz2: list of projects and their donors
--project_intaud02-Feb-2006.csv.bz2: list of projects and their intended audiences
--project_licenses02-Feb-2006.csv.bz2: list of projects and their open source licenses
--project_opsys02-Feb-2006.csv.bz2: list of projects and their operating systems

tips for using the query tool

NOTE: This message describes an old query tool. The old query tool has been replaced with the new query tool. The new tool is located here: New Query Tool


Original message:
If you use the query tool, be aware that the amount of data in some of our tables is truly immense.

Tips:

1. do a "describe" on each table to see what's in there first:

"describe fm_projects"

This will tell you the structure of the table.

2. If you want to see a few sample rows, and you feel as though you simply MUST do a "select *", at least do your select with a mysql-style "limit" phrase like this:

"select * from fm_projects limit 25"

3. If you get an error describing something like a "timeout", this means your query was probably just too large. Email or chat with us on IRC or AIM to figure out what is wrong or a way to optimize the query.

4. Use the text files - many of the queries you want are the same queries that everyone wants! So we've taken the liberty of making text files of these items for your convenience.

tidbit: freshmeat and sourceforge

Freshmeat (FM) describes itself thusly: "freshmeat maintains the Web's largest index of Unix and cross-platform software, themes and related 'eye-candy', and Palm OS software."

And Sourceforge (SF) is, of course, "the world's largest Open Source software development web site, hosting more than 100,000 projects and over 1,000,000 registered users with a centralized resource for managing projects, issues, communications, and code."

Here at FLOSSmole, we keep tabs on Freshmeat AND Sourceforge projects. Some of the projects listed on Freshmeat are also listed in Sourceforge, and some of them are not. One way to tell which SF projects are listed on FM is to query our Freshmeat tables and ask which Freshmeat projects resolve to a "sf.net" or "sourceforge" URL:

SELECT count(*)
FROM fm_project_homepages
WHERE datasource_id=18
AND real_url_homepage LIKE "%sourceforge%"
OR real_url_homepage LIKE "%sf.net"


For the March 5 data (datasource=18), this yields 10278 results.

Other things we track about Freshmeat projects are the authors, the dependencies (what other software is this software dependent upon?), and how the project is classified in the trove.

The tables you'll be interested in are:

fm_project_authors
fm_project_dependencies
fm_project_homepages

Some pretty pictures to amuse you...

...while we get February's data parsed and loaded.

These graphs showing December 2005 trends were made by FLOSSmole's newest developer.

Most connected developers - I really like this chart because it shows that the most connected developer on Sourceforge (i.e. member of the most projects) is a graphic designer! How cool is that? This makes perfect sense when you think about it for a second, but it wouldn't have been MY first guess.

Here are some older charts, similar to things we have run before - these graphs show the kinds of reports you can run using FLOSSmole data:

Database Environments

SF project descriptions

We got a request for Sourceforge project descriptions. These are the little paragraphs that the project owners write to describe a given project. I've parsed out the descriptions and put them in this file release. Also, I created a new table called project_description to hold this information if you're using the query tool.

freshmeat dec and jan

December and January Freshmeat files have been added as datasource_ids 14(Dec) and 15(Jan). Use the Query Tool to explore the fm_* tables (these are the tables that hold the freshmeat data).

december 2005 data

We've run December 2005 Sourceforge data; the raw html has been stored as datasource_id #13 if you're using the query tool, otherwise, text files are over here at sourceforge on our project page.

We've got the usual stuff, all the Sourceforge project names, all project data, developer counts, who is working on what projects, what programming languages are being used, operating system counts, all that good stuff. Have fun!

Current status: found something new to add... Donors. This could be interesting for a SNA (social network analysis). I'll get a script written to parse donors and make a new table. When I'm done I'll post here.
Syndicate content