Submitted by megan on November 30, 2010 - 2:26pm
Whew! Google Code is collected, parsed, and released. Backup to Teragrid is happening now (just as soon as I solve this little issue of"disk quota exceeded" - fun!). In the meantime, go to the FLOSSmole Google Code Downloads Page and get your hot fresh data.
Remember, the files marked "datamarts" are SQL code you can use to make your own version of the database. The raw delimited files are marked .txt.bz2. The datasource_id for Google Code this release is 235.
Submitted by megan on September 21, 2010 - 10:47am
Just released Github data for September today. This took about 10 days to collect, parse and release. You can download the files here (along with Freshmeat, Rubyforge, Objectweb, Free Software Foundation, Savannah, Tigris, etc), or wait for the next Teragrid backup if you want direct database access. (Google Code is collecting now, and upon completion of that, I'll do a final September Teragrid backup.)
Submitted by megan on September 1, 2010 - 4:38pm
Hello moles,
We have new data backed up to Teragrid. Here is what is included:
Google Code - metadata on projects, developers, issues, etc. Plus HTML.
Github - metadata. Plus XML.
Launchpad - metadata on projects, groups, developers, wiki. Plus HTML.
Tigris - discussions! messages! project metadata. Plus HTML.
Plus all the other forges you've known and loved for so many years: fm, ow, rf, fsf, sv, etc etc.
Submitted by megan on August 12, 2010 - 11:06am
Hello moles! I've released a new set of Google Code project data to our own downloads page (on Google Code, no less!) - the datasource_id is 226.
This data took over a month to collect. Included are the following:
Submitted by megan on August 5, 2010 - 4:50pm
Interesting article by some folks at 80Legs about crawling the web versus using API to gather data. On several occasions we've chosen to use an API rather than crawling. This pretty much summarizes the limitations around that choice.
Submitted by megan on June 18, 2010 - 9:59pm
Introducing a new data source: Launchpad data. In this collection, Launchpad has about 19k projects in it and about 34k developers.
mysql> select count(*) from lp_projects where datasource_id=227;
18956
mysql> select count(distinct dev_loginname) from lp_developer_indexes where datasource_id=227;
34051
Available on our Google Code downloads page: Launchpad data
Submitted by megan on June 6, 2010 - 11:11am
Submitted by megan on June 4, 2010 - 2:18pm
The database schema page here on flossmole.org has been updated. I've got a single-page PNG of the schema, and an MWB file for those of you who like MySQL Workbench.
Submitted by megan on May 30, 2010 - 7:32am
I've decided to start using the wiki on our Google Code site to mark changes as they are made to the project. You can see what we're working on and where we left off on a particular project. This is as much for me as it is for you! Sometimes I forget where I was in a particular project and this will help me.
Here is a link to the wiki of the May 2010 Changelog on Google Code.
Submitted by megan on May 30, 2010 - 7:28am
It's been a long time since we shipped new data to Teragrid. I apologize for that oversight.
The new connection info is bebop.sdsc.edu and then the mysql port and your username/password.
Differences/shortcomings/things to know:
1) We are still collecting data all the time so the Teragrid is always a bit behind the master collection at Elon and the project file downloads at Google Code
2) This includes datasources up to 222, although 223 is listed in the datasources table.
Pages