Microsoft CodePlex data

Codeplex was Microsoft's open source code forge. It began in 2006 and shut down in 2017. We collected the data at the time of shutdown, and provided it here at FLOSSmole for anyone to use.

Data is available in raw format or in the FLOSSmole database.

Sample graphics

SELECT create_date, count(*)
FROM cp_projects
GROUP BY 1
ORDER BY 1 ASC;

Data Model

cp_projects table:
This table holds 400k records of projects that were - at some point - created on Codeplex. This includes "spam" projects and removed projects. However, not every project will have many details filled in about it. This is because the details come from their "index" page (see next table) and we were not able to get an index page for about 300k of the spam/canceled projects.

proj_name: the short name of the project
datasource_id: the collection number (70910 in this case)
proj_long_name: the long name of the project
proj_url: the url on codeplex
description: the long description of the project (some of these are quite long)
current_dl_version: the latest version number of the software that you can download
current_dl_date: the date that the current downloadable version was posted
download_count: the count of downloads of the latest version
proj_status: the last state the project was in (alpha, beta, etc)
proj_license: what license does this project use?
create_date: the date the project was first created (month/year only - comes from the history page)
last_updated: the last datetime that this record was updated in our database

cp_projects_indexes table (database only):
This table holds 100k home page, history page, and developers page records for every non-canceled project on Codeplex. NOTE: even though some spam projects were removed from the 400k list above, there are still many projects in this list that I would consider "spam" since they are gibberish and hold no real code or data, and are being used for advertising.

proj_name: the short name of the project
datasource_id: the collection number (70910)
home_html: the Codeplex home page for the project
history_html: the Codeplex history page for the project (this is where we get the cp_projects.create_date from)
people_html: the Codeplex developer page where the developers for each project are listed

cp_project_history:
Lists the 864k unique history events (such as page changes) for the projects, during each collection (70910 in this case). Each project can have multiple of the same event during a given period.

history_id: a unique identifier for each history event (primary key)
proj_name: the short name for a project
datasource_id: the collection number
month, year: when the history event took place
page_name: what page was updated
page_url: what is the url of the page that was updated
author, author_url: who updated the page?
last_updated: the timestamp for when this record was last updated

cp_project_people
Lists the 49k individual people that are working on ANY Codeplex project.

username: the Codeplex system username of the person
datasource_id: the collection number
personal_statement: typed by the person, tells something about themselves
member_since: the date the person joined
last_visit: the last date they used the site
user_html: the home page for the user on Codeplex
last_updated: the timestamp for when this record was last updated

cp_project_people_roles
Lists the person, project, and what role they had on that project at the time of collection. A person can only have a single role on a given project at one time.

proj_name: the short name of the project
datasource_id: the collection number
username: the person's Codeplex username
role: what their job was on this project
project_member_since: when they joined the project
last_updated: the timestamp for when this record was last updated

AttachmentSize
Image icon withoutspam.png31.32 KB
Image icon withspam.png24.66 KB
Data Resources: