The Sunlight Foundation today published a post defending their claim that anti-net neutrality activists "dominated" the second round of the FCC’s Open Internet proceeding, based on data released by the FCC. However, numerous problems with the data and their methodology make it impossible to support their conclusions.
There were three huge problems.
First, we know the data Sunlight used excluded many of our comments. Today we confirmed with the FCC that at least 244,000 pro-net neutrality comments were not processed correctly due to an error on their end, and were missing from the data they released to Sunlight. This alone is enough to tip the scales in favor of net neutrality activists, if going purely by the numbers, and it could be much higher. 
Second, the Sunlight Foundation knew it was missing over 800,000 comments, and they didn’t try to figure out if the data they analysed was a representative sample. The group that "dominated" the second round of comments (in Sunlight’s words) could have simply been the one organization that–due to the technique it used for submitting–didn’t get all its submissions garbled. In fact, it looks like that’s what happened. American Commitment’s own reported numbers are actually a little lower than their total in Sunlight’s report (probably due to a final burst of paid advertising before the deadline). Pro-net neutrality comments got lost. Anti-net neutrality comments didn’t. The sample wasn’t representative of the whole.
The third problem in Sunlight’s report is their methodology, as they themselves describe it. Sunlight has publicly acknowledged a huge difference in how they counted some comments from pro-net neutrality groups like Free Press versus comments from American Commitment, the one (shady) anti-net neutrality group.
All groups were effectively collecting signatures on a letter. American Commitment submitted them as a barrage of identical comments, while groups like Free Press submitted them as signatures on a single letter. The FCC says it recognizes and counts both. But Sunlight Foundation admits they chose to treat them differently, excluding multiple signatures on a single letter from the count.
We can’t see any basis for this, other than convenience. In both cases, an individual member of the American public is taking a moment to say "Hey FCC: I agree with the following statement." It just happens that two groups submitted that sentiment in different ways. It would be one thing if the FCC treated such comments differently, but they don’t! The FCC has said signatures count the same as individual comments.
Sunlight expresses it in neutral tones, saying "This isn’t to suggest that signature-only submissions shouldn’t be counted, but the focus of our report meant that we discarded them." The thing is, that’s an arbitrary choice, at odds with both the intent of the people who commented and the FCC’s own criteria. And it has a huge impact on the results!
By making this choice, Sunlight knows they are excluding many pro-net neutrality comments while including *every* anti-net neutrality comment from American Commitment. In other words, they know their sample is skewed.
Finally, we’re bothered by how Sunlight handled the correction. Our CTO was up all night with the FCC data they used, comparing it to ours, and found some serious issues. We told them this. We urged them to work with us today to figure out what went wrong.
Instead, they worked on a response in silence, simply justifying their work without examining their exclusion of signatures. This would be bad even if American Commitment wasn’t engaged in a cynical attempt to manipulate the public conversation around an extremely important issue. But they are. Read about it. Sunlight knows this context, but instead of approaching the data carefully and working with others to get the answer right, they’re playing right into the scam.
The headline of their first post was (and remains at the time of writing) "One group dominates the second round of net neutrality comments." If you arbitrarily ignore a subset of commenters (the signers) and base your conclusions on data everyone agrees is incomplete and broken, that’s true. The problem is, their provocative headline doesn’t include that disclaimer.
We still hope Sunlight will do the right thing, acknowledge the mistakes they and the FCC made and correct the record. That headline is the first thing they should fix.
 A note on duplicates: Sunlight writes that, of the limited number of comments from Battle for the Net that actually made it into the FCC’s data, many were duplicates and thus excluded from the study. We verified that we sent at least 526,657 unique CSV comments to the FCC by examining reference numbers we attached to them. Today the FCC acknowledged that there were technical problems with their ability to process the CSVs, and they are missing at least 244,000 pro-net neutrality comments from Battle for the Net. If the comments that actually made it into Sunlight’s data were duplicates of one another, then the number of comments the FCC lost or garbled in its release is even greater than the 244,000 confirmed so far.