Breaking Update: the FCC has now acknowledged to Fight for the Future in an email that there is a discrepancy in their data and they dropped at least 244,881 pro-net neutrality comments: https://twitter.com/fightfortheftr/status/545318660387377152
Update 2: Sunlight has responded. To read our analysis of their response and our rebuttal to the Sunlight Foundation’s report, click here: http://tumblr.fightforthefuture.org/post/105475259503/why-is-sunlight-foundation-playing-into-american
WASHINGTON, DC –Several media outlets have run headlines based on a study from the Sunlight Foundation that is based on faulty data, which drastically underrepresents the number of pro-net neutrality comments the FCC received during its second comment period.
Based on a combination of errors by Sunlight and the FCC, it appears that Sunlight’s report undercounts the number of comments submitted through Battle for the Net (a collaboration of Fight for the Future, Demand Progress, and Free Press) by at least 500,000. This alone undermines Sunlight’s claim that anti-net neutrality comments dominated the reply-comment period, but there are likely additional errors as well.
Based on an initial look at the data by Fight for the Future’s technologists, it appears that there are two major issues:
The FCC failed to register a significant number of pro-net neutrality comments that were sent. We’ve thus far identified at least 150,798 comments that were missing from the FCC’s data dump, and ongoing analysis of their data suggests that this number is in fact much higher. This alone is enough to completely unseat the conclusion that anti-net neutrality commenters "dominated" the second comment period.
The Sunlight Foundation’s analysis used a flawed data set that it misleadingly characterized as representative of the full set of comments; it ignored one third of the release of the FCC data (by Sunlight’s own admission), close to 800,000 comments, because of difficulty processing those comments. The data Sunlight used cannot be assumed to be "reasonably representative” of all the comments. There were several methods by which comments could be submitted to the FCC. Because this led to inconsistencies in the FCC’s release of the data, it’s an error for Sunlight to infer that the excluded comments maintained the same distribution of pro- vs. anti-net neutrality submissions as the data Sunlight did consider. In particular, while pro-net neutrality comments were vastly undercounted by Sunlight, it appears that approximately all of American Commitment’s comments were counted: The organization claims to have generated 800,000 comments, and Sunlight claims to have counted 800,000 comments from them. The result is that the Sunlight Foundation’s finding that anti-net neutrality groups "dominated" the second round comment period is completely unfounded.
Fight for the Future co-founder Tiffiniy Cheng said, "Millions of people have spoken out in support of net neutrality, and their voices matter. Getting these numbers right is important. The FCC and the Sunlight Foundation need to act immediately to correct the record, and media outlets that have ran stories based on the faulty data should publish prominent corrections."
"Sunlight applied a flawed sampling methodology to a flawed set of data, and drew conclusions that are impossible to make with any ‘reasonably representative’ certainty," said Jeff Lyon, Fight for the Future’s Chief Technical Officer, "Sunlight’s approach is like trying to draw conclusions about the average income in Massachusetts by only surveying people in Boston." Lyon provided the following explanation of the serious errors he was able to identify in the FCC’s data and Sunlight’s analysis of it:
There are two major problems with the data the FCC released and the resulting study:
The Sunlight Foundation based its study on the data released by the FCC in this October 22 blog post by Gigi Sohn. However, the FCC failed to register hundreds of thousands of pro-net neutrality comments from Battle for the Net, and perhaps from other organizations.
Sunlight’s methodology was flawed. Sunlight was unable to parse all of the data released by the FCC. According to the FCC, there were 2.4 million comments in the data, but Sunlight was only able to read 1.6 million comments. Sunlight’s study is based on a subset of the data that misses one third of the data in FCC"s data dump; this data set is not reasonably representative of the big picture but in fact was comprised mainly of one set of comments. Furthermore, Sunlight significantly underreported the number of comments from Battle for the Net that the FCC actually recorded.
In actuality, there were at least 998,498 comments sent from Battle for the Net, but between the FCC not recording them and Sunlight applying a flawed methodology to analyze what little data there actually was, the end result was completely distorted.
The FCC failed to register our comments:
In the FCC’s release of the data, Ms. Sohn reports that the FCC received 725,169 comments through ECFS and CSV uploads during the second comment period from July 19th to September 15th.
However, just between September 12th and September 15th, Battle for the Net sent 527,953 comments through CSV uploads alone. We also submitted 470,596 more comments via ECFS and email. Battle for the Net’s numbers alone are far higher than the numbers reported by Ms. Sohn.
Given that numerous other individuals and organizations were submitting net neutrality comments during the same period, at best the FCC is severely underreporting the number of comments sent out from pro-net neutrality activists.
To verify this, we downloaded and analyzed the data dump of all comments received by the FCC during the second commenting period, and compared our data to the FCC’s. Please note that we have thus far only analyzed the 527,953 comments sent via CSV, and we are still processing reports on the data submitted by ECFS and email.
Total number of comments we submitted via CSV: 527,953
Almost all of these submissions used an open letter by Senator Angus King with each participant signing on. To do a sanity test, we checked our CSV data for two of the phrases from the letter:
Number of occurrences of phrase: ‘These principles of fairness and openness’ in our CSV comments: 525,189 (this number may be lower than actual due to aggressive deduplication)
Number of occurrences of phrase: ‘We are writing to urge you to implement’ in our CSV comments: 525,189 (this number may be lower than actual due to aggressive deduplication)
Next, we scanned the data from the dump of FCC’s ECFS comments from the second commenting period.
Number of occurrences of phrase: ‘These principles of fairness and openness’ in FCC’s data dump of ECFS comments: 374,421
Number of occurrences of phrase: ‘We are writing to urge you to implement’ in FCC’s data dump of ECFS comments: 374,391
We identified 525,189 CSV comments, and found that at most the FCC only recorded 374,421. From this basic analysis alone, it is clear that, at best, the FCC missed a huge number of the comments we submitted via CSV. But we also sent over 470,596 more comments via email and through FCC’s ECFS site, (before it broke from all the load we put it under). Initial results are indicating that a large number of these comments submitted through email and ECFS were also not recorded by the FCC, but we are still generating reports to more precisely quantify those numbers.
We are running a more thorough analysis of the data to identify all the individuals whose comments were not recorded by the FCC, but crunching through all of this data will take several hours.
Sunlight’s methodology was not "reasonably representative".
Sunlight was unable to parse all of the data released by the FCC. According to the FCC, there were 2.4 million comments in the data, but Sunlight was only able to read 1.6 million comments. They chose to base their conclusions on a subset of the data that may not be representative of the big picture. According to Sunlight’s own admission:
Clearly, 1.67 million documents is far short of 2.5 million (the number reported in the commission’s blog post). We spent enough time with these files that we’re reasonably sure that the FCC’s comment counts are incorrect and that our analysis is reasonably representative of what’s there, but the fact that it’s impossible for us to know for sure is problematic
Sunlight also significantly under-reported the number of comments that came from Battle for the Net commenters, estimating this at 271,608. When we pointed out how easily we identified at least 367,460 of our own comments in the data, they acknowledged their error. However, this margin alone could have been enough to tip their conclusions in favor of net neutrality activists.
Furthermore, the FCC confirmed that people who signed petitions would be counted as individual commenters. Many net-neutrality activist organizations attached their petition signatures as PDFs attached to single ECFS filings. Sunlight was unable to parse these PDFs and chose to simply exclude them from their sample pool, ignoring perhaps hundreds of thousands of pro-net neutrality comments. On the other hand, Sunlight was able to easily read all of American Commitment’s comments, further distorting their results in favor of anti-net neutrality commenters.
Sunlight applied a flawed sampling methodology to a flawed set of data, and drew conclusions that are impossible to make with any "reasonably representative" certainty.