|
Filtering software companies, given their limited resources, do not attempt to index or classify all of the billions of pages that exist on the Web. Instead, the set of pages that they attempt to examine and classify is restricted to a small portion of the Web. The companies use a variety of automated and manual methods to identify a universe of Web sites and pages to "harvest" for classification. These methods include: entering certain key words into search engines; following links from a variety of online directories (e.g., generalized directories like Yahoo or various specialized directories, such as those that provide links to sexually explicit content); reviewing lists of newly-registered domain names; buying or licensing lists of URLs from third parties; "mining" access logs maintained by their customers; and reviewing other submissions from customers and the public. The goal of each of these methods is to identify as many URLs as possible that are likely to contain content that falls within the filtering companies' category definitions.
The first method, entering certain keywords into commercial search engines, suffers from several limitations. First, the Web pages that may be "harvested" through this method are limited to those pages that search engines have already identified. However, as noted above, a substantial portion of the Web is not even theoretically indexable (because it is not linked to by any previously known page), and only approximately 50% of the pages that are theoretically indexable have actually been indexed by search engines. We are satisfied that the remainder of the indexable Web, and the vast "Deep Web," which cannot currently be indexed, includes materials that meet CIPA's categories of visual depictions that are obscene, child pornography, and harmful to minors. These portions of the Web cannot presently be harvested through the methods that filtering software companies use (except through reporting by customers or by observing users' log files), because they are not linked to other known pages. A user can, however, gain access to a Web site in the unindexed Web or the Deep Web if the Web site's proprietor or some other third party informs the user of the site's URL. Some Web sites, for example, send out mass email advertisements containing the site's URL, the spamming process we have described above. Second, the search engines that software companies use for harvesting are able to search text only, not images. This is of critical importance, because CIPA, by its own terms, covers only "visual depictions." 20 U.S.C. Sec. 9134(f)(1)(A)(i); 47 U.S.C. Sec. 254(h)(5)(B)(i). Image recognition technology is immature, ineffective, and unlikely to improve substantially in the near future. None of the filtering software companies deposed in this case employs image recognition technology when harvesting or categorizing URLs. Due to the reliance on automated text analysis and the absence of image recognition technology, a Web page with sexually explicit images and no text cannot be harvested using a search engine. This problem is complicated by the fact that Web site publishers may use image files rather than text to represent words, i.e., they may use a file that computers understand to be a picture, like a photograph of a printed word, rather than regular text, making automated review of their textual content impossible. For example, if the Playboy Web site displays its name using a logo rather than regular text, a search engine would not see or recognize the Playboy name in that logo.
In addition to collecting URLs through search engines and Web directories (particularly those specializing in sexually explicit sites or other categories relevant to one of the filtering companies' category definitions), and by mining user logs and collecting URLs submitted by users, the filtering companies expand their list of harvested URLs by using "spidering" software that can "crawl" the lists of pages produced by the previous four methods, following their links downward to bring back the pages to which they link (and the pages to which those pages link, and so on, but usually down only a few levels). This spidering software uses the same type of technology that commercial Web search engines use. While useful in expanding the number of relevant URLs, the ability to retrieve additional pages through this approach is limited by the architectural feature of the Web that page-to-page links tend to converge rather than diverge. That means that the more pages from which one spiders downward through links, the smaller the proportion of new sites one will uncover; if spidering the links of 1000 sites retrieved through a search engine or Web directory turns up 500 additional distinct adult sites, spidering an additional 1000 sites may turn up, for example, only 250 additional distinct sites, and the proportion of new sites uncovered will continue to diminish as more pages are spidered. These limitations on the technology used to harvest a set of URLs for review will necessarily lead to substantial underblocking of material with respect to both the category definitions employed by filtering software companies and CIPA's definitions of visual depictions that are obscene, child pornography, or harmful to minors. 2. The "Winnowing" or Categorization Phase
Once the URLs have been harvested, some filtering software companies use automated key word analysis tools to evaluate the content and/or features of Web sites or pages accessed via a particular URL and to tentatively prioritize or categorize them. This process may be characterized as "winnowing" the harvested URLs. Automated systems currently used by filtering software vendors to prioritize, and to categorize or tentatively categorize the content and/or features of a Web site or page accessed via a particular URL operate by means of (1) simple key word searching, and (2) the use of statistical algorithms that rely on the frequency and structure of various linguistic features in a Web page's text. The automated systems used to categorize pages do not include image recognition technology. All of the filtering companies deposed in the case also employ human review of some or all collected Web pages at some point during the process of categorizing Web pages. As with the harvesting process, each technique employed in the winnowing process is subject to limitations that can result in both overblocking and underblocking.
First, simple key-word-based filters are subject to the obvious limitation that no string of words can identify all sites that contain sexually explicit content, and most strings of words are likely to appear in Web sites that are not properly classified as containing sexually explicit content. As noted above, filtering software companies also use more sophisticated automated classification systems for the statistical classification of texts. These systems assign weights to words or other textual features and use algorithms to determine whether a text belongs to a certain category. These algorithms sometimes make reference to the position of a word within a text or its relative proximity to other words. The weights are usually determined by machine learning methods (often described as "artificial intelligence"). In this procedure, which resembles an automated form of trial and error, a system is given a "training set" consisting of documents preclassified into two or more groups, along with a set of features that might be potentially useful in classifying the sets. The system then "learns" rules that assign weights to those features according to how well they work in classification, and assigns each new document to a category with a certain probability. Notwithstanding their "artificial intelligence" description, automated text classification systems are unable to grasp many distinctions between types of content that would be obvious to a human. And of critical importance, no presently conceivable technology can make the judgments necessary to determine whether a visual depiction fits the legal definitions of obscenity, child pornography, or harmful to minors. Finally, all the filtering software companies deposed in this case use some form of human review in their process of winnowing and categorizing Web pages, although one company admitted to categorizing some Web pages without any human review. SmartFilter states that "the final categorization of every Web site is done by a human reviewer." Another filtering company asserts that of the 10,000 to 30,000 Web pages that enter the "work queue" to be categorized each day, two to three percent of those are automatically categorized by their PornByRef system (which only applies to materials classified in the pornography category), and the remainder are categorized by human review. SurfControl also states that no URL is ever added to its database without human review.
Human review of Web pages has the advantage of allowing more nuanced, if not more accurate, interpretations than automated classification systems are capable of making, but suffers from its own sources of error. The filtering software companies involved here have limited staff, of between eight and a few dozen people, available for hand reviewing Web pages. The reviewers that are employed by these companies base their categorization decisions on both the text and the visual depictions that appear on the sites or pages they are assigned to review. Human reviewers generally focus on English language Web sites, and are generally not required to be multi-lingual. Given the speed at which human reviewers must work to keep up with even a fraction of the approximately 1.5 million pages added to the publicly indexable Web each day, human error is inevitable. Errors are likely to result from boredom or lack of attentiveness, overzealousness, or a desire to "err on the side of caution" by screening out material that might be offensive to some customers, even if it does not fit within any of the company's category definitions. None of the filtering companies trains its reviewers in the legal definitions concerning what is obscene, child pornography, or harmful to minors, and none instructs reviewers to take community standards into account when making categorization decisions.
Perhaps because of limitations on the number of human reviewers and because of the large number of new pages that are added to the Web every day, filtering companies also widely engage in the practice of categorizing entire Web sites at the "root URL," rather than engaging in a more fine-grained analysis of the individual pages within a Web site. For example, the filtering software companies deposed in this case all categorize the entire Playboy Web site as Adult, Sexually Explicit, or Pornography. They do not differentiate between pages within the site containing sexually explicit images or text, and for example, pages containing no sexually explicit content, such as the text of interviews of celebrities or politicians. If the "root" or "top-level" URL of a Web site is given a category tag, then access to all content on that Web site will be blocked if the assigned category is enabled by a customer. In some cases, whole Web sites are blocked because the filtering companies focus only on the content of the home page that is accessed by entering the root URL. Entire Web sites containing multiple Web pages are commonly categorized without human review of each individual page on that site. Web sites that may contain multiple Web pages and that require authentication or payment for access are commonly categorized based solely on a human reviewer's evaluation of the pages that may be viewed prior to reaching the authentication or payment page.
Because there may be hundreds or thousands of pages under a root URL, filtering companies make it their primary mission to categorize the root URL, and categorize subsidiary pages if the need arises or if there is time. This form of overblocking is called "inheritance," because lower-level pages inherit the categorization of the root URL without regard to their specific content. In some cases, "reverse inheritance" also occurs, i.e., parent sites inherit the classification of pages in a lower level of the site. This might happen when pages with sexual content appear in a Web site that is devoted primarily to non-sexual content. For example, N2H2's Bess filtering product classifies every page in the Salon.com Web site, which contains a wide range of news and cultural commentary, as "Sex, Profanity," based on the fact that the site includes a regular column that deals with sexual issues. Blocking by both domain name and IP address is another practice in which filtering companies engage that is a function both of the architecture of the Web and of the exigencies of dealing with the rapidly expanding number of Web pages. The category lists maintained by filtering software companies can include URLs in either their human-readable domain name address form, their numeric IP address form, or both. Through "virtual hosting" services, hundreds of thousands of Web sites with distinct domain names may share a single numeric IP address. To the extent that filtering companies block the IP addresses of virtual hosting services, they will necessarily block a substantial amount of content without reviewing it, and will likely overblock a substantial amount of content.
Another technique that filtering companies use in order to deal with a structural feature of the Internet is blocking the root level URLs of so-called "loophole" Web sites. These are Web sites that provide access to a particular Web page, but display in the user's browser a URL that is different from the URL with which the particular page is usually associated. Because of this feature, they provide a "loophole" that can be used to get around filtering software, i.e., they display a URL that is different from the one that appears on the filtering company's control list. "Loophole" Web sites include caches of Web pages that have been removed from their original location, "anonymizer" sites, and translation sites. Caches are archived copies that some search engines, such as Google, keep of the Web pages they index. The cached copy stored by Google will have a URL that is different from the original URL. Because Web sites often change rapidly, caches are the only way to access pages that have been taken down, revised, or have changed their URLs for some reason. For example, a magazine might place its current stories under a given URL, and replace them monthly with new stories. If a user wanted to find an article published six months ago, he or she would be unable to access it if not for Google's cached version.
Some sites on the Web serve as a proxy or intermediary between a user and another Web page. When using a proxy server, a user does not access the page from its original URL, but rather from the URL of the proxy server. One type of proxy service is an "anonymizer." Users may access Web sites indirectly via an anonymizer when they do not want the Web site they are visiting to be able to determine the IP address from which they are accessing the site, or to leave "cookies" on their browser. Some proxy servers can be used to attempt to translate Web page content from one language to another. Rather than directly accessing the original Web page in its original language, users can instead indirectly access the page via a proxy server offering translation features. As noted above, filtering companies often block loophole sites, such as caches, anonymizers, and translation sites. The practice of blocking loophole sites necessarily results in a significant amount of overblocking, because the vast majority of the pages that are cached, for example, do not contain content that would match a filtering company's category definitions. Filters that do not block these loophole sites, however, may enable users to access any URL on the Web via the loophole site, thus resulting in substantial underblocking. 3. The Process for "Re-Reviewing" Web Pages After Their Initial Categorization Most filtering software companies do not engage in subsequent reviews of categorized sites or pages on a scheduled basis. Priority is placed on reviewing and categorizing new sites and pages, rather than on re-reviewing already categorized sites and pages. Typically, a filtering software vendor's previous categorization of a Web site is not re-reviewed for accuracy when new pages are added to the Web site. To the extent the Web site was previously categorized as a whole, the new pages added to the site usually share the categorization assigned by the blocking product vendor. This necessarily results in both over- and underblocking, because, as noted above, the content of Web pages and Web sites changes relatively rapidly.
In addition to the content on Web sites or pages changing rapidly, Web sites themselves may disappear and be replaced by sites with entirely different content. If an IP address associated with a particular Web site is blocked under a particular category and the Web site goes out of existence, then the IP address likely would be reassigned to a different Web site, either by an Internet service provider or by a registration organization, such as the American Registry for Internet Numbers, see http://www.arin.net. In that case, the site that received the reassigned IP address would likely be miscategorized. Because filtering companies do not engage in systematic re-review of their category lists, such a site would likely remain miscategorized unless someone submitted it to the filtering company for re-review, increasing the incidence of over- and underblocking. This failure to re-review Web pages primarily increases a filtering company's rate of overblocking. However, if a filtering company does not re-review Web pages after it determines that they do not fall into any of its blocking categories, then that would result in underblocking (because, for example, a page might add sexually explicit content). 3. The Inherent Tradeoff Between Overblocking and Underblocking
There is an inherent tradeoff between any filter's rate of overblocking (which information scientists also call "precision") and its rate of underblocking (which is also referred to as "recall"). The rate of overblocking or precision is measured by the proportion of the things a classification system assigns to a certain category that are appropriately classified. The plaintiffs' expert, Dr. Nunberg, provided the hypothetical example of a classification system that is asked to pick out pictures of dogs from a database consisting of 1000 pictures of animals, of which 80 were actually dogs. If it returned 100 hits, of which 80 were in fact pictures of dogs, and the remaining 20 were pictures of cats, horses, and deer, we would say that the system identified dog pictures with a precision of 80%. This would be analogous to a filter that overblocked at a rate of 20%. The recall measure involves determining what proportion of the actual members of a category the classification system has been able to identify. For example, if the hypothetical animal- picture database contained a total of 200 pictures of dogs, and the system identified 80 of them and failed to identify 120, it would have performed with a recall of 40%. This would be analogous to a filter that underblocked 60% of the material in a category. In automated classification systems, there is always a tradeoff between precision and recall. In the animal-picture example, the recall could be improved by using a looser set of criteria to identify the dog pictures in the set, such as any animal with four legs, and all the dogs would be identified, but cats and other animals would also be included, with a resulting loss of precision. The same tradeoff exists between rates of overblocking and underblocking in filtering systems that use automated classification systems. For example, an automated system that classifies any Web page that contains the word "sex" as sexually explicit will underblock much less, but overblock much more, than a system that classifies any Web page containing the phrase "free pictures of people having sex" as sexually explicit.
This tradeoff between overblocking and underblocking also applies not just to automated classification systems, but also to filters that use only human review. Given the approximately two billion pages that exist on the Web, the 1.5 million new pages that are added daily, and the rate at which content on existing pages changes, if a filtering company blocks only those Web pages that have been reviewed by humans, it will be impossible, as a practical matter, to avoid vast amounts of underblocking. Techniques used by human reviewers such as blocking at the IP address level, domain name level, or directory level reduce the rates of underblocking, but necessarily increase the rates of overblocking, as discussed above. To use a simple example, it would be easy to design a filter intended to block sexually explicit speech that completely avoids overblocking. Such a filter would have only a single sexually explicit Web site on its control list, which could be re-reviewed daily to ensure that its content does not change. While there would be no overblocking problem with such a filter, such a filter would have a severe underblocking problem, as it would fail to block all the sexually explicit speech on the Web other than the one site on its control list. Similarly, it would also be easy to design a filter intended to block sexually explicit speech that completely avoids underblocking. Such a filter would operate by permitting users to view only a single Web site, e.g., the Sesame Street Web site. While there would be no underblocking problem with such a filter, it would have a severe overblocking problem, as it would block access to millions of non-sexually explicit sites on the Web other than the Sesame Street site.
While it is thus quite simple to design a filter that does not overblock, and equally simple to design a filter that does not underblock, it is currently impossible, given the Internet's size, rate of growth, rate of change, and architecture, and given the state of the art of automated classification systems, to develop a filter that neither underblocks nor overblocks a substantial amount of speech. The more effective a filter is at blocking Web sites in a given category, the more the filter will necessarily overblock. Any filter that is reasonably effective in preventing users from accessing sexually explicit content on the Web will necessarily block substantial amounts of non- sexually explicit speech. 4. Attempts to Quantify Filtering Programs' Rates of Over- and Underblocking The government presented three studies, two from expert witnesses, and one from a librarian fact witness who conducted a study using Internet use logs from his own library, that attempt to quantify the over- and underblocking rates of five different filtering programs. The plaintiffs presented one expert witness who attempted to quantify the rates of over- and underblocking for various programs. Each of these attempts to quantify rates of over- and underblocking suffers from various methodological flaws.
The fundamental problem with calculating over- and underblocking rates is selecting a universe of Web sites or Web pages to serve as the set to be tested. The studies that the parties submitted in this case took two different approaches to this problem. Two of the studies, one prepared by the plaintiffs' expert witness Chris Hunter, a graduate student at the University of Pennsylvania, and the other prepared by the defendants' expert, Chris Lemmons of eTesting Laboratories, in Research Triangle Park, North Carolina, approached this problem by compiling two separate lists of Web sites, one of URLs that they deemed should be blocked according to the filters' criteria, and another of URLs that they deemed should not be blocked according to the filters' criteria. They compiled these lists by choosing Web sites from the results of certain key word searches. The problem with this selection method is that it is neither random, nor does it necessarily approximate the universe of Web pages that library patrons visit.
The two other studies, one by David Biek, head librarian at the Tacoma Public Library's main branch, and one by Cory Finnell of Certus Consulting Group, of Seattle, Washington, chose actual logs of Web pages visited by library patrons during specific time periods as the universe of Web pages to analyze. This method, while surely not as accurate as a truly random sample of the indexed Web would be (assuming it would be possible to take such a sample), has the virtue of using the actual Web sites that library patrons visited during a specific period. Because library patrons selected the universe of Web sites that Biek and Finnell's studies analyzed, this removes the possibility of bias resulting from the study author's selection of the universe of sites to be reviewed. We find that the Lemmons and Hunter studies are of little probative value because of the methodology used to select the sample universe of Web sites to be tested. We will therefore focus on the studies conducted by Finnell and Biek in trying to ascertain estimates of the rates of over- and underblocking that takes place when filters are used in public libraries. The government hired expert witness Cory Finnell to study the Internet logs compiled by the public libraries systems in Tacoma, Washington; Westerville, Ohio; and Greenville, South Carolina. Each of these libraries uses filtering software that keeps a log of information about individual Web site requests made by library patrons. Finnell, whose consulting firm specializes in data analysis, has substantial experience evaluating Internet access logs generated on networked systems. He spent more than a year developing a reporting tool for N2H2, and, in the course of that work, acquired a familiarity with the design and operation of Internet filtering products.
The Tacoma library uses Cyber Patrol filtering software, and logs information only on sites that were blocked. Finnell worked from a list of all sites that were blocked in the Tacoma public library in the month of August 2001. The Westerville library uses the Websense filtering product, and logs information on both blocked sites and non-blocked sites. When the logs reach a certain size, they are overwritten by new usage logs. Because of this overwriting feature, logs were available to Finnell only for the relatively short period from October 1, 2001 to October 3, 2001. The Greenville library uses N2H2's filtering product and logs both blocked sites and sites that patrons accessed. The logs contain more than 500,000 records per day. Because of the volume of the records, Finnell restricted his analysis to the period from August 2, 2001 to August 15, 2001.
Finnell calculated an overblocking rate for each of the three libraries by examining the host Web site containing each of the blocked pages. He did not employ a sampling technique, but instead examined each blocked Web site. If the contents of a host Web site or the pages within the Web site were consistent with the filtering product's definition of the category under which the site was blocked, Finnell considered it to be an accurate block. Finnell and three others, two of whom were temporary employees, examined the Web sites to determine whether they were consistent with the filtering companies' category definitions. Their review was, of course, necessarily limited by: (1) the clarity of the filtering companies' category definitions; (2) Finnell's and his employees' interpretations of the definitions; and (3) human error. The study's reliability is also undercut by the fact that Finnell failed to archive the blocked Web pages as they existed either at the point that a patron in one of the three libraries was denied access or when Finnell and his team reviewed the pages. It is therefore impossible for anyone to check the accuracy and consistency of Finnell's review team, or to know whether the pages contained the same content when the block occurred as they did when Finnell's team reviewed them. This is a key flaw, because the results of the study depend on individual determinations as to overblocking and underblocking, in which Finnell and his team were required to compare what they saw on the Web pages that they reviewed with standard definitions provided by the filtering company.
Tacoma library's Cyber Patrol software blocked 836 unique Web sites during the month of August. Finnell determined that 783 of those blocks were accurate and that 53 were inaccurate. The error rate for Cyber Patrol was therefore estimated to be 6.34%, and the true error rate was estimated with 95% confidence to lie within the range of 4.69% to 7.99%. Finnell and his team reviewed 185 unique Web sites that were blocked by Westerville Library's Websense filter during the logged period and determined that 158 of them were accurate and that 27 of them were inaccurate. He therefore estimated the Websense filter's overblocking rate at 14.59% with a 95% confidence interval of 9.51% to 19.68%. Additionally, Finnell examined 1,674 unique Web sites that were blocked by the Greenville Library's N2H2 filter during the relevant period and determined that 1,520 were accurate and that 87 were inaccurate. This yields an estimated overblocking rate of 5.41% and a 95% confidence interval of 4.33% to 6.55%. Finnell's methodology was materially flawed in that it understates the rate of overblocking for the following reasons. First, patrons from the three libraries knew that the filters were operating, and may have been deterred from attempting to access Web sites that they perceived to be "borderline" sites, i.e., those that may or may not have been appropriately filtered according to the filtering companies' category definitions. Second, in their cross-examination of Finnell, the plaintiffs offered screen shots of a number of Web sites that, according to Finnell, had been appropriately blocked, but that Finnell admitted contained only benign materials. Finnell's explanation was that the Web sites must have changed between the time when he conducted the study and the time of the trial, but because he did not archive the images as they existed when his team reviewed them for the study, there is no way to verify this. Third, because of the way in which Finnell counted blocked Web sites i.e., if separate patrons attempted to reach the same Web site, or one or more patrons attempted to access more than one page on a single Web site, Finnell counted these attempts as a single block, see supra note 10 his results necessarily understate the number of times that patrons were erroneously denied access to information.
At all events, there is no doubt that Finnell's estimated rates of overblocking, which are based on the filtering companies' own category definitions, significantly understate the rate of overblocking with respect to CIPA's category definitions for filtering for adults. The filters used in the Tacoma, Westerville, and Greenville libraries were configured to block, among other things, images of full nudity and sexually explicit materials. There is no dispute, however, that these categories are far broader than CIPA's categories of visual depictions that are obscene, or child pornography, the two categories of material that libraries subject to CIPA must certify that they filter during adults' use of the Internet. Finnell's study also calculated underblocking rates with respect to the Westerville and Greenville Libraries (both of which logged not only their blocked sites, but all sites visited by their patrons), by taking random samples of URLs from the list of sites that were not blocked. The study used a sample of 159 sites that were accessed by Westerville patrons and determined that only one of them should have been blocked under the software's category definitions, yielding an underblocking rate of 0.6%. Given the size of the sample, the 95% confidence interval is 0% to 1.86%. The study examined a sample of 254 Web sites accessed by patrons in Greenville and found that three of them should have been blocked under the filtering software's category definitions. This results in an estimated underblocking rate of 1.2% with a 95% confidence interval ranging from 0% to 2.51%.
We do not credit Finnell's estimates of the rates of underblocking in the Westerville and Greenville public libraries for several reasons. First, Finnell's estimates likely understate the actual rate of underblocking because patrons, who knew that filtering programs were operating in the Greenville and Westerville Libraries, may have refrained from attempting to access sites with sexually explicit materials, or other contents that they knew would probably meet a filtering program's blocked categories. Second, and most importantly, we think that the formula that Finnell used to calculate the rate of underblocking in these two libraries is not as meaningful as the formula that information scientists typically use to calculate a rate of recall, which we describe above in Subsection II.E.3. As Dr. Nunberg explained, the standard method that information scientists use to calculate a rate of recall is to sort a set of items into two groups, those that fall into a particular category (e.g., those that should have been blocked by a filter) and those that do not. The rate of recall is then calculated by dividing the number of items that the system correctly identified as belonging to the category by the total number of items in the category.
In the example above, we discussed a database that contained 1000 photographs. Assume that 200 of these photographs were pictures of dogs. If, for example, a classification system designed to identify pictures of dogs identified 80 of the dog pictures and failed to identify 120, it would have performed with a recall rate of 40%. This would be analogous to a filter that underblocked at a rate of 60%. To calculate the recall rate of the filters in the Westerville and Greenville public libraries in accordance with the standard method described above, Finnell should have taken a sample of sites from the libraries' Internet use logs (including both sites that were blocked and sites that were not), and divided the number of sites in the sample that the filter incorrectly failed to block by the total number of sites in the sample that should have been blocked. What Finnell did instead was to take a sample of sites that were not blocked, and divide the total number of sites in this sample by the number of sites in the sample that should have been blocked. This made the denominator that Finnell used much larger than it would have been had he used the standard method for calculating recall, consequently making the underblocking rate that he calculated much lower than it would have been under the standard method.
Moreover, despite the relatively low rates of underblocking that Finnell's study found, librarians from several of the libraries proffered by defendants that use blocking products, including Greenville, Tacoma, and Westerville, testified that there are instances of underblocking in their libraries. No quantitative evidence was presented comparing the effectiveness of filters and other alternative methods used by libraries to prevent patrons from accessing visual depictions that are obscene, child pornography, or in the case of minors, harmful to minors. Biek undertook a similar study of the overblocking rates that result from the Tacoma Library's use of the Cyber Patrol software. He began with the 3,733 individual blocks that occurred in the Tacoma Library in October 2000 and drew from this data set a random sample of 786 URLs. He calculated two rates of overblocking, one with respect to the Tacoma Library's policy on Internet use that the pictorial content of the site may not include "graphic materials depicting full nudity and sexual acts which are portrayed obviously and exclusively for sensational or pornographic purposes" and the other with respect to Cyber Patrol's own category definitions. He estimated that Cyber Patrol overblocked 4% of all Web pages in October 2000 with respect to the definitions of the Tacoma Library's Internet Policy and 2% of all pages with respect to Cyber Patrol's own category definitions.
It is difficult to determine how reliable Biek's conclusions are, because he did not keep records of the raw data that he used in his study; nor did he archive images of the Web pages as they looked when he made the determination whether they were properly classified by the Cyber Patrol program. Without this information, it is impossible to verify his conclusions (or to undermine them). And Biek's study certainly understates Cyber Patrol's overblocking rate for some of the same reasons that Finnell's study likely understates the true rates of overblocking used in the libraries that he studied. We also note that Finnell's study, which analyzed a set of Internet logs from the Tacoma Library during which the same filtering program was operating with the same set of blocking categories enabled, found a significantly higher rate of overblocking than the Biek study did. Biek found a rate of overblocking of approximately 2% while the Finnell study estimated a 6.34% rate of overblocking. At all events, the category definitions employed by CIPA, at least with respect to adult use visual depictions that are obscene or child pornography are narrower than the materials prohibited by the Tacoma Library policy, and therefore Biek's study understates the rate of overblocking with respect to CIPA's definitions for adults. In sum, we think that Finnell's study, while we do not credit its estimates of underblocking, is useful because it states lower bounds with respect to the rates of overblocking that occurred when the Cyber Patrol, Websense, and N2H2 filters were operating in public libraries. While these rates are substantial between nearly 6% and 15% we think, for the reasons stated above, that they greatly understate the actual rates of overblocking that occurs, and therefore cannot be considered as anything more than minimum estimates of the rates of overblocking that happens in all filtering programs. 5. Methods of Obtaining Examples of Erroneously Blocked Web Sites
The plaintiffs assembled a list of several thousand Web sites that they contend were, at the time of the study, likely to have been erroneously blocked by one or more of four major commercial filtering programs: SurfControl Cyber Patrol 6.0.1.47, N2H2 Internet Filtering 2.0, Secure Computing SmartFilter 3.0.0.01, and Websense Enterprise 4.3.0. They compiled this list using a two-step process. First, Benjamin Edelman, an expert witness who testified before us, compiled a list of more than 500,000 URLs and devised a program to feed them through all four filtering programs in order to compile a list of URLs that might have been erroneously blocked by one or more of the programs. Second, Edelman forwarded subsets of the list that he compiled to librarians and professors of library science whom the plaintiffs had hired to review the blocked sites for suitability in the public library context. Edelman assembled the list of URLs by compiling Web pages that were blocked by the following categories in the four programs: Cyber Patrol: Adult/Sexually Explicit; N2H2: Adults Only, Nudity, Pornography, and Sex, with "exceptions" engaged in the categories of Education, For Kids, History, Medical, Moderated, and Text/Spoken Only; SmartFilter: Sex, Nudity, Mature, and Extreme; Websense: Adult Content, Nudity, and Sex.
Edelman then assembled a database of Web sites for possible testing. He derived this list by automatically compiling URLs from the Yahoo index of Web sites, taking them from categories from the Yahoo index that differed significantly from the classifications that he had enabled in each of the blocking programs (taking, for example, Web sites from Yahoo's "Government" category). He then expanded this list by entering URLs taken from the Yahoo index into the Google search engine's "related" search function, which provides the user with a list of similar sites. Edelman also included and excluded specific Web sites at the request of the plaintiffs' counsel.
Taking the list of more than 500,000 URLs that he had compiled, Edelman used an automated system that he had developed to test whether particular URLs were blocked by each of the four filtering programs. This testing took place between February and October 2001. He recorded the specific dates on which particular sites were blocked by particular programs, and, using commercial archiving software, archived the contents of the home page of the blocked Web sites (and in some instances the pages linked to from the home page) as it existed when it was blocked. Through this process, Edelman, whose testimony we credit, compiled a list of 6,777 URLs that were blocked by one or more of the four programs. Because these sites were chosen from categories from the Yahoo directory that were unrelated to the filtering categories that were enabled during the test (i.e., "Government" vs. "Nudity"), he reasoned that they were likely erroneously blocked. As explained in the margin, Edelman repeated his testing and discovered that Cyber Patrol had unblocked most of the pages on the list of 6,777 after he had published the list on his Web site. His records indicate that an employee of SurfControl (the company that produces Cyber Patrol software) accessed his site and presumably checked out the URLs on the list, thus confirming Edelman's judgment that the majority of URLs on the list were erroneously blocked. Edelman forwarded the list of blocked sites to Dr. Joseph Janes, an Assistant Professor in the Information School of the University of Washington who also testified at trial as an expert witness. Janes reviewed the sites that Edelman compiled to determine whether they are consistent with library collection development, i.e., whether they are sites to which a reference librarian would, consistent with professional standards, direct a patron as a source of information.
Edelman forwarded Janes a list of 6,775 Web sites, almost the entire list of blocked sites that he collected, from which Janes took a random sample of 859 using the SPSS statistical software package. Janes indicated that he chose a sample size of 859 because it would yield a 95% confidence interval of plus or minus 2.5%. Janes recruited a group of 16 reviewers, most of whom were current or former students at the University of Washington's Information School, to help him identify which sites were appropriate for library use. We describe the process that he used in the margin. Due to the inability of a member of Janes's review team to complete the reviewing process, Janes had to cut 157 Web sites out of the sample, but because the Web sites were randomly assigned to reviewers, it is unlikely that these sites differed significantly from the rest of the sample. That left the sample size at 699, which widened the 95% confidence interval to plus or minus 2.8%.
Of the total 699 sites reviewed, Janes's team concluded that 165 of them, or 23.6% percent of the sample, were not of any value in the library context (i.e., no librarian would, consistent with professional standards, refer a patron to these sites as a source of information). They were unable to find 60 of the Web sites, or 8.6% of the sample. Therefore, they concluded that the remaining 474 Web sites, or 67.8% of the sample, were examples of overblocking with respect to materials that are appropriate sources of information in public libraries. Applying a 95% confidence interval of plus or minus 2.8%, the study concluded that we can be 95% confident that the actual percentage of sites in the list of 6,775 sites that are appropriate for use in public libraries is somewhere between 65.0% and 70.6%. In other words, we can be 95% certain that the actual number of sites out of the 6,775 that Edelman forwarded to Janes that are appropriate for use in public libraries (under Janes's standard) is somewhere between 4,403 and 4,783.
The government raised some valid criticisms of Janes's methodology, attacking in particular the fact that, while sites that received two "yes" votes in the first round of voting were determined to be of sufficient interest in a library context to be removed from further analysis, sites receiving one or two "no" votes were sent to the next round. The government also correctly points out that results of Janes's study can be generalized only to the population of 6,775 sites that Edelman forwarded to Janes. Even taking these criticisms into account, and discounting Janes's numbers appropriately, we credit Janes's study as confirming that Edelman's set of 6,775 Web sites contains at least a few thousand URLs that were erroneously blocked by one or more of the four filtering programs that he used, whether judged against CIPA's definitions, the filters' own category criteria, or against the standard that the Janes study used. Edelman tested only 500,000 unique URLs out of the 4000 times that many, or two billion, that are estimated to exist in the indexable Web. Even assuming that Edelman chose the URLs that were most likely to be erroneously blocked by commercial filtering programs, we conclude that many times the number of pages that Edelman identified are erroneously blocked by one or more of the filtering programs that he tested. Edelman's and Janes's studies provide numerous specific examples of Web pages that were erroneously blocked by one or more filtering programs. The Web pages that were erroneously blocked by one or more of the filtering programs do not fall into any neat patterns; they range widely in subject matter, and it is difficult to tell why they may have been overblocked. The list that Edelman compiled, for example, contains Web pages relating to religion, politics and government, health, careers, education, travel, sports, and many other topics. In the next section, we provide examples from each of these categories. 6. Examples of Erroneously Blocked Web Sites
Several of the erroneously blocked Web sites had content relating to churches, religious orders, religious charities, and religious fellowship organizations. These included the following Web sites: the Knights of Columbus Council 4828, a Catholic men's group associated with St. Patrick's Church in Fallon, Nevada, http://msnhomepages.talkcity.com/SpiritSt/kofc4828, which was blocked by Cyber Patrol in the "Adult/Sexually Explicit" category; the Agape Church of Searcy, Arkansas, http://www.agapechurch.com, which was blocked by Websense as "Adult Content"; the home page of the Lesbian and Gay Havurah of the Long Beach, California Jewish Community Center, http://www.compupix.com/gay/havurah.htm, which was blocked by N2H2 as "Adults Only, Pornography," by Smartfilter as "Sex," and by Websense as "Sex"; Orphanage Emmanuel, a Christian orphanage in Honduras that houses 225 children, http://home8.inet.tele.dk/rfb_viva, which was blocked by Cyber Patrol in the "Adult/Sexually Explicit" category; Vision Art Online, which sells wooden wall hangings for the home that contain prayers, passages from the Bible, and images of the Star of David, http://www.visionartonline.com, which was blocked in Websense's "Sex" category; and the home page of Tenzin Palmo, a Buddhist nun, which contained a description of her project to build a Buddhist nunnery and international retreat center for women, http://www.tenzinpalmo.com, which was categorized as "Nudity" by N2H2.
Several blocked sites also contained information about governmental entities or specific political candidates, or contained political commentary. These included: the Web site for Kelley Ross, a Libertarian candidate for the California State Assembly, http://www.friesian.com/ross/ca40, which N2H2 blocked as "Nudity"; the Web site for Bob Coughlin, a town selectman in Dedham, Massachusetts, http://www.bobcoughlin.org, which was blocked under N2H2's "Nudity" category; a list of Web sites containing information about government and politics in Adams County, Pennsylvania, http://www.geocities.com/adamscopa, which was blocked by Websense as "Sex"; the Web site for Wisconsin Right to Life, http://www.wrtl.org, which N2H2 blocked as "Nudity"; a Web site that promotes federalism in Uganda, http://federo.com, which N2H2 blocked as "Adults Only, Pornography"; "Fight the Death Penalty in the USA," a Danish Web site dedicated to criticizing the American system of capital punishment, http://www.fdp.dk, which N2H2 blocked as "Pornography"; and "Dumb Laws," a humor Web site that makes fun of outmoded laws, http://www.dumblaws.com, which N2H2 blocked under its "Sex" category. Erroneously blocked Web sites relating to health issues included the following: a guide to allergies, http://www.x- sitez.com/allergy, which was categorized as "Adults Only, Pornography" by N2H2; a health question and answer site sponsored by Columbia University, http://www.goaskalice.com.columbia.edu, which was blocked as "Sex" by N2H2, and as "Mature" by Smartfilter; the Western Amputee Support Alliance Home Page, http://www.usinter.net/wasa, which was blocked by N2H2 as "Pornography"; the Web site of the Willis-Knighton Cancer Center, a Shreveport, Louisiana cancer treatment facility, http://cancerftr.wkmc.com, which was blocked by Websense under the "Sex" category; and a site dealing with halitosis, http://www.dreamcastle.com/tungs, which was blocked by N2H2 as "Adults, Pornography," by Smartfilter as "Sex," by Cyber Patrol as "Adult/Sexually Explicit," and by Websense as "Adult Content."
The filtering programs also erroneously blocked several Web sites having to do with education and careers. The filtering programs blocked two sites that provide information on home schooling. "HomEduStation the Internet Source for Home Education," http://www.perigee.net/~mcmullen/homedustation/, was categorized by Cyber Patrol as "Adult/Sexually Explicit." Smartfilter blocked "Apricot: A Web site made by and for home schoolers," http://apricotpie.com, as "Sex." The programs also miscategorized several career-related sites. "Social Work Search," http://www.socialworksearch.com/, is a directory for social workers that Cyber Patrol placed in its "Adult/Sexually Explicit" category. The "Gay and Lesbian Chamber of Southern Nevada," http://www.lambdalv.com, "a forum for the business community to develop relationships within the Las Vegas lesbian, gay, transsexual, and bisexual community" was blocked by N2H2 as "Adults Only, Pornography." A site for aspiring dentists, http://www.vvm.com/~bond/home.htm, was blocked by Cyber Patrol in its "Adult/Sexually Explicit" category. The filtering programs erroneously blocked many travel Web sites, including: the Web site for the Allen Farmhouse Bed & Breakfast of Alleghany County, North Carolina, http://planet- nc.com/Beth/index.html, which Websense blocked as "Adult Content"; Odysseus Gay Travel, a travel company serving gay men, http://www.odyusa.com, which N2H2 categorized as "Adults Only, Pornography"; Southern Alberta Fly Fishing Outfitters, http://albertaflyfish.com, which N2H2 blocked as "Pornography"; and "Nature and Culture Conscious Travel," a tour operator in Namibia, http://www.trans-namibia-tours.com, which was categorized as "Pornography" by N2H2.
The filtering programs also miscategorized a large number of sports Web sites. These included: a site devoted to Willie O'Ree, the first African-American player in the National Hockey League, http://www.missioncreep.com/mw/oree.html, which Websense blocked under its "Nudity" category; the home page of the Sydney University Australian Football Club, http://www.tek.com.au/suafc, which N2H2 blocked as "Adults Only, Pornography," Smartfilter blocked as "Sex," Cyber Patrol blocked as "Adult/Sexually Explicit" and Websense blocked as "Sex"; and a fan's page devoted to the Toronto Maple Leafs hockey team, http://www.torontomapleleafs.atmypage.com, which N2H2 blocked under the "Pornography" category. 7. Conclusion: The Effectiveness of Filtering Programs Public libraries have adopted a variety of means of dealing with problems created by the provision of Internet access. The large amount of sexually explicit speech that is freely available on the Internet has, to varying degrees, led to patron complaints about such matters as unsought exposure to offensive material, incidents of staff and patron harassment by individuals viewing sexually explicit content on the Internet, and the use of library computers to access illegal material, such as child pornography. In some libraries, youthful library patrons have persistently attempted to use the Internet to access hardcore pornography.
Those public libraries that have responded to these problems by using software filters have found such filters to provide a relatively effective means of preventing patrons from accessing sexually explicit material on the Internet. Nonetheless, out of the entire universe of speech on the Internet falling within the filtering products' category definitions, the filters will incorrectly fail to block a substantial amount of speech. Thus, software filters have not completely eliminated the problems that public libraries have sought to address by using the filters, as evidenced by frequent instances of underblocking. Nor is there any quantitative evidence of the relative effectiveness of filters and the alternatives to filters that are also intended to prevent patrons from accessing illegal content on the Internet. Even more importantly (for this case), although software filters provide a relatively cheap and effective, albeit imperfect, means for public libraries to prevent patrons from accessing speech that falls within the filters' category definitions, we find that commercially available filtering programs erroneously block a huge amount of speech that is protected by the First Amendment. Any currently available filtering product that is reasonably effective in preventing users from accessing content within the filter's category definitions will necessarily block countless thousands of Web pages, the content of which does not match the filtering company's category definitions, much less the legal definitions of obscenity, child pornography, or harmful to minors. Even Finnell, an expert witness for the defendants, found that between 6% and 15% of the blocked Web sites in the public libraries that he analyzed did not contain content that meets even the filtering products' own definitions of sexually explicit content, let alone CIPA's definitions.
This phenomenon occurs for a number of reasons explicated in the more detailed findings of fact supra. These include limitations on filtering companies' ability to: (1) harvest Web pages for review; (2) review and categorize the Web pages that they have harvested; and (3) engage in regular re-review of the Web pages that they have previously reviewed. The primary limitations on filtering companies' ability to harvest Web pages for review is that a substantial majority of pages on the Web are not indexable using the spidering technology that Web search engines use, and that together, search engines have indexed only around half of the Web pages that are theoretically indexable. The fast rate of growth in the number of Web pages also limits filtering companies' ability to harvest pages for review. These shortcomings necessarily result in significant underblocking. Several limitations on filtering companies' ability to review and categorize the Web pages that they have harvested also contribute to over- and underblocking. First, automated review processes, even those based on "artificial intelligence," are unable with any consistency to distinguish accurately material that falls within a category definition from material that does not. Moreover, human review of URLs is hampered by filtering companies' limited staff sizes, and by human error or misjudgment. In order to deal with the vast size of the Web and its rapid rates of growth and change, filtering companies engage in several practices that are necessary to reduce underblocking, but inevitably result in overblocking. These include: (1) blocking whole Web sites even when only a small minority of their pages contain material that would fit under one of the filtering company's categories (e.g., blocking the Salon.com site because it contains a sex column); (2) blocking by IP address (because a single IP address may contain many different Web sites and many thousands of pages of heterogenous content); and (3) blocking loophole sites such as translator sites and cache sites, which archive Web pages that have been removed from the Web by their original publisher.
Finally, filtering companies' failure to engage in regular re-review of Web pages that they have already categorized (or that they have determined do not fall into any category) results in a substantial amount of over- and underblocking. For example, Web publishers change the contents of Web pages frequently. The problem also arises when a Web site goes out of existence and its domain name or IP address is reassigned to a new Web site publisher. In that case, a filtering company's previous categorization of the IP address or domain name would likely be incorrect, potentially resulting in the over- or underblocking of many thousands of pages. The inaccuracies that result from these limitations of filtering technology are quite substantial. At least tens of thousands of pages of the indexable Web are overblocked by each of the filtering programs evaluated by experts in this case, even when considered against the filtering companies' own category definitions. Many erroneously blocked pages contain content that is completely innocuous for both adults and minors, and that no rational person could conclude matches the filtering companies' category definitions, such as "pornography" or "sex."
The number of overblocked sites is of course much higher with respect to the definitions of obscenity and child pornography that CIPA employs for adults, since the filtering products' category definitions, such as "sex" and "nudity," encompass vast amounts of Web pages that are neither child pornography nor obscene. Thus, the number of pages of constitutionally protected speech blocked by filtering products far exceeds the many thousands of pages that are overblocked by reference to the filtering products' category definitions.
No presently conceivable technology can make the judgments necessary to determine whether a visual depiction fits the legal definitions of obscenity, child pornography, or harmful to minors. Given the state of the art in filtering and image recognition technology, and the rapidly changing and expanding nature of the Web, we find that filtering products' shortcomings will not be solved through a technical solution in the foreseeable future. In sum, filtering products are currently unable to block only visual depictions that are obscene, child pornography, or harmful to minors (or, only content matching a filtering product's category definitions) while simultaneously allowing access to all protected speech (or, all content not matching the blocking product's category definitions). Any software filter that is reasonably effective in blocking access to Web pages that fall within its category definitions will necessarily erroneously block a substantial number of Web pages that do not fall within its category definitions. 2. Analytic Framework for the Opinion: The Centrality of Dole and the Role of the Facial Challenge
Both the plaintiffs and the government agree that, because this case involves a challenge to the constitutionality of the conditions that Congress has set on state actors' receipt of federal funds, the Supreme Court's decision in South Dakota v. Dole, 483 U.S. 203 (1987), supplies the proper threshold analytic framework. The constitutional source of Congress's spending power is Article I, Sec. 8, cl. 1, which provides that "Congress shall have Power . . . to pay the Debts and provide for the common Defence and general Welfare of the United States." In Dole, the Court upheld the constitutionality of a federal statute requiring the withholding of federal highway funds from any state with a drinking age below 21. Id. at 211-12. In sustaining the provision's constitutionality, Dole articulated four general constitutional limitations on Congress's exercise of the spending power.
First, "the exercise of the spending power must be in pursuit of 'the general welfare.'" Id. at 207. Second, any conditions that Congress sets on states' receipt of federal funds must be sufficiently clear to enable recipients "to exercise their choice knowingly, cognizant of the consequences of their participation." Id. (internal quotation marks and citation omitted). Third, the conditions on the receipt of federal funds must bear some relation to the purpose of the funding program. Id. And finally, "other constitutional provisions may provide an independent bar to the conditional grant of federal funds." Id. at 208. In particular, the spending power "may not be used to induce the States to engage in activities that would themselves be unconstitutional. Thus, for example, a grant of federal funds conditioned on invidiously discriminatory state action or the infliction of cruel and unusual punishment would be an illegitimate exercise of the Congress' broad spending power." Id. at 210.
Plaintiffs do not contend that CIPA runs afoul of the first three limitations. However, they do allege that CIPA is unconstitutional under the fourth prong of Dole because it will induce public libraries to violate the First Amendment. Plaintiffs therefore submit that the First Amendment "provide[s] an independent bar to the conditional grant of federal funds" created by CIPA. Id. at 208. More specifically, they argue that by conditioning public libraries' receipt of federal funds on the use of software filters, CIPA will induce public libraries to violate the First Amendment rights of Internet content-providers to disseminate constitutionally protected speech to library patrons via the Internet, and the correlative First Amendment rights of public library patrons to receive constitutionally protected speech on the Internet. The government concedes that under the Dole framework, CIPA is facially invalid if its conditions will induce public libraries to violate the First Amendment. The government and the plaintiffs disagree, however, on the meaning of Dole's "inducement" requirement in the context of a First Amendment facial challenge to the conditions that Congress places on state actors' receipt of federal funds. The government contends that because plaintiffs are bringing a facial challenge, they must show that under no circumstances is it possible for a public library to comply with CIPA's conditions without violating the First Amendment. The plaintiffs respond that even if it is possible for some public libraries to comply with CIPA without violating the First Amendment, CIPA is facially invalid if it "will result in the impermissible suppression of a substantial amount of protected speech."
Because it was clear in Dole that the states could comply with the challenged conditions that Congress attached to the receipt of federal funds without violating the Constitution, the Dole Court did not have occasion to explain fully what it means for Congress to use the spending power to "induce [recipients] to engage in activities that would themselves be unconstitutional." Dole, 483 U.S. at 210; see id. at 211 ("Were South Dakota to succumb to the blandishments offered by Congress and raise its drinking age to 21, the State's action in so doing would not violate the constitutional rights of anyone."). Although the proposition that Congress may not pay state actors to violate citizens' First Amendment rights is unexceptionable when stated in the abstract, it is unclear what exactly a litigant must establish to facially invalidate an exercise of Congress's spending power on this ground. In general, it is well-established that a court may sustain a facial challenge to a statute only if the plaintiff demonstrates that the statute admits of no constitutional application. See United States v. Salerno, 481 U.S. 739, 745 (1987) ("A facial challenge to a legislative Act is, of course, the most difficult challenge to mount successfully, since the challenger must establish that no set of circumstances exists under which the Act would be valid."); see also Bowen v. Kendrick, 487 U.S. 589, 612 (1988) ("It has not been the Court's practice, in considering facial challenges to statutes of this kind, to strike them down in anticipation that particular applications may result in unconstitutional use of funds.") (internal quotation marks and citation omitted).
First Amendment overbreadth doctrine creates a limited exception to this rule by permitting facial invalidation of a statute that burdens a substantial amount of protected speech, even if the statute may be constitutionally applied in particular circumstances. "The Constitution gives significant protection from overbroad laws that chill speech within the First Amendment's vast and privileged sphere. Under this principle, [a law] is unconstitutional on its face if it prohibits a substantial amount of protected expression." Ashcroft v. Free Speech Coalition, 122 S. Ct. 1389, 1399 (2002); see also Broadrick v. Oklahoma, 413 U.S. 601, 612 (1973). This more liberal test of a statute's facial validity under the First Amendment stems from the recognition that where a statute's reach contemplates a number of both constitutional and unconstitutional applications, the law's sanctions may deter individuals from challenging the law's validity by engaging in constitutionally protected speech that may nonetheless be proscribed by the law. Without an overbreadth doctrine, "the contours of regulation would have to be hammered out case by case and tested only by those hardy enough to risk criminal prosecution to determine the proper scope of regulation." Dombrowski v. Pfister, 380 U.S. 479, 487 (1965); see also Brockett v. Spokane Arcades, Inc., 472 U.S. 491, 503 (1985) ("[A]n individual whose own speech or expressive conduct may validly be prohibited or sanctioned is permitted to challenge a statute on its face because it also threatens others not before the court those who desire to engage in legally protected expression but who may refrain from doing so rather than risk prosecution or undertake to have the law declared partially invalid.").
Plaintiffs argue that the overbreadth doctrine is applicable here, since CIPA "threatens to chill free speech because it will censor a substantial amount of protected speech, because it is vague, and because the law creates a prior restraint . . . ." Unlike the statutes typically challenged as facially overbroad, however, CIPA does not impose criminal penalties on those who violate its conditions. Cf. Freedom of Speech Coalition, 122 S. Ct. at 1398 ("With these severe penalties in force, few legitimate movie producers or book publishers, or few other speakers in any capacity, would risk distributing images in or near the uncertain reach of this law."). Thus, the rationale for permitting facial challenges to laws that may be constitutionally applied in some instances is less compelling in cases such as this, which involve challenges to Congress's exercise of the spending power, than in challenges to criminal statutes. Nonetheless, "even minor punishments can chill protected speech," id., and absent the ability to challenge CIPA on its face, public libraries that depend on federal funds may decide to comply with CIPA's terms, thereby denying patrons access to substantial amounts of constitutionally protected speech, rather than refusing to comply with CIPA's terms and consequently losing the benefits of federal funds. See 47 C.F.R. Sec. 54.520(e)(1) ("A school or library that knowingly fails to ensure the use of computers in accordance with the certifications required by this section, must reimburse any funds and discounts received under the federal universal support service support mechanism for schools and libraries for the period in which there was noncompliance."). Even in cases where the only penalty for failure to comply with a statute is the withholding of federal funds, the Court has sustained facial challenges to Congress's exercise of the spending power. See, e.g., Legal Servs. Corp. v. Velazquez, 531 U.S. 533 (2001) (declaring unconstitutional on its face a federal statute restricting the ability of legal services providers who receive federal funds to engage in activity protected by the First Amendment).
The Court's unconstitutional conditions cases, such as Velazquez, are not strictly controlling, since they do not require a showing that recipients who comply with the conditions attached to federal funding will, as state actors, violate others' constitutional rights, as is the case under the fourth prong of Dole. However, they are highly instructive. The Supreme Court's pronouncements in the unconstitutional conditions cases on what is necessary for a plaintiff to mount a successful First Amendment facial challenge to an exercise of Congress's spending power have not produced a seamless web. For example, in Rust v. Sullivan, 500 U.S. 173 (1991), the Court rejected a First Amendment facial challenge to federal regulations prohibiting federally funded healthcare clinics from providing counseling concerning the use of abortion as a method of family planning, explaining that: Petitioners are challenging the facial validity of the regulations. Thus, we are concerned only with the question whether, on their face, the regulations are both authorized by the Act and can be construed in such a manner that they can be applied to a set of individuals without infringing upon constitutionally protected rights. Petitioners face a heavy burden in seeking to have the regulations invalidated as facially unconstitutional. . . . The fact that the regulations might operate unconstitutionally under some conceivable set of circumstances is insufficient to render them wholly invalid.
Id. at 183 (internal quotation marks, alterations, and citation omitted). In contrast, NEA v. Finley, 524 U.S. 569 (1998), which also involved a facial First Amendment challenge to an exercise of Congress's spending power, articulated a somewhat more liberal test of facial validity than Rust, explaining that "[t]o prevail, respondents must demonstrate a substantial risk that application of the provision will lead to the suppression of speech." Id. at 580. Against this background, it is unclear to us whether, to succeed in facially invalidating CIPA on the grounds that it will "induce the States to engage in activities that would themselves be unconstitutional," Dole, 483 U.S. at 210, plaintiffs must show that it is impossible for public libraries to comply with CIPA's conditions without violating the First Amendment, or rather simply that CIPA will effectively restrict library patrons' access to substantial amounts of constitutionally protected speech, therefore causing many libraries to violate the First Amendment. However, we need not resolve this issue. Rather, we may assume without deciding, for purposes of this case, that a facial challenge to CIPA requires plaintiffs to show that any public library that complies with CIPA's conditions will necessarily violate the First Amendment and, as explained in detail below, we believe that CIPA's constitutionality fails even under this more restrictive test of facial validity urged on us by the government. Because of the inherent limitations in filtering technology, public libraries can never comply with CIPA without blocking access to a substantial amount of speech that is both constitutionally protected and fails to meet even the filtering companies' own blocking criteria. We turn first to the governing legal principles to be applied to the facts in order to determine whether the First Amendment permits a library to use the filtering technology mandated by CIPA. 3. Level of Scrutiny Applicable to Content-based Restrictions on Internet Access in Public Libraries
In analyzing the constitutionality of a public library's use of Internet filtering software, we must first identify the appropriate level of scrutiny to apply to this restriction on patrons' access to speech. While plaintiffs argue that a public library's use of such filters is subject to strict scrutiny, the government maintains that the applicable standard is rational basis review. If strict scrutiny applies, the government must show that the challenged restriction on speech is narrowly tailored to promote a compelling government interest and that no less restrictive alternative would further that interest. United States v. Playboy Entm't Group, Inc., 529 U.S. 803, 813 (2000). In contrast, under rational basis review, the challenged restriction need only be reasonable; the government interest that the restriction serves need not be compelling; the restriction need not be narrowly tailored to serve that interest; and the restriction "need not be the most reasonable or the only reasonable limitation." Cornelius v. NAACP Legal Def. & Educ. Fund, 473 U.S. 788, 808 (1985).
Software filters, by definition, block access to speech on the basis of its content, and content-based restrictions on speech are generally subject to strict scrutiny. See Playboy, 529 U.S. at 813 ("[A] content-based speech restriction . . . can stand only if it satisfies strict scrutiny."). Strict scrutiny does not necessarily apply to content-based restrictions on speech, however, where the restrictions apply only to speech on government property, such as public libraries. "[I]t is . . . well settled that the government need not permit all forms of speech on property that it owns and controls." Int'l Soc'y for Krishna Consciousness, Inc. v. Lee, 505 U.S. 672, 678 (1992). We perforce turn to a discussion of public forum doctrine. 1. Overview of Public Forum Doctrine The government's power to restrict speech on its own property is not unlimited. Rather, under public forum doctrine, the extent to which the First Amendment permits the government to restrict speech on its own property depends on the character of the forum that the government has created. See Cornelius v. NAACP Legal Def. & Educ. Fund, Inc., 473 U.S. 788 (1985). Thus, the First Amendment affords greater deference to restrictions on speech in those areas considered less amenable to free expression, such as military bases, see Greer v. Spock, 424 U.S. 828 (1976), jail grounds, see Adderley v. Florida, 385 U.S. 39 (1966), or public airport terminals, see Int'l Soc'y for Krishna Consciousness, Inc. v. Lee, 505 U.S. 672 (1992), than to restrictions on speech in state universities, see Rosenberger v. Rector & Visitors of Univ. of Va., 515 U.S. 819 (1995), or streets, sidewalks and public parks, see Frisby v. Schultz, 487 U.S. 474 (1988); Hague v. CIO, 307 U.S. 496 (1939). The Supreme Court has identified three types of fora for purposes of identifying the level of First Amendment scrutiny applicable to content-based restrictions on speech on government property: traditional public fora, designated public fora, and nonpublic fora. Traditional public fora include sidewalks, squares, and public parks: [S]treets and parks . . . have immemorially been held in trust for the use of the public and, time out of mind, have been used for purposes of assembly, communicating thoughts between citizens, and discussing public questions. Such use of the streets and public places has, from ancient times, been a part of the privileges, immunities, rights, and liberties of citizens.
Hague, 307 U.S. at 515. "In these quintessential public forums, . . . [f]or the State to enforce a content-based exclusion it must show that its regulation is necessary to serve a compelling state interest and that it is narrowly drawn to achieve that end." Perry Educ. Ass'n v. Perry Local Educs. Ass'n, 460 U.S. 37, 45 (1983); see also Int'l Soc'y for Krishna Consciousness, 505 U.S. at 678 ("[R]egulation of speech on government property that has traditionally been available for public expression is subject to the highest scrutiny."); Frisby, 487 U.S. at 480 ("[W]e have repeatedly referred to public streets as the archetype of a traditional public forum."). A second category of fora, known as designated (or limited) public fora, "consists of public property which the State has opened for use by the public as a place for expressive activity." Perry, 460 U.S. at 46. Whereas any content-based restriction on the use of traditional public fora is subject to strict scrutiny, the state is generally permitted, as long as it does not discriminate on the basis of viewpoint, to limit a designated public forum to certain speakers or the discussion of certain subjects. See Perry, 460 U.S. at 45 n.7. Once it has defined the limits of a designated public forum, however, "[r]egulation of such property is subject to the same limitations as that governing a traditional public forum." Int'l Soc'y for Krishna Consciousness, 505 U.S. at 678. Examples of designated fora include university meeting facilities, see Widmar v. Vincent, 454 U.S. 263 (1981), school board meetings, see City of Madison Joint School Dist. v. Wisc. Employment Relations Comm'n, 429 U.S. 167 (1976), and municipal theaters, see Southeastern Promotions, Ltd. v. Conrad, 420 U.S. 546 (1975).
The third category, nonpublic fora, consists of all remaining public property. "Limitations on expressive activity conducted on this last category of property must survive only a much more limited review. The challenged regulation need only be reasonable, as long as the regulation is not an effort to suppress the speaker's activity due to disagreement with the speaker's view." Int'l Soc'y for Krishna Consciousness, 505 U.S. at 679. 2. Contours of the Relevant Forum: the Library's Collection as a Whole or the Provision of Internet Access?
To apply public forum doctrine to this case, we must first determine whether the appropriate forum for analysis is the library's collection as a whole, which includes both print and electronic resources, or the library's provision of Internet access. Where a plaintiff seeks limited access, for expressive purposes, to governmentally controlled property, the Supreme Court has held that the relevant forum is defined not by the physical limits of the government property at issue, but rather by the specific access that the plaintiff seeks: Although . . . as an initial matter a speaker must seek access to public property or to private property dedicated to public use to evoke First Amendment concerns, forum analysis is not completed merely by identifying the government property at issue. Rather, in defining the forum we have focused on the access sought by the speaker. When speakers seek general access to public property, the forum encompasses that property. In cases in which limited access is sought, our cases have taken a more tailored approach to ascertaining the perimeters of a forum within the confines of the government property. Cornelius v. NAACP Legal Def. & Educ. Fund, Inc., 473 U.S. 788, 801 (1985).
Thus, in Cornelius, where the plaintiffs were legal defense and political advocacy groups seeking to participate in the Combined Federal Campaign charity drive, the Court held that the relevant forum, for First Amendment purposes, was not the entire federal workplace, but rather the charity drive itself. Id. at 801. Similarly, in Perry Education Association v. Perry Local Educators' Association, 460 U.S. 37 (1983), which addressed a union's right to access a public school's internal mail system and teachers' mailboxes, the Court identified the relevant forum as the school's mail system, not the public school as a whole. In Widmar v. Vincent, 454 U.S. 263 (1981), in which a student group challenged a state university's restrictions on use of its meeting facilities, the Court identified the relevant forum as the meeting facilities to which the plaintiffs sought access, not the state university generally. And in Christ's Bride Ministries, Inc. v. SEPTA, 148 F.3d 242 (3d Cir. 1998), involving a First Amendment challenge to the removal of advertisements from subway and commuter rail stations, the Third Circuit noted that the forum at issue was not the rail and subway stations as a whole, but rather the advertising space within the stations. Id. at 248. Although these cases dealt with the problem of identifying the relevant forum where speakers are claiming a right of access, we believe that the same approach applies to identifying the relevant forum where the parties seeking access are listeners or readers.
In this case, the patron plaintiffs are not asserting a First Amendment right to compel public libraries to acquire certain books or magazines for their print collections. Nor are the Web site plaintiffs claiming a First Amendment right to compel public libraries to carry print materials that they publish. Rather, the right at issue in this case is the specific right of library patrons to access information on the Internet, and the specific right of Web publishers to provide library patrons with information via the Internet. Thus, the relevant forum for analysis is not the library's entire collection, which includes both print and electronic media, such as the Internet, but rather the specific forum created when the library provides its patrons with Internet access. Although a public library's provision of Internet access does not resemble the conventional notion of a forum as a well- defined physical space, the same First Amendment standards apply. See Rosenberger v. Rector & Visitors of Univ. of Va., 515 U.S. 819, 830 (1995) (holding that a state university's student activities fund "is a forum more in a metaphysical than a spatial or geographic sense, but the same principles are applicable"); see also Cornelius, 473 U.S. at 801 (identifying the Combined Federal Campaign charity drive as the relevant unit of analysis for application of public forum doctrine). 3. Content-based Restrictions in Designated Public Fora
Unlike nonpublic fora such as airport terminals, see Int'l Soc'y for Krishna Consciousness, Inc. v. Lee, 505 U.S. 672 (1992), military bases, see Greer v. Spock, 424 U.S. 828 (1976), jail grounds, see Adderley v. Florida, 385 U.S. 39 (1966), the federal workplace, see Cornelius v. NAACP Legal Def. & Educ. Fund, 473 U.S. 788, 805 (1985), and public transit vehicles, see Lehman v. City of Shaker Heights, 418 U.S. 298 (1974), the purpose of a public library in general, and the provision of Internet access within a public library in particular, is "for use by the public . . . for expressive activity," Perry Educ. Ass'n v. Perry Local Educs. Ass'n, 460 U.S. 37, 45 (1983), namely, the dissemination and receipt by the public of a wide range of information. We are satisfied that when the government provides Internet access in a public library, it has created a designated public forum. See Mainstream Loudoun v. Bd. of Trustees of the Loudoun County Library, 24 F. Supp. 2d 552, 563 (E.D. Va. 1998); cf. Kreimer v. Bureau of Police, 958 F.2d 1242, 1259 (3d Cir. 1992) (holding that a public library is a limited public forum). Relying on those cases that have recognized that government has leeway, under the First Amendment, to limit use of a designated public forum to narrowly specified purposes, and that content-based restrictions on speech that are consistent with those purposes are subject only to rational basis review, the government argues for application of rational basis review to public libraries' decisions about which content to make available to their patrons via the Internet. See Rosenberger, 515 U.S. 819, 829 (1995) ("The necessities of confining a forum to the limited and legitimate purposes for which it was created may justify the State in reserving it for certain groups or for the discussion of certain topics."); Perry, 460 U.S. at 46 n.7 (1983) ("A public forum may be created for a limited purpose such as use by certain groups . . . or for the discussion of certain subjects."). |
|