(Reader advisory: this post contains 10,379 words, it’s long, very long, easily two cups of tea and a kitkat long)
Much as I’m sure many people will sling various insults at you regarding your interview in the Telegraph regarding age classifications for website, you’ll find here at Penguin Central we like to take a more mild mannered and refrained attitude towards political discourse, unless the subject is a Tory of course.
So, Andy, you’re the Secretary of State for Culture, Media and Sport. As such you’ve presumably got within your brief a remit over facets of the internet, or shall we start off by getting the technical terms correct, the World Wide Web. Just for your information, the internet is everything that constitutes an inter-connected network of computers that spans the globe. The World Wide Web is the little bit of it that you see when you type www.something.com into Internet Explorer 6 from one of your Parliamentary computers.
As such, the World Wide Web, all those pretty graphical sites that you see in Internet Explorer 6 constitutes a relatively small amount of the data on the internet. The vast majority of information and data available is contained on FTP servers around the globe. Available either openly or maybe restricted to certain users depending on the nature or ownership of said data or indeed they may constitute web sites that are considered the ‘Deep Web’ as in sites that are not readily available either by design or non-registration with services such as search engines.
So before we proceed, a quick question, are we talking the World Wide Web or the Internet? If the former then the task is simply impossible. If however we’re talking the latter then as much of this information is contained without any publicly available indication of origin, ie a ‘link’ on a website pointing to it then how exactly does anyone including any regulatory body think it is going to assess and classify it?
On to the bare bones of this issue now. Assuming we’re not talking the whole Internet, but the sub-section of it that we know as the World Wide Web and the non ‘deep web’ element of it and it’s notable that you were talking largely in terms of English Language sites which we’ll concentrate on
Is it possible to classify and control access to the entire internet/World Wide Web? Well, nothing’s impossible but here’s a few statistics and technological facts to think about, oh, and some financial costs because these things cost money.
You have spoken of a government regulatory body. Given the parallels of implementing a system of classification, we’ll use the British Board of Film Classification as our comparable organisation even though strictly speaking it is a non-governmental body with no specific powers as such as the final decision on the release, content, cutting and the like of films actually falls to local authorities.
Last year (actually these are the figures for 2008 but it’s almost last year) the BBFC classified 639 films and incidentally cut 7 of them. Now despite its name, the BBFC isn’t just about films, they do videos and video games. Of which in 2008 the BBFC classified 11,439 videos (cut 411), 966 video trailers (cut 5), 638 Film advertisements (cut 1), 13 video adevertisements (cut 0) and 242 video games & other interactive entertainment.
A quick tot up time. In 2008, the BBFC classified 13,937 works of one nature or another. There’s not been a great deal of difference in the overall figures of classifications in recent years so we’re going to assume that’s a rough figure of their general workload.
A few World Wide web figures now. Firstly though, lets just mention that there are no definitive figures and they are growing rather quickly. However, according to the Wikipedia page on the ‘Internet’ we’ve got:
“According to a 2001 study, there were massively more than 550 billion documents on the Web, mostly in the invisible Web, or deep Web. A 2002 survey of 2,024 million Web pages determined that by far the most Web content was in English: 56.4%; next were pages in German (7.7%), French (5.6%), and Japanese (4.9%). A more recent study, which used Web searches in 75 different languages to sample the Web, determined that there were over 11.5 billion Web pages in the publicly indexable Web as of the end of January 2005. As of June 2008, the indexable web contains at least 63 billion pages. On July 25, 2008, Google software engineers Jesse Alpert and Nissan Hajaj announced that Google Search had discovered one trillion unique URLs.
Over 100.1 million websites operated as of March 2008. Of these 74% were commercial or other sites operating in the
.com generic top-level domain.“
So lets just toss a couple of figures about. In 2001 an estimated 56.1% of content was in English. Lets assume with the increase in activity in non-English speaking countries of late, that figure has dropped to 50%. As of this July Google announced they have indexed 1trillion unique URL’s and don’t forget, that’s stuff that is in the readily available top bit of the World Wide Web which isn’t the majority and generally not the really nasty stuff like child porn and torture sites anyway.
So with the nice round figure of a trillion in mind and the current annual rate of classification by the BBFC of 13,937 and assuming that no other works are added to the internet in the mean time, that equals at the current level of work undertook by the BBFC, 7,175,145 years worth of work.
OK, there are a lot of variables here because odds on you could probably classify a single page of a website quicker than a feature length movie so lets assume our classifiers are really fast, can interpret and decide on an individual page in 1 minute and we won’t have them doing any form filling or bureaucracy to waste time as well. So a trillion minutes say. That would be: 166,666,666,666 hours. Or: 694,444,444 days. Or: 19,025,875 Years. Or: roughly 3,000 times the amount of time our species has had the ability to formulate written language. Quick note, that does assume that someone is working 24 hours a day. If we were to factor in a normal working day of 8 hours we can treble that figure.
So that’s web pages. Assuming we’re not talking web pages but individual sites as a whole. So again returning to the figures from Wikipedia as a good indication, we’ve got 100.1million websites and lets knock that down to 50million as being in English. The figures look a little more palatable now. So lets see how we get on with individual websites.
Using the same methodology, that would be: 3,587 years of work for the BBFC or, only about half of all the recorded period for which our species has had written language. Let’s see how that goes with theÂ minute per site with someone working 24 hours a day shall we? 833,333 hours, 34,722 days or 95 years. So assuming that not a single site is added onto the World Wide Web from now on and the person doing the classification can decide on a site in a minute and they can work 24 hours a day we could get the job done in a century’s worth of man hours.
There are a few problems with this approach. Unless you classify every page on a website then you cannot fully associate any meaningful age classification for it. It would be the equivalent of classifying the Texas Chainsaw Massacre on the basis of a few seconds clip which may be as innocuous as a skyline picture across Texas or a scene of someone being disemboweled with a chainsaw, it just doesn’t work and to prove it, here’s an example. Let’s take my humble little domain: www.politicalpenguin.org.uk.
If you were to take the content available on it from say November 2008 then it’s fairly benign. I don’t do swearing or pictures of beheadings, the glorification of violence or kiddie porn. However if you were to have a little rummage back in my archives from when I first starting blogging I experimented with different styles and to be fair swore a bit. In the end I decided it wasn’t my style but there’s a fair number of naughty words knocking about deep in the archive.
Equally I chose November as an example deliberately because the other week I penned this article on web censorship which I’d advise you also read as a practical example of the problems relating to this issue. Were the current front page of my site to be audited in some way then would my article on child pornography, as reasoned and detached as it is warrant a rating that it is not suitable for a certain age group?
The next important issue to consider is subdomains. If you’re not familiar with these, here’s how domain structures work. There’s the top level domain, think .com, .uk. In some territories like the UK there are sub top level domains often to denote the nature of the site so think .ac.uk for academic institutions, .gov.uk for governmental institutions etc. Below that is what we consider the domain name so the ‘politicalpenguin’ element of politicalpenguin.org.uk. Now I don’t have any subdomains operating from this URL but I could have any number of them such as ‘me.politicalpenguin.org.uk’ where we can see the domain is once again separated from the next level of the URL by the use of a full stop.
This is important when considering quite a lot of online community sites. Lets take two examples. WordPress.com and Blogspot.com. Each is a service run and freely available for anyone wanting to set up and run a blog. However, to differentiate between individual blogs a subdomain is introduced. So for example Hopi Sen’s blog is www.hopisen.wordpress.com and Sadie’s Tavern is www.sadiestavern.blogspot.com of which if you’re not aware, both are a thoroughly good read.
Now unless you then add in every single subdomain and lets not forget, that 100.1million registered domains only accounts for the main domain name of which an infinite number of sub domains can be created then you’re not going to be able to regulate any content with any degree of effectiveness.
You also risk tarring the good with the same brush as the bad. Much as the people behind WordPress.com and Blogspot.com (owned by Google btw) are good at taking down content that crops up which is offensive, you cannot reasonably expect to regulate unless you take down the main domain name. Incidentally, WordPress.com alone has 5,096,001Â registered blogs, all with their own subdomain.
This leads us on to another important factor. The World Wide Web in its current 2.0 fashion irrespective of where it might be heading is not a static entity. Unlike films that are distributed the dynamics are inherently different. Here’s how it goes with old media. Person makes film, film studio distribute a set version of that film which is assessed and classified and the end consumer buys it or pays to see it. Short of the odd re-release and directors cut version when the industry is trying to cream an extra few dollars from a production, the end content remains static to the end user. Similarly with the majority of video games. Despite their interactive nature and user manipulation, the media and ultimate total number of variables are set and can therefore be classified.
The World Wide Web works a bit differently. Just because a website displays certain content at a certain time is no guarantee that it will always do so. The domain name may pass hands so whereas (and these a fictitous examples) www.hotwheels.com may at some point be a commercial car retailer, at another point in time, perhaps if said company were to go bust, the domain may be bought up and used by someone else to display naked women masturbating over Ferrari’s.
In it’s previous guise, that website could have been classified and deemed acceptable for a young audience but you probably wouldn’t want your 8 year old viewing its latter form.
The essential difference here is that with old media, films, videos, video games (not that they’re old but their distribution system is) and even books, they constituted a physical entity of which the reproduction of was prohibitive to entry. The capital cost of the machinery and distribution network meant that unless you were either very rich or had a viable business model to making good profits then you were excluded from publishing content that could be available on a wide scale.
This is no longer the case. A computer, maybe a mobile phone with a camera and a broadband connection coupled with simplistic online tools allows even the most technically challenged to post or release content in a variety of formats, be it written, audio or video.
I think we can now establish that in terms of content control and or classification of what is currently available irrespective of future content, there is no practical method by which all the content on the World Wide Web could be classified by a governmental or non-governmental authority, it is simply impossible. Equally, short of banning individuals from owning various pieces of currently relatively cheap equipment, you cannot control the content that is uploaded to websites either.
We shall move on from this approach to look at other options.
The automated approach:
Are there bits of kit out there capable of assessing websites and drawing up some sort of classification structure? Yes and no. Yes there are and they can, no, they can’t do it very well. Here’s how they work.
These solutions can be either hardware or software and I won’t bore you with the different makes and models but Cisco do a nice line in the hardware department and apparently they’re apparently very popular with the Chinese Government. There are effectively two methodologies here. The interception and the blacklist approach.
Lets deal with interception first. This is where the hardware approach generally applies and is a favourite of the Chinese Government. End user requests a website by typing in the URL or clicking on a hyperlink to it from another site or search engine. Before that page is served up on their computer at home it passes through the ISP where a bit of hardware scans the page looking for whatever it’s been programmed to look for and if it passes, serves the site or doesn’t as the case may be if it detects something it doesn’t like.
There’s a big problem with this approach. Much as we all love technology and computers. They’re stupid. Let’s take your own example of a beheading. We’ll deal with images later but at a cruder level lets deal with text. Hardware scans site. Site mentions the word beheading, site or individual page is blocked. Congratulations, you’ve just block content from the BBC, CNN and pretty much any other news agency that might report on a hostage being beheaded by a terrorist group in Iraq or wherever.
There is of course a way around this. You combine such hardware with a whitelist. So it’s OK to allow content to pre-approved sites even if they contain certain keywords because they’re ‘reputable’. There’s only one slight hitch with this approach. You may have allowed access to the BBC or CNN but you’ve now blocked this article which contains that keyword even though this article is concerned with a legitimate debate which is inherent in a democratic country and as such, even in this article, constitutes a small example amidst a much wider issue.
Lets get down and diggy with images now. Computers are good at words, even if they’re rubbish at understanding context. However images, as we currently stand in a technical sense are a lot more difficult. Computers don’t understand objects and images. At best, utilising the latest in current Artificial Intelligence we can just about teach them to recognise an apple. Not much use when trying to differentiate someone’s holiday snaps from Majorca they’ve uploaded to Flickr from a three-way gangbang session on a porn site.
That said, there have been experimentations in this area which are at best crude. Let’s put aside images of torture for a moment because you’re completely lost in that area and think images of naked people doing naughties.
There are some systems that have been used in this area. They work on a fairly simple procedure of scanning images and working out the proportion of skin coloured pixels. If it appears too high, it assumes it a naughty photo of naked people and blocks it. Apart from the slight problem that people aren’t all white (it’s even more rubbish at spotting naked black people), the proportion of skin coloured pixels is hard to determine between said naked people doing naughties and Sandra in her bikini on holiday in Majorca and quite legitimate sites that may for example sell swimwear, lingerie or medical sites that deal with anatomy.
Staying with the images problem, and we’re not even going to visit video content because if even the best systems currently available can’t do static images, they haven’t got a hope in hell of understanding moving ones.
There is the issue of using a textual approach. Here’s where we cross over into software analysis and consider bots. These are nice little creatures that inhabit the Internet although they mainly stick to the open elements of the World Wide Web. Used by search engines such as Google to index sites. Again, and some like Googlebot are quite sophisticated, they’re really not that clever and relatively easy to manipulate if you know what you’re doing. This is the problem. Putting aside textual interpretations because the arguments against that are the same as with a hardware interception approach, if you ever try to use Google’s Image search you’ll find that on the whole it’s OK, but it’s not really that good.
The reason for this is because it relies not specifically on looking at images but details like their filename, alternative tags etc to determine what they are. Great if you call you image “man-being-beheaded.jpg” and tag it’s alternative text as “man being beheaded by terrorist group in iraq” but the problem is that if you’re filtering content and the people who produce the images know this, they’re not going to tag and label the file that way. They’ll call it “fluffy-rabbits.jpg” and an alt description of “us playing with fluffy rabbits in corfu” or some such description.
The problem you have with this approach is that by trying to protect young people from harmful images you are doing the opposite. Whereas unless you daughter or son was specifically looking for images of beheading which odds on they’re unlikely to be anyway, when your daughter decides to Google Image for pictures of fluffy bunny rabbits she might just get a slightly different image than she anticipated and you’ve proverbially shot yourself in the foot.
Have any of these methodologies of filtering the World Wide Web been tried and how did they work out?
Well, there’s China of course, that well known promoter of freedoms and democracy and strangely enough a rather recent example from an English speaking democracy. Australia to be precise.
You may wish to have a word with Michael Malone, the boss of iiNet who are Australia’s largest ISP. He’s getting his company to participate in the tests on net filtering for the sole purpose of trying to show how completely impossible the scheme is.
As far as can be told, the Australian methodology relies on the use of a simple blacklist. A set number of sites that are deemed to contain inappropriate material and are therefore blocked. To a large extent the UK already has an industry led scheme that uses blacklists and is implemented by the Internet Watch Foundation. See the aforementioned article on child pornography for possible problems with this approach. To be fair to the IWF they generally do a good job but the methods by which ISP’s block content is laughably easy to circumnavigate which has been the same criticism of the Australian system.
I’ll digress into a little personal experience here. A few years back I had the responsibility for running a suite of computers used by young people between the ages of 8 and 18. Apart from the reality that I neither had the budget to procure systems for filtering, nor management understood the technical challenges I adopted the simple approach of having completely unfiltered access to the web available. I simply made sure that I kept an eye on what was being viewed in a supervisory role and inspected the log files while knowing who was on which computer at which time. In the whole period I only had one instance of a child viewing innapropriate information, a beheading ironically enough. they knew the rules and they were banned for good.
I mention this because the kids in question also had access to the World Wide Web at school and as you do, you get talking to them. Now despite the age range, the bulk of the kids were circa 12 years old and as time passed they built up respect for me because unlike their school, I didn’t treat them like children and try to limit their usage by filtering the content and they reciprocated by being trusted by not doing or viewing anything wrong but what struck me most was a conversation with a few of them one day.
The gist of which was that they reckoned I was cool because I trusted them and they didn’t have to use web proxies like they did at school to look at various sites which although not offensive, were deemed non-educational and off-limits by their school.
It wasn’t the fact that they’d paid me compliments but that 12 year olds knew about and how to use web-proxies to circumvent the restrictions imposed and Andy, if 12 year olds knew this stuff years ago, the following generations are just going to get smarter because much as adults may wish to try, children’s natural curiosity will win through in the end because they’ll find a way.
Think of it a bit like sex education in schools. I don’t know how things are now but way back when I went through the education system there wasn’t any. Apart from one 2 hour session I recall at about the age 14. I was bored stupid because I’d already been down the library looking all these naughties up in the Encyclopedia of Britannica and I’ll guess a fair number of my peers did likewise, or possibly not given the teenage pregnancy rate at my old school.
The point being that the World Wide Web, isn’t something particularly new in terms of the content that it allows children to obtain, it’s just easier. You and me are of a generation who’s childhoods didn’t have such an opportunity so we read books and looked things up in dictionaries and encyclopaedias but the point is that we wanted to look up that particular topic in the first place so there was a conscious decision to look for it.
In the case of the World Wide Web it is similarly so. Just because various material may be available, doesn’t necessarily follow that people of a young age will go looking for it. If they are looking for it then no amount of filtering or blocking will stop them because unless their parents are as technically savvy as myself the kids are likely to know more than their parents, much the same as my parents never managed to understand how to programme a VCR but to me it was simple.
So where are we back to? The technical angle. Let’s do blacklists versus whitelists now.
There is a very simple method by which you could enforce such a system and that would be to institute a whitelist. So instead of blocking content deemed unsuitable, you simply block everything until say the author or owner of a site applies for a license to allow their site to be viewed, it can be assessed, classified and then unblocked.
Apart from the whole proxy argument that could be instigated to circumvent this approach, assuming you’re not blocking proxies, many of which run through academic institutions around the world to promote freedom of speech and access to people who live in dictatorial countries where their governments heavily censor the internet, that’s China, North Korea, Iran and maybe who knows, Australia one day.
If you are going to block said institutions then the UK would be joining a rather interesting group of countries that usually we’re not to keen to be associated with, at least in terms of issues surrounding freedoms anyway.
So lets assume you can do this. You’ve shifted all responsibility for registration and presumably cost over to the Website publishers. This is indeed the current model with the BBFC. They don’t receive public funding but derive their revenue from charging publishers of whatever media to review their content and classify them.
What’s the problem with this approach?
Apart from effectively being a tax on anyone who dares to publish content of any kind on the World Wide Web we return to the impracticality of it on the basis of us dealing with a dynamic media. Does every post on every web forum constitute new content and if so, is it the responsibility of the owner of the site or the commentator to seek classification on every instance and at what cost.
I don’t like to use ‘it’s the end of world as we know it’ analogies but put simply, this would kill the World Wide Web overnight as it currently stands along with all the investment, innovation, discourse, along with it. Not to mention that the UK’s a bit of a melting pot for these new innovations on the old Web and we earn a nice bit of money in exports from it which is rather handy considering we don’t earn much from manufacturing and for some reason financial services aren’t in demand these days like they used to be.
So whitelists, apart from again being technically impossible would kill the World Wide Web along with a large number of jobs and possibly leave us with some sort of on-demand one way content from major corporations like the film and music industry and that would be pretty much our lot.
On to blacklists now and they’re the most popular in terms of implementation and feature in a number of countries including the UK.
As mentioned before, our blacklist system is run by the industry itself through the IWF who work on a complaint basis. If someone complains about content, they investigate it, consider the content according to the current legislation and make a decision as to whether blacklist the page. Note that they blacklist individual pages as was the case with the recent Wikipedia article on the Virgin Killers album by The Scorpions. ISP’s who are signed up to this voluntary system then redirect requests for these pages to proxy servers that display a simple 404 error message.
Without going into the ins and outs of the system we have in the UK, nor the patently easy methods with which it can be circumvented by anyone with a modest amount of technical ability which even 12 year olds I’ve come across in my time have, there is the issue of who decides. In our case in the UK it is effectively an unaccountable, unelected industry body and there are many problems with this approach but to be fair the IWF, with the exception of the Wikipedia case, they have been doing a good job up until now to prevent inadvertent viewing by the general public of content that is deemed illegal according to UK Law.
Of our fellow European Union countries, the only one I will confess to knowing details about blacklisting is Finland who similarly have a system although in their case the decision falls to an individual law enforcement officer. He’s not a popular chappy and there have been severe criticisms over the decisions taken to blacklist certain sites. Most notably lapsiporno.info which was set up by a hacker called Matti Nikki. To be fair, you really shouldn’t call your site childporno (lapsi is Finnish for child btw) but irrespective of the registered domain name, it is banned, seemingly not because of any content that contravenes the law, but because the site was established to criticise the nature of censorship in Finland.
I don’t use phraseology such as slippery slope but I’m sure you can understand the problems of where such censorship may lead to. Openness and freedom of speech are fundamental to a functioning democracy as indeed is dissent. If websites are blocked, not because they contain actual content that breaches the law but because they criticise governments then we lose a part of our democratic heritage that so many have died to achieve.
So where are we at now?
Well, content filtering isn’t going to work. Blacklists and whitelists don’t work. Registration of every single website would destroy the Web as we know it and along with it a lot of export money and jobs. Forget that any attempt to implement such a system will bring you personal ridicule. Forget that the time-scales of trying to classify what is currently available run into millennia, lets pretend we can and see how much this might all cost.
We’ll return to our man hours on the basis of 1 minute per page because if we’re going to do this right, it has to be per page, not domain. Let’s get our imaginary classifying person on a normal working day of 8 hours and be a bad employer and not give him a lunch hour, holiday leave or the weekend off. I’m not sure what the going rate for such a job should be but lets say something to the tune of a box-standard local government administrative post and say Â£15,000 per annum.
So we’ve got 19,025,875 years of work. Let’s factor in the 8 hour working day which takes us up to 57,077,625 working years. Let’s times that by our annual salary cost and we get Â£856,164,375,000. Let’s not forget, that’s a pretty quick scan of webpages, some are long and might take a bit more than a minute to read and interpret, not to mention there would probably be more time taken to do the paperwork.
So if that’s our approach, a rough figure of at least Â£856Billion. I’m not sure this is the best time for looking for additional funding for a Government department but I’d rather you than me run that one past Alastair at the next spending review stage.
Let’s look at an automated system. Well, actually we can’t because there’s not a single piece of information available into the projected implementations and costs of such a system nor do the various manufacturers seem particularly open about how much their little boxes cost. However the unofficial word on the grapevine is the going rate for individual filter boxes canÂ be around $25,000 and you’re going to need lots and lots of these.
Is there any good quality review material out there I can point you in the direction of?
There’s a bit of a problem here because there’s not really much freely available information to go because those countries that tend to go in for all this net-filtering also tend not to be the most open and transparent in terms of their governance. Nor have any generally considered liberal democracies decided that this is a particularly good idea, apart from, of course Australia.
In the case of Australia there have been a couple of reports done over time and in particular two this year. I’ll point you in the direction of the first one which was published in June and commissioned by your opposite number the Stephen Conroy, Minister for Broadband, Communications and the Digital Economy.
It makes for interesting reading and on the surface appears to suggest a strong argument in favour or automated content filtering. It notes that the technology has come a long way since a previous report was commissioned in 2005 and tests a number of products aimed at ISP level, in this case tier 3 level filtering.
They’re a mixed bag in terms of differential effect to network traffic flow. Ranging from (and we’re talking in active not passive mode here) nearly no degradation for one product up to over 75% degradation for others.
Forgive me at this juncture because the report doesn’t list exact figures in tables but uses un-gridded bar charts so if I’m a few percent out in my reading then I apologise but the results are interesting for the different categories of content.
For those not inclined to read the whole report, category 1 content is the really naughty stuff that’s already on a blacklist like child porn, category 2 is content that might not be deemed suitable for kids, so violence and general pornographic material and category 3 is innocuous content to test primarily for false positives.
On an overall view it’s interesting to note that in general, the products tested that caused the most amount of degradation to network traffic were the more accurate so there seems to be a correlation between accuracy and disruption to service which is not unexpected.
Looking at them, the product ‘Delta’ that caused the least amount of degradation only managed to identify and block around 88% of category 1 content. However product ‘Alpha’ which caused a 75% degradation in traffic catches what looks like 98% of category 1 content.
So yes, if you’re happy to have your broadband connection slashed down to 25% of what you get now then perhaps it’s possible to block 98% of all the really nasty stuff that’s out there.
Looking at category 2 data and this really is the area we’re interested in because the issue isn’t outright banning of illegal content because we effectively have that already but being able to restrict data that may be deemed unsuitable to people based on their age group.
Here there are a wide range of variations. Best performing was product ‘Omega’ and ‘Beta’ at 98% however ‘Beta’ is once again a heavy degrader of service whereas ‘Omega’ runs at a 20% drop in speed.
On to category 3 content which is innocuous so shouldn’t really be blocked. Here we have anything from a blocking rate of 2% to 10%, the most inaccurate being product ‘Delta’ the one that doesn’t degrade speed as much and ‘Gamma’ which although the most accurate in this category seem to have the highest detrimental effect on speed at about 85% degradation.
So what can we conclude from this study?
There’s a payoff between accuracy and speed. If we want our internet connections to run circa 15-20% of their current speeds we could with the latest equipment get accuracy of blocking well into the 90% bracket and false positives as low as 2%.
However, this is an entirely controlled environment with a selection of URL’s numbering just above 3,000. How such systems would integrate into real real ISP’s with vast arrays of different equipment and how they would cope with potentially a trillion web pages is anyone’s guess because it’s not been tried.
Here’s where we come right up to date because even a few days ago this issue has been hitting the headlines in Australia because there’s another report part produced by Professor Bjorn Landfeldt of the University of Sydney.
It was published in February 2008 but it’s not been made public by the Australian Government. The reason may be that it is highly critical of the practicalities of implementing such web filtering schemes and as we can’t get a hold of the actual report, here’s an article from the Brisbane Times for a bit of background information.
Getting away from the technical aspects now and on to the more political, economic and societal implications of such a scheme.
We’re into simple cost benefit analysis here so lets start with a focus on economics. Are there any economic benefits from such a scheme?
For this section we’re going to assume some sort of automated approach to net filtering at ISP level akin to the approach being taken by the Australian Government and discard the idea of individually assessing every single bit of English language content on the web by hand because I think it’s fair to say that such a methodology is completely impractical.
Short of limited employment benefits as regards needing people to install and implement such a system, there aren’t many economic benefits from this scheme. There are however a few rather large economic costs to be considered.
At present there is a desire by the Government to encourage ISP’s to invest in next generation delivery as regards improved bandwidth, the laying of fibre to the box if not fibre to home to bring us up to the competitive level already enjoyed by countries such as Japan and South Korea. Fundamental to our future competitiveness as a country is the installation of these newer network grids and equally the costs run into billions.
As it stands, the position of the Government is that ISP’s should be responsible for implementing and absorbing the costs of such upgrades along with the previous Government plans to make ISP’s responsible for enforcing content restrictions around various material, in particular potentially copyright infringing music shared via P2P connection despite this being in contradiction to EC Directive European Directive on Electronic Commerce Directive 2000/31/EC Article 12 btw.
None of these schemes effectively mandated upon ISP’s are cheap to implement and ultimately the cost will be passed on to the consumer in the end. Relatively speaking in the UK we have a very competitive and cheap to enter marketplace for broadband internet connections perhaps represented in our relatively high broadband penetration rates compared to other countries. This is a good thing and in particular allows even those on lower incomes to participate in a public sphere and gain access to information to which previously there were discriminatory barriers.
Apart from the inherent cost implications of roll-out there are also the more significant effects that such a system will have on the latency of the UK’s broadband network as a whole. If we take the Australian study and forget for the moment any criticisms that it was prepared for a Minister who’s policy it is to implement such a system we can see that irrespective of the fail rate which are unacceptable across all the products tested that of those, the ones that were the most accurate effectively quartered the speed across the network.
Returning to global competitiveness, even if we were to establish a nationwide fibre optics network that currently would put us in the region of speeds up to 100Mbits per second, if we were to factor in anything around a 75% degradation of service we would be held back to speeds of around 25Mbits per second which are already reasonably widely available through ADSL2+ services offered by a number of ISP’s.
If such degradations in services are to be expected then, and I will give you a personal example. I currently pay Â£7.50 a month for an 8Mbit connection (it may be something a little cheaper this month with the VAT cut) and fair play to my ISP because I always get my full speed that I pay for which is very nice compared to some of the poor services from others. If such a system were to be introduced and I wished to continue to receive speeds of 8Mbits per second then I would have to upgrade my package. At present my ISP offers a top speed of 24Mbits per second which means that even if I paid the full price for this which would double my monthly bill, I might only be likely to receive around 6Mbits per second.
That’s an example on a practical consumer perspective scale but equally if I were to be paying for an 8Mbit connection and my ISP still delivered this speed then I would be receiving a service that at current cost ratio should be more and if that cost were to be absorbed by my ISP then they are going to have to pass it on at some point in the future. Either way, the consumer will be the ultimate loser.
To be fair, there is an aspect that has to be considered here. ISP’s continue to upgrade their networks constantly, introducing new equipment that increase speeds and performance. It could be argued that over a set period of time if ISP’s are due to introduce new equipment that will upgrade their speeds and performance that degradation even of a high level could be off-set.
However in this scenario and lets just image that over a 5 year period, equipment introduced by an ISP is going to double the speed of the network. So if a content filtering system is introduced which degrades the system by 50% in a 5 year period the end user will notice no difference to their current speed and presumably no increased cost because of increased efficiency.
If we were to go down that route in the UK then perhaps it is palatable to the end consumer. But if, and lets pick a country at random, France introduces these new technologies without any kind of content filtering then within that same 5 year period they will have a internet infrastructure twice as fast as our own.
We already struggle in some respects against competitors in the far east as regards network speeds but to handicap an entire industry at a time when the UK is a pretty exciting place to be would be silly. If we don’t continue to be competitive in this area then those very same people who chose to be based in the UK may decide that other countries offer them better opportunities and because of the nature of the industry, these people are extremely mobile. In terms of their work they can be anywhere on the planet as long as it have a reasonably good internet connection.
Just to prove this point. I’m penning this post, not from my little house in the West Midlands where I’m normally based but in Brandenburg in Germany, about 30km from the Polish border to be precise. I can look out the window and there’s not another single house as far as the eye can see. I am pretty much on the edge of nowhere (there’s the village in the opposite direction) but in terms of large swathes of the work that I do, I’m just as effective sitting here as I am at home. Please bear this in mind. These relatively young people in an industry that the UK’s one of the world leaders in are mobile and highly sensitive to restrictions. They’ll earn us lots of money over the years, we should try to keep them, not drive them away. That is not also to mention the increased demands on bandwidth that will be required in the near future to support video on demand services like the BBC iPlayer, an area where we lead the world and should be rightfully proud of.
Political and societal implications:
I’m sure there will be plenty of response to your interview involving phraseology such as ‘Big Brother’ and ‘police state’. I’ll refrain because such as seems to be the vogue at the moment such phrases are over used without due diligence to their real meaning.
However there are civil liberties implications to such a system and I’m by no means a no holds bar libertarian type but they are important. Questions on such as system arise along a number of themes, some of which have no precedence in human history.
We will start with the who chooses argument. I am going to assume that we would be talking an automated system as the human involved one would be infinitely too costly or time consuming. As such a BBFC style classification system would not be applicable. However it is important to note how the BBFC works and where ultimate responsibility lies.
As mentioned before, the BBFC is non-governmental but ultimately it does not have arbitrary powers over censorship in the UK. That legal responsibility lies with local authorities who are in one way or another answerable to the electorate via the ballot box.
As unlikely as it is that an issue over film censorship would feature in a local election campaign, there is democratic accountability within the current censorship system in the UK relating to films, videos and video games.
I can from memory only recollect one film in recent times where there was an issue of dispute between the BBFC and local authorities and that related to the 1996 film ‘Crash’ by David Cronenburg which depicted scenes of people deriving sexual stimulation from car accidents or something like that. I did only watch the film once and it wasn’t really my cup of tea. However at the time the Daily Mail newspaper ran a campaign calling for it to be banned. The result was that after a number of specialist reports the BBFC passed the film uncut for screenings. However Westminster City Council refused to allow it to be screened along with a small number of other local authorities around the country.
Whatever you may think about about that individual film or the motivation by certain local authorities, or indeed the credence that moralistic crusades by the Daily Mail newspaper should be given, the fact remains that democratic institutions had the final say.
Here is where we have a problem with content filtering the World Wide Web. There is (assuming an ISP implemented automated approach) no human judgement involved. We would be handing over decision making to machines of which the complexity of their algorithms are unlikely to be understood by all but the most accomplished mathematicians.
There is nothing necessarily wrong in delegating various processes to machines. It happens in many fields and industries from auto-pilots in aeroplanes to robotic arms in factories. However in those circumstances the programming behind the process only need to deal with relatively simple tasks and interpret a small number of variables or information given to them from other sources.
In the case of judging suitability of content, in particular images and video just according to a set legal criteria, irrespective of what should be considered suitable for different age groups, the levels of subjectiveness and nuances are outside of the scope of capability of any machine currently or likely to be available any time soon.
As such, we should be very concerned about the automation of censorship. Even within the controlled environment tests conducted in Australia there were false positives and that on a small selected batch of URL’s.
What structure could be put in place to ensure that where ‘www.mybikinishop.com’ gets barred because it contains images of women modelling underwear for sale to compensate the owner for loss of business? Would the responsibility fall on the ISP’s? Could the owner sue for loss of business or some form of associated defamation for being included on a blacklist along with child porn? This is important and again runs parellel to the desire of the music industry to mandate ISP’s to identify and ‘inform’ users about copyright infringement.
ISP’s should not be in a position where they as private enterprises enforce the law. We do not have someone from British Telecom listening in to every conversation made across a landline in the UK to step in and tell people that they shouldn’t be using the telephone to organise a bank robbery and that cross analogy of transmission service provider should be the same for internet traffic as much as it is for British Telecom and the Royal Mail.
We should just note that there is nothing wrong for the Royal Mail, British Telecom or ISP’s in cases where they suspect that illegal activity may be going on to inform the proper legal authorities but ultimately, action and retribution should lie within the legal judicial system because fundamental to liberal democracies is the concept of rule of law, not some form of quasi-corporate legal enforcement.
Does this mean there isn’t a place for industry? No, it doesn’t and as mentioned before, through the IWF we already have a system whereby ISP’s block access to pages that contain content that is deemed illegal in the UK. However if part of this desire to restrict content on the World Wide Web is to block this material then the system is already in place and on the whole works but equally like any system, it’s one that a 12 year old could circumvent with relative ease.
The state versus family argument:
As I mentioned, I’m not an all out libertarian type, I don’t necessarily accept that the parent is always the right person to make choices for their child because although I’d place myself in the reasonably well informed and educated parent bracket, there are plenty of ignorant parents out there who’s actions work to the detriment of both the development and health of their children. I’m sure that statement might put me at odds with many people who defend the ultimate right of the parent but when you’ve witnessed parents shoving MacDonalds french fries down the throats of their obvious under 1 year old children at Wolverhampton bus station then you might just get my drift as it were.
So we do have good parents who will take an interest in what their children do online and we’ll have parents who won’t give a damn, we’re not going to escape that position. The question is then, to what extent the state should interfere within this particular context of the family?
I’m not against the state interfering in family life specifically, but it should only do so where there is compelling and documented evidence that by doing so, it would be of benefit to the children. I’m going to return to actual documented evidence a bit later. So to give an example, I wouldn’t have a problem with a scheme such as compulsory school meals (preferably free) whereby children have at least one healthy meal a day.
There is overwhelmingly compelling empirical evidence relating to diet and the effects it has on a child’s future health prospects. There is a current and growing documented problem with obesity in the UK and despite the fact that I only ever had one school meal (in my day the choice was chips, or chips, or chips hence why I came home for lunch) myself and as a parent would make sure my own children eat healthily I wouldn’t have a problem with such a mandatory scheme.
However in the case of some form of age rated World Wide Web content filtering, classification or censorship there are a great deal more problems and these also relate to how we govern. Is this a case of perceived threat versus actual reality? It is inicative and probably a case of those in charge of the media being of a different generation that doesn’t quite understand but here’s an example.
Every few weeks at least, there will be something on the news relating to the horrors of the internet. These can be anything from someone committing suicide, to bullying, or sexual abuse of a minor. The issue isn’t really important. What is important is that if for example the people in question were perhaps discussing suicide in a chatroom, or using Facebook to meet up to plan a bullying or found in possession of pictures of naked children downloaded from the web, there is an inference that the ‘internet’ is somehow a causal element.
The internet is a network for the transmission of information, simply and purely that. The media never reports that the person who committed suicide was reading a lot of dark and depressing novels at the time so therefore it’s ‘books’ fault. Or that the gang of bullies co-ordinated their actions by texting each other via mobile phones so mobiles are to blame. (Note, when the proliferation of mobile phones first started to the point where large numbers of children had them, elements of the media did report in this fashion but as we’ve now got the ‘internet’ there’s a new scary bogeyman in town to blame so they leave the poor old mobiles alone these days). Or that the person who sexual abused a minor was on a secret mailing list (as in post) for hard copies of child pornographic imagery as I assume is how it used to be done before the internet came along, therefore it’s all the fault of the Royal Mail.
What we find ourselves in now is a period of admittedly rapid change and probably a paradigm shift in terms of our species. I find it all rather exciting and in many respects I wish I were circa 10 years younger to fully appreciate it but I fully relish seeing how my kids grow up surrounded by the ability to have almost unlimited information at their fingertips.
For my generation we had to trundle off to the library and were limited by what was in stock. We had no opportunity to contribute towards the discourse of our society and this for the vast majority of people continued throughout their lives. Now almost everyone has that ability and far from being afraid of it, we should embrace it’s possibilities because they are truly staggering.
That is of course not to close ones eyes to the nasty side of what is out there. The question is, as a parent, what do you do to make sure this does not have a detrimental effect on your own children’s development? I have deliberately refrained from using the word ‘protect’ in this context.
Most parents want to ‘protect’ their children from harm and suffering (some don’t and even are contributing factors to harming them sadly) that’s why on the whole most parents get their kids inoculated against various diseases, why we teach them how to safely traverse the streets, not walk in the road and look and listen before crossing the road. Mine is the ‘Green Cross Code’ generation because whereas my parents grew up in a time where motor vehicles were streaming constantly up and down their road. For my generation with the proliferation of ownership and use of cars came new dangers that needed training to avoid and for my own children yes, there are new dangers although hopelessly over-estimated that come from using different facets of the internet but censorship is not the answer.
Information and education is and always be the answer in these cases. For my generation it was the green and white dressed bloke with the lollipop and latterly although on a different topic, images of tombstones falling with word AIDS emblazoned on them.
Public information campaigns don’t always work and with the proliferation of media are more difficult to deliver than in the past. You can’t just mandate a slot just after Eastenders and know you have half the population of the country watching because they don’t anymore. You have to be clever about the delivery and ironically in this context embrace the new media and understand how it works to get the message across.
Another personal example if I may as one parent to another. My son is two. At somewhere just past one he got into DVD’s, he still is although he’s moved on to other things which constitute a far larger amount of his time these days.
Like most small children he likes animated films although he’s rather partial to Star Wars as well (likes Yoda). One of the first films he really liked was the Disney animation Lilo and Stitch, you’ll probably know it. However after a couple of weeks we noticed a distinct deterioration in his behaviour. Tantrums and the like which seemed out of kilter to how well behaved he was before. After a few days of me and missus scratching our heads I finally realised that if you observe the behaviour of Lilo in the film, her screaming and tantrums then I proposed that he may be copying that behaviour. The DVD mysteriously vanished after that and hasn’t been seen since and his behaviour returned to normal.
The important thing to note is that in the UK Lilo and Stitch has a ‘Universal’ rating meaning suitable for all. Yet and take this as anecdotal evidence if you will, that film induced copycat behavioural issues with my son that would be considered a problem. Perhaps I should start a campaign to have it reclassified to PG but this instance illustrates where both the problem and solution lies.
Perhaps we shouldn’t have introduced him to the film but it was rated as ‘Universal’ so there shouldn’t be a problem with it according to the official BBFC rating. However as parents who aren’t overbearing but keep a distinct eye on how our children develop we acted as the censors in this case by removing access to the media. We had a slightly smaller problem, again with another Universally rated Disney animation, Ratatouille which induced hair pulling. However in that instance he was a little older, understood language better and we could then explain that it was wrong and he understood.
Another distinctly non-new media example. Our son again has developed an obsession with jigsaw puzzles. He loves them, he’s really fast and worryingly for me, sometimes faster than I am. Most toys, jigsaws included, have age suggestions on them. He’s doing jigsaws that are apparently for 5+ these days. Same would be true for Lego. He loves it but the age recommendations say he really should be sticking to Duplo. However he’s costing me a fortune in the normal little Lego at the moment building up a collection which might soon surpass the amount I had by the time I was 10 and I won a load in a Lego building competition when I was four to help build up my collection.
The rationale behind these examples is that kids do develop differently and at different rates. If we as parents are to allow them the freedom to learn at their own pace, take interest in the things that they do and even allow them to take quantifiable risks because that is the way we learn then we have to.
An age rating system whereby content a website may carry a certain age classification rating is only as good as the parent standing behind the child monitoring the online activity in the first place and if the adult is already there then they can decide what is suitable for their child.
Where any problems occur is when the child is completely unsupervised. However if we are talking some form of content filtering then the solutions are already in place. There is a plethora of software available for parents to download whereby they can restrict what their children see. The problem is that they don’t.
This is where you have you end up with people proposing arbitrary content filtering at ISP level but there lies a fundamental problem. You either restrict content across the board for everyone including adults which is outright censorship and unless your China, North Korea or Iran probably isn’t where you want to be or you end up with a similar scenario where the ISP’s offer some sort of staggered filtering system whereby the parent has to proactively enable, disable and basically do something to make it work.
If you’re not at the stage where the parent is aware of the importance of and if they lack technical ability or the desire to pro-actively monitor their children’s online activity then what is the likelihood they’re going to make use of such a system anyway.
If the desire to protect children from perceived or real harm through online activity then you must first start with educating the parents as to its importance.
I’d like to point you in the direction of a post I penned back in february 2007 which highlights a campaign about online child safety I picked up from German TV (the advert in question is embedded into the post from YouTube, it’s good, I’d advise a peek). I’m not aware if we’ve had any comparable campaigns in the UK regarding this issue as we don’t generally watch British television at home but it was part of a pan-European EU backed campaign which I think represents a better approach, that of public awareness for parents.
This is a purely generational problem in that the vast bulk of parents who are in the 20-30′s age groups are unlikely to have grown up with the access to the internet at home or school. With the exception of parents in their lower 20′s and teens they will not have grown up with access to the internet and their knowledge and understanding if they have it will have primarily come from university, work or subsequent home usage as adults.
I’m going to wind up this post here as it’s now longer than my bachelors degree dissertation and I was going to try and highlight any correlation of actual risk from impirical studies as opposed to real risks that affect children like being hit by cars but while trying to find any figures or studies for actual harm caused to children by going online I got round to reading the Byron Review.
Here’s the link. I’d advise reading it, it’s very good and talks specifically about effectively what I have as regards to parental awareness and the need to allow parents the choice but opportunity to use home based filtering systems. I think she talks a lot of sense and the recommendations of the report seem very practical. In particular the report accepts that the IWF does a good job in blocking illegal content which in general I would agree with and interestingly enough concludes that ISP level blocking should not be used for pretty much the same reasons I’ve outlined and that some sort of mandatory classification system for websites would be ineffective, not feasible and at best a partial solution.
On a personal note, here’s an open invitation. If you would like to have a chat about anything I’d be happy to pop down to London to discuss things with you. I’ll even shout you a pint at a local watering hole if you fancy.
With best wishes,
A belated happy Christmas and a happy New Year.