Today, we are announcing that we are Beta test complete on PageScan – our proprietary approach to building a “Blacklist that Learns.” As any user of a web filter is aware, these products have two different ways by which they maintain their blacklists:
- Maintain a static list of websites that are updated periodically.
- HTTP response parsing to infer unknown bad content.
The first approach leaves you vulnerable to newer websites that may not be part of your blacklist. The second approach has severe performance limitations.
PageScan is able to accurately block new content and meet the strict performance constraints of modern K-12 networks. We accomplish this using the following approach:
- The first time we come across a site that is not in our database, we let it through.
- We then fetch it offline and scan the response content for keywords indicative of inappropriate content.
- Our first pass is designed to produce False Positives. We narrow things down with a second pass done with an API call to a third-party service that we have identified as “best-of-breed”.
- If the site is identified to be inappropriate for kids, we add it to our blacklist.
With this approach, we can recognize an inappropriate site accessed from a school in the UK and use that intelligence to benefit a school in Texas. We are able to identify –with high accuracy– sites belonging to the following categories: porn, drugs, gambling and proxys. Our beta tests have shown an extremely low False Positive rate.
We have decided to release this at the end of September for the following simple reason: we would like to get through the start of school, making only minimal changes to our core system in the interim. Changing the way we handle our blacklist is fundamental to how our system works and the risks of releasing early may outweigh the benefits. As always, should you have any questions, please reach out to firstname.lastname@example.org.
For more updates on new features and product releases, sign up below: