Inrupt and Solid

Vint Cerf (one of the prime movers in the creation of what became the Internet) is distressed by several aspects of the modern World Wide Web. For one thing, privacy is all but a disappeared concept. For another, your data is not your own; it’s controlled by someone else. Additionally, the huge gatekeepers of the Internet control the web in a way never intended when the web was first invented.

His answer to this was to take a sabbatical from his job at MIT, and form a new entity and technology to put your data and your privacy under your control. In his vision, he appears to believe that when this comes to fruition, it will revolutionize the web. The entity is Inrupt, and the technology is Solid.

Here’s my preliminary take on this.

First, let me say that I am about the opposite of a hero worshiper. When the web was created, Vint Cerf came up with some deeply innovative ideas which have lead to the Internet as it is today. However, I’ve learned that just because a guy does one thing great doesn’t mean everything else he does is also great. Just look at musical one hit wonders. A musician does an album with ten songs on it, but history only remembers the one song that played constantly on the radio. So, give Vint Cerf his due, but that doesn’t mean anything else he comes up with will be great.

It is devilishly hard to determine the gritty details of what Cerf is actually doing. The Inrupt website is long on marketese and short on details. I see a lot of, “We want to take the power of the Internet back from the tech giants and return it to the people”. What I don’t see is anything specific and revolutionary which will do this.

Apparently, in order to take advantage of this, you have to create a repository for your data (called a “pod”) at Inrupt. You sign up with them, and you get a WebID which appears to be just a URL to your site. This is a preliminary arrangement. It looks like in the future, other entities will offer space for repositories. But for now, you do it with them. Oddly, they don’t say anything about this new service being distributed, which it really ought to be in order to proof it against abuse.

As an aside, Cerf seems preoccupied with your privacy, but I don’t see anything anywhere which indicates that your data is encrypted. Seems like if your goal was privacy, that would be one of the things you’d make plain. Maybe I missed the memo.

Cerf seems to eschew the use of databases to house your data. Instead, your data is stored in documents at your site. Having worked with documents and databases for decades, I can tell you that storing data in documents is slow and needlessly complex. I don’t know if he intends data to be stored free form or XML or what. Apparently, there’s some sort of index that tells you where each piece of your data is. But although this type of scheme sounds great in theory, it doesn’t live up to the promise.

When HTML was first created, no one much thought about being able to index the data. HTML was a language meant to determine now data would be presented, not to index or categorize it. Years later, XHTML was created, and the idea was to make the data on the web index-able and categorize it, to make searching and using it easier.

That didn’t work out so well. Instead, web data is more or less just free form text. To compensate, we improved our search engines, which can now take natural language queries.

If you’ve ever looked into XML documents, they’re like gobbledygook. The promise of XML was that you could build documents and anyone else could read them. Each field or bit of data was preceded with some identifier which told you what it was, and everyone would understand the identifiers because they were plain English. Didn’t happen. XML documents are not interchangeable because no one uses the same identifiers for their data. There is no standard for how to identify data in XML. (Contrast EDI, which is more or less standardized. Problem is, a copy of the standard for any document type costs a lot of money.)

I worked with an application a few years ago which stored financial transaction data in XML. Slow as hell. You have to parse each individual chunk of data and examine each identifier to find out if it’s the particular data you want. Parsing text is a slow process, no matter how many CPU cores you point at it. By contrast, database access is considerably faster. You want your data fast? Database.

So your “pod” at inrupt.com will house documents and an index. Again, I’m not sure what makes this “pod” system better than Azure or AWS. I guess there must be some sugar in there to make it double nifty.

How do you access your data? What brilliant new technology does Cerf have in mind to allow you to access and manipulate your data? It’s called Solid. There’s an SDK for it, but it’s still beta. From a quick glance, it appears to be mostly Javascript libraries.

If you’re gonna revolutionize the Internet, don’t put all your eggs in the Javascript basket. My attitude on Javascript has softened a bit, but if this is how you’re going to revolutionize things, you really need to provide a multi-lingual approach. Maybe this is in the works somewhere, but they don’t say anything about it.

Now let’s talk about revolutionizing the Internet, in particular snatching control of the Internet from the tech giants.

Endless billions of dollars are tied up in the business of the Internet. Microsoft, Amazon, Facebook, Google, Youtube, Twitter and the rest. These folks keep, mine, sell and monitor your data. Moreover, these folks provide you with services you actually want in a way that’s easy to use and consume. You can talk to your friends and see what they’re up to on Facebook. You can catch real time comments and news on Twitter. You can watch videos from friends and people you like and admire on Youtube. You can buy almost anything you want on Amazon. You can search for almost any piece of human knowledge on Google. And you want to wrest the Internet away from these people, so you can have privacy and control of your own data?

I have to say Vint Cerf is a little late to the party. The time to implement something like this was in 2000, not twenty years on, when the Internet is owned and controlled by Google, Facebook and the rest. Don’t get me wrong. Like a lot of people, I’m not satisfied with the stewardship of the tech giants over our data and the Internet. I appreciate what they offer us; I just don’t like the amount of control they have and the way they choose to use their power.

I’m all for an Internet revolution. I’m just not sure Cerf’s vision is the way to make it happen. A web hosting service and Javascript API just doesn’t seem beefy enough to defeat the giants of Silicon Valley.

Of course, I’m the guy who said the World Wide Web would never catch on, so what do I know?