The Semantic Web is an ideal in which every piece of content comes with a meta description about what it is. But even 25 years after the invention of the WWW, the Semantic Web is still far away. The blockchain opens a new chance to implement the dream of Tim Berners-Lee, the inventor of the Web.
Do you remember how hard it was to get meaningful information in the time before the Internet? You had to order a book, usually not knowing if what arrived days later had the answer to your question. Or you visited a library – for millennials: that’s a public building with many books that you can borrow. The books you wanted usually weren’t available.
What a relief the invention of the Internet was. Or was it? Let’s say you want to find out how to launch a successful crowdfunding (an ICO) for a new cryptocurrency. You ask the almighty oracle Google: “How to make an ICO”.
A lot of what you get is:
- Multiple slightly different worded copies of the initial opinion
- Advertising disguised as an opinion disguised as information
- Misleading opinions
- Answers to questions you didn’t ask
- Aged or ageless information in an ever-changing environment
What you don’t learn is how to make an ICO. You might find all the raw data but you are starting with a huge handicap:
You don’t know the right questions. So you won’t find the right answers.
It’s a long and steep learning curve to identify which questions to ask. You will also have a hard time qualifying the findings.
We still have the half-knowledge we had until the 1990s, but this time we have the illusion of knowledge.
We could all be smart.
“A new form of web content that is meaningful to computers will unleash a revolution of new possibilities”. Thus spoke Tim Berners-Lee in 2001. He should know since he invented the Web. But why would we want computers to find content meaningful?
“On the internet, nobody knows that I am a dog”, says a famous cartoon from the stone age of web-based communication. When we enter the term “ACME running shoes” into Google, what do we get? Is it the ACME shop or Amazon, or an infomercial about the shoe? Is it runner’s blog or is it even an article on the bad conditions of workers in the ACME factory? Google can guess, but without some sort of metadata, it can’t know.
Metadata is data about data, like the information in your passport describing your name, features, and nationality. People can be classified and so can every piece of data in the world. Google should know what a particular piece of content is, and we humanoids should have a language to ask for what we want.
But it ain’t that easy. If you want to receive the critical journalistic article on the ACME shoe factory you have to make a lucky guess for the words such an article might use. Google offers you a button “Feeling lucky?” that gives you a random result. Very often, our searches feel random.
What is behind the term “Semantic Web” is the solution to the randomness: all content is classified. Users (as well as computers) have access to the taxonomy and know how to search. Looking for the critical article:
“Who: ACME running shoe company;
type of content: journalistic article;
topic: labour laws”.
Then you can feel lucky because you get what you were looking for. But we don’t search this way. So we can deduct, that 25 years after the invention of the WWW and 17 years after Berners-Lee’s statement, the Semantic Web has not come around yet. A holy grail of efficient data management. King Arthur is still out looking.
Author Cory Doctorow has an explanation for why we still search in the dark. “Why would I create truthful metadata if it doesn’t benefit me?” Lying is cheaper than telling the truth. Everybody has the highest incentive to look good on the Internet. For this we pretend, exaggerate, lie, steal, embezzle. Or we work the machine.
As Cory Doctorow writes: “Observational metadata is far more reliable than the stuff that human beings create for the purposes of having their documents found. It cuts through the marketing bullshit, the self-delusion, and the vocabulary collisions.”
Have you ever wondered why the Internet is a cradle of mediocrity? The reason is search engine optimization. It is a technique to predict the keywords people might use to find something and create a text that is keyword-friendly more than it is useful.
If a system demands everybody to play ball and be nice, that system will have a hard time.
The truth and nothing but the truth:
Semantic data forces content to be truthful about itself. If it lies, it can be penalized. Mind you, it doesn’t prevent the author from lying in the text. But with semantic data at least you know what you are getting. Exiting times have come in the decades-long fight for a smart web: machine learning can help to auto-classify content. Combined with the immutability and incorruptibility of the blockchain you can create a new standard for global data exchange.
We at Pacio are doing this in the field of management data. But more about that in a different post.