Will Cogito Make The Semantic Web A Reality?

There’s been outbreaks of murmuring for quite a while now about “The Semantic Web”. If you don’t know what the Semantic Web is, think of it as the same Internet that you already know and love, but with a semantic layer placed over it so that you can know it a whole lot better and love it to death.

What would the semantic layer do, that would make such a difference?

It would be able to understand the meaning of all the billions of documents and web pages out there, and consequently, it could find information and web sites you were seeking out with far greater accuracy than Google-on-good-day-traveling-down-hill-with-the-wind-behind-it.

I’m about to claim that a product called Cogito from an Italian Company called Expert System can go a long way towards doing that, but in order to make the claim let me first explain the problem Cogito solves, in linguistic terms:

Most human languages are what Noam Chomsky called “Type 0” languages. Generally, this means that in an English sentence there are no hard and fast rules for “targeting” (i.e. modifying). Any word or collection of words may in various situations modify the meaning of another word in the sentence. Consider the sentence:

He hit the man with the broken stick.

He may have used the broken stick to hit the man, or he may have hit the man who happened to have a broken stick. It can mean either. It’s easy to come up with other ambiguities. A famous one is:

Fruit flies like a banana.

It may refer to “fruit flies” liking a banana, or possibly being ordered to like as banana (imperative form). It could be describing fruit flies as being like a banana, or it may be making a claim about the aerodynamic properties of fruit. When you combine individual words that have multiple meaning and you have variety in the way targeting occurs and you happen to be a computer you find it hard to pin meaning down.

Finally let me introduce my favorite sentence ever, which was created by Noam Chomsky:

Colorless green ideas sleep furiously.

There’s no ambiguity here, but there is a mass of “illegal targeting”. Ideas, being conceptual rather than physical, cannot have color so both the adjectives here “colorless” and “green” are invalid, when used with this noun. The two adjectives are also contradictory and thus cannot target the same noun, even when the noun is a valid target. The noun “ideas” cannot target the verb “sleep”. Ideas don’t sleep, and even if they did, sleep isn’t something that can happen “furiously”. So in this sentence we have a very highly condensed lack of meaning.

What’s a computer to make of it?

But, wait a minute. Let me force some meaning into this sentence, without changing a word, just by applying metaphor.

Let’s say that I’m assembling a collection of ideas and suggestions on how to improve the environment (green ideas). I note that some of the ideas have attracted a great deal of attention in the media and sparked enthusiasm (huge windmills to draw power from hurricanes), but the less colorful ones have not (collect egg shells to build miniature methane farms). There’s no doubt that, given the lack of enthusiasm for the colorless ideas, that they will not be implemented – even though many of them would be very effective. Hence they sleep, awaiting perceptive individuals to adopt them. And it’s an angry sleep. Damn it! If only these ideas were implemented then greenhouse gasses would diminish, glaciers would grow in size and polar bears would gambol through the arctic ice flows in much larger numbers. Clearly, colorless green ideas sleep furiously.

That, pretty much, illustrates the problem of meaning, but it also illustrates the solution. What I did, to give the meaningless sentence meaning, was to give a it great deal more context.

What computer software can do to identify meaning is accumulate as much data as it can about ambiguities and multiple meanings and resolve those by using context, before attempting to analyze the meaning of an item. Words that are found together give meaning to each other and they also help to resolve ambiguities.

Expert System’s Luca Scagliarini demonstrated to me how Cogito, his company’s software, resolved the meaning of the sentence:

The car eats gas if you put your foot on the gas.

The first word “gas” is estimated to mean gasoline because of the word car and the verb eats (used here metaphorically). The second “gas” in the sentence is estimated to mean accelerator also because of context.

Cogito includes a full semantic net which covers all words in the English language and the contexts in which they have been used. Because it has this semantic net Cogito is capable of doing some very useful things:

It can classify blocks of text under specific headings according to any given taxonomy.
It can search an extensive text for specific items of meaning rather than just words.

It can even answer questions from text. Luca asked Cogito a question and had it look for an answer in the Wikipedia. The question was:

Why is the sun yellow?

And to my surprise Cogito produced a relevant answer.

So I asked it “What causes the Aurora Borealis?” and it pulled a scientific explanation from the Wikipedia. (Of course, all of us except the Wikipedia, know that the Aurora Borealis is caused by the Goddess of the Dawn dancing with the God of the North Wind).

Luca also randomly pulled up articles from CNN and had Cogito pull out the “3 most important sentences” This corresponds to automatic summarizing.

Luca has promised me a copy of Cogito to play with, so I’ll have more to say about the product in future. Right now, though, I’m convinced that this product is the most important product I’ve seen in the area of semantics. It’s way ahead of anything else I’ve looked at. It may even be the most important product I’ve looked at in the last year.

It could make the semantic web a reality, and that would change the Internet (and the world).

Categories: Briefings Tags: , , , , , , , , , Subscribe to RSS feed