This is a rough start at a tutorial on RDFa. It is incomplete. The idea is to introduce RDFa's basic processing model, rather than starting with markup. It needs finishing with examples. It would also be good to explain how HTML5 Microdata uses essentially the same model but differs in some detail. Danbri.org 08:39, 17 January 2010 (UTC)
Contents |
Draft Intro to RDFa
Most Web pages are trying to tell you something. Many try to sell you something, convince you of something, or give you links to more information. RDFa is a technology for making sense of all this. It provides a simplified version of the basic factual claims that a page makes, stripped of flashy colours, pretty pictures or seductive language.
RDFa makes it easier to compare, integrate and summarise information from diverse sources. To do this, it uses some techniques for simplifying the information in a Web page into a form that is easier for computers to deal with, but still useful for humans.
When people read Web pages, they can usually understand most of the words used by the authors and publishers. But in practice on the Web, they'll often skip and skim across a page, looking for the main points. Web users are notorious for not reading everything. Computers don't have the luxury of common sense; they can't really make sense of anything. Nevertheless, RDFa is a little similar: it defines a way for computers to quickly scan through a document, picking out the main points. Now, since computers are way too dumb to understand human language, RDFa works as a set of extensions to HTML, the 'markup language' Web pages are published in.
Just as HTML already defines markup for saying which bits of a document are links, where images should go, and how things should be presented, RDFa goes a little further. RDFa lets a publisher add a few extra hints for computer readers of a document.
RDFa lets you specify what the current focus of a phrase or paragraph is, and it lets you annotate text and Web links to say something about the relationship between the current focus of the text, and that bit of information. So for example, a paragraph about a DVD might mention the director of the movie by name, or by linking to their blog; or a sentence about a city might mention it's current population or it's Mayor. Lots of stuff on the Web fits this pattern: you have some text talking about something, and then some information about that thing.
These three ideas are at the heart of RDFa:
- at any point in the document, it can have a 'focus' - ie. the thing (real or online) that the document is about.
- at any point in the document, links or bits of text can tell us about the relationship between that main thing and other things; or about properties of that thing.
- at any point in the document, the focus can switch and it can shift to being about something else.
So when a document is seen from RDFa's 'point of view', it looks something like this:
BLAH BLAH BLAH BLAH Bristol blah blah blah lord Mayor blah blah Chris Davis blah blah blah 421,300 blah blah blah.
Computers don't have smarts to figure out what we're on about, whether we write in English, Japanese, Croatian or any of the other thousands of human languages used online. It sometimes makes sense to help them with little clues; hints to tell them what each paragraph or even sentence is actually about. At any point in the flow of words, RDFa lets us drop in one of these aboutness hints, informing computers when the topic of conversation has changed. Is 421,300 something to do with Chris Davis? or the city? or the country? In RDFa properties and relationships are never left floating around freely; they are always attached to something; the thing that the markup at that point was about.
Maybe in one sentence our focus is the country called 'the United Kingdom'; a few paragraphs later, the text becomes more specific, and is about the British city Bristol for a while. And perhaps it then becomes still more specific, talking not about the Lord Mayor in that city, before mentioning the population of the city. Computers could easily get lost without some help!
An RDFa tool skimming our page throws away everything except for extracted links and properties attached to a list of things described in the page. Since RDFa is a Web technology, it uses Web links (URLs, informally) for most of this.
RDFa Markup
This section can be skipped by people who don't care about the details of HTML document markup.
When you read about the markup rules for RDFa (or its close cousin, Microdata), bear in mind this description and it may help you understand what's going on.
Each bit of RDFa markup is either an aboutness-hint, to remind computers which thing we're talking about. Or it's a way of decorating existing markup to make its meaning clearer to our simple-minded computer friends.
All RDFa is expressed using attributes on existing HTML (or XHTML) elements.
about-ness hints
- if you see about=, then the focus has switched to be about something different.
- the value of the about attribute gives a Web ID for that thing.
- if instead you see typeof=, the focus has switched but we didn't have a handy Web identifier for the new thing we're talking about.
- the value(s) of the typeof attribute carries one or more Web IDs for categories - like Person, Film, DVD, City etc - that our new un-identified focal thing falls into.
properties and relationships
- if you see property=, the markup inside this element is telling you about some specific property of our current thing-we're-talking-about.
- the property attribute contains one or more Web IDs for specific properties; age, height, population etc...
- if you see rel=, you should find an href= link nearby; this annotation tells us more about the relationship between our main thing and some other linked thing.
- ...
(to be completed :)
Markup stuff is from memory; I might be remembering earlier versions of RDFa. Needs checking...
To Do
- real example
- images
- larson 'what a dog hears' cartoon? blah blah ginger blah blah