Contents

FOAF in 2009

This document describes a personal view of some of the key issues around FOAF in 2009.

Work-in-Progress Draft for discussion, Dan Brickley, 12th January 2009.

Project History, Goals & Direction

The project began in May of 2000, as an "experimental linked information system" called RDFWebRing. This was soon changed to RDFWeb, and gradually the project's evocatively-named starter vocabulary "FOAF" became the main emphasis and the de-facto project name. As of ~2006, everything falls under the Friend of a Friend project banner. Project discussions have always been informal and conducted in public email, IRC, weblog, and wiki systems. The main mailing list is foaf-dev (formerly rdfweb-dev) and the initial announcement and TODO list is worth re-reading in 2009.

The original goals document still provides a good introduction to the project.

The basic idea behind FOAF is simple: the Web is all about making connections between things. FOAF provides some basic machinery to help us tell the Web about the connections between the things that matter to us. Thousands of people already do this on the Web by describing themselves and their lives on their home page. Using FOAF, you can help machines understand your home page, and through doing so, learn about the relationships that connect people, places and things described on the Web. FOAF uses W3C's RDF technology to integrate information from your home page with that of your friends, and the friends of your friends, and their friends..

From the outset, this was described as a "linked information system", taking the phrase from the original document proposing the WWW. In 2009 these issues are more often discussed as "linked data", following another and more recent technical note from Tim Berners-Lee.

The phrase 'linked information' lends itself well to FOAF since it is naturally concerned with information as well as disinformation, partial information etc.

From the original introduction:

Consider a Web of inter-related home pages, each describing things of interest to a group of friends. Each new home page that appears on the Web tells the world something new, providing factoids and gossip that make the Web a mine of disconnected snippets of information. FOAF provides a way to make sense of all this.

In some ways, FOAF and RDFWeb were a thinly veiled wrapper around RDF advocacy and deployment efforts, but they also served to help focus the RDF and Semantic Web developer communities on a number of practical problems.

* provenance (who said what, where we found the data, and how we can be sure of it's source
* identity reasoning (aka 'smushing'); how reference-by-description strategies can help clarify what our data is describing
* practical use of ontology languages - FOAF used DAML+OIL and OWL from the outset
* scalability - FOAF put millions of real-world RDF claims into the public Web, in the days before our tools scaled 
* vocabulary management and evolution: how to refine a vocabulary that is in continuous, world-wide use

The Web in 2000

When FOAF was created:

* "social network" sites as we now know them barely existed, although we had SixDegrees.com and LiveJournal
* FOAF co-founder DanBri had worked on a social-networking-esque site, the Social Science Research Grapevine which had adopted a centralised approach to data creation. This was largely unsuccessful, hence the interest in distributed alternatives. Research into quality ratings and trust in RDF provided more background.
* W3C's initial RDF/XML recommendation was just over a year old.
* There were hardly any RDF/XML parsers, or database systems, or crawlers, or (apart from Dublin Core) vocabularies.

FOAF in 2009

This is not intended to be a complete "state of the nation" overview of FOAF deployment experiences. Rather, it is a checklist of some core concerns that need attention during 2009.

Revisiting Goals

The project has traditionally had loosely articulated goals, perhaps reflecting the loosely-coupled collaborations made possible by RDF's "open world" technology design. People have used the FOAF vocabulary in many ways. Many of the initial topics sketched in the project announcement message have received a lot of attention over the years.

Technical concerns:

* provenance, who-said-what, practical uses of RDF, Web crawling, information linking
* how to handle non-public circulation of FOAF descriptions (XMPP, PGP/GPG, XML Signature, S/MIME, FOAF+SSL, OpenID, OAuth, ...).
* practicalities of RDF/OWL management and  vocabulary evolution

Descriptive concerns:

* how to describe people in an open-ended, privacy-friendly, respectful fashion
* how to integrate large-scale vocabularies (Wordnet, DMoz, Wikipedia, tagging ...)
* how to say 'X' about people/groups/organizations
* how to better integrate FOAF descriptions with vCard/LDAP/hCard/XFN/OpenSocial/PortableContacts, ...

The FOAF project was last truly active as a communal venture between 2000 and 2006/7. Since then we have let things go quiet, although experimentation and collaboration has continued, and there are now many dozens (even hundreds) of research papers addressing FOAF and related topics. This in addition to practical tools, support from search engines (Yandex in Russia, Yahoo! SearchMonkey, Google Social Graph API, Garlik's Qdos, Sindice, ...) and cross-referencing from many other RDF vocabularies and data sets, particular those in the "linked data" community.

During 2008, "Web 2.0" tools, standards and attitude became increasingly focussed on interop and data exchange in the social networking space. The data portability initiative, Microformats community, Google's OpenSocial effort, RDFa, XMPP, Portable Contacts, OAuth and OpenID projects all contributed to a strong sense that a distributed and deeply social Web was possible. Dozens of technologies and projects are working in this direction, and the patterns in which they'll be combined are far from clear.

In this context, FOAF retains a somewhat distinct character. Firstly it inherits from RDF an approach to scoping and extensibility. RDF and FOAF are by nature very open-ended, they encourage you to write, publish, aggregate and consume descriptions that may be gappy and incomplete, or that are extended with descriptive terms drawn from other useful projects nearby. FOAF therefore emphasises an evidence-centric approach to describing friendship networks. Although you can assert in RDF that person X is a friend (or colleague, lover, buddy or employer) of Person Y, the FOAF design and FOAF community has generally taken another direction. Instead, we have focussed on describing less emotive things: the evidence that friendship leaves in the world. Photos, meetings, events, documents, ..., working in the expectation that data aggregators will emerge who sift amongst such evidence to figure out what the public record (and the not-so-public record) is telling us.

The remainder of this document outlines briefly some areas proposed for attention from the FOAF and SW community during 2009.

Things To Work On

Core project

The first distinction to draw here is between the minimally staffed effort of the core project, versus the research and development efforts of the wider community. Most of my (Dan Brickley's) time for FOAF this last year has been on basic infrastructural issues, rather than the vocabulary or technology efforts.

Current Infrastructure:

www.foaf-project.org

Drupal 6.2 since late 2008 (thanks to Stéphane Corlosquet)

wiki.foaf-project.org

MediaWiki 1.10.2 (2007) with OpenID and anti-spam extensions. HELP NEEDED upgrading and exploring Semantic Media Wiki extensions.

lists.foaf-project.org

email lists thanks to Edd Dumbill (active: foaf-dev, foaf-protocols, expertfinder-dev, new: foaf4lib libraries list); HELP NEEDED getting RSS/Atom views of these feeds.

weblog

No active FOAF blog as-such, but new Drupal site aggregates feeds from contributors; TODO: create feed list in wiki. HELP NEEDED. FOAF News on homepage is maintained by Libby Miller and Dan Brickley via Delicious tagging ('foafnews') of relevant content. Old blog posts on rdfweb.org and danbri.org are still online.


IRC logs

IRC discussion is in #foaf and nearby #swig on irc.freenode.net thanks to Freenode, with 24x7 Web-based IRC logs thanks to Dave Beckett.

Version Control

svn.foaf-project.org; two repositories hosted by DanBri at DreamHost (alongside other core sites). One for vocab [1], another for related collaborations.

Vocabulary management tools

Vocabulary management.

FOAF spec uses a tool called SpecGen. A ruby version specgen.rb originally written in 2003, this was converted to Redland-based Python by Chris Schmidt, and is currently being reworked to be pure Python. HELP NEEDED here. Many versions of specgen.py now exist and collaborations are forming; until the tool is stable, the FOAF spec is frozen.

DNS

Gandi is the registrar for xmlns.com and foaf-project.org. Domains are owned by Dan Brickley currently, although a non-profit org is under discussion (but not documented here).


Preservation & Archival

Long-term preservation / archival of namespace documents. This has been discussed with Dublin Core and W3C contacts but no conclusions reached.


Sysadmin

HELP NEEDED here. Getting some basic funded or volunteer support for core systems infrastructure is a goal for 2009.

Core Vocabulary

The FOAF vocabulary has not changed greatly for a while. This was fine as implementation experiences accumulated, but there are now a substantial collection of [2] Open Issues that need attention. Furthermore, the landscape surrounding FOAF is very different now to the nature of the Web in May 2000.

User privacy

FOAF was initially promoted and designed around the concern of the 'machine readable homepage', in a setting where homepages were largely created and maintained by hand or with basic HTML tools. Since 2003-2004, many thousands of FOAF documents have been published from 'social Web' databases, ie. by Web sites on behalf of their users. This situation deserves some careful evaluation. For example, the foaf:mbox_sha1sum construct was initially created for users who were knowingly exposing foaf:mbox and wanted to obfuscate things. It may not be appropriate for sites to publish hashed mailbox informations for their users, unless it is made clear to users exactly what the implications are.

Advocacy / outreach role

Historically, sites have exposed FOAF data in an ad hoc way, without any interaction with the core project. This may on occasion endanger user privacy. Should 'we' take a more active role, enter into dialog with publishers, aggregators, and re-aggregators? What 'best practice' documentation is appropriate at this time?

What should users (direct or indirect) of the Google Social Graph API know about the FOAF data it aggregates? What 'take down' tools are available, or accompanying documentation. What can we expect if FOAF data (accurate or otherwise) shows up increasingly in tools like Yahoo! SearchMonkey?

Short version: for years our main audience was developers. Increasingly, FOAF and similar systems are deployed on a scale where end users will be affected. It is not clear yet what the best strategy is here, other than to be aware that these tools and technologies are no longer 'in the lab'.

User Incentives and expectations

Sometime in 2008, Tribe.net turned off their FOAF feeds. This happened because Tribe's userbase discovered that their profiles were being copied wholesale to another site (explode.us). Furthermore, it may be that Tribe was exposing more in their FOAF data than in their HTML (violating the largely unwritten guideline that public HTML and FOAF should carry more or less the same information, to avoid suprising users like this).

There are various semi-public discussions about how to improve this situation. For example, discussion with the MIT DIG group (eg. see reciprocal privacy proposal). One possibility is to have better vocabulary to describe user expectations for how their data can be re-used (something close to Creative Commons). A fancy elaboration here would be an exploration of the inclusion of advertising or similar material amongst the options a user can demand from aggregators.

Technology and Vocabulary Development

There are many technical directions to explore. The following are suggestions for discussion.

Accomodate vCard / Portable Contacts

The recently developed Portable Contacts schema is a best-of-breed XML and JSON vocabulary for 'contacts data'. FOAF has always been weak on these details, and a better treatment of family / given names, phone numbers, fax numbers, addresses and so on is long overdue. It would be natural to base this on Portable Contacts, and the possibility of integrating FOAF and PC via GRDDL is being discussed on the PC list. Portable Contracts is also now the base schema for the OpenSocial project, somewhat obsoleting work in 2008 to express this in RDF.

OpenID

FOAF has basic support for declaring OpenIDs since 2007, through the foaf:openid property.

There are several other potential interactions.

Pages exposed as OpenIDs (ie. pages whose owner can demonstrate control over easily) might expose RDFa/FOAF, eg. expressing karma-like data such as that published in Advogato's foaf:Group trust rankings.

FOAF properties could be mapping individually to OpenID Attribute Exchange fields.

FOAF 'as a blob' could be expressed as a single AX field.

OAuth

OAuth is an API permissioning system. It is heavily used in the Portable Contacts protocol, by OpenSocial, and by many Social Web sites who want a framework for asking users' permission to expose data. Garlik and (?) OpenLink have implementation experience, amongst others from the RDF world.

OAuth relates to FOAF both as a way of limiting the visibility of FOAF/RDF data, but also as a way of mediating write/publish access.

This is a long-standing problem for FOAF-creation tools. We can use foaf-a-matic to generate RDF/XML for someone, but how to get it onto their website without asking them to trust us with their password (aka the password 'anti pattern').

With OAuth, we could perhaps negotiate a token for posting via webdav, atompub or similar.

foaf+ssl

Henry Story and others on the foaf-protocols list have been very active here, both with exploring the use of FOAF over https: where X509 certification controls what we can see and do, but also in thinking about how FOAF and PGP-style decentralisation can be re-invented in an X509-based setting.

GPG / PGP, XML Signature etc.

We have been experimenting with FOAF and PGP since the beginning. It is time to progress this work, and to link it to more recent developments, such as SPARQL's Named Graph mechanism for indicating provenance, and to other crypto techniques.

Edd Dumbill's foafbot was the first tool to do anything really interesting with PGP-signed FOAF data. See Edd's writeup of the initial FOAF community practice.

Other un-explored avenues include DanBri's early effort to represent in RDF the 'who signed whose key' trust chains from PGP (example).

With the foaf+ssl work, the growing deployment of XML Signature (including in java 1.6), and the W3C recommendation for RDFa in XHTML, many new possibilities present themselves. There is also active work around W3C on signing for HTML and Widgets that we should connect with.

Trust, provenance and information linking

It is time to revisit the Semantic Web layercake, and rethink the trust piece. RDF gives us a model for representing claims about the world, but we have yet to really explore how inter-document connections relate to our ability to trust (or believe) these. Google's Social Graph API is perhaps the first mainstream exploration of these issues that does anything with RDF at Web scale.

The SGAPI now uses Redland's Raptor parser against all the FOAF in Google's main Web index. It merges this with XFN microformat data, and follows identity reasoning chains that are based on the notion of reciprocation between pages. If danbri.org claims (via XFN rel='me') http://identi.ca/danbri/ and vice versa, this is more robust than a one-way claim.

These ideas merit further exploration over general RDF data, for example, a simple use case: proving that somebody worked at an organization. What do we need to know before trusting an RDFa claim in http://www.w3.org/People/Alumni that says DanBri is an ex-W3C staff.

HELP NEEDED here - student project material maybe?

Crawler statistics and REST APIs

We now have several large aggregated bodies of FOAF or general RDF data. There are no general REST APIs yet agreed, beyond SPARQL (expensive at such scale). We also have no way to collect statistics. This is unfortunate since such stats can guide vocab development.


SearchMonkey

Yahoo SearchMonkey deserves more attention.

Dan has a half-finished prototype showing integration of Google SGAPI and SearchMonkey. To accomplish this, it includes a PHP proxy that makes the Google SGAPI JSON interfaces look like the Atom+RDFa interfaces preferred by SearchMonkey.

As of current writing this is of rather academic interest, since SGAPI users a version of Raptor that doesn't handle RDFa, while SearchMonkey handles RDFa but not RDF/XML. So the overlap is tiny (2 or 3 pages); if RDFa gets more widely used this demo could be rather nice.

Conclusion / summary

FOAF-based explorations can go in many many directions. The duty of the core project is to facilitate these, without attempting too much itself.

To make the best of 2009 we need:

- more sysadmin help (eg. mediawiki)
- regular meetings (f2f where feasible and regular online meetings)
- a decision process / calendar for the core vocab
- finish rewriting specgen.py so the spec can be rev'd

These are relatively modest steps, but are the minimum to get things moving again at a healthy pace. Beyond the minimum, collaborations could take us in many directions. My bet is that more attention on provenance, sourcing, and trust is timely, alongside a renewed attention to users (documentation, incentives, privacy issues, ...) as a group who are increasingly encountering this technology.