ScutterStrategies

From FOAF

Jump to: navigation, search

Strategies for focused, subject-oriented and/or limited but open scuttering...

With LiveJournal adding FOAF output to their setup, the estimated number of public documents containing statements involving the FOAF vocabulary approaches 3 million, totalling something like 6 billion (6.000 million) triples. (These numbers are approximate per March 1. 2004. For comparison, the Redland MySQL backend stores approximately 7 million triples per GB, depending on graph structure.) Even without counting RSS and the results of transformations from various other syndication formats, a simple scutter is no longer able to find, retrieve and store everything.

As such, strategies for limiting scutter activity is needed - following every link (rdfs:seeAlso's) is no longer possible or desired.

Contents

[edit] Naïve Scuttering

Follow every link.

[edit] Personal Scuttering

Starting with one's own FOAF file, doing a breadth-first scutter N levels deep.

This type of scuttering is best suited to determine the network around a specific user. It allows you to build a network of relationships. This is the type of method that TouchGraph uses to build graphs - first load one user, and all the data around that user, then, upon request, load the same for others. Good for building informative networks including communities grouped around a central point.

...

[edit] Member Scuttering

Following links from a community's member's FOAF files, recursing N times from "inside" links.

...

[edit] Link Weighted Scuttering

Assigning weight to links, only scuttering links with a certain weight.

...

Note that link weighted scuttering could be combined with document weights, by making the document-specific weight of a link depend on the document itself.

[edit] Document Weighted Scuttering

Assigning weights to documents, only following links from documents with a certain weight.

...

[edit] Document Weight

Document weight should be a number expressing the quality of the document. The notion of quality is subjective, but the following evaluation points for a document could be a starting point:

[edit] Good

Statements about the document itself:: This includes foaf:topic, foaf:PersonalProfileDocument, foaf:primaryTopic, foaf:maker and the like.
Statements about the (primary) topic of the document:: To encourage the use of foaf:topic and foaf:primaryTopic.
Document signatures:: To promote trust and verification, documents with attached WOT signatures should be preferred over others.
Depictions:: Statements involving foaf:depiction, foaf:depicts, foaf:img and similar constructs (co-creation, co-location) should get boosted, to show that implicit connections are better than explicit.

...

[edit] Bad

rdfs:seeAlso's:: Links themselves don't add value.
Descriptions of other persons:: Documents with a bunch of explicit foaf:knows don't present any intrinsic value. These statements are typically not authoritative anyway, and decreasing their importance makes it possible to avoid LiveJournal to some degree.
Anonymous relations:: Resources without a URI or IFP are of no use when trying to make connections, so these should be discouraged. As with explicit foaf:knows, this should take care of some LiveJournal issues, as the interest specifications only have a title.

...