ExpertFinder 2fEFW2007 2fSlot2 5fBenchmarkDatasets

From FOAF

Jump to: navigation, search

ExpertFinder Breakout Session 16-Jan-2007 Berlin

Attending: Dan Brickley, Andreas Harth, Aidan Hogan, Tereza Iofciu

This group discussed how to acquire and use benchmark datasets for expert finding. A complete crawl of the RDF web would be a good starting point to create a benchmark dataset. Additionally, one could use the crawled dataset to create statistics on the usage of the FOAF vocabulary for the purpose of fixing/changing the vocabulary to determine the amount of damage done by the change. Changing the FOAF vocabulary amounts to changing the meaning of potentially hundreds of thousands of files on the Web. Analysing the dataset would make it possible to "fix the boat while sailing" with minimal amount of damage incflicted on the already existing data files.

Some thoughts about the analysis that could be carried out:

* basic statistics: how many files have a first_name + lastName property and also have foaf:name?
* determine foaf data from different sources, e.g. LiveJournal, tribe, ecademy, vox
* find deployment patterns (how often are RDF files linked from within HTML pages vis link rel tag, how often linked with rdfs:seeAlso property, etc.)
* how often do people describe themselves vs. describing other people in RDF files.
* what makes a "FOAF file"? ratio of foaf triples to total triples in a file? better term would be foaf description
* foaf:interest links to: Wikipedia urls? papers? documents? dmoz?

What is the relation between Recommender Systems and Expert Finding? A list of experts can be used as an input to collaborative filtering systems. So determining experts on a topic could be one ingredient in a recommender system.

What is the relation between Expertfinder and Google for locating people? When would you need to locate people? e.g.

* find staff for a company/department?
* find experts in academia, medical experts, lawyer etc.
* get a landscape of important people in a given area - to find collaboration partners, etc.

Issue of different languages: english wikipedia vs. chinese wikipedia

Responsible to report: AndreasHarth

(Note Axel: the TREC enterprise track data, for might be interesting in this context) - Yes, a good starting point. However, the TREC enterprise track dataset mainly consists of HTML pages, whereas we would like to focus on more structured data.