Restarting Linked Data from scratch, part 2

This post is part of a series, following on from my earlier post Restarting Linked Data from scratch, part 1. In this post I'm going to take the first step by trying to approach publishing and exposing linked data RESTfully.

I'm assuming that if you are reading this, you know what linked data is, and REST as per the dissertation of Roy T. Fielding. If not go do some reading :)

Interface Constraints

REST is defined by four interface constraints:

  1. identification of resources
  2. manipulation of resources through representations
  3. self-descriptive messages
  4. hypermedia as the engine of application state.

From here I'll look at each of these four constraints and build up the approach as I go.

What a resource is

Quoting extensively from REST 5.2.1.1 Resources and Resource Identifiers:

The key abstraction of information in REST is a resource. Any information that can be named can be a resource: a document or image, a temporal service (e.g. "today's weather in Los Angeles"), a collection of other resources, a non-virtual object (e.g. a person), and so on. In other words, any concept that might be the target of an author's hypertext reference must fit within the definition of a resource...

A resource is a conceptual mapping to a set of entities, not the entity that corresponds to the mapping at any particular point in time...

The values in the set may be resource representations and/or resource identifiers...

A resource can map to the empty set, which allows references to be made to a concept before any realization of that concept exists...

The only thing that is required to be static for a resource is the semantics of the mapping, since the semantics is what distinguishes one resource from another...

What a representation is

Again, quoting extensively from REST 5.2.1.2 Representations:

... A representation is a sequence of bytes, plus representation metadata to describe those bytes. Other commonly used but less precise names for a representation include: document, file, and HTTP message entity, instance, or variant...

If the value set of a resource at a given time consists of multiple representations, content negotiation may be used to select the best representation for inclusion in a given message...

The data format of a representation is known as a media type

Identification of resources

To do this properly I need to identify some resources, so for this I'm going to work with "Something" :)

  • "Something" - a resource, a non-virtual object

At any point in time I have a description of Something which has multiple representations in different mediatypes, all semantically matching or equivalent:

  • "something.rdf" - representation of Something with mediatype RDF+XML
  • "something.n3" - representation of Something with mediatype RDF+N3
  • "something.en.html" - representation of Something, in english, with mediatype text/html
  • "something.de.html" representation of Something, in german, with mediatype text/html

Each one of those representations is also a resource because they can be the target of a hyperlink. Of course by resource I mean a conceptual mapping to each of the things listed, and I haven't assigned URIs but will..

To be able to make this set of representations manageable and to indicate they are in a set, I'm going to add in another resource which is a collection of resources, which can be considered a set of these equivalent representations of Something at a fixed point in time. For the purpose of this exercise, that point in time is today.

  • "Something-20100311" - a resource which is a collection of equivalent representations of Something on the 11th March 2010.

Additionally, for the sake of argument, I'm going to say that a new set of representations (or version) is added every day - to handle this I then need one more resource, a collection of resources, where each resource in the collection is itself a collection of resources (one of the aforementioned and including the example "Something-20100311"). This will give me a conceptual mapping which covers time, and therefore everything I could need.

  • "Somethings" - a resource which is a collection of resources, see above for full description!

Finally, I'm going to add in two shortcut resources which have no representation and are simply conceptual maps to the first and most current sets of representations.

  • "first" - a resource which always maps to the first collection of representations of Something.
  • "latest" - a resource which maps to the most recent collection of representations of Something.

Giving the resources URIs

Now to assign some URIs for this use case, there is no set structure and I'm not going to define one because it is up to each server (or manager of) to control it's own URI space, but for the sake of this exercise I'll define mine as follows:


base: http://data.webr3.org
...
/d/Something
/rg/Somethings
/rg/Somethings/first
/rg/Somethings/latest
/rg/Somethings/Something-20100311
/rg/Somethings/Something-20100311/something.rdf
/rg/Somethings/Something-20100311/something.n3
/rg/Somethings/Something-20100311/something.en.html
/rg/Somethings/Something-20100311/something.de.html
...
/rg/Somethings/Something-20100305
/rg/Somethings/Something-20100305/something.rdf
/rg/Somethings/Something-20100305/something.n3
/rg/Somethings/Something-20100305/something.en.html
/rg/Somethings/Something-20100305/something.de.html
...

From the above you can see that every possible representation has its own URI, in addition every collection of equivalent representations has its own URI, as does the collection of all those collections; and so does "Something" our non virtual object.

Also we've exposed multiple resources which could also be RESTful CRUD access points operating on an atompub style protocol. Small sentence, big potential, will cover approaches and protocols in later posts.

The Key resource

The most important thing, which I haven't yet covered, is that we've exposed a key resource, namely /rg/Somethings. This is a resource at the top of the representation chain which can be used to expose content negotiation, be it server or agent driven (or a mix of both), and regardless of the mappings and levels of collection further down the line this can always be a single point of entry to get representations.

I'll cover just how in a moment, but for now something important.

Important

I've had to give a fixed example just to make some progress, but we have to remember that every system has different needs, in some cases it may be that there is only a single fixed representation for a resource, whilst in others each strand of representation (like something.de.html) may take it's own versioning / temporal path. This could indicate that a structure such as the following may be in order:

...
/d/Something
/rg/Somethings
/rg/Somethings/first
/rg/Somethings/latest
/rg/Somethings/Something-20100311
/rg/Somethings/Something-20100305
/rg/Somethings/Something-rdf
/rg/Somethings/Something-rdf/20100311.rdf
/rg/Somethings/Something-rdf/20100305.rdf
/rg/Somethings/Something-html-en
/rg/Somethings/Something-html-en/20100311.html
/rg/Somethings/Something-html-en/20100305.html
/rg/Somethings/Something-html-de
/rg/Somethings/Something-html-de/20100308.html
/rg/Somethings/Something-html-de/20100303.html

The above highlights that whilst we may have added more resources, the core resources are still the same; remember that they are "conceptual maps", meaning that Something-20100311 may "map" to the version of en-html on the 11th and de-html on the 8th, because the de version was written first, then translated to english and from there rdf and so forth, but they are all semantically equivalent, containing the same information even though they were created at different times.

The Conceptual Maps are as follows, from what I can tell this should always cover any use-case, no matter how complex.


Thing 1-1 CollectionOfCollections
CollectionOfCollections 1-* CollectionOfEquivalentRepresentations
CollectionOfEquivalentRepresentations 1-* Representation

aside:At times like this I wish I'd had a chance to study computer science so that I could express these things formally, so you'll have to make sense of it as best you can :( sorry.

Exposing via Content Negotiation

In my research so far, I've been able to figure out how to expose all of the aforementioned via HTTP, RESTfully using content negotiation in a manner which seems to be transparent to existing web browsers, but exposes all the information needed in a manner that is visible to machines; without using any additional extensions headers. As follows:

1 The client does a normal GET request on our "Something", notice that no content negotiation is happening yet, we are simply asserting via a 303 "that the requested resource does not have a representation of its own that can be transferred by the server over HTTP."

#Request
GET /d/Something HTTP/1.1
Host: data.webr3.org
Accept: text/html;q=0.5, application/rdf+xml


#Response
HTTP/1.1 303 See Other
Location: http://data.webr3.org/rg/Somethings

2The client does a GET on the URI we specified in the Location field, namely to our key resource that can be used for content negotiation over all the representations.

#Request
GET /rg/Somethings HTTP/1.1
Host: data.webr3.org
Accept: text/html;q=0.5, application/rdf+xml


#Response
HTTP/1.1 300 Multiple Choices
Location: http://data.webr3.org/rg/Somethings/latest
Content-Type: application/xhtml+xml
Content-Length: 17400


<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML+RDFa 1.0//EN"
"http://www.w3.org/MarkUp/DTD/xhtml-rdfa-1.dtd">
<html xmlns="http://www.w3.org/1999/xhtml"
xmlns:foaf="http://xmlns.com/foaf/0.1/"
xmlns:dc="http://purl.org/dc/elements/1.1/"
version="XHTML+RDFa 1.0" xml:lang="en">
...

Here's where it gets interesting and clients can take different routes; first the route of the typical user agent:

User Agent Route


#Request
GET /rg/Somethings/latest HTTP/1.1
Host: data.webr3.org
Accept: text/html;q=0.5, application/rdf+xml


#Response
HTTP/1.1 307 Temporary Redirect
Location: http://data.webr3.org/rg/Somethings/Something-20100311


#Request
GET /rg/Somethings/Something-20100311 HTTP/1.1
Host: data.webr3.org
Accept: text/html;q=0.5, application/rdf+xml


#Response
HTTP/1.1 302 Found
Vary: Accept
ETag: W/"xyzzy"
Last-Modified: Wed, 11 Mar 2010 12:45:26 GMT
Content-Type: application/xhtml+xml
Content-Length: 17400
Content-Language: en
Content-Location: http://data.webr3.org/rg/Somethings/Something-20100311/something.en.html


<!DOCTYPE html...

First you can see that the user agent simply goes straight through to the most recent content and what they expect to see; which is nice, with additional Server driven content negotiation.

Further, we can see that full cache control is in there which as we know speeds up the net, and further still we have a rather nifty "weak" entity tag; this entity tag is shared by all representations which are semantically equal, and asserts they are equal via the entity tag. It's also worth noting that you could add this entity tag to your RDF graphs and further assert provenance which could come in very handy down the line for POST and PUT implementations.

To recap, common user agents just go straight through to the expected resource via server driven content negotiation and can take full advantage of cache / control data.

The Machine Route

Back at 2 the server returned a 300 Multiple Choices as soon as /rg/Somethings was requested. All important was that the entity returned was XHTML+RDFa (although this could have been Atom or similar..), which means we can give both a human and machine readable list of all our various representations, the "machine" can then select which one it finds most fitting. The choices could be expressed using any suitable ontology; and further both Alternative and Link headers could be added if publishers wished.

I think that covers it all, if there are any errors or things I've missed please do let me know asap; but for now that'll do me - it's verbose, but I like verbose - prove it works then optimise it later :)

Digg This
Reddit This
Stumble Now!
Buzz This
Vote on DZone
Share on Facebook
Bookmark this on Delicious
Kick It on DotNetKicks.com
Shout it
Share on LinkedIn
Bookmark this on Technorati
Post on Twitter
Printed from: http://webr3.org/blog/linked-data/restarting-linked-data-from-scratch-part-2/ .
© Your Name Here 2010.

1 Comment   »

  • Hi Nathan!

    I don't think there are any errors here, but I do have a couple suggestions:

    * Keep in mind TBL's URI Axioms, especially...

    Axiom: Opacity of URIsThe only thing you can use an identifier for is to refer to an object. When you are not dereferencing, you should not look at the contents of the URI string to gain other information.

    * As a sanity-check, consider how one would configure Apache to implement these; see e.g. Apache's docs on Content Negotiation.

    * I think you over-emphasize the "user" in User Agent, but keep in mind I've been yelling at people for talking about "clicking on DOIs" for fifteen years, too! But seriously, browsers are merely "machines" operating on the user's behalf, playing their role in the HTTP transaction based on (a) hard coding, (b) configuration and (c) user response. So I think less emphasis on human interaction, perhaps a single unified user agent section, would make the discussion less confusing ;)

    John

RSS feed for comments on this post , TrackBack URI

Leave a Reply

Additional comments powered by BackType

  • webr3 avatar