Maybe we don't need Named Graphs

In this post I'll put forward an argument that perhaps the "web of linked data", and thus RDF(2)/OWL(2), doesn't need any concept of Named Graphs.

This is quite a dry subject, and I could be wrong (in fact in some ways I want to be proved wrong, this is how we learn), but do read on if you're interested.

Example

Over the past few months I've hit on a number of occasions where I was convinced I needed Named Graphs in order to address the task at hand.

A notable example is the scenario where using WebAccessControl and the ACL ontology, a system would have to figure out just who should be given access to a resource, and who should be denied.

In this example I'll cover the notion of ACL for "groups" in a linked data world.

The task at hand is to allow access if:
the graph serialized within the document obtained by dereferencing the URI of the group states the <webid#me> is a member.

Otherwise written as:
if we dereference <groups#admin> does the graph returned include the following { <groups#admin> sioc:has_member <webid#me> }

Or in SPARQL:

ASK
GRAPH <groups> {
  <groups#admin> sioc:has_member <webid#me>
}

In this example we *do not* want to dereference the users webid to see if the graph returned specifies that { <webid#me> sioc:member_of <groups#admin> } , or indeed consider the open world possibilities that another yet unknown graph could assert that the user is a member of our admin group, as that would breach security.

The ACL

To proceed with the example, consider the following ACL:

[] a acl:Authorization ;
	acl:accessTo <https://example.org/sensitive> ;
 	acl:agentClass :mygroup ;
 	acl:mode acl:Read .

:mygroup owl:equivalentClass [
 	a owl:Restriction ;
 	owl:hasValue <groups#admin> ;
 	owl:onProperty [ owl:inverseOf sioc:has_member ];
 	] .

The Problem

The problem proposed by this ACL is that any of the following four sets of triples would infer that <webid#me> would qualify as an instance of :mygroup (or a member of <groups#admin> if you prefer).

  • <webid#me> sioc:member_of <groups#admin> .
  • <webid#me> _:x <groups#admin> .
    _:x owl:inverseOf sioc:has_member .
  • <groups#admin> sioc:has_member <webid#me> .
  • <groups#admin> _:y <webid#me> .
    _:y owl:inverseOf sioc:member_of .

In other words, the ACL does not specify a "Named Graph" to query, and at the moment, no way exists to specify with (OWL or RDF) which "Named Graph" to query / trust.

This, point in case, is one example where I saw the need for Named Graphs in RDF and OWL.

Another way of looking at it

You will have noticed the notion of "Named Graphs" creeping in above, seems like a logical thing to say, especially when you consider that to process this ACL and grant access you'd probably use SPARQL, and specify a Named Graph to query over. However, much of what follows arose because I'd decided not to use SPARQL, and rather to code an ACL processor in my preferred language.

If you consider the situation, the ACL processor which decides if access should be granted or not, must implicitly "trust" the document which contains the serialized ACL graph. That is to say, that it must by extension trust any resources pointed to by said ACL, and if it doesn't then the ACL isn't fit for the purpose.

It's also important to note that "trust" is context specific, in this case we trust the resources pointed to by the ACL for the purpose of WebAccessControl.

One could then pretty quickly conclude that in this scenario the ACL processor already know's how to process the ACL, it must only use resources it trusts, therefore it must only allow access if the graph serialized within the document obtained by dereferencing the URI of the group states the <webid#me> is a member.

(because <groups#admin> is specified in the ACL, and thus by extension, trusted)

Named Graphs in SPARQL

The aforementioned logic would also apply if I was using SPARQL to process the ACL, it would equate to the ACL processor asking:

ASK
GRAPH <groups> {
  <groups#admin> sioc:has_member <webid#me>
}

But again this is very context specific to the example, let's consider for a moment that the URI for the group could have been a non-fragment URI, <groups/admin> for example.

This leads us to an important problem, when we dereference <groups/admin> it would have to 303 See Other through to a different URI, let's say <data/groups/admin> - which would then mean that the Named Graph to be used was <data/groups/admin> - this URI, you may note, we do not know when we are writing our ACL; so if we ASKed the above SPARQL, the results would always come back negative, since their is no GRAPH <groups>.

The URI of the Named Graph issue is compounded by modern web servers and publishing practises, because <data/groups/admin> could easily be content negotiated (or rewritten), thus giving various final URI's of <data/groups/admin> or <data/groups/admin.rdf> or <data/groups/admin.ttl> or <data/groups/admin.n3> and so forth. One could quite easily (and often does) end up with the same Graph repeated multiple times within a quad store, all under "different" "Named Graphs".

I'll expand on a possible way of addressing this problem further on.

Directionality

Previously I mentioned that the ACL processor didn't have a problem with the above ACL, because it by nature trusted all resources which were mentioned in the ACL graph. However, again this is very context specific.

Let's consider for a moment an inverted ACL, where we want to allow access if:
the graph serialized within the document obtained by dereferencing the URI of the users webid states that <webid#me> is a sioc:member_of <groups#admin>.

We don't know the users webid ahead of time when we write the ACL, so again we have no way of writing how to trust a resource - it is critical to note that even if RDF(2) did support the concept of Named Graphs, it still wouldn't address the situation because we wouldn't know the Named Graph ahead of time, in order to trust it!

If we now consider the following ACL:

[] a acl:Authorization ;
	acl:accessTo <https://example.org/sensitive> ;
 	acl:agentClass :mygroup ;
 	acl:mode acl:Read .

:mygroup owl:equivalentClass [
 	a owl:Restriction ;
 	owl:hasValue <groups#admin> ;
 	owl:onProperty sioc:member_of;
 	] .

The outcome of our previous logic concludes that again we should be querying the "trusted" resource <groups#admin>, which gives us another problem, that's not the resource we want to be asking in this scenario.

The only thing that remains, and I'll later argue the only thing that ever matters in a web of linked data, is direction.

If we analyse the first ACL closer, we can see that we ultimately used the direction inferred by the presence of owl:inverseOf to place <groups#admin> in the subject position, rather than the value/object position it could have been in, indicated by the presence of owl:hasValue. (bare with me).

In this example, we can use the strong semantics of owl:hasValue (and lack of owl:inverseOf) to place <groups#admin> in the value/object position, and thus our ACL processor can come to the outcome we want, which is to look for the a triple with the meaning { <webid#access> sioc:member_of <groups#admin> }, and that means dereferencing the URI in the subject position, in other words asking the graph serialized in the document returned by GETting <webid> if it contains such a triple.

I've applied some understanding to OWL that quite simply isn't there though, as I earlier stated both ACL examples could easily equate to looking for any one of those four sets of triples.

However, this is the point - machine understanding of data is in the domain of the machine, the application doing the processing. And "truth" or "trust" is entirely context specific.

I'm increasingly convinced that the combined context of the data in a graph and the context under which that graph is being queried, specifies or infers in which direction you want to be reading, and directionality can be determined with linked data by dereferencing whichever uri you place on the left / in the subject position.

I recently found that Tim Berners-Lee wrote about this in a blog post entitled Backward and Forward links in RDF just as important:

One meme of RDF ethos is that the direction one choses for a given property is arbitrary: it doesn't matter whether one defines "parent" or "child"; "employee" or "employer". This philosophy (from the Enquire design of 1980) is that one should not favor one way over another. One day, you may be interested in following the link one way, another day, or somene else, the other way.

Key here is the sentence "One day, you may be interested in following the link one way, another day, or somene else, the other way.", and that is exactly what all these examples are doing, following a link one way, or the other way.

To conclude this part, in every scenario thus far where I've thought I needed Named Graphs, it turns out that I in-fact needed directionality - and because I'm dealing with Linked Data, whatever I place in the subject position defines the URI which I need to dereference, and ultimately the Graph(s) which are considered when resolving the answer to the question being ASKed.

I'd thus suggest that "Named Graphs", do not exist in a web of data, they are needed in N3 and when using rules, because all data is often in a single file, however that is not the case for Linked Data, where we dereference.

Back to SPARQL and Named Graphs

Previously I mentioned the complications with the way we currently use named graphs in SPARQL and in our quad stores, where the URI we end up using could literally be, anything; and often we get duplicate data under different graphs.

To address this, I'd suggest that what we should be storing as the graph ?g value, is not some made up "named graph" but rather: the dereferenced URI which we initially requested.

  • in the case of <group#admins> this would be <group>.
  • in the case of <group/admins> this would be <group/admins>

To clarify, *never* the URI that a GET request finally resolves to, and *always* the initial dereferenced URI we requested.

The above ensure that we'd never have duplicate data in our quad stores again, that SPARQL queries including a FROM clause always dereferenced, that publishers and web server administrators were free to relocate and restructure their data, and ultimately make for a much nicer, healthier web of data.

Cool URIs don't change, and they wouldn't, just because the final document serializing a graph may move to a different URI, doesn't mean the original URI has to change.

Conclusion

Apologies for the length of the post, but I figured everything needed covered, in context. Simply put we need to focus less on Named Graphs (which IMHO aren't needed) and focus more on directionality. Every problem I've encountered thus far is covered by what Tim said years ago: "One day, you may be interested in following the link one way, another day, or somene else, the other way."

Comments?

Digg This
Reddit This
Stumble Now!
Buzz This
Vote on DZone
Share on Facebook
Bookmark this on Delicious
Kick It on DotNetKicks.com
Shout it
Share on LinkedIn
Bookmark this on Technorati
Post on Twitter
Printed from: http://webr3.org/blog/semantic-web/maybe-we-dont-need-named-graphs/ .
© Your Name Here 2012.

5 Tweets

8 Comments   »

  • Henry Story says:

    If you are still wondering whether you need named graphs, then you should stop, and simplify your task a lot. You cannot merge all the information in the web without just getting junk.

    See:
    "Keeping track of context in life and on the web"
    http://blogs.sun.com/bblfish/entry/it_s_all_about_context

    "Beatnik: change your mind"
    http://blogs.sun.com/bblfish/entry/beatnik_change_your_mind

    "Are OO Languages Autistic"
    http://blogs.sun.com/bblfish/entry/are_oo_languages_autistic

    Just ask yourself what would happen if you believed everything everyone told you? You believed what the Scientologists told you, and what the Jehova's witnesses told you, and the scientific community, and the politicians of all parties. None of them would say that you understood what they were saying in the end.

  • Hi Nathan,

    Just got the time to read your post. My first reading is that we should decouple the ability to name a graph and the ability to identify the source of a graph.

    My initial mistake in RDF/XML source declaration was to use provenances and names as interchangeable notions but they are not. For instance, we should be able to name a graph distributed over several sources and we should be able to have several graphs in one source (say several graphs in one file).

    The second important point in my opinion is that we should make a difference between a graph and the document or store that records it. Using the same URI to name the graph and the document that contains the graphs leads to properties about the document (ex. encoding) being applied to the graph and vice versa ; the content is not the file.

    For this reason what I really want when I argue for named graphs is a mechanism that allows us to explicitly name the RDF graphs not a default system using sources, file names, etc. A model of the RDF graph and a syntax that allows to precisely name the graphs and choose the triples that belong to it.

    Cheers,

  • Seth Russell says:

    Your ACL example does use named graphs. In your example the name of the graph *is* the document URI. Now it may just be that the only practical way to do ACL today is to trust only the triples asserted within one particular document. But i don't follow how that proves that we don't need a standard way establish a URI to name a collection of triples. If you can *trust* all the triples that obtain when you dereference the URI of the context, whether that is a document URI or not, then you can secruly grant access. That we might not know how to do that yet unless the context is completley contained with one document is beside the point.

RSS feed for comments on this post , TrackBack URI

Leave a Reply

Additional comments powered by BackType

  • webr3 avatar