Reading List : Web, Linked Data, REST, Semantic Web

Personally, I have two types of reading, the posts etc that I "tweet" and then the heavier reading I do over time; this is a list of the latter for the past month - hopefully it'll help somebody who's looking for the same kind of info I have been.

I've grouped all the links in to two main sections, and then sub-grouped by how they make sense in my head! :)

Web, HTTP and REST

Roy T. Fielding Dissertation - Architectural Styles and the Design of Network-based Software Architectures Of particular relevance and note are chapters 4-6 (many only ever read chapter 5 and miss the context + summary *needed* in chapters 4 and 6!)

Roy T. Fielding - REST APIs must be hypertext-driven
Discussion on HTML5 and RESTful HTTP in browsers
Discussion on URIs Resources and Switching content types w/ REST angle (v good)

RFC 2616 HTTP/1.1 and the HTTPbis Working Group HTTP/1.1 update in parts:

  1. Messaging
  2. Semantics
  3. Payload
  4. Conditional
  5. Range
  6. Cache
  7. Authentication

Linked Data and the Semantic Web

Linking Open Data Community Project
Linked Data Applications
Equivalence Mining and Matching Frameworks
Linked Data Browsers, Mashups and other Client Applications

Dataset Dynamics - On the Dynamics of Linked Datasets
Realizing a write-enabled Web of Data
Web Access Control (WAC) - a decentralized system for allowing different users and groups various forms of access to resources where users and groups are identified by HTTP URIs.
Discussion of the WAC vocabulary
Socially Aware Cloud Storage Design Note
Distributed Social Networking through Socially Aware Cloud Storage from TimBL
AWWSW - "Architecture of the World Wide Semantic Web" Task Force

SPARQL 1.1 Uniform HTTP Protocol for Managing RDF Graphs
A Linked Data Frontend for SPARQL Endpoints
RAP - RDF API for PHP V0.9.6
Inav the Terrible - An idea for posting RDF through HTTP.

Talis Changesets
Triplify Update Vocabulary

skos as atom

RFC 4287 - The Atom Syndication Format
RFC 5023 - The Atom Publishing Protocol
AtomPub Tombstones - The Atom "deleted-entry" Element
RFC 5005 - Feed Paging and Archiving
Versioning Link Relations - Link Relation Types for Simple Version Navigation between Web Resources

Named Graphs, Provenance and Trust
Accessing Site-Specific APIs Through Write-Wrappers From The Web of Data
Provenance Information in the Web of Data - LDOW 2009 paper
Using Reification To Extend RDF (historical reification approach)
RDF Policy-based URI Access Control for Content Authoring
The Open Provenance Model Core Specification (v1.1)
W3C Provenance Incubator Group

History of the Web 1945, 1980 through 1997 on W3
LEIRI - Legacy extended IRIs for XML resource identification The type of "URI" used in xml:base
CSHALS 2010 W3C Semanic Web Tutorial
Mindswap online RDF Converter
W3 online RDF Validator


Human Flesh Search a catalyst for Linked Data generation?

Earlier today I saw an interesting slideshow from James A Hendler which had some focus on Human Flesh Search.

Personally I find this somewhat inspiring, out of all of it's features the thing that I find most interesting is using human power to rapidly "link" data together.

Here's a very quick look at the human process involved:

  1. Assign the subject of the Human Flesh Search.
  2. Send out humans to find data about that subject, on both the web and in the physical world
    • When the information is on the web link to it and perhaps highlight some key points in an abstract
    • When the information is in the physical world identify it, and describe it.
  3. Correlate all data together and reference it to the subject.
  4. Manually cleanse the data to weed out incorrect / wrongly placed data

If this isn't a huge human powered linked data generating and cleansing machine then what is?

IMHO an implementation to cater for this (in early stages) would be extremely simple, for example consider the following:

  • The subject of the flesh search is assigned an Identifier (perhaps under the guise of a tag - eg flesh:the_subject)
  • A simple interface is made, perhaps a tiny browser extension or toolbar button, which when clicked associates the current web page with the flesh:subject (no doublt by quickly sending the URI of the page as a get parameter to a server, at which point a single triple flesh:the_subject predicate uri is stored)
  • For offline resources a simple UI is made to enter in some data (back end obviously stores data cleansed and associated w/ flesh:the_subject)
  • Another simple UI to expand or remove information and "links".

Certainly running this as social style web app would be in order, but would require minimal work from what I can see, and serve as a model for a human powered mass linked data generating machine.

nb: you could go wild with this, sponging data and uri's submitted through various web services, now is a good time to stop I think.

This may be overly obvious, but just in-case I thought it needed said :)


Are GET and PUT symmetrical? (Linked Data, CONNEG)

John S. Erickson, Ph.D. left a rather interesting question on my earlier post HTTP RFC paraphrased for the Web of Data, quoted here in full:

Here's a question: are GET and PUT symmetrical? Perhaps my question is a bit out-of-band, because the context of my question is really about the symmetry of content negotiation. Implicit in RFC 2616 section 14 is the existence of a (potentially) wide variety of alternate representations that may be accessed via the HTTP/1.1 request header fields; in the world of linked data these are useful for differentiating between text/html, application/rdf+xml, text/rdf+n3, etc.

So maybe the heart of my question is this: in a linked data world where conneg is alive and well (and assumed), what should be the correct interpretation of RFC 2616's (if) the Request-URI refers to an already existing resource, the enclosed entity SHOULD be considered as a modified version of the one residing on the origin server? If I can GET a resource to the resolution of a MIME type, should I not expect to PUT to that same resolution as well? Or, if during a GET the server responds with a list of Alternates, shouldn't I expect to PUT using any of those choices?

Readers familiar with MIME-typed disseminators in the digital repository world (think: Fedora) might think of this as a question about MIME-typed "ingestors"; it is a similar issue: instead of requesting a particular manifestation of the object, this is about updating a specific manifestation (possibly updating the whole).

It's a very good question, and I'm perhaps a bit out of my rank in answering, but to me it seems easy (perhaps too easy) to answer... here go's:

You can request a GET on anything with an HTTP URI.

HTTP URIs can identify anything, however for the context of this let's limit URIs to identifying three distinct things:

  1. Stored Hypermedia (digital documents, audio, video, text files ... any "file" physically stored on the server and web accessible.)
  2. Web Services (a data-accepting process, a gateway to some other protocol, a separate entity that accepts annotations ... Nota bene if you are still reading then you know what I'm referring to so no need to be pedantic on my choice of terminology)
  3. Things (the sky, dave's car, me ... and in this context not hypermedia or web application endpoints)

Let's look at what we can successfully PUT. RFC 2616 section 9.6 tells us:

The PUT method requests that the enclosed entity be stored under the supplied Request-URI ... the URI in a PUT request identifies the entity enclosed with the request ...

We can derive from the above that only "Stored Hypermedia" can successfully be PUT; and thus we cannot successfully PUT a "Web Service" or a "Thing".

Nota bene I'm assuming here that you (the reader) already knows all about CONNEG, Linked Data and the multiple questions which have arisen on how to handle content negotiation, especially with regards RDF and alternative representations.

In the Linked Data world we use the RDF data model extensively. We describe things with statements, each statement is a subject-object-predicate expression which we refer to as a triple; strap enough of these triples together and you have a description of a thing. This is our RDF.

To pass these descriptions around we can serialize our RDF statements in a variety of machine readable formats application/rdf+xml and text/rdf+n3 amongst others.

An RDF Document is where we persist one of these serializations in document, id est Stored Hypermedia.

So to store an RDF document we would logically use PUT, and then GET it later, or perhaps DELETE it. We certainly wouldn't POST to an RDF Document, anymore than we would POST to a GIF image! When thinking about an RDF Document think of it as Hypermedia, exempli gratia an AVI, JPEG or WAV, and treat it as such; a single document, in a single format.

RDF is not one of it's serializations, and it most certainly isn't an "RDF Document".

A major factor which compounds confusion in regards to content negotiation is the Web community's fixation of thinking about the response to a GET as if it were Stored Hypermedia, especially when the mime type associated with the returned entity is something like text/html or application/rdf+xml.

In common language what you are reading is a "web page", infering that it is a Document and Stored Hypermedia.

In reality what you are viewing is an entity returned by a GET request to a Web Service, and that entity has a mime type of text/html. What you are reading is actually just some data I input in to a database via a POST to a CMS (wordpress); there are other things here too, presentation data, navigation on the right and so forth. Leave a comment and the data I entered will stay the same but the entity returned when you next GET this URI will be different. This certainly is not Stored Hypermedia (as we defined it above).

Considering the conneg questions, ask yourself "are these Alternatives Stored Hypermedia or simply entities returned by a web service?". I'm willing to take a punt here and say that your answer will be the latter. Which means that you'd be getting the data in to your system via a POST to a web service. and not a PUT.

Further, if you think about the "RDF" you would no doublt be posting to create these alternatives, it may well make sense to think of it as POSTing RDF in a serialized form which the Web Service accepts. The Web Service will then (all going well) persist this RDF, or the triples, in something like a QuadStore. This serialized RDF is not an RDF Document, it's just serialized RDF in a request entity, just like any other data. Thus think of serialized RDF like json, base64 encoded strings, the egg in ?name=egg&val=shell; it's "just" data.

All of the above is driven home when you consider HTTP URIs that identify "Things". You can't send the Thing in an entity, thus if you get any response from a server when you HTTP GET the URI for a thing, then that URI must be serviced by a Web Service (as defined above).

And now that you know it's a Web Service logic would dictate that you should POST back to it...

The POST method is used to request that the origin server accept the entity enclosed in the request as a new subordinate of the resource identified by the Request-URI in the Request-Line. POST is designed to allow a uniform method to cover the following functions: - Annotation of existing resources; ...

Answering the questions!
After all of that, on to the questions.

  • are GET and PUT symmetrical?
    Only in the context where the URI requested identifies Stored Hypermedia.
  • If I can GET a resource to the resolution of a MIME type, should I not expect to PUT to that same resolution as well?
    No as by this point you know that the URI/MIME combo you requested definately does not represent Stored Hypermedia (a Document).
  • Or, if during a GET the server responds with a list of Alternates, shouldn't I expect to PUT using any of those choices?
    Not expect, as you have no way of knowing whether each URI returned identifies Stored Hypermedia, a Web Service or another Thing. With every new URI the cycle starts again.

Please don't take any of this as gospel, it's all IMO. I do at least hope it makes sense though, and if it doesn't please do correct me asap :)


HTTP RFC paraphrased for the Web of Data

This post is all about gleaning as much useful information as possible from the HTTP Protocol RFC 2616 in order to answer simple and complex Web of Data related questions.

I've chosen the rather old RFC 2616 (1999!) at this time rather than the upcoming HTTPbis because I feel it's important to know where you are coming from, and whilst many things about the Web of Data feel new, they are really age old principals and technologies which have never been used to their full potential. Further you won't be able to appreciate the refinements in HTTPbis if you don't know what it's refining.

Virtually everything from here on is just a snippet/quote or paraphrase of the RFC. Let's start with a simple one:

Why use HTTP?

HTTP is an application-level protocol for distributed, collaborative, hypermedia information systems. ... HTTP allows an open-ended set of methods and headers that indicate the purpose of a request. ... HTTP is also used as a generic protocol for communication between user agents and proxies/gateways to other Internet systems ... HTTP allows basic hypermedia access to resources available from diverse applications. source

I do fully recommend reading the entire RFC and the new HTTPbis, most questions can be answered by returning to these documents and reading what they say (it's all in the detail); here's some more info gleaned from the RFC:

The difference between POST and PUT, URIs as Identifiers, and URIs to identify more than just documents.

The fundamental difference between the POST and PUT requests is reflected in the different meaning of the Request-URI. The URI in a POST request identifies the resource that will handle the enclosed entity. That resource might be a data-accepting process, a gateway to some other protocol, or a separate entity that accepts annotations. In contrast, the URI in a PUT request identifies the entity enclosed with the request -- the user agent knows what URI is intended .. source

Using POST RESTfully for more than just form data.

The POST method is used to request that the origin server accept the entity enclosed in the request as a new subordinate of the resource identified by the Request-URI in the Request-Line. POST is designed to allow a uniform method to cover the following functions: Annotation of existing resources; ... Extending a database through an append operation. The actual function performed by the POST method is determined by the server and is usually dependent on the Request-URI. The posted entity is subordinate to that URI in the same way that ... a record is subordinate to a database. source

What to do if something is created as a result of a POST request.

If a resource has been created on the origin server, the response SHOULD be 201 (Created) and contain an entity which describes the status of the request and refers to the new resource, and a Location header. source

When to use a PUT request?

The PUT method requests that the enclosed entity be stored under the supplied Request-URI. source

How to handle a PUT

If the Request-URI refers to an already existing resource, the enclosed entity SHOULD be considered as a modified version of the one residing on the origin server.

If the Request-URI does not point to an existing resource, and that URI is capable of being defined as a new resource by the requesting user agent, the origin server can create the resource with that URI. If a new resource is created, the origin server MUST inform the user agent via the 201 (Created) response. If an existing resource is modified, either the 200 (OK) or 204 (No Content) response codes SHOULD be sent. source

if extra headers were sent?

Unless otherwise specified for a particular entity-header, the entity-headers in the PUT request SHOULD be applied to the resource created or modified by the PUT. source

and what if I want to save it somewhere other than the URI specified by the client?

If the server desires that the request be applied to a different URI, it MUST send a 301 (Moved Permanently) response. source

and if the PUT can't be done..

If the resource could not be created or modified with the Request-URI, an appropriate error response SHOULD be given that reflects the nature of the problem. source

can I use PUT with server side versioning?

A single resource MAY be identified by many different URIs. For example, an article might have a URI for identifying "the current version" which is separate from the URI identifying each particular version ... a PUT request on a general URI might result in several other URIs being defined by the origin server. source

how would I let a client know I implement server side versioning when they PUT?

If an existing resource is modified, either the 200 (OK) or 204 (No Content) response codes SHOULD be sent.. source

200 indicates a message body in the response ;)

and DELETE?
well it's short so you may as well read it all..

The DELETE method requests that the origin server delete the resource identified by the Request-URI. This method MAY be overridden by human intervention (or other means) on the origin server. The client cannot be guaranteed that the operation has been carried out, even if the status code returned from the origin server indicates that the action has been completed successfully. However, the server SHOULD NOT indicate success unless, at the time the response is given, it intends to delete the resource or move it to an inaccessible location.

A successful response SHOULD be 200 (OK) if the response includes an entity describing the status, 202 (Accepted) if the action has not yet been enacted, or 204 (No Content) if the action has been enacted but the response does not include an entity. source

note that a response can be 200, meaning you can return a response message (like i have X other versions here [list] or delete them all by clicking here [form input which POSTs to a service] ), or an RDF response that can be interpreted by a client to do the aforementioned :)

but can't I tunnel all actions through GET?

Safe Method .. GET .. SHOULD NOT have the significance of taking an action other than retrieval! source

edit: removed small section about URI vs URL! Do however see the comment from Michael which links to more information on the subject.

Thanks for reading :)
There is much more information in the RFC, but those were some nicer points I found useful and relevant to current Web of Data topics & discussions.


Why ActionScripts poor performance is a good thing

ActionScript gives web developers an almost unique chance to grow their skills; out of the common languages in a web developers itinerary ActionScript is almost the only one where a developers code is going to be run 30-100 times per second, with common methods being run literally thousands of times per second.

Further the code isn't running on a solid server where the environment is controlled, and where you can simply throw more processors and memory at a problem to make it go away, the code is being run on differently specified machines almost every run, in a different client and in different versions of player.

To compound this unique situation the virtual machines (relatively) poor performance comes in to play.

Why is this good?

It has spurred a good percentage of ActionScript developers on to look at more advanced levels of code optimisation, to consider factors which are simply ignored and overlooked in other languages, to work with (and create) other technologies in order to push performance to the max; indeed even to hack the virtual machine to bits then create custom compilers and optimisers.

Libs and frameworks from PV3D through to Flex, projects such as haXe, Apparat and Alchemy; not to mention the wide array of technologies from AMF to RTMFP - this list goes on (and on).

There are 3 types of coders: bad coders, normal coders and ultra coders who's skillsets and level of understanding are far and beyond that of the normal coder; what I'm trying to get at here, is that the ActionScript challenges outlined above create an environment where a higher percentage of coders can grow in to ultra coders, than in any other language I know.

The Flash / ActionScript pair gives both designers and developers the notion that anything is possible - and this spurs them on to the push boundaries of what can be done; that kind of thinking sticks with people, they carry it on to other languages, platforms and communities; bringing inspiration and forward thinking to the web of humans.

Context: Really enjoyed reading this post: The Advantage Of ActionScript from Joa, in particular I'm glad that he mentioned "Poor Performance" as an advantage. The above post leads on from Joa's to give some of my own thoughts on why we all owe ActionScript and Flash a little something.

as3poor


Virtuoso 6, SPARQL + GEO, Sample Queries

Along side a whole host of improvements, the latest version of Virtuoso (Virtuoso 6) has added support for Geo data! One small sentence, one huge leap for mankind; it's vastly importany IMHO because it brings a new kind of link to Linked Data; a location based one.

Very brief intro: SPARQL is a fantastic query language which works over RDF and thus Linked Data, Virtuoso amongst other things has a powerful QuadStore which can be queried via SPARQL, and Virtuoso's implementation of SPARQL + the extensive suite of extensions they have implemented makes it the most usable and powerful query langauge available (again, in my honest opinion). In short this combination was enough to make me drop normal RDBMS systems and never look back.

Rather than rambling on about how fantastic it is though; here are some Virtuoso specific sample SPARQL (+GEO) queries, which should hopefully wet your appetite and give you some inclination of what can be done.

Basic Geo Lookups

Things within 20km of New York City : RESULTS
SELECT DISTINCT ?resource ?label ?location
WHERE
{
<http://dbpedia.org/resource/New_York_City> geo:geometry ?sourceGEO .
?resource geo:geometry ?location ; rdfs:label ?label .
FILTER( bif:st_intersects( ?location, ?sourceGEO, 20 ) ) .
FILTER( lang(?label) = "en" )
}

Distance between New York City and London, England : RESULTS
SELECT (bif:st_distance(?nyl,?ll)) as ?distanceBetweenNewYorkCityAndLondon
WHERE
{
<http://dbpedia.org/resource/New_York_City> geo:geometry ?nyl .
<http://dbpedia.org/resource/London> geo:geometry ?ll .
}

Querying Time and Space

All Educational Institutions within 10km of Oxford, UK; ordered by date of establishment : RESULTS
SELECT DISTINCT ?thing as ?uri ?thingLabel as ?name ?date as ?established ?matchGEO as ?location
WHERE
{
<http://dbpedia.org/resource/Oxford> geo:geometry ?sourceGEO .
?resource geo:geometry ?matchGEO .
FILTER( bif:st_intersects( ?matchGEO, ?sourceGEO, 5 ) ) .
?thing ?somelink ?resource ; <http://dbpedia.org/ontology/established> ?date ; rdfs:label ?thingLabel . FILTER( lang(?thingLabel) = "en" )
} ORDER BY asc( ?date )

Historical cross section of events related to Edinburgh and the surrounding area (within 30km) during the 19th century : RESULTS
SELECT DISTINCT ?thing ?thingLabel ?dateMeaningLabel ?date ?matchGEO WHERE {
{
SELECT DISTINCT ?thing ?matchGEO
WHERE
{
<http://dbpedia.org/resource/Edinburgh> geo:geometry ?sourceGEO .
?resource geo:geometry ?matchGEO .
FILTER( bif:st_intersects( ?matchGEO, ?sourceGEO, 30 ) ) .
?thing ?somelink ?resource
}
}
{?property rdf:type owl:DatatypeProperty ; rdfs:range xsd:date } .
?thing ?dateMeaning ?date . FILTER( ?dateMeaning in( ?property ) ) . FILTER( ?date >= xsd:gYear("1800") && ?date <= xsd:gYear("1900") )
?dateMeaning rdfs:label ?dateMeaningLabel . FILTER( lang(?dateMeaningLabel) = "en" ) .
?thing rdfs:label ?thingLabel . FILTER( lang(?thingLabel) = "en" )
} ORDER BY asc( ?date )

Transitivity and Inference (v5 compatible)

Finding the shortest route between two "things" (HTML and XML in the example) : RESULTS
SELECT ?route ?jump WHERE
{
{ SELECT ?x ?y WHERE { ?x foaf:page ?xpage ; ?predicate ?y . filter( isURI(?y) ) } }
OPTION ( TRANSITIVE, T_DISTINCT, T_SHORTEST_ONLY, t_in(?x), t_out(?y), t_max(10), t_step('path_id') as ?path, t_step(?x) as ?route, t_step('step_no') AS ?jump )
. FILTER ( ?y = <http://dbpedia.org/resource/HTML> && ?x = <http://dbpedia.org/resource/XML> )
}

..and all routes between the two "things" : RESULTS
SELECT ?route ?path ?jump WHERE
{
{ SELECT ?x ?y WHERE { ?x foaf:page ?xpage ; ?predicate ?y . filter( isURI(?y) ) } }
OPTION ( TRANSITIVE, T_NO_CYCLES, t_in(?x), t_out(?y), t_max(5), t_step('path_id') as ?path, t_step(?x) as ?route, t_step('step_no') AS ?jump )
. FILTER ( ?y = <http://dbpedia.org/resource/HTML> && ?x = <http://dbpedia.org/resource/XML> )
}

Traversing Ontologies and (Sub)Classes; all subclasses of Person down the hierarchy : RESULTS
SELECT DISTINCT ?x WHERE
{
{ SELECT ?x ?y WHERE { ?x rdfs:subClassOf ?y } }
OPTION ( TRANSITIVE, T_DISTINCT, t_in(?x), t_out(?y), t_step('path_id') as ?path, t_step(?x) as ?route, t_step('step_no') AS ?jump, T_DIRECTION 2 )
FILTER ( ?y = <http://dbpedia.org/ontology/Person> )
}

Free text search, scores and IRI Ranks (v5 compatible)

Searching over labels, with text match scores and additional ranks for each iri / resource : RESULTS
SELECT ?s ?page ?label ?textScore ((?s)) as ?iriRank WHERE {
?s foaf:page ?page ; rdfs:label ?label . FILTER( lang(?label) = "en" ) .
?label bif:contains 'adobe and flash' option (score ?textScore ) .
}

Virtuoso 6.1 (Open Source Edition) released. For features & bug fix details see: link

spo


How to fix a noisy computer or graphics card fan

Background:

A few months ago I upgraded my pc, part of which included adding an Asus GeForce 9600 GT; shortly after installing, the Fan on the graphics card started to make a most irritating grinding noise. I found that if I knocked and tilted my case a few times I could get it to stop grinding & quieten down (the hit-it fix). Over time this grinding has become more frequent, to the point that it's almost non-stop; and so that it vibrates the case and thus you can hear it all through the house. It's been driving me insane because everytime I get it to stop, the *slightest* of movements starts it going again, which includes me moving, somebody walking past, infact anything. This weekend it's been especially bad, in-fact I've felt like throwing the f'ing thing out the window; finally this morning I had enough and thought I have to get a new fan for this thing.

Searching the internet found me no replacement fans, which meant I was looking at sending the card back to get it fixed (and thus missing out on work), buying a full new cooling set, or buying a new card - the last two options would have led to pc-murder as seriously this noise goes right through you. Thus after running out of optionsĀ  I tried to fix it.

How to fix a noisy computer fan:

  1. Remove the fan from the computer (in my case this meant removing the graphics card and unscrewing the plastic cover and fan).
  2. Once you have the fan removed, remove the stickers from it.
  3. On one side you'll notice a small recess with the end of a small metal pin on it (the bit that holds the fan together, and which the fan spins on)
  4. Take a small, toy plastic soldier and chew off the arm (or rifle butt it if has one).
  5. Take the cap off a bottle of Filippo Berio Extra Virgin Olive Oil and fill said cap approx half full with said olive oil.
  6. Dip one end of the plastic arm / rifle in to the olive oil, then dab the smallest of drops on to the small metal pin in the recess of the fan (from step 3)
  7. As you do step 6, gently pull the two parts of the fan about 0.5mm apart and turn the fan clockwise and ant-clockwise.
  8. Repeat steps 6 & 7 until you feel the oil has worked its way in and loosened it, "oiled the fan".
  9. Be very careful not to swamp the thing with oil, it's an electronic part and you'll break it.
  10. Reassemble what you took apart in step 1.

I do believe the specific brand of olive oil, or even the fact it's olive oil isn't of significant importance, likewise the soldier arm / rifle could be anything small that's good for poking.

As an extra bonus, my graphics card is now running 20 degrees cooler when idle :)

Short version:

  1. Oil it

Disclaimer:

This worked for me, I'm most pleased, if it doesn't work for you or you break something, don't blame me - but it might be worth a try.

Regards!

fanfix



  • webr3 avatar