NOTIFY

Subscribe and pull are pretty handy, but push (especially asynchronous) is just as useful, and often seems to be missing. So where is it at web level?

Let's say I have something named/identified with a URI, like myself, now everytime that's mentioned anywhere on the web I'd like to know about it (okay not all the time, but it should at least be possible) - insert multiple scenario's here, conclude that any notification solution needs to be as generalized as possible.

So, I simply propose 3 basic things:

NOTIFY - a new HTTP verb

Why a new verb? to prevent collisions with usage of POST and to leverage the already existing design of HTTP, especially things like Accept and OPTIONS. (will need to have properties similar to POST)

notify - a new link relation

To be used in html, and with the Link header, this allows resources to specify where notifications should be sent to.

x:notify - a URI identifying the new link relation

it's just the rel, but using a full URI for compatibility with the semantic web.

That's it, quite sure one can build a lot of things on top of that.


Uniform Data

Why a need for uniform data?

a) The web is currently converging around web applications and mobile devices, a lot of focus is being placed on sensor networks, internet of things, and augmented reality to display information. Simply, how can these applications make use of published data readily from multiple sources if that data is not in a uniform standard?

b) The core web which people use on a daily basis is ever more silo focussed, and the size of those silo's is ever increasing - the social sector is a great example of this, and whilst there are core movements to create a more federated and distributed social web, a key blockage in the way is a lack of uniform data, often new formats are being developed, or poorly modelled application (rather than domain) specific models are making it out on to the web, and interoperability is several times harder than it could be, given the presence of uniform data. This has significant social and economic repercussions.

c) Time, a significant amount of time is invested daily by thousands (if not millions) in to re-solving the same old problems, creating a schema for this, a model for that, learning the same lessons countless people have learned before them, often the learning curve spans several years. A standard way to publish and share reusable model specific schemas (/not/ format specific like XML schema and JSON schema) would save vast amounts of developer time per annum. In addition to having significant economic impacts this would also lend to far more innovation (since more time free to innovate!) within an already important and innovative sector.

Why not "plain" RDF?

RDF has failed to be understood, adopted or loved by the general masses of the web, even many who use RDF often do not fully understand it and have many issues. Adoption has been... let's just say not good.

There are 3196 APIs on ProgrammableWeb, out of those:

  • 2152 produce XML
  • 1255 produce JSON
  • 36 produce RDF

Perhaps more indicative though, is that those 36 are spread over 6 years, with only 1 updated so far this year, meanwhile there have been 58 new JSON based APIs in the last month alone.

Over on stack overflow, there have been 1,569,512 questions asked, 273 that's 0.017% of them, are RDF related.

The numbers are pretty clear, for all RDF's merits, and the countless benefits of the uniformity of RDF, it's just not being adopted.

To use RDF correctly requires RDF tooling, and not just tooling to parse the data (like JSON, and common usage of XML), but to use the data, to handle triples and graphs and queries, all of which requires significant investment in skills, time, and deployable technologies.

Further more, RDF data published using multiple different ontologies is difficult for people to use, the infrastructure and tooling simply doesn't exist to follow ones nose around the web and make practical use of several thousand different ontologies, that level of understanding is a good generation away, and for now all it does is serve as a blockage to adoption, and primarily as a blockage to people actually using or presenting the data. Time and time again we have seen a rallying around core ontologies, with successful mixing and matching happening more at the ontology level, than the data level. For now applications will be looking for mentions of Classes and Properties they "understand" (have a hard coded usage for).

Additionally, these difficulties in usage have lead to a second layer of centralization on the web, one which was borne from RDF, and rather ironically many of the architectural benefits of uniformity and universality are being lost. That is SPARQL, we are seeing a huge increase in SPARQL enabled datastores on the web, each of which holds a specific set of data, and each of which has key resource limitations. Practically this means that:
- clients are tightly coupled to servers
- all processing and storage weight is being handled by the servers
- data on the wire is non uniform
- clients are not using the web of data, rather they are using a datasource on the web, a datasilo.
This is a pattern which is not optimized for anybody, servers, clients, developers, data, the web, the network.

The core benefits of a web of linked data have not realized, RDF has failed to deliver them, primarily due to complexity and tooling requirements. SPARQL (positioned on the server/silo) is only compounding matters. That's not to say it cannot deliver them, or that these technologies are bad, only that they have not delivered the core benefits, yet.

Perhaps another way to put it, is that if you break things like RDBMS and Classes and Objects down you can get to triples of some sort (EAV, RDF, or to atomic relations / predicate based logic), and RDF did just this, however it was done in such a way that the data format (RDF) required a full new stack of technologies to /use/ the data, rather than being a uniform data format acting as a bridge between say classes and objects and RDBMS, a webized data model; that is to say, you can't really use "it" (RDF, the model people don't really speak of) with 95% of the deployed technology out there, you can provide an RDF view of the data from that technology, map it to RDF, but you cannot easily pull it back in and use it, and unusable data, isn't much use. There are many shades of grey between, but it's certainly more at the unusable end of the spectrum.

What can we do?

If we look at what people already do, a large proportion of web developers (most) continue to publish data via web services as XML and JSON, the common process is simple, create a schema, document it somewhere out of band (perhaps call it API documentation), publish data using that schema in some arbitrary way as XML and JSON. On the client side the same process continues, find a new API, get an XML or JSON parser, map the data as described by the API to some classes and start using it. All of this is needless work, they are showing us what works, what they can do, and how they can work with data easily. Tersely, they are missing the benefits of Uniform Data.

We can bring the benefits of uniform data to the current web 2.0, class and objects, rdbms, xml and json focussed web.

We can not only address these core issues, and bring the benefits of linked data and the semantic web to the general developer population, but we can also:
- ensure it's RDF and traditional semantic web compatible (giving "us" mountains of useful every-day data)
- provide that clear migration path to the "full" semantic web that's missing now.
- increase semantic web adoption exponentially, bringing big benefits without the high cost.

Approaches

There are two key approaches I can personally see to this:

  1. Webize Classes and Objects (Java style POJOs, Data Objects, subset of UML)
  2. Provide a Classes and Objects view over RDF

The first of these approaches - providing an abstract syntax for classes and objects and then defining mappings for that to XML and JSON - would bring the benefits of OWL 2 and XSD to schemas, and the benefits of "linked data" to both the schemas (/class blueprints) and instance data. It would allow data validation rules to be augmented on from sources external to the schema, it could be codified in libraries across multiple languages, it could also serve as a translation layer between Classes and Objects, NoSQL, and RDBMS, and other formats such as CSV. Additionally it would lend each schema openly being mapped to vendor specific databases, as well as vendor neutral schemas such as ANSI SQL. Furthermore, it would also lend to innovation in each layer, for example standardized queries for each kind of data could be created, with translations of those to each specific vendor or to well defined standardized languages, and even codified to work in memory in libraries (for example within instance methods or to run on GPU enabled hardware and languages). Many benefits could come from webizing what the masses already do. Other examples include providing an opportunity to refine the core datatypes on the web in a serialization agnostic way (think xsd types merged with webidl types), ensuring the correct entailments for equality are baked in to the core, providing first level support for things like lists and sets, providing a foundation upon which diff, patch, versioning can all be accomplished, providing canonicalized forms so that encryption and a data signing can be accomplished... and more I'm sure.

The second of these approaches has less wide scale benefits, but would provide a more usable abstraction layer on top of RDF, which is currently (dare I say painfully) missing. This would ultimately make working with data more familiar, a codified example could be:


var person = new Class('foaf:Person');      // external class definitions loaded from the web
person.load('http://example.org/bob#me');   // instance data loaded
print(person.name);                         // simple access to pre-known properties
person.validate();                          // in built validation from OWL 2
                                            // and XSD data type restrictions

// work with a schema class at a time..

var man = new Class('gender:Male');         // different class for different data
man.load('http://example.org/bob#me');      // same data
print(man.wife);                            // different, domain specific properties
man.expand();                               // full entailment regimes support to get
                                            // the most from schema definitions 

The best approach will become clear as time progresses, for now I'm keen and happy to work on either or both.

Just some musings..


I live in the UK and...

I work on the web, I work from home, my clients are all in different countries around the world, so are the people I communicate with on a daily basis, my friends and associates. One could say that I'm almost detached from the everyday society that goes on around me, I pretty much work and live autonomously; other than close family and a select few people I see regularly when I'm outside that is. The same is true for my better half Rachel.

A side effect of this, is that it let's me sit back and look at the country in which I live. The UK (or Great Britain, or whatever you call it). You probably know what I'm about to say, but I don't see anybody else saying it - granted, I haven't looked, but the way I see it is, if this isn't coming through my social stream then it isn't being said - so let me say it.

If the UK was embodied in a person, it would be a desensitized homeless addict without any morals, who'd taken advantage of and harmed every person they knew, and that was currently in mid air, about to hit the ground, after jumping off a bridge. I'm probably being too kind there though. From here on, this post is just going to be a collection of fragment paragraphs which hopefully illustrates.

Japan has just been hit by so many disasters it's unreal - I can say without a shadow of a doubt though, that if they had happened in the UK the effect would have been several orders of magnitude more devastating. The infrastructure would have crumbled, there would have been next to no warning bar the web's social media streams (that is to say, I'd probably have had to run down the street and warn the neighbours as they wouldn't have had a clue), the death toll would have been unimaginable. The response? can you really see our government (country wide or local) doing anything to respond that amounted to anything of consequence? Do you think there's a plan for any such event (of any kind on a similar scale), do you think there's even a plan, a clue, any preparedness at all? Can you see our prime minister stepping up to the plate and doing anything other than hide or squirm? Let's not gloss over things here, we would be completely and utterly screwed.

The middle east, uprisings everywhere, the people have a drive and are prepared to at least try and do something. Now, can you see that happening here? our economy is gone, there are no jobs, people are being made redundant at a scary rate with no more jobs for them to walk in to. The country produces nothing, has no industries, there are no jobs for our children, well over half the population is paid out of the tax of the other half of the population, and the half that pays the taxes are dropping like flies and struggling for work or to make a wage. Here's the horrible bit though, some of the people who are out of work just accept it and do nothing, others work themselves to death (literally, I knew several people did everything they could to provide for their families after loosing their careers, and sadly are no longer with us, overwork and over stress, they just died, unceremoniously at a young age - really good men, the best kind), yet others are busy scraping themselves through life and in to early grave. Some, the over 40s are just without hope, there's nothing for them now, they had a career and were the foundations of the companies and organizations they worked in, they have nothing now, can't get any kind of job doing what they did (even though the country needs their skills!), they can wallow or possibly get unskilled work, they can kiss goodbye to their pensions though, nothing to look forward to there, if they're lucky they've already bought their homes and can.. oh wait, no housing market is gone and they don't have a job, probably have to sell their homes just to survive and go live in a small flat somewhere, scrap that thought. The other horrible bit? the rest of the country who are working, are too distracted by a combination of money, stress and distractions (tv, material goods, fake lives, striving after the wind) to do or say anything about it, they just get on with their own lives until it happens to them and then nobody listens because they're too distracted with (repeat). The nation is too demotivated to do anything frankly. Ahh but these other countries are full of corrupt people stealing their money and wasting it.. yes, and we've legalized that process and made it law, the norm. What percentage of your wages goes to tax and national insurance? then what percentage get's taxed on everything you spend - do the maths, take a wage of an easy figure, say £1000 GBP, take off the tax and national insurance, then the spend the rest and take off the amount of that which is taxed, then pay one other person with what's left that wasn't taxed and repeat. How much is left after two circulations through the economy? what percentage of that money was trimmed off as tax? almost all of it. It makes you wonder how the country even functions. So where does that money go? in to our crumbling infrastructure? schooling? health? general services? No chance in hell, there is no way that much money can produce that little results - it just does not add up, especially when you couple that to the debt we have as a country.

I spoke to a taxi driver last night, he owns the business, it's his retirement you see and the only way he can actually do any work and make a living, a skilled worker all his life, that's all he's got left now, and he's one of the lucky ones, anyway, he told me that the night before he'd picked up a local bank manager, a bit tipsy (drunk that is), who said that he doesn't think he can do his job any longer, because he sold mortgages to a load of people because they wanted new build properties in a nice area of town, now not one of them (in the entire scheme) can handle the mortgage payments, some of them are homeless, a couple of them have recently committed suicide, he feels guilty because he saw it coming. It all interconnects, this guy who owns the taxi company has a new driver called Mo. Mo is a few years older than me and has two young children, he works two jobs, starts one at 8am (the telesales job, although Mo is a qualified computer/network engineer and mathematician) and finishes driving the taxi at 3am, 6 or 7 days a week, he's working himself in to an early grave, but has no choice. Mo recently joined that company, because two weeks before the company he previously worked for was put up for sale (£50k price tag), Mo wanted to buy it but the bank said no, and nobody else around to help him, so he's stuck. The company was put up for sale by the previous owners widow, because her husband, who I also knew, was working two jobs, the taxi business and working on the oil rigs, he was a brilliant man, kindly man in his late 30s - dropped down dead from overwork and stress last year, he dropped me off down the street, went home picked up his equipment for the oil rig job, drove to the airport, went to step on the helicopter and dropped dead from a heart attack. That you see is why his widow was selling the business, why Mo had to leave to this other company, and how I came to be speaking to the taxi driver at the start of this paragraph last night (because I wanted to give Mo my business, but Mo wasn't on). Coincidently, the other driver who's having to leave, Bill, is awaiting retirements to go and live with his wife in Taiwan, he can't leave the country yet because he won't get his full pension if he does, so he's stuck here twiddling his thumbs for a couple of years first, with no work of course even though he's incredibly skilled at what he does. Ahh yes, and the company that's getting sold, well that's just been bought, by somebody who's just been made redundant from the local council and can't get another job, it was literally, his only choice, because he has to pay for private health care for his wife who's ill and needs an operation for which the waiting list is years. I find that all quite sad, these are just the normal people I see on a daily basis.

I'll stop there I think, and get back to my web bubble where I can work, can communicate, and do get to see at least some of what's happening in the world, I'm protected in a way I guess. I do feel guilty for that protection though, when my friends who I grew up with, who have traditional trades, are all pretty much screwed.

The problem though, is that I have kids, so what do I do to guarantee them, at least a fighting chance, a job, and maybe a place where priorities are a little more in order (like a formerly third world country perhaps)?


The simplest view possible of httpRange-14.

Here's an even simpler way of looking at this..

  • a URI is associated with a thing by a group of agents/people as a name for that thing.
  • some URIs are also associated with a set of representations over time by the dereferencing process.

Why were the representations made available for that URI?

  • because somebody made a web page and then needed a uri to refer to it.
  • because somebody named something with a uri and then wanted to provide information about it.
  • because somebody made a web page about one specific thing and then needed a uri to refer to it and then the uri became commonly used to refer to the thing named.

That's a really minimal set of the different ways of looking at it, without getting in to any technical details at all, all three are really common cases of how people use URIs, the in-fighting is just people who've picked one of the three as being gospel, or technically required to make things work.

It's a social problem.

The httpRange-14 resolution picked the first of the above reasons as being the norm, and as requiring the least technical trade-offs. The resolution also accounted for the second case, with precedence given to the importance of having distinct names, rather than network performance or ease of implementation, again simply a design trade-off, one which prioritizes humans over machines.

I definitely cannot explain it any simpler than that.


A simple overview of httpRange-14

Complicated issue eh? it's certainly consumed a great deal of my time for over a year.

So, here's a simple-ish summary of the problem - disclaimer, all IMHO of course:

Outline:

    each URI <u> is bound to a thing T by a set of agents SA (this is the naming process)
    <u> refers to T

    some URIs are bound to a set of representations SR over time by the dereferencing process (i.e. GET a URI)
    <u> refers to T, SR

Let's assign the name XT to this class of URIs which are bound to both a T and an SR. (things which name something, and which you can GET content+meta by dereferencing)

Where T == SR this forms a subclass of XT which we'll call YT (just means that the URI is used to refer to the thing you GET, like a web page or an image)

The opposing views:

  1. for all <u> in XT, T == SR
    (all members XT are members of YT, doesn't account for T != SR - this is an information resource theory)

  2. SR is bound to T not <u>
    (means T == SR && T != SR - this is the content+meta gives information about the thing theory, slash uris name anything)

  3. <u> is bound to T, and T != SR
    (SR is unbound to any name, or bound to "some other name" - this is the <u> :retrieved_from "u" approach, or content-location = graph uri)

  4. <u> is bound to SR, and T != SR
    (T is unbound to any name, or bound to "some other name" - can't see what it equates to but it's a theory)

  5. <u> is bound to T, SR and T != SR
    (can't use deref URIs as names - URI collision, or the chimera theory, this is where information about <u> consists of info about both the information and the real world thing named. Practically it's the same as view 2)

Summary:

View 1, is the httpRange-14 solution (accounting for view 3 with the 303 solution too). View 2 is implied by the REST dissertation (actually so is 1 and 3..). View 3 is the "we don't normally talk about the document" view. All of them have issues for somebody.


What does a URI name? agree?

Here is an extract from an email conversation I had earlier, where U is a URI.

Likewise, Moby Dick is not a function, however X being a function from requests to responses, and the functions instantiation as a locus of computation could well be correct.

U can still identify Moby Dick, and all you do is request that X gives you a representation of Moby Dick bearing Y characteristics (content type etc), where U can be resolved to an address for X.

Bearing in mind of course that U can be dereferenced in any number of ways, and doesn't always "point to"/"address" X (stick U in a sparql query, or wayback machine for instance).

The common theme though, which I have to agree with, is that we always use a URI to refer to a source of information (static or computed), whether it's modifying it, or getting some version (representation?) of it.

A representation of information, and U refers to that information (format agnostic) and we retrieve that information by using U as name for the purpose of referencing.

So U is always used to refer to information about a thing, and that thing can be anything.

I'm interested to know, if anybody disagrees?

ps: please don't quote any specification when replying, if the specs had the answers we wouldn't have questions.


The Social Graph, the other Graphs, and the pot of gold you're sitting on.

me: have you all moved on from the #socialgraph meme to focus on more specific graphs? idea graphs? innovation graphs? convergence graphs?, reply: pls explain how2 define the other graphs you mention - We're curious

You've probably heard of the Social Graph before, that obscure abstraction that holds the relationships between humans, it's locked up in the cloud somewhere inside corporations like facebook, google, linkedin and twitter, with tantalizing bits exposed to you - it's the thing which some people understand, and which some people mine to do everything from social networking and promotion through to highly targeted marketing.

What you may not realize though, is that the social graph concept is incredibly simple, that there's not just a social graph, one graph, but an infinite amount of incredibly interesting graphs, that each graph holds a world full of unrealized, valuable and useful information, knowledge even, and that... let's cut to the point here, 99.99999999% of these graphs are locked away, most people don't even know that they exist, let alone that they're sitting on one of these gold mines, the graphs are locked away unrealized in almost every application and website you use, hidden in databases, and if you're a developer, programmer, db admin or website owner, then you definitely have one or more of these graphs under your control.

This is one of those things I can't stress enough, it's the same as having a pot of gold buried in your back garden, you may not be able to see it, but it's there, and all you need to do is grab a tool and dig it out.

Why Graphs are valuable

To understand, let's look at (and explain) the Social Graph. People use a website like Twitter and make connections to other people, they follow people, each "follow" is saved as a very simple link between the two people: "X follows Y". That's it, if you took 20 of these follows and drew them on a piece of paper you'd see a graph, nodes and edges, people linked together, and this is where things get really interesting and where all the golden information comes from. You see, if you know that "X follows Y" then you can say that "Y is a follower of X", you get the link in the other direction for free. And if you take all the people that "follow bob" and see who else they follow, then take the top 5 of that list, you can pretty safely infer that those 5 people are like bob. This is how "you might like to follow" works, those magical suggestions made by computers, linkedin being a great example of this. By exposing even the simplest of relationships, humans and machines can look at those relationships and infer new ones, indeed it's magical, but really it's simple.

This process of inferring new relationships between things by looking at existing relationships between things, is how the world ticks over. Another example is amazon (and any marketing really), if you buy x,y and z books, then amazon can look at other people who bought those books, see the most common other purchases they've made, and suggest them to you, odds are high you'll want one of them.

However, these kinds of relationships are everywhere, and there's value in every one of them, for example take this blog, each post is tagged with a few terms, which means that each term is linked to another by the post(s) it's tagged to, and each post is linked to me because I'm the author, so if I took all of those tags and seen how they relate to each other and how popular they were, you'd get a pretty strong indication of my technical interests, and if you linked them to me and plotted them over time, you'd essentially see a really interesting view of "me" and my interests, how one thing has lead to another, what things have died out, see the introduction of one topic in my life, branching, strengthening and converging with others over time - personally I'd find that really, really interesting, it's invaluable information in fact which is just locked away in this blog, waiting to be exposed. That's an example of one of these unnamed and unrealized graphs, it's there already, I just haven't dug it up and received the benefits yet.

And this, what I outlined above is just the tip of the iceberg, say i pulled in all the tweets I've messaged and added them in to the equation, say i also pulled in the tweets from my friends, and the tags from the posts I'd read and run them through the same simple process of naming links and looking at the graphs they create. I'd have a wealth of information about almost everything, I'd be able to see where my ideas came from, see that Jeff and Kingsley both mentioned something 6 months ago that I've taken and mixed with my own ideas to create something new. Then I'd be able to analyse information as it came in to, and have my machines filter and suggest not only what I'm interested in, what information I want to consume, but also suggest things I don't know, that because I'm interested in X & Y and talk to Melvin and Manu, that I'll probably want to talk to David and read Z.

If all this sounds ultra complex, it's not, just remember it all comes down to a simple little link between two things, and the graph of those things plotted out, with Y to Z added in to the mix.

The point is, that all of these relationships that are hidden away, the unrealized graphs, are exponentially more valuable and interesting than the boring data we think we have, they contain new knowledge that we didn't know, they allow us to realize things we'd never guess, they contain insight++.

The Three Steps

  1. Find the relationships.
  2. Name the relationships.
  3. Expose your Graph(s).

Finding and Naming Relations: We're at the tail end of the RDBMS generation, so odds are very high that your database is in an RDMBS like mysql, if your database is fully normalised then you'll probably already have link tables, every one of those will contain the relationships you're looking for, odds are you've just given them a generic name like UserPosts or suchlike thinking that the relationships aren't that important (speaking from experience here!), if your database isn't normalized then you'll be looking for IDs from one table mentioned in another, UserID columns and similar, or perhaps foreign keys. Remember there are also natural links between bits of data, any time you have a unique value like an email address, a town name, a postcode, a price, sku, an order number, it can be the hook for a relation, this user and that order have the same email address, these customers and these suppliers are in the same town, might they deliver directly to save on postage and packing + speed up delivery times? When you find, name and expose these relationships you get your Graph(s), and from there you can start inferring things, answering questions, and gaining knowledge you never knew you had; no matter how big or small your datasets are, even if it's in a CSV or a generic format.

Exposing Graphs: Regardless of how you expose these new graphs of relations you've found, just expose them, I'm very pro linked data for multiple reasons, but if you just expose these relations in any way you can, then others (like me) can come along and make relationships between your data and other peoples data, and those relationships are more valuable and interesting than I can ever describe, for example, just think how much easier your life would get if somebody linked up twitter usernames to facebook usernames to email addresses, you'd practically have one single contact list for everybody you knew, and multiple ways to contact them, and that's just a tiny, simple, essentially boring little link between datasets.



  • webr3 avatar