Virtuoso 6, SPARQL + GEO, Sample Queries

Along side a whole host of improvements, the latest version of Virtuoso (Virtuoso 6) has added support for Geo data! One small sentence, one huge leap for mankind; it's vastly importany IMHO because it brings a new kind of link to Linked Data; a location based one.

Very brief intro: SPARQL is a fantastic query language which works over RDF and thus Linked Data, Virtuoso amongst other things has a powerful QuadStore which can be queried via SPARQL, and Virtuoso's implementation of SPARQL + the extensive suite of extensions they have implemented makes it the most usable and powerful query langauge available (again, in my honest opinion). In short this combination was enough to make me drop normal RDBMS systems and never look back.

Rather than rambling on about how fantastic it is though; here are some Virtuoso specific sample SPARQL (+GEO) queries, which should hopefully wet your appetite and give you some inclination of what can be done.

Basic Geo Lookups

Things within 20km of New York City : RESULTS
SELECT DISTINCT ?resource ?label ?location
WHERE
{
<http://dbpedia.org/resource/New_York_City> geo:geometry ?sourceGEO .
?resource geo:geometry ?location ; rdfs:label ?label .
FILTER( bif:st_intersects( ?location, ?sourceGEO, 20 ) ) .
FILTER( lang(?label) = "en" )
}

Distance between New York City and London, England : RESULTS
SELECT (bif:st_distance(?nyl,?ll)) as ?distanceBetweenNewYorkCityAndLondon
WHERE
{
<http://dbpedia.org/resource/New_York_City> geo:geometry ?nyl .
<http://dbpedia.org/resource/London> geo:geometry ?ll .
}

Querying Time and Space

All Educational Institutions within 10km of Oxford, UK; ordered by date of establishment : RESULTS
SELECT DISTINCT ?thing as ?uri ?thingLabel as ?name ?date as ?established ?matchGEO as ?location
WHERE
{
<http://dbpedia.org/resource/Oxford> geo:geometry ?sourceGEO .
?resource geo:geometry ?matchGEO .
FILTER( bif:st_intersects( ?matchGEO, ?sourceGEO, 5 ) ) .
?thing ?somelink ?resource ; <http://dbpedia.org/ontology/established> ?date ; rdfs:label ?thingLabel . FILTER( lang(?thingLabel) = "en" )
} ORDER BY asc( ?date )

Historical cross section of events related to Edinburgh and the surrounding area (within 30km) during the 19th century : RESULTS
SELECT DISTINCT ?thing ?thingLabel ?dateMeaningLabel ?date ?matchGEO WHERE {
{
SELECT DISTINCT ?thing ?matchGEO
WHERE
{
<http://dbpedia.org/resource/Edinburgh> geo:geometry ?sourceGEO .
?resource geo:geometry ?matchGEO .
FILTER( bif:st_intersects( ?matchGEO, ?sourceGEO, 30 ) ) .
?thing ?somelink ?resource
}
}
{?property rdf:type owl:DatatypeProperty ; rdfs:range xsd:date } .
?thing ?dateMeaning ?date . FILTER( ?dateMeaning in( ?property ) ) . FILTER( ?date >= xsd:gYear("1800") && ?date <= xsd:gYear("1900") )
?dateMeaning rdfs:label ?dateMeaningLabel . FILTER( lang(?dateMeaningLabel) = "en" ) .
?thing rdfs:label ?thingLabel . FILTER( lang(?thingLabel) = "en" )
} ORDER BY asc( ?date )

Transitivity and Inference (v5 compatible)

Finding the shortest route between two "things" (HTML and XML in the example) : RESULTS
SELECT ?route ?jump WHERE
{
{ SELECT ?x ?y WHERE { ?x foaf:page ?xpage ; ?predicate ?y . filter( isURI(?y) ) } }
OPTION ( TRANSITIVE, T_DISTINCT, T_SHORTEST_ONLY, t_in(?x), t_out(?y), t_max(10), t_step('path_id') as ?path, t_step(?x) as ?route, t_step('step_no') AS ?jump )
. FILTER ( ?y = <http://dbpedia.org/resource/HTML> && ?x = <http://dbpedia.org/resource/XML> )
}

..and all routes between the two "things" : RESULTS
SELECT ?route ?path ?jump WHERE
{
{ SELECT ?x ?y WHERE { ?x foaf:page ?xpage ; ?predicate ?y . filter( isURI(?y) ) } }
OPTION ( TRANSITIVE, T_NO_CYCLES, t_in(?x), t_out(?y), t_max(5), t_step('path_id') as ?path, t_step(?x) as ?route, t_step('step_no') AS ?jump )
. FILTER ( ?y = <http://dbpedia.org/resource/HTML> && ?x = <http://dbpedia.org/resource/XML> )
}

Traversing Ontologies and (Sub)Classes; all subclasses of Person down the hierarchy : RESULTS
SELECT DISTINCT ?x WHERE
{
{ SELECT ?x ?y WHERE { ?x rdfs:subClassOf ?y } }
OPTION ( TRANSITIVE, T_DISTINCT, t_in(?x), t_out(?y), t_step('path_id') as ?path, t_step(?x) as ?route, t_step('step_no') AS ?jump, T_DIRECTION 2 )
FILTER ( ?y = <http://dbpedia.org/ontology/Person> )
}

Free text search, scores and IRI Ranks (v5 compatible)

Searching over labels, with text match scores and additional ranks for each iri / resource : RESULTS
SELECT ?s ?page ?label ?textScore ((?s)) as ?iriRank WHERE {
?s foaf:page ?page ; rdfs:label ?label . FILTER( lang(?label) = "en" ) .
?label bif:contains 'adobe and flash' option (score ?textScore ) .
}

Virtuoso 6.1 (Open Source Edition) released. For features & bug fix details see: link

spo


How to fix a noisy computer or graphics card fan

Background:

A few months ago I upgraded my pc, part of which included adding an Asus GeForce 9600 GT; shortly after installing, the Fan on the graphics card started to make a most irritating grinding noise. I found that if I knocked and tilted my case a few times I could get it to stop grinding & quieten down (the hit-it fix). Over time this grinding has become more frequent, to the point that it's almost non-stop; and so that it vibrates the case and thus you can hear it all through the house. It's been driving me insane because everytime I get it to stop, the *slightest* of movements starts it going again, which includes me moving, somebody walking past, infact anything. This weekend it's been especially bad, in-fact I've felt like throwing the f'ing thing out the window; finally this morning I had enough and thought I have to get a new fan for this thing.

Searching the internet found me no replacement fans, which meant I was looking at sending the card back to get it fixed (and thus missing out on work), buying a full new cooling set, or buying a new card - the last two options would have led to pc-murder as seriously this noise goes right through you. Thus after running out of options  I tried to fix it.

How to fix a noisy computer fan:

  1. Remove the fan from the computer (in my case this meant removing the graphics card and unscrewing the plastic cover and fan).
  2. Once you have the fan removed, remove the stickers from it.
  3. On one side you'll notice a small recess with the end of a small metal pin on it (the bit that holds the fan together, and which the fan spins on)
  4. Take a small, toy plastic soldier and chew off the arm (or rifle butt it if has one).
  5. Take the cap off a bottle of Filippo Berio Extra Virgin Olive Oil and fill said cap approx half full with said olive oil.
  6. Dip one end of the plastic arm / rifle in to the olive oil, then dab the smallest of drops on to the small metal pin in the recess of the fan (from step 3)
  7. As you do step 6, gently pull the two parts of the fan about 0.5mm apart and turn the fan clockwise and ant-clockwise.
  8. Repeat steps 6 & 7 until you feel the oil has worked its way in and loosened it, "oiled the fan".
  9. Be very careful not to swamp the thing with oil, it's an electronic part and you'll break it.
  10. Reassemble what you took apart in step 1.

I do believe the specific brand of olive oil, or even the fact it's olive oil isn't of significant importance, likewise the soldier arm / rifle could be anything small that's good for poking.

As an extra bonus, my graphics card is now running 20 degrees cooler when idle :)

Short version:

  1. Oil it

Disclaimer:

This worked for me, I'm most pleased, if it doesn't work for you or you break something, don't blame me - but it might be worth a try.

Regards!

fanfix


Preparing Yourself for Web 3.0, LOD and 2010+

If you work on the net then you'll have probably heard of the "semantic web", it's nice, you can ignore it and get along just fine though; however "Linked Open Data" (LOD) is now upon us and it's one of these things that can't be ignored, no matter which sector of the internet you work in, if you do ignore it you'll probably become extinct (career-wise) pretty soon.

Sounds melodramatic but the whole point of this text is to explain in real terms the effect it'll have on the every day web worker; the web developer, web designer, seo expert, internet marketer etc. So that you, my current or future friends and associates still have a job in a couple  of years; and I researched it so that I would still have a job in a few years (+ because I love this stuff!)

A bit about Linked Open Data (LOD).

LOD can easily be a huge, scary and new thing, overwhelming in so many ways with all this talk of a cloud, billions of bits of information in some part of the web that is separate to "us"; take one look at the diagrams of the linked open data cloud and you'll see those academic acronyms of scientific organisations, future thinking global entities publishing their specialised data - nothing about you and me with our little blogs, and moreover nothing about our clients websites.

Sure it's about getting massive amounts of data on the web, linked and open for use, but it's different to how you expect :)

Linked Open Data is simply about making the info we already put on the net (like this post) machine readable as well as human readable.

It *IS NOT* about creating some system to dump everything from our database in some weird format for a machine to read somewhere.

It *IS* about wrapping the data on a normal page in a bit of markup so that a computer knows what it is.

If you're writing about london you simply add a tiny bit of markup that says 'about="http://en.wikipedia.org/wiki/London"' - honestly that's it in real terms, the user reading your page knows its about London, England - and now a system like google knows that it's definitely about London, England too. In most cases though it's simpler; it's a case of saying this article is titled "x" and made by person "y" - that alone makes a huge difference to the net.

How LOD will change the web.

More Links! We're currently in the age of search, if you want something you search for it, to get more info you search again, and again and so on.

Link trust was at an all time low a few years ago, sure you'd click a navigation link on a site but not a link in a document, because it was probably to a popup, an advert, something you didn't want. Not now though, mainly thanks to bloggers with their in text links to other pages, the world has grown to trust the link again.

Linked Open Data will spawn a massive increase in related data on page, related resources, articles, images, videos and more. And thus many, many more links.

This means that people will search less, and explore more; ever increasingly.

It's unavoidable, and even if a website isn't enhanced with all this extra linked data, odds are the user will have a browser extension or app running that will show all the related information anyway - these technologies are already here and used - adoption *will* grow, no way out of it, change happens.

Info for specific sectors

This isn't the meant to be a full introduction or all encompassing, in fact nowhere near it - if you want the ins and outs of LOD then look elsewhere. This info is for the everyday Web Developer, Web Designer and SEO Specialist.

Web Designers (+ those who work with html)

To be honest I think this change might hit you guy's hardest; you see XHTML+RDFa is already here, it'll be massive soon (and don't go thinking HTML5 will get you out of it, RDFa will be in there too). In short XHTML+RDFa is xhtml as you know it, but with support for embedded RDF information, really it means a few new properties on elements that let you say what they are; in place FOAF, Dublin Core (DC) and the like. Any further description is outside the scope of this document ;)

What this means for you is that as well as having potentially a lot more to display on page (linked data) and lot's of UI challenges, you also now have to cater for this RDFa data in your templates. It's not like other W3C stuff which you can ignore, different to cross browser compatibility, if you leave it out or skip the RDFa stuff then the site will potentially be outside of the LOD network, traffic will drop and ultimately the site may as well not be "in" the web (might be a few years before that though) - so in many ways the end of ignorance and excuses.

You can currently slap out some HTML4, change the doctype, stick on jquery and make it "look" web 2.0 - and people will think it's web 2.0; with web 3.0 (the linked open data based net) you can't do that, it either is web 3.0 or isn't; there isn't a "web 3.0" look, just web 3.0 source.

Drupal 7 has RDFa support out of the box; within the year I bet every CMS & Blog will too; and if you make a new template with the RDFa cut out because you "don't know it", then I'm pretty sure it won't be long before your clients or employers cut you out; and we don't want that.

Further, if you don't - developers will be on your back big time & changing your source; or worse the SEO guys will be ;)

Web Developers
All of you need to know what triples are (subject-predicate-object), and URIs and CURIEs (not your normal URIs, URIs as Identifiers).

If you're going to be exposing data in your systems then you need to get used to mapping database properties through to RDF triples; that a user is a foaf:person with a foaf:name; that tags are ctags and dc:subjects, and that articles have a dc:title (keeping it simple for this).

If you're going to be consuming LOD data then you need to learn a bit more, RDF, SPARQL, Owl, ontologies and a bit more.

And if you want to get "in to" LOD in a big way, then go do it.

SEO Specialists
You need to know what the designers know, and you'll be changing from SEO specialists to data exploration optimizers or suchlike, focus will be on how you can make the data machine readable and get it linked in by the right services.. should be fun!

Further, you'll need to watch for how to get traffic to the sites, as mentioned search engine traffic will drop slowly over the next few months and years; with more focus going on "links" from related pages. As for the diggs & reddits, who knows how it'll effect traffic from them.

Summary

IMHO it's in all of our best interests to just get on with this, it will happen and the sooner YOU do it and convince your employers you have to make this move the better, companies can easily loose clients too if your competition is offering "web 3" and you aren't.

At no point have I seen a tech hit the web which could literally leave people behind if they don't jump on board; it's happened in other industries and now ours (remember VHS?).

The two questions most people / companies / clients will immediately raise..

1] We don't want to expose all our data for reasons X,Y&Z!

LOD isn't about exposing all the your data on the internet; it's about making the data you've already exposed on the internet in a more granular fashion, it's about making that data machine readable.

Presently you may have an article on a page with a title and author credit in HTML, in the future you would still have the same author and title, however they would be wrapped in markup that allows a machine to understand that "Joe Blogs" is a person who is the author of the article, and that the articles title is "I'm scared of exposing my data".

If you consider you're public facing web pages, everything on that page is already exposed, all we're doing here is describing what each bit of data is in a way we can all use.

2] Trust & Junk

One common misconception is that you have no control over the source of the data you pull from the "cloud", and that it could essentially be junk. However this couldn't be further from the truth, what we do is to find a source of data we trust that has their data exposed in a machine readable format, then query it for the exact information we want, and finally include or display it in our own system.

To illustrate, consider you wanted to reference the countries of the world with population in your system. Currently you would have to build  a database table, populate that data with country name and country population, then write some code to display that data. In this scenario you'd probably get the population data from a credible source such as wikipedia (copy and paste it in to your own database).

By using linked open data, you could treat the machine readable version of wikipedia (dbpedia) as your database table, query it instead and again write some code to display the data on you're own site.

You're displaying the same data, from the same trusted source; and you've selected which source you trust; it's not a case of just querying some cloud of data; it's a case of choosing which source(s) you want / trust and querying them.

As an additional bonus you don't need to worry about your information going out of date, as you're getting the data straight from source, the population of each country is updated on your site whenever it's updated on wikipedia.

Further, you don't need to worry about maintaining that list of countries, as in a single query you can pull out a list of all countries with each ones population, as the world grows and changes, so does your data.

Further still! once you've made the move to using some linked open data, all the data you could want is at your finger tips, let's say a decision is made to include 30 different bits of information about each country in your system. Consider that task for a minute - full system change, finding, collating and entering all that data; let alone maintaining it! Well, I'm sure you can guess the next bit, using LOD we can simply expand our original query to include the other bits of information we want, then display it - job done.

That's it.

Good Luck!

nathan

future


The end of Search? Linked Data, Semantic Web & thoughts.

Earlier today I was reading an interesting post by Georgi Kobilarov entitled "What’s wrong with the Linked Data world, part 1 - Keyword Search"; this particularly sparked my interest because in all honesty "search" had never came in to my vision of the semantic web / linked data world.

To me, the draw of linked data and the semantic web has always been exploration; the notion that even the most unskilled of publishers should be able to enrich their content via semi-automated software to the standards of a near perfect wikipedia article has always been the driving force. Additionally, content classification, relation, linkage, data centralization and the like are all major benefits which will make a vast difference to the usability of the web.

Search will always be a major part of the internet, at the moment we use search to find content on a specific subject, then search again to find more, and search again to find related or expanded info, help, facts, answers, whatever; however, in the future I hope to see search move to a less prominent role, one where we use search to find the most suitable "entry point" in to the web of linked data - and from there every other piece of related / expanded information is either on page, or a click (link) away.

Some major hurdles need to be jumped before we can get to that stage though, both through lack of organization and lack of appropriate software. Personally I have a mental blueprint / overview of what's needed (imho), and some very specific ideas on the software side, with any luck I'll get a chance to contribute + build some of this, we'll see.

Some thoughts of what's needed from my little brain.

Linked Data Ping
A central service API which is pinged by all software as it publishes information with machine accessible content. (Needed way before (x)HTML+RDFa takes off). Provides a stream of all recent pings to be consumed (xmpp pub-sub?).

Clustered Servers holding a centralized data GUID lookup and proxy.
In essence all resources on the net should be a linked pair of GUID to endpoint, each endpoint should contain a reference to the GUID, and each GUID should be a URI which redirects to the endpoint, endpoints change GUID/URI stays unique. In an ideal scenario when somebody creates a link to X resource or Y document, the publishing/controlling software should replace the endpoint with the GUID instead. This would also enable multiple other services such as centralized pingback, references, statistics etc.

Machine Readable Data Cache.
Together with the aforementioned services a high availability database of cache'd information should exist; in principal this would work by reading the stream of "Linked Data Pings", getting the GUID for the content and then retrieving all machine readable data and caching it. Much like the RDF data exposed through dbpedia, however for everything. Even if only a predefined subset of the common rdf vocabularies was stored and exposed it'd be enough to start, from there all other domain specific ontology could be retrieved by reading the endpoint itself.

Semantic CMS
Ideally we need a new breed of CMS, one that not only has simple FOAF and Dublin Core (~Drupal 7), but also support for full content enrichment using the aforementioned machine services; and provides a simple UI for manually exposing entities, events, facts etc. (Think highlight name in text, mark as Person with Name, system finds guid and builds relevant RDFa and we have another triple of linked data.)

The possibilities from this point are endless; if you're reading this document after all this has been made, then you'd see a whole host of in text links through to more information on each keyphrase, person, entity etc; you'd be aided by auto injection of sources, related reading, comments, further documents discussing the content here, in short you'd be exploring the net one click at a time, linked data all the way; not searching.

In summary (and very much imho), Linked Data is not for searching, it's for linking data - search was invented to address the issue that everything isn't linked, when it is then the link takes precedence again.

My only worry in all of this, is the idea that all rdf triples are fact, and true - already the major search engines are exposing rdfa data in summaries, 5* ratings on products and suchlike, the room for abuse will only get worse.

Thanks Georgi for placing the spark that clarified my current thoughts.

Finally, this isn't a biased opinion in anyway, or an endorsement, but to me openlink virtuoso, dbpedia, zemanta and open calais are leading the way and enabling all of this; together with the hard working folks contributing to the various linked W3C projects and specs. If only dbpedia/zemanta/calais would unify there uri's/guids/endpoints we'd be a lot further along.. (well I would ;).

Regards!

linkeddata


Developer Cry

devcry
Whilst in my spare time I'm literally cramming to take in all the new technologies on the web and learn how to use them all, in working hours even the meatiest of projects involves old skills learned years ago; and worse, the common projects listed everywhere are little more than install X CMS, with Y Plugins and Z Template + a couple of tweaks.

Much of the work has been dumbed down to repetitive, narrow scoped projects that really take little to no "developer knowledge".

I'm sitting here racking my brain, how to find a way to spend 2010 learning further and putting all these good things in to practise? I'd love to spend a good chunk of the year piecing together all these new tech's to create something great; and preferably with some of the ultra skilled developers I know - the scope is incredible, the boundaries are virtually limitless, and moreover, the technologies are there for the picking!

Surely, somewhere, there is a business man sitting racking his brain, fed up with the same old sites every one has, aware of all these new techs, great API's, and with the notion that something great can be created; posting job offers and hunting down the right developers for the task?

I just can't grasp that with all the forward thinking companies and people out there pushing boundaries, that nobody is wanting to take advantage of that and tie them all together.

So here's the problem; how to pair the two? how do we match up to the right companies & businesses. In fact.. how do I find the right people who want to exploit all of this?

Link up the goodies available from the semantic web, with the tech advances in streaming technologies, via a whole host of API's, using the latest languages to target multiple platforms; polish it all off with the latest in graphical interfaces and develop a realtime monster application to be proud of.

I'd love to list the tech's involved, but the list would be 100+; and more to the point - without a set project description to mesh this all together, we just don't know. But.. anything is possible.

Developers can you think of a way to break the masses in to the next generation of the internet?

Companies & business men, can you think of some projects to get us there?

The next stage of the net is sooo close, yet so far, all we need to do is grab it and go - I know there's a lot of forward thinkers that read this blog.. rack you're brains for the good of us all!

Regards :)


Forced Coding

forced-coding

update:
I've been off for a week after finishing a project and moving house, in that time this domain dropped and just prior to that appeared to get 20k+ reads which is a bit omg. Anyways many comments all over the net, and certainly on reddit (comments here) - and thus just to clarify, the entire content of this post is purely a note to myself, and to help me in those times when you can't get going in the morning or such like.. in no way am I suggesting you don't plan or do things properly whenever you can - this is literally just some ways of getting through the day. Thought I'd made that clear and most people got it ;)
end update:

This one is more for my own reference, but sharing anyways as it may help others. In numerous scenarios it is really hard to "get going" when you're trying to code, particularly under the following circumstances:

* You haven't started coding till late(r) in the day
* Emails, Blogs, Social networking have taken up more time than expected and/or distracted more than anticipated
* You've just completed a deliverable, milestone or task.
* You're tired!

All of these are virtually daily occurances in the coding world, and here are the methods I've used to get going again.. no particular order, just a list.

Under all circumstance, avoid planning!
Planning is one of those things you can't do unless you are all ready in the flow, whether this is because you've just had a client meeting, a long discussion, or read a full spec - it's not the thing to do to get your flow going, all you'll do is plan nothing, plan badly, or stare blankly at the screen / paper.

Don't read related material to get you in the flow.
This will purely serve to distract you, make you think about doing things differently, doublt what you've done or worse throw you in to planning mode - fact is you won't be planning "your" app though, you'll be planning "some" ideal app or scenario.

Pick the smallest task, whether complex or not, and just do it!
Doesn't matter what it is, so long as it's coding a little part of the app, or modifying part of it, then it'll do. It could be adding an extra field to an object or table, popping in some validation, anything small and simple. It really doesn't matter if you do it right or wrong; you're not doing it to sign off a task, you're doing it to re-aquaint yourself with your system, by the time you've been through X lines of code you'll be back in work mode and firing on all cylinders, well on your way to getting zoned.

Music, Headphones, Repeat.
You'll know the genre that suits you, personally I find repeating an album or even song fades me in to the zone and keeps me there. The repetitiveness of the tune keeps you there, because just as a phone call can distract you, so can a change in tune to something at a different tempo or worse a completely different genre.

Don't cram!
If you've only got 15 minutes before the next sizable interuption, forget it, don't do anything just chill - make a coffee, smoke, whatever. You're not wasting time you're saving your zone, you can only get zoned a couple of times a day, so don't get zoned for only 15 minutes - save it and get zoned for longer later on.

Speedcode
Why not? as nike say "just do it", if the code has 10 bugs but is finished in half the time then you've done good, that gives you loads of time to fix the bugs, and more importantly you get to those moments where you realise x,y&z need to be changed much quicker. Not only that, but would you rather have a week to go and have a list of 80 bugs, or a week to go and 2 major deliverables a week overdue..

Communicate for no reason.
Often a major focus is simply talking to somebody else on the same project as you, whether its the client or a workmate, and the more stressed they are the better, they'll not only blast you with things but their urgency / stress will often convey straight over to you and focus / zone you instantly.

Do the thing you know, not the thing you don't.
Inline with "don't plan" and "pick the smallest task", always pick something you can already do (if possible), as with everything else, the things you don't know are much easier when you're already zoned, not only that but you'll be more focussed when doing the thing you don't know so less likely to over spec / over code it.

Don't code other things!
nothing on earth will kill your project like working on something else, every minute you spend on another script or app is like an hour lost on what you should be doing, and with every minute that passes you're getting closer and closer to utter project failure - and hence why most open source projects are dead.

Remember, the key to forced coding is just to get you in the zone, and ultimately boils down to just getting stuck in there with some lines of code on your project.

Works for me anyways [most of the time] - if anybody has anything to add (constructive) then please do!

Regards - nathan



  • webr3 avatar