<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>webr3.org &#187; internet</title>
	<atom:link href="http://webr3.org/blog/category/internet/feed/" rel="self" type="application/rss+xml" />
	<link>http://webr3.org/blog</link>
	<description>brain&#039;s on fire!</description>
	<lastBuildDate>Tue, 19 Jul 2011 15:38:29 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.9.2</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>linked data extractor prototype details</title>
		<link>http://webr3.org/blog/experiments/linked-data-extractor-prototype-details/</link>
		<comments>http://webr3.org/blog/experiments/linked-data-extractor-prototype-details/#comments</comments>
		<pubDate>Tue, 13 Apr 2010 18:53:43 +0000</pubDate>
		<dc:creator>nathan</dc:creator>
				<category><![CDATA[experiments]]></category>
		<category><![CDATA[internet]]></category>
		<category><![CDATA[linked data]]></category>
		<category><![CDATA[semantic web]]></category>
		<category><![CDATA[virtuoso]]></category>
		<category><![CDATA[Computing]]></category>
		<category><![CDATA[DBpedia]]></category>
		<category><![CDATA[Education]]></category>
		<category><![CDATA[extractor]]></category>
		<category><![CDATA[Open access]]></category>
		<category><![CDATA[World Wide Web]]></category>

		<guid isPermaLink="false">http://webr3.org/blog/?p=308</guid>
		<description><![CDATA[I recently released a prototype linked data semantic extraction demo which combines OpenCalais, Zemanta and Openlink Virtuoso to effectively categorize and work out what a given peice of text / document is about.
OpenCalais and Zemanta usage details and service comparison.
The demo leverages OpenCalais in order to pick up references to things, which are returned in [...]]]></description>
			<content:encoded><![CDATA[<p>I recently released a <a href="http://extractor.data.fm/?test">prototype linked data semantic extraction</a> demo which combines <a href="http://www.opencalais.com/">OpenCalais</a>, <a href="http://developer.zemanta.com/">Zemanta</a> and <a href="http://virtuoso.openlinksw.com/">Openlink Virtuoso</a> to effectively categorize and work out what a given peice of text / document is about.</p>
<h3>OpenCalais and Zemanta usage details and service comparison.</h3>
<p>The demo leverages OpenCalais in order to pick up references to things, which are returned in most cases as string literals; OpenCalais can also be configured to return back socialtags which give a broad stroke idea of what the document is about, again with string literal "tags". With regards the references (semantic metadata, Entities, Facts, Events etc.) which OpenCalais returns, whilst it is generally string literals, it also returns back vital Type and Relevance information, so in the case of "London" it will also assert that London is a City. Even in the case where it doesn't previously know what a thing is, it can work out that say "Frank Neverbeenheardofbefore" is a Person.</p>
<p>Zemanta is also leveraged, the primary difference between Zemanta and OpenCalais (and thus the need for both services) is that Zemanta focuses more on accurate tagging of text. Primarily though, Zemanta tags (again string literals) are meaningful tags which are commonly known and are referenced to either existing Linked Data identifiers such as http://dbpedia.org/resource/London and further information about the tag (or thing), in the case of the aforementioned London, then it will often also provide links to the wikipedia page for London, the official homepage to the city of London and a link to show the position of London on google maps.</p>
<p>I should point out that ever increasingly OpenCalais also returns back Linked Data too, for instance in the case of London they have given it an HTTP URI which can be dereferenced to retrieve more information about London. At a very crude estimation I would suggest that (depending on the subject matter) OpenCalais returns Linked Data URIs for about 15% of all references it finds to well known "things".</p>
<p>Weighing up the two services I couldn't say that one is better than the other, both have advantages and disadvantages, the only way to get a decent overall picture is to use both. for the benefits of feedback to both of these great services though, here is a general comparison:</p>
<p>note: none of these figures are from exact tests, they are from extensive developer usage of both services as I've used them both since they were made public.</p>
<p>Zemanta is generally 2x as fast for average texts (the size of this post for instance) and as much as 5x as fast for longer texts. Average for Zemanta being 0.7 to 2 seconds. Average for OpenCalais being 1.5 to 10 seconds. It may also be worth noting that the availability of Zemanta is somewhat higher than that of OpenCalais, perhaps 1 in 250 calls to OpenCalais will fail.</p>
<p>OpenCalais does a lot more heavy work than Zemanta though, and *really* semantically analyzes the text to figure out a wealth of information. In this respect the tables are completely turned and Zemanta consitently deals with providing a few high quality known tags; where as OpenCalais often provides at least 10x as much information about a given text, including relevance and type as mentioned before. OpenCalais also extracts Facts / Events, and further it can figure out that "Jim" is also "Jim Bob", and that Jim said X about Y on date D.</p>
<p>Generally you can trust the data from Zemanta 99% as it deals with "known" things, however due to this in some cases very new topics (such as IPad for the first few days after its announcement) remain unknown. Due to the nature of OpenCalais and it's dealing with the unknown you need to take more time to verify what it has returned, however when OpenCalais assigns a LinkedData identifier to something or provides more information you can 99.99% trust that it is entirely accurate.</p>
<p>It's worth noting that both of these services do different things though, and both do it extremely well, Zemanta "tags" and OpenCalais "semantically extracts information", in some respects I was hesitant about comparing the two, as in the context of what I'm doing both are needed and both are equal, however in different contexts both do different jobs and there is a need for people to select one over the other.</p>
<p>Out of all the competition though, I would highly recommend both Zemanta and OpenCalais over their respective competitors, and do hope that neither of these great services ever decide to target each others markets. (e.g. they compliment each other well and both do so well because they stick to what they are good at).</p>
<h3>extractor.data.fm details</h3>
<p>This demo deals primarily with figuring out what a document is about; in that it aims to provide back a list of:</p>
<ul>
<li><strong>Categories</strong><br />A list of 1-5 dbpedia (and therefore wikipedia) categories which the provided document would be categorized under if it were a wikipedia article and had been categorized by a huma who was knowledgeable in the subject domain(s) of the text.</li>
<li><strong>General Topics</strong><br />A short list of the general and broad-strokes Subjects covered by the document, these can are distinct from the primary specific subjects covered and the categories, and in many ways can be seen as the most common intersections between the primary specific subjects discussed.</li>
<li><strong>Primary Subjects</strong><br />These are the specific subjects covered in the document, not just the things mentioned, but the things actively discussed within the document, the primary subject matter as it were.</li>
<li><strong>Related or Mentioned Subjects</strong><br />Whilst I've termed them "related" as in dcterms:related, these are simply things which have been detected in the document or text and which are not primary subjects; in many ways "mentions" may be a more appropriate term.</li>
</ul>
<p>Out of the above list, the two services do the heavy lifting to give the demo it's Primary Subjects and Related Subjects; in short OpenCalais' SocialTags and Zemanta's Tags give us back our Primary Subjects. Whilst OpenCalais by way of the semantic extraction provide us with the Related Subjects, namely all those extracted semantics which have the Type of a real thing (not an IndustryTerm or Event) and which are not all ready a Primary Subject; additionally those extracted semantics which are not tags but have a relevance higher than a certain score are boosted up to be Primary Subjects too.</p>
<p>A primary and initial function of the demo is to associate the tags returned by both services together, and figure out when each is talking about the same thing; this is covered first by dealing with the linked data they return; where both services are talking about the same thing you simply know this unambiguously due to the nature of http URIs and them both being the "sameAs" each other. After this two chunks of unhandled data remain, Zemanta tags which have not determined to be the sameas OpenCalais ones; and OpenCalais semantics which we have a string literal name for and a type.</p>
<p>In step <a href="http://virtuoso.openlinksw.com/">Openlink Virtuoso</a> 6.1 (<a href="http://virtuoso.openlinksw.com/dataspace/dav/wiki/Main/VOSIndex">open source edition</a>!) with most of dbpedia 3.4 loaded in to do the heavy lifting from here on; Virtuoso is a really powerful bit of kit and has replaced  mysql/sql server/postgres, rdf store and web dav server in my typical server stack. The public lod and dbpedia endpoints really do no justice as to just how powerful and fast Virtuoso is, queries which take a few seconds on the public endpoint return in hundredths of a second on my local (low spec) server, and the comparative performance to the aforementioned RDBMS solutions is not to be sniffed at.</p>
<p>To handle the typed string literals from OpenCalais, I built a custom dbpedia lookup service (using sparql over the aforementioned Virtuoso + dbpedia setup) which tries to unambiguously determine the identifier for a string literal, if it is known; the results are pretty good and I'd safely say that it gets it right in 98% of cases. This essentially turns the remaining unknown string literals in to known Linked Data URIs, and as a side benefit gives the correct full Name for the thing identified along with the correct casing and obviously much more linked data.</p>
<p>Remaining now the demo has a few OpenCalais semantics which are still unknown, but we know the Type and have a name for the thing; and as URIs are given to things that can be Named, I simply mint my own uri's for these and specify the OpenCalais identifier as a "seeAlso" (to be future compatible with a time where they do associate there own hash uris through to linked data).</p>
<p>At this point the demo has all of the Primary Subjects and Related Subjects determined and where possible linked through to additional LinkedData and human readable web documents about the subjects.</p>
<h4>Categorization</h4>
<p>This is where the script comes in to it's own and really leverages virtuoso, up till this point it's all been about cleaning, validating, looking up, associating and suchlike.</p>
<p>Given that we now have linked data HTTP URIs for all the subjects we are dealing with, and in all Primary Subject cases we also have dbpedia.org URIs the demo can start to use some of Virtuoso's more powerful features. First point of call is to get the Category intersection of all primary subjects (including the inferred categories!) via a slightly complex transitive sparql query over the dbpedia dataset. From here the demo calculates a set of primary categories which the text is related to, then it finds the general category intersection (again including inferred categories) between the primary categories, and the primary subjects. with the results returned is a wealth of numerical information which the demo dually considers and can then infer which are the General Subjects and the Categories for the text.</p>
<p>At some point I'll cover this part of the script in more details and give some virtuoso specific transitive SPARQL queries for you to use in your own such creations, but for now the above will have to do.</p>
<h3>Conclusion</h3>
<p>This extractor demo is something I've been working on and trying to achieve for about 5 years, and whilst it is still early days it's the first time the technologies have been available to both make it possible, and to utilize the results correctly to achieve what I'm aiming for overall.</p>
<p>The overall goal is to create a system which allows users to simply drop in content, and the system "files" it in the correct categories, lists it under the correct subjects and interlinks it with other resource via typed links such as "related resources" and looser resource lists of "also mentioned here", further benefits of such a system are that you can accurately figure out what readers are interested in and promote new content to them, you can give users the option of content streams where they can watch specific subjects or combination of subjects to be notified of their "ideal" reading. On the flip side you can also identify users and contributers interests and expertise, and correlate these together (with geo-location) to suggest others users who they may wish to collaborate with, other organisations doing the same work in the same fields and many such uses. In reality I have much of this implemented in a site I've been working on for the last year, which is just being rolled out again, and the system works extremely well with huge benefits to all involved, the site you see deals with climate adaptation and both provides a service to the general adaptation community where they can share and find knowledge, and more importantly serves organisations working on critical issues by letting them see which people / organisations / projects are doing what, where and allows them to both co-ordinate efforts and perhaps more importantly not duplicate efforts and waste resources where it counts most. This has a positive impact on the worlds poorest nations and those suffering people who these organisations are trying to work with and help.</p>
<p>Back to the demo, and with the context described, the extractor.data.fm demo is a quick UI around an API which is in many ways the backbone of the aforementioned system. The API is used in a semi-automated way, where the data returned by it is verified in a UI by the content author / admins who remove any unambiguous data and then hit save, from there everything is automated again and the system functions as above.</p>
<p>I'm unsure whether this kind of system will ever be able to be fully automated (or whether its wise to allow this) as certain scenarios just can't be covered yet, a real life example of this is an initiative called "TEA", ambiguity at this level, and with entities which are unknown to systems or even the web of data, will always be an issue at some point, as things progress it may be they are only ambiguous once, on their first discovery, but that is still once; hence why this may always have to be a semi-automated process.</p>
]]></content:encoded>
			<wfw:commentRss>http://webr3.org/blog/experiments/linked-data-extractor-prototype-details/feed/</wfw:commentRss>
		<slash:comments>6</slash:comments>
		</item>
		<item>
		<title>Reading List : Web, Linked Data, REST, Semantic Web</title>
		<link>http://webr3.org/blog/internet/reading-list-web-linked-data-rest-semantic-web/</link>
		<comments>http://webr3.org/blog/internet/reading-list-web-linked-data-rest-semantic-web/#comments</comments>
		<pubDate>Wed, 10 Mar 2010 02:03:17 +0000</pubDate>
		<dc:creator>nathan</dc:creator>
				<category><![CDATA[internet]]></category>
		<category><![CDATA[1.1 Uniform HTTP Protocol]]></category>
		<category><![CDATA[API]]></category>
		<category><![CDATA[Atom Publishing Protocol]]></category>
		<category><![CDATA[Computing]]></category>
		<category><![CDATA[HTTP]]></category>
		<category><![CDATA[Knowledge representation]]></category>
		<category><![CDATA[linked data]]></category>
		<category><![CDATA[Paging]]></category>
		<category><![CDATA[PHP]]></category>
		<category><![CDATA[Query languages]]></category>
		<category><![CDATA[rdf]]></category>
		<category><![CDATA[Reification]]></category>
		<category><![CDATA[Resource]]></category>
		<category><![CDATA[Roy T. Fielding]]></category>
		<category><![CDATA[Roy T. Fielding Dissertation]]></category>
		<category><![CDATA[semantic web]]></category>
		<category><![CDATA[Simple Knowledge Organization System]]></category>
		<category><![CDATA[Software engineering]]></category>
		<category><![CDATA[SPARQL]]></category>
		<category><![CDATA[Technology/Internet]]></category>
		<category><![CDATA[URIs Resources]]></category>
		<category><![CDATA[Web Access Control]]></category>
		<category><![CDATA[Web Architecture]]></category>
		<category><![CDATA[Web Resources     Named Graphs]]></category>
		<category><![CDATA[Web services]]></category>
		<category><![CDATA[Web Tutorial   Mindswap]]></category>
		<category><![CDATA[World Wide Web]]></category>
		<category><![CDATA[write-enabled Web]]></category>
		<category><![CDATA[XML]]></category>

		<guid isPermaLink="false">http://webr3.org/blog/?p=264</guid>
		<description><![CDATA[Personally, I have two types of reading, the posts etc that I "tweet" and then the heavier reading I do over time; this is a list of the latter for the past month - hopefully it'll help somebody who's looking for the same kind of info I have been.
I've grouped all the links in to [...]]]></description>
			<content:encoded><![CDATA[<p>Personally, I have two types of reading, the posts etc that I "tweet" and then the heavier reading I do over time; this is a list of the latter for the past month - hopefully it'll help somebody who's looking for the same kind of info I have been.</p>
<p>I've grouped all the links in to two main sections, and then sub-grouped by how they make sense in my head! :)</p>
<h3>Web, HTTP and REST</h3>
<p>Roy T. Fielding Dissertation - <a href="http://www.ics.uci.edu/~fielding/pubs/dissertation/top.htm" target="_blank">Architectural Styles and the Design of Network-based Software Architectures</a> Of particular relevance and note are chapters 4-6 (many only ever read chapter 5 and miss the context + summary *needed* in chapters 4 and 6!)</p>
<ul>
<li><a href="http://www.ics.uci.edu/~fielding/pubs/dissertation/web_arch_domain.htm" target="_blank"> Chapter 4 - Designing the Web Architecture: Problems and Insights</a></li>
<li><a href="http://www.ics.uci.edu/~fielding/pubs/dissertation/rest_arch_style.htm" target="_blank"> Chapter 5 - Representational State Transfer (REST)</a></li>
<li><a href="http://www.ics.uci.edu/~fielding/pubs/dissertation/evaluation.htm" target="_blank"> Chapter 6 - Experience and Evaluation</a></li>
</ul>
<p><a href="http://roy.gbiv.com/untangled/2008/rest-apis-must-be-hypertext-driven" target="_blank"> Roy T. Fielding - REST APIs must be hypertext-driven</a><br />
<a href="http://www.mail-archive.com/whatwg@lists.whatwg.org/msg12443.html" target="_blank"> Discussion on HTML5 and RESTful HTTP in browsers</a><br />
<a href="http://tech.groups.yahoo.com/group/rest-discuss/message/5168" target="_blank"> Discussion on URIs Resources and Switching content types w/ REST angle (v good)</a></p>
<p><a href="http://www.w3.org/Protocols/rfc2616/rfc2616.html" target="_blank">RFC 2616 HTTP/1.1</a> and the HTTPbis Working Group HTTP/1.1 update in parts:</p>
<ol>
<li><a href="http://tools.ietf.org/html/draft-ietf-httpbis-p1-messaging" target="_blank">Messaging</a></li>
<li><a href="http://tools.ietf.org/html/draft-ietf-httpbis-p2-semantics" target="_blank">Semantics</a></li>
<li><a href="http://tools.ietf.org/html/draft-ietf-httpbis-p3-payload" target="_blank">Payload</a></li>
<li><a href="http://tools.ietf.org/html/draft-ietf-httpbis-p4-conditional" target="_blank">Conditional</a></li>
<li><a href="http://tools.ietf.org/html/draft-ietf-httpbis-p5-range" target="_blank">Range</a></li>
<li><a href="http://tools.ietf.org/html/draft-ietf-httpbis-p6-cache" target="_blank">Cache</a></li>
<li><a href="http://tools.ietf.org/html/draft-ietf-httpbis-p7-auth" target="_blank">Authentication</a></li>
</ol>
<h3>Linked Data and the Semantic Web</h3>
<p><a href="http://esw.w3.org/topic/SweoIG/TaskForces/CommunityProjects/LinkingOpenData" target="_blank"> Linking Open Data Community Project</a><br />
<a href="http://esw.w3.org/topic/SweoIG/TaskForces/CommunityProjects/LinkingOpenData/Applications" target="_blank"> Linked Data Applications</a><br />
<a href="http://esw.w3.org/topic/TaskForces/CommunityProjects/LinkingOpenData/EquivalenceMining" target="_blank"> Equivalence Mining and Matching Frameworks</a><br />
<a href="http://esw.w3.org/topic/TaskForces/CommunityProjects/LinkingOpenData/SemWebClients" target="_blank"> Linked Data Browsers, Mashups and other Client Applications</a></p>
<p><a href="http://esw.w3.org/topic/DatasetDynamics" target="_blank"> Dataset Dynamics - On the Dynamics of Linked Datasets</a><br />
<a href="http://esw.w3.org/topic/WriteWebOfData" target="_blank"> Realizing a write-enabled Web of Data</a><br />
<a href="http://esw.w3.org/topic/WebAccessControl" target="_blank"> Web Access Control (WAC)</a>  - a decentralized system for allowing different users and groups various forms of access to resources where users and groups are identified by HTTP URIs.<br />
<a href="http://esw.w3.org/topic/WebAccessControl/Vocabulary" target="_blank"> Discussion of the WAC vocabulary</a><br />
<a href="http://www.w3.org/DesignIssues/CloudStorage.html" target="_blank"> Socially Aware Cloud Storage Design Note</a><br />
<a href="http://www.w3.org/2010/Talks/0303-socialcloud-tbl/" target="_blank"> Distributed Social Networking through Socially Aware Cloud Storage from TimBL</a><br />
<a href="http://esw.w3.org/topic/AwwswHome" target="_blank"> AWWSW - "Architecture of the World Wide Semantic Web" Task Force</a></p>
<p><a href="http://www.w3.org/TR/sparql11-http-rdf-update/" target="_blank"> SPARQL 1.1 Uniform HTTP Protocol for Managing RDF Graphs</a><br />
<a href="http://www4.wiwiss.fu-berlin.de/pubby/" target="_blank"> A Linked Data Frontend for SPARQL Endpoints</a><br />
<a href="http://www4.wiwiss.fu-berlin.de/bizer/rdfapi/index.html" target="_blank"> RAP - RDF API for PHP V0.9.6</a><br />
<a href="http://buzzword.org.uk/2009/posted-data/" target="_blank"> Inav the Terrible - An idea for posting RDF through HTTP.</a></p>
<p><a href="http://n2.talis.com/wiki/Changesets" target="_blank"> Talis Changesets</a><br />
<a href="http://triplify.org/vocabulary/update" target="_blank"> Triplify Update Vocabulary</a></p>
<p><a href="http://inkdroid.org/journal/2009/11/04/skos-as-atom/" target="_blank"> skos as atom</a></p>
<p><a href="http://tools.ietf.org/html/rfc4287" target="_blank"> RFC 4287 - The Atom Syndication Format</a><br />
<a href="http://tools.ietf.org/html/rfc5023" target="_blank"> RFC 5023 - The Atom Publishing Protocol</a><br />
<a href="http://ietfreport.isoc.org/all-ids/draft-snell-atompub-tombstones-06.txt" target="_blank"> AtomPub Tombstones - The Atom "deleted-entry" Element</a><br />
<a href="http://tools.ietf.org/html/rfc5005" target="_blank"> RFC 5005 - Feed Paging and Archiving</a><br />
<a href="http://tools.ietf.org/html/draft-brown-versioning-link-relations-07" target="_blank"> Versioning Link Relations - Link Relation Types for Simple Version Navigation between Web Resources</a></p>
<p><a href="http://www2005.org/cdrom/docs/p613.pdf" target="_blank"> Named Graphs, Provenance and Trust</a><br />
<a href="http://sunsite.informatik.rwth-aachen.de/Publications/CEUR-WS/Vol-521/paper1.pdf" target="_blank"> Accessing Site-Specific APIs Through Write-Wrappers From The Web of Data</a><br />
<a href="http://events.linkeddata.org/ldow2009/papers/ldow2009_paper18.pdf" target="_blank"> Provenance Information in the Web of Data - LDOW 2009 paper</a><br />
<a href="http://www.w3.org/2002/Talks/0910-rdf-reification/Overview.html" target="_blank"> Using Reification To Extend RDF</a> (historical reification approach)<br />
<a href="http://dig.csail.mit.edu/2009/presbrey/UAP.pdf" target="_blank"> RDF Policy-based URI Access Control for Content Authoring</a><br />
<a href="http://eprints.ecs.soton.ac.uk/18332/1/opm.pdf" target="_blank"> The Open Provenance Model Core Specification (v1.1)</a><br />
<a href="http://www.w3.org/2005/Incubator/prov/wiki/Main_Page" target="_blank"> W3C Provenance Incubator Group</a></p>
<p><a href="http://www.w3.org/History/" target="_blank"> History of the Web 1945, 1980 through 1997 on W3</a><br />
<a href="http://www.w3.org/TR/leiri/" target="_blank"> LEIRI - Legacy extended IRIs for XML resource identification</a> The type of "URI" used in xml:base<br />
<a href="http://www.slideshare.net/LeeFeigenbaum/cshals-2010-w3c-semanic-web-tutorial" target="_blank"> CSHALS 2010 W3C Semanic Web Tutorial</a><br />
<a href="http://www.mindswap.org/2002/rdfconvert/" target="_blank">Mindswap online RDF Converter</a><br />
<a href="http://www.w3.org/RDF/Validator/" target="_blank">W3 online RDF Validator</a></p>
]]></content:encoded>
			<wfw:commentRss>http://webr3.org/blog/internet/reading-list-web-linked-data-rest-semantic-web/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Preparing Yourself for Web 3.0, LOD and 2010+</title>
		<link>http://webr3.org/blog/featured/preparing-yourself-for-web-3-0-lod-and-2010/</link>
		<comments>http://webr3.org/blog/featured/preparing-yourself-for-web-3-0-lod-and-2010/#comments</comments>
		<pubDate>Fri, 30 Oct 2009 00:08:51 +0000</pubDate>
		<dc:creator>nathan</dc:creator>
				<category><![CDATA[RDFa]]></category>
		<category><![CDATA[featured]]></category>
		<category><![CDATA[general]]></category>
		<category><![CDATA[internet]]></category>
		<category><![CDATA[linked data]]></category>
		<category><![CDATA[semantic web]]></category>
		<category><![CDATA[author]]></category>
		<category><![CDATA[everyday Web Developer]]></category>
		<category><![CDATA[FOAF]]></category>
		<category><![CDATA[HTML]]></category>
		<category><![CDATA[London]]></category>
		<category><![CDATA[mentioned search engine traffic]]></category>
		<category><![CDATA[Open Data]]></category>
		<category><![CDATA[public facing web pages]]></category>
		<category><![CDATA[rdf]]></category>
		<category><![CDATA[same author]]></category>
		<category><![CDATA[search engine]]></category>
		<category><![CDATA[Technology/Internet]]></category>
		<category><![CDATA[United Kingdom]]></category>
		<category><![CDATA[Web 2.0]]></category>
		<category><![CDATA[Web Designer and SEO Specialist]]></category>
		<category><![CDATA[Web Developers]]></category>
		<category><![CDATA[XHTML]]></category>

		<guid isPermaLink="false">http://webr3.org/blog/?p=172</guid>
		<description><![CDATA[If you work on the net then you'll have probably heard of the "semantic web", it's nice, you can ignore it and get along just fine though; however "Linked Open Data" (LOD) is now upon us and it's one of these things that can't be ignored, no matter which sector of the internet you work [...]]]></description>
			<content:encoded><![CDATA[<p>If you work on the net then you'll have probably heard of the "semantic web", it's nice, you can ignore it and get along just fine though; however "Linked Open Data" (LOD) is now upon us and it's one of these things that can't be ignored, no matter which sector of the internet you work in, if you do ignore it you'll probably become extinct (career-wise) pretty soon.</p>
<p>Sounds melodramatic but the whole point of this text is to explain in real terms the effect it'll have on the every day web worker; the web developer, web designer, seo expert, internet marketer etc. So that you, my current or future friends and associates still have a job in a couple  of years; and I researched it so that I would still have a job in a few years (+ because I love this stuff!)</p>
<h2><strong>A bit about Linked Open Data (LOD).</strong></h2>
<p>LOD can easily be a huge, scary and new thing, overwhelming in so many ways with all this talk of a cloud, billions of bits of information in some part of the web that is separate to "us"; take one look at the diagrams of the linked open data cloud and you'll see those academic acronyms of scientific organisations, future thinking global entities publishing their specialised data - nothing about you and me with our little blogs, and moreover nothing about our clients websites.</p>
<p>Sure it's about getting massive amounts of data on the web, linked and open for use, but it's different to how you expect :)</p>
<p>Linked Open Data is simply about making the info we already put on the net (like this post) machine readable as well as human readable.</p>
<p>It *IS NOT* about creating some system to dump everything from our database in some weird format for a machine to read somewhere.</p>
<p>It *IS* about wrapping the data on a normal page in a bit of markup so that a computer knows what it is.</p>
<p>If you're writing about london you simply add a tiny bit of markup that says 'about="http://en.wikipedia.org/wiki/London"' - honestly that's it in real terms, the user reading your page knows its about London, England - and now a system like google knows that it's definitely about London, England too. In most cases though it's simpler; it's a case of saying this article is titled "x" and made by person "y" - that alone makes a huge difference to the net.</p>
<h2><strong>How LOD will change the web.</strong></h2>
<p>More Links! We're currently in the age of search, if you want something you search for it, to get more info you search again, and again and so on.</p>
<p>Link trust was at an all time low a few years ago, sure you'd click a navigation link on a site but not a link in a document, because it was probably to a popup, an advert, something you didn't want. Not now though, mainly thanks to bloggers with their in text links to other pages, the world has grown to trust the link again.</p>
<p>Linked Open Data will spawn a massive increase in related data on page, related resources, articles, images, videos and more. And thus many, many more links.</p>
<p>This means that people will search less, and explore more; ever increasingly.</p>
<p>It's unavoidable, and even if a website isn't enhanced with all this extra linked data, odds are the user will have a browser extension or app running that will show all the related information anyway - these technologies are already here and used - adoption *will* grow, no way out of it, change happens.</p>
<h2><strong>Info for specific sectors</strong></h2>
<p>This isn't the meant to be a full introduction or all encompassing, in fact nowhere near it - if you want the ins and outs of LOD then look elsewhere. This info is for the everyday Web Developer, Web Designer and SEO Specialist.</p>
<p><strong>Web Designers (+ those who work with html)</strong></p>
<p>To be honest I think this change might hit you guy's hardest; you see XHTML+RDFa is already here, it'll be massive soon (and don't go thinking HTML5 will get you out of it, RDFa will be in there too). In short XHTML+RDFa is xhtml as you know it, but with support for embedded RDF information, really it means a few new properties on elements that let you say what they are; in place FOAF, Dublin Core (DC) and the like. Any further description is outside the scope of this document ;)</p>
<p>What this means for you is that as well as having potentially a lot more to display on page (linked data) and lot's of UI challenges, you also now have to cater for this RDFa data in your templates. It's not like other W3C stuff which you can ignore, different to cross browser compatibility, if you leave it out or skip the RDFa stuff then the site will potentially be outside of the LOD network, traffic will drop and ultimately the site may as well not be "in" the web (might be a few years before that though) - so in many ways the end of ignorance and excuses.</p>
<p>You can currently slap out some HTML4, change the doctype, stick on jquery and make it "look" web 2.0 - and people will think it's web 2.0; with web 3.0 (the linked open data based net) you can't do that, it either is web 3.0 or isn't; there isn't a "web 3.0" look, just web 3.0 source.</p>
<p>Drupal 7 has RDFa support out of the box; within the year I bet every CMS &amp; Blog will too; and if you make a new template with the RDFa cut out because you "don't know it", then I'm pretty sure it won't be long before your clients or employers cut you out; and we don't want that.</p>
<p>Further, if you don't - developers will be on your back big time &amp; changing your source; or worse the SEO guys will be ;)</p>
<p><strong>Web Developers</strong><br />
All of you need to know what triples are (subject-predicate-object), and URIs and CURIEs (not your normal URIs, URIs as Identifiers).</p>
<p>If you're going to be exposing data in your systems then you need to get used to mapping database properties through to RDF triples; that a user is a foaf:person with a foaf:name; that tags are ctags and dc:subjects, and that articles have a dc:title (keeping it simple for this).</p>
<p>If you're going to be consuming LOD data then you need to learn a bit more, RDF, SPARQL, Owl, ontologies and a bit more.</p>
<p>And if you want to get "in to" LOD in a big way, then go do it.</p>
<p><strong>SEO Specialists</strong><br />
You need to know what the designers know, and you'll be changing from SEO specialists to data exploration optimizers or suchlike, focus will be on how you can make the data machine readable and get it linked in by the right services.. should be fun!</p>
<p>Further, you'll need to watch for how to get traffic to the sites, as mentioned search engine traffic will drop slowly over the next few months and years; with more focus going on "links" from related pages. As for the diggs &amp; reddits, who knows how it'll effect traffic from them.</p>
<h2><strong>Summary</strong></h2>
<p>IMHO it's in all of our best interests to just get on with this, it will happen and the sooner YOU do it and convince your employers you have to make this move the better, companies can easily loose clients too if your competition is offering "web 3" and you aren't.</p>
<p>At no point have I seen a tech hit the web which could literally leave people behind if they don't jump on board; it's happened in other industries and now ours (remember VHS?).</p>
<h2><strong>The two questions most people / companies / clients will immediately raise..</strong></h2>
<p><strong>1] We don't want to expose all our data for reasons X,Y&amp;Z!</strong></p>
<p>LOD isn't about exposing all the your data on the internet; it's about making the data you've already exposed on the internet in a more granular fashion, it's about making that data machine readable.</p>
<p>Presently you may have an article on a page with a title and author credit in HTML, in the future you would still have the same author and title, however they would be wrapped in markup that allows a machine to understand that "Joe Blogs" is a person who is the author of the article, and that the articles title is "I'm scared of exposing my data".</p>
<p>If you consider you're public facing web pages, everything on that page is already exposed, all we're doing here is describing what each bit of data is in a way we can all use.</p>
<p><strong>2] Trust &amp; Junk</strong></p>
<p>One common misconception is that you have no control over the source of the data you pull from the "cloud", and that it could essentially be junk. However this couldn't be further from the truth, what we do is to find a source of data we trust that has their data exposed in a machine readable format, then query it for the exact information we want, and finally include or display it in our own system.</p>
<p>To illustrate, consider you wanted to reference the countries of the world with population in your system. Currently you would have to build  a database table, populate that data with country name and country population, then write some code to display that data. In this scenario you'd probably get the population data from a credible source such as wikipedia (copy and paste it in to your own database).</p>
<p>By using linked open data, you could treat the machine readable version of wikipedia (dbpedia) as your database table, query it instead and again write some code to display the data on you're own site.</p>
<p>You're displaying the same data, from the same trusted source; and you've selected which source you trust; it's not a case of just querying some cloud of data; it's a case of choosing which source(s) you want / trust and querying them.</p>
<p>As an additional bonus you don't need to worry about your information going out of date, as you're getting the data straight from source, the population of each country is updated on your site whenever it's updated on wikipedia.</p>
<p>Further, you don't need to worry about maintaining that list of countries, as in a single query you can pull out a list of all countries with each ones population, as the world grows and changes, so does your data.</p>
<p>Further still! once you've made the move to using some linked open data, all the data you could want is at your finger tips, let's say a decision is made to include 30 different bits of information about each country in your system. Consider that task for a minute - full system change, finding, collating and entering all that data; let alone maintaining it! Well, I'm sure you can guess the next bit, using LOD we can simply expand our original query to include the other bits of information we want, then display it - job done.</p>
<p><strong>That's it.</strong></p>
<p>Good Luck!</p>
<p>nathan</p>
<p><img class="alignnone size-full wp-image-173" title="future" src="http://webr3.org/blog/wp-content/uploads/2009/10/future.jpg" alt="future" width="600" height="250" /></p>
]]></content:encoded>
			<wfw:commentRss>http://webr3.org/blog/featured/preparing-yourself-for-web-3-0-lod-and-2010/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>What to do when a continent wipes out your domains dns [seriously]</title>
		<link>http://webr3.org/blog/gotcha/what-to-do-when-a-continent-wipes-out-your-domains-dns-seriously/</link>
		<comments>http://webr3.org/blog/gotcha/what-to-do-when-a-continent-wipes-out-your-domains-dns-seriously/#comments</comments>
		<pubDate>Mon, 20 Jul 2009 10:55:58 +0000</pubDate>
		<dc:creator>nathan</dc:creator>
				<category><![CDATA[gotcha]]></category>
		<category><![CDATA[internet]]></category>
		<category><![CDATA[Distributed denial of service attacks on root nameservers]]></category>
		<category><![CDATA[DNS]]></category>
		<category><![CDATA[dns tools]]></category>
		<category><![CDATA[Domain name system]]></category>
		<category><![CDATA[Europe]]></category>
		<category><![CDATA[Human Interest]]></category>
		<category><![CDATA[ISP]]></category>
		<category><![CDATA[Major]]></category>
		<category><![CDATA[Name server]]></category>
		<category><![CDATA[Root nameserver]]></category>
		<category><![CDATA[Technology/Internet]]></category>

		<guid isPermaLink="false">http://webr3.org/blog/?p=105</guid>
		<description><![CDATA[
This weekend "the internet" played it's biggest gotcha on me in 10 years ever; I'll keep it short and it's worth a read just so your a gotcha more aware.
The Problem: Every DNS server at every major ISP on a continent (Europe) decided to throw back a "SERVFAIL" response for one of my domains. Thus [...]]]></description>
			<content:encoded><![CDATA[<p><img class="alignnone size-full wp-image-106" title="bollocks" src="http://webr3.org/blog/wp-content/uploads/2009/07/bollocks.jpg" alt="bollocks" width="600" height="250" /></p>
<p>This weekend "the internet" played it's biggest gotcha on me <span style="text-decoration: line-through;">in 10 years</span> ever; I'll keep it short and it's worth a read just so your a gotcha more aware.</p>
<p><strong>The Problem:</strong> Every DNS server at every major ISP on a continent (Europe) decided to throw back a "SERVFAIL" response for one of my domains. Thus the domain was unknown all through europe!</p>
<p><strong>The Weirdness:</strong> All DNS for the domain was correct, and moreover with a third party (godaddy), all dns tools online are showing no problem, all your servers show the domain / dns is okay.</p>
<p><strong>The Bigger Problem:</strong> DNS is fine on the nameservers, and fine all around the world, but in one continent it's stuffed, the problem isn't at your domain register, with your isp, with your hosting company, on your server - in other words it is completely and totally outwith your control; Who do you phone? It's a case when the "bit in the middle" that thing we call the internet, breaks, there is no one person or company that can fix it.</p>
<p><strong>The Problem Doesn't fix itself:</strong> 36 hours on and the problem is still there, it's not getting better.</p>
<p><strong>The (only?) Fix: </strong>After trying everything possible, here's the only thing that fixes it - <em>change the nameservers</em>, this forces the dns cache's around the internet to update and within 24 hours the record for your domain is restored world wide. thank fk for that!</p>
<p><strong>Conclusion:</strong> This is honestly the strangest and most frustrating "bug" I've ever found in my life, nobody else can help you when a few thousand servers all decide your domain is no good; so a friendly heads up - you may never see this error ever, but if you do - you know what to do!</p>
]]></content:encoded>
			<wfw:commentRss>http://webr3.org/blog/gotcha/what-to-do-when-a-continent-wipes-out-your-domains-dns-seriously/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

