Sharing Knowledge, Sharing Data

For the purposes of this article:

  • Knowledge is the theoretical or practical understanding of a subject.
  • Information describes (and gives structure to) knowledge.
  • Data are the abstract facts that can be distilled and derived from information which can be stored on, and manipulated by, a computer.

Sharing knowledge is the most important thing we can do.

In order to share our knowledge effectively, GrubClub will adopt Semantic Web standards throughout. The best way to describe why this is a good idea is to give an example of the achievements and (current) limitations of another social project; and there’s no bigger or better example than Wikipedia.

Wikipedia’s two greatest assets are (1) its content, and (2) the users that manage and maintain it. The majority of the organisation on Wikipedia is done by a minority of its users. The information in Wikipedia is organised for humans to consume; the organisation is a manual process and herein lies its main limitation. There’s a lot of administration work that is eats up time but doesn’t open up the information for new uses.  For example:

If there is a page detailing the top ten foods consumed in your country last year, then building a similar table for this year would require a lot of the work to be repeated to create a similar page.  If we wanted to replicate the table for all countries then extra work would be necessary to generate the extra pages.

Whilst the diligent managers of Wikipedia’s pages work hard to organise the information, there is notably little “distilling” of it to create reusable data. If data (rather than pages of information) is available it can be queried and combined with other data, so:

  1. Similar pages for any country could be automatically generated.
  2. Novel perspectives could be derived, offering alternative insight into the original knowledge. i.e:

    What if countries are not the segregator you’re looking for and instead you’d like to see the information organized by distance from the equator (separated into 100 mile wide hoops that surround Earth)? If we have latitude data for countries then such a page can be automatically generated.

    What if you’d like to see the same data segregated by annual rainfall, mean population age, or by some other socio-economic criteria?

    Unless a wikipedia editor (or you) spends the time to create every one of the pages you want to use, you’re going to be unlucky; and even if you or someone else does do the work to make the pages, it’s time that could (and probably should) be better spent distilling the data and working out how to automatically generate the pages you’re interested in, so that the next person who wants to query something similar has less distilling to do.

There is always organizational overhead, but the as time is spent organizing data, much more time is saved by avoiding the generation and maintenance of pages that are useful only to humans.  The crux of the matter is this: at the time knowledge is captured we are unlikely to know how it might be used, or how it may be combined in new and novel ways, so the key to effective knowledge sharing is the storage of data that is freely accessible to as many humans and as many machines as possible.

To this end we should strive to:

  1. Distill information into data that has semantic markup – small indicators that enrich and clarify what the data is and what it means.
  2. Avoid locking knowledge into inflexible structures, and applications.  Instead we should favour flexible, open and integrable standards that allow us to describe what we know, but do not limit us in the way we interpret that description (because structured data is good, and some data cannot exist without structure but there’s never just one correct way of describing anything, so if we are flexible we are with the data, we create opportunities to squeeze knowledge from it).

Sharing knowledge is the most important thing we can do, and the technologies that will allow us to be most flexible and open in sharing this knowledge are Semantic Web technologies.

This entry was posted in GrubClub and tagged , , , , , , , , . Bookmark the permalink. Post a comment or leave a trackback: Trackback URL.

Post a Comment

Your email is never published nor shared. Required fields are marked *

*
*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>