10/09/2006

regarding web two point zero

i've been looking at new additions to google and amazon that seem to be pushing the web 2.0 model of user supplied and managed content. firstly, there's google base which is a new database of user supplied and annotated content that is indexed, searched and published by google. if you have a google services account, you can easily add items, either singly or in bulk using XML to submit them all. there are a bunch of pre-defined item types or categories, such as Blogs, Jobs, Podcasts, Reviews, Recipies, Products or Reference articles each with their own set of default attributes/meta-data.

you can also post items in your own categories, and add arbitrary new attributes. attributes are just name/value pairs, where the value is either a plain text string or one of several pre-defined types like numbers, date (range), URLs or locations (for google maps). these are displayed at the top of an item's display page. additionally you can add up to ten labels, which are similar to tags or keywords. these labels are used to group items, and for browsing, similarly to categories except that you may have membership of multiple label classes but only one category.

although i like the idea of submitting your own content to be hosted by google, with tags and semantic info for indexing, it appears that most of the information in the base is auto submitted from other sites, as a link to the item page and some meta-data. unfortunately, for items like books, cds and dvds or other physical objects, there are many online retailers selling them. it means that there are many copies of the information (meta data) on an item, sometimes conflicting, and no way of determining the definitive item's identity. this is a shame, because a database like this would be a good basis for some of the semantic web projects.

i'm not sure how google will rank the information though, since people can obviously submit anything - the wikipedia problem, basically, which they seem to have solved, admittedly. also, there aren't really any links to or from the google hosted content (yet) and this makes it hard to calculate a pagerank equivalent. interestingly, you can see recent searches on the base front page, which can be odd! but, they could use some of the search data to determine which items people looked at most and have this as part of the ranking data.

there are also vocabularies to describe links and relationships. for instance functional requirements for bibliographic records (FRBR) is a vocabulary that describes the relationships between works, such as parodyOf, excerptFrom, originalWork, reviewOf and so on. sites like IMDb provide a unique namespace for referencing movies, which can each be entered into base with the relevant meta data. then, any reviews, parodies or whatever can be easily linked to the unique identity of the original work.

i have submitted a copy of my mind performance hacks review, as one of my items to see how the data entry works, as well as data for my weblog. as mentioned previously, there aren't many google hosted items at the moment, although the people profiles category is, and has some special search settings. this part works like a personal ad database, really, although it could eventually evolve into a directory for identity information, like a white pages.

the second user generated content system is on amazon, namely their addition of wiki-pages to all book information, called ProductWiki (product information from our customers). this allows any customer to contribute relevant information as freeform text and links, not nescessarily in the form of a product review. for instance, links to source code download sites for technical books or to online discussion forums about the characters for fiction. at the moment, uptake seems slow for this feature, but since the wikis allow cross-referncing between books easily, this could grow into a hypertext literary database. i have edited and created content on the wikipedia encyclopaedia site, as well as friend's private wikis, and used them at work for recording information like network configurations that is often dynamic, and i really like the concept. hopefully user contributions will make amazon's wiki a useful resource eventually.


No comments: