I like David Peterson’s blog: yesterday I read an interesting article on how structured data (such as embedded meta or microformats) might take on Google.
While the usage of such data is pretty impressive and its rendering fits perfectly with our expectations of “data-gadgetry”, structured-data are still a very sparse.
My understanding of Yahoo’s Microsearch is not […]
Feb 24, 2008 | By: Stephane | 1 Comment
Big news for the Semantic web! While it is in itself nothing too fancy, and short of revolutionizing the whole information scene, Reuters’ new acquisition (ClearForest) has released the OpenCalais API. What it does is to turn “supposedly” non-structured data into that infamous RDF format.
Feb 15, 2008 | By: Stephane | No Comments
While analyzing non-structured or dirty data it is sometimes hard to discriminate strings that are not actual words (garbage, tags, typos, …). In this case, the need of an automated method to differentiate actual words from garbage is of great help. While several approaches exist I will demonstrate two methods available at no cost with Open Source Software:
Feb 15, 2008 | By: Stephane | No Comments
Here we go. This is my usual introductory speech when talking about semantics:
NO, the semantic web has NOTHING to do with semantics.
It is interesting to see how the term “semantic” has been abused here (search Google for semantic web, very few of the 5M+ results has anything to do with information’s meaning).
Feb 13, 2008 | By: Stephane | 1 Comment
A common problem when working with internet data is “What can I do about all these tags”. Cleaning html can be a daunting task. A simple work around is to use the help of a text-based web browser.
Feb 12, 2008 | By: Stephane | No Comments