Information Classification

It was not intended to use SALSA as a classification system. Many other tools exists that do a fair job. Yet as a side effect of mental semantic representation SALSA is capable of classifying unstructured-data by itself. Not only is this a nice feature but it’s computation is light. Meaning SALSA is a potential technology for massive document categorization.

Here is a preview of what the classification looks like within SALSA’s mental representation. I will try to setup a public web-service for those interested.

SALSA’s classification

2 Responses to “Information Classification”

  1. 1
    Christopher Kotfila Says:

    Hey man,

    Can you contextualize this graph a little more for me, I understand what you’re driving at, but what is your corpus, what kind of algorithms are you using to produce the data and is there any indication that what you’ve put up has inter-reliability with a humans judgment of these classifications?

    word
    /Chris

  2. 2
    Stephane Says:

    Hey Chris,

    While I cannot answer direct questions with regards to the technology (look at the latest publication regarding VGEM and Dan’s work at the cogworks for an insight), the inter-reliability question is a very good one.

    There are two sides to this question:
    - Are the data correlated with Human judgment?
    - What is the threshold to which Information System must correlate to human data in order to be usable?

    I will dedicate a complete post to these. In the mean time a public version of the service will be made available so that anyone can test and get a feel of SALSA’s results.

    Cheers Chris and thank you for stopping by :-)

    _Stephane

Leave a Reply