#501: Filterclicks

There are lots of programs available which do Bayesian analysis of text and thus claim, bogusly, to have extracted the semantics of the content. These often get successfully applied, however, in the form of automatic spamblockers or email classifiers, amongst other things.

(Say I get lots of emails. Say 200 contain the word ‘girlfriend’. Of those messages containing ‘girlfriend,’ I decide that 190 are spam. Now the chance that an incoming email which contains the word girlfriend is spam = 190/200. I could then use this to throw away 190 out of every 200 messages received containing the ‘g word’ although in practice it’s more useful, with spam, to ditch everything with a value > say 80%. Over time, this estimate will change if I can be bothered to label as spam stuff which slips through this filter. It may be less useful if for example I join an online dating agency…perish the thought.)

Today’s invention is to apply this logic to what some people regard as the uncontrollable growth of the number of messages in their inboxes. All sorts of rules of thumb are available in software packages to filter email, but when a message is opened under this regime, the user can click on a selection of words within it (one click=’this word makes makes the message less unimportant’, two clicks= ‘this word makes the message more unimportant’ or the like). After a period of labelling individual words like this, the email client can extract the various probabilities and automatically order one’s messages in terms of their importance. It might then insist that the most important be dealt with first.

Comments are closed