06 October 2008

Filtermath




i think my complaint
about blogging's dark age
can be boiled down to a handful
of formulae and statistics

eg for flickr (simplest case)
c4000 new pix per minute
maybe 0.1% of which (4/min) i'd fave
if i had a way to find them

the full database of a billion pix
waiting to be filtered
a checklist that should always remember
which you've already viewed

i subscribe to about 250 photographers' uploads
and 50-more's favorites
plus a single tag ('nude')
all via rss, which lets me quickly scan
mediumsized copies
of hundreds of pix per day

maybe 10% of which i fave
creating a veryhighquality stream
that should be subscribed by thousands
(current subscriptions unknown/zero?)

but missing at least 1000 potential faves/day
because i can't find them

simple strategy:
ruby/python/perl script
builds mirror of personal faves
and for each of these faves
the list of all flickrers who've faved it

and then calculates which flickrers
have faves that overlap mine most
(without excessive burden of
boring/offensive faves)

(flickr needs a stat
hinting how many pix each favoriter winnows
on average)

when you discover promising new fave-sets
you need to queue/ration
the process of sifting them
(the best ones might need better
to be added en masse
with exceptions deleted)

flickr should recommend photographers and faves
based on your known preferences

should always offer a stream
of likeliest-to-fave's
it knows you haven't seen

algorithmically tweaking its choices
as you surf






.