:: scentric
audio player. music manager. light weight. sexy.


This page

The plan - executive summary

This project was conceived because managing playlists is a very tedious business and because discovering new artists that do music that I like is also more work than I can be bothered with.

The project has several goals: firstly, program a song-centric playlist manager that allows people to sort songs into user-created categories. Categories can be hierarchical, so that a parent automatically inherits songs in child-categories. Each category constitutes a playlist.

Secondly, I want to be able to click on a song title or artist and find other song titles or artists that I don't have yet and that are similar to the selected title or artist.

The first stage is a stand-alone local application, the second stage would involve aggregating song meta-information in a decentralised peer-to-peer network (note: I am not talking about being able to download any songs, just finding titles and artists!). The grade of similarity of any two songs is dependent on the number of other songs that are in the same category (we would take the category that has both songs in it, and, if there are multiple ones of those, the one with the lowest number of other songs). This grade of similarity would need to be aggregated.


What's the problem with playlists?

Nothing. The problem is mine. It's just that I don't think in terms of playlists, I think in terms of songs.

Let me explain: Currently most music players make you think in terms of playlists. You start a playlist, let's call it the 'Acid Jazz' playlist. Now you've got an empty playlist. Now you need to think about your music collection and think about which songs in your collection you would like to have on this playlist. So you go to the 'add songs' dialog and add songs. Then you make another playlist and call it 'Funk'. Again, you go to the 'add songs' dialog and add some songs. Now you create a playlist 'Jazz' and again add some songs. So far so good. Now let's see what the problem is:

Problem 1: You get a couple of new Acid Jazz songs. Obviously, they go into the 'Acid Jazz' playlist and into the 'Jazz' playlist and possibly also into the 'Funk' playlist. So what do you do? You open the 'Acid Jazz' playlist and add your new songs. You open the 'Jazz' playlist and add your new songs. You open the 'Funk' playlist and add new songs.

Now this plainly sucks, because you know perfectly well that most of your 'Acid Jazz' songs also go into 'Jazz' and/or 'Funk'. Why do you need to open three playlists and add those songs, if simply categorising them as 'Acid Jazz' could be enough given the knowledge that the other two are supra-categories. Equally, if you simply know that your girlfriend loves Acid Jazz and Funk, so why do you need to manually add all those new songs to 'her' playlist as well after getting them?

(note: this is just an example, not an attempt at a genealogy of music genres)

Problem 2: Those songs were in your audio-downloads folder, where you temporarily store songs to find out whether you want to keep them or not. The folder tends to fill up and you move the songs that you definitively want to keep to your shiny new 50 Terabyte harddrive. Now that the location of the files has changed, you will need to put them in your playlists again.

Problem 3: ... there are more, but I'll spare you ;) (I could go on about this forever)


So what's the plan then?

The overall goal is to ...

  • be able to manage my song collection in a way that doesn't make me hate computers, and
  • establish a peer-to-peer system that indexes the aggregate 'similarity' of songs based on the categorisations of thousands of people. I want to click on a song that I like and the system should spit out a list of 100 songs that are probably similar according to the categorisations of other people (songs I don't have yet of course).

There. Now you know.

Note that this system is based on the underlying categorisation data and not on listening habits (like audioscrobbler, for example).

The goals to be achieved are divided into different 'stages', after which the program will be useful in itself, regardless of whether subsequent stages are ever implemented. The point of this is to increase the chances of producing something useful at some point. If the goals are set too high, chances are that the project will never see the light of day, which would be shame in my admittedly somewhat biased opinion. Therefore: first things first:


Stage 1

write a song-centric audio collection manager. Frankly, I have so many songs in my collection that I don't even know all of them by name or interpret any more (yes, I have a pathetic memory). If I play a song, I know how I would personally classify it, and do it right away. Here's the idea in bullet points:

  • Let the user specify all directories and directory trees that contain audio files
  • Hash those files (ie. read them and produce a unique 160-bit number that identifies a file), so identical files can be recognised even after they have been moved to a different location in the filesystem, or after they have been removed temporarily and restored (e.g. from CD).
  • When hashing the audio files, skip the various descriptive headers that might change, like artist and title etc. (id3 tags et al.)
  • Users can define a tree of categories and subcategories.
  • Users can assign songs to a category. They will automatically be part of all parent categories.
  • Users can define 'virtual' playlists via rules (e.g. 'Happy Music' = 'Funk + HipHop + Jazz - AcidJazz)
  • Each 'category' represents a playlist.
  • Make sure the user doesn't need to know scheme to be able to use or understand the above.


Stage 2

Develop nifty algorithms to estimate the similarity of songs, based on categories and meta tag information (and possibly audio analysis, but let's not get carried away), and more nifty algorithms that index songs redundantly based on not 100% reliable meta tag information (e.g. wrong spelling, too much information, etc.) so that songs that are probably the same songs can be identified even though they are different versions of the file. Ideas in bullet points:

  • Songs in the same category are probably very similar.
  • If there are only very few songs in the same category, those songs are probably more similar than if there are hundreds of songs in that same category.
  • The more parent categories (leaves to tree root) there are, and the more sibling categories (leaves of immediate parent) there are, the more similar are the songs in a certain category.
  • Songs with similar meta tag information are probably similar (e.g. same artist, same song). Use junk filters and phonetic transcription algorithms.
  • Songs that are not in the same category, but whose category name is identical, are probably similar (e.g. /music/80s/rock and /music/70s/rock) to some degree (this condition needs more thought - what about /music/rock/50s and /music/blues/50s).


Stage 3

Develop nifty ways of aggregating and indexing this kind of 'similarity data' from multiple persons in a meaningful and efficient way and make it searchable (e.g. given one song with certain meta data, what could be similar songs according to the database).


Stage 4

You guessed it - put it some networking capabilites and make the whole thing into a decentralised peer-to-peer network where the 'similarity data' is indexed and can be queried. I want to click on a song that I like and the system should spit out a list of 100 songs that are probably similar according to the categorisations of other people (songs I don't have yet of course).


Stage 5

World domination, I guess.


Questions and Answers

How is this new? Why don't you just query http://www.gnoosic.com or similar engines?

Firstly, I want to manage my music collection into playlists etc. Gnoosic and friends can't do that for me. Secondly, I think that the problem with Gnoosic and friends is that they tend to make connections between 'well-known' bands rather than find the rare jewels that no one knows about (or no one bothers to type in).

Cool. So is this going to be Napster 2?

No. This project has nothing to do with file-sharing. It's all about peer-to-peer though. Think hyperlinks. Think hyperlinks created by aggregating individual data on the relationship of songs.

But most people's taste in music sucks, and they will only have two categories - 'Rock' and 'Pop'!?

It won't matter. Obviously songs are more similar if there are 100 songs in 20 different catgeories than if there are 100 songs in 2 categories. It won't matter, as long as there are only a few people who take care of their music collection and categorise songs in a sophisticated way. Also, are people with a couple of songs likely to use this program in any case? Unlikely. Playlists will probably do for them.

People will not know how to sort songs properly! I mean, not everyone can tell the difference between downtempo, electronic dub, ambient, chilled beat, abstract hip hop, ... !?

It won't matter. See above. The system will stand if you aggregate the data, as long as there are a few freaks like you who actually know the difference (I certainly don't). The plebs will go unnoticed ;) Apart from that, it simply doesn't matter how people categorise their music. Obviously, music can be categorised by genre, but it can also be categorised by anything else (e.g. /music/reading /music/sad /music/having_sex /music/work /music/cooking /music/christmas /music/chill). People will most likely come up with categories that make sense in some way or another, and make some songs more similar to others in some respects than other songs. Have a little faith in humanity here ;)

What makes you think this will actually work?

The same reason why Napster worked: Napster worked, because people would download music for themselves and then it would automatically be shared out and made available to download for others. So the overall added value to the whole network is created by purely egoistic behaviour.
The same will be true for this project (scentric). People will use it primarily as a tool to organise their own music collection. The 'song similarity data' that aggregated will provide value to the community is generated as a side-effect.

I think your program is deeply postmodern I might say.

erm, yes, sure, Axel.


scentric is powered by gstreamer logo

web design based on mark_olson's industrofunk design.