Thursday, October 15, 2009

I just had to get this out of me

It is thursday evening. An evenining after a day when nothing has worked. And even though I would have prefered writing this with a nice 10 year old scotch whisky in my company, I will not.

On the topic of a structure database

In an already too distant past a project for a molecular structure database was started. It got the name StructureDB, in lack of a better name, and StructureDB it has remained. From the beginning the idea was to have a system that was easy to get started with, no fancy installation and stuff, just fire it up and start playing around. So we settled for HSQLDB. However we also wanted a server version which we thought we were gonna use MySQL for. So far so good. Then we started designing a fancy model with auditing and annotations and stuff because those things are a must have for a big system. In order to do auditing we needed users, different users that is -- which were to log in to the system. So the model turned out to be something like this:

Fast forward to today. As I said I spent the whole day struggling with things that didn't work. Now struggling with things that doesn't work is business as usual. What was different today was that the things that didn't work was things that had been working fine up until today. At least that is what I thought. One of the things I was fighting with was creating a default 'local' user for each new database instance and how to keep the auditing correct in regards to who created this user. I was trying to make it so that it created itself. This had been working fine earlier but was missbehaving in some cases it seemed, and while I was messing around, exploring possibilities for how to solve it the whole thing literary came crashing down upon me and the last thing that happend before I went home was the it somehow used up 500MB of memory while loading a 5MB file into memory.

Anyway, I gave up and thought something like it's clearly a bad day and I better sleep on this. However during the bike ride home a voice in my head told me:
'-You are doing it wrong!'
'Why?' I asked the voice. What was I doing wrong?

Have you spotted it yet? Well, I will tell you now. StructureDB today is a one user system running locally on one client. There is absolutly no need for users, and no need for auditing. There simply is no point in being able to go: "This is wrong, who did this?" to the system because the answer is always gonna be: "You did it!". Furthermore, the fancy ChoiceAnnotation based on pre defined values is probably not the way we want to work with molecules either. Normally we import the molecules from something like a huge SDF file and then we do searches on them. Maybe we calculate some properties for them and store them. But there is no reason not to simply use text fields or number fields for this. We don't need a predefined enumeration of valid property values.

The only reason for keepign the user system was for that distant day when we were going to run this all on a MySQL installation and share data among many clients. But that day is looking more and more distant to me.

So I need to bring out the red pen and cross out some things from that diagram:

Now this should make a lot of things a lot easier. Whenever I find the time to get back to this taht is...

...and who knows, it wouldn't suprise me if it actually will bring that MySQL day a bit closer too!

1 comment:

  1. Hmm, just got to think about this article which you might find interesting:

    Not in the short term, but seems to be interesting times now when many people try different strategies, so maybe soon databases will work quite differently.

    In this perspective I also found this proof of concept interesting (actully inspired by the above article):
    AFAIS it would go under the "graph database" category mentioned in the article.