Skip navigation.

The State of Solid State Devices for Databases #

Recently I read in AnandTech a good article on Solid State Devices (SSD). It certainly blew away many misconceptions I had about SSD.

From a professional point of view, my main interest would be how databases are affected by the following characteristics of SSD:

  • Both sequential and random reads are fast with a granularity of 4K. In other words, to modify 1 byte, you still have to write 4K.
  • Writes require the block to be erased first. A block is typically 512K. That means if there are no erased (also called trimmed) blocks, you need to erase, and it is a s-l-o-w operation.
  • You can only erase a block 10,000 times before it stops working.
  • Good quality SSD controllers with onboard caches and highly parallel architectures make a big difference.

From this summary, it appears that the SSD is ideal for relatively static data, and we can selectively put certain parts of the database on SSD. Examples include:

  1. The typical publishing web-site, where articles and messages are rarely edited more than a few times.
  2. Systems with large amounts of static data, eg. multi-player online games such as the EVE Online case-study.

For transaction processing systems it depends on the usage. Assuming the blocks in a table are updated 10,000 times a day, the data distribution of updates is even across all records, and the table fits into 100 512K blocks, then the lifetime of the SSD for those blocks would be 100 days (this might be acceptable if SSD was sufficiently cheap). And even for data warehouse applications with relatively static data, B-tree index rebalancings will cause the frequent rewrites of indexes.

It also appears that operations characterized by sequential writes such as transaction logging should continue to be placed on hard disks.

For some information on potential database performance, see these articles testing SSD with mysql, DB2 and PostgreSQL. Also see Windows 7 Support and Q&A for Solid-State Drives.

Memories are made of Squid... #

Apache is not a particularly fast web server. A single Apache server doesn't handle a mix of static and dynamic data particularly well. Ideally, static data such as gifs, pngs and html pages should be cached in memory for quick access; Apache doesn't do this. And the prefork design of Apache (where we have a simple reliable parent process managing multiple worker processes that do the real work) makes it exceptionally robust, but the overhead of having these parent and child processes makes things run slower.

Squid is a web proxy accelerator. What it does is make Apache look good -- downloads magically become faster because of the caching of static files, and the overhead of connection setup and keep-alives is offloaded from Apache. When a request for a .php page is made, Squid will pass the request to Apache and return the results.

Since 2004, we have had a customer runnning an Intranet system with Apache 1.3, PHP 4.3, Oracle and Squid. This system runs on 16 CPU Sun E20K mainframe and has over 3000 users logged in every day. A few months ago, we upgraded them to PHP 5.2. While planning for the upgrade, I did some benchmarks and found that PHP 5.2 was about 30-40% faster than PHP 4.3. So I confidently recommended that we disable Squid...

On the morning of the rollout, I saw to my horror the CPU utilisation surge, from 50%, to 60%, to 70%, until it hit 98% -- and it stayed there! Only then did I remember that in my testing back in 2004, Squid had improved performance significantly more than an upgrade to PHP5 ever would. At eleven, we started Squid and modified apache to listen to port 3000 again. CPU utilisation dropped from 98% to 40%. Squid had saved day again.

The moral of the story: never underestimate the power of a good squid


The Server setup is as below, all software running on a single E20K Solaris Server:


  Squid listens on port 80 ---- Apache on port 3000 (http) ---- Apache Children  ---- Oracle 10g
                                 and port 443 (https)            running PHP 5.2

When web browsers use https for login authentication, they connect directly to Apache. When users are accessing normal data on port 80, Squid will forward the request to Apache which is listening on port 3000. Squid is setup to cache .png, .gif, .jpg, .js, .htm, .html and .css files.

ADOdb Active Record and the art of redesign #

Merry Christmas and Happy New Year everyone. Looking forward to the new year as I expect to be a father in January :)

Let's now talk a little bit about the parenting and the past: Since 2006 ADOdb has supported Active Record, the object-oriented paradigm for processing records using SQL. One of the most powerful features of Active Record is the ability to define parent-child relationships. The old way was:

$person = new person();
$person->HasMany('children', 'person_id');
$person->Load("id=1");

Where "persons" is the parent table, "children" is the child table and "children.person_id" is a field in "children" pointing to "persons". All the children of person with id=1 would be dynamically loaded into the array $person->children when the property was accessed (lazy loading).

This was confusing for the programmer and had many limitations, as was pointed out by Arialdo Martini in this post.

Firstly it was confusing to the programmer. Should HasMany() be called everytime you create a new person()? The answer was no, it's global, but the implementation made it look like it was local to the instance. The HasMany function really should be defined as a statically, before new person() was used.

Another problem was you could not override the class of the child objects. So you couldn't modify the behaviour of child object easily.

My objective was to fix all this, and still keep backward compatibility so your old code continued to work. The good news is that all the metadata to keep track of all the object-table relationships could still be reused. The problem was one of a weak API, but the internals were sound. The solution implemented in ADOdb 5.07 was to create a new set of static functions that override the default behaviour:

ClassHasMany

The new way defines the relationship in a static function, which makes it clearer that it only needs to be called once in your init php code:

class person extends ADOdb_Active_Record{}
ADODB_Active_Record::ClassHasMany('person', 'children','person_id');
$person = new person();
$person->Load("id=1");

TableHasMany

One of the things that I try to do in ADOdb is maintain backward compatability. You are able to override the defaults of Active Record (id is the primary key, the name of the table is the plural version of the class name). So if the table name of the parent is not "persons", but "people", you can use:

ADODB_Active_Record::TableHasMany('people', 'children','person_id');
$person = new person();
$person->Load("id=1");

TableKeyHasMany

The default primary key name is "id". You can override it (say "pid" is used) using

ADODB_Active_Record::TableKeyHasMany('people', 'pid', 'children', 'person_id')

Child Class Definition

Formerly, the child class was always an ADODB_Active_Record instance. Now you can derive the class of the children like this:

class childclass extends ADODB_Active_Record {... };
ADODB_Active_Record::ClassHasMany('person', 'children','person_id', 'childclass');

Works the same way with TableHasMany().

Belongs To

Analogously, there are functions ClassBelongsTo, TableBelongsTo, TableKeyBelongsTo for defining child pointing to parent.


Download ADOdb   ADOdb Active Record docs