so many trails ... so little time: 2008

Sunday, November 02, 2008

Old challenges, new synthax ...

I've blogged before about the new SQL synthax which is becoming available in databases and how it helps solving questions which are increasingly common.

Now it's time for another example, something which doesn't come up often in a reporting environment as most tools have this feature, but can be a problem if you're building your output with a scipting language.

Adding a "Total" row at the end of your tabular output, that's it! Here you can find the Devshed forum post that prompted this summary.

Aside from the classical solution using a UNION and an ORDER BY query, I suggested using the new WITH ROLLUP clause, which is implemented by many commercial databases and MySQL (available from 4.1).

Unfortunately MySQL's implementations poses some limitations to the ORDER BY clause, actually it's impossible to directly order by the aggregate column, which is a quite common requirement, on with the main problem using the Sakila sample database.

An example requirement could be showing all sales by category and a row holding the grand total.

In Sakila the basic query would be:

SELECT
 c.name AS category,
 SUM(p.amount) AS total_sales
FROM
 sakila.payment p
  JOIN sakila.rental r
   ON p.rental_id = r.rental_id
  JOIN sakila.inventory i
   ON r.inventory_id = i.inventory_id
  JOIN sakila.film f
   ON i.film_id = f.film_id
  JOIN sakila.film_category fc
   ON f.film_id = fc.film_id
  JOIN sakila.category c
   ON fc.category_id = c.category_id
GROUP BY
 c.name;

As I said a requirement based on ordering by total_sales in MySQL can't be satisfied without using a trick, like nesting selects, and so I'll skip it (BTW: this made me notice a bad practice left around in Sakila, usage of order by in views, see "sales_by_film_category") .

The traditional solution for adding a total row would be:

SELECT
 c.name AS category,
 SUM(p.amount) AS total_sales
FROM
 sakila.payment p
  JOIN sakila.rental r
   ON p.rental_id = r.rental_id
  JOIN sakila.inventory i
   ON r.inventory_id = i.inventory_id
  JOIN sakila.film f
   ON i.film_id = f.film_id
  JOIN sakila.film_category fc
   ON f.film_id = fc.film_id
  JOIN sakila.category c
   ON fc.category_id = c.category_id
GROUP BY
 c.name
UNION ALL
SELECT
 NULL,
 SUM(p.amount) AS total_sales
FROM
 sakila.payment p
  JOIN sakila.rental r
   ON p.rental_id = r.rental_id
  JOIN sakila.inventory i
   ON r.inventory_id = i.inventory_id
  JOIN sakila.film f
   ON i.film_id = f.film_id
  JOIN sakila.film_category fc
   ON f.film_id = fc.film_id
  JOIN sakila.category c
   ON fc.category_id = c.category_id
ORDER
    BY CASE WHEN category IS NULL
            THEN 'last'
            ELSE 'first' END
     , category

As you can see it can be quite hard to read ... the order by trick is last.

The explain plan can become quite nasty too ... (MySQLPerformanceBlog wrote about this earlier), now have a look at the more modern solution:

SELECT
 c.name AS category,
 SUM(p.amount) AS total_sales
FROM
 sakila.payment p
  JOIN sakila.rental r
   ON p.rental_id = r.rental_id
  JOIN sakila.inventory i
   ON r.inventory_id = i.inventory_id
  JOIN sakila.film f
   ON i.film_id = f.film_id
  JOIN sakila.film_category fc
   ON f.film_id = fc.film_id
  JOIN sakila.category c
   ON fc.category_id = c.category_id
GROUP BY
 c.name
WITH rollup;

Much simpler, isn't it?

But what about efficiency?
Let's look at explain plans for those queries:

First one, the classic way:

mysql> EXPLAIN EXTENDED select
->  c.name AS category,
->  sum(p.amount) AS total_sales
-> from
->  sakila.payment p
->   join sakila.rental r
->    on p.rental_id = r.rental_id
->   join sakila.inventory i
->    on r.inventory_id = i.inventory_id
->   join sakila.film f
->    on i.film_id = f.film_id
->   join sakila.film_category fc
->    on f.film_id = fc.film_id
->   join sakila.category c
->    on fc.category_id = c.category_id
-> group by
->  c.name
-> union all
-> select
->  null,
->  sum(p.amount) AS total_sales
-> from
->  sakila.payment p
->   join sakila.rental r
->    on p.rental_id = r.rental_id
->   join sakila.inventory i
->    on r.inventory_id = i.inventory_id
->   join sakila.film f
->    on i.film_id = f.film_id
->   join sakila.film_category fc
->    on f.film_id = fc.film_id
->   join sakila.category c
->    on fc.category_id = c.category_id
-> ORDER
->     BY CASE WHEN category IS NULL
->             THEN 'last'
->             ELSE 'first' END
->      , category;
+----+--------------+------------+--------+-----------------------------------+---------------------------+---------+-----------------------+------+----------+---------------------------------+
| id | select_type  | table      | type   | possible_keys                     |key                       | key_len | ref                   | rows | filtered |Extra                           |
+----+--------------+------------+--------+-----------------------------------+---------------------------+---------+-----------------------+------+----------+---------------------------------+
|  1 | PRIMARY      | c          | ALL    | PRIMARY                           |NULL                      | NULL    | NULL                  |   16 |   100.00 |Using temporary; Using filesort |
|  1 | PRIMARY      | fc         | ref    | PRIMARY,fk_film_category_category |fk_film_category_category | 1       | sakila.c.category_id  |    9 |   100.00 |Using index                     |
|  1 | PRIMARY      | f          | eq_ref | PRIMARY                           |PRIMARY                   | 2       | sakila.fc.film_id     |    1 |   100.00 |Using index                     |
|  1 | PRIMARY      | i          | ref    | PRIMARY,idx_fk_film_id            |idx_fk_film_id            | 2       | sakila.fc.film_id     |    2 |   100.00 |Using index                     |
|  1 | PRIMARY      | r          | ref    | PRIMARY,idx_fk_inventory_id       |idx_fk_inventory_id       | 3       | sakila.i.inventory_id |    1 |   100.00 |Using index                     |
|  1 | PRIMARY      | p          | ref    | fk_payment_rental                 |fk_payment_rental         | 5       | sakila.r.rental_id    |    1 |   100.00 |Using where                     |
|  2 | UNION        | c          | index  | PRIMARY                           |PRIMARY                   | 1       | NULL                  |   16 |   100.00 |Using index                     |
|  2 | UNION        | fc         | ref    | PRIMARY,fk_film_category_category |fk_film_category_category | 1       | sakila.c.category_id  |    9 |   100.00 |Using index                     |
|  2 | UNION        | f          | eq_ref | PRIMARY                           |PRIMARY                   | 2       | sakila.fc.film_id     |    1 |   100.00 |Using index                     |
|  2 | UNION        | i          | ref    | PRIMARY,idx_fk_film_id            |idx_fk_film_id            | 2       | sakila.fc.film_id     |    2 |   100.00 |Using index                     |
|  2 | UNION        | r          | ref    | PRIMARY,idx_fk_inventory_id       |idx_fk_inventory_id       | 3       | sakila.i.inventory_id |    1 |   100.00 |Using index                     |
|  2 | UNION        | p          | ref    | fk_payment_rental                 |fk_payment_rental         | 5       | sakila.r.rental_id    |    1 |   100.00 |Using where                     |
| NULL | UNION RESULT |  | ALL    | NULL| NULL                      | NULL    | NULL                  | NULL |     NULL| Using filesort                  |
+----+--------------+------------+--------+-----------------------------------+---------------------------+---------+-----------------------+------+----------+---------------------------------+
13 rows in set, 1 warning (0.00 sec)

mysql>

Pretty bad isn't it? Two queries are executed, then glued together.

Now the second one WITH ROLLUP:

mysql> EXPLAIN EXTENDED select
->  c.name AS category,
->  sum(p.amount) AS total_sales
-> from
->  sakila.payment p
->   join sakila.rental r
->    on p.rental_id = r.rental_id
->   join sakila.inventory i
->    on r.inventory_id = i.inventory_id
->   join sakila.film f
->    on i.film_id = f.film_id
->   join sakila.film_category fc
->    on f.film_id = fc.film_id
->   join sakila.category c
->    on fc.category_id = c.category_id
-> group by
->  c.name
-> with rollup;
+----+-------------+-------+--------+-----------------------------------+---------------------------+---------+-----------------------+------+----------+----------------+
| id | select_type | table | type   | possible_keys                     | key                    | key_len | ref                   | rows | filtered | Extra         |
+----+-------------+-------+--------+-----------------------------------+---------------------------+---------+-----------------------+------+----------+----------------+
|  1 | SIMPLE      | c     | ALL    | PRIMARY                           | NULL                    | NULL    | NULL                  |   16 |   100.00 | Using filesort |
|  1 | SIMPLE      | fc    | ref    | PRIMARY,fk_film_category_category | fk_film_category_category | 1       | sakila.c.category_id  |    9 |   100.00 | Using index    |
|  1 | SIMPLE      | f     | eq_ref | PRIMARY                           | PRIMARY                   | 2       | sakila.fc.film_id     |    1 |   100.00 | Using index    |
|  1 | SIMPLE      | i     | ref    | PRIMARY,idx_fk_film_id            | idx_fk_film_id            | 2       | sakila.fc.film_id     |    2 |   100.00 | Using index    |
|  1 | SIMPLE      | r     | ref    | PRIMARY,idx_fk_inventory_id       | idx_fk_inventory_id       | 3       | sakila.i.inventory_id |    1 |   100.00 | Using index    |
|  1 | SIMPLE      | p     | ref    | fk_payment_rental                 | fk_payment_rental         | 5       | sakila.r.rental_id    |    1 |   100.00 | Using where    |
+----+-------------+-------+--------+-----------------------------------+---------------------------+---------+-----------------------+------+----------+----------------+
6 rows in set, 1 warning (0.00 sec)

mysql>

A lot better, isn't it? And faster ;)

BTW: More about EPLAIN EXTENDED here.

Wednesday, October 01, 2008

Adding an OpenLDAP server as SQLServer linked server

... and querying it.

You can link to an OpenLDAP server from SQLServer by:

1. Running the following stored procedures:

Use master
go
exec sp_addlinkedserver
'ADSI_OpenLDAP', --name of the linked server
'My OpenLDAP Linked server', --description
'ADSDSOObject',
'adsdatasource'
go
exec sp_addlinkedserverlogin
'ADSIOpenLDAP',
False,
'sa',
'your_root_dn',
'your_root_dn_password'
go

Note that this skips each and every proper security management practice.

Now you can query the linked server this way:

select
*
from
OpenQuery(ADSI_OpenLDAP,
'SELECT
*
FROM
"LDAP://localhost/dc=mydomain,dc=com"
WHERE
objectClass="organizationalUnit"')

which will return all organizational units.
Or:

select
*
from
OpenQuery(ADSI_OpenLDAP,
'SELECT
*
FROM
"LDAP://localhost/ou=myou,dc=mydomain,dc=com"
WHERE
objectClass="InetOrgPerson"')

which will return all InetOrgPerson (i.e. users) from the specified organizationalUnit (ou)
here called "myou".
This is also available as a PDF file on my old site

Monday, September 08, 2008

Random selection, with a bias ...

Say you want to randomly select your employee of the month, but not so randomly, better, you'd like to give your best employees a bigger chance to be selected based on their rating.
This is just an example, you could be randomly displaying ads from your customers, but giving an higher chance to be displayed to those who are paying more, there can be a million other example, but I hope you got the sense of this.
In other terms this means to add some skew to your data, so that it's distribution is no more uniform.
Apart from the statistical implications of this, let's see the sql in action, I'll be using Firebird's Employee sample database, first of all we'll add a column to the employee table, holding the employee rating, then set the rating to 5 for employee 2, 10 for employee 4, 15 for employee 5, 1 for all other employees.
Now that we have rated each employee we need the magic to have them randomly selected with a bias ...
This is accomplished with a "sequence table" which will do just one thing, hold a list of numbers starting from the minimum rating value to the max rating (you'll probably want to preload it with a lot of numbers starting from 1).
Here is the sql:

CREATE TABLE CONSECUTIVE_NUMBER(
NUM Integer NOT NULL,
CONSTRAINT PK_CONSECUTIVE_NUMBER_1 PRIMARY KEY (NUM)
);

And load it:

insert Into Consecutive_number (num) Values (1);
Insert Into Consecutive_number (num) Values (2);
Insert Into Consecutive_number (num) Values (3);
Insert Into Consecutive_number (num) Values (4);
Insert Into Consecutive_number (num) Values (5);
Insert Into Consecutive_number (num) Values (6);
Insert Into Consecutive_number (num) Values (7);
Insert Into Consecutive_number (num) Values (8);
Insert Into Consecutive_number (num) Values (9);
Insert Into Consecutive_number (num) Values (10);
Insert Into Consecutive_number (num) Values (11);
Insert Into Consecutive_number (num) Values (12);
Insert Into Consecutive_number (num) Values (13);
Insert Into Consecutive_number (num) Values (14);
Insert Into Consecutive_number (num) Values (15);
Insert Into Consecutive_number (num) Values (16);
Insert Into Consecutive_number (num) Values (17);
Insert Into Consecutive_number (num) Values (18);
Insert Into Consecutive_number (num) Values (19);
Insert Into Consecutive_number (num) Values (20);

Now we are ready for the magic, we'll be joining our employee table with this sequence table, generating a specific cross join which will hold as many rows for each employee as it's rating, thus, when doing a random select on this data there will be a bigger chance to choose some employees, i.e. employee number 5 will have 15 times the chance to be selected than employee number 8 and 3 times the chances of employee number 2.
Here is the query, embedded in a view:

CREATE VIEW
RANDOM_SKEWED_EMPLOYEE (EMP_NO)
AS
/* write select statement here */
SELECT
FIRST 1
a.EMP_NO
FROM
EMPLOYEE a
INNER JOIN
CONSECUTIVE_NUMBER c
ON
c.NUM BETWEEN 1 AND a.RATING
ORDER BY
rand();

The join clause does the magic, as I said.
Now extract a small sample:

Let's see it's distribution, even on such a small sample you'll see that those 3 employees appear more often than others and with more or less the expected relative ratio:

Emp_no Times
2 5
4 9
5 15
9 4
14 1
20 1
28 2
29 2
36 1
37 1
52 2
71 1
72 1
83 1
113 1
114 1
118 2
121 1
127 1
138 2
144 1

Of course a larger sample will be closer to the desired result.
This example is buit on Firebird, but it should work in any database.
HTH

Note that PostgreSQL's generate_series() instruction might help a lot ;)

Tuesday, July 29, 2008

Setting up ColdFusion and Firebird

After a long struggle on Devshed forums I decided to try on my own, after all I know Firebird and don't see why it and ColdFusion can't play nice together ... infact they did, and easily.
Already having installed on my PC the database (Firebird 2.5 Alpha1) and a client tool (Flamerobin 0.8.6) I went for the server, downloading and installing the free developer version of ColdFusion 8 from Adobe's site.
Install went smooth and I just confirmed every default option but the drive D: in my case instead of the standard C:
After the install finished I logged into the admin console at the default address and opened the Server Setting Summary:

The important things are highlighted:

ColdFusion server version: 8,0,1,195765
Edition: Developer
Java VM Version: 1.6.0_04

With this infos at hand I went to Firebird site to download the appropriate driver version:

I chose the lates driver available, JayBird 2.1.6, for my JVM which is 1.6:

The driver downloads as a zip file, unzip it, extract the driver:

And place it along the other JDBC drivers in the ColdFusion server (the server config can tell you where they are), in my case D:\ColdFusion8\lib and restarted the server:

A quick check confirmed that the driver is in the server classpath:

Now I set up an alias for my Firebird database, just to have a simpler connectionstring, open up the file aliases.conf and add a row for your db, from now on you'll be able to refer to the alias instead of the full path to the db:

Don't forget to restart the Firebird server.

Ok, time to go back to ColdFusion and set up a Data Source:

Name it "LocalFirebird" and chose driver "other" from the drop down list, the click "Add".
After clicking "Add" a new form opens, asking for some other details:

Values are:

JDBC URL: jdbc:firebirdsql:127.0.0.1/3050:pippo (note the alias, no need for the full path)
Driver Class: org.firebirdsql.jdbc.FBDriver
Driver Name: JayBird
Username: your user, standard admin name is SYSDBA
Password: your password, standard password for SYSDBA is masterkey

Now click "Submit" and you'll go back to the Data Sources list, click "Verify all connection" and wait to see if an "OK" appears for the new connection:

After confirming the datasource is OK you can go on and put up a quick test page.

A Devshed forum use has put up a quick tutorial some time ago, kudos to him but it looks like it's a bit old (2003) and "poor in graphics" ;), you can find it here, I added some notes there which correspond to the content of this post.

Sunday, July 27, 2008

The new MINI Clubman

I'm the happy owner of a New MINI ONE and had a chance to try the New MINI Cooper Clubman, which, apart being the bigger sister of my car, is also a second generation New Mini, which means new engine, new steering and some other things ... unfortunately not all of them are nice.
First of all, the new engine seems to be better than the old one, with a nice sound and good power.
The new steering is good, very light when parking and with a nice, heavy feeling at speed.
Car dynamics are great as usual, a pleasure to drive.
Now for the "not nice" things, rear visibility is close to zero:

The rear booth is bigger than the standard Mini (not too difficult ;)) but overall smaller than what I expected:

and the rear doors don't help when parked in tight quarters.

Other than that I didn't like some of the plastics used inside the car

and the awful "Goldrake" look of the central instruments, see it for
yourself:

Basically the Mini didn't change, so nice to drive that you'll forget it's price tag and everything else, but I must say that I like the first generation new Mini better than it's successor.

Tuesday, July 22, 2008

PostgreSQL - Firebird comparison

Google led me to a comparison sheet about PostgreSQL and Firebird by AMSoftwareDesign, as it looks a bit outdated I decided to add some infos about the latest Firebird release, see it in action:

	PostgreSQL vs Firebird feature comparison
Feature	PostgreSQL 8.2.x	Firebird 2.0.x	Firebird 2.5 Alpha
MVCC	Yes	Yes	Yes
Row level Locking Available	Yes	Yes	Yes
Max Database Size	Unlimited*	Unlimited*	Unlimited*
Max table Size	32 TB	~32 TB	~32 TB
Max Row Size	1.6 TB	64 KB	65 KB
Max Rows per Table	Unlimited*	> 16 Billion	> 16 Billion
Max Columns Per Table	50 - 1600 depending on column types	Depends on data types used.	Depends on data types used.
Max Indexes Per Table	Unlimited*	65,535	66,535
Max SQL statement size	Unlimited*	64kb	64kb
Multi Threaded Architecture Available?	No (see "Features we do NOT want" in the TODO list)	Yes (super server) and No (classic server)	Yes (super server), architectures will be unified with full SMP support in 3.0
Ability to re-order table columns without re-creating	No	Yes	Yes
Stores Transaction Information in same file as data	No	Yes	Yes
Auto Increment Columns	Yes (serial type that uses sequences)	Yes (must use a generator and a trigger)	Yes (must use a generator and a trigger)
True Boolean column type	Yes	No	No
Table Inheritance	Yes	No	No
Domains	Yes	Yes	Yes
Table Partitioning	Yes (basic)	No	No
Updateable Views	No (workaround available via rules system)	Yes	Yes
Event/Notification System	Yes	Yes	Yes
Temporary Tables	Yes	No	Yes
Rich Built in Functions	Yes	Yes	Yes
Multi Lang Stored Procedures	Yes (PLPGSQL,PLPerl,PlJava etc)	No	No (support for Java stored procedures is scheduled for 3.0)
Compiled External Function (UDF) Support	Yes	Yes	Yes
Exception handling in stored procedures	Yes	Yes	Yes
2 Phased Commit	Yes	Yes	Yes
Native SSL support	Yes	No	No
Multiple auth methods (i.e. LDAP)	Yes	No	Yes (database auth or integrated windows auth)
Compound Indexes	Yes	Yes	Yes
Unique Indexes	Yes	Yes	Yes
Partial Indexes	Yes	Yes	Yes
Functional Indexes	Yes	Yes	Yes
Multiple Index Storage Types	Yes (btree,hash etc)	No	No
Point in Time Recovery	Yes	No	No
Schema Support	Yes	No	No
Conforms to ANSI-SQL 92/99	Yes	Yes	Yes
Limit/Offset support	Yes	Yes	Yes
Create user defined types	Yes	No	No
Create user defined operators	Yes	No	No
Create user defined Aggregates	Yes	No	No
Log Shipping (for Point In Time Recovery and Log Shipping)	Partial	No	No
Write ahead logging	Yes	No	No
Tablespaces	Yes	No	No
Save Points	Yes	Yes	Yes
Open Source Async Replication	Yes (Slony )	No (Commercial solutions available. Database shadowing is also present.)	No (Commercial solutions available. Database shadowing is also present.)
Online/Hot Backups	Yes	Yes	Yes
File System based backups possible	Yes (Postmaster must be stopped)	Yes	Yes
Require backup/restore to compact	No	Yes	Yes
Fully ACID Compliant	Yes	Yes	Yes
Native Win32 Port	Yes	Yes	Yes
Text/Memo field type	Yes	Yes	Yes
BLOB support	Yes (limited to the max field size of 1 GB)	Yes (Can be up to 32GB)	Yes (Can be up to 32GB)
UTF8 support	Yes	Yes	Yes
Define charactersets/collations per database (default)	Partial (PostgreSQL can also define a characterset for the entire database cluster during the initdb process, it is not recommend to run databases in different encodings than the encoding chosen at initdb time)	Yes	Yes
Define charactersets and collations on a per column level	No	Yes	Yes
Foreign Keys	Yes	Yes	Yes
Check Constraints	Yes	Yes	Yes
Unique Constraints	Yes	Yes	Yes
Not Null Constraints	Yes	Yes	Yes
Multiple Transaction Isolation levels	Yes	Yes	Yes
Fully relational System Catalogs	Yes	Yes	Yes
Information Schema	Yes	No (no schema support) but equivalent system tables	No (no schema support) but equivalent system tables
Native GIS support via GIST or other native means	Yes (PostGIS )	No	No
Open Source Full Text Search	Yes	No	No
Use POSIX Regular Expressions in queries	Yes	No	Yes, through standard predicate SIMILAR TO
Database Monitoring	Yes	No	Yes, through system tables and triggers
Ability to query databases on other servers local or remote.	Yes (Dblink )	No	Yes (through EXECUTE STATEMENT)
Ability to query other databases	Yes (DBI-Link ,DBLink-TDS )	No	No
Read Only Databases	No	Yes	Yes
Regular Version Updates	Yes	Yes	Yes

* Unlimited but still restricted by system resources.

so many trails ... so little time