Sunday, November 02, 2008

Old challenges, new synthax ...

I've blogged before about the new SQL synthax which is becoming available in databases and how it helps solving questions which are increasingly common.

Now it's time for another example, something which doesn't come up often in a reporting environment as most tools have this feature, but can be a problem if you're building your output with a scipting language.
Adding a "Total" row at the end of your tabular output, that's it! Here you can find the Devshed forum post that prompted this summary.

Aside from the classical solution using a UNION and an ORDER BY query, I suggested using the new WITH ROLLUP clause, which is implemented by many commercial databases and MySQL (available from 4.1).
Unfortunately MySQL's implementations poses some limitations to the ORDER BY clause, actually it's impossible to directly order by the aggregate column, which is a quite common requirement, on with the main problem using the Sakila sample database.

An example requirement could be showing all sales by category and a row holding the grand total.
In Sakila the basic query would be:

  1. SELECT
  2. c.name AS category,
  3. SUM(p.amount) AS total_sales
  4. FROM
  5. sakila.payment p
  6. JOIN sakila.rental r
  7. ON p.rental_id = r.rental_id
  8. JOIN sakila.inventory i
  9. ON r.inventory_id = i.inventory_id
  10. JOIN sakila.film f
  11. ON i.film_id = f.film_id
  12. JOIN sakila.film_category fc
  13. ON f.film_id = fc.film_id
  14. JOIN sakila.category c
  15. ON fc.category_id = c.category_id
  16. GROUP BY
  17. c.name;


As I said a requirement based on ordering by total_sales in MySQL can't be satisfied without using a trick, like nesting selects, and so I'll skip it (BTW: this made me notice a bad practice left around in Sakila, usage of order by in views, see "sales_by_film_category") .

The traditional solution for adding a total row would be:

  1. SELECT
  2. c.name AS category,
  3. SUM(p.amount) AS total_sales
  4. FROM
  5. sakila.payment p
  6. JOIN sakila.rental r
  7. ON p.rental_id = r.rental_id
  8. JOIN sakila.inventory i
  9. ON r.inventory_id = i.inventory_id
  10. JOIN sakila.film f
  11. ON i.film_id = f.film_id
  12. JOIN sakila.film_category fc
  13. ON f.film_id = fc.film_id
  14. JOIN sakila.category c
  15. ON fc.category_id = c.category_id
  16. GROUP BY
  17. c.name
  18. UNION ALL
  19. SELECT
  20. NULL,
  21. SUM(p.amount) AS total_sales
  22. FROM
  23. sakila.payment p
  24. JOIN sakila.rental r
  25. ON p.rental_id = r.rental_id
  26. JOIN sakila.inventory i
  27. ON r.inventory_id = i.inventory_id
  28. JOIN sakila.film f
  29. ON i.film_id = f.film_id
  30. JOIN sakila.film_category fc
  31. ON f.film_id = fc.film_id
  32. JOIN sakila.category c
  33. ON fc.category_id = c.category_id
  34. ORDER
  35. BY CASE WHEN category IS NULL
  36. THEN 'last'
  37. ELSE 'first' END
  38. , category

As you can see it can be quite hard to read ... the order by trick is last.

The explain plan can become quite nasty too ... (MySQLPerformanceBlog wrote about this earlier), now have a look at the more modern solution:

  1. SELECT
  2. c.name AS category,
  3. SUM(p.amount) AS total_sales
  4. FROM
  5. sakila.payment p
  6. JOIN sakila.rental r
  7. ON p.rental_id = r.rental_id
  8. JOIN sakila.inventory i
  9. ON r.inventory_id = i.inventory_id
  10. JOIN sakila.film f
  11. ON i.film_id = f.film_id
  12. JOIN sakila.film_category fc
  13. ON f.film_id = fc.film_id
  14. JOIN sakila.category c
  15. ON fc.category_id = c.category_id
  16. GROUP BY
  17. c.name
  18. WITH rollup;

Much simpler, isn't it?

But what about efficiency?
Let's look at explain plans for those queries:

First one, the classic way:

mysql> EXPLAIN EXTENDED select
-> c.name AS category,
-> sum(p.amount) AS total_sales
-> from
-> sakila.payment p
-> join sakila.rental r
-> on p.rental_id = r.rental_id
-> join sakila.inventory i
-> on r.inventory_id = i.inventory_id
-> join sakila.film f
-> on i.film_id = f.film_id
-> join sakila.film_category fc
-> on f.film_id = fc.film_id
-> join sakila.category c
-> on fc.category_id = c.category_id
-> group by
-> c.name
-> union all
-> select
-> null,
-> sum(p.amount) AS total_sales
-> from
-> sakila.payment p
-> join sakila.rental r
-> on p.rental_id = r.rental_id
-> join sakila.inventory i
-> on r.inventory_id = i.inventory_id
-> join sakila.film f
-> on i.film_id = f.film_id
-> join sakila.film_category fc
-> on f.film_id = fc.film_id
-> join sakila.category c
-> on fc.category_id = c.category_id
-> ORDER
-> BY CASE WHEN category IS NULL
-> THEN 'last'
-> ELSE 'first' END
-> , category;
+----+--------------+------------+--------+-----------------------------------+---------------------------+---------+-----------------------+------+----------+---------------------------------+
| id | select_type | table | type | possible_keys |key | key_len | ref | rows | filtered |Extra |
+----+--------------+------------+--------+-----------------------------------+---------------------------+---------+-----------------------+------+----------+---------------------------------+
| 1 | PRIMARY | c | ALL | PRIMARY |NULL | NULL | NULL | 16 | 100.00 |Using temporary; Using filesort |
| 1 | PRIMARY | fc | ref | PRIMARY,fk_film_category_category |fk_film_category_category | 1 | sakila.c.category_id | 9 | 100.00 |Using index |
| 1 | PRIMARY | f | eq_ref | PRIMARY |PRIMARY | 2 | sakila.fc.film_id | 1 | 100.00 |Using index |
| 1 | PRIMARY | i | ref | PRIMARY,idx_fk_film_id |idx_fk_film_id | 2 | sakila.fc.film_id | 2 | 100.00 |Using index |
| 1 | PRIMARY | r | ref | PRIMARY,idx_fk_inventory_id |idx_fk_inventory_id | 3 | sakila.i.inventory_id | 1 | 100.00 |Using index |
| 1 | PRIMARY | p | ref | fk_payment_rental |fk_payment_rental | 5 | sakila.r.rental_id | 1 | 100.00 |Using where |
| 2 | UNION | c | index | PRIMARY |PRIMARY | 1 | NULL | 16 | 100.00 |Using index |
| 2 | UNION | fc | ref | PRIMARY,fk_film_category_category |fk_film_category_category | 1 | sakila.c.category_id | 9 | 100.00 |Using index |
| 2 | UNION | f | eq_ref | PRIMARY |PRIMARY | 2 | sakila.fc.film_id | 1 | 100.00 |Using index |
| 2 | UNION | i | ref | PRIMARY,idx_fk_film_id |idx_fk_film_id | 2 | sakila.fc.film_id | 2 | 100.00 |Using index |
| 2 | UNION | r | ref | PRIMARY,idx_fk_inventory_id |idx_fk_inventory_id | 3 | sakila.i.inventory_id | 1 | 100.00 |Using index |
| 2 | UNION | p | ref | fk_payment_rental |fk_payment_rental | 5 | sakila.r.rental_id | 1 | 100.00 |Using where |
| NULL | UNION RESULT | | ALL | NULL| NULL | NULL | NULL | NULL | NULL| Using filesort |
+----+--------------+------------+--------+-----------------------------------+---------------------------+---------+-----------------------+------+----------+---------------------------------+
13 rows in set, 1 warning (0.00 sec)

mysql>

Pretty bad isn't it? Two queries are executed, then glued together.

Now the second one WITH ROLLUP:

mysql> EXPLAIN EXTENDED select
-> c.name AS category,
-> sum(p.amount) AS total_sales
-> from
-> sakila.payment p
-> join sakila.rental r
-> on p.rental_id = r.rental_id
-> join sakila.inventory i
-> on r.inventory_id = i.inventory_id
-> join sakila.film f
-> on i.film_id = f.film_id
-> join sakila.film_category fc
-> on f.film_id = fc.film_id
-> join sakila.category c
-> on fc.category_id = c.category_id
-> group by
-> c.name
-> with rollup;
+----+-------------+-------+--------+-----------------------------------+---------------------------+---------+-----------------------+------+----------+----------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+--------+-----------------------------------+---------------------------+---------+-----------------------+------+----------+----------------+
| 1 | SIMPLE | c | ALL | PRIMARY | NULL | NULL | NULL | 16 | 100.00 | Using filesort |
| 1 | SIMPLE | fc | ref | PRIMARY,fk_film_category_category | fk_film_category_category | 1 | sakila.c.category_id | 9 | 100.00 | Using index |
| 1 | SIMPLE | f | eq_ref | PRIMARY | PRIMARY | 2 | sakila.fc.film_id | 1 | 100.00 | Using index |
| 1 | SIMPLE | i | ref | PRIMARY,idx_fk_film_id | idx_fk_film_id | 2 | sakila.fc.film_id | 2 | 100.00 | Using index |
| 1 | SIMPLE | r | ref | PRIMARY,idx_fk_inventory_id | idx_fk_inventory_id | 3 | sakila.i.inventory_id | 1 | 100.00 | Using index |
| 1 | SIMPLE | p | ref | fk_payment_rental | fk_payment_rental | 5 | sakila.r.rental_id | 1 | 100.00 | Using where |
+----+-------------+-------+--------+-----------------------------------+---------------------------+---------+-----------------------+------+----------+----------------+
6 rows in set, 1 warning (0.00 sec)

mysql>

A lot better, isn't it? And faster ;)

BTW: More about EPLAIN EXTENDED here.

Wednesday, October 01, 2008

Adding an OpenLDAP server as SQLServer linked server

... and querying it.

You can link to an OpenLDAP server from SQLServer by:

1. Running the following stored procedures:

Use master
go
exec sp_addlinkedserver
'ADSI_OpenLDAP', --name of the linked server
'My OpenLDAP Linked server', --description
'ADSDSOObject',
'adsdatasource'
go
exec sp_addlinkedserverlogin
'ADSIOpenLDAP',
False,
'sa',
'your_root_dn',
'your_root_dn_password'
go

Note that this skips each and every proper security management practice.

Now you can query the linked server this way:

select
*
from
OpenQuery(ADSI_OpenLDAP,
'SELECT
*
FROM
"LDAP://localhost/dc=mydomain,dc=com"
WHERE
objectClass="organizationalUnit"')

which will return all organizational units.
Or:

select
*
from
OpenQuery(ADSI_OpenLDAP,
'SELECT
*
FROM
"LDAP://localhost/ou=myou,dc=mydomain,dc=com"
WHERE
objectClass="InetOrgPerson"')

which will return all InetOrgPerson (i.e. users) from the specified organizationalUnit (ou)
here called "myou".
This is also available as a PDF file on my old site

Monday, September 08, 2008

Random selection, with a bias ...

Say you want to randomly select your employee of the month, but not so randomly, better, you'd like to give your best employees a bigger chance to be selected based on their rating.
This is just an example, you could be randomly displaying ads from your customers, but giving an higher chance to be displayed to those who are paying more, there can be a million other example, but I hope you got the sense of this.
In other terms this means to add some skew to your data, so that it's distribution is no more uniform.
Apart from the statistical implications of this, let's see the sql in action, I'll be using Firebird's Employee sample database, first of all we'll add a column to the employee table, holding the employee rating, then set the rating to 5 for employee 2, 10 for employee 4, 15 for employee 5, 1 for all other employees.
Now that we have rated each employee we need the magic to have them randomly selected with a bias ...
This is accomplished with a "sequence table" which will do just one thing, hold a list of numbers starting from the minimum rating value to the max rating (you'll probably want to preload it with a lot of numbers starting from 1).
Here is the sql:

CREATE TABLE CONSECUTIVE_NUMBER(
NUM Integer NOT NULL,
CONSTRAINT PK_CONSECUTIVE_NUMBER_1 PRIMARY KEY (NUM)
);

And load it:

insert Into Consecutive_number (num) Values (1);
Insert Into Consecutive_number (num) Values (2);
Insert Into Consecutive_number (num) Values (3);
Insert Into Consecutive_number (num) Values (4);
Insert Into Consecutive_number (num) Values (5);
Insert Into Consecutive_number (num) Values (6);
Insert Into Consecutive_number (num) Values (7);
Insert Into Consecutive_number (num) Values (8);
Insert Into Consecutive_number (num) Values (9);
Insert Into Consecutive_number (num) Values (10);
Insert Into Consecutive_number (num) Values (11);
Insert Into Consecutive_number (num) Values (12);
Insert Into Consecutive_number (num) Values (13);
Insert Into Consecutive_number (num) Values (14);
Insert Into Consecutive_number (num) Values (15);
Insert Into Consecutive_number (num) Values (16);
Insert Into Consecutive_number (num) Values (17);
Insert Into Consecutive_number (num) Values (18);
Insert Into Consecutive_number (num) Values (19);
Insert Into Consecutive_number (num) Values (20);

Now we are ready for the magic, we'll be joining our employee table with this sequence table, generating a specific cross join which will hold as many rows for each employee as it's rating, thus, when doing a random select on this data there will be a bigger chance to choose some employees, i.e. employee number 5 will have 15 times the chance to be selected than employee number 8 and 3 times the chances of employee number 2.
Here is the query, embedded in a view:

CREATE VIEW
RANDOM_SKEWED_EMPLOYEE (EMP_NO)
AS
/* write select statement here */
SELECT
FIRST 1
a.EMP_NO
FROM
EMPLOYEE a
INNER JOIN
CONSECUTIVE_NUMBER c
ON
c.NUM BETWEEN 1 AND a.RATING
ORDER BY
rand();

The join clause does the magic, as I said.
Now extract a small sample:

24
5
36
85
5
2
12
24
144
5
5
20
94
5
4
127
5
94
85
4
2
34
136
4
9

Let's see it's distribution, even on such a small sample you'll see that those 3 employees appear more often than others and with more or less the expected relative ratio:
Emp_no Times
2 5
4 9
5 15
9 4
14 1
20 1
28 2
29 2
36 1
37 1
52 2
71 1
72 1
83 1
113 1
114 1
118 2
121 1
127 1
138 2
144 1

Of course a larger sample will be closer to the desired result.
This example is buit on Firebird, but it should work in any database.
HTH

Note that PostgreSQL's generate_series() instruction might help a lot ;)

Tuesday, July 29, 2008

Setting up ColdFusion and Firebird

After a long struggle on Devshed forums I decided to try on my own, after all I know Firebird and don't see why it and ColdFusion can't play nice together ... infact they did, and easily.
Already having installed on my PC the database (Firebird 2.5 Alpha1) and a client tool (Flamerobin 0.8.6) I went for the server, downloading and installing the free developer version of ColdFusion 8 from Adobe's site.
Install went smooth and I just confirmed every default option but the drive D: in my case instead of the standard C:
After the install finished I logged into the admin console at the default address and opened the Server Setting Summary:

The important things are highlighted:
  1. ColdFusion server version: 8,0,1,195765
  2. Edition: Developer
  3. Java VM Version: 1.6.0_04
With this infos at hand I went to Firebird site to download the appropriate driver version:

I chose the lates driver available, JayBird 2.1.6, for my JVM which is 1.6:


The driver downloads as a zip file, unzip it, extract the driver:

And place it along the other JDBC drivers in the ColdFusion server (the server config can tell you where they are), in my case D:\ColdFusion8\lib and restarted the server:


A quick check confirmed that the driver is in the server classpath:

Now I set up an alias for my Firebird database, just to have a simpler connectionstring, open up the file aliases.conf and add a row for your db, from now on you'll be able to refer to the alias instead of the full path to the db:


Don't forget to restart the Firebird server.

Ok, time to go back to ColdFusion and set up a Data Source:


Name it "LocalFirebird" and chose driver "other" from the drop down list, the click "Add".
After clicking "Add" a new form opens, asking for some other details:

Values are:

  1. JDBC URL: jdbc:firebirdsql:127.0.0.1/3050:pippo (note the alias, no need for the full path)
  2. Driver Class: org.firebirdsql.jdbc.FBDriver
  3. Driver Name: JayBird
  4. Username: your user, standard admin name is SYSDBA
  5. Password: your password, standard password for SYSDBA is masterkey
Now click "Submit" and you'll go back to the Data Sources list, click "Verify all connection" and wait to see if an "OK" appears for the new connection:

After confirming the datasource is OK you can go on and put up a quick test page.

A Devshed forum use has put up a quick tutorial some time ago, kudos to him but it looks like it's a bit old (2003) and "poor in graphics" ;), you can find it here, I added some notes there which correspond to the content of this post.

Sunday, July 27, 2008

The new MINI Clubman

I'm the happy owner of a New MINI ONE and had a chance to try the New MINI Cooper Clubman, which, apart being the bigger sister of my car, is also a second generation New Mini, which means new engine, new steering and some other things ... unfortunately not all of them are nice.
First of all, the new engine seems to be better than the old one, with a nice sound and good power.
The new steering is good, very light when parking and with a nice, heavy feeling at speed.
Car dynamics are great as usual, a pleasure to drive.
Now for the "not nice" things, rear visibility is close to zero:











The rear booth is bigger than the standard Mini (not too difficult ;)) but overall smaller than what I expected:











and the rear doors don't help when parked in tight quarters.










Other than that I didn't like some of the plastics used inside the car

and the awful "Goldrake" look of the central instruments, see it for
yourself:














Basically the Mini didn't change, so nice to drive that you'll forget it's price tag and everything else, but I must say that I like the first generation new Mini better than it's successor.

Tuesday, July 22, 2008

PostgreSQL - Firebird comparison

Google led me to a comparison sheet about PostgreSQL and Firebird by AMSoftwareDesign, as it looks a bit outdated I decided to add some infos about the latest Firebird release, see it in action:


PostgreSQL vs Firebird feature comparison
Feature PostgreSQL 8.2.x Firebird 2.0.x Firebird 2.5 Alpha
MVCC Yes Yes Yes
Row level Locking
Available
Yes Yes Yes
Max
Database Size
Unlimited* Unlimited* Unlimited*
Max
table Size
32 TB ~32 TB ~32 TB
Max
Row Size
1.6 TB 64 KB 65 KB
Max
Rows per Table
Unlimited* > 16 Billion > 16 Billion
Max
Columns Per Table
50 - 1600 depending on column types Depends on data types used. Depends on data types used.
Max
Indexes Per Table
Unlimited* 65,535 66,535
Max
SQL statement size
Unlimited* 64kb 64kb
Multi Threaded
Architecture Available?
No (see "Features we do NOT want" in the TODO list) Yes (super server) and No (classic server)
Yes (super server), architectures will be unified with full SMP support in 3.0
Ability to re-order
table columns
without re-creating
No Yes Yes
Stores
Transaction Information
in same file as data
No Yes Yes
Auto Increment
Columns
Yes (serial type that uses sequences) Yes (must use a generator and a trigger) Yes (must use a generator and a trigger)
True Boolean
column type
Yes No No
Table Inheritance Yes No No
Domains Yes Yes Yes
Table Partitioning Yes (basic) No No
Updateable Views No (workaround available via rules system) Yes Yes
Event/Notification System Yes Yes Yes
Temporary Tables Yes No Yes
Rich Built in Functions Yes Yes
Yes
Multi Lang Stored Procedures Yes (PLPGSQL,PLPerl,PlJava etc) No No (support for Java stored procedures is scheduled for 3.0)
Compiled External Function
(UDF) Support
Yes Yes
Yes
Exception handling
in stored procedures
Yes Yes Yes
2 Phased Commit Yes Yes Yes
Native SSL support Yes No No
Multiple auth methods
(i.e. LDAP)
Yes No Yes (database auth or integrated windows auth)
Compound Indexes Yes Yes Yes
Unique Indexes Yes Yes Yes
Partial Indexes Yes Yes Yes
Functional Indexes Yes Yes Yes
Multiple Index Storage Types Yes (btree,hash etc) No No
Point in Time Recovery Yes No No
Schema Support Yes No No
Conforms to ANSI-SQL 92/99 Yes Yes Yes
Limit/Offset support Yes Yes Yes
Create user defined types Yes No No
Create user defined operators Yes No No
Create user defined Aggregates Yes No No
Log Shipping (for Point In Time Recovery and Log Shipping) Partial No No
Write ahead logging Yes No No
Tablespaces Yes No No
Save Points Yes Yes Yes
Open Source
Async Replication
Yes (Slony ) No (Commercial solutions available. Database shadowing is also present.) No (Commercial solutions available. Database shadowing is also present.)
Online/Hot Backups Yes Yes Yes
File System based
backups possible
Yes (Postmaster must be stopped) Yes Yes
Require backup/restore to compact No Yes Yes
Fully ACID Compliant Yes Yes Yes
Native Win32 Port Yes Yes Yes
Text/Memo field type Yes Yes
Yes
BLOB support Yes (limited to the max field size of 1 GB) Yes (Can be up to 32GB) Yes (Can be up to 32GB)
UTF8 support Yes Yes Yes
Define charactersets/collations per database (default) Partial (PostgreSQL can also define a characterset for the entire database cluster during the initdb process, it is not recommend to run databases in different
encodings than the encoding chosen at initdb time
)
Yes Yes
Define charactersets and collations on a per column level No Yes Yes
Foreign Keys Yes Yes Yes
Check Constraints Yes Yes Yes
Unique Constraints Yes Yes Yes
Not Null Constraints Yes Yes Yes
Multiple Transaction Isolation levels Yes Yes Yes
Fully relational System Catalogs Yes Yes Yes
Information Schema Yes No (no schema support) but equivalent system tables No (no schema support) but equivalent system tables
Native GIS support via GIST or other native means Yes (PostGIS ) No No
Open Source Full Text Search Yes No No
Use POSIX Regular Expressions in queries Yes No Yes, through standard predicate SIMILAR TO
Database Monitoring Yes No Yes, through system tables and triggers
Ability to query databases on other servers local or remote. Yes (Dblink ) No Yes (through EXECUTE STATEMENT)
Ability to query other databases Yes (DBI-Link ,DBLink-TDS ) No No
Read Only Databases No Yes Yes
Regular Version Updates Yes Yes Yes




* Unlimited but still restricted by system resources.