Search This Blog

19 December 2007

These past weeks

Ever since I submitted the "External Language Stored Procedures" project which Eric and myself have been working on to be listed on FreshMeat, it has gained a small number of interested users and we have just started to get some feedback. Yay! Always nice to know when people are using your code. Okay, so one of the emails contained a feature request: Support for PHP stored procedures. I think I shall play with that during my Christmas vacation next week. The weekend refresh took a little longer than expected to complete due to changes inside the mysqld code but I managed to test and push the changes before midnight.

Sometimes a task morphs gradually as requirements creep in... WL#3771 is one of those where even the title and direction has changed completely through its evolution.. Last week and this, I have completed a draft of the code in the task's current incarnation and hope it will be deemed acceptable and pushable soon.

A few days ago, I had just completed porting the Falcon storage engine to Sparc and PowerPC (again). Verified that all tests which aren't disabled complete successfully. I have been told that Jim will review the patch and may push portions vaguely based upon my patch sometime in the hand-wavy future.

06 December 2007

Vulnerable PHP application?

I seem to be getting a lot of requests on my webserver of the form: "GET /<directory-path>/pmapper-3.2-beta3/incphp/globals.php?__SESSION[PM_INCPHP]=<some-url>?" where replace the <directory-path> and the <some-url> with a path on your server and some random website respectively. It appears to be a probe to test for some vulnerability but I don't have the full request logged, only the request up to the question mark on the GET request so the payload isn't logged. The changing URLs are probably to throw people off the scent.

I must confess, I had no idea what this "pmapperr" application was before I observed these hits and nor do I have PHP installed on my webserver but I figure if I mention it on this blog, an audience who may be able to investigate can do something.

Oh well, I have done my part.

30 November 2007

Stuff since the last update

The Thanksgiving break was a welcome change of pace... and on the idle driving up and down the Californian coastline, I had a few moments to work on the MySQL External Stored Procedures project. I think I pretty much have Perl functioning fully and a couple of minor problems were resolved.
Yesterday, I completed the first draft of the all-revised-from-scratch WL#3771, now titled as the Plugable Audit Interface. Moving on to my other worklog task - Falcon on big-endian architectures, again, from scratch.

27 November 2007

Organic vs Directed design

Trying to make a good first few implementation iterations is hard and is sometimes made harder by specifications which are not minutely detailed. When the specification is vague, organic development sometimes seems to excel, partly because there is little or no worries about refactoring or redesigning as necessary. Trying to make something absolutely perfect on the first iteration is difficult, especially when attempting to ensure that the code is maintainable and extensible.
Still, not bad progress given that this is only the 2nd day of solid work since Thanksgiving.
Enough of that, time to go to bed.

10 November 2007

10 Minute HOW-TO

This question seems to crop up again and again... In the words of one comment to my blog: "I've downloaded your External Language Stored Procedures package and am interested in trying out the JAVA part. Would appreciate if you could explain how use the downloaded stuff."

Ok, on my download page, I have provided 2 different downloads which are automatically created from our source repository. A source patch which may be applied to a current mysql-6.0 tree and a tarball snapshot of the work repository. The source snapshot is not as friendly to build as the official MySQL source downloads as it is somewhat raw and unfinished - you will need certain GNU tools installed on your computer to make use of it. As a minimum, you will need the following:
  • GNU autoconf version 2.52 - I have version 2.61
  • GNU automake 1.08 - I have version 1.10
  • GNU libtool 1.4 - I have version 1.5.24
  • GNU bison 1.8 - I have version 2.3
  • GNU make - I have version 3.81

The simplest first and only step you may use is the BUILD scripts ... locate a build script for you and simply execute it such as:
$ ./BUILD/compile-ppc-debug-max-no-ndb
If you have any specific configure requirements, you may do:
$ ./BUILD/autorun.sh
$ ./configure <your configure options here>
There are also options available in most of the standard compile scripts too. For example "--just-print" would display the commands which the script would execute or "--just-configure" would stop the build immediately after completing the configure script. Almost always, I would use one of the standard scripts.

Now on to the minimum requirements of the External Language Stored Procedures work which Eric and myself are hacking with... There is a small wiki page available at MySQL Forge but it needs further embellishment:
  • The Java plug-in probably requires JDK 1.4 however JDK 1.3 may work. Supported JDKs include JDKs from Apple, Sun, IBM and Blackdown. Please send an email to Eric or myself if you experience any problems on your setup.
  • The XML-RPC plug-in requires the xmlrpc-c library available from their Sourceforge site. As a minimum, you should use no less than version 1.03 ... version 1.06 is better but I have not tried their newer alphas.
  • The Perl plug-in will require a properly configured Perl runtime with multithreading configured.... If you want to check, run "perl -V" and check what it says about usethreads, useithreads and usemultiplicity. For example, if a line like the following appears, you are ok:
        usethreads=define use5005threads=undef useithreads=define usemultiplicity=define
    However, if you see a line like the next line, the Perl plugin should not even attempt to compile:
        usethreads=undef use5005threads=undef useithreads=undef usemultiplicity=undef
    Unfortunately, most Linux distros will compile your Perl without support for threads.
    I should mention that the code for the Perl plug-in is incomplete so any attempt to actually use it will crash for now.

Eric and I have submitted a talk on our External Language Stored Procedures work for the MySQL Users Conference 2008 so if we are all lucky, there may be a presentation about this project. They have received more than 300 presentation proposals and I am sure that almost all of them are excellent presentations from seasoned presenters, so... Fingers crossed!

For all those curious and adventurous, please do download the code and have fun!

24 October 2007

Proposed a talk for UC2008

Well, we did it. In between his SCRUM meeting and my compiling MySQL on Windows, Eric and I bashed out a proposal for the MySQL Users Conference 2008 and have submitted it. The content of the talk, well... It is entitled "Tour of External language stored procedures for MySQL" and we're going to talk about our little project.
On Kaj's blog, he suggests that submissions should be creative, descriptive and specific. I think we have all three bases covered.
Fingers crossed!

23 October 2007

The context of external language procedure's queries

Perhaps it wasn't entirely clear what the last posting meant by "reentrant". It means that the external language stored procedure can issue a query within the same thread context that executed the stored procedure.

Thanks to the question posed by Bill Karwin, I have tested this scenarios to demonstrate:
mysql> lock table mysql.proc write;
Query OK, 0 rows affected (0.00 sec)

mysql> call test2('select distinct name from mysql.proc');
+--------------------+
| name |
+--------------------+
| test1 |
| test2 |
| xml_current_time |
| xml_get_state |
| xml_list_methods |
| xml_list_signature |
| xml_method_help |
| xml_sum_and_diff |
+--------------------+
8 rows in set (0.00 sec)

Query OK, 0 rows affected (0.00 sec)

mysql> unlock tables;
Query OK, 0 rows affected (0.00 sec)
As you can see, the exclusive write lock on the PROC table has not prevented the inline query from being able to read the table as it truly is executed within the same thread.
A second example to demonstrate that the stored procedure is executed from within the same transaction context:
mysql> create table foo (a int primary key) engine=innodb;
Query OK, 0 rows affected (0.19 sec)

mysql> set autocommit=0;
Query OK, 0 rows affected (0.00 sec)

mysql> insert into foo values (1),(2),(4);
Query OK, 3 rows affected (0.00 sec)
Records: 3 Duplicates: 0 Warnings: 0

mysql> call test2('commit');
Query OK, 0 rows affected (0.01 sec)

mysql> insert into foo values (8),(12),(14);
Query OK, 3 rows affected (0.00 sec)
Records: 3 Duplicates: 0 Warnings: 0

mysql> call test2('rollback');
Query OK, 0 rows affected (0.00 sec)

mysql> select * from foo;
+---+
| a |
+---+
| 1 |
| 2 |
| 4 |
+---+
3 rows in set (0.00 sec)

I hope that clarifies things a little.

Reentrant MySQL with External Stored Procedures

Ok, this is a late late blog post because the code was done and pushed into the work tree not last weekend, but the weekend before. I think it is a pretty significant feature to blog about but for some strange reason, it completely slipped my mind.

Check this out:
mysql> call test2('select count(*) from mysql.user; select count(*) from mysql.proc');
+----------+
| count(*) |
+----------+
| 3 |
+----------+
1 row in set (0.00 sec)

+----------+
| count(*) |
+----------+
| 8 |
+----------+
1 row in set (0.00 sec)

Query OK, 0 rows affected (0.00 sec)
Not impressed yet? How about this:
mysql> set max_sp_recursion_depth=2;
Query OK, 0 rows affected (0.00 sec)

mysql> call test2('select count(*) from mysql.proc;call test2("select count(*) from mysql.db;call test2(\'select count(*) from mysql.user\')")');
+----------+
| count(*) |
+----------+
| 8 |
+----------+
1 row in set (0.00 sec)

+----------+
| count(*) |
+----------+
| 2 |
+----------+
1 row in set (0.00 sec)

+----------+
| count(*) |
+----------+
| 3 |
+----------+
1 row in set (0.00 sec)

Query OK, 0 rows affected (0.00 sec)
And here was how the procedure was declared:

CREATE PROCEDURE test2(query text)
READS SQL DATA
LANGUAGE deep_thought
DYNAMIC RESULT SETS 5
EXTERNAL NAME 'loopback';
Maybe now you're curious how the language plugin performs this magic? Well, the psm_example.so plugin is actually linked against the MySQL client libraries, albeit a craftily modified version of it: Less than 10 lines of code within the client library was added. Then the stored routine was able to reenter the mysql server simply by using the mysql client library and execute statements within the same context by having code somewhat like the following:

mysql_options(&mysql, MYSQL_OPT_USE_INLINE_CONNECTION, NULL);
if (!mysql_real_connect(&mysql, 0, 0, 0, 0, 0, 0, CLIENT_MULTI_STATEMENTS))
{ ... }

Pretty nifty, innit?

07 October 2007

Update on XML_RPC

Wow, it actually worked with only minor changes to the code. Check this out:

mysql> create function xml_get_state(id int) returns text
-> no sql language xmlrpc external name
-> 'xmlrpc://betty.userland.com/RPC2;examples.getStateName';
Query OK, 0 rows affected (0.00 sec)

mysql> select xml_get_state(40);
+-------------------+
| xml_get_state(40) |
+-------------------+
| South Carolina |
+-------------------+
1 row in set (0.42 sec)

Now I just need to check that it works for more complex resultsets... Trawl the web for open-access trivial XML-RPC services or build a test XML-RPC server?

A few fun ideas... XML-RPC as a Stored Procedure.

Wouldn't it be fun to be able to embed web-service routines into MySQL as if they are stored procedures?
CREATE FUNCTION xml_current_time() RETURNS DATETIME NO SQL
LANGUAGE XMLRPC
EXTERNAL NAME "xmlrpc://time.userland.com/RPC2;currentTime.getCurrentTime";

Perhaps not a completely useful example... but an interesting goal to achieve. I have written up some code and pushed it to the experimental repository. I haven't tested it yet but I may do later today if I don't get too preoccupied catching up watching NBC's "Heroes". Snapshots created automagically at midnight, Pacific time.

05 October 2007

External Language Stored Procedures

Yay! It's the weekend!

Why does the weekend make me cheerful? I get time to spend on personal projects, although my sweet wife does complain that my 'personal projects' don't look much different than my day job.

This project, I have been doodling with for quite a while. It spawned from a little project that Eric Herman and myself started to look at 2 years ago with what we called "MySQL UDF v2". At the time, it included a whole plug-in and unit testing framework. Since we have to work with what is in the server, we dropped our plug-in and unit test framework after the current MySQL 5.1 plug-in system suddenly appeared: Somehow, I ended up maintaining it but that's another story.

A more modern and sane system for user-defined functions is still our goal. One part of our concept is to minimize the amount of 'heavy lifting' that needs to be done in C in order to create a new language plug-in. Eric being the happy Java guy that he is, is working on the Java stuff and for some crazy reason I have decided to take on Perl as an interesting challenge.

Alas, we never get much solid time to work on this project - The bulk of the effort is done during MySQL developer conferences, the most recent was the Heidelberg conference which finished only a few days ago. Heidelberg was significant because Eric and I successfully demonstrated Java UDFs working with a SQL standards compliant CREATE declaration and the Java function demonstrated was a simple string reverse method, although it is capable of much more.

We have every plan to continue working on this project - it is fun, especially now that we can demonstrate it. I have thrown together a cron job which will make a snapshot of our work repository and publish it on my personal website.

Words of encouragement or even code are all greatly appreciated.

Re: Gemini, Forgotten about this one aye?

Brian mentioned Gemini on his blog... As many people remember, there was a stunned silence for a while when Oracle purchased InnoBase OY only 2 years ago... Very similar to the stunned silence about 6 years ago when the suits at NuSphere tried to do a SCO. Back on to the topic at hand... Gemini. About two years ago, Brian fed me a copy of the Gemini engine saying that "Hey, you remember Gemini? Here's the code. Get it working. Don't talk about it and call it 'Amira'.". Perhaps not exactly his words but close enough.

So the Amira project had a shaky beginning... I ported the code to work on MySQL 5.0 and started examining it, removing platform-specific code, and improving the performance of it. From the first working run to the last, I think performance improved by about a factor of 10 but I don't remember specifics. The Amira project was cancelled 2Q2006. I suppose the reason why would be due to the problem of ownership - the code may be GPL but it is still copyrighted by NuSphere. No better position than the situation with InnoBase's GPL storage engine.

Now Brian has released the original code. I suppose I can commit to his repository a updated Gemini/Amira which works as a storage engine plug-in for MySQL 5.1... And if I have time, I could work on additional performance improvements which was planned but never implemented.

24 August 2007

Testing MySQL on a multicore Mac

Making my Quad G5 Mac work a little harder... Now that it has 4G of RAM, it doesn't seem to swap as much and I can happily build up to 2 MySQL source repositories and a couple of concurrent mysql-test-run.
I am currently experimenting with a RAM drive for handling the test run data, using something like the following for creating a 512M RAM disk.

diskutil eraseVolume UFS RAMDisk `hdid -nomount ram://1048576`
sudo chmod g+rwt /Volumes/RAMDisk

export MTR_BUILD_THREAD=auto
export MTR_MEM=/Volumes/RAMDisk

So far, looks pretty good.

09 July 2007

It's not exactly serious work but I found this post on Slashdot hilarious.
Definitely mod as funny.

03 July 2007

FUD and Mirrors

To some, this will be a repost as I originally wrote it for an internal MySQL mailing list but I have had much positive replies to the posting, I thought I should share it with a wider audience.
Windows Vista will be successful irrespective of any comments Microsoft
makes about Linux and Open Source software purely because of a few
simple facts.

  1. Most people do not care about Linux/OSS/GPL/etc. To them, their
    computer is a single monolithic device - they do not separate the
    hardware from the software - it all came together in the one box. It
    never occurs to them that the operating system can be changed or that
    there is an alternative to Microsoft Office - in fact to them, all
    software is Windows. It was common to hear people discuss that they have
    Windows 97 PC or XP 2003 computer when they really meant that they had
    "Windows with MS Office 97/2003 on their PC from Dell/Acer/etc".

  2. Most people want what everyone else has. Why do you think
    advertisements focus on how popular a product is as a reason why you
    should get it too. Think about car advertisements saying that XYZ is
    voted "Car of the Year 2008" etc. The best way to promote any product is
    by saying how popular it is. I am sure that MySQL benefits from this
    effect too - we promote ourselves as the most popular free database and
    people are drawn to that.

  3. Games + Drivers. Like as like not, all the game publishers and all
    the hardware vendors will fall over each other to make sure that their
    stuff will work for the majority of people.

Most importantly - the average consumer does not hear any of this news
about the patent posturing that MS is making with regard to Linux/OSS.
It will not affect their purchasing decisions - partly because they
don't know that there is a decision to be made nor will they choose
otherwise if the option was presented to them.

In my opinion - Microsoft's "237 Patents infringed by OSS" statement is
not aimed at the Open Source community, nor is it aimed at the
consumers. It is targeted for the ears and eyes of the ISVs and IHVs to
help ensure that they do not stray from the fold and dip their toes in
supporting alternatives to the Microsoft offerings.

Windows Vista will be hugely successful. More PCs are sold every month
today than 5 years ago and they need an operating system. I would be
very shocked if Vista's launch sales did not outstrip XP's launch.

Like as like not, we will have to ensure that we play well in a MS Vista
world. We would benefit by being the "Most popular database" on the
"Most popular operating system".

Just my 2¢...

28 June 2007

Quick musing on the "Queue" engine.

Brian postulates the idea of a "Queue" engine and why it is impractical. I agree that a queue storage engine doesn't really work too well. What would work better would be either a special 'queue' object type which may use any user accessible storage engine or perhaps a non-standard extension to the SQL syntax:


DEQUEUE message FROM queue_table LIMIT 1;


This can be done in (perhaps) the MySQL Proxy, where it is translated into a SELECT/DELETE combination or it can be added to the server, and should not cause much parser complications.


Then any storage engine may be used for a queue, purely dependent upon the user's requirements as some people will be satisfied with a non-persistent MEMORY based queue, MYISAM queues will be adequate for many and for a few who need transactions, there is INNODB and FALCON.

My 5+ Wish List?

Okay... I had originally stated that I would not publish my wish-list but since Jay Pipes has said that he was collating these ideas, I'll play along hoping that there is a positive influence on the future of MySQL. Here are my five, plus a few extra.... (out of list of many others)

  1. Modular Architecture

    I know that many people have put this on their wish list and I am sure that many people would probably blame me for some of the current architecture woes. Honestly, I have an excuse: I am trying to evolve the plug-in system into something more usable. As Eric Herman can testify, we actually had an alternative plug-in architecture under development as a small skunk works project, which in my opinion was superior than what we have today but we permanently shelved it after the current plug-in system appeared in the tree.

    A good modular architecture would allow parts to be plugged together to make a whole and would allow infinite extensibility and predicable degradation of services should modules be removed. The removal of the ACL module would predicatably remove all ACL checking. Multiple ACL modules would extend capabilities and make it more fine grained as their capabilities merge. In another senerio, we could have a module which acts as a filter to another module - they combine at run time, for example: A filter module could combine with a storage engine module to compress or encrypt blobs or specific data columns - suppose an environment which needs to store credit card details or other personal identifiers which require the persisted data encrypted transparently. There should not be any limits imposed upon the quantity and combination of modules to create plug-in classes except that obviously (and detectably) bogus combinations should be emit errors safely and predictably.

    Example of syntax:


    CREATE LIBRARY myutillib EXTERNAL NAME 'myutils.so';
    CREATE LIBRARY mystorelib EXTERNAL NAME 'mystore.so';
    CREATE STORAGE ENGINE mystore1 USING LIBRARY mystorelib EXTERNAL NAME 'mystore_engine';
    CREATE STORAGE ENGINE bar USING CHAIN OF
    LIBRARY myutillib EXTERNAL NAME 'bzipfield',
    LIBRARY myutillib EXTERNAL NAME 'cryptfield',
    LIBRARY mystorelib EXTERNAL NAME 'mystore_engine';

    -- create a table, the column `secretblob` is compressed with bzip
    -- before being encrypted with AES using a shared secret (stored by mysql table def)
    -- the ssn field is encrypted with triple-DES using a secret stored in a varable secretvar
    CREATE TABLE foouser (
    ...
    ) ENGINE=bar
    bzip_fields="secretblob,otherblob"
    crypt_fields="secretblob:aes:mysecret123,ssn:3des:@secretvar";

    SET secretvar='mysecretsecret';

  2. libmysys as a separate project

    This is an old idea but one which I think holds significant value and would ease maintenance of the server. I believe we would be served well to have libmysys and other non-core common components as a separate project outside of the mysqld server code. Then all the different versions of mysqld which we develop can use a common foundation and abstraction layer. The abstraction layer should not care for what version of mysqld is being built on top of it. Both shared and static libraries, debug and non-debug builds could be made available with the simple caveat that a debug build of mysqld should be built on a debug build of libmysys. However, I think that a non-debug build of mysqld should be made to work on top of a debug build of libmysys. As to what components should be moved into the libmysys project? Basically, the contents of the mysys/ strings/ dbug/ vio/ zlib/ regex/ directories and their relevant headers from include/ should be transplanted.

  3. New/modular Parser

    There are simple things which can be done to make the parser better and more customizable. As part of this, multiple parsers should be supported. "Why multiple parsers?" readers may ask.... Simply because only a limited subset of the possible accepted syntax are used the overwhelming majority of the time. I would imagine that the parser could be split up to be
    • Data access and manipulation - SELECT INSERT UPDATE DELETE and also SETCALLPREPARE EXECUTE

    • Privilege manipulation - GRANT REVOKE

    • Schema manipulation - ALTER CREATE DROP

    • Stored procedure - invoked by CREATE PROCEDURE or CREATE FUNCTION


  4. Unit tests for server components

    I like unit tests. When done correctly, they should test components for proper operation within specifications and for graceful failure outside of specification. Many significant parts of the server should be possible to unit-test. Full integration testing (á la mysql-test-run) should not be used to substitute for the lack of unit testing. Components which in my opinion should be subject to unit testing include each individual storage engine (within some form of storage engine test harness), the parser (test that it generates the correct trees), the ACL subsystem, the optimizer, each Item class and spatial functions, logging and replication, etc... and lets not forget unit tests for the whole of the aforementioned libmysys

  5. Aggregate Stored Functions and External Stored Procedures

    This is a small detail but it would be nice to have aggregates it completes the stored procedure implementation so that any old-world UDF can now be defined under the newer stored procedure code. This should not be too complicated to do, the complex part is how to declare the aggregate function in a way which makes sense (since the SQL2003:PSM spec does not encompass declaring aggregate functions). It would also be nice to have an uniform extensible external stored procedure implementation. Eric Herman and I actually have a skunk works project for this - the basic framework already works. When we have time, which is unfortunately not very often nowadays. We're trying to develop the plug-ins components for handling Java and Perl external stored procedures. This skunk works project is the descendant of the earlier 'plug-in' skunk work project we had, except with all the plug-in management stuff and elegant unit-test suite removed. The SQL syntax conforms to the SQL2003 specification, which is similarly closely adhered to by IBM's DB/2



Yeah I know this is supposed to be just five items but there are a few others which I believe needs to be considered:

  • A new Recursive Descent parser

    Yeah yeah yeah... broken record... etc. The SQL syntax lends itself for parsing by recursive descent much more easily than using Yacc style parsers. Wether it be hand-written or use some tool similar to PCCTS is not too important. If a tool such as PCCTS is used, the generated code could then be maintained by hand if desired, using the tool only as an initial 'bootstrap'.

  • Abstract Syntax Trees

    Wouldn't it be nice if the parser was to only generate abstract trees which could then be processed later more optimally? The parser would simply parse the keywords, turning the identifiers into atoms which could later be compared with a simple integer compare operation - no need for laborious string comparisons. SELECT statements could be handled more efficiently as the joins can be evaluated first before examining the expressions to emit which would eliminate the extra pass which is done to resolve field names to their tables. It would mean that column types would be known at the time of parsing the expressions which would allow, perhaps, the overloading of functions and more.

    I believe that the extra AST stage would incur less cost than the current cost of all the string comparisons currently done resolving identifiers which is O(n*m/2).

  • SCTP and/or link aggregation

    It would be good/useful to be able to bundle multiple connections down a single network link. This would reduce the number of concurrent sockets which mysqld handles when communicating with (for example) a J2EE application server. SCTP performs the multiplexing within the operating system's network stack, although I would imagine that it would eventually be optimized to be handled within user-land.

  • Parsing within the client

    The statements could be parsed within the client library and an optimal bytestream which represents the AST could be transmitted instead. This lowers the network overhead and will lift some burden of parsing from the server. It will also allow the client to rapidly receive error messages should there be some error in the syntax. This will not bypass any security as the AST tree should only be a binary representation of the SQL statement and should be completely reversible, except that stuff such as comments would be lost. On the plus side, the query-cache's job would be easier.

  • Integrated federation

    Best illustrated by example:


    CREATE FEDERATION odbc USING LIBRARY 'ha_odbc.so' EXTERNAL NAME 'odbc_engine';
    CREATE FEDERATION google USING LIBRARY 'ha_google.so' EXTERNAL NAME 'google_engine';

    CREATE CONNECTION fooserver USING "mysql://fooserver/db";
    CREATE CONNECTION barserver USING "odbc://otheroradsn";
    CREATE CONNECTION google USING "google:///";

    -- select against the bar table on fooserver
    SELECT foo.*, bar.* FROM foo@fooserver JOIN bar@barserver USING (id);
    SELECT * FROM `site:mysql.com +engine +archive`@google;

  • Elimination of .frm files

    I think Jay Pipes and Ronald Bradford have already heard this idea from me. Each 'schema' database directory should have a simple text file called 'schema.sql' and within it, it contains a list of CREATE TABLE statements for each table within that directory. When someone performs an ALTER or DROP operation, the statement is initially simply appended to the text file. At some later flush or shutdown, a new file is created called 'schema.new' which is simply the current state of play for the directory which only has CREATE TABLE or CREATE VIEW and similar declarations within it. When it is finished creating the new file, it is renamed as the new schema file.

    Why is this good? It means that all the table declarations are there in a nice human understandable format. During run time, it is trivial to simply append to the file. It will be quick to parse when the schema is first accessed. When the schema is garbage-collected, perhaps by ref count, it is trivial to flush out all the declarations. It will improve performance of mysqld where there are many tables as all the table definitions for a schema are loaded at once. There are plenty of benefits. And in the case of failure/crash, it is quite recoverable: If the table definition after the ALTER doesn't match the table, the table definition before it may be used - in the case where an incomplete ALTER operation has occurred.

  • Elimination of errmsg.sys

    It would be much nicer if all the error messages for all supported languages was stored within a system table within mysql. This would allow different connections to have different languages for their error messages and would even allow the creation of a special localization for machine parsing - for use in cases where the error message needs to be processed further for use in the client. This may be useful to throw exceptions with specific properties, parsed from the error message, or for use with Federation where the federated server needs to parse the error message to make a sane local error message. Also, I would extend the syntax used by the error message templates such that positional modifiers can be used, similar to Java localization. This would be useful because some error message may be strained/contrived to fit the same argument order for all languages. A column can also be used for additional descriptive and help information so that a user may ask:

    HELP ERROR 1234


Woah, that ended up being perhaps my longest blog posting.... It's amazing what you end up typing and copy/pasting while waiting for repositories to clone.

21 June 2007

Wish lists?

It seems that its very popular right now for people to say what their top 5 or 10 wishes are for MySQL Server. Many people who have chatted with me are well aware of a small collection of mine. Guess what? I have no plans to iterate through them here.
I personally don't believe it is particularly productive. MySQL is a company with a lot of smart people and almost every one of them will have their own distinct and unique wish list. For a few to have their wishes set above the rest could drive a wedge in the works. Of course, a small number of senior officers of the company to broadcast their opinions (MÃ¥rten, Monty, etc) is perfectly reasonable - they are expected to have a public vision - wether it be purely their own or distilled from the collective of the company. So where should we espouse our heart felt opinions? The WorkLog RawIdeas acts as a voracious black hole from which few ideas ever escape to fruition. Perhaps the internal Wiki is a good place and that everyone should be encouraged to start a page of their own wish list, perhaps linked with appropriate WorkLog entries and a grand index of pages and perhaps of ideas, where like-minded staffers can indicate which wishes they agree upon.

08 June 2007

What a surprisingly busy week.

Spent quite a lot of time tracking down a linking problem on Itanium. It appears that GNU ld 2.14 has some serious deficiencies. GNU ld 2.17 (the current stable release) does fix the problem. Unfortunately, the last supported version of GNU ld on RedHat AS 3 for Itaniam is version 2.14.90.0.4.20030529 - which clearly cannot contain the fix which was written in November of 2003. I suppose what to do is now a decision to be made by someone else.

Also spent some time with Federated - Bulk insert support within the storage engine and also how to make Federated play better with transactions and network overhead. Have to put together a new WorkLog entry if I cannot find a relevant one. I need to work on writing better changeset comments. Ingo says, and I have to admit that I agree a bit, that my comments assume that the reader is familiar with the code and bug in question. I need to make them a little more verbose so that anyone can work out what is being fixed without them having to dig through the bug database.
I had a few interesting chats/email discussions to discover that perhaps why there are relatively few complaints about Federated is because while it works for simple cases, there are lot of areas where it doesn't work for users and so it quickly gets abandoned and no bug report is filed, which is a shame as Federated can be a really useful tool.

I have removed the my Sparc and Power Falcon trees. I don't really have time to maintain them anymore: It has been nearly two weeks since I last merged, built and tested them and with all the new tablespace work going in to the tree, future merges are likely to be problematic. Doing this will liberate a couple of hours a week so that I can examine other interesting stuff or perhaps get some more bugs fixed.

30 May 2007

Book Review: Understanding MySQL Internals.

I actually received this book about a month ago but from necessity, I had to put it to one side and deal with more important issues. It is not a big book, only 235 pages, but it has an easy to read conversational style, unlike similarly titled technical books on the internals of other software.

If you're looking for a book about database theory, this book is not for you. It conveys to the reader a good overview as how MySQL functions without delving into the theories as to how the algorithms work or the rationale behind their design. The book does a good job of at least giving a drive-by look at most of the significant components of MySQL. libmysys, MySQL's platform library which allows MySQL to work on a very wide range of operating systems is mentioned only under the subsection titles 'Core API' and 'Utility API' and only lists about 30 of the functions available: libmysys provides the important glue between MySQL and the operating system, comparable with Netscape's NSPR or Apache's APR. The only omission within this book is information of MySQL's string-handling and character-set functions and data structures.

As the author acknowledged in his introduction, the bleeding-edge of MySQL development does move quite fast - so for a reader to make use of this book, I would recommend using the MySQL 4.1 or 5.0 source code for reference. There are many significant changes within the 5.1 beta source code which are not covered - which is perfectly reasonable as 5.1 is still in beta. I do hope that Sasha Pachev has a "2nd Edition" in the works which will cover 5.1 when it is finally released as stable and the plug-ins infrastructure as that would lie the greatest avenue for users and 3rd party ISVs to develop extensions for MySQL. There are also several exciting new components, such as the Falcon storage engine, which would be worthy of mention and perhaps an overview of their operation.

This book would be valuable to anyone unfamiliar with the MySQL source code and provides a foundation which can be built upon. It is informative and does not dazzle the reader with techo-jargon and theories.

If you're interested in this book, it is published by O'Reilly.
http://www.oreilly.com/catalog/9780596009571/index.html

22 May 2007

Been busy,

I have been very busy this past months, with both personal crisis and mega-patches. This is my MySQL blog so lets keep it to topic.
A couple of weeks ago, the plugins mega-patch was finally pushed and was released to the world in version 5.1.18. Thanks to Ingo and Sergei for keeping up the effort while I was occupied. As part of the code push, InnoDB code was stripped out of the main mysqld code and stored where it belongs, within Innobase's code. This is an exciting development as it now makes it easier than ever for plugin developers to create plugins without having to modify any existing code within mysqld. It has taken many months to get to this point but we are finally there. Anyone remember our CEO announcing the plugin initutive back when Innobase was acquired by Oracle? October 2005, I believe.

Since that push, I am back on bug-fixing... I have an annoying bug on Itanic with Intel's Itanic Compiler. (Yeah, I know the name is Itanium) The ELF fixups required for the embedded library are not being generated correctly because it uses a 22bit word as an offset and the code footprint is more than 4MB. It is quite likely that the fault lies in the version of Binutils used but for some reason, a recent binutils is not supported by the particular license of RedHat AS we have installed there.

On the bright side, I had some encouraging chats with Eric Herman... and after an excitingly short evening of coding, I have developed a PoC of external stored procedures support in mysqld. All that needs to be done now are the actual plugins to support particular types of external procedures - Eric wants to carry the baton for Java... I think I shall play with C. That should prove interesting as it will require using libtcc to create on-the-fly trampoline code so that native C functions can be called.

Something which deserves a minor mention: I have successfully compiled and tested Jim Starkey's Falcon Storage Engine for MySQL on a Quad PowerPC G5 Mac running Mac OS/X 10.4.9.

08 March 2007

It has been very productive this week working at Brian's place in Seattle. No sightseeing this trip but had an enjoyable experience talking with other developers and users at the MySQL meetup session.
I have been working on developing a flexible Log Plugin system which hopefully does not have the drawbacks of the existing system. The new code seems to be working well now although it is not tied into any system variables in a way to show off what it can do. My patch earlier on Wednesday replaced about 2500 LOC with about 500. Even after the code written today, there is still a significant reduction in the overall code size. Now, all I need is my plugin server variables patch pushed...

04 March 2007

An interesting but ambitious idea would be to refactor the mysqld code in a major way: Not just a minor tweek but would take a lot of the concepts, libraries and design of mysqld and put them together in a cleaner way with the benefit of hindsight, wishlists and the goals of mysqld. Very possible to do but not at all easy.

27 January 2007

I find myself thinking about Federated and about how to improve it's connection pooling. Whatever is done should be made safe for transctions. Unfortunately, we cannot suspend and join XA transactions which is a pity as it would make it much easier.

25 January 2007

Cool! I can now generate Xcode project files and build all of mysql (except NDB) from within it. I think it appears to compile faster. Now I can try to see if I can be more productive using Xcode... starting with some debugging.

23 January 2007

A couple of bug fixes pushed into the team trees. It appears that Valgrind can have a race when signalling condition variables while not holding the mutex. I should get some rest - having to stay up for European hours can be quite exausting. The company-wide conf call was mostly uneventful except for a small number of people who seemed unable to mute their connection. Hopefully, next time MGM will use the "lecture" mode and ensure that all other participants are quiet: It was quite hard to concentrate with the errant echos and other noises.

07 January 2007

My main development work of November/December is still under review by Sergei. Just working on fixing bugs and right now looking at a facinating bug in the Heap/Memory storage engine...