Wednesday, 29 April 2009

Gambit MySQL client 0.2

After a few weeks wrestling with the MySQL protocol, I've finally managed to produce a reasonable attempt at a client for Gambit (and probably other Scheme's).

This is a pure Scheme implementation rather than using Gambit's C interface to glue the existing MySQL C API into Gambit. The reason for this is that Gambit's power, and in my view fantastic potential as a web platform, comes from it's lightweight threads which allows Gambit to process thousands of concurrent requests in a single OS thread. This is all acheived by an internal schedular, continuations and very careful use of non-blocking calls on all I/O functions. Using the C API effectively takes the OS thread out of Gambit and into the C code and therefore all of those lovely threads stop executing until your call is completed - seriously bad news for a web application.

This library is now around 80% complete and supports the following features :-
  • Multiple concurrent connections to MySQL,
  • Authentication using both the old and new authentication methods,
  • Switching Schema's,
  • Dynamic queries, updates, deletes etc.
  • Prepared statement creation,
  • Executing prepared statements (queries and updates) with bind variables for most datatypes,
  • Closing prepared statements,
  • Closing connections.
Additionally, I've also introduced a SQL Abstraction Layer. This is equivalent to JDBC in Java which defines a few generic SQL types and provides a common API across database vendors. The idea behind this is that if someone else provides another database client, they can use the same abstraction layer and then any programs can switch between databases without having to rewrite their code (hopefully).

There are a few limitations with the driver, i.e. features that aren't implemented yet :-
  • Not all datatypes are supported yet in prepared statements e.g. floats, doubles, sets, enums and one or two others. Some of these can easily be avoided e.g. by using DECIMAL instead of FLOAT or DOUBLE.
  • BLOB's and large character fields are not yet supported in either dynamic or prepared statements.
  • Large result sets requiring supplementary fetches aren't supported as I haven't implemented the additional fetch function.
Hopefully I'll add these things over the coming weeks and months. There are likely to be a few bugs in there, so if you find one or you are desperate for a feature that hasn't been implemented yet then please email me.

The library is in the Gambit Dumping Grounds. Here's an example of how you use it :-

(load "sql")
(load "mysql")

(with-connection (sql-connect "localhost" 3306 "test" "fred" "fred")

;;(mysql-execute "drop table test")
(mysql-execute "create table test (id integer PRIMARY KEY AUTO_INCREMENT, name varchar(50), inserted datetime, amount decimal(10,2) DEFAULT -4.5)")
(pp (sql-execute "select * from test"))

;; Prepared Statements
(let ((stmt1 (sql-prepare-statement "insert into test (name, amount, inserted) values (?, ?, sysdate())"))
(stmt2 (sql-prepare-statement "select id, name, amount, inserted from test where amount > ? order by id")))

(pp (sql-execute-prepared-statement stmt1 '("Fred" 10.5)))
(pp (sql-execute-prepared-statement stmt1 '("Joe" 5.5)))
(sql-close-statement stmt1)

(pp (sql-execute-prepared-statement stmt2 '(4.5)))
(sql-close-statement stmt2))

;; Close connection

Result sets are returned as a list of lists i.e. a list of column values for each row in a list for the result set. The first row in the list is the column headings. For dynamic statements, the column data is always a string regardless of the SQL type because this is the way that MySQL sends it and I haven't implemented any type conversions yet. Prepared statements return binary types so I've mapped some of these e.g. strings and numbers into their respective Scheme types.

I would strongly recommend using prepared statements rather than dynamic statements for performance reasons and also for their natural resistance to SQL injection attacks in web applications.

Once the library is finished, I'll create a persistence layer that builds on top of the prepared statements to give a framework similar to ActiveRecord in Rails or Hibernate in Java.

Tuesday, 21 April 2009

MySQL Interface for Gambit

As you've probably already realised, I'm a huge fan of the Gambit Scheme implementation which is highly performant and compiles into a standalone executable.

Gambit has also been proven to support tens of thousands of concurrent lightweight threads which helps make it an ideal platform for web applications. This coupled with it's rare ability to serialize continuations places it as a front runner for web applications in the 21st century.

Unfortunately there is little infrastructure there at present to support web applications ... or so I thought. It turns out that a tremendous amount of effort has gone into developing a framework called 'Black Hole' by Per Eckerdal and Mikael More. This is a continuations based framework that is built on top of a module system for Gambit. I look forward to seeing this being released in the very near future.

To support this effort, I have developed a pure Scheme MySQL interface for Gambit that is currently in the Gambit Dumping Grounds. The advantage of a pure Scheme implementation rather than a C binding is that it won't block your thousands of lightweight threads whilst your OS thread is stuck in C land.

Currently the implementation supports logging on with new and old authentication methods, switching schema's and DML and DDL using dynamic queries. I'm currently adding support for prepared statements which I would consider essential before building a persistence layer for Black Hole or anything else.

I'd love to do the same for Oracle but unfortunately Oracle don't publish the protocol for SQL*Net so this is not possible. Hopefully Gambit will devise a way to call C with an OS thread so as not to block the lightweight threads and then we can use the OCI interface.

Functional Programming Languages

We’ve had Structured Programming, we’ve got Object Orientation, could Functional Programming be the next big thing ?

In the 1950’s high level programming languages were limited to Fortran, Algol, Cobol and Lisp. Most of the languages that we use today are derived from Algol, an imperative language. An imperative language is one which is built of statements that change state, named after the imperative mood in natural languages which expresses commands to take action. Our structured programming in C created structures or records containing data that we modified through functions or procedures – an example of an imperative style. With the advent of rich GUI applications in the 1990’s, this approach became untidy because although the structures were similar for different GUI components, the behaviour was different. This led to the much needed development of object orientation with C++. Object orientation is further refinement and development of the imperative style.

One of the most popular languages of recent times, Java, is a near perfect example of an OO language since it was primarily designed for developing portable GUI applications – Applets. It seems strangely ironic, therefore, that very few GUI applications are developed with Java and instead we use it mainly for developing server-side web applications. There are a wealth of web frameworks available in Java but despite this, Java it is still not the most productive language or the least error prone language for this purpose. Web applications are highly concurrent and shared state implicit in an imperative programming style can create many problems.

We have all seen the obvious failures where the wrong data is revealed to the wrong customer because the code has failed to do adequate checking but these are only the cases where the data has made it on to the web pages. For every one of these, there will be countless other bugs where incorrect concurrent data access causes subtle error conditions resulting in random error pages rather than such obvious failings. These failings can sometimes be difficult to recreate and may not be in your own code, these could be bugs in the web framework that you are using, the application server or even the JVM itself - all written in an imperative style. Adding controls such as synchronisation around data to prevent concurrent modification necessarily introduces performance bottlenecks. With multiple CPU’s and increasing numbers of cores per CPU these problems will get greater rather than lesser over the coming years.

In a concurrent world, imperative is the wrong default !” – Tim Sweeney, Epic Games

So if imperative programming is such a bad fit for our modern concurrent world, what’s the alternative ?

In the 1950’s, Lisp had a functional approach where there was little or no shared state and instead data is passed as arguments, often in a list, in function calls i.e. on a stack rather than a shared heap. This seemingly innocuous difference leads to some major differences in program structure and language features. Some of these features such as closures and continuations have been adopted by some of the newer dynamic but essentially imperative languages like Ruby and have been attributed with increasing productivity and expressiveness. Lisp survives on today in the form of Common Lisp and Scheme and although not widely used outside academia, both are enjoying renewed interest in recent years. Scheme, in particular, has seen an explosion of interest from developers becoming disillusioned with the mainstream commercial imperative languages. Scheme has been used as a teaching language at some of the prestige technical universities for many years but is now finding its way into niche commercial and open source applications, one example being the GIMP image manipulation software.

Another branch of functional languages derived from the 1970’s language ML include OCAML, Erlang and Haskell which again have seen a big growth in interest in recent years. OCAML has gained popularity in military and investment banking applications. Microsoft is about to release its own functional language based on ML called F#.

Lisp suffered a big drop in popularity when the AI research of the 1970’s failed to deliver on earlier promises and at around the same time people were beginning to do great things with C. C, with its simpler requirements for compiler technology, was perceived as fast and efficient and its imperative style suited the single-threaded applications being developed at the time. By comparison, Lisp was perceived as slow and bloated and difficult to work with. Nowadays, in Scheme’s case, the core language has been simplified and streamlined and compiler technology has improved enormously and the best Scheme and OCAML compilers can produce single-threaded code that runs at similar speeds or even slightly faster than C code. Parallel code often runs many times faster because there is little need for synchronisation. Similarly, large tasks can be effortlessly broken into many pieces and executed in parallel on many cores or distributed over many machines – a task that is not easy to achieve with Java or C#.

The benefits of a functional approach are not just increased performance and the removal of concurrency errors though; one functional language feature is particularly suited to web applications: continuations.

The stateless nature of the web introduces its own problems. Business processes are inherently stateful and yet the HTTP protocol is inherently stateless. This means that web frameworks have to implement ‘sessions’ either in memory or a database and track them through cookie values or rewritten URL’s. Business processes then need to be broken down into various discrete steps, each one corresponding with a HTTP request / response pair, and the application then needs to retrieve the state from the previous steps, perform the current step and then persist the data ready for the next step. This extra complexity is necessary because of the discontinuous nature imposed by the protocol.

Ideally, an application would start the process and then ‘call out’ to a user to grab the data needed for the next step and then continue the process within the same procedure so all the local state is still in scope i.e. turning the protocol inside out. Continuations permit this approach.

It's amazing that this single higher abstraction can make such a tremendous difference in your Web programming, but it really does. If the Java language had native support for continuations, we'd all be coding this way.” – Bruce Tate, IBM

Most languages support continuations of some form, e.g. the humble ‘return’ or ‘break’ statement present in most imperative languages is an example of an ‘escaping continuation’ that can be used to prematurely exit from a block of code. First class continuations present in Scheme allow continuations to be used not only for escapes but also for arbitrary jumps (with state) and they are reusable i.e. they can be invoked many times. Some Scheme implementations even allow continuations to be serialised to disk or a database and then can be later invoked even though the process that they once existed in has long since gone. This is only possible because all the required state existed on the local stack at the time the continuation was captured and is not reliant on other unknown data existing in the process heap space – the essence of a functional style.

These two reasons alone show how any functional language that supports first class continuations would be a far better choice for web applications than our current imperative languages but there is another, rarely mentioned, problem particularly with Java that could be addressed: poor productivity.

It seems strange that we have gone from writing business applications in dedicated and productive 4GL’s like PowerBuilder, Oracle Forms, etc a few years ago to using a general purpose 3GL like Java. It is probably no coincidence that there is now a shortage of developers and development projects often overrun on time and cost. Yet nobody in the lucrative software development industry talks about lost productivity, they merely claim that projects are more complex now.

Scheme and Common Lisp have two other rare features that makes them compelling and could help recover the lost productivity that we have seen in the move to Java and distributed systems : Homoiconicity and Metalinguistic Abstraction.

Homoiconicity is a word coined to describe the fact that the language source code is itself a data structure within the language or in other words, there is uniformity between code and data. Code can become data and data can become code. This means that programs can not only manipulate data (as you’d expect) but they can also manipulate code.

This opens up the opportunity for metalinguistic abstraction or creating abstractions above the base language that then become part of the language i.e. syntactic extension. In simple terms, this means that macros can be used to extend the syntax of the language and create meta-languages. This allows you to readily create a domain specific language (DSL) just for web applications or any other purpose. In fact most Lisp programs end up being a DSL for the problem being solved. This approach magnifies the reuse potential of language features since anywhere where syntax is repeated, an abstraction can be created and reused without the fear of introducing new bugs. Most languages allow you to extend and reuse data types but Lisp based languages are some of the very few that allow you to do the same for syntax.

Metalinguistic abstraction is certainly not unknown in Java applications, most web frameworks rely on XML to create the abstractions that can’t be done in Java. In fact XML has a very similar structure to the nested lists of Lisp S-expressions. The disadvantages of using XML for abstraction is that it is not integrated with the language so you can’t mix XML tags with java code and, of course, your application becomes messy because the logic spans numerous Java classes and XML files. Being able to create those abstractions in the core language would certainly improve readability, reduce defects and improve productivity. Interestingly Ruby on Rails does away with XML files by using Ruby mixins to create abstractions – few would disagree that Rails is more productive than a comparable Java framework. Macros in Scheme or Lisp go further still and produce very concise web specific code particularly when combined with a continuations based web framework. And unlike Ruby this is compiled code running at native code speeds and can safely be highly parallel in nature.

Functional languages are slowly emerging from niche areas but given the utter domination of imperative languages and the enormous established commercial interests around them, business adoption is likely to be slow initially. Given the advantages offered though, early adopters could gain significant commercial advantages over their competitors albeit at the probable cost of some early pain. Functional programming is a paradigm shift and requires developers to almost learn programming again from basic principles.

Hybrid languages like Scala for the JVM have recently emerged that allow functional programming to be mixed with imperative and build upon the existing tools and libraries already available for the Java platform. Whilst Scala pushes JVM development forward, there is still a danger that developers will continue using an imperative style and miss the benefits being offered.