Tuesday 21 April 2009

Functional Programming Languages

We’ve had Structured Programming, we’ve got Object Orientation, could Functional Programming be the next big thing ?

In the 1950’s high level programming languages were limited to Fortran, Algol, Cobol and Lisp. Most of the languages that we use today are derived from Algol, an imperative language. An imperative language is one which is built of statements that change state, named after the imperative mood in natural languages which expresses commands to take action. Our structured programming in C created structures or records containing data that we modified through functions or procedures – an example of an imperative style. With the advent of rich GUI applications in the 1990’s, this approach became untidy because although the structures were similar for different GUI components, the behaviour was different. This led to the much needed development of object orientation with C++. Object orientation is further refinement and development of the imperative style.

One of the most popular languages of recent times, Java, is a near perfect example of an OO language since it was primarily designed for developing portable GUI applications – Applets. It seems strangely ironic, therefore, that very few GUI applications are developed with Java and instead we use it mainly for developing server-side web applications. There are a wealth of web frameworks available in Java but despite this, Java it is still not the most productive language or the least error prone language for this purpose. Web applications are highly concurrent and shared state implicit in an imperative programming style can create many problems.

We have all seen the obvious failures where the wrong data is revealed to the wrong customer because the code has failed to do adequate checking but these are only the cases where the data has made it on to the web pages. For every one of these, there will be countless other bugs where incorrect concurrent data access causes subtle error conditions resulting in random error pages rather than such obvious failings. These failings can sometimes be difficult to recreate and may not be in your own code, these could be bugs in the web framework that you are using, the application server or even the JVM itself - all written in an imperative style. Adding controls such as synchronisation around data to prevent concurrent modification necessarily introduces performance bottlenecks. With multiple CPU’s and increasing numbers of cores per CPU these problems will get greater rather than lesser over the coming years.

In a concurrent world, imperative is the wrong default !” – Tim Sweeney, Epic Games

So if imperative programming is such a bad fit for our modern concurrent world, what’s the alternative ?

In the 1950’s, Lisp had a functional approach where there was little or no shared state and instead data is passed as arguments, often in a list, in function calls i.e. on a stack rather than a shared heap. This seemingly innocuous difference leads to some major differences in program structure and language features. Some of these features such as closures and continuations have been adopted by some of the newer dynamic but essentially imperative languages like Ruby and have been attributed with increasing productivity and expressiveness. Lisp survives on today in the form of Common Lisp and Scheme and although not widely used outside academia, both are enjoying renewed interest in recent years. Scheme, in particular, has seen an explosion of interest from developers becoming disillusioned with the mainstream commercial imperative languages. Scheme has been used as a teaching language at some of the prestige technical universities for many years but is now finding its way into niche commercial and open source applications, one example being the GIMP image manipulation software.

Another branch of functional languages derived from the 1970’s language ML include OCAML, Erlang and Haskell which again have seen a big growth in interest in recent years. OCAML has gained popularity in military and investment banking applications. Microsoft is about to release its own functional language based on ML called F#.

Lisp suffered a big drop in popularity when the AI research of the 1970’s failed to deliver on earlier promises and at around the same time people were beginning to do great things with C. C, with its simpler requirements for compiler technology, was perceived as fast and efficient and its imperative style suited the single-threaded applications being developed at the time. By comparison, Lisp was perceived as slow and bloated and difficult to work with. Nowadays, in Scheme’s case, the core language has been simplified and streamlined and compiler technology has improved enormously and the best Scheme and OCAML compilers can produce single-threaded code that runs at similar speeds or even slightly faster than C code. Parallel code often runs many times faster because there is little need for synchronisation. Similarly, large tasks can be effortlessly broken into many pieces and executed in parallel on many cores or distributed over many machines – a task that is not easy to achieve with Java or C#.

The benefits of a functional approach are not just increased performance and the removal of concurrency errors though; one functional language feature is particularly suited to web applications: continuations.

The stateless nature of the web introduces its own problems. Business processes are inherently stateful and yet the HTTP protocol is inherently stateless. This means that web frameworks have to implement ‘sessions’ either in memory or a database and track them through cookie values or rewritten URL’s. Business processes then need to be broken down into various discrete steps, each one corresponding with a HTTP request / response pair, and the application then needs to retrieve the state from the previous steps, perform the current step and then persist the data ready for the next step. This extra complexity is necessary because of the discontinuous nature imposed by the protocol.

Ideally, an application would start the process and then ‘call out’ to a user to grab the data needed for the next step and then continue the process within the same procedure so all the local state is still in scope i.e. turning the protocol inside out. Continuations permit this approach.

It's amazing that this single higher abstraction can make such a tremendous difference in your Web programming, but it really does. If the Java language had native support for continuations, we'd all be coding this way.” – Bruce Tate, IBM

Most languages support continuations of some form, e.g. the humble ‘return’ or ‘break’ statement present in most imperative languages is an example of an ‘escaping continuation’ that can be used to prematurely exit from a block of code. First class continuations present in Scheme allow continuations to be used not only for escapes but also for arbitrary jumps (with state) and they are reusable i.e. they can be invoked many times. Some Scheme implementations even allow continuations to be serialised to disk or a database and then can be later invoked even though the process that they once existed in has long since gone. This is only possible because all the required state existed on the local stack at the time the continuation was captured and is not reliant on other unknown data existing in the process heap space – the essence of a functional style.

These two reasons alone show how any functional language that supports first class continuations would be a far better choice for web applications than our current imperative languages but there is another, rarely mentioned, problem particularly with Java that could be addressed: poor productivity.

It seems strange that we have gone from writing business applications in dedicated and productive 4GL’s like PowerBuilder, Oracle Forms, etc a few years ago to using a general purpose 3GL like Java. It is probably no coincidence that there is now a shortage of developers and development projects often overrun on time and cost. Yet nobody in the lucrative software development industry talks about lost productivity, they merely claim that projects are more complex now.

Scheme and Common Lisp have two other rare features that makes them compelling and could help recover the lost productivity that we have seen in the move to Java and distributed systems : Homoiconicity and Metalinguistic Abstraction.

Homoiconicity is a word coined to describe the fact that the language source code is itself a data structure within the language or in other words, there is uniformity between code and data. Code can become data and data can become code. This means that programs can not only manipulate data (as you’d expect) but they can also manipulate code.

This opens up the opportunity for metalinguistic abstraction or creating abstractions above the base language that then become part of the language i.e. syntactic extension. In simple terms, this means that macros can be used to extend the syntax of the language and create meta-languages. This allows you to readily create a domain specific language (DSL) just for web applications or any other purpose. In fact most Lisp programs end up being a DSL for the problem being solved. This approach magnifies the reuse potential of language features since anywhere where syntax is repeated, an abstraction can be created and reused without the fear of introducing new bugs. Most languages allow you to extend and reuse data types but Lisp based languages are some of the very few that allow you to do the same for syntax.

Metalinguistic abstraction is certainly not unknown in Java applications, most web frameworks rely on XML to create the abstractions that can’t be done in Java. In fact XML has a very similar structure to the nested lists of Lisp S-expressions. The disadvantages of using XML for abstraction is that it is not integrated with the language so you can’t mix XML tags with java code and, of course, your application becomes messy because the logic spans numerous Java classes and XML files. Being able to create those abstractions in the core language would certainly improve readability, reduce defects and improve productivity. Interestingly Ruby on Rails does away with XML files by using Ruby mixins to create abstractions – few would disagree that Rails is more productive than a comparable Java framework. Macros in Scheme or Lisp go further still and produce very concise web specific code particularly when combined with a continuations based web framework. And unlike Ruby this is compiled code running at native code speeds and can safely be highly parallel in nature.

Functional languages are slowly emerging from niche areas but given the utter domination of imperative languages and the enormous established commercial interests around them, business adoption is likely to be slow initially. Given the advantages offered though, early adopters could gain significant commercial advantages over their competitors albeit at the probable cost of some early pain. Functional programming is a paradigm shift and requires developers to almost learn programming again from basic principles.

Hybrid languages like Scala for the JVM have recently emerged that allow functional programming to be mixed with imperative and build upon the existing tools and libraries already available for the Java platform. Whilst Scala pushes JVM development forward, there is still a danger that developers will continue using an imperative style and miss the benefits being offered.

1 comment: