Wednesday, 15 July 2009

Spread betting with genetic programs

I met up with an old friend a couple of weeks ago who mentioned he was interested in financial spread betting and the possibilities of making some money as he knew of people that made a full time living from it. I must admit, I'd often thought that it must be possible to use software to make money from the markets but hadn't realized that spread betting offered a low cost (albeit high risk) way in.

It sounds rather obvious but you should be successful if you win more on your winning bets than you lose on your losing bets even though you might lose more often than you win. So the emphasis should be on profit rather than the number of times you win or lose. Reading books like 'The Naked Trader', it seems the biggest danger is your emotions forcing you to throw good money after bad when you are losing and making you cash in your winning bets too soon when they're in profit rather than letting them grow to reach their full potential.

Using software should remove the emotions providing the user follows the orders exactly and is not tempted to intervene. So I set about writing some software, in Scheme obviously, to generate a single market order for a bet on the FTSE100 index the next day based on the previous day's data. Here's an example of the daily movement :-



The FTSE is an ideal target for spread betting because it has a large daily movement range and the spreads offered by most companies are very tight (e.g. IG Index offers only 2 points) and you can have a stop that is only 30 points away which means that can very cheaply place bets.

The order might say Buy at 4280 with a stop of 30 points, no limit and a rate of £1 per point. If the market is lower than 4280 at any point during the day, your bet will be placed. For every point that the market moves up, you will gain a pound. If the market then moves downwards you will lose £1 for every point until it reaches the 30 point stop in which case you've lost your bet and are £30 poorer. If on the other hand the market had climbed 100 points you'd be almost £100 richer (the difference is the spreads taken by the spread betting firm). If the market never came down to 4280, then your bet would not be placed and you wouldn't gain or lose anything.

So the first task was to build a simulator and get hold of some historic 'intraday' data i.e. minute by minute movements in the price going back over three months. The simulator takes a list of orders and runs them in a particular day to calculate the profit or loss for each order. Wrapped around this is a harness that for each day in the set will call out to a strategy function asking for orders based on the previous days statistics. The daily and weekly profitability is then calculated for the strategy so you can decide whether it's a winner or not. You can then use the strategy to generate orders for the next day i.e. tomorrow, hopefully with confidence that it will win more than it will lose.

Using the simulator, you can try out strategies that you think might work - here's one that's not bad :-

;; Stats used by strategy function
(define-type stats
prev-open
prev-close
prev-high
prev-low
)

;; A very simple but surprisingly effective strategy
(define (simple-strategy stats)
(let ((gain (- (stats-prev-close stats) (stats-prev-open stats)))
(range (- (stats-prev-high stats) (stats-prev-low stats)))
(mid (/ (+ (stats-prev-high stats) (stats-prev-low stats)) 2)))
(cond
((> gain 10) (list (make-order - (+ mid 5) 30 0 1.0)))
(else (list (make-order + (+ mid 10) 30 0 1.0))))))

The simulator shows that this very simple strategy generates around 500 points profit over the sample three month period. (NB. past performance is no gaurantee of future performance !). Make-order takes the direction (+ for buy order or - for sell), the opening (order limit) value, stop distance, the limit distance and price per point. Note that the only input data the strategy uses is the previous day's opening, closing, high and low price. The strategy works on the basis that if the FTSE gains one day then it'll probably come down the next. Obviously this isn't a universal rule and will very often be wrong but when you're wrong you will lose at most 30 points and when you're right you could gain in excess of 100 points.

Using manual analysis of the charts and computing some optimum values, I created a variation on the one above that managed to average around 1000 points profit for the period but I had no way of knowing if it was optimal or not.

Genetic Programming
Genetic programming is a technique for evolving programs to solve a problem. Here our problem is to come up with a trading strategy by creating program fragments that breed and mutate over time and are evaluated against a 'fitness function' the least fit are discarded and the most fit are encouraged to breed to produce stronger offspring. In our case the fitness function is the simulator - the total profit over the three months data is all that matters. If you make a good profit you can stay and breed, if you don't you're out. Breeding (crossover) takes complete elements at random from both parents. Mutation randomly and subtly modifies one of these elements to produce something different. After many generations you get a very optimal solution to the problem you are trying to solve.

Scheme is the perfect language for genetic programming because :-
  1. You can modify code easily by manipulating lists,
  2. You can evaluate your new code with eval.
So to solve our problem we need a 'trader' :-

(define-type trader
dir-func
start-func
stop-func
limit-func
profit)


Which basically is a collection of expressions that are evaluated to produce the corresponding values required for the order. The profit member is used to track which of the traders are successful.

There are a population of traders that exist in a list that are initialised with random expressions :-

(define (pick-from lst)
(list-ref lst (random-integer (length lst))))

(define (get-random-var)
(pick-from '(o c h l g r m)))

(define (get-random-constant x)
(let ((r (- x (random-integer (* x 2)))))
(if (= r 0) (get-random-constant x) r))) ;; Don't want any divide by zero errors

(define (make-random-trader)
(make-trader (pick-from '(+ -)) (get-mutated-term (get-random-var)) (get-mutated-term (get-random-constant 100)) (get-mutated-term (get-random-constant 100)) 0))


(define (initialise-traders num)
(let ((traders '()))
(let nxt ((x 0))
(if (< x num)
(begin
(set! traders (cons (make-random-trader) traders))
(nxt (+ x 1))
)))
traders))
Initialise-traders creates the population of size num using the make-random-trader functions. The expressions are based on random constants and random variables which are o - yesterdays opening value, c - yesterdays closing value, h - yesterdays high value etc. A useful function pick-from basically returns a random element from a specified list. Update: Subsequent studies have shown that it is useful to seed the initial population with some known good specimens e.g. the simple strategy above.

Breeding and Mutation
The following function breeds two traders by selecting features from each and then adding mutation :-

(define (breed t1 t2)
;; Create child from parents
(let ((t (make-trader
(pick-from (list (trader-dir-func t1) (trader-dir-func t2)))
(pick-from (list (trader-start-func t1) (trader-start-func t2)))
(pick-from (list (trader-stop-func t1) (trader-stop-func t2)))
(pick-from (list (trader-limit-func t1) (trader-limit-func t2)))
0 )))

;; Add mutation
(let ((r (random-integer 5))) ;; what to modify
(cond
((= r 1) (trader-dir-func-set! t (mutate (trader-dir-func t))))
((= r 2) (trader-start-func-set! t (mutate (trader-start-func t))))
((= r 3) (trader-stop-func-set! t (mutate (trader-stop-func t))))
((= r 4) (trader-limit-func-set! t (mutate (trader-limit-func t))))))
t))


You can see that one (or none) of the trader functions is selected for mutation which is done with these functions :-

(define (mutate-list l)
(pick-from (list
(map (lambda (x)
(if (> (random-integer 10) 2) (mutate x) x))
l)
(simplify-list l))))

(define (get-mutated-term t)
(pick-from (list
(get-random-constant 100)
(get-random-var)
`(+ ,t ,(get-random-constant 100))
`(- ,t ,(get-random-var))
`(* ,t ,(get-random-constant 100))
`(/ ,t ,(get-random-var))
`(if (> ,t ,(get-random-constant 100)) ,t ,(get-random-var)))))

(define (mutate f)
(cond
((list? f) (mutate-list f))
((number? f) (pick-from (list (get-mutated-term f) (+ f (get-random-constant 10)))))
((or (eq? f 'o) (eq? f 'c) (eq? f 'h) (eq? f 'l) (eq? f 'g) (eq? f 'r) (eq? f 'm)) (get-mutated-term f)) ;; Variable
((or (eq? f '+) (eq? f '-)) (pick-from (list '- '+ `(if (> ,(get-random-var) ,(get-random-constant 100)) + -))))
((or (eq? f '*) (eq? f '/)) (pick-from '(+ - / *))) ;; Arithmetic operator
((or (eq? f '>) (eq? f '<) (eq? f '=)) (pick-from '(> < =))) ;; Logical operator
(else f))) ;; No mutation - don't know what to do

The key function here is mutate which takes an expression term and mutates it using get-mutated-term.

The rest of the code is concerned with maintaining the population pool, evaluating the profitability of the members and then discarding the least profitable and breeding the most profitable.

The Results
The simple static example above achieved a total profit of 509 points over the 3 months data, the genetically bred trader achieved 1469 ! My efforts to hand optimise the simple trader resulted in a profitability of around 1100 but the genetic model has even exceeded this by 30%.

Update: I expanded the data available to the traders to include a long and short moving average and also the gain on the Dow Jones Industrial Average and re-ran the process. The winning trader had a profitability of around 1700 points but surprisingly hadn't chosen to use the Dow variable.

I've now created a separate blog to document my efforts to take this study forward to fully automated forex trading, please visit http://www.thegenetictrader.com to check on my progress.

13 comments:

  1. Great post! Using GP for spread betting is something I've wanted to do for a while, but haven't done anything about.
    I did implement both tree-based and linear genetic programming systems in Java for an animat problem, but they kind of sucked at the problem domain (which was very non-deterministic and extremely slow (~20s) to evaluate each solution).

    The mention of seeding the pool with some decent handmade solutions seems wise, since it can take an awful long time to get out of a local starting optima if all the solutions start out rubbish.

    ReplyDelete
  2. This does take quite a while to run - 1024 generations of a population of 1024 takes nearly 3 hours to run.

    Thanks for the comment.

    ReplyDelete
  3. Are you using the same dataset for training and for evaluation of your algorithm? Wouldn't overfitting be a problem?

    ReplyDelete
  4. Disappointingly when I evaluated the algorithms against a different data set, the profitability was quite poor. I put this down to the training set being too restrictive i.e. not representing enough different market conditions.

    I'm now accumulating more data and concentrating more on forex rather than indices as they tend to be more technical in nature and therefore more predictable.

    I do think that overfitting has been a problem so I'm looking for ways to avoid it.

    Thanks

    ReplyDelete
  5. Your approach is really interesting and I would love to see it work.

    Maybe some cross-validation could help you get over the problem of over-fitting. For example you could train a "model" using data from week 1 and evaluate on week 2. Then use data of week 2 to train and evaluate on week 3 etc. In the end you have n trained models that are maybe overfitted to their specific week but are evaluated on a week that they haven't "seen" yet.

    A mixture of these models could be used to predict for all the data with less overfitting. Maybe using a voting scheme.

    The same Anonymous...

    ReplyDelete
  6. Thnaks for a very interesting post. Being a rubyist myself, I would like to implement some of your ideas in my favorite language. I would be interested in knowing where historical intraday data could be found, and would be obliged if you could give some pointers.

    ReplyDelete
  7. Hi, Ruby would be tricky I'm afraid because of the complex syntax.
    My programs work because Scheme makes it very easy to write programs that write programs.

    Without this you would have to write an evaluation environment i.e. a language on top of Ruby!
    Look for a ready built library for GA's failing that I'd give up on Ruby.

    For historic data try http://www.fin-rus.com/analysis/export/_eng_/default.asp

    ReplyDelete
  8. Andrew, thanks for this nice post.
    Just curious about your thoughts on Python? Have you tried it? It allows functional programming constructs so you should be able to do what Scheme allows?

    Thanks, JB

    ReplyDelete
  9. Hi JB,

    I've never tried Python but read a lot of good things about it. It seems to have a great community and a lot of libraries which is a great advantage over Scheme. I do love Scheme though and the prefix notation makes it ideal for genetic programming.

    Andrew

    ReplyDelete
  10. Andrew,

    What simulator do you use for your testing?

    Cheers

    ReplyDelete
  11. Excellent post and follow ups!
    I've been deploying Genetic Algorithms for a year or two myself (on various problems - essentially to study the nuts and blts of the processe - see Flickr and search for 'Rat in Maze') I am particularly intersted in developing fast techniques which can operate GA's 'by hand' (the mathematics of what goes on n the GA 'computation' is the same irrespective of the problem & is basically 'stupid' grinding away that can be radically simplified/shortcutted).
    The method you are deploying here is one of computer program 'slime' from which an optimal solution evolves - see Koza, who is leading the field in the techniques.

    Regarding computer languages, you can use any: use the 'Spool' command in Basic etc. to write to a file (I actually filled up the hard disc on my7 machine last year in about 30 mins with one program populations!!!).

    ReplyDelete
  12. Hi Andrew,

    Did you continue with this research?
    ... or did you become so rich that you don't have time to come to the blog?

    JT

    ReplyDelete
  13. Yes I became so fabulously rich that I'm now working full time in a proper job. I rewrote the whole code in C and spent many months developing a very complex parallel genetic programming library. I learnt a great deal about using the right fitness functions (clue: profit isn't it) and about using different validation sets and complexity constraints to avoid overfitting. I evaluated around 7 billion candidate functions for trading EURUSD on the spot markets and found around 50 very promising functions which I traded live for 3 months and made a profit. Unfortunately the Euro crises hit and the whole thing became erratic and started losing money so I pulled the plug. I think the experience validated the approach but it was complex and still very risky and it's much easier to make money by working for somebody else.

    ReplyDelete