Introduction
Have you ever wondered how Policy Failover works in Opalis? Let’s take a look. For the purpose of this testing I am going to create a very simple workflow that sleeps for a minute then sends out an email. I will simulate ‘killing’ the action server that the workflow is running on by forcing the service to stop and see what happens to the Policy.
So we see that the policy that was running on the stopped action server did indeed die and error out. The interesting thing is that the policy still shows that it is running (note the Green Arrow up above). If we check the other action server we that a new PolicyModule has been spawned
And look at that I even got the email
What does this mean?
So it would be all and well to just say ‘great policy failover works’ but what does this actually mean? The policies that fail do indeed come up on the failover action server; the problem is they start over from the beginning. So, if we don’t design our workflows with this in mind we could run into some unexpected logic paths. For example, think of an all in wonder policy that deploys virtual machines. If you had 20 builds going on at various steps in the process they would all restart from the beginning! This means you could possibly get 10 additional Virtual Machines and have 10 Virtual Machines in an intermediate unfinished state. I have come up with some of my own policy design ideas on how to design policies to work correctly with the failover that Opalis does provide. See the Workflow Concepts series of posts for more details. The High level overview post is available here http://opalis.wordpress.com/2010/12/13/workflow-concepts-part-ii-%e2%80%93-segregation-of-work-high-level/.