Tag Archives: research

Payment by results schemes – advice for the Government and advice for contractors

Advice for the Government
I heard on the radio today that the Government will establish a payment by results scheme for a service to reduce recidivism among offenders after a short (less than one year) jail sentence. Currently, approximately 60 per cent1 of such offenders are re-incarcerated within one year – the so-called ‘revolving door’2. Contractors will help released offenders find their way in life; for example by making a deal with housing associations to provide accommodation and providing other sources of support. These contractors will be remunerated in proportion to their success in reducing reoffending. However, the scientific evidence that this will work is not strong3 and there are a number of potential challenges to implementing such a scheme4. These include the potential for gaming of the system and ‘cherry-picking’ certain cases to maximise returns; the difficulty in measuring outcomes that cannot easily be defined or evaluated; where to obtain the payments from as not all savings made from a reduction in crime would be available as money, and that which is, would be from both public and private sectors; and the scale of change possible, as most successful interventions have only produced small changes in outcomes4,5.

More important, from the point of view of the remuneration, is that the extent to which it could work – the effect size – is poorly calibrated. This is because insufficient head-to-head trials have been conducted of different interventions to reduce recidivism. This places the taxpayer at considerable risk of either under and over paying for the service. The corollary is that – payment by results schemes should only be introduced where there is a good way of calibrating cause and effect consequences of the service. I know of what I speak, since I chair the scientific advisory committee for the payment by results scheme for the multiple sclerosis drugs. The idea here is that the drug companies would repay some of the costs of the drugs if they underperform, or the Treasury would provide a retrospective enhanced payment if the drugs worked better than expected. The problem here is that the effect of the drugs has only been properly calibrated after two years of use, whereas the scheme runs for ten years and is concerned with longer term outcomes. So, we have to try and work out whether the drugs are working better or worse than expected, not by means of a proper experiment (head-to-head trial), but by simply observing how well people do on medicine and trying to compare this with a retrospective cohort of patients. This is a very tricky and uncertain business. This problem, of working out how effective interventions are, leads me to advice for contractors.

Advice for contractors
As a contractor, I would choose my ground very carefully. I would try to provide services, in situations where there is likely to be a positive underlying trend. In that case, the underlying trend would contribute to my ‘results’. With the wind behind me I would have a very good chance of making a sturdy profit.

A third way
There is of course an alternative proposition. This would be to bring in the payment by results scheme as part of a prospective, carefully designed study. For example, the intervention (payment by results) could be rolled out sequentially across different parts of the country, where the order was determined at random – a so-called cluster stepped wedge design6. Such a study, if large enough, could be used not only to tell if the general idea of payment by results works, but also to determine which type of scheme is most effective. In other words, it would be possible to get a handle on which types of service provides best outcomes. The Cabinet Office has advocated such experimental approaches to public policy7. I strongly urge the Government to look at its own excellent plan of making policy on the basis of empirical evidence.

1. Ministry of Justice. Table 19a: Adult proven re-offending data, by custodial sentence length, 2000, 2002 to March 2011 in Early estimates of proven re-offending: results from April 2011 to March 2012. 2012 Available from: http://www.justice.gov.uk/downloads/statistics/reoffending/proven-reoffending-apr10-mar11-tables.xls (accessed 9 May 2013).

2. Cutherbertson P. The failure of revolving door community sentencing. Centre for Crime Prevention. 2013. Available from: https://docs.google.com/file/d/0B25IaOtJKlvwYjkxVENsbi1TbTg/edit?usp=sharing (accessed 9 May 2013).

3. Nicholson C. Rehabilitation Works: Ensuring payment by results cuts reoffending. London. Centre Forum; 2011.

4. Fox C, Albertson K. Is payment by results the most efficient way to address the challenges faced by the criminal justice sector? Probation Journal. 2012; 59(4):355-73.

5. Fox C, Albertson K. Payment by results and social impact bonds in the criminal justice sector: New challenges for the concept of evidence-based policy? Criminology & Criminal Justice. 2011;11:395.

6. Brown CA, Lilford RL. The stepped wedge trial design: a systematic review. BMC Medical Research Methodology 2006, 6:54

7. Haynes L, Service O, Goldacre B, Torgerson D. Test, Learn, Adapt: Developing Public Policy with Randomised Controlled Trials. Cabinet Office. Available from https://www.gov.uk/government/publications/test-learn-adapt-developing-public-policy-with-randomised-controlled-trials (accessed 9 May 2013).


Using individual surgeons’ outcome data for quality control purposes

I have to start this blog with an anti-blog. Some of my “immense following” thought I was having a dig at qualitative researchers in general in my previous blog. On re-reading my blog, I can see how it could have been interpreted in this way, so let me hurry to reassure qualitative researchers that I did not mean to impugn the method; on the contrary, I’m an acolyte of qualitative research. My target was phenomenology and constructivism, and of course it is the ideas, not the people I wish to attack. Incidentally, Kuper said in her BMJ article on qualitative research that most qualitative researchers are constructivists.1 She produced no evidence to back up this assertion which I think and hope is wrong.

But today’s blog is about the Government’s announcement that it will publish the results of individual surgeons. As always with data, making them available is one thing; what people do with them is another. As a libertarian I can find no grounds for supressing the availability of data. But what are we to make of comparisons of the performance of different surgeons?

First of all, a lot will depend on the outcome in question. Below is a graph from my published work which examines standardised mortality ratios (SMRs) as a diagnostic test for preventable mortality.2 Even if risk adjustment can explain a massive 80 per cent of the variance between hospitals, you can see that SMRs are a rubbish diagnostic test; SMRs are neither sensitive nor specific at hospital level because less than 20% of mortality is preventable. False positives are not neutral and no one who has been properly brought up would ever use a test this bad, not even for screening.


But hang about, these are hospital mortality rates, not the mortality rates of individual surgeons. Overall hospital mortality represents the interaction of tens of hundreds of different variables – different doctors, different nurses, different pharmacists. Surgical mortality is likely to be much more dependent on the individual surgeon, especially when it is a technically demanding operation, such as the management of leaking aneurysms and removal of inaccessible giblets, such as pancreas or oesophagus. However, we still need to proceed with great caution for the following reasons:

  1. Surgeons attract different case loads and those with a black belt are often sent the most tricky cases. And remember, case mix adjustment is an imperfect art. The technique can sometimes even exaggerate the very bias that it is designed to counteract.3
  2. The number of operations of a given type that a surgeon carries out can be rather small, yielding wide confidence limits. The data should be entered on a funnel plot, since studies by David Spiegelhalter and others show that that this is better than other methods at helping people to understand natural variability.4
  3. As practice improves, so league tables will vitiate their own success and become less useful diagnostically. This is because the greater the variance between surgeons, the greater the signal in the noise and the steeper the slope of the curve in the above diagram.2 As surgeons with the worst rates improve or desist, so variance between them decreases and the information content of SMRs declines. The graph in the figure assumes that the coefficient of variation of the preventability rate is twice that of the outcome overall.

Of course surgeons can be compared with respect to outcomes other than mortality; revision rates for hip replacement for instance. However, the above caveats, especially the one about the best surgeons attracting the most difficult cases, still apply.

As an aside, it is sometimes said that using individual surgeon outcomes in performance management does not result in cherry picking on the grounds that league tables in cardiac surgery have not resulted in selection of progressively lower risk cases over time. However, a moment’s thought shows that this is a logical fallacy because there is no counter-factual. The natural method of medical advancement is to progressively stretch indications as knowledge and experience accrue.

I do, however, understand the concern generated by high profile cases where internal hospital procedures have not spotted incompetent surgeons. My prescription would be “to investigate the investigators.” I would make it the medical director’s job to look at the figures and to probe the explanations. The medical director can triangulate the figures with other data. If a surgeon is an outlier and anaesthetists and theatre sisters corroborate technical incompetence – then that is one thing. However, if the outlier turns out to be an acknowledged virtuoso surgeon, who attracts the most difficult cases for that reason, then that is another thing altogether. The Care Quality Commission should not jump on the individual surgeon in a draconian fashion but check that the medical director is doing his or her investigative job. Indeed, this was precisely the route followed by my colleagues and I with respect to an outlier in a surgical trial and I commended it to the government as a suitable model for routine practice as well.5

01. Kuper A, Reeves S, Levinson W. An introduction to reading
and appraising qualitative research. BMJ. 2008;337:404-7.

02. Girling AJ, Hofer TP, Wu J, Chilton PJ, Nicholl JP, Mohammed MA, Lilford RJ. Case-mix adjusted hospital mortality is a poor proxy for preventable mortality: a modelling study. BMJ Qual Saf. 2012. Available from: http://qualitysafety.bmj.com/content/early/2012/10/12/bmjqs-2012-001202.full

03. Mohammed MA, Deeks JJ, Girling A, Rudge G, Carmalt M, Stevens AJ, Lilford RJ. Evidence of methodological bias in hospital standardised mortality ratios: retrospective database study of English hospitals. BMJ 2009. Available from: http://www.bmj.com/content/338/bmj.b780

04. Spiegelhalter D. Funnel plots for comparing institutional performance. Stat Med 2005. Available from http://www.ncbi.nlm.nih.gov/pubmed/15568194.

05. Mason S, Nicholl J, Lilford R. What to do about poor clinical performance in clinical trials. BMJ 2002. Available from http://www.bmj.com/content/324/7334/419.