Loading summary
Santi Ruiz
Hi, I'm Santi Ruiz and this is Statecraft, an interview series about how policymakers get things done. There are a lot of forces in policymaking in our lives generally that push us towards the short term quarterly returns, the President's daily briefing, two year election cycles. It's very hard for most politicians and most policymakers to break outside of that cycle. The pressures to get results and to show results on a tight turnaround in more and more domains are incredible. One of my interests on Statecraft for a while has been how do you build a machine to get long term results? Whether it's a new agency or a new initiative, or just somebody thinking long term in their own individual career. How do you set up a structure so that you can actually work with a goal that's 10 years away, or 20 years away, or 50 years away and protect that thing from the short term political pressures that almost everybody's under? Today's interviewee is Sir Rory Collins. He spent two decades building and leading one of the most important scientific resources in the world, the UK Biobank. The Biobank represents a fascinating case study in long term thinking. It's a database of half a million British participants whose health is being tracked longitudinally, that is for the next 30 years. The Biobank was established in the knowledge that the upfront work and the spending required would only start to pay off 10 or 15 years later. When so Rory went in for the 10 year review with funders, they asked what had been achieved so far. So Rory said, nothing yet. It's democratized access to population scale data for researchers worldwide and it's already yielding amazing insights into the causes of and cures for a disease. So Rory had this vision in the early 2000s of an institution that would pay off way past the political sell by date. I wanted to understand how he built the UK Biobank and just as importantly, how he managed to sustain it over a long period of time. As a reminder, you can find the transcript for this conversation and for many others at www.statecraft.pub. that's Statecraft Pub. If you like this episode, or even if you don't like it, leave us a review wherever you're listening to this podcast. One final note. I refer to Sir Rory Collins several times as Sir Collins, but I've been informed that as a knight, he is actually Sir Rory, whereas if he were a lord, he would be Lord Collins. I regret the error. Without further ado, Sir Rory Collins, welcome to the show.
Sir Rory Collins
Thank you very much for having me on.
Santi Ruiz
I'm very excited to have this conversation. I wanted to start with just some basics for those of us, especially on the American side, who may not be familiar with the biobank. And I was wondering if you could put some numbers on it for me. So there's 500,000 people in the biobank. What has happened to them?
Sir Rory Collins
So UK Biobank was set up the beginning of the century, initially came out of an idea from the Wellcome Trust charity, which funds a lot of medical research, and the Medical Research Council are UK equivalent essentially of the National Institute of Health to government funding. Really the concept, which I think came from a lot of epidemiologists, geneticists, health researchers, was to set up a large, what we call a prospective cohort. So we recruited half a million men and women who were aged 40 to 69 between 2006 and 2010. So they're from all across the United Kingdom, from different socioeconomic status, urban, rural individuals who came along, answered lots of questions about their lifestyle, their environment, their family and their own medical history, agreed to provide us with biological samples and to make physical measurements and importantly to follow their health, which we've been doing over the last 20 years, largely through linkage to the National Health Service health record systems, but also going back to participants themselves and asking them about different aspects of their health.
Santi Ruiz
And talk to me about the scientific value of the biobank. How many studies, what have we learned so far? What are we hoping to learn in the future?
Sir Rory Collins
The beauty of studying people in middle age, when they're relatively healthy, is you can find out about their exposures, not just their genetic makeup, but also the way in which they're living. You can measure other things in their blood, all of the things that predate the development of disease. Because obviously getting a disease can change your lifestyle, it can change other measures in your blood. The big advantage of having studied them then and then following the health is we can understand what are the causal determinants of disease. So not only what are the genetic, but also the lifestyle environment, the proteins in their blood, the metabolites in their blood that subsequently lead to them developing some particular condition. And the important thing that the Medical Research Council and the Wellcome Trust required of us when we set it up, was to make these data available to researchers around the world for any kind of health related research in the public interest. And that's the consent that these half million altruistic participants who agreed to. So the data are being used by scientists everywhere to try to understand why it is one person that gets a particular disease, another doesn't. Is it their genetics, is it their lifestyle? And what are the pathways through which genes, environment and lifestyle might lead to getting a disease? So looking at proteins and metabolites. So there are thousands of researchers around the world working on the data. Last year alone there were 5,000 peer reviewed publications based on UK Biobank. So it's unprecedented in terms of the scale of discovery and the range of discoveries that are emerging from UK Biobank, all down to the altruism of those half million participants. It's an interesting question as to exactly why there was a decision to set up UK Biobank when it was made. Now it looks prescient. At the time it was really a kind of risky decision. So if you're interested in genetics, the great thing about genetics is your genes don't change. So in some respects the most efficient way of studying the genetic determinants of disease is take people with a particular disease and compare them with those who don't, because the disease itself won't have changed your genetics. And therefore you can look to see what are the differences in the genetics of someone who has a disease as compared to those who don't. And that's of course how we've learned so much about genetic mutations that are really strongly causal of disease. From an ecological perspective, of course it's much harder because lifestyle, your environment, is influenced by disease. So you want to study people when they're healthy, find out how they're living when they're healthy, and then follow them for a long period of time to understand what it was that led to the disease. So for example, you study people smoking habits and then follow them and you find that smoking causes lung cancer, many other cancers, cardiovascular disease, respiratory disease, et cetera. Of course, one of the best ways to get people to stop smoking is for them to have a heart attack, then they stop smoking, so they change their environmental risk factor. And there was a lot of controversy at the beginning of the century as to whether a study like UK Biobank was the right thing to do if you were interested in genetics alone. But we're not. We're interested in how genes and environment and lifestyle interactions. And for that you need to set up a large prospective study, large because you need enough people to develop any particular condition in order to be able to have the statistical power to determine what are the causal factors and prospective with long term follow up so that you are actually studying the causal associations. You can exclude events that occur quite soon where they may have influenced your non genetic risk factors. It required a vision of the Importance not just to genetics, but of all of the different drivers of disease. For the Medical Research Council and the Wellcome Trust to decide they were going to set this up with a recognition that for the first 10 or even 15 years it was unlikely that it would actually produce anything of material value in terms of discoveries. Now, of course, it's a gold mine.
Santi Ruiz
What made you think this was even a feasible project to undertake?
Sir Rory Collins
Well, there were a number of epidemiologists who have been involved in setting up large prospective studies. The kind of classic in Britain was a so called British doctors study that studied doctors and their smoking habits and has followed them for a century. And so the ability to recruit like large numbers of people, certainly there. But when one looked at the studies that had been done, they had largely looked at risk factors that were old risk factors, so blood pressure, body mass index, so obesity, cholesterol levels, where cholesterol had been measured and the samples hadn't been scored. A number of people around the world had seen the value in establishing a large prospective study with biological samples stored and then just waiting until it was possible to assay those samples at that very large scale. Hundreds of thousands of individuals with markers that hadn't been studied so far. So we knew how to recruit large numbers of people. In the UK we had been running really quite large randomized trials, working with the National Health Service to recruit directly into studies. So it seemed to us that one could take that approach of working with the National Health Service to invite people to join the study and then when they agreed to take part, to really look at how one could build a kind of factory production line where they would go through answering lots of questions. And we used touchscreen technology, which was relatively novel at that time, to get lots of questions answered and had people moving through this production line of answering questions, having physical measurements, collecting biological samples, and if you invited enough people and you had enough funding, then you really could recruit very large numbers, which is what we did.
Santi Ruiz
When you say that epidemiologists had put together very large prospective trials in the past, what's very large in that context?
Sir Rory Collins
Well, what became clear was that there have been a number of observational studies, so long term follow up studies of things like blood pressure, but cholesterol, body mass index, tens of thousands, hundred thousand, Perhaps the biggest was the so called Mr. Fit screening study in the US of a third of a million American men that showed this very lovely association between cholesterol level and the risk of coronary artery disease. That showed that throughout the range, higher cholesterol was associated with higher risk and there really was no normal level within the western range whereby lower cholesterol was not likely to be cardia protective. And so these very large prospective studies, either individually or collectively, through a meta analysis, kind of combination of the results from different studies, showed that if you had large numbers of individual, hundreds of thousands of people in prospective studies, either individual ones or combined, then you could get really very clear signals about the strength of the relationship of certain risk factors, the risk factors to be measured and disease. And so that really generated the hypothesis that if we could create studies of hundreds of thousands of people, then we would be able to study many more risk factors if we stored the samples and then just waited until we could analyze them at scale.
Santi Ruiz
As you and others and other advocates for UK Biobank began to shop the idea around and build support, did you run into trouble with the long term nature of the project, that all of the costs are upfront and all of the benefits are out at least a decade?
Sir Rory Collins
I know that a number of individual researchers put in grant applications which were not supported. The International Cancer Research IARC in Lyon set up a study of the dietary determinants of cancer where they collected biological samples, but it was really a study of studies. So in each country they did a study and then IARC in Lyon brought those together. They're all slightly different, but it did produce a large scale prospective study. But these were seen as studies that were of value for non genetic risk factors. And when UK Biobank was being proposed by the Medical Research Council Wellcome Trust, I think that one of the drivers for it was the opportunity to study the genetic determinants of disease. And quite rightly people said, well, if you're interested in genetic determinants of disease, you can get quick answers relatively cost effectively by just studying people with disease and people without. And I think the messaging didn't really get across clearly enough that one wasn't interested solely in genetics, that one was interested in all of the different risk factors. Genetics, environmental, lifestyle and the beauty in the genetics along with the environment and lifestyle is that one can actually unpick the causal aspects of environmental and lifestyle risk factors. And so this so called Mendelian randomization, where you use genetic changes in say body mass index to determine whether body mass index is actually causally related to disease. And so having all of these things in one study is incredibly powerful, not only for determining the strength of an association, but but determining its causal nature.
Santi Ruiz
Will you tell us a little bit more about those initial years in the early 2000s, when you were building support politically and institutionally for it, were you going to Whitehall and convincing MPs what was the day to day as you tried to put together a coalition for it?
Sir Rory Collins
No, I wasn't seriously involved in it until really around 2005. The decision was made by the Medical Research Council, Wellcome Trust, that they wanted to fund a large prospective study. So it really came from within those organizations. It's quite difficult to understand exactly why those decisions were made. I think partly they were a result of epidemiologists having argued the case or the value of having large prospective studies that could study newer risk factors and the value of having much bigger studies than we'd had before. I think part of it was the excitement about the ability to study genetics at scale. And these kind of came together, in my view, to help the funders decide that they wanted to set up a large prospective study that would be able to look at the genetic determinants of risk, but also to look at those in the context of lifestyle and environment. And so they made an internal decision to fund this study. They set up a small group led by Tom Mead, an epidemiologist, to get input from scientists around the UK and beyond, around what should that study be in terms of its scale? Looking at power calculations to decide what sort of differences in risk could be detected with studies of 100,000, 200,000, it came up with half a million individuals. The age range was selected on the basis of young enough that you would be studying people before disease was changing the risk factors, but old enough that the study would start to produce results within 10 or 15 years. So there was always an expectation that this was a long term commitment. I think only organizations, institutional organizations like the Medical Research Council, Wellcome Trust, could make a decision whereby the first 10 years are really just waiting. I don't think they, or indeed any of us, thought it would become as productive as it has been. And the reason it has, I think, is because of access. One of the things that the Wellcome Trust and the MRC decided at the very beginning was that this was not a resource that was being produced by researchers for themselves alone, that this was going to be a resource that was going to be made available as widely as possible in order that that investment, both the financial investment and the altruistic investment of the participants, actually generated as much knowledge about how to prevent and treat disease as possible. And that, I think, has been the transformative aspect of UK biobank, the kind of model of accessibility when you say.
Santi Ruiz
That you and others didn't expect that long term value. Why not? And what changed that made you update?
Sir Rory Collins
There was quite a lot of uncertainty about the quality of findings that would emerge if you just made data accessible to researchers around the world. There was also uncertainty about what kind of capacity there would be to turn the samples into data. To give you a sense of that, when it was decided to move forward, it was recognized with half a million samples. And the idea was to sub aliquot those samples into multiple aliquots, collect them into multiple tubes and subaliquote them in order that you could retrieve particular samples from particular individuals. It was decided to build an automated archive, which was again state of the art, had never been done to keep the samples at minus 80, but be able to track them, retrieve them when you wanted to. And that archive was built with a single robot because it was expected that one would pull out samples from say 10,000 women who developed breast cancer after 10 years and 10,000 controls to compare the people who do and the people who don't develop a particular cancer. So called case control approach, or a case cohort approach, where you might have the same controls for a number of different diseases. If what you're interested in is studying a particular disease, that again is very cost effective because instead of analyzing all half a million people, you measure the, the people with the disease you're interested in. And some match control. But after we had recruited all the participants and we were discussing this with our scientific advisory board, they said, well, if you're trying to build a resource for researchers around the world to use for all kinds of different research, then a strategy where you assay the samples as if you're doing it for a single researcher isn't actually the best approach. So the thing to do is to wait until you can actually assay all of the samples for some particular type of assay, so genotype the whole cohort until you could do that, then leave the samples alone, and again until people have developed disease. So during the first 10 years when they're relatively healthy, there is no value in assaying the samples. So I think one of the things that we kind of really had as a watchword was to defer, don't do anything now that you could do better later, rather than doing case control, better to wait until you can genotype the whole cohort. And that then turned a study of a particular disease into a resource for researchers to do all kinds of different research.
Santi Ruiz
How did you manage to build an institution that has defer as a watchword, because I think many listeners will just be, I think surprised and impressed at an institution that was set up with the ability to pass the marshmallow tests not to try and eat dessert first. I'm curious, are there things operationally that were built into UK Biobank that helped make it easier to make those decisions, to wait and defer and take more of the value down the road?
Sir Rory Collins
They say that the definition of an intelligent man or woman is one who can hold two opposing views at the same time, don't they? In a way that was the situation I think, for all of us. We all want quick wins, but on the other hand, we know there are no quick wins in a prospective study. It was really kind of just reinforcing the messaging. I mean, I Remember at the 10 year renewal application, so every five years the funders say, well, what have you done? And of course the first 10 years we were building, we were recruiting and then we were starting to turn samples into data. We were linking the people into their health record systems. But it was a very slow buildup of use of the resource. The 10 year review, I was asked, you know, what had UK Broadbank done for which it was set up? It was a one word answer, nothing. There was a kind of quiet pause at that point. But that was the truth. You know, it was a resource that was in gestation. I went on to say that it will be the next 5, 10, 15, 20 years where it really starts to deliver on the investment that has been made. The thing that really I think transformed UK Biobank, that made it move from being a UK study to being an international study, was when the UK government has some funding available and they offered that funding to UK Biobank to get the genotyping done. Affymetrics in California won the tender to do the genotyping. We were able to tell researchers that there was genotyping data available in high learning participants and that's where we were moving to a scale that had never been done before. Researchers had been combining studies of a few hundred or a few thousand people to try to understand what were the genetic determinants of say blood pressure or height or cardiovascular disease. And here now with genotyping data on half a million people, and that was really what put biobank on the researchers map. And we started to see an increase in use of the resource internationally, particularly moving from the uk, where of course a lot of researchers have been involved in designing and building it, to an increase in North America and in mainland.
Santi Ruiz
Europe as The Biobank got off the ground, especially in the second half of the 2000s, there was a lot of public scrutiny, a lot of public interest, as half a million people were recruited, Obviously, did that public scrutiny or parliamentary scrutiny lead to significant changes in the operations of the biobank? How much did you adjust in response to feedback?
Sir Rory Collins
The thing that the Wellcome Trust and the MRC did before anything happened was really widespread consultation, consultation with the research community about what were the things that people wanted to study. So a lot of different working groups were set up to determine what questions or measurements one would make to be able to understand a diet or activity. There were a whole set of pilot studies done by researchers to look at how you collect samples and assay those samples in the future. So, you know, how would you collect and process a sample so that when the technology caught up and you could do that at the scale of half a million people, your sample would be the right kind of sample. So a whole set of different groups piloted how to collect and process samples, as I say, what questions to ask, what measurements to make. There was a lot of consultation with patient groups around the issues of things like feedback and no feedback. So in UK Biobank there is a policy of not feeding back information to participants. Now, that was something that went through a lot of iterations because you imagine, well, you want me to join a study, you're going to make all these measurements, why aren't you going to tell me what those measurements are? Then you go, well, we don't know what these measurements mean. Do you still want us to tell you what they are, when they may or may not be relevant to your disease? They may be misleading, for example. So even things like single gene disorders that are identified in a population may have quite different relevance in a free living population. So they may be much less predictive of disease. And we don't know these things. And so the whole purpose of UK Biobank is to learn. So the issue with feedback is it can actually cause harm, it can mislead. This was discussed in great depth about the feedback policy and I think there are different positions on this. The one thing you can say is that if you don't have feedback, you can guarantee you won't cause someone harm. But it is one where there was a lot of ethical, legal and participant engagement before that happened. So there had been very widespread consultation in that respect. I think the areas where there was controversy was to set up a study like this is very expensive. And so researchers were saying, well, that's been to take away funding that could be used for my research. Now, that is a perfectly reasonable concern. I think actually what happened was that Wellcome and the Medical Research Council put additional funding in to set it up. So I don't know that it really had that effect. The other controversy was around, is this what you would do if you were interested in genetics alone? And that was allowed to rumble on for longer than perhaps it should have done. It took a while to kind of address that. And I remember when I was asked to take on UK Biobank in 2005, the first thing I did was go and talk to the geneticists who had been saying, well, this isn't what you do if you want to set up a genetic study. I said, well, it really isn't a pure genetic study. This is an epidemiological study where we need to study people when they're healthy and we need to follow them up long term in order to look at the association between genes, environment and lifestyle. And they kind of went, oh, well, why didn't someone say that? The same with some of the people in government. The chair of the Health Select Committee had also raised concerns about this and again, I went and talked with him. So a lot of it was about addressing misunderstandings of what UK Biobank was about.
Santi Ruiz
To go back to the point about communicating to participants, these altruistic volunteers. Imagine I was someone who was coming into the NHS in 2009 and signing up for my data to be tracked anonymously in the UK Biobank. What information would I be told about my measurements and what would I not be told?
Sir Rory Collins
When we wrote out through the NHS to people who were living within about 50 miles of where we had recruitment centers, they were invited to participate. They were sent an information leaflet telling them what the study was about and that there wouldn't be feedback of any of the subsequent analyses of the samples. They were told that when they came to the assessment center, they decided to come if anything was noticed, particularly during the visit, any kind of issue that was an incidental finding and things like their smoky history would be recorded, their body mass index would be recorded. They got about half a dozen piece of information given to them at the end of the visit that told them about the heel ultrasound density test that was done during the visit, which is a rather crude measure. If you have a very low density, it can be suggested of osteoporosis. They were told their body mass index as to where they were on the scale in terms of overweight, but that was about it. But it was very Very clear in the consent that this was not a health check and that they would not be getting feedback of their subsequent results, although we would be keeping them informed about what UK Biobank was doing in terms of the use of the data and research findings that emerged and how they were contributing to helping to understand how to prevent and treat disease. We also had their agreement to get access to all of their medical health related records and to get back in touch with them to ask for more information provided that did not constitute feedback itself. So we wouldn't go back to somebody and say, you know, you have a particular genetic abnormality and we want to get more information from you.
Santi Ruiz
Despite that strong limitation on information sharing, it's my understanding you hit 500,000 participants earlier than expected, that there was more interest.
Sir Rory Collins
We set up a machine that worked. The IT team that set up the systems for running these assessment centers did a really good job. We were essentially, if you like, running an airline. We had 100 seats per day in each of the half dozen centres we had open. We ran them seven days a week. If we only got 90 in a day, we would run out of money. If we went to 110 a day, the system crashed. People ended up having to kind of hang around waiting and they were unhappy. The systems were put in place to monitor the process of inviting people filling up the appointment slots, not overfilling them, monitoring to see whether there are any delays occurring in any of the centers on any day, as well as monitoring the quality of the data that was being collected. They did a very good job in managing that. So we were able to recruit on time, a bit ahead of time, and most importantly on budget.
Santi Ruiz
I'm curious. The Biobank was established as an independent charitable company. Of all the forms that it could have taken, was that ever in question? Were there ever other shapes proposed for the UK Biobank institutionally?
Sir Rory Collins
Not really. They didn't want it to be the study of a particular institution, so they didn't want it to be, you know, I don't know, an Oxford University or a Manchester University study. The MRC and the Wellcome Trust were keen for it to be seen as being a UK institution with no researcher ownership of it. Again, to stress this accessibility. And when I was made the principal investigator and chief executive, I kind of made the point that people could not collaborate with us on using the data. It was their data to use. So although I am a PI, I probably publish less on UK violent than almost anybody else. We wanted to kind of break the mold and make it Clear that the data were for researchers anywhere in the world to use for health related research in the public interest. They didn't collaborate with us, they made an access application, they complied with the material transfer agreement for accessing those data and they applied their imaginations to learn more about how to prevent and treat disease. I think the main driver from the Medical Research Council and the Wellcome Trust perspective was, was to reinforce his point that it was not a study that belonged to any researcher, it was a study that belonged to the research community to do health related research in the public interest.
Santi Ruiz
I'd love to get your thoughts on the various lessons from setting up the Biobank and from being the principal investigator over the last 20 years. For those who are listening to this and who are looking to establish new research institutions or new research enterprises, especially larger, more complex ones like the UK Biobank is, maybe let's start with institutional design lessons. What would you say to people who are trying to stand up an institution that can carry on complex research work in the long term?
Sir Rory Collins
I think the thing that we focused on was what do we need to do now, what do we need to put in place so that we can do what we want to do in the future, but don't try to do everything now? At the beginning of UK Biobank there was a lot of pressure to say, well, what's your access policy? And our point was we haven't got anything to access. What we're focusing on is getting something to access. And we're doing that in such a way that we will be able to allow the data to be used in the way that we want it to be used. But we're not going to develop the access policy. We want to focus on what do we consent to participants for? Do we get the information from the participants? Do we collect the samples in ways that can be used in 10 years time? 1 of the first things I did when I was made the CEO was to cancel an order for biochemistry analyzers. I said, let's recruit the people, let's collect the samples and we'll think about what we analyze when we've got the samples. It goes back to that thing about you defer what you don't need to do now. That means you can kind of focus your attention on the things that you do need to do now and you can focus your resource on the things you can do now. And hopefully in the future the things that you want to do will be easier to do and cheaper to do. If we had genotyped all the samples as we were collecting them, which some studies do, it would have been a lot more expensive to do them when we did do them. Perhaps more importantly, the quality would be worse because you would be collecting them in the order of the participants you collected. And if there were shifts in the type of participant and shift in the assay methodology, you'd find it very difficult to untangle those. Whereas if you wait until you've recruited everybody and then analyzed the samples in a kind of quasi random order, you get rid of any systematic difference between the participants. And you know, you're using better technology because it's later, you use a smaller volume, it's at lower costs. I think you focus on what you have to do and just make sure that you're planning things in a way that allows you to do the things you want to do in the future, but not necessarily do them. People struggle with that sometimes. You know, what's the quick win? The quick win is not doing something.
Santi Ruiz
That principle of deferring, which we already talked about a little bit. On the one hand, I understand that it's relatively easy to keep insisting on that principle in a longitudinal study because the study is set up such that you won't get the value for a while. But these things like choosing a data access policy, all these other things that you could be doing, how much political pressure was there, either from parliament or just from other actors who wanted to be seen doing something. Getting these quick wins a lot.
Sir Rory Collins
It's often easier to do things than not do them. The question is, what's the best strategy? And if you try to do too many things at the same time, then you risk not doing the ones that need to be done as well as they could be done. I think just focusing on what needs to be done ensures that it's done better and deferring things that don't need to be done until they can be done better, more efficiently, lower cost. I think it's an argument one has to make repeatedly, but I think it is one that is worth considering.
Santi Ruiz
If you could go back to 2005, when you first became principal investigator and CEO, would you do anything differently this time around?
Sir Rory Collins
I try to work out how we could get a million people for the same money would be number one.
Santi Ruiz
What would that. What would that give you statistically that 500,000 people will not.
Sir Rory Collins
You get more people developing disease within a shorter period of time is the number of people who develop a condition that fracture power. I think the age range was the right age range. I think we got quite a Lot of diversity for the uk. One of the arguments always made against the UK Biobank is there aren't a lot of people from Africa or from India or from Asia in UK Biobank, but at the time they weren't in the uk. If you want to study a wider range of peoples, then you need to set up studies like UK Biobank. You carefully selected different parts of the world and in fact we set up a parallel study in China, the China Kadori Biobank to UK Biobank at the same time in order to address that. And we also worked with the Mexican Ministry of Health to support a large scale study in Mexico. So you can't get some kinds of diversity in a population like the uk, but we have diversity in terms of socioeconomic status, in terms of rural, urban, et cetera. But larger numbers would give you greater power to study more conditions and rarer conditions. I wouldn't have done the biochemistry assays. When we did them, we had to do haematology with fresh blood, but with the biochemistry there was a lot of pressure to do the assays of things we know about like cholesterol and various cancer markers. But I actually think it was a lot of work, quite expensive at the time, used quite a bit of sample and probably one could do that now with some of the newer technologies with less effort and less cost. Again, that was something we could have deferred, but in general I think it's worked out well. As I say, the access model has resulted in very, very much wider spread use of the resource and many more findings than I have anticipated at this pace.
Santi Ruiz
Maybe this is a silly way of framing this question, but I'm curious if I was thinking of setting up a US biobank, say 600,000 participants, what would the value of that American biobank be in a world where the UK Biobank already exists? How much of the value has already been built, built by an existing biobank?
Sir Rory Collins
Well, of course, under President Obama, the All of US study was set up. I was one of the people asked to advise on that study, given our experience in the UK Biobank. And I think it is going to be a very important study because it includes a much wider range of people than UK Biobank, very large numbers of people from various, various ethnic groups that will be interestingly different. So I think genetically and also in terms of lifestyle, environment, they will be different. I think that it's really important to have a few studies in different parts of the world in order to understand the full range of genetic, lifestyle, environmental risk factors and a wide range of different diseases, you go to different parts of the world, you have different rates of disease. And so studies that allow you to look at the widest range of exposures and the widest range of diseases can't be achieved in a single country. So I think what UK Biobank does is it demonstrates the value of having such studies and it demonstrates the value of making that data as widely available as possible. And I think our experience in UK Biobank of doing so has been positive in two ways. One, it showed that if you make data available to researchers, a lot of research findings come out. The second thing is that there's been a lot of external investment in making UK Biobank even more valuable. The UK government funded the genotyping, but it was actually industry that funded the exome sequencing, a large part of the whole genome sequencing. They're now funding proteomic assays on the data. Government supported imaging in UK Biobank, we've imaged 100,000 of the participants. But now industry and philanthropic organizations are funding repeat imaging. So every dollar of Wellcome Trust and Medical Research Council funding of UK Biobank has leveraged about $12 of external investment in enhancing the resource in assays in imaging. And then those data being made available to that very wide range of researchers, that's allowed them to do even more research than otherwise would have been possible. So accessibility generated researchers using the data generated inward investment generated researchers being able to do even more with that enhanced data. And I think that that experience has helped other similar studies, like all of us, consider the value of that kind of access model. The Mexican Ministry of Health study, the Mexico City prospective study, has just been made available to researchers around the world. It was exome sequenced by industry. Those data are now available to all researchers, having been first made available to Mexican researchers to if you might give them a head start for their contribution to having set up in the first place.
Santi Ruiz
One of the overarching lessons that I think you're pointing to here is just that we underestimate the value of open scientific data infrastructure data that's available to a broad range of researchers. Because although the UK Biobank cost a lot of money to set up, in relative terms, it's actually not that much. And as science and markets evolve, that infrastructure can become incredibly valuable in the future. Beyond infrastructure assets like this, like the UK Biobank, what other kinds of scientific infrastructure would you want to see that could learn from these kind of basic principles of very large sample size and open access for researchers?
Sir Rory Collins
Well, we've seen so much over the last 10 or 20 years of the value of data. But the value of data are only if people can use the data. I think one of the issues is that people are often more concerned about the sins of commission than they are the sins of omission. What could go wrong if we make data accessible? The risks are given great emphasis, whereas the risks of not making data available are not considered seriously enough. And I think what UK Babin does is it really helps to redress that 5,000 peer reviewed publications last year alone. By making the data available, I mean, the benefit of that is enormous. We're already seeing new targets for disease, new ways of identifying decades before disease develops who's at risk, and therefore identifying ways in which we could target intervention to prevent people from developing those diseases in the first place. Better ways of screening for disease than we do at the moment. So you can imagine kind of precision public health using things like breast cancer screening or colorectal screening or prostate cancer screening in a much more precise way, screening the people who really are at risk. Better ways of using cholesterol lowering and other cardiovascular protective treatments in people at high risk based on their genetic and other risk factor data that are coming out of UK Biobank. So perhaps UK Biobank helps to redress that balance and understand the risks of not allowing data to be used, particularly in the health arena. The ability now to have data on secure platforms, which is where we've got the UK Biobank data now, and having researchers come to the data also democratizes access. So you don't need to have a big computer in your institution because you can just come to the data which is sitting on a cloud based platform and you can get access to the compute you need when you need it, from wherever you are in the world. The data are more secure, but I think even more importantly, the data are more accessible in terms of not only being able to get at them, but able to have the compute to analyze them wherever you are. And one of the things that we've really been trying to do is help support researchers in less well resourced parts of the world, in Africa and India, but also Eastern Europe and South America, to take advantage of their brains, but by making the data accessible to them and providing compute for them.
Santi Ruiz
Any other lessons that we should try and get to for an American policymaker audience? I ask because there's a fair amount of folks in D.C. who listen to this and if there's something we should say specifically to them, I would love to get it in.
Sir Rory Collins
I think that one has to have a very long view. You know, UK Biobank was set up 20 years ago and it's now a mature resource. It's now starting to generate important findings. There's real value in setting up studies now. They're not competing with UK Biobank, they're complementary to UK Biobank. They need to be interestingly different. I think with the all of Us study is a very interestingly different population in the US that will really extend the range of genetic and lifestyle environmental risk factors that can be studied. It will really start to come into its own in the next 10 years or so. And the question is not where we want to be now, the question is where we want to be in the future. Thinking about a strategy of having a series of these large scale studies and thinking about how they can be accessible and analyzed together to really understand better how to prevent and treat disease. People kind of say to me sometimes, what's the sunset clause for UK Biobank? And I said, this is a sun that never sets. This is a resource that gets more and more valuable as time goes by. And that will be true. Studies that are set up now, they will become more and more valuable as time goes by. And the more such studies we have and the more we're able to assay in those samples, we're just touching the surface at the moment. Sequencing is just the beginning. The proteomic assays that we're doing in UK Biobank are just the beginning of the kinds of really detailed proteomic assays that will be possible in the next five or 10 years. I think that what we're going to see is a set of these studies that over the next 20 or 30 years will really help us to understand how to prevent and treat huge range of diseases. And no single study is going to be sufficient.
Santi Ruiz
I want to close with just a personal question. You've spent two decades in this role as CEO of UK Biobank, which is, I think, a long time for anybody in any one role. I'm curious how, as you reflect on that, two decades, how has it been, has it been as much fun as it seems like?
Sir Rory Collins
Yeah, it's been an absolute blast and I'm learning all the time. I find myself thinking not what have we done, but more what have we not done? What are the opportunities that we haven't yet taken and how do we take advantage of those to make the resource better? It's been a brilliant thing to be involved in. I work with a fantastic team. Just seeing how the research community has responded to having access to data has been really very interesting. Started out with this project where people said, you mean really we can access the data? Really, we can access the samples? And we would say, yes. And once they'd got past the. What the catch element of it is and started, we kind of found ourselves now turning into a utility. And we have to learn how to function as a utility and think about what are the things that we can make, how can we make UK Babak even more valuable for the research community? And that's what we're focusing on now. The. There's always the next thing we should be doing.
Statecraft Podcast Episode Summary: "How the UK Biobank Was Built"
Hosted by Santi Ruiz | Released on June 19, 2025
In the episode titled "How the UK Biobank Was Built," host Santi Ruiz delves into the intricate process of establishing one of the world's most significant scientific resources—the UK Biobank. Joining Ruiz is Sir Rory Collins, the principal investigator and CEO of the UK Biobank, who shares his two-decade journey in creating and sustaining this monumental project. This conversation offers invaluable insights into long-term policy planning, institutional design, and the challenges of maintaining a research initiative beyond immediate political horizons.
Sir Rory Collins provides a foundational understanding of the UK Biobank's inception:
"[UK Biobank was] set up the beginning of the century... to set up a large, what we call a prospective cohort... recruited half a million men and women ... aged 40 to 69 between 2006 and 2010."
(02:56)
The Biobank was envisioned as a prospective cohort study, enrolling half a million participants to gather extensive health and lifestyle data. The primary goal was to create a longitudinal database that could yield insights into the causal determinants of various diseases over the next 30 years.
Ruiz probes the scientific impact of the Biobank, to which Sir Collins responds by highlighting its expansive utility:
"Last year alone there were 5,000 peer reviewed publications based on UK Biobank. So it's unprecedented in terms of the scale of discovery..."
(04:21)
The UK Biobank has facilitated thousands of studies globally, enabling researchers to explore genetic, environmental, and lifestyle factors affecting health. Notably, it employs a unique approach by studying participants when they are relatively healthy, allowing for the identification of pre-disease factors and causal relationships.
A significant emphasis of the discussion revolves around the foresight required to establish a long-term research institution resistant to short-term political pressures:
"I wanted to understand how he built the UK Biobank and just as importantly, how he managed to sustain it over a long period of time."
(00:04)
Sir Collins underscores the necessity of deferring immediate gains to achieve more substantial future benefits, a principle that guided the Biobank's development.
Ruiz inquires about the feasibility and initial hurdles faced during the Biobank's establishment. Sir Collins elaborates:
"It required a vision of the Importance not just to genetics, but of all of the different drivers of disease."
(09:06)
Challenges included securing funding for long-term projects, convincing stakeholders of the Biobank's multifaceted value beyond mere genetic studies, and addressing initial skepticism regarding its broad scope.
A core theme is the strategic operational decisions that enabled the Biobank's success. Sir Collins shares:
"The thing that really I think transformed UK Biobank... was when the UK government has some funding available and they offered that funding to UK Biobank to get the genotyping done."
(18:08)
Key strategies included:
Deferred Analysis: Choosing to defer extensive assays and analyses until after comprehensive recruitment allowed for higher quality and cost-effective data processing.
Accessibility Model: Making data widely accessible to the global research community without exclusive collaborations ensured broad utilization and external investment.
Automated Systems: Implementing efficient recruitment and data collection systems akin to "running an airline" ensured timely and on-budget enrollment of participants.
"If you try to do too many things at the same time, then you risk not doing the ones that need to be done as well as they could be done."
(35:24)
The Biobank faced public and parliamentary scrutiny during its expansion:
"There had been very widespread consultation in that respect."
(23:21)
Issues such as data privacy, feedback of individual results to participants, and the ethical implications of large-scale data sharing were meticulously addressed through consultations and policy formulations. The decision to limit feedback to participants to avoid misinformation and potential harm was a pivotal policy stance.
Sir Collins imparts several lessons for future large-scale research initiatives:
Long-Term Commitment: Emphasizing the importance of sustained investment and vision beyond immediate political cycles.
Deferring Non-Essential Tasks: Focusing resources on essential tasks and deferring other activities to future points when they can be executed more efficiently.
Accessibility Enhances Value: Making data widely accessible not only maximizes research output but also attracts external funding and investment, exponentially increasing the project's value.
Inclusive Design: Ensuring diversity in participant selection and establishing parallel studies in different regions (e.g., the China Kadoorie Biobank) to enhance the Biobank's global applicability.
"UK Biobank helps to redress the balance and understand the risks of not allowing data to be used, particularly in the health arena."
(42:34)
Looking ahead, Sir Collins emphasizes the perpetual value of the Biobank:
"This is a sun that never sets. This is a resource that gets more and more valuable as time goes by."
(45:29)
He advocates for the establishment of complementary biobanks globally to capture diverse genetic and environmental data, thereby enriching the collective understanding of health determinants. The ongoing enhancements through technological advancements and external investments ensure the Biobank's continued relevance and utility.
Concluding the conversation, Sir Collins reflects on his two-decade tenure:
"It's been an absolute blast and I'm learning all the time... It's been a brilliant thing to be involved in."
(47:41)
His enthusiasm underscores the rewarding nature of leading a transformative and globally impactful research institution.
The episode "How the UK Biobank Was Built" offers a comprehensive exploration of the intricate planning, strategic decision-making, and unwavering commitment required to establish and sustain a long-term research initiative. Sir Rory Collins' insights provide a valuable blueprint for policymakers, researchers, and institutions aiming to create enduring scientific infrastructures that transcend immediate political and social landscapes.
For more detailed transcripts and additional episodes, subscribe at www.statecraft.pub.