Loading summary
Jason Matheny
Foreign.
Santi Ruiz
Hi, I'm Santi Ruiz and you're listening to Statecraft, an interview series about how policymakers get things done. Today's episode was originally recorded and published in November 2023, but we never published the audio until now. Jason Matheny, our interviewee today, is kind of D.C. royalty. He spent time in academia, Oxford, Princeton, Johns Hopkins Applied Physics Laboratory. But from 2015 to 2018, he ran IARPA, the Intelligence Advanced Research Projects activity. If you're familiar with the DARPA model or DARPA, you've got a good idea of what IARPA is. Basically, they're the R and D lab for the intelligence community. Since then, he spent time in the Biden White House and he now runs the RAND Corporation, a major think tank. Really one of the original think tanks. A couple years ago, Jason and I talked about how to predict the future. Jason's been really interested in prediction markets for a while, in predictive accuracy, and Tetlock's work on human judgment. And a lot of his time at IARPA was spent not just on procuring cool new gadgets of the James Bond style, but trying to build tools so that intelligence analysts and operatives could make better calls. As he says in the interview, the wars in Iraq and Afghanistan were multi trillion dollar decisions. If you could improve the accuracy of forecasting, if you could improve the accuracy of forecasting individual strategies by just a percentage point, that would be worth tens of billions of dollars.
Jason Matheny
Thanks for doing these. I, I think this is, it's such a great service to society to, to try to record, you know, things that have worked and things that haven't worked in different kinds of policy efforts.
Santi Ruiz
I hope you're right. I think you're right. And it's a lot of fun too. So I'll get into kind of my formal question asking in a moment, but just for context, I would love to talk about the ARPA model, generally about what makes made IARPA different, what's similar, different, unique. And I'd love to talk about forecasting work specifically at iarpa, because I know that was something that mattered a lot to you in your time there. In your view, what makes the ARPA model unique? And obviously it's been replicated a couple times in a couple different contexts. And you've spoken about and thought a lot about which parts can be replicated and which parts need to be adjusted to different contexts.
Jason Matheny
Yeah. So the, the basic ARPA model is having entrepreneurial program managers who are hired and given a budget and a lot of autonomy to pick research problems that are considered high risk, high reward. Select researchers to fund either based on their individual judgment or the judgment of a small selection panel, and then have the researchers compete in parallel against a common set of metrics, usually with some disciplined approach to ending funding annually for teams that aren't succeeding. So I, I think the, maybe the three sort of core elements then are the program managers, the sort of research tournament, and then the defunding of, of research that isn't succeeding. I think all three are pretty unusual and distinguish the ARPA model from the conventional model of funding, science and technology efforts.
Santi Ruiz
Would the ARPA model work without really kind of stellar program managers? You said entrepreneurial as one of your kind of descriptors. What else makes a good program manager?
Jason Matheny
Yeah, so I mean, just to go a little bit over how program managers are recruited, selected, they're recruited by office directors, typically on the basis of technical knowledge, an entrepreneurial mindset, creativity, risk tolerance, predicted ability to be able to succeed in pitching a program and executing that program. The program manager candidates make a program pitch based on the Heilmeier questions. They go through multiple rounds of review, first with the office director who sort of serves as a gatekeeper. There are other characteristics to program managers that are important. Typically they need to get a security clearance.
Santi Ruiz
Is that, is that true across all of the ARPAs? I know it's true for IARPA.
Jason Matheny
Yeah, it's true for IRPA. I think all of the DARPA PMs, I believe, have security clearances. I don't know if it's true of ARPA Energy or of ARPA Health, but I, I believe it's certainly true of everybody at irpa. I believe it's true of darpa as, as well, I don't think there are exceptions. Now the level of security clearance that could vary, but that does mean that certain kinds of folks aren't, aren't qualified in, in some ways that might even disadvantage research. You know, like non US citizens, which is 96% of the world, is, is disqualified. The program managers are given very little training. They're also given very little management. The main performance measure is whether you get your program approved. And that process is sort of like cross between a dissertation defense and a pitch to a venture capitalist. You know, you, you structure your pitch in, in answers to the hallmark questions. But the, the goal is ultimately to persuade this group of people that you have an idea that could be transformative if successful.
Santi Ruiz
Is there such a thing as too risky or too difficult a project to qualify or is it just measured on impact, if successful.
Jason Matheny
Yeah, great question. Some projects are too risky. That is, if the sort of premise is, hey, if successful, this would be transformative. But in order to be successful, I have to violate the following laws of physics or I have to solve, you know, one particular problem that, you know, Nobel laureates haven't managed to solve in the last 50 years, then I think that's, you know, going to be deemed too risky. There is, there's probably some like Goldilocks zone of appropriate probabilities there. It's probably under 50% and over 5%, something like that. What's interesting is that there are then problems that aren't risky enough and that's seen as being not ARPA hard, even if those problems, if solved, could be transformative. And you know, it's often said that the goal of the ARPA should be high risk, high reward research. Really, just from a policy point of view, what you would love is low risk, high reward research. I mean, you shouldn't disqualify a research project just because it turns out that the project would be easier to solve. Ultimately you should be sort of optimizing based on the cost benefit of, of a particular research investment. And that does mean then that the ARPAs do not tend to go after transformative but easy problems, even if they're, the reason that they haven't been solved is because they're neglected or have just been kind of ignored. Part of the, part of the rationale for the ARPAs using their funding as opposed to other agencies is that the, the risk profile should be high risk, high reward.
Santi Ruiz
And just to flesh that out, is that because ARPAs are aimed to fill a hole in the funding space that wouldn't be filled otherwise.
Jason Matheny
That's right. So the sense is across the federal enterprise or the sort of portfolio of federal agencies that are funding research and development, that the niche for the ARPAs is to focus on higher risk research. And that would be its comparative advantage that the model of having entrepreneurial program managers with relatively little oversight is to give them the ability to go after high risk research that you don't need. The ARPA model, maybe if you're going after lower risk research.
Santi Ruiz
Can you give me an example of research that fits in the target area for an arpa? And then research that would be really.
Jason Matheny
Transformative but doesn't quite fit stealth technology, for example, would be in the high risk, high reward space. And that was, you know, historic DARPA investment. It was also, you know, a US Air Force investment. And the riskiness of it was that we, we had models for how you could significantly reduce the radar cross section of an aircraft. But those models produced aircraft designs that we couldn't figure out how to fly.
Santi Ruiz
Sure.
Jason Matheny
And so that was technically risky, but the impact on military operations, if successful, was going to be transformative. It would mean that you could have aircraft that were penetrating air defense zones. And I think that's a pretty classic case of a, you know, an ARPA hard problem. High risk, high reward. One that benefited from having program managers who were constructive contrarians. I mean, you know, despite being told, oh, it just won't be aerodynamically feasible to design an aircraft that has such a small cross section. An example of a project that, that isn't ARPA hard. So one would be, or sorry, isn't a good fit for, for an arpa, one would be something that doesn't seem risky enough. For instance, you know, producing a specific app for a specific device that solves a, a problem that seems like the commercial sector is going to solve in a year. And anyway, you know, like, at this point in time, like a better camera for a cell phone, it's one, it's not clear that it's, it's super risky because advances are happening in these commercial cameras all the time. Second, it's not going to be transformative for permissions. And the counterfactual of the world in which an arc but doesn't invest in that is that ultimately those advances get made. Get made anyway. What might be another example in the other direction, you know, faster than light travel. Like, you know, maybe it's possible, maybe there are tachyons that are achieving, you know, faster than light speeds. But figuring out how to really leverage that is probably beyond the risk tolerance of ANARPA program.
Santi Ruiz
I want to get into kind of the bureaucratic history of iarpa, but first you just kind of mentioned counterfactuals and the amount of counterfactual thinking that goes into deciding what qualifies as a good ARPA project. Do the ARPAs have a framework for thinking through counterfactuals?
Jason Matheny
So I love the ARPA model. I mean, I think it's a great alternative to a lot of traditional science and technology funding. There's a lot of variation in this model, and there's also, I think, some blind spots. The Heil Merrick questions don't, for instance, have a question about the counterfactual impact or would this sort of work get done otherwise? There's a question about what difference will it make if you're Successful, which I think partially gets at this. But the arpas tend not to do a real rigorous assessment of, okay, here are all the other funding streams going to solve this particular problem. Here's what we understand about the likelihood of success or failure of each of those alternative efforts. To what degree will this problem be obviated, you know, by some other development? We tend also not to think as much about strategic move and counter move. And particularly when you're working in an environment that's competitive, like national security, where you have different actors that are competing against one another in technology, we probably should be spending more time to be thinking about, well, if we develop this technology X, our competitor is likely to develop technology Y. Do we have a reason to believe that technology X will be asymmetrically advantageous to us, or will it simply be one that a competitor could copy or could actually gain leverage out of by exploiting some vulnerability in technology X? So I think there is a kind of strategic analysis that isn't inherent in the DARPA Heil Meier questions or the ARPA Heilmeier questions. And, you know, there have been some efforts over the years to kind of append the, the original Heil Meier questions at ierpa. I would ask, you know, some questions about this kind of competitive aspect. Also the way in which we need to think about baking in security and safety into the technologies we create. So, for example, are there ways of making a technology less prone to reverse engineering, to theft, to misuse? Because many of the things that the US Government funds get stolen even when we classify them. And if you assign some probability to the theft of some classified piece of research, and I think it probably is prudent to assign at least a 10% probability to some exquisite technology that we're developing in a classified environment to that being stolen, then you should be thinking in advance, well, will I be better off or worse off having developed this technology if it's stolen? And you should plan ahead for that, and you should think, is there some way of baking into the technology itself some protection against the possibility that it's stolen, the possibility that it's stolen and used against me, the possibility that it's stolen and used against me, not only by a state actor, but by a non state actor? And so I think that kind of strategic analysis is something that we probably need to add on to the way in which the Heilmar questions are, are used.
Santi Ruiz
Real quick for our listeners. Will you just outline what the high marker questions are?
Jason Matheny
Yeah, there's a few different versions of these, what are you trying to do? How is it done today? And what are the limits of current practice? What is new in your approach, and why do you think it will be successful? Who cares? What difference will it make? What are the risks? How much will it cost? How long will it take? And those questions are deceptively simple. I think in answering those questions about a program that one is trying to propose to an ARPA leadership, the. Oh, and actually, there's. There's the last question of what are the midterm and final final exams to check for success, which is an important one, which is just, you know, how do you evaluate whether you're succeeding or whether you're failing? But anyway, I think the questions help to guide one's approach to designing a program. It doesn't help as much in thinking about which programs are worth pursuing at all. That is like, are we picking the problems that are going to matter the most? Because typically you're answering these questions for, you know, two or three concepts you're not thinking about. Of all the possible problems that I can work on, how do I ensure that I'm working on the most important one? This is a sort of different set of questions. There's a few different formulations of that. One is Richard Hamming at Bell Labs used to go around the Bell Labs lunchroom and invite himself to people's tables, probably in a sort of way that was irritating to most staff. But he would ask them, what are the most important problems in your field and why? And then he would ask them what they were working on. And he would typically find that people were not working on the problems that they just said were the most important ones. And then he would ask them why they weren't working on the problems that they were most important. And one, it was surprising, I think, how often people just didn't think about ensuring that they were working on the most important problems. And then second, the various barriers to working on those problems, either that they didn't think that they would get support for it, or they weren't sure that they would be successful. So there is a sort of set of questions that needs to precede the Heilmar questions, which is, are you really sure this is, like the most important problem that you should be spending your time on? Are there more important problems? Because you can. There are all sorts of technical problems to solve, and the opportunity cost per your time is really high. Because if you take on a program at an arpa, that's probably at least a quarter of your time as a Program manager, you can really concentrate on running probably three or four new programs. Beyond that, you're really delegating the management of the programs to the seta.
Santi Ruiz
So the ceta, sorry, is.
Jason Matheny
Oh, the scientific and engineering technical assistance. So those are the on site contractors who help in running the programs.
Santi Ruiz
So in your time leading iarpa, you made some adjustments to the high meier questions you guys were using. I'd love to hear kind of both about those changes or additions and what your process was for making that bureaucratic change.
Jason Matheny
Internally, we continued to use the hallmark questions, but we also thought, you know, there are important questions to add that get up. How competitors are going to respond to a technology being developed. Since we should be thinking about move and counter move and how different technology investments could either create or not create an asymmetric advantage, we should be thinking then about what happens after the technology is developed and is introduced to the world. So one of the questions is what's your estimate about how long it would take a major nation competitor to weaponize this technology after they learn about it? What's your estimate for a non state terrorist group with resources like those of Al Qaeda in the first decade of the century? Second, if the technology is leaked, stolen or copied, would we regret having developed it? What, if any, first mover advantage is likely to endure after a competitor follows? Third, how could the program be misinterpreted by foreign intelligence? Do you have any suggestions for reducing that risk? Fourth, can we develop defensive capabilities before or alongside offensive ones? Fifth, can the technology be made less prone to theft, replication and mass production? What design features could create barriers to entry? And six, what Red Team activities could help answer these questions? Whose Red Team opinions would you particularly respect on that last point? I do think that red teaming is a really important check on the strategic value of a particular technology and it's not done frequently enough. I think we typically start research programs in the ARPAS without having somebody who's assigned a kind of Red Team role of thinking. Okay, imagine we're successful. What are you going to do in response to this technology that we've just created? How are you going to counter it? If you manage to steal it, how are you going to use it? Right. That's just not something that we routinely do.
Santi Ruiz
Why not? Why is that not a common mode of analysis?
Jason Matheny
We don't do enough red teaming in general. It's interesting being at Rand because RAND has a long history of doing war gaming, you know, since the 40s of sometimes. It's just really awkward because you're trying to beat the stuff that you're investing a lot of money and time in building and you're not highly motivated to see the ways that it breaks. So I think some kinds of institutional and even personal incentives not to explore the ways in which one's own investment is potentially vulnerable. But we know from red teams and pen tests in, say, the cyber domain that it is really important to do that kind of analysis. In fact, it's maybe one of our best protections against different kinds of vulnerabilities. It's our best way of identifying them early and then being able to create countermeasures. We don't do that enough, though, when we're developing technologies, I think partly for the institutional disincentive, partly also because it takes time and investment and that's trading off against other time or investment that is seen as kind of more within the job jar of a research and development agency. So, you know, the ARPAs could say, hey, look, that's not our job is to be figuring out how to break our own technology. But it's not clear that there'd be any agency better qualified to figure out how to break the technology than the arpas themselves. They just need to find folks who are like really going to be sincere about trying to break it. You know, aren't friends with the program manager, are sufficiently kind of separated as an honest broker so that they can be a trusted third party.
Santi Ruiz
How do you institute that kind of honest broker red teams bureaucratically? Because I'm guessing, you know, if I sit at a lunch table with you every day, I may not be as good of a red teamer as somebody who, you know, has a neutral or even adversarial relationship with you.
Jason Matheny
Yeah, most of the ARPA programs will have testing and evaluation teams, but their role is to make sure in a very like, honest, credible way that the technology that's being developed is actually working and that the research is on track. Their, their role is not, hey, let me figure out how to defeat this technology. Now that the testing evaluation role is typically played by an ffrdc, a federally funded research and development center, or a UARC university affiliated research center, which have these kind of privileged relationships with government in that they're able to leverage private sector expertise, but they have reliable enough funding that they don't have to worry about irritating government, a particular government, you know, program manager, that they can just be trusted to tell the truth. And they're also just kind of like temperamentally selected to be places that just will tell the truth. No matter what, even if it's unpopular. And you know, RAND is an example of a place like that there. You know, IDA and Center for Naval Analysis and others are, I think also have that kind of shared commitment. Like they, they're just nerd magnets and they attract nerds who are like honest to a fault. And so that's really, really good. The ARPAs then put the FFRDCs to work in evaluating whether the program is successful. And they are typically not put to work on this other side, which is now pretend that you're our enemy and are figuring out how to counter this technology or figure out once you've stolen it, how you're going to put it to use. So I think the institutions already exist potentially to carry out that work. They just aren't typically asked to do it again in the cyber realm. Using that as an example, we have organizations that focus on red teaming and pen tests and then there are even competitions like DEF Con in which that kind of Red Team activity is celebrated. And we don't do that as systematically in national security or even other parts of policy where you have kind of a permanent Red Team activity. I mean, to give another example, in our foreign policy, there's no permanent Red Team, you know, at the National Security Council that said that like tries to think how is China going to react to this policy and is like deeply immersed in, you know, China thought and policy and whose sort of full time job then is to play the role of China and anticipating how China will respond to a particular US action. And that might actually be an incredibly valuable function for a cell, a Red Team cell, to exist in a place like the National Security Council.
Santi Ruiz
It's funny you mentioned this kind of institutional or bureaucratic weakness at creating versions of ourselves that try to think like adversaries. The name of this newsletter, Statecraft. I was staring at my bookshelf and looking at a book by Angela Kotavila informing Statecraft. He is a intelligence community guy who was at the Senate Senate Intelligence Committee staff for a while and his big critique of Cold War era American intelligence community was it was too bureaucratic in its thinking and unable to put itself in the shoes of the KGB and other adversaries. And maybe to zoom out a little bit, IARPA has like a kind of a particular relationship with the intelligence community. I'm curious you can talk about what that relationship looks like, how the, how IARPA and this range of other federal entities interact.
Jason Matheny
Yeah, so. And by the way, on the topic of red teams, there's a Great book by Mike Kazenko on just kind of the value of red teams and the history of red team activities that I recommend. So IARPA is similar to DARPA in that it has the same overall kind of model of program managers, programs that are run like, like research tournaments and a, a rigorous process for down selection, you know, for, for stopping funding for things that aren't working. It's the main difference compared to DARPA is that the, the primary end user of IARPA's research is the intelligence community, which is a collection of I think now, 18 agencies that collect and analyze and interpret and protect information for US decision makers. And the goal then of IARPA is to be funding advanced research that ultimately improves decision advantage in one of those, you know, in collecting information, analyzing information, protecting information. The, the model is quite similar to, to darpa. It's a much smaller group of consumers. I mean the, the intelligence community is, you know, maybe one tenth the size of the Defense Department. And because things don't need to be massive produced. I mean because you don't need, you know, multiple as many numbers of, of pieces of equipment for the intelligence community compared to the Defense Department. The, the problems that are worked on can be sort of one offs. I mean, can be more exquisite in, in that there might be just a single device that you need to be able to, to build or there might be a device that you're only going to need to make five of. And that, that means I think then that and because there's often a greater premium on, on secrecy, on both sort of COVID and, and clandestine uses of technologies, there's, there's a lot more thinking than about, about the various kinds of security around technology efforts. And in some cases, how do you, how do you even prevent it? It being an area of research that's, that's recognized. So those are some distinguishing features. The level of security is often, often needs to be higher. There's, the technology base is different. I mean for darpa there's a lot of, you know, established defense contractors. The intelligence community will often have contractors that are, that are smaller and less well known for security reasons. What else is different? All the program managers require a TSSCI clearance. That's not true of DARPA where most of the program managers might have secret or collateral, top secret.
Santi Ruiz
Let me go in here and ask about some of the kind of consequences of that smaller purchasing base of the intelligence community versus Defense and the lower manufacturing compared to most DARPA projects. Does that limit what IARPA can usefully work on or does it constrain it?
Jason Matheny
Well, I think in some ways it can open it up, because if you don't have to think about something that needs to be mass produced, then the technical space of options could offer more degrees of freedom. I think, also because a lot of the, the intelligence community's problems come down to problems of human judgment, there's also more room for investing in research and in human judgment and cognitive psychology. That I think has historically been an area of significant investment by IARPA in a way that is distinguished from the other ARPAs. The problem of collection is typically a problem of collecting electrons or photons. And so those end up being, you know, physics and electrical engineering problems. There's also, you know, chemistry problems, especially in the world today, where we're increasingly concerned about misuses of biology. So figuring out ways of analyzing chemical signatures of biological activity, or biological signatures in some cases of biological activity is something that's increasingly important. But intelligence collection usually involves a lot of physics, a lot of electrical engineering. The analysis side certainly does involve things like statistics and computer science, machine learning research, but ultimately decision making is predominantly human. And the human psychology of cognitive biases and heuristics then is really important. And that has been an area of research at IARPA that I think has been pretty unusual compared to the other arpas. And then there's the problem of information protection. And there it's kind of back to the physics and engineering problems and that the way in which you collect information is typically by breaking other people's efforts at protecting it. So on the protection side, it's sort of a cat and mouse game of trying to figure out how you would counter all of the clever things that you're doing to collect information.
Santi Ruiz
My impression of your time at IARPA is that you really championed the kind of focus on human judgment and tools to augment human forecasting. Was that an interest of yours before you, I mean, before you kind of entered the IARPA space?
Jason Matheny
I was first interested in research on, on human judgment after reading the book Expert Political Judgment by Phil Tetlock, which is, you know, I think one of the more ambitious pieces of research on the accuracy of, of human judgment about policymaking and, you know, over, you know, 20 year period, evaluating the accuracy of forecasts from a bunch of different participants about world events and just finding that the accuracy wasn't very good and in many cases, you know, wasn't substantially better than random chance. And it seemed like an important problem because the decisions that humans make can be Extraordinarily costly. And improving the accuracy of those judgments then could be worth an expectation, you know, billions of dollars per percentage point improvement. And I mean, if you think about a decision like the wars in Iraq and Afghanistan, I mean, those were multi trillion dollar decisions. And improving the accuracy either in our forecasts about what the likelihood of success would be on different time horizons, or even being able forecast the success of individual strategies being used in those two countries. If you can improve the accuracy by just a percentage point, that would be worth tens of billions of dollars. And yet society does not invest tens of billions of dollars in figuring out how to improve the accuracy of human judgment. That seems really odd. Just that as a society we have under invested in something that's so consequential and probably amenable to improvement. You know, as the work at IARPO demonstrated, you really can substantially improve human judgment through a few different interventions that are pretty robust that over different cohorts of people, over different periods of time, over different kinds of analytic questions, accuracy. So part of the reason for working on that set of problems at IARPA was just because it seemed neglected relative to what the societal benefit could be for, for the United States for decision making and national security and foreign policy.
Santi Ruiz
Did you encounter kind of pushback to making human judgment projects the level of priority that they were in your time? Were there different camps that thought that was not the most valuable use of resources?
Jason Matheny
I was lucky that I had great bosses who saw this as important. I don't think that they were completely convinced that the, you know, that the methods were necessarily going to work, but recognized the, the value of methods that would work and were willing to take the risk. And we were lucky not just in the, the leadership at iarpa, but also the leadership at the Office of the Director of National Intelligence. So, you know, having Jim Clapper as DNI and Robert Cardillo as Deputy DNI and Stephanie o' Sullivan as principal Deputy dni. I mean, the three of them were also supportive and protective of this kind of research. And I think there was also approach that we took which was not really to picking which horse was going to win in advance, but instead setting up a repeatable process in which you could evaluate methods and really creating an experimental test bed in which you could evaluate a bunch of different analytic methods for their accuracy. And you know, some of them could be based on human judgment, some of them can be based on machine learning, some of them could be based on some combination. And then, you know, you can run them all in a forecasting tournament to see whether they're making accurate forecasts or not about real world events. The goal really was to create an observatory in which different kinds of methods could be compared to one another and tested for accuracy. I was lucky that I had top cover for my bosses to do that type of work. And we were lucky that we could get world class researchers who worked on this problem in at least one of.
Santi Ruiz
The tournaments that you guys ran. And I think likely in more context, crowdsourced forecasts seemed to significantly outperform traditional kind of statistical forecasts or experts. You seem to find a lot of alpha in crowdsourcing. What's the explanation for that? To what extent can that be leveraged by the intelligence community? How useful is that to know that?
Jason Matheny
So I think the math intuition here is that if you think of a human judgment as including truth plus random error plus systematic bias, when you take a bunch of human judgments and take the average, you're tending to cancel out the random error because if it's truly random, it's just as likely to be an overestimate as an underestimate. And just averaging over a large number of observations is going to cancel that out. And if those judgments that you're averaging across are diverse, they're based on different sources of information, they're based on different mental models that will tend to cancel out the systematic bias. And so your, your estimate from taking that average is, is going to get closer and closer to the truth. This sort of math reason for taking the average of a large number of crowdsourced judgments, you know, goes all the way back to like Francis Galton and his, his experiments of getting crowds to estimate the weight of an ox. And it's been supported by lots of research since then and the work that we did at IARPA in the ACE program and in the forecasting tournaments that followed that supported this notion of averaging large numbers of judgments. There is a way of improving on that, which is finding subsets of people who are consistently more accurate than others. Super forecasters, super forecasters, and, and then taking the average of their judgments. It's still though taking average of relatively, you know, large numbers of judgments as opposed to simply taking, you know, the average of these. Super forecaster, the super duper forecaster, who's, who's consistently, you know, number one, none of them have methods of, of picking a single individual performed well, even if it was cherry picking the, you know, the single best individual for a particular field. That's a really important insight into analytic accuracy and it's One that not many institutions have have adopted. And that's peculiar because it's a research finding that's been pretty robust across different subjects. And because there's lots of institutions on the planet that you would think are highly motivated to increase their accuracy. Why don't we see these crowdsourcing mechanisms more widely used? And why don't we see, like, institutions making more regular use of research tournaments internally to find their own superforecasters? I don't have good answers to that question. It's something that, that Robin Hansen has, I think, a pretty cynical hypothesis about, which is that the leaders of most institutions aren't primarily motivated by increasing accuracy, but are more motivated by protecting their own special status or those of senior leaders. And crowdsourcing is inherently disruptive to leadership.
Santi Ruiz
This was Cody Villa's view about other failures of the intelligence community. It's hard to resist bureaucratic capture and to prevent these fundamental human challenges.
Jason Matheny
Yeah, I think you're right. And you do see other examples of this in institutions that collectively have reasons to be anxious about either threats to their credibility or their legitimacy. Having your homework graded is uncomfortable. Almost everybody would prefer to grade their own homework to determine how it's going to be graded. These kinds of research tournaments have the homework of senior leaders graded in that the senior leaders are typically around a table, you know, in a bog set process where they're deliberating and reaching some sort of conclusion. And most of what we know from research and cognitive psychology and human judgment over the last 50 years suggests that that is not the best way to be making judgments. It might be one of the worst ways of making judgments. And yet that is the norm in most institutions. It's this unstructured group deliberation.
Santi Ruiz
Has IARPA had, maybe this is a sensitive question, but has IARPA had success at pushing the intelligence community to do less of the bogsat bunch of guys sitting around the table judgment making and more along these lines, leveraging crowds and averages and statistical insights.
Jason Matheny
For a while it did for about nine or 10 years. There was a crowdsourced analysis effort that was, you know, started at IARPA and then it became a kind of grassroots activity. I mean, there were thousands of intelligence analysts across all of those agencies that were using crowdsourced tools to be making forecasts or analytic judgments. When Was this? From 2019. But to my knowledge, there isn't an activity like that right now. There are activities like that in other intelligence communities. For example, the UK intelligence community has this effort called Cosmic Bazaar, which was prompted by the IARPA effort and I think is still going strong, includes crowdsourced judgments. So I think it's the sort of activity that unless you've got top cover and unless you have a budget that's really dedicated to this and protected for it, it's going to fade away. Because managers aren't necessarily highly motivated to keep this kind of activity going. Analysts, you know, working level analysts love it. And I mean those were our strongest proponents for this kind of activity were the folks who are working on a hard analytic question and they wanted to see what other analysts across the entire intelligence community thought about it and wanted to understand the reasoning. I wanted to see whether the probability assigned to conflict in Kashmir was going up or was going down over the next month. And if some analysts thought it was going up, what evidence were they looking at? So you could turn it into an intelligence warning tool and a kind of like heart rate monitor for the analytic community to be understanding what was driving the anxiety of intel analysts. But that kind of tool it is, it is one that can be contrarian, right, because you're, especially if you're rewarding with bragging rights, you know, through like a leaderboard folks who are, are accurate about things where the majority is wrong. That is an important element of, of these kinds of crowdsourcing tools is you have to give bragging rights for folks who are correct about something that is unpopular. Because otherwise if it's just a popularity index, then it's, it's not going to be as useful analytically.
Santi Ruiz
Sure, you've talked a little bit about the bureaucratic disincentives to doing this kind of work or to, you know, rewarding low status correct decisions. I've got a quote from an interview you did a few years ago where you talk about kind of the same thing is applied to IR by that it's hard to create an organization that rewards high risk activities and to sustain it. In particular, how do you go about sustaining an organization that is an institution like this and stopping it from becoming another kind of bureaucratically captured institution?
Jason Matheny
I mean, a lot of it is just hiring great people. Almost every enterprise like 90% of the variance is determined by the people who are hired, the people who retained, the people who are promoted. And I think the arpas work primarily because of great hiring and having unusual hiring authorities that allows them to optimize hiring for people who are going to be very sharp, creative, constructive contrarians and think differently about problems where there may already be a lot of inertia inside of a bureaucracy. That you're going to have to work against. It helps to have Congress on your side because Congress is ultimately appropriating funds for these agencies. And a track record really helps. You know, DARPA having played such an important role in stealth and GPS and the Internet, is going to have great congressional support. So having having some important wins matters. And IARPA got some important wins, particularly in the classified domain early on, which helped. The other important things I think are being able to, to clearly communicate to leaders why it's important to have a, a cell of misfits. And I think a lot of leaders actually do get this. My favorite bosses all have gotten this is that you want a group of misfits who you're sort of protecting and allowing them to operate in a way that is, is different from, from other parts of an organization. I mean it should obviously all be legal, it should all be ethical. But it could still be quirky. It could be one where people are wearing shorts and flip flops, where they don't have to worry as much about the kind of cultural conventions that are common in other parts of an organization. Because scientists and engineers tend to be pretty quirky. They tend to be, you know, have a stronger kind of libertarian bent. And you want a place where eccentric scientists and engineers can feel comfortable and can work effectively. Because we need nerds and we need institutions that attract nerds.
Santi Ruiz
Is the fact that IARPA is working on often secret or covert or non public projects, does that help attract the right kind of talent? Is it a barrier to people who want their name to be on public research papers? How does that fact change hiring dynamics?
Jason Matheny
There is a kind of selection effect and that the people for whom the most important thing is is getting their name on journal articles are not going to be attracted to. So the types of people who might come to IARPA are folks who were academics and were really disappointed by the rat race for tenure or the kinds of incentive systems within academic journals that prevented publication of works that was actually really important because of the kind of publication incentive system and who want to solve problems that are really important to the country and the world for which there might not be much academic reward for folks who are coming in from industry, you know, people who didn't want to just, you know, develop the next form of ad targeting, you know, like click through rates by, but instead wanted to work on something that, you know, could save the world. And so I think that that tends to be a selection effect. The need for secrecy sometimes means then that you're Also picking people who might already not care as much about having a public profile. I think actually a lot of those that, that can be a quite positive selection effect. I mean, you know that, that, that can tend to be finding people who are really mission focused, are pretty humble, both, you know, sort of by, by temperament, but also like kind of epistemically humble and self critical. So I was, I was really amazed by the quality of people at iarpa. I think it has a really strong self selection effect.
Santi Ruiz
When does an IARPA project that's working? I mean what's the. Some IARPA projects are happen in public and some of them presumably are not for public consumption. Given IARPA's focus on tools not being used by adversaries or stolen, how do you balance working in public and covert or secret projects?
Jason Matheny
Yeah, IARPA developed a process that we called Research Technology Protection, which is before any program started, we would do a cost benefit analysis of making the program secret or not. And it included questions like, you know, where are the researchers to solve this problem? Are they people who are in academia? In which case most of them don't have security clearances so we wouldn't be able to access them. If the program is secret, does it require a mix of academics and cleared contractors? Does it require crowdsourcing itself to solve? I mean, is it the sort of problem that you want to pose as a price challenge rather than one that, you know, give it to like three teams, you give it to the world to solve? You want to figure that out in advance and decide whether the costs of secrecy exceed the benefits or the benefits of secrecy exceed the costs. And it really depends. I mean there's some problems that I would have thought of as oh yeah, this is much better to keep secret. But then you actually do the math on this and you think actually the benefits of secrecy in this case aren't so significant and we wouldn't be able to solve this problem if we tried to keep it secret. And then there are others where the opposite is true. I do think that the questions that we added to the Heil Meier catechism are ones that bear on this question of there are certain kinds of research programs that by making them known, you create a security dilemma in which one, you prompt other foreign intelligence services into thinking that you're doing something very dangerous and reckless and that you're planning to use some new offensive weapon. And you need to be really then thoughtful about what sort of information can be misinterpreted by others if they're aware of an investment. I think we also tend to overestimate our ability to keep secrets.
Santi Ruiz
Why is that?
Jason Matheny
I think, in part because when you're operating in a classified environment, you're so aware of the number of protections that you're going through. You know, the guards, guns, and gates, you know, at the front, the special rooms with, you know, air gapping and, you know, no electronic devices except those that have been, you know, specially created for this room. And so because you live in that environment, you assign a high level of confidence to it, in part because it creates so many inconveniences for somebody personally to be going into and out of that environment every day. They just sort of think, well, this is really tough. But there's so many historical instances of our protections failing in one way or another, and we should be more realistic about the base rate of things getting stolen. And if we're realistic about that, then I think that has implications for the kind of research that we pursue and how we pursue it. For example, if you develop a technology that would actually create an asymmetric disadvantage if used against you. Cyber weapons in general are a case of technology where the United States might be more vulnerable than some of our competitors because we have, like, a bigger attack surface or like a more open society. And so you should be especially thoughtful before developing cyber weapons of how those tools can be used against us, because we might actually be more. More vulnerable in some ways than our. Our competitors. The same might be true for advances in biology. So the United States has a strong T against the use of biological weapons. We have a legal set of commitments against developing or using biological weapons. We had, you know, a unilateral moratorium on biological weapons even before there was an international treaty on it. And not every country believes they're taboo. And some countries have active, offensive biological weapons programs. That's another one where the technologies that we're developing in synthetic biology, for example, can create risks that are asymmetric for us. We have a very open society, one that tends to be counter authoritarian in many ways. And so if there were a biological attack against the United States that required very rapid vaccination or antiviral distribution to the entire population. And we saw what we went through with COVID which was, you know, low rates of vaccine acceptance across many parts of the country. We're just asymmetrically vulnerable to infectious disease in the United States. And so that's another one where we should just be extremely cautious in the kinds of technologies that we develop that could be weaponized by others. So I think cyber and bio are two examples. In some ways they, they draw from the same set of vulnerabilities and open democr democratic societies, which is that we tend to have larger attack surfaces. Our degree of openness and interaction means that infectious agents, whether biological or digital, are ones that get transmitted quickly in our societies. And we were decentralized and counter authoritarian in ways that make the distribution of countermeasures quite difficult and inefficient. So examples.
Santi Ruiz
Can you give me the inverse? An example of the inverse of the kinds of things that we might be especially interested in developing because we're asymmetrically protected from them or adversaries are especially vulnerable.
Jason Matheny
Yeah, Encryption or other privacy enhancing technologies. You know, democracies have an asymmetric advantage, I think, in developing and using those kinds of technologies. Technologies that inherently involve some amount of dissent or loss of control. Large language models I think would be an example of that. The regulation around language models in China has been driven by an anxiety within the CCP of models spitting out anti party rhetoric or providing historical references that are not approved. There are others. Immigration I think is a driver of immense innovation that democratic societies tend to have an asymmetric advantage in leveraging. First, because stem immigrants overwhelmingly prefer to live in democracies. And second, because democracies tend to be more receptive to receiving immigrants and to integrating immigrants. Another might be international standards, and that the process for creating them is one in which the democratic party process of those standards bodies is one that's more compatible with how democracies think about the development of standards domestically. And then there's a few areas of asymmetric advantage that probably aren't because of some inherent advantage of democracies, but are more because of historical inertia. So for example, the moats around semiconductor manufacturing is due to a series of historical events. You know, the United States had a lead in the semiconductors, you know, going all the way back to the 50s and, and you know, they're the design firms, the tool making firms grew up here. The manufacturing ended up moving to other democratic states, but still ones that were close allies and partners of the United States. So I think, you know, there are a variety of technologies that probably have some asymmetric advantage for democracies. I think this by the way, is like a really important and understudied question. You know, the Internet used to be one where folks would say like, oh yeah, that would be an asymmetric advantage for democracy. Not necessarily. And I think it makes sense to be thinking about this as a bigger research enterprise for technology strategy for policymakers is to think about what are the technologies that if we develop them, will at the margins, advantage to greater advantage to democracies versus autocracies. Yeah, I'd love to see more work on that.
Santi Ruiz
I would love to ask about your time at cset. Obviously, you, you founded this think tank. And one thing we've been talking about recently internally at IFP is which kinds of information that we produce are most helpful for policymakers making decisions. One thing a team member noticed was that we don't often create PDFs, but the national security apparatus loves PDFs and shares all the time. We just, we are not in the habit of making them. I'm curious, maybe, yeah, at whatever level of specificity you want to talk about it, but how do you influence the national security apparatus? How do national security folks make decisions? What's the box at story for how these systems get made? And how do you play in that process most effectively?
Jason Matheny
First, I love your questions. All of your questions have been awesome. So I'm just, I'm so grateful that you're, that you're doing this work. I mean, I wish, I mean, just 20, 20 years ago, I wish that your interviews had been available because they would have been just so helpful to me. I was giving a talk recently in which I was sort of giving advice about policy careers. And I think the thing that I found most surprising about the policy world is just how much is based on interpersonal trust, because there aren't figures of merit that are easily measurable in policy. You know, for many policies, it's going to be years before you see whether they were successful. And it's a case of a group effort where the effect of an individual is difficult to isolate. So, so much of selection and advancement for personnel in the policy world is just based on interpersonal trust. And it's really important then to invest in trust and to try not to degrade it. And I think there are a few things that people who want to have an impact in policy and improve the world should probably assign more weight to than they would intuitively. And one of them is to be kind of. Because if you're in a stressful policy environment, your first instinct might not be to spend time being kind and making sure that your colleagues are doing okay, making sure that they're taking care of themselves, that they're sleeping, that they're eating, making sure you bring, you know, dark chocolate to meetings. But one, you should, because these people almost certainly could be much wealthier doing something else. They could have more free time doing something else. So there's already a selection effect where the people doing this job, you know, tend to be people who have a level of altruism that's unusual. And just finding every opportunity to express appreciation for that is really important. But also people really like being with kind people more than they do unkind people. And there are some kinds of institutions where being a brilliant jerk is tolerated. You know, I think industry and academia tend to have a greater tolerance for brilliant jerks in part because you can measure, you know, their contribution and say, you know, like, well, John's a jerk, but we can tell that he's, you know, made this like, plus player. Yeah, plus player. Exactly. You know, he's 100x programmer, whatever. You can't really do that in the policy world. And so nobody's willing or very rarely are they willing to make the sacrifice to put up with somebody who's a jerk even if they appeared to be brilliant. I know they're certainly counterexamples to this of people in the policy world being known for being abrasive and toxic. But I think it's actually very unusual compared to industry or even academia. I've just found much less evidence for that style of personality succeeding in the policy world compared to those other parts of society. Yeah, in general, my, you know, my experience was that the policy world is much more like, you know, West Wing or Madam Secretary than it is like House of Cards. It's, you know, it tends to be much more pro social and compassionate. So that's, I think one thing that at cset we definitely were thoughtful about hiring people who are not only, you know, brilliant and hardworking, but we're also kind and compassionate and the sort of people that other folks are going to enjoy working with. A second feature of, of the way we hired at, at CSAT was to pick people who really were theologically committed to finding facts and following them and were really tenacious about getting evidence for or against a particular policy and not having a strong default position on the policy question before they started working on it. And a third feature I think was finding people who believed that analysis would be improved by spending some amount of time in government. So folks who were interested in being researchers, but were also interested in, you know, spending a year or two in government after a year or two of analysis and then coming back that in fact the experience of spending time in government would make them even better researchers. So I don't know, those were some of the characteristics that we looked for.
Santi Ruiz
When we were hiring and in terms of the product, the materials and the things that you're creating for the national security apparatus, what kinds of any tips and tricks there for what the best outputs for a think tank are if you want to be effective?
Jason Matheny
Yeah, it can vary because the, the policy audience is so broad and you know, some of them want a one pager and you're not going to get them to read more than, than that. Others are going to be willing to really absorb a 200 page report or their staff is going to be willing to absorb a 200 page report. And so it, knowing your audience is I think the, the important first step. Yeah, let me think. So first picking the important problems and in the stock that I'll send you, I, you know, I, I have like at least a few heuristics that I try to follow. I mean one is, is the Hammond question about, you know, what is the most important problem in your field? I think in general people underestimate their own potential to make contributions to the most important problems. They probably overestimated how many people are already working on the most important problems and they underestimate the impact that they can have on the margins because there's so many incredibly important problems that are just really neglected. I mean like for example, this question about asymmetric advantages for democracies from different technologies. I don't think there's been, at least I haven't seen a really deep strategic analysis of that question. And there's something I call the Mummy G conjecture, which is named after my mother in law because I was helping her check luggage once and she had this like giant suitcase and I asked her who's going to pick this thing up? And she's like, if you don't know whose responsibility it is, then it's probably yours. And I think there's a lot of problems like that where people just assume that there's somebody else working on them. If you can't figure it out after like a few days worth of homework, then it is probably a neglected problem and so it's probably up to us to solve it. And then there's just a lot of important problems that for different institutional reasons are neglected and in fact problems that are neglected, that is a kind of Bayesian evidence that the problem is likely to be more cost effective to solve than problems that are popular. Because if a problem is popular, has lots of people trying to solve it, it hasn't already been solved, then it's probably really hard and your Your impact on the margins is likely to be low. So anyway, I think the problem selection really needs to be step one for figuring out what's the product. Step two is, and this really should precede the choice of methods and everything else is figure out who is the decision maker for solving this problem that your analysis will ultimately need to reach and how do they receive information in a way that's useful and that includes not just the style of product or its length or it's, you know, or its contents. Like is it just a graph rather than text or is it, is it like a dashboard but, but also its timeline. So you know, a, a project that takes two years when, you know, a decision maker needs to reach, you know, closure on an executive order in three months. I mean that, that project isn't going to be useful.
Santi Ruiz
Sure.
Jason Matheny
If it's delivered late. So the, the choice of audience needs to inform content and quality and style. It can also influence the level of certainty or rigor. I mean there's kind of diminishing marginal utility in certainty or polish. And as long as you appropriately caveat your conclusions by saying I only have 70% confidence in this, with another six months I could reach 80% confidence. I think, you know, decision makers in general prefer to receive information earlier with a trade off and certainty. As long as they understand the trade off.
Santi Ruiz
Sure.
Jason Matheny
And then.
Santi Ruiz
Sorry. No, no, please.
Jason Matheny
I was just gonna say in terms of, in terms of content, I do think there is a, a generational change that we were conscious of at cset. In some ways the model that we had been operating under is the RAND model of producing reports. And we hired from rand. You know, our head of analysis was, was from Rand that like the style of the reports looks like rand. I think it's the same type font.
Santi Ruiz
Sure.
Jason Matheny
But there is a generational change going on which is that more policymakers grew up with phones and expect apps, they expect dashboards. They don't necessarily want to report as much as they want an annotated map or a toggle that they can sort of look through a bunch of scenarios by changing what an assumption is and understand its dynamic effects on an outcome. I think that is going to change the way that we produce analysis for policymakers. I think CSAT was pretty early and seeing that shift and starting to respond to it by creating more dashboards, creating more of these kind of like tech observatories where you could actually dig into the data and run scenario analysis. I think the policy analysis world will probably need to be doing more of that over the years.
Santi Ruiz
Two last questions for you on forecasting specifically. One is, I just want to go back to your point about differential technologies, about which technologies may be asymmetrically helpful for democracies and which may not. How confident are you about this mode of analysis? You mentioned the case of the Internet, which everybody, I think assumed was a boon to democracy now is a more complicated question given that kind of, you know, case track record. How do we think about making these kinds of predictions when the possible effects 20 years down the line are very hard to know?
Jason Matheny
Yeah, I mean, I think we should treat it skeptically. I mean, we should make sure that we're really thinking from both sides of the game board. And I don't know if anybody, you know, at DARPA or elsewhere when we were thinking about the Internet sort of thought, okay, suppose I'm an authoritarian state. How am I going to use the Internet?
Santi Ruiz
Sure.
Jason Matheny
Suppose I'm a revanche state that wants to disrupt some democratic power. How am I going to launch disinformation attacks? Or I don't know that anybody was actually given that analytic task. So I think this goes again to our discussion of red teaming. I think for a question like what technologies provide an asymmetric advantage for democracies, you should have somebody who's playing the role of the autocrat, who thinks about how they will use that technology in order to break you or in order to advance their own set of objectives. And that goes for the differential technology development for, you know, safety and security too. I mean, we have some bets on, you know, which kinds of technologies are defense dominant. And we should be, you know, we should have red teams that are really thinking through whether that defense dominance is, is real.
Santi Ruiz
But I guess to push on that. It's hard for me to imagine what would have been different about the development of the Internet if you'd had, even if you'd kind of known in advance authoritarians will use this to spread disinformation and it will be firewalled in autocratic regimes, I guess. What's the positive case for how that kind of red teaming would shape that technological flow?
Jason Matheny
Yeah, I think there have been a few papers written about if we were designing the Internet protection and scratch again, like here's, here's how we probably could have done better. You know, for example, like in, you know, thinking about like packets having packets that include headers that are not completely vulnerable to, you know, anonymization, to obfuscation, to misattribution, having some way of assigning credibility to some nodes versus others having also a legal framework that would precede the development of kind of like opportunities for regulatory capture by large entities. So that you could have certain kinds of defenses against disinformation that, that would have, that would have preceded the case where you've got a handful of companies that are managing large information platforms that are no longer interested in preventing disinformation as long as that disinformation is generating advertising revenue. I think we really could have done a better job. I don't, I don't think we would have gotten it perfect. And I certainly don't think that there's a scenario where, you know, DARPA or NSF or the other, you know, federal funders would have said, eh, pass, we're not, we're not going to develop this. This looks, you know, too potentially toxic. But I do think that we could have baked in certain kinds of defenses against disinformation in the way that we had designed the Internet and its, its early regulatory structure. The same thing I think is true of, you know, some technologies that we have now that are, that are distributed. We should have been more thoughtful about DNA synthesis that if you're creating DNA synthesizers, each one of which has more destructive potential than a uranium enrichment facility, you, you probably want to figure out some way of creating some baked in level of, of security and safety. And we didn't do that. And you know, now there's commercially available DNA synthesizers that somebody could use to recreate smallpox or something worse for you know, under a hundred thousand dollars at this point. And I, I think we, we really didn't even ask the question at the time that DNA synthesizers were being developed. Is there any way of putting some form of security or safety into this technology from the start? And you know, I think for things like AI, now we're having this discussion because it still feels like early days and the kinds of tools that we're making to ask, well, you know, can you build alignment into models when they're being built? Can you do red teaming much earlier on? And now they're, I mean, I think this is actually a positive example of red teaming is that you're seeing most of the major AI labs doing red teaming, like sincerely, I mean, really trying to figure out how the tools could be misused. So I think we're going to see out of that a lot of baked in security, I mean, reinforcement learning with human feedback, the way in which that's being applied as an example of trying to bake in guardrails from the start rather than, you know, Kind of retrofit them.
Santi Ruiz
That's a really helpful answer for me. Will you send me any of those papers that you, you mentioned about differential development of the Internet as a. Yeah, yeah. Other path? I would love to. I'll, I'll follow up in an email. But last question for you before I let you go. If you had kind of full autonomy and authority over embedding forecasting work in the US Government, if you got to supercharge your IR stuff, what would be, what would be top of list? In an ideal world, how is the US Government using forecasting to make decisions?
Jason Matheny
I think ensuring that a place like the National Security Council has its key questions on, you know, a prediction market or some other sort of forecasting platform would be the thing I would focus on, having worked there and seeing the time pressure that folks are, are operating under. And you don't, you don't have time to do research. You don't have time to read very much. You're, you're operating under sleep deprivation and hunger and wisps, some of the most consequential decisions that the country faces. And you, you just don't have nearly as much time to read and collect information. So finding an efficient way of, of crowdsourcing forecasts that are conditional. If I pursue option or action A, this is likely to be the consequence. If I pursue action B, this is likely to be the consequence. I would have loved having access to a tool like that. It's also the case that the intelligence that reaches the NSC has been sort of whittled down and you don't have access to a broad view. You don't have time to ask for that coordinated analytic product like a National Intelligence Estimate, because that takes months. But what you want is really some, you know, kind of EKG for the analytic community about a particular problem at any given moment. And it's, it's also incredibly valuable in a place like the NSC to be given some indication when you might be making catastrophic mistake. And it's, it's very hard for intelligence briefers, you know, to be collecting information from enough analysts to be able to indicate whether you could be making a catastrophic mistake. Sure. So anyway, that's, that's probably the place where, where I think having access to more accurate forecasts, more timely forecasts and ones where there's strong incentives for telling the truth, even when it's going to be extremely inconvenient, that's the place where it would be most valuable.
Santi Ruiz
Jason, thank you for giving us so much of your time. I know, we're way over.
Jason Matheny
Thanks so much. And really, thanks again for doing this. It's really valuable to the world and getting folks to think about institutional design, incentive mechanisms, you know, heuristics that folks can use in order to make better policy. I think all that's really valuable.
Santi Ruiz
Yeah, it's. It's been a pleasure to do these kinds of interviews, including this one. So thanks for the opportunity, Jason.
Jason Matheny
Of course. Take care.
Statecraft Podcast Summary: "How to Predict the Future" Featuring Jason Matheny
Published on June 25, 2025
Introduction
In this episode of Statecraft, host Santi Ruiz engages in a deep conversation with Jason Matheny, a distinguished figure in the Washington D.C. policy sphere. Matheny's illustrious career spans academia at Oxford and Princeton, leadership at the Intelligence Advanced Research Projects Activity (IARPA) from 2015 to 2018, a stint in the Biden White House, and currently serving as the head of the RAND Corporation. The discussion delves into the intricacies of the ARPA model, the importance of forecasting in policy-making, and the challenges of institutional design within government agencies.
1. The ARPA Model: Features and Uniqueness
Matheny outlines the foundational elements that distinguish the ARPA (Advanced Research Projects Agency) model from conventional research funding frameworks:
Entrepreneurial Program Managers: These individuals are granted significant autonomy and budgets to pursue high-risk, high-reward research initiatives.
Research Tournaments: Researchers compete in parallel, adhering to a common set of metrics that emphasize transformative potential.
Defunding Unsuccessful Projects: Annually, projects that fail to meet predefined success criteria are defunded, ensuring resources are allocated to promising ventures.
Notable Quote:
"The three core elements are the program managers, the research tournament, and the defunding of research that isn't succeeding." [02:22]
2. Program Manager Qualities and Recruitment
Effective program managers are pivotal to the ARPA model's success. Matheny emphasizes the recruitment process, which seeks individuals with technical expertise, entrepreneurial spirit, creativity, and a high tolerance for risk. These managers undergo rigorous selection, including program pitches based on the Heilmeier questions, and must obtain security clearances—a requirement consistent across IARPA and DARPA.
Notable Quote:
"You structure your pitch in answers to the hallmark questions, but the goal is ultimately to persuade this group that you have an idea that could be transformative if successful." [04:36]
3. Balancing Risk in Research Projects
A critical aspect of ARPA's mission is identifying the right balance of risk:
Too Risky: Projects that require violating fundamental laws of physics or solving unprecedented problems are deemed excessively risky.
Not Risky Enough: Conversely, endeavors perceived as easily solvable or too incremental do not align with ARPA's high-risk, high-reward ethos.
Matheny introduces the concept of a "Goldilocks zone" for project risk, suggesting an optimal probability range for success that ensures transformative impact without untenable risk.
Notable Quote:
"There's a Goldilocks zone of appropriate probabilities—probably under 50% and over 5%." [06:06]
4. Enhancing the Heilmeier Questions and Strategic Analysis
While the Heilmeier questions provide a robust framework for evaluating research proposals, Matheny advocates for augmenting them with considerations of strategic impact and counterfactual scenarios. Specifically, he highlights the necessity of:
Counterfactual Impact: Assessing whether the research would proceed without ARPA's involvement and its broader implications.
Competitive Responses: Evaluating how adversaries might respond to the development and deployment of the technology.
Matheny suggests incorporating red teaming exercises to anticipate and mitigate potential vulnerabilities and misuse of technologies.
Notable Quote:
"We tend not to do a real rigorous assessment of... alternative efforts solving the problem." [11:43]
5. IARPA vs. DARPA: Differences and Intelligence Community Relationship
IARPA operates with a focus on the intelligence community—comprising 18 agencies tasked with information collection, analysis, and protection. Unlike DARPA, which caters to the Defense Department's broad needs, IARPA's projects are often more specialized and secretive, necessitating higher security clearances and fostering collaborations with smaller, security-focused contractors.
Notable Quote:
"The primary end user of IARPA's research is the intelligence community..." [26:40]
6. Focus on Human Judgment and Forecasting Accuracy
Matheny’s passion for improving human judgment stems from his engagement with Phil Tetlock's work on the limitations of expert predictions. At IARPA, initiatives like the ACE program focused on enhancing forecasting accuracy through crowdsourcing and identifying "superforecasters." He underscores the immense societal value of refining decision-making processes, particularly in high-stakes environments like national security.
Notable Quote:
"Improving the accuracy of those judgments could be worth billions of dollars per percentage point improvement." [05:58]
7. Institutional Challenges and Sustaining Innovative Organizations
Sustaining agencies like IARPA requires strategic hiring, protected autonomy, and congressional support. Matheny notes that the success of ARPA-like institutions largely hinges on:
Hiring Exceptional Talent: Program managers who are creative and contrarian thinkers.
Protecting Autonomy: Allowing for unconventional operations without bureaucratic interference.
Securing Congressional Backing: Demonstrating impactful achievements to maintain funding and support.
Notable Quote:
"Almost every enterprise like 90% of the variance is determined by the people who are hired." [46:49]
8. Public vs. Secret Projects in IARPA
IARPA employs a rigorous cost-benefit analysis to decide which projects should remain classified. Factors influencing this decision include the availability of researchers, the necessity of secrecy for national security, and the feasibility of maintaining confidentiality without hindering project success.
Notable Quote:
"If you develop a technology that would actually create an asymmetric disadvantage if used against you... you should be especially thoughtful before developing cyber weapons." [54:44]
9. Asymmetric Technological Advantages for Democracies
Matheny explores how certain technologies can confer asymmetrical benefits to democratic societies compared to autocracies. Examples include:
Encryption and Privacy Technologies: Strengthening democratic resilience against authoritarian surveillance.
Large Language Models: While beneficial, they also pose challenges like disinformation, necessitating built-in safeguards.
Immigration and Innovation: Democracies leverage diverse talent to spur technological advancements.
He emphasizes the need for proactive red teaming to foresee and mitigate potential misuses of emerging technologies.
Notable Quote:
"We really could have done a better job... in DNA synthesis by putting some form of security or safety into this technology from the start." [76:37]
10. Influence on Policy and Think Tank Best Practices
In discussing his experience founding CSET, Matheny highlights the importance of:
Building Interpersonal Trust: Success in policy relies heavily on trusted relationships rather than measurable achievements.
Hiring for Kindness and Objectivity: Recruiting individuals who are not only skilled but also compassionate and unbiased.
Adapting Communication Styles: Tailoring outputs—ranging from one-pagers to interactive dashboards—to meet policymakers' diverse needs.
Notable Quote:
"The policy world is much more like West Wing or Madam Secretary than it is like House of Cards." [68:27]
11. Forecasting in National Security Decision Making
Matheny advocates for embedding robust forecasting mechanisms within key national security bodies like the National Security Council (NSC). He envisions tools that offer real-time, crowdsourced predictions to inform high-stakes decisions and preempt catastrophic errors.
Notable Quote:
"What you want is really some kind of EKG for the analytic community about a particular problem at any given moment." [80:56]
Conclusion
The episode underscores the critical role of innovative institutional models like ARPA in advancing national security and policy-making. Jason Matheny's insights reveal the delicate balance between fostering creative, high-risk research and ensuring strategic, secure implementation. Emphasizing the enhancement of human judgment and the importance of proactive strategic analysis, Matheny advocates for a future where forecasting and collaborative intelligence play pivotal roles in shaping effective policies.
To explore more interview transcripts, subscribe at www.statecraft.pub.