
Loading summary
A
Hello and welcome to Sigma Nutrition Radio. This is episode 604 of the podcast. My name is Danny Lennon. You are very welcome to the show. Today we're going to be getting into one of my favorite things to talk about personally, and that is some of the aspects about interpreting nutrition science, some aspects related to research, critical appraisal of that, and how the field can improve going forward. So we're going to get into a lot of topics that might seem dry at the surface, but I think are really at the center of how we can interpret studies and understand if a particular study gives us information that is useful or actually tells us what it claims that it does or can answer a particular research question that we want. And so these are concepts that are fundamental but are very often overlooked or poorly understood. And so to go through some of these concepts, I'm going to be talking with Dr. David Allison, who is Chief of nutrition and director of the USDA's Children's Nutrition Research center at Baylor College of Medicine, where he's done research in nutrition, obesity, rigor and reproducibility of scientific evidence. And he is one of the people who I really, really value his perspectives on evaluating research and doing good, quality science. And his work has been especially recognized in some of the areas we're going to be talking about today related to statistical reasoning, research methodology, improving transparency and trustworthiness in nutrition and health sciences. And as you will see during our conversation, he is someone that speaks with real precision and accuracy about some of the most fundamental things within the field. And I think if we're able to take a few of the ideas and concepts that he discussed today, it will massively improve your ability to critically applaze research and understand some of the errors that go on. This is one of the episodes where having maybe a couple of listens will be really, really useful. Some of these concepts are certainly ones you want to maybe investigate a bit deeper afterwards and practice and apply with. Also, for those of you who are Sigma Nutrition Premium subscribers, you will of course get detailed study notes to accompany this episode, which I think will be particularly useful here, as well as an edited transcript and the Key Idea segment after the interview. For those of you who are in the public feed of the podcast and might want to take a look at our premium subscription, which gives you these extra educational tools to retain more information from your listening and to really use it as a learning tool, then that will be linked up in the description box where you're listening right now. Check that out, see if it's for you it's the direct way to support the podcast. So I very much appreciate anyone who does that. And that will all be linked up there for you or over on Sigma Nutrition.com Also, they're linked up in the description box will be the episode page, which includes any resources we might mention throughout the conversation. So that is it. Please enjoy this conversation between myself and Dr. David Allison. A very big welcome to the podcast to Dr. David Allison. Thank you so much for taking the time to join me today.
B
Well, thank you, Danny. It's truly a pleasure to be here with you.
A
I'm really looking forward to this. I think, as I mentioned to you previously in some of our communications, I really like not only your thoughts and your insights that are based in your experiences, but how you address some of the, I think, most pertinent issues within nutrition science. And that's what I really want to get into here today. How can, going forward as a field, nutrition science do better? How can we get better quality answers to the questions we care about and have some degree of rigor in that? But before I get to my questions, maybe for people listening, can you very briefly give them an introduction into your work, your academic background, your interests, anything else that might be relevant to what we're going to get into today?
B
Sure, I'll be glad to. My original training is as a psychologist, but I've always been a scientist at heart. And by that I mean I'm someone who likes to look at things in the world, wonder about them. Wonder is a wonderful thing. Think about how did they get there? What do they do? How does this work, is what I'm seeing, what's really there, and so on, and then start to figure out ways to try to answer those questions, whether it's by asking somebody else or collecting some data. And even as a kid I did that. And often when I would ask someone else and get an answer, I we follow up with a question that not everybody loves it when a kid asks, which is how do you know? And are you sure? But those are the hallmarks of a scientist. And so I've sort of been a scientist for as long as I can remember. People often ask me, how did you choose to become a scientist? And I say, there was no choice in the matter, it's just what I am. And then I've sort of evolved to be a statistician because I'm a skeptic and I don't like taking anybody's word for anything. And so I felt I needed to understand statistics, so I kept studying statistics until people seemed to think I was a statistician. And then I stopped arguing and said, okay, if you think I'm a statistician, I am. So I'm a statistician, a psychologist, an obesity, nutrition, energetics, aging researcher, someone focused on rigor, reproducibility and transparency and trustworthiness in science. And I have the privilege of leading the USDA Children's Nutrition Research center at Baylor College of Medicine and Texas Children's Hospital.
A
We're obviously going to dive into a number of those elements you related to really a more global sense of epistemology and how we apply that to nutrition. Not only what we know, but more importantly, how did we come to know that? And so before we get into maybe the specifics, let me lead off with a perhaps overly broad question. But feel free to go in which direction you wish. And that's given that we're thinking about in the spirit of scientific inquiry, we want some degree of rigor in the field. And I think this is where many valid points are made around some of the problems that nutrition as a field faces. When we want to have a rigorous scientific inquiry in whatever field specifically for us, that's nutrition, what should that mean? What things are we actually looking for? And how do you conceptualize that for people?
B
That's a very challenging question and it's one I struggle with regularly and think about often. There's a wonderful book by Atul Gawande called the Checklist Manifesto and I'm very enamored of it. It talks about using checklists in things like surgery to reduce the number of mistakes. And it seems to be very effective. And I've tried to think, well, what's the checklist for a research project? And I've struggled with it because research projects vary so much that it's hard to think of a single checklist for the astronomer and the cell biologist and the human clinical trialist and so on. And even within those domains it's hard to think about them all. But I think if we think about the scientific process very broadly, it seems to have a few major steps. And one is conceiving of a question or identifying a topic in which one wants to find something out and then structuring that question to be precise and answerable and asking oneself and checking is it really answerable? Not all questions that can be grammatically posed really have meaning. And I think when we get into some things like the so called debate of carbohydrate insulin model versus energy balance model, I think it may behoove us to ask, is there actually a question on the table that makes any logical sense? That's the first step and we can have some checklists there. The second one is, okay, now design the study. What are you going to do to collect some information that bears on that question? The next step is to execute the protocol. And you've got to execute it well. You've got to record your measurements properly and so forth and faithfully. The next step is, is you've got to analyze the data and the next step is you've got to interpret the analyses and then finally you've got to communicate what you found. So those are the steps involved. And at each point there are things that can go right and things that can go wrong and things be better or worse. Every single one of those points we need to ask, are we doing this as thoroughly as possible? Are we checking for common mistakes? Are we documenting what we've done? Are we being transparent in what we've done? My own belief is that rigor is compromisable. We always want the most rigor we can get. But not every study deserves equal rigor. Some studies are very important. If it's for example, a registry study of a clinical trial for a drug where safety and very serious matters are at hand, we probably want that to be the greatest rigor we can achieve. On the other hand, sometimes it's a very quick test of some observational question. Is A associated with B? That's actually not really of great importance. Or it's very preliminary and we might have less rigor. We might use self reported heights and weights, for example. And I think there's nothing wrong with that. The key is to disclose it so that if the reader knows that we did these things that were not so rigorous, then they can make their own judgment about how much confidence to put in that study. So that that's, I think, a general approach.
A
A lot of us can have a focus on how maybe a study was executed when it comes to interpreting that study. And we, we look at that. Whereas really, as you've alluded to there, we need to look at this at various stages all along and there's probably no way we can put specific numbers on this. But as a general point, how would you think about how prevalent or not we actually see in the field of nutrition problem with a particular study or set of studies might not be in the execution in and of itself, but rather that either the design or the analysis that was used, or whatever the case may be, was just incapable of answering the question. We want it to answer or ideally it should be able to, if that question makes any sense at all.
B
Well, your question certainly makes sense. I have no numerical information on this. I can only sort of give you a gestalt gut feeling on it for what that's worth. When we have the opportunity, when I have the opportunity to get a view into those things for any particular study, I see not in every paper or every study, but in most, I see some things that could be done have been done differently and perhaps better in every phase. Now in some cases those differences where the things that I think might have been done better or differently are relatively minor or modest. And I would still describe the study as having some validity that is not being completely invalidated by the error. And by an invalidating error, my group, we use the term to mean an error which if corrected, could either change the results in conclusion or would change the results and conclusions. So sometimes we don't know that it would, but we know that it could. And in either case we consider those invalidating errors. Even if it didn't change the results and conclusions, we still consider it an invalidating error. If it could have. Hard to know in all cases. But when we get a peek into it, we often see it. We see it at the level of the logic and I've mentioned one example just now, we could talk about other things with ultra processed foods or what have you where the very logic of what's being asked is open to doubt. We see it in the premises where the premises of seed are not valid. We see it in the methods where the measurements may not be valid. We see a great deal of that with food intake measurements. We also see a great deal, well, not a great deal, but we see it in other areas. There was a very controversial, interesting case recently you probably know about involving so called hyper responders and whether hyper responders to a ketogenic diet who have high LDL cholesterol are actually at risk from that high LDL cholesterol for coronary artery plaques. There was a challenge there with the measurements or the image analysis of the CAC measurements. And so that's a measurement validity problem. As we go into analyses, we see this very often. We see what's called a DINS era, which is difference in nominal significance. That's a term my group has introduced in which one looks at the treatment group, for example in a randomized controlled study, this is one example and says ah, the treatment group changed and P is less than 0.05. Let's just say for discussions 0.049. The control group didn't change, didn't lose weight or lower their cholesterol or whatever. And the p value was 0.051. So it was significant in the treatment group and not significant in the control group. Therefore, I declare that there was an effect. And that's statistical nonsense. You can easily mathematically prove that that's a grossly incorrect procedure, but that's still sometimes used. We also see with cluster randomized trials, which happen to be very popular in childhood obesity, community school and other intervention programs, very, very frequently those are misanalyzed to the point of being completely invalid. And there's unfortunately a great deal of discipline wrong literature that purporting to show certain things are efficacious or effective for childhood obesity when the studies have not shown that. And then we get to interpretation. There's a great deal of confusion there, particularly in areas about treatment response heterogeneity. It links back to the design and sometimes the analysis. That is, there's a great deal of discussion by clinical investigators of observed treatment response heterogeneity. And what they're really observing is variability among people in outcomes, like weight loss, as an example, but not in response. And they conflate outcome with response. If outcomes were responses, we wouldn't need control groups and studies. The very fact in these control groups recognizes that the response to treatment is not the same as the outcome achieved. But people seem to forget that when they go into treatment response heterogeneity, and they say, look, some people lost a lot of weight, some people lost a little weight, some people even gain weight. What great treatment response heterogeneity. In which case we have to say, no, there's no evidence for that at all yet. There might be treatment response heterogeneity, but that's not evidence for it. So you have an analytic and an interpretive problem. Many people misanalyze and misdesign their studies there and misinterpret them. And then in terms of dissemination, that's where it really gets squirrely. It is not at all uncommon to see in the paper, in the results section of a paper by an astute set of investigators, a very clear statement of something like, on our primary endpoint, there was no statistically significant difference between the treatment and control group. And then the abstract contains some other statement in the conclusion, like, this is a promising treatment with some apparent effectiveness in women, let's just say as an example, because later, as a post hoc analysis, they looked at women and men separately and found that seemed to be affected in women, but not men. And it's not mentioned in the abstract that was post hoc, or it's not mentioned that it wasn't on the primary outcome, it was on a secondary, or it wasn't on all subjects, it was only on subjects after they eliminated some subset. And then it's less tempered in the abstract. Then you go to the press release from the university and you get more spin. And then you go to the interview that's published in the newspaper or the online news article with the investigator who suddenly has forgotten everything he or she said earlier about it wasn't the primary outcome or in the observational association study, that it was an association, not necessarily causation. And suddenly you hear words about causation, impact, large effects, outrage, miracle cure, et cetera. So these are all the problems we see.
A
That answer is so rich with a number of issues that I think understanding that these issues arise, being able to identify them, would eliminate so much of the confusion that we have around nutrition, even for people within the field and within academia. And so love us to start working through some of those that you've raised in a bit more detail just to make sure that everyone is clear on examples of where these come up. But exactly what is happening here and why it is so pertinent? If we start first with that DINS error, that's the difference in nominal significance. And in a simplistic form, we're talking about situations where someone can see that in one group we see a significant finding from pre to post test. So at the end versus the beginning. And then in the other group we do a pre versus post and we see that there's a no significant finding. The issue then comes people start trying to make conclusions relative to the difference between groups based on this, as you've noted, this is a significant error that can lead us into huge problems. Can you maybe just restate that again? And then maybe to put a bit of color on this, why does this mistake actually remain so common? Because one thing is that it happens when people are interpreting studies. You see it all the time on social media. People look at a randomized controlled trial and are making conclusions of this type based on a significance finding in one group from the start to the end of the trial and maybe not in the other. However, this, as you noted, is something that is still happening within the studies and being reported by the study authors themselves. Why is this such a problem? And then why do you think this remains to be so common?
B
I think it's a problem for two reasons. One is, you might say is innocent and the other is not so innocent. The innocent problem is that many non statistician investigators don't understand statistics very well. The most common type of statistical inference we use and inference meaning, you know, you're sort of making it not just describing what you see in the data, in the sample data you have in your hand, but you're making an inference to what's the case in the population from which you've obtained that sample. That's what you really want to know about. And so the process of inference. One school is called the frequentist school and is by far the most common one used in the health sciences. And it involves the P values and the confidence intervals and things like that that you're used to seeing. The logic of frequentist statistics, I believe personally is sound. Some people don't, but it's not intuitive. Most people want to know about the probability of the hypothesis they're testing, for example, being true. So, for example, if I say to you, well, what's your hypothesis here, Danny? And you say, well, my hypothesis is that the treatment group would lose more weight than the control group. Or better yet, better said, treatment causes weight loss. And I say, good, that's terrific, Danny, we just did the analysis for you and the P value is less than 0.05. Let's say it's 0.04. You say, oh great, David, so that means there's only a 4% chance that the treatment doesn't cause an effect and there's a 96% chance that it does. I say, no, no, no, Danny, sorry, you got it backwards. You've made what's called the prosecutor's fallacy. You're interpreting the probability of A given B as the probability of B given A. You're interpreting my number. I gave you that P value as the probability that the hypothesis is true or false as opposed to the probability of obtaining the data you observed or data more extreme departure in terms of its departure from what would be expected under the null hypothesis if the null hypothesis were true. And then you might say, david, what the heck did you just say? That is a mouthful. I said, Danny, it's really. We observe the probability of the data given if there was no effect. That's what 04 means, not the probability that there is no effect. And you say, well, David, I don't care about the probability of the data. The data are in my hand, I know what they are. I care about the probability that my hypothesis is true or not. And I said well, we don't answer that, Danny. Sorry. We give you the answer to the question we want to answer. It does bear on it. If we say there is some chance that your hypothesis could be true, it's conceivable. And the probability, if there is no effect of obtaining data that look like this is very low. It strengthens our belief legitimately, in my opinion, that there is an effect, but it doesn't tell you what perhaps you really want to know. And to get to that, one has to adopt a different framework, usually called Bayesian. And that's uncommon, a less common approach. And that has issues too. These things are very difficult concepts. For some, they're not at all intuitive. So I think that's a big part of it. Probability is just not intuitive if you know. The best example of all is the famous Monty hall problem. And I don't, in the interest of time, perhaps. I mean, I can go through it if you want, but I suspect you don't want me to go through it. But the listener can look it up. It's a fun problem. And what's also fun is the social history of how many professional doctoral level mathematics and statistics professors get it wrong until it takes them a long time to conceive it. So probability is very counterintuitive and a lot of people make just innocent mistakes. And I think a big problem we have in the health sciences is we don't have enough professional statisticians involved. Too many investigators either believe they and their research team have the statistical skills to do the analysis and interpret it because somebody took one master's level course 20 years ago in statistics and knows how to turn the computer on or just doesn't have access to a statistician because there aren't enough available or they don't have funding for it. That's a big problem. You know, we, we tell a joke sometimes in our group that's. It's not original to us, but someone from the surgery department, a surgeon, calls up the statistician professor and says, I'm going to do my own statistical analysis on this trial. I did, so I don't really need your help. I'm just hoping you can recommend a good statistics textbooks for me. And the statistics professor replies, oh, that's wonderful. I'm glad you called because I also wanted to do some brain surgery, but I'd like to just do it myself. Can you recommend a textbook for me? And the point is, you know, we wouldn't trust the statistician to do that. You know, I know what Bernoulli's principle is, and I understand how a combustion engine works, but do you want me to personally be the one checking the 747 plane before you get on it? Probably not. Probably want a professional mechanical engineering expert. So that's a really big problem. But if everybody had a professional statistician analyzing every one of their studies, we wouldn't possibly, under current circumstances, have enough statisticians available. So we really need to think this through and figure out how to work it out. And I'm not sure I know the solution. The other cause is not so innocent. Many people want to publish a statistically significant result either because they think it makes the paper more interesting or they really believe the hypothesis. They just don't want to tell you that I still might believe my hypothesis that this dietary supplement as an example, causes lower cholesterol or weight loss or more happiness or whatever you think it causes. And you're entitled to your belief, you can believe anything you want. But they want to be able to say my evidence supports it. And maybe they don't get the statistically significant result on the proper analysis, but they do. See if they do this so called dins error, then they can claim it's effective. And I think that's nefarious or malfeasance. And when you see a research plan published or a protocol paper or clinical trials registry a priori, and it says we're going to do this proper analysis, but then the improper analysis is reported, you kind of have a good sense that this was not so innocent.
A
And some of those nefarious drivers I certainly want to return to later. Some of them, as you mentioned, owing to the incentives at play that are related to maybe pressures within academia or just our current peer review publishing system that we have, amongst other things. And I also want to return to this central importance of statistics to be able to not only interpret studies properly, but make sure that we're doing proper science. So maybe let me work through a couple of the things that you said. And given that we are including myself, not statisticians, having a grasp of certain key concepts related to statistics can maybe head off some of these potential problems, or at least that we can maybe make some note of them. When someone is erroneously interpreting a study, either intentionally or not, or spotting problems in actual papers, as you've mentioned, there's maybe confusion around what probability actually is, how to interpret P values, what we're actually testing. Certainly, at least from a frequentist perspective, what we're actually testing were really asking the question if the null hypothesis were true, what is the probability of observing these particular data or data more extreme, as opposed to what maybe people might typically tend to think around that we're testing the alternative hypothesis. So there's a real key importance of understanding some of these concepts, one that I wanted to return to that you mentioned a bit earlier, relates to the heterogeneity and response, and this is in particular, I see it come up quite a lot where you sometimes have actually a researcher promoting the work, maybe on social media, or you have other people talking about a study and they might hold up trial data and they will misinterpret some of these findings to say, well, look, we have these two different interventions, or we have these two different groups and based on this response, and they'll show all the individual data points from that study, they start labeling those people as responders or non responders and then go on to make a further claim that, well, this means that for some of these people this diet is better and the other one is worse. And for other people it's the converse, and they're making that claim based on this one particular trial, which is not. It's not the question it's set up to answer. So could you maybe just from your perspective again, explain some of this confusion people have when it comes to this heterogeneity, how maybe they're taking trial data and making claims about individualization of response or making conclusions that aren't actually what this data is able to provide us with.
B
This is one that's particularly irksome to me because so many methodologists have written about this. And yet it just doesn't seem to get to most investigators who are not statisticians. There seems to be this presumption that, gee, I see different people lose different amounts of weights or have different outcomes in terms of happiness or sobriety or muscle growth, or again, whatever somebody's studying. And therefore there's great heterogeneity of response. I've even heard it in FDA advisory board meetings, and I've heard the fda, you know, look to the sponsor and say, can you tell us more about these non responders? And, you know, I sometimes want to jump up and scream, there's no evidence that there are any non responders. You're just seeing variability in outcomes. We don't know what that means. And the reason is, let's just take an example and let's take weight loss. Let's suppose that you and I are both in a clinical trial and we're both in the treatment group. And on average, the treatment group loses 15 kilos and the control group loses 5 kilos. So now we have very good assuming everything else is good. With this study, we have very good justification for saying the treatment causes a 10 kilo weight loss on average. And we've estimated the average effect. We had to subtract the placebo weight loss out from the treatment mean weight loss. So now we know the mean effect, or at least we have a good estimate of it. And let's suppose, though, that in that trial, in which case, as I said, the average weight loss in the treatment group is 15 kilos. Let's suppose that I only lost 5 kilos, you lost 20 kilos. It's tempting to say you had a very good response, a better than or higher weight loss response than average, and I had a lesser one. I'm a poor responder or a weak responder or whatever you want to call it, and you're a high or intenser good responder. But here's the problem. If you and I were in the control group, we also might have lost different amounts of weight. Maybe in the control group, had you and I been in it, I would have in fact gained five kilos and you would have lost five kilos. So there were factors affecting our weight loss other than our responsivity to the treatment. Maybe during that interval you got a horrible case of the flu and it caused you to lose an extra five kilos and you hadn't yet fully recovered. And so we see the 20 kilo rather than 50 kilo weight loss for you. But really only 15 were due to the drug, or I should say 10 because we have to subtract out the placebo. Only 10 were due to the drug, and the remainder was due to the fact that you got this absolutely hard case of the flu. In contrast, let's suppose that I moved next door or to an apartment that was right above a donut shop. And I love donuts. And every day I smell the donuts when I walk into a building and I can't resist them. And I start eating a few donuts every day. And that's what counteracts some of the weight loss from the drug. That would have happened in the placebo group too. And so it's not differential response, it's just differential outcomes because of the swings and arrows of outrageous fortune, to quote Shakespeare. And that needs to be taken into account in the analysis. You can do that simply if you have observed things that you think moderated. So if you think, for example, age or sex or starting BMI or geographic location, or whether you have low insulin baseline levels or high insulin baseline levels or particular genotype. You can, if you measure those things, you can include them in your statistical model. And what you need to test for is an interaction between that and the treatment assignment. What too many clinical investigators do is if they do that at all, they just look at the treatment group or they don't even do a control trial, they just treat some people and then they say, I can predict outcome with whether you are a man or a woman or this geographic area or not, or have this baseline insulin level or not. And they mistake predicting the outcome for predicting the response. What you need to do is have the control group and then you need to look at an interaction between those factors and treatment versus control assignment as a variable. And if you get that statistical interaction, then you've done everything else right, then you can claim that there's some treatment response heterogeneity with respect to those pre randomization variables. If you want to look at all of the treatment response heterogeneity variants, which includes the things you know about and could measure and the things you didn't know about or didn't measure or didn't include in your statistical model, then you need a completely different approach. And often probably the only or the most valid one is something that involves a crossover design, but not a conventional crossover Design. The Ordinary 2 by 2, 2 treatments by 2 period crossover design won't do it. What you need is a multi period crossover design with multiple sequences. And you need to have for each treatment at least two periods in which the treatment is applied. And people very rarely do that. We've found a few studies that have been set up like that that we can analyze that way, even though the original investigators didn't, didn't plan it for that. But they use what's called factorial designs. And we're able to extract that analysis from the factorial design. And we have a paper that's in review on that, in which one of my postdocs, Dr. Reddy, is the lead author. And we find that in some things we get statistically significant evidence of treatment response heterogeneity, and in some we don't. We say, you know, it just looks, as far as we can tell, there's no strong evidence that people or mice respond differently to this treatment.
A
On that, I do have a question maybe I'll return to a bit later related to the use of crossover studies in nutrition. But one thing before I forget that I wanted to Pull back on you mentioned a bit earlier, Dr. Allison was in relation to the sometimes nefarious, sometimes maybe just bad practice that can go on, that a bit of statistical understanding can help people realize why it's such an issue. And that relates to the use particularly of secondary outcomes. And so oftentimes, I think everyone is relatively familiar that we have this primary outcome that is the main subject of what we're trying to answer within a particular study. But there will be certain secondary outcomes listed as well. And sometimes you see in studies where there's a whole range of secondary outcomes, a big long list of things. And I think for someone who maybe is coming in without a background in this, they might say, well, what is the problem with that? If we have a study and someone just is going to go and measure everything they possibly can, because we have access to these participants right now, and then after the fact, we can see kind of what comes up. Obviously there's a significant flaw in that, but I think there's also a bit for those who are very familiar with that and understand the maybe the limits of looking at maybe secondary outcomes compared to a primary outcome, that maybe don't take the time to actually think about the implications from a statistical standpoint. That is, if we think of the more secondary outcomes we can list and we just keep testing more and more things, and we then we think of the potential for false positives, false negatives within our data, at huge numbers of things that we're testing statistically, we can pretty easily demonstrate why this becomes a problem. So with that ramble aside, can you maybe mention for maybe people who are not in the field, why is it so problematic if someone is just going to measure a whole bunch of different outcomes that they are not our primary outcome, and then after the fact, just see what ends up being a significant outcome. And I suppose the real thrust of the question is if we do find significance within that particular secondary outcome, let's say, why does that not necessarily guarantee that's a trustworthy piece of data?
B
Let's say, okay, really good question. And it's a very challenging question because it brings up these philosophical issues as well as practical issues. The practical issues are socially difficult, but not intellectually difficult. And to me, the big practical issue is just disclosure. There's a wonderful book about statistics that one could probably read in two hours on a plane, contains almost no equations by Robert Abelson. And the title of the book is Statistics as Principled Arguments. And at one point in the book, I'm paraphrasing here Abelson says something like, students and colleagues often come to me and they say, professor, can I do this in my design or analysis? And he responds, you can do anything you want as long as you are prepared to accept the epistemologic limits that that choice makes upon you. And I agree with him. But I would add further, you can do anything you want if you accept the limits. And here's the important end and you disclose it to the reader. That's the issue of transparency. So you could do any good, bad, stupid, intelligent analysis you want, in my opinion, but just tell the reader what you did and then the reader can make their own judgment. That's the key practical thing. And if people don't do that, if you analyze a thousand endpoints and you just publish the one that came out significant, or you analyze your data a thousand ways and just publish the ones that came out significant, but you don't disclose that, then we can't properly interpret those statistics you give. Now let's jump back to the much harder issue, which is the philosophical issue. Let's suppose that what we're actually looking at is the effects of, let's say, some elements of self reported dietary intake on some outcomes of interest. And we're looking at a big data set and there are many, many things we might call outcomes. There are things like body composition measurements and metabolism measurements and health outcomes. And many of those can be scaled different ways. So we've probably got thousands of outcomes we can find in a big data set like the nhanes, for example, and we've got thousands of exposures of how much broccoli did you eat? Or how much cauliflower, or how much. What's the ratio of broccoli to cauliflower consumption? So close to infinite. Number of things I can put in if I allow myself all these, if I do all of the analyses, thousands upon thousands of them easily get into the millions, right? If I have a thousand outcomes and a thousand exposures, right. I'm, I'm up to a million now. I'm sorry, a billion. These lead to the idea that you can just find any random nonsense and something will be statistically significant eventually by chance. And we shouldn't have very much confidence in that. And most people intuitively say, if I'm going to do something like that, I need to apply some correction. The most well known one is called the Bonferroni correction. And you basically, I won't go through what the Bonferroni correction is, but you apply a much more stringent test, right? So instead of saying the P value of less than.05 is good enough, I'm going to take 05 and I'm going to divide it by the number of tests we did. If that was a million tests, then it's has to be a significance level of or P value of less than.05 divided by a million before I get to declare it statistically significant. Now most people intuitively say, yeah, that kind of makes sense. Otherwise it just looks like you're as they so called the Texas sharpshooter problem, which is just shoot everything all over the side of the barn and then go find where there's a cluster of bullet holes and draw a target around it and say, look, I hit the bullseye. Most of us feel that doesn't make sense. And not a lot of argument about the need for some or the value of some multiple testing correction here. But now let's take a different situation. Now let's suppose that you and I are both interested in the effect of avocado consumption on mood. We both do a study of 100 subjects, 50 randomly assigned to eat an avocado for lunch and 50 assigned to eat something else. And then we measure mood using the exact same scale and we get the exact same result. And that is the P value for the estimated effect of avocado on mood is 0.049. Now interestingly, in addition to the measure of mood that we both used, which was, let's just say some self report thing, suppose you were ambitious and you said, well, I don't know if people are always going to report honestly or accurately. I'd like another look at this. So I'm going to have outside observers rate the apparent mood of the subject. So now you have two variables as an outcome. You have subject reported mood and observer reported mood. I only have one subject reported mood. Now we say, well Danny, if you're going to analyze them both, you need to do a Bonferroni correction. So in order for you to declare that you have a statistically significant result, your P values need to be below 0.025. On the other hand, I only tested one outcome, so my P value only needs to be below 0.05. But we both got the, we did the identical study and with respect to self reported mood, we got the identical result. And by the rules of the game, I say my study shows to a reasonable degree of certainty that eating an avocado leads to this effect in mood, p less than 0.05. You say by the rules of the game My study does not show that eating an avocado has this effect on mood. P equals 0.049, p is greater than 0.025, not statistically significant. We have the identical result, but opposite or different conclusions. Only because you were ambitious enough to include an extra measure in your study. That seems counterintuitive and silly, but it's no different than the big one. It's just smaller, the one that's big. Lots of testing. And many astute statisticians like Dr. Kenneth Rothman, former editor of American Journal of Epidemiology, said we shouldn't use multiple testing corrections. There's no right answer to that. It's a matter of judgment. But I think what we want to do is be transparent with the reader. So we want to give the exact P value. So instead of you just saying not significant and me just saying significant, if we both say and the p value is 0.049, then again, readers can judge for themselves. If something wants to do a Bonferroni correction on yours, they can. If they don't, they don't have to. You've given them all the transparent information.
A
I mean, that's a recurring theme that we want, this transparency in how things are going to be reported, but also that there's some thought gone into that as well with respect to the P values in general, never mind some of the corrections. There's obviously huge debate within the field of statistics around that. We could probably spend hours only focusing on that, but sadly we don't have time here. And to respect the time we do have, I'll maybe start trying to get to my final question or so. There's a whole host of things I could ask you about, David, but one thing that I think is worth coming to is in relation to nutrition broadly, there's obviously much debate about where the field is going, where it has been up to now, what areas can we improve. There's a whole range of different perspectives on that, and one that particularly maybe applies in the case of nutrition epidemiology. But we could also think of this more broadly within the field of nutrition, given the type of exposures we're looking at in nutrition, given the outcomes we're looking at, and all the factors that influence those that go outside of nutrition in comparison to other fields we are working oftentimes with effect sizes that are relatively small, let's say. And so then when we start thinking about a number of the potential measurement errors that can happen, this starts to become tricky and is where we get into a Lot of the discussion and perspectives that people have of where we need to go. So when you think about the general issue of this measurement error that is always going to come up, the effect sizes that we're working at within nutrition, all the challenges that come with the type of exposures and outcomes we are looking at. And again at the fear of trying to get you to condense what could be hours worth of discussion into one particular answer for you. What are the most pressing issues that you think that as a field nutrition could and or should be doing much better. Now that would give us that biggest bang for the buck in terms of answering those questions we care about.
B
I think I can provide a few things in my personal views, not everybody would agree what we could do to make more rapid progress. One is we could do a stronger culling of research questions to decide which ones are really important and which ones both important and answerable and which ones are either less important or less answerable. And then, and I'm thinking mainly about effects on health outcomes or other outcomes of interest in humans. And then for those that we think are answerable and important, we really dig in and we do much more powerful, better studies. For the others, we just do a little less of them and accept more uncertainty. That's the first. The second thing is I think we really need to work on our methods and I think we need to stop spending so much time debating about things like the value of self reported food intake and trying to tweak it in tiny ways and make it better, or to apologize for it and think that by apologizing for its limits that made the limits go away. And we need to just say that's a terrible method and let's really move a great deal of the investment from using those methods to coming up with better methods that are based on biochemical and other types of tests and we'll have more validity if we can work them out. The third thing I would say goes to sort of the social behaviors. And I think there are three things we can do not just in nutrition, but in science in general, nutrition in particular, that would go a very long way. The first is get all trials and as many other types of studies as possible pre registered publicly in a searchable database. That's starting to happen. We're not completely there, but we've made a lot of progress on that, including having a pre specified data analysis plan. Second thing is that upon publication of the study, all raw data and all statistical code for reproducing every number recorded in every line of text every table, every figure in that paper are made publicly available online immediately upon publication. None of this nonsense of data are available upon reasonable request. And then when you write to people, you find out that they don't consider your request reasonable. And then the third thing is, is with exceptions, there will always be exceptions. But with exceptions, that should be rare. Publication of trials should be mandatory. That is, no matter how boring you think your result is, no matter how little you think it'll help you get your next grant, tenure, promotion, fame, you are obligated to publish it. And I think that should be a condition of IRB approval. It should be a condition of accepting the grant funding, assuming you accept grant funding for it. It should be a condition of working at a nonprofit institution like a university. And if people don't comply with it, then I think the IRB should say, we're not approving anything else for you until you do this. And the grant funding agency should say, we're not giving you any more grants and you're not eligible to apply unless you do this. And so on.
A
There's a whole host of things I had planned that I would love to talk to you about. I will leave you with the very final short question I always leave the podcast on, and it is simply if you could advise people to do one thing that would have a benefit for any aspect of their life each day, what might that one thing be?
B
Question Everything fits in perfectly with what
A
we've discussed today and brings us full circle to, I think, your initial response to the very first question. So with that, Dr. David Ellison, let me say thank you so much for giving up your time to come and talk to me today. I really, really appreciate it. More so for the thought and that has gone into your answers to perspectives. I've learned a lot from your work and hearing your perspectives on nutrition science as a field has really helped shape a number of things personally. So I, I appreciate that and thanks for being a part of this.
B
Well, sir, I have learned and continue to learn so much from your podcast that I'm truly grateful. And if I've been able to repay that learning even a tiny bit today, I'm happy to have done so.
A
Before you go, I just wanted to remind you about Sigma Nutrition Premium. It was created with the goal of allowing you to more deeply understand the material you're hearing, to be able to retain more of that and then be able to easily and efficiently revise so that in the future you can use it using things that you have learned. So for full details on this, then check out the link in the description box, wherever you're currently listening right now. Or just go to SigManutrition.com and you can see all the details there. And of course, your support is what keeps Sigma Nutrition going. We don't run ads, we don't sell supplements, anything like that. So your support is what allows me to continue to do this. So thank you for that. I hope you do come back for the next episode regardless. And until then, have a great week. Stay safe and take care.
Release Date: May 5, 2026
Host: Danny Lennon
Guest: Dr. David Allison, Chief of Nutrition & Director, USDA Children’s Nutrition Research Center, Baylor College of Medicine
This episode explores the principles of interpreting nutrition science, focusing on research rigor, statistical reasoning, transparency, and common errors that lead to misinterpretation. Host Danny Lennon welcomes Dr. David Allison, a leading researcher in nutrition, obesity, and research methodology, to unpack why missteps occur in nutrition science, how to recognize them, and the future direction the field should take to ensure better trust and validity in research conclusions.
(04:22) David Allison shares his scientific philosophy, rooted in skepticism, statistics, and a lifelong quest to question assumptions.
“I've sort of been a scientist as long as I can remember... I keep asking, how do you know? And are you sure? But those are the hallmarks of a scientist.” (04:43)
(06:44) Allison outlines a generalized structure for rigorous research:
Not all studies require the same rigor—critical is disclosure of methodological weaknesses to allow readers to assess confidence.
“We always want the most rigor we can get. But not every study deserves equal rigor... The key is to disclose it...” (08:37)
(13:45) DINS Error (Difference In Nominal Significance):
Cluster Randomized Trials:
“If outcomes were responses, we wouldn't need control groups...” (16:17)
Innocent Misunderstanding: Lack of statistical training; frequentist p-values are counterintuitive.
“Probability is very counterintuitive and a lot of people make just innocent mistakes...” (22:12)
Nefarious Practice: Selective analysis or reporting to achieve publishable results.
“…they want to be able to say my evidence supports it...and I think that's nefarious...” (24:33)
(28:20) Variability in outcomes is often mistaken for true differences in treatment response.
“I've even heard [misinterpretation] in FDA advisory board meetings...there's no evidence that there are any non-responders...” (28:43)
Proper test: Need to analyze interaction effects between baseline characteristics and treatment, not just outcome differences.
True estimation of individual response heterogeneity requires advanced designs (multi-period crossover, not standard parallel-group).
(34:42) Excessive secondary outcomes inflate risk of “false positives.”
(37:04) Core Issue: Researchers can test many variables and only report those with “significant” results, misleadingly inflating findings.
“If you analyze a thousand endpoints and you just publish the one that came out significant...we can't properly interpret those statistics you give.” (38:12)
Solution: Transparency—disclose all outcomes measured and all statistical analyses performed.
Cull research questions for importance and answerability; focus on robust studies for high-impact questions.
Better methods for dietary assessment: Move away from self-report toward objective (biomarker-based) methods.
“Stop spending so much time debating about the value of self reported food intake and...let's really move...investment from using those methods to coming up with better methods...” (47:31)
Social and policy changes:
“If people don't comply...the IRB should say, we're not approving anything else for you...” (49:40)
| Quote | Speaker | Timestamp | | --- | --- | --- | | “I've sort of been a scientist as long as I can remember... I keep asking, how do you know? And are you sure? But those are the hallmarks of a scientist.” | Dr. David Allison | 04:43 | | “We always want the most rigor we can get. But not every study deserves equal rigor... The key is to disclose it...” | Dr. David Allison | 08:37 | | “If outcomes were responses, we wouldn't need control groups... But people seem to forget that when they go into treatment response heterogeneity...” | Dr. David Allison | 16:17 | | “Probability is very counterintuitive and a lot of people make just innocent mistakes.” | Dr. David Allison | 22:12 | | “You can do anything you want as long as you are prepared to accept the epistemologic limits that that choice makes upon you... and you disclose it to the reader.” | Dr. David Allison, referencing Abelson | 37:39 | | “Question Everything.” | Dr. David Allison | 50:27 |