
Formal verification and type systems - how do they relate? Niki Vazou is on a mission to bring better formal verification to the masses. I have done a couple of episodes about dependent types and my feeling is that dependent types are super powerful an...
Loading summary
A
Hello, this is Adam Gordon Bell. Join me as I learn about building software. This is code recursive.
B
It is a thing that languages like Haskell adapts 20 features. Maybe in five years like two of these features will become mainstream and will be ported to JavaScript.
A
That was Nikki Vazu. She wants to find ways to bring formal verification to mainstream programming. Today she teaches me about refinement types. Refinement types are available in some form in several languages. Haskell with Liquid Haskell, Scala with the Refined library, and perhaps even Ruby. Nikki is an assistant professor at IMDA Software Institute and the creator of Liquid Haskell. If you like to hear me ask simple questions to very smart people, you're gonna love this interview. We talk about type systems, formal verification, SMT solvers working with GHC and dependent types versus refinement types versus liquid types and so much more. Super fun interview. I hope you enjoy it. Nikki Bazu, welcome to Co Recursive.
B
Thank you for having me.
A
So I have a lot of questions to ask you about refinement types and proofs, software verifications. But I wanted to start with this quote I found on your website. It says my goal is to make theorem proving a useful part of mainstream programming. So what does that mean to you?
B
To me it means that where I see all these theorem proving and refinement types and general verification go is that in some year some developer is going to write code and then some verification will run on the bug that will not only detect errors, but also it can be used to suggest code or to make coding easier, faster and safer. But what is important is that the developers should not know exactly what is going on in the background. Because my impression is that up now people, especially mainstream developers, are all these scared of this formal verification and type checking and refinement types.
A
So what is formal verification?
B
That's a very broad research area I assume. But the high level idea is that you have a code, you have some specifications. So you start with some specifications that you have in your mind that you want your code to satisfy. Like a program will not crash or like the most simple one is that you will not divide something with zero. And then you find a way, like refinement types, to express these specifications formally using some syntactically formal language. And then formal verification is the process of trying to prove that the code satisfies the specifications that you wrote.
A
Okay, and you mentioned types. So I've had a number of episodes about types before. So I had an interview. Episode 6 I talked to Edwin Brady about Idris and episode 15 I talked to Stephanie Weyrich about dependent types. And episode 23 I talked about the PI language. So listeners should definitely check those out, but just to get like some basics so that we understand, like what is.
B
A type, what is a type? For me, a type is basically a set of expressions in the language. So the language I talk when I say programming is Haskell, if that's the case. So for example, in Haskell we have the type of integers that basically express all the potential integers, starting from 0, 1 minus 1, et cetera. So you group them all together and you call them integers. And then you have, for example, functions that are typed as a function from integer to integer. So you write your code and then you write the types that the code should satisfy. And then if the program satisfies this type or type check, then you, you know that certain errors are not going to happen. So you cannot, for example, apply an integer to an integer because the type checking performs some basic checks for you.
A
So types are then a form of software verification?
B
Definitely.
A
So types give us some limited form of verification. Right. So I'm trying to make sure I understand this. So if a function can only handle integers, then the static typing that happens during compile time in the language, it verifies that nothing's calling it with something that isn't an integer. Is that the sense in which it's.
B
Yes. So this is a very good example. And also what is interesting is that in what we say, strong type languages like Haskell, that makes sure that all your types are satisfied at compile time, then the compiler can use all this information that it knows about the types to, for example, allocate proper memory when it is to run your code, or it makes sure that it knows that your code satisfies the types, and then it can use this information to optimize the runtime and also to make sure that certain errors will not happen.
A
So what then are refinement types?
B
The standard types are, are like only depend on the structure of your expression. So as we said before, zero has a type int. So refinement types basically take the standard types and refine them with some logical predicates, and then we can have more refined sets that define types. So for example, again the standard example is division by zero. And the standard Haskell type for the Divisor operator says, give me two integers and I'm going to give you back an integer. So we can go and insert logic to this type and say, give me an integer, then give me an integer that is probably different than zero. And I'm going to give you back an integer. And then if you give this refinement type to your program, and if you run like a refinement type checker and make sure that all your program satisfies this type, then you have a proof that on runtime there is going to be no division by zero exception.
A
So if I have this divide function, it takes two integers. I want to make sure the bottom one's not zero, right? The second argument, the second argument is not zero. So refinement types let you. How does refinement type solve that?
B
So you can refine the standard types of your language with logical predicates. And at this case, the logical predicate says that V, where V is the value that I'm refining is different than zero. And you give this refinement to the second argument of the divisor operator. And now you have a specification for this division function. And then you run like the refinement type checking, which again, like the standard type checking, checks that all the types, the refinement now types that you write are satisfied by your program. So for the refinement type checking to pass, there needs to be a proof that for every time you apply the divisor operator, the second argument is provably different than zero.
A
So how I would solve this problem without refinement types, right, is I would use something to sort of wrap the result and maybe put a check in place, is how I'm thinking, right? So I would be like in my divide function, I would be like, if the second value is 0, return none or something, or some sentinel value, maybe. So how is refinement types differ from that solution?
B
Oh, so the refinement type checking happens on compile time. So from soundness in the sense that division with zero will never happen, these solutions are exactly the same. But your solution says that in the definition of the divisor operator, run one runtime check to check that the second argument satisfies some predicate. What refinement types do is basically at compile time trying to prove that this check always succeeds. So they basically make your program more efficient, faster, because you get rid of one runtime check that's interesting.
A
So there can be performance ramifications because we're moving things to compile time.
B
Exactly. One of the standard examples that I use is basically a simplification of the heartblet example that basically you have memory and you want to index it, and you need a runtime check to say that whenever you are indexing, you are Only indexing the parts of the memory that you have the ability to index and you don't go out of memory. And this check, as you suggested with the divisor operator, could be happen at runtime with an assertion or could be lifted to the type system and give like the indexing operator, a refinement type that says that the index should always be in bounds. And then at compile time again a refinement type checker is trying to prove that this indexing constraint is actually satisfied.
A
It seems like I could come up with a way to do this with normal types. So I'm thinking of wrapping them in something. So what I'm thinking is for your divide example, all I need is a type for numbers that aren't zero, like a natural number type or non negative or something, right?
B
Yeah, you need a refinement of an integer. By definement I mean like in general something that is if you see types of sets, you need a subset, right? And this is what refinement types give you. They take the types and they add logical predicates to define subset of the initial type that satisfies some predicates. So what is cool is that these logical predicates are expressed in a logic and then a refinement type checker is basically using SMT solvers to decide that all this logic that you are using to refine types is actually satisfied. Where SMT solver stands for this? Satisfiability Modular theory. And in my work I treat them as black boxes that I ask them questions like can you prove that in this case the second argument of the device is different than zero? And hopefully the SMT solver will come back and tell me yes, and then my program would type check.
A
So how do they do that? Let me narrow it down. What does the SMT solver do?
B
So SMT solver against stands for satisfiability modulo theory. Basically what they do is they take as input logic and they answer if logic that you give them is satisfiable. This is a mathematical term. It's again like we are informal verification. So we are using a lot of math in the back, some of which I don't even know exactly. But like, I mean that's a beauty, right? There is a lot of work on SMT solvers. Me as a user, I know how to use them, I don't know how to develop them exactly, but there is a lot of engineering work to make these things work. And again to briefly go back at the beginning of your first question, I mean this Is how I see it too, right? That me as a researcher in type theory, I want to build formal verification and type systems that the user will use without exactly knowing what happens here. So this SMT solver takes as input a logical predicate and checks if this is satisfiable. For example, if I tell you, like, can you check if a value is greater than zero is satisfiable? It will come back to me and will tell me yes. If you replace V with 1, then you have 1 greater than 0, and so your initial formula is satisfiable. Basically, if you negate what you ask it, you can ask it, can you check if not V greater than zero is satisfiable? And this will say, no, the negation is not satisfiable. Again, going back to the field of logic, we know that the negation of a formula is not satisfiable. Then this formula is valid, which means that it holds for a formula has variables. So if you take, if you substitute the variables with any value, then this formula holds. So we use smts as a black box to ask them, like, can you prove that this formula is valid? And the SMT can say yes or no.
A
The one usage of SMT solvers I'm familiar with is probably like a toy. But it's like Sudoku puzzles. If you give it all the rules, like, do you know these puzzles?
B
Yes.
A
If you give it all the rules, like I have a nine here and a seven there, and whatever, it will kind of infer what all the other values must be.
B
Yes, because it finds like, I guess, assignment to the values that satisfy the constraints. So I guess there. The idea is like, that you take the Sudoku rules, you translate it as an expression in the logic that is satisfiable, even only if the initial Sudoku is a valid solution. And then the SMT solver rule give you the assignment from the free variables to values. I mean, we do something like this in refinement types. You take the program, you take some specifications that you can see the specifications as, like, when is the Sudoku true? And then you translate this combination of program and specification into logical formulas. And then we give them to SMT and we ask them, like, is the formula valid? And if it's valid, then there are no refinement type errors. Otherwise we say that, you know, the SMT couldn't prove that this application is correct, so maybe something is wrong with your program.
A
So like in the example of divide by zero, if I say this has to be greater than zero, then the SMT solver has to look at all the places this is called and see if it can figure out the value of those kind of.
B
So this is like the goal of the type system. So for example, when you type check like pure Haskell or any programming language with type code, then there is an algorithm that decides if your code type checks or not. And we call this algorithm a type system. And it has some rules. So in the refinement type system there are again some rules. And this is exactly what they do. They take the specifications and the code and they basically red use it or translate it to SMT queries. So it's not that we give to the SMT the whole thing, we take the whole thing, we apply the refinement typing rules, and this is exactly the area that I'm working on. And then this refinement type checking asks like the SMT queries ask questions to check validity.
A
If I call the divide function and I give it a seven, right? So I want to divide by seven as the second argument, then it sees that seven and knows that it's decided that it fits the constraint.
B
Yes. And it will tell you, yes, everything is good.
A
So what if the value is passed through several layers of calls?
B
Then it needs to have a type that implies that it's different than zero. So for example, I get an input from the user and I transform it, I add it, I multiply it, I pass it through various checks. Finally, at some point I divide it by zero. All this transformation that I'm getting to the input need to have property type so that the end result that I pass to the divisor implies that it is different than zero. During the process, I can, for example, add some runtime checks to make sure that the final value that I pass to the division operator is greater than zero. So, for example, I take my input, I check if it is greater than 10, and then at runtime I make this check, and then I remove nine, and then I pass it to the divisor operator. And then the minus operation has a proper type to satisfy that the end result is greater than zero. And the idea is this, that instead of having this runtime check at the end division operator, then I have much fewer runtime checks at the beginning of my code where I take the input from the user for which I know nothing. Then at the beginning I make sufficient runtime checks to ensure that at compile time I can prove that all the entries, all the constraints are satisfied. Also, that is cool, because I make one runtime check at the Beginning if my program is to cross at some division or at some indexing, then I know early on on and not after I have done all the work at runtime.
A
So yeah, on a runtime, on a value that's supplied at runtime, there's no way to get around that you need to runtime check. But the refinement types will detect this runtime check and kind of make sure that that restriction, that constraint travels via the type system to where you have the restriction in place.
B
They will not detect the runtime check. It is more like by the refinement error methods that you get, maybe you will be helped to understand what runtime check you need. It is like type checking in Haskell. Like if there is an error you'll get a message and then the fix is up to you to understand the message and what you need to fix. And it's the same for refinement types. You can get a refinement type error. It will not tell you like add this runtime check at the beginning. But the type error will give you some information that might be help you to understand what runtime checks you need to put.
A
But I'm wondering if I add a runtime check, can it detect it? Or do I have to tell it somehow that oh, I've covered this condition?
B
No, it will detect the runtime check.
A
Can I also feed it my own kind of assumptions like don't worry.
B
Yes, yes, yes. So it understands, for example, if you're branching, if you have if boolean, then alternative, else other alternative, it will understand that the code that you run inside the then branch like the the if condition is satisfied. So you can add inside the ifs. You can add like arbitrary expressions and then the then and the else will be checked, assuming that the condition and respectively its negation hold.
A
I had a couple interviews about dependent types and they have some of the same use cases that will be talked about, right? Like using a list of a known length so that you know, if I take the head of it, then I know that its length is greater than zero, for instance. So how does refinement types compare?
B
So this is a very complicated question because I think you can heard many different opinions on these questions. I can tell you my opinion. So the first one is that basically dependent types, what they do is that they allow you for your type system to be more expressive. They allow you to express properties like the safe indexing, like the division by by jira that standard type system don't let you do. On the other hand, what refinement types do is they refine an existing type system. So you have a type system and you add this logical predicate. You cannot change the language or the existing type system. You start from a real thing and you refine the types. So basically you don't allow the types more expressive. On the other hand, you, you catch more errors. So we have this rule that you take a language and you add refinement types. So every program that passes the refinement types also passes the standard type checking of the language. So dependent types make extend the type system to accept more programs, while refinement types constrain the type system so that more programs are rejected. This is one thing and the thing is that all refinement type systems start from an existing language and refine it. And the other very important difference is that refinement types use an external solver. The most common thing is to use an SMT solver. But independent types, you write the proofs yourself or you have a tactic language, or you have some automation. But the proofs are things that exist in your language, while in the refinement types the proof happens externally from the solver. So you cannot have proof terms inside refinement types, which is good and bad. The way I see it, having this external solver is good because many things are proven automatically, so the user doesn't need to prove things. There is an opinion that it is bad because trusting a refinement type system makes you trust the way that refinement types is checked and also user trust the SMT solver or whatever external solver that you use. But for example, you do prove to Coke you have very fewer things to trust, but you have more things to prove yourself.
A
So it seems to relate to what you were talking about, practicality and mainstream programming. Do you think that refinement types they can be added on to existing mainstream languages? Like you said Haskell, I think that you even had something on your website about Ruby. Is this a technology that we can build add like a linter type of process on top of an existing language?
B
I do believe that, yes. Yeah, I do have a work on refinement types for Ruby and I am like actively working on Liquid Haskell, which is a refinement type checker for Haskell programs. So I think that in the Haskell community refinement, Haskell and refinement types are quite accepted because the Haskell users won like have realized that types are good. So I think that if you go all like the type checking of Haskell, then spending a little bit More time in refinement typing and Liquid Haskell is not bad, but for, I don't know, for dynamic language it's very difficult to persuade somebody, like if you don't understand. But yeah, so that is what I'm saying that like this is a goal to make all this typing and refinement typing so easy that like you don't actually have to persuade somebody. So I mean, I see it going there, but I don't know how long it will take.
A
So tell me, how was it working with Ruby and adding refinements to it?
B
So as I said before, I believe that this is where all this should go. This is where verification should go, this is where refinement should go. Because if you want verification to actually have impact and become mainstream, then like it is the popular programming languages that we should target. But the problem is that for me it's like much I find it more pleasing to work in Haskell. It is not only the programming language, it's like the community. It is that Haskell is of these languages that like, I mean, I feel that I know category theory without. No, without having read category theory. So it is the thing that it is challenging, it is a very nice language and it also has like a very enthusiastic community. So for me, ideally I would like to stay there and collaborate maybe with people from other languages. It is a thing that languages like Haskell adapts 20 features, maybe in five years, like two of these features will become mainstream and will be ported to JavaScript. So hopefully verification could be one of these features that ate years will be adapted at more mainstream programming languages. I don't know if it makes sense.
A
It does make sense. I mean, I get what you're saying, you're like, you don't want to work with JavaScript. I mean, I don't want to either. But the refinement types, because they sit outside of the language, they seem like particularly well suited to some of those problems.
B
Definitely.
A
So what languages have some sort of support for refinement types? Like obviously you primarily work with Liquid Haskell, but is there support in other languages that you know of?
B
So interestingly, all this started from OCaml. Yeah, so we work with Haskell and then there is some work for JavaScript that I know and we have like this work that you see on my website, we have support from, for Ruby and I don't know many other languages, but this doesn't mean that they don't exist.
A
So I know that there is like a refined library that maybe is slightly different because it's in the. For Scala. Yeah.
B
Yes, that is true.
A
And also for Haskell, I think if I understand they work differently because they use sort of macros to do it rather than something that sits outside the language.
B
So Scala is an interesting case. I mean, I guess this is happening in Haskell too, that Scala's type system is already semi dependent. I mean you can express a lot of properties because you have all this subtyping. Then like type checking by self is using a lot of reasoning to decide if it is like if a program type checks or not. And if I have understand it correctly, type checking depends on the refinements of vice versa. So it's not such clean separated as like in the literature that you say that refinement types just refine the existing types and it is done in two separate steps. First type checking and then refinement type checking. But there is some support and a lot of automation in Scala too.
A
Yeah, and there's some overlap like as you were saying with Haskell, like I think in Scala some of these issues people take a different approach to. Right. So like taking the head of an empty list, I mean the default solution to that I think is that either people will use something that uses some sort of option type or maybe you'd use like a non empty list wherein the list cannot, you know, it always has at least one element. It's interesting that you're adding refinement types to these languages with such rich type systems that, that some of the scenarios can be covered in different ways.
B
But again, so I think like the alternative of option types, again like every time you want to access it, you need to basically open it. So you're spending a lot of time, a lot of runtime on it.
A
Oh, that's a good point.
B
Yeah. And again like non empty list, again you need to check it. Basically the concept behind refinement types is like perform the most checks you can at compile time and using an smart dissolver as a black box that is there to do a lot of reasoning for you. But the other point is that I mean it's not exactly a coincidence that I work like I am extending Haskell with refinement types because first of all, like, I mean Haskell is pure. I don't know if I need to.
A
Explain it doesn't have side effects.
B
Yeah, you can easily map a Haskell program into this SMT logic that we said before without having to deal about side effects. So I don't have to reason about memory, I don't have to do a Lot of reasoning and the mapping to logic, it's much easier. That's the one thing like that these strongly typed languages offer like a very nice background so that you add logic on top of it. And then as I said before, it's like Haskell has a group of people that already like all this mathy logic type system that it is easier to try and experiment with refinement types too.
A
They're eager for it. Where when you go to Ruby and they're like, we don't even want static typing at all. Why would we want refinements to static typing?
B
Yeah, that's true. If you need to persuade somebody to use types, then it's much more difficult to persuade using refinements types.
A
So a previous guest I had, John De Goes, he said that he thought dependent types were very challenging to use for like a programmer, a professional programmer, not somebody like an academic like you. And the refinement types were like more approachable. Do you agree with that?
B
So refinement types are awesome, specifically when your program is correct. So it's like again, from the beginning, you're writing your program, you have some specifications in your mind, and you write down the specifications and there are two points. You can make an error. You can either have an error in your program, you can have an error in the specifications that you wrote, or you can have nothing. So most of the times, because there is this automation, if both the program and the specification is correct, and because we use this SMT solver, it's like everything works perfectly. Now if there is an error, it is much more or less like standard type checking. So you'll get an error, you'll get a location, but then like how you go from there to fixing the type of the specification, it's not a very easy task. So I think like that for me, refinement types are definitely easier than dependent types because there is a lot of automation and you don't need, in most of the cases, you don't need to explicitly proof things, but it is a little bit more difficult than type checking. Strongly typed language like Haskell, for example.
A
Yeah, because you're putting these predicates right into your type.
B
Yeah. And because the type checking algorithm on the back is a little bit more difficult. So basically error reporting is the process of trying to explain to a human why the algorithm failed. And the more difficult the algorithm is, the more messy the error message is because it is trying to abstract from everything that it did and explain you what went wrong. I think to make refinement types Even more approachable error reporting is a way to go. The other thing that we have mentioned is, I guess already many times, is to give more motivation. We said how refinement types can be used to speed up your code. But this is again, you can do that manually. But if, for example, the refinement. The compiler would see refinement types and use this information to speed up your code by removing runtime checks, then I think that this would be a very good motivation for people to use such systems.
A
The compiler, you lost me there because I thought the compiler isn't aware of these refinements because they're sitting outside.
B
Yeah, yeah, right now it is not aware. But what I said is that one future direction is for the compiler to know the refinement types and use these types to optimize runtimes or memory of the code.
A
Like for instance, if your integer is always greater than zero, then maybe memory doesn't need to be as large because it's actually just a positive value. It would take up half as much space or something, or at least I don't know how ints are represented in memory, but some less amount of space.
B
Something like that. Yeah, but we can stay in the present.
A
So the other thing that I found on the Liquid Haskell website is something mentioning proving laws by writing code. So what does that mean? And why would I want to do that?
B
Okay, so I think that why would you want to do that is easier. So we can start, start from that. The laws are basically some properties that class methods satisfy. For example, when you have monoid, you can say that it is associative and you can use these properties first of all to reason why your code is correct. And also maybe you could use these properties to again optimize your code. For example, a common optimization is I think map fusion that is actually expressed in one of Haskell Cloud's laws. So the question why prove this is functional correctness, that your function is doing more or less what you have in your mind. And second, maybe you want to use manually or even compiler optimization that use these laws.
A
Yes, if you have a type class and it says okay, this operation needs to be associative, so you can yes, apply it one way or the other. Like right now, at least in Scala, like that might be a comment or something that somebody puts in, but you're saying we could use refinement types to actually express that as a verifiable statement. And is that statement in the refinement code comments or where does it live? Like what does it look like, basically.
B
What we do this is like, I think it's been one year since we added this feature in Liquid Haskell is we use Haskell functions that return just a unit value and refine them with a logical predicate to express theorems like that the list append is associative. And the way we do it is we say like, okay, I'm going to define a function that, for example, take as input a list X's, a list Y's and a list as S and returns Haskell unit value. And this unit value is refined in my refinement logic to satisfy associativity of the input Xs, Ys and Z. So we say X's append to parenthesis Y's and Z is equal to parenthesis X is appendwise parenthesis. So we use Haskell functions that return unit usually because, like, these functions are never going to be used at runtime, we don't care. And we define them to express theorems about their inputs and other Haskell functions. And we do this because, for example, append associativity is not a property that SMT prover can automatically prove. But to prove this we need like to tell to the prover some information about how the functions that we want to prove properties about behave. So, for example, to prove associativity, basically the proof is like inductive proof. It has a base case, it has an inductive case that is using the inductive hypothesis. And all these things about like induction, we don't want to ask the SMT's to prove them automatically. So we use like some Haskell functions to provide some of this, like the structure of the inductive proof and some information that the SMT doesn't know. And then again, like, we give all this information to the SMT and we let them like figure out the details and conclude the proof. So in short, we use refined Haskell types to express theorems about functions. And then the body of this function basically provides a template for the proof of the theorem. And combined with smtn, the theory of refinement types, we prove these theorems.
A
I know that because I've done a bunch of episodes about types that I have some listeners who are interested in, like proofs and how they relate to types. Is Liquid Haskell and refinement types a way that you would recommend to approach learning about proofs, or is it more a side effect of its implementation?
B
You mean about formal proofs?
A
Yeah. Is Liquid Haskell a tool that people could use to learn about formal proofs.
B
Yes. Actually my last Haskell paper was named Theorem proving for all. And in this paper we basically claim that especially if you know Haskell, then it's very easy to learn using liquid Haskell, how to express formal reasoning and proving formal properties about your Haskell program. But again, like as we already mentioned, Haskell and logic are very close. Actually they are just like an error. Like the only thing that exists in Haskell and does not exist in logic is this error at the invergence. If you exclude that, then basically you can treat Haskell as your logic and there develop proofs like the standard inductive proofs about math properties.
A
What was the thing you said is missing?
B
So what exists in Haskell and there doesn't exist in the logic is divergence and runtime errors. So these are the only two effects that Haskell have. You can create divergent programs and then you can cross, you can give an error. And so actually when we use liquid Haskell to type proofs, we make sure that these two effects are excluded. So again, as I said, you can use refinement Haskell functions to express theorems. And then the proof is basically a Haskell function. And liquid Haskell will check that this Haskell function cannot diverge and cannot throw an error. So you can express this associativity property that takes three arguments and returns unit refined to satisfy associativity. And you could develop a proof that basically says give me three inputs and I'm going to return error. Error satisfies every type. So this would be a valid proof. So this is why we need to exclude errors. And also it's the same for divergence, like in Haskell we use a lot, it's undefined. Normally we use it to say I'm going to define it later. And the type of undefined is A for all A's, which in the refinement types translates like for every refinement. And its definition says basically undefined is equal to undefined. So divergence, again, like satisfies all types, it satisfies all theorems. So if you combine error and divergence, you cannot have a theorem prover, a formal tool that gives mathematical proofs. But liquid Haskell like excludes, it's having a termination checker, it's having totality checker, and it checks that your proofs are actually mathematical proofs.
A
So yeah, no error and no underscore for undefined. So you mentioned totality checking. Back when I interviewed Stephanie Weyrich, I asked her if there should be mainstream programming languages that do like totality checking. So why does liquid Haskell bring that to the table?
B
Why we have the totality checker.
A
Yeah. And how is that useful? How is it useful in fact in terms of not just proofs, but how is it useful in terms of verifying your code is performing correctly? Verification. What is totality checking? What's a total function?
B
So a total function is basically a function that we said before, a mathematical function, right? It cannot diverge, it doesn't call error, and all the cases are complete, like it is defined for all its inputs.
A
The example we used of taking the head of a list that is not total.
B
Yeah, this is where I was going, right, that I'm writing a module, after some calls I am calling head, right? And usually like it's a very natural specification to say that head is called a non empty list. But the question is like how did this specification came from? And checking that all your functions are total is basically what is giving you the specifications that are propagated. So head is not a total function because it's not defined for empty lists. But for example, you can use refinements to say that the input of head is a list whose length is always greater than zero, for example, and then the specification will be propagated.
A
And this is how the totality checking lets us do better verification, Right, because.
B
It is tracking the potential sources of error.
A
And then for the proofs that we were talking about, totality is also required.
B
Yes, because for example, then you could define inductive proof without the base case or without an inductive case. So if I was not checking totality, I could have the associativity proof only defined for the base case. And the thing that precludes me from that is that I need to check that it's defined for all potential inputs, both for empty lists and for non empty lists.
A
The other thing, what about infinite loops? What if I just write something that loops forever and claims to return something but never returns?
B
So that is a very interesting question. Yeah, so you could reason about that. And I think that many formal systems reason about infinite loops as long as they are. So I think infinite loops are good if they are what we call productive. So they may diverge, they may never finish, but they should always produce something like refinement types. Maybe I should talk about liquid types. Liquid Haskell are using this smt. So we map Haskell to logi, and in logic nothing diverge. The easiest thing is just to make sure that you don't have infinite loops. So what Liquid Haskell is doing is that it is checking that all your code is not diverging. And if it cannot prove that, it will tell you, oh, you know, like this function, I cannot prove that it terminates. So maybe something will go wrong in your logic. I mean, in practice, definitely. Divergence is something that we want for real programs. You can formally reason about that, but it is difficult.
A
So, yeah, we talked a lot about the refinement types and where they're useful and stuff, so. But, you know, your project is Liquid Haskell, so usually before I interview somebody, if they have like a. A paper or whatever, I'd like print it out. I guess it's your dissertation. It was too big. I didn't print it out. It's like 248 pages.
B
Yeah, it is like a bunch of Liquid Haskell papers altogether.
A
So how big is the Liquid Haskell code base? How long did it take to construct, and how big of a project is this?
B
So it's a big project. It's been some years now. It is my first and only project of my PhD, which started 2011. So I guess, like, I think it started 2012 or something like this. And I have to say that liquid types existed for OCaml at the beginning. And when I joined my PhD, my supervisor, Ranji Della, was at the point that he was porting Liquidite from ocamel to Haskell. And this is where we started. And it's been how many? 80 years now and we're still on that.
A
That's awesome.
B
But that's the thing. The Haskell community is so good that it accepts everything. I mean, I think if we stayed in OCaml, maybe we wouldn't continue that. I don't know.
A
Yeah, you're getting uptake from the community, which is awesome.
B
Yeah.
A
So do you have to. I'm doing an interview soon about building like a compiler. So do you have to do compiler type stuff in Liquid Haskell, like build an AST and somehow feed it to the solver? Does it end up having components like that?
B
Yes, we have an ast, but we have an AST only for types. So again, like what Liquid Haskell is doing, mostly it has a lot of weird engineering, but it takes Haskell code, some refinement types and reduces this to SMT solver language. And then if the SMT solver finds that something is not valid, it takes it maps back the result. So we have an AST for the types. And what is really cool is that we are using the as for expressions, we are using GHC's AST and this reduces a lot of work because instead of parsing the code, we compile it with ghc. GHC is doing a bunch of simplifications and then we ask GHC, give me the ASD of the program that you compile. And what is amazing is that the intermediate language of GHC is I think like seven alternatives. So it has variables, literals, case, and like, in total, like, I think you have seven cases, which simplifies a lot, the analysis of Haskell programs. And the cool thing is that if you ask GHC to give you this very short intermediate representation and you analyze this and you can target the whole Haskell. So all these class syntax that you write, all these guards, everything reduces to this very tiny intermediate representation.
A
That's very cool. Yeah, because it's such a large language with all the extensions and stuff. If you had to build your own like tokenizer and all this stuff, it would be a lot, right?
B
It has legdity, has this API that you can use and you can call it and you can ask it questions very.
A
That sounds like that would make your project a lot easier or that it does.
B
Yes it does, because like you can definitely restart assured that you target the whole Haskell. It's not that there is a hidden syntax or something that you haven't thought about. You really don't care what the user writes. As long as you can analyze this small intermediate language, you're fine.
A
So do you think that other people should be trying to build verification tools that sit outside of the language? Like maybe not refinement types, but just in general, like, I don't know, a tool to validate that my team's JavaScript doesn't hit certain conditions. Is that something that is under expl in the world of professional software development?
B
So I'm not sure I understand what you're asking. Is it like if we should use verification or if it should be part of the language or external to the language?
A
My question is sort of like, should people try to build verification tools on their own, like professional software developers, to kind of try to verify code? Should I try to build a verification tool that sits outside my language?
B
Yeah, I think, yeah. And I think actually they are doing it. So I know Facebook has a tool called Infer. Right. Especially the big companies are moving towards verification. I mean, even the fact that you're using like people are using in production languages like Haskell or Scala or this like, safer language means that people do care about verification and they are willing to invest maybe more time to learn to understand like a complicated but safer language. So I think, yeah, that adding verification in industrial products is definitely something that can happen. And I think that we are going there. I mean, you will spend a lot of time writing tests anyways, right? So I don't know if it is a small step to going from all these specifications and the tests, like what you have in your mind, into actual verification, but this is our goal for I guess the more academics to make it be a small step from the tests to actual verification.
A
A test only, like it can only check for the absence of a problem, right? It checks one specific case. What you're talking about is verifying like for all inputs, right?
B
Yes.
A
So the power is there.
B
It is.
A
Well, I think that's all my questions. Nikki, thank you so much for your time. Is there anything else you. You wanted to mention?
B
No, just thank you very much for this interview.
A
Awesome. It's been great fun.
B
Thank you.
A
That was the show. I hope you enjoyed it. I think it ties in with some of the conversations we've been having on this show's Slack channel about dependent types and constraint propagation and type level programming. If you have any feedback, let me know. Someone named Autotopism gave a great review on itunes and Uncarved Bitmap gave a great shout out on Twitter. Mark Weiss, the host of using Reflection, also wrote a great review of the podcast and my co worker Elliot mentioned the show on Twitter. So thank you so much everybody. I'm sure I missed a whole bunch of other people. Podcasting is a somewhat solitary endeavor and my enthusiasm for podcasting kind of lives and dies by the little dopamine rush I get every time someone use an itunes review or mentions me on Twitter or LinkedIn or when I get a Google alert saying that somebody mentioned me on some list of podcasts. So I really appreciate it everybody. Thank you very much. Until next time. Take care.
Episode: Refinement Types and Liquid Haskell With Niki Vazou
Host: Adam Gordon Bell
Guest: Niki Vazou, Assistant Professor at IMDA Software Institute and Creator of Liquid Haskell
Date: May 15, 2019
This episode delves into the world of refinement types, formal verification, and the journey toward making theorem proving a mainstream part of programming. Adam Gordon Bell interviews Niki Vazou, a key contributor to Liquid Haskell, discussing how refinement types can push software verification forward, what sets them apart from dependent types, the use of SMT solvers, and the challenges and opportunities for bringing advanced type systems to everyday programming in mainstream languages.
Formal Verification:
Types as Verification:
Refinement Types:
Int, specify that an integer is non-zero for a divisor function:
Compile-time vs. Runtime Checks:
None).Examples:
Role of SMT Solvers:
Analogy to Sudoku:
Interaction with Type Inference:
Handling User Input:
Type Error Reporting and Runtime Checks:
Liquid Haskell:
Other Language Support:
Mainstream Adoption:
Future Potential:
Proving Laws as Code:
Educational Tool:
Caveats:
Adoption in Industry:
Distinguishing from Tests:
| Timestamp | Segment / Theme | |-----------|------------------------------------------------------------| | 01:24 | Niki’s goal: theorem proving in mainstream programming | | 02:36 | Formal verification explained | | 05:39 | Introduction to refinement types | | 08:06 | Compile-time vs. runtime checks and performance | | 11:05 | SMT solvers and their role | | 13:19 | SMT solvers & Sudoku analogy | | 16:08 | Propagating runtime checks via types | | 19:47 | Dependent types vs. refinement types | | 22:31 | Bringing refinement types to mainstream languages | | 25:57 | Refinement types in Scala & other languages | | 28:15 | Haskell’s advantages for refinement types | | 30:43 | Error reporting and usability | | 34:05 | Proving laws by writing code in Liquid Haskell | | 36:54 | Liquid Haskell as an education tool | | 39:57 | Totality checking and its importance | | 42:07 | Handling divergence (infinite loops) | | 44:51 | Internals: using GHC’s AST | | 47:42 | Verification tools in industry; difference from tests |
This episode offers a comprehensive, approachable exploration of refinement types and formal verification for software developers. Niki Vazou demystifies how these tools can be both powerful and practical, their relationship to dependent types and language features, and the technical machinery (like SMT solvers) that make them possible. Throughout, there’s an optimistic vision: as tools like Liquid Haskell mature, formal methods might not just be for experts, but will become an invisible, reliable part of writing correct and efficient software everywhere.