![BHIS Podcast: Py2K20 - Transitioning from Python2 to Python3 — Talkin' Bout [Infosec] News cover](https://img.transistor.fm/AukI425sRBc3M3UIa9lVng7qjeNeYEQ8BZfzCEXhALs/rs:fill:0:0:1/w:1400/h:1400/q:60/mb:500000/aHR0cHM6Ly9pbWct/dXBsb2FkLXByb2R1/Y3Rpb24udHJhbnNp/c3Rvci5mbS8xZTA1/ZWZhNDcxZGM4ZTFj/ZGJhMTMwNmYzMmJj/ZjBkNi5wbmc.jpg)
In this podcast (originally recored as a live webcast), we talk about the 2020 End of Life for Python2. We address what the short, and medium term impacts will likely be. Key language differences will be highlighted with techniques to modify your code to
Loading summary
A
Hello from Spearfish, South Dakota, it's the Black Hills Information Security Podcast. This is the podcast version of our webcast, so some of the slides we might reference will be missing, but you can find the whole episode on our YouTube page. This is transitioning from Python 2 to Python 3 with Joff Thier. Enjoy.
Thanks everybody for attending today. So we're going to be talking about. I was struggling with a title for this one, but I ended up landing on pi2k20, which is sort of a nod to, you know, y2k. I know most of you are probably not even that old now, but you know, it's, it's a fairly big transition that we're looking at from Python 2 to Python 3 and I think people have been a little bit resistant and so I want to talk about some of the things that will change and what that's going to look like for you. Jason already said it. Who's this? Jeff Guy, Pen tester, researcher, developer, Black Hills. I am a Sans certified instructor. We've already talked about Security Weekly and the Sans stuff. If you want to go and look up the Automating Information Security with python class, then sans.org has all the things you need there. You can read about the course outline, the syllabus and so forth and learn about the various instructors that teach it. And Mark Baggett is the author of the class and Mark Baggett, myself and Mike Moore are the main individuals that teach the class right now, although we are onboarding some other people.
So what's our agenda today? Well, first of all, what's happening Linux distros potential impact. Then we're going to talk about the future module with respect to Python 2.7. X. Then we're going to step through a series of slides which talk about the language differences between Python 2 and Python 3. In short, keywords, division, print format, strings, input, string objects, some file handling and some memory saving enhancements. We're going to talk about variable scope a little bit and then that's a lot to take in. So we're going to sort of wrap up, ask summary and questions. Some of you are going to get focused on module differences such as what's happening with urllib 2, urllib 3, urllib and all that mess. I'm not going to dive into that. I'm really going to focus on what it means in the core language that these changes coming up and a little bit of opinion as well. So Python 2 becomes end of life. So what actually changes? Well, first of all, Pep 394 is the Python enhancement proposal that states that the Python command on a Linux System will invoke Python 2 by default. So that's one thing that may change in the future, but may not necessarily change in 2020. It may stay that way and frankly it really is just a symbolic link on the operating system with respect to Linux. The thing that will change is that maintenance releases of Python 2.7 will cease to be released in 2020. Right now, one other thing to note, and this is regard to PEP394. The 394 recommendation is going to be periodically reviewed. This is a cut directly out of the Python documentation, updated when the core development team judges it appropriately. Right at this point in reference, regular maintenance releases will continue until at least 2020. So Pep394 hasn't committed to changing it, but we need to keep our eye on it. Now, what will the Linux distributions do? This is a sort of predictive question. My opinion is in the short term I suspect the nothing. The main barrier for changing the Python symbolic link to Python 3 is going to be breakage of course of third party scripts and packages, right? Classic example would be the print statement in Python 3 which has become a function instead of a built in statement in the language that would immediately break. So avoiding such breakage of third party scripts is precisely the reason PEP recommends that Python continue to refer to Python to for the time being, okay. However, the Linux distributions are free to change that if they want to, so we do need to keep an eye on it. If they do change it, what does it really mean? Well, not a lot. In the Linux world it means that your friendly system administrator, and that might be yourself, does an su and go ahead and relinks the Python command to Python 2 instead of Python 3. So how much do we really care? Well, somewhat because, well, some package update, maybe a software update to Python itself may change that symbolic link back again. So it's really just going to be a maintenance headache for those that are in the systems administration role with respect to Linux distributions that actually make the move. Now how does this affect other operating systems? In the case of OS X, Mac is going to follow a very similar role that one would expect to see in the Linux world. So it'll be the similar thing. You'll have a symbolic link that's pointing to either Python 2 or Python 3 and you may run into that same issue in the case of a Windows installation of Python. It's really up to you because the Windows installation of Python creates Python EE and depending on which version that you install means that you will either be running Python 2 or Python 3. So it's completely up to you, in the case of Windows, how that's gonna look. Now, let's talk about the future module a little bit. In Python 2, future was a statement that allows us to import or integrate features that have not been released in that current version of the language. So a classic example in the Python 2 and Python 3 transition is making our Python 2 scripts forward compatible with Python 3. And in the case of Python 3, if it sees such a forward compatibility statement, such as from double underscore, future double underscore, we call those dunder variables. Typically the double underscore is dunder. That would effectively be ignored in Python 3, but it's going to be processed to change the division behavior or the print function behavior in Python 2.7.x, which is useful because that allows us to tell the Python compiler or interpreter that a particular module needs to be using syntax or semantics that are going to be available in a specified future release. Right? So it's a migration feature. The future statement is a migration feature that allows us to program in, specifically in this example, in Python 3, syntax within a Python 2 script, a very, very useful feature. We're going to dive into that a little bit as we go through the presentation today. Core language keywords so in Python 2, there were 31 exact keywords in the language, and this is the meat and potatoes of the actual interpreter. I used Python on my OSX machine here to illustrate this. You could use Python, a Linux machine, or a KALI distribution or whatever. You're going to see the same thing. Python 3 changed the core language keyword set such that there are now 33 keywords instead of just 31 keywords. What does that mean to us? Well, really, not a lot. The thing that actually changed is that the true and false and the none statements became keywords. In Python 3 and prior, they were not keywords. So not really a big deal there.
Division, however, is a big deal. So when I say division, I'm talking about mathematical division. In Python 2, mathematical division is actually the same as the floor operator. In other words, mathematical division in Python 2 is the same as performing an integer division and you will not get a floating point result. You can change that behavior in Python 2 if the dividend or the divisor are actually floats. For example, maybe 5.0 divided by 3 or something like that.
In Python 3, all division mathematically is going to be floating point unless you explicitly use the Floor operator, which is a double slash. Let's look at a couple of examples of this.
This is Python 2. And notice in the first example, I'm dividing 10 by 3 in the Python interpreter. So I have 10/3 typed out, and notice how the interpreter gives me an integer result. Now, you all know that 10 divided by 3 is actually going to be 3 and a third. It is not going to be just the integer 3. If we want to force Python 2 to perform that floating point division, we can make the dividend or the divisor a floating point, and then we would get 3.3333. In other words, we would get a floating point result. Okay? The other thing we can do in Python 2.7. X is that we can import the division behavior from the future. If we import the division behavior from the future, then Division in Python 2.7x acts exactly the same as Division in Python 3. In other words, 103 is going to give us a floating point result, which is 3.3333 and so on. Now, if we desire to perform an integer division, then we can certainly say 10 3, which would give us an integer division, thus mimicking the original Python 2 behavior. So there's two operators here. One is the slash for actual division, the other one is the slash slash for an integer division or a floor operator. And that's what we need to be thinking about using going forward as we code in Python 3 using either that floating point division or using the floor operator. So this is a classic example here of how we can use the future module to help us out in Python 2. Of course, I have to show you the Python 3 example, and that is, if we divide 10 by 3, we get 3, 3, 3.333, etc. If we use the floor operator again, we get the integer result. In other words, Here in the Python 3 interpreter, this is acting the same way as if we imported division from the future in Python 2. Now, one of the ones that really trips people up, one of the feature changes is the print function. In Python 2, print was actually a keyword in the language, it was part of the core interpreter itself. In Python 3, print has actually become a function. So what exactly does this mean? Well, in Python 2, you are able to get away with using the syntax print, then either a single quote or a double quote and some string on the inside of that quote, and it would work printing out that output to screen. In Python 3, that will no longer work. You actually have to use parentheses around the arguments to the print function. That you are trying to print to screen.
So it's a bit of a challenge. We have to translate our scripts because you will get a syntax error if you try to use the original print statement without the parentheses in a Python 3 script. The other thing about Python 2 is if you do use the parentheses, it will in fact work. But Python 2 thinks you're printing a tuple because a data element or an object in Python that has parentheses around it is also used to define a tuple. So Python 2 would print essentially a tuple, but it wouldn't really be forward compatible. Now, again, just like the division example, we can make the print function forward compatible by importing it from the future. My syntax is incorrect here. I should have put double underscore, future double underscore around that. I apologize. But from double underscore, future double underscore, import printfunction means that your Python 2 script can now use the print function in the same way that Python 3 uses the print function, and you will have compatible syntax. Well, there is one other feature that you need to be aware of. With the print function in Python 2, you have this idea of suppressing line endings by appending a comma to the end of the print statement. Python 3 does not have that concept. Python 3 actually takes that to the next level by using a parameter to the function called end. So if we say end equals and then double quotes together, for example, or maybe end equals and a single space between the quotes, that means the end of line character for a print has now going to be whatever we specify inside of those quotes, either the space or the comma, or even just nothing. By default, the end character is always going to be the line feed with a normal print statement. So you don't need to specify it when you're expecting just a line feed on the end of line. What about format strings? This is another change that is coming now. A lot of us on the webcast, I am sure, have probably programmed in languages like C and C, Java, things like that that use format strings like we've already always known them, right? They use percent d and percent f for floating point, percent x for hexadecimal, percent s for strings, and so on. And these format strings in Python, at least version 2 anyway, have been inherited from those languages C and C. So example code here we've got a variable A equals jof thier in quotes, and then we've got a format string underneath it. X equals hello% s. What is your favorite color? Percent and then an A inside the parentheses down Here, what this means is create a variable X whereby the variable A is substituted into that string at the position of the percent S characters, and then we can print it. Of course. Now, Python 2 style format strings are something that we all know and love and we've been used to if we've been programming in the Python language, they will not go away in Python 3, and that's a beautiful thing. However, Python 3 adopted a new style of format string which is a little bit more precise than Python 2, and over time I've actually grown to love and find more useful. Python 3 adopts what I would think of as a C Sharp style or syntax of a format string. Instead of using percent s, percent d percent x and so on, Python 3 uses curly braces and then an argument number, a colon, and then fill character, alignment, length and type as the various parameters inside of the format string. And as it says on the slide here, the argument number is the argument number that you specify in the format method next to the format string. The fill character is the fill filling the string when you're doing alignment. The alignment can either be the caret the left than or the greater than to align center left and right. The length of course, is the length of the string when aligned. And then the type parameter here is the same as the Python to format string types. So S for string, d for decimal x for hexadecimal and so forth. So here's an example of a Python 3 style format string. We have x equals 10 divided by 3 and then the very next statement we have on the Python interpreter is print, which is print as a function, 10 divided by 3 equals and then a format specifier with a curly brace argument 0 colonial 6 wide 0 fillet in advance point 3F for floating point and.
Close, the format string. What that means is I want to print a field that is a floating point number six in width with three digits after the decimal point. And of course I print 103 equals 0 3.333. So it's a very precise way of doing format strings. Now the other question that comes up is, well, what if I want to put curly braces inside my format string? Well, you can do that by specifying a double click curly brace. And then you will see a curly brace actually printed out, not just the format string representation. Another example would be name equals jof print hello. And then we've got square brackets. Here we have Argument 0 inside the curly braces for the format string. And then we have a space as the fill character centered the width of the field is going to be eight and then we have S for string. And of course, when you see the printout you see hello Jof, how are you? And notice how JOFF is centered between those square brackets. The square brackets are in the format string just so that you can see in the output that JARF has been actually centered in that result. Any questions so far, Jason?
B
Jeff, I think a lot of us, including myself, wondered why 10 divided by 3 was 3.3333335 and not 3.33333333.
A
Okay, that's largely due to the limitations of floating point arithmetic. The register inside of a computer has limited space and when you actually use up the entire floating point field, there's going to be a mathematical rounding effect to deal with the fact that the real answer there is 10.333 infinite. But computers are not infinite, so you have to actually terminate that precision at some point.
B
Thank you. I feel like there was a few people like, huh, what?
A
Yeah, right.
B
Then one last question. What about Splunk? Is it built on Python 2?
A
You would have to ask your Splunk representative. I would be making a guess if I answered that question. So I'm not going to make that guess because I don't have a Splunk instance in front of me. I will tell you that I suspect it's going to be Python 2. You may want to ask them what the future holds in the product that would not stop you from using the. Should not stop you anyway from using from double underscore future statements in in your scripts.
B
Cool. That's it for now.
A
Okay, so what about keyboard input? Well, there is a function in Python 2 called raw underscore input and there's also a function in Python 2 called input. And this has always been a bit of a source of confusion in Python 2. The easiest way to say this is that raw underscore input in Python 2 was the thing everybody used to actually get input from from a keyboard. If you used the actual input function in Python 2, it would work, but it actually evaluates the right hand side of the equation, if you like, whatever is input into the interpreter. And the result of performing that evaluation leads to potential code injection vulnerabilities in Python 2. So the input function in Python 2 is bad. In fact, so much so that, well, it is a code injection issue. So what a lot of people did in their Python 2 scripts is they used just raw underscore input. But raw underscore input goes away in Python 3. And so you immediately have a transition issue that comes up. One of the ways to solve this, and this is when I could do a demo, but I don't think we'll have time, is to use a try except block, where we reassign the input symbol to the function raw input. If we detect that we are running Python 2, that you could do that with either a try except block or you could do it with an if statement. Testing the platform sys version variable. So there's a couple of ways around that. What about strings?
Python 3 has made a major change with respect to strings. Strings in Python 2 were all treated just as sequence of bytes, sequences of bytes. This is wonderful, but we don't live in a byte orientated world anymore. Python 3 treats all strings by the default encoding, which is, you know, UTF8. Now, I think this is probably the number one issue that scares everybody and holds them back from adopting the new version because this brings some baggage with it. Let's talk about that baggage a little bit. First of all, what is UTF8? Well, UTF8 is a space saving encoding for strings. Now we could have encoded strings in Python all using straight up Unicode, but if we did that, we would be wasting two bytes for every string. Or if we had Unicode that recognized big endian and little endian information, we would even be wasting 4 bytes for each character in a string. UTF8 actually saves us space for the string representation by interpreting the first few bits of the data in question. If the first bit is a zero, for example, of that byte that is being interpreted, then the string is treated as if it's 7 bit asci. In other words, it's a byte oriented string. If the first couple of bits in the string, or really the first three bits in the, in the data that we're Interpreting Here are 1, 1, 0, then Python treats that string as if it were Unicode or UTF8 encoded, which is what we're looking at. And you have up to 11 bits to encode the character, which is much greater than 255 in the standard ASCII table. Every other byte also has to start with a 10. So in this case we will only be using 2 bytes. We have 1, 1, 0, and then we have the 11 bits remaining there to represent the actual character. And we can represent characters all the way up to 2048. In other words, we can represent Unicode characters way beyond the ASCII table, which is required because, well, we live in a multi language world these days, right? As Bart Simpson is writing on the board Here I will go study encodings and properly use UTF8. That's what you need to do when you're adopting Python 3. Go study your encodings and properly adopt UTF8. Now, if the first byte that we're dealing with in a UTF8 string starts with 11, 10, and every other byte starts with a 10, we have up to 16 bits used in the remaining three bytes total two bytes, should I say, not three, because that's really only two bytes worth of data and we have up to 65,535 characters that we can represent. So UTF8, the reason I put the table in, and this is actually from the class UTF8 is a way for us to flexibly represent lots of characters and also save memory space at the same time. Now, what does this mean with respect to Python? Well, it means we need to actually translate between byte oriented data and string oriented data. First of all, Python 3 now has an object which are called bytes. Now, Python 3 bytes are actually compatible with strings in Python 2, because we know that Python 2 represented strings just as individual bytes. However, in Python 3, string objects now are fully UTF8, meaning that we probably need a way to translate or convert between string objects and byte objects as we make that transition to Python 3. So in Python 3, if we use the encode method, it will actually convert a string to a byte representation. If we use the decode method of a Byte object in Python 3, it will convert bytes back to strings. So we have now a bidirectional way of converting between the different data types, which is awesome. Now, another issue arises here as we make the transition to Python 3, and that is codecs. Many of you have probably been used to using codecs in Python too, where we can have a string variable or a string object, and then we can just say.in code maybe base 64 in between the parentheses there, and receive for ourselves a base 64 encoded string of that string variable. Well, in Python 3 they've basically hijacked the encode method because now it's used to perform encoding between byte objects and string objects. And so you can no longer do that, or you should no longer do that. So what you need to do, if you've been using the codecs for from the string object in Python 2, you need to adopt the codecs module instead. The Codex module is available both in Python 2 and Python 3. We can import that codecs module and then use codex.in code and codex decode for various codecs such as base 64, rot 13, zip and so forth. Right? So we do have a way to get around that. Okay, now the other thing this brings into play is some file handling differences as we're reading data off disk and writing data to disk. When you're dealing with files in Python, the normal modes for opening files are R for read, W for Write, A for Append, R plus for Read and Write. Python 3, though, now has to recognize the fact that it's dealing with Unicode text as well as dealing with the idea of byte objects. And so Python 3 has introduced a couple of additional file modes for text and binary files using the T and the B. So if we open a file read in text mode, then we will be interpreting the data in that file as if it were unique Unicode or UTF 8 strings. If we open it in binary mode, we're going to interpret that data in the file as if it were byte objects. The other thing that occurs with files is Python 3 will properly handle any of the line ending issues. So this is something that goes way back in time to when we first were dealing with the differences between Unix, Linux and Windows. And everybody I know, everybody on this webcast has probably done this. You ever picked up a file from a Unix system, it's not Linux, it's Unix, right? Because I'm old school, right, and taken that file over to a Windows system and opened it up in Notepad and then suddenly noticed that the text file has no line endings, right? It's all mushed up into one big blob of text and you swear up and down and get very frustrated with this. And this is a legacy of Windows expressing the end of line as a backslash R, backslash N or carriage return and line feed versus Unix expressing an end of line only as a line feed. So Python 3 will properly handle those line ending translations when you're in text mode, which is awfully convenient. So a couple of quick examples in Python 3 here I decided to open the file bin bash and notice how the second parameter to the file open method here has an R and a B in it, meaning we're opening that in read binary mode. Now we'll read 40 bytes from that file using the file handle variable read. And if we look at the type of the object that we get back, notice how it is of class bytes, meaning that Python 3 gave us back a byte oriented object to reflect the fact that this is a binary file. If we alternatively open another file down towards the bottom of the slide here. Etc, password. For example, in read text mode and we read some content, the type of object we get back is going to be class str, meaning it is a string object. In fact, with the default encoding it is a UTF 8 encoded string object. Any questions?
B
Jason, when was underscore future underscore introduced in Python 2? Wondering if this might be available might not be available on an older system without data Python 2 installations is underscore Python Underscore available in Python distributions like iron Python or C C Python Double.
A
Underscore Future double underscore I believe and if anybody looks it up and wants to correct me on this one, they're welcome to I believe it was introduced right around Python 2.1 and has been available since since then. Is it available on things such as iron Python? It should be. I haven't directly had experience with that to test out that hypothesis. However, iron Python is based on Python 2 and should have the same features as the Python 2 interpreter that's installed with it. Iron Python I don't Think has a Python 3 version yet, although last I looked they were very, very close.
B
Few more questions. If you try to open a file in wrong mode, does Python have a way of knowing this or does it allow it?
A
If you try to open a file in the wrong mode, Python doesn't know it, it will simply allow it and you will get back from the file whatever data you in the mode that you opened it in. So Python's agnostic that way. It's just going to open the file regardless. All right, some memory savings features. Now this, this to me is very, very useful. Python 2 had this habit of with the various built in functions that used to return lists as the object that was created whenever you used the function examples would be the range statement or the map statement. There are other examples, dictionary stuff I'm gonna get into in a minute. That also was was quite list intensive. Now what do I mean by list intensive? Well, if you had a very, very large range of numbers, for example, and you use the built in range function, a list is created in memory and a list that is created in memory is going to consume, you know, whatever the object representation of those integers is going to consume in that memory. It may be 4050k or whatever the number is. In Python 3 the the functions that are built in, many of them favor using iterable generator objects rather than lists for their return values. And this is actually a good thing. The iterable objects essentially allows us to loop across the result of these functions rather than generating a brand new list in memory. In other words. Python 3 has tried to raise memory efficiency in the interpreter by moving these functions away from generating a list for the result of each one of these functions, and range and map are classic examples of this. So in the case of range, range is a way we can generate a list of integers between the start value and the end value. And having a step value in the case of map, it's one of the very useful functions in Python that allows us to map either a built in function or a self defined function onto a list of objects or multiple lists of objects in both of these cases in Python 3 this has changed such that these two functions return an iterable range and map object instead of a list that they used to return in Python 2 dictionary have changed as well. So a dictionary in Python is a key value pair data structure. Some people may know dictionaries as the same as hash tables, e.g. associative arrays. If you've used Perl. Basically there are ways of having a key value pair where the key can be just about any object in Python and the value can be just about any other object. Think maybe of associating a list of names with the first initial. A points to Apple, B points to Bob, and so forth. That kind of thing. So if we were to create a dictionary in Python, we can use a variable A equals dict or A equals open curly brace co close curly braces. There are a number of methods associated with dictionaries. The three that we're most familiar with are the methods A keys, a values and a items in Python 2, all these three methods returned lists and you can imagine if you had a really large dictionary in memory. Creating a new list was also quite a memory intensive thing to do. Now Python 2 had this ability to create iterable generator objects using iterkeys, iter values and iter items, which do the same thing in many respects as Python 3's keys, values and items methods do, and returning that same iterable view object. All right, so I didn't say that very well. It's probably easier to see it as an example on screen and that's what I'll show you. So here for example, we've created a dictionary D equals and we've used the curly braces. A is associated with the value alpha. The key of C is associated with charlie. The key of D is associated with the string delta. If we were to use the keys method against that dictionary and use the type built in function around this, we get a type of list that is returned in other words. When we use this keys method, it returns back an actual Python list. Now, if we use the iterkeys method inside of Python 2, this returns a dictionary keyiterator object. This is our iterable generator. What is this iterable generator thing? What it really is is an object that we can only iterate across. In other words, we can only use it inside of a loop. So in this case we've got for I in D iterkeys and we're printing each value of I and we're suppressing the new line using the comma. This is Python 2 syntax. And notice how we print a C and D that corresponds to the keys and a C and D. Now, we could have easily done this for loop down here with just the keys method. We would have seen the exact same thing. The only difference being that it creates a list in memory before iterating across those list items. In other words, it's going to consume a little bit more memory in Python 3. First thing we can observe if we try to do the same thing, is that the iterkeys method is good gone. It no longer exists. In fact, if we create our dictionary up here, D equals A alpha, C, charlie, D delta and so forth, we try to call the iterkeys method, it gives us a trace back saying this dict object has no attribute iterkeys. Iterkeys has disappeared in Python 3. We can actually fetch a view of the keys in Python 3, which is a iterable object using just the keys method in Python 3. Now if we then iterate across that view, for example, for I in X, where X has the value returned from that method, then we would see the keys A, C and D. And in this case I am printing each one of the keys with an end character of a space in my print function. Now, one of the cool things about this view generation in Python 3 is if I were to enter another key value pair into the dictionary, such as this right here, where I'm saying D the key of Z equals Zulu, and then I were to iterate through that keys of the dictionary. Again, notice how the view has been updated. In other words, we have the Z for Zulu and in the keys. Now, I did not refetch the keys, I simply iterated over the X variable which already contained the keys. So in other words, invisibly in the background, if you like, Python has updated the view of the keys for you, which is a really terrific feature. Another thing that has changed is variable scope in Python to list comprehension. This is a fairly minor change, but it may have impact if you're using list comprehensions and depending on the idea that the variable will still be in existence after the list comprehension is finished. What do I mean by this? Here's an example in Python 2, okay, we're setting this x variable to the integer 999. Then we're setting a equals 1, 2, 3, 4, 5 inside of a list here. So we have a list of integers. The next statement B Bengals x multiplied by 2 for x in a if x is less than 10 inside of the square brackets. This is a classic list comprehension. List comprehensions always have three components. They have an expression, then they have an iteration, which is the actual for loop in the middle here. And then they have a filter, which is the if test here at the end of the list comprehension. And notice how B then has 2, 4, 6, 8 and 10 as a result, which is basically each of the elements from the A list multiplied by two. So list comprehensions allow us to create a new list using this expression, iterator and filter kind of syntax.
The other thing to realize about list comprehensions in Python 2 is if we look at the variable X after performing the list comprehension, it has the last value of the iteration. In other words, X retained this number five because the number five was the very last value of the A variable. And in some cases this might be considered suboptimal, might even be considered a bug. Some people might consider it a feature. The thing that you need to be aware of is you cannot depend on that in Python 3. In Python 3, if we perform the very same list comprehension, the X variable that we use inside the list comprehension actually has its own new local scope. In other words, the act of using a list comprehension is creating a stack frame and actually is giving us an X which is unique just for the duration of that list comprehension's execution. What do I mean by this? Well, look, if we have x equals 999, we have a equals 1, 2, 3, 4, 5 in the in the parentheses. Here we perform the exact same lift list comprehension, we look at our B list, we get 2, 4, 6, 8, 10. But if we go back and look at the original X variable, it has retained the value 999, which is part of its globally scoped variable value that was there to begin with. Which in my mind, from pure computer science perspective, is exactly what you would want it to do. So this is a subtle change, but it's a change that you ought to be aware of if you're depending on the fact that the variable that you use in a list comprehension is in fact the same as the scope within which you're using it. And that is actually all I have for immediately the webcast because I do want to leave us some time for questions. Some helpful resources here. Of course you can go look at the SANS course Automating Information Security with Python and I highly recommend that you do that. I'd love to have you in class with me. We have a lot of fun in the class. I think it's one of the best classes that we offer at sans, but then again I'm a little biased. I absolutely and wholeheartedly recommend that you look@docs.python.org there's also a porting section in docs.PI python.org in the how to area there that you can look at. There is a link for Python 3 for scientists readtheDocsIO which is a transition guide which mentions many of the same things that I've mentioned in this webcast. There's another one on the wiki which is porting to Pi3k, a bilingual quick ref. I will tell you that if you take care of your strings and your print statements and your input side of things, as well as leverage it with the from future module imports, you've pretty much got 80 to 90% of the job done. The rest is going to be somewhat debugging if you're into some sort of feature that's buried deep in your code but should not be too difficult. Just a quick shout out to python converter.com there is an online resource, there's probably others that you are aware of as well, which is a Python 2 to 3 translation automatic converter. I did a quick exercise while I was preparing for the webcast to paste a script in. So on the left hand side here you can see that I've created created a dictionary and then I have used iter keys and then I have printed a variable out of the dictionary and then I performed a quick division and then I did the exact same thing after. Well I didn't do the same thing. I pressed the translate button should I say? And I ended up with what the python converter.com thought was my Python 3 compatible code. It did correctly change over my keys method, it did correctly translate my print statement, it did not change my division. However, I suspect that is quite correct in the way that it's written because my division actually used a floating point as one of the arguments, which means it will act the same way in both languages in this particular case, and that's all I've got for now. If you want to contact me or comment or otherwise, please avoid trolling. You can find me on the Twitters othyer and look at some of those additional resources. So let's open it up for questions at this time. Sure.
B
With change in string syntax, can you still easily use a list of or array of arguments and call them with their respective position like argument zero? Or is this still something that has changed?
A
Yeah, so I think what the what the person is asking is can you use slicing and I didn't talk about slicing with Python 3 strings by indexing an individual atomic part of that string using square brackets and the index number against that string and the answer is yes.
B
So question reiterating reformatting from previous question have all or most of the Red team Python tools or libraries you've been using for the last past 5 years been migrated to Python 3?
A
So have all the modules? Is that the question?
B
All or most of the Red team Python tools or libraries you've been using for the past five years migrated to Python 3?
A
Yeah. So in my experience the answer is yes for the most part. But be careful. If you're writing, for example, proof of concept code to perform some sort of injection attack against some sort of network service, for example, you're going to want to probably do that as byte objects. That's one thing to look at. The other thing that frankly has become difficult is urllib, and somebody mentioned this earlier, urllib 2 urllib 3. I still would highly recommend that you look at the requests module when you're dealing with HTTP transactions, because that's really the way to go. Requests is just so much of a better module. I think the real answer to that though, is when you get down to the low level, you're going to deal with focusing on converting some of your scripts to be using byte objects, especially if you're doing things like bit masking and shifting bits around.
B
How do you feel about Iron Python?
A
I love Iron Python. I think it's a great project. Anything that is able to access the. Net runtime in a different way with different languages gives us a tremendous ability to do more interesting things as pen testers, forensicators, defenders. And Iron Python gives us direct access into the. NET Framework, which is really fantastic. I just hope that they go to Python 3, because that's, well, kind of important, in my opinion. If you are interested in Iron Python, I highly recommend that you look up some of Marcello's work with Silent Trinity and someone actually has a message on Silent Trinity here. And the actual question, which is a great segue for me, will Silent Trinity make its way into the SEC573 class? Jacob I suggest that it probably will at some point. I'm not the course author, but I know that the course author knows Marcelo and has been working with Marcelo. I'll give Marcelo a quick shout out to potentially bring him in on board as a SANS instructor, so I think that might be a yes.
B
Another question it may have been addressed before Join the talk, but will changes in syntax change the course significantly? The PI 73 course? Like should they wait a year or take the updated course or take it now?
A
Good question. So I'm going to tell you right now that a fair amount of the material in this webcast has been rephrased and borrowed from 573. So we do have in Sec573 numerous shout outs to Python, two to three conversions and scripting. Additionally, with Sec573 we teach both forms of the language throughout the week and I also highly recommend that most of my students try their scripting out in both languages as we proceed through the week. So if you're thinking about taking it, I would highly encourage you. Just go ahead and do so. We're definitely Python 3 friendly.
B
Tim has a comment or yeah, with programming, learn early, repeat often. Programming is journey smiley face.
A
Well said Tim. That's absolutely the truth. I would highly recommend for those that are not Python people, the original design statements by Guido Van Rossum and the crew was really a reference. Sorry, not a reference. It was really emphasizing the fact that Python should be an intuitive and easy language to learn. The goal was to make it fun and honestly I think they achieved that goal. And so for even somebody new into programming or scripting, the transition into Python, if you've just sort of dabbled with things like Bash, is really a good path to take and I'd highly recommend it.
B
Joth how have you used the 6 library for writing portable code?
A
Have I used which library?
B
The 6 library for writing portable code.
A
I have not used the 6 library for writing portable code. I don't have a reference to that one personally, but I might go look that up. So thank you.
B
It says so the global local scope is like Perl or C. So global.
A
Local scope is like Perl or C. The answer would be yes, Python does have local scopes. In fact, Python has multiple scopes. It has local, enclosing, global and built in in terms of its resolution order and it's very, very similar to C in that when you're in a function, for example, in Python, it's going to create a stack frame and the variables inside that function are going to be part of that stack frame. They're going to be local to that stack frame. Stack frame. You can reference global variables inside of functions in Python. If you do so, it's considered poor programming practice. As with the essence of computer science from days of old, that still applies. If you're going to write functions, you should always pass in your input parameters as parameters to the function and return back your result back to the main calling program. But definitely there is global and local scope, just like those other languages to wrap up.
B
If you have ideas for future webcasts, we would love for you to leave them in the questions window. Like, hey, I really wish you guys would cover this, or I really wish you would talk about that. So feel free to add those and we'd love to take your ideas and give future webcasts based on them. Jaff, any final thoughts?
A
Well, yes, and that is sort of piggybacking on what you just said. If folks have more of a specific, focused Python example that they want to call out and see differences between Python 2 and Python 3, maybe working a real example that they're grappling with, that might be a good suggestion for a webcast we can see about engaging in that journey together, because I think that would be a good thing to do. Sort of a real example of working through some of the problems I've talked about today.
B
All right, and with that, thank you so much for tuning in for our webcast today. And if you do need a pen test, Red Team or Threat Hunt, please consider Black Hills Information Security. Visit us@blackhillsinfosec.com Joff thank you so much for your time today. If you're listening to the recording, go ahead and look down below and you'll see the links to the slides. If you are listening, live slides will be available when this is available on YouTube. Thanks so much everyone.
A
Thanks everybody.
Thanks for listening. Remember, if you enjoyed this podcast to leave us a positive review on your streaming service.
B
See, there's still some people here. I wonder if they're wondering if we're going to do something else.
A
Shows over. Go home, everybody.
Episode: BHIS Podcast: Py2K20 - Transitioning from Python2 to Python3
Host: Black Hills Information Security (BHIS), with lead speaker Joff Thyer
Date: May 31, 2019
This episode dives into the critical migration from Python 2 to Python 3, discussing technical distinctions, core language changes, migration strategies, and the implications for infosec practitioners and toolsets. Joff Thyer, a pen tester, developer, and SANS instructor, walks listeners through both the practical and theoretical aspects of transitioning code and workflows as Python 2 reaches end-of-life. The session focuses on core language updates rather than third-party modules and is packed with examples, tips, and a lively Q&A.
The end-of-life (EOL) for Python 2 means no more updates or support from the core Python team after 2020.
PEP 394 details how operating systems map the python command to specific Python versions, with Linux and Mac likely defaulting to Python 2 until it's safe to change due to widespread dependencies. For Windows, it's user-controlled and version-dependent.
The real impact is in maintenance headaches and script breakage, especially for sysadmins and teams relying on legacy scripts.
"The main barrier for changing the Python symbolic link to Python 3 is going to be breakage of course of third party scripts and packages." — Joff Thyer [05:44]
The from __future__ import ... statement in Python 2 allows developers to import features from Python 3 for forward compatibility.
Examples: division behavior, print function.
"The future statement is a migration feature that allows us to program in...Python 3 syntax within a Python 2 script, a very, very useful feature." — Joff Thyer [07:19]
Keywords [07:48]
True, False, and None as true language keywords.Division [08:36–11:11]
In Python 2, / is integer division (unless operands are floats); Python 3, / always produces a float. Use // for integer (floor) division.
Migration tip: from __future__ import division in Python 2 to mimic Python 3.
"In Python 3, all division mathematically is going to be floating point unless you explicitly use the Floor operator, which is a double slash." — Joff Thyer [09:14]
Print Function [11:12–13:18]
Python 2 uses print as a statement; Python 3 requires parentheses: print().
In Python 2, print() prints a tuple, not a string; leads to subtle bugs.
New line suppression syntax changed: use end= parameter in Python 3.
"In Python 3, print has actually become a function. So...you actually have to use parentheses." — Joff Thyer [12:03]
Format Strings [14:21–17:43]
% formatting; Python 3 adds .format() with {} braces..format() method is more explicit and flexible (alignment, fill character, number formatting).raw_input() is safe for user input, while input() can result in code execution.raw_input() is gone; input() is safe and preferred.input = raw_input in Python 2 for compatibility.Major shift: strings in Python 2 = bytes; in Python 3 = Unicode (UTF-8 by default).
Conversion between str and bytes is explicit; use .encode() and .decode().
Codec-related functions change; use the codecs module for things like base64 encoding.
File modes now include text ('t') and binary ('b') modes for reading/writing to handle encoding differences.
Line ending conversions are handled automatically in text mode.
"Python 3 treats all strings by the default encoding, which is, you know, UTF8." — Joff Thyer [22:23]
range and map return lists.keys(), values(), and items() (return lists) plus iterkeys(), itervalues(), iteritems() (generators).keys(), values(), and items() return "views" (iterators) by default; no more iterkeys(), etc.Python 2: variables from list comprehensions leak into the containing scope.
Python 3: comprehension variables are local to the expression, preventing accidental variable reuse or overwrites.
"In Python 3...the x variable that we use inside the list comprehension actually has its own new local scope." — Joff Thyer [41:44]
On future adoption fears:
"I think this is probably the number one issue that scares everybody and holds them back from adopting the new version because this brings some baggage with it." — Joff Thyer on string encoding [22:38]
On the ease of migration:
"If you take care of your strings and your print statements and your input side of things...you've pretty much got 80 to 90% of the job done." — Joff Thyer [44:04]
On Python philosophy:
"Python should be an intuitive and easy language to learn...the goal was to make it fun, and honestly I think they achieved that goal." — Joff Thyer [50:22]
Floating point "weirdness"
[18:59]
Input and security
[20:23]
On Red Team Tools and Modules
[46:43]
IronPython & .NET Integration
[47:57]
__future__ module and migration practicalitysix lib), variable scopefrom __future__ imports liberally in Python 2 to prep code for the transition.requests library over urllib.The transition from Python 2 to Python 3 is non-trivial but highly manageable—especially by tackling the common pitfalls outlined: print, input, encoding, and type distinctions. Start modernizing scripts now using the tips Joff shares, and leverage both the official docs and automated tools to ease the process.
"Programming is a journey—learn early, repeat often." — Listener Tim [50:09]