# DyingLoveGrape.

## Part 2, Section 3: A Bit of Programming.

### Assembly For Linux.

[This part is under construction!]

If you've made it this far, Kudos; you're already passed the first step to becoming a programmer! If you'd like, you may stop at this point and play around with Python a bit more, or even venture into other languages like Java --- the more "programming maturity" the better, but it isn't strictly necessary.

Here, we step back and ask, "That was certainly nice of Python to do all those things for me, but how did it do them?" You may recall from the first few lessons that computers generally work in 1's and 0's, and Python is written almost entirely in english words! Moreover, we didn't need to worry about where things went when we defined or asked for variables or when we wanted to print something out. This is similar to having a bicycle and riding it everywhere but not knowing how any of the parts work. To remedy this, we're going to step back quite a bit and dig downwards into the depths to get as near as possible to the language of the computer itself. While it is possible to code directly in 1's and 0's, the closest reasonable way to look at the way computers process things is through assembly language.

Similar to how we programmed in Python, we will program in Assembly; though, because it is not especially important to us to make crazy complex programs in Assembly (we can already do this in Python!) we're going to make extremely simple programs and see how those work to give us a good idea of how the computer works in general.

For this section I've chosen to use the e-book "Programming from the Ground Up" (and not just because we both happened to use the phrase 'from the ground up'!). It is beginner-friendly, well-written, and it covers everything we'll need to know about Assembly. We will not be reading the entire book, though most of it is accessible for a beginner.

Warning: Make sure that your version of Ubuntu is 32-bit, or you will not be able to work along with the examples in the book.

• Skim through the first chapter; don't worry about the requirements, we have all that.

• Read the second chapter carefully; this information will reinforce some of the computer architecture things you've looked at in part 1.

• This book does have quesions at the end of the chapters, which I encourage you to think about. I'll provide answers and hints to similar questions.

• Chapter 2 Exercises.

• Click the question box to show/hide the hint/solution.

• What is a byte?

A byte is a digital information storage unit; it holds a number from 0 to 255. Why 255? In binary eight bits (a bit is a single 0 or 1) can fit into a byte, so, for example, $01001100$ or $11110111$ are a byte in size. Recall that in binary these numbers stand for a power of two each. For example, $11111111 = 2^{0} + 2^{1} + 2^{2} + \dots + 2^{7}$ $= 1 + 2 + 4 + \dots + 128 = 255$ So, since $00000000$ in binary is 0 in decimal and $11111111$ in binary is 255 in decimal we can store any number between 0 and 255 in a byte.

• Can the number 144 in binary fit into a byte? How is 144 represented in binary? Can 256 fit into a byte in binary? Take a guess as to how 256 is written in binary!

Since $144 = 2^{7} + 2^{4}$ we may represent this as a byte by expressing it as $1001000$. Unfortunately, 256 is too large to respesent, but $256 = 2^{8}$. Hence, we can reprsent it as $100000000$; notice there are nine digits in the binary representation, which is one too many to fit into a byte. Saddest.

• What is a word in terms of the information unit? Where are the numbers 0 and 4294967295 coming from?

A word is four bytes, so it looks something like this: $00100100\,10000000\,10010001\,11100111$ where I've left spaces there to show you that it's four groups of eight numbers. In reality, there'd be no "space" between them, it'd be one really long binary number. Note also that: $00000000000000000000000000000000$ in binary is equal to 0 in decimal and $11111111111111111111111111111111$ in binary is equal to 4294967295 in decimal (why?). Hence, 4294967295 is the largest number we can represent in a word.

• What's the difference between "regular" memory and registers? How big is a register?

The register is a "special purpose" part of the memory which can store information which you are currently working with. Registers can be a few different sizes: either 1, 2, 4, or 8 bytes. Older computers (like those from the 1980's) used registers with 1 byte registers (so each register had 8 bits of space). A bit later, registers were expanded to hold 2 bytes, or 1 word (which is 16 bits). Modern computers will either have registers which hold 3 bytes or 4 bytes. Notice that 3 bytes is 32-bits and 4 bytes is 64-bits; this is a primary difference between a "32-bit" computer and a "64-bit" computer.

• Suppose we wanted to store in memory your first name, your last name, your age, and the name of the month you were born. How could this look in a block of memory? How could this look as a block of pointers? What's the difference?

We'd have to pick a limit (or else there could be disasterous results!) for each part. It takes 1 byte (8 bits) to represent a letter in ASCII, so let's say that we limit our first name to 25 characters; that's 25 bytes. Similarly, let's limit our last names to 50 characters; that's 50 bytes. Our ages can be stored pretty easily in 1 byte (how?) and our birth-month's name can have a 25 character (25 byte) limit. The chunk of memory would look like this (where "Start" is the beginning of the block of memory we're using): $\begin{array}{cc} \textbf{Item} & \textbf{Location in the Block}\\\hline \text{First Name} & \text{Start + 0 bytes}\\ \text{Last Name} & \text{Start + 50 bytes}\\ \text{Age} & \text{Start + 75 bytes}\\ \text{Month Name} & \text{Start + 76 bytes}\\ \end{array}$ For example, "First Name" begins at 0, which we call Start. "Last Name" starts at Start + 50 bytes because "First Name" began at 0 and took up 50 bytes.

For pointers, it's a little easier. We store all of our information somewhere in memory. Each address in memory is 1 word (4 bytes) long, so we have: $\begin{array}{cc} \textbf{Pointer to Item} & \textbf{Location in the Block}\\\hline \text{Pointer to First Name} & \text{Start + 0 bytes}\\ \text{Pointer to Last Name} & \text{Start + 4 bytes}\\ \text{Pointer to Age} & \text{Start + 8 bytes}\\ \text{Pointer to Month Name} & \text{Start + 12 bytes}\\ \end{array}$ The difference is that the former stores all the information in one block of memory (we know exactly where the last name is if we know the location of the first name) whereas the latter stores information elsewhere in memory and simply gives us the addresses to these values.

• This chapter is somewhat long (around 25 pages) but do not skim it; if you need to, do a little each day. This is, in my opinion, the most important part of the book as it deals with the "meat" of an Assembly program.

• You'll be writing programs in this section. To do this on Linux, you'll be working in the terminal. You can either use nano (remembering that "save" is ctrl+O) or gedit. Gedit will be friendlier to use if you don't mind switching back and forth between gedit and the terminal.

• Actually write out the code. Seriously. Do it.
• Chapter 3 Exercises.

• Click the question box to show/hide the hint/solution.

• What are the operands for the movl instruction?

There are two of them: 'source' and 'destination' and are written as:

movl source, destination

where 'source' tells us what value we're going to put into the 'destination'.

• What is the instruction to move the number '4' into the ebx register? What about moving the contents of the eax register into the ebx register?

movl $1, ebx and movl %eax, %ebx • What are the names of the general-purpose registers (on the x86 processors, which we are using)? %eax, %ebx, %ecx, %edx, %edi, %esi. • What does the$ mean in front of the 1?

It means that we're going to be immediately addressing the number (as 1) instead of interpreting it as an address, a letter, etc.; the ways of addressing things was talked about a bit at the end of Chapter 2.

• What register does a system call need to be placed in?

%eax

• What does the program

movl $1, %eax movl$4, %ebx

int $0x80 do when it's run? What will typing echo$? into the terminal after running the program return? Try it!?

This is just a program which exits with 'exit status' 4. If you type echo $into the terminal after running it, it'll return '4', the exit status. Remember, the exit status doesn't mean anything by itself, it's just something programmers can utilize to say if things exited correctly or not. • Does the program movl$1, %eax

movl $4, %ebx do anything when it's run? If so, what? If not, why not? It doesn't do anything; the interrupt is missing! • How many bytes would the list .long 4, 5, 6, 7, 8, 999, 10 take up in memory? Each long is 4 bytes long and there are 7 of them, so the entire list is 28 bytes. • List the jump commands. Which one does what after a command like cmpl X, Y for some values X and Y? Let's make a little chart (which is also available in the book). $\begin{array}{c|c} \textbf{Jump Command} & \textbf{Following }\,\mathtt{cmpl\phantom{x}X, Y}\dots\\\hline \mathtt{je} & \text{Jump if }Y = X.\\ \mathtt{jg} & \text{Jump if }Y \gt X.\\ \mathtt{jge} & \text{Jump if }Y \geq X.\\ \mathtt{jl} & \text{Jump if }Y \lt X.\\ \mathtt{jle} & \text{Jump if }Y \leq X.\\ \mathtt{jmp} & \text{Jump no matter what.}\\ \end{array}$ • What does this code do? What happens when you run it? Read through it and think about it first, then copy and assemble it to see what happens! It defines a variable my_number = 42, and then begins by moving the number 5 into the %eax register. It then adds %eax to my_number. (In general, add X, Y adds the value of X to the value of Y.) Now my_number = 47. It then compares the number 42 to my_number. If my_number is less than or equal to 42 (which it is not), the program jumps to to the label less_equal; if my_number is greater than 42 (which it is), the program jumps to the label greater. Since 47 > 42, the program jumps to greater which moves the value 8 into the %ebx register then jumps to the exit_routine label, which exits the program as usual. If we were to run this program and then type echo$?, it would return the exit status %ebx which was, in this case, 8.

• [A Little Harder!] Without copying too much code from the book, create an Assembly program which finds the minimum number in a list of positive numbers. (This one is a bit harder; you're not going to want to end the list with 0 this time, or else what will happen?

The code will be almost the same as the book's code for the maximum number. One way to do this is is as follows. Look at the code in the 'start_loop' for the maximum-finding program. The last two lines are

cmpl %ebx, %eax
jle start_loop

The command jle start_loop right after says, 'go to the start_loop if Y is less than or equal to X'. If we want to find the minimum, we need to alter the program in the book to keep looping if %eax is greater than or equal to %ebx (why? Remember, %ebx is currently our smallest number!). Hence, we change the jump line at the end of the start_loop to be

jge start_loop

And try to run the program. When we type in echo $? we are shocked (shocked!) to see that it returns 0. Of course, 0 is the smallest number, but we didn't mean for that to be in our list! Hence, we need to improvise a bit. There's a few things we can do, but for a quick fix we will instead end our list with a large number (I used 1000), and change the line at the beginning of 'start_loop' to be cmpl$1000, %eax

(Why would we do that? What does it do?) Now, running the program should give the appropriate minimum value. Whew!

• To do...

• Finish up chapter 3 questions, concepts. Make a program that has exit code 2 if the variable is 0, and 1 otherwise.

• Addressing modes, layout of registers, little and big endian.

• How much of chapter four? Recursive functions in Python?

• Anything else in the book? Find memory management, buffer overflow tutorial. See what else was covered in Vivek's videos.

 ⇐ Back to 2.2 Home Onwards to 2.4 ⇒