学习汇编

原项目是原作者为自学NASM而设立的,本项目是译者为自学NASM而翻译的。

原项目的Github翻译版的Github

Lesson 1

The obligatory 'Hello, world!'
经典的 'Hello, world!'

Introduction to the Linux System Call Table. In this lesson we use software interrupts to request system functions from the kernel in order to print out 'Hello World!' to the console.

对Linux系统调用的引入。本课程中我们将使用软件中断(软中断)来请求Linux内核打印'Hello Word!'到控制台。

View lesson »

Lesson 2

Proper program exit

正确的程序退出

A very brief lesson about memory addresses, sequential code execution and how to properly terminate a program without errors.

关于内存地址简要的介绍,指出了如何没有错误地终止程序。

View details »

Lesson 3

Calculate string length
计算字符串长度

What if we wanted to output something that we don't know the length of? Like user input? Learn about loops, labels and pointer arithmic.

我们该如何输出一个像是用户输入这样的不知道长度的字符串? 本课介绍循环、标签和指针算术运算。

View details »

Lesson 4

Subroutines
子程序

Introduction to the stack and how to write clean, reusable code in assembly.

关于栈的介绍、以及如何在汇编中写出清晰且具有可复用性的代码

View lesson »

Lesson 5

External include files
包含外部文件

To further simplify our code we can move our subroutines into an external include file.

为了化简我们的代码,我们可以将一些子程序移动带外部文件中。

View details »

Lesson 6

NULL terminating bytes

A quick lesson on how memory is handled. This lesson also fixes the duplication bug we added in lesson 5.

View details »

Lesson 7

Linefeeds

How you can use the stack to print linefeeds after strings and an introduction to the Extended Stack Pointer ESP.

View lesson »

Lesson 8

Passing arguments

Passing arguments to your program from the command line.

View lesson »

Lesson 9

User input

Introduction to the BSS section and how to trigger a call to sys_read to process user input.

View lesson »

Lesson 10

Count to 10

Introduction to numbers and counting in assembly.

View lesson »

Lesson 11

Count to 10 (itoa)

Introduction to ASCII and how to convert integers to their string representations in assembly.

View lesson »

Lesson 12

Calculator - addition

Introduction to calulating numbers in assembly. This tutorial describes a simple program to add two numbers together.

View lesson »

Lesson 13

Calculator - subtraction

Introduction to calulating numbers in assembly. This tutorial describes a simple program to subtract one number from another.

View lesson »

Lesson 14

Calculator - multiplication

Introduction to calulating numbers in assembly. This tutorial describes a simple program to multiply two numbers together.

View lesson »

Lesson 15

Calculator - division

Introduction to calulating numbers in assembly. This tutorial describes a simple program to divide one number by another.

View lesson »

Lesson 16

Calculator (atoi)

This program takes a series of passed string arguments, converts them to integers and adds them all together.

View lesson »

Lesson 17

Namespace

Introduction to how NASM handles namespace when it comes to global and local labels.

View lesson »

Lesson 18

Fizz Buzz

The Fizz Buzz programming challenge recreated in NASM.

View lesson »

Lesson 19

Execute Command

In this lesson we replace the currently running process with a new process that executes a command.

View lesson »

Lesson 20

Process Forking

In this lesson we create a new process that duplicates our current process.

View lesson »

Lesson 21

Telling the time

In this lesson we ask the kernel for the current unix timestamp.

View lesson »

Lesson 22

File Handling - Create

In this lesson we learn how to create a new file in Assembly.

View lesson »

Lesson 23

File Handling - Write

In this lesson we write content to a newly created text file.

View lesson »

Lesson 24

File Handling - Open

In this lesson we open a text file and print it's file descriptor.

View lesson »

Lesson 25

File Handling - Read

In this lesson we read content from a newly created text file.

View lesson »

Lesson 26

File Handling - Close

In this lesson we close a newly created text file using it's file descriptor.

View lesson »

Lesson 27

File Handling - Update

In this lesson we update the content of an included text file using seek.

View lesson »

Lesson 28

File Handling - Delete

In this lesson we learn how to delete a file.

View lesson »

Lesson 29

Sockets - Create

In this lesson we learn how to create a new socket in assembly and store it's file descriptor.

View lesson »

Lesson 30

Sockets - Bind

In this lesson we learn how to bind a socket to an IP Address & Port Number.

View lesson »

Lesson 31

Sockets - Listen

In this lesson we learn how to make a socket listen for incoming connections.

View lesson »

Lesson 32

Sockets - Accept

In this lesson we learn how to make a socket accept incoming connections.

View lesson »

Lesson 33

Sockets - Read

In this lesson we learn how to read incoming requests on a socket.

View lesson »

Lesson 34

Sockets - Write

In this lesson we learn how to make a socket respond to incoming requests.

View lesson »

Lesson 35

Sockets - Close

In this lesson we learn how to shutdown and close an open socket connection.

View lesson »

Lesson 36

Download a Webpage

In this lesson we're going to connect to a webserver and send a HTTP request for a webpage. We'll then print the server's response to our terminal.

View lesson »


Lesson 1

Hello, world!

First, some background
一些背景知识

Assembly language is bare-bones. The only interface a programmer has above the actual hardware is the kernel itself. In order to build useful programs in assembly we need to use the linux system calls provided by the kernel. These system calls are a library built into the operating system to provide functions such as reading input from a keyboard and writing output to the screen.

汇编语言是裸机编程语言。汇编程序员所接触的除了硬件就是内核。通过使用Linux内核提供的系统调用,我们可以用汇编构建各种有用的程序。系统调用被设计成了操作系统提供的一些功能,例如从键盘中读取输入或者输出到屏幕。

When you invoke a system call the kernel will immediately suspend execution of your program. It will then contact the necessary drivers needed to perform the task you requested on the hardware and then return control back to your program.

当你触发系统调用时,内核将立即暂停你的程序的执行。接着内核会调用必要的驱动以在硬件层面执行你所请求的任务,接着将控制权返回给你的程序。

Note: Drivers are called drivers because the kernel literally uses them to drive the hardware.

注意: 驱动(Drivers) 之所以被叫做drivers 是因为内核字面上地使用它们来“驱动”硬件设备。

We can accomplish this all in assembly by loading EAX with the function number (operation code OPCODE) we want to execute and filling the remaining registers with the arguments we want to pass to the system call. A software interrupt is requested with the INT instruction and the kernel takes over and calls the function from the library with our arguments. Simple.

我们通过将想要执行的系统调用编号(操作数OPCODE)写入EAX寄存器并将其他要传递给系统调用的参数传入其余寄存器以在汇编中执行系统调用。软中断是由INT命令触发的,紧接着内核就会接管程序并以我们传递的参数调用相应的系统调用。

For example requesting an interrupt when EAX=1 will call sys_exit and requesting an interrupt when EAX=4 will call sys_write instead. EBX, ECX & EDX will be passed as arguments if the function requires them. Click here to view an example of a Linux System Call Table and its corresponding OPCODES.

举例来说,请求sys_exit(程序退出)需要将EAX设置为1再触发中断;请求sys_write(写入)需要将EAX设置为4。这个链接指向了一份Linux系统调用表

Writing our program
写程序

Firstly we create a variable 'msg' in our .data section and assign it the string we want to output in this case 'Hello, world!'. In our .text section we tell the kernel where to begin execution by providing it with a global label _start: to denote the programs entry point.

首先我们在data段创建了变量msg并指定其为'Hello World!'。我们在text段我们通过global标签告诉内核_start是程序的入口点。

We will be using the system call sys_write to output our message to the console window. This function is assigned OPCODE 4 in the Linux System Call Table. The function also takes 3 arguments which are sequentially loaded into EDX, ECX and EBX before requesting a software interrupt which will perform the task.

我们将使用sys_write来输出文本到控制台窗口。sys_write在x86架构的Linux系统调用表中是4号。此外,在触发系统调用之前,我们需要将其他三个参数写入EDX,ECX和EBX.

The arguments passed are as follows:

  • EDX will be loaded with the length (in bytes) of the string.
  • ECX will be loaded with the address of our variable created in the .data section.
  • EBX will be loaded with the file we want to write to – in this case STDOUT.
The datatype and meaning of the arguments passed can be found in the function's definition.

参数如下:

  • EDX 存入字符串长度
  • ECX 存入.data段字符串变量的地址
  • EBX 存入写入文件句柄,传入1意味着写入到标准输出(STDOUT)
具体的数据类型和参数含义可以在函数定义里面找。 (译者注:man 2 write可以查看sys_write调用的具体内容)

We compile, link and run the program using the commands below.

通过如下的命令编译、链接和运行我们的程序。

helloworld.asm
                            ; Hello World Program - asmtutor.com
                            ; 编译: nasm -f elf helloworld.asm
                            ; 链接 (64位系统需要指定elf_i386): ld -m elf_i386 helloworld.o -o helloworld
                            ; 运行: ./helloworld

                            SECTION .data
                            msg     db      'Hello World!', 0Ah     ; 声明msg变量来储存字符串

                            SECTION .text
                            global  _start

                            _start:

                                mov     edx, 13     ; 要写入的字节数,每个字母加上0Ah(换行符)
                                mov     ecx, msg    ; 将要写入的内容的地址传递到ECX
                                mov     ebx, 1      ; 写入到STDOUT
                                mov     eax, 4      ; 将SYS_WRITE的系统调用编号写入EAX
                                int     80h
                            
~$ nasm -f elf helloworld.asm ~$ ld -m elf_i386 helloworld.o -o helloworld ~$ ./helloworld Hello World! Segmentation fault

Error: Segmentation fault


Lesson 2

Proper program exit

正确的程序退出

Some more background
更多的背景知识

After successfully learning how to execute a system call in Lesson 1 we now need to learn about one of the most important system calls in the kernel, sys_exit.

在第一课学完如何执行系统调用后,我们将学习最重要的系统调用之一——sys_exit

Notice how after our 'Hello, world!' program ran we got a Segmentation fault? Well, computer programs can be thought of as a long strip of instructions that are loaded into memory and divided up into sections (or segments). This general pool of memory is shared between all programs and can be used to store variables, instructions, other programs or anything really. Each segment is given an address so that information stored in that section can be found later.

我们的HelloWorld程序运行完之后会触发Segmentation fault(段错误)。计算机程序可以理解为加载到内存中的一系列的指令并且被分为了若干段。内存池被全部的程序共享,用以储存变量、指令以及其他各种东西。每一个段都有着段地址,所以段中储存的信息可以被找到(寻址)。

To execute a program that is loaded in memory, we use the global label _start: to tell the operating system where in memory our program can be found and executed. Memory is then accessed sequentially following the program logic which determines the next address to be accessed. The kernel jumps to that address in memory and executes it.

为了执行加载到内存中的程序,我们使用了global _start告诉操作系统我们的程序从哪里开始执行。接着会访问内存,并按照程序逻辑依次确定接下来要访问的地址。内核会调转到相应的地址并将其执行。

It's important to tell the operating system exactly where it should begin execution and where it should stop. In Lesson 1 we didn't tell the kernel where to stop execution. So, after we called sys_write the program continued sequentially executing the next address in memory, which could have been anything. We don't know what the kernel tried to execute but it caused it to choke and terminate the process for us instead - leaving us the error message of 'Segmentation fault'. Calling sys_exit at the end of all our programs will mean the kernel knows exactly when to terminate the process and return memory back to the general pool thus avoiding an error.

很重要的一件事是告诉操作系统我们的程序从什么地方执行、从什么地方结束。在第1课中我们没有告诉内核何时停止执行。因此,在我们的程序中,调用sys_write之后会紧接着执行内存中紧随其后的地址,可能存放着各种东西。我们不知道内核尝试执行了什么,但是这引发了问题并且导致我们的程序被终止,报错是Segmentation fault。在程序末尾执行sys_exit可以精确地告诉内核什么时候结束我们的程序并且将我们程序所占用的内存返回给内存池,以此避免触发错误。

Writing our program
写程序

Sys_exit has a simple function definition. In the Linux System Call Table it is allocated OPCODE 1 and is passed a single argument through EBX.

sys_exit是非常简单的一个系统调用。在x86 Linux系统调用表中编号为1,只需要向EBX传入一个数字作为程序退出的状态。(译者注:相当于C语言main函数的返回值)

In order to execute this function all we need to do is:

  • Load EBX with 0 to pass zero to the function meaning 'zero errors'.
  • Load EAX with 1 to call sys_exit.
  • Then request an interrupt on libc using INT 80h.

为了执行这个函数,我们要做的事情如下:

  • 将EBX设置为0,意味着程序没问题.
  • 将EAX设置为1,以调用sys_exit
  • 通过INT 80h调用C标准库调用

We then compile, link and run it again.

我们再次编译、链接和运行。

Note: Only new code added in each lesson will be commented.

注意: 只注释了新增的代码

helloworld.asm
                            ; Hello World Program - asmtutor.com
                            ; Compile with: nasm -f elf helloworld.asm
                            ; Link with (64 bit systems require elf_i386 option): ld -m elf_i386 helloworld.o -o helloworld
                            ; Run with: ./helloworld

                            SECTION .data
                            msg     db      'Hello World!', 0Ah

                            SECTION .text
                            global  _start

                            _start:

                                mov     edx, 13
                                mov     ecx, msg
                                mov     ebx, 1
                                mov     eax, 4
                                int     80h

                                mov     ebx, 0      ; return 0 status on exit - 'No Errors'
                                mov     eax, 1      ; invoke SYS_EXIT (kernel opcode 1)
                                int     80h
                            
~$ nasm -f elf helloworld.asm ~$ ld -m elf_i386 helloworld.o -o helloworld ~$ ./helloworld Hello World!

Lesson 3

Calculate string length

计算字符串长度

Firstly, some background
一些背景知识

Why do we need to calculate the length of a string?

计算字符长长度的必要性:

Well sys_write requires that we pass it a pointer to the string we want to output in memory and the length in bytes we want to print out. If we were to modify our message string we would have to update the length in bytes that we pass to sys_write as well, otherwise it will not print correctly.

sys_write系统调用需要传入我们想要输出的字符串指针以及输出长度。如果我们想要修改输出的字符串,就得相应地修改输出长度,否则会出问题。

You can see what I mean using the program in Lesson 2. Modify the message string to say 'Hello, brave new world!' then compile, link and run the new program. The output will be 'Hello, brave ' (the first 13 characters) because we are still only passing 13 bytes to sys_write as its length. It will be particularly necessary when we want to print out user input. As we won't know the length of the data when we compile our program, we will need a way to calculate the length at runtime in order to successfully print it out.

举例来说,你把上一课中的程序的字符串修改为“Hello, brave new world!”,接着编译链接运行。它只会输出'Hello, brave '(也就是前13个字符)。这是因为你传递给sys_write的参数是13。此外,计算字符串长度在处理用户输入数据时尤其重要。如果在编译时不知道字符串会有多长,那么就需要有办法来在运行的时候计算字符串的长度。

Writing our program
写程序

To calculate the length of the string we will use a technique called pointer arithmetic. Two registers are initialised pointing to the same address in memory. One register (in this case EAX) will be incremented forward one byte for each character in the output string until we reach the end of the string. The original pointer will then be subtracted from EAX. This is effectively like subtraction between two arrays and the result yields the number of elements between the two addresses. This result is then passed to sys_write replacing our hard coded count.

为了计算字符串长度,我们需要使用指针的算术运算技术。两个寄存器最初被设置为指向相同的地址。其中一个(本例中是EAX)将会在读取到输出字符串末尾之前逐个递增。接着我们使用这个减去另外一个寄存器中原有的值,差值就是字符串长度。这个方法类似两个数组的减法运算,结果是两个地址之间元素的数量。运算的结果被传递给sys_write代替硬编码的字符串长度。

The CMP instruction compares the left hand side against the right hand side and sets a number of flags that are used for program flow. The flag we're checking is the ZF or Zero Flag. When the byte that EAX points to is equal to zero the ZF flag is set. We then use the JZ instruction to jump, if the ZF flag is set, to the point in our program labeled 'finished'. This is to break out of the nextchar loop and continue executing the rest of the program.

CMP指令比较两个传入数据,并为程序流设置一个ZF标记。在本例中,当EAX指向的地址存入的数据为0时,会设置ZF标记。我们使用JZ命令在有ZF标记时跳转到finished标签。这将打破原有的nextchar无条件循环。

helloworld-len.asm
                            ; Hello World Program (Calculating string length)
                            ; Compile with: nasm -f elf helloworld-len.asm
                            ; Link with (64 bit systems require elf_i386 option): ld -m elf_i386 helloworld-len.o -o helloworld-len
                            ; Run with: ./helloworld-len

                            SECTION .data
                            msg     db      'Hello, brave new world!', 0Ah ; 可以任意修改msg,而且不用修改其他位置。

                            SECTION .text
                            global  _start

                            _start:

                                mov     ebx, msg        ; 将msg的地址传入ebx
                                mov     eax, ebx        ; 将ebx的内容传入eax。

                            nextchar:
                                cmp     byte [eax], 0   ; 将eax所指向的内存地址的内容与字符串结束符0做比较
                                jz      finished        ; 如果相等上一行,ZF标记会激活jz跳转,执行finished标签之后的命令。
                                inc     eax             ; 如果没能跳转,那么eax自增。
                                jmp     nextchar        ; 回到nextchar语句开头处

                            finished:
                                sub     eax, ebx        ; EAX减去EBX
                                                        ; 在开始的时候,两个寄存器的值是相同的(参见第15行)
                                                        ; 但是EAX随着字符串中每个字符递增
                                                        ; 当你使用同类型的内存地址相减时
                                                        ; 算出来的结果是两地址之间的个数差,本例中是文本长度

                                mov     edx, eax        ; EAX 现在等于字符串长度
                                mov     ecx, msg        ; 此后的代码不需要额外解释
                                mov     ebx, 1
                                mov     eax, 4
                                int     80h

                                mov     ebx, 0
                                mov     eax, 1
                                int     80h
                            
~$ nasm -f elf helloworld-len.asm ~$ ld -m elf_i386 helloworld-len.o -o helloworld-len ~$ ./helloworld-len Hello, brave new world!

Lesson 4

Subroutines

子程序

Introduction to subroutines
对子程序的介绍

Subroutines are functions. They are reusable pieces of code that can be called by your program to perform various repeatable tasks. Subroutines are declared using labels just like we've used before (eg. _start:) however we don't use the JMP instruction to get to them - instead we use a new instruction CALL. We also don't use the JMP instruction to return to our program after we have run the function. To return to our program from a subroutine we use the instruction RET instead.

子程序是由可复用的执行各种重复功能的函数构成的。子程序依旧通过此前我们使用的标签来声明。但是这次不使用JMP命令来调用子程序,我们将使用新指令 CALL.此外,我们也不使用JMP从子程序跳转回主程序,而是使用RET指令。

Why don't we JMP to subroutines?
为什么不用JMP来调用子程序?

The great thing about writing a subroutine is that we can reuse it. If we want to be able to use the subroutine from anywhere in the code we would have to write some logic to determine where in the code we had jumped from and where we should jump back to. This would litter our code with unwanted labels. If we use CALL and RET however, assembly handles this problem for us using something called the stack.

子程序的强大之处在于可复用性。如果我们想在代码的任意地方调用我们的子程序,我们应该写一些额外的代码来记录我们从什么地方跳转的、应该回到哪里。这些将会让我们的代码被若干非预期的标签所污染。如果我们使用 CALL 和 RET 指令, 汇编将使用栈来自动处理这些问题。

Introduction to the stack
对栈的简要介绍

The stack is a special type of memory. It's the same type of memory that we've used before however it's special in how it is used by our program. The stack is what is call Last In First Out memory (LIFO). You can think of the stack like a stack of plates in your kitchen. The last plate you put on the stack is also the first plate you will take off the stack next time you use a plate.

栈是特殊的内存空间。从底层上讲,栈和一般的内存空间也没别的区别。栈特殊在我们的程序以后进先出(LIFO)的原则来使用。你可以理解栈为厨房里一系列叠起来的盘子,你最后放的那个盘子是下次取出来的第一个盘子。

The stack in assembly is not storing plates though, its storing values. You can store a lot of things on the stack such as variables, addresses or other programs. We need to use the stack when we call subroutines to temporarily store values that will be restored later.

不过汇编里面的栈储存的东西是值。你可以用栈储存变量、地址甚至是其他程序。我们接下来使用栈来储存调用子程序时需要用到的东西。

Any register that your function needs to use should have it's current value put on the stack for safe keeping using the PUSH instruction. Then after the function has finished it's logic, these registers can have their original values restored using the POP instruction. This means that any values in the registers will be the same before and after you've called your function. If we take care of this in our subroutine we can call functions without worrying about what changes they're making to our registers.

函数用到的任何寄存器都应该先使用PUSH指令将现在的值安全地存入到栈中。接下来函数可以完成它自己的任务,最后再通过POP指令读取栈中储存的数据到各个寄存器中。这意味着调用函数前后各个寄存器中的值都保持不变。如果我们看重这一点,那我们就可以调用子程序,而不用担心子程序破坏了我们原有的寄存器数据情况。

The CALL and RET instructions also use the stack. When you CALL a subroutine, the address you called it from in your program is pushed onto the stack. This address is then popped off the stack by RET and the program jumps back to that place in your code. This is why you should always JMP to labels but you should CALL functions.

CALL 和 RET 指令也使用了栈。当你CALL一个子程序时,你现在的地址将会入栈。当执行RET指令时,这个地址将会出栈并调转到你程序原有的位置。这也是为什么要通过CALL来调用子程序的原因。

helloworld-len.asm
                            ; Hello World Program (Subroutines)
                            ; Compile with: nasm -f elf helloworld-len.asm
                            ; Link with (64 bit systems require elf_i386 option): ld -m elf_i386 helloworld-len.o -o helloworld-len
                            ; Run with: ./helloworld-len

                            SECTION .data
                            msg     db      'Hello, brave new world!', 0Ah

                            SECTION .text
                            global  _start

                            _start:

                                mov     eax, msg        ; 将要输出的字符串的地址写入EAX
                                call    strlen          ; 调用我们编写好的字符串长度函数

                                mov     edx, eax        ; 我们的函数使用EAX来保存字符串长度的结果
                                mov     ecx, msg        ; 剩下的就和之前类似了
                                mov     ebx, 1
                                mov     eax, 4
                                int     80h

                                mov     ebx, 0
                                mov     eax, 1
                                int     80h

                            strlen:                     ; 这个是我们的字符串长度函数的声明
                                push    ebx             ; 将EBX的值入栈,以此来保护EBX中原有的值。
                                mov     ebx, eax        ; 将EBX的值赋值为EAX

                            nextchar:                   ; 下文就和第三课内容类似了
                                cmp     byte [eax], 0
                                jz      finished
                                inc     eax
                                jmp     nextchar

                            finished:
                                sub     eax, ebx
                                pop     ebx             ; 将刚才存入栈的EBX的内容出栈回EBX
                                ret                     ; 通过RET语句回到调用函数的地方
                            
~$ nasm -f elf helloworld-len.asm ~$ ld -m elf_i386 helloworld-len.o -o helloworld-len ~$ ./helloworld-len Hello, brave new world!

Lesson 5

External include files

引用外部文件

External include files allow us to move code from our program and put it into separate files. This technique is useful for writing clean, easy to maintain programs. Reusable bits of code can be written as subroutines and stored in separate files called libraries. When you need a piece of logic you can include the file in your program and use it as if they are part of the same file.

引用外部文件允许我们将程序中的代码分离到单独的文件中。这对编写清晰、易于维护的程序相当重要。可复用的代码可以被编写为子程序并且储存在被称作库的独立文件中。当你需要使用其中的代码时,你可以include这个文件,这样就犹如这些代码本来就是同一个文件的一部分。

In this lesson we will move our string length calculating subroutine into an external file. We fill also make our string printing logic and program exit logic a subroutine and we will move them into this external file. Once it's completed our actual program will be clean and easier to read.

在本课中,我们将移动我们的字符串长度函数到一个单独的文件中。同时,我们也一同将字符串输出和程序退出编写为子程序并移动到这个外部文件中。这样之后,我们实际的程序将会清晰可读。

We can then declare another message variable and call our print function twice in order to demonstrate how we can reuse code.

我们可以另外声明一个字符串变量,通过输出新的字符串变量以展示代码的可复用性。

Note: I won't be showing the code in functions.asm after this lesson unless it changes. It will just be included if needed.

注意: functions.asm这个文件在后文有新的修改之前将不再赘述。

functions.asm
                            ;------------------------------------------
                            ; int slen(String message)
                            ; 字符串长度计算函数
                            slen:
                                push    ebx
                                mov     ebx, eax

                            nextchar:
                                cmp     byte [eax], 0
                                jz      finished
                                inc     eax
                                jmp     nextchar

                            finished:
                                sub     eax, ebx
                                pop     ebx
                                ret


                            ;------------------------------------------
                            ; void sprint(String message)
                            ; 字符串输出函数
                            sprint:
                                push    edx
                                push    ecx
                                push    ebx
                                push    eax
                                call    slen

                                mov     edx, eax
                                pop     eax

                                mov     ecx, eax
                                mov     ebx, 1
                                mov     eax, 4
                                int     80h

                                pop     ebx
                                pop     ecx
                                pop     edx
                                ret


                            ;------------------------------------------
                            ; void exit()
                            ; 退出程序
                            quit:
                                mov     ebx, 0
                                mov     eax, 1
                                int     80h
                                ret


                            
helloworld-inc.asm
                            ; Hello World Program (External file include)
                            ; Compile with: nasm -f elf helloworld-inc.asm
                            ; Link with (64 bit systems require elf_i386 option): ld -m elf_i386 helloworld-inc.o -o helloworld-inc
                            ; Run with: ./helloworld-inc

                            %include        'functions.asm'                             ; 引入外部文件

                            SECTION .data
                            msg1    db      'Hello, brave new world!', 0Ah              ; 字符串1
                            msg2    db      'This is how we recycle in NASM.', 0Ah      ; 字符串2

                            SECTION .text
                            global  _start

                            _start:

                                mov     eax, msg1       ; 将字符串1的地址写入EAX
                                call    sprint          ; 调用字符串输出函数

                                mov     eax, msg2       ; 将字符串2的地址写入EAX
                                call    sprint          ; 调用字符串输出函数

                                call    quit            ; 调用退出函数
                            
~$ nasm -f elf helloworld-inc.asm ~$ ld -m elf_i386 helloworld-inc.o -o helloworld-inc ~$ ./helloworld-inc Hello, brave new world! This is how we recycle in NASM. This is how we recycle in NASM.

Error: Our second message is outputted twice. This is fixed in the next lesson.

错误: 第二个字符串被输出了两次。这个问题将在下一课修正。


Lesson 6

NULL terminating bytes

NULL终止符

Ok so why did our second message print twice when we only called our sprint function on msg2 once? Well actually it did only print once. You can see what I mean if you comment out our second call to sprint. The output will be both of our message strings.

想知道为什么我们只输出了msg2一次但是而打印了两次吗?事实上的确输出了一次。你可以输出msg2的代码注释掉。实际上,输出字符串1的时候两个字符串都被输出了。

But how is this possible?

为什么?

What is happening is we weren't properly terminating our strings. In assembly, variables are stored one after another in memory so the last byte of our msg1 variable is right next to the first byte of our msg2 variable. We know our string length calculation is looking for a zero byte so unless our msg2 variable starts with a zero byte it keeps counting as if it's the same string (and as far as assembly is concerned it is the same string). So we need to put a zero byte or 0h after our strings to let assembly know where to stop counting.

以上问题归咎于我们未能正确地终止字符串。在汇编语言中,变量在内存中是连续储存的,因此,msg1的末尾紧接着msg2的开头。我们的取字符串长度函数的原理是寻找值为0的字符,所以除非msg2以0字符开头,否则两个字符串会被视作同一个字符串而持续统计长度。我们通过在字符串末尾放置一个0值字符表明字符串的结束。

Note: In programming 0h denotes a null byte and a null byte after a string tells assembly where it ends in memory.

注意: 在编程中,0h的值意味着NULL。而字符串末尾的NULL告诉汇编字符串结束了。

helloworld-inc.asm
                            ; Hello World Program (NULL terminating bytes)
                            ; Compile with: nasm -f elf helloworld-inc.asm
                            ; Link with (64 bit systems require elf_i386 option): ld -m elf_i386 helloworld-inc.o -o helloworld-inc
                            ; Run with: ./helloworld-inc

                            %include        'functions.asm'

                            SECTION .data
                            msg1    db      'Hello, brave new world!', 0Ah, 0h          ; 新增了NULL终止符
                            msg2    db      'This is how we recycle in NASM.', 0Ah, 0h  ; 新增了NULL终止符

                            SECTION .text
                            global  _start

                            _start:

                                mov     eax, msg1
                                call    sprint

                                mov     eax, msg2
                                call    sprint

                                call    quit
                            
~$ nasm -f elf helloworld-inc.asm ~$ ld -m elf_i386 helloworld-inc.o -o helloworld-inc ~$ ./helloworld-inc Hello, brave new world! This is how we recycle in NASM.

Lesson 7

Linefeeds

换行符

Linefeeds are essential to console programs like our 'hello world' program. They become even more important once we start building programs that require user input. But linefeeds can be a pain to maintain. Sometimes you will want to include them in your strings and sometimes you will want to remove them. If we continue to hard code them in our variables by adding 0Ah after our declared message text, it will become a problem. If there's a place in the code that we don't want to print out the linefeed for that variable we will need to write some extra logic remove it from the string at runtime.

换行符对于像是'Hello world!'这样的控制台程序是必要的,在程序需要用户输入时,换行符更重要了。但是换行符对于维护不太友好:有时你想要在字符末尾输出换行符,有时候不想。如果我们继续硬编码换行符0Ah到字符串变量里面,将造成维护问题。如果我们什么地方不想换行,那就需要额外的逻辑来在运行时移除这个换行符。

It would be better for the maintainability of our program if we write a subroutine that will print out our message and then print a linefeed afterwards. That way we can just call this subroutine when we need the linefeed and call our current sprint subroutine when we don't.

更好的方案是写一个新的子程序在输出完字符串后多输出一个换行符。我们在需要时调用这个子程序,其他情况下调用原来那个。

A call to sys_write requires we pass a pointer to an address in memory of the string we want to print so we can't just pass a linefeed character (0Ah) to our print function. We also don't want to create another variable just to hold a linefeed character so we will instead use the stack.

调用sys_write需要传递一个指向要输出的字符串的地址而不能单单传入一个换行符。我们也不想单独为换行符创建一个变量,因此因为使用栈来解决这个问题。

The way it works is by moving a linefeed character into EAX. We then push EAX onto the stack and get the address pointed to by the Extended Stack Pointer. ESP is another register. When you push items onto the stack, ESP is decremented to point to the address in memory of the last item and so it can be used to access that item directly from the stack. Since ESP points to an address in memory of a character, sys_write will be able to use it to print.

我们首先传入EAX一个换行符,接着我们将EAX入栈,通过ESP寄存器(扩展栈指针寄存器)获取到栈的地址。当你将数据入栈时,ESP寄存器会自动指向栈的末项因此ESP可以被用于直接找到栈里面的数据。因为ESP指向的是内存地址,所以sys_write可以将其输出。

Note: I've highlighted the new code in functions.asm below.

注意: 下文高亮了新增的代码

functions.asm
                            ;------------------------------------------
                            ; int slen(String message)
                            ; 计算字符串长度
                            slen:
                                push    ebx
                                mov     ebx, eax

                            nextchar:
                                cmp     byte [eax], 0
                                jz      finished
                                inc     eax
                                jmp     nextchar

                            finished:
                                sub     eax, ebx
                                pop     ebx
                                ret


                            ;------------------------------------------
                            ; void sprint(String message)
                            ; 输出字符串
                            sprint:
                                push    edx
                                push    ecx
                                push    ebx
                                push    eax
                                call    slen

                                mov     edx, eax
                                pop     eax

                                mov     ecx, eax
                                mov     ebx, 1
                                mov     eax, 4
                                int     80h

                                pop     ebx
                                pop     ecx
                                pop     edx
                                ret


                            ;------------------------------------------
                            ; void sprintLF(String message)
                            ; 输出字符串和换行符
                            sprintLF:
                                call    sprint

                                push    eax         ; 将EAX入栈以保护原来EAX的值
                                mov     eax, 0Ah    ; 将换行符0Ah写入EAX
                                                    ; 因为EAX是四字节宽(32bit),现在EAX的值为0000000Ah
                                push    eax         ; 将换行符入栈,这样栈内存就含有换行符
                                                    ; 因为我们的架构是小端序架构,这意味着EAX的值是逐字节反序存入内存中的
                                                    ; 这意味着现在栈内存存放的东西是0Ah,0h,0h,0h
                                                    ; 这本身就构成了我们需要的换行符后接着终止符的情况
                                mov     eax, esp    ; 将栈地址esp寄存器移动到EAX中,用来输出栈上现有的字符串(即换行符)
                                call    sprint      ; 调用原有的输出函数
                                pop     eax         ; 将换行符从栈中弹出
                                pop     eax         ; 恢复在运行我们的函数之前EAX的值
                                ret                 ; 将程序的控制返回到主进程


                            ;------------------------------------------
                            ; void exit()
                            ; 退出程序
                            quit:
                                mov     ebx, 0
                                mov     eax, 1
                                int     80h
                                ret


                            
helloworld-lf.asm
                            ; Hello World Program (Print with line feed)
                            ; Compile with: nasm -f elf helloworld-lf.asm
                            ; Link with (64 bit systems require elf_i386 option): ld -m elf_i386 helloworld-lf.o -o helloworld-lf
                            ; Run with: ./helloworld-lf

                            %include        'functions.asm'

                            SECTION .data
                            msg1    db      'Hello, brave new world!', 0h          ; NOTE we have removed the line feed character 0Ah
                            msg2    db      'This is how we recycle in NASM.', 0h  ; 注意,我们现在移除了字符串中的换行符

                            SECTION .text
                            global  _start

                            _start:

                                mov     eax, msg1
                                call    sprintLF    ; NOTE we are calling our new print with linefeed function

                                mov     eax, msg2
                                call    sprintLF    ; 注意,我们现在使用新的函数来输出字符串和换行符

                                call    quit
                            
~$ nasm -f elf helloworld-lf.asm ~$ ld -m elf_i386 helloworld-lf.o -o helloworld-lf ~$ ./helloworld-lf Hello, brave new world! This is how we recycle in NASM.

Lesson 8

Passing arguments

传递参数

Passing arguments to your program from the command line is as easy as popping them off the stack in NASM. When we run our program, any passed arguments are loaded onto the stack in reverse order. The name of the program is then loaded onto the stack and lastly the total number of arguments is loaded onto the stack. The last two stack items for a NASM compiled program are always the name of the program and the number of passed arguments.

在NASM汇编中,通过命令行给你的程序传递参数就和从栈取数据出栈一样容易。当我们运行程序时,任何传入的参数都会以反序的方式入栈,接着是程序名,最后是传入参数个数。通过NASM汇编器编译的程序的栈最初时最后两个值一定是程序名和参数个数。

So all we have to do to use them is pop the number of arguments off the stack first, then iterate once for each argument and perform our logic. In our program that means calling our print function.

所以我们需要做的事情就是将参数个数从栈中弹出,然后对每一个参数迭代执行我们的程序逻辑。本例中,我们仅仅调用输出函数。

Note: We are using the ECX register as our counter for the loop. Although it's a general-purpose register it's original intention was to be used as a counter.

注意: 我们使用ECX寄存器作为循环的计数器。虽然ECX是通用寄存器,但是在设计之初的确是为了作为计数器的。

helloworld-args.asm
                            ; Hello World Program (Passing arguments from the command line)
                            ; Compile with: nasm -f elf helloworld-args.asm
                            ; Link with (64 bit systems require elf_i386 option): ld -m elf_i386 helloworld-args.o -o helloworld-args
                            ; Run with: ./helloworld-args

                            %include        'functions.asm'

                            SECTION .text
                            global  _start

                            _start:

                                pop     ecx             ; 栈中第一个值是参数个数

                            nextArg:
                                cmp     ecx, 0h         ; 将ecx与0比较,判断是否有参数剩余
                                jz      noMoreArgs      ; 如果等于0,也就是没有参数了,就跳转到noMoreArgs标签,结束循环
                                pop     eax             ; 将栈中目前最后一项的数据出栈,存入EAX以待输出
                                call    sprintLF        ; 调用输出函数
                                dec     ecx             ; ECX自减
                                jmp     nextArg         ; 循环

                            noMoreArgs:
                                call    quit
                            
~$ nasm -f elf helloworld-args.asm ~$ ld -m elf_i386 helloworld-args.o -o helloworld-args ~$ ./helloworld-args "This is one argument" "This is another" 101 ./helloworld-args This is one argument This is another 101

Lesson 9

User input

用户输入

Introduction to the .bss section
对.bss段的介绍

So far we've used the .text and .data section so now it's time to introduce the .bss section. BSS stands for Block Started by Symbol. It is an area in our program that is used to reserve space in memory for uninitialised variables. We will use it to reserve some space in memory to hold our user input since we don't know how many bytes we'll need to store.

目前我们已经用到了.text段和.data段,现在要介绍.bss段。BSS是符号起始块(Block Started by Symbol)的缩写,是程序中为未初始化的数据保留的内存空间。我们使用bss来储存我们不知道会有多长的用户输入。

The syntax to declare variables is as follows:

bss段语法如下

.bss section example
                            SECTION .bss
                            variableName1:      RESB    1       ; reserve space for 1 byte (字节,8bit)
                            variableName2:      RESW    1       ; reserve space for 1 word (字,长度取决于架构,x86架构是32bit)
                            variableName3:      RESD    1       ; reserve space for 1 double word (双字)
                            variableName4:      RESQ    1       ; reserve space for 1 double precision float (quad word) (双精度浮点数,四字)
                            variableName5:      REST    1       ; reserve space for 1 extended precision float (扩展精度浮点数)
                            
Writing our program
写程序

We will be using the system call sys_read to receive and process input from the user. This function is assigned OPCODE 3 in the Linux System Call Table. Just like sys_write this function also takes 3 arguments which will be loaded into EDX, ECX and EBX before requesting a software interrupt that will call the function.

我们将使用系统调用sys_read来接收和处理用户输入。这个函数在x86架构Linux系统调用表中是3号调用。与sys_write类似,需要在触发调用之前通过寄存器接收3个参数。

The arguments passed are as follows:

  • EDX will be loaded with the maximum length (in bytes) of the space in memory.
  • ECX will be loaded with the address of our variable created in the .bss section.
  • EBX will be loaded with the file we want to read from – in this case STDIN.
As always the datatype and meaning of the arguments passed can be found in the function's definition.

参数如下所示:

  • EDX接收最大读取长度.
  • ECX接收读入数据的缓冲区地址,这里传入.bss段地址
  • EBX是我们读取的文件,这里从STDIN读取
以上可以参考系统调用的函数定义。

When sys_read detects a linefeed, control returns to the program and the users input is located at the memory address you passed in ECX.

当sys_read读取到换行符时,程序的控制权将返回你的程序,同时用户输入被储存到ECX中。

helloworld-input.asm
                            ; Hello World Program (Getting input)
                            ; Compile with: nasm -f elf helloworld-input.asm
                            ; Link with (64 bit systems require elf_i386 option): ld -m elf_i386 helloworld-input.o -o helloworld-input
                            ; Run with: ./helloworld-input

                            %include        'functions.asm'

                            SECTION .data
                            msg1        db      'Please enter your name: ', 0h      ; 要求用户输入名称
                            msg2        db      'Hello, ', 0h                       ; 在用户输入完名称后输出你好

                            SECTION .bss
                            sinput:     resb    255                                 ; 为用户输入保留255字节来储存

                            SECTION .text
                            global  _start

                            _start:

                                mov     eax, msg1
                                call    sprint

                                mov     edx, 255        ; 最多读取的字节数
                                mov     ecx, sinput     ; 储存用户输入的缓冲区位置
                                mov     ebx, 0          ; 从STDIN(0)中读取
                                mov     eax, 3          ; 向EAX填入sys_read的系统调用编号
                                int     80h

                                mov     eax, msg2
                                call    sprint

                                mov     eax, sinput     ; 将用户输入缓冲区输出。注意:最后的换行符也被存入了缓冲区
                                call    sprint          ; 调用原来的输出函数

                                call    quit
                            
~$ nasm -f elf helloworld-input.asm ~$ ld -m elf_i386 helloworld-input.o -o helloworld-input ~$ ./helloworld-input Please enter your name: Daniel Givney Hello, Daniel Givney

Lesson 10

Count to 10

数到10

Firstly, some background
一些背景信息

Counting by numbers is not as straight forward as you would think in assembly. Firstly we need to pass sys_write an address in memory so we can't just load our register with a number and call our print function. Secondly, numbers and strings are very different things in assembly. Strings are represented by what are called ASCII values. ASCII stands for American Standard Code for Information Interchange. A good reference for ASCII can be found here. ASCII was created as a way to standardise the representation of strings across all computers.

在汇编中数数并不直接。首先,我们需要向sys_write传递的是地址,因此不能只向寄存器存入数字就调用输出函数。其次,数字和字符串在汇编中是截然不同的两种事物。字符串通过ASCII码来编码。ASCII全称American Standard Code for Information Interchange. 可以前往这个网站参考。

Remember, we can't print a number - we have to print a string. In order to count to 10 we will need to convert our numbers from standard integers to their ASCII string representations. Have a look at the ASCII values table and notice that the string representation for the number '1' is actually '49' in ASCII. In fact, adding 48 to our numbers is all we have to do to convert them from integers to their ASCII string representations.

注意!我们不能输出(print)一个数字,我们只能够输出字符串。为了数到10,我们需要将数字转换成ASCII码编码的字符串。查阅ASCII码表可知,数字转换成相应的字符串只需要加48,比如'1'对应ASCII码的49号。

Writing our program
写程序

What we will do with our program is count from 1 to 10 using the ECX register. We will then add 48 to our counter to convert it from a number to it's ASCII string representation. We will then push this value to the stack and call our print function passing ESP as the memory address to print from. Once we have finished counting to 10 we will exit our counting loop and call our quit function.

我们的程序将使用ECX寄存器储存1到10的数字。接着我们将数字加48以转换为ASCII编码的字符串。接着我们将这个值入栈再通过给打印函数传递ESP寄存器来将这个字符输出。一旦我们成功数到10了,就结束循环并退出程序。

helloworld-10.asm
                            ; Hello World Program (Count to 10)
                            ; Compile with: nasm -f elf helloworld-10.asm
                            ; Link with (64 bit systems require elf_i386 option): ld -m elf_i386 helloworld-10.o -o helloworld-10
                            ; Run with: ./helloworld-10

                            %include        'functions.asm'

                            SECTION .text
                            global  _start

                            _start:

                                mov     ecx, 0          ; ECX被初始化为0

                            nextNumber:
                                inc     ecx             ; ECX自增

                                mov     eax, ecx        ; 将要输出的数写入EAX
                                add     eax, 48         ; 加48转换成相应的ASCII字符
                                push    eax             ; 入栈
                                mov     eax, esp        ; 通过ESP获取栈中字符串的地址
                                call    sprintLF        ; 调用输出函数

                                pop     eax             ; 将栈中用于输出的字符弹出
                                cmp     ecx, 10         ; 判断是否到10
                                jne     nextNumber      ; jne命令:不相等则跳转

                                call    quit
                            
~$ nasm -f elf helloworld-10.asm ~$ ld -m elf_i386 helloworld-10.o -o helloworld-10 ~$ ./helloworld-10 1 2 3 4 5 6 7 8 9 :

Error: Our number 10 prints a colon (:) character instead. What's going on?

错误: 应该输出10的地方输出了一个冒号。


Lesson 11

Count to 10 (itoa)

通过itoa函数来数到十

So why did our program in Lesson 10 print out a colon character instead of the number 10?. Well lets have a look at our ASCII table. We can see that the colon character has a ASCII value of 58. We were adding 48 to our integers to convert them to their ASCII string representations so instead of passing sys_write the value '58' to print ten we actually need to pass the ASCII value for the number 1 followed by the ASCII value for the number 0. Passing sys_write '4948' is the correct string representation for the number '10'. So we can't just simply add 48 to our numbers to convert them, we first have to divide them by 10 because each place value needs to be converted individually.

所以为什么第十课应该输出数字10的地方输出了一个冒号?请参阅ASCII码表,ASCII值为58的地方是冒号。当我们让10加上48时,得到的是58.所以为了输出一个10,我们应该输出1再输出0。也就是向sys_write传递49 48.因此我们不能简单的给数字加48来完成转换,我们首先应该按各个位将数字分离因为每位的数字都应该单独转换。

We will write 2 new subroutines in this lesson 'iprint' and 'iprintLF'. These functions will be used when we want to print ASCII string representations of numbers. We achieve this by passing the number in EAX. We then initialise a counter in ECX. We will repeatedly divide the number by 10 and each time convert the remainder to a string by adding 48. We will then push this onto the stack for later use. Once we can no longer divide the number by 10 we will enter our second loop. In this print loop we will print the now converted string representations from the stack and pop them off. Popping them off the stack moves ESP forward to the next item on the stack. Each time we print a value we will decrease our counter ECX. Once all numbers have been converted and printed we will return to our program.

在本课程中,我们将编写两个新的子程序'iprint'和'iprintLF'。这两个函数将用于输出代表传递到EAX寄存器中的数字的ASCII字符串。我们初始化ECX寄存器,重复地用数字去除10,将余数加上48转换为字符入栈待稍后使用,当我们剩下的数字不够除10,我们便进入第二个循环——将栈中的数字打印并弹出。弹出的操作意味着ESP也会前移一项,我们每次输出一个值也会减少ECX的一个值。当所有的数字都被转换和输出之后,我们返回到程序中。

How does the divide instruction work?
除法指令是如何工作的。

The DIV and IDIV instructions work by dividing whatever is in EAX by the value passed to the instruction. The quotient part of the value is left in EAX and the remainder part is put into EDX (Originally called the data register). DIV和IDIV指令使用EAX来除传入的值,商被存入EAX而余数被存入EDX。这也是EDX得名数据寄存器的原因。(译者注:DIV、IDIV的操作数只能为寄存器,不能传入立即数)

For example.

举例来说

IDIV instruction example
                            mov     eax, 10         ; EAX = 10
                            mov     esi, 10         ; ESI = 10
                            idiv    esi             ; 使用EAX除ESI(10),现在EAX=1,EDX=0
                            idiv    esi             ; 使用EAX除ESI(10),现在EAX=0,EDX=1
                            
If we are only storing the remainder won't we have problems?
如果只储存余数是否会发生问题?

No, because these are integers, when you divide a number by an even bigger number the quotient in EAX is 0 and the remainder is the number itself. This is because the number divides zero times leaving the original value as the remainder in EDX. How good is that?

一般不会,因为你使用某个数去除更大的数,EAX会变为0而EDX会变成原来的被除数。这不挺好的吗?(译者注:译者未能理解这段话在讲什么,大意应该是只将EDX入栈不用担心EAX的值会影响程序逻辑)

Note: Only the new functions iprint and iprintLF have comments.

注意: 只有新的函数iprint和iprintLF有注释。

functions.asm
                            ;------------------------------------------
                            ; void iprint(Integer number)
                            ; 通过整数转换为字符的方式来打印数字
                            iprint:
                                push    eax             ; 保护EAX、ECX、EDX、ESI
                                push    ecx
                                push    edx
                                push    esi
                                mov     ecx, 0          ; ECX是记录要输出字符数的寄存器

                            divideLoop:
                                inc     ecx             ; ECX自增来统计要输出的字符数
                                mov     edx, 0          ; 每次是哟之前清空EDX
                                mov     esi, 10         ; 设置ESI为10
                                idiv    esi             ; 使用EAX除去ESI
                                add     edx, 48         ; 将除10的余数加48转换为相应的ASCII字符
                                push    edx             ; 将转换为字符的值入栈
                                cmp     eax, 0          ; 将EAX同0比较判断是否除净
                                jnz     divideLoop      ; JNZ是非零情况跳转,也就是未除净继续循环。

                            printLoop:
                                dec     ecx             ; ECX递减
                                mov     eax, esp        ; 将esp传入eax以输出栈里面存放的字符
                                call    sprint          ; 调用字符串输出函数
                                pop     eax             ; 将最后一个字符弹出,esp前移
                                cmp     ecx, 0          ; 比较是否输出完全部的字符
                                jnz     printLoop       ; 如果没有,就继续

                                pop     esi             ; 恢复ESI、EDX、ECX、EAX在使用之前的值
                                pop     edx
                                pop     ecx
                                pop     eax
                                ret


                            ;------------------------------------------
                            ; void iprintLF(Integer number)
                            ; 有换行符的iprint
                            iprintLF:
                                call    iprint          ; 调用以上的iprint函数

                                push    eax             ; 保护EAX
                                mov     eax, 0Ah        ; 将换行符写入EAX
                                push    eax             ; 换行符入栈
                                mov     eax, esp        ; 将esp写入eax,以此来打印栈中换行符
                                call    sprint          ; 输出换行符
                                pop     eax             ; 将换行符弹出
                                pop     eax
                                ret


                            ;------------------------------------------
                            ; int slen(String message)
                            ; String length calculation function
                            slen:
                                push    ebx
                                mov     ebx, eax

                            nextchar:
                                cmp     byte [eax], 0
                                jz      finished
                                inc     eax
                                jmp     nextchar

                            finished:
                                sub     eax, ebx
                                pop     ebx
                                ret


                            ;------------------------------------------
                            ; void sprint(String message)
                            ; String printing function
                            sprint:
                                push    edx
                                push    ecx
                                push    ebx
                                push    eax
                                call    slen

                                mov     edx, eax
                                pop     eax

                                mov     ecx, eax
                                mov     ebx, 1
                                mov     eax, 4
                                int     80h

                                pop     ebx
                                pop     ecx
                                pop     edx
                                ret


                            ;------------------------------------------
                            ; void sprintLF(String message)
                            ; String printing with line feed function
                            sprintLF:
                                call    sprint

                                push    eax
                                mov     eax, 0AH
                                push    eax
                                mov     eax, esp
                                call    sprint
                                pop     eax
                                pop     eax
                                ret


                            ;------------------------------------------
                            ; void exit()
                            ; Exit program and restore resources
                            quit:
                                mov     ebx, 0
                                mov     eax, 1
                                int     80h
                                ret


                            
helloworld-itoa.asm
                            ; Hello World Program (Count to 10 itoa)
                            ; Compile with: nasm -f elf helloworld-itoa.asm
                            ; Link with (64 bit systems require elf_i386 option): ld -m elf_i386 helloworld-itoa.o -o helloworld-itoa
                            ; Run with: ./helloworld-itoa

                            %include        'functions.asm'

                            SECTION .text
                            global  _start

                            _start:

                                mov     ecx, 0

                            nextNumber:
                                inc     ecx
                                mov     eax, ecx
                                call    iprintLF        ; 现在调用我们的整数输出函数
                                cmp     ecx, 10
                                jne     nextNumber

                                call    quit
                            
~$ nasm -f elf helloworld-itoa.asm ~$ ld -m elf_i386 helloworld-itoa.o -o helloworld-itoa ~$ ./helloworld-itoa 1 2 3 4 5 6 7 8 9 10

Lesson 12

Calculator - addition

计算器 - 加法

In this program we will be adding the registers EAX and EBX together and we'll leave our answer in EAX. Firstly we use the MOV instruction to load EAX with an integer (in this case 90). We then MOV an integer into EBX (in this case 9). Now all we need to do is use the ADD instruction to perform our addition. EBX & EAX will be added together leaving our answer in the left most register in this instruction (in our case EAX). Then all we need to do is call our integer printing function to complete the program.

在本程序中,我们将EAX和EBX的值加在一起,结果储存到EAX中。首先我们使用MOV指令来为寄存器赋值为90和9.接着我们只需要使用ADD指令即可。最后调用我们的整数输出函数。

calculator-addition.asm
                            ; Calculator (Addition)
                            ; Compile with: nasm -f elf calculator-addition.asm
                            ; Link with (64 bit systems require elf_i386 option): ld -m elf_i386 calculator-addition.o -o calculator-addition
                            ; Run with: ./calculator-addition

                            %include        'functions.asm'

                            SECTION .text
                            global  _start

                            _start:

                                mov     eax, 90     ; 为EAX赋值
                                mov     ebx, 9      ; 为EBX赋值
                                add     eax, ebx    ; EAX = EAX + EBX
                                call    iprintLF    ; 调用整数输出函数

                                call    quit
                            
~$ nasm -f elf calculator-addition.asm ~$ ld -m elf_i386 calculator-addition.o -o calculator-addition ~$ ./calculator-addition 99

Lesson 13

Calculator - subtraction

计算器 - 减法

In this program we will be subtracting the value in the register EBX from the value in the register EAX. Firstly we load EAX and EBX with integers in the same way as Lesson 12. The only difference is we will be using the SUB instruction to perform our subtraction logic, leaving our answer in the left most register of this instruction (in our case EAX). Then all we need to do is call our integer printing function to complete the program.

在本程序中,我们将会使用EAX的值减去EBX的值。首先我们参照第12课的方式向EAX和EBX加载数字,仅有的差异就是本课使用SUB指令来执行减法,指令会将结果存在放左侧的EAX寄存器中,接着我们只需要输出就行。

calculator-subtraction.asm
                            ; Calculator (Subtraction)
                            ; Compile with: nasm -f elf calculator-subtraction.asm
                            ; Link with (64 bit systems require elf_i386 option): ld -m elf_i386 calculator-subtraction.o -o calculator-subtraction
                            ; Run with: ./calculator-subtraction

                            %include        'functions.asm'

                            SECTION .text
                            global  _start

                            _start:

                                mov     eax, 90     ; 为EAX赋值
                                mov     ebx, 9      ; 为EBX赋值
                                sub     eax, ebx    ; 减法指令
                                call    iprintLF    ; 输出

                                call    quit
                            
~$ nasm -f elf calculator-subtraction.asm ~$ ld -m elf_i386 calculator-subtraction.o -o calculator-subtraction ~$ ./calculator-subtraction 81

Lesson 14

Calculator - multiplication

计算器 - 乘法

In this program we will be multiplying the value in EBX by the value present in EAX. Firstly we load EAX and EBX with integers in the same way as Lesson 12. This time though we will be calling the MUL instruction to perform our multiplication logic. The MUL instruction is different from many instructions in NASM, in that it only accepts one further argument. The MUL instruction always multiples EAX by whatever value is passed after it. The answer is left in EAX.

本程序中,我们将使用EAX寄存器的值乘上EBX寄存器的值。和第12课一样,我们先赋值。这次使用MUL指令来完成乘法操作。注意:MUL与NASM的其他很多指令不同,MUL只接受一个参数,通过使用EAX来乘传入参数,结果储存在EAX中。

calculator-multiplication.asm
                            ; Calculator (Multiplication)
                            ; Compile with: nasm -f elf calculator-multiplication.asm
                            ; Link with (64 bit systems require elf_i386 option): ld -m elf_i386 calculator-multiplication.o -o calculator-multiplication
                            ; Run with: ./calculator-multiplication

                            %include        'functions.asm'

                            SECTION .text
                            global  _start

                            _start:

                                mov     eax, 90
                                mov     ebx, 9
                                mul     ebx         ; EAX = EAX * EBX
                                call    iprintLF

                                call    quit
                            
~$ nasm -f elf calculator-multiplication.asm ~$ ld -m elf_i386 calculator-multiplication.o -o calculator-multiplication ~$ ./calculator-multiplication 810

Lesson 15

Calculator - division

计算器 - 除法

In this program we will be dividing the value in EBX by the value present in EAX. We've used division before in our integer print subroutine. Our program requires a few extra strings in order to print out the correct answer but otherwise there's nothing complicated going on.

在本课中,我们将使用EAX的值去除EBX的值。在此前的整数输出函数中,我们已经用过除法指令了。这个程序需要额外的字符串用来表示“余”。

Firstly we load EAX and EBX with integers in the same way as Lesson 12. Division logic is performed using the DIV instruction. The DIV instruction always divides EAX by the value passed after it. It will leave the quotient part of the answer in EAX and put the remainder part in EDX (the original data register). We then MOV and call our strings and integers to print out the correct answer.

首先和12课一样,给EAX和EBX赋值。除法指令是DIV,DIV总是使用EAX作为被除数,传入参数作为除数,整除结果保留在EAX中,余数保留在EDX中。接着需要MOV来调用字符串和整数输出函数。

calculator-division.asm
                            ; Calculator (Division)
                            ; Compile with: nasm -f elf calculator-division.asm
                            ; Link with (64 bit systems require elf_i386 option): ld -m elf_i386 calculator-division.o -o calculator-division
                            ; Run with: ./calculator-division

                            %include        'functions.asm'

                            SECTION .data
                            msg1        db      ' remainder '      ; 余

                            SECTION .text
                            global  _start

                            _start:

                                mov     eax, 90
                                mov     ebx, 9
                                div     ebx         ; 新EAX * EBX + 新EDX = 原EAX
                                call    iprint      ; 调用整数输出函数
                                mov     eax, msg1   ; 输出一个“余”
                                call    sprint      ; 调用字符输出函数
                                mov     eax, edx    ; 将余数EDX转移到EAX以用来输出
                                call    iprintLF    ; 有换行符的整数输出函数

                                call    quit
                            
~$ nasm -f elf calculator-division.asm ~$ ld -m elf_i386 calculator-division.o -o calculator-division ~$ ./calculator-division 10 remainder 0

Lesson 16

Calculator (atoi)

基于文本转数字的计算器

Our program will take several command line arguments and add them together printing out the result in the terminal.

本程序将从命令行接受若干参数并将它们加到一起,最后将结果输出到终端。

Writing our program
写程序

Our program begins by using the POP instruction to get the number of passed arguments off the stack. This value is stored in ECX (originally known as the counter register). It will then POP the next value off the stack containing the program name and remove it from the number of arguments stored in ECX. It will then loop through the rest of the arguments popping each one off the stack and performing our addition logic. As we know, arguments passed via the command line are received by our program as strings. Before we can add the arguments together we will need to convert them to integers otherwise our result will not be correct. We do this by calling our Ascii to Integer function (atoi). This function will convert the ascii value into an integer and place the result in EAX. We can then add this value to EDX (originally known as the data register) where we will store the result of our additions. If the value passed to atoi is not an ascii representation of an integer our function will return zero instead. When all arguments have been converted and added together we will print out the result and call our quit function.

我们的程序先使用pop指令获取到栈顶的参数个数并储存于ECX寄存器中,接着将程序名弹出并将ECX减去这项。循环将栈中各项弹出并相加。然而,参数是以字符串的形式传递的,在我们将数字相加之前,我们需要有一个函数将字符串转换为数字(atoi)。这个函数将会通过ASCII码将数字转换为数值并储存在EAX,我们将这个数加到EDX中储存求和结果。如果传递进入函数的字符串并不是一个表示数字的字符串,那我们的函数返回0.当所有的参数都被转换求和后,输出总和并结束程序。

How does the atoi function work
文本转数字函数如何运作?

Converting an ascii string into an integer value is not a trivial task. We know how to convert an integer to an ascii string so the process should essentially work in reverse. Firstly we take the address of the string and move it into ESI (originally known as the source register). We will then move along the string byte by byte (think of each byte as being a single digit or decimal placeholder). For each digit we will check if it's value is between 48-57 (ascii values for the digits 0-9).

将文本转换为数字并不是一个容易事,我们知道如何转换一个数字到字符串,这个工作就需要反着来。首先我们获取到字符串的地址,并写入常被叫做源地址寄存器的ESI,接着在认为每个字符是一个单独的数字的情况下逐个移动字符,检查是否在ASCII码代表0到9的48到57的范围内。

Once we have performed this check and determined that the byte can be converted to an integer we will perform the following logic. We will subtract 48 from the value - converting the ascii value to it's decimal equivalent. We will then add this value to EAX (the general purpose register that will store our result). We will then multiple EAX by 10 as each byte represents a decimal placeholder and continue the loop.

一旦检查通过,我们就可以执行接下来的转换逻辑:我们使用每一个值减去48,将ASCII码转换为相应的数字,加到一般作为结果寄存器的EAX中,接着让EAX乘10进一位继续循环。

When all bytes have been converted we need to do one last thing before we return the result. The last digit of any number represents a single unit (not a multiple of 10) so we have multiplied our result one too many times. We simple divide it by 10 once to correct this and then return. If no integer arguments were pass however, we skip this divide instruction.

当所有字符都完成转换后我们还有一件事要做:最后一位书代表的是个位而非十位,只需要除去10即可。

What is the BL register
BL寄存器是什么?

You may have noticed that the atoi function references the BL register. So far in these tutorials we have been exclusively using 32bit registers. These 32bit general purpose registers contain segments of memory that can also be referenced. These segments are available in 16bits and 8bits. We wanted a single byte (8bits) because a byte is the size of memory that is required to store a single ascii character. If we used a larger memory size we would have copied 8bits of data into 32bits of space leaving us with 'rubbish' bits - because only the first 8bits would be meaningful for our calculation.

你可能会注意到我们的字符串转数字函数使用到了BL寄存器。截至目前我们使用到的寄存器都是32位寄存器,这代表数据以32位一段来处理,但同时也可以用16位或8位。我们需要用到8位寄存器用来储存单个字符。如果使用了更大的内存空间,会造成浪费,因为我们的计算中只有8位实际有用。

The EBX register is 32bits. EBX's 16 bit segment is referenced as BX. BX contains the 8bit segments BL and BH (Lower and Higher bits). We wanted the first 8bits (lower bits) of EBX and so we referenced that storage area using BL.

EBX寄存器是32位寄存器,EBX的低16位是BX寄存器。BX又可以分为低8位BL和高8位BH。我们需要使用EBX的第一个低8位,所以使用BL寄存器。

Almost every assembly language tutorial begins with a history of the registers, their names and their sizes. These tutorials however were written to provide a foundation in NASM by first writing code and then understanding the theory. The full story about the size of registers, their history and importance are beyond the scope of this tutorial but we will return to that story in later tutorials.

几乎所有的汇编教程都是以寄存器的历史、名称、大小开始的,那些教程是通过写代码和对理论的理解来提供一个NASM汇编的基础。在本教程中,关于寄存器大小、历史和其他重要的事情都会在后文慢慢叙述。

Note: Only the new function in this file 'atoi' is shown below.

functions.asm
                                ;------------------------------------------
                                ; int atoi(Integer number)
                                ; Ascii to integer function (atoi)
                                atoi:
                                    push    ebx             ; 保护EBX、ECX、EDX、ESI
                                    push    ecx
                                    push    edx
                                    push    esi
                                    mov     esi, eax        ; 将储存在EAX的字符串指针移动进入ESI
                                    mov     eax, 0          ; 初始化EAX、ECX为0
                                    mov     ecx, 0

                                .multiplyLoop:
                                    xor     ebx, ebx        ; 将EBX的高低位都设置为0(译者注:这步操作比MOV EBX,0省性能)
                                    mov     bl, [esi+ecx]   ; 从内存中读取一个字节,存入BL
                                    cmp     bl, 48          ; 比较BL与'0'
                                    jl      .finished       ; JL(Jump if Less) 如果小于则跳转,(译者注:字符串末尾NULL也会引发这一步跳转)
                                    cmp     bl, 57          ; 比较BL与'9'
                                    jg      .finished       ; JG(Jump if Greater) 如果大于则跳转

                                    sub     bl, 48          ; 将字符减去48转换为相应的数字
                                    add     eax, ebx        ; 将EAX = EAX + EBX,这里BL是EBX的末8位,所以EBX在数值上等于BL
                                    mov     ebx, 10         ; 令EBX = 10
                                    mul     ebx             ; EAX = EAX * EBX(10),这里是因为乘法指令也不能操作立即数
                                    inc     ecx             ; ECX计数器自增
                                    jmp     .multiplyLoop   ; 继续循环

                                .finished:
                                    cmp     ecx, 0          ; 比较ECX是否为0,如果是,表明是因为读取到了非数字而导致的。
                                    je      .restore        ; JE(Jump if Equal),如果相等则跳转
                                    mov     ebx, 10         ; 将EBX设置为10
                                    div     ebx             ; EAX = EAX / EBX(10)

                                .restore:
                                    pop     esi             ; 恢复ESI、EDX、ECX、EBX
                                    pop     edx
                                    pop     ecx
                                    pop     ebx
                                    ret
                            
calculator-atoi.asm
                            ; Calculator (ATOI)
                            ; Compile with: nasm -f elf calculator-atoi.asm
                            ; Link with (64 bit systems require elf_i386 option): ld -m elf_i386 calculator-atoi.o -o calculator-atoi
                            ; Run with: ./calculator-atoi 20 1000 317

                            %include        'functions.asm'

                            SECTION .text
                            global  _start

                            _start:

                                pop     ecx             ; 用ECX记录参数个数
                                pop     edx             ; 程序名,舍弃
                                sub     ecx, 1          ; ECX减1,程序名项已舍弃
                                mov     edx, 0          ; 初始化EDX用以储存求和数值

                            nextArg:
                                cmp     ecx, 0h         ; 检查是否有参数剩余
                                jz      noMoreArgs      ; JZ(Jump if Zero),如果没有,则结束循环
                                pop     eax             ; 将栈顶的参数弹出
                                call    atoi            ; 调用转换函数
                                add     edx, eax        ; 累加
                                dec     ecx             ; 参数数目自减
                                jmp     nextArg         ; 循环

                            noMoreArgs:
                                mov     eax, edx        ; 移动求和数值进入EAX以调用数字输出函数
                                call    iprintLF
                                call    quit
                            
~$ nasm -f elf calculator-atoi.asm ~$ ld -m elf_i386 calculator-atoi.o -o calculator-atoi ~$ ./calculator-atoi 20 1000 317 1337

Lesson 17

Namespace

命名空间

Namespace is a necessary construct in any software project that involves a codebase that is larger than a few simple functions. Namespace provides scope to your identifiers and allows you to reuse naming conventions to make your code more readable and maintainable. In assembly language where subroutines are identified by global labels, namespace can be achieved by using local labels.

命名空间是在任何软件工程中都很重要的一种结构,包括一些稍大于单个函数的代码库。命名空间可以划分独立的标识符空间,使得你可以重新使用相同的标识符名称,由此使得代码可读性和可维护性得到提升。在汇编语言中,子程序使用全局标签区分,命名空间使用本地标签(局部标签)来构成。

Up until the last few tutorials we have been using global labels exclusively. This means that blocks of logic that essentially perform the same task needed a label with a unique identifier. A good example would be our "finished" labels. These were global in scope meaning when we needed to break out of a loop in one function we could jump to a "finished" label. But if we needed to break out of a loop in a different function we would need to name this same task something else maybe calling it "done" or "continue". Being able to reuse the label "finished" would mean that someone reading the code would know that these blocks of logic perform almost the same task.

以上的这些教程我们已经使用过了很多全局标签了,这意味着执行相同任务的逻辑需要独特的标识符:对此的一个很好的例子是'finished'标签。如果finished标签是全局标签,那么我们每一个函数需要结束循环时都会跳转到同一个fininshed标签中。但是如果我们需要在别的函数中以不同的结束循环逻辑操作时,就不再能使用fininshed这个标识符,而可能需要叫做done或者continue等等。如果能够复用finished这个标识符,别人在读我们的代码时也能直接看出来这些代码块的逻辑是类似的。

Local labels are prepended with a "." at the beginning of their name for example ".finished". You may have noticed them appearing as our code base in functions.asm grew. A local label is given the namespace of the first global label above it. You can jump to a local label by using the JMP instruction and the compiler will calculate which local label you are referencing by determining in what scope (based on the above global labels) the instruction was called.

局部标签以英文句号.开头,例如'.finished'。你可能注意到我们已经使用过了。局部标签位于其上方最近的一个全局标签所构成的命名空间作用域中,你可以通过JMP指令跳转到局部变量中。汇编编译器会自动在全局标签的命名空间作用域里面寻找你正在使用的局部标签。

Note: The file functions.asm was modified adding namespaces in all the subroutines. This is particularly important in the "slen" subroutine which contains a "finished" global label.

注意: 本课附件中的functions.asm已经修改为所有函数都加上命名空间了。这对于slen(strlen)子程序很重要。

namespace.asm
                            ; Namespace
                            ; Compile with: nasm -f elf namespace.asm
                            ; Link with (64 bit systems require elf_i386 option): ld -m elf_i386 namespace.o -o namespace
                            ; Run with: ./namespace

                            %include        'functions.asm'

                            SECTION .data
                            msg1        db      'Jumping to finished label.', 0h        ; 字符串
                            msg2        db      'Inside subroutine number: ', 0h        ; 字符串
                            msg3        db      'Inside subroutine "finished".', 0h     ; 字符串

                            SECTION .text
                            global  _start

                            _start:

                            subrountineOne:
                                mov     eax, msg1       ; 移动要输出的文本
                                call    sprintLF        ; 调用有换行符的输出函数
                                jmp     .finished       ; 跳转到同命名空间下的结束段

                            .finished:
                                mov     eax, msg2       ; 移动要输出的文本
                                call    sprint          ; 调用文本输出函数
                                mov     eax, 1          ; 设置EAX为1,意味着现在是第一个子程序
                                call    iprintLF        ; 输出这个1

                            subrountineTwo:
                                mov     eax, msg1       ; 移动要输出的文本
                                call    sprintLF        ; 调用有换行符的输出函数
                                jmp     .finished       ; 跳转到同命名空间下的结束段

                            .finished:
                                mov     eax, msg2       ; 移动要输出的文本
                                call    sprint          ; 调用文本输出函数
                                mov     eax, 2          ; 设置EAX为2,意味着现在是第二个子程序
                                call    iprintLF        ; 输出2

                                mov     eax, msg1       ; 移动要输出的文本
                                call    sprintLF        ; 调用文本输出函数
                                jmp     finished        ; 跳转到全局结束段

                            finished:
                                mov     eax, msg3       ; 移动要输出的文本
                                call    sprintLF        ; 调用文本输出函数
                                call    quit            ; 退出程序
                            
~$ nasm -f elf namespace.asm ~$ ld -m elf_i386 namespace.o -o namespace ~$ ./namespace Jumping to finished label. Inside subroutine number: 1 Jumping to finished label. Inside subroutine number: 2 Jumping to finished label. Inside subroutine "finished".

Lesson 18

Fizz Buzz

Firstly, some background
一些背景信息

FizzBuzz is group word game played in schools to teach children division. Players take turns to count aloud integers from 1 to 100 replacing any number divisible by 3 with the word "fizz" and any number divisible by 5 with the word "buzz". Numbers that are both divisible by 3 and 5 are replaced by the word "fizzbuzz". This children's game has also become a defacto interview screening question for computer programming jobs as it's thought to easily discover candidates that can't construct a simple logic gate.

FizzBuzz是在学校教学生除法的一组文字游戏。玩家轮流从1数到100,当遇到被三整除的数要喊fizz,被五整除的数要喊buzz.如果同时被三五整除,那就喊fizzbuzz.这种小孩子的游戏事实上也可以设计为电脑程序因为它简单而又有明确的逻辑。

Writing our program
写程序

There are a number of code solutions to this simple game and some languages offer very trivial and elegant solutions. Depending on how you choose to solve it, the solution almost always involves an if statement and possibly an else statement depending whether you choose to exploit the mathematical property that anything divisible by 5 & 3 would also be divisible by 3 * 5. Being that this is an assembly language tutorial we will provide a solution that involves a structure of two cascading if statements to print the words "fizz" and/or "buzz" and an else statement in case these fail, to print the integer as an ascii value. Each iteration of our loop will then print a line feed. Once we reach 100 we call our program exit function.

有很多的解决方案能够实现这个简单的游戏,甚至一些编程语言能够给你不费吹灰之力且优雅的解决方案。取决于你想实现的方式,解决方案主要包括一个if语句和可能的else语句,依靠你能否恰当地利用数学运算来判断一个数字能否被5、3整除或者能被3*5整除。在本教程中,我们将会使用一种包括两个级联if语句和一个else语句的结构来完成这个工作。两个if语句判断是否应该输出fizz或者buzz,如果都失败了就输出原始的数值。循环中每一项结束后都会输出一个换行符,一旦我们达到100就调用结束函数。

fizzbuzz.asm
                            ; Fizzbuzz
                            ; Compile with: nasm -f elf fizzbuzz.asm
                            ; Link with (64 bit systems require elf_i386 option): ld -m elf_i386 fizzbuzz.o -o fizzbuzz
                            ; Run with: ./fizzbuzz

                            %include        'functions.asm'

                            SECTION .data
                            fizz        db      'Fizz', 0h     ; Fizz字符串
                            buzz        db      'Buzz', 0h     ; Buzz字符串

                            SECTION .text
                            global  _start

                            _start:

                                mov     esi, 0          ; 初始化检查是否为Fizz的布尔值
                                mov     edi, 0          ; 初始化检查是否为Buzz的布尔值
                                mov     ecx, 0          ; 初始化计数器

                            nextNumber:
                                inc     ecx             ; 计数器自增

                            .checkFizz:
                                mov     edx, 0          ; 清除储存余数的edx寄存器
                                mov     eax, ecx        ; 移动目前要判断的数到eax作为被除数
                                mov     ebx, 3          ; 移动除数3到EBX
                                div     ebx             ; 使用DIV指令执行除法
                                mov     edi, edx        ; 移动余数到edi寄存器
                                cmp     edi, 0          ; 比较是否有余数,如果为0表示整除,否则表述有余数
                                jne     .checkBuzz      ; 如果不相等(JNE-Jump if Not Equal),则说明非3的倍数,直接跳转到Buzz检查
                                mov     eax, fizz       ; 如果是3的倍数,会执行这一条,将fizz字符串移动到eax
                                call    sprint          ; 输出fizz字符串

                            .checkBuzz:
                                mov     edx, 0          ; 基本逻辑同上
                                mov     eax, ecx
                                mov     ebx, 5
                                div     ebx
                                mov     esi, edx
                                cmp     esi, 0
                                jne     .checkInt
                                mov     eax, buzz
                                call    sprint

                            .checkInt:
                                cmp     edi, 0          ; EDI保存了checkFizz的结果
                                je     .continue        ; 如果EDI为0说明被三整除,跳过输出原始数字这一步
                                cmp     esi, 0          ; ESI保存了checkBuzz的结果
                                je     .continue        ; 基本原理同上
                                mov     eax, ecx        ; 如果不是三五的倍数,会执行到这里,将原始数字存入eax
                                call    iprint          ; 调用数字输出函数将eax中的数字输出

                            .continue:
                                mov     eax, 0Ah        ; 输出换行符的流程
                                push    eax
                                mov     eax, esp
                                call    sprint
                                pop     eax
                                cmp     ecx, 100        ; 比较ecx是否达到100
                                jne     nextNumber      ; 如果没达到,就继续循环

                                call    quit            ; 否则结束程序运行
                            
~$ nasm -f elf fizzbuzz.asm ~$ ld -m elf_i386 fizzbuzz.o -o fizzbuzz ~$ ./fizzbuzz 1 2 Fizz 4 Buzz Fizz 7 8 Fizz Buzz 11 Fizz 13 14 FizzBuzz 16 ...

Lesson 19

Execute Command

执行命令

Firstly, some background
一些背景知识

The EXEC family of functions replace the currently running process with a new process, that executes the command you specified when calling it. We will be using the SYS_EXECVE function in this lesson to replace our program's running process with a new process that will execute the linux program /bin/echo to print out “Hello World!”.

EXEC是一系列函数,用于将当前的进程替换为一个新进程,用以执行你在调用EXEC函数时指定的命令。在本课中,我们将使用SYS_EXECVE函数来替换我们程序的进程为/bin/echo以输出'Hello World!'

Naming convention
命名规范

The naming convention used for this family of functions is exec (execute) followed by one or more of the following letters.

  • e - An array of pointers to environment variables is explicitly passed to the new process image.
  • l - Command-line arguments are passed individually to the function.
  • p - Uses the PATH environment variable to find the file named in the path argument to be executed.
  • v - Command-line arguments are passed to the function as an array of pointers.

exec系列的函数使用了如下的命名规范

  • e - 一个指向环境变量的指针数组将会传递给新的进程.
  • l - 命令行参数将会在函数中独立传递
  • p - 使用PATH环境变量来寻找传入的文件
  • v - 命令行参数将会作为指针数组传入
Writing our program
写程序

The V & E at the end of our function name means we will need to pass our arguments in the following format: The first argument is a string containing the command to execute, then an array of arguments to pass to that command and then another array of environment variables that the new process will use. As we are calling a simple command we won't pass any special environment variables to the new process and instead will pass 0h (null).

我们调用的EXEC末尾的V和E意味着传递到参数遵循如下的原则:第一个参数是要执行的命令,接着是参数的数组。最后是新进程要使用到的环境变量。因为我们不需要传递什么环境变量,所以我们只传递一个NULL。

Both the command arguments and the environment arguments need to be passed as an array of pointers (addresses to memory). That's why we define our strings first and then define a simple null-terminated struct (array) of the variables names. This is then passed to SYS_EXECVE. We call the function and the process is replaced by our command and output is returned to the terminal.

命令行参数和环境变量都需要以指针数组的形式传递。这也是为什么我们先定义字符串,再定义一个简单的以NULL结束的数组传递给SYS_EXECVE。我们调用这个函数接着现在的进程会被我们的命令所替换,输出会返回到终端上。

execute.asm
                            ; Execute
                            ; Compile with: nasm -f elf execute.asm
                            ; Link with (64 bit systems require elf_i386 option): ld -m elf_i386 execute.o -o execute
                            ; Run with: ./execute

                            %include        'functions.asm'

                            SECTION .data
                            command         db      '/bin/echo', 0h     ; 要执行的命令
                            arg1            db      'Hello World!', 0h
                            arguments       dd      command
                                            dd      arg1                ; 要传递的命令行参数
                                            dd      0h                  ; 结束这个数组结构
                            environment     dd      0h                  ; 要传递的环境变量,没有就只传递NULL结束符

                            SECTION .text
                            global  _start

                            _start:

                                mov     edx, environment    ; EDX存放环境变量数组的地址
                                mov     ecx, arguments      ; ECX存放命令行参数数组地址
                                mov     ebx, command        ; EBX存放要执行的命令
                                mov     eax, 11             ; 执行SYS_EXECVE,编号11
                                int     80h

                                call    quit                ; 调用退出函数(译者注:实际上执行不到这行)
                            
~$ nasm -f elf execute.asm ~$ ld -m elf_i386 execute.o -o execute ~$ ./execute Hello World!

Note: Here are a couple other commands to try.

注意: 也有一些其他可以尝试的命令。

execute.asm
                            SECTION .data
                            command         db      '/bin/ls', 0h       ; 可以执行的命令
                            arg1            db      '-l', 0h
                            
execute.asm
                            SECTION .data
                            command         db      '/bin/sleep', 0h    ; 可以执行的命令
                            arg1            db      '5', 0h
                            

Lesson 20

Process Forking

进程分叉(子进程)

Firstly, some background
一些背景知识

In this lesson we will use SYS_FORK to create a new process that duplicates our current process. SYS_FORK takes no arguments - you just call fork and the new process is created. Both processes run concurrently. We can test the return value (in eax) to test whether we are currently in the parent or child process. The parent process returns a non-negative, non-zero integer. In the child process EAX is zero. This can be used to branch your logic between the parent and child.

本课中,我们将使用SYS_FORK这个系统调用来创建一个重复我们当前进程的新进程。SYS_FORK系统调用不需要任何参数,只需要调用它,就能直接创建出新的进程。这两个进程是并发执行的,我们可以使用这个系统调用存放在EAX寄存器里面的返回值来区分我们在原来的进程还是子进程里面。原来的进程的EAX返回一个非负非零的整数,而子进程的EAX是0.接下来可以以此通过分支来区分原来的进程和子进程。

In our program we exploit this fact to print out different messages in each process.

我们的程序将会揭示这一点并在不同的进程中输出不同的信息。

Note: Each process is responsible for safely exiting.

注意: 每一个进程都有责任安全地退出。

fork.asm
                            ; Fork
                            ; Compile with: nasm -f elf fork.asm
                            ; Link with (64 bit systems require elf_i386 option): ld -m elf_i386 fork.o -o fork
                            ; Run with: ./fork

                            %include        'functions.asm'

                            SECTION .data
                            childMsg        db      'This is the child process', 0h     ; 字符串消息
                            parentMsg       db      'This is the parent process', 0h    ; 字符串消息

                            SECTION .text
                            global  _start

                            _start:

                                mov     eax, 2              ; 调用SYS_FORK,系统调用编号为2
                                int     80h

                                cmp     eax, 0              ; 如果EAX为0,说明在子进程里面
                                jz      child               ; 零则跳转

                            parent:
                                mov     eax, parentMsg      ; 在原来的进程里面输出的文本
                                call    sprintLF            ; 调用文本输出函数

                                call    quit                ; 结束原来的进程

                            child:
                                mov     eax, childMsg       ; 在子进程里面输出的文本
                                call    sprintLF            ; 调用文本输出函数

                                call    quit                ; 结束子进程
                            
~$ nasm -f elf fork.asm ~$ ld -m elf_i386 fork.o -o fork ~$ ./fork This is the parent process This is the child process

Lesson 21

Telling the time

为您报时

Generating a unix timestamp in NASM is easy with the SYS_TIME function of the linux kernel. Simply pass OPCODE 13 to the kernel with no arguments and you are returned the Unix Epoch in the EAX register.

在NASM汇编中,你可以很轻易地使用SYS_TIME系统调用获取到一个Unix时间戳。只需要传递(在x86汇编中)编号为13的系统调用给内核,你就能在EAX中获取到一个Unix时间戳

That is the number of seconds that have elapsed since January 1st 1970 UTC.

换句话说,就是从UTC(北京时间 减8小时) 1970年1月1日零点整到现在的秒数。

time.asm
                            ; Time
                            ; Compile with: nasm -f elf time.asm
                            ; Link with (64 bit systems require elf_i386 option): ld -m elf_i386 time.o -o time
                            ; Run with: ./time

                            %include        'functions.asm'

                            SECTION .data
                            msg        db      'Seconds since Jan 01 1970: ', 0h     ; 字符串

                            SECTION .text
                            global  _start

                            _start:

                                mov     eax, msg        ; 移动要输出的文本到EAX
                                call    sprint          ; 调用文本输出函数

                                mov     eax, 13         ; 调用SYS_TIME
                                int     80h             ; 激活系统调用

                                call    iprintLF        ; 调用数字输出函数
                                call    quit            ; 结束程序
                            
~$ nasm -f elf time.asm ~$ ld -m elf_i386 time.o -o time ~$ ./time Seconds since Jan 01 1970: 1374995660

Lesson 22

File Handling - Create

文件句柄 - 创建

Firstly, some background
一些背景知识

File Handling in Linux is achieved through a small number of system calls related to creating, updating and deleting files. These functions require a file descriptor which is a unique, non-negative integer that identifies the file on the system.

Linux中的文件句柄是通过少量与文件创建、读写和删除的系统调用来实现的。这些函数需要在系统唯一确定的非负整数作为文件描述符

Writing our program
写程序

We begin the tutorial by creating a file using sys_creat. We will then build upon our program in each of the following file handling lessons, adding code as we go. Eventually we will have a full program that can create, update, open, close and delete files.

我们通过使用SYS_CREAT系统调用来创建文件。接下来关于文件句柄的若干课程都是在本课程基础上新增代码实现的。最终我们要做出一个能够创建、上传、打开、关闭和删除文件的完整的应用程序。

sys_creat expects 2 arguments - the file permissions in ECX and the filename in EBX. The sys_creat opcode is then loaded into EAX and the kernel is called to create the file. The file descriptor of the created file is returned in EAX. This file descriptor can then be used for all other file handling functions.

SYS_CREAT系统调用需要两个参数,EBX存放文件名、ECX存放文件权限。将SYS_CREAT的操作数传入EAX调用内核触发系统调用,就可以创建文件了。被创建的文件的文件描述符会被作为返回值存放在EAX,接着可以用于其他那些操作文件的函数中。

create.asm
                                ; Create
                                ; Compile with: nasm -f elf create.asm
                                ; Link with (64 bit systems require elf_i386 option): ld -m elf_i386 create.o -o create
                                ; Run with: ./create

                                %include    'functions.asm'

                                SECTION .data
                                filename db 'readme.txt', 0h    ; 要创建的文件名

                                SECTION .text
                                global  _start

                                _start:

                                    mov     ecx, 0777o          ; 全部用户可读可写可执行
                                    mov     ebx, filename       ; 要创建的文件名
                                    mov     eax, 8              ; SYS_CREAT的系统调用编号为8
                                    int     80h                 ; 触发系统调用

                                    call    quit                ; 退出程序
                            
~$ nasm -f elf create.asm ~$ ld -m elf_i386 create.o -o create ~$ ./create

Note: The file 'readme.txt' will now have been created in the folder.

注意: 现在'readme.txt''就会被创建于程序的工作目录。


Lesson 23

File Handling - Write

文件句柄 - 写入

Building upon the previous lesson we will now use sys_write to write content to a newly created file.

在上一课的基础上,我们现在将使用sys_write来向新创建的文件中写入内容。

sys_write expects 3 arguments - the number of bytes to write in EDX, the contents string to write in ECX and the file descriptor in EBX. The sys_write opcode is then loaded into EAX and the kernel is called to write the content to the file. In this lesson we will first call sys_creat to get a file descriptor which we will then load into EBX.

SYS_WRITE需要传入三个参数,分别EBX存放文件描述符,ECX存放要写入的字符串,EDX是要写入的字节数。将SYS_WRITE的操作数写入EAX再调用系统调用,就能够将内容写入到文件中。在本课程中,我们先调用SYS_CREAT获取到要写入的文件的描述符。

write.asm
                                ; Write
                                ; Compile with: nasm -f elf write.asm
                                ; Link with (64 bit systems require elf_i386 option): ld -m elf_i386 write.o -o write
                                ; Run with: ./write

                                %include    'functions.asm'

                                SECTION .data
                                filename db 'readme.txt', 0h    ; 要创建的文件名
                                contents db 'Hello world!', 0h  ; 要写入的内容

                                SECTION .text
                                global  _start

                                _start:

                                    mov     ecx, 0777o          ; 这里接续着上一课创建文件的代码
                                    mov     ebx, filename
                                    mov     eax, 8
                                    int     80h

                                    mov     edx, 12             ; 要写入的字节数(译者注:这里可以复用此前的strlen)
                                    mov     ecx, contents       ; 要写入的文本
                                    mov     ebx, eax            ; 在上面的系统调用之后,EAX存放的是新建的文件的文件描述符,在这里传入ebx作为参数
                                    mov     eax, 4              ; 在x86汇编中,SYS_WRITE是4
                                    int     80h                 ; 触发系统调用

                                    call    quit                ; 退出
                            
~$ nasm -f elf write.asm ~$ ld -m elf_i386 write.o -o write ~$ ./write

Note: Open the newly created file 'readme.txt' in this folder and you will see the content 'Hello world!'.

注意: 现在打开新创建的'readme.txt',就可以看到内容'Hello world!'


Lesson 24

File Handling - Open

文件句柄 - 打开

Building upon the previous lesson we will now use sys_open to obtain the file descriptor of the newly created file. This file descriptor can then be used for all other file handling functions.

在先前课程的基础上,我们现在将使用SYS_OPEN系统调用来获取新创建的文件的文件描述符,以此传递给其他各种文件相关函数。

sys_open expects 2 arguments - the access mode (table below) in ECX and the filename in EBX. The sys_open opcode is then loaded into EAX and the kernel is called to open the file and return the file descriptor.

SYS_OPEN函数需要两个参数,文件名存放在EBX、打开模式(见下表)存放在ECX中。SYS_OPEN的操作数储存在EAX中,在调用之后的返回值文件描述符也会储存在EAX中。

sys_open additionally accepts zero or more file creation flags and file status flags in EDX. Click here for more information about the access mode, file creation flags and file status flags.

SYS_OPEN额外接受0或者更多的文件标志位或文件状态传入EDX作为参数,详情可以参见此处。

Description 描述 Value 取值
O_RDONLY open file in read only mode 0
O_RDONLY 以只读模式打开文件 0
O_WRONLY open file in write only mode 1
O_WRONLY 以只写模式打开文件 1
O_RDWR open file in read and write mode 2
O_RDWR 以可读可写模式打开文件 2

Note: sys_open returns the file descriptor in EAX. On linux this will be a unique, non-negative integer which we will print using our integer printing function.

注意: SYS_OPEN将会把文件描述符的返回值存放在EAX中,在Linux系统中,文件描述符将会是唯一的非负整数。我们将使用此前做好的整数输出函数来演示文件描述符的取值。

open.asm
                                ; Open
                                ; Compile with: nasm -f elf open.asm
                                ; Link with (64 bit systems require elf_i386 option): ld -m elf_i386 open.o -o open
                                ; Run with: ./open

                                %include    'functions.asm'

                                SECTION .data
                                filename db 'readme.txt', 0h    ; 文件名
                                contents db 'Hello world!', 0h  ; 要写入的内容

                                SECTION .text
                                global  _start

                                _start:

                                    mov     ecx, 0777o          ; 第22课创建文件
                                    mov     ebx, filename
                                    mov     eax, 8
                                    int     80h

                                    mov     edx, 12             ; 第23课写入文件
                                    mov     ecx, contents
                                    mov     ebx, eax
                                    mov     eax, 4
                                    int     80h

                                    mov     ecx, 0              ; 只读的文件操作模式(O_RDONLY)
                                    mov     ebx, filename       ; 我们之前创建的文件名
                                    mov     eax, 5              ; SYS_OPEN的系统调用编号为5
                                    int     80h                 ; 触发系统调用

                                    call    iprintLF            ; 调用整数输出函数
                                    call    quit                ; 结束程序
                            
~$ nasm -f elf open.asm ~$ ld -m elf_i386 open.o -o open ~$ ./open 4

Lesson 25

File Handling - Read

文件句柄 - 读取

Building upon the previous lesson we will now use sys_read to read the content of a newly created and opened file. We will store this string in a variable.

在前几课的基础上,我们将使用sys_read系统调用来读取我们新创建和打开的文件的内容。我们将储存读入的字符串进入变量。

sys_read expects 3 arguments - the number of bytes to read in EDX, the memory address of our variable in ECX and the file descriptor in EBX. We will use the previous lessons sys_open code to obtain the file descriptor which we will then load into EBX. The sys_read opcode is then loaded into EAX and the kernel is called to read the file contents into our variable and is then printed to the screen.

SYS_READ系统调用需要传入三个参数,分别EBX文件描述符、ECX目标内存地址、EDX要读取的字节数。我们将会是用之前的SYS_OPEN来获取文件描述符,加载到EBX中。将SYS_READ的系统调用编号传入EAX再触发系统调用,就可以将文件的内容读取变量并输出到屏幕中。

Note: We will reserve 255 bytes in the .bss section to store the contents of the file. See Lesson 9 for more information on the .bss section.

注意: 我们在bss段保留255字节来储存文件的内容。跳转回到第九课关于BSS段的内容

read.asm
                                ; Read
                                ; Compile with: nasm -f elf read.asm
                                ; Link with (64 bit systems require elf_i386 option): ld -m elf_i386 read.o -o read
                                ; Run with: ./read

                                %include    'functions.asm'

                                SECTION .data
                                filename db 'readme.txt', 0h    ; 文件名
                                contents db 'Hello world!', 0h  ; 写入内容

                                SECTION .bss
                                fileContents resb 255,          ; 储存文件内容的BSS段

                                SECTION .text
                                global  _start

                                _start:

                                    mov     ecx, 0777o          ; 第22课创建文件
                                    mov     ebx, filename
                                    mov     eax, 8
                                    int     80h

                                    mov     edx, 12             ; 第23课写入文件
                                    mov     ecx, contents
                                    mov     ebx, eax
                                    mov     eax, 4
                                    int     80h

                                    mov     ecx, 0              ; 第24课打开文件
                                    mov     ebx, filename
                                    mov     eax, 5
                                    int     80h

                                    mov     edx, 12             ; 要读取的字节数
                                    mov     ecx, fileContents   ; 传入要储存读取内容的地址
                                    mov     ebx, eax            ; 将已打开文件的描述符传入EBX
                                    mov     eax, 3              ; SYS_READ的系统调用编号是3
                                    int     80h                 ; 触发系统调用

                                    mov     eax, fileContents   ; 将EAX指向储存读入内容的字符串
                                    call    sprintLF            ; 调用文本输出函数

                                    call    quit                ; 退出程序
                            
~$ nasm -f elf read.asm ~$ ld -m elf_i386 read.o -o read ~$ ./read Hello world!

Lesson 26

File Handling - Close

文件句柄 - 关闭

Building upon the previous lesson we will now use sys_close to properly close an open file.

在以上课程的基础上,我们将使用SYS_CLOSE来恰当地关闭已打开的文件。

sys_close expects 1 argument - the file descriptor in EBX. We will use the previous lessons code to obtain the file descriptor which we will then load into EBX. The sys_close opcode is then loaded into EAX and the kernel is called to close the file and remove the active file descriptor.

SYS_CLOSE只需要向EBX传入要关闭的文件的文件描述符即可。我们将使用此前打开文件的代码来获取要传入EBX以关闭的文件描述符。SYS_CLOSE操作数加载到EAX后触发系统调用,文件就可以恰当地被关闭,活跃的文件描述符也随之移除。

close.asm
                                ; Close
                                ; Compile with: nasm -f elf close.asm
                                ; Link with (64 bit systems require elf_i386 option): ld -m elf_i386 close.o -o close
                                ; Run with: ./close

                                %include    'functions.asm'

                                SECTION .data
                                filename db 'readme.txt', 0h    ; 文件名
                                contents db 'Hello world!', 0h  ; 要写入的内容

                                SECTION .bss
                                fileContents resb 255,          ; 储存文件内容的BSS段

                                SECTION .text
                                global  _start

                                _start:

                                    mov     ecx, 0777o          ; 第22课创建文件
                                    mov     ebx, filename
                                    mov     eax, 8
                                    int     80h

                                    mov     edx, 12             ; 第23课写入文件
                                    mov     ecx, contents
                                    mov     ebx, eax
                                    mov     eax, 4
                                    int     80h

                                    mov     ecx, 0              ; 第24课打开文件
                                    mov     ebx, filename
                                    mov     eax, 5
                                    int     80h

                                    mov     edx, 12             ; 第25课读取文件
                                    mov     ecx, fileContents
                                    mov     ebx, eax
                                    mov     eax, 3
                                    int     80h

                                    mov     eax, fileContents
                                    call    sprintLF

                                    mov     ebx, ebx            ; 这句不是必须的,但是在逻辑上意思是把SYS_CLOSE的EBX需要
                                    mov     eax, 6              ; SYS_CLOSE的操作数是6
                                    int     80h                 ; 触发系统调用

                                    call    quit                ; 调用退出函数
                            
~$ nasm -f elf close.asm ~$ ld -m elf_i386 close.o -o close ~$ ./close Hello world!

Note: We have properly closed the file and removed the active file descriptor.

注意: 我们已经恰当地关闭了文件并且移除了活跃的文件描述符。


Lesson 27

File Handling - Seek

文件句柄 - 采集

In this lesson we will open a file and update the file contents at the end of the file using sys_lseek.

本课中,我们将打开一个文件,再使用SYS_LSEEK达到文件末尾并上传内容。

Using sys_lseek you can move the cursor within the file by an offset in bytes. The below example will move the cursor to the end of the file, then pass 0 bytes as the offset (so we append to the end of the file and not beyond) before writing a string in that position. Try different values in ECX and EDX to write the content to different positions within the opened file.

使用SYS_LSEEK可以移动文件内读写光标前进一定偏移量的字符数。下面的例子将演示如何移动光标达到文件末尾,接着在开始写入字符串之前再设置传递偏移量为0以确保达到末尾而不是超出了文件。也可以尝试在ECX和EDX中传入不同的值来看看内容会被写入到文件的什么地方。

sys_lseek expects 3 arguments - the whence argument (table below) in EDX, the offset in bytes in ECX, and the file descriptor in EBX. The sys_lseek opcode is then loaded into EAX and we call the kernel to move the file pointer to the correct offset. We then use sys_write to update the content at that position.

SYS_LSEEK需要三个参数,EBX传入文件描述符,ECX传入偏移量,EDX传入参考点。将SYS_LSEEK的操作数加载到EAX中,接着触发系统调用,文件操作光标就会移动到需要的位置。我们接下来使用SYS_WRITE在目标位置写入内容。

描述 取值
SEEK_SET beginning of the file 0
SEEK_SET 文件开头 0
SEEK_CUR current file offset 1
SEEK_CUR 当前光标偏移量 1
SEEK_END end of the file 2
SEEK_END 文件末尾 2

Note: A file 'readme.txt' has been included in the code folder for this lesson. This file will be updated after running the program.

注意: 文件'readme.txt'已经放到了这一课的代码文件夹中,在运行程序之后,文件内容将会发生改变。

seek.asm
                                ; Seek
                                ; Compile with: nasm -f elf seek.asm
                                ; Link with (64 bit systems require elf_i386 option): ld -m elf_i386 seek.o -o seek
                                ; Run with: ./seek

                                %include    'functions.asm'

                                SECTION .data
                                filename db 'readme.txt', 0h    ; 文件名
                                contents  db '-updated-', 0h     ; 要写入的内容

                                SECTION .text
                                global  _start

                                _start:

                                    mov     ecx, 1              ; 以只写模式(O_WRONLY)打开文件
                                    mov     ebx, filename       ; 要打开的文件名
                                    mov     eax, 5              ; 传入SYS_OPEN (操作数为5)到EAX
                                    int     80h                 ; 触发系统调用

                                    mov     edx, 2              ; 传入偏移参考点(SEEK_END)
                                    mov     ecx, 0              ; 偏移0字节,也就是在文件末尾不移动
                                    mov     ebx, eax            ; 将刚才SYS_OPEN获取到的文件描述符从eax传入ebx作为参数
                                    mov     eax, 19             ; 传入SYS_LSEEK (操作数 19)到EAX
                                    int     80h                 ; 触发系统调用

                                    mov     edx, 9              ; 要写入的内容
                                    mov     ecx, contents       ; 移动要写入的文本到ECX
                                    mov     ebx, ebx            ; 这一步是出于演示的,将文件描述符传入ebx
                                    mov     eax, 4              ; 传入SYS_WRITE (操作数 4)到EAX
                                    int     80h                 ; 触发系统调用

                                    call    quit                ; 退出程序
                            
~$ nasm -f elf seek.asm ~$ ld -m elf_i386 seek.o -o seek ~$ ./seek

Lesson 28

File Handling - Delete

文件句柄 - 删除

Deleting a file on linux is achieved by calling sys_unlink.

在Linux中,可以通过SYS_UNLINK系统调用删除文件。

sys_unlink expects 1 argument - the filename in EBX. The sys_unlink opcode is then loaded into EAX and the kernel is called to delete the file.

SYS_UNLINK只需要将文件名传入EBX作为唯一的参宿,接着将调用操作数传入EAX触发系统调用即可删除文件。

Note: A file 'readme.txt' has been included in the code folder for this lesson. This file will be deleted after running the program.

注意: 一个'readme.txt'已经被存放在本课的代码文件夹中,在运行完程序之后,这个文件将会被删除。

unlink.asm
                                ; Unlink
                                ; Compile with: nasm -f elf unlink.asm
                                ; Link with (64 bit systems require elf_i386 option): ld -m elf_i386 unlink.o -o unlink
                                ; Run with: ./unlink

                                %include    'functions.asm'

                                SECTION .data
                                filename db 'readme.txt', 0h    ; 要删除的文件名

                                SECTION .text
                                global  _start

                                _start:

                                    mov     ebx, filename       ; 将文件名传入EBX
                                    mov     eax, 10             ; 将SYS_UNLINK (操作数为 10)传入EAX
                                    int     80h                 ; 触发系统调用

                                    call    quit                ; 退出程序
                            
~$ nasm -f elf unlink.asm ~$ ld -m elf_i386 unlink.o -o unlink ~$ ./unlink

Lesson 29

Sockets - Create

套接字 - 创建

Firstly, some background
一些背景信息

Socket Programming in Linux is achieved through the use of the SYS_SOCKETCALL kernel function. The SYS_SOCKETCALL function is somewhat unique in that it encapsulates a number of different subroutines, all related to socket operations, within the one function. By passing different integer values in EBX we can change the behaviour of this function to create, listen, send, receive, close and more. Click here to view the full commented source code of the completed program.

在Linux中,套接字编程是通过SYS_SOCKETCALL系统调用实现的。SYS_SOCKETCALL以一种独特的方式在一个函数之内实现了若干种关于套接字的操作子程序。通过向EBX传递不同的整数,我们可以调整它的行为,涵盖了创建、监听、发送、接收、关闭等等各种操作。这个链接指向了充分注释过的完整程序源代码。

Writing our program
写程序

We begin the tutorial by first initalizing some of our registers which we will use later to store important values. We will then create a socket using SYS_SOCKETCALL's first subroutine which is called 'socket'. We will then build upon our program in each of the following socket programming lessons, adding code as we go. Eventually we will have a full program that can create, bind, listen, accept, read, write and close sockets.

我们首先通过初始化一些之后用来储存重要数据的寄存器来开始这节教程。我们将会使用SYS_SOCKETCALL的第一个子程序,俗称'socket',来创建一个套接字。接下来若干关于套接字的课程中,我们将不断在这个程序的基础上添加代码。最终我们能够得到一个涵盖了创建、绑定、监听、接受、读取、写入和关闭套接字的完整程序。

SYS_SOCKETCALL's subroutine 'socket' expects 2 arguments - a pointer to an array of arguments in ECX and the integer value 1 in EBX. The SYS_SOCKETCALL opcode is then loaded into EAX and the kernel is called to create the socket. Because everything in linux is a file, we recieve back the file descriptor of the created socket in EAX. This file descriptor can then be used for performing other socket programming functions.

SYS_SOCKETCALL的'socket'子程序需要两个参数,EBX传入整数1,ECX要传入一个参数的数组,将调用操作数传入EAX接着触发系统调用就能获取到套接字。在Linux中,一切都是文件(译者注:实际上这是Unix哲学的一部分),所以我们将在EAX得到新创建的套接字的文件描述符,这个文件描述符将用于接下来的其他套接字操作函数中。

Note: XORing a register by itself is an efficent way of ensuring the register is initalised with the integer value zero and doesn't contain an unexpected value that could corrupt your program.

注意: 将一个寄存器与其自身异或(XOR)是初始化该寄存器的值为0的高效途径,这可以确保寄存器不含有可能导致程序异常的值。

socket.asm
                                ; Socket
                                ; Compile with: nasm -f elf socket.asm
                                ; Link with (64 bit systems require elf_i386 option): ld -m elf_i386 socket.o -o socket
                                ; Run with: ./socket

                                %include    'functions.asm'

                                SECTION .text
                                global  _start

                                _start:

                                    xor     eax, eax            ; EAX置0
                                    xor     ebx, ebx            ; EBX置0
                                    xor     edi, edi            ; EDI置0
                                    xor     esi, esi            ; ESI置0

                                _socket:

                                    push    byte 6              ; push 6 onto the stack (IPPROTO_TCP)
                                                                ; 将6 TCP协议 入栈
                                    push    byte 1              ; push 1 onto the stack (SOCK_STREAM)
                                                                ; 将1 面向连接到TCP服务 入栈
                                    push    byte 2              ; push 2 onto the stack (PF_INET)
                                                                ; 将2 协议族 入栈
                                    mov     ecx, esp            ; move address of arguments into ecx
                                                                ; 将栈指针寄存器传入ECX,表示参数数组来自栈顶
                                    mov     ebx, 1              ; invoke subroutine SOCKET (1)
                                                                ; 使用‘socket’子程序(1)
                                    mov     eax, 102            ; SYS_SOCKETCALL的系统调用操作数是102
                                    int     80h                 ; 触发系统调用

                                    call    iprintLF            ; 调用整数输出函数,可以将套接字的文件描述符输出。如果报错,则是-1

                                _exit:

                                    call    quit                ; 退出程序
                            
~$ nasm -f elf socket.asm ~$ ld -m elf_i386 socket.o -o socket ~$ ./socket 3

Lesson 30

Sockets - Bind

Building on the previous lesson we will now associate the created socket with a local IP address and port which will allow us to connect to it. We do this by calling the second subroutine of SYS_SOCKETCALL which is called 'bind'.

We begin by storing the file descriptor we recieved in lesson 29 into EDI. EDI was originally called the Destination Index and is traditionally used in copy routines to store the location of a target file.

SYS_SOCKETCALL's subroutine 'bind' expects 2 arguments - a pointer to an array of arguments in ECX and the integer value 2 in EBX. The SYS_SOCKETCALL opcode is then loaded into EAX and the kernel is called to bind the socket.

socket.asm
                                ; Socket
                                ; Compile with: nasm -f elf socket.asm
                                ; Link with (64 bit systems require elf_i386 option): ld -m elf_i386 socket.o -o socket
                                ; Run with: ./socket

                                %include    'functions.asm'

                                SECTION .text
                                global  _start

                                _start:

                                    xor     eax, eax            ; initialize some registers
                                    xor     ebx, ebx
                                    xor     edi, edi
                                    xor     esi, esi

                                _socket:

                                    push    byte 6              ; create socket from lesson 29
                                    push    byte 1
                                    push    byte 2
                                    mov     ecx, esp
                                    mov     ebx, 1
                                    mov     eax, 102
                                    int     80h

                                _bind:

                                    mov     edi, eax            ; move return value of SYS_SOCKETCALL into edi (file descriptor for new socket, or -1 on error)
                                    push    dword 0x00000000    ; push 0 dec onto the stack IP ADDRESS (0.0.0.0)
                                    push    word 0x2923         ; push 9001 dec onto stack PORT (reverse byte order)
                                    push    word 2              ; push 2 dec onto stack AF_INET
                                    mov     ecx, esp            ; move address of stack pointer into ecx
                                    push    byte 16             ; push 16 dec onto stack (arguments length)
                                    push    ecx                 ; push the address of arguments onto stack
                                    push    edi                 ; push the file descriptor onto stack
                                    mov     ecx, esp            ; move address of arguments into ecx
                                    mov     ebx, 2              ; invoke subroutine BIND (2)
                                    mov     eax, 102            ; invoke SYS_SOCKETCALL (kernel opcode 102)
                                    int     80h                 ; call the kernel

                                _exit:

                                    call    quit                ; call our quit function
                            
~$ nasm -f elf socket.asm ~$ ld -m elf_i386 socket.o -o socket ~$ ./socket

Lesson 31

Sockets - Listen

In the previous lessons we created a socket and used the 'bind' subroutine to associate it with a local IP address and port. In this lesson we will use the 'listen' subroutine of SYS_SOCKETCALL to tell our socket to listen for incoming TCP requests. This will allow us to read and write to anyone who connects to our socket.

SYS_SOCKETCALL's subroutine 'listen' expects 2 arguments - a pointer to an array of arguments in ECX and the integer value 4 in EBX. The SYS_SOCKETCALL opcode is then loaded into EAX and the kernel is called. If succesful the socket will begin listening for incoming requests.

socket.asm
                                ; Socket
                                ; Compile with: nasm -f elf socket.asm
                                ; Link with (64 bit systems require elf_i386 option): ld -m elf_i386 socket.o -o socket
                                ; Run with: ./socket

                                %include    'functions.asm'

                                SECTION .text
                                global  _start

                                _start:

                                    xor     eax, eax            ; initialize some registers
                                    xor     ebx, ebx
                                    xor     edi, edi
                                    xor     esi, esi

                                _socket:

                                    push    byte 6              ; create socket from lesson 29
                                    push    byte 1
                                    push    byte 2
                                    mov     ecx, esp
                                    mov     ebx, 1
                                    mov     eax, 102
                                    int     80h

                                _bind:

                                    mov     edi, eax            ; bind socket from lesson 30
                                    push    dword 0x00000000
                                    push    word 0x2923
                                    push    word 2
                                    mov     ecx, esp
                                    push    byte 16
                                    push    ecx
                                    push    edi
                                    mov     ecx, esp
                                    mov     ebx, 2
                                    mov     eax, 102
                                    int     80h

                                _listen:

                                    push    byte 1              ; move 1 onto stack (max queue length argument)
                                    push    edi                 ; push the file descriptor onto stack
                                    mov     ecx, esp            ; move address of arguments into ecx
                                    mov     ebx, 4              ; invoke subroutine LISTEN (4)
                                    mov     eax, 102            ; invoke SYS_SOCKETCALL (kernel opcode 102)
                                    int     80h                 ; call the kernel

                                _exit:

                                    call    quit                ; call our quit function
                            
~$ nasm -f elf socket.asm ~$ ld -m elf_i386 socket.o -o socket ~$ ./socket

Lesson 32

Sockets - Accept

In the previous lessons we created a socket and used the 'bind' subroutine to associate it with a local IP address and port. We then used the 'listen' subroutine of SYS_SOCKETCALL to tell our socket to listen for incoming TCP requests. Now we will use the 'accept' subroutine of SYS_SOCKETCALL to tell our socket to accept those incoming requests. Our socket will then be ready to read and write to remote connections.

SYS_SOCKETCALL's subroutine 'accept' expects 2 arguments - a pointer to an array of arguments in ECX and the integer value 5 in EBX. The SYS_SOCKETCALL opcode is then loaded into EAX and the kernel is called. The 'accept' subroutine will create another file descriptor, this time identifying the incoming socket connection. We will use this file descriptor to read and write to the incoming connection in later lessons.

Note: Run the program and use the command sudo netstat -plnt in another terminal to view the socket listening on port 9001.

socket.asm
                                ; Socket
                                ; Compile with: nasm -f elf socket.asm
                                ; Link with (64 bit systems require elf_i386 option): ld -m elf_i386 socket.o -o socket
                                ; Run with: ./socket

                                %include    'functions.asm'

                                SECTION .text
                                global  _start

                                _start:

                                    xor     eax, eax            ; initialize some registers
                                    xor     ebx, ebx
                                    xor     edi, edi
                                    xor     esi, esi

                                _socket:

                                    push    byte 6              ; create socket from lesson 29
                                    push    byte 1
                                    push    byte 2
                                    mov     ecx, esp
                                    mov     ebx, 1
                                    mov     eax, 102
                                    int     80h

                                _bind:

                                    mov     edi, eax            ; bind socket from lesson 30
                                    push    dword 0x00000000
                                    push    word 0x2923
                                    push    word 2
                                    mov     ecx, esp
                                    push    byte 16
                                    push    ecx
                                    push    edi
                                    mov     ecx, esp
                                    mov     ebx, 2
                                    mov     eax, 102
                                    int     80h

                                _listen:

                                    push    byte 1              ; listen socket from lesson 31
                                    push    edi
                                    mov     ecx, esp
                                    mov     ebx, 4
                                    mov     eax, 102
                                    int     80h

                                _accept:

                                    push    byte 0              ; push 0 dec onto stack (address length argument)
                                    push    byte 0              ; push 0 dec onto stack (address argument)
                                    push    edi                 ; push the file descriptor onto stack
                                    mov     ecx, esp            ; move address of arguments into ecx
                                    mov     ebx, 5              ; invoke subroutine ACCEPT (5)
                                    mov     eax, 102            ; invoke SYS_SOCKETCALL (kernel opcode 102)
                                    int     80h                 ; call the kernel

                                _exit:

                                    call    quit                ; call our quit function

                            
~$ nasm -f elf socket.asm ~$ ld -m elf_i386 socket.o -o socket ~$ ./socket

Lesson 33

Sockets - Read

When an incoming connection is accepted by our socket, a new file descriptor identifying the incoming socket connection is returned in EAX. In this lesson we will use this file descriptor to read the incoming request headers from the connection.

We begin by storing the file descriptor we recieved in lesson 32 into ESI. ESI was originally called the Source Index and is traditionally used in copy routines to store the location of a target file.

We will use the kernel function sys_read to read from the incoming socket connection. As we have done in previous lessons, we will create a variable to store the contents being read from the file descriptor. Our socket will be using the HTTP protocol to communicate. Parsing HTTP request headers to determine the length of the incoming message and accepted response formats is beyond the scope of this tutorial. We will instead just read up to the first 255 bytes and print that to standardout.

Once the incoming connection has been accepted, it is very common for webservers to spawn a child process to manage the read/write communication. The parent process is then free to return to the listening/accept state and accept any new incoming requests in parallel. We will implement this design pattern below using SYS_FORK and the JMP instruction prior to reading the request headers in the child process.

To generate valid request headers we will use the commandline tool curl to connect to our listening socket. But you can also use a standard web browser to connect in the same way.

sys_read expects 3 arguments - the number of bytes to read in EDX, the memory address of our variable in ECX and the file descriptor in EBX. The sys_read opcode is then loaded into EAX and the kernel is called to read the contents into our variable which is then printed to the screen.

Note: We will reserve 255 bytes in the .bss section to store the contents being read from the file descriptor. See Lesson 9 for more information on the .bss section.

Note: Run the program and use the command curl http://localhost:9001 in another terminal to view the request headers being read by our program.

socket.asm
                                ; Socket
                                ; Compile with: nasm -f elf socket.asm
                                ; Link with (64 bit systems require elf_i386 option): ld -m elf_i386 socket.o -o socket
                                ; Run with: ./socket

                                %include    'functions.asm'

                                SECTION .bss
                                buffer resb 255,                ; variable to store request headers

                                SECTION .text
                                global  _start

                                _start:

                                    xor     eax, eax            ; initialize some registers
                                    xor     ebx, ebx
                                    xor     edi, edi
                                    xor     esi, esi

                                _socket:

                                    push    byte 6              ; create socket from lesson 29
                                    push    byte 1
                                    push    byte 2
                                    mov     ecx, esp
                                    mov     ebx, 1
                                    mov     eax, 102
                                    int     80h

                                _bind:

                                    mov     edi, eax            ; bind socket from lesson 30
                                    push    dword 0x00000000
                                    push    word 0x2923
                                    push    word 2
                                    mov     ecx, esp
                                    push    byte 16
                                    push    ecx
                                    push    edi
                                    mov     ecx, esp
                                    mov     ebx, 2
                                    mov     eax, 102
                                    int     80h

                                _listen:

                                    push    byte 1              ; listen socket from lesson 31
                                    push    edi
                                    mov     ecx, esp
                                    mov     ebx, 4
                                    mov     eax, 102
                                    int     80h

                                _accept:

                                    push    byte 0              ; accept socket from lesson 32
                                    push    byte 0
                                    push    edi
                                    mov     ecx, esp
                                    mov     ebx, 5
                                    mov     eax, 102
                                    int     80h

                                _fork:

                                    mov     esi, eax            ; move return value of SYS_SOCKETCALL into esi (file descriptor for accepted socket, or -1 on error)
                                    mov     eax, 2              ; invoke SYS_FORK (kernel opcode 2)
                                    int     80h                 ; call the kernel

                                    cmp     eax, 0              ; if return value of SYS_FORK in eax is zero we are in the child process
                                    jz      _read               ; jmp in child process to _read

                                    jmp     _accept             ; jmp in parent process to _accept

                                _read:

                                    mov     edx, 255            ; number of bytes to read (we will only read the first 255 bytes for simplicity)
                                    mov     ecx, buffer         ; move the memory address of our buffer variable into ecx
                                    mov     ebx, esi            ; move esi into ebx (accepted socket file descriptor)
                                    mov     eax, 3              ; invoke SYS_READ (kernel opcode 3)
                                    int     80h                 ; call the kernel

                                    mov     eax, buffer         ; move the memory address of our buffer variable into eax for printing
                                    call    sprintLF            ; call our string printing function

                                _exit:

                                    call    quit                ; call our quit function
                            
~$ nasm -f elf socket.asm ~$ ld -m elf_i386 socket.o -o socket ~$ ./socket GET / HTTP/1.1 Host: localhost:9001 User-Agent: curl/x.xx.x Accept: */*

Lesson 34

Sockets - Write

When an incoming connection is accepted by our socket, a new file descriptor identifying the incoming socket connection is returned in EAX. In this lesson we will use this file descriptor to send our response to the connection.

We will use the kernel function sys_write to write to the incoming socket connection. As our socket will be communicating using the HTTP protocol, we will need to send some compulsory headers in order to allow HTTP speaking clients to connect. We will send these following the formatting rules set out in the RFC Standard.

sys_write expects 3 arguments - the number of bytes to write in EDX, the response string to write in ECX and the file descriptor in EBX. The sys_write opcode is then loaded into EAX and the kernel is called to send our response back through our socket to the incoming connection.

Note: We will create a variable in the .data section to store the response we will write to the file descriptor. See Lesson 1 for more information on the .data section.

Note: Run the program and use the command curl http://localhost:9001 in another terminal to view the response sent via our socket. Or connect to the same address using any standard web browser.

socket.asm
                                ; Socket
                                ; Compile with: nasm -f elf socket.asm
                                ; Link with (64 bit systems require elf_i386 option): ld -m elf_i386 socket.o -o socket
                                ; Run with: ./socket

                                %include    'functions.asm'

                                SECTION .data
                                ; our response string
                                response db 'HTTP/1.1 200 OK', 0Dh, 0Ah, 'Content-Type: text/html', 0Dh, 0Ah, 'Content-Length: 14', 0Dh, 0Ah, 0Dh, 0Ah, 'Hello World!', 0Dh, 0Ah, 0h

                                SECTION .bss
                                buffer resb 255,                ; variable to store request headers

                                SECTION .text
                                global  _start

                                _start:

                                    xor     eax, eax            ; initialize some registers
                                    xor     ebx, ebx
                                    xor     edi, edi
                                    xor     esi, esi

                                _socket:

                                    push    byte 6              ; create socket from lesson 29
                                    push    byte 1
                                    push    byte 2
                                    mov     ecx, esp
                                    mov     ebx, 1
                                    mov     eax, 102
                                    int     80h

                                _bind:

                                    mov     edi, eax            ; bind socket from lesson 30
                                    push    dword 0x00000000
                                    push    word 0x2923
                                    push    word 2
                                    mov     ecx, esp
                                    push    byte 16
                                    push    ecx
                                    push    edi
                                    mov     ecx, esp
                                    mov     ebx, 2
                                    mov     eax, 102
                                    int     80h

                                _listen:

                                    push    byte 1              ; listen socket from lesson 31
                                    push    edi
                                    mov     ecx, esp
                                    mov     ebx, 4
                                    mov     eax, 102
                                    int     80h

                                _accept:

                                    push    byte 0              ; accept socket from lesson 32
                                    push    byte 0
                                    push    edi
                                    mov     ecx, esp
                                    mov     ebx, 5
                                    mov     eax, 102
                                    int     80h

                                _fork:

                                    mov     esi, eax            ; fork socket from lesson 33
                                    mov     eax, 2
                                    int     80h

                                    cmp     eax, 0
                                    jz      _read

                                    jmp     _accept

                                _read:

                                    mov     edx, 255            ; read socket from lesson 33
                                    mov     ecx, buffer
                                    mov     ebx, esi
                                    mov     eax, 3
                                    int     80h

                                    mov     eax, buffer
                                    call    sprintLF

                                _write:

                                    mov     edx, 78             ; move 78 dec into edx (length in bytes to write)
                                    mov     ecx, response       ; move address of our response variable into ecx
                                    mov     ebx, esi            ; move file descriptor into ebx (accepted socket id)
                                    mov     eax, 4              ; invoke SYS_WRITE (kernel opcode 4)
                                    int     80h                 ; call the kernel

                                _exit:

                                    call    quit                ; call our quit function
                            
~$ nasm -f elf socket.asm ~$ ld -m elf_i386 socket.o -o socket ~$ ./socket

New terminal window
~$ curl http://localhost:9001 Hello World!

Lesson 35

Sockets - Close

In this lesson we will use sys_close to properly close the active socket connection in the child process after our response has been sent. This will free up some resources that can be used to accept new incoming connections.

sys_close expects 1 argument - the file descriptor in EBX. The sys_close opcode is then loaded into EAX and the kernel is called to close the socket and remove the active file descriptor.

Note: Run the program and use the command curl http://localhost:9001 in another terminal or connect to the same address using any standard web browser.

socket.asm
                                ; Socket
                                ; Compile with: nasm -f elf socket.asm
                                ; Link with (64 bit systems require elf_i386 option): ld -m elf_i386 socket.o -o socket
                                ; Run with: ./socket

                                %include    'functions.asm'

                                SECTION .data
                                ; our response string
                                response db 'HTTP/1.1 200 OK', 0Dh, 0Ah, 'Content-Type: text/html', 0Dh, 0Ah, 'Content-Length: 14', 0Dh, 0Ah, 0Dh, 0Ah, 'Hello World!', 0Dh, 0Ah, 0h

                                SECTION .bss
                                buffer resb 255,                ; variable to store request headers

                                SECTION .text
                                global  _start

                                _start:

                                    xor     eax, eax            ; initialize some registers
                                    xor     ebx, ebx
                                    xor     edi, edi
                                    xor     esi, esi

                                _socket:

                                    push    byte 6              ; create socket from lesson 29
                                    push    byte 1
                                    push    byte 2
                                    mov     ecx, esp
                                    mov     ebx, 1
                                    mov     eax, 102
                                    int     80h

                                _bind:

                                    mov     edi, eax            ; bind socket from lesson 30
                                    push    dword 0x00000000
                                    push    word 0x2923
                                    push    word 2
                                    mov     ecx, esp
                                    push    byte 16
                                    push    ecx
                                    push    edi
                                    mov     ecx, esp
                                    mov     ebx, 2
                                    mov     eax, 102
                                    int     80h

                                _listen:

                                    push    byte 1              ; listen socket from lesson 31
                                    push    edi
                                    mov     ecx, esp
                                    mov     ebx, 4
                                    mov     eax, 102
                                    int     80h

                                _accept:

                                    push    byte 0              ; accept socket from lesson 32
                                    push    byte 0
                                    push    edi
                                    mov     ecx, esp
                                    mov     ebx, 5
                                    mov     eax, 102
                                    int     80h

                                _fork:

                                    mov     esi, eax            ; fork socket from lesson 33
                                    mov     eax, 2
                                    int     80h

                                    cmp     eax, 0
                                    jz      _read

                                    jmp     _accept

                                _read:

                                    mov     edx, 255            ; read socket from lesson 33
                                    mov     ecx, buffer
                                    mov     ebx, esi
                                    mov     eax, 3
                                    int     80h

                                    mov     eax, buffer
                                    call    sprintLF

                                _write:

                                    mov     edx, 78             ; write socket from lesson 34
                                    mov     ecx, response
                                    mov     ebx, esi
                                    mov     eax, 4
                                    int     80h

                                _close:

                                    mov     ebx, esi            ; move esi into ebx (accepted socket file descriptor)
                                    mov     eax, 6              ; invoke SYS_CLOSE (kernel opcode 6)
                                    int     80h                 ; call the kernel

                                _exit:

                                    call    quit                ; call our quit function
                            
~$ nasm -f elf socket.asm ~$ ld -m elf_i386 socket.o -o socket ~$ ./socket

New terminal window
~$ curl http://localhost:9001 Hello World!

Note: We have properly closed the socket connections and removed their active file descriptors.

Lesson 36

Download a Webpage

In the previous lessons we have been learning how to use the many subroutines of the SYS_SOCKETCALL kernel function to create, manage and transfer data through Linux sockets. We will continue that theme in this lesson by using the 'connect' subroutine of SYS_SOCKETCALL to connect to a remote webserver and download a webpage.

These are the steps we need to follow to connect a socket to a remote server:

  • Call SYS_SOCKETCALL's subroutine 'socket' to create an active socket that we will use to send outbound requests.
  • Call SYS_SOCKETCALL's subroutine 'connect' to connect our socket with a socket on the remote webserver.
  • Use SYS_WRITE to send a HTTP formatted request through our socket to the remote webserver.
  • Use SYS_READ to recieve the HTTP formatted response from the webserver.
We will then use our string printing function to print the response to our terminal.

What is a HTTP Request

The HTTP specification has evolved through a number of standard versions including 1.0 in RFC1945, 1.1 in RFC2068 and 2.0 in RFC7540. Version 1.1 is still the most common today.

A HTTP/1.1 request is comprised of 3 sections:

  1. A line containing the request method, request url, and http version
  2. An optional section of request headers
  3. An empty line that tells the remote server you have finished sending the request and you will begin waiting for the response.

A typical HTTP request for the root document on this server would look like this:

                                GET / HTTP/1.1                  ; A line containing the request method, url and version
                                Host: asmtutor.com              ; A section of request headers
                                                                ; A required empty line
                            
Writing our program

This tutorial starts out like the previous ones by calling SYS_SOCKETCALL's subroutine 'socket' to initially create our socket. However, instead of calling 'bind' on this socket we will call 'connect' with an IP Address and Port Number to connect our socket to a remote webserver. We will then use the SYS_WRITE and SYS_READ kernel methods to transfer data between the two sockets by sending a HTTP request and reading the HTTP response.

SYS_SOCKETCALL's subroutine 'connect' expects 2 arguments - a pointer to an array of arguments in ECX and the integer value 3 in EBX. The SYS_SOCKETCALL opcode is then loaded into EAX and the kernel is called to connect to the socket.

Note: In Linux we can use the following command ./crawler > index.html to save the output of our program to a file instead.

crawler.asm
                                ; Crawler
                                ; Compile with: nasm -f elf crawler.asm
                                ; Link with (64 bit systems require elf_i386 option): ld -m elf_i386 crawler.o -o crawler
                                ; Run with: ./crawler

                                %include    'functions.asm'

                                SECTION .data
                                ; our request string
                                request db 'GET / HTTP/1.1', 0Dh, 0Ah, 'Host: 139.162.39.66:80', 0Dh, 0Ah, 0Dh, 0Ah, 0h

                                SECTION .bss
                                buffer resb 1,                  ; variable to store response

                                SECTION .text
                                global  _start

                                _start:

                                    xor     eax, eax            ; init eax 0
                                    xor     ebx, ebx            ; init ebx 0
                                    xor     edi, edi            ; init edi 0

                                _socket:

                                    push    byte 6              ; push 6 onto the stack (IPPROTO_TCP)
                                    push    byte 1              ; push 1 onto the stack (SOCK_STREAM)
                                    push    byte 2              ; push 2 onto the stack (PF_INET)
                                    mov     ecx, esp            ; move address of arguments into ecx
                                    mov     ebx, 1              ; invoke subroutine SOCKET (1)
                                    mov     eax, 102            ; invoke SYS_SOCKETCALL (kernel opcode 102)
                                    int     80h                 ; call the kernel

                                _connect:

                                    mov     edi, eax            ; move return value of SYS_SOCKETCALL into edi (file descriptor for new socket, or -1 on error)
                                    push    dword 0x4227a28b    ; push 139.162.39.66 onto the stack IP ADDRESS (reverse byte order)
                                    push    word 0x5000         ; push 80 onto stack PORT (reverse byte order)
                                    push    word 2              ; push 2 dec onto stack AF_INET
                                    mov     ecx, esp            ; move address of stack pointer into ecx
                                    push    byte 16             ; push 16 dec onto stack (arguments length)
                                    push    ecx                 ; push the address of arguments onto stack
                                    push    edi                 ; push the file descriptor onto stack
                                    mov     ecx, esp            ; move address of arguments into ecx
                                    mov     ebx, 3              ; invoke subroutine CONNECT (3)
                                    mov     eax, 102            ; invoke SYS_SOCKETCALL (kernel opcode 102)
                                    int     80h                 ; call the kernel

                                _write:

                                    mov     edx, 43             ; move 43 dec into edx (length in bytes to write)
                                    mov     ecx, request        ; move address of our request variable into ecx
                                    mov     ebx, edi            ; move file descriptor into ebx (created socket file descriptor)
                                    mov     eax, 4              ; invoke SYS_WRITE (kernel opcode 4)
                                    int     80h                 ; call the kernel

                                _read:

                                    mov     edx, 1              ; number of bytes to read (we will read 1 byte at a time)
                                    mov     ecx, buffer         ; move the memory address of our buffer variable into ecx
                                    mov     ebx, edi            ; move edi into ebx (created socket file descriptor)
                                    mov     eax, 3              ; invoke SYS_READ (kernel opcode 3)
                                    int     80h                 ; call the kernel

                                    cmp     eax, 0              ; if return value of SYS_READ in eax is zero, we have reached the end of the file
                                    jz      _close              ; jmp to _close if we have reached the end of the file (zero flag set)

                                    mov     eax, buffer         ; move the memory address of our buffer variable into eax for printing
                                    call    sprint              ; call our string printing function
                                    jmp     _read               ; jmp to _read

                                _close:

                                    mov     ebx, edi            ; move edi into ebx (connected socket file descriptor)
                                    mov     eax, 6              ; invoke SYS_CLOSE (kernel opcode 6)
                                    int     80h                 ; call the kernel

                                _exit:

                                    call    quit                ; call our quit function
                            
~$ nasm -f elf crawler.asm ~$ ld -m elf_i386 crawler.o -o crawler ~$ ./crawler HTTP/1.1 200 OK Content-Type: text/html   <!DOCTYPE html> <html lang="en"> ... </html>