If Ever I See You Again Shelley Hack
The Basics of Hacking [An Introduction]
C- source code needs to be compiled into a binary file earlier it can be executed and practise anything. The part of your compiler is to interpret the C — source lawmaking into machine language that your processor architecture (the target architecture) can sympathise. The long number next to "<main> is a retentiveness address cogent where this function lives in the computer's retention. The right most column are associates linguistic communication instructions. The "-q" flag just prevents the gdb from press the copywrite data.
Today nosotros will begin our journey into the basics of hacking. Allow's not waste material whatsoever fourth dimension.
Hither is the source code of a simple C program:
If we compile and execute the code it prints "Hello Earth" ten times:
Information technology is worth emphasizing that C- source lawmaking needs to be compiled into a binary file earlier information technology tin be executed and do annihilation. By default, the binary file will be named "a.out." However, we can specify a name for the binary file by writing "-o [proper name]" as we did to a higher place.
If nosotros were to simply open the binary file "first" with the Linux utility "less" we would see the following:
This is only a snippet of a long machine language file
This file contains machine language — an elementary language that tin can be understood directly by the CPU. The role of your compiler is to translate the C — source lawmaking into automobile language that your processor architecture (the target architecture) can understand. In this article we are using a x86 processor architecture.
Let's have a closer look at our compiled binary file "commencement" using the Linux utility "objump": objdump -D -Chiliad intel first | grep -A20 principal:
Objdump converts machine linguistic communication instructions into homo readable assembly instructions. Just as machine language instructions are unique to a given processor compages, assembly language instructions are likewise unique to the given processor architecture. The "-D" flag tells objdump to detach all. The "-M intel" flag tells objdump to utilize intel syntax for the assembly linguistic communication. Intel syntax (oppose to at&t syntax) does not incorporate the cacophony of % and $ signs and I find it easier to read. We tin can make intel syntax the default for objdump by adding "ready disassembly intel" to our .gdbinit file in our home directory.
The long number next to "<chief>" is a memory address denoting where this function lives in the figurer'southward memory. The cavalcade of numbers, each beginning with "iv," are retentiveness addresses which evidence us the location of each machine linguistic communication didactics. The various numbers in the middle column (55 is on summit) are automobile language instructions that only the CPU can understand. The correct most cavalcade are assembly language instructions. Assembly is merely a way for programmers to represent the machine language instructions. Typically, intel syntax follows a specific format:
<Operation> <Destination>, <Source>
Case:
mov rbp,rsp
The "destination" or "source" values will either exist registers, a retentiveness accost, or a value.
Registers are similar internal variables for the processor, they allow the processor to read and write information efficiently, do simple math, etc.
Nosotros can examine the values of the registers before the programme runs using gdb (a portable debugger written in c):
The "-q" flag merely prevents the gdb from press the copywrite data.
The first four registers: rax (accumulator), rbx (base of operations), rcx (counter), and rdx (data) are general purpose registers. They are mainly used as temporary variables by CPU when executing auto language instructions.
The 2d 4 registers: rsi, rdi, rbp, rsp are additional full general-purpose registers only are sometimes known as pointers and indices. They stand for Stack Pointer, Base of operations Pointer, Source Index, and Destination Index. RSP and RBP are called pointers because they shop addresses. RSI and RDI are also pointers used to point to the source and destination when data needs to exist read from or written to.
The RIP is the Educational activity Pointer register which points to the electric current instruction the processor is reading. RIP is very important. The remaining eflags annals consists of several bit flags that are used for comparisons and memory segmentations.
Let's use the debugger to pace through our hello world plan:
Notice that we have access to the source code from inside the debugger. In gild to do this, you must compile the source file with the "-1000" flag. Being able to view the source code from within the debugger volition assistance u.s. keep track of things. We then disassemble the main function using "disass principal" (we volition utilise abbreviations when nosotros tin can). This stops the plan earlier any of the instructions are actually run. We then utilise "i r rip" to examine the value of the Didactics Pointer. In this example, "i" is curt for "info" and "r" is curt for "annals."
Notice that rip contains a retentiveness address that points to an instruction in the master() functions disassembly (the second mov teaching). All of the functions before this are collectively known as the "function prologue" and are "generated by the compiler to set up memory for the rest of the main() function's local variables."
Nosotros can examine memory straight in gdb by typing "x" (for "examine") and specifying two arguments: the location of memory to examine, and how to display that memory. There are four dissimilar display formats: o (octal), x (hexadecimal), u (standard base-10), t (binary).
The default size of a single unit of measurement is a 4-byte discussion. The size of the display units for the examine command can exist inverse by adding a size letter to the end of the format letter. Size letters are as follows: b (a unmarried byte), h (a halfword), westward (a word), thousand (a double discussion):
You may notice that the bytes are existence reversed. This is because values are stored in petty-endian guild on the x86 processor which means the least meaning byte is stored first. In other words, "if four bytes are to be interpreted as a single value, the bytes must exist used in reverse club" (Hacking: The Art of Exploitation,Jon Erickson).
The examine control can besides brandish retentiveness as disassembled assembly language instructions:
In this context, "i" stands for "instruction." This instruction moves the value of "0" into a location in memory that is four less than where the annals rbp lives:
If nosotros disassemble the main part we see that we are on this pedagogy.
If we run the command "nexti," the electric current didactics will exist executed and nosotros volition motion on to the next didactics:
We are at present at the jump instruction.
If we examine the value at the accost of rbp — 4 we will find that it has been zeroed out:
We cipher out 4 bytes because an integer in C requires four bytes.
The adjacent instructions make more sense to discuss as a group:
The current teaching is an unconditional jump to 0x40054c. This takes us to a "cmp" instruction which represents the command period of the loop (is i less than 9). If it is, we bound to location 0x40053e which is another "mov" instruction. Let's use the nexti control to execute the mov instruction and then inspect the edi register:
You may notice that all each of these hexadecimal bytes are represented on the ascii table. We tin lookup a byte on the ascii tabular array by using "c":
So, the mov instruction loads the string "Hello, World" so that the subsequent telephone call pedagogy tin print the cord.
This concludes our first swoop into the nuts of hacking an examining memory.
Tags
Related Stories
Source: https://hackernoon.com/the-basics-of-hacking-an-introduction-iwft31rg
0 Response to "If Ever I See You Again Shelley Hack"
إرسال تعليق