notes2

/*personal notes of renzo diomedi*/

~ 00000010 ~

HOW THE MACHINE ARRANGES THE BYTES:

remembering that in 0xnn n=nibble = 4 bit = hex = 2^4 = [0,15]

dat:
.byte 0x88, 0x77 , 0x66, 0x55, 0x44, 0x33, 0x22, 0x11
generates &dat = 0x55667788 0x11223344 #2*32 bit

dat:
.byte 0x11, 0x22 , 0x33, 0x44, 0x55, 0x66, 0x77, 0x88
generates &dat = 0x88776655 0x44332211

if ECX:EBX registers were placed in the data memory location :
&dat = ebx:ecx

Breakpoint 1, _start () at 8.s:9
9 movl $0x88776655 , %edx
(gdb) s
10 movl $0x44332211 , %eax
(gdb) s
11 movl $0x22222222 , %ecx
(gdb) s
12 movl $0x11111111 , %ebx
(gdb) s
13 cmpxchg8b dat
(gdb) x/2x &dat
0x804909c: 0x55667788 0x11223344
(gdb) x/2x &eax
No symbol "eax" in current context.
(gdb) x/2x $eax
0x44332211: Cannot access memory at address 0x44332211
(gdb) print $eax
$1 = 1144201745
(gdb) print/x $eax
$2 = 0x44332211
(gdb) print/x $edx
$3 = 0x88776655
(gdb) info registers
eax 0x44332211 1144201745
ecx 0x22222222 572662306
edx 0x88776655 -2005440939
ebx 0x11111111 286331153
esp 0xbffff2b0 0xbffff2b0
ebp 0x0 0x0
esi 0x0 0
edi 0x0 0
eip 0x8048089 0x8048089 <_start+21>
eflags 0x202 [ IF ]
cs 0x73 115
ss 0x7b 123
ds 0x7b 123
es 0x7b 123
fs 0x0 0
gs 0x0 0
(gdb) Quit
(gdb) Quit
(gdb)

then checked and verified that the Label dat is not equal to edx:eax , cmpxchg8b moves the values stored in DAT in edx:eax ordening as 0x88776655:0x44332211
Verifying this by
DAT:
.byte 0x77, 0x55, 0x22, 0x23, 0x32, 0x98, 0x11, 0x45

#8b8
.section .data
DAT:
.byte 0x77, 0x55, 0x22, 0x23, 0x32, 0x98, 0x11, 0x45
.section .text
.globl _start
_start:
nop
movl $0x88776655, %edx
movl $0x44332211, % eax
movl $0x22222222, %ecx
movl $0x11111111, %ebx
cmpxchg8b DAT
movl $0x0 , %ebx
movl $0x1, %eax
int $0x080

(gdb) x/2x &DAT
0x402000 <_data_start__>: 0x23225577 0x45119832
//why not 0x45119832 0x23225577 ? why are the 2 groups of 32 bits sorted like this? which the key?

(gdb) info registers
eax 0x23225577 589452663
ecx 0x22222222 572662306
edx 0x45119832 1158780978
ebx 0x11111111 286331153

The basic algorithm for a sort in the high-level language "c" is:

for(out = array_size-1; out>0, out--)
{
for(in = 0; in < out; in++)
{
if (array[in] > array[in+1])
swap(array[in], array[in+1]);
}
}

There are two loops. The inner loop runs through the array, checking the adjacent array value to see which is larger. If a larger value is found in front of a smaller value, the two values are swapped in the array. This continues through to the end of the array.
When the first pass has completed, the largest value in the array should be at the end of the array, but the remaining values are not in any particular order. You must take N-1 passes through an array of N elements before all of the elements are in sorted order. The outer loop controls how many total passes of the inner loop are performed. For each new pass of the inner loop, there is one less element to check, as the last element of the previous pass should be in the proper order.

This algorithm is implemented in the assembly language program using a data array and two counters, EBX and ECX. The EBX counter is used for the inner loop, decreasing each time an array element is tested. When it reaches zero, the ECX counter is decreased, and the EBX counter is reset. This process continues until the ECX counter reaches zero. This indicates that all of the required passes have been completed.

algorithm to sort an array of integers, not the most efficient sort method, but it is the easiest to understand and demonstrate.

# sort
.section .data
va:
.int 105, 235, 61, 315, 134, 221, 53, 145, 117, 5
.section .text
.globl _start
_start:
movl $va, %esi
movl $9, %ecx # 9 comparisons
movl $9, %ebx
loop:
movl (%esi), %eax
cmp %eax, 4(%esi)
jge skip
xchg %eax, 4(%esi)
movl %eax, (%esi)
skip:
add $4, %esi
dec %ebx
jnz loop
// after ebx reaches 0, then ecx decrease to 8, then after ebx = 0 again, ecx decrease to 7 and so on to zero
dec %ecx /*decreasee only ebx loop ended*/
jz end
movl $va, %esi
movl %ecx, %ebx /* now ebx is resetted to 8, then after a new loop is resetted to 7,..,6.....,1 */
jmp loop
end:
movl $1, %eax
movl $0, %ebx
int $0x80

The actual comparing and swapping of array values is done using indirect addressing. The ESI register is loaded with the memory address of the start of the data array. The ESI register is then used as a pointer to each array element during the comparison section:

movl (%esi), %eax
cmp %eax, 4(%esi)
jge skip
xchg %eax, 4(%esi)
movl %eax, (%esi)
skip:

First, the value in the first array element is loaded into the EAX register, and compared with the second array element (located 4 bytes from the first). If the second element is already larger than or equal to the first element, nothing happens and the program moves on to the next pair.
If the second element is less than the first element, the XCHG instruction is used to swap the first element (loaded into the EAX register) with the second element in memory. Next, the second element (now loaded into the EAX register) is then placed in the first element location in memory.
After this, the ESI register is incremented by 4 bytes, now pointing to the second element in the array. The process is then repeated, now using the second and third array elements. This continues until the end of the array is reached.
This simple sample program does not produce any output. Instead, to see if it really works, you can use the debugger and view the values array before and after the program is run. Here’s a sample output of the program in action:

C:\>gdb -q users\\rnz\desktop\sort.exe
Reading symbols from users\\rnz\desktop\sort.exe...done.
(gdb) break *end
Breakpoint 1 at 0x40102d: file users\rnz\desktop\sort.s, line 27.
(gdb) x/10d &values
0x402000 : 105 235 61 315
0x402010 : 134 221 53 145
0x402020 : 117 5
(gdb) run
Starting program: C:\users\rnz\desktop\sort.exe
[New Thread 14156.0x2b00]
Breakpoint 1, end () at users\rnz\desktop\sort.s:27
27 movl $1, %eax
(gdb) x/10d &values
0x402000 : 5 53 61 105
0x402010 : 117 134 145 221
0x402020 : 235 315
(gdb)

# sort.exp
.section .data
va:
.int 105, 235, 61, 315, 134, 221, 53, 145, 117, 5
.section .text
.globl _start
_start:
movl $va, %esi
movl $9, %ecx
movl $9, %ebx
loop:
movl (%esi), %eax
cmp %eax, 4(%esi)
jge skip
xchg %eax, 4(%esi)
movl %eax, (%esi)
skip:
add $4, %esi
dec %ebx
jnz loop
dec %ecx
jz end
movl $va, %esi
//movl %ecx, %ebx ############### NOTICE!!!
jmp loop
end:
movl $1, %eax
movl $0, %ebx
int $0x80

C:\>as -gstabs -o users\rnz\desktop\sort.exp.o users\rnz\desktop\sort.exp.s
C:\>ld -o users\rnz\desktop\sort.exp.exe users\rnz\desktop\sort.exp.o
C:\>gdb -q users\rnz\desktop\sort.exp.exe
Reading symbols from users\rnz\desktop\sort.exp.exe...done.
(gdb) break *end
Breakpoint 1 at 0x40102b: file users\rnz\desktop\sort.exp.s, line 27.
(gdb) x/10d &va
0x402000 : 105 235 61 315
0x402010 : 134 221 53 145
0x402020 : 117 5
(gdb) run
Starting program: C:\users\rnz\desktop\sort.exp.exe
[New Thread 10920.0x2b28]
Program received signal SIGSEGV, Segmentation fault.
loop () at users\rnz\desktop\sort.exp.s:15
15 xchg %eax, 4(%esi)
(gdb) x/10d &va
0x402000 : 61 105 134 221
0x402010 : 53 145 117 5
0x402020 : 235 0
(gdb)
##############FAILED SORT!!!!!!!!!!!!!!!!!!!!!!!!

Stack

pushx source

pushl %ecx # puts the 32-bit value of the ECX register on the stack
pushw %cx # puts the 16-bit value of the CX register on the stack
pushl $100 # puts the value of 100 on the stack as a 32-bit integer value
pushl data # puts the 32-bit data value referenced by the data label
pushl $data # puts the 32-bit memory address referenced by the data label

Note the difference between using the label data versus the memory location $data.
The first format (without the dollar sign) places the data value contained in the memory location in the stack,
whereas the second format places the memory address referenced by the label in the stack.

popx destination

popl %ecx # place the next 32-bits in the stack in the ECX register
popw %cx # place the next 16-bits in the stack in the CX register
popl value # place the next 32-bits in the stack in the value memory location

Instruction Description
PUSHA/POPA Push or pop all of the 16-bit general-purpose registers
PUSHAD/POPAD Push or pop all of the 32-bit general-purpose registers
PUSHF/POPF Push or pop the lower 16 bits of the EFLAGS register
PUSHFD/POPFD Push or pop the entire 32 bits of the EFLAGS register

The PUSHA and POPA instructions are great for quickly setting aside and retrieving the current state of all the general-purpose registers at once. The PUSHA instruction pushes the 16-bit registers so they appear on the stack in the following order: DI, SI, BP, BX, DX, CX, and finally, AX. The PUSHAD instruction pushes the 32-bit counterparts of these registers in the same order. The POPA and POPAD instructions retrieve the registers in the reverse order they were pushed.
The behavior of the POPF and POPFD instructions varies depending on the processor mode of operation. When the processor is running in protected mode in ring 0 (the privileged mode), all of the nonreserved flags in the EFLAGS register can be modified, with the exception of the VIP, VIF, and VM flags. The VIP and VIF flags are cleared, and the VM flag is not modified.
When the processor is running in protected mode in a higher level ring (an unprivileged mode), the same results as the ring 0 mode are obtained, and the IOFL field is not allowed to be modified.

Optimizing Memory Access

Memory access is one of the slowest functions the processor performs. When writing assembly language programs that require high performance, it is best to avoid memory access as much as possible. Whenever possible, it is best to keep variables in registers on the processor. Register access is highly optimized for the processor, and is the quickest way to handle data.
When it is not possible to keep all of the application data in registers, you should try to optimize the memory access for the application. For processors that use data caching, accessing memory in a sequential order in memory helps increase cache hits, as blocks of memory will be read into cache at one time. One other item to think about when using memory is how the processor handles memory reads and writes. Most processors (including those in the IA-32 family) are optimized to read and write memory locations in specific cache blocks, beginning at the start of the data section. On a Pentium 4 processor, the size of the cache block is 64 bits. If you define a data element that crosses a 64-bit block boundary, it will require two cache operations to retrieve or store the data element in memory.
To solve this problem, Intel suggests following these rules when defining data:

❑ Align 16-bit data on a 16-byte boundary.
❑ Align 32-bit data so that its base address is a multiple of four.
❑ Align 64-bit data so that its base address is a multiple of eight.
❑ Avoid many small data transfers. Instead, use a single large data transfer.
❑ Avoid using larger data sizes (such as 80- and 128-bit floating-point values) in the stack.

Aligning data within the data section can be tricky. The order in which data elements are defined can be crucial to the performance of your application. If you have a lot of similarly sized data elements, such as integer and floating-point values, place them together at the beginning of the data section. This ensures that they will maintain the proper alignment. If you have a lot of odd-sized data elements, such as strings and buffers, place those at the end of the data section so they won’t throw off the alignment of the other data elements.
The gas assembler supports the .align directive, which is used to align defined data elements on specific memory boundaries. The .align directive is placed immediately before the data definition in the data section, instructing the assembler to position the data element on a memory boundary

HOME PAGE