Difference between revisions of "Floating-point Opcodes"
Line 1: | Line 1: | ||
− | The FPU offers a lot of operations not available to classic x86 CPU, like <code>SIN</code>, <code>COS</code>, <code>TAN</code>, <code>EXP</code>, <code>SQRT</code>, <code>LN</code> and so on. Usage and communication with the FPU is a bit uncommon and takes a bit to get used to. It's recommended to read the creation of the [[Output#Outputting_in_mode_13h_.28320x200.29|snippet we want to modify]] first, this is how it looks like originally : | + | The FPU offers a lot of operations not available to classic x86 CPU, like <code>SIN</code>, <code>COS</code>, <code>TAN</code>, <code>EXP</code>, <code>SQRT</code>, <code>LN</code> and so on. [http://www.website.masmforum.com/tutorials/fptute/appen1.htm SIMPLY FPU |
+ | ] by Raymond Filiatreault has a compact overview of all FPU commands. Usage and communication with the FPU is a bit uncommon and takes a bit to get used to. It's recommended to read the creation of the [[Output#Outputting_in_mode_13h_.28320x200.29|snippet we want to modify]] first, this is how it looks like originally : | ||
<syntaxhighlight lang="nasm">cwd ; "clear" DX for perfect alignment | <syntaxhighlight lang="nasm">cwd ; "clear" DX for perfect alignment | ||
Line 66: | Line 67: | ||
mov bx,320 ; 320 columns | mov bx,320 ; 320 columns | ||
mov ax,di ; get screen pointer in AX | mov ax,di ; get screen pointer in AX | ||
− | div bx ; construct X,Y from | + | div bx ; construct X,Y from screen pointer into AX,DX |
sub ax,100 ; subtract the origin | sub ax,100 ; subtract the origin | ||
sub dx,160 ; = (160,100) ... center of 320x200 screen | sub dx,160 ; = (160,100) ... center of 320x200 screen | ||
Line 78: | Line 79: | ||
fsqrt ; R | fsqrt ; R | ||
fistp word [si] ; - | fistp word [si] ; - | ||
− | mov ax,[si] ; | + | mov ax,[si] ; get the result from memory |
stosb ; write to screen (DI) and increment DI | stosb ; write to screen (DI) and increment DI | ||
jmp short X ; next pixel</syntaxhighlight> | jmp short X ; next pixel</syntaxhighlight> | ||
+ | |||
+ | A few words on this : | ||
+ | * Depending on what you do, sometimes <code>F(N)INIT</code> can be omitted. Real hardware will refuse to work more often than emulators, but it's always worth the try. | ||
+ | * Accessing memory (size) efficiently can be a real pain. The safest way is to reference absolute memory locations (f.e <code>[1234]</code>) but that's two bytes more per instruction than referencing memory with <code>[BX]</code>, <code>[SI]</code>, <code>[BX+SI]</code>, <code>[BP+DI]</code>, <code>[BP+SI]</code>, <code>[DI]</code> or <code>[BX+DI]</code>. When working with FPU and this ''classic'' approach of FPU communication, you have to design your codeflow to have one or some of these locations available. | ||
+ | * Accessing the memory is always with regard to the segment register <code>DS</code> unless you perform segment overrides. When accessing memory with <code>[BP+??]</code> be aware that this access memory in regard to the segment register <code>SS</code> (see [http://www.oopweb.com/Assembly/Documents/ArtOfAssembly/Volume/Chapter_4/CH04-2.html here, at 4.6.2.2 The Register Indirect Addressing Modes] |
Revision as of 13:45, 15 August 2016
The FPU offers a lot of operations not available to classic x86 CPU, like SIN
, COS
, TAN
, EXP
, SQRT
, LN
and so on. [http://www.website.masmforum.com/tutorials/fptute/appen1.htm SIMPLY FPU
] by Raymond Filiatreault has a compact overview of all FPU commands. Usage and communication with the FPU is a bit uncommon and takes a bit to get used to. It's recommended to read the creation of the snippet we want to modify first, this is how it looks like originally :
cwd ; "clear" DX for perfect alignment
mov al,0x13
X: int 0x10 ; set video mode AND draw pixel
mov ax,cx ; get column in AH
add ax,di ; offset by framecounter <-- REPLACE THIS WITH FPU CODE
xor al,ah ; the famous XOR pattern
and al,32+8 ; a more interesting variation of it
mov ah,0x0C ; set subfunction "set pixel" for int 0x10
loop X ; loop 65536 times
inc di ; increment framecounter
in al,0x60 ; check keyboard ...
dec al ; ... for ESC
jnz X ; rinse and repeat
ret ; quit program
and this is how it looks if we replace the instruction with FPU code :
cwd ; "clear" DX for perfect alignment
mov al,0x13
X: int 0x10 ; set video mode AND draw pixel
mov ax,cx ; get column in AH
fninit ; init FPU first
mov [si],ax ; write first addend to a memory location
fild word [si] ; F(pu) I(nteger) L(oad)D a WORD from memory location to the FPU stack
mov [si],di ; write second addend to a memory location
fiadd word [si] ; Directly add the word in the memory location to the top FPU stack
fist word [si] ; F(pu) I(nteger) ST(ore) the result into a memory location
mov ax,[si] ; Get the word from the memory location into AX
xor al,ah ; the famous XOR pattern
and al,32+8 ; a more interesting variation of it
mov ah,0x0C ; set subfunction "set pixel" for int 0x10
loop X ; loop 65536 times
inc di ; increment framecounter
in al,0x60 ; check keyboard ...
dec al ; ... for ESC
jnz X ; rinse and repeat
ret ; quit program
The usual interaction with the FPU is as follows
-
F(N)INIT
: Initialization of the FPU - store register content in memory location(s)
- transfer from memory location onto FPU stack
- actual calculations on the FPU (more on this soon)
- transfer from FPU stack into memory location(s)
- get register from memory location
That would be a lot for a single integer addition, but once more complex floating point operations are involved, it starts to pay off. For more advanced FPU operation, let's start from scratch with an unoptimized program which plots the distance of each pixel to the screen center as color, in 49 bytes.
push 0a000h
pop es ; get start of video memory in ES
mov al,0x13 ; switch to video mode 13h
int 0x10 ; 320 * 200 in 256 colors
fninit ; -
; it's useful to comment what's on the
; stack after each FPU operation
; to not get lost ;) start is : empty (-)
X:
xor dx,dx ; reset the high word before division
mov bx,320 ; 320 columns
mov ax,di ; get screen pointer in AX
div bx ; construct X,Y from screen pointer into AX,DX
sub ax,100 ; subtract the origin
sub dx,160 ; = (160,100) ... center of 320x200 screen
mov [si],ax ; move X into a memory location
fild word [si] ; X
fmul st0 ; X²
mov [si],dx ; move Y into a memory location
fild word [si] ; Y X²
fmul st0 ; Y² X²
fadd st0,st1 ; Y²+X²
fsqrt ; R
fistp word [si] ; -
mov ax,[si] ; get the result from memory
stosb ; write to screen (DI) and increment DI
jmp short X ; next pixel
A few words on this :
- Depending on what you do, sometimes
F(N)INIT
can be omitted. Real hardware will refuse to work more often than emulators, but it's always worth the try. - Accessing memory (size) efficiently can be a real pain. The safest way is to reference absolute memory locations (f.e
[1234]
) but that's two bytes more per instruction than referencing memory with[BX]
,[SI]
,[BX+SI]
,[BP+DI]
,[BP+SI]
,[DI]
or[BX+DI]
. When working with FPU and this classic approach of FPU communication, you have to design your codeflow to have one or some of these locations available. - Accessing the memory is always with regard to the segment register
DS
unless you perform segment overrides. When accessing memory with[BP+??]
be aware that this access memory in regard to the segment registerSS
(see here, at 4.6.2.2 The Register Indirect Addressing Modes