Difference between revisions of "Risc OS on ARM based CPUs"
(Added Risc OS section) |
m |
||
Line 8: | Line 8: | ||
=== What does ARM offer compared to x86 ? === | === What does ARM offer compared to x86 ? === | ||
− | If you come from x86 coding on ARM will be a very different experience, as that architecture never had any inherited obstacles from | + | If you come from x86 coding on ARM will be a very different experience, as that architecture never had any inherited obstacles from an 8 or 16 Bit age. It was purely RISC and 32 Bit from the beginning regarding instruction set and register size. Over the years a lot of enhancements took place. In general you got: |
− | + | * 16 full size 32-Bit registers (r0...15, usually: r13: stack pointer, r14: link register, r15: program counter) | |
− | + | * VFP/NEON(SIMD) units with 32 32-Bit single precision registers (s0...s31), 32 64-Bit multi purpose or double precision registers for SIMD (d0...d31), and 16 128-Bit multi purpose registers for SIMD (q0...q15). All those registers are fully mapped on each other | |
− | + | * THUMB/THUMB-2 instruction set (especially useful regarding sizecoding) | |
− | ...and of course the single commands in general are very different to x86...some things might be familiar, some are not at all...over the years the ARM instruction set became quite huge. Nowadays there's hardware integer divide, various SIMD approaches in either ARM or NEON instructions. Just regarding the FPU it still lacks trigonometric and other fancy instructions compared to x87. There is a so called FPEmulator in Risc OS for taking care of that, but that's rather slow as implemented by software and not available for THUMB/THUMB-2 | + | ...and of course the single commands in general are very different to x86...some things might be familiar, some are not at all...over the years the ARM instruction set became quite huge. Nowadays there's hardware integer divide, various SIMD approaches in either ARM or NEON instructions. Just regarding the FPU it still lacks trigonometric and other fancy instructions compared to x87. There is a so called FPEmulator in Risc OS for taking care of that, but that's rather slow as implemented by software and not available for THUMB/THUMB-2. By now though it might be an option for e.g. precalc when you use the Basic Assembler from RISC OS. |
The size of the instructions is always 4 Bytes, only THUMB offers a limited instruction set with a length of 2 Bytes. | The size of the instructions is always 4 Bytes, only THUMB offers a limited instruction set with a length of 2 Bytes. |
Revision as of 11:32, 16 June 2020
Contents
Why ARM and why on Risc OS ?
x86 and CPUs based on ARM architecture are the two major CPU architectures of modern times, the later one especially for any kind of mobile devices. Back in the 80's ARM was founded to power the successor of the BBC Micro. Most popular and known may be is the Acorn Archimedes range (1987) and the Acorn Risc PC. All those home computers were run by Risc OS, a unique operating system for ARM cpu's.
Nowadays due to the work of a few enthusiast Risc OS is still in development and you can run it on popular single-board computers. Especially recommended and cheap is the Raspberry Pi range. So the fastest cpu to run Risc OS natively at the time of writing is an overclocked RPi4 at 2147 Mhz.
Actually I'm not aware if Android or an kind of Linux would be a better platform for sizecoding on ARM hardware. Just proof us wrong and write to us about it.
What does ARM offer compared to x86 ?
If you come from x86 coding on ARM will be a very different experience, as that architecture never had any inherited obstacles from an 8 or 16 Bit age. It was purely RISC and 32 Bit from the beginning regarding instruction set and register size. Over the years a lot of enhancements took place. In general you got:
- 16 full size 32-Bit registers (r0...15, usually: r13: stack pointer, r14: link register, r15: program counter)
- VFP/NEON(SIMD) units with 32 32-Bit single precision registers (s0...s31), 32 64-Bit multi purpose or double precision registers for SIMD (d0...d31), and 16 128-Bit multi purpose registers for SIMD (q0...q15). All those registers are fully mapped on each other
- THUMB/THUMB-2 instruction set (especially useful regarding sizecoding)
...and of course the single commands in general are very different to x86...some things might be familiar, some are not at all...over the years the ARM instruction set became quite huge. Nowadays there's hardware integer divide, various SIMD approaches in either ARM or NEON instructions. Just regarding the FPU it still lacks trigonometric and other fancy instructions compared to x87. There is a so called FPEmulator in Risc OS for taking care of that, but that's rather slow as implemented by software and not available for THUMB/THUMB-2. By now though it might be an option for e.g. precalc when you use the Basic Assembler from RISC OS.
The size of the instructions is always 4 Bytes, only THUMB offers a limited instruction set with a length of 2 Bytes.
That may sound as a bit of a handicap regarding size coding and for some tasks that is definitely true. For others it's not due to the things even one instruction can do (e.g. conditional execution and shifts for free). The following shows an example:
ARM (8 Bytes)
cmp r0,r1 ;compare r0 with r1
addhi r0,r2,r3,lsl#2 ;if r0>r1 then r0 = r2 + r3<<2
x86 (11 Bytes)
cmp eax,ebx
jna skip:
mov eax,edx
shl eax,2
add eax,ecx
skip:
What does Risc OS offer for sizecoding ?
- more or less easy access to common screen modes
- all screen modes have a linear frame buffer, no 16Bit screen banks limit like on DOS
- convenient access to operating system/kernel routines (so called SWI's (SoftWare Interrupt), comparable to 'int' on x86).
- up to date 16-Bit sound system, for e.g. generating bytebeat based stuff
- built in BBC Basic including an Assembler
What does it lack (but mostly not relevant to tiny intro sizecoding) ?
- no multicore support
- no shader access or any kind of open-gl or direct-x
- lack of software development in general, so web browsing is there but a bit limited
Code Examples - Simple sizecoding framework and output to screen
So what would a common intro framework look like ? For now we will use the gnu assembler to assemble our code, as the built in BASIC Assembler doesn't support THUMB code.
Before we start with the actual code it's best to define some of the mentioned SWI's for OS interaction by their number. Here's a list of some basic ones.
.set OS_ScreenMode, 0x65
.set OS_RemoveCursors, 0x36
.set OS_ScreenMode, 0x65
.set OS_ReadVduVariables, 0x31
.set OS_ReadMonotonicTime, 0x42
.set OS_ReadEscapeState, 0x2c
.set OS_Exit, 0x11
So for a basic intro loop in THUMB-2 this would look like
.syntax unified
.thumb //assemble using thumb mode
movs r0,#0 //reason code to set screen mode by number
movs r1,#13 //screen mode 13 = 320x256 256 colours
swi OS_ScreenMode //set screen mode
adr.n r0,screen_address //address of input block to read screen mode address
movs r1,r0 //address of output block where screen mode address is stored
swi OS_ReadVduVariables //read and write screen mode address from/to blocks
mainloop:
ldr.n r7,screen_address //read screen address
swi OS_ReadMonotonicTime //get OS timer to r0
movs r2,#255 //screen y
yloop:
movs r1,#320 //screen x
xloop:
adds r3,r1,r0 //p = x+timer
eors r3,r3,r2 //p = (x+timer) xor y
strb r3,[r7],1 //plot result as byte (with standard palette)
subs r1,r1,#1 //dec x
bne xloop
subs r2,r2,#1 //dec y
bge yloop
swi OS_ReadEscapeState //ESC pressed ?
bcc mainloop
swi OS_Exit //if yes exit to OS
.align 2 //align
screen_address:
.word 148 //input block to read screen address
.word -1 //request block needs to be terminated by -1
This assembles to 52 Bytes.
As you can see for setting the screen mode you can rely on smaller old school modes with up to e.g. 800x600x256 colours by just choosing a mode by a number (listed here: Screen Modes). After you set the screen mode you got to read it's start address by the OS_ReadVduVariables, as that is not a fixed address. On one specific device it should work to read that address and finally hardcode this address into your code, but then of course you would be restricted to your device (e.g. a RPI4 shows different results than a RPI3 for the same screen mode).
An intro showing that technique is e.g. Exoticorn's Edgedancer
If you want to go for true colour it's a bit more complex. The probably shortest way is to use the option to kind of upgrade those old school screen modes by a string using reason code 15 of the SWI (Check out this link for further information). That would look like this code snippet:
.syntax unified
.thumb //assemble using thumb mode
movs r0,#15 //reason code to request screen mode by string
adr.n r1,mode_string //pointer to string
swi OS_ScreenMode //set screen mode
adr.n r0,screen_address //address of input block to read screen mode address
movs r1,r0 //address of output block where screen mode address is stored
swi OS_ReadVduVariables //read and write screen mode address from/to blocks
mainloop:
ldr.n r7,screen_address //read screen address
swi OS_ReadMonotonicTime //get OS timer to r0
movs r2,#255 //screen y
ands r0,r0,r2 //get lowest byte of timer
lsls r0,r0,#8 //create 'B' for RGB from timer
yloop:
lsls r4,r2,#16 //create 'R' for RGB from y
orrs r4,r4,r0 //combine 'R' and 'B'
movs r1,#320 //screen x
xloop:
lsrs r3,r1,#1 //x>>1 for 'G' as x>256
orrs r3,r3,r4 //finalize RGB value
stmia r7!,{r3} //store true colour pixel and increment address
subs r1,r1,#1 //dec x
bne xloop
subs r2,r2,#1 //dec y
bge yloop
swi OS_ReadEscapeState //ESC pressed ?
bcc mainloop
swi OS_Exit //if yes exit to OS
.align 2 //align
mode_string:
.string "13 C16M" //screen mode string (terminated by 0) => 13 = 320*256 C16M = true colour
screen_address:
.word 148 //input block to read screen address
.word -1 //request block needs to be terminated by -1
This assembles to 68 Bytes.
An intro showing that technique is e.g. Exoticorn's Elsecaller
Another approach is to read the current screen mode, as most users would run in 1920x1080x32Bit anyway and not even set the screen mode, which also makes the intro independent of the resolution:
An intro showing that technique is e.g. Kuemmels's Risc OS 3dball. In a later upgrade to that intro you can also see the combined use of THUMB-2 and NEON within the code which lead to a reduction in code size from the initial non-THUMB version of around 44 Bytes. For more insights and requirements of the use of VFP/NEON check out the section below.
To trigger the THUMB mode in general you can conveniently set the first Bit of the start address by the following command on the command line in Risc OS (&8000 is the general start address for executables in Risc OS). The best way to do so is to use a batch file for that, as shown in most of the above mentioned intros:
SYS "OS_File",1,"filename",&8000,&8001,,19
Regarding THUMB mode on Risc OS in general there's a small thing to address. A very ancient module has to be removed from the OS, otherwise it crashes your code. By today that bug is still not fixed. The modules names is "SpecialFX" and needs to be removed by "rmkill SpecialFX" on the command line or by any batch file as shown in the intro links from above.
To exit your intro and go back to the desktop you simple use the shown SWI OS_Exit. If you didn't change the mode you got to use e.g. the SWI "OS_NewLine" to re-trigger desktop redraw. Of course all of those can be omitted if your tiny intro compo rules allow you too...
Code Examples - Using VFP/NEON code
...work in progress
Code Examples - Sound output by interrupt driven bytebeat
...work in progress...
Resources
Links on the OS
Risc OS Open - Home of the current OS version and discussion forum
Links on ARM coding
Thumb 16-bit Instruction Set Quick Reference Card
ARM and Thumb-2 Instruction Set Quick Reference Card
Vector Floating Point Instruction Set Quick Reference Card
Coding for NEON - Part 1 - load and stores
Coding for NEON - Part 2 - dealing with leftovers
Coding for NEON - Part 3 - matrix multiplication
Coding for NEON - Part 4 - shifting left and right
Coding for NEON - Part 5 - rearranging vectors
Condition Codes 1: Condition Flags and Codes
Condition Codes 2: Conditional Execution