Risc OS on ARM based CPUs

Why ARM and why on Risc OS ?

x86 and CPUs based on ARM architecture are the two major CPU architectures of modern times, the later one especially for any kind of mobile devices. Back in the 80's ARM was founded to power the successor of the BBC Micro. Most popular and known may be is the Acorn Archimedes range (1987) and the Acorn Risc PC. All those home computers were run by Risc OS, a unique operating system for ARM cpu's.

Nowadays due to the work of a few enthusiast Risc OS is still in development and you can run it on popular single-board computers. Especially recommended and cheap is the Raspberry Pi range. So the fastest cpu to run Risc OS natively at the time of writing is an overclocked RPi4 at 2147 Mhz.

Actually I'm not aware if Android or an kind of Linux would be a better platform for sizecoding on ARM hardware. Just proof us wrong and write to us about it.

What does ARM offer compared to x86 ?

If you come from x86 coding on ARM will be a very different experience, as that architecture never had any inherited obstacles from an 8 or 16 Bit age. It was purely RISC and 32 Bit from the beginning regarding instruction set and register size. Over the years a lot of enhancements took place. In general you got:

16 full size 32-Bit registers (r0...15, usually: r13: stack pointer, r14: link register, r15: program counter)
VFP/NEON(SIMD) units with 32 32-Bit single precision registers (s0...s31), 32 64-Bit multi purpose or double precision registers for SIMD (d0...d31), and 16 128-Bit multi purpose registers for SIMD (q0...q15). All those registers are fully mapped on each other
THUMB/THUMB-2 instruction set (especially useful regarding sizecoding)

...and of course the single commands in general are very different to x86...some things might be familiar, some are not at all...over the years the ARM instruction set became quite huge. Nowadays there's hardware integer divide, various SIMD approaches in either ARM or NEON instructions. Just regarding the FPU it still lacks trigonometric and other fancy instructions compared to x87. There is a so called FPEmulator in Risc OS for taking care of that, but that's rather slow as implemented by software and not available for THUMB/THUMB-2. By now though it might be an option for e.g. precalc when you use the Basic Assembler from RISC OS.

The size of the instructions is always 4 Bytes, only THUMB offers a limited instruction set with a length of 2 Bytes.

That may sound as a bit of a handicap regarding size coding and for some tasks that is definitely true. For others it's not due to the things even one instruction can do (e.g. conditional execution and shifts for free). The following shows an example:

ARM (8 Bytes)

cmp   r0,r1            //compare r0 with r1
addhi r0,r2,r3,lsl#2   //if r0>r1 then r0 = r2 + r3<<2

x86 (11 Bytes)

cmp eax,ebx
jna skip: 
   mov eax,edx
   shl eax,2
   add eax,ecx
skip:

What does Risc OS offer for sizecoding ?

more or less easy access to common screen modes
all screen modes have a linear frame buffer, no 16Bit screen banks limit like on DOS
convenient access to operating system/kernel routines (so called SWI's (SoftWare Interrupt), comparable to 'int' on x86).
up to date 16-Bit sound system, for e.g. generating bytebeat based stuff
built in BBC Basic including an Assembler

What does it lack (but mostly not relevant to tiny intro sizecoding) ?

no multicore support
no shader access or any kind of open-gl or direct-x
lack of software development in general, so web browsing is there but a bit limited

Code Examples - Simple sizecoding framework and output to screen

So what would a common intro framework look like ? For now we will use the gnu assembler to assemble our code, as the built in BASIC Assembler doesn't support THUMB code.

Before we start with the actual code it's best to define some of the mentioned SWI's for OS interaction by their number. Here's a list of some basic ones.

.set OS_ScreenMode, 0x65
.set OS_RemoveCursors, 0x36
.set OS_ScreenMode, 0x65
.set OS_ReadVduVariables, 0x31
.set OS_ReadMonotonicTime, 0x42
.set OS_ReadEscapeState, 0x2c
.set OS_Exit, 0x11

So for a basic intro loop in THUMB-2 this would look like

.syntax unified
.thumb                   //assemble using thumb mode
movs r0,#0               //reason code to set screen mode by number
movs r1,#13              //screen mode 13 = 320x256 256 colours
swi OS_ScreenMode        //set screen mode 
adr.n r0,screen_address  //address of input block to read screen mode address
movs r1,r0               //address of output block where screen mode address is stored  
swi OS_ReadVduVariables  //read and write screen mode address from/to blocks 

mainloop:
ldr.n r7,screen_address  //read screen address
swi OS_ReadMonotonicTime //get OS timer to r0
movs r2,#255             //screen y
yloop:
   movs r1,#320          //screen x
   xloop:
      adds r3,r1,r0      //p = x+timer
      eors r3,r3,r2      //p = (x+timer) xor y
      strb r3,[r7],1     //plot result as byte (with standard palette)
      subs r1,r1,#1      //dec x 
   bne xloop
   subs r2,r2,#1         //dec y
bge yloop
swi OS_ReadEscapeState   //ESC pressed ?
bcc mainloop
swi OS_Exit              //if yes exit to OS

.align 2                 //align
screen_address:
.word 148                //input block to read screen address
.word -1                 //request block needs to be terminated by -1

This assembles to 52 Bytes.

As you can see for setting the screen mode you can rely on smaller old school modes with up to e.g. 800x600x256 colours by just choosing a mode by a number (listed here: Screen Modes). After you set the screen mode you got to read it's start address by the OS_ReadVduVariables, as that is not a fixed address. On one specific device it should work to read that address and finally hardcode this address into your code, but then of course you would be restricted to your device (e.g. a RPI4 shows different results than a RPI3 for the same screen mode).

An intro showing that technique is e.g. Exoticorn's Edgedancer

If you want to go for true colour it's a bit more complex. The probably shortest way is to use the option to kind of upgrade those old school screen modes by a string using reason code 15 of the SWI (Check out this link for further information). That would look like this code snippet:

.syntax unified
.thumb                   //assemble using thumb mode
movs r0,#15              //reason code to request screen mode by string     
adr.n r1,mode_string     //pointer to string
swi OS_ScreenMode        //set screen mode 
adr.n r0,screen_address  //address of input block to read screen mode address
movs r1,r0               //address of output block where screen mode address is stored  
swi OS_ReadVduVariables  //read and write screen mode address from/to blocks 

mainloop:
ldr.n r7,screen_address  //read screen address
swi OS_ReadMonotonicTime //get OS timer to r0
movs r2,#255             //screen y
ands r0,r0,r2            //get lowest byte of timer
lsls r0,r0,#8            //create 'B' for RGB from timer
yloop:
   lsls r4,r2,#16        //create 'R' for RGB from y
   orrs r4,r4,r0         //combine 'R' and 'B'
   movs r1,#320          //screen x
   xloop:
      lsrs r3,r1,#1      //x>>1 for 'G' as x>256
      orrs r3,r3,r4      //finalize RGB value 
      stmia r7!,{r3}     //store true colour pixel and increment address
      subs r1,r1,#1      //dec x 
   bne xloop
   subs r2,r2,#1         //dec y
bge yloop
swi OS_ReadEscapeState   //ESC pressed ?
bcc mainloop
swi OS_Exit              //if yes exit to OS

.align 2                 //align
mode_string:
.string "13 C16M"        //screen mode string (terminated by 0) => 13 = 320*256 C16M = true colour
screen_address:
.word 148                //input block to read screen address
.word -1                 //request block needs to be terminated by -1

This assembles to 68 Bytes.

An intro showing that technique is e.g. Exoticorn's Elsecaller

Another approach is to read the current screen mode, as most users would run in 1920x1080x32Bit anyway and not even set the screen mode, which also makes the intro independent of the resolution:

An intro showing that technique is e.g. Kuemmels's Risc OS 3dball. In a later upgrade to that intro you can also see the combined use of THUMB-2 and NEON within the code which lead to a reduction in code size from the initial non-THUMB version of around 44 Bytes. For more insights and requirements of the use of VFP/NEON check out the section below.

To trigger the THUMB mode in general you can conveniently set the first Bit of the start address by the following command on the command line in Risc OS (&8000 is the general start address for executables in Risc OS). The best way to do so is to use a batch file for that, as shown in most of the above mentioned intros:

SYS "OS_File",1,"filename",&8000,&8001,,19

Regarding THUMB mode on Risc OS in general there's a small thing to address. A very ancient module has to be removed from the OS, otherwise it crashes your code. By today that bug is still not fixed. The modules names is "SpecialFX" and needs to be removed by "rmkill SpecialFX" on the command line or by any batch file as shown in the intro links from above.

To exit your intro and go back to the desktop you simple use the shown SWI OS_Exit. If you didn't change the mode you got to use e.g. the SWI "OS_NewLine" to re-trigger desktop redraw. Of course all of those can be omitted if your tiny intro compo rules allow you too...