MUZAK58
MUZAK58 was created by wiRe/Napalm and is 58 bytes in size. Watch the video here. It achieved 4th place at the Lovebyte 2024 demoparty and is a pure tech demo of a size-optimized bytebeat player for MSDOS and COVOX LPT-DAC, as also used in other sizecoding releases by wiRe. This page will describe how this player works and how it can be adopted for other releases. Feel free to reuse those ideas and techniques in your own sizecoding productions, but please give a credit to wiRe then. Any commercial use is not permitted.
Before you continue, make sure to read Bytebeat and Output#Producing_sound for the basics.
Contents
COVOX LPT-DAC
The COVOX LPT-DAC, also called Disney Sound, is an 8-bit digital-to-analog converter (DAC) connected to the 8 data-out-lines of a parallel printer port (LPT). Typically it was assembled using a simple R-2R resistor ladder to perform the conversion into an analog signal, why it was very cheap to build such hardware device on your own those days. Playing back an 8-bit sample, like it is the output of a bytebeat algorithm, through COVOX LPT DAC is a very easy task. Assuming the next sample value is inside register AL, then this is all you must do:
mov dx, 0378h ;BA7803 ;load LPT1 port address into DX
out dx, al ;EE ;send 8-bit sample data to COVOX device
HINT: It is also possible to have 2 COVOX adapters, e.g. connected to LPT1 and LPT2, and send out two samples in parallel for stereo output, one sample for the left and one for the right channel.
HINT: It is also possible to play a bytebeat through the PC speaker at lower quality, as described here: Output#PC_Speaker_variant
But for a good audio playback quality, the time between two LPT1 writes should match the sampling rate pretty well. Also the bytebeat algorithm will require a time counter as input, which reflects the current sample number. This is why we need a good time source.
Time Source
For playing data through the COVOX LPTDAC, we need a pretty accurate timer. Typical sampling rate would be 8 kHz, but also higher values could be used. Lower values may also work in some special cases, but very Lo-Fi then. There are multiple possible options to get such a time counter:
- Timer Interrupt
- Poll BIOS Counter
- Poll PIT Counter
- Alternative Options
Timer Interrupt
As described here: Output#Advanced_PC_Speaker_and_COVOX_sound_via_interrupt. While this is the most accurate way to drive the COVOX, it is very likely also the most expensive one. Setting up the new interrupt handler (let's even ignore the backup and restore of the old handler), the overhead of the handler itself and the problem of exchanging any data between handler and non-interrupt code. All this will cost bytes. In most cases it will require less size if the timer is polled instead, like in all other variants described next. But it must be also clear that the polling approach will make it necessary to perform this task at a higher frequency than the actual samplingrate, i.e. 8kHz. This requires the polling to happen inside an inner loop, e.g. after each pixel update, which might eat up quite some performance.
Poll BIOS Counter
The DWORD at [0:0x046C] holds the number of BIOS timer ticks. Typically INT8 will run at a frequency of 18.2 Hz. On each call the default interrupt handler will increase this value by 1. Reusing this default handler will prevent the costs for setting up an own handler just to implement the counter increment logic. So, a simple solution to get a timer counter is to reconfigure the PIT for an 8kHz rate. This will trigger the default INT8 handler 8000 times per second. Then this counter can be polled periodically inside the inner loop. Once its LSB changes, another sample must be generated, also using this counter value as sample counter, and sent to LPT1. This could look like this:
mov al, 149 ;B095 ;program PIT #0 to ~8kHz (1.19318181818 MHz / 149 = 8007.93 Hz)
out 40h, al ;E640 ;
salc ;D6 ; AL := 0 (if CF=0)
out 40h, al ;E640 ;
; ...
suplp: mov al, [046Ch] ;A0xxxx ;load LSB from BIOS timer, assuming DS=0
_tcmp: cmp al, 0x55 ;2C?? ;did timer value changed?
jz ntick ;74xx ; no -> skip audio
mov [_tcmp+1], al ;A2xxxx ;remember last BIOS timer value (selfmodifying code)
inc ebp ;6645 ;increment 32-bit timer counter
; ... set AL to next audio sample based on EBP ...
mov dx, 0378h ;BA7803 ;load LPT1 port address into DX
out dx, al ;EE ;send 8-bit sample data to COVOX device
ntick:
; ...
jmp short suplp
HINT: Instead incrementing your own 16- or 32-bit timer counter (EBP inside the above example) someone could also use the BIOS timer counter itself, located at DWORD [0:0x046C]. Whatever fits better.
Poll PIT Counter
(...more soon...)
This solution may result in the shortest code. But one drawback is the very slow access to the PIT register. On modern chipsets the PIT 8254 is emulated by the southbridge.
Alternative Options
Another option to get an accurate time is to read the processor's time-stamp counter using the RDTSC instruction.
The Bytebeat
Bytebeat is simply spoken an 8-bit uncompressed audiowave stream at any fixed sampling rate, that is expressed by a single, more or less complex, mathematical function f(t), where t is the time represented by the number of the sample, which is also equal to the byte offset of the stream. It will start generation of the first sample for t=0 and, in case of an 8kHz samplingrate, will play the sample f(8000) after exactly 1 second. Since this is actually a Softsynth (music synthesis done by software), in theory any sound or music can be aproximated this way. There are no limits except the increasing complexity of the resulting function.
In general, any bytebeat algorithm can be implemented now to present the next sample to be written to the COVOX LPT1. But in respect to the size, a bytebeat algorithm is better suited if it's formula is as simple as possible, implementation-wise. MUZAK58 is to some degree a generic or reusable background music player. Of course it is also possible to modify the player itself to change the music style (more on that later), but the source of the played music comes from sequence tables stored in memory. Changing those words results in an entirely new music tune being played. Also spending more words for this table will vary the tune, that it will not repeat as fast. The sqeuence table of this reference example is 10 bytes in total and looks like this:
seqtbl: dw 0x1413
dw 0x6C66
dw 0x2242
dw 0x6495
dw 0x4484
The method used to achieve a size-optimized, but still flexible bytebeat is described next.
Music Sequencer
As you can read in many Bytebeat tutorials, like Steady_On_Tim by Gasman or inside the paper published by viznut, the basic idea to generate a melody with a bytebeat is to modify some basic waveform oscillator function o(t), like saw-tooth, square, triangle or sine waveforms, by multiplying the time parameter t by a scale factor p: f(t) = o(t*p). This multiplication factor will modulate the pitch. If we then use a sequence table s(t) to replace p, that will change the pitch of our oscillator over time, then we can already play some simple melody using this formula: 'f(t) = o(t*s(t))'.
Accordingly, we implement a single pitch-modulated oscillator with saw-tooth waveform:
(t*[1,2,4,8,16,8,4,2][(t>>11)%8])&255
(listen to this bytebeat here)
To my knowledge, the above code is the simplest way to play a melody in a bytebeat, as long as this should be defined by a sequence table. This example demonstrates a sequence of 8 steps, with S=8 specifying the number of steps. Each step will change the pitch of the resulting saw-tooth waveform.
Replacing the trailing "&255" (implicit for a bytebeat) by "&128" would change the saw-tooth waveform into a square-wave function:
(t*[1,2,4,8,16,8,4,2][(t>>11)%8])&128
(listen to this bytebeat here)
Also other waveforms are possible here. Here we use the sine function:
sin(t*[1,2,4,8,16,8,4,2][(t>>11)%8]/14)*127+127
(listen to this bytebeat here)
Or distortion-like effects can be applied, like demonstrated here by using the XOR operator in the last step:
(t*[1,2,4,8,16,8,4,2][(t>>11)%8])^64
(listen to this bytebeat here)
HINT: Instead of using only one modulated waveform or one sequence, also 2 or more can be used and combined, like i.e.: f(t) = (o0(t*s0(t)) + o1(t*s1(t))) / 2
So far, this are well known techniques used in bytebeat algorithms. With this knowledge we can already start to implement a bytebeat player with one sequence table that holds as many steps S as we need for our tune, or at least as many as we can effort due to size constraints. The more steps S we spent, the longer the tune will durate before it repeats. The larger the value of each sequence step could be, with a value range limited by log2(M) bits per step, the larger the range of notes we can use in the end. Both parameters S and M will define the final byte-size of our sequence table.
Cascaded Sequences
The issue we will face with this approach in size-coding is, that such a sequence table will quickly grow and consumes quite a lot of bytes in the end. Our reference example here, MUZAK58, spends 10 bytes for it's entire tune. If we take our knowledge at this point, then we would be able to use those 10 bytes to divide them into a sequence of 40 steps (S = 40), as long as the limited range per step given by 4 bits (M = 2^4 = 16) is enough for the music composition we have in mind. 40 steps is not less, but the heavily limited range of less than 1 octave will limit us to something that we would very likely call a children's song in the end. The reference tune instead sounds like being build out of at least a multiple of 32 steps, before it starts to repeat. And the octave range also does not appear to be limited to a single octave. What the hell is going on here? How is it possible to compress the sequence table in this way?
The trick discovered by wiRe here is to cascade multiple sequencers and combine all their output into a single sequence of much longer sequence duration (before repetition) and of wider pitch range per sequence step: s(t) = (s0(t) * s1(t) * s2(t) * ...) / C
But this will limit the freedom of the composer, you could think now. True! But you will see that the results you can achieve this way are not that bad as you may expect first. Indeed, the resulting limitation can even turn out to give new impulses to creativeness; something we already know as the sizecoding effect.
Here is some attempt to visualize how such an cascaded sequence will develop over time, showing the sequence table index of 5 cascaded sequencers in relation to the sequencer step count. O is the time divider to derive the step count stepcnt = t / O with O = log2(ticks_per_step) to avoid any integer division.
+--------+--------------+---------------------------------------------+
| stpcnt | (t>>O) | 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 ... |
+--------+--------------+---------------------------------------------+
| seq0ix | (t>>(O+0))%S | 0 1 2 3 0 1 2 3 0 1 2 3 0 1 2 3 0 1 2 3 ... |
| seq1ix | (t>>(O+1))%S | 0 0 1 1 2 2 3 3 0 0 1 1 2 2 3 3 0 0 1 1 ... |
| seq2ix | (t>>(O+2))%S | 0 0 0 0 1 1 1 1 2 2 2 2 3 3 3 3 0 0 0 0 ... |
| seq3ix | (t>>(O+3))%S | 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 2 2 2 2 ... |
| seq4ix | (t>>(O+4))%S | 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 ... |
+--------+--------------+---------------------------------------------+
In combination with our oscillator function, the entire bytebeat will finally look like this: f(t) = o( (t * s0(t) * s1(t) * s2(t) * ...) / C )
Final Bytebeat Implementation
The basic idea is to design the bytebeat algorithm as a loop that performs the same operations on each iteration to achieve the smallest possible size. If we decide to use a simple saw-tooth oscillator, we have an easy game with our oscillator function beeing as simple as o(t) = t. As we figured out, the function f(t) is then only comprised of N+1 terms, all multiplied together like this: f(t) = (t * s0(t) * s1(t) * s2(t) * ... * sN-1(t)) / C. On each loop iteration of the final bytebeat player, the current sequencer sN(t) is evaluated by looking up dtermining it's unique index and looking that up inside it's sequencer table. The value stored there for this step is then multiplied to the total result of f(t). If we keep M low, then even a 16-bit multiplication is enough here. The final scale factor C depends on the range of the values derived from the sequencer functions sN(t). Scaling happens as a shift-right-operation in the last step. And with some tweaking of the sequencer step values can even be forced to result in a shift by 8.
In the reference implementation, a total of 5 cascaded sequencers was used: N=5. Each sequencer's table was chosen to store 4 sequence steps: S=4. Which sequencer step to index is then based on 2-bits of parameter t. The fastest sequencer step time was chosen to be 2^10 timer ticks or samples, giving us O=10. This means the lookup index for each sequencer i with 0 <= i < N is derived by (t>>(O+i))%S, where O as start offset can be chosen different for each music tune. Each step value is limited by M=16. Putting this all together, we can now start composing one music tune this way:
static constexpr unsigned O = 10;
static constexpr unsigned N = 5;
static constexpr unsigned S = 4;
static constexpr unsigned A = 256;
static constexpr uint8_t seqtbl[N][S] = { {3,1,4,1}, {6,6,12,6}, {2,4,2,2}, {5,9,4,6}, {4,8,4,4} };
uint8_t get_next_sample( uint16_t t ) {
for( unsigned i = 0; i < N; i++ ) t *= seqtbl[i][(t>>(O+i))%S];
return static_cast<uint8_t>(t / A);
}
(listen to this bytebeat here)
The Sourcecode
With all these parameters being chosen carefully, the final bytebeat implementation and sequence tables will become very small. Here is the documented source code of MUZAK58:
;-----------------------------------
; MUZAK58 by wiRe/NpM
;-----------------------------------
section .text
org 100h
;--------------------------------- ;---------- ;muzak sequence table
seqtbl: dw 0x1413 ;1314 ; t * [3,1,4,1][3&t>>10] ;! 1314 adc dx,[si]
dw 0x6C66 ;666C ; * [6,6,12,6][3&t>>11] ;! 666C o32 insb
dw 0x2242 ;4222 ; * [2,4,2,2][3&t>>12] ;! 42 inc dx
dw 0x6495 ;9564 ; * [5,9,4,6][3&t>>13] ;! 22956484 and dl,[di-0x7b9c]
dw 0x4484 ;8444 ; * [4,8,4,4][3&t>>14] >> 8 ;! 44 inc sp
mov al, 0b00010000 ;B010 ;write 8253/8254 PIT command/mode register: resets PIT channel #0
out 43h, al ;E643 ; [7:6] channel #0, [5:4] LSB only, [3:1] mode0 (one-shot), [0] 16-bit binary
;--------------------------------- ;---------- ;present next audio sample (DX:BX = 32-bit sample counter)
bbeat: add al, 149 ;04xx ; calculate new timer period (AL = 42..148)
out 40h, al ;E640 ; rearm timer
inc bx ;43 ; increment 16-bit timer counter
pusha ;60 ; store all registers
;mov si, seqtbl ;BExxxx ; load address of sequence table into SI (here SI already points to seqtbl by default)
mov dx, bx ;89DA ; load start value into DX
mov cl, 5 ;B1xx ; init index counter inside CX (CH must be zero already!)
bbeat_lp: push cx ;51 ; store CX counter
mov cl, 01100b ;B1xx ; get bit sequence from time into CL
and cl, bh ;20F9 ; CL := offset to 1 out of 4 entries
lodsw ;AD ; load next sequence table entry (AX := DS:[SI]; SI := SI + 2)
ror ax, cl ;D3C8 ; select sequence entry at bit-offset 0, 4, 8 or 12
and ax, 01111b ;83E00F ; each sequence entry is 4 bits only (AX &= 15)
mul dx ;F7E2 ; multiply (DX:AX := AX ∗ DX)
xchg ax, dx ;92 ; DX := updated 16-bit sample
pop cx ;59 ; restore CX counter
shr bx, 1 ;D1ED ; get next bit sequence from time
loop bbeat_lp ;E2xx ; loop until all bits are out
mov al, dh ;88F0 ; get sample data into AL
mov dx, 0378h ;BA7803 ; load LPT1 port address into DX
out dx, al ;EE ; send 8-bit sample data to COVOX device
popa ;61 ; restore all registers (especially BX, CX, DX, SI)
suplp: ;--------------------------------- ;---------- ;read 8253/8254 PIT ch#0 counter value (ch#0 must be reconfigured to 0b00010000)
in al, 40h ;E440 ; read low-byte
cmp al, 148 ;3Cxx ; did timer counter overflowed to 149..0FFh?
jo bbeat ;71xx ; yes -> play
bbeat_sk: jmp short suplp ;75xx ; loop forever