Difference between revisions of "MUZAK58"
(→Music Sequence Table) |
m |
||
(9 intermediate revisions by the same user not shown) | |||
Line 1: | Line 1: | ||
− | [https://www.pouet.net/prod.php?which=96071 MUZAK58] was created by wiRe/Napalm and is 58 bytes in size. It | + | [https://www.pouet.net/prod.php?which=96071 MUZAK58] was created by [https://www.pouet.net/user.php?who=106446 wiRe/Napalm] and is 58 bytes in size. You can watch the video [https://www.youtube.com/watch?v=AX5OYQzzi1g here]. It won 4th place at the Lovebyte 2024 demoscene party and is a pure tech demo of a size-optimized bytebeat player for MSDOS and COVOX LPT-DAC, as also used in other sizecoding releases by it's inventor. This page describes how this player works and how it can be adopted for other releases. Feel free to use these ideas and techniques in your own sizecoding productions, but please give a credit to wiRe then. Commercial use is not allowed. |
+ | |||
+ | Before you continue, make sure to read [[Bytebeat]] and [[Output#Producing_sound]] for the basics. | ||
= COVOX LPT-DAC = | = COVOX LPT-DAC = | ||
− | The COVOX LPT-DAC, also called Disney Sound, is an 8-bit digital-to-analog converter (DAC) connected to the 8 data | + | The COVOX LPT-DAC, also called Disney Sound, is an 8-bit digital-to-analog converter (DAC) connected to the 8 data output lines of a parallel printer port (LPT). Typically it was assembled using a simple R-2R resistor ladder to perform the conversion to an analog signal, so it was very cheap to build such a hardware device on your own at that time. Playing back an 8-bit sample, such as the output of a bytebeat algorithm, through COVOX LPT DAC is a very simple task. Assuming the next sample value is in register AL, then this is all you need to do: |
<syntaxhighlight lang="nasm"> | <syntaxhighlight lang="nasm"> | ||
− | + | mov dx, 0378h ;BA7803 ;load LPT1 port address into DX | |
− | + | out dx, al ;EE ;send 8-bit sample data to COVOX device | |
</syntaxhighlight> | </syntaxhighlight> | ||
'''HINT:''' It is also possible to have 2 COVOX adapters, e.g. connected to LPT1 and LPT2, and send out two samples in parallel for stereo output, one sample for the left and one for the right channel. | '''HINT:''' It is also possible to have 2 COVOX adapters, e.g. connected to LPT1 and LPT2, and send out two samples in parallel for stereo output, one sample for the left and one for the right channel. | ||
− | But for a good audio playback quality, the time between two LPT1 writes should match the sampling rate | + | '''HINT:''' It is also possible to play a bytebeat over the PC speaker in lower quality, as described here: [[Output#PC_Speaker_variant]] |
+ | |||
+ | But for a good audio playback quality, the time between two LPT1 writes should match the sampling rate quite well. Also, the bytebeat algorithm needs a time counter as input that reflects the current sample number. Therefore we need a good time source. | ||
= Time Source = | = Time Source = | ||
− | + | To play data through the COVOX LPTDAC, we need a fairly accurate timer. A typical sample rate would be 8 kHz, but higher values can also be used. Lower values may also work in some special cases, but then very lo-fi. There are several ways to get such a timer: | |
* Timer Interrupt | * Timer Interrupt | ||
− | * Poll BIOS Counter | + | * Poll BIOS Counter |
* Poll PIT Counter | * Poll PIT Counter | ||
* Alternative Options | * Alternative Options | ||
Line 21: | Line 25: | ||
== Timer Interrupt == | == Timer Interrupt == | ||
As described here: [[Output#Advanced_PC_Speaker_and_COVOX_sound_via_interrupt]]. | As described here: [[Output#Advanced_PC_Speaker_and_COVOX_sound_via_interrupt]]. | ||
− | While this is the most accurate way to drive the COVOX, it is | + | While this is the most accurate way to drive the COVOX, it is probably also the most expensive. Setting up the new interrupt handler (ignoring even the backup and restore of the old handler), the overhead of the handler itself and the problem of exchanging any data between the handler and the non-interrupt code. All of this will cost bytes. In most cases, it will take less size to poll the timer instead, as in all the other variants described next. But it must be also be clear that the polling approach makes it necessary to perform this task at a higher frequency than the actual sampling rate, i.e. 8kHz. This requires the polling to be done in an inner loop, e.g. after every pixel update, which can eat up quite a bit of performance. |
== Poll BIOS Counter == | == Poll BIOS Counter == | ||
− | The DWORD at [0:0x046C] holds the number of BIOS timer ticks. Typically INT8 | + | The DWORD at [0:0x046C] holds the number of BIOS timer ticks. Typically, INT8 runs at a frequency of 18.2 Hz. On each call the default interrupt handler increments this value by 1. Reusing this default handler avoids the cost of writing a custom handler just to implement the counter incrementing logic. So, a simple solution to get a timer counter is to reconfigure the PIT for an 8kHz rate. This will trigger the default INT8 handler 8000 times per second. Then this counter can be polled periodically inside the inner loop. As soon as its LSB changes, another sample must be generated, also using this counter value as sample counter, and sent to LPT1. This could look like this: |
<syntaxhighlight lang="nasm"> | <syntaxhighlight lang="nasm"> | ||
− | + | mov al, 149 ;B095 ;program PIT #0 to ~8kHz (1.19318181818 MHz / 149 = 8007.93 Hz) | |
− | + | out 40h, al ;E640 ; | |
− | + | salc ;D6 ; AL := 0 (if CF=0) | |
− | + | out 40h, al ;E640 ; | |
− | + | ; ... | |
− | suplp: mov al, [046Ch] ;A0xxxx ;load LSB from BIOS timer, assuming DS=0 | + | suplp: mov al, [046Ch] ;A0xxxx ;load LSB from BIOS timer, assuming DS=0 |
− | _tcmp: cmp al, 0x55 ;2C?? ;did timer value changed? | + | _tcmp: cmp al, 0x55 ;2C?? ;did timer value changed? |
− | + | jz ntick ;74xx ; no -> skip audio | |
− | + | mov [_tcmp+1], al ;A2xxxx ;remember last BIOS timer value (selfmodifying code) | |
− | + | inc ebp ;6645 ;increment 32-bit timer counter | |
− | + | ; ... set AL to next audio sample based on EBP ... | |
− | + | mov dx, 0378h ;BA7803 ;load LPT1 port address into DX | |
− | + | out dx, al ;EE ;send 8-bit sample data to COVOX device | |
− | ntick: | + | ntick: |
− | + | ; ... | |
− | + | jmp short suplp | |
</syntaxhighlight> | </syntaxhighlight> | ||
− | '''HINT:''' Instead incrementing your own 16- or 32-bit timer counter (EBP inside the above example) someone could also use the BIOS timer counter itself, located at DWORD [0:0x046C]. Whatever | + | '''HINT:''' Instead incrementing your own 16- or 32-bit timer counter (EBP inside the above example) someone could also use the BIOS timer counter itself, located at DWORD [0:0x046C]. Whatever suits you better. |
== Poll PIT Counter == | == Poll PIT Counter == | ||
+ | |||
(...more soon...) | (...more soon...) | ||
+ | |||
+ | [http://wiki.osdev.org/Programmable_Interval_Timer Programmable Interval Timer] | ||
+ | |||
+ | |||
+ | This solution may result in the shortest code. A disadvantage is the very slow access to the PIT register. On modern chipsets the PIT 8254 is emulated by the southbridge. | ||
== Alternative Options == | == Alternative Options == | ||
− | Another | + | Another way to get an accurate time is to read the processor's timestamp counter using the [https://www.felixcloutier.com/x86/rdtsc RDTSC] instruction. |
= The Bytebeat = | = The Bytebeat = | ||
− | [[Bytebeat]] is simply | + | [[Bytebeat]] is simply an 8-bit uncompressed audio wave stream at any fixed sampling rate, that is expressed by a single, more or less complex, mathematical function ''f(t)'', where ''t'' is the time represented by the number of the sample, which is also equal to the byte offset within the stream. It will start generating the first sample for ''t=0'' and will play the sample ''f(8000)'' after exactly 1 second if the sampling rate is 8kHz. Since this is actually a softsynth (music synthesis done by software), in theory any sound or music can be approximated in this way. There are no limits except the increasing complexity of the resulting function. |
− | In general, any bytebeat algorithm can be implemented | + | In general, any bytebeat algorithm can be implemented to generate the next sample to be written to the COVOX LPT1. But in terms of size, a bytebeat algorithm is better suited if it's formula can be implemented in as few bytes as possible. [https://www.pouet.net/prod.php?which=96071 MUZAK58] is to a certain extent a generic or reusable background music player. Of course it is also possible to modify the player itself to change the music style (more on that later), but the source of the music played comes from sequence tables stored in memory. Changing these words will result in completely new music being played. If you use more words for this table, the song becomes more complex so that it does not repeat itself so quickly. The sequence table of this reference example is 10 bytes long and looks like this: |
+ | <syntaxhighlight lang="nasm"> | ||
+ | seqtbl: dw 0x1413 | ||
+ | dw 0x6C66 | ||
+ | dw 0x2242 | ||
+ | dw 0x6495 | ||
+ | dw 0x4484 | ||
+ | </syntaxhighlight> | ||
− | + | The method used to achieve a size-optimized, yet flexible bytebeat is described next. | |
− | |||
− | + | == Music Sequencer == | |
+ | As you can read in many bytebeat tutorials, like [[Steady_On_Tim]] by Gasman or in the paper published by Viznut, the basic idea to generate a melody with a bytebeat is to modify some basic waveform oscillator function ''o(t)'', like sawtooth, square, triangle or sine waveforms, by multiplying the time parameter ''t'' by a scale factor ''p'': ''f(t) = o(t*p)''. This multiplication factor modulates the pitch. If we then use a sequence table ''s(t)'' to replace ''p'', which will change the pitch of our oscillator over time, we can already play a simple melody using this formula: ''f(t) = o(t*s(t))''. | ||
− | + | Accordingly, we implement a single pitch-modulated oscillator with sawtooth waveform: | |
+ | <syntaxhighlight lang="javascript"> | ||
+ | (t*[1,2,4,8,16,8,4,2][(t>>11)%8])&255 | ||
+ | </syntaxhighlight> | ||
+ | ([https://bytebeat.demozoo.org/#t=0&e=0&s=8000&bb=5d00000100250000000000000000141d0145bdb13c9159728aa3da7e69b2fed6480708c016cc4525c68500003024ed9473119236434ffff34df800 listen to this bytebeat here]) | ||
− | + | To my knowledge, the above code is the simplest way to play a melody in a bytebeat, as long as it is defined by a sequence table. This example demonstrates a sequence of 8 steps, where ''S=8'' specifies the number of steps. Each step changes the pitch of the resulting sawtooth waveform. | |
− | + | Replacing the trailing "&255" (implicit for a bytebeat) by "&128" would change the sawtooth waveform to a square wave function: | |
+ | <syntaxhighlight lang="javascript"> | ||
+ | (t*[1,2,4,8,16,8,4,2][(t>>11)%8])&128 | ||
+ | </syntaxhighlight> | ||
+ | ([https://bytebeat.demozoo.org/#t=0&e=0&s=8000&bb=5d00000100250000000000000000141d0145bdb13c9159728aa3da7e69b2fed6480708c016cc4525c68500003024ed9dc1d9b391be7fffcfb76000 listen to this bytebeat here]) | ||
− | + | Other waveforms are also possible. Here we use the sine function: | |
+ | <syntaxhighlight lang="javascript"> | ||
+ | sin(t*[1,2,4,8,16,8,4,2][(t>>11)%8]/14)*127+127 | ||
+ | </syntaxhighlight> | ||
+ | ([https://bytebeat.demozoo.org/#t=0&e=0&s=8000&bb=5d000001002f0000000000000000399a4a1a8bae05d329e28520c901366398da262860ce3ea49cc63383ad4015395d56ced153c2b5712a75c831dca7c583fffcb53000 listen to this bytebeat here]) | ||
+ | Or distortion-like effects can be applied, as shown here using the XOR operator in the last step: | ||
+ | <syntaxhighlight lang="javascript"> | ||
+ | (t*[1,2,4,8,16,8,4,2][(t>>11)%8])^64 | ||
+ | </syntaxhighlight> | ||
+ | ([https://bytebeat.demozoo.org/#t=0&e=0&s=8000&bb=5d00000100240000000000000000141d0145bdb13c9159728aa3da7e69b2fed6480708c016cc4525c68500003024f067719de4f113fffded5400 listen to this bytebeat here]) | ||
+ | |||
+ | '''HINT:''' Instead of using only one modulated oscillator or one sequence, also 2 or more can be used and combined, e.g.: ''f(t) = (o0(t*s0(t)) + o1(t*s1(t))) / 2'' | ||
+ | |||
+ | So far, these are well known techniques used in bytebeat algorithms. With this knowledge we can already start to implement a bytebeat player with a sequence table containing as many steps ''S'' as we need for our composition, or at least as many as we can handle due to size constraints. The more steps ''S'' we use, the longer the song will last before it repeats. The larger the value of each sequence step can be, with a value range limited by ''log2(M)'' bits per step, the larger the range of notes we can end up using. Both parameters ''S'' and ''M'' will define the final byte size of our sequence table. | ||
+ | |||
+ | == Cascaded Sequences == | ||
+ | |||
+ | The problem we will face with this approach in sizecoding products is, that such a sequence table will grow quickly and end up consuming quite a few bytes. Our reference example [https://www.pouet.net/prod.php?which=96071 MUZAK58] uses 10 bytes for all it's song data. Using our knowledge at this point, we would be able to divide these 10 bytes into a sequence of 40 steps (''S = 40''), as long as the limited range per step given by 4 bits (''M = 2^4 = 16'') is sufficient for the music composition we have in mind. 40 steps is not less, but the severely limited range of less than 1 octave will limit us to what we would most likely end up calling a children's song. Instead, the reference muzak sounds like it is made up of at least a multiple of 32 steps before it starts to repeat. And the octave range does not seem to be limited to a single octave. What the hell is going on here? How is it possible to compress the sequence table like this? | ||
+ | |||
+ | The trick wiRe discovered here is to cascade multiple sequencers and combine all their outputs into a single sequence with a much longer sequence duration (before repetition) and a wider pitch range per sequence step: ''s(t) = (s0(t) * s1(t) * s2(t) * ...) / C'' | ||
+ | |||
+ | But this limits the composer's freedom, you might think. This is true! But you will see that the results you get are not as bad as you might think at first, in fact the resulting limitation can even give new impulses to creativity; something we already know as the sizecoding effect. | ||
+ | |||
+ | Here is an attempt to visualize how such an cascaded sequence evolves over time, showing the sequence table index of 5 cascaded sequencers in relation to the sequence step counter. ''O'' is the time divider to derive the step count ''stepcnt = t / O'' with ''O = log2(ticks_per_step)'' to avoid any integer division. | ||
+ | <syntaxhighlight lang="text"> | ||
+ | +--------+--------------+---------------------------------------------+ | ||
+ | | stpcnt | (t>>O) | 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 ... | | ||
+ | +--------+--------------+---------------------------------------------+ | ||
+ | | seq0ix | (t>>(O+0))%S | 0 1 2 3 0 1 2 3 0 1 2 3 0 1 2 3 0 1 2 3 ... | | ||
+ | | seq1ix | (t>>(O+1))%S | 0 0 1 1 2 2 3 3 0 0 1 1 2 2 3 3 0 0 1 1 ... | | ||
+ | | seq2ix | (t>>(O+2))%S | 0 0 0 0 1 1 1 1 2 2 2 2 3 3 3 3 0 0 0 0 ... | | ||
+ | | seq3ix | (t>>(O+3))%S | 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 2 2 2 2 ... | | ||
+ | | seq4ix | (t>>(O+4))%S | 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 ... | | ||
+ | +--------+--------------+---------------------------------------------+ | ||
+ | with S=4 | ||
+ | </syntaxhighlight> | ||
+ | |||
+ | In combination with our oscillator function, the whole bytebeat will look like this: ''f(t) = o( (t * s0(t) * s1(t) * s2(t) * ...) / C )'' | ||
+ | |||
+ | == Final Bytebeat Implementation == | ||
+ | |||
+ | The basic idea is to design the bytebeat algorithm as a loop that performs the same operations on each iteration to achieve the smallest possible size. If we decide to use a simple sawtooth oscillator, we have an easy game with our oscillator function being as simple as ''o(t) = t''. As we found out, the function ''f(t)'' is then only comprised of ''N+1'' terms, all multiplied together like this: ''f(t) = (t * s0(t) * s1(t) * s2(t) * ... * sN-1(t)) / C''. On each loop iteration of the final bytebeat player, the current sequencer ''sN(t)'' is evaluated by calculating the current sequencer index and looking it up in the sequencer table. The value stored there for this step is then multiplied towards the total result of ''f(t)''. If we keep ''M'' low, then even a 16-bit multiplication is sufficient. The final scaling factor ''C'' depends on the range of the values derived from the sequencer functions ''sN(t)''. Scaling is done as a shift-right operation in the last step. And with some tweaking of the sequencer step values can even be forced to result in a shift-right by 8. | ||
+ | |||
+ | The reference implementation uses a total of 5 cascaded sequencers: ''N=5''. The table of each sequencer was chosen to store 4 sequence steps: ''S=4''. Which sequencer step to index is then based on 2 bits of the parameter ''t''. The shortest sequencer step time for this song was chosen to be ''2^10'' samples, which gives us ''O=10''. This means that the lookup index for each sequencer ''i'' with ''0 <= i < N'' is derived by ''(t>>(O+i))%S''. Each step value is limited by ''M=16''. Putting all this together, we can now start composing a song in this way: | ||
<syntaxhighlight lang="cpp"> | <syntaxhighlight lang="cpp"> | ||
static constexpr unsigned O = 10; | static constexpr unsigned O = 10; | ||
static constexpr unsigned N = 5; | static constexpr unsigned N = 5; | ||
static constexpr unsigned S = 4; | static constexpr unsigned S = 4; | ||
− | static constexpr unsigned | + | static constexpr unsigned C = 256; |
static constexpr uint8_t seqtbl[N][S] = { {3,1,4,1}, {6,6,12,6}, {2,4,2,2}, {5,9,4,6}, {4,8,4,4} }; | static constexpr uint8_t seqtbl[N][S] = { {3,1,4,1}, {6,6,12,6}, {2,4,2,2}, {5,9,4,6}, {4,8,4,4} }; | ||
− | uint8_t | + | uint8_t get_next_sample( uint16_t t ) { |
− | for( unsigned i = 0; i < N; i++ ) t *= seqtbl[i][(t>>(i | + | for( unsigned i = 0; i < N; i++ ) t *= seqtbl[i][(t>>(O+i))%S]; |
− | return static_cast<uint8_t>(t / | + | return static_cast<uint8_t>(t / C); |
} | } | ||
</syntaxhighlight> | </syntaxhighlight> | ||
+ | Or the same thing written in Javascript: | ||
+ | <syntaxhighlight lang="javascript"> | ||
+ | t | ||
+ | * [3,1,4,1][3&t>>10] | ||
+ | * [6,6,12,6][3&t>>11] | ||
+ | * [2,4,2,2][3&t>>12] | ||
+ | * [5,9,4,6][3&t>>13] | ||
+ | * [4,8,4,4][3&t>>14] >> 8 | ||
+ | </syntaxhighlight> | ||
+ | ([https://bytebeat.demozoo.org/#t=0&e=0&s=8000&v=circles&bb=5d000001007000000000000000003a028140b2901c8f2d314244236cb35b1c788f43a8bd95752d36006aa55dbc6cdcbeb9b5eebb4a5495e65c56d4efcd7d11ba349adaa5ca64f88abeeec07f8c411feb6be3fcc21580 listen to this bytebeat here]) | ||
+ | |||
+ | = The Sourcecode = | ||
+ | |||
+ | With all these parameters carefully chosen, the final bytebeat implementation and sequence tables will be very small. Here is the commented source code of [https://www.pouet.net/prod.php?which=96071 MUZAK58]: | ||
+ | <syntaxhighlight lang="nasm"> | ||
+ | ;----------------------------------- | ||
+ | ; MUZAK58 by wiRe/NpM | ||
+ | ;----------------------------------- | ||
+ | section .text | ||
+ | org 100h | ||
+ | |||
+ | ;--------------------------------- ;---------- ;muzak sequence table | ||
+ | seqtbl: dw 0x1413 ;1314 ; t * [3,1,4,1][3&t>>10] ;! 1314 adc dx,[si] | ||
+ | dw 0x6C66 ;666C ; * [6,6,12,6][3&t>>11] ;! 666C o32 insb | ||
+ | dw 0x2242 ;4222 ; * [2,4,2,2][3&t>>12] ;! 42 inc dx | ||
+ | dw 0x6495 ;9564 ; * [5,9,4,6][3&t>>13] ;! 22956484 and dl,[di-0x7b9c] | ||
+ | dw 0x4484 ;8444 ; * [4,8,4,4][3&t>>14] >> 8 ;! 44 inc sp | ||
− | + | mov al, 0b00010000 ;B010 ;write 8253/8254 PIT command/mode register: resets PIT channel #0 | |
+ | out 43h, al ;E643 ; [7:6] channel #0, [5:4] LSB only, [3:1] mode0 (one-shot), [0] 16-bit binary | ||
− | (... | + | ;--------------------------------- ;---------- ;present next audio sample (DX:BX = 32-bit sample counter) |
+ | bbeat: add al, 149 ;04xx ; calculate new timer period (AL = 42..148) | ||
+ | out 40h, al ;E640 ; rearm timer | ||
+ | |||
+ | inc bx ;43 ; increment 16-bit timer counter | ||
+ | |||
+ | pusha ;60 ; store all registers | ||
+ | ;mov si, seqtbl ;BExxxx ; load address of sequence table into SI (here SI already points to seqtbl by default) | ||
+ | mov dx, bx ;89DA ; load start value into DX | ||
+ | mov cl, 5 ;B1xx ; init index counter inside CX (CH must be zero already!) | ||
+ | bbeat_lp: push cx ;51 ; store CX counter | ||
+ | mov cl, 01100b ;B1xx ; get bit sequence from time into CL | ||
+ | and cl, bh ;20F9 ; CL := offset to 1 out of 4 entries | ||
+ | lodsw ;AD ; load next sequence table entry (AX := DS:[SI]; SI := SI + 2) | ||
+ | ror ax, cl ;D3C8 ; select sequence entry at bit-offset 0, 4, 8 or 12 | ||
+ | and ax, 01111b ;83E00F ; each sequence entry is 4 bits only (AX &= 15) | ||
+ | mul dx ;F7E2 ; multiply (DX:AX := AX ∗ DX) | ||
+ | xchg ax, dx ;92 ; DX := updated 16-bit sample | ||
+ | pop cx ;59 ; restore CX counter | ||
+ | shr bx, 1 ;D1ED ; get next bit sequence from time | ||
+ | loop bbeat_lp ;E2xx ; loop until all bits are out | ||
+ | |||
+ | mov al, dh ;88F0 ; get sample data into AL | ||
+ | mov dx, 0378h ;BA7803 ; load LPT1 port address into DX | ||
+ | out dx, al ;EE ; send 8-bit sample data to COVOX device | ||
+ | popa ;61 ; restore all registers (especially BX, CX, DX, SI) | ||
+ | |||
+ | main_lp: ;--------------------------------- ;---------- ;read 8253/8254 PIT ch#0 counter value (ch#0 must be reconfigured to 0b00010000) | ||
+ | in al, 40h ;E440 ; read low-byte | ||
+ | cmp al, 148 ;3Cxx ; did timer counter overflowed to 149..0FFh? | ||
+ | jo short bbeat ;71xx ; yes -> play | ||
+ | |||
+ | ;... the place for your intro | ||
+ | |||
+ | jmp short main_lp ;75xx ; loop forever | ||
+ | </syntaxhighlight> |
Latest revision as of 02:06, 12 June 2024
MUZAK58 was created by wiRe/Napalm and is 58 bytes in size. You can watch the video here. It won 4th place at the Lovebyte 2024 demoscene party and is a pure tech demo of a size-optimized bytebeat player for MSDOS and COVOX LPT-DAC, as also used in other sizecoding releases by it's inventor. This page describes how this player works and how it can be adopted for other releases. Feel free to use these ideas and techniques in your own sizecoding productions, but please give a credit to wiRe then. Commercial use is not allowed.
Before you continue, make sure to read Bytebeat and Output#Producing_sound for the basics.
Contents
COVOX LPT-DAC
The COVOX LPT-DAC, also called Disney Sound, is an 8-bit digital-to-analog converter (DAC) connected to the 8 data output lines of a parallel printer port (LPT). Typically it was assembled using a simple R-2R resistor ladder to perform the conversion to an analog signal, so it was very cheap to build such a hardware device on your own at that time. Playing back an 8-bit sample, such as the output of a bytebeat algorithm, through COVOX LPT DAC is a very simple task. Assuming the next sample value is in register AL, then this is all you need to do:
mov dx, 0378h ;BA7803 ;load LPT1 port address into DX
out dx, al ;EE ;send 8-bit sample data to COVOX device
HINT: It is also possible to have 2 COVOX adapters, e.g. connected to LPT1 and LPT2, and send out two samples in parallel for stereo output, one sample for the left and one for the right channel.
HINT: It is also possible to play a bytebeat over the PC speaker in lower quality, as described here: Output#PC_Speaker_variant
But for a good audio playback quality, the time between two LPT1 writes should match the sampling rate quite well. Also, the bytebeat algorithm needs a time counter as input that reflects the current sample number. Therefore we need a good time source.
Time Source
To play data through the COVOX LPTDAC, we need a fairly accurate timer. A typical sample rate would be 8 kHz, but higher values can also be used. Lower values may also work in some special cases, but then very lo-fi. There are several ways to get such a timer:
- Timer Interrupt
- Poll BIOS Counter
- Poll PIT Counter
- Alternative Options
Timer Interrupt
As described here: Output#Advanced_PC_Speaker_and_COVOX_sound_via_interrupt. While this is the most accurate way to drive the COVOX, it is probably also the most expensive. Setting up the new interrupt handler (ignoring even the backup and restore of the old handler), the overhead of the handler itself and the problem of exchanging any data between the handler and the non-interrupt code. All of this will cost bytes. In most cases, it will take less size to poll the timer instead, as in all the other variants described next. But it must be also be clear that the polling approach makes it necessary to perform this task at a higher frequency than the actual sampling rate, i.e. 8kHz. This requires the polling to be done in an inner loop, e.g. after every pixel update, which can eat up quite a bit of performance.
Poll BIOS Counter
The DWORD at [0:0x046C] holds the number of BIOS timer ticks. Typically, INT8 runs at a frequency of 18.2 Hz. On each call the default interrupt handler increments this value by 1. Reusing this default handler avoids the cost of writing a custom handler just to implement the counter incrementing logic. So, a simple solution to get a timer counter is to reconfigure the PIT for an 8kHz rate. This will trigger the default INT8 handler 8000 times per second. Then this counter can be polled periodically inside the inner loop. As soon as its LSB changes, another sample must be generated, also using this counter value as sample counter, and sent to LPT1. This could look like this:
mov al, 149 ;B095 ;program PIT #0 to ~8kHz (1.19318181818 MHz / 149 = 8007.93 Hz)
out 40h, al ;E640 ;
salc ;D6 ; AL := 0 (if CF=0)
out 40h, al ;E640 ;
; ...
suplp: mov al, [046Ch] ;A0xxxx ;load LSB from BIOS timer, assuming DS=0
_tcmp: cmp al, 0x55 ;2C?? ;did timer value changed?
jz ntick ;74xx ; no -> skip audio
mov [_tcmp+1], al ;A2xxxx ;remember last BIOS timer value (selfmodifying code)
inc ebp ;6645 ;increment 32-bit timer counter
; ... set AL to next audio sample based on EBP ...
mov dx, 0378h ;BA7803 ;load LPT1 port address into DX
out dx, al ;EE ;send 8-bit sample data to COVOX device
ntick:
; ...
jmp short suplp
HINT: Instead incrementing your own 16- or 32-bit timer counter (EBP inside the above example) someone could also use the BIOS timer counter itself, located at DWORD [0:0x046C]. Whatever suits you better.
Poll PIT Counter
(...more soon...)
This solution may result in the shortest code. A disadvantage is the very slow access to the PIT register. On modern chipsets the PIT 8254 is emulated by the southbridge.
Alternative Options
Another way to get an accurate time is to read the processor's timestamp counter using the RDTSC instruction.
The Bytebeat
Bytebeat is simply an 8-bit uncompressed audio wave stream at any fixed sampling rate, that is expressed by a single, more or less complex, mathematical function f(t), where t is the time represented by the number of the sample, which is also equal to the byte offset within the stream. It will start generating the first sample for t=0 and will play the sample f(8000) after exactly 1 second if the sampling rate is 8kHz. Since this is actually a softsynth (music synthesis done by software), in theory any sound or music can be approximated in this way. There are no limits except the increasing complexity of the resulting function.
In general, any bytebeat algorithm can be implemented to generate the next sample to be written to the COVOX LPT1. But in terms of size, a bytebeat algorithm is better suited if it's formula can be implemented in as few bytes as possible. MUZAK58 is to a certain extent a generic or reusable background music player. Of course it is also possible to modify the player itself to change the music style (more on that later), but the source of the music played comes from sequence tables stored in memory. Changing these words will result in completely new music being played. If you use more words for this table, the song becomes more complex so that it does not repeat itself so quickly. The sequence table of this reference example is 10 bytes long and looks like this:
seqtbl: dw 0x1413
dw 0x6C66
dw 0x2242
dw 0x6495
dw 0x4484
The method used to achieve a size-optimized, yet flexible bytebeat is described next.
Music Sequencer
As you can read in many bytebeat tutorials, like Steady_On_Tim by Gasman or in the paper published by Viznut, the basic idea to generate a melody with a bytebeat is to modify some basic waveform oscillator function o(t), like sawtooth, square, triangle or sine waveforms, by multiplying the time parameter t by a scale factor p: f(t) = o(t*p). This multiplication factor modulates the pitch. If we then use a sequence table s(t) to replace p, which will change the pitch of our oscillator over time, we can already play a simple melody using this formula: f(t) = o(t*s(t)).
Accordingly, we implement a single pitch-modulated oscillator with sawtooth waveform:
(t*[1,2,4,8,16,8,4,2][(t>>11)%8])&255
(listen to this bytebeat here)
To my knowledge, the above code is the simplest way to play a melody in a bytebeat, as long as it is defined by a sequence table. This example demonstrates a sequence of 8 steps, where S=8 specifies the number of steps. Each step changes the pitch of the resulting sawtooth waveform.
Replacing the trailing "&255" (implicit for a bytebeat) by "&128" would change the sawtooth waveform to a square wave function:
(t*[1,2,4,8,16,8,4,2][(t>>11)%8])&128
(listen to this bytebeat here)
Other waveforms are also possible. Here we use the sine function:
sin(t*[1,2,4,8,16,8,4,2][(t>>11)%8]/14)*127+127
(listen to this bytebeat here)
Or distortion-like effects can be applied, as shown here using the XOR operator in the last step:
(t*[1,2,4,8,16,8,4,2][(t>>11)%8])^64
(listen to this bytebeat here)
HINT: Instead of using only one modulated oscillator or one sequence, also 2 or more can be used and combined, e.g.: f(t) = (o0(t*s0(t)) + o1(t*s1(t))) / 2
So far, these are well known techniques used in bytebeat algorithms. With this knowledge we can already start to implement a bytebeat player with a sequence table containing as many steps S as we need for our composition, or at least as many as we can handle due to size constraints. The more steps S we use, the longer the song will last before it repeats. The larger the value of each sequence step can be, with a value range limited by log2(M) bits per step, the larger the range of notes we can end up using. Both parameters S and M will define the final byte size of our sequence table.
Cascaded Sequences
The problem we will face with this approach in sizecoding products is, that such a sequence table will grow quickly and end up consuming quite a few bytes. Our reference example MUZAK58 uses 10 bytes for all it's song data. Using our knowledge at this point, we would be able to divide these 10 bytes into a sequence of 40 steps (S = 40), as long as the limited range per step given by 4 bits (M = 2^4 = 16) is sufficient for the music composition we have in mind. 40 steps is not less, but the severely limited range of less than 1 octave will limit us to what we would most likely end up calling a children's song. Instead, the reference muzak sounds like it is made up of at least a multiple of 32 steps before it starts to repeat. And the octave range does not seem to be limited to a single octave. What the hell is going on here? How is it possible to compress the sequence table like this?
The trick wiRe discovered here is to cascade multiple sequencers and combine all their outputs into a single sequence with a much longer sequence duration (before repetition) and a wider pitch range per sequence step: s(t) = (s0(t) * s1(t) * s2(t) * ...) / C
But this limits the composer's freedom, you might think. This is true! But you will see that the results you get are not as bad as you might think at first, in fact the resulting limitation can even give new impulses to creativity; something we already know as the sizecoding effect.
Here is an attempt to visualize how such an cascaded sequence evolves over time, showing the sequence table index of 5 cascaded sequencers in relation to the sequence step counter. O is the time divider to derive the step count stepcnt = t / O with O = log2(ticks_per_step) to avoid any integer division.
+--------+--------------+---------------------------------------------+
| stpcnt | (t>>O) | 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 ... |
+--------+--------------+---------------------------------------------+
| seq0ix | (t>>(O+0))%S | 0 1 2 3 0 1 2 3 0 1 2 3 0 1 2 3 0 1 2 3 ... |
| seq1ix | (t>>(O+1))%S | 0 0 1 1 2 2 3 3 0 0 1 1 2 2 3 3 0 0 1 1 ... |
| seq2ix | (t>>(O+2))%S | 0 0 0 0 1 1 1 1 2 2 2 2 3 3 3 3 0 0 0 0 ... |
| seq3ix | (t>>(O+3))%S | 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 2 2 2 2 ... |
| seq4ix | (t>>(O+4))%S | 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 ... |
+--------+--------------+---------------------------------------------+
with S=4
In combination with our oscillator function, the whole bytebeat will look like this: f(t) = o( (t * s0(t) * s1(t) * s2(t) * ...) / C )
Final Bytebeat Implementation
The basic idea is to design the bytebeat algorithm as a loop that performs the same operations on each iteration to achieve the smallest possible size. If we decide to use a simple sawtooth oscillator, we have an easy game with our oscillator function being as simple as o(t) = t. As we found out, the function f(t) is then only comprised of N+1 terms, all multiplied together like this: f(t) = (t * s0(t) * s1(t) * s2(t) * ... * sN-1(t)) / C. On each loop iteration of the final bytebeat player, the current sequencer sN(t) is evaluated by calculating the current sequencer index and looking it up in the sequencer table. The value stored there for this step is then multiplied towards the total result of f(t). If we keep M low, then even a 16-bit multiplication is sufficient. The final scaling factor C depends on the range of the values derived from the sequencer functions sN(t). Scaling is done as a shift-right operation in the last step. And with some tweaking of the sequencer step values can even be forced to result in a shift-right by 8.
The reference implementation uses a total of 5 cascaded sequencers: N=5. The table of each sequencer was chosen to store 4 sequence steps: S=4. Which sequencer step to index is then based on 2 bits of the parameter t. The shortest sequencer step time for this song was chosen to be 2^10 samples, which gives us O=10. This means that the lookup index for each sequencer i with 0 <= i < N is derived by (t>>(O+i))%S. Each step value is limited by M=16. Putting all this together, we can now start composing a song in this way:
static constexpr unsigned O = 10;
static constexpr unsigned N = 5;
static constexpr unsigned S = 4;
static constexpr unsigned C = 256;
static constexpr uint8_t seqtbl[N][S] = { {3,1,4,1}, {6,6,12,6}, {2,4,2,2}, {5,9,4,6}, {4,8,4,4} };
uint8_t get_next_sample( uint16_t t ) {
for( unsigned i = 0; i < N; i++ ) t *= seqtbl[i][(t>>(O+i))%S];
return static_cast<uint8_t>(t / C);
}
Or the same thing written in Javascript:
t
* [3,1,4,1][3&t>>10]
* [6,6,12,6][3&t>>11]
* [2,4,2,2][3&t>>12]
* [5,9,4,6][3&t>>13]
* [4,8,4,4][3&t>>14] >> 8
(listen to this bytebeat here)
The Sourcecode
With all these parameters carefully chosen, the final bytebeat implementation and sequence tables will be very small. Here is the commented source code of MUZAK58:
;-----------------------------------
; MUZAK58 by wiRe/NpM
;-----------------------------------
section .text
org 100h
;--------------------------------- ;---------- ;muzak sequence table
seqtbl: dw 0x1413 ;1314 ; t * [3,1,4,1][3&t>>10] ;! 1314 adc dx,[si]
dw 0x6C66 ;666C ; * [6,6,12,6][3&t>>11] ;! 666C o32 insb
dw 0x2242 ;4222 ; * [2,4,2,2][3&t>>12] ;! 42 inc dx
dw 0x6495 ;9564 ; * [5,9,4,6][3&t>>13] ;! 22956484 and dl,[di-0x7b9c]
dw 0x4484 ;8444 ; * [4,8,4,4][3&t>>14] >> 8 ;! 44 inc sp
mov al, 0b00010000 ;B010 ;write 8253/8254 PIT command/mode register: resets PIT channel #0
out 43h, al ;E643 ; [7:6] channel #0, [5:4] LSB only, [3:1] mode0 (one-shot), [0] 16-bit binary
;--------------------------------- ;---------- ;present next audio sample (DX:BX = 32-bit sample counter)
bbeat: add al, 149 ;04xx ; calculate new timer period (AL = 42..148)
out 40h, al ;E640 ; rearm timer
inc bx ;43 ; increment 16-bit timer counter
pusha ;60 ; store all registers
;mov si, seqtbl ;BExxxx ; load address of sequence table into SI (here SI already points to seqtbl by default)
mov dx, bx ;89DA ; load start value into DX
mov cl, 5 ;B1xx ; init index counter inside CX (CH must be zero already!)
bbeat_lp: push cx ;51 ; store CX counter
mov cl, 01100b ;B1xx ; get bit sequence from time into CL
and cl, bh ;20F9 ; CL := offset to 1 out of 4 entries
lodsw ;AD ; load next sequence table entry (AX := DS:[SI]; SI := SI + 2)
ror ax, cl ;D3C8 ; select sequence entry at bit-offset 0, 4, 8 or 12
and ax, 01111b ;83E00F ; each sequence entry is 4 bits only (AX &= 15)
mul dx ;F7E2 ; multiply (DX:AX := AX ∗ DX)
xchg ax, dx ;92 ; DX := updated 16-bit sample
pop cx ;59 ; restore CX counter
shr bx, 1 ;D1ED ; get next bit sequence from time
loop bbeat_lp ;E2xx ; loop until all bits are out
mov al, dh ;88F0 ; get sample data into AL
mov dx, 0378h ;BA7803 ; load LPT1 port address into DX
out dx, al ;EE ; send 8-bit sample data to COVOX device
popa ;61 ; restore all registers (especially BX, CX, DX, SI)
main_lp: ;--------------------------------- ;---------- ;read 8253/8254 PIT ch#0 counter value (ch#0 must be reconfigured to 0b00010000)
in al, 40h ;E440 ; read low-byte
cmp al, 148 ;3Cxx ; did timer counter overflowed to 149..0FFh?
jo short bbeat ;71xx ; yes -> play
;... the place for your intro
jmp short main_lp ;75xx ; loop forever