Difference between revisions of "MUZAK58"

From SizeCoding
Jump to: navigation, search
m
 
(8 intermediate revisions by the same user not shown)
Line 1: Line 1:
[https://www.pouet.net/prod.php?which=96071 MUZAK58] was created by wiRe/Napalm and is 58 bytes in size. It achieved 4th place at the Lovebyte 2024 demoparty and is a pure tech demo of a size-optimized bytebeat player for MSDOS and COVOX LPT-DAC, as also used in other sizecoding releases by wiRe. This page will describe how this player works and how it can be adopted for other releases. Feel free to reuse those ideas and techniques in your own productions, but please give a credit to wiRe then. Before you continue, make sure to read [[Bytebeat]] and [[Output#Producing_sound]] for the basics.
+
[https://www.pouet.net/prod.php?which=96071 MUZAK58] was created by [https://www.pouet.net/user.php?who=106446 wiRe/Napalm] and is 58 bytes in size. You can watch the video [https://www.youtube.com/watch?v=AX5OYQzzi1g here]. It won 4th place at the Lovebyte 2024 demoscene party and is a pure tech demo of a size-optimized bytebeat player for MSDOS and COVOX LPT-DAC, as also used in other sizecoding releases by it's inventor. This page describes how this player works and how it can be adopted for other releases. Feel free to use these ideas and techniques in your own sizecoding productions, but please give a credit to wiRe then. Commercial use is not allowed.
 +
 
 +
Before you continue, make sure to read [[Bytebeat]] and [[Output#Producing_sound]] for the basics.
  
 
= COVOX LPT-DAC =
 
= COVOX LPT-DAC =
The COVOX LPT-DAC, also called Disney Sound, is an 8-bit digital-to-analog converter (DAC) connected to the 8 data-out-lines of a parallel printer port (LPT). Typically it was assembled using a simple R-2R resistor ladder to perform the conversion into an analog signal, why it was very cheap to build such hardware device on your own those days. Playing back an 8-bit sample, like it is the output of a bytebeat algorithm, through COVOX LPT DAC is a very easy task. Assuming the next sample value is inside register AL, then this is all you must do:
+
The COVOX LPT-DAC, also called Disney Sound, is an 8-bit digital-to-analog converter (DAC) connected to the 8 data output lines of a parallel printer port (LPT). Typically it was assembled using a simple R-2R resistor ladder to perform the conversion to an analog signal, so it was very cheap to build such a hardware device on your own at that time. Playing back an 8-bit sample, such as the output of a bytebeat algorithm, through COVOX LPT DAC is a very simple task. Assuming the next sample value is in register AL, then this is all you need to do:
 
<syntaxhighlight lang="nasm">
 
<syntaxhighlight lang="nasm">
          mov    dx, 0378h                ;BA7803    ;load LPT1 port address into DX
+
            mov    dx, 0378h                ;BA7803    ;load LPT1 port address into DX
          out    dx, al                    ;EE        ;send 8-bit sample data to COVOX device
+
            out    dx, al                    ;EE        ;send 8-bit sample data to COVOX device
 
</syntaxhighlight>
 
</syntaxhighlight>
  
 
'''HINT:''' It is also possible to have 2 COVOX adapters, e.g. connected to LPT1 and LPT2, and send out two samples in parallel for stereo output, one sample for the left and one for the right channel.
 
'''HINT:''' It is also possible to have 2 COVOX adapters, e.g. connected to LPT1 and LPT2, and send out two samples in parallel for stereo output, one sample for the left and one for the right channel.
  
But for a good audio playback quality, the time between two LPT1 writes should match the sampling rate pretty well. Also the bytebeat algorithm will require a time counter as input, which reflects the current sample number. This is why we need a good time source.
+
'''HINT:''' It is also possible to play a bytebeat over the PC speaker in lower quality, as described here: [[Output#PC_Speaker_variant]]
 +
 
 +
But for a good audio playback quality, the time between two LPT1 writes should match the sampling rate quite well. Also, the bytebeat algorithm needs a time counter as input that reflects the current sample number. Therefore we need a good time source.
  
 
= Time Source =
 
= Time Source =
For playing data through the COVOX LPTDAC, we need a pretty accurate timer. Typical sampling rate would be 8 kHz, but also higher values could be used. Lower values may also work in some special cases. There are multiple possible options to get such a time counter:
+
To play data through the COVOX LPTDAC, we need a fairly accurate timer. A typical sample rate would be 8 kHz, but higher values can also be used. Lower values may also work in some special cases, but then very lo-fi. There are several ways to get such a timer:
 
* Timer Interrupt
 
* Timer Interrupt
* Poll BIOS Counter  
+
* Poll BIOS Counter
 
* Poll PIT Counter
 
* Poll PIT Counter
 
* Alternative Options
 
* Alternative Options
Line 21: Line 25:
 
== Timer Interrupt ==
 
== Timer Interrupt ==
 
As described here: [[Output#Advanced_PC_Speaker_and_COVOX_sound_via_interrupt]].
 
As described here: [[Output#Advanced_PC_Speaker_and_COVOX_sound_via_interrupt]].
While this is the most accurate way to drive the COVOX, it is very likely also the most expensive one. Setting up the new interrupt handler (let's even ignore the backup and restore of the old handler), the overhead of the handler itself and the problem of exchanging any counters between handler and firmware. All this will cost bytes. In most cases it will require less size if the timer is polled instead, like in all other variants described next. But it must be also clear that the polling approach will make it necessary to perform this task at a higher frequency than the actual samplingrate, i.e. 8kHz. This requires the polling to happen inside an inner loop, e.g. after each pixel update.
+
While this is the most accurate way to drive the COVOX, it is probably also the most expensive. Setting up the new interrupt handler (ignoring even the backup and restore of the old handler), the overhead of the handler itself and the problem of exchanging any data between the handler and the non-interrupt code. All of this will cost bytes. In most cases, it will take less size to poll the timer instead, as in all the other variants described next. But it must be also be clear that the polling approach makes it necessary to perform this task at a higher frequency than the actual sampling rate, i.e. 8kHz. This requires the polling to be done in an inner loop, e.g. after every pixel update, which can eat up quite a bit of performance.
  
 
== Poll BIOS Counter ==
 
== Poll BIOS Counter ==
The DWORD at [0:0x046C] holds the number of BIOS timer ticks. Typically INT8 will run at a frequency of 18.2 Hz. On each call the default interrupt handler will increase this value by 1. Reusing this default handler will prevent the costs for setting up an own handler just to implement the counter increment logic. So, a simple solution to get a timer counter is to reconfigure the PIT for an 8kHz rate. This will trigger the default INT8 handler 8000 times per second. Then this counter can be polled periodically inside the inner loop. Once its LSB changes, another sample must be generated, also using this counter value as sample counter, and sent to LPT1. This could look like this:
+
The DWORD at [0:0x046C] holds the number of BIOS timer ticks. Typically, INT8 runs at a frequency of 18.2 Hz. On each call the default interrupt handler increments this value by 1. Reusing this default handler avoids the cost of writing a custom handler just to implement the counter incrementing logic. So, a simple solution to get a timer counter is to reconfigure the PIT for an 8kHz rate. This will trigger the default INT8 handler 8000 times per second. Then this counter can be polled periodically inside the inner loop. As soon as its LSB changes, another sample must be generated, also using this counter value as sample counter, and sent to LPT1. This could look like this:
  
 
<syntaxhighlight lang="nasm">
 
<syntaxhighlight lang="nasm">
          mov    al, 149                  ;B095      ;program PIT #0 to ~8kHz (1.19318181818 MHz / 149 = 8007.93 Hz)
+
            mov    al, 149                  ;B095      ;program PIT #0 to ~8kHz (1.19318181818 MHz / 149 = 8007.93 Hz)
          out    40h, al                  ;E640      ;
+
            out    40h, al                  ;E640      ;
          salc                              ;D6        ;  AL := 0 (if CF=0)
+
            salc                              ;D6        ;  AL := 0 (if CF=0)
          out    40h, al                  ;E640      ;
+
            out    40h, al                  ;E640      ;
  
          ; ...
+
            ; ...
  
suplp:    mov    al, [046Ch]              ;A0xxxx    ;load LSB from BIOS timer, assuming DS=0
+
  suplp:    mov    al, [046Ch]              ;A0xxxx    ;load LSB from BIOS timer, assuming DS=0
_tcmp:    cmp    al, 0x55                  ;2C??      ;did timer value changed?
+
  _tcmp:    cmp    al, 0x55                  ;2C??      ;did timer value changed?
          jz      ntick                    ;74xx      ;  no -> skip audio
+
            jz      ntick                    ;74xx      ;  no -> skip audio
          mov    [_tcmp+1], al            ;A2xxxx    ;remember last BIOS timer value (selfmodifying code)
+
            mov    [_tcmp+1], al            ;A2xxxx    ;remember last BIOS timer value (selfmodifying code)
  
          inc    ebp                      ;6645      ;increment 32-bit timer counter
+
            inc    ebp                      ;6645      ;increment 32-bit timer counter
  
          ; ... set AL to next audio sample based on EBP ...
+
            ; ... set AL to next audio sample based on EBP ...
  
          mov    dx, 0378h                ;BA7803    ;load LPT1 port address into DX
+
            mov    dx, 0378h                ;BA7803    ;load LPT1 port address into DX
          out    dx, al                    ;EE        ;send 8-bit sample data to COVOX device
+
            out    dx, al                    ;EE        ;send 8-bit sample data to COVOX device
  
ntick:
+
  ntick:
          ; ...
+
            ; ...
  
          jmp    short suplp
+
            jmp    short suplp
 
</syntaxhighlight>
 
</syntaxhighlight>
  
'''HINT:''' Instead incrementing your own 16- or 32-bit timer counter (EBP inside the above example) someone could also use the BIOS timer counter itself, located at DWORD [0:0x046C]. Whatever fits better.
+
'''HINT:''' Instead incrementing your own 16- or 32-bit timer counter (EBP inside the above example) someone could also use the BIOS timer counter itself, located at DWORD [0:0x046C]. Whatever suits you better.
  
 
== Poll PIT Counter ==
 
== Poll PIT Counter ==
 +
 
(...more soon...)
 
(...more soon...)
 +
 +
[http://wiki.osdev.org/Programmable_Interval_Timer Programmable Interval Timer]
 +
 +
 +
This solution may result in the shortest code. A disadvantage is the very slow access to the PIT register. On modern chipsets the PIT 8254 is emulated by the southbridge.
  
 
== Alternative Options ==
 
== Alternative Options ==
Another option to get an accurate time is to read the processor's time-stamp counter using the [https://www.felixcloutier.com/x86/rdtsc RDTSC] instruction.
+
Another way to get an accurate time is to read the processor's timestamp counter using the [https://www.felixcloutier.com/x86/rdtsc RDTSC] instruction.
  
 
= The Bytebeat =
 
= The Bytebeat =
[[Bytebeat]] is simply spoken an 8-bit uncompressed audiowave stream at any fixed sampling rate, that is expressed by a single, more or less complex, mathematical function ''f(t)'', where ''t'' is the number of the sample. It will start generation the first sample for ''t=0'' and, in case of an 8kHz samplingrate, will reach the sample ''f(8000)'' after exactly 1 second.
+
[[Bytebeat]] is simply an 8-bit uncompressed audio wave stream at any fixed sampling rate, that is expressed by a single, more or less complex, mathematical function ''f(t)'', where ''t'' is the time represented by the number of the sample, which is also equal to the byte offset within the stream. It will start generating the first sample for ''t=0'' and will play the sample ''f(8000)'' after exactly 1 second if the sampling rate is 8kHz. Since this is actually a softsynth (music synthesis done by software), in theory any sound or music can be approximated in this way. There are no limits except the increasing complexity of the resulting function.
 +
 
 +
In general, any bytebeat algorithm can be implemented to generate the next sample to be written to the COVOX LPT1. But in terms of size, a bytebeat algorithm is better suited if it's formula can be implemented in as few bytes as possible. [https://www.pouet.net/prod.php?which=96071 MUZAK58] is to a certain extent a generic or reusable background music player. Of course it is also possible to modify the player itself to change the music style (more on that later), but the source of the music played comes from sequence tables stored in memory. Changing these words will result in completely new music being played. If you use more words for this table, the song becomes more complex so that it does not repeat itself so quickly. The sequence table of this reference example is 10 bytes long and looks like this:
 +
<syntaxhighlight lang="nasm">
 +
  seqtbl:  dw      0x1413
 +
            dw      0x6C66
 +
            dw      0x2242
 +
            dw      0x6495
 +
            dw      0x4484
 +
</syntaxhighlight>
 +
 
 +
The method used to achieve a size-optimized, yet flexible bytebeat is described next.
 +
 
 +
== Music Sequencer ==
 +
As you can read in many bytebeat tutorials, like [[Steady_On_Tim]] by Gasman or in the paper published by Viznut, the basic idea to generate a melody with a bytebeat is to modify some basic waveform oscillator function ''o(t)'', like sawtooth, square, triangle or sine waveforms, by multiplying the time parameter ''t'' by a scale factor ''p'': ''f(t) = o(t*p)''. This multiplication factor modulates the pitch. If we then use a sequence table ''s(t)'' to replace ''p'', which will change the pitch of our oscillator over time, we can already play a simple melody using this formula: ''f(t) = o(t*s(t))''.
 +
 
 +
Accordingly, we implement a single pitch-modulated oscillator with sawtooth waveform:
 +
<syntaxhighlight lang="javascript">
 +
  (t*[1,2,4,8,16,8,4,2][(t>>11)%8])&255
 +
</syntaxhighlight>
 +
([https://bytebeat.demozoo.org/#t=0&e=0&s=8000&bb=5d00000100250000000000000000141d0145bdb13c9159728aa3da7e69b2fed6480708c016cc4525c68500003024ed9473119236434ffff34df800 listen to this bytebeat here])
 +
 
 +
To my knowledge, the above code is the simplest way to play a melody in a bytebeat, as long as it is defined by a sequence table. This example demonstrates a sequence of 8 steps, where ''S=8'' specifies the number of steps. Each step changes the pitch of the resulting sawtooth waveform.
 +
 
 +
Replacing the trailing "&255" (implicit for a bytebeat) by "&128" would change the sawtooth waveform to a square wave function:
 +
<syntaxhighlight lang="javascript">
 +
  (t*[1,2,4,8,16,8,4,2][(t>>11)%8])&128
 +
</syntaxhighlight>
 +
([https://bytebeat.demozoo.org/#t=0&e=0&s=8000&bb=5d00000100250000000000000000141d0145bdb13c9159728aa3da7e69b2fed6480708c016cc4525c68500003024ed9dc1d9b391be7fffcfb76000 listen to this bytebeat here])
 +
 
 +
Other waveforms are also possible. Here we use the sine function:
 +
<syntaxhighlight lang="javascript">
 +
  sin(t*[1,2,4,8,16,8,4,2][(t>>11)%8]/14)*127+127
 +
</syntaxhighlight>
 +
([https://bytebeat.demozoo.org/#t=0&e=0&s=8000&bb=5d000001002f0000000000000000399a4a1a8bae05d329e28520c901366398da262860ce3ea49cc63383ad4015395d56ced153c2b5712a75c831dca7c583fffcb53000 listen to this bytebeat here])
 +
 
 +
Or distortion-like effects can be applied, as shown here using the XOR operator in the last step:
 +
<syntaxhighlight lang="javascript">
 +
  (t*[1,2,4,8,16,8,4,2][(t>>11)%8])^64
 +
</syntaxhighlight>
 +
([https://bytebeat.demozoo.org/#t=0&e=0&s=8000&bb=5d00000100240000000000000000141d0145bdb13c9159728aa3da7e69b2fed6480708c016cc4525c68500003024f067719de4f113fffded5400 listen to this bytebeat here])
 +
 
 +
'''HINT:''' Instead of using only one modulated oscillator or one sequence, also 2 or more can be used and combined, e.g.: ''f(t) = (o0(t*s0(t)) + o1(t*s1(t))) / 2''
 +
 
 +
So far, these are well known techniques used in bytebeat algorithms. With this knowledge we can already start to implement a bytebeat player with a sequence table containing as many steps ''S'' as we need for our composition, or at least as many as we can handle due to size constraints. The more steps ''S'' we use, the longer the song will last before it repeats. The larger the value of each sequence step can be, with a value range limited by ''log2(M)'' bits per step, the larger the range of notes we can end up using. Both parameters ''S'' and ''M'' will define the final byte size of our sequence table.
 +
 
 +
== Cascaded Sequences ==
 +
 
 +
The problem we will face with this approach in sizecoding products is, that such a sequence table will grow quickly and end up consuming quite a few bytes. Our reference example [https://www.pouet.net/prod.php?which=96071 MUZAK58] uses 10 bytes for all it's song data. Using our knowledge at this point, we would be able to divide these 10 bytes into a sequence of 40 steps (''S = 40''), as long as the limited range per step given by 4 bits (''M = 2^4 = 16'') is sufficient for the music composition we have in mind. 40 steps is not less, but the severely limited range of less than 1 octave will limit us to what we would most likely end up calling a children's song. Instead, the reference muzak sounds like it is made up of at least a multiple of 32 steps before it starts to repeat. And the octave range does not seem to be limited to a single octave. What the hell is going on here? How is it possible to compress the sequence table like this?
 +
 
 +
The trick wiRe discovered here is to cascade multiple sequencers and combine all their outputs into a single sequence with a much longer sequence duration (before repetition) and a wider pitch range per sequence step: ''s(t) = (s0(t) * s1(t) * s2(t) * ...) / C''
  
In general, any bytebeat algorithm can be implemented now to present the next sample inside register AL, as it is required by the previous example code. But in respect to the size, a bytebeat algorithm is better suited if it's formula is as simple as possible, implementation-wise. The method used to achieve a size-optimized, but still flexible bytebeat formula is described next.
+
But this limits the composer's freedom, you might think. This is true! But you will see that the results you get are not as bad as you might think at first, in fact the resulting limitation can even give new impulses to creativity; something we already know as the sizecoding effect.
  
== Music Sequence ==
+
Here is an attempt to visualize how such an cascaded sequence evolves over time, showing the sequence table index of 5 cascaded sequencers in relation to the sequence step counter. ''O'' is the time divider to derive the step count ''stepcnt = t / O'' with ''O = log2(ticks_per_step)'' to avoid any integer division.
 +
<syntaxhighlight lang="text">
 +
  +--------+--------------+---------------------------------------------+
 +
  | stpcnt | (t>>O)      | 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 ... |
 +
  +--------+--------------+---------------------------------------------+
 +
  | seq0ix | (t>>(O+0))%S | 0 1 2 3 0 1 2 3 0 1 2 3 0 1 2 3 0 1 2 3 ... |
 +
  | seq1ix | (t>>(O+1))%S | 0 0 1 1 2 2 3 3 0 0 1 1 2 2 3 3 0 0 1 1 ... |
 +
  | seq2ix | (t>>(O+2))%S | 0 0 0 0 1 1 1 1 2 2 2 2 3 3 3 3 0 0 0 0 ... |
 +
  | seq3ix | (t>>(O+3))%S | 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 2 2 2 2 ... |
 +
  | seq4ix | (t>>(O+4))%S | 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 ... |
 +
  +--------+--------------+---------------------------------------------+
 +
                                                              with S=4
 +
</syntaxhighlight>
  
(...more soon...)
+
In combination with our oscillator function, the whole bytebeat will look like this: ''f(t) = o( (t * s0(t) * s1(t) * s2(t) * ...) / C )''
  
== Final Bytebeat ==
+
== Final Bytebeat Implementation ==
  
Putting this all together, we can now start composing one music tune this way:
+
The basic idea is to design the bytebeat algorithm as a loop that performs the same operations on each iteration to achieve the smallest possible size. If we decide to use a simple sawtooth oscillator, we have an easy game with our oscillator function being as simple as ''o(t) = t''. As we found out, the function ''f(t)'' is then only comprised of ''N+1'' terms, all multiplied together like this: ''f(t) = (t * s0(t) * s1(t) * s2(t) * ... * sN-1(t)) / C''. On each loop iteration of the final bytebeat player, the current sequencer ''sN(t)'' is evaluated by calculating the current sequencer index and looking it up in the sequencer table. The value stored there for this step is then multiplied towards the total result of ''f(t)''. If we keep ''M'' low, then even a 16-bit multiplication is sufficient. The final scaling factor ''C'' depends on the range of the values derived from the sequencer functions ''sN(t)''. Scaling is done as a shift-right operation in the last step. And with some tweaking of the sequencer step values can even be forced to result in a shift-right by 8.
  
 +
The reference implementation uses a total of 5 cascaded sequencers: ''N=5''. The table of each sequencer was chosen to store 4 sequence steps: ''S=4''. Which sequencer step to index is then based on 2 bits of the parameter ''t''. The shortest sequencer step time for this song was chosen to be ''2^10'' samples, which gives us ''O=10''. This means that the lookup index for each sequencer ''i'' with ''0 <= i < N'' is derived by ''(t>>(O+i))%S''. Each step value is limited by ''M=16''. Putting all this together, we can now start composing a song in this way:
 
<syntaxhighlight lang="cpp">
 
<syntaxhighlight lang="cpp">
 
   static constexpr unsigned O = 10;
 
   static constexpr unsigned O = 10;
 
   static constexpr unsigned N = 5;
 
   static constexpr unsigned N = 5;
 
   static constexpr unsigned S = 4;
 
   static constexpr unsigned S = 4;
   static constexpr unsigned A = 256;
+
   static constexpr unsigned C = 256;
  
 
   static constexpr uint8_t seqtbl[N][S] = { {3,1,4,1}, {6,6,12,6}, {2,4,2,2}, {5,9,4,6}, {4,8,4,4} };
 
   static constexpr uint8_t seqtbl[N][S] = { {3,1,4,1}, {6,6,12,6}, {2,4,2,2}, {5,9,4,6}, {4,8,4,4} };
  
   uint8_t get_sample( uint16_t t ) {
+
   uint8_t get_next_sample( uint16_t t ) {
 
     for( unsigned i = 0; i < N; i++ ) t *= seqtbl[i][(t>>(O+i))%S];
 
     for( unsigned i = 0; i < N; i++ ) t *= seqtbl[i][(t>>(O+i))%S];
     return static_cast<uint8_t>(t / A);
+
     return static_cast<uint8_t>(t / C);
 
   }
 
   }
 
</syntaxhighlight>
 
</syntaxhighlight>
 +
Or the same thing written in Javascript:
 +
<syntaxhighlight lang="javascript">
 +
t
 +
* [3,1,4,1][3&t>>10]
 +
* [6,6,12,6][3&t>>11]
 +
* [2,4,2,2][3&t>>12]
 +
* [5,9,4,6][3&t>>13]
 +
* [4,8,4,4][3&t>>14] >> 8
 +
</syntaxhighlight>
 +
([https://bytebeat.demozoo.org/#t=0&e=0&s=8000&v=circles&bb=5d000001007000000000000000003a028140b2901c8f2d314244236cb35b1c788f43a8bd95752d36006aa55dbc6cdcbeb9b5eebb4a5495e65c56d4efcd7d11ba349adaa5ca64f88abeeec07f8c411feb6be3fcc21580 listen to this bytebeat here])
 +
 +
= The Sourcecode =
 +
 +
With all these parameters carefully chosen, the final bytebeat implementation and sequence tables will be very small. Here is the commented source code of [https://www.pouet.net/prod.php?which=96071 MUZAK58]:
 +
<syntaxhighlight lang="nasm">
 +
          ;-----------------------------------
 +
          ; MUZAK58 by wiRe/NpM
 +
          ;-----------------------------------
 +
            section .text
 +
            org    100h
 +
 +
          ;--------------------------------- ;---------- ;muzak sequence table
 +
  seqtbl:  dw      0x1413                    ;1314      ;  t * [3,1,4,1][3&t>>10]      ;! 1314      adc dx,[si]
 +
            dw      0x6C66                    ;666C      ;    * [6,6,12,6][3&t>>11]      ;! 666C      o32 insb
 +
            dw      0x2242                    ;4222      ;    * [2,4,2,2][3&t>>12]      ;! 42        inc dx
 +
            dw      0x6495                    ;9564      ;    * [5,9,4,6][3&t>>13]      ;! 22956484  and dl,[di-0x7b9c]
 +
            dw      0x4484                    ;8444      ;    * [4,8,4,4][3&t>>14] >> 8  ;! 44        inc sp
 +
 +
            mov    al, 0b00010000            ;B010      ;write 8253/8254 PIT command/mode register: resets PIT channel #0
 +
            out    43h, al                  ;E643      ;  [7:6] channel #0, [5:4] LSB only, [3:1] mode0 (one-shot), [0] 16-bit binary
 +
 +
          ;--------------------------------- ;---------- ;present next audio sample (DX:BX = 32-bit sample counter)
 +
  bbeat:    add    al, 149                  ;04xx      ;  calculate new timer period (AL = 42..148)
 +
            out    40h, al                  ;E640      ;  rearm timer
 +
 +
            inc    bx                        ;43        ;  increment 16-bit timer counter
 +
 +
            pusha                            ;60        ;  store all registers
 +
          ;mov    si, seqtbl                ;BExxxx    ;  load address of sequence table into SI (here SI already points to seqtbl by default)
 +
            mov    dx, bx                    ;89DA      ;  load start value into DX
 +
            mov    cl, 5                    ;B1xx      ;  init index counter inside CX (CH must be zero already!)
 +
  bbeat_lp: push    cx                        ;51        ;  store CX counter
 +
            mov    cl, 01100b                ;B1xx      ;  get bit sequence from time into CL
 +
            and    cl, bh                    ;20F9      ;    CL := offset to 1 out of 4 entries
 +
            lodsw                            ;AD        ;  load next sequence table entry (AX := DS:[SI]; SI := SI + 2)
 +
            ror    ax, cl                    ;D3C8      ;  select sequence entry at bit-offset 0, 4, 8 or 12
 +
            and    ax, 01111b                ;83E00F    ;  each sequence entry is 4 bits only (AX &= 15)
 +
            mul    dx                        ;F7E2      ;  multiply (DX:AX := AX ∗ DX)
 +
            xchg    ax, dx                    ;92        ;    DX := updated 16-bit sample
 +
            pop    cx                        ;59        ;  restore CX counter
 +
            shr    bx, 1                    ;D1ED      ;  get next bit sequence from time
 +
            loop    bbeat_lp                  ;E2xx      ;  loop until all bits are out
  
The above code can be executed here live: [https://bytebeat.demozoo.org/#t=0&e=0&s=8000&v=circles&bb=5d000001007000000000000000003a028140b2901c8f2d314244236cb35b1c788f43a8bd95752d36006aa55dbc6cdcbeb9b5eebb4a5495e65c56d4efcd7d11ba349adaa5ca64f88abeeec07f8c411feb6be3fcc21580 bytebeat.demozoo.org]
+
            mov    al, dh                    ;88F0      ;  get sample data into AL
 +
            mov    dx, 0378h                ;BA7803    ;  load LPT1 port address into DX
 +
            out    dx, al                    ;EE        ;  send 8-bit sample data to COVOX device
 +
            popa                              ;61        ;  restore all registers (especially BX, CX, DX, SI)
  
(...more soon...)
+
  main_lp: ;--------------------------------- ;---------- ;read 8253/8254 PIT ch#0 counter value (ch#0 must be reconfigured to 0b00010000)
 +
            in      al, 40h                  ;E440      ;  read low-byte
 +
            cmp    al, 148                  ;3Cxx      ;  did timer counter overflowed to 149..0FFh?
 +
            jo      short bbeat              ;71xx      ;    yes -> play
 +
 
 +
          ;... the place for your intro
 +
 
 +
            jmp    short main_lp            ;75xx      ;  loop forever
 +
</syntaxhighlight>

Latest revision as of 02:06, 12 June 2024

MUZAK58 was created by wiRe/Napalm and is 58 bytes in size. You can watch the video here. It won 4th place at the Lovebyte 2024 demoscene party and is a pure tech demo of a size-optimized bytebeat player for MSDOS and COVOX LPT-DAC, as also used in other sizecoding releases by it's inventor. This page describes how this player works and how it can be adopted for other releases. Feel free to use these ideas and techniques in your own sizecoding productions, but please give a credit to wiRe then. Commercial use is not allowed.

Before you continue, make sure to read Bytebeat and Output#Producing_sound for the basics.

COVOX LPT-DAC

The COVOX LPT-DAC, also called Disney Sound, is an 8-bit digital-to-analog converter (DAC) connected to the 8 data output lines of a parallel printer port (LPT). Typically it was assembled using a simple R-2R resistor ladder to perform the conversion to an analog signal, so it was very cheap to build such a hardware device on your own at that time. Playing back an 8-bit sample, such as the output of a bytebeat algorithm, through COVOX LPT DAC is a very simple task. Assuming the next sample value is in register AL, then this is all you need to do:

            mov     dx, 0378h                 ;BA7803     ;load LPT1 port address into DX
            out     dx, al                    ;EE         ;send 8-bit sample data to COVOX device

HINT: It is also possible to have 2 COVOX adapters, e.g. connected to LPT1 and LPT2, and send out two samples in parallel for stereo output, one sample for the left and one for the right channel.

HINT: It is also possible to play a bytebeat over the PC speaker in lower quality, as described here: Output#PC_Speaker_variant

But for a good audio playback quality, the time between two LPT1 writes should match the sampling rate quite well. Also, the bytebeat algorithm needs a time counter as input that reflects the current sample number. Therefore we need a good time source.

Time Source

To play data through the COVOX LPTDAC, we need a fairly accurate timer. A typical sample rate would be 8 kHz, but higher values can also be used. Lower values may also work in some special cases, but then very lo-fi. There are several ways to get such a timer:

  • Timer Interrupt
  • Poll BIOS Counter
  • Poll PIT Counter
  • Alternative Options

Timer Interrupt

As described here: Output#Advanced_PC_Speaker_and_COVOX_sound_via_interrupt. While this is the most accurate way to drive the COVOX, it is probably also the most expensive. Setting up the new interrupt handler (ignoring even the backup and restore of the old handler), the overhead of the handler itself and the problem of exchanging any data between the handler and the non-interrupt code. All of this will cost bytes. In most cases, it will take less size to poll the timer instead, as in all the other variants described next. But it must be also be clear that the polling approach makes it necessary to perform this task at a higher frequency than the actual sampling rate, i.e. 8kHz. This requires the polling to be done in an inner loop, e.g. after every pixel update, which can eat up quite a bit of performance.

Poll BIOS Counter

The DWORD at [0:0x046C] holds the number of BIOS timer ticks. Typically, INT8 runs at a frequency of 18.2 Hz. On each call the default interrupt handler increments this value by 1. Reusing this default handler avoids the cost of writing a custom handler just to implement the counter incrementing logic. So, a simple solution to get a timer counter is to reconfigure the PIT for an 8kHz rate. This will trigger the default INT8 handler 8000 times per second. Then this counter can be polled periodically inside the inner loop. As soon as its LSB changes, another sample must be generated, also using this counter value as sample counter, and sent to LPT1. This could look like this:

            mov     al, 149                   ;B095       ;program PIT #0 to ~8kHz (1.19318181818 MHz / 149 = 8007.93 Hz)
            out     40h, al                   ;E640       ;
            salc                              ;D6         ;  AL := 0 (if CF=0)
            out     40h, al                   ;E640       ;

            ; ...

  suplp:    mov     al, [046Ch]               ;A0xxxx     ;load LSB from BIOS timer, assuming DS=0
  _tcmp:    cmp     al, 0x55                  ;2C??       ;did timer value changed?
            jz      ntick                     ;74xx       ;  no -> skip audio
            mov     [_tcmp+1], al             ;A2xxxx     ;remember last BIOS timer value (selfmodifying code)

            inc     ebp                       ;6645       ;increment 32-bit timer counter

            ; ... set AL to next audio sample based on EBP ...

            mov     dx, 0378h                 ;BA7803     ;load LPT1 port address into DX
            out     dx, al                    ;EE         ;send 8-bit sample data to COVOX device

  ntick:
            ; ...

            jmp     short suplp

HINT: Instead incrementing your own 16- or 32-bit timer counter (EBP inside the above example) someone could also use the BIOS timer counter itself, located at DWORD [0:0x046C]. Whatever suits you better.

Poll PIT Counter

(...more soon...)

Programmable Interval Timer


This solution may result in the shortest code. A disadvantage is the very slow access to the PIT register. On modern chipsets the PIT 8254 is emulated by the southbridge.

Alternative Options

Another way to get an accurate time is to read the processor's timestamp counter using the RDTSC instruction.

The Bytebeat

Bytebeat is simply an 8-bit uncompressed audio wave stream at any fixed sampling rate, that is expressed by a single, more or less complex, mathematical function f(t), where t is the time represented by the number of the sample, which is also equal to the byte offset within the stream. It will start generating the first sample for t=0 and will play the sample f(8000) after exactly 1 second if the sampling rate is 8kHz. Since this is actually a softsynth (music synthesis done by software), in theory any sound or music can be approximated in this way. There are no limits except the increasing complexity of the resulting function.

In general, any bytebeat algorithm can be implemented to generate the next sample to be written to the COVOX LPT1. But in terms of size, a bytebeat algorithm is better suited if it's formula can be implemented in as few bytes as possible. MUZAK58 is to a certain extent a generic or reusable background music player. Of course it is also possible to modify the player itself to change the music style (more on that later), but the source of the music played comes from sequence tables stored in memory. Changing these words will result in completely new music being played. If you use more words for this table, the song becomes more complex so that it does not repeat itself so quickly. The sequence table of this reference example is 10 bytes long and looks like this:

  seqtbl:   dw      0x1413
            dw      0x6C66
            dw      0x2242
            dw      0x6495
            dw      0x4484

The method used to achieve a size-optimized, yet flexible bytebeat is described next.

Music Sequencer

As you can read in many bytebeat tutorials, like Steady_On_Tim by Gasman or in the paper published by Viznut, the basic idea to generate a melody with a bytebeat is to modify some basic waveform oscillator function o(t), like sawtooth, square, triangle or sine waveforms, by multiplying the time parameter t by a scale factor p: f(t) = o(t*p). This multiplication factor modulates the pitch. If we then use a sequence table s(t) to replace p, which will change the pitch of our oscillator over time, we can already play a simple melody using this formula: f(t) = o(t*s(t)).

Accordingly, we implement a single pitch-modulated oscillator with sawtooth waveform:

  (t*[1,2,4,8,16,8,4,2][(t>>11)%8])&255

(listen to this bytebeat here)

To my knowledge, the above code is the simplest way to play a melody in a bytebeat, as long as it is defined by a sequence table. This example demonstrates a sequence of 8 steps, where S=8 specifies the number of steps. Each step changes the pitch of the resulting sawtooth waveform.

Replacing the trailing "&255" (implicit for a bytebeat) by "&128" would change the sawtooth waveform to a square wave function:

  (t*[1,2,4,8,16,8,4,2][(t>>11)%8])&128

(listen to this bytebeat here)

Other waveforms are also possible. Here we use the sine function:

  sin(t*[1,2,4,8,16,8,4,2][(t>>11)%8]/14)*127+127

(listen to this bytebeat here)

Or distortion-like effects can be applied, as shown here using the XOR operator in the last step:

  (t*[1,2,4,8,16,8,4,2][(t>>11)%8])^64

(listen to this bytebeat here)

HINT: Instead of using only one modulated oscillator or one sequence, also 2 or more can be used and combined, e.g.: f(t) = (o0(t*s0(t)) + o1(t*s1(t))) / 2

So far, these are well known techniques used in bytebeat algorithms. With this knowledge we can already start to implement a bytebeat player with a sequence table containing as many steps S as we need for our composition, or at least as many as we can handle due to size constraints. The more steps S we use, the longer the song will last before it repeats. The larger the value of each sequence step can be, with a value range limited by log2(M) bits per step, the larger the range of notes we can end up using. Both parameters S and M will define the final byte size of our sequence table.

Cascaded Sequences

The problem we will face with this approach in sizecoding products is, that such a sequence table will grow quickly and end up consuming quite a few bytes. Our reference example MUZAK58 uses 10 bytes for all it's song data. Using our knowledge at this point, we would be able to divide these 10 bytes into a sequence of 40 steps (S = 40), as long as the limited range per step given by 4 bits (M = 2^4 = 16) is sufficient for the music composition we have in mind. 40 steps is not less, but the severely limited range of less than 1 octave will limit us to what we would most likely end up calling a children's song. Instead, the reference muzak sounds like it is made up of at least a multiple of 32 steps before it starts to repeat. And the octave range does not seem to be limited to a single octave. What the hell is going on here? How is it possible to compress the sequence table like this?

The trick wiRe discovered here is to cascade multiple sequencers and combine all their outputs into a single sequence with a much longer sequence duration (before repetition) and a wider pitch range per sequence step: s(t) = (s0(t) * s1(t) * s2(t) * ...) / C

But this limits the composer's freedom, you might think. This is true! But you will see that the results you get are not as bad as you might think at first, in fact the resulting limitation can even give new impulses to creativity; something we already know as the sizecoding effect.

Here is an attempt to visualize how such an cascaded sequence evolves over time, showing the sequence table index of 5 cascaded sequencers in relation to the sequence step counter. O is the time divider to derive the step count stepcnt = t / O with O = log2(ticks_per_step) to avoid any integer division.

  +--------+--------------+---------------------------------------------+
  | stpcnt | (t>>O)       | 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 ... |
  +--------+--------------+---------------------------------------------+
  | seq0ix | (t>>(O+0))%S | 0 1 2 3 0 1 2 3 0 1 2 3 0 1 2 3 0 1 2 3 ... |
  | seq1ix | (t>>(O+1))%S | 0 0 1 1 2 2 3 3 0 0 1 1 2 2 3 3 0 0 1 1 ... |
  | seq2ix | (t>>(O+2))%S | 0 0 0 0 1 1 1 1 2 2 2 2 3 3 3 3 0 0 0 0 ... |
  | seq3ix | (t>>(O+3))%S | 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 2 2 2 2 ... |
  | seq4ix | (t>>(O+4))%S | 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 ... |
  +--------+--------------+---------------------------------------------+
                                                               with S=4

In combination with our oscillator function, the whole bytebeat will look like this: f(t) = o( (t * s0(t) * s1(t) * s2(t) * ...) / C )

Final Bytebeat Implementation

The basic idea is to design the bytebeat algorithm as a loop that performs the same operations on each iteration to achieve the smallest possible size. If we decide to use a simple sawtooth oscillator, we have an easy game with our oscillator function being as simple as o(t) = t. As we found out, the function f(t) is then only comprised of N+1 terms, all multiplied together like this: f(t) = (t * s0(t) * s1(t) * s2(t) * ... * sN-1(t)) / C. On each loop iteration of the final bytebeat player, the current sequencer sN(t) is evaluated by calculating the current sequencer index and looking it up in the sequencer table. The value stored there for this step is then multiplied towards the total result of f(t). If we keep M low, then even a 16-bit multiplication is sufficient. The final scaling factor C depends on the range of the values derived from the sequencer functions sN(t). Scaling is done as a shift-right operation in the last step. And with some tweaking of the sequencer step values can even be forced to result in a shift-right by 8.

The reference implementation uses a total of 5 cascaded sequencers: N=5. The table of each sequencer was chosen to store 4 sequence steps: S=4. Which sequencer step to index is then based on 2 bits of the parameter t. The shortest sequencer step time for this song was chosen to be 2^10 samples, which gives us O=10. This means that the lookup index for each sequencer i with 0 <= i < N is derived by (t>>(O+i))%S. Each step value is limited by M=16. Putting all this together, we can now start composing a song in this way:

  static constexpr unsigned O = 10;
  static constexpr unsigned N = 5;
  static constexpr unsigned S = 4;
  static constexpr unsigned C = 256;

  static constexpr uint8_t seqtbl[N][S] = { {3,1,4,1}, {6,6,12,6}, {2,4,2,2}, {5,9,4,6}, {4,8,4,4} };

  uint8_t get_next_sample( uint16_t t ) {
    for( unsigned i = 0; i < N; i++ ) t *= seqtbl[i][(t>>(O+i))%S];
    return static_cast<uint8_t>(t / C);
  }

Or the same thing written in Javascript:

t
* [3,1,4,1][3&t>>10]
* [6,6,12,6][3&t>>11]
* [2,4,2,2][3&t>>12]
* [5,9,4,6][3&t>>13]
* [4,8,4,4][3&t>>14] >> 8

(listen to this bytebeat here)

The Sourcecode

With all these parameters carefully chosen, the final bytebeat implementation and sequence tables will be very small. Here is the commented source code of MUZAK58:

           ;-----------------------------------
           ; MUZAK58 by wiRe/NpM
           ;-----------------------------------
            section .text
            org     100h

           ;--------------------------------- ;---------- ;muzak sequence table
  seqtbl:   dw      0x1413                    ;1314       ;  t * [3,1,4,1][3&t>>10]       ;! 1314       adc dx,[si]
            dw      0x6C66                    ;666C       ;    * [6,6,12,6][3&t>>11]      ;! 666C       o32 insb
            dw      0x2242                    ;4222       ;    * [2,4,2,2][3&t>>12]       ;! 42         inc dx
            dw      0x6495                    ;9564       ;    * [5,9,4,6][3&t>>13]       ;! 22956484   and dl,[di-0x7b9c]
            dw      0x4484                    ;8444       ;    * [4,8,4,4][3&t>>14] >> 8  ;! 44         inc sp

            mov     al, 0b00010000            ;B010       ;write 8253/8254 PIT command/mode register: resets PIT channel #0
            out     43h, al                   ;E643       ;  [7:6] channel #0, [5:4] LSB only, [3:1] mode0 (one-shot), [0] 16-bit binary

           ;--------------------------------- ;---------- ;present next audio sample (DX:BX = 32-bit sample counter)
  bbeat:    add     al, 149                   ;04xx       ;  calculate new timer period (AL = 42..148)
            out     40h, al                   ;E640       ;  rearm timer

            inc     bx                        ;43         ;  increment 16-bit timer counter

            pusha                             ;60         ;  store all registers
           ;mov     si, seqtbl                ;BExxxx     ;  load address of sequence table into SI (here SI already points to seqtbl by default)
            mov     dx, bx                    ;89DA       ;  load start value into DX
            mov     cl, 5                     ;B1xx       ;  init index counter inside CX (CH must be zero already!)
  bbeat_lp: push    cx                        ;51         ;  store CX counter
            mov     cl, 01100b                ;B1xx       ;  get bit sequence from time into CL
            and     cl, bh                    ;20F9       ;    CL := offset to 1 out of 4 entries
            lodsw                             ;AD         ;  load next sequence table entry (AX := DS:[SI]; SI := SI + 2)
            ror     ax, cl                    ;D3C8       ;  select sequence entry at bit-offset 0, 4, 8 or 12
            and     ax, 01111b                ;83E00F     ;  each sequence entry is 4 bits only (AX &= 15)
            mul     dx                        ;F7E2       ;  multiply (DX:AX := AX ∗ DX)
            xchg    ax, dx                    ;92         ;    DX := updated 16-bit sample
            pop     cx                        ;59         ;  restore CX counter
            shr     bx, 1                     ;D1ED       ;  get next bit sequence from time
            loop    bbeat_lp                  ;E2xx       ;  loop until all bits are out

            mov     al, dh                    ;88F0       ;  get sample data into AL
            mov     dx, 0378h                 ;BA7803     ;  load LPT1 port address into DX
            out     dx, al                    ;EE         ;  send 8-bit sample data to COVOX device
            popa                              ;61         ;  restore all registers (especially BX, CX, DX, SI)

  main_lp: ;--------------------------------- ;---------- ;read 8253/8254 PIT ch#0 counter value (ch#0 must be reconfigured to 0b00010000)
            in      al, 40h                   ;E440       ;  read low-byte
            cmp     al, 148                   ;3Cxx       ;  did timer counter overflowed to 149..0FFh?
            jo      short bbeat               ;71xx       ;    yes -> play

           ;... the place for your intro

            jmp     short main_lp             ;75xx       ;  loop forever