Difference between revisions of "MUZAK58"

From SizeCoding
Jump to: navigation, search
m
 
(3 intermediate revisions by the same user not shown)
Line 1: Line 1:
[https://www.pouet.net/prod.php?which=96071 MUZAK58] was created by wiRe/Napalm and is 58 bytes in size. Watch the video [https://www.youtube.com/watch?v=AX5OYQzzi1g here]. It achieved 4th place at the Lovebyte 2024 demoparty and is a pure tech demo of a size-optimized bytebeat player for MSDOS and COVOX LPT-DAC, as also used in other sizecoding releases by wiRe. This page will describe how this player works and how it can be adopted for other releases. Feel free to reuse those ideas and techniques in your own sizecoding productions, but please give a credit to wiRe then. Any commercial use is not permitted.
+
[https://www.pouet.net/prod.php?which=96071 MUZAK58] was created by [https://www.pouet.net/user.php?who=106446 wiRe/Napalm] and is 58 bytes in size. You can watch the video [https://www.youtube.com/watch?v=AX5OYQzzi1g here]. It won 4th place at the Lovebyte 2024 demoscene party and is a pure tech demo of a size-optimized bytebeat player for MSDOS and COVOX LPT-DAC, as also used in other sizecoding releases by it's inventor. This page describes how this player works and how it can be adopted for other releases. Feel free to use these ideas and techniques in your own sizecoding productions, but please give a credit to wiRe then. Commercial use is not allowed.
  
 
Before you continue, make sure to read [[Bytebeat]] and [[Output#Producing_sound]] for the basics.
 
Before you continue, make sure to read [[Bytebeat]] and [[Output#Producing_sound]] for the basics.
  
 
= COVOX LPT-DAC =
 
= COVOX LPT-DAC =
The COVOX LPT-DAC, also called Disney Sound, is an 8-bit digital-to-analog converter (DAC) connected to the 8 data-out-lines of a parallel printer port (LPT). Typically it was assembled using a simple R-2R resistor ladder to perform the conversion into an analog signal, why it was very cheap to build such hardware device on your own those days. Playing back an 8-bit sample, like it is the output of a bytebeat algorithm, through COVOX LPT DAC is a very easy task. Assuming the next sample value is inside register AL, then this is all you must do:
+
The COVOX LPT-DAC, also called Disney Sound, is an 8-bit digital-to-analog converter (DAC) connected to the 8 data output lines of a parallel printer port (LPT). Typically it was assembled using a simple R-2R resistor ladder to perform the conversion to an analog signal, so it was very cheap to build such a hardware device on your own at that time. Playing back an 8-bit sample, such as the output of a bytebeat algorithm, through COVOX LPT DAC is a very simple task. Assuming the next sample value is in register AL, then this is all you need to do:
 
<syntaxhighlight lang="nasm">
 
<syntaxhighlight lang="nasm">
 
             mov    dx, 0378h                ;BA7803    ;load LPT1 port address into DX
 
             mov    dx, 0378h                ;BA7803    ;load LPT1 port address into DX
Line 12: Line 12:
 
'''HINT:''' It is also possible to have 2 COVOX adapters, e.g. connected to LPT1 and LPT2, and send out two samples in parallel for stereo output, one sample for the left and one for the right channel.
 
'''HINT:''' It is also possible to have 2 COVOX adapters, e.g. connected to LPT1 and LPT2, and send out two samples in parallel for stereo output, one sample for the left and one for the right channel.
  
'''HINT:''' It is also possible to play a bytebeat through the PC speaker at lower quality, as described here: [[Output#PC_Speaker_variant]]
+
'''HINT:''' It is also possible to play a bytebeat over the PC speaker in lower quality, as described here: [[Output#PC_Speaker_variant]]
  
But for a good audio playback quality, the time between two LPT1 writes should match the sampling rate pretty well. Also the bytebeat algorithm will require a time counter as input, which reflects the current sample number. This is why we need a good time source.
+
But for a good audio playback quality, the time between two LPT1 writes should match the sampling rate quite well. Also, the bytebeat algorithm needs a time counter as input that reflects the current sample number. Therefore we need a good time source.
  
 
= Time Source =
 
= Time Source =
For playing data through the COVOX LPTDAC, we need a pretty accurate timer. Typical sampling rate would be 8 kHz, but also higher values could be used. Lower values may also work in some special cases, but very Lo-Fi then. There are multiple possible options to get such a time counter:
+
To play data through the COVOX LPTDAC, we need a fairly accurate timer. A typical sample rate would be 8 kHz, but higher values can also be used. Lower values may also work in some special cases, but then very lo-fi. There are several ways to get such a timer:
 
* Timer Interrupt
 
* Timer Interrupt
 
* Poll BIOS Counter
 
* Poll BIOS Counter
Line 25: Line 25:
 
== Timer Interrupt ==
 
== Timer Interrupt ==
 
As described here: [[Output#Advanced_PC_Speaker_and_COVOX_sound_via_interrupt]].
 
As described here: [[Output#Advanced_PC_Speaker_and_COVOX_sound_via_interrupt]].
While this is the most accurate way to drive the COVOX, it is very likely also the most expensive one. Setting up the new interrupt handler (let's even ignore the backup and restore of the old handler), the overhead of the handler itself and the problem of exchanging any data between handler and non-interrupt code. All this will cost bytes. In most cases it will require less size if the timer is polled instead, like in all other variants described next. But it must be also clear that the polling approach will make it necessary to perform this task at a higher frequency than the actual samplingrate, i.e. 8kHz. This requires the polling to happen inside an inner loop, e.g. after each pixel update, which might eat up quite some performance.
+
While this is the most accurate way to drive the COVOX, it is probably also the most expensive. Setting up the new interrupt handler (ignoring even the backup and restore of the old handler), the overhead of the handler itself and the problem of exchanging any data between the handler and the non-interrupt code. All of this will cost bytes. In most cases, it will take less size to poll the timer instead, as in all the other variants described next. But it must be also be clear that the polling approach makes it necessary to perform this task at a higher frequency than the actual sampling rate, i.e. 8kHz. This requires the polling to be done in an inner loop, e.g. after every pixel update, which can eat up quite a bit of performance.
  
 
== Poll BIOS Counter ==
 
== Poll BIOS Counter ==
The DWORD at [0:0x046C] holds the number of BIOS timer ticks. Typically INT8 will run at a frequency of 18.2 Hz. On each call the default interrupt handler will increase this value by 1. Reusing this default handler will prevent the costs for setting up an own handler just to implement the counter increment logic. So, a simple solution to get a timer counter is to reconfigure the PIT for an 8kHz rate. This will trigger the default INT8 handler 8000 times per second. Then this counter can be polled periodically inside the inner loop. Once its LSB changes, another sample must be generated, also using this counter value as sample counter, and sent to LPT1. This could look like this:
+
The DWORD at [0:0x046C] holds the number of BIOS timer ticks. Typically, INT8 runs at a frequency of 18.2 Hz. On each call the default interrupt handler increments this value by 1. Reusing this default handler avoids the cost of writing a custom handler just to implement the counter incrementing logic. So, a simple solution to get a timer counter is to reconfigure the PIT for an 8kHz rate. This will trigger the default INT8 handler 8000 times per second. Then this counter can be polled periodically inside the inner loop. As soon as its LSB changes, another sample must be generated, also using this counter value as sample counter, and sent to LPT1. This could look like this:
  
 
<syntaxhighlight lang="nasm">
 
<syntaxhighlight lang="nasm">
Line 56: Line 56:
 
</syntaxhighlight>
 
</syntaxhighlight>
  
'''HINT:''' Instead incrementing your own 16- or 32-bit timer counter (EBP inside the above example) someone could also use the BIOS timer counter itself, located at DWORD [0:0x046C]. Whatever fits better.
+
'''HINT:''' Instead incrementing your own 16- or 32-bit timer counter (EBP inside the above example) someone could also use the BIOS timer counter itself, located at DWORD [0:0x046C]. Whatever suits you better.
  
 
== Poll PIT Counter ==
 
== Poll PIT Counter ==
Line 65: Line 65:
  
  
This solution may result in the shortest code. But one drawback is the very slow access to the PIT register. On modern chipsets the PIT 8254 is emulated by the southbridge.
+
This solution may result in the shortest code. A disadvantage is the very slow access to the PIT register. On modern chipsets the PIT 8254 is emulated by the southbridge.
  
 
== Alternative Options ==
 
== Alternative Options ==
Another option to get an accurate time is to read the processor's time-stamp counter using the [https://www.felixcloutier.com/x86/rdtsc RDTSC] instruction.
+
Another way to get an accurate time is to read the processor's timestamp counter using the [https://www.felixcloutier.com/x86/rdtsc RDTSC] instruction.
  
 
= The Bytebeat =
 
= The Bytebeat =
[[Bytebeat]] is simply spoken an 8-bit uncompressed audiowave stream at any fixed sampling rate, that is expressed by a single, more or less complex, mathematical function ''f(t)'', where ''t'' is the time represented by the number of the sample, which is also equal to the byte offset inside the stream. It will start generation of the first sample for ''t=0'' and, in case of an 8kHz samplingrate, will play the sample ''f(8000)'' after exactly 1 second. Since this is actually a Softsynth (music synthesis done by software), in theory any sound or music can be aproximated this way. There are no limits except the increasing complexity of the resulting function.
+
[[Bytebeat]] is simply an 8-bit uncompressed audio wave stream at any fixed sampling rate, that is expressed by a single, more or less complex, mathematical function ''f(t)'', where ''t'' is the time represented by the number of the sample, which is also equal to the byte offset within the stream. It will start generating the first sample for ''t=0'' and will play the sample ''f(8000)'' after exactly 1 second if the sampling rate is 8kHz. Since this is actually a softsynth (music synthesis done by software), in theory any sound or music can be approximated in this way. There are no limits except the increasing complexity of the resulting function.
  
In general, any bytebeat algorithm can be implemented now to present the next sample to be written to the COVOX LPT1. But in respect to the size, a bytebeat algorithm is better suited if it's formula is as simple as possible, implementation-wise. [https://www.pouet.net/prod.php?which=96071 MUZAK58] is to some degree a generic or reusable background music player. Of course it is also possible to modify the player itself to change the music style (more on that later), but the source of the played music comes from sequence tables stored in memory. Changing those words results in an entirely new music being played. Also spending more words for this table will add ore variations to the song, that it will not repeat as fast. The sqeuence table of this reference example is 10 bytes in total and looks like this:
+
In general, any bytebeat algorithm can be implemented to generate the next sample to be written to the COVOX LPT1. But in terms of size, a bytebeat algorithm is better suited if it's formula can be implemented in as few bytes as possible. [https://www.pouet.net/prod.php?which=96071 MUZAK58] is to a certain extent a generic or reusable background music player. Of course it is also possible to modify the player itself to change the music style (more on that later), but the source of the music played comes from sequence tables stored in memory. Changing these words will result in completely new music being played. If you use more words for this table, the song becomes more complex so that it does not repeat itself so quickly. The sequence table of this reference example is 10 bytes long and looks like this:
 
<syntaxhighlight lang="nasm">
 
<syntaxhighlight lang="nasm">
 
   seqtbl:  dw      0x1413
 
   seqtbl:  dw      0x1413
Line 82: Line 82:
 
</syntaxhighlight>
 
</syntaxhighlight>
  
The method used to achieve a size-optimized, but still flexible bytebeat is described next.
+
The method used to achieve a size-optimized, yet flexible bytebeat is described next.
  
 
== Music Sequencer ==
 
== Music Sequencer ==
As you can read in many Bytebeat tutorials, like [[Steady_On_Tim]] by Gasman or inside the paper published by viznut, the basic idea to generate a melody with a bytebeat is to modify some basic waveform oscillator function ''o(t)'', like saw-tooth, square, triangle or sine waveforms, by multiplying the time parameter ''t'' by a scale factor ''p'': ''f(t) = o(t*p)''. This multiplication factor will modulate the pitch. If we then use a sequence table ''s(t)'' to replace ''p'', that will change the pitch of our oscillator over time, then we can already play some simple melody using this formula: ''f(t) = o(t*s(t))''.
+
As you can read in many bytebeat tutorials, like [[Steady_On_Tim]] by Gasman or in the paper published by Viznut, the basic idea to generate a melody with a bytebeat is to modify some basic waveform oscillator function ''o(t)'', like sawtooth, square, triangle or sine waveforms, by multiplying the time parameter ''t'' by a scale factor ''p'': ''f(t) = o(t*p)''. This multiplication factor modulates the pitch. If we then use a sequence table ''s(t)'' to replace ''p'', which will change the pitch of our oscillator over time, we can already play a simple melody using this formula: ''f(t) = o(t*s(t))''.
  
Accordingly, we implement a single pitch-modulated oscillator with saw-tooth waveform:
+
Accordingly, we implement a single pitch-modulated oscillator with sawtooth waveform:
 
<syntaxhighlight lang="javascript">
 
<syntaxhighlight lang="javascript">
 
   (t*[1,2,4,8,16,8,4,2][(t>>11)%8])&255
 
   (t*[1,2,4,8,16,8,4,2][(t>>11)%8])&255
Line 93: Line 93:
 
([https://bytebeat.demozoo.org/#t=0&e=0&s=8000&bb=5d00000100250000000000000000141d0145bdb13c9159728aa3da7e69b2fed6480708c016cc4525c68500003024ed9473119236434ffff34df800 listen to this bytebeat here])
 
([https://bytebeat.demozoo.org/#t=0&e=0&s=8000&bb=5d00000100250000000000000000141d0145bdb13c9159728aa3da7e69b2fed6480708c016cc4525c68500003024ed9473119236434ffff34df800 listen to this bytebeat here])
  
To my knowledge, the above code is the simplest way to play a melody in a bytebeat, as long as this should be defined by a sequence table. This example demonstrates a sequence of 8 steps, with ''S=8'' specifying the number of steps. Each step will change the pitch of the resulting saw-tooth waveform.
+
To my knowledge, the above code is the simplest way to play a melody in a bytebeat, as long as it is defined by a sequence table. This example demonstrates a sequence of 8 steps, where ''S=8'' specifies the number of steps. Each step changes the pitch of the resulting sawtooth waveform.
  
Replacing the trailing "&255" (implicit for a bytebeat) by "&128" would change the saw-tooth waveform into a square-wave function:
+
Replacing the trailing "&255" (implicit for a bytebeat) by "&128" would change the sawtooth waveform to a square wave function:
 
<syntaxhighlight lang="javascript">
 
<syntaxhighlight lang="javascript">
 
   (t*[1,2,4,8,16,8,4,2][(t>>11)%8])&128
 
   (t*[1,2,4,8,16,8,4,2][(t>>11)%8])&128
Line 101: Line 101:
 
([https://bytebeat.demozoo.org/#t=0&e=0&s=8000&bb=5d00000100250000000000000000141d0145bdb13c9159728aa3da7e69b2fed6480708c016cc4525c68500003024ed9dc1d9b391be7fffcfb76000 listen to this bytebeat here])
 
([https://bytebeat.demozoo.org/#t=0&e=0&s=8000&bb=5d00000100250000000000000000141d0145bdb13c9159728aa3da7e69b2fed6480708c016cc4525c68500003024ed9dc1d9b391be7fffcfb76000 listen to this bytebeat here])
  
Also other waveforms are possible. Here we use the sine function:
+
Other waveforms are also possible. Here we use the sine function:
 
<syntaxhighlight lang="javascript">
 
<syntaxhighlight lang="javascript">
 
   sin(t*[1,2,4,8,16,8,4,2][(t>>11)%8]/14)*127+127
 
   sin(t*[1,2,4,8,16,8,4,2][(t>>11)%8]/14)*127+127
Line 107: Line 107:
 
([https://bytebeat.demozoo.org/#t=0&e=0&s=8000&bb=5d000001002f0000000000000000399a4a1a8bae05d329e28520c901366398da262860ce3ea49cc63383ad4015395d56ced153c2b5712a75c831dca7c583fffcb53000 listen to this bytebeat here])
 
([https://bytebeat.demozoo.org/#t=0&e=0&s=8000&bb=5d000001002f0000000000000000399a4a1a8bae05d329e28520c901366398da262860ce3ea49cc63383ad4015395d56ced153c2b5712a75c831dca7c583fffcb53000 listen to this bytebeat here])
  
Or distortion-like effects can be applied, like demonstrated here by using the XOR operator in the last step:
+
Or distortion-like effects can be applied, as shown here using the XOR operator in the last step:
 
<syntaxhighlight lang="javascript">
 
<syntaxhighlight lang="javascript">
 
   (t*[1,2,4,8,16,8,4,2][(t>>11)%8])^64
 
   (t*[1,2,4,8,16,8,4,2][(t>>11)%8])^64
Line 113: Line 113:
 
([https://bytebeat.demozoo.org/#t=0&e=0&s=8000&bb=5d00000100240000000000000000141d0145bdb13c9159728aa3da7e69b2fed6480708c016cc4525c68500003024f067719de4f113fffded5400 listen to this bytebeat here])
 
([https://bytebeat.demozoo.org/#t=0&e=0&s=8000&bb=5d00000100240000000000000000141d0145bdb13c9159728aa3da7e69b2fed6480708c016cc4525c68500003024f067719de4f113fffded5400 listen to this bytebeat here])
  
'''HINT:''' Instead of using only one modulated oscillator or one sequence, also 2 or more can be used and combined, like i.e.: ''f(t) = (o0(t*s0(t)) + o1(t*s1(t))) / 2''
+
'''HINT:''' Instead of using only one modulated oscillator or one sequence, also 2 or more can be used and combined, e.g.: ''f(t) = (o0(t*s0(t)) + o1(t*s1(t))) / 2''
  
So far, this are well known techniques used in bytebeat algorithms. With this knowledge we can already start to implement a bytebeat player with one sequence table that holds as many steps ''S'' as we need for our composition, or at least as many as we can effort due to size constraints. The more steps ''S'' we spent, the longer the song will durate before it repeats. The larger the value of each sequence step could be, with a value range limited by ''log2(M)'' bits per step, the larger the range of notes we can use in the end. Both parameters ''S'' and ''M'' will define the final byte-size of our sequence table.
+
So far, these are well known techniques used in bytebeat algorithms. With this knowledge we can already start to implement a bytebeat player with a sequence table containing as many steps ''S'' as we need for our composition, or at least as many as we can handle due to size constraints. The more steps ''S'' we use, the longer the song will last before it repeats. The larger the value of each sequence step can be, with a value range limited by ''log2(M)'' bits per step, the larger the range of notes we can end up using. Both parameters ''S'' and ''M'' will define the final byte size of our sequence table.
  
 
== Cascaded Sequences ==
 
== Cascaded Sequences ==
  
The issue we will face with this approach in size-coding is, that such a sequence table will quickly grow and consumes quite a lot of bytes in the end. Our reference example, [https://www.pouet.net/prod.php?which=96071 MUZAK58], spends 10 bytes for all it's song data. If we take our knowledge at this point, then we would be able to use those 10 bytes to divide them into a sequence of 40 steps (''S = 40''), as long as the limited range per step given by 4 bits (''M = 2^4 = 16'') is enough for the music composition we have in mind. 40 steps is not less, but the heavily limited range of less than 1 octave will limit us to something that we would very likely call a children's song in the end. The reference muzak instead sounds like being build out of at least a multiple of 32 steps, before it starts to repeat. And the octave range also does not appear to be limited to a single octave. What the hell is going on here? How is it possible to compress the sequence table in this way?
+
The problem we will face with this approach in sizecoding products is, that such a sequence table will grow quickly and end up consuming quite a few bytes. Our reference example [https://www.pouet.net/prod.php?which=96071 MUZAK58] uses 10 bytes for all it's song data. Using our knowledge at this point, we would be able to divide these 10 bytes into a sequence of 40 steps (''S = 40''), as long as the limited range per step given by 4 bits (''M = 2^4 = 16'') is sufficient for the music composition we have in mind. 40 steps is not less, but the severely limited range of less than 1 octave will limit us to what we would most likely end up calling a children's song. Instead, the reference muzak sounds like it is made up of at least a multiple of 32 steps before it starts to repeat. And the octave range does not seem to be limited to a single octave. What the hell is going on here? How is it possible to compress the sequence table like this?
  
The trick discovered by wiRe here is to cascade multiple sequencers and combine all their output into a single sequence of much longer sequence duration (before repetition) and of wider pitch range per sequence step: ''s(t) = (s0(t) * s1(t) * s2(t) * ...) / C''
+
The trick wiRe discovered here is to cascade multiple sequencers and combine all their outputs into a single sequence with a much longer sequence duration (before repetition) and a wider pitch range per sequence step: ''s(t) = (s0(t) * s1(t) * s2(t) * ...) / C''
  
But this will limit the freedom of the composer, you could think now. True! But you will see that the results you can achieve this way are not that bad as you may expect first. Indeed, the resulting limitation can even turn out to give new impulses to creativeness; something we already know as the sizecoding effect.
+
But this limits the composer's freedom, you might think. This is true! But you will see that the results you get are not as bad as you might think at first, in fact the resulting limitation can even give new impulses to creativity; something we already know as the sizecoding effect.
  
Here is some attempt to visualize how such an cascaded sequence will develop over time, showing the sequence table index of 5 cascaded sequencers in relation to the sequencer step count. ''O'' is the time divider to derive the step count ''stepcnt = t / O'' with ''O = log2(ticks_per_step)'' to avoid any integer division.
+
Here is an attempt to visualize how such an cascaded sequence evolves over time, showing the sequence table index of 5 cascaded sequencers in relation to the sequence step counter. ''O'' is the time divider to derive the step count ''stepcnt = t / O'' with ''O = log2(ticks_per_step)'' to avoid any integer division.
 
<syntaxhighlight lang="text">
 
<syntaxhighlight lang="text">
 
   +--------+--------------+---------------------------------------------+
 
   +--------+--------------+---------------------------------------------+
Line 139: Line 139:
 
</syntaxhighlight>
 
</syntaxhighlight>
  
In combination with our oscillator function, the entire bytebeat will finally look like this: ''f(t) = o( (t * s0(t) * s1(t) * s2(t) * ...) / C )''
+
In combination with our oscillator function, the whole bytebeat will look like this: ''f(t) = o( (t * s0(t) * s1(t) * s2(t) * ...) / C )''
  
 
== Final Bytebeat Implementation ==
 
== Final Bytebeat Implementation ==
  
The basic idea is to design the bytebeat algorithm as a loop that performs the same operations on each iteration to achieve the smallest possible size. If we decide to use a simple saw-tooth oscillator, we have an easy game with our oscillator function beeing as simple as ''o(t) = t''. As we figured out, the function ''f(t)'' is then only comprised of ''N+1'' terms, all multiplied together like this: ''f(t) = (t * s0(t) * s1(t) * s2(t) * ... * sN-1(t)) / C''. On each loop iteration of the final bytebeat player, the current sequencer ''sN(t)'' is evaluated by calculating the current sequencer index and looking that up inside the sequencer table. The value stored there for this step is then multiplied to the total result of ''f(t)''. If we keep ''M'' low, then even a 16-bit multiplication is enough here. The final scale factor ''C'' depends on the range of the values derived from the sequencer functions ''sN(t)''. Scaling happens as a shift-right-operation in the last step. And with some tweaking of the sequencer step values can even be forced to result in a shift by 8.
+
The basic idea is to design the bytebeat algorithm as a loop that performs the same operations on each iteration to achieve the smallest possible size. If we decide to use a simple sawtooth oscillator, we have an easy game with our oscillator function being as simple as ''o(t) = t''. As we found out, the function ''f(t)'' is then only comprised of ''N+1'' terms, all multiplied together like this: ''f(t) = (t * s0(t) * s1(t) * s2(t) * ... * sN-1(t)) / C''. On each loop iteration of the final bytebeat player, the current sequencer ''sN(t)'' is evaluated by calculating the current sequencer index and looking it up in the sequencer table. The value stored there for this step is then multiplied towards the total result of ''f(t)''. If we keep ''M'' low, then even a 16-bit multiplication is sufficient. The final scaling factor ''C'' depends on the range of the values derived from the sequencer functions ''sN(t)''. Scaling is done as a shift-right operation in the last step. And with some tweaking of the sequencer step values can even be forced to result in a shift-right by 8.
  
In the reference implementation, a total of 5 cascaded sequencers was used: ''N=5''. Each sequencer's table was chosen to store 4 sequence steps: ''S=4''. Which sequencer step to index is then based on 2-bits of parameter ''t''. The fastest sequencer step time for this song was chosen to be ''2^10'' timer ticks or samples, giving us ''O=10''. This means the lookup index for each sequencer ''i'' with ''0 <= i < N'' is derived by ''(t>>(O+i))%S''. Each step value is limited by ''M=16''. Putting this all together, we can now start composing one song this way:
+
The reference implementation uses a total of 5 cascaded sequencers: ''N=5''. The table of each sequencer was chosen to store 4 sequence steps: ''S=4''. Which sequencer step to index is then based on 2 bits of the parameter ''t''. The shortest sequencer step time for this song was chosen to be ''2^10'' samples, which gives us ''O=10''. This means that the lookup index for each sequencer ''i'' with ''0 <= i < N'' is derived by ''(t>>(O+i))%S''. Each step value is limited by ''M=16''. Putting all this together, we can now start composing a song in this way:
 
<syntaxhighlight lang="cpp">
 
<syntaxhighlight lang="cpp">
 
   static constexpr unsigned O = 10;
 
   static constexpr unsigned O = 10;
Line 158: Line 158:
 
     return static_cast<uint8_t>(t / C);
 
     return static_cast<uint8_t>(t / C);
 
   }
 
   }
 +
</syntaxhighlight>
 +
Or the same thing written in Javascript:
 +
<syntaxhighlight lang="javascript">
 +
t
 +
* [3,1,4,1][3&t>>10]
 +
* [6,6,12,6][3&t>>11]
 +
* [2,4,2,2][3&t>>12]
 +
* [5,9,4,6][3&t>>13]
 +
* [4,8,4,4][3&t>>14] >> 8
 
</syntaxhighlight>
 
</syntaxhighlight>
 
([https://bytebeat.demozoo.org/#t=0&e=0&s=8000&v=circles&bb=5d000001007000000000000000003a028140b2901c8f2d314244236cb35b1c788f43a8bd95752d36006aa55dbc6cdcbeb9b5eebb4a5495e65c56d4efcd7d11ba349adaa5ca64f88abeeec07f8c411feb6be3fcc21580 listen to this bytebeat here])
 
([https://bytebeat.demozoo.org/#t=0&e=0&s=8000&v=circles&bb=5d000001007000000000000000003a028140b2901c8f2d314244236cb35b1c788f43a8bd95752d36006aa55dbc6cdcbeb9b5eebb4a5495e65c56d4efcd7d11ba349adaa5ca64f88abeeec07f8c411feb6be3fcc21580 listen to this bytebeat here])
Line 163: Line 172:
 
= The Sourcecode =
 
= The Sourcecode =
  
With all these parameters being chosen carefully, the final bytebeat implementation and sequence tables will become very small. Here is the commented source code of [https://www.pouet.net/prod.php?which=96071 MUZAK58]:
+
With all these parameters carefully chosen, the final bytebeat implementation and sequence tables will be very small. Here is the commented source code of [https://www.pouet.net/prod.php?which=96071 MUZAK58]:
 
<syntaxhighlight lang="nasm">
 
<syntaxhighlight lang="nasm">
 
           ;-----------------------------------
 
           ;-----------------------------------
Line 208: Line 217:
 
             popa                              ;61        ;  restore all registers (especially BX, CX, DX, SI)
 
             popa                              ;61        ;  restore all registers (especially BX, CX, DX, SI)
  
   suplp:   ;--------------------------------- ;---------- ;read 8253/8254 PIT ch#0 counter value (ch#0 must be reconfigured to 0b00010000)
+
   main_lp: ;--------------------------------- ;---------- ;read 8253/8254 PIT ch#0 counter value (ch#0 must be reconfigured to 0b00010000)
 
             in      al, 40h                  ;E440      ;  read low-byte
 
             in      al, 40h                  ;E440      ;  read low-byte
 
             cmp    al, 148                  ;3Cxx      ;  did timer counter overflowed to 149..0FFh?
 
             cmp    al, 148                  ;3Cxx      ;  did timer counter overflowed to 149..0FFh?
             jo      bbeat                     ;71xx      ;    yes -> play
+
             jo      short bbeat               ;71xx      ;    yes -> play
 +
 
 +
          ;... the place for your intro
  
  bbeat_sk: jmp    short suplp              ;75xx      ;  loop forever
+
            jmp    short main_lp            ;75xx      ;  loop forever
 
</syntaxhighlight>
 
</syntaxhighlight>

Latest revision as of 02:06, 12 June 2024

MUZAK58 was created by wiRe/Napalm and is 58 bytes in size. You can watch the video here. It won 4th place at the Lovebyte 2024 demoscene party and is a pure tech demo of a size-optimized bytebeat player for MSDOS and COVOX LPT-DAC, as also used in other sizecoding releases by it's inventor. This page describes how this player works and how it can be adopted for other releases. Feel free to use these ideas and techniques in your own sizecoding productions, but please give a credit to wiRe then. Commercial use is not allowed.

Before you continue, make sure to read Bytebeat and Output#Producing_sound for the basics.

COVOX LPT-DAC

The COVOX LPT-DAC, also called Disney Sound, is an 8-bit digital-to-analog converter (DAC) connected to the 8 data output lines of a parallel printer port (LPT). Typically it was assembled using a simple R-2R resistor ladder to perform the conversion to an analog signal, so it was very cheap to build such a hardware device on your own at that time. Playing back an 8-bit sample, such as the output of a bytebeat algorithm, through COVOX LPT DAC is a very simple task. Assuming the next sample value is in register AL, then this is all you need to do:

            mov     dx, 0378h                 ;BA7803     ;load LPT1 port address into DX
            out     dx, al                    ;EE         ;send 8-bit sample data to COVOX device

HINT: It is also possible to have 2 COVOX adapters, e.g. connected to LPT1 and LPT2, and send out two samples in parallel for stereo output, one sample for the left and one for the right channel.

HINT: It is also possible to play a bytebeat over the PC speaker in lower quality, as described here: Output#PC_Speaker_variant

But for a good audio playback quality, the time between two LPT1 writes should match the sampling rate quite well. Also, the bytebeat algorithm needs a time counter as input that reflects the current sample number. Therefore we need a good time source.

Time Source

To play data through the COVOX LPTDAC, we need a fairly accurate timer. A typical sample rate would be 8 kHz, but higher values can also be used. Lower values may also work in some special cases, but then very lo-fi. There are several ways to get such a timer:

  • Timer Interrupt
  • Poll BIOS Counter
  • Poll PIT Counter
  • Alternative Options

Timer Interrupt

As described here: Output#Advanced_PC_Speaker_and_COVOX_sound_via_interrupt. While this is the most accurate way to drive the COVOX, it is probably also the most expensive. Setting up the new interrupt handler (ignoring even the backup and restore of the old handler), the overhead of the handler itself and the problem of exchanging any data between the handler and the non-interrupt code. All of this will cost bytes. In most cases, it will take less size to poll the timer instead, as in all the other variants described next. But it must be also be clear that the polling approach makes it necessary to perform this task at a higher frequency than the actual sampling rate, i.e. 8kHz. This requires the polling to be done in an inner loop, e.g. after every pixel update, which can eat up quite a bit of performance.

Poll BIOS Counter

The DWORD at [0:0x046C] holds the number of BIOS timer ticks. Typically, INT8 runs at a frequency of 18.2 Hz. On each call the default interrupt handler increments this value by 1. Reusing this default handler avoids the cost of writing a custom handler just to implement the counter incrementing logic. So, a simple solution to get a timer counter is to reconfigure the PIT for an 8kHz rate. This will trigger the default INT8 handler 8000 times per second. Then this counter can be polled periodically inside the inner loop. As soon as its LSB changes, another sample must be generated, also using this counter value as sample counter, and sent to LPT1. This could look like this:

            mov     al, 149                   ;B095       ;program PIT #0 to ~8kHz (1.19318181818 MHz / 149 = 8007.93 Hz)
            out     40h, al                   ;E640       ;
            salc                              ;D6         ;  AL := 0 (if CF=0)
            out     40h, al                   ;E640       ;

            ; ...

  suplp:    mov     al, [046Ch]               ;A0xxxx     ;load LSB from BIOS timer, assuming DS=0
  _tcmp:    cmp     al, 0x55                  ;2C??       ;did timer value changed?
            jz      ntick                     ;74xx       ;  no -> skip audio
            mov     [_tcmp+1], al             ;A2xxxx     ;remember last BIOS timer value (selfmodifying code)

            inc     ebp                       ;6645       ;increment 32-bit timer counter

            ; ... set AL to next audio sample based on EBP ...

            mov     dx, 0378h                 ;BA7803     ;load LPT1 port address into DX
            out     dx, al                    ;EE         ;send 8-bit sample data to COVOX device

  ntick:
            ; ...

            jmp     short suplp

HINT: Instead incrementing your own 16- or 32-bit timer counter (EBP inside the above example) someone could also use the BIOS timer counter itself, located at DWORD [0:0x046C]. Whatever suits you better.

Poll PIT Counter

(...more soon...)

Programmable Interval Timer


This solution may result in the shortest code. A disadvantage is the very slow access to the PIT register. On modern chipsets the PIT 8254 is emulated by the southbridge.

Alternative Options

Another way to get an accurate time is to read the processor's timestamp counter using the RDTSC instruction.

The Bytebeat

Bytebeat is simply an 8-bit uncompressed audio wave stream at any fixed sampling rate, that is expressed by a single, more or less complex, mathematical function f(t), where t is the time represented by the number of the sample, which is also equal to the byte offset within the stream. It will start generating the first sample for t=0 and will play the sample f(8000) after exactly 1 second if the sampling rate is 8kHz. Since this is actually a softsynth (music synthesis done by software), in theory any sound or music can be approximated in this way. There are no limits except the increasing complexity of the resulting function.

In general, any bytebeat algorithm can be implemented to generate the next sample to be written to the COVOX LPT1. But in terms of size, a bytebeat algorithm is better suited if it's formula can be implemented in as few bytes as possible. MUZAK58 is to a certain extent a generic or reusable background music player. Of course it is also possible to modify the player itself to change the music style (more on that later), but the source of the music played comes from sequence tables stored in memory. Changing these words will result in completely new music being played. If you use more words for this table, the song becomes more complex so that it does not repeat itself so quickly. The sequence table of this reference example is 10 bytes long and looks like this:

  seqtbl:   dw      0x1413
            dw      0x6C66
            dw      0x2242
            dw      0x6495
            dw      0x4484

The method used to achieve a size-optimized, yet flexible bytebeat is described next.

Music Sequencer

As you can read in many bytebeat tutorials, like Steady_On_Tim by Gasman or in the paper published by Viznut, the basic idea to generate a melody with a bytebeat is to modify some basic waveform oscillator function o(t), like sawtooth, square, triangle or sine waveforms, by multiplying the time parameter t by a scale factor p: f(t) = o(t*p). This multiplication factor modulates the pitch. If we then use a sequence table s(t) to replace p, which will change the pitch of our oscillator over time, we can already play a simple melody using this formula: f(t) = o(t*s(t)).

Accordingly, we implement a single pitch-modulated oscillator with sawtooth waveform:

  (t*[1,2,4,8,16,8,4,2][(t>>11)%8])&255

(listen to this bytebeat here)

To my knowledge, the above code is the simplest way to play a melody in a bytebeat, as long as it is defined by a sequence table. This example demonstrates a sequence of 8 steps, where S=8 specifies the number of steps. Each step changes the pitch of the resulting sawtooth waveform.

Replacing the trailing "&255" (implicit for a bytebeat) by "&128" would change the sawtooth waveform to a square wave function:

  (t*[1,2,4,8,16,8,4,2][(t>>11)%8])&128

(listen to this bytebeat here)

Other waveforms are also possible. Here we use the sine function:

  sin(t*[1,2,4,8,16,8,4,2][(t>>11)%8]/14)*127+127

(listen to this bytebeat here)

Or distortion-like effects can be applied, as shown here using the XOR operator in the last step:

  (t*[1,2,4,8,16,8,4,2][(t>>11)%8])^64

(listen to this bytebeat here)

HINT: Instead of using only one modulated oscillator or one sequence, also 2 or more can be used and combined, e.g.: f(t) = (o0(t*s0(t)) + o1(t*s1(t))) / 2

So far, these are well known techniques used in bytebeat algorithms. With this knowledge we can already start to implement a bytebeat player with a sequence table containing as many steps S as we need for our composition, or at least as many as we can handle due to size constraints. The more steps S we use, the longer the song will last before it repeats. The larger the value of each sequence step can be, with a value range limited by log2(M) bits per step, the larger the range of notes we can end up using. Both parameters S and M will define the final byte size of our sequence table.

Cascaded Sequences

The problem we will face with this approach in sizecoding products is, that such a sequence table will grow quickly and end up consuming quite a few bytes. Our reference example MUZAK58 uses 10 bytes for all it's song data. Using our knowledge at this point, we would be able to divide these 10 bytes into a sequence of 40 steps (S = 40), as long as the limited range per step given by 4 bits (M = 2^4 = 16) is sufficient for the music composition we have in mind. 40 steps is not less, but the severely limited range of less than 1 octave will limit us to what we would most likely end up calling a children's song. Instead, the reference muzak sounds like it is made up of at least a multiple of 32 steps before it starts to repeat. And the octave range does not seem to be limited to a single octave. What the hell is going on here? How is it possible to compress the sequence table like this?

The trick wiRe discovered here is to cascade multiple sequencers and combine all their outputs into a single sequence with a much longer sequence duration (before repetition) and a wider pitch range per sequence step: s(t) = (s0(t) * s1(t) * s2(t) * ...) / C

But this limits the composer's freedom, you might think. This is true! But you will see that the results you get are not as bad as you might think at first, in fact the resulting limitation can even give new impulses to creativity; something we already know as the sizecoding effect.

Here is an attempt to visualize how such an cascaded sequence evolves over time, showing the sequence table index of 5 cascaded sequencers in relation to the sequence step counter. O is the time divider to derive the step count stepcnt = t / O with O = log2(ticks_per_step) to avoid any integer division.

  +--------+--------------+---------------------------------------------+
  | stpcnt | (t>>O)       | 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 ... |
  +--------+--------------+---------------------------------------------+
  | seq0ix | (t>>(O+0))%S | 0 1 2 3 0 1 2 3 0 1 2 3 0 1 2 3 0 1 2 3 ... |
  | seq1ix | (t>>(O+1))%S | 0 0 1 1 2 2 3 3 0 0 1 1 2 2 3 3 0 0 1 1 ... |
  | seq2ix | (t>>(O+2))%S | 0 0 0 0 1 1 1 1 2 2 2 2 3 3 3 3 0 0 0 0 ... |
  | seq3ix | (t>>(O+3))%S | 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 2 2 2 2 ... |
  | seq4ix | (t>>(O+4))%S | 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 ... |
  +--------+--------------+---------------------------------------------+
                                                               with S=4

In combination with our oscillator function, the whole bytebeat will look like this: f(t) = o( (t * s0(t) * s1(t) * s2(t) * ...) / C )

Final Bytebeat Implementation

The basic idea is to design the bytebeat algorithm as a loop that performs the same operations on each iteration to achieve the smallest possible size. If we decide to use a simple sawtooth oscillator, we have an easy game with our oscillator function being as simple as o(t) = t. As we found out, the function f(t) is then only comprised of N+1 terms, all multiplied together like this: f(t) = (t * s0(t) * s1(t) * s2(t) * ... * sN-1(t)) / C. On each loop iteration of the final bytebeat player, the current sequencer sN(t) is evaluated by calculating the current sequencer index and looking it up in the sequencer table. The value stored there for this step is then multiplied towards the total result of f(t). If we keep M low, then even a 16-bit multiplication is sufficient. The final scaling factor C depends on the range of the values derived from the sequencer functions sN(t). Scaling is done as a shift-right operation in the last step. And with some tweaking of the sequencer step values can even be forced to result in a shift-right by 8.

The reference implementation uses a total of 5 cascaded sequencers: N=5. The table of each sequencer was chosen to store 4 sequence steps: S=4. Which sequencer step to index is then based on 2 bits of the parameter t. The shortest sequencer step time for this song was chosen to be 2^10 samples, which gives us O=10. This means that the lookup index for each sequencer i with 0 <= i < N is derived by (t>>(O+i))%S. Each step value is limited by M=16. Putting all this together, we can now start composing a song in this way:

  static constexpr unsigned O = 10;
  static constexpr unsigned N = 5;
  static constexpr unsigned S = 4;
  static constexpr unsigned C = 256;

  static constexpr uint8_t seqtbl[N][S] = { {3,1,4,1}, {6,6,12,6}, {2,4,2,2}, {5,9,4,6}, {4,8,4,4} };

  uint8_t get_next_sample( uint16_t t ) {
    for( unsigned i = 0; i < N; i++ ) t *= seqtbl[i][(t>>(O+i))%S];
    return static_cast<uint8_t>(t / C);
  }

Or the same thing written in Javascript:

t
* [3,1,4,1][3&t>>10]
* [6,6,12,6][3&t>>11]
* [2,4,2,2][3&t>>12]
* [5,9,4,6][3&t>>13]
* [4,8,4,4][3&t>>14] >> 8

(listen to this bytebeat here)

The Sourcecode

With all these parameters carefully chosen, the final bytebeat implementation and sequence tables will be very small. Here is the commented source code of MUZAK58:

           ;-----------------------------------
           ; MUZAK58 by wiRe/NpM
           ;-----------------------------------
            section .text
            org     100h

           ;--------------------------------- ;---------- ;muzak sequence table
  seqtbl:   dw      0x1413                    ;1314       ;  t * [3,1,4,1][3&t>>10]       ;! 1314       adc dx,[si]
            dw      0x6C66                    ;666C       ;    * [6,6,12,6][3&t>>11]      ;! 666C       o32 insb
            dw      0x2242                    ;4222       ;    * [2,4,2,2][3&t>>12]       ;! 42         inc dx
            dw      0x6495                    ;9564       ;    * [5,9,4,6][3&t>>13]       ;! 22956484   and dl,[di-0x7b9c]
            dw      0x4484                    ;8444       ;    * [4,8,4,4][3&t>>14] >> 8  ;! 44         inc sp

            mov     al, 0b00010000            ;B010       ;write 8253/8254 PIT command/mode register: resets PIT channel #0
            out     43h, al                   ;E643       ;  [7:6] channel #0, [5:4] LSB only, [3:1] mode0 (one-shot), [0] 16-bit binary

           ;--------------------------------- ;---------- ;present next audio sample (DX:BX = 32-bit sample counter)
  bbeat:    add     al, 149                   ;04xx       ;  calculate new timer period (AL = 42..148)
            out     40h, al                   ;E640       ;  rearm timer

            inc     bx                        ;43         ;  increment 16-bit timer counter

            pusha                             ;60         ;  store all registers
           ;mov     si, seqtbl                ;BExxxx     ;  load address of sequence table into SI (here SI already points to seqtbl by default)
            mov     dx, bx                    ;89DA       ;  load start value into DX
            mov     cl, 5                     ;B1xx       ;  init index counter inside CX (CH must be zero already!)
  bbeat_lp: push    cx                        ;51         ;  store CX counter
            mov     cl, 01100b                ;B1xx       ;  get bit sequence from time into CL
            and     cl, bh                    ;20F9       ;    CL := offset to 1 out of 4 entries
            lodsw                             ;AD         ;  load next sequence table entry (AX := DS:[SI]; SI := SI + 2)
            ror     ax, cl                    ;D3C8       ;  select sequence entry at bit-offset 0, 4, 8 or 12
            and     ax, 01111b                ;83E00F     ;  each sequence entry is 4 bits only (AX &= 15)
            mul     dx                        ;F7E2       ;  multiply (DX:AX := AX ∗ DX)
            xchg    ax, dx                    ;92         ;    DX := updated 16-bit sample
            pop     cx                        ;59         ;  restore CX counter
            shr     bx, 1                     ;D1ED       ;  get next bit sequence from time
            loop    bbeat_lp                  ;E2xx       ;  loop until all bits are out

            mov     al, dh                    ;88F0       ;  get sample data into AL
            mov     dx, 0378h                 ;BA7803     ;  load LPT1 port address into DX
            out     dx, al                    ;EE         ;  send 8-bit sample data to COVOX device
            popa                              ;61         ;  restore all registers (especially BX, CX, DX, SI)

  main_lp: ;--------------------------------- ;---------- ;read 8253/8254 PIT ch#0 counter value (ch#0 must be reconfigured to 0b00010000)
            in      al, 40h                   ;E440       ;  read low-byte
            cmp     al, 148                   ;3Cxx       ;  did timer counter overflowed to 149..0FFh?
            jo      short bbeat               ;71xx       ;    yes -> play

           ;... the place for your intro

            jmp     short main_lp             ;75xx       ;  loop forever