Note: This article was originally written by Jonathan Cauldwell and is reproduced here with permission.
Until now we have drawn all our graphics directly onto the screen, for reasons of speed and simplicity. However, there is one major disadvantage to this method: if the television scan line happens to be covering the particular screen area where we are deleting or redrawing our image then our graphics will appear to flicker. Unfortunately, on the Spectrum there is no easy way to tell where the scan line is at any given point so we have to find a way around this.
One method which works well is to delete and redraw all sprites immediately following a halt instruction, before the scan has a chance to catch up with the image being drawn. The disadvantage to this method is that our sprite code has to be pretty fast, and even then it is not advisable to delete and re-draw more than two sprites per frame because by then the scan will be over the top border and into the screen area. Of course, locating the status panel at the top of the screen might give a little more time to draw our graphics, and if the game is to run at 25 frames per second we could employ a second halt instruction and manoeuvre another couple of sprites immediately afterwards.
Ultimately, there comes a point where this breaks down. If our graphics are going to take a little longer to draw we need another way to hide the process from the player and we need to employ the use of a second buffer screen. This means that all the work involved in drawing and undrawing graphics is hidden from the player and all that is visible is each finished frame once it has been drawn.
There are two ways of doing this on a Spectrum. One method will only work on a 128K machine, so we will put that to one side for the time being. The other method actually tends to be more complicated in practice but will work on any Spectrum.
Creating a Screen Buffer
The simplest way to implement double buffering on a 48K Spectrum is to set up a dummy screen elsewhere in RAM, and draw all our background graphics and sprites there. As soon as our screen is complete we copy this dummy screen to the physical screen at address 16384 thus:
. ; code to draw all our sprites etc. . . . . ; now screen is drawn copy it to physical screen. ld hl,49152 ld de,16384 ld bc,6912 ldir
While in theory this is perfect, in practice copying 6912 bytes of RAM (or 6144 bytes if we ignore the colour attributes) to the screen display every frame it is too slow for arcade games. The secret is to reduce the amount of screen RAM we need to copy each frame, and to find a faster way than by transferring it with the LDIR instruction.
The first way is to decide how big our screen is going to be. Most games separate the screen into 2 areas: a status panel to display score, lives and other bits of information, and a window where all the action takes place. As we don’t need to update the status panel every frame our dummy screen only needs to be as big as the action window.
So if we were to have a status panel as an 80 x 192 pixel at the right edge of the screen that would leave us a 176×192 pixel window, meaning our dummy screen would only need to be 22 chars wide by 192 pixels high, or 22×192=4224 bytes. Manually moving 4224 bytes from one part of RAM to another is far less painful than manipulating 6114 bytes. The trick is to find a size which is large enough not to restrict gameplay while being small enough to be manipulated quickly. Of course, we may also want to make our buffer a little larger around the edges. While these edges are not displayed on the screen they are useful if we wish to clip sprites as they move into the action window from the sides.
Once we have set our buffer size in stone we need to write a routine to transfer it to the physical display file one or two bytes at a time. While we are at it, we can also re-order our buffer screen to use a more logical display method than the one used by the physical screen. We can make allowances for the peculiar ordering of the Spectrum’s display file in our transfer rountine, meaning any graphics routines which make use of our dummy screen buffer can be simplified.
There are two really quick ways of moving a dummy screen to the display screen. The first, and most simple method, is to use lots of unrolled LDI instructions. The second, and more complicated method, makes use of PUSH and POP to transfer the data.
Let us start with LDI. If our buffer is 22 chars wide we might transfer a single line from the buffer to the screen display with 22 consecutive LDI instructions – it is much quicker to use lots of LDI instructions than to use a single LDIR. We could write a routine to transfer our data across a single line at a time, pointing HL to the start of each line of the buffer, DE to the line on the screen where it is needed, and then 22 LDI instructions to move the data across. However, as each LDI instruction takes two bytes of code, it stands to reason that such a routine would be at least twice the size of the buffer it moved. A considerable hit when dealing with a little over 40K of useful RAM. You may instead wish to move the LDI instructions to a subroutine which copies a pixel line, or perhaps a group of 8 pixel lines, at a time. This routine could then be called from within a loop – unrolled or not – which could take care of the HL and DE registers.
The second method is to transfer the buffer to the screen using PUSH and POP instructions. While this does have the advantage of being the fastest way there is, there are drawbacks. You do need complete control of the stack pointer so you can’t have any interrupts occurring mid-way through the routine. The stack pointer must be stored away somewhere first, and restored immediately afterwards.
The Spectrum’s stack is usually located below your program code, but this method involves setting the stack to point to each part of the buffer in turn, and then using POP to copy the contents of the dummy screen buffer into each of the register pairs in turn. The stack pointer is then moved to the relevant point in the screen display RAM, before the registers are PUSHed into memory in the reverse order to that in which they were POPped. Ie, values are POPped from the buffer going from the start of each line, and PUSHed to the screen in the reverse order, going from the end of the line to the beginning.
Below is the gist of the screen transfer routine from Rallybug. This used a buffer 30 characters wide, with 28 characters visible on screen. The remaining 2 characters were not displayed so that sprites moved onto the screen slowly from the edge, rather than suddenly appearing from nowhere. As the visible screen width is 28 characters wide, this requires 14 16-bit registers per line. Obviously, the Z80A doesn’t have this many, even counting the alternate registers and IX and IY. As such, the Rallybug routine splits the display into two halves of 14 bytes each, requiring just 7 register pairs. The routine sets the stack pointer to the beginning of each buffer line in turn, then POPs the data into AF, BC, DE and HL. It then swaps these registers into the alternate register set with EXX, and POPs 6 more bytes into BC, DE and HL. These registers now need to be unloaded into the screen area, so the stack pointer is set to point to the end of the relevant screen line, and HL, DE and BC are PUSHed into position. The alternate registers are then restored, and HL, DE, BC and AF are respectively copied into position. This is repeated over and over again for each half of each screen line, before the stack pointer is restored to its original position.
Complicated, yes. But incredibly fast.
SEG1 equ 16514 SEG2 equ 18434 SEG3 equ 20482 P0 equ 0 P1 equ 256 P2 equ 512 P3 equ 768 P4 equ 1024 P5 equ 1280 P6 equ 1536 P7 equ 1792 C0 equ 0 C1 equ 32 C2 equ 64 C3 equ 96 C4 equ 128 C5 equ 160 C6 equ 192 C7 equ 224 xfer ld (stptr),sp ; store stack pointer. ; Character line 0. ld sp,WINDOW ; start of buffer line. pop af pop bc pop de pop hl exx pop bc pop de pop hl ld sp,SEG1+C0+P0+14 ; end of screen line. push hl push de push bc exx push hl push de push bc push af . . ld sp,WINDOW+4784 ; start of buffer line. pop af pop bc pop de pop hl exx pop bc pop de pop hl ld sp,SEG3+C7+P7+28 ; end of screen line. push hl push de push bc exx push hl push de push bc push af okay ld sp,(stptr) ; restore stack pointer. ret
Scrolling the Buffer
Now we have our dummy screen, we can do anything we like to it without the risk of flicker or other graphical anomalies, because we only transfer the buffer to the physical screen when we have finished building the picture. We can place sprites, masked or otherwise, anywhere we like and in any order we like. We can move the screen around, and animate the background graphics, and most importantly, we can now scroll in any direction.
Different techniques are required for different types of scrolling, although they all have one thing in common: as scrolling is a processor-intensive task, unrolled loops are the order of the day. The simplest type of scroll is a left/right single pixel scroll. A right single pixel scroll requires us to set the HL register pair to the start of the buffer, then execute the following two operands over and over again until we reach the end of the buffer:
rr (hl) ; rotate carry flag and 8 bits right. inc hl ; next buffer address.
Similarly, to execute a left single-pixel scroll we set hl to the last byte of the buffer and execute these two instructions until we reach the beginning of the buffer:
rl (hl) ; rotate carry flag and 8 bits right. dec hl ; next buffer address.
For most of the time, however, we can get away with only incrementing or decrementing the l register, instead of the HL pair, speeding up the routine even more. This does have the drawback of having to know exactly when the high order byte of the address changes. For this reason, I usually set my buffer address in stone right at the beginning of the project, often at the very top of RAM, so I don’t have to rewrite the scrolling routines when things get shifted around during the course of a project. As with the routine to transfer the buffer to the physical screen, a massive unrolled loop is very expensive in terms of RAM, so it is a good idea to write a smaller unrolled loop which scrolls, say, 256 bytes at a time, then call it 20 or so times, depending upon the chosen buffer size.
In addition to scrolling one pixel at a time, we can scroll four pixels fairly quickly too. By replacing rl (hl) with rld in the left scroll, and rr (hl) with rrd in the right scroll, we can move 4-pixels.
Vertical scrolling is done by shifting bytes around in RAM, in much the same way as the routine to transfer the dummy screen to the physical one. To scroll up one pixel, we set our FROM address to be the start of the second pixel line, the TO address to the address of the start of the buffer, then copy the data from the FROM address to the TO address until we reach the end of the buffer. To scroll down, we have to work in the opposite direction, so we set our FROM address to the end of the penultimate line of the buffer, our TO address to the end of the last line, and work backwards until we reach the start of the buffer. The added advantage of vertical scrolling is that we can scroll up or down by more than one line, simply by altering the addresses, and the routine will run just as quickly. Generally speaking, it isn’t a good idea to scroll by more than one pixel if your frame rate is lower than 25 frames per second, because the screen will appear to judder.
There is one other technique that can be employed with vertical scrolling, and it is one I employed when writing Megablast for Your Sinclair. This involves treating the dummy screen as wrap-around. In other words, you still use the same amount of RAM for the dummy buffer, but the part of the buffer from which you start copying to the top of the screen can change from one frame to the next. When you reach the end of the buffer, you skip back to the beginning. With this system, the routine to copy the buffer takes the address of the start of the buffer from a 16-bit pointer which could point to any line in the buffer, and copies the data to the physical screen line by line until it reaches the end of the buffer. At this point, the routine copies the data from the start of the buffer to the remainder of the physical screen. This makes the transfer routine a little slower, and complicates any other graphics routines – which also have to go back to the first line whenever they go beyond the last line in the buffer. It does, on the other hand, mean that no data needs to be shifted in order to scroll the screen. By changing the 16-bit pointer to the line which is first copied to the physical screen, scrolling is done automatically when the buffer is transferred.