Bytes: Chuntey has moved to a new (more permanent) address @ www.chuntey.arjunnair.in. This blog will no longer be updated here. Please refer to the new link for futher updates! Thanks.
Making Games Load and Run Automatically
Note: This article was originally written by Jonathan Cauldwell and is reproduced here with permission.
While this is simple enough to achieve for an experienced Sinclair BASIC programmer, it is an area often overlooked. In particular, programmers migrating to the Spectrum from other machines will not be familiar with the way this is done.
In order to run a machine code routine, we have to start it from BASIC. This means writing a small BASIC loader program, which clears the space for the machine code, loads that code, and then runs it. The simplest sort of loader would be along these lines:
10 CLEAR 24575: LOAD ""CODE: RANDOMIZE USR 24576
The first command, CLEAR, sets RAMTOP below the area occupied by the machine code, so BASIC doesn’t overwrite it. It also clears the screen and moves the stack out of the way. The number that follows should usually be one byte below the first byte of your game. LOAD “”CODE loads the next code file on the tape, and RANDOMIZE USR effectively calls the machine code routine at the address specified, in this case 24576. This should be the entry point for your game. On a Spectrum, The ROM sits in the first 16K, and this is followed by various other things such as screen RAM, system variables and BASIC. A safe place for your code is above this area, all the way up to the top of RAM at address 65535. With just a short BASIC loader a start address of 24576, or even 24000 will give you plenty of room for your game.
This loader program is then saved to tape using a command like this:
SAVE "name" LINE 10
LINE 10 indicates that on loading, the BASIC program is to auto-run from line 10.
After the BASIC loader comes the code file. You can save a code file like this:
SAVE "name" CODE 24576,40960
CODE tells the Spectrum to save a code file, as opposed to BASIC. The first number after this is the start address of the block of code, and the last number is its length.
That is simple enough, but what if we want to add a loading screen? Well, that is straightforward enough. We can load a screen using
What this will do is load a block of code up to 6912 bytes long, to the start of the screen display at address 16384. Putting the screen file there is a bit trickier, because we cannot simply save out the screen as a file as the bottom two lines would be overwritten with the Start tape, then press any key message. So we load our picture into a point in RAM – say, 32768 – then use
SAVE "name" CODE 32768,6912
6912 is the size of the Spectrum’s display RAM. When we reload the block from tape using LOAD “”SCREEN$, we are specifying that we want to force the code file to be loaded into screen memory. Under these circumstances it doesn’t matter where the code file was located when it was saved.
Now we have another problem: wouldn’t the Bytes: name message that is printed up on loading the code block overwrite part of the screen? Well, yes it would. We can overcome this by poking the output stream.
Will do the trick for us. So our BASIC loader now looks like this:
10 CLEAR 24575: LOAD ""SCREEN$: POKE 23739,111: LOAD ""CODE: RANDOMIZE USR 24576
Note: This article was originally written by Jonathan Cauldwell and is reproduced here with permission.
Setting up your own interrupts can a nightmare the first time you try it, as it is a complicated business. With practice, it becomes a little easier. To make the Spectrum run our own interrupt routine, we have to tell it where the routine is, put the machine into interrupt mode 2, and ensure that interrupts are enabled. Sound simple enough? The tricky part is telling the Spectrum where our routine is located.
With the machine in mode 2, the Z80 uses the I register to determine the high byte of the address of the pointer to the interrupt service routine address. The low byte is supplied by the hardware. In practice, we never know what the low byte is going to be – so you see the problem? The low byte could be 0, it could be 255, or it could be anywhere in between. This means we need a whole block of 257 bytes consisting of pointers to the start address of our service routine. As the low byte supplied by the hardware could be odd or even, we have to make sure that the low byte and the high byte of the address of our service routine are identical. This seriously restricts where we can locate our routine.
We should also only locate our table of pointers and our routine in uncontended RAM. Do not place them below address 32768. Even paging in an uncontended RAM bank for the purpose, such as bank 1, will produce problems on certain models of Spectrum. Personally, I find bank 0 to be as good a place as any.
Let us say we choose address 51400 as the location of our interrupt routine. This is valid as both the high byte and low byte are 200, since 200*256+200 = 51400. We then need a table of 129 pointers all pointing to this address, or 257 instances of defb 200, located at the start of a 256-byte page boundary. Assuming we put it high up out of the way, we could start it at 254*256 = 65024.
We would do this:
org 51400 int ; interrupt service routine. org 65024 ; pointers to interrupt routine. defb 200,200,200,200 defb 200,200,200,200 . . defb 200,200,200,200 defb 200
Ugh! Still, now we come to our interrupt routine. Interrupts can occur during any period, so we have to preserve any registers we are likely to use, perform our code, optionally call the ROM service routine, restore the registers, re-enable interrupts, then return from the interrupt with a RETI. Our routine might resemble this:
int push af ; preserve registers. push bc push hl push de push ix call 49158 ; play music. rst 56 ; ROM routine, read keys and update clock. pop ix ; restore registers. pop de pop hl pop bc pop af ei ; always re-enable interrupts before returning. reti ; done. ret
If you are not reading the keyboard via the system variables you may wish to dispense with the RST 56. Doing so will free up the IY registers. However, if your game’s timing counts the frames using the method described in the timing chapter, you will need to increment the timer yourself:
ld hl,23672 ; frames counter. inc (hl) ; move it along.
With all this in place, we are ready to set off our interrupts. We have to point the I register at the table of pointers and select interrupt mode 2. This code will do the job for us:
di ; interrupts off as a precaution. ld a,254 ; high byte of pointer table location. ld i,a ; set high byte. im2 ; select interrupt mode 2. ei ; enable interrupts.
Music and AY Effects
Note: This article was originally written by Jonathan Cauldwell and is reproduced here with permission.
Introduced with the 128K models and pretty much standard since then, this is a very popular sound chip, used on various other computers, not to mention video games and pinball machines. Basically, it has 14 registers, which can be written to, or read from, via in and out instructions.
The first six registers control the tone for each of the three channels, and are paired in the little-endian way we would expect, ie register 0 is Channel A tone low, register 1 is channel A tone high, register 2 is channel B tone low, and so on. Register 6 controls the white noise period, values of 0 to 31 are valid. 0 gives the highest frequency noise, 31 the lowest. Register 7 is the mixer control. Bits d0-d5 select white noise, tone, neither, or both. To enable tone or white noise, a bit must be reset, so 0 outputs tone and noise from all three channels, 63 outputs nothing. Registers 8, 9 and 10 are envelope/amplitude controls. A value of 16 tells the chip to use the envelope generator, 0-15 will set the volume for that channel directly.
In practice, you are better off controlling the volume yourself, along with the tone. By varying these from one frame to the next, it is possible to produce a variety of very good sound effects. Should you wish to use the envelope generator, the next two registers, 11 and 12, are paired to form the 16-bit period of the envelope, and register 13 determines the pattern. The 128K manual explains the full list of patterns, but I won’t cover them here as I have never found them particularly useful myself.
To read a register, we write the number of the register to port 65533, then immediately read that port. To write to a register, we again send the number if the register to port 65533, and then the value to 49149. To the uninitiated, Z80 opcodes don’t appear to be capable of writing to 16-bit port addresses. Don’t let that confuse you, it’s just that the way they are written is misleading. out (c),a actually means out (bc),a and out (n),a actually does out (a*256+n),a.
Reading a sound chip register does have some uses. You might want to read the volume registers 8, 9 and 10 and display volume bars – I did something along those lines in Egghead 5. Also, believe it or not, the Sinclair light gun is read via sound chip register 14. Yes, really. It only actually yields two pieces of information, whether or not the trigger is pressed, and whether or not the gun is pointed at a bright part of the screen, or indeed any bright object. It is up to the programmer what he does with that information.
Here’s some rather basic code to write to the sound chip:
; Write the contents of our AY buffer to the AY registers. w8912 ld hl,snddat ; start of AY-3-8912 register data. ld e,0 ; start with register 0. ld d,14 ; 14 to write. ld c,253 ; low byte of port to write. w8912a ld b,255 ; 255*256+253 = port 65533 = select soundchip register. out (c),e ; tell chip which register we're writing. ld a,(hl) ; value to write. ld b,191 ; 191*256+253 = port 49149 = write value to register. out (c),a ; this is what we're putting there. inc e ; next sound chip register. inc hl ; next byte to write. dec d ; decrement loop counter. jp nz,w8912a ; repeat until done. ret snddat defw 0 ; tone registers, channel A. defw 0 ; channel B tone registers. defw 0 ; as above, channel C. sndwnp defb 0 ; white noise period. sndmix defb 60 ; tone/noise mixer control. sndv1 defb 0 ; channel A amplitude/envelope generator. sndv2 defb 0 ; channel B amplitude/envelope. sndv3 defb 0 ; channel C amplitude/envelope. sndenv defw 600 ; duration of each note. defb 0
By calling w8912 once every iteration of the main loop, the sound is constantly updated. It is then up to you to update the buffer as each noise changes from one frame to the next. Think of it as “animating” sound. However, just because you stop updating the sound registers the sound won’t stop playing. The AY chip will keep playing your tone or noise until instructed to stop. A quick way to do this is to set the three amplitude registers to 0. In the example above, write a zero to sndv1, sndv2 and sndv3 then call w8912.
Using Music Drivers
Most 128K music drivers, and some 48K ones, have two entry points. An initialisation/termination routine which stops all sound and resets the driver to the beginning of the tune, and a service routine to be called repeatedly, usually 50 times per second. A good place to store music is usually 49152, the start of the switchable RAM bank area. If you know the start address of the driver, or are in a position to determine this yourself, the very beginning is usually the initialisation address which sets up your music at the start. More often than not, the code at this point either simply jumps to another address, or loads a register or register pair before jumping elsewhere. The service routine tends to immediately follow this jp instruction. If your driver has no other documentation, you may have to disassemble the code to find this address, usually 3-6 bytes in.
To use a music driver, call the initialisation address prior to starting, and also when you want to turn it off. Between these points, you need to call the service routine repeatedly. This can either be done manually, or by setting up an interrupt to do the job automatically. If you choose to do this manually, for example in menu code, bear in mind that clearing the screen and displaying a menu, high score table, instructions etc will take more than 1/50th of a second to do, so this will delay your routine and could sound odd. It might be better to write a routine to clear the screen over several frames with some sort of special effect, punctuated with halts and calls to the service routine every frame.
Adding and subtracting is straightforward enough on the Spectrum’s CPU, we have an abundance of instructions to perform these tasks. But unlike some later processors in the series, the programmer of the Z80A has to do his own multiplication and division. While such calculations are rare, they have their uses in certain types of game, and until you have routines to do the job, certain things are very tricky to do. For example, without Pythagoras’ theorem, it can be difficult to program an enemy sprite to shoot at the player with any degree of accuracy.
Suppose sprite A needs to fire a shot at sprite B. We need to find the angle at which sprite A is to fire, and some trigonometry is necessary to do this. We know the coordinates of the sprite, Ax, Ay, Bx and By, and the distances between these, Bx-Ax and By-Ay, will give us the opposite and adjacent line lengths. Unfortunately, the only way to calculate the angle from the opposite and adjacent is to use arctangent, and as tangents are only suitable for certain angles, we are better off using sine or cosine instead. So in order to find the angle from sprite A to sprite B, we need to find the length of the hypotenuse.
The hypotenuse is calculated by squaring the x distance, adding it to the square of the y distance, then finding the square root. There are routines in the Sinclair ROM to do all of this, but there is one serious drawback: as anyone who has ever used Sinclair BASIC will tell you, the maths routines are incredibly slow for writing games. So we have to knuckle down and write our own.
Squaring our x and y distances means using a multiplication routine and multiplying the numbers by themselves. Thankfully, this part is relatively painless. Multiplication is achieved in the same way as you would perform long multiplication on paper, although this time we are working in binary. All that is required is shifting, testing bits, and adding. Where a bit exists in our first factor, we add the second factor to the total. Then we shift the second factor left, and test the next bit along in our first factor. The routine below, taken from Kuiper Pursuit, demonstrates the technique by multiplying H by D and returning the result in HL.
imul ld e,d ; HL = H * D ld a,h ; make accumulator first multiplier. ld hl,0 ; zeroise total. ld d,h ; zeroise high byte so de=multiplier. ld b,8 ; repeat 8 times. imul1 rra ; rotate rightmost bit into carry. jr nc,imul2 ; wasn't set. add hl,de ; bit was set, so add de. and a ; reset carry. imul2 rl e ; shift de 1 bit left. rl d djnz imul1 ; repeat 8 times. ret
Now we need a square root, which is where our problems begin. Square roots are a lot more complicated. This means doing a lot of divisions, so first we need a division routine. This can be seen to work in the opposite way to multiplication, by shifting and subtracting. The next routine, also from Kuiper Pursuit, divides HL by D and returns the result in HL.
idiv ld b,8 ; bits to check. ld a,d ; number by which to divide. idiv3 rla ; check leftmost bit. jr c,idiv2 ; no more shifts required. inc b ; extra shift needed. cp h jr nc,idiv2 jp idiv3 ; repeat. idiv2 xor a ld e,a ld c,a ; result. idiv1 sbc hl,de ; do subtraction. jr nc,idiv0 ; no carry, keep the result. add hl,de ; restore original value of hl. idiv0 ccf ; reverse carry bit. rl c ; rotate in to ac. rla rr d ; divide de by 2. rr e djnz idiv1 ; repeat. ld h,a ; copy result to hl. ld l,c ret
In the same way that multiplication is made up of shifting and adding, and division is done via shifting and subtracting, so square roots can be calculated by shifting and dividing. We’re simply trying to find the “best fit” number which, when multiplied by itself, gives us the number with which we started. I won’t go into detailed explanation as to how the following routine works – if you really are that interested, follow my comments and step it through a debugger. Taken from Blizzard’s Rift, it returns the square root of HL in the accumulator.
isqr ld (sqbuf0),hl ; number for which we want to find square root. xor a ; zeroise accumulator. ld (sqbuf2),a ; result buffer. ld a,128 ; start division with highest bit. ld (sqbuf1),a ; next divisor. ld b,8 ; 8 bits to divide. isqr1 push bc ; store loop counter. ld a,(sqbuf2) ; current result. ld d,a ld a,(sqbuf1) ; next bit to check. or d ; combine with divisor. ld d,a ; store low byte. xor a ; HL = HL / D ld c,a ; zeroise c. ld e,a ; zeroise e. push de ; remember divisor. ld hl,(sqbuf0) ; original number. call idiv4 ; divide number by d. pop de ; restore divisor. cp d ; is divisor greater than result? jr c,isqr0 ; yes, don't store this bit then. ld a,d ld (sqbuf2),a ; store new divisor. isqr0 ld hl,sqbuf1 ; bit we tested. and a ; clear carry flag. rr (hl) ; next bit to right. pop bc ; restore loop counter. djnz isqr1 ; repeat ld a,(sqbuf2) ; return result in hl. ret sqbuf0 defw 0 sqbuf1 defb 0 sqbuf2 defb 0
With the length of the hypotenuse calculated, we can simply divide the opposite line by the hypotenuse to find the cosine of the angle. A quick search of our sine table will then tell us what that angle is. Phew!
This is the entire calculation taken from Blizzard’s Rift. Note that it uses the adjacent line length rather than the opposite, so finds the arccosine instead of the arcsine. It is also only used when the ship is above the gun turret, giving the player the opportunity to sneak up and attack from underneath. Nevertheless, it demonstrates how a sprite can fire at another with deadly accuracy. If you have ever played Blizzard’s Rift, you will know exactly how lethal those gun turrets can be.
; Ship is above the gun so we can employ some basic trigonometry to aim it. ; We need to find the angle and to do this we divide the adjacent by ; the hypotenuse and find the arccosine. ; First of all we put the length of the opposite on the stack: mgunx ld a,(nshipy) ; ship y coordinate. ld hl,guny ; gun y coord. sub (hl) ; find difference. jr nc,mgun0 ; result was positive. neg ; negative, make it positive. mgun0 cp 5 ; y difference less than 5? jr c,mgunu ; yes, point straight up. push af ; place length of opposite on stack. ; Next we require the length of the hypotenuse and we can use good ; old Pythagoras' theorem for this. ld h,a ; copy a to h. ld d,h ; copy h to d. call imul ; multiply integer parts to get 16-bit result. push hl ; remember squared value. ld hl,nshipx ; gun x coordinate. ld a,(gunx) ; ship x coordinate. sub (hl) ; find difference, will always be positive. ld h,a ; put x difference in h. ld d,h ; copy h to d. call imul ; multiply h by d to get square. pop de ; get last squared result. add hl,de ; want the sum of the two. call isqr ; find the square root, hypotenuse in a. pop de ; opposite line now in d register. ld h,a ; length of hypotenuse. ld l,0 ; no fraction or sign. ex de,hl ; switch 'em. ; Opposite and hypotenuse are now in de and hl. ; We now divide the first by the second and find the arcsine. ; Remember - sine = opposite over hypotenuse. call div ; division will give us the sine. ex de,hl ; want result in de. call asn ; get arcsine to find the angle. push af ; Okay, we have the angle but it's only 0 to half-pi radians (64 angles) ; so we need to make an adjustment based upon the quarter of the circle. ; We can establish which quarter of the circle our angle lies in by ; examining the differences between the ship and gun coordinates. ld a,(guny) ; gun y position. ld hl,shipy ; ship y. cp (hl) ; is ship to the right? jr nc,mgun2 ; player to the left, angle in second quarter. ; Angle to play is in first quarter, so it needs subtracting from 64. ld a,64 ; pi/2 radians = 64 angles. pop bc ; angle in b. sub b ; do the subtraction. ld (ix+1),a ; new angle. ret ; we have our angle. ; Second quarter - add literal 64 to our angle. mgun2 pop af ; original angle. add a,192 ; add pi/2 radians. ld (ix+1),a ; new angle. ret ; job's a good 'un!
So far, we have moved sprites up, down, left and right by whole pixels. However, many games require more sophisticated sprite manipulation. Platform games require gravity, top-down racing games use rotational movement and others use inertia.
Jump and Inertia Tables
The simplest way of achieving gravity or inertia is to have a table of values. For example, the Egghead games make use of a jump table and maintain a pointer to the current position. Such a table might look like the one below.
; Jump table. ; Values >128 are going up,
With the pointer stored in jptr, we might do something like this:
ld hl,(jptr) ; fetch jump pointer. ld a,(hl) ; next value. cp 128 ; reached end of table? jr nz,skip ; no, we're okay. dec hl ; back to maximum velocity. ld a,(hl) ; fetch max speed. skip inc hl ; move pointer along. ld (jptr),hl ; set next pointer position. ld hl,verpos ; player's vertical position. add a,(hl) ; add relevant amount. ld (hl),a ; set player's new position.
To initiate a jump, we would set jptr to jtabu. To start falling, we would set it to jtabd.
Okay, so it’s a bit simplistic. In practice, we would usually use the value from the jump table as a loop counter, moving the player up or down one pixel at a time, checking for collisions with platforms, walls, deadly items etc as we go. We might also use the end marker (128) to signify that the player had fallen too far, and set a flag so that the next time the player hits something solid, he loses a life. That said, you get the picture.
If we want more sophisticated gravity, inertia, or rotational movement we need fractional coordinates. Up until now, with the Spectrum’s resolution at 256×192 pixels, we have only needed to use one byte per coordinate. If instead we use a two-byte register pair, the high byte for the integer and low byte for the fraction, we open up a whole new world of possibilities. This gives us 8 binary decimal places, allowing very precise and subtle movements. With a coordinate in the HL pair, we can set up the displacement in DE, and add the two together. When plotting our sprites, we simply use the high bytes as our x and y coordinates for our screen address calculation, and discard the low bytes which hold the fractions. The effect of adding a fraction to a coordinate will not be visible every frame, but even the smallest fraction, 1/256, will slowly move a sprite over time.
Now we can take a look at gravity. This is a constant force, in practice it accelerates an object towards the ground at 9.8m/s^2. To simulate it in a Spectrum game, we set up our vertical coordinate as a 16-bit word. We then set up a second 16-bit word for our momentum. Each frame, we add a tiny fraction to the momentum, then add the momentum to the vertical position. For example:
ld hl,(vermom) ; momentum. ld de,2 ; tiny fraction, 1/128. add hl,de ; increase momentum. ld (vermom),hl ; store momentum. ld de,(verpos) ; vertical position. add hl,de ; add momentum. ld (verpos),hl ; store new position. ret verpos defw 0 ; vertical position. vermom defw 0 ; vertical momentum.
Then, to plot our sprites, we simply take the high byte of our vertical position, verpos+1, to give us the number of pixels from the top of the screen. Different values of DE will vary the strength of the gravity, indeed we can even swap the direction by subtracting DE from HL, or by adding a negative distance (65536-distance). We can apply the same to the y coordinate too, and have the sprite subject to momentum in all directions. This is how we would go about writing a Thrust-style game.
The other thing we might need for a Thrust game, top-down racers, or anything where circles or basic trigonometry is involved is a sine/cosine table. Mathematics isn’t everybody’s cup of tea, and if your trigonometry is a little rusty I suggest you read up on sines and cosines before continuing with the remainder of this chapter.
In mathematics, we can find the x and y distance from the centre of a circle given the radius and the angle by using sines and cosines. However, whereas in maths a circle is made up of either 360 degrees or 2 PI radians, it is more convenient for the Spectrum programmer to represent his angle as, say, an 8-bit value from 0 to 255, or even use fewer bits, depending on the number of positions the player sprite can take. He can then use this value to look up his 16-bit fractional values for the sine and cosine in a table. Assuming we have an 8-bit angle set up in the accumulator, and we wish to find the sine, we simply access the table in a manner similar to this:
ld de,2 ; tiny fraction - 1/128. ld l,a ; angle in low byte. ld h,0 ; zero displacement high byte. add hl,hl ; double displacement as entries are 16-bit. ld de,sintab ; address of sine table. add hl,de ; add displacement for this angle. ld e,(hl) ; fraction of sine. inc hl ; point to second half. ld d,(hl) ; integer part. ret ; return with sine in de.
Sinclair BASIC actually provides us with the values we require, with its SIN and COS functions. Using this, we can POKE the values returned into RAM and either save to tape, or save out the binary using an emulator such as SPIN. Alternatively, you may prefer to use another programming language on the PC to generate a table of formatted sine values. to import into your source file, or include as a binary. For a sine table with 256 equally-spaced angles, we would need a total of 512 bytes, but we would need to be careful to convert the number returned by SIN into one our game will recognise. Multiplying the sine by 256 will give us our positive values, but where SIN returns a negative result, we might need to multiply the ABS value of the sine by 256, then either subtract that from 65536 or set bit d7 of the high byte to indicate that the number must be subtracted rather than added to our coordinate. With a sine table constructed in this manner, we don’t need a separate table for cosines, as we just add or subtract 64, or a quarter-turn, to the angle before looking up the value in our table. To move a sprite at an angle of A, we add the sine of A to one coordinate, and the cosine of A to the other coordinate. By changing whether we add or subtract a quarter turn to obtain the cosine, and which plane uses sines and which uses cosines, we can start our circle at any of the 4 main compass points, and make it go in a clockwise or anti-clockwise direction.
Until now we have drawn all our graphics directly onto the screen, for reasons of speed and simplicity. However, there is one major disadvantage to this method: if the television scan line happens to be covering the particular screen area where we are deleting or redrawing our image then our graphics will appear to flicker. Unfortunately, on the Spectrum there is no easy way to tell where the scan line is at any given point so we have to find a way around this.
One method which works well is to delete and redraw all sprites immediately following a halt instruction, before the scan has a chance to catch up with the image being drawn. The disadvantage to this method is that our sprite code has to be pretty fast, and even then it is not advisable to delete and re-draw more than two sprites per frame because by then the scan will be over the top border and into the screen area. Of course, locating the status panel at the top of the screen might give a little more time to draw our graphics, and if the game is to run at 25 frames per second we could employ a second halt instruction and manoeuvre another couple of sprites immediately afterwards.
Ultimately, there comes a point where this breaks down. If our graphics are going to take a little longer to draw we need another way to hide the process from the player and we need to employ the use of a second buffer screen. This means that all the work involved in drawing and undrawing graphics is hidden from the player and all that is visible is each finished frame once it has been drawn.
There are two ways of doing this on a Spectrum. One method will only work on a 128K machine, so we will put that to one side for the time being. The other method actually tends to be more complicated in practice but will work on any Spectrum.
Creating a Screen Buffer
The simplest way to implement double buffering on a 48K Spectrum is to set up a dummy screen elsewhere in RAM, and draw all our background graphics and sprites there. As soon as our screen is complete we copy this dummy screen to the physical screen at address 16384 thus:
. ; code to draw all our sprites etc. . . . . ; now screen is drawn copy it to physical screen. ld hl,49152 ld de,16384 ld bc,6912 ldir
While in theory this is perfect, in practice copying 6912 bytes of RAM (or 6144 bytes if we ignore the colour attributes) to the screen display every frame it is too slow for arcade games. The secret is to reduce the amount of screen RAM we need to copy each frame, and to find a faster way than by transferring it with the LDIR instruction.
The first way is to decide how big our screen is going to be. Most games separate the screen into 2 areas: a status panel to display score, lives and other bits of information, and a window where all the action takes place. As we don’t need to update the status panel every frame our dummy screen only needs to be as big as the action window.
So if we were to have a status panel as an 80 x 192 pixel at the right edge of the screen that would leave us a 176×192 pixel window, meaning our dummy screen would only need to be 22 chars wide by 192 pixels high, or 22×192=4224 bytes. Manually moving 4224 bytes from one part of RAM to another is far less painful than manipulating 6114 bytes. The trick is to find a size which is large enough not to restrict gameplay while being small enough to be manipulated quickly. Of course, we may also want to make our buffer a little larger around the edges. While these edges are not displayed on the screen they are useful if we wish to clip sprites as they move into the action window from the sides.
Once we have set our buffer size in stone we need to write a routine to transfer it to the physical display file one or two bytes at a time. While we are at it, we can also re-order our buffer screen to use a more logical display method than the one used by the physical screen. We can make allowances for the peculiar ordering of the Spectrum’s display file in our transfer rountine, meaning any graphics routines which make use of our dummy screen buffer can be simplified.
There are two really quick ways of moving a dummy screen to the display screen. The first, and most simple method, is to use lots of unrolled LDI instructions. The second, and more complicated method, makes use of PUSH and POP to transfer the data.
Let us start with LDI. If our buffer is 22 chars wide we might transfer a single line from the buffer to the screen display with 22 consecutive LDI instructions – it is much quicker to use lots of LDI instructions than to use a single LDIR. We could write a routine to transfer our data across a single line at a time, pointing HL to the start of each line of the buffer, DE to the line on the screen where it is needed, and then 22 LDI instructions to move the data across. However, as each LDI instruction takes two bytes of code, it stands to reason that such a routine would be at least twice the size of the buffer it moved. A considerable hit when dealing with a little over 40K of useful RAM. You may instead wish to move the LDI instructions to a subroutine which copies a pixel line, or perhaps a group of 8 pixel lines, at a time. This routine could then be called from within a loop – unrolled or not – which could take care of the HL and DE registers.
The second method is to transfer the buffer to the screen using PUSH and POP instructions. While this does have the advantage of being the fastest way there is, there are drawbacks. You do need complete control of the stack pointer so you can’t have any interrupts occurring mid-way through the routine. The stack pointer must be stored away somewhere first, and restored immediately afterwards.
The Spectrum’s stack is usually located below your program code, but this method involves setting the stack to point to each part of the buffer in turn, and then using POP to copy the contents of the dummy screen buffer into each of the register pairs in turn. The stack pointer is then moved to the relevant point in the screen display RAM, before the registers are PUSHed into memory in the reverse order to that in which they were POPped. Ie, values are POPped from the buffer going from the start of each line, and PUSHed to the screen in the reverse order, going from the end of the line to the beginning.
Below is the gist of the screen transfer routine from Rallybug. This used a buffer 30 characters wide, with 28 characters visible on screen. The remaining 2 characters were not displayed so that sprites moved onto the screen slowly from the edge, rather than suddenly appearing from nowhere. As the visible screen width is 28 characters wide, this requires 14 16-bit registers per line. Obviously, the Z80A doesn’t have this many, even counting the alternate registers and IX and IY. As such, the Rallybug routine splits the display into two halves of 14 bytes each, requiring just 7 register pairs. The routine sets the stack pointer to the beginning of each buffer line in turn, then POPs the data into AF, BC, DE and HL. It then swaps these registers into the alternate register set with EXX, and POPs 6 more bytes into BC, DE and HL. These registers now need to be unloaded into the screen area, so the stack pointer is set to point to the end of the relevant screen line, and HL, DE and BC are PUSHed into position. The alternate registers are then restored, and HL, DE, BC and AF are respectively copied into position. This is repeated over and over again for each half of each screen line, before the stack pointer is restored to its original position.
Complicated, yes. But incredibly fast.
SEG1 equ 16514 SEG2 equ 18434 SEG3 equ 20482 P0 equ 0 P1 equ 256 P2 equ 512 P3 equ 768 P4 equ 1024 P5 equ 1280 P6 equ 1536 P7 equ 1792 C0 equ 0 C1 equ 32 C2 equ 64 C3 equ 96 C4 equ 128 C5 equ 160 C6 equ 192 C7 equ 224 xfer ld (stptr),sp ; store stack pointer. ; Character line 0. ld sp,WINDOW ; start of buffer line. pop af pop bc pop de pop hl exx pop bc pop de pop hl ld sp,SEG1+C0+P0+14 ; end of screen line. push hl push de push bc exx push hl push de push bc push af . . ld sp,WINDOW+4784 ; start of buffer line. pop af pop bc pop de pop hl exx pop bc pop de pop hl ld sp,SEG3+C7+P7+28 ; end of screen line. push hl push de push bc exx push hl push de push bc push af okay ld sp,(stptr) ; restore stack pointer. ret
Scrolling the Buffer
Now we have our dummy screen, we can do anything we like to it without the risk of flicker or other graphical anomalies, because we only transfer the buffer to the physical screen when we have finished building the picture. We can place sprites, masked or otherwise, anywhere we like and in any order we like. We can move the screen around, and animate the background graphics, and most importantly, we can now scroll in any direction.
Different techniques are required for different types of scrolling, although they all have one thing in common: as scrolling is a processor-intensive task, unrolled loops are the order of the day. The simplest type of scroll is a left/right single pixel scroll. A right single pixel scroll requires us to set the HL register pair to the start of the buffer, then execute the following two operands over and over again until we reach the end of the buffer:
rr (hl) ; rotate carry flag and 8 bits right. inc hl ; next buffer address.
Similarly, to execute a left single-pixel scroll we set hl to the last byte of the buffer and execute these two instructions until we reach the beginning of the buffer:
rl (hl) ; rotate carry flag and 8 bits right. dec hl ; next buffer address.
For most of the time, however, we can get away with only incrementing or decrementing the l register, instead of the HL pair, speeding up the routine even more. This does have the drawback of having to know exactly when the high order byte of the address changes. For this reason, I usually set my buffer address in stone right at the beginning of the project, often at the very top of RAM, so I don’t have to rewrite the scrolling routines when things get shifted around during the course of a project. As with the routine to transfer the buffer to the physical screen, a massive unrolled loop is very expensive in terms of RAM, so it is a good idea to write a smaller unrolled loop which scrolls, say, 256 bytes at a time, then call it 20 or so times, depending upon the chosen buffer size.
In addition to scrolling one pixel at a time, we can scroll four pixels fairly quickly too. By replacing rl (hl) with rld in the left scroll, and rr (hl) with rrd in the right scroll, we can move 4-pixels.
Vertical scrolling is done by shifting bytes around in RAM, in much the same way as the routine to transfer the dummy screen to the physical one. To scroll up one pixel, we set our FROM address to be the start of the second pixel line, the TO address to the address of the start of the buffer, then copy the data from the FROM address to the TO address until we reach the end of the buffer. To scroll down, we have to work in the opposite direction, so we set our FROM address to the end of the penultimate line of the buffer, our TO address to the end of the last line, and work backwards until we reach the start of the buffer. The added advantage of vertical scrolling is that we can scroll up or down by more than one line, simply by altering the addresses, and the routine will run just as quickly. Generally speaking, it isn’t a good idea to scroll by more than one pixel if your frame rate is lower than 25 frames per second, because the screen will appear to judder.
There is one other technique that can be employed with vertical scrolling, and it is one I employed when writing Megablast for Your Sinclair. This involves treating the dummy screen as wrap-around. In other words, you still use the same amount of RAM for the dummy buffer, but the part of the buffer from which you start copying to the top of the screen can change from one frame to the next. When you reach the end of the buffer, you skip back to the beginning. With this system, the routine to copy the buffer takes the address of the start of the buffer from a 16-bit pointer which could point to any line in the buffer, and copies the data to the physical screen line by line until it reaches the end of the buffer. At this point, the routine copies the data from the start of the buffer to the remainder of the physical screen. This makes the transfer routine a little slower, and complicates any other graphics routines – which also have to go back to the first line whenever they go beyond the last line in the buffer. It does, on the other hand, mean that no data needs to be shifted in order to scroll the screen. By changing the 16-bit pointer to the line which is first copied to the physical screen, scrolling is done automatically when the buffer is transferred.