This tutorial is by Derek M. Smith © 2005.
Sprite graphics are the backbone of arcade games. Put simply, a sprite is a moveable screen object, such as a spaceship, alien, or anything else you can imagine. Some computers come with built in sprites, such as the Commodore and Atari ranges. These machines could generate sprites without the need for complex programming, and on top of this, they used an additional processor, which left the main CPU free to get on with the rest of the program. The Spectrum sadly was never adorned with such luxuries and so programmers had to write their own sprite routines from scratch (a daunting task for the amateur).
Sloppy programming in this area could really spoil a game, either making it too slow, or through flickering graphics.
A simple sprite routine is often one of the first things a new programmer, after having mastered the basics of machine code, will have a go at.
At first it seems straight forward, but soon all sorts of hurdles appear. The first usually being the Spectrum’s unusual screen arrangement. More experienced programmers who have mastered the basic sprite handling routine will then seek ways to optimise it.
This tutorial is aimed at both beginners and the more experienced programmers. Beginners will learn the principles behind sprite programming and the experienced machine coders will learn to find ways of improving their routines. For simplicity, this tutorial assumes that the reader has a decent grasp of assembly language.
ZX Spectrum screen arrangement
“The display file stores the television picture. It is rather curiously laid out…” -Spectrum Manual ch24 p164
On the face of it the Spectrum’s screen arrangement is pretty strange. Just watch the screen of a game loading, and you’ll understand what I mean. Why does it skip lines like that? Sure it may be interesting to watch the picture being gradually built up, but it can be a real pain when it comes to writing a sprite routine (at least until you understand the principles behind its layout).
First off, the Spectrum screen has a resolution of 256 pixels across by 192 down, not including the border around it. It can display 8 colours (including black and white) with two levels of intensity (brightness).
The Memory Map for the screen starts at address 16384 and is 6912 bytes long.
It is split in to two halves with the first 6144 bytes containing the bit-map (or pixel map, if you wish) and the remaining 768 bytes containing the attribute-map.
Consider the bit-map first:-
Each line of 256 pixels is stored as 32 bytes: 32 x 8 = 256. So far so good.
Now you would think that each line would follow on from the one before in the pixel map, and most people (myself included) write their first sprite routine thinking this to be the case – only to find that when they execute the routine, with a shudder of anticipation, their sprite is spread all over the screen. At this point some give up (or decide to write adventure games instead) baffled by the Spectrum’s idiosyncrasies.
Let me say now that when you do grasp the screen layout and the techniques used for addressing it, I think you will be glad that the designers done it that way.
Type in the following and run it:
10 FOR S = 16384 TO 22527 20 POKE S, 255 30 NEXT S
This short BASIC program fills the screen. Although it is moving through memory sequentially, POKING 255 into each memory location, the screen fills up in a rather more esoteric manner. Run the program a few times and watch the pattern it traces. You’ll notice two things: One is the way it skips lines, another is that the screen is divided into three parts.
I said earlier that each line is stored as 32 bytes, with the first line beginning at 16384. Where does the second line start? Most of us assume it would start at 16384+32. In fact it starts at 16384+256 and 16384+32 takes us down to line 8. Remember that the screen is used to display text (characters) as well as graphics. So adding 32 actually takes us down one character row (8 screen lines).
In machine code 16bit addresses, such as 16384, are stored in high byte / low byte format. The high byte is equal to ADDRESS / 256 and the low byte is the remainder from the division, so 16384 in high/low byte format would be 64, 0 (16384 / 256 = 64 with no remainder). This means that when we increase the high byte of a 16 bit address by one, it is equivalent to adding 256. So when it comes to screen addressing all we need to do to move down a line is increase the high byte of the address. This can be done quite easily and quickly in machine code:
ld hl,16384 ;load the hl register pair with the address of the start of the display file inc h ;increment the high byte (4 tstates)
This is in fact much quicker than adding 32 where we would have to do the following:
ld hl,16384 ld a,l ;load the accumulator with the l register as we cannot add ;directly to the l register add a,32 ;add 32 ld l,a ;load the result back into the l register (total time 15 tstates)
The above works within a group of 8 lines, ie. 1 char line.
Now, as mentioned before, the other peculiar thing about the screen layout is that it is divided in to three parts – top, middle and bottom. Each third of the screen has 64 lines (or 8 character rows) and takes up 2048 bytes of memory. All that has been said so far applies only so long as we don’t cross from one third into another.
The whole matter becomes at good deal clearer if we look at the screen address in binary.
High Byte | Low Byte 0 1 0 T T L L L Cr Cr Cr Cc Cc Cc Cc Cc
I have used some abbreviations to make things a bit clearer:
T – these two bits refer to which third of the screen is being addressed: 00 – Top, 01 – Middle, 10 – Bottom
L – these three bits indicate which line is being addressed: from 0 – 7, or 000 – 111 in binary
Cr – these three bits indicate which character row is being addressed: from 0 – 7
Cc – these five bits refer to which character column is being addressed: from 0 – 31
The top three bits ( 010 ) of the high byte don’t change.
Calculating the screen address
The first task in putting a sprite on the screen is to translate the X,Y coords into a screen address.
There are two ways of doing this. One is to set up a look-up table which contains 192 addresses corresponding to each screen line. The other way is a bit more interesting and involves distilling the appropriate line, column and row bits that make up the address from the X and Y coords. Lets examine this way first.
Taking the Y coord first: This will be in the range 0 – 191, with 0 corresponding to the top of the screen. The lowest three bits indicate which line (within a character row) 0 – 7 we are dealing with. This is the same as the high byte of the screen address. The top two bits refer to which third of the screen we are dealing with:
Y Coord T T - - - L L L
Let us assume that the B & C registers contain our X & Y coords. First we need to isolate the lowest three bits of the Y coord (C reg.), as follows:
ld a,c and %00000111 ;% indicates that the number following it is in binary format
We will use the HL register pair for the screen address. So next we transfer these three bits to the high byte (H reg.)
In order to get the top two bits into the correct position we must shift them right three times, as follows:
ld a,c rra ;rotate right accumulator rra rra
This shifts all the bits to the right three times, with the highest bit being reset after each shift. All that’s left to do at this stage is to clear the bits we don’t need, as follows:
This must now be ORd with the H register.
Then the top three bits, which remain constant, are set as appropriate:
or %01000000 ld h,a ;load the result back into the H reg.
So now we have the high byte of the screen address.
The low byte of the address is composed of the following bits of both the X and Y coords:
X Coord | Y Coord Cc Cc Cc Cc Cc - - - - - Cr Cr Cr - - - Low Byte Cr Cr Cr Cc Cc Cc Cc Cc
The character column bits need to be shifted to the right three times.
ld a,b ;B reg. holds the X coord rra rra rra and %00011111 ; ensure the top 3 bits are clear ld l,a
Then the character row bits must be shifted left twice so that they correspond to the highest three bits of the low byte of the address (see earlier).
ld a,c ;C reg. holds the Y coord rla ;rotate left accumulator rla and %11100000 ;isolate the character row bits or l ;OR the result with the low byte ld l,a ;and place in L register
Right, that’s the difficult part done. We now have the screen address in HL. All that is left is to get the pixel position (0 – 7 from left to right) from the X coord (lowest three bits).
ld a,b and %00000111
So the complete routine is as follows:
On Entry: B reg = X coord, C reg = Y coord On Exit: HL = screen address, A = pixel position ; Calculate the high byte of the screen addressand store in H reg. ld a,c and %00000111 ld h,a ld a,c rra rra rra and %00011000 or h or %01000000 ld h,a ; Calculate the low byte of the screen address and store in L reg. ld a,b rra rra rra and %00011111 ld l,a ld a,c rla rla and %11100000 or l ld l,a ; Calculate pixel postion and store in A reg. ld a,b and %00000111
Using a Look-up Table
The other method of calculating the screen address is to use a table of pre-calculated addresses, and then use the Y coord to pick out the right one. This is a good deal quicker than the above method, albeit at the cost of 384 bytes of memory for the table. The look-up table contains the address of each line in the display file, 192 lines.
Routine to generate a Screen Address Table:
scradtab equ 64000 gentab ld de,16384 ld hl,scradtab ld b,192 lineloop ld (hl),e inc l ld (hl),d inc hl inc d ld a,d and 7 jr nz,nextline ld a,e add a,32 ld e,a jr c,nextline ld a,d sub 8 ld d,a nextline djnz lineloop ret
Assuming again that on entry the B and C registers contain our X & Y coords.
ld de,scradtab ;address of look-up table ld l,c ;Y coord in L register ld h,0 add hl,hl ;multiply by two, as each address is 2 bytes add hl,de ;add to the start address of the table. HL is now at the ;appropriate point in the table ld a,(hl) ;get low byte of screen address; store in A temporarily ;so as not to corrupt HL inc l ld h,(hl) ;get high byte ld l,a ;HL now contains the address of the start of the line ld a,b ;calculate character column 0 - 31 from X coord and %11111000 ;isolate appropriate bits rrca ;rotate right circular accumulator (faster than srl a) rrca rrca ;shifting three times is the same as dividing by eight add a,l ;add to low byte of address ld l,a ;put result back in L reg.
On exit HL will contain the screen address.
An even quicker way is to arrange the lookup table so that it is aligned with a 256 byte page boundary. This method separates the low byte and the high byte of the screen addresses, so in effect there are two tables one containing the low bytes of the address of each screen line and one containing the high bytes. Each must be aligned with a page boundary, ie. the start address of the table must be cleanly divisible by 256.
The above routine for generating the screen address table needs the following changes made to it:
lineloop ld (hl),e inc h ld (hl),d dec h inc l
The table would be filled as follows:
64000 - 64191 0....... 32....... 64....... 96....... 128....... Low bytes of screen addresses 64256 - 64447 64,65,66....71, 64,65,66....71, 64,65,66....71.... High bytes of screen addresses
Finding the start address of a particular line is then done as follows:
ld h,scradtab/256 ; high byte of start of table ld l,c ; C reg. contains the Y coord ld a,(hl) ; already we have found the low byte, so store in A temporarily inc h ; increasing high byte moves forward 256 bytes, and to the ld h,(hl) ; corresponding high byte of the screen address ld l,a ; HL now contains the screen address for the start of the line
All that is then needed is to add on the column position, which is calculated in the same way as the previous routine.
The main factors which need consideration when writing a sprite routine are:
1. Whether or not to use masked sprites
2. Whether or not to use pre-shifted sprites
3. What sizes of sprite to use and whether to use a different routine tailored for each size or a generic
routine which will produce various sizes of sprite
Writing a generic sprite routine which can handle many sizes and types of sprite, is generally shunned by most programmers, because the complexity of the routine has a negative impact on performance. It is perferable to write several routines tailored to specific sizes of sprite.
As per usual there is a trade-off between memory and speed, with the fastest routines using the most memory.
Masked sprites on average will take about 30% more time to process.
Pre-shifted sprites, which are sprites stored in 2, 4 or 8 different pixel positions, are by far the fastest type, but consume lots of memory.
An alternative to using pre-shifted sprites favored by some programmers, is to have a look-up table containing each bit pattern from 0 – 255 shifted into eight positions. In practice this takes up 4K of memory but this is usually less than is required when pre-shifting a large number of sprites. More about this later.
A Worked Example
It’s now time to take a look at how to code a basic sprite routine. The example below is for an 8 x 8 non-masked sprite. The sprite data (a small arrow pointer) is located at USR “A” – 65368. The routine shifts the data in real time. The sprite is XORed with the contents of the screen. You could also OR the sprite with the screen contents (see below for explanation of OR, XOR etc.) Real time rotation of sprite data is easiest acheived with small sprites (16 pixels or less wide), as the data can be stored in registers while being rotated.
The steps in the algorithm to draw the sprite could be stated as follows:
1. Retrieve X and Y coords.
2. Calculate Screen Address based on X,Y coords, using a look-up table
3. Calculate Bit Position from X coord (X coord AND 7 = bit position)
4. Retrieve a line of the sprite graphics data
5. Check if Bit Position is zero
6. If so there is no need to shift the sprite data, so skip the code which shifts the data (jump to step 8 )
7. Shift the sprite data according to its bit position
8. Put the line of sprite data on the screen
9. Adjust screen address for next line
10. Perform steps 4-9 until all lines have been drawn
ORG 50000 SPRITE DI ;Disable Interupts. Not strictly necessary in this ;example as we are not redirecting the Stack. LD BC,(XPOS) ; First off, get the X & Y coords ; and place them in B & C registers LD H,SCRADTAB/256 ; this next section calculates the LD L,B ; screen address using a lookup table LD A,(HL) ; as explained earlier in the tutorial INC H LD H,(HL) LD L,A LD A,C AND 248 RRCA RRCA RRCA ADD A,L LD L,A LD (SCRADD),HL ; store screen address for later ; as HL is needed again LD A,C ; find which pixel position the sprite AND 7 ; will be at and store it. We need this to LD (BITPOS),A ; know how many times to shift the sprite data LD HL,SPRGFX ; start address for the sprite graphic LD C,8 ; The sprite is 8 lines tall LINELOOP LD E,(HL) ; load the E reg with the first line of INC L ; sprite data, and move forward PUSH HL ; preserve this address for later LD A,(BITPOS) ; Retrieve the pixel position OR A ; Quick way of testing if A is zero - Note: A is unaffected JR Z,SKIPROTATE ; If zero then no shifting is needed LD B,A ; loop counter for number of times to ; rotate (shift) sprite data XOR A ; Quick way setting A to zero and ; clearing the Carry Flag. The Carry Flag must be reset ; as the rotate instructions will shift its contents ; into the sprite data. ROTATELOOP RR E ; An extra register is needed for the RRA ; shifted sprite data. After the RR E is executed ; rightmost bit of sprite data is shifted out to the Carry Flag. ; It is then shifted into the A reg by RRA. So no data is lost. DJNZ ROTATELOOP ; Loop back until shifting is complete SKIPROTATE LD D,A ; Store in D register, as A will be ; needed for another purpose. E & D regs now contain the ; shifted sprite data. LD HL,(SCRADD) ; Get back the screen address LD A,(HL) ; Actually put a line of the sprite on the screen XOR E LD (HL),A INC L LD A,(HL) XOR D LD (HL),A DEC L ; move back and down one line INC H LD A,H ; It is not necessary to recalculate AND 7 ; the screen address for each line of JR NZ,A1 ; the sprite. All that is needed is to LD A,L ; check if a char. or segment boundary ADD A,32 ; has been crossed and adjust address LD L,A ; accordingly. JR C,A1 LD A,H SUB 8 LD H,A A1 LD (SCRADD),HL ; store it again POP HL ; retrieve the address of the next line ; of sprite data DEC C JP NZ,LINELOOP ; loop back until all lines are drawn. EI ; Enable Interupts again RET XPOS DEFB 0 YPOS DEFB 0 SCRADD DEFW 0 BITPOS DEFB 0 SCRADTAB EQU 64000
On Exit : A, BC, DE, HL corrupt
As can be seen from the above routine it is not necessary to re-calculate the screen address for every line of a sprite. A couple of checks can be made to test if the next line to be drawn is in a new character row or screen third and adjust the address accordingly:
LD A,H ;HL contains screen address AND 7 JR NZ, REST_OF_PROGRAM LD A,L ADD A,32 LD L,A JR C, REST_OF PROGRAM LD A,H SUB 8 LD H,A Rest Of Program continues here...
Routine to generate a table of screen addresses
GENTAB LD DE,16384 LD HL,SCRADTAB LD B,192 LINELOOP LD (HL),E INC H LD (HL),D DEC H INC L INC D LD A,D AND 7 JR NZ,NEXTLINE LD A,E ADD A,32 LD E,A JR C,NEXTLINE LD A,D SUB 8 LD D,A NEXTLINE DJNZ LINELOOP RET
Sprite Graphics Data
ORG 65368 SPRGFX DEFB %00000000 DEFB %01111000 DEFB %01110000 DEFB %01111000 DEFB %01011100 DEFB %00001110 DEFB %00000100 DEFB %00000000
OR, XOR, AND
00111100 OR 00011000 = 00111100 00111100 XOR 00011000 = 00100100 00111100 AND 00011000 = 00011000
Masking is a technique used to blank out the area behind a sprite before drawing the sprite to the display. It’s slower than planar (non-masked) sprites, and uses twice as much memory but the final result is more pleasing to the eye. If your sprites are going to be moving over a patterned background its the best technique to use.
The way to draw a masked sprite is to AND the mask with background and then OR the actual sprite with the result. When you AND two bitmaps, bits in the BACKGROUND are reset by zeros in the MASK, and left as they are by ones. That’s probably not very easy to grasp so look at the following examples:
background mask 00111100 and 11111111 = 00111100 ; each bit in the data byte remains the same after the mask is applied 00111100 and 00000000 = 00000000 ; all the bits in the data byte are reset by the mask 11111111 and 00111100 = 00111100 ; the middle four bits remain unchanged and the rest are reset
The previous example showed how to code a very simple sprite routine. Lets now look at a more complex example. This routine will draw a 16 x 16 Masked Sprite, storing the background underneath in a buffer.
This is probably the largest size that can be conveniently shifted in real-time. Any larger and you will want to use a different technique.
The following routine uses a couple of non-standard programming techniques. These tricks help to speed up execution but make the code a bit more difficult to follow. One technique involves using the Stack to retrieve the sprite data, the other is self-modifying code, where the program pokes values into instructions that will be executed later on in the routine.
ORG 50000,50000 SPRITE DI ; Disable Interupts. LD (SPTEMP),SP ; Store Stack Pointer as it will be redirected ; by the routine LD BC,(XPOS) LD A,C AND 7 LD (BITPOS),A LD H,SCRADTAB/256 ; Calculate screen address as before LD L,B LD A,(HL) INC H LD H,(HL) LD L,A LD A,C AND 248 RRCA RRCA RRCA ADD A,L LD L,A LD (SCRADD),HL LD IX,BKGRNDBUF ; IX is used to point to a temporary buffer where the contents ; of the screen under the sprite are stored. LD SP,SPRGFX ; start address for the sprite graphic LD A,16 ; The sprite is 16 lines tall LD (LINECOUNT),A LINELOOP POP DE ; Using the stack to retrieve a line of sprite graphics POP HL ; E and D will contain the MASK as it is stored first ; L and H hold the actual sprite image LD A,255 ; When shifting, the A reg will be the extra register that the MASK ; is shifted into. It is loaded with 255 as that means all the bits are ; initially transparent. SCF ; The carry flag is set by this instruction so that bits shifted into the ; left side of the mask are set (transparent) EX AF,AF' ; We also need an extra register for the sprite image, so we will use ; the alternate or shadow A register. LD A,(BITPOS) ; Retrieve the pixel position OR A ; Quick way of testing if A is zero JR Z,SKIPROTATE ; If zero then no shifting is needed LD B,A ; loop counter for number of times to shift XOR A ; clear carry flag and set A reg. to zero, ; as the rotate instructions will shift its contents (carry flag's) ; into the sprite image data. This is the opposite to the MASK. ROTATELOOP EX AF,AF' ; Shift the Sprite Mask data RR E RR D RRA EX AF,AF' ; Shift the Sprite Image data RR L RR H RRA DJNZ ROTATELOOP ; Loop back until shifting is complete SKIPROTATE LD (DATA+1),A ; This is the piece of self-modifying code mentioned earlier. EX AF,AF' ; As the A reg will be needed and there are no other registers LD (MASK+1),A ; free, we will poke their data (the rightmost bytes of the sprite mask ; and image) into instructions further on in the routine. LD BC,(SCRADD) ; Get back the screen address LD A,(BC) ; Actually put a line of sprite data on the screen LD (IX),A ; Store what is under the sprite so that it can be erased ; without corrupting the background AND E ; This instruction applies the MASK OR L ; The sprite image is then ORed with the background LD (BC),A ; and the result copied back to the screen INC C INC IXL LD A,(BC) ; Put the second byte on the screen LD (IX),A ; As we are dealing with a 16 x 16 sprite AND D ; each line will be three bytes wide when shifted OR H LD (BC),A INC C INC IXL LD A,(BC) ; Put the third byte on the screen LD (IX),A MASK AND 255 ; These are the two instructions that were modified DATA OR 0 ; earlier. The data stored in the A reg. and its shadow ; were poked into the AND & OR instructions. LD (BC),A INC IXL DEC C DEC C ; move back and down one line INC B LD A,B ; It is not necessary to recalculate AND 7 ; the screen address for each line of JR NZ,A1 ; the sprite. All that is needed is to LD A,C ; check if a char. or segment (screen third) boundary ADD A,32 ; has been crossed and adjust the address LD C,A ; accordingly. JR C,A1 LD A,B SUB 8 LD B,A A1 LD (SCRADD),BC ; store it again LD HL,LINECOUNT DEC (HL) JP NZ,LINELOOP ; loop back until all lines are drawn. LD SP,(SPTEMP) ; Restore the Stack Pointer EI ; Enable Interupts again RET
In order to clear the sprite we need to copy back to the screen the Buffer holding the contents
of the screen under the sprite. The following routine will do this.
CLEARSPRITE DI LD BC,(OLDXPOS) LD H,SCRADTAB/256 LD L,B LD A,(HL) INC H LD H,(HL) LD L,A LD A,C AND 248 RRCA RRCA RRCA ADD A,L LD L,A LD DE, BKGRNDBUF LD C,16 CLOOP LD A,(DE) LD (HL),A INC L INC E LD A,(DE) LD (HL),A INC L INC E LD A,(DE) LD (HL),A INC E DEC L DEC L INC H LD A,H AND 7 JR NZ,A2 LD A,L ADD A,32 LD L,A JR C,A2 LD A,H SUB 8 LD H,A A2 DEC C JP NZ,CLOOP EI RET XPOS DEFB 0 YPOS DEFB 0 OLDXPOS DEFB 0 OLDYPOS DEFB 0 LINECOUNT DEFB 0 BITPOS DEFB 0 SCRADD DEFW 0 SPTEMP DEFW 0 SCRADTAB EQU 64000 BKGRNDBUF EQU 64512
; ASM source file created by SevenuP v1.12 ; SevenuP (C) Copyright 2002-2004 by Jaime Tejedor G¢mez, aka Metalbrain ;GRAPHIC DATA: ;Pixel Size: ( 16, 16) ;Char Size: ( 2, 2) ;Sort Priorities: X char, Char line, Y char, Mask ;Attributes: No attributes ;Mask: Yes, mask before SPRGFX DEFB 248, 31, 0, 0 DEFB 224, 7, 3,192 DEFB 192, 3, 15,240 DEFB 128, 1, 25,248 DEFB 128, 1, 51,252 DEFB 0, 0, 39,252 DEFB 0, 0,111,250 DEFB 0, 0,127,252 DEFB 0, 0,127,250 DEFB 0, 0,127,252 DEFB 0, 0, 63,248 DEFB 128, 1, 63,244 DEFB 128, 1, 31,232 DEFB 192, 3, 13, 80 DEFB 224, 7, 2,128 DEFB 248, 31, 0, 0
Addendum: There appears to be a minor bug in the original code by Derek Smith. Commentator uglifruit posted a fix in the comments section, which has now been integrated into the above code.