# Geometry Transformation Engine (GTE) [GTE Overview](geometrytransformationenginegte.md#gte-overview)
[GTE Registers](geometrytransformationenginegte.md#gte-registers)
[GTE Saturation](geometrytransformationenginegte.md#gte-saturation)
[GTE Opcode Summary](geometrytransformationenginegte.md#gte-opcode-summary)
[GTE Coordinate Calculation Commands](geometrytransformationenginegte.md#gte-coordinate-calculation-commands)
[GTE General Purpose Calculation Commands](geometrytransformationenginegte.md#gte-general-purpose-calculation-commands)
[GTE Color Calculation Commands](geometrytransformationenginegte.md#gte-color-calculation-commands)
[GTE Division Inaccuracy](geometrytransformationenginegte.md#gte-division-inaccuracy)
## GTE Overview #### GTE Operation The GTE doesn't have any memory or I/O ports mapped to the CPU memory bus, instead, it's solely accessed via coprocessor opcodes:
``` mov cop0r12,rt ;-enable/disable COP2 (GTE) via COP0 status register mov cop2r0-63,rt ;\write parameters to GTE registers mov cop2r0-31,[rs+imm] ;/ mov cop2cmd,imm25 ;-issue GTE command mov rt,cop2r0-63 ;\read results from GTE registers mov [rs+imm],cop2r0-31 ;/ jt cop2flg,dest ;-jump never ;\implemented (no exception), but, jf cop2flg,dest ;-jump always ;/flag seems to be always "false" ``` GTE (memory-?) load and store instructions have a delay of 2 instructions, for any GTE commands or operations accessing that register. Any? That's wrong!
GTE instructions and functions should not be used in
``` - Delay slots of jumps and branches - Event handlers or interrupts (sounds like nonsense?) (need push/pop though) ``` If an instruction that reads a GTE register or a GTE command is executed before the current GTE command is finished, the CPU will hold until the instruction has finished. The number of cycles each GTE instruction takes is shown in the command list.
#### GTE Command Encoding (COP2 imm25 opcodes) ``` 31-25 Must be 0100101b for "COP2 imm25" instructions 20-24 Fake GTE Command Number (00h..1Fh) (ignored by hardware) 19 sf - Shift Fraction in IR registers (0=No fraction, 1=12bit fraction) 17-18 MVMVA Multiply Matrix (0=Rotation. 1=Light, 2=Color, 3=Reserved) 15-16 MVMVA Multiply Vector (0=V0, 1=V1, 2=V2, 3=IR/long) 13-14 MVMVA Translation Vector (0=TR, 1=BK, 2=FC/Bugged, 3=None) 11-12 Always zero (ignored by hardware) 10 lm - Saturate IR1,IR2,IR3 result (0=To -8000h..+7FFFh, 1=To 0..+7FFFh) 6-9 Always zero (ignored by hardware) 0-5 Real GTE Command Number (00h..3Fh) (used by hardware) ``` The MVMVA bits are used only by the MVMVA opcode (the bits are zero for all other opcodes).
The "sf" and "lm" bits are usually fixed (either set, or cleared, depending on the command) (for MVMVA, the bits are variable) (also, "sf" can be changed for some commands like SQR) (although they are usually fixed for most other opcodes, changing them might have some effect on some/all opcodes)?
#### GTE Data Register Summary (cop2r0-31) ``` cop2r0-1 3xS16 VXY0,VZ0 Vector 0 (X,Y,Z) cop2r2-3 3xS16 VXY1,VZ1 Vector 1 (X,Y,Z) cop2r4-5 3xS16 VXY2,VZ2 Vector 2 (X,Y,Z) cop2r6 4xU8 RGBC Color/code value cop2r7 1xU16 OTZ Average Z value (for Ordering Table) cop2r8 1xS16 IR0 16bit Accumulator (Interpolate) cop2r9-11 3xS16 IR1,IR2,IR3 16bit Accumulator (Vector) cop2r12-15 6xS16 SXY0,SXY1,SXY2,SXYP Screen XY-coordinate FIFO (3 stages) cop2r16-19 4xU16 SZ0,SZ1,SZ2,SZ3 Screen Z-coordinate FIFO (4 stages) cop2r20-22 12xU8 RGB0,RGB1,RGB2 Color CRGB-code/color FIFO (3 stages) cop2r23 4xU8 (RES1) Prohibited cop2r24 1xS32 MAC0 32bit Maths Accumulators (Value) cop2r25-27 3xS32 MAC1,MAC2,MAC3 32bit Maths Accumulators (Vector) cop2r28-29 1xU15 IRGB,ORGB Convert RGB Color (48bit vs 15bit) cop2r30-31 2xS32 LZCS,LZCR Count Leading-Zeroes/Ones (sign bits) ``` #### GTE Control Register Summary (cop2r32-63) ``` cop2r32-36 9xS16 RT11RT12,..,RT33 Rotation matrix (3x3) ;cnt0-4 cop2r37-39 3x 32 TRX,TRY,TRZ Translation vector (X,Y,Z) ;cnt5-7 cop2r40-44 9xS16 L11L12,..,L33 Light source matrix (3x3) ;cnt8-12 cop2r45-47 3x 32 RBK,GBK,BBK Background color (R,G,B) ;cnt13-15 cop2r48-52 9xS16 LR1LR2,..,LB3 Light color matrix source (3x3) ;cnt16-20 cop2r53-55 3x 32 RFC,GFC,BFC Far color (R,G,B) ;cnt21-23 cop2r56-57 2x 32 OFX,OFY Screen offset (X,Y) ;cnt24-25 cop2r58 BuggyU16 H Projection plane distance. ;cnt26 cop2r59 S16 DQA Depth queing parameter A (coeff) ;cnt27 cop2r60 32 DQB Depth queing parameter B (offset);cnt28 cop2r61-62 2xS16 ZSF3,ZSF4 Average Z scale factors ;cnt29-30 cop2r63 U20 FLAG Returns any calculation errors ;cnt31 ``` ## GTE Registers Note in some functions format is different from the one that's given here.
#### Matrix Registers ``` Rotation matrix (RT) Light matrix (LLM) Light Color matrix (LCM) cop2r32.lsbs=RT11 cop2r40.lsbs=L11 cop2r48.lsbs=LR1 cop2r32.msbs=RT12 cop2r40.msbs=L12 cop2r48.msbs=LR2 cop2r33.lsbs=RT13 cop2r41.lsbs=L13 cop2r49.lsbs=LR3 cop2r33.msbs=RT21 cop2r41.msbs=L21 cop2r49.msbs=LG1 cop2r34.lsbs=RT22 cop2r42.lsbs=L22 cop2r50.lsbs=LG2 cop2r34.msbs=RT23 cop2r42.msbs=L23 cop2r50.msbs=LG3 cop2r35.lsbs=RT31 cop2r43.lsbs=L31 cop2r51.lsbs=LB1 cop2r35.msbs=RT32 cop2r43.msbs=L32 cop2r51.msbs=LB2 cop2r36 =RT33 cop2r44 =L33 cop2r52 =LB3 ``` Each element is 16bit (1bit sign, 3bit integer, 12bit fraction). Reading the last elements (RT33,L33,LB3) returns the 16bit value sign-expanded to 32bit.
#### Translation Vector (TR) (Input, R/W?) ``` cop2r37 (cnt5) - TRX - Translation vector X (R/W?) cop2r38 (cnt6) - TRY - Translation vector Y (R/W?) cop2r39 (cnt7) - TRZ - Translation vector Z (R/W?) ``` Each element is 32bit (1bit sign, 31bit integer).
Used only for MVMVA, RTPS, RTPT commands.
#### Background Color (BK) (Input?, R/W?) ``` cop2r45 (cnt13) - RBK - Background color red component cop2r46 (cnt14) - GBK - Background color green component cop2r47 (cnt15) - BBK - Background color blue component ``` Each element is 32bit (1bit sign, 19bit integer, 12bit fraction).
#### Far Color (FC) (Input?) (R/W?) ``` cop2r53 (cnt21) - RFC - Far color red component cop2r54 (cnt22) - GFC - Far color green component cop2r55 (cnt23) - BFC - Far color blue component ``` Each element is 32bit (1bit sign, 27bit integer, 4bit fraction).
#### Screen Offset and Distance (Input, R/W?) ``` cop2r56 (cnt24) - OFX - Screen offset X cop2r57 (cnt25) - OFY - Screen offset Y cop2r58 (cnt26) - H - Projection plane distance cop2r59 (cnt27) - DQA - Depth queing parameter A.(coeff.) cop2r60 (cnt28) - DQB - Depth queing parameter B.(offset.) ``` The X and Y values are each 32bit (1bit sign, 15bit integer, 16bit fraction).
The H value is 16bit unsigned (0bit sign, 16bit integer, 0bit fraction). BUG: When reading the H register, the hardware does accidently \ the \ 16bit value (ie. values +8000h..+FFFFh are returned as FFFF8000h..FFFFFFFFh) (this bug applies only to "mov rd,cop2r58" opcodes; the actual calculations via RTPS/RTPT opcodes are working okay).
The DQA value is only 16bit (1bit sign, 7bit integer, 8bit fraction).
The DQB value is 32bit (1bit sign, 7bit integer, 24bit? fraction).
Used only for RTPS/RTPT commands.
#### Average Z Registers (ZSF3/ZSF4=Input, R/W?) (OTZ=Result, R) ``` cop2r61 (cnt29) ZSF3 | 0|ZSF3 1,3,12| Z3 average scale factor (normally 1/3) cop2r62 (cnt30) ZSF4 | 0|ZSF4 1,3,12| Z4 average scale factor (normally 1/4) cop2r7 OTZ (R) | |OTZ 0,15, 0| Average Z value (for Ordering Table) ``` Used only for AVSZ3/AVSZ4 commands.
#### Screen XYZ Coordinate FIFOs ``` cop2r12 - SXY0 rw|SY0 1,15, 0|SX0 1,15, 0| Screen XY fifo (older) cop2r13 - SXY1 rw|SY1 1,15, 0|SX1 1,15, 0| Screen XY fifo (old) cop2r14 - SXY2 rw|SY2 1,15, 0|SX2 1,15, 0| Screen XY fifo (new) cop2r15 - SXYP rw|SYP 1,15, 0|SXP 1,15, 0| SXY2-mirror with move-on-write cop2r16 - SZ0 rw| 0|SZ0 0,16, 0| Screen Z fifo (oldest) cop2r17 - SZ1 rw| 0|SZ1 0,16, 0| Screen Z fifo (older) cop2r18 - SZ2 rw| 0|SZ2 0,16, 0| Screen Z fifo (old) cop2r19 - SZ3 rw| 0|SZ3 0,16, 0| Screen Z fifo (new) ``` SX,SY,SZ are used as Output for RTPS/RTPT. Additionally, SX,SY are used as Input for NCLIP, and SZ is used as Input for AVSZ3/AVSZ4.
The SZn Fifo has 4 stages (required for AVSZ4 command), the SXYn Fifo has only 3 stages, and a special mirrored register: SXYP is a mirror of SXY2, the difference is that writing to SXYP moves SXY2/SXY1 to SXY1/SXY0, whilst writing to SXY2 (or any other SXYn or SZn registers) changes only the written register, but doesn't move any other Fifo entries.
#### 16bit Vectors (R/W) ``` Vector 0 (V0) Vector 1 (V1) Vector 2 (V2) Vector 3 (IR) cop2r0.lsbs - VX0 cop2r2.lsbs - VX1 cop2r4.lsbs - VX2 cop2r9 - IR1 cop2r0.msbs - VY0 cop2r2.msbs - VY1 cop2r4.msbs - VY2 cop2r10 - IR2 cop2r1 - VZ0 cop2r3 - VZ1 cop2r5 - VZ2 cop2r11 - IR3 ``` All elements are signed 16bit. The IRn and VZn elements occupy a whole 32bit register, reading these registers returns the 16bit value sign-expanded to 32bit. Note: IRn can be also indirectly accessed via IRGB/ORGB registers.
#### Color Register and Color FIFO ``` cop2r6 - RGBC rw|CODE |B |G |R | Color/code cop2r20 - RGB0 rw|CD0 |B0 |G0 |R0 | Characteristic color fifo. cop2r21 - RGB1 rw|CD1 |B1 |G1 |R1 | cop2r22 - RGB2 rw|CD2 |B2 |G2 |R2 | cop2r23 - (RES1) | | Prohibited ``` RES1 seems to be unused... looks like an unused Fifo stage... RES1 is read/write-able... unlike SXYP (for SXYn Fifo) it does not mirror to RGB2, nor does it have a move-on-write function...
#### Interpolation Factor ``` cop2r8 IR0 rw|Sign |IR0 1, 3,12| Intermediate value 0. ``` Used as Output for RTPS/RTPT, and as Input for various commands.
#### XX... ``` cop2r24 MAC0 rw|MAC0 1,31,0 | Sum of products value 0 ``` #### XX... ``` cop2r25 MAC1 rw|MAC1 1,31,0 | Sum of products value 1 cop2r26 MAC2 rw|MAC2 1,31,0 | Sum of products value 2 cop2r27 MAC3 rw|MAC3 1,31,0 | Sum of products value 3 ``` #### cop2r28 - IRGB - Color conversion Input (R/W) Expands 5:5:5 bit RGB (range 0..1Fh) to 16:16:16 bit RGB (range 0000h..0F80h).
``` 0-4 Red (0..1Fh) (R/W) ;multiplied by 80h, and written to IR1 5-9 Green (0..1Fh) (R/W) ;multiplied by 80h, and written to IR2 10-14 Blue (0..1Fh) (R/W) ;multiplied by 80h, and written to IR3 15-31 Not used (always zero) (Read only) ``` After writing to IRGB, the result can be read from IR3 after TWO nop's, and from IR1,IR2 after THREE nop's (for uncached code, ONE nop would work). When using IR1,IR2,IR3 as parameters for GTE commands, similar timing restrictions might apply... depending on when the specific commands use the parameters?
#### cop2r29 - ORGB - Color conversion Output (R) Collapses 16:16:16 bit RGB (range 0000h..0F80h) to 5:5:5 bit RGB (range 0..1Fh). Negative values (8000h..FFFFh/80h) are saturated to 00h, large positive values (1000h..7FFFh/80h) are saturated to 1Fh, there are no overflow or saturation flags set in cop2r63 though.
``` 0-4 Red (0..1Fh) (R) ;IR1 divided by 80h, saturated to +00h..+1Fh 5-9 Green (0..1Fh) (R) ;IR2 divided by 80h, saturated to +00h..+1Fh 10-14 Blue (0..1Fh) (R) ;IR3 divided by 80h, saturated to +00h..+1Fh 15-31 Not used (always zero) (Read only) ``` Any changes to IR1,IR2,IR3 are reflected to this register (and, actually also to IRGB) (ie. ORGB is simply a read-only mirror of IRGB).
#### cop2r30 - LZCS - Count Leading Bits Source data (R/W) #### cop2r31 - LZCR - Count Leading Bits Result (R) Reading LZCR returns the leading 0 count of LZCS if LZCS is positive and the leading 1 count of LZCS if LZCS is negative. The results are in range 1..32.
#### cop2r63 (cnt31) - FLAG - Returns any calculation errors. See GTE Saturation chapter.
## GTE Saturation Maths overflows are indicated in FLAG register. In most cases, the result is saturated to MIN/MAX values (except MAC0,MAC1,MAC2,MAC3 which aren't saturated). For IR1,IR2,IR3 many commands allow to select the MIN value via "lm" bit of the GTE opcode (though not all commands, RTPS/RTPT always act as if lm=0).
#### cop2r63 (cnt31) - FLAG - Returns any calculation errors. ``` 31 Error Flag (Bit30..23, and 18..13 ORed together) (Read only) 30 MAC1 Result larger than 43 bits and positive 29 MAC2 Result larger than 43 bits and positive 28 MAC3 Result larger than 43 bits and positive 27 MAC1 Result larger than 43 bits and negative 26 MAC2 Result larger than 43 bits and negative 25 MAC3 Result larger than 43 bits and negative 24 IR1 saturated to +0000h..+7FFFh (lm=1) or to -8000h..+7FFFh (lm=0) 23 IR2 saturated to +0000h..+7FFFh (lm=1) or to -8000h..+7FFFh (lm=0) 22 IR3 saturated to +0000h..+7FFFh (lm=1) or to -8000h..+7FFFh (lm=0) 21 Color-FIFO-R saturated to +00h..+FFh 20 Color-FIFO-G saturated to +00h..+FFh 19 Color-FIFO-B saturated to +00h..+FFh 18 SZ3 or OTZ saturated to +0000h..+FFFFh 17 Divide overflow. RTPS/RTPT division result saturated to max=1FFFFh 16 MAC0 Result larger than 31 bits and positive 15 MAC0 Result larger than 31 bits and negative 14 SX2 saturated to -0400h..+03FFh 13 SY2 saturated to -0400h..+03FFh 12 IR0 saturated to +0000h..+1000h 0-11 Not used (always zero) (Read only) ``` Bit30-12 are read/write-able, ie. they can be set/reset by software, however, that's normally not required - all bits are automatically reset at the begin of a new GTE command.
Bit31 is apparently intended for RTPS/RTPT commands, since it triggers only on flags that are affected by these two commands, but even for that commands it's totally useless since one could as well check if FLAG is nonzero.
Note: Writing 32bit values to 16bit GTE registers by software does not trigger any overflow/saturation flags (and does not do any saturation), eg. writing 12008900h (positive 32bit) to a signed 16bit register sets that register to FFFF8900h (negative 16bit).
## GTE Opcode Summary #### GTE Command Summary (sorted by Real Opcode bits) (bit0-5) ``` Opc Name Clk Expl. 00h - N/A (modifies similar registers than RTPS...) 01h RTPS 15 Perspective Transformation single 0xh - N/A 06h NCLIP 8 Normal clipping 0xh - N/A 0Ch OP(sf) 6 Outer product of 2 vectors 0xh - N/A 10h DPCS 8 Depth Cueing single 11h INTPL 8 Interpolation of a vector and far color vector 12h MVMVA 8 Multiply vector by matrix and add vector (see below) 13h NCDS 19 Normal color depth cue single vector 14h CDP 13 Color Depth Que 15h - N/A 16h NCDT 44 Normal color depth cue triple vectors 1xh - N/A 1Bh NCCS 17 Normal Color Color single vector 1Ch CC 11 Color Color 1Dh - N/A 1Eh NCS 14 Normal color single 1Fh - N/A 20h NCT 30 Normal color triple 2xh - N/A 28h SQR(sf)5 Square of vector IR 29h DCPL 8 Depth Cue Color light 2Ah DPCT 17 Depth Cueing triple (should be fake=08h, but isn't) 2xh - N/A 2Dh AVSZ3 5 Average of three Z values 2Eh AVSZ4 6 Average of four Z values 2Fh - N/A 30h RTPT 23 Perspective Transformation triple 3xh - N/A 3Dh GPF(sf)5 General purpose interpolation 3Eh GPL(sf)5 General purpose interpolation with base 3Fh NCCT 39 Normal Color Color triple vector ``` Unknown if/what happens when using the "N/A" opcodes?
#### GTE Command Summary (sorted by Fake Opcode bits) (bit20-24) The fake opcode number in bit20-24 has absolutely no effect on the hardware, it seems to be solely used to (or not to) confuse developers. Having the opcodes sorted by their fake numbers gives a more or less well arranged list:
``` Fake Name Clk Expl. 00h - N/A 01h RTPS 15 Perspective Transformation single 02h RTPT 23 Perspective Transformation triple 03h - N/A 04h MVMVA 8 Multiply vector by matrix and add vector (see below) 05h - N/A 06h DCPL 8 Depth Cue Color light 07h DPCS 8 Depth Cueing single 08h DPCT 17 Depth Cueing triple (should be fake=08h, but isn't) 09h INTPL 8 Interpolation of a vector and far color vector 0Ah SQR(sf)5 Square of vector IR 0Bh - N/A 0Ch NCS 14 Normal color single 0Dh NCT 30 Normal color triple 0Eh NCDS 19 Normal color depth cue single vector 0Fh NCDT 44 Normal color depth cue triple vectors 10h NCCS 17 Normal Color Color single vector 11h NCCT 39 Normal Color Color triple vector 12h CDP 13 Color Depth Que 13h CC 11 Color Color 14h NCLIP 8 Normal clipping 15h AVSZ3 5 Average of three Z values 16h AVSZ4 6 Average of four Z values 17h OP(sf) 6 Outer product of 2 vectors 18h - N/A 19h GPF(sf)5 General purpose interpolation 1Ah GPL(sf)5 General purpose interpolation with base 1Bh - N/A 1Ch - N/A 1Dh - N/A 1Eh - N/A 1Fh - N/A ``` For the sort-effect, DCPT should use fake=08h, but Sony seems to have accidently numbered it fake=0Fh in their devkit (giving it the same fake number as for NCDT). Also, "Wipeout 2097" accidently uses 0140006h (fake=01h and distorted bit18) instead of 1400006h (fake=14h) for NCLIP.
#### Additional Functions The LZCS/LZCR registers offer a Count-Leading-Zeroes/Leading-Ones function.
The IRGB/ORGB registers allow to convert between 48bit and 15bit RGB colors.
These registers work without needing to send any COP2 commands. However, unlike for commands (which do automatically halt the CPU when needed), one must insert dummy opcodes between writing and reading the registers.
## GTE Coordinate Calculation Commands #### COP2 0180001h - 15 Cycles - RTPS - Perspective Transformation (single) #### COP2 0280030h - 23 Cycles - RTPT - Perspective Transformation (triple) RTPS performs final Rotate, translate and perspective transformation on vertex V0. Before writing to the FIFOs, the older entries are moved one stage down. RTPT is same as RTPS, but repeats for V1 and V2. The "sf" bit should be usually set.
``` IR1 = MAC1 = (TRX*1000h + RT11*VX0 + RT12*VY0 + RT13*VZ0) SAR (sf*12) IR2 = MAC2 = (TRY*1000h + RT21*VX0 + RT22*VY0 + RT23*VZ0) SAR (sf*12) IR3 = MAC3 = (TRZ*1000h + RT31*VX0 + RT32*VY0 + RT33*VZ0) SAR (sf*12) SZ3 = MAC3 SAR ((1-sf)*12) ;ScreenZ FIFO 0..+FFFFh MAC0=(((H*20000h/SZ3)+1)/2)*IR1+OFX, SX2=MAC0/10000h ;ScrX FIFO -400h..+3FFh MAC0=(((H*20000h/SZ3)+1)/2)*IR2+OFY, SY2=MAC0/10000h ;ScrY FIFO -400h..+3FFh MAC0=(((H*20000h/SZ3)+1)/2)*DQA+DQB, IR0=MAC0/1000h ;Depth cueing 0..+1000h ``` If the result of the "(((H\*20000h/SZ3)+1)/2)" division is greater than 1FFFFh, then the division result is saturated to +1FFFFh, and the divide overflow bit in the FLAG register gets set; that happens if the vertex is exceeding the "near clip plane", ie. if it is very close to the camera (SZ3\<=H/2), exactly at the camara position (SZ3=0), or behind the camera (negative Z coordinates are saturated to SZ3=0). For details on the division, see:
[GTE Division Inaccuracy](geometrytransformationenginegte.md#gte-division-inaccuracy)
For "far plane clipping", one can use the SZ3 saturation flag (MaxZ=FFFFh), or the IR3 saturation flag (MaxZ=7FFFh) (eg. used by Wipeout 2097), or one can compare the SZ3 value with any desired MaxZ value by software.
Note: The command does saturate IR1,IR2,IR3 to -8000h..+7FFFh (regardless of lm bit). When using RTP with sf=0, then the IR3 saturation flag (FLAG.22) gets set \ if "MAC3 SAR 12" exceeds -8000h..+7FFFh (although IR3 is saturated when "MAC3" exceeds -8000h..+7FFFh).
#### COP2 1400006h - 8 Cycles - NCLIP - Normal clipping ``` MAC0 = SX0*SY1 + SX1*SY2 + SX2*SY0 - SX0*SY2 - SX1*SY0 - SX2*SY1 ``` The sign of the result indicates whether the polygon coordinates are arranged clockwise or anticlockwise (ie. whether the front side or backside is visible). If the result is zero, then it's neither one (ie. the vertices are all arranged in a straight line). Note: The GPU probably renders straight lines as invisble 0 pixel width lines?
#### COP2 158002Dh - 5 Cycles - AVSZ3 - Average of three Z values (for Triangles) #### COP2 168002Eh - 6 Cycles - AVSZ4 - Average of four Z values (for Quads) ``` MAC0 = ZSF3*(SZ1+SZ2+SZ3) ;for AVSZ3 MAC0 = ZSF4*(SZ0+SZ1+SZ2+SZ3) ;for AVSZ4 OTZ = MAC0/1000h ;for both (saturated to 0..FFFFh) ``` Adds three or four Z values together and multplies them by a fixed point value. The result can be used as index in the GPU's Ordering Table (OT).
[GPU Depth Ordering](graphicsprocessingunitgpu.md#gpu-depth-ordering)
The scaling factors would be usually ZSF3=N/30h and ZSF4=N/40h, where "N" is the number of entries in the OT (max 10000h). SZn and OTZ are unsigned 16bit values, for whatever reason ZSFn registers are signed 16bit values (negative values would allow a negative result in MAC0, but would saturate OTZ to zero).
## GTE General Purpose Calculation Commands #### COP2 0400012h - 8 Cycles - MVMVA(sf,mx,v,cv,lm) Multiply vector by matrix and vector addition.
``` Mx = matrix specified by mx ;RT/LLM/LCM - Rotation, light or color matrix Vx = vector specified by v ;V0, V1, V2, or [IR1,IR2,IR3] Tx = translation vector specified by cv ;TR or BK or Bugged/FC, or None ``` Calculation:
``` MAC1 = (Tx1*1000h + Mx11*Vx1 + Mx12*Vx2 + Mx13*Vx3) SAR (sf*12) MAC2 = (Tx2*1000h + Mx21*Vx1 + Mx22*Vx2 + Mx23*Vx3) SAR (sf*12) MAC3 = (Tx3*1000h + Mx31*Vx1 + Mx32*Vx2 + Mx33*Vx3) SAR (sf*12) [IR1,IR2,IR3] = [MAC1,MAC2,MAC3] ``` Multiplies a vector with either the rotation matrix, the light matrix or the color matrix and then adds the translation vector or background color vector.
The GTE also allows selection of the far color vector (FC), but this vector is not added correctly by the hardware: The return values are reduced to the last portion of the formula, ie. MAC1=(Mx13\*Vx3) SAR (sf\*12), and similar for MAC2 and MAC3, nethertheless, some bits in the FLAG register seem to be adjusted as if the full operation would have been executed. Setting Mx=3 selects a garbage matrix (with elements -60h, +60h, IR0, RT13, RT13, RT13, RT22, RT22, RT22).
#### COP2 0A00428h+sf\*80000h - 5 Cycles - SQR(sf) - Square vector ``` [MAC1,MAC2,MAC3] = [IR1*IR1,IR2*IR2,IR3*IR3] SHR (sf*12) [IR1,IR2,IR3] = [MAC1,MAC2,MAC3] ;IR1,IR2,IR3 saturated to max 7FFFh ``` Calculates the square of a vector. The result is, of course, always positive, so the "lm" flag for negative saturation has no effect.
#### COP2 170000Ch+sf\*80000h - 6 Cycles - OP(sf,lm) - Outer product of 2 vectors ``` [MAC1,MAC2,MAC3] = [IR3*D2-IR2*D3, IR1*D3-IR3*D1, IR2*D1-IR1*D2] SAR (sf*12) [IR1,IR2,IR3] = [MAC1,MAC2,MAC3] ;copy result ``` Calculates the cross product of two signed 16bit vectors. Note: D1,D2,D3 are meant to be the RT11,RT22,RT33 elements of the RT matrix "misused" as vector. lm should be usually zero.
#### LZCS/LZCR registers - ? Cycles - Count-Leading-Zeroes/Leading-Ones The LZCS/LZCR registers offer a Count-Leading-Zeroes/Leading-Ones function.
## GTE Color Calculation Commands #### COP2 0C8041Eh - 14 Cycles - NCS - Normal color (single) #### COP2 0D80420h - 30 Cycles - NCT - Normal color (triple) #### COP2 108041Bh - 17 Cycles - NCCS - Normal Color Color (single vector) #### COP2 118043Fh - 39 Cycles - NCCT - Normal Color Color (triple vector) #### COP2 0E80413h - 19 Cycles - NCDS - Normal color depth cue (single vector) #### COP2 0F80416h - 44 Cycles - NCDT - Normal color depth cue (triple vectors) In: V0=Normal vector (for triple variants repeated with V1 and V2), BK=Background color, RGBC=Primary color/code, LLM=Light matrix, LCM=Color matrix, IR0=Interpolation value.
``` [IR1,IR2,IR3] = [MAC1,MAC2,MAC3] = (LLM*V0) SAR (sf*12) [IR1,IR2,IR3] = [MAC1,MAC2,MAC3] = (BK*1000h + LCM*IR) SAR (sf*12) [MAC1,MAC2,MAC3] = [R*IR1,G*IR2,B*IR3] SHL 4 ;<--- for NCDx/NCCx [MAC1,MAC2,MAC3] = MAC+(FC-MAC)*IR0 ;<--- for NCDx only [MAC1,MAC2,MAC3] = [MAC1,MAC2,MAC3] SAR (sf*12) ;<--- for NCDx/NCCx Color FIFO = [MAC1/16,MAC2/16,MAC3/16,CODE], [IR1,IR2,IR3] = [MAC1,MAC2,MAC3] ``` #### COP2 138041Ch - 11 Cycles - CC(lm=1) - Color Color #### COP2 1280414h - 13 Cycles - CDP(...) - Color Depth Que In: [IR1,IR2,IR3]=Vector, RGBC=Primary color/code, LCM=Color matrix, BK=Background color, and, for CDP, IR0=Interpolation value, FC=Far color.
``` [IR1,IR2,IR3] = [MAC1,MAC2,MAC3] = (BK*1000h + LCM*IR) SAR (sf*12) [MAC1,MAC2,MAC3] = [R*IR1,G*IR2,B*IR3] SHL 4 [MAC1,MAC2,MAC3] = MAC+(FC-MAC)*IR0 ;<--- for CDP only [MAC1,MAC2,MAC3] = [MAC1,MAC2,MAC3] SAR (sf*12) Color FIFO = [MAC1/16,MAC2/16,MAC3/16,CODE], [IR1,IR2,IR3] = [MAC1,MAC2,MAC3] ``` #### COP2 0680029h - 8 Cycles - DCPL - Depth Cue Color light #### COP2 0780010h - 8 Cycles - DPCS - Depth Cueing (single) #### COP2 0x8002Ah - 17 Cycles - DPCT - Depth Cueing (triple) #### COP2 0980011h - 8 Cycles - INTPL - Interpolation of a vector and far color In: [IR1,IR2,IR3]=Vector, FC=Far Color, IR0=Interpolation value, CODE=MSB of RGBC, and, for DCPL, R,G,B=LSBs of RGBC.
``` [MAC1,MAC2,MAC3] = [R*IR1,G*IR2,B*IR3] SHL 4 ;<--- for DCPL only [MAC1,MAC2,MAC3] = [IR1,IR2,IR3] SHL 12 ;<--- for INTPL only [MAC1,MAC2,MAC3] = [R,G,B] SHL 16 ;<--- for DPCS/DPCT [MAC1,MAC2,MAC3] = MAC+(FC-MAC)*IR0 [MAC1,MAC2,MAC3] = [MAC1,MAC2,MAC3] SAR (sf*12) Color FIFO = [MAC1/16,MAC2/16,MAC3/16,CODE], [IR1,IR2,IR3] = [MAC1,MAC2,MAC3] ``` DPCT executes thrice, and reads the R,G,B values from RGB0 (ie. reads from the Bottom of the Color FIFO, instead of from the RGBC register) (the CODE value is kept read from RGBC as usually), so, after DPCT execution, the RGB0,RGB1,RGB2 Fifo entries are modified.
#### COP2 190003Dh - 5 Cycles - GPF(sf,lm) - General purpose Interpolation #### COP2 1A0003Eh - 5 Cycles - GPL(sf,?) - General Interpolation with base ``` [MAC1,MAC2,MAC3] = [0,0,0] ;<--- for GPF only [MAC1,MAC2,MAC3] = [MAC1,MAC2,MAC3] SHL (sf*12) ;<--- for GPL only [MAC1,MAC2,MAC3] = (([IR1,IR2,IR3] * IR0) + [MAC1,MAC2,MAC3]) SAR (sf*12) Color FIFO = [MAC1/16,MAC2/16,MAC3/16,CODE], [IR1,IR2,IR3] = [MAC1,MAC2,MAC3] ``` Note: Although the SHL in GPL is theoretically undone by the SAR, 44bit overflows can occur internally when sf=1.
#### Details on "MAC+(FC-MAC)\*IR0" ``` [IR1,IR2,IR3] = (([RFC,GFC,BFC] SHL 12) - [MAC1,MAC2,MAC3]) SAR (sf*12) [MAC1,MAC2,MAC3] = (([IR1,IR2,IR3] * IR0) + [MAC1,MAC2,MAC3]) ``` Note: Above "[IR1,IR2,IR3]=(FC-MAC)" is saturated to -8000h..+7FFFh (ie. as if lm=0), anyways, further writes to [IR1,IR2,IR3] (within the same command) are saturated as usually (ie. depening on lm setting).
#### Details on "(LLM\*V0) SAR (sf\*12)" and "(BK\*1000h + LCM\*IR) SAR (sf\*12)" Works like MVMVA command (see there), but with fixed Tx/Vx/Mx parameters, the sf/lm bits can be changed and do affect the results (although normally both bits should be set for use with color matrices).
#### Notes The 8bit RGB values written to the top of Color Fifo are the 32bit MACn values divided by 16, and saturated to +00h..+FFh, and of course, the older Fifo entries are moved downwards. Note that, at the GPU side, the meaning of the RGB values depends on whether or not texture blending is used (for untextured polygons FFh is max brightness) (for texture blending FFh is double brightness and 80h is normal brightness).
The 8bit CODE value is intended to contain a GP0(20h..7Fh) Rendering command, allowing to automatically merge the 8bit command number, with the 24bit color value.
The IRGB/ORGB registers allow to convert between 48bit and 15bit RGB colors.
Although the result of the commands in this chapter is written to the Color FIFO, some commands like GPF/GPL may be also used for other purposes (eg. to scale or scale/translate single vertices).
## GTE Division Inaccuracy #### GTE Division Inaccuracy (for RTPS/RTPT commands) Basically, the GTE division does (attempt to) work as so (using 33bit maths):
``` n = (((H*20000h/SZ3)+1)/2) ``` alternatly, below would give (almost) the same result (using 32bit maths):
``` n = ((H*10000h+SZ3/2)/SZ3) ``` in both cases, the result is saturated about as so:
``` if n>1FFFFh or division_by_zero then n=1FFFFh, FLAG.Bit17=1, FLAG.Bit31=1 ``` However, the real GTE hardware is using a fast, but less accurate division mechanism (based on Unsigned Newton-Raphson (UNR) algorithm):
``` if (H < SZ3*2) then ;check if overflow z = count_leading_zeroes(SZ3) ;z=0..0Fh (for 16bit SZ3) n = (H SHL z) ;n=0..7FFF8000h d = (SZ3 SHL z) ;d=8000h..FFFFh u = unr_table[(d-7FC0h) SHR 7] + 101h ;u=200h..101h d = ((2000080h - (d * u)) SHR 8) ;d=10000h..0FF01h d = ((0000080h + (d * u)) SHR 8) ;d=20000h..10000h n = min(1FFFFh, (((n*d) + 8000h) SHR 16)) ;n=0..1FFFFh else n = 1FFFFh, FLAG.Bit17=1, FLAG.Bit31=1 ;n=1FFFFh plus overflow flag ``` the GTE's unr\_table[000h..100h] consists of following values:
``` FFh,FDh,FBh,F9h,F7h,F5h,F3h,F1h,EFh,EEh,ECh,EAh,E8h,E6h,E4h,E3h ;\ E1h,DFh,DDh,DCh,DAh,D8h,D6h,D5h,D3h,D1h,D0h,CEh,CDh,CBh,C9h,C8h ; 00h..3Fh C6h,C5h,C3h,C1h,C0h,BEh,BDh,BBh,BAh,B8h,B7h,B5h,B4h,B2h,B1h,B0h ; AEh,ADh,ABh,AAh,A9h,A7h,A6h,A4h,A3h,A2h,A0h,9Fh,9Eh,9Ch,9Bh,9Ah ;/ 99h,97h,96h,95h,94h,92h,91h,90h,8Fh,8Dh,8Ch,8Bh,8Ah,89h,87h,86h ;\ 85h,84h,83h,82h,81h,7Fh,7Eh,7Dh,7Ch,7Bh,7Ah,79h,78h,77h,75h,74h ; 40h..7Fh 73h,72h,71h,70h,6Fh,6Eh,6Dh,6Ch,6Bh,6Ah,69h,68h,67h,66h,65h,64h ; 63h,62h,61h,60h,5Fh,5Eh,5Dh,5Dh,5Ch,5Bh,5Ah,59h,58h,57h,56h,55h ;/ 54h,53h,53h,52h,51h,50h,4Fh,4Eh,4Dh,4Dh,4Ch,4Bh,4Ah,49h,48h,48h ;\ 47h,46h,45h,44h,43h,43h,42h,41h,40h,3Fh,3Fh,3Eh,3Dh,3Ch,3Ch,3Bh ; 80h..BFh 3Ah,39h,39h,38h,37h,36h,36h,35h,34h,33h,33h,32h,31h,31h,30h,2Fh ; 2Eh,2Eh,2Dh,2Ch,2Ch,2Bh,2Ah,2Ah,29h,28h,28h,27h,26h,26h,25h,24h ;/ 24h,23h,22h,22h,21h,20h,20h,1Fh,1Eh,1Eh,1Dh,1Dh,1Ch,1Bh,1Bh,1Ah ;\ 19h,19h,18h,18h,17h,16h,16h,15h,15h,14h,14h,13h,12h,12h,11h,11h ; C0h..FFh 10h,0Fh,0Fh,0Eh,0Eh,0Dh,0Dh,0Ch,0Ch,0Bh,0Ah,0Ah,09h,09h,08h,08h ; 07h,07h,06h,06h,05h,05h,04h,04h,03h,03h,02h,02h,01h,01h,00h,00h ;/ 00h ;<-- one extra table entry (for "(d-7FC0h)/80h"=100h) ;-100h ``` Above can be generated as "unr\_table[i]=min(0,(40000h/(i+100h)+1)/2-101h)".
Some special cases: NNNNh/0001h uses a big multiplier (d=20000h), in practice, this can occur only for 0000h/0001h and 0001h/0001h (due to the H\ The min(1FFFFh) limit is needed for cases like FE3Fh/7F20h, F015h/780Bh, etc. (these do produce UNR result 20000h, and are saturated to 1FFFFh, but without setting overflow FLAG bits).