# CPU Specifications #### CPU [CPU Registers](cpuspecifications.md#cpu-registers)
[CPU Opcode Encoding](cpuspecifications.md#cpu-opcode-encoding)
[CPU Load/Store Opcodes](cpuspecifications.md#cpu-loadstore-opcodes)
[CPU ALU Opcodes](cpuspecifications.md#cpu-alu-opcodes)
[CPU Jump Opcodes](cpuspecifications.md#cpu-jump-opcodes)
[CPU Coprocessor Opcodes](cpuspecifications.md#cpu-coprocessor-opcodes)
[CPU Pseudo Opcodes](cpuspecifications.md#cpu-pseudo-opcodes)
#### System Control Coprocessor (COP0) [COP0 - Register Summary](cpuspecifications.md#cop0---register-summary)
[COP0 - Exception Handling](cpuspecifications.md#cop0---exception-handling)
[COP0 - Misc](cpuspecifications.md#cop0---misc)
[COP0 - Debug Registers](cpuspecifications.md#cop0---debug-registers)
## CPU Registers All registers are 32bit wide.
``` Name Alias Common Usage (R0) zero Constant (always 0) (this one isn't a real register) R1 at Assembler temporary (destroyed by some pseudo opcodes!) R2-R3 v0-v1 Subroutine return values, may be changed by subroutines R4-R7 a0-a3 Subroutine arguments, may be changed by subroutines R8-R15 t0-t7 Temporaries, may be changed by subroutines R16-R23 s0-s7 Static variables, must be saved by subs R24-R25 t8-t9 Temporaries, may be changed by subroutines R26-R27 k0-k1 Reserved for kernel (destroyed by some IRQ handlers!) R28 gp Global pointer (rarely used) R29 sp Stack pointer R30 fp(s8) Frame Pointer, or 9th Static variable, must be saved R31 ra Return address (used so by JAL,BLTZAL,BGEZAL opcodes) - pc Program counter - hi,lo Multiply/divide results, may be changed by subroutines ``` R0 is always zero.
R31 can be used as general purpose register, however, some opcodes are using it to store the return address: JAL, BLTZAL, BGEZAL. (Note: JALR can optionally store the return address in R31, or in R1..R30. Exceptions store the return address in cop0r14 - EPC).
#### R29 (SP) - Full Decrementing Wasted Stack Pointer The CPU doesn't explicitly have stack-related registers or opcodes, however, conventionally, R29 is used as stack pointer (SP). The stack can be accessed with normal load/store opcodes, which do not automatically increase/decrease SP, so the SP register must be manually modified to (de-)allocate data.
The PSX BIOS is using "Full Decrementing Wasted Stack".
Decrementing means that SP gets decremented when allocating data (that's common for most CPUs) - Full means that SP points to the first ALLOCATED word on the stack, so the allocated memory is at SP+0 and above, free memory at SP-1 and below, Wasted means that when calling a sub-function with N parameters, then the caller must pre-allocate N works on stack, and the sub-function may freely use and destroy these words; at [SP+0..N\*4-1].
For example, "push ra,r16,r17" would be implemented as:
``` sub sp,20h mov [sp+14h],ra mov [sp+18h],r16 mov [sp+1Ch],r17 ``` where the allocated 20h bytes have the following purpose:
``` [sp+00h..0Fh] wasted stack (may, or may not, be used by sub-functions) [sp+10h..13h] 8-byte alignment padding (not used) [sp+14h..1Fh] pushed registers ``` ## CPU Opcode Encoding #### Primary opcode field (Bit 26..31) ``` 00h=SPECIAL 08h=ADDI 10h=COP0 18h=N/A 20h=LB 28h=SB 30h=LWC0 38h=SWC0 01h=BcondZ 09h=ADDIU 11h=COP1 19h=N/A 21h=LH 29h=SH 31h=LWC1 39h=SWC1 02h=J 0Ah=SLTI 12h=COP2 1Ah=N/A 22h=LWL 2Ah=SWL 32h=LWC2 3Ah=SWC2 03h=JAL 0Bh=SLTIU 13h=COP3 1Bh=N/A 23h=LW 2Bh=SW 33h=LWC3 3Bh=SWC3 04h=BEQ 0Ch=ANDI 14h=N/A 1Ch=N/A 24h=LBU 2Ch=N/A 34h=N/A 3Ch=N/A 05h=BNE 0Dh=ORI 15h=N/A 1Dh=N/A 25h=LHU 2Dh=N/A 35h=N/A 3Dh=N/A 06h=BLEZ 0Eh=XORI 16h=N/A 1Eh=N/A 26h=LWR 2Eh=SWR 36h=N/A 3Eh=N/A 07h=BGTZ 0Fh=LUI 17h=N/A 1Fh=N/A 27h=N/A 2Fh=N/A 37h=N/A 3Fh=N/A ``` #### Secondary opcode field (Bit 0..5) (when Primary opcode = 00h) ``` 00h=SLL 08h=JR 10h=MFHI 18h=MULT 20h=ADD 28h=N/A 30h=N/A 38h=N/A 01h=N/A 09h=JALR 11h=MTHI 19h=MULTU 21h=ADDU 29h=N/A 31h=N/A 39h=N/A 02h=SRL 0Ah=N/A 12h=MFLO 1Ah=DIV 22h=SUB 2Ah=SLT 32h=N/A 3Ah=N/A 03h=SRA 0Bh=N/A 13h=MTLO 1Bh=DIVU 23h=SUBU 2Bh=SLTU 33h=N/A 3Bh=N/A 04h=SLLV 0Ch=SYSCALL 14h=N/A 1Ch=N/A 24h=AND 2Ch=N/A 34h=N/A 3Ch=N/A 05h=N/A 0Dh=BREAK 15h=N/A 1Dh=N/A 25h=OR 2Dh=N/A 35h=N/A 3Dh=N/A 06h=SRLV 0Eh=N/A 16h=N/A 1Eh=N/A 26h=XOR 2Eh=N/A 36h=N/A 3Eh=N/A 07h=SRAV 0Fh=N/A 17h=N/A 1Fh=N/A 27h=NOR 2Fh=N/A 37h=N/A 3Fh=N/A ``` #### Opcode/Parameter Encoding ``` 31..26 |25..21|20..16|15..11|10..6 | 5..0 | 6bit | 5bit | 5bit | 5bit | 5bit | 6bit | -------+------+------+------+------+--------+------------ 000000 | N/A | rt | rd | imm5 | 0000xx | shift-imm 000000 | rs | rt | rd | N/A | 0001xx | shift-reg 000000 | rs | N/A | N/A | N/A | 001000 | jr 000000 | rs | N/A | rd | N/A | 001001 | jalr 000000 | <-----comment20bit------> | 00110x | sys/brk 000000 | N/A | N/A | rd | N/A | 0100x0 | mfhi/mflo 000000 | rs | N/A | N/A | N/A | 0100x1 | mthi/mtlo 000000 | rs | rt | N/A | N/A | 0110xx | mul/div 000000 | rs | rt | rd | N/A | 10xxxx | alu-reg 000001 | rs | 00000| <--immediate16bit--> | bltz 000001 | rs | 00001| <--immediate16bit--> | bgez 000001 | rs | 10000| <--immediate16bit--> | bltzal 000001 | rs | 10001| <--immediate16bit--> | bgezal 00001x | <---------immediate26bit---------> | j/jal 00010x | rs | rt | <--immediate16bit--> | beq/bne 00011x | rs | N/A | <--immediate16bit--> | blez/bgtz 001xxx | rs | rt | <--immediate16bit--> | alu-imm 001111 | N/A | rt | <--immediate16bit--> | lui-imm 100xxx | rs | rt | <--immediate16bit--> | load rt,[rs+imm] 101xxx | rs | rt | <--immediate16bit--> | store rt,[rs+imm] x1xxxx | <------coprocessor specific------> | coprocessor (see below) ``` #### Coprocessor Opcode/Parameter Encoding ``` 31..26 |25..21|20..16|15..11|10..6 | 5..0 | 6bit | 5bit | 5bit | 5bit | 5bit | 6bit | -------+------+------+------+------+--------+------------ 0100nn |0|0000| rt | rd | N/A | 000000 | MFCn rt,rd_dat ;rt = dat 0100nn |0|0010| rt | rd | N/A | 000000 | CFCn rt,rd_cnt ;rt = cnt 0100nn |0|0100| rt | rd | N/A | 000000 | MTCn rt,rd_dat ;dat = rt 0100nn |0|0110| rt | rd | N/A | 000000 | CTCn rt,rd_cnt ;cnt = rt 0100nn |0|1000|00000 | <--immediate16bit--> | BCnF target ;jump if false 0100nn |0|1000|00001 | <--immediate16bit--> | BCnT target ;jump if true 0100nn |1| <--------immediate25bit--------> | COPn imm25 010000 |1|0000| N/A | N/A | N/A | 000001 | COP0 01h ;=TLBR 010000 |1|0000| N/A | N/A | N/A | 000010 | COP0 02h ;=TLBWI 010000 |1|0000| N/A | N/A | N/A | 000110 | COP0 06h ;=TLBWR 010000 |1|0000| N/A | N/A | N/A | 001000 | COP0 08h ;=TLBP 010000 |1|0000| N/A | N/A | N/A | 010000 | COP0 10h ;=RFE 1100nn | rs | rt | <--immediate16bit--> | LWCn rt_dat,[rs+imm] 1110nn | rs | rt | <--immediate16bit--> | SWCn rt_dat,[rs+imm] ``` #### Illegal Opcodes All opcodes that are marked as "N/A" in the Primary and Secondary opcode tables are causing a Reserved Instruction Exception (excode=0Ah).
The unused operand bits (eg. Bit21-25 for LUI opcode) should be usually zero, but do not necessarily trigger exceptions if set to nonzero values.
## CPU Load/Store Opcodes #### Load instructions ``` movbs rt,[imm+rs] lb rt,imm(rs) rt=[imm+rs] ;byte sign-extended movb rt,[imm+rs] lbu rt,imm(rs) rt=[imm+rs] ;byte zero-extended movhs rt,[imm+rs] lh rt,imm(rs) rt=[imm+rs] ;halfword sign-extended movh rt,[imm+rs] lhu rt,imm(rs) rt=[imm+rs] ;halfword zero-extended mov rt,[imm+rs] lw rt,imm(rs) rt=[imm+rs] ;word ``` Load instructions can read from the data cache (if the data is not in the cache, or if the memory region is uncached, then the CPU gets halted until it has read the data) (however, the PSX doesn't have a data cache).
#### Caution - Load Delay The loaded data is NOT available to the next opcode, ie. the target register isn't updated until the next opcode has completed. So, if the next opcode tries to read from the load destination register, then it would (usually) receive the OLD value of that register (unless an IRQ occurs between the load and next opcode, in that case the load would complete during IRQ handling, and so, the next opcode would receive the NEW value).
#### Store instructions ``` movb [imm+rs],rt sb rt,imm(rs) [imm+rs]=(rt AND FFh) ;store 8bit movh [imm+rs],rt sh rt,imm(rs) [imm+rs]=(rt AND FFFFh) ;store 16bit mov [imm+rs],rt sw rt,imm(rs) [imm+rs]=rt ;store 32bit ``` Store operations are passed to the write-buffer, so they can execute within a single clock cycle (unless the write-buffer was full, in that case the CPU gets halted until there's room in the buffer). But, the PSX doesn't have a writebuffer...?
#### Load/Store Alignment Halfword addresses must be aligned by 2, word addresses must be aligned by 4, trying to access mis-aligned addresses will cause an exception. There's no alignment restriction for bytes.
#### Unaligned Load/Store ``` lwr rt,imm(rs) load right bits of rt from memory (usually imm+0) lwl rt,imm(rs) load left bits of rt from memory (usually imm+3) swr rt,imm(rs) store right bits of rt to memory (usually imm+0) swl rt,imm(rs) store left bits of rt to memory (usually imm+3) ``` There's no delay required between lwl and lwr, so you can use them directly following eachother, eg. to load a word anywhere in memory without regard to alignment:
``` lwl r2,$0003(t0) ;\no delay required between these lwr r2,$0000(t0) ;/(although both access r2) nop ;-requires load delay HERE (before reading from r2) and r2,r2,0ffffh ;-access r2 (eg. reducing it to unaligned 16bit data) ``` #### Unaligned Load/Store (Details) LWR/SWR transfers the right (=lower) bits of Rt, up-to 32bit memory boundary:
``` lwr/swr [N*4+0] transfer whole 32bit of Rt to/from [N*4+0..3] lwr/swr [N*4+1] transfer lower 24bit of Rt to/from [N*4+1..3] lwr/swr [N*4+2] transfer lower 16bit of Rt to/from [N*4+2..3] lwr/swr [N*4+3] transfer lower 8bit of Rt to/from [N*4+3] ``` LWL/SWL transfers the left (=upper) bits of Rt, down-to 32bit memory boundary:
``` lwl/swl [N*4+0] transfer upper 8bit of Rt to/from [N*4+0] lwl/swl [N*4+1] transfer upper 16bit of Rt to/from [N*4+0..1] lwl/swl [N*4+2] transfer upper 24bit of Rt to/from [N*4+0..2] lwl/swl [N*4+3] transfer whole 32bit of Rt to/from [N*4+0..3] ``` The CPU has four separate byte-access signals, so, within a 32bit location, it can transfer all fragments of Rt at once (including for odd 24bit amounts). The transferred data is not zero- or sign-expanded, eg. when transferring 8bit data, the other 24bit of Rt and [mem] will remain intact.
Note: The aligned variant can also misused for blocking memory access on aligned addresses (in that case, if the address is known to be aligned, only one of the opcodes are needed, either LWL or LWR).... Uhhhhhhhm, OR is that NOT allowed... more PROBABLY that doesn't work?
## CPU ALU Opcodes #### arithmetic instructions ``` addt rd,rs,rt add rd,rs,rt rd=rs+rt (with overflow trap) add rd,rs,rt addu rd,rs,rt rd=rs+rt subt rd,rs,rt sub rd,rs,rt rd=rs-rt (with overflow trap) sub rd,rs,rt subu rd,rs,rt rd=rs-rt addt rt,rs,imm addi rt,rs,imm rt=rs+(-8000h..+7FFFh) (with ov.trap) add rt,rs,imm addiu rt,rs,imm rt=rs+(-8000h..+7FFFh) ``` The opcodes "with overflow trap" do trigger an exception (and leave rd unchanged) in case of overflows.
#### comparison instructions ``` setlt slt rd,rs,rt if rs The hardware does NOT generate exceptions on SHL overflows.
#### Multiply/divide ``` smul rs,rt mult rs,rt hi:lo = rs*rt (signed) umul rs,rt multu rs,rt hi:lo = rs*rt (unsigned) sdiv rs,rt div rs,rt lo = rs/rt, hi=rs mod rt (signed) udiv rs,rt divu rs,rt lo = rs/rt, hi=rs mod rt (unsigned) mov rd,hi mfhi rd rd=hi ;move from hi mov rd,lo mflo rd rd=lo ;move from lo mov hi,rs mthi rs hi=rs ;move to hi mov lo,rs mtlo rs lo=rs ;move to lo ``` The mul/div opcodes are starting the multiply/divide operation, starting takes only a single clock cycle, however, trying to read the result from the hi/lo registers while the mul/div operation is busy will halt the CPU until the mul/div has completed. For multiply, the execution time depends on rs (ie. "small\*large" can be much faster than "large\*small").
``` __umul_execution_time_____________________________________________________ Fast (6 cycles) rs = 00000000h..000007FFh Med (9 cycles) rs = 00000800h..000FFFFFh Slow (13 cycles) rs = 00100000h..FFFFFFFFh __smul_execution_time_____________________________________________________ Fast (6 cycles) rs = 00000000h..000007FFh, or rs = FFFFF800h..FFFFFFFFh Med (9 cycles) rs = 00000800h..000FFFFFh, or rs = FFF00000h..FFFFF801h Slow (13 cycles) rs = 00100000h..7FFFFFFFh, or rs = 80000000h..FFF00001h __udiv/sdiv_execution_time________________________________________________ Fixed (36 cycles) no matter of rs and rt values ``` For example, when executing "umul 123h,12345678h" and "mov r1,lo", one can insert up to six (cached) ALU opcodes, or read one value from PSX Main RAM (which has 6 cycle access time) between the "umul" and "mov" opcodes without additional slowdown.
The hardware does NOT generate exceptions on divide overflows, instead, divide errors are returning the following values:
``` Opcode Rs Rd Hi/Remainder Lo/Result udiv 0..FFFFFFFFh 0 --> Rs FFFFFFFFh sdiv 0..+7FFFFFFFh 0 --> Rs -1 sdiv -80000000h..-1 0 --> Rs +1 sdiv -80000000h -1 --> 0 -80000000h ``` For udiv, the result is more or less correct (as close to infinite as possible). For sdiv, the results are total garbage (about farthest away from the desired result as possible).
Note: After accessing the lo/hi registers, there seems to be a strange rule that one should not touch the lo/hi registers in the next 2 cycles or so... not yet understood if/when/how that rule applies...?
## CPU Jump Opcodes #### jumps and branches Note that the instruction following the branch will always be executed.
``` jmp dest j dest pc=(pc and F0000000h)+(imm26bit*4) call dest jal dest pc=(pc and F0000000h)+(imm26bit*4),ra=$+8 jmp rs jr rs pc=rs call rs,ret=rd jalr (rd,)rs(,rd) pc=rs, rd=$+8 ;see caution je rs,rt,dest beq rs,rt,dest if rs=rt then pc=$+4+(-8000h..+7FFFh)*4 jne rs,rt,dest bne rs,rt,dest if rs<>rt then pc=$+4+(-8000h..+7FFFh)*4 js rs,dest bltz rs,dest if rs<0 then pc=$+4+(-8000h..+7FFFh)*4 jns rs,dest bgez rs,dest if rs>=0 then pc=$+4+(-8000h..+7FFFh)*4 jgtz rs,dest bgtz rs,dest if rs>0 then pc=$+4+(-8000h..+7FFFh)*4 jlez rs,dest blez rs,dest if rs<=0 then pc=$+4+(-8000h..+7FFFh)*4 calls rs,dest bltzal rs,dest if rs<0 then pc=$+4+(..)*4, ra=$+8 callns rs,dest bgezal rs,dest if rs>=0 then pc=$+4+(..)*4, ra=$+8 ``` #### JALR cautions Caution: The JALR source code syntax varies (IDT79R3041 specs say "jalr rs,rd", but MIPS32 specs say "jalr rd,rs"). Moreover, JALR may not use the same register for both operands (eg. "jalr r31,r31") (doing so would destroy the target address; which is normally no problem, but it can be a problem if an IRQ occurs between the JALR opcode and the following branch delay opcode; in that case BD gets set, and EPC points "back" to the JALR opcode, so JALR is executed twice, with destroyed target address in second execution).
#### exception opcodes Unlike for jump/branch opcodes, exception opcodes are immediately executed (ie. without executing the following opcode).
``` syscall imm20 generates a system call exception break imm20 generates a breakpoint exception ``` The 20bit immediate doesn't affect the CPU (however, the exception handler may interprete it by software; by examing the opcode bits at [epc-4]).
## CPU Coprocessor Opcodes #### Coprocessor Instructions (COP0..COP3) ``` mov rt,cop#Rd(0-31) mfc# rt,rd ;rt = cop#datRd ;data regs mov rt,cop#Rd(32-63) cfc# rt,rd ;rt = cop#cntRd ;control regs mov cop#Rd(0-31),rt mtc# rt,rd ;cop#datRd = rt ;data regs mov cop#Rd(32-63),rt ctc# rt,rd ;cop#cntRd = rt ;control regs mov cop#cmd,imm25 cop# imm25 ;exec cop# command 0..1FFFFFFh mov cop#Rt(0-31),[rs+imm] lwc# rt,imm(rs) ;cop#dat_rt = [rs+imm] ;word mov [rs+imm],cop#Rt(0-31) swc# rt,imm(rs) ;[rs+imm] = cop#dat_rt ;word jf cop#flg,dest bc#f dest ;if cop#flg=false then pc=$+disp jt cop#flg,dest bc#t dest ;if cop#flg=true then pc=$+disp rfe rfe ;return from exception (COP0) tlb tlb ;virtual memory related (COP0) ``` Unknown if any tlb-opcodes (tlbr,tlbwi,tlbwr,tlbp) are implemented in the psx?
#### Caution - Load Delay When reading from a coprocessor register, the next opcode cannot use the destination register as operand (much the same as the Load Delays that occur when reading from memory; see there for details).
Reportedly, the Load Delay applies for the next TWO opcodes after coprocessor reads, but, that seems to be nonsense (the PSX does finish both COP0 and COP2 reads after ONE opcode).
#### Caution - Store Delay In some cases, a similar delay occurs when writing to a coprocessor register. COP0 is more or less free of store delays (eg. one can read from a cop0 register immediately after writing to it), the only known exception is the cop2 enable bit in cop0r12.bit30 (setting that cop0 bit acts delayed, and cop2 isn't actually enabled until after 2 clock cycles or so).
Writing to cop2 registers has a delay of 2..3 clock cycles. In most cases, that is probably (?) only 2 cycles, but special cases like writing to IRGB (which does additionally affect IR1,IR2,IR3) take 3 cycles until the result arrives in all registers).
Note that Store Delays are counted in numbers of clock cycles (not in numbers of opcodes). For 3 cycle delay, one must usually insert 3 cached opcodes (or one uncached opcode).
## CPU Pseudo Opcodes #### Pseudo instructions (native/spasm) ``` nop ;alias for sll r0,r0,0 move rd,rs ;alias for addu rd,rs,r0 la rx,imm32 ;load address (alias for lui rx / addiu rx) li rx,imm32 ;load immediate (alias for lui rx / ori rx) li rx,imm16 ;load immediate (alias for ori, range 0..FFFFh) li rx,-imm15 ;load immediate (alias for addiu, range -1..-8000h) li rx,imm16*10000h ;load immediate (alias for lui) lw rx,imm32 ;load from address (lui rx / lw rx,rx) sw rx,imm32 ;store to address (lui r1 / sw rx,r1) (destroys r1!) lb,lh,lwl,lwr,lbu,lhu;as above pseudo lw sb,sh,swl,swr ;as above pseudo sw (ie. also destroys r1!) alu rx,op ;alias for alu rx,rx,op alu(u) rx,rx,imm ;alias for alui(u) rx,rx,imm jalr rx ;alias for jalr (RA,)rx(,RA) subi(u) rt,rs,imm ;alias for addi(u) rt,rs,-imm beqz rx,dest ;alias for beq rx,r0,dest bnez rx,dest ;alias for bne rx,r0,dest b dest ;alias for beq r0,r0,dest (jump relative/spasm) bra dest ;alias for ...? (jump relative/gnu) bal dest ;alias for ...? (call relative/spasm) ``` #### Pseudo instructions (nocash/a22i) ``` mov rx,NNNN0000h ;alias for lui rx,NNNNh mov rx,0000NNNNh ;alias for or rx,r0,NNNNh ;max +FFFFh mov rx,-imm15 ;alias for add rx,r0,-NNNNh ;min -8000h mov rx,ry ;alias for or rx,ry,0 (or "addiu") nop ;alias for shl r0,r0,0 jrel dest ;alias for blez R0,dest ;relative jump crel dest ;alias for callns R0,dest ;relative call jz rx,dest ;alias for je rx,R0,dest jnz rx,dest ;alias for jne rx,R0,dest call rx ;alias for call rx,ret=RA ret ;alias for jmp ra subt rt,rs,imm ;alias for addt rt,rs,-imm sub rt,rs,imm ;alias for add rt,rs,-imm alu rx,op ;alias for alu rx,rx,op neg(t) rx,ry ;alias for sub(t) rx,R0,ry not rx,ry ;alias for nor rx,R0,ry neg(t)/not rx ;alias for neg(t)/not rx,rx setz rx,ry ;alias for setb rx,ry,1 (set if zero) setnz rx,ry ;alias for setb rx,R0,ry (set if nonzero) syscall/break ;alias for syscall/break 000000h ``` Below are pseudo instructions combined of two 32bit opcodes...
``` movp rx,imm32 ;alias for lui rx,imm16 -plus- or rx,rx,imm16) mov(bhs)p rx,[imm32] ;load from address (lui rx,imm16 / mov rx,[rx+imm16]) movu [rs+imm] ;alias for lwr/swr [rs+imm] plus lwl/swl [rs+imm+3] reti ;alias for jmp k0 plus rfe ``` Below are pseudo instructions combined of two or more 32bit opcodes...
``` push rlist ;alias for sub sp,n*4 -- mov [sp+(1..n)*4],r1..rn pop rlist ;alias for mov r1..rn,[sp+(1..n)*4] -- add sp,n*4 pop pc,rlist ;alias for pop ra,rlist -- jmp ra ``` #### Possible more Pseudos... ``` call x0000000h ;call y0000000h (could be half-working for mem mirrors?) setae,setge ;--> setb,setlt with swapped operands ``` #### Directives (nocash) ``` .mips ;select MIPS instruction set (alternately .hc05 for MC68HC05) .bios ;create a .ROM file (instead of .EXE) .auto_nop ;append NOPs to jumps ;unless next opcode starts with a + org imm ;assume following code to be originated at address "imm" db n(,n(..))) ;define 8bit data values(s) or quoted ASCII strings dw n(,n(..))) ;define 16bit data values(s) (not 32bit data!) dd n(,n(..))) ;define 32bit data values(s) .align imm 0 ;alias for immediate 0 and register R0 (whichever fits) ``` #### Directives (native) ``` org imm ;self-explaining (but, default=$80010000 for spasm!) align imm ;self-explaining (probably zeropadded?) db n(,n(..))) ;define 8bit data values(s) or quoted ASCII strings dh n(,n(..))) ;define 16bit data values(s) dw n(,n(..))) ;define 32bit data values(s) (not 16bit data!) dcb len,value ;fill bytes by (different as DCB on ARM CPUs) xyz ;define label "xyz" at current address (without colon) xyz equ n ;assign value n to xyz xyz = n ;probably same/sililar as "equ" ;xyz ;comments invoked with semicolon (spasm) incbin file.bin ;import binary file include file.asm ;import asm file zero ;alias for r0 >imm32 ;alias for (i-(i AND 8000h))/10000h, and/or i/10000h ? It uses "#" instead of ";" for comments.
It uses ":" for labels (fortunately).
The assembler has at least one directive: ".byte" (equivalent to "db" on other assemblers).
I've no clue which assembler is used for that syntax... could that be the Psy-Q assembler?
## COP0 - Register Summary #### COP0 Register Summary ``` cop0r0-r2 - N/A cop0r3 - BPC - Breakpoint on execute (R/W) cop0r4 - N/A cop0r5 - BDA - Breakpoint on data access (R/W) cop0r6 - JUMPDEST - Randomly memorized jump address (R) cop0r7 - DCIC - Breakpoint control (R/W) cop0r8 - BadVaddr - Bad Virtual Address (R) cop0r9 - BDAM - Data Access breakpoint mask (R/W) cop0r10 - N/A cop0r11 - BPCM - Execute breakpoint mask (R/W) cop0r12 - SR - System status register (R/W) cop0r13 - CAUSE - (R) Describes the most recently recognised exception cop0r14 - EPC - Return Address from Trap (R) cop0r15 - PRID - Processor ID (R) cop0r16-r31 - Garbage cop0r32-r63 - N/A - None such (Control regs) ``` ## COP0 - Exception Handling #### cop0r13 - CAUSE - (Read-only, except, Bit8-9 are R/W) Describes the most recently recognised exception
``` 0-1 - Not used (zero) 2-6 Excode Describes what kind of exception occured: 00h INT Interrupt 01h MOD Tlb modification (none such in PSX) 02h TLBL Tlb load (none such in PSX) 03h TLBS Tlb store (none such in PSX) 04h AdEL Address error, Data load or Instruction fetch 05h AdES Address error, Data store The address errors occur when attempting to read outside of KUseg in user mode and when the address is misaligned. (See also: BadVaddr register) 06h IBE Bus error on Instruction fetch 07h DBE Bus error on Data load/store 08h Syscall Generated unconditionally by syscall instruction 09h BP Breakpoint - break instruction 0Ah RI Reserved instruction 0Bh CpU Coprocessor unusable 0Ch Ov Arithmetic overflow 0Dh-1Fh Not used 7 - Not used (zero) 8-15 Ip Interrupt pending field. Bit 8 and 9 are R/W, and contain the last value written to them. As long as any of the bits are set they will cause an interrupt if the corresponding bit is set in IM. 16-27 - Not used (zero) 28-29 CE Contains the coprocessor number if the exception occurred because of a coprocessor instuction for a coprocessor which wasn't enabled in SR. 30 - Not used (zero) 31 BD Is set when last exception points to the branch instuction instead of the instruction in the branch delay slot, where the exception occurred. ``` #### cop0r12 - SR - System status register (R/W) ``` 0 IEc Current Interrupt Enable (0=Disable, 1=Enable) ;rfe pops IUp here 1 KUc Current Kernal/User Mode (0=Kernel, 1=User) ;rfe pops KUp here 2 IEp Previous Interrupt Disable ;rfe pops IUo here 3 KUp Previous Kernal/User Mode ;rfe pops KUo here 4 IEo Old Interrupt Disable ;left unchanged by rfe 5 KUo Old Kernal/User Mode ;left unchanged by rfe 6-7 - Not used (zero) 8-15 Im 8 bit interrupt mask fields. When set the corresponding interrupts are allowed to cause an exception. 16 Isc Isolate Cache (0=No, 1=Isolate) When isolated, all load and store operations are targetted to the Data cache, and never the main memory. (Used by PSX Kernel, in combination with Port FFFE0130h) 17 Swc Swapped cache mode (0=Normal, 1=Swapped) Instruction cache will act as Data cache and vice versa. Use only with Isc to access & invalidate Instr. cache entries. (Not used by PSX Kernel) 18 PZ When set cache parity bits are written as 0. 19 CM Shows the result of the last load operation with the D-cache isolated. It gets set if the cache really contained data for the addressed memory location. 20 PE Cache parity error (Does not cause exception) 21 TS TLB shutdown. Gets set if a programm address simultaneously matches 2 TLB entries. (initial value on reset allows to detect extended CPU version?) 22 BEV Boot exception vectors in RAM/ROM (0=RAM/KSEG0, 1=ROM/KSEG1) 23-24 - Not used (zero) 25 RE Reverse endianness (0=Normal endianness, 1=Reverse endianness) Reverses the byte order in which data is stored in memory. (lo-hi -> hi-lo) (Has affect only to User mode, not to Kernel mode) (?) (The bit doesn't exist in PSX ?) 26-27 - Not used (zero) 28 CU0 COP0 Enable (0=Enable only in Kernal Mode, 1=Kernal and User Mode) 29 CU1 COP1 Enable (0=Disable, 1=Enable) (none such in PSX) 30 CU2 COP2 Enable (0=Disable, 1=Enable) (GTE in PSX) 31 CU3 COP3 Enable (0=Disable, 1=Enable) (none such in PSX) ``` #### cop0r14 - EPC - Return Address from Trap (R) ``` 0-31 Return Address from Exception ``` This address is the instruction at which the exception took place, unless BD is set in CAUSE, then the instruction was at EPC+4.
Interrupts should always return to EPC+0, no matter of the BD flag. That way, if BD=1, the branch gets executed again, that's required because EPC stores only the current program counter, but not additionally the branch destination address.
Other exceptions may require to handle BD. In simple cases, when BD=0, the exception handler may return to EPC+0 (retry execution of the opcode), or to EPC+4 (skip the opcode that caused the exception). Note that jumps to faulty memory locations are executed without exception, but will trigger address errors and bus errors at the target location, ie. EPC (and BadVAddr, in case of address errors) point to the faulty address, not to the opcode that has jumped to that address).
#### Interrupts vs GTE Commands If an interrupt occurs "on" a GTE command (cop2cmd), then the GTE command is executed, but nethertheless, the return address in EPC points to the GTE command. So, if the exeception handler would return to EPC as usually, then the GTE command would be executed twice. In best case, this would be a waste of clock cycles, in worst case it may lead to faulty result (if the results from the 1st execution are re-used as incoming parameters in the 2nd execution). To fix the problem, the exception handler must do:
``` if (cause AND 7Ch)=00h ;if excode=interrupt if ([epc] AND FE000000h)=4A000000h ;and opcode=cop2cmd epc=epc+4 ;then skip that opcode ``` Note: The above exception handling is working only in newer PSX BIOSes, but in very old PSX BIOSes, it is only incompletely implemented (see "BIOS Patches" chapter for common workarounds; or write your own exception handler without using the BIOS).
Of course, the above exeption handling won't work in branch delays (where BD gets set to indicate that EPC was modified) (best workaround is not to use GTE commands in branch delays).
#### cop0cmd=10h - RFE opcode - Prepare Return from Exception The RFE opcode moves some bits in cop0r12 (SR): bit2-3 are copied to bit0-1, and bit4-5 are copied to bit2-3, all other bits (including bit4-5) are left unchanged.
The RFE opcode does NOT automatically jump to EPC. Instead, the exception handler must copy EPC into a register (usually R26 aka K0), and then jump to that address. Because of branch delays, that would look like so:
``` mov k0,epc ;get return address push k0 ;save epc in memory (if you expect nested exceptions) ... ;whatever (ie. process CAUSE) pop k0 ;restore from memory (if you expect nested exceptions) jmp k0 ;jump to K0 (after executing the next opcode) +rfe ;move SR bit4/5 --> bit2/3 --> bit0/1 ``` If you expect exceptions to be nested deeply, also push/pop SR. Note that there's no way to leave all registers intact (ie. above code destroys K0).
#### cop0r8 - BadVaddr - Bad Virtual Address (R) Contains the address whose reference caused an exception. Set on any MMU type of exceptions, on references outside of kuseg (in User mode) and on any misaligned reference. BadVaddr is updated ONLY by Address errors (Excode 04h and 05h), all other exceptions (including bus errors) leave BadVaddr unchanged.
#### Exception Vectors (depending on BEV bit in SR register) ``` Exception BEV=0 BEV=1 Reset BFC00000h BFC00000h (Reset) UTLB Miss 80000000h BFC00100h (Virtual memory, none such in PSX) COP0 Break 80000040h BFC00140h (Debug Break) General 80000080h BFC00180h (General Interrupts & Exceptions) ``` Note: Changing vectors at 800000xxh (kseg0) seems to be automatically reflected to the instruction cache without needing to flush cache (at least it worked SOMETIMES in my test proggy... but NOT always? ...anyways, it'd be highly recommended to flush cache when changing any opcodes), whilst changing mirrors at 000000xxh (kuseg) seems to require to flush cache.
The PSX uses only the BEV=0 vectors (aside from the reset vector, the PSX BIOS ROM doesn't contain any of the BEV=1 vectors).
#### Exception Priority ``` Reset At any time (highest) ;-reset AdEL Memory (Load instruction) ;\ AdES Memory (Store instruction) ; memory (data load/store) DBE Memory (Load or store) ;/ MOD ALU (Data TLB) ;\ TLBL ALU (DTLB Miss) ; none such TLBS ALU (DTLB Miss) ;/ Ovf ALU ;-overflow Int ALU ;-interrupt Sys RD (Instruction Decode) ;\ Bp RD (Instruction Decode) ; RI RD (Instruction Decode) ; CpU RD (Instruction Decode) ;/ TLBL I-Fetch (ITLB Miss) ;-none such AdEL IVA (Instruction Virtual Address) ;\memory (opcode fetch) IBE RD (end of I-Fetch, lowest) ;/ ``` ## COP0 - Misc #### cop0r15 - PRID - Processor ID (R) ``` 0-7 Revision 8-15 Implementation 16-31 Not used ``` For a Playstation with CXD8606CQ CPU, the PRID value is 00000002h.
Unknown if/which other Playstation CPU versions have other values...?
#### cop0r6 - JUMPDEST - Randomly memorized jump address (R) The is a rather strange totally useless register. After certain exceptions, the CPU does memorize a jump destination address in the register. Once when it has memorized an address, the register becomes locked, and the memorized value won't change until it becomes unlocked by a new exception. Exceptions that do unlock the register are Reset and Interrupts (cause.bit10). Exceptions that do NOT unlock the register are syscall/break opcodes, and software generated interrupts (eg. cause.bit8).
In the unlocked state, the CPU does more or less randomly memorize one of the next some jump destinations - eg. the destination from the second jump after reset, or from a jump that occured immediately before executing the IRQ handler, or from a jump inside of the IRQ handler, or even from a later jump that occurs shortly after returning from the IRQ handler.
The register seems to be read-only (although the Kernel initialization code writes 0 to it for whatever reason).
#### cop0r0..r2, cop0r4, cop0r10, cop0r32..r63 - N/A Registers 0,1,2,4,10 control virtual memory on some MIPS processors (but there's none such in the PSX), and Registers 32..63 (aka "control registers") aren't used in any MIPS processors. Trying to read any of these registers causes a Reserved Instruction Exception (excode=0Ah).
#### cop0cmd=01h,02h,06h,08h - TLBR,TLBWI,TLBWR,TLBP The PSX supports only one cop0cmd (cop0cmd=10h aka RFE). Trying to execute the TLBxx opcodes causes a Reserved Instruction Exception (excode=0Ah).
#### jf/jt cop0flg,dest - conditional cop0 jumps #### mov [mem],cop0reg / mov cop0reg,[mem] - coprocessor cop0 load/store Not supported by the CPU. Trying to execute these opcodes causes a Coprocessor Unusable Exception (excode=0Bh, ie. unlike above, not 0Ah).
#### cop0r16-r31 - Garbage Trying to read these registers returns garbage (but does not trigger an exception). When reading one of the garbage registers shortly after reading a valid cop0 register, the garbage value is usually the same as that of the valid register. When doing the read later on, the return value is usually 00000020h, or when reading much later it returns 00000040h, or even 00000100h. No idea what is causing that effect...?
Note: The garbage registers can be accessed (without causing an exception) even in "User mode with cop0 disabled" (SR.Bit1=1 and SR.Bit28=0); accessing any other existing cop0 registers (or executing the rfe opcode) in that state is causing Coprocessor Unusable Exceptions (excode=0Bh).
## COP0 - Debug Registers The COP0 debug registers seem to be PSX specific, normal R30xx CPUs like IDT's R3041 and R3051 don't have anything similar.
#### cop0r7 - DCIC - Breakpoint control (R/W) ``` 0 Automatically set by hardware upon Any break (R/W) 1 Automatically set by hardware upon BPC Code break (R/W) 2 Automatically set by hardware upon BDA Data break (R/W) 3 Automatically set by hardware upon BDA Data-Read break (R/W) 4 Automatically set by hardware upon BDA Data-Write break (R/W) 5 Automatically set by hardware upon any-jump break (R/W) 6-11 Not used (always zero) 12-13 Jump Redirection (0=Disable, 1..3=Enable) (see note) (R/W) 14-15 Unknown? (R/W) 16-22 Not used (always zero) 23 Super-Master Enable 1 for bit24-29 24 Execution breakpoint (0=Disabled, 1=Enabled) (see BPC, BPCM) 25 Data access breakpoint (0=Disabled, 1=Enabled) (see BDA, BDAM) 26 Break on Data-Read (0=No, 1=Break/when Bit25=1) 27 Break on Data-Write (0=No, 1=Break/when Bit25=1) 28 Break on any-jump (0=No, 1=Break on branch/jump/call/etc.) 29 Master Enable for bit28 (..and/or exec-break at address>=80000000h?) 30 Master Enable for bit24-27 31 Super-Master Enable 2 for bit24-29 ``` When a breakpoint address match occurs the PSX jumps to 80000040h (ie. unlike normal exceptions, not to 80000080h). The Excode value in the CAUSE register is set to 09h (same as BREAK opcode), and EPC contains the return address, as usually. One of the first things to be done in the exception handler is to disable breakpoints (eg. if the any-jump break is enabled, then it must be disabled BEFORE jumping from 80000040h to the actual exception handler).
#### cop0r7.bit12-13 - Jump Redirection Note If one or both of these bits are nonzero, then the PSX seems to check for the following opcode sequence,
``` mov rx,[mem] ;load rx from memory ... ;one or more opcodes that do not change rx jmp/call rx ;jump or call to rx ``` if it does sense that sequence, then it sets PC=[00000000h], but does not store any useful information in any cop0 registers, namely it does not store the return address in EPC, so it's impossible to determine which opcode has caused the exception. For the jump target address, there are 31 registers, so one could only guess which of them contains the target value; for "POP PC" code it'd be usually R31, but for "JMP [vector]" code it may be any register. So far the feature seems to be more or less unusable...?
#### cop0r5 - BDA - Breakpoint on Data Access Address (R/W) #### cop0r9 - BDAM - Breakpoint on Data Access Mask (R/W) Break condition is "((addr XOR BDA) AND BDAM)=0".
#### cop0r3 - BPC - Breakpoint on Execute Address (R/W) #### cop0r11 - BPCM - Breakpoint on Execute Mask (R/W) Break condition is "((PC XOR BPC) AND BPCM)=0".
#### Note (BREAK Opcode) Additionally, the BREAK opcode can be used to create further breakpoints by patching the executable code. The BREAK opcode uses the same Excode value (09h) in CAUSE register. However, the BREAK opcode jumps to the normal exception handler at 80000080h (not 80000040h).
#### Note (LibCrypt) The debug registers are mis-used by "Legacy of Kain: Soul Reaver" (and maybe also other games) for storing libcrypt copy-protection related values (ie. just as a "hidden" location for storing data, not for actual debugging purposes).
[CDROM Protection - LibCrypt](cdromdrive.md#cdrom-protection---libcrypt)
#### Note (Cheat Devices/Expansion ROMs) The Expansion ROM header supports only Pre-Boot and Post-Boot vectors, but no Mid-Boot vector. Cheat Devices are often using COP0 breaks for Mid-Boot Hooks, either with BPC=BFC06xxxh (break address in ROM, used in older cheat firmwares), or with BPC=80030000h (break address in RAM aka relocated GUI entrypoint, used in later cheat firmwares). Moreover, aside from the Mid-Boot Hook, the Xplorer cheat device is also supporting a special cheat code that uses the COP0 break feature.
#### Note (Datasheet) Note: COP0 debug registers are described in LSI's "L64360" datasheet, chapter 14. And in their LR33300/LR33310 datasheet, chapter 4.