psx-spx.github.io/cpuspecifications.md
Nicolas 'Pixel' Noble 9c394320d4 Deformat a bit.
2020-07-23 13:41:06 -07:00

868 lines
42 KiB
Markdown

# CPU Specifications
#### CPU
[CPU Registers](cpuspecifications.md#cpu-registers)<br/>
[CPU Opcode Encoding](cpuspecifications.md#cpu-opcode-encoding)<br/>
[CPU Load/Store Opcodes](cpuspecifications.md#cpu-loadstore-opcodes)<br/>
[CPU ALU Opcodes](cpuspecifications.md#cpu-alu-opcodes)<br/>
[CPU Jump Opcodes](cpuspecifications.md#cpu-jump-opcodes)<br/>
[CPU Coprocessor Opcodes](cpuspecifications.md#cpu-coprocessor-opcodes)<br/>
[CPU Pseudo Opcodes](cpuspecifications.md#cpu-pseudo-opcodes)<br/>
#### System Control Coprocessor (COP0)
[COP0 - Register Summary](cpuspecifications.md#cop0---register-summary)<br/>
[COP0 - Exception Handling](cpuspecifications.md#cop0---exception-handling)<br/>
[COP0 - Misc](cpuspecifications.md#cop0---misc)<br/>
[COP0 - Debug Registers](cpuspecifications.md#cop0---debug-registers)<br/>
## CPU Registers
All registers are 32bit wide.<br/>
```
Name Alias Common Usage
(R0) zero Constant (always 0) (this one isn't a real register)
R1 at Assembler temporary (destroyed by some pseudo opcodes!)
R2-R3 v0-v1 Subroutine return values, may be changed by subroutines
R4-R7 a0-a3 Subroutine arguments, may be changed by subroutines
R8-R15 t0-t7 Temporaries, may be changed by subroutines
R16-R23 s0-s7 Static variables, must be saved by subs
R24-R25 t8-t9 Temporaries, may be changed by subroutines
R26-R27 k0-k1 Reserved for kernel (destroyed by some IRQ handlers!)
R28 gp Global pointer (rarely used)
R29 sp Stack pointer
R30 fp(s8) Frame Pointer, or 9th Static variable, must be saved
R31 ra Return address (used so by JAL,BLTZAL,BGEZAL opcodes)
- pc Program counter
- hi,lo Multiply/divide results, may be changed by subroutines
```
R0 is always zero.<br/>
R31 can be used as general purpose register, however, some opcodes are using it
to store the return address: JAL, BLTZAL, BGEZAL. (Note: JALR can optionally
store the return address in R31, or in R1..R30. Exceptions store the return
address in cop0r14 - EPC).<br/>
#### R29 (SP) - Full Decrementing Wasted Stack Pointer
The CPU doesn't explicitly have stack-related registers or opcodes, however,
conventionally, R29 is used as stack pointer (SP). The stack can be accessed
with normal load/store opcodes, which do not automatically increase/decrease
SP, so the SP register must be manually modified to (de-)allocate data.<br/>
The PSX BIOS is using "Full Decrementing Wasted Stack".<br/>
Decrementing means that SP gets decremented when allocating data (that's common
for most CPUs) - Full means that SP points to the first ALLOCATED word on the
stack, so the allocated memory is at SP+0 and above, free memory at SP-1 and
below, Wasted means that when calling a sub-function with N parameters, then
the caller must pre-allocate N works on stack, and the sub-function may freely
use and destroy these words; at [SP+0..N\*4-1].<br/>
For example, "push ra,r16,r17" would be implemented as:<br/>
```
sub sp,20h
mov [sp+14h],ra
mov [sp+18h],r16
mov [sp+1Ch],r17
```
where the allocated 20h bytes have the following purpose:<br/>
```
[sp+00h..0Fh] wasted stack (may, or may not, be used by sub-functions)
[sp+10h..13h] 8-byte alignment padding (not used)
[sp+14h..1Fh] pushed registers
```
## CPU Opcode Encoding
#### Primary opcode field (Bit 26..31)
```
00h=SPECIAL 08h=ADDI 10h=COP0 18h=N/A 20h=LB 28h=SB 30h=LWC0 38h=SWC0
01h=BcondZ 09h=ADDIU 11h=COP1 19h=N/A 21h=LH 29h=SH 31h=LWC1 39h=SWC1
02h=J 0Ah=SLTI 12h=COP2 1Ah=N/A 22h=LWL 2Ah=SWL 32h=LWC2 3Ah=SWC2
03h=JAL 0Bh=SLTIU 13h=COP3 1Bh=N/A 23h=LW 2Bh=SW 33h=LWC3 3Bh=SWC3
04h=BEQ 0Ch=ANDI 14h=N/A 1Ch=N/A 24h=LBU 2Ch=N/A 34h=N/A 3Ch=N/A
05h=BNE 0Dh=ORI 15h=N/A 1Dh=N/A 25h=LHU 2Dh=N/A 35h=N/A 3Dh=N/A
06h=BLEZ 0Eh=XORI 16h=N/A 1Eh=N/A 26h=LWR 2Eh=SWR 36h=N/A 3Eh=N/A
07h=BGTZ 0Fh=LUI 17h=N/A 1Fh=N/A 27h=N/A 2Fh=N/A 37h=N/A 3Fh=N/A
```
#### Secondary opcode field (Bit 0..5) (when Primary opcode = 00h)
```
00h=SLL 08h=JR 10h=MFHI 18h=MULT 20h=ADD 28h=N/A 30h=N/A 38h=N/A
01h=N/A 09h=JALR 11h=MTHI 19h=MULTU 21h=ADDU 29h=N/A 31h=N/A 39h=N/A
02h=SRL 0Ah=N/A 12h=MFLO 1Ah=DIV 22h=SUB 2Ah=SLT 32h=N/A 3Ah=N/A
03h=SRA 0Bh=N/A 13h=MTLO 1Bh=DIVU 23h=SUBU 2Bh=SLTU 33h=N/A 3Bh=N/A
04h=SLLV 0Ch=SYSCALL 14h=N/A 1Ch=N/A 24h=AND 2Ch=N/A 34h=N/A 3Ch=N/A
05h=N/A 0Dh=BREAK 15h=N/A 1Dh=N/A 25h=OR 2Dh=N/A 35h=N/A 3Dh=N/A
06h=SRLV 0Eh=N/A 16h=N/A 1Eh=N/A 26h=XOR 2Eh=N/A 36h=N/A 3Eh=N/A
07h=SRAV 0Fh=N/A 17h=N/A 1Fh=N/A 27h=NOR 2Fh=N/A 37h=N/A 3Fh=N/A
```
#### Opcode/Parameter Encoding
```
31..26 |25..21|20..16|15..11|10..6 | 5..0 |
6bit | 5bit | 5bit | 5bit | 5bit | 6bit |
-------+------+------+------+------+--------+------------
000000 | N/A | rt | rd | imm5 | 0000xx | shift-imm
000000 | rs | rt | rd | N/A | 0001xx | shift-reg
000000 | rs | N/A | N/A | N/A | 001000 | jr
000000 | rs | N/A | rd | N/A | 001001 | jalr
000000 | <-----comment20bit------> | 00110x | sys/brk
000000 | N/A | N/A | rd | N/A | 0100x0 | mfhi/mflo
000000 | rs | N/A | N/A | N/A | 0100x1 | mthi/mtlo
000000 | rs | rt | N/A | N/A | 0110xx | mul/div
000000 | rs | rt | rd | N/A | 10xxxx | alu-reg
000001 | rs | 00000| <--immediate16bit--> | bltz
000001 | rs | 00001| <--immediate16bit--> | bgez
000001 | rs | 10000| <--immediate16bit--> | bltzal
000001 | rs | 10001| <--immediate16bit--> | bgezal
00001x | <---------immediate26bit---------> | j/jal
00010x | rs | rt | <--immediate16bit--> | beq/bne
00011x | rs | N/A | <--immediate16bit--> | blez/bgtz
001xxx | rs | rt | <--immediate16bit--> | alu-imm
001111 | N/A | rt | <--immediate16bit--> | lui-imm
100xxx | rs | rt | <--immediate16bit--> | load rt,[rs+imm]
101xxx | rs | rt | <--immediate16bit--> | store rt,[rs+imm]
x1xxxx | <------coprocessor specific------> | coprocessor (see below)
```
#### Coprocessor Opcode/Parameter Encoding
```
31..26 |25..21|20..16|15..11|10..6 | 5..0 |
6bit | 5bit | 5bit | 5bit | 5bit | 6bit |
-------+------+------+------+------+--------+------------
0100nn |0|0000| rt | rd | N/A | 000000 | MFCn rt,rd_dat ;rt = dat
0100nn |0|0010| rt | rd | N/A | 000000 | CFCn rt,rd_cnt ;rt = cnt
0100nn |0|0100| rt | rd | N/A | 000000 | MTCn rt,rd_dat ;dat = rt
0100nn |0|0110| rt | rd | N/A | 000000 | CTCn rt,rd_cnt ;cnt = rt
0100nn |0|1000|00000 | <--immediate16bit--> | BCnF target ;jump if false
0100nn |0|1000|00001 | <--immediate16bit--> | BCnT target ;jump if true
0100nn |1| <--------immediate25bit--------> | COPn imm25
010000 |1|0000| N/A | N/A | N/A | 000001 | COP0 01h ;=TLBR
010000 |1|0000| N/A | N/A | N/A | 000010 | COP0 02h ;=TLBWI
010000 |1|0000| N/A | N/A | N/A | 000110 | COP0 06h ;=TLBWR
010000 |1|0000| N/A | N/A | N/A | 001000 | COP0 08h ;=TLBP
010000 |1|0000| N/A | N/A | N/A | 010000 | COP0 10h ;=RFE
1100nn | rs | rt | <--immediate16bit--> | LWCn rt_dat,[rs+imm]
1110nn | rs | rt | <--immediate16bit--> | SWCn rt_dat,[rs+imm]
```
#### Illegal Opcodes
All opcodes that are marked as "N/A" in the Primary and Secondary opcode tables
are causing a Reserved Instruction Exception (excode=0Ah).<br/>
The unused operand bits (eg. Bit21-25 for LUI opcode) should be usually zero,
but do not necessarily trigger exceptions if set to nonzero values.<br/>
## CPU Load/Store Opcodes
#### Load instructions
```
movbs rt,[imm+rs] lb rt,imm(rs) rt=[imm+rs] ;byte sign-extended
movb rt,[imm+rs] lbu rt,imm(rs) rt=[imm+rs] ;byte zero-extended
movhs rt,[imm+rs] lh rt,imm(rs) rt=[imm+rs] ;halfword sign-extended
movh rt,[imm+rs] lhu rt,imm(rs) rt=[imm+rs] ;halfword zero-extended
mov rt,[imm+rs] lw rt,imm(rs) rt=[imm+rs] ;word
```
Load instructions can read from the data cache (if the data is not in the
cache, or if the memory region is uncached, then the CPU gets halted until it
has read the data) (however, the PSX doesn't have a data cache).<br/>
#### Caution - Load Delay
The loaded data is NOT available to the next opcode, ie. the target register
isn't updated until the next opcode has completed. So, if the next opcode tries
to read from the load destination register, then it would (usually) receive the
OLD value of that register (unless an IRQ occurs between the load and next
opcode, in that case the load would complete during IRQ handling, and so, the
next opcode would receive the NEW value).<br/>
#### Store instructions
```
movb [imm+rs],rt sb rt,imm(rs) [imm+rs]=(rt AND FFh) ;store 8bit
movh [imm+rs],rt sh rt,imm(rs) [imm+rs]=(rt AND FFFFh) ;store 16bit
mov [imm+rs],rt sw rt,imm(rs) [imm+rs]=rt ;store 32bit
```
Store operations are passed to the write-buffer, so they can execute within a
single clock cycle (unless the write-buffer was full, in that case the CPU gets
halted until there's room in the buffer). But, the PSX doesn't have a
writebuffer...?<br/>
#### Load/Store Alignment
Halfword addresses must be aligned by 2, word addresses must be aligned by 4,
trying to access mis-aligned addresses will cause an exception. There's no
alignment restriction for bytes.<br/>
#### Unaligned Load/Store
```
lwr rt,imm(rs) load right bits of rt from memory (usually imm+0)
lwl rt,imm(rs) load left bits of rt from memory (usually imm+3)
swr rt,imm(rs) store right bits of rt to memory (usually imm+0)
swl rt,imm(rs) store left bits of rt to memory (usually imm+3)
```
There's no delay required between lwl and lwr, so you can use them directly
following eachother, eg. to load a word anywhere in memory without regard to
alignment:<br/>
```
lwl r2,$0003(t0) ;\no delay required between these
lwr r2,$0000(t0) ;/(although both access r2)
nop ;-requires load delay HERE (before reading from r2)
and r2,r2,0ffffh ;-access r2 (eg. reducing it to unaligned 16bit data)
```
#### Unaligned Load/Store (Details)
LWR/SWR transfers the right (=lower) bits of Rt, up-to 32bit memory boundary:<br/>
```
lwr/swr [N*4+0] transfer whole 32bit of Rt to/from [N*4+0..3]
lwr/swr [N*4+1] transfer lower 24bit of Rt to/from [N*4+1..3]
lwr/swr [N*4+2] transfer lower 16bit of Rt to/from [N*4+2..3]
lwr/swr [N*4+3] transfer lower 8bit of Rt to/from [N*4+3]
```
LWL/SWL transfers the left (=upper) bits of Rt, down-to 32bit memory boundary:<br/>
```
lwl/swl [N*4+0] transfer upper 8bit of Rt to/from [N*4+0]
lwl/swl [N*4+1] transfer upper 16bit of Rt to/from [N*4+0..1]
lwl/swl [N*4+2] transfer upper 24bit of Rt to/from [N*4+0..2]
lwl/swl [N*4+3] transfer whole 32bit of Rt to/from [N*4+0..3]
```
The CPU has four separate byte-access signals, so, within a 32bit location, it
can transfer all fragments of Rt at once (including for odd 24bit amounts). The
transferred data is not zero- or sign-expanded, eg. when transferring 8bit
data, the other 24bit of Rt and [mem] will remain intact.<br/>
Note: The aligned variant can also misused for blocking memory access on
aligned addresses (in that case, if the address is known to be aligned, only
one of the opcodes are needed, either LWL or LWR).... Uhhhhhhhm, OR is that NOT
allowed... more PROBABLY that doesn't work?<br/>
## CPU ALU Opcodes
#### arithmetic instructions
```
addt rd,rs,rt add rd,rs,rt rd=rs+rt (with overflow trap)
add rd,rs,rt addu rd,rs,rt rd=rs+rt
subt rd,rs,rt sub rd,rs,rt rd=rs-rt (with overflow trap)
sub rd,rs,rt subu rd,rs,rt rd=rs-rt
addt rt,rs,imm addi rt,rs,imm rt=rs+(-8000h..+7FFFh) (with ov.trap)
add rt,rs,imm addiu rt,rs,imm rt=rs+(-8000h..+7FFFh)
```
The opcodes "with overflow trap" do trigger an exception (and leave rd
unchanged) in case of overflows.<br/>
#### comparison instructions
```
setlt slt rd,rs,rt if rs<rt then rd=1 else rd=0 (signed)
setb sltu rd,rs,rt if rs<rt then rd=1 else rd=0 (unsigned)
setlt slti rt,rs,imm if rs<(-8000h..+7FFFh) then rt=1 else rt=0 (signed)
setb sltiu rt,rs,imm if rs<(FFFF8000h..7FFFh) then rt=1 else rt=0(unsigned)
```
#### logical instructions
```
and rd,rs,rt and rd,rs,rt rd = rs AND rt
or rd,rs,rt or rd,rs,rt rd = rs OR rt
xor rd,rs,rt xor rd,rs,rt rd = rs XOR rt
nor rd,rs,rt nor rd,rs,rt rd = FFFFFFFFh XOR (rs OR rt)
and rt,rs,imm andi rt,rs,imm rt = rs AND (0000h..FFFFh)
or rt,rs,imm ori rt,rs,imm rt = rs OR (0000h..FFFFh)
xor rt,rs,imm xori rt,rs,imm rt = rs XOR (0000h..FFFFh)
```
#### shifting instructions
```
shl rd,rt,rs sllv rd,rt,rs rd = rt SHL (rs AND 1Fh)
shr rd,rt,rs srlv rd,rt,rs rd = rt SHR (rs AND 1Fh)
sar rd,rt,rs srav rd,rt,rs rd = rt SAR (rs AND 1Fh)
shl rd,rt,imm sll rd,rt,imm rd = rt SHL (00h..1Fh)
shr rd,rt,imm srl rd,rt,imm rd = rt SHR (00h..1Fh)
sar rd,rt,imm sra rd,rt,imm rd = rt SAR (00h..1Fh)
mov rt,i*10000h lui rt,imm rt = (0000h..FFFFh) SHL 16
```
Unlike many other opcodes, shifts use 'rt' as second (not third) operand.<br/>
The hardware does NOT generate exceptions on SHL overflows.<br/>
#### Multiply/divide
```
smul rs,rt mult rs,rt hi:lo = rs*rt (signed)
umul rs,rt multu rs,rt hi:lo = rs*rt (unsigned)
sdiv rs,rt div rs,rt lo = rs/rt, hi=rs mod rt (signed)
udiv rs,rt divu rs,rt lo = rs/rt, hi=rs mod rt (unsigned)
mov rd,hi mfhi rd rd=hi ;move from hi
mov rd,lo mflo rd rd=lo ;move from lo
mov hi,rs mthi rs hi=rs ;move to hi
mov lo,rs mtlo rs lo=rs ;move to lo
```
The mul/div opcodes are starting the multiply/divide operation, starting takes
only a single clock cycle, however, trying to read the result from the hi/lo
registers while the mul/div operation is busy will halt the CPU until the
mul/div has completed. For multiply, the execution time depends on rs (ie.
"small\*large" can be much faster than "large\*small").<br/>
```
__umul_execution_time_____________________________________________________
Fast (6 cycles) rs = 00000000h..000007FFh
Med (9 cycles) rs = 00000800h..000FFFFFh
Slow (13 cycles) rs = 00100000h..FFFFFFFFh
__smul_execution_time_____________________________________________________
Fast (6 cycles) rs = 00000000h..000007FFh, or rs = FFFFF800h..FFFFFFFFh
Med (9 cycles) rs = 00000800h..000FFFFFh, or rs = FFF00000h..FFFFF801h
Slow (13 cycles) rs = 00100000h..7FFFFFFFh, or rs = 80000000h..FFF00001h
__udiv/sdiv_execution_time________________________________________________
Fixed (36 cycles) no matter of rs and rt values
```
For example, when executing "umul 123h,12345678h" and "mov r1,lo", one can
insert up to six (cached) ALU opcodes, or read one value from PSX Main RAM
(which has 6 cycle access time) between the "umul" and "mov" opcodes without
additional slowdown.<br/>
The hardware does NOT generate exceptions on divide overflows, instead, divide
errors are returning the following values:<br/>
```
Opcode Rs Rd Hi/Remainder Lo/Result
udiv 0..FFFFFFFFh 0 --> Rs FFFFFFFFh
sdiv 0..+7FFFFFFFh 0 --> Rs -1
sdiv -80000000h..-1 0 --> Rs +1
sdiv -80000000h -1 --> 0 -80000000h
```
For udiv, the result is more or less correct (as close to infinite as
possible). For sdiv, the results are total garbage (about farthest away from
the desired result as possible).<br/>
Note: After accessing the lo/hi registers, there seems to be a strange rule
that one should not touch the lo/hi registers in the next 2 cycles or so... not
yet understood if/when/how that rule applies...?<br/>
## CPU Jump Opcodes
#### jumps and branches
Note that the instruction following the branch will always be executed.<br/>
```
jmp dest j dest pc=(pc and F0000000h)+(imm26bit*4)
call dest jal dest pc=(pc and F0000000h)+(imm26bit*4),ra=$+8
jmp rs jr rs pc=rs
call rs,ret=rd jalr (rd,)rs(,rd) pc=rs, rd=$+8 ;see caution
je rs,rt,dest beq rs,rt,dest if rs=rt then pc=$+4+(-8000h..+7FFFh)*4
jne rs,rt,dest bne rs,rt,dest if rs<>rt then pc=$+4+(-8000h..+7FFFh)*4
js rs,dest bltz rs,dest if rs<0 then pc=$+4+(-8000h..+7FFFh)*4
jns rs,dest bgez rs,dest if rs>=0 then pc=$+4+(-8000h..+7FFFh)*4
jgtz rs,dest bgtz rs,dest if rs>0 then pc=$+4+(-8000h..+7FFFh)*4
jlez rs,dest blez rs,dest if rs<=0 then pc=$+4+(-8000h..+7FFFh)*4
calls rs,dest bltzal rs,dest if rs<0 then pc=$+4+(..)*4, ra=$+8
callns rs,dest bgezal rs,dest if rs>=0 then pc=$+4+(..)*4, ra=$+8
```
#### JALR cautions
Caution: The JALR source code syntax varies (IDT79R3041 specs say "jalr rs,rd",
but MIPS32 specs say "jalr rd,rs"). Moreover, JALR may not use the same
register for both operands (eg. "jalr r31,r31") (doing so would destroy the
target address; which is normally no problem, but it can be a problem if an IRQ
occurs between the JALR opcode and the following branch delay opcode; in that
case BD gets set, and EPC points "back" to the JALR opcode, so JALR is executed
twice, with destroyed target address in second execution).<br/>
#### exception opcodes
Unlike for jump/branch opcodes, exception opcodes are immediately executed (ie.
without executing the following opcode).<br/>
```
syscall imm20 generates a system call exception
break imm20 generates a breakpoint exception
```
The 20bit immediate doesn't affect the CPU (however, the exception handler may
interprete it by software; by examing the opcode bits at [epc-4]).<br/>
## CPU Coprocessor Opcodes
#### Coprocessor Instructions (COP0..COP3)
```
mov rt,cop#Rd(0-31) mfc# rt,rd ;rt = cop#datRd ;data regs
mov rt,cop#Rd(32-63) cfc# rt,rd ;rt = cop#cntRd ;control regs
mov cop#Rd(0-31),rt mtc# rt,rd ;cop#datRd = rt ;data regs
mov cop#Rd(32-63),rt ctc# rt,rd ;cop#cntRd = rt ;control regs
mov cop#cmd,imm25 cop# imm25 ;exec cop# command 0..1FFFFFFh
mov cop#Rt(0-31),[rs+imm] lwc# rt,imm(rs) ;cop#dat_rt = [rs+imm] ;word
mov [rs+imm],cop#Rt(0-31) swc# rt,imm(rs) ;[rs+imm] = cop#dat_rt ;word
jf cop#flg,dest bc#f dest ;if cop#flg=false then pc=$+disp
jt cop#flg,dest bc#t dest ;if cop#flg=true then pc=$+disp
rfe rfe ;return from exception (COP0)
tlb<xx> tlb<xx> ;virtual memory related (COP0)
```
Unknown if any tlb-opcodes (tlbr,tlbwi,tlbwr,tlbp) are implemented in the psx?<br/>
#### Caution - Load Delay
When reading from a coprocessor register, the next opcode cannot use the
destination register as operand (much the same as the Load Delays that occur
when reading from memory; see there for details).<br/>
Reportedly, the Load Delay applies for the next TWO opcodes after coprocessor
reads, but, that seems to be nonsense (the PSX does finish both COP0 and COP2
reads after ONE opcode).<br/>
#### Caution - Store Delay
In some cases, a similar delay occurs when writing to a coprocessor register.
COP0 is more or less free of store delays (eg. one can read from a cop0
register immediately after writing to it), the only known exception is the cop2
enable bit in cop0r12.bit30 (setting that cop0 bit acts delayed, and cop2 isn't
actually enabled until after 2 clock cycles or so).<br/>
Writing to cop2 registers has a delay of 2..3 clock cycles. In most cases, that
is probably (?) only 2 cycles, but special cases like writing to IRGB (which
does additionally affect IR1,IR2,IR3) take 3 cycles until the result arrives in
all registers).<br/>
Note that Store Delays are counted in numbers of clock cycles (not in numbers
of opcodes). For 3 cycle delay, one must usually insert 3 cached opcodes (or
one uncached opcode).<br/>
## CPU Pseudo Opcodes
#### Pseudo instructions (native/spasm)
```
nop ;alias for sll r0,r0,0
move rd,rs ;alias for addu rd,rs,r0
la rx,imm32 ;load address (alias for lui rx / addiu rx)
li rx,imm32 ;load immediate (alias for lui rx / ori rx)
li rx,imm16 ;load immediate (alias for ori, range 0..FFFFh)
li rx,-imm15 ;load immediate (alias for addiu, range -1..-8000h)
li rx,imm16*10000h ;load immediate (alias for lui)
lw rx,imm32 ;load from address (lui rx / lw rx,rx)
sw rx,imm32 ;store to address (lui r1 / sw rx,r1) (destroys r1!)
lb,lh,lwl,lwr,lbu,lhu;as above pseudo lw
sb,sh,swl,swr ;as above pseudo sw (ie. also destroys r1!)
alu rx,op ;alias for alu rx,rx,op
alu(u) rx,rx,imm ;alias for alui(u) rx,rx,imm
jalr rx ;alias for jalr (RA,)rx(,RA)
subi(u) rt,rs,imm ;alias for addi(u) rt,rs,-imm
beqz rx,dest ;alias for beq rx,r0,dest
bnez rx,dest ;alias for bne rx,r0,dest
b dest ;alias for beq r0,r0,dest (jump relative/spasm)
bra dest ;alias for ...? (jump relative/gnu)
bal dest ;alias for ...? (call relative/spasm)
```
#### Pseudo instructions (nocash/a22i)
```
mov rx,NNNN0000h ;alias for lui rx,NNNNh
mov rx,0000NNNNh ;alias for or rx,r0,NNNNh ;max +FFFFh
mov rx,-imm15 ;alias for add rx,r0,-NNNNh ;min -8000h
mov rx,ry ;alias for or rx,ry,0 (or "addiu")
nop ;alias for shl r0,r0,0
jrel dest ;alias for blez R0,dest ;relative jump
crel dest ;alias for callns R0,dest ;relative call
jz rx,dest ;alias for je rx,R0,dest
jnz rx,dest ;alias for jne rx,R0,dest
call rx ;alias for call rx,ret=RA
ret ;alias for jmp ra
subt rt,rs,imm ;alias for addt rt,rs,-imm
sub rt,rs,imm ;alias for add rt,rs,-imm
alu rx,op ;alias for alu rx,rx,op
neg(t) rx,ry ;alias for sub(t) rx,R0,ry
not rx,ry ;alias for nor rx,R0,ry
neg(t)/not rx ;alias for neg(t)/not rx,rx
setz rx,ry ;alias for setb rx,ry,1 (set if zero)
setnz rx,ry ;alias for setb rx,R0,ry (set if nonzero)
syscall/break ;alias for syscall/break 000000h
```
Below are pseudo instructions combined of two 32bit opcodes...<br/>
```
movp rx,imm32 ;alias for lui rx,imm16 -plus- or rx,rx,imm16)
mov(bhs)p rx,[imm32] ;load from address (lui rx,imm16 / mov rx,[rx+imm16])
movu [rs+imm] ;alias for lwr/swr [rs+imm] plus lwl/swl [rs+imm+3]
reti ;alias for jmp k0 plus rfe
```
Below are pseudo instructions combined of two or more 32bit opcodes...<br/>
```
push rlist ;alias for sub sp,n*4 -- mov [sp+(1..n)*4],r1..rn
pop rlist ;alias for mov r1..rn,[sp+(1..n)*4] -- add sp,n*4
pop pc,rlist ;alias for pop ra,rlist -- jmp ra
```
#### Possible more Pseudos...
```
call x0000000h ;call y0000000h (could be half-working for mem mirrors?)
setae,setge ;--> setb,setlt with swapped operands
```
#### Directives (nocash)
```
.mips ;select MIPS instruction set (alternately .hc05 for MC68HC05)
.bios ;create a .ROM file (instead of .EXE)
.auto_nop ;append NOPs to jumps ;unless next opcode starts with a +
org imm ;assume following code to be originated at address "imm"
db n(,n(..))) ;define 8bit data values(s) or quoted ASCII strings
dw n(,n(..))) ;define 16bit data values(s) (not 32bit data!)
dd n(,n(..))) ;define 32bit data values(s)
.align imm
0 ;alias for immediate 0 and register R0 (whichever fits)
```
#### Directives (native)
```
org imm ;self-explaining (but, default=$80010000 for spasm!)
align imm ;self-explaining (probably zeropadded?)
db n(,n(..))) ;define 8bit data values(s) or quoted ASCII strings
dh n(,n(..))) ;define 16bit data values(s)
dw n(,n(..))) ;define 32bit data values(s) (not 16bit data!)
dcb len,value ;fill <len> bytes by <value> (different as DCB on ARM CPUs)
xyz ;define label "xyz" at current address (without colon)
xyz equ n ;assign value n to xyz
xyz = n ;probably same/sililar as "equ"
;xyz ;comments invoked with semicolon (spasm)
incbin file.bin ;import binary file
include file.asm ;import asm file
zero ;alias for r0
>imm32 ;alias for (i-(i AND 8000h))/10000h, and/or i/10000h ?
<imm32 ;alias for (i AND 0FFFFh), used for SW(+/-) and ORI(+)?
end ;N/A ;no "end" or ".end" directive needed/used by spasm
r1 aka at ;N/A ;some assemblers may (optionally) reject to use r1/at
```
#### Syntax for unknown assembler (for pad.s)
It uses "0x" for HEX values (but doesn't use "$" for registers).<br/>
It uses "#" instead of ";" for comments.<br/>
It uses ":" for labels (fortunately).<br/>
The assembler has at least one directive: ".byte" (equivalent to "db" on other
assemblers).<br/>
I've no clue which assembler is used for that syntax... could that be the Psy-Q
assembler?<br/>
## COP0 - Register Summary
#### COP0 Register Summary
```
cop0r0-r2 - N/A
cop0r3 - BPC - Breakpoint on execute (R/W)
cop0r4 - N/A
cop0r5 - BDA - Breakpoint on data access (R/W)
cop0r6 - JUMPDEST - Randomly memorized jump address (R)
cop0r7 - DCIC - Breakpoint control (R/W)
cop0r8 - BadVaddr - Bad Virtual Address (R)
cop0r9 - BDAM - Data Access breakpoint mask (R/W)
cop0r10 - N/A
cop0r11 - BPCM - Execute breakpoint mask (R/W)
cop0r12 - SR - System status register (R/W)
cop0r13 - CAUSE - (R) Describes the most recently recognised exception
cop0r14 - EPC - Return Address from Trap (R)
cop0r15 - PRID - Processor ID (R)
cop0r16-r31 - Garbage
cop0r32-r63 - N/A - None such (Control regs)
```
## COP0 - Exception Handling
#### cop0r13 - CAUSE - (Read-only, except, Bit8-9 are R/W)
Describes the most recently recognised exception<br/>
```
0-1 - Not used (zero)
2-6 Excode Describes what kind of exception occured:
00h INT Interrupt
01h MOD Tlb modification (none such in PSX)
02h TLBL Tlb load (none such in PSX)
03h TLBS Tlb store (none such in PSX)
04h AdEL Address error, Data load or Instruction fetch
05h AdES Address error, Data store
The address errors occur when attempting to read
outside of KUseg in user mode and when the address
is misaligned. (See also: BadVaddr register)
06h IBE Bus error on Instruction fetch
07h DBE Bus error on Data load/store
08h Syscall Generated unconditionally by syscall instruction
09h BP Breakpoint - break instruction
0Ah RI Reserved instruction
0Bh CpU Coprocessor unusable
0Ch Ov Arithmetic overflow
0Dh-1Fh Not used
7 - Not used (zero)
8-15 Ip Interrupt pending field. Bit 8 and 9 are R/W, and
contain the last value written to them. As long
as any of the bits are set they will cause an
interrupt if the corresponding bit is set in IM.
16-27 - Not used (zero)
28-29 CE Contains the coprocessor number if the exception
occurred because of a coprocessor instuction for
a coprocessor which wasn't enabled in SR.
30 - Not used (zero)
31 BD Is set when last exception points to the
branch instuction instead of the instruction
in the branch delay slot, where the exception
occurred.
```
#### cop0r12 - SR - System status register (R/W)
```
0 IEc Current Interrupt Enable (0=Disable, 1=Enable) ;rfe pops IUp here
1 KUc Current Kernal/User Mode (0=Kernel, 1=User) ;rfe pops KUp here
2 IEp Previous Interrupt Disable ;rfe pops IUo here
3 KUp Previous Kernal/User Mode ;rfe pops KUo here
4 IEo Old Interrupt Disable ;left unchanged by rfe
5 KUo Old Kernal/User Mode ;left unchanged by rfe
6-7 - Not used (zero)
8-15 Im 8 bit interrupt mask fields. When set the corresponding
interrupts are allowed to cause an exception.
16 Isc Isolate Cache (0=No, 1=Isolate)
When isolated, all load and store operations are targetted
to the Data cache, and never the main memory.
(Used by PSX Kernel, in combination with Port FFFE0130h)
17 Swc Swapped cache mode (0=Normal, 1=Swapped)
Instruction cache will act as Data cache and vice versa.
Use only with Isc to access & invalidate Instr. cache entries.
(Not used by PSX Kernel)
18 PZ When set cache parity bits are written as 0.
19 CM Shows the result of the last load operation with the D-cache
isolated. It gets set if the cache really contained data
for the addressed memory location.
20 PE Cache parity error (Does not cause exception)
21 TS TLB shutdown. Gets set if a programm address simultaneously
matches 2 TLB entries.
(initial value on reset allows to detect extended CPU version?)
22 BEV Boot exception vectors in RAM/ROM (0=RAM/KSEG0, 1=ROM/KSEG1)
23-24 - Not used (zero)
25 RE Reverse endianness (0=Normal endianness, 1=Reverse endianness)
Reverses the byte order in which data is stored in
memory. (lo-hi -> hi-lo)
(Has affect only to User mode, not to Kernel mode) (?)
(The bit doesn't exist in PSX ?)
26-27 - Not used (zero)
28 CU0 COP0 Enable (0=Enable only in Kernal Mode, 1=Kernal and User Mode)
29 CU1 COP1 Enable (0=Disable, 1=Enable) (none such in PSX)
30 CU2 COP2 Enable (0=Disable, 1=Enable) (GTE in PSX)
31 CU3 COP3 Enable (0=Disable, 1=Enable) (none such in PSX)
```
#### cop0r14 - EPC - Return Address from Trap (R)
```
0-31 Return Address from Exception
```
This address is the instruction at which the exception took place, unless BD is
set in CAUSE, then the instruction was at EPC+4.<br/>
Interrupts should always return to EPC+0, no matter of the BD flag. That way,
if BD=1, the branch gets executed again, that's required because EPC stores
only the current program counter, but not additionally the branch destination
address.<br/>
Other exceptions may require to handle BD. In simple cases, when BD=0, the
exception handler may return to EPC+0 (retry execution of the opcode), or to
EPC+4 (skip the opcode that caused the exception). Note that jumps to faulty
memory locations are executed without exception, but will trigger address
errors and bus errors at the target location, ie. EPC (and BadVAddr, in case of
address errors) point to the faulty address, not to the opcode that has jumped
to that address).<br/>
#### Interrupts vs GTE Commands
If an interrupt occurs "on" a GTE command (cop2cmd), then the GTE command is
executed, but nethertheless, the return address in EPC points to the GTE
command. So, if the exeception handler would return to EPC as usually, then the
GTE command would be executed twice. In best case, this would be a waste of
clock cycles, in worst case it may lead to faulty result (if the results from
the 1st execution are re-used as incoming parameters in the 2nd execution). To
fix the problem, the exception handler must do:<br/>
```
if (cause AND 7Ch)=00h ;if excode=interrupt
if ([epc] AND FE000000h)=4A000000h ;and opcode=cop2cmd
epc=epc+4 ;then skip that opcode
```
Note: The above exception handling is working only in newer PSX BIOSes, but in
very old PSX BIOSes, it is only incompletely implemented (see "BIOS Patches"
chapter for common workarounds; or write your own exception handler without
using the BIOS).<br/>
Of course, the above exeption handling won't work in branch delays (where BD
gets set to indicate that EPC was modified) (best workaround is not to use GTE
commands in branch delays).<br/>
#### cop0cmd=10h - RFE opcode - Prepare Return from Exception
The RFE opcode moves some bits in cop0r12 (SR): bit2-3 are copied to bit0-1,
and bit4-5 are copied to bit2-3, all other bits (including bit4-5) are left
unchanged.<br/>
The RFE opcode does NOT automatically jump to EPC. Instead, the exception
handler must copy EPC into a register (usually R26 aka K0), and then jump to
that address. Because of branch delays, that would look like so:<br/>
```
mov k0,epc ;get return address
push k0 ;save epc in memory (if you expect nested exceptions)
... ;whatever (ie. process CAUSE)
pop k0 ;restore from memory (if you expect nested exceptions)
jmp k0 ;jump to K0 (after executing the next opcode)
+rfe ;move SR bit4/5 --> bit2/3 --> bit0/1
```
If you expect exceptions to be nested deeply, also push/pop SR. Note that
there's no way to leave all registers intact (ie. above code destroys K0).<br/>
#### cop0r8 - BadVaddr - Bad Virtual Address (R)
Contains the address whose reference caused an exception. Set on any MMU type
of exceptions, on references outside of kuseg (in User mode) and on any
misaligned reference. BadVaddr is updated ONLY by Address errors (Excode 04h
and 05h), all other exceptions (including bus errors) leave BadVaddr unchanged.<br/>
#### Exception Vectors (depending on BEV bit in SR register)
```
Exception BEV=0 BEV=1
Reset BFC00000h BFC00000h (Reset)
UTLB Miss 80000000h BFC00100h (Virtual memory, none such in PSX)
COP0 Break 80000040h BFC00140h (Debug Break)
General 80000080h BFC00180h (General Interrupts & Exceptions)
```
Note: Changing vectors at 800000xxh (kseg0) seems to be automatically reflected
to the instruction cache without needing to flush cache (at least it worked
SOMETIMES in my test proggy... but NOT always? ...anyways, it'd be highly
recommended to flush cache when changing any opcodes), whilst changing mirrors
at 000000xxh (kuseg) seems to require to flush cache.<br/>
The PSX uses only the BEV=0 vectors (aside from the reset vector, the PSX BIOS
ROM doesn't contain any of the BEV=1 vectors).<br/>
#### Exception Priority
```
Reset At any time (highest) ;-reset
AdEL Memory (Load instruction) ;\
AdES Memory (Store instruction) ; memory (data load/store)
DBE Memory (Load or store) ;/
MOD ALU (Data TLB) ;\
TLBL ALU (DTLB Miss) ; none such
TLBS ALU (DTLB Miss) ;/
Ovf ALU ;-overflow
Int ALU ;-interrupt
Sys RD (Instruction Decode) ;\
Bp RD (Instruction Decode) ;
RI RD (Instruction Decode) ;
CpU RD (Instruction Decode) ;/
TLBL I-Fetch (ITLB Miss) ;-none such
AdEL IVA (Instruction Virtual Address) ;\memory (opcode fetch)
IBE RD (end of I-Fetch, lowest) ;/
```
## COP0 - Misc
#### cop0r15 - PRID - Processor ID (R)
```
0-7 Revision
8-15 Implementation
16-31 Not used
```
For a Playstation with CXD8606CQ CPU, the PRID value is 00000002h.<br/>
Unknown if/which other Playstation CPU versions have other values...?<br/>
#### cop0r6 - JUMPDEST - Randomly memorized jump address (R)
The is a rather strange totally useless register. After certain exceptions, the
CPU does memorize a jump destination address in the register. Once when it has
memorized an address, the register becomes locked, and the memorized value
won't change until it becomes unlocked by a new exception. Exceptions that do
unlock the register are Reset and Interrupts (cause.bit10). Exceptions that do
NOT unlock the register are syscall/break opcodes, and software generated
interrupts (eg. cause.bit8).<br/>
In the unlocked state, the CPU does more or less randomly memorize one of the
next some jump destinations - eg. the destination from the second jump after
reset, or from a jump that occured immediately before executing the IRQ
handler, or from a jump inside of the IRQ handler, or even from a later jump
that occurs shortly after returning from the IRQ handler.<br/>
The register seems to be read-only (although the Kernel initialization code
writes 0 to it for whatever reason).<br/>
#### cop0r0..r2, cop0r4, cop0r10, cop0r32..r63 - N/A
Registers 0,1,2,4,10 control virtual memory on some MIPS processors (but
there's none such in the PSX), and Registers 32..63 (aka "control registers")
aren't used in any MIPS processors. Trying to read any of these registers
causes a Reserved Instruction Exception (excode=0Ah).<br/>
#### cop0cmd=01h,02h,06h,08h - TLBR,TLBWI,TLBWR,TLBP
The PSX supports only one cop0cmd (cop0cmd=10h aka RFE). Trying to execute the
TLBxx opcodes causes a Reserved Instruction Exception (excode=0Ah).<br/>
#### jf/jt cop0flg,dest - conditional cop0 jumps
#### mov [mem],cop0reg / mov cop0reg,[mem] - coprocessor cop0 load/store
Not supported by the CPU. Trying to execute these opcodes causes a Coprocessor
Unusable Exception (excode=0Bh, ie. unlike above, not 0Ah).<br/>
#### cop0r16-r31 - Garbage
Trying to read these registers returns garbage (but does not trigger an
exception). When reading one of the garbage registers shortly after reading a
valid cop0 register, the garbage value is usually the same as that of the valid
register. When doing the read later on, the return value is usually 00000020h,
or when reading much later it returns 00000040h, or even 00000100h. No idea
what is causing that effect...?<br/>
Note: The garbage registers can be accessed (without causing an exception) even
in "User mode with cop0 disabled" (SR.Bit1=1 and SR.Bit28=0); accessing any
other existing cop0 registers (or executing the rfe opcode) in that state is
causing Coprocessor Unusable Exceptions (excode=0Bh).<br/>
## COP0 - Debug Registers
The COP0 debug registers seem to be PSX specific, normal R30xx CPUs like IDT's
R3041 and R3051 don't have anything similar.<br/>
#### cop0r7 - DCIC - Breakpoint control (R/W)
```
0 Automatically set by hardware upon Any break (R/W)
1 Automatically set by hardware upon BPC Code break (R/W)
2 Automatically set by hardware upon BDA Data break (R/W)
3 Automatically set by hardware upon BDA Data-Read break (R/W)
4 Automatically set by hardware upon BDA Data-Write break (R/W)
5 Automatically set by hardware upon any-jump break (R/W)
6-11 Not used (always zero)
12-13 Jump Redirection (0=Disable, 1..3=Enable) (see note) (R/W)
14-15 Unknown? (R/W)
16-22 Not used (always zero)
23 Super-Master Enable 1 for bit24-29
24 Execution breakpoint (0=Disabled, 1=Enabled) (see BPC, BPCM)
25 Data access breakpoint (0=Disabled, 1=Enabled) (see BDA, BDAM)
26 Break on Data-Read (0=No, 1=Break/when Bit25=1)
27 Break on Data-Write (0=No, 1=Break/when Bit25=1)
28 Break on any-jump (0=No, 1=Break on branch/jump/call/etc.)
29 Master Enable for bit28 (..and/or exec-break at address>=80000000h?)
30 Master Enable for bit24-27
31 Super-Master Enable 2 for bit24-29
```
When a breakpoint address match occurs the PSX jumps to 80000040h (ie. unlike
normal exceptions, not to 80000080h). The Excode value in the CAUSE register is
set to 09h (same as BREAK opcode), and EPC contains the return address, as
usually. One of the first things to be done in the exception handler is to
disable breakpoints (eg. if the any-jump break is enabled, then it must be
disabled BEFORE jumping from 80000040h to the actual exception handler).<br/>
#### cop0r7.bit12-13 - Jump Redirection Note
If one or both of these bits are nonzero, then the PSX seems to check for the
following opcode sequence,<br/>
```
mov rx,[mem] ;load rx from memory
... ;one or more opcodes that do not change rx
jmp/call rx ;jump or call to rx
```
if it does sense that sequence, then it sets PC=[00000000h], but does not store
any useful information in any cop0 registers, namely it does not store the
return address in EPC, so it's impossible to determine which opcode has caused
the exception. For the jump target address, there are 31 registers, so one
could only guess which of them contains the target value; for "POP PC" code
it'd be usually R31, but for "JMP [vector]" code it may be any register. So far
the feature seems to be more or less unusable...?<br/>
#### cop0r5 - BDA - Breakpoint on Data Access Address (R/W)
#### cop0r9 - BDAM - Breakpoint on Data Access Mask (R/W)
Break condition is "((addr XOR BDA) AND BDAM)=0".<br/>
#### cop0r3 - BPC - Breakpoint on Execute Address (R/W)
#### cop0r11 - BPCM - Breakpoint on Execute Mask (R/W)
Break condition is "((PC XOR BPC) AND BPCM)=0".<br/>
#### Note (BREAK Opcode)
Additionally, the BREAK opcode can be used to create further breakpoints by
patching the executable code. The BREAK opcode uses the same Excode value (09h)
in CAUSE register. However, the BREAK opcode jumps to the normal exception
handler at 80000080h (not 80000040h).<br/>
#### Note (LibCrypt)
The debug registers are mis-used by "Legacy of Kain: Soul Reaver" (and maybe
also other games) for storing libcrypt copy-protection related values (ie. just
as a "hidden" location for storing data, not for actual debugging purposes).<br/>
[CDROM Protection - LibCrypt](cdromdrive.md#cdrom-protection---libcrypt)<br/>
#### Note (Cheat Devices/Expansion ROMs)
The Expansion ROM header supports only Pre-Boot and Post-Boot vectors, but no
Mid-Boot vector. Cheat Devices are often using COP0 breaks for Mid-Boot Hooks,
either with BPC=BFC06xxxh (break address in ROM, used in older cheat
firmwares), or with BPC=80030000h (break address in RAM aka relocated GUI
entrypoint, used in later cheat firmwares). Moreover, aside from the Mid-Boot
Hook, the Xplorer cheat device is also supporting a special cheat code that
uses the COP0 break feature.<br/>
#### Note (Datasheet)
Note: COP0 debug registers are described in LSI's "L64360" datasheet, chapter
14. And in their LR33300/LR33310 datasheet, chapter 4.<br/>