2020-07-23 20:49:17 +02:00
|
|
|
# Graphics Processing Unit (GPU)
|
|
|
|
The GPU can render Polygons, Lines, or Rectangles to the Drawing Buffer, and
|
|
|
|
sends the Display Buffer to the Television Set. Polygons are useful for 3D
|
|
|
|
graphics (or rotated/scaled 2D graphics), Rectangles are useful for 2D graphics
|
|
|
|
and Text output.<br/>
|
|
|
|
|
|
|
|
[GPU I/O Ports, DMA Channels, Commands, VRAM](graphicsprocessingunitgpu.md#gpu-io-ports-dma-channels-commands-vram)<br/>
|
|
|
|
[GPU Render Polygon Commands](graphicsprocessingunitgpu.md#gpu-render-polygon-commands)<br/>
|
|
|
|
[GPU Render Line Commands](graphicsprocessingunitgpu.md#gpu-render-line-commands)<br/>
|
|
|
|
[GPU Render Rectangle Commands](graphicsprocessingunitgpu.md#gpu-render-rectangle-commands)<br/>
|
|
|
|
[GPU Rendering Attributes](graphicsprocessingunitgpu.md#gpu-rendering-attributes)<br/>
|
|
|
|
[GPU Memory Transfer Commands](graphicsprocessingunitgpu.md#gpu-memory-transfer-commands)<br/>
|
|
|
|
[GPU Other Commands](graphicsprocessingunitgpu.md#gpu-other-commands)<br/>
|
|
|
|
[GPU Display Control Commands (GP1)](graphicsprocessingunitgpu.md#gpu-display-control-commands-gp1)<br/>
|
|
|
|
[GPU Status Register](graphicsprocessingunitgpu.md#gpu-status-register)<br/>
|
|
|
|
[GPU Versions](graphicsprocessingunitgpu.md#gpu-versions)<br/>
|
|
|
|
[GPU Depth Ordering](graphicsprocessingunitgpu.md#gpu-depth-ordering)<br/>
|
|
|
|
[GPU Video Memory (VRAM)](graphicsprocessingunitgpu.md#gpu-video-memory-vram)<br/>
|
|
|
|
[GPU Texture Caching](graphicsprocessingunitgpu.md#gpu-texture-caching)<br/>
|
|
|
|
[GPU Timings](graphicsprocessingunitgpu.md#gpu-timings)<br/>
|
|
|
|
[GPU (MISC)](graphicsprocessingunitgpu.md#gpu-misc)<br/>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
## GPU I/O Ports, DMA Channels, Commands, VRAM
|
|
|
|
#### GPU I/O Ports (1F801810h and 1F801814h in Read/Write Directions)
|
|
|
|
```
|
|
|
|
Port Name Expl.
|
|
|
|
1F801810h-Write GP0 Send GP0 Commands/Packets (Rendering and VRAM Access)
|
|
|
|
1F801814h-Write GP1 Send GP1 Commands (Display Control) (and DMA Control)
|
|
|
|
1F801810h-Read GPUREAD Receive responses to GP0(C0h) and GP1(10h) commands
|
|
|
|
1F801814h-Read GPUSTAT Receive GPU Status Register
|
|
|
|
```
|
|
|
|
It (=GP0 only?) has a 64-byte (16-word) command FIFO buffer.<br/>
|
|
|
|
Optionally, Port 1F801810h (Read/Write) can be also accessed via DMA2.<br/>
|
|
|
|
|
|
|
|
#### GPU Timers / Synchronization
|
|
|
|
Most of the Timers are bound to GPU timings, see<br/>
|
|
|
|
[Timers](timers.md)<br/>
|
|
|
|
[Interrupts](interrupts.md)<br/>
|
|
|
|
|
|
|
|
#### GPU-related DMA Channels (DMA2 and DMA6)
|
|
|
|
```
|
|
|
|
Channel Recommended for
|
|
|
|
DMA2 in Linked Mode - Sending rendering commands ;GP0(20h..7Fh,E1h..E6h)
|
|
|
|
DMA2 in Continous Mode - VRAM transfers to/from GPU ;GP0(A0h,C0h)
|
|
|
|
DMA6 - Initializing the Link List ;Main RAM
|
|
|
|
```
|
|
|
|
Note: Before using DMA2, set up the DMA Direction in GP1(04h).<br/>
|
|
|
|
DMA2 is equivalent to accessing Port 1F801810h (GP0/GPUREAD) by software.<br/>
|
|
|
|
DMA6 just initializes data in Main RAM (not physically connected to the GPU).<br/>
|
|
|
|
|
|
|
|
#### GPU Command Summary
|
|
|
|
Commands/Packets consist of a 8bit command number (MSBs) and a 24bit parameter
|
|
|
|
(LSBs), which are written as 32bit value to GP0 or GP1.<br/>
|
|
|
|
```
|
|
|
|
GP0(00h) - Nop?
|
|
|
|
GP0(01h,02h,80h,A0h,C0h) - Direct VRAM Access
|
|
|
|
GP0(03h) - Unknown (does take up FIFO space!!!)
|
|
|
|
GP0(1Fh) - Interrupt Request (IRQ1)
|
|
|
|
GP0(20h..3Fh) - Render Polygons
|
|
|
|
GP0(40h..5Fh) - Render Lines
|
|
|
|
GP0(60h..7Fh) - Render Rectangles
|
|
|
|
GP0(E1h..E6h) - Rendering Attributes
|
|
|
|
GP1(00h..09h,10h,20h) - Display Control (these via GP1 register)
|
|
|
|
```
|
|
|
|
Some GP0 commands require additional parameters, which are written (following
|
|
|
|
to the command) as further 32bit values to GP0. The execution of the command
|
|
|
|
starts when all parameters have been received (or, in case of Polygon/Line
|
|
|
|
commands, when the first 3/2 vertices have been received).<br/>
|
|
|
|
|
|
|
|
#### VRAM Overview / VRAM Addressing
|
|
|
|
VRAM is 1MByte (not mapped to the CPU bus) (it can be read/written only via I/O
|
|
|
|
or DMA). The memory is used for:<br/>
|
|
|
|
```
|
|
|
|
Framebuffer(s) ;Usually 2 buffers (Drawing Area, and Display Area)
|
|
|
|
Texture Page(s) ;Required when using Textures
|
|
|
|
Texture Palette(s) ;Required when using 4bit/8bit Textures
|
|
|
|
```
|
|
|
|
The 1MByte VRAM is organized as 512 lines of 2048 bytes. It is accessed via
|
|
|
|
coordinates, ranging from (0,0)=Upper-Left to (N,511)=Lower-Right.<br/>
|
|
|
|
```
|
|
|
|
Unit = 4bit 8bit 16bit 24bit Halfwords | Unit = Lines
|
|
|
|
Width = 4096 2048 1024 682.66 1024 | Height = 512
|
|
|
|
```
|
|
|
|
The horizontal coordinates are addressing memory in
|
|
|
|
4bit/8bit/16bit/24bit/halfword units (depending on what data formats you are
|
|
|
|
using) (or a mixup thereof, eg. a halfword-base address, plus a 4bit texture
|
|
|
|
coordinate).<br/>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
## GPU Render Polygon Commands
|
|
|
|
#### GP0(20h) - Monochrome three-point polygon, opaque
|
|
|
|
#### GP0(22h) - Monochrome three-point polygon, semi-transparent
|
|
|
|
#### GP0(28h) - Monochrome four-point polygon, opaque
|
|
|
|
#### GP0(2Ah) - Monochrome four-point polygon, semi-transparent
|
|
|
|
```
|
|
|
|
1st Color+Command (CcBbGgRrh)
|
|
|
|
2nd Vertex1 (YyyyXxxxh)
|
|
|
|
3rd Vertex2 (YyyyXxxxh)
|
|
|
|
4th Vertex3 (YyyyXxxxh)
|
|
|
|
(5th) Vertex4 (YyyyXxxxh) (if any)
|
|
|
|
```
|
|
|
|
|
|
|
|
#### GP0(24h) - Textured three-point polygon, opaque, texture-blending
|
|
|
|
#### GP0(25h) - Textured three-point polygon, opaque, raw-texture
|
|
|
|
#### GP0(26h) - Textured three-point polygon, semi-transparent, texture-blending
|
|
|
|
#### GP0(27h) - Textured three-point polygon, semi-transparent, raw-texture
|
|
|
|
#### GP0(2Ch) - Textured four-point polygon, opaque, texture-blending
|
|
|
|
#### GP0(2Dh) - Textured four-point polygon, opaque, raw-texture
|
|
|
|
#### GP0(2Eh) - Textured four-point polygon, semi-transparent, texture-blending
|
|
|
|
#### GP0(2Fh) - Textured four-point polygon, semi-transparent, raw-texture
|
|
|
|
```
|
|
|
|
1st Color+Command (CcBbGgRrh) (color is ignored for raw-textures)
|
|
|
|
2nd Vertex1 (YyyyXxxxh)
|
|
|
|
3rd Texcoord1+Palette (ClutYyXxh)
|
|
|
|
4th Vertex2 (YyyyXxxxh)
|
|
|
|
5th Texcoord2+Texpage (PageYyXxh)
|
|
|
|
6th Vertex3 (YyyyXxxxh)
|
|
|
|
7th Texcoord3 (0000YyXxh)
|
|
|
|
(8th) Vertex4 (YyyyXxxxh) (if any)
|
|
|
|
(9th) Texcoord4 (0000YyXxh) (if any)
|
|
|
|
```
|
|
|
|
|
|
|
|
#### GP0(30h) - Shaded three-point polygon, opaque
|
|
|
|
#### GP0(32h) - Shaded three-point polygon, semi-transparent
|
|
|
|
#### GP0(38h) - Shaded four-point polygon, opaque
|
|
|
|
#### GP0(3Ah) - Shaded four-point polygon, semi-transparent
|
|
|
|
```
|
|
|
|
1st Color1+Command (CcBbGgRrh)
|
|
|
|
2nd Vertex1 (YyyyXxxxh)
|
|
|
|
3rd Color2 (00BbGgRrh)
|
|
|
|
4th Vertex2 (YyyyXxxxh)
|
|
|
|
5th Color3 (00BbGgRrh)
|
|
|
|
6th Vertex3 (YyyyXxxxh)
|
|
|
|
(7th) Color4 (00BbGgRrh) (if any)
|
|
|
|
(8th) Vertex4 (YyyyXxxxh) (if any)
|
|
|
|
```
|
|
|
|
|
|
|
|
#### GP0(34h) - Shaded Textured three-point polygon, opaque, texture-blending
|
|
|
|
#### GP0(36h) - Shaded Textured three-point polygon, semi-transparent, tex-blend
|
|
|
|
#### GP0(3Ch) - Shaded Textured four-point polygon, opaque, texture-blending
|
|
|
|
#### GP0(3Eh) - Shaded Textured four-point polygon, semi-transparent, tex-blend
|
|
|
|
```
|
|
|
|
1st Color1+Command (CcBbGgRrh)
|
|
|
|
2nd Vertex1 (YyyyXxxxh)
|
|
|
|
3rd Texcoord1+Palette (ClutYyXxh)
|
|
|
|
4th Color2 (00BbGgRrh)
|
|
|
|
5th Vertex2 (YyyyXxxxh)
|
|
|
|
6th Texcoord2+Texpage (PageYyXxh)
|
|
|
|
7th Color3 (00BbGgRrh)
|
|
|
|
8th Vertex3 (YyyyXxxxh)
|
|
|
|
9th Texcoord3 (0000YyXxh)
|
|
|
|
(10th) Color4 (00BbGgRrh) (if any)
|
|
|
|
(11th) Vertex4 (YyyyXxxxh) (if any)
|
|
|
|
(12th) Texcoord4 (0000YyXxh) (if any)
|
|
|
|
```
|
|
|
|
|
|
|
|
#### GP0(35h,37h,3Dh,3Fh) - Undocumented/Nonsense (Raw Texture + UNUSED shading)
|
|
|
|
These are undocumented inefficient nonsense commands: Parameters are same as
|
|
|
|
for GP0(34h,36h,3Ch,3Eh), ie. with colors for all vertices, but without
|
|
|
|
actually using that colors. Instead, the commands are rendering raw textures
|
|
|
|
without blending.<br/>
|
|
|
|
In other words, the commands have same function as GP0(25h,27h,2Dh,2Fh), but
|
|
|
|
with additional/unused parameters (and possible additional/unused internal
|
|
|
|
gouraud shading calculations).<br/>
|
|
|
|
For whatever reason, Castlevania is actually using these nonsense commands,
|
|
|
|
namely GP0(3Dh) and GP0(3Fh).<br/>
|
|
|
|
|
|
|
|
#### GP0(21h,23h,29h,2Bh,31h,33h,39h,3Bh) - Undocumented/Nonsense
|
|
|
|
These commands have texture-blending disabled, which is nonsense because they
|
|
|
|
are using untextured polygons anyways, ie. they are probably same as
|
|
|
|
GP0(20h,22h,28h,2Ah,30h,32h,38h,3Ah).<br/>
|
|
|
|
|
|
|
|
#### Notes
|
|
|
|
Polygons are displayed up to \<excluding\> their lower-right coordinates.<br/>
|
|
|
|
Four-point polygons are internally processed as two Three-point polygons, the
|
|
|
|
first consisting of Vertices 1,2,3, and the second of Vertices 2,3,4.<br/>
|
|
|
|
Within the Three-point polygons, the ordering of the vertices is don't care at
|
|
|
|
the GPU side (a front-back check, based on clockwise or anti-clockwise
|
|
|
|
ordering, can be implemented at the GTE side).<br/>
|
|
|
|
Dither enable (in Texpage command) affects ONLY polygons that do use Gouraud
|
|
|
|
Shading or Texture Blending.<br/>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
## GPU Render Line Commands
|
|
|
|
#### GP0(40h) - Monochrome line, opaque
|
|
|
|
#### GP0(42h) - Monochrome line, semi-transparent
|
|
|
|
#### GP0(48h) - Monochrome Poly-line, opaque
|
|
|
|
#### GP0(4Ah) - Monochrome Poly-line, semi-transparent
|
|
|
|
```
|
|
|
|
1st Color+Command (CcBbGgRrh)
|
|
|
|
2nd Vertex1 (YyyyXxxxh)
|
|
|
|
3rd Vertex2 (YyyyXxxxh)
|
|
|
|
(...) VertexN (YyyyXxxxh) (poly-line only)
|
|
|
|
(Last) Termination Code (55555555h) (poly-line only)
|
|
|
|
```
|
|
|
|
|
|
|
|
#### GP0(50h) - Shaded line, opaque
|
|
|
|
#### GP0(52h) - Shaded line, semi-transparent
|
|
|
|
#### GP0(58h) - Shaded Poly-line, opaque
|
|
|
|
#### GP0(5Ah) - Shaded Poly-line, semi-transparent
|
|
|
|
```
|
|
|
|
1st Color1+Command (CcBbGgRrh)
|
|
|
|
2nd Vertex1 (YyyyXxxxh)
|
|
|
|
3rd Color2 (00BbGgRrh)
|
|
|
|
4th Vertex2 (YyyyXxxxh)
|
|
|
|
(...) ColorN (00BbGgRrh) (poly-line only)
|
|
|
|
(...) VertexN (YyyyXxxxh) (poly-line only)
|
|
|
|
(Last) Termination Code (55555555h) (poly-line only)
|
|
|
|
```
|
|
|
|
|
|
|
|
#### Note
|
|
|
|
Lines are displayed up to \<including\> their lower-right coordinates (ie.
|
|
|
|
unlike as for polygons, the lower-right coordinate is not excluded).<br/>
|
|
|
|
If dithering is enabled (via Texpage command), then both monochrome and shaded
|
|
|
|
lines are drawn with dithering (this differs from monochrome polygons and
|
|
|
|
monochrome rectangles).<br/>
|
|
|
|
|
|
|
|
#### Termination Codes for Poly-Lines (aka Linestrips)
|
|
|
|
The termination code should be usually 55555555h, however, Wild Arms 2 uses
|
|
|
|
50005000h (unknown which exact bits/values are relevant there).<br/>
|
|
|
|
|
|
|
|
#### Wire-Frame
|
|
|
|
Poly-Lines can be used (among others) to create Wire-Frame polygons (by setting
|
|
|
|
the last Vertex equal to Vertex 1).<br/>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
## GPU Render Rectangle Commands
|
|
|
|
Rectangles are drawn much faster than polygons. Unlike for polygons, gouroud
|
|
|
|
shading is not possible, dithering isn't applied, the rectangle must forcefully
|
|
|
|
have horizontal and vertical edges, textures cannot be rotated or scaled, and,
|
|
|
|
of course, the GPU does render Rectangles at once (without splitting them into
|
|
|
|
triangles).<br/>
|
|
|
|
|
|
|
|
#### GP0(60h) - Monochrome Rectangle (variable size) (opaque)
|
|
|
|
#### GP0(62h) - Monochrome Rectangle (variable size) (semi-transparent)
|
|
|
|
#### GP0(68h) - Monochrome Rectangle (1x1) (Dot) (opaque)
|
|
|
|
#### GP0(6Ah) - Monochrome Rectangle (1x1) (Dot) (semi-transparent)
|
|
|
|
#### GP0(70h) - Monochrome Rectangle (8x8) (opaque)
|
|
|
|
#### GP0(72h) - Monochrome Rectangle (8x8) (semi-transparent)
|
|
|
|
#### GP0(78h) - Monochrome Rectangle (16x16) (opaque)
|
|
|
|
#### GP0(7Ah) - Monochrome Rectangle (16x16) (semi-transparent)
|
|
|
|
```
|
|
|
|
1st Color+Command (CcBbGgRrh)
|
|
|
|
2nd Vertex (YyyyXxxxh)
|
|
|
|
(3rd) Width+Height (YsizXsizh) (variable size only) (max 1023x511)
|
|
|
|
```
|
|
|
|
|
|
|
|
#### GP0(64h) - Textured Rectangle, variable size, opaque, texture-blending
|
|
|
|
#### GP0(65h) - Textured Rectangle, variable size, opaque, raw-texture
|
|
|
|
#### GP0(66h) - Textured Rectangle, variable size, semi-transp, texture-blending
|
|
|
|
#### GP0(67h) - Textured Rectangle, variable size, semi-transp, raw-texture
|
|
|
|
#### GP0(6Ch) - Textured Rectangle, 1x1 (nonsense), opaque, texture-blending
|
|
|
|
#### GP0(6Dh) - Textured Rectangle, 1x1 (nonsense), opaque, raw-texture
|
|
|
|
#### GP0(6Eh) - Textured Rectangle, 1x1 (nonsense), semi-transp, texture-blending
|
|
|
|
#### GP0(6Fh) - Textured Rectangle, 1x1 (nonsense), semi-transp, raw-texture
|
|
|
|
#### GP0(74h) - Textured Rectangle, 8x8, opaque, texture-blending
|
|
|
|
#### GP0(75h) - Textured Rectangle, 8x8, opaque, raw-texture
|
|
|
|
#### GP0(76h) - Textured Rectangle, 8x8, semi-transparent, texture-blending
|
|
|
|
#### GP0(77h) - Textured Rectangle, 8x8, semi-transparent, raw-texture
|
|
|
|
#### GP0(7Ch) - Textured Rectangle, 16x16, opaque, texture-blending
|
|
|
|
#### GP0(7Dh) - Textured Rectangle, 16x16, opaque, raw-texture
|
|
|
|
#### GP0(7Eh) - Textured Rectangle, 16x16, semi-transparent, texture-blending
|
|
|
|
#### GP0(7Fh) - Textured Rectangle, 16x16, semi-transparent, raw-texture
|
|
|
|
```
|
|
|
|
1st Color+Command (CcBbGgRrh) (color is ignored for raw-textures)
|
|
|
|
2nd Vertex (YyyyXxxxh) (upper-left edge of the rectangle)
|
|
|
|
3rd Texcoord+Palette (ClutYyXxh) (for 4bpp Textures Xxh must be even!)
|
|
|
|
(4th) Width+Height (YsizXsizh) (variable size only) (max 1023x511)
|
|
|
|
```
|
|
|
|
Unlike for Textured-Polygons, the "Texpage" must be set up separately for
|
|
|
|
Rectangles, via GP0(E1h). Width and Height can be up to 1023x511, however, the
|
|
|
|
maximum size of the texture window is 256x256 (so the source data will be
|
|
|
|
repeated when trying to use sizes larger than 256x256).<br/>
|
|
|
|
|
|
|
|
#### Texture Origin and X/Y-Flip
|
|
|
|
Vertex & Texcoord specify the upper-left edge of the rectangle. And,
|
|
|
|
normally, screen coords and texture coords are both incremented during
|
|
|
|
rendering the rectangle pixels.<br/>
|
|
|
|
Optionally, X/Y-Flip bits can be set in Texpage.Bit12/13, these bits cause the
|
|
|
|
texture coordinates to be decremented (instead of incremented). The X/Y-Flip
|
|
|
|
bits do affect only Rectangles (not Polygons, nor VRAM Transfers).<br/>
|
|
|
|
Caution: Reportedly, the X/Y-Flip feature isn't supported on old PSX consoles
|
|
|
|
(unknown which ones exactly, maybe such with PU-7 mainboards, and unknown how
|
|
|
|
to detect flipping support; except of course by reading VRAM).<br/>
|
|
|
|
|
|
|
|
#### Note
|
|
|
|
There are also two VRAM Transfer commands which work similar to GP0(60h) and
|
|
|
|
GP0(65h). Eventually, that commands might be even faster... although not sure
|
|
|
|
if they do use the Texture Cache?<br/>
|
|
|
|
The difference is that VRAM Transfers do not clip to the Drawig Area boundary,
|
|
|
|
do not support fully-transparent nor semi-transparent texture pixels, and do
|
|
|
|
not convert color depths (eg. without 4bit texture to 16bit framebuffer
|
|
|
|
conversion).<br/>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
## GPU Rendering Attributes
|
|
|
|
#### Vertex (Parameter for Polygon, Line, Rectangle commands)
|
|
|
|
```
|
|
|
|
0-10 X-coordinate (signed, -1024..+1023)
|
|
|
|
11-15 Not used (usually sign-extension, but ignored by hardware)
|
|
|
|
16-26 Y-coordinate (signed, -1024..+1023)
|
|
|
|
26-31 Not used (usually sign-extension, but ignored by hardware)
|
|
|
|
```
|
|
|
|
Size Restriction: The maximum distance between two vertices is 1023
|
|
|
|
horizontally, and 511 vertically. Polygons and lines that are exceeding that
|
|
|
|
dimensions are NOT rendered. For example, a line from Y1=-300 to Y2=+300 is NOT
|
|
|
|
rendered, a line from Y1=-100 to Y2=+400 is rendered (as far as it is within
|
|
|
|
the drawing area).<br/>
|
|
|
|
If portions of the polygon/line/rectangle are located outside of the drawing
|
|
|
|
area, then the hardware renders only the portion that is inside of the drawing
|
|
|
|
area. Not sure if the hardware is skipping all clipped pixels at once (within a
|
|
|
|
single clock cycle), or if it's (slowly) processing them pixel by pixel?<br/>
|
|
|
|
|
|
|
|
#### Color Attribute (Parameter for all Rendering commands, except Raw Texture)
|
|
|
|
```
|
|
|
|
0-7 Red (0..FFh)
|
|
|
|
8-15 Green (0..FFh)
|
|
|
|
16-23 Blue (0..FFh)
|
|
|
|
24-31 Command (in first paramter) (don't care in further parameters)
|
|
|
|
```
|
|
|
|
Caution: For untextured graphics, 8bit RGB values of FFh are brightest.
|
|
|
|
However, for texture blending, 8bit values of 80h are brightest (values
|
|
|
|
81h..FFh are "brighter than bright" allowing to make textures about twice as
|
|
|
|
bright as than they were originially stored in memory; of course the results
|
|
|
|
can't exceed the maximum brightness, ie. the 5bit values written to the
|
|
|
|
framebuffer are saturated to max 1Fh).<br/>
|
|
|
|
|
|
|
|
#### Texpage Attribute (Parameter for Textured-Polygons commands)
|
|
|
|
```
|
|
|
|
0-8 Same as GP0(E1h).Bit0-8 (see there)
|
|
|
|
9-10 Unused (does NOT change GP0(E1h).Bit9-10)
|
|
|
|
11 Same as GP0(E1h).Bit11 (see there)
|
|
|
|
12-13 Unused (does NOT change GP0(E1h).Bit12-13)
|
|
|
|
14-15 Unused (should be 0)
|
|
|
|
```
|
|
|
|
This attribute is used in all Textured-Polygons commands.<br/>
|
|
|
|
|
|
|
|
#### Clut Attribute (Color Lookup Table, aka Palette)
|
|
|
|
This attribute is used in all Textured Polygon/Rectangle commands. Of course,
|
|
|
|
it's relevant only for 4bit/8bit textures (don't care for 15bit textures).<br/>
|
|
|
|
```
|
|
|
|
0-5 X coordinate X/16 (ie. in 16-halfword steps)
|
|
|
|
6-14 Y coordinate 0-511 (ie. in 1-line steps)
|
|
|
|
15 Unknown/unused (should be 0)
|
|
|
|
```
|
|
|
|
Specifies the location of the CLUT data within VRAM.<br/>
|
|
|
|
|
|
|
|
#### GP0(E1h) - Draw Mode setting (aka "Texpage")
|
|
|
|
```
|
|
|
|
0-3 Texture page X Base (N*64) (ie. in 64-halfword steps) ;GPUSTAT.0-3
|
|
|
|
4 Texture page Y Base (N*256) (ie. 0 or 256) ;GPUSTAT.4
|
|
|
|
5-6 Semi Transparency (0=B/2+F/2, 1=B+F, 2=B-F, 3=B+F/4) ;GPUSTAT.5-6
|
|
|
|
7-8 Texture page colors (0=4bit, 1=8bit, 2=15bit, 3=Reserved);GPUSTAT.7-8
|
|
|
|
9 Dither 24bit to 15bit (0=Off/strip LSBs, 1=Dither Enabled) ;GPUSTAT.9
|
|
|
|
10 Drawing to display area (0=Prohibited, 1=Allowed) ;GPUSTAT.10
|
|
|
|
11 Texture Disable (0=Normal, 1=Disable if GP1(09h).Bit0=1) ;GPUSTAT.15
|
|
|
|
(Above might be chipselect for (absent) second VRAM chip?)
|
|
|
|
12 Textured Rectangle X-Flip (BIOS does set this bit on power-up...?)
|
|
|
|
13 Textured Rectangle Y-Flip (BIOS does set it equal to GPUSTAT.13...?)
|
|
|
|
14-23 Not used (should be 0)
|
|
|
|
24-31 Command (E1h)
|
|
|
|
```
|
|
|
|
The GP0(E1h) command is required only for Lines, Rectangle, and
|
|
|
|
Untextured-Polygons (for Textured-Polygons, the data is specified in form of
|
|
|
|
the Texpage attribute; except that, Bit9-10 can be changed only via GP0(E1h),
|
|
|
|
not via the Texpage attribute).<br/>
|
|
|
|
Texture page colors setting 3 (reserved) is same as setting 2 (15bit).<br/>
|
|
|
|
Note: GP0(00h) seems to be often inserted between Texpage and Rectangle
|
|
|
|
commands, maybe it acts as a NOP, which may be required between that commands,
|
|
|
|
for timing reasons...?<br/>
|
|
|
|
|
|
|
|
#### GP0(E2h) - Texture Window setting
|
|
|
|
```
|
|
|
|
0-4 Texture window Mask X (in 8 pixel steps)
|
|
|
|
5-9 Texture window Mask Y (in 8 pixel steps)
|
|
|
|
10-14 Texture window Offset X (in 8 pixel steps)
|
|
|
|
15-19 Texture window Offset Y (in 8 pixel steps)
|
|
|
|
20-23 Not used (zero)
|
|
|
|
24-31 Command (E2h)
|
|
|
|
```
|
|
|
|
Mask specifies the bits that are to be manipulated, and Offset contains the new
|
|
|
|
values for these bits, ie. texture X/Y coordinates are adjusted as so:<br/>
|
|
|
|
```
|
|
|
|
Texcoord = (Texcoord AND (NOT (Mask*8))) OR ((Offset AND Mask)*8)
|
|
|
|
```
|
|
|
|
The area within a texture window is repeated throughout the texture page. The
|
|
|
|
data is not actually stored all over the texture page but the GPU reads the
|
|
|
|
repeated patterns as if they were there.<br/>
|
|
|
|
|
|
|
|
#### GP0(E3h) - Set Drawing Area top left (X1,Y1)
|
|
|
|
#### GP0(E4h) - Set Drawing Area bottom right (X2,Y2)
|
|
|
|
```
|
|
|
|
0-9 X-coordinate (0..1023)
|
|
|
|
10-18 Y-coordinate (0..511) ;\on Old 160pin GPU (max 1MB VRAM)
|
|
|
|
19-23 Not used (zero) ;/
|
|
|
|
10-19 Y-coordinate (0..1023) ;\on New 208pin GPU (max 2MB VRAM)
|
|
|
|
20-23 Not used (zero) ;/(retail consoles have only 1MB though)
|
|
|
|
24-31 Command (Exh)
|
|
|
|
```
|
|
|
|
Sets the drawing area corners. The Render commands GP0(20h..7Fh) are
|
|
|
|
automatically clipping any pixels that are outside of this region.<br/>
|
|
|
|
|
|
|
|
#### GP0(E5h) - Set Drawing Offset (X,Y)
|
|
|
|
```
|
|
|
|
0-10 X-offset (-1024..+1023) (usually within X1,X2 of Drawing Area)
|
|
|
|
11-21 Y-offset (-1024..+1023) (usually within Y1,Y2 of Drawing Area)
|
|
|
|
22-23 Not used (zero)
|
|
|
|
24-31 Command (E5h)
|
|
|
|
```
|
|
|
|
If you have configured the GTE to produce vertices with coordinate "0,0" being
|
|
|
|
located in the center of the drawing area, then the Drawing Offset must be
|
|
|
|
"X1+(X2-X1)/2, Y1+(Y2-Y1)/2". Or, if coordinate "0,0" shall be the upper-left
|
|
|
|
of the Drawing Area, then Drawing Offset should be "X1,Y1". Where X1,Y1,X2,Y2
|
|
|
|
are the values defined with GP0(E3h-E4h).<br/>
|
|
|
|
|
|
|
|
#### GP0(E6h) - Mask Bit Setting
|
|
|
|
```
|
|
|
|
0 Set mask while drawing (0=TextureBit15, 1=ForceBit15=1) ;GPUSTAT.11
|
|
|
|
1 Check mask before draw (0=Draw Always, 1=Draw if Bit15=0) ;GPUSTAT.12
|
|
|
|
2-23 Not used (zero)
|
|
|
|
24-31 Command (E6h)
|
|
|
|
```
|
|
|
|
When bit0 is off, the upper bit of the data written to the framebuffer is equal
|
|
|
|
to bit15 of the texture color (ie. it is set for colors that are marked as
|
|
|
|
"semi-transparent") (for untextured polygons, bit15 is set to zero).<br/>
|
|
|
|
When bit1 is on, any (old) pixels in the framebuffer with bit15=1 are
|
|
|
|
write-protected, and cannot be overwritten by (new) rendering commands.<br/>
|
|
|
|
The mask setting affects all rendering commands, as well as CPU-to-VRAM and
|
|
|
|
VRAM-to-VRAM transfer commands (where it acts on the separate halfwords, ie. as
|
|
|
|
for 15bit textures). However, Mask does NOT affect the Fill-VRAM command.<br/>
|
|
|
|
|
|
|
|
#### Note
|
|
|
|
GP0(E3h..E5h) do not take up space in the FIFO, so they are probably executed
|
|
|
|
immediately (even if there're still other commands in the FIFO). Best use them
|
|
|
|
only if you are sure that the FIFO is empty (otherwise the new Drawing Area
|
|
|
|
settings might accidently affect older Rendering Commands in the FIFO).<br/>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
## GPU Memory Transfer Commands
|
|
|
|
#### GP0(01h) - Clear Cache
|
|
|
|
```
|
|
|
|
1st Command (Cc000000h)
|
|
|
|
```
|
|
|
|
"Seems to be the same as the GP1 command." Uh, which GP1 command?<br/>
|
|
|
|
Before using GP(A0h) or GP(C0h) one should reportedly send:<br/>
|
|
|
|
Clear Cache (01000000h)<br/>
|
|
|
|
"Reset command buffer (write to GP1 or GP0)" Uh? Bullshit.<br/>
|
|
|
|
However, there \<may\> be some situations in which it is neccessary to
|
|
|
|
flush the texture cache.<br/>
|
|
|
|
|
|
|
|
#### GP0(02h) - Fill Rectangle in VRAM
|
|
|
|
```
|
|
|
|
1st Color+Command (CcBbGgRrh) ;24bit RGB value (see note)
|
|
|
|
2nd Top Left Corner (YyyyXxxxh) ;Xpos counted in halfwords, steps of 10h
|
|
|
|
3rd Width+Height (YsizXsizh) ;Xsiz counted in halfwords, steps of 10h
|
|
|
|
```
|
|
|
|
Fills the area in the frame buffer with the value in RGB. Horizontally the
|
|
|
|
filling is done in 16-pixel (32-bytes) units (see below masking/rounding).<br/>
|
|
|
|
The "Color" parameter is a 24bit RGB value, however, the actual fill data is
|
|
|
|
16bit: The hardware automatically converts the 24bit RGB value to 15bit RGB
|
|
|
|
(with bit15=0).<br/>
|
|
|
|
Fill is NOT affected by the Mask settings (acts as if Mask.Bit0,1 are both
|
|
|
|
zero).<br/>
|
|
|
|
|
|
|
|
#### GP0(80h) - Copy Rectangle (VRAM to VRAM)
|
|
|
|
```
|
|
|
|
1st Command (Cc000000h)
|
|
|
|
2nd Source Coord (YyyyXxxxh) ;Xpos counted in halfwords
|
|
|
|
3rd Destination Coord (YyyyXxxxh) ;Xpos counted in halfwords
|
|
|
|
4th Width+Height (YsizXsizh) ;Xsiz counted in halfwords
|
|
|
|
```
|
|
|
|
Copys data within framebuffer. The transfer is affected by Mask setting.<br/>
|
|
|
|
|
|
|
|
#### GP0(A0h) - Copy Rectangle (CPU to VRAM)
|
|
|
|
```
|
|
|
|
1st Command (Cc000000h)
|
|
|
|
2nd Destination Coord (YyyyXxxxh) ;Xpos counted in halfwords
|
|
|
|
3rd Width+Height (YsizXsizh) ;Xsiz counted in halfwords
|
|
|
|
... Data (...) <--- usually transferred via DMA
|
|
|
|
```
|
|
|
|
Transfers data from CPU to frame buffer. If the number of halfwords to be sent
|
|
|
|
is odd, an extra halfword should be sent (packets consist of 32bit units). The
|
|
|
|
transfer is affected by Mask setting.<br/>
|
|
|
|
|
|
|
|
#### GP0(C0h) - Copy Rectangle (VRAM to CPU)
|
|
|
|
```
|
|
|
|
1st Command (Cc000000h) ;\
|
|
|
|
2nd Source Coord (YyyyXxxxh) ; write to GP0 port (as usually)
|
|
|
|
3rd Width+Height (YsizXsizh) ;/
|
|
|
|
... Data (...) ;<--- read from GPUREAD port (or via DMA)
|
|
|
|
```
|
|
|
|
Transfers data from frame buffer to CPU. Wait for bit27 of the status register
|
|
|
|
to be set before reading the image data. When the number of halfwords is odd,
|
|
|
|
an extra halfword is read at the end (packets consist of 32bit units).<br/>
|
|
|
|
|
|
|
|
#### Masking and Rounding for FILL Command parameters
|
|
|
|
```
|
|
|
|
Xpos=(Xpos AND 3F0h) ;range 0..3F0h, in steps of 10h
|
|
|
|
Ypos=(Ypos AND 1FFh) ;range 0..1FFh
|
|
|
|
Xsiz=((Xsiz AND 3FFh)+0Fh) AND (NOT 0Fh) ;range 0..400h, in steps of 10h
|
|
|
|
Ysiz=((Ysiz AND 1FFh)) ;range 0..1FFh
|
|
|
|
```
|
|
|
|
Fill does NOT occur when Xsiz=0 or Ysiz=0 (unlike as for Copy commands).
|
|
|
|
Xsiz=400h works only indirectly: Param=400h is handled as Xsiz=0, however,
|
|
|
|
Param=3F1h..3FFh is rounded-up and handled as Xsiz=400h.<br/>
|
|
|
|
|
|
|
|
#### Masking for COPY Commands parameters
|
|
|
|
```
|
|
|
|
Xpos=(Xpos AND 3FFh) ;range 0..3FFh
|
|
|
|
Ypos=(Ypos AND 1FFh) ;range 0..1FFh
|
|
|
|
Xsiz=((Xsiz-1) AND 3FFh)+1 ;range 1..400h
|
|
|
|
Ysiz=((Ysiz-1) AND 1FFh)+1 ;range 1..200h
|
|
|
|
```
|
|
|
|
Parameters are just clipped to 10bit/9bit range, the only special case is that
|
|
|
|
Size=0 is handled as Size=max.<br/>
|
|
|
|
|
|
|
|
#### Notes
|
|
|
|
The coordinates for the above VRAM transfer commands are absolute framebuffer
|
|
|
|
addresses (not relative to Draw Offset, and not clipped to Draw Area).<br/>
|
|
|
|
Non-DMA transfers seem to be working at any time, but GPU-DMA Transfers seem to
|
|
|
|
be working ONLY during V-Blank (outside of V-Blank, portions of the data appear
|
|
|
|
to be skipped, and the following words arrive at wrong addresses), unknown if
|
|
|
|
it's possible to change that by whatever configuration settings...? That
|
|
|
|
problem appears ONLY for continous DMA aka VRAM transfers (linked-list DMA aka
|
|
|
|
Ordering Table works even outside V-Blank).<br/>
|
|
|
|
|
|
|
|
#### Wrapping
|
|
|
|
If the Source/Dest starting points plus the width/height value exceed the
|
|
|
|
1024x512 pixel VRAM size, then the Copy/Fill operations wrap to the opposite
|
|
|
|
memory edge (without any carry-out from X to Y, nor from Y to X).<br/>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
## GPU Other Commands
|
|
|
|
#### GP0(1Fh) - Interrupt Request (IRQ1)
|
|
|
|
```
|
|
|
|
1st Command (Cc000000h) ;GPUSTAT.24
|
|
|
|
```
|
|
|
|
Requests IRQ1. Can be acknowledged via GP1(02h). This feature is rarely used.<br/>
|
|
|
|
Note: The command is used by Blaze'n'Blade, but the game doesn't have IRQ1
|
|
|
|
enabled, and the written value (1F801810h) looks more like an I/O address,
|
|
|
|
rather than like a command, so not sure if it's done intentionally, or if it is
|
|
|
|
just a bug.<br/>
|
|
|
|
|
|
|
|
#### GP0(03h) - Unknown?
|
|
|
|
Unknown. Doesn't seem to be used by any games. Unlike the "NOP" commands,
|
|
|
|
GP0(03h) does take up space in FIFO, so it is apparently not a NOP.<br/>
|
|
|
|
|
|
|
|
#### GP0(00h) - NOP (?)
|
|
|
|
This command doesn't take up space in the FIFO (eg. even if a VRAM-to-VRAM
|
|
|
|
transfer is still busy, one can send dozens of GP0(00h) commands, without the
|
|
|
|
command FIFO becoming full. So, either the command is ignored (or, if it has a
|
|
|
|
function, it is executed immediately, even while the transfer is busy).<br/>
|
|
|
|
...<br/>
|
|
|
|
GP0(00h) unknown, used with parameter = 08A16Ch... or rather 08FDBCh ... the
|
|
|
|
written value seems to be a bios/ram memory address, anded with 00FFFFFFh...
|
|
|
|
maybe a bios bug?<br/>
|
|
|
|
GP0(00h) seems to be often inserted between Texpage and Rectangle commands,
|
|
|
|
maybe it acts as a NOP, which may be required between that commands, for timing
|
|
|
|
reasons...?<br/>
|
|
|
|
|
|
|
|
#### GP0(04h..1Eh,E0h,E7h..EFh) - Mirrors of GP0(00h) - NOP (?)
|
|
|
|
Like GP0(00h), these commands don't take up space in the FIFO. So, maybe, they
|
|
|
|
are same as GP0(00h), however, the Drawing Area/Offset commands GP0(E3h..E5h)
|
|
|
|
don't take up FIFO space either, so not taking up FIFO space doesn't
|
|
|
|
neccessarily mean that the command has no function.<br/>
|
|
|
|
|
|
|
|
#### GP0(81h..9Fh) - Mirror of GP0(80h) - Copy Rectangle (VRAM to VRAM)
|
|
|
|
#### GP0(A1h..BFh) - Mirror of GP0(A0h) - Copy Rectangle (CPU to VRAM)
|
|
|
|
#### GP0(C1h..DFh) - Mirror of GP0(C0h) - Copy Rectangle (VRAM to CPU)
|
|
|
|
Mirrors.<br/>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
## GPU Display Control Commands (GP1)
|
|
|
|
GP1 Display Control Commands are sent by writing the 8bit Command number
|
|
|
|
(MSBs), and 24bit parameter (LSBs) to Port 1F801814h. Unlike GP0 commands, GP1
|
|
|
|
commands are passed directly to the GPU (ie. they can be sent even when the
|
|
|
|
FIFO is full).<br/>
|
|
|
|
|
|
|
|
#### GP1(00h) - Reset GPU
|
|
|
|
```
|
|
|
|
0-23 Not used (zero)
|
|
|
|
```
|
|
|
|
Resets the GPU to the following values:<br/>
|
|
|
|
```
|
|
|
|
GP1(01h) ;clear fifo
|
|
|
|
GP1(02h) ;ack irq (0)
|
|
|
|
GP1(03h) ;display off (1)
|
|
|
|
GP1(04h) ;dma off (0)
|
|
|
|
GP1(05h) ;display address (0)
|
|
|
|
GP1(06h) ;display x1,x2 (x1=200h, x2=200h+256*10)
|
|
|
|
GP1(07h) ;display y1,y2 (y1=010h, y2=010h+240)
|
|
|
|
GP1(08h) ;display mode 320x200 NTSC (0)
|
|
|
|
GP0(E1h..E6h) ;rendering attributes (0)
|
|
|
|
```
|
|
|
|
Accordingly, GPUSTAT becomes 14802000h. The x1,y1 values are too small, ie. the
|
|
|
|
upper-left edge isn't visible. Note that GP1(09h) is NOT affected by the reset
|
|
|
|
command.<br/>
|
|
|
|
|
|
|
|
#### GP1(01h) - Reset Command Buffer
|
|
|
|
```
|
|
|
|
0-23 Not used (zero)
|
|
|
|
```
|
|
|
|
Resets the command buffer.<br/>
|
|
|
|
|
|
|
|
#### GP1(02h) - Acknowledge GPU Interrupt (IRQ1)
|
|
|
|
```
|
|
|
|
0-23 Not used (zero) ;GPUSTAT.24
|
|
|
|
```
|
|
|
|
Resets the IRQ flag in GPUSTAT.24. The flag can be set via GP0(1Fh).<br/>
|
|
|
|
|
|
|
|
#### GP1(03h) - Display Enable
|
|
|
|
```
|
|
|
|
0 Display On/Off (0=On, 1=Off) ;GPUSTAT.23
|
|
|
|
1-23 Not used (zero)
|
|
|
|
```
|
|
|
|
Turns display on/off. "Note that a turned off screen still gives the flicker of
|
|
|
|
NTSC on a PAL screen if NTSC mode is selected."<br/>
|
|
|
|
The "Off" settings displays a black picture (and still sends /SYNC signals to
|
|
|
|
the television set). (Unknown if it still generates vblank IRQs though?)<br/>
|
|
|
|
|
|
|
|
#### GP1(04h) - DMA Direction / Data Request
|
|
|
|
```
|
|
|
|
0-1 DMA Direction (0=Off, 1=FIFO, 2=CPUtoGP0, 3=GPUREADtoCPU) ;GPUSTAT.29-30
|
|
|
|
2-23 Not used (zero)
|
|
|
|
```
|
|
|
|
Notes: Manually sending/reading data by software (non-DMA) is ALWAYS possible,
|
|
|
|
regardless of the GP1(04h) setting. The GP1(04h) setting does affect the
|
|
|
|
meaning of GPUSTAT.25.<br/>
|
|
|
|
|
|
|
|
#### Display start/end
|
|
|
|
Specifies where the display area is positioned on the screen, and how much data
|
|
|
|
gets sent to the screen. The screen sizes of the display area are valid only if
|
|
|
|
the horizontal/vertical start/end values are default. By changing these you can
|
|
|
|
get bigger/smaller display screens. On most TV's there is some black around the
|
|
|
|
edge, which can be utilised by setting the start of the screen earlier and the
|
|
|
|
end later. The size of the pixels is NOT changed with these settings, the GPU
|
|
|
|
simply sends more data to the screen. Some monitors/TVs have a smaller display
|
|
|
|
area and the extended size might not be visible on those sets. "(Mine is
|
2020-07-23 22:41:06 +02:00
|
|
|
capable of about 330 pixels horizontal, and 272 vertical in 320\*240 mode)"<br/>
|
2020-07-23 20:49:17 +02:00
|
|
|
|
|
|
|
#### GP1(05h) - Start of Display area (in VRAM)
|
|
|
|
```
|
|
|
|
0-9 X (0-1023) (halfword address in VRAM) (relative to begin of VRAM)
|
|
|
|
10-18 Y (0-511) (scanline number in VRAM) (relative to begin of VRAM)
|
|
|
|
19-23 Not used (zero)
|
|
|
|
```
|
|
|
|
Upper/left Display source address in VRAM. The size and target position on
|
|
|
|
screen is set via Display Range registers; target=X1,Y2;
|
2020-07-23 22:41:06 +02:00
|
|
|
size=(X2-X1/cycles\_per\_pix), (Y2-Y1).<br/>
|
2020-07-23 20:49:17 +02:00
|
|
|
|
|
|
|
#### GP1(06h) - Horizontal Display range (on Screen)
|
|
|
|
```
|
|
|
|
0-11 X1 (260h+0) ;12bit ;\counted in 53.222400MHz units,
|
|
|
|
12-23 X2 (260h+320*8) ;12bit ;/relative to HSYNC
|
|
|
|
```
|
|
|
|
Specifies the horizontal range within which the display area is displayed. For
|
|
|
|
resolutions other than 320 pixels it may be necessary to fine adjust the value
|
2020-07-23 22:41:06 +02:00
|
|
|
to obtain an exact match (eg. X2=X1+pixels\*cycles\_per\_pix).<br/>
|
|
|
|
The number of displayed pixels per line is "(((X2-X1)/cycles\_per\_pix)+2) AND
|
2020-07-23 20:49:17 +02:00
|
|
|
NOT 3" (ie. the hardware is rounding the width up/down to a multiple of 4
|
|
|
|
pixels).<br/>
|
|
|
|
Most games are using a width equal to the horizontal resolution (ie. 256, 320,
|
|
|
|
368, 512, 640 pixels). A few games are using slightly smaller widths (probably
|
|
|
|
due to programming bugs). Pandemonium 2 is using a bigger "overscan" width
|
|
|
|
(ensuring an intact picture without borders even on mis-calibrated TV sets).<br/>
|
|
|
|
The 260h value is the first visible pixel on normal TV Sets, this value is used
|
|
|
|
by MOST NTSC games, and SOME PAL games (see below notes on Mis-Centered PAL
|
|
|
|
games).<br/>
|
|
|
|
|
|
|
|
#### GP1(07h) - Vertical Display range (on Screen)
|
|
|
|
```
|
|
|
|
0-9 Y1 (NTSC=88h-(224/2), (PAL=A3h-(264/2)) ;\scanline numbers on screen,
|
|
|
|
10-19 Y2 (NTSC=88h+(224/2), (PAL=A3h+(264/2)) ;/relative to VSYNC
|
|
|
|
20-23 Not used (zero)
|
|
|
|
```
|
|
|
|
Specifies the vertical range within which the display area is displayed. The
|
|
|
|
number of lines is Y2-Y1 (unlike as for the width, there's no rounding applied
|
|
|
|
to the height). If Y2 is set to a much too large value, then the hardware stops
|
|
|
|
to generate vblank interrupts (IRQ0).<br/>
|
|
|
|
The 88h/A3h values are the middle-scanlines on normal TV Sets, these values are
|
|
|
|
used by MOST NTSC games, and SOME PAL games (see below notes on Mis-Centered
|
|
|
|
PAL games).<br/>
|
|
|
|
The 224/264 values are for fullscreen pictures. Many NTSC games display 240
|
|
|
|
lines (overscan with hidden lines). Many PAL games display only 256 lines
|
|
|
|
(underscan with black borders).<br/>
|
|
|
|
|
|
|
|
#### GP1(08h) - Display mode
|
|
|
|
```
|
|
|
|
0-1 Horizontal Resolution 1 (0=256, 1=320, 2=512, 3=640) ;GPUSTAT.17-18
|
|
|
|
2 Vertical Resolution (0=240, 1=480, when Bit5=1) ;GPUSTAT.19
|
|
|
|
3 Video Mode (0=NTSC/60Hz, 1=PAL/50Hz) ;GPUSTAT.20
|
|
|
|
4 Display Area Color Depth (0=15bit, 1=24bit) ;GPUSTAT.21
|
|
|
|
5 Vertical Interlace (0=Off, 1=On) ;GPUSTAT.22
|
|
|
|
6 Horizontal Resolution 2 (0=256/320/512/640, 1=368) ;GPUSTAT.16
|
|
|
|
7 "Reverseflag" (0=Normal, 1=Distorted) ;GPUSTAT.14
|
|
|
|
8-23 Not used (zero)
|
|
|
|
```
|
|
|
|
Note: Interlace must be enabled to see all lines in 480-lines mode (interlace
|
|
|
|
is causing ugly flickering, so a non-interlaced low resolution image is
|
|
|
|
typically having better quality than a high resolution interlaced image, a
|
|
|
|
pretty bad example are the intro screens shown by the BIOS). The Display Area
|
|
|
|
Color Depth does NOT affect the Drawing Area (the Drawing Area is
|
|
|
|
\<always\> 15bit).<br/>
|
|
|
|
When the "Reverseflag" is set, the display scrolls down 2 lines or so, and
|
|
|
|
colored regions are getting somehow hatched/distorted, but black and white
|
|
|
|
regions are still looking okay. Don't know what that's good for? Probably
|
|
|
|
relates to PAL/NTSC-Color Clock vs PSX-Dot Clock mismatches: Bit7=0 causes
|
|
|
|
Flimmering errors (errors at different locations in each frame), and Bit7=1
|
|
|
|
causes Static errors (errors at same locations in all frames)?<br/>
|
|
|
|
|
|
|
|
#### GP1(10h) - Get GPU Info
|
|
|
|
#### GP1(11h..1Fh) - Mirrors of GP1(10h), Get GPU Info
|
|
|
|
After sending the command, the result can be read (immediately) from GPUREAD
|
|
|
|
register (there's no NOP or other delay required) (namely GPUSTAT.Bit27 is used
|
|
|
|
only for VRAM-Reads, but NOT for GPU-Info-Reads, so do not try to wait for that
|
|
|
|
flag).<br/>
|
|
|
|
```
|
|
|
|
0-23 Select Information which is to be retrieved (via following GPUREAD)
|
|
|
|
```
|
|
|
|
On Old 180pin GPUs, following values can be selected:<br/>
|
|
|
|
```
|
|
|
|
00h-01h = Returns Nothing (old value in GPUREAD remains unchanged)
|
|
|
|
02h = Read Texture Window setting ;GP0(E2h) ;20bit/MSBs=Nothing
|
|
|
|
03h = Read Draw area top left ;GP0(E3h) ;19bit/MSBs=Nothing
|
|
|
|
04h = Read Draw area bottom right ;GP0(E4h) ;19bit/MSBs=Nothing
|
|
|
|
05h = Read Draw offset ;GP0(E5h) ;22bit
|
|
|
|
06h-07h = Returns Nothing (old value in GPUREAD remains unchanged)
|
|
|
|
08h-FFFFFFh = Mirrors of 00h..07h
|
|
|
|
```
|
|
|
|
On New 208pin GPUs, following values can be selected:<br/>
|
|
|
|
```
|
|
|
|
00h-01h = Returns Nothing (old value in GPUREAD remains unchanged)
|
|
|
|
02h = Read Texture Window setting ;GP0(E2h) ;20bit/MSBs=Nothing
|
|
|
|
03h = Read Draw area top left ;GP0(E3h) ;20bit/MSBs=Nothing
|
|
|
|
04h = Read Draw area bottom right ;GP0(E4h) ;20bit/MSBs=Nothing
|
|
|
|
05h = Read Draw offset ;GP0(E5h) ;22bit
|
|
|
|
06h = Returns Nothing (old value in GPUREAD remains unchanged)
|
|
|
|
07h = Read GPU Type (usually 2) ;see "GPU Versions" chapter
|
|
|
|
08h = Unknown (Returns 00000000h) (lightgun on some GPUs?)
|
|
|
|
09h-0Fh = Returns Nothing (old value in GPUREAD remains unchanged)
|
|
|
|
10h-FFFFFFh = Mirrors of 00h..0Fh
|
|
|
|
```
|
|
|
|
The selected data is latched in GPUREAD, the same/latched value can be read
|
|
|
|
multiple times, but, the latch isn't automatically updated when changing GP0
|
|
|
|
registers.<br/>
|
|
|
|
|
|
|
|
#### GP1(09h) - New Texture Disable
|
|
|
|
```
|
|
|
|
0 Texture Disable (0=Normal, 1=Allow Disable via GP0(E1h).11) ;GPUSTAT.15
|
|
|
|
1-23 Unknown (seems to have no effect)
|
|
|
|
```
|
|
|
|
This feature seems to be intended for debugging purposes (most released games
|
|
|
|
do contain program code for disabling textures, but do never execute it).<br/>
|
|
|
|
GP1(09h) seems to be supported only on New GPUs. Old GPUs don't support it all,
|
|
|
|
and there seem to be some Special/Prototype GPUs that use GP1(20h) instead of
|
|
|
|
GP1(09h).<br/>
|
|
|
|
|
|
|
|
#### GP1(20h) - Special/Prototype Texture Disable
|
|
|
|
```
|
|
|
|
0-23 Unknown (501h=Texture Enable, 504h=Texture Disable, or so?)
|
|
|
|
```
|
|
|
|
Seems to be a used only on whatever arcade/prototype GPUs. New GPUs are using
|
|
|
|
GP1(09h) instead of GP1(20h).<br/>
|
|
|
|
|
|
|
|
#### GP1(0Bh) - Unknown/Internal?
|
|
|
|
```
|
|
|
|
0-10 Unknown (GPU crashes after a while when set to 274h..7FFh)
|
|
|
|
11-23 Unknown (seems to have no effect)
|
|
|
|
```
|
|
|
|
The register doesn't seem to be used by any games.<br/>
|
|
|
|
|
|
|
|
#### GP1(0Ah,0Ch..0Fh,21h..3Fh) - N/A
|
|
|
|
Not used?<br/>
|
|
|
|
|
|
|
|
#### GP1(40h..FFh) - N/A (Mirrors)
|
|
|
|
Mirrors of GP1(00h..3Fh).<br/>
|
|
|
|
|
|
|
|
#### Mis-Centered PAL Games (wrong GP1(06h)/GP1(07h) settings)
|
|
|
|
NTSC games are typically well centered (using X1=260h, and Y1/Y2=88h+/-N).<br/>
|
|
|
|
PAL games should be centered as X1=260h, and Y1/Y2=A3h+/-N) - these values
|
|
|
|
would be looking well on a Philips Philetta TV Set, and do also match up with
|
|
|
|
other common picture positions (eg. as used by Nintendo's SNES console).<br/>
|
|
|
|
However, most PAL games are using completely different "random" centering
|
|
|
|
values (maybe caused by different developers trying to match the centering to
|
|
|
|
the different TV Sets) (although it looks more as if the PAL developers just
|
|
|
|
went amok: Many PAL games are even using different centerings for their Intro,
|
|
|
|
Movie, and actual Game sequences).<br/>
|
|
|
|
In result, most PAL games are looking like crap when playing them on a real
|
|
|
|
PSX. For PSX emulators it may be recommended to ignore the GP1(06h)/GP1(07h)
|
|
|
|
centering, and instead, apply auto-centering to PAL games.<br/>
|
|
|
|
For PAL game developers, it may be recommended to add a screen centering option
|
|
|
|
(as found in Tomb Raider 3, for example). Unknown if this is really required...
|
|
|
|
or if X1=260h, and Y1/Y2=A3h+/-N would work fine on most or all PAL TV Sets?<br/>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
## GPU Status Register
|
|
|
|
#### 1F801814h - GPUSTAT - GPU Status Register (R)
|
|
|
|
```
|
|
|
|
0-3 Texture page X Base (N*64) ;GP0(E1h).0-3
|
|
|
|
4 Texture page Y Base (N*256) (ie. 0 or 256) ;GP0(E1h).4
|
|
|
|
5-6 Semi Transparency (0=B/2+F/2, 1=B+F, 2=B-F, 3=B+F/4) ;GP0(E1h).5-6
|
|
|
|
7-8 Texture page colors (0=4bit, 1=8bit, 2=15bit, 3=Reserved)GP0(E1h).7-8
|
|
|
|
9 Dither 24bit to 15bit (0=Off/strip LSBs, 1=Dither Enabled);GP0(E1h).9
|
|
|
|
10 Drawing to display area (0=Prohibited, 1=Allowed) ;GP0(E1h).10
|
|
|
|
11 Set Mask-bit when drawing pixels (0=No, 1=Yes/Mask) ;GP0(E6h).0
|
|
|
|
12 Draw Pixels (0=Always, 1=Not to Masked areas) ;GP0(E6h).1
|
|
|
|
13 Interlace Field (or, always 1 when GP1(08h).5=0)
|
|
|
|
14 "Reverseflag" (0=Normal, 1=Distorted) ;GP1(08h).7
|
|
|
|
15 Texture Disable (0=Normal, 1=Disable Textures) ;GP0(E1h).11
|
|
|
|
16 Horizontal Resolution 2 (0=256/320/512/640, 1=368) ;GP1(08h).6
|
|
|
|
17-18 Horizontal Resolution 1 (0=256, 1=320, 2=512, 3=640) ;GP1(08h).0-1
|
|
|
|
19 Vertical Resolution (0=240, 1=480, when Bit22=1) ;GP1(08h).2
|
|
|
|
20 Video Mode (0=NTSC/60Hz, 1=PAL/50Hz) ;GP1(08h).3
|
|
|
|
21 Display Area Color Depth (0=15bit, 1=24bit) ;GP1(08h).4
|
|
|
|
22 Vertical Interlace (0=Off, 1=On) ;GP1(08h).5
|
|
|
|
23 Display Enable (0=Enabled, 1=Disabled) ;GP1(03h).0
|
|
|
|
24 Interrupt Request (IRQ1) (0=Off, 1=IRQ) ;GP0(1Fh)/GP1(02h)
|
|
|
|
25 DMA / Data Request, meaning depends on GP1(04h) DMA Direction:
|
|
|
|
When GP1(04h)=0 ---> Always zero (0)
|
|
|
|
When GP1(04h)=1 ---> FIFO State (0=Full, 1=Not Full)
|
|
|
|
When GP1(04h)=2 ---> Same as GPUSTAT.28
|
|
|
|
When GP1(04h)=3 ---> Same as GPUSTAT.27
|
|
|
|
26 Ready to receive Cmd Word (0=No, 1=Ready) ;GP0(...) ;via GP0
|
|
|
|
27 Ready to send VRAM to CPU (0=No, 1=Ready) ;GP0(C0h) ;via GPUREAD
|
|
|
|
28 Ready to receive DMA Block (0=No, 1=Ready) ;GP0(...) ;via GP0
|
|
|
|
29-30 DMA Direction (0=Off, 1=?, 2=CPUtoGP0, 3=GPUREADtoCPU) ;GP1(04h).0-1
|
|
|
|
31 Drawing even/odd lines in interlace mode (0=Even or Vblank, 1=Odd)
|
|
|
|
```
|
|
|
|
In 480-lines mode, bit31 changes per frame. And in 240-lines mode, the bit
|
|
|
|
changes per scanline. The bit is always zero during Vblank (vertical retrace
|
|
|
|
and upper/lower screen border).<br/>
|
|
|
|
|
|
|
|
#### Note
|
|
|
|
Further GPU status information can be retrieved via GP1(10h) and GP0(C0h).<br/>
|
|
|
|
|
|
|
|
#### Ready Bits
|
|
|
|
Bit28: Normally, this bit gets cleared when the command execution is busy (ie.
|
|
|
|
once when the command and all of its parameters are received), however, for
|
|
|
|
Polygon and Line Rendering commands, the bit gets cleared immediately after
|
|
|
|
receiving the command word (ie. before receiving the vertex parameters). The
|
|
|
|
bit is used as DMA request in DMA Mode 2, accordingly, the DMA would probably
|
|
|
|
hang if the Polygon/Line parameters are transferred in a separate DMA block
|
|
|
|
(ie. the DMA probably starts ONLY on command words).<br/>
|
|
|
|
Bit27: Gets set after sending GP0(C0h) and its parameters, and stays set until
|
|
|
|
all data words are received; used as DMA request in DMA Mode 3.<br/>
|
|
|
|
Bit26: Gets set when the GPU wants to receive a command. If the bit is cleared,
|
|
|
|
then the GPU does either want to receive data, or it is busy with a command
|
|
|
|
execution (and doesn't want to receive anything).<br/>
|
|
|
|
Bit25: This is the DMA Request bit, however, the bit is also useful for non-DMA
|
|
|
|
transfers, especially in the FIFO State mode.<br/>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
## GPU Versions
|
|
|
|
#### Summary of GPU Differences
|
|
|
|
```
|
|
|
|
Differences... Old 160pin GPU New 208pin GPU
|
|
|
|
GPU Chip CXD8514Q CXD8561Q/BQ/CQ/CXD9500Q
|
|
|
|
Mainboard EARLY-PU-8 and below LATE-PU-8 and up
|
|
|
|
Memory Type Dual-ported VRAM Normal DRAM
|
|
|
|
GPUSTAT.13 when interlace=off always 0 always 1
|
|
|
|
GPUSTAT.14 always 0 reverseflag
|
|
|
|
GPUSTAT.15 always 0 texture_disable
|
|
|
|
GP1(10h:index3..4) 19bit (1MB VRAM) 20bit (2MB VRAM)
|
|
|
|
GP1(10h:index7) N/A 00000002h version
|
|
|
|
GP1(10h:index8) mirror of index0 00000000h zero
|
|
|
|
GP1(10h:index9..F) mirror of index1..7 N/A
|
|
|
|
GP1(20h) whatever? used for detecting old gpu
|
|
|
|
GP0(E1h).bit12/13 without x/y-flip with x/y-flip
|
|
|
|
GP0(03h) N/A (no stored in fifo) unknown/unused command
|
|
|
|
Shaded Textures ((color/8)*texel)/2 (color*texel)/16
|
|
|
|
GP0(02h) FillVram xpos.bit0-3=0Fh=bugged xpos.bit0-3=ignored
|
|
|
|
dma-to-vram: doesn't work with blksiz>10h (new gpu works with blksiz=8C0h!)
|
|
|
|
dma-to-vram: MAYBE also needs extra software-handshake to confirm DMA done?
|
|
|
|
320*224 pix = 11800h pix = 8C00h words
|
|
|
|
GP0(80h) VramToVram works Freeze on large moves?
|
|
|
|
```
|
|
|
|
|
|
|
|
#### Shaded Textures
|
|
|
|
The Old GPU crops 8:8:8 bit gouraud shading color to 5:5:5 bit before
|
|
|
|
multiplying it with the texture color, resulting in rather poor graphics. For
|
|
|
|
example, the snow scence in the first level of Tomb Raider I looks a lot
|
|
|
|
smoother on New GPUs.<br/>
|
|
|
|
The cropped colors are looking a bit as if dithering would be disabled
|
|
|
|
(although, technically dithering works fine, but due to the crippled color
|
|
|
|
input, it's always using the same dither pattern per 8 intensities, instead of
|
|
|
|
using 8 different dither patterns).<br/>
|
|
|
|
|
|
|
|
#### Memory/Rendering Timings
|
|
|
|
The Old GPU uses two Dual-ported VRAM chips (each with two 16bit databusses,
|
|
|
|
one for CPU/DMA/rendering access, and one for output to the video DAC). The New
|
|
|
|
GPU uses s normal DRAM chip (with single 32bit databus).<br/>
|
|
|
|
The exact timing differences are unknown, but the different memory types should
|
|
|
|
result in quite different timings:<br/>
|
|
|
|
The Old GPU might perform better on non-32bit aligned accesses, and on memory
|
|
|
|
accesses performed simultaneously with DAC output.<br/>
|
|
|
|
On the other hand, the New GPU's DRAM seems to be faster in some cases (for
|
|
|
|
example, during Vblank, it's fast enough to perform DMA's with blksiz\>10h,
|
|
|
|
which exceeds the GPU's FIFO size, and causes lost data on Old GPUs).<br/>
|
|
|
|
|
|
|
|
#### X/Y-Flip and 2MB Video RAM
|
|
|
|
The X/Y-flipping feature may be used by arcade games (provided that the arcade
|
|
|
|
board is fitted with New GPUs). The flipping feature does also work on retail
|
|
|
|
consoles with New GPUs, but PSX games should never use that feature (for
|
|
|
|
maintaining compatiblity with older PSX consoles).<br/>
|
|
|
|
2Mbyte Video RAM is used on some arcade boards. Whilst PSX retail consoles are
|
|
|
|
always containing only 1MByte RAM, so the feature cannot be used even if the
|
|
|
|
console contains a New GPU. There's one special case: Some PSone consoles are
|
|
|
|
actually fitted with 2MB chips (maybe because smaller chips haven't been in
|
|
|
|
production anymore), but the chips are wired so that only half of the memory is
|
|
|
|
accessible (the extra memory could be theoretically unlocked with some minimal
|
|
|
|
hardware modification).<br/>
|
|
|
|
|
|
|
|
#### GPU Detection (and optional texture disable)
|
|
|
|
Below is slightly customized GPU Detection function taken from Perfect Assassin
|
|
|
|
(the index7 latching works ONLY on New GPUs, whilst old GPUs would leave the
|
|
|
|
latched value unchanged; as a workaround, the index4 latching is used to ensure
|
|
|
|
that the latch won't contain 000002h on old GPUs, assuming that index4 is never
|
|
|
|
set to 000002h).<br/>
|
|
|
|
```
|
|
|
|
[1F801814h]=10000004h ;GP1(10h).index4 (latch draw area bottom right)
|
|
|
|
[1F801814h]=10000007h ;GP1(10h).index7 (latch GPU version, if any)
|
|
|
|
if ([1F801810h] AND 00FFFFFFh)=00000002h then goto @@gpu_v2
|
|
|
|
[1F801810h]=([1F801814h] AND 3FFFh) OR E1001000h ;change GPUSTAT via GP0(E1h)
|
|
|
|
dummy=[1F801810h] ;dummy read (unknown purpose)
|
|
|
|
if ([1F801814h] AND 00001000h) then goto @@gpu_v1 else goto @@gpu_v0
|
|
|
|
;---
|
|
|
|
@@gpu_v0: ;Old 160pin GPU (EARLY-PU-8)
|
|
|
|
return 0
|
|
|
|
;---
|
|
|
|
@@gpu_v1: ;unknown GPU type, maybe some custom arcade/prototype version ?
|
|
|
|
if want_tex_dis then [1F801814h]=20000504h ;GP1(20h)
|
|
|
|
return 1
|
|
|
|
;---
|
|
|
|
@@gpu_v2: ;New 208pin GPU (LATE-PU-8 and up)
|
|
|
|
if want_tex_dis then [1F801814h]=09000001h ;GP1(09h)
|
|
|
|
return 2
|
|
|
|
```
|
|
|
|
|
|
|
|
#### GP0(02h) FillVram
|
|
|
|
The FillVram command does normally ignore the lower 4bit of the x-coordinate
|
|
|
|
(and software should always set those bits to zero). However, if the 4bits are
|
|
|
|
all set, then the Old GPU does write each 2nd pixel to wrong memory address.
|
|
|
|
For example, a 32x4 pixel fill produces following results for x=0..1Fh:<br/>
|
|
|
|
```
|
|
|
|
0h 10h 20h 30h 40h
|
|
|
|
| | | | |
|
|
|
|
################################ ;\x=00h..0Eh
|
|
|
|
################################ ; and, x=0Fh
|
|
|
|
################################ ; on NEW GPU
|
|
|
|
################################ ;/
|
|
|
|
# # # # # # # ################## # # # # # # # ;\
|
|
|
|
# # # # # # # ################## # # # # # # # ; x=0Fh
|
|
|
|
# # # # # # # ################## # # # # # # # ; on OLD GPU
|
|
|
|
# # # # # # # ################## # # # # # # # ;/
|
|
|
|
################################ ;\x=10h..1Eh
|
|
|
|
################################ ; and, x=1Fh
|
|
|
|
################################ ; on NEW GPU
|
|
|
|
################################ ;/
|
|
|
|
# # # # # # # ################## # # # # # # # ;\
|
|
|
|
# # # # # # # ################## # # # # # # # ; x=1Fh
|
|
|
|
# # # # # # # ################## # # # # # # # ; on OLD GPU
|
|
|
|
# # # # # # # ################## # # # # # # # ;/
|
|
|
|
```
|
|
|
|
|
|
|
|
#### Arcade GPUs
|
|
|
|
Some arcade boards are using normal retail GPUs, however, there are also two
|
|
|
|
special non-retail 208pin GPUs which seem to be solely used on arcade boards:<br/>
|
|
|
|
```
|
|
|
|
IC21 - 208pin - "SONY CXD8538Q" ;seen on GP-11 (namco System 11) boards
|
|
|
|
IC103 - 208pin - "SONY CXD8654Q" ;seen on GP-15 (namco System 12) boards
|
|
|
|
```
|
|
|
|
The exact differences to retail GPUs are unknown. One of the special GPUs is
|
|
|
|
said to use entierly different command numbers for rendering commands (maybe
|
|
|
|
some old prototype variant, or maybe some protection against cloning arcade
|
|
|
|
boards with retail chips).<br/>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
## GPU Depth Ordering
|
|
|
|
#### Absent Depth Buffer
|
|
|
|
The PlayStation's GPU stores only RGB colors in the framebuffer (ie. unlike
|
|
|
|
modern 3D processors, it's NOT buffering Depth values; leaving apart the Mask
|
|
|
|
bit, which could be considered as a tiny 1bit "Depth" or "Priority" value). In
|
|
|
|
fact, the GPU supports only X,Y coordinates, and it's totally unaware of Z
|
|
|
|
coordinates. So, when rendering a polygon, the hardware CANNOT determine which
|
|
|
|
of the new pixels are in front/behind of the old pixels in the buffer.<br/>
|
|
|
|
|
|
|
|
#### Simple Ordering
|
|
|
|
The rendering simply takes place in the ordering as the data is sent to the GPU
|
|
|
|
(ie. the most distant objects should be sent first). For 2D graphics, it's
|
|
|
|
fairly easy follow that order (eg. even multi-layer 2D graphics can be using
|
|
|
|
DMA2-continous mode).<br/>
|
|
|
|
|
|
|
|
#### Depth Ordering Table (OT)
|
|
|
|
For 3D graphics, the ordering of the polygons may change more or less randomly
|
|
|
|
(eg. when rotating/moving the camera). To solve that problem, the whole
|
|
|
|
rendering data is usually first stored in a Depth Ordering Table (OT) in Main
|
|
|
|
RAM, and, once when all polygons have been stored in the OT, the OT is sent to
|
|
|
|
the GPU via "DMA2-linked-list" mode.<br/>
|
|
|
|
|
|
|
|
#### Initializing an empty OT (via DMA6)
|
|
|
|
DMA channel 6 can be used to set up an empty linked list, in which each entry
|
|
|
|
points to the previous:<br/>
|
|
|
|
```
|
|
|
|
DPCR - enable bits ;Example=x8xxxxxxh
|
|
|
|
D6_MADR - pointer to the LAST table entry ;Example=8012300Ch
|
|
|
|
D6_BCR - number of list entries ;Example=00000004h
|
|
|
|
D6_CHCR - control bits (should be 11000002h) ;Example=11000002h
|
|
|
|
```
|
|
|
|
Each entry has a size of 00h words (upper 8bit), and a pointer to the previous
|
|
|
|
entry (lower 24bit). With the above Example values, the generated table would
|
|
|
|
look like so:<br/>
|
|
|
|
```
|
|
|
|
[80123000h]=00FFFFFFh ;1st entry, points to end code (xxFFFFFFh)
|
|
|
|
[80123004h]=00123000h ;2nd entry, points to 1st entry
|
|
|
|
[80123008h]=00123004h ;3rd entry, points to 2nd entry
|
|
|
|
[8012300Ch]=00123008h ;last entry, points to 3rd entry (table entrypoint)
|
|
|
|
```
|
|
|
|
|
|
|
|
#### Inserting Entries (Passing GTE data to the OT) (by software)
|
|
|
|
The GTE commands AVSZ3 and AVSZ4 can be used to calculate the Average Z
|
|
|
|
coordinates of a polygon (based on its three or four Z coordinates). The result
|
|
|
|
is returned as a 16bit Z value in GTE register OTZ, the commands do also allow
|
|
|
|
to divide the result, to make it less than 16bit (the full 16bit would require
|
|
|
|
an OT of 256KBytes - for the EMPTY table, which would be a waste of memory, and
|
|
|
|
which would slowdown the DMA2/DMA6 operations) (on the other hand, a smaller
|
|
|
|
table means less depth resolution).<br/>
|
|
|
|
```
|
|
|
|
[PacketAddr+0] = [80123000h+OTZ*4] + (N SHL 24) <--internal link chain
|
|
|
|
[PacketAddr+4..N*4] = GP0 Command(s) and Parameters <--data (send to GP0)
|
|
|
|
[80123000h+OTZ*4] = PacketAddr AND FFFFFFh <--internal link chain
|
|
|
|
```
|
|
|
|
If there's been already an entry (at the same OTZ index), then the new polygon
|
|
|
|
will be processed first (ie. it will appear "behind" of the old entry).<br/>
|
|
|
|
Not sure if the packet size must be limited to max N=16 words (ie. as for the
|
|
|
|
DMA2-continous block size) (due to GP0 FIFO size limits)?<br/>
|
|
|
|
|
|
|
|
#### Sending the OT to the CPU (via DMA2-linked-list mode)
|
|
|
|
```
|
|
|
|
1 - Wait until GPU is ready to receive commands ;GPUSTAT.28
|
|
|
|
2 - Enable DMA channel 2 ;DPCR
|
|
|
|
3 - Set GPU to DMA cpu->gpu mode ;[GP1]=04000002h aka GP1(04h)
|
|
|
|
3 - Set D2_MADR to the start of the list ;(LAST Entry) ;Example=80123010h
|
|
|
|
4 - Set D2_BCR to zero ;(length unused, end at END-CODE)
|
|
|
|
5 - Set D2_CHCR to link mode, mem->GPU and dma enable ;=01000401h
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
## GPU Video Memory (VRAM)
|
|
|
|
#### Framebuffer
|
|
|
|
The framebuffer contains the image that is to be output to the Television Set.
|
|
|
|
The GPU supports 10 resolutions, with 16bit or 24bit per pixel.<br/>
|
|
|
|
```
|
|
|
|
Resolution 16bit 24bit | Resolution 16bit 24bit
|
|
|
|
256x240 120Kbytes 180Kbytes | 256x480 240Kbytes 360Kbytes
|
|
|
|
320x240 150Kbytes 225Kbytes | 320x480 300Kbytes 450Kbytes
|
|
|
|
368x240 xx0Kbytes xx0Kbytes | 368x480 xx0Kbytes xx0Kbytes
|
|
|
|
512x240 240Kbytes 360Kbytes | 512x480 480Kbytes 720Kbytes
|
|
|
|
640x240 300Kbytes 450Kbytes | 640x480 600Kbytes 900Kbytes
|
|
|
|
```
|
|
|
|
Note: In most cases, you'll need TWO framebuffers (one being displayed, and
|
|
|
|
used as rendering target) (unless you are able to draw the whole new image
|
|
|
|
during vblank, or unless when using single-layer 2D graphics). So, resolutions
|
|
|
|
that occupy more than 512K would exceed the available 1MB VRAM when using 2
|
|
|
|
buffers. Also, high resolutions mean higher rendering load, and less texture
|
|
|
|
memory.<br/>
|
|
|
|
```
|
|
|
|
<B> 15bit Direct Display (default) (works with polygons, lines, rectangles)</B>
|
|
|
|
0-4 Red (0..31)
|
|
|
|
5-9 Green (0..31)
|
|
|
|
10-14 Blue (0..31)
|
|
|
|
15 Mask flag (0=Normal, 1=Do not allow to overwrite this pixel)
|
|
|
|
<B> 24bit Direct Display (works ONLY with direct vram transfers)</B>
|
|
|
|
0-7 Red (0..255)
|
|
|
|
8-15 Green (0..255)
|
|
|
|
16-23 Blue (0..255)
|
|
|
|
```
|
|
|
|
Note: The 24bit pixels occupy 3 bytes (not 4 bytes with unused MSBs), so each 6
|
|
|
|
bytes contain two 24bit pixels. The 24bit display mode works only with VRAM
|
|
|
|
transfer commands like GP0(A0h); the rendering commands GP0(20h..7Fh) cannot
|
|
|
|
output 24bit data. Ie. 24bit mode is used mostly for MDEC videos (and some 2D
|
|
|
|
games like Heart of Darkness).<br/>
|
|
|
|
|
|
|
|
#### Texture Bitmaps
|
|
|
|
A texture is an image put on a polygon or sprite. The data of a texture can be
|
|
|
|
stored in 3 different modes:<br/>
|
|
|
|
```
|
|
|
|
<B> 16bit Texture (Direct Color) ;(One 256x256 page = 128Kbytes)</B>
|
|
|
|
0-4 Red (0..31) ;\Color 0000h = Fully-Transparent
|
|
|
|
5-9 Green (0..31) ; Color 0001h..7FFFh = Non-Transparent
|
|
|
|
10-14 Blue (0..31) ; Color 8000h..FFFFh = Semi-Transparent (*)
|
|
|
|
15 Semi Transparency Flag ;/(*) or Non-Transparent for opaque commands
|
|
|
|
<B> 8bit Texture (256 Color Palette) ;(One 256x256 page = 64Kbytes)</B>
|
|
|
|
0-7 Palette index for 1st pixel (left)
|
|
|
|
8-15 Palette index for 2nd pixel (right)
|
|
|
|
<B> 4bit Texture (16 Color Palette) ;(One 256x256 page = 32Kbytes)</B>
|
|
|
|
0-3 Palette index for 1st pixel (left)
|
|
|
|
4-7 Palette index for 2nd pixel (middle/left)
|
|
|
|
8-11 Palette index for 3rd pixel (middle/right)
|
|
|
|
12-15 Palette index for 4th pixel (right)
|
|
|
|
```
|
|
|
|
A Texture Page is a 256x256 texel region in VRAM (the Polygon rendering
|
|
|
|
commands are using Texcoords with 8bit X,Y coordinates, so polygons cannot use
|
|
|
|
textures bigger than 256x256) (the Rectangle rendering commands with
|
|
|
|
width/height parameters could theoretically use larger textures, but the
|
|
|
|
hardware clips their texture coordinates to 8bit, too).<br/>
|
|
|
|
The GP0(E2h) Texture Window (aka Texture Repeat) command can be used to reduce
|
|
|
|
the texture size to less than 256x256 texels.<br/>
|
|
|
|
The Texture Pages can be located in the frame buffer on X multiples of 64
|
|
|
|
halfwords and Y multiples of 256 lines.<br/>
|
|
|
|
|
|
|
|
#### Texture Palettes - CLUT (Color Lookup Table)
|
|
|
|
The clut is a the table where the colors are stored for the image data in the
|
|
|
|
CLUT modes. The pixels of those images are used as indexes to this table. The
|
|
|
|
clut is arranged in the frame buffer as a 256x1 image for the 8bit clut mode,
|
|
|
|
and a 16x1 image for the 4bit clut mode.<br/>
|
|
|
|
```
|
|
|
|
0-4 Red (0..31) ;\Color 0000h = Fully-Transparent
|
|
|
|
5-9 Green (0..31) ; Color 0001h..7FFFh = Non-Transparent
|
|
|
|
10-14 Blue (0..31) ; Color 8000h..FFFFh = Semi-Transparent (*)
|
|
|
|
15 Semi Transparency Flag ;/(*) or Non-Transparent for opaque commands
|
|
|
|
```
|
|
|
|
The clut data can be arranged in the frame buffer at X multiples of 16
|
|
|
|
(X=0,16,32,48,etc) and anywhere in the Y range of 0-511.<br/>
|
|
|
|
|
|
|
|
#### Texture Color Black Limitations
|
|
|
|
On the PSX, texture color 0000h is fully-transparent, that means textures
|
|
|
|
cannot contain Black pixels. However, in some cases, Color 8000h (Black with
|
|
|
|
semi-transparent flag) can be used, depending on the rendering command:<br/>
|
|
|
|
```
|
|
|
|
opaque command, eg. GP0(24h) --> 8000h = Non-Transparent Black
|
|
|
|
semi-transp command, eg. GP0(26h) --> 8000h = Semi-Transparent Black
|
|
|
|
```
|
|
|
|
So, with semi-transparent rendering commands, it isn't possible to use
|
|
|
|
Non-Transparent Black pixels in textures, the only workaround is to use colors
|
|
|
|
like 0001h (dark red) or 0400h (dark blue). However, due to the PSX's rather
|
|
|
|
steeply increasing intensity ramp, these colors are clearly visible to be
|
|
|
|
brighter than black.<br/>
|
|
|
|
|
|
|
|
#### RGB Intensity Notes
|
|
|
|
The Playstations RGB values aren't linear to normal RGB values (as used on
|
|
|
|
PCs). The min/max values are of course the same, but the medium values differ:<br/>
|
|
|
|
```
|
|
|
|
Intensity PC PSX
|
|
|
|
Minimum 0 0
|
|
|
|
Medium (circa) 16 8
|
|
|
|
Maximum 31 31
|
|
|
|
```
|
|
|
|
Ie. on the PSX, the intensity increases steeply from 0 to 15, and less steeply
|
|
|
|
from 16 to 31.<br/>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
## GPU Texture Caching
|
|
|
|
The GPU has 2 Kbyte Texture Cache<br/>
|
|
|
|
The Texture Cache is (maybe) also used for CLUT data - or is there a separate
|
|
|
|
CLUT Cache - or is the CLUT uncached - but that'd be trash?<br/>
|
|
|
|
|
|
|
|
If polygons with texture are displayed, the GPU needs to read these from the
|
|
|
|
frame buffer. This slows down the drawing process, and as a result the number
|
|
|
|
of polygons that can be drawn in a given timespan. To speed up this process the
|
|
|
|
GPU is equipped with a texture cache, so a given piece of texture needs not to
|
|
|
|
be read multiple times in succession.<br/>
|
|
|
|
The texture cache size depends on the color mode used for the textures.<br/>
|
|
|
|
In 4 bit CLUT mode it has a size of 64x64, in 8 bit CLUT it's 32x64 and in
|
|
|
|
15bitDirect is 32x32. A general speed up can be achieved by setting up textures
|
|
|
|
according to these sizes. For further speed gain a more precise knowledge of
|
|
|
|
how the cache works is necessary.<br/>
|
|
|
|
|
|
|
|
#### Cache blocks
|
|
|
|
The texture page is divided into non-overlapping cache blocks, each of a unit
|
|
|
|
size according to color mode. These cache blocks are tiled within the texture
|
|
|
|
page.<br/>
|
|
|
|
```
|
|
|
|
+-----+-----+-----+--
|
|
|
|
|cache| | |
|
|
|
|
|block| |
|
|
|
|
| 0| 1 | 2 ..
|
|
|
|
+-----+-----+--
|
|
|
|
|.. | |
|
|
|
|
```
|
|
|
|
|
|
|
|
#### Cache entries
|
|
|
|
Each cache block is divided into 256 cache entries, which are numbered
|
|
|
|
sequentially, and are 8 bytes wide. So a cache entry holds 16 4bit clut pixels
|
|
|
|
8 8bit clut pixels, or 4 15bitdirect pixels.<br/>
|
|
|
|
```
|
|
|
|
4bit and 8bit clut: 15bitdirect:
|
|
|
|
+----+----+----+----+ +----+----+----+----+----+----+----+----+
|
|
|
|
| 0| 1| 2| 3| | 0| 1| 2| 3| 4| 5| 6| 7|
|
|
|
|
+----+----+----+----+ +----+----+----+----+----+----+----+----+
|
|
|
|
| 4| 5| 6| 7| | 8| 9| a| b| c| d| e| f|
|
|
|
|
+----+----+----+----+ +----+----+----+----+----+----+----+----+
|
|
|
|
| 8| 9| .. | 10| 11| ..
|
|
|
|
+----+----+-- +----+----+--
|
|
|
|
| c| ..| | 18| ..|
|
|
|
|
+----+-- +----+--
|
|
|
|
| .. | ..
|
|
|
|
```
|
|
|
|
The cache can hold only one cache entry by the same number, so if f.e. a piece
|
|
|
|
of texture spans multiple cache blocks and it has data on entry 9 of block 1,
|
|
|
|
but also on entry 9 of block 2, these cannot be in the cache at once.<br/>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
## GPU Timings
|
|
|
|
#### Video Clock
|
|
|
|
The PSone/PAL video clock is the cpu clock multiplied by 11/7.<br/>
|
|
|
|
```
|
|
|
|
CPU Clock = 33.868800MHz (44100Hz*300h)
|
|
|
|
Video Clock = 53.222400MHz (44100Hz*300h*11/7)
|
|
|
|
```
|
|
|
|
For other PSX/PSone PAL/NTSC variants, see:<br/>
|
|
|
|
[Pinouts - CLK Pinouts](pinouts.md#pinouts---clk-pinouts)<br/>
|
|
|
|
|
|
|
|
#### Vertical Timings
|
|
|
|
```
|
|
|
|
PAL: 314 scanlines per frame (13Ah)
|
|
|
|
NTSC: 263 scanlines per frame (107h)
|
|
|
|
```
|
|
|
|
Timer1 can use the hblank signal as input, allowing to count scanlines (unless
|
|
|
|
the display is configured to 0 pixels width, which would cause an endless
|
|
|
|
hblank). The hblank signal is generated even during vertical blanking/retrace.<br/>
|
|
|
|
|
|
|
|
#### Horizontal Timings
|
|
|
|
```
|
|
|
|
PAL: 3406 video cycles per scanline (or 3406.1 or so?)
|
|
|
|
NTSC: 3413 video cycles per scanline (or 3413.6 or so?)
|
|
|
|
```
|
|
|
|
Dotclocks:<br/>
|
|
|
|
```
|
|
|
|
PSX.256-pix Dotclock = 5.322240MHz (44100Hz*300h*11/7/10)
|
|
|
|
PSX.320-pix Dotclock = 6.652800MHz (44100Hz*300h*11/7/8)
|
|
|
|
PSX.368-pix Dotclock = 7.603200MHz (44100Hz*300h*11/7/7)
|
|
|
|
PSX.512-pix Dotclock = 10.644480MHz (44100Hz*300h*11/7/5)
|
|
|
|
PSX.640-pix Dotclock = 13.305600MHz (44100Hz*300h*11/7/4)
|
|
|
|
Namco GunCon 385-pix = 8.000000MHz (from 8.00MHz on lightgun PCB)
|
|
|
|
```
|
|
|
|
Dots per scanline are, depending on horizontal resolution, and on PAL/NTSC:<br/>
|
|
|
|
```
|
|
|
|
320pix/PAL: 3406/8 = 425.75 dots 320pix/NTSC: 3413/8 = 426.625 dots
|
|
|
|
640pix/PAL: 3406/4 = 851.5 dots 640pix/NTSC: 3413/4 = 853.25 dots
|
|
|
|
256pix/PAL: 3406/10 = 340.6 dots 256pix/NTSC: 3413/10 = 341.3 dots
|
|
|
|
512pix/PAL: 3406/5 = 681.2 dots 512pix/NTSC: 3413/5 = 682.6 dots
|
|
|
|
368pix/PAL: 3406/7 = 486.5714 dots 368pix/NTSC: 3413/7 = 487.5714 dots
|
|
|
|
```
|
|
|
|
Timer0 can use the dotclock as input, however, the Timer0 input "ignores" the
|
|
|
|
fractional portions (in most cases, the values are rounded down, ie. with 340.6
|
|
|
|
dots/line, the timer increments only 340 times/line; the only value that is
|
|
|
|
rounded up is 425.75 dots/line) (for example, due to the rounding, the timer
|
|
|
|
isn't running exactly twice as fast in 512pix/PAL mode than in 256pix/PAL
|
|
|
|
mode). The dotclock signal is generated even during horizontal/vertical
|
|
|
|
blanking/retrace.<br/>
|
|
|
|
|
|
|
|
#### Frame Rates
|
|
|
|
```
|
|
|
|
PAL: 53.222400MHz/314/3406 = ca. 49.76 Hz (ie. almost 50Hz)
|
|
|
|
NTSC: 53.222400MHz/263/3413 = ca. 59.29 Hz (ie. almost 60Hz)
|
|
|
|
```
|
|
|
|
|
|
|
|
#### Note
|
|
|
|
Above values include "hidden" dots and scanlines (during horizontal and
|
|
|
|
vertical blanking/retrace).<br/>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
## GPU (MISC)
|
|
|
|
#### GP0(20h..7Fh) - Render Command Bits
|
|
|
|
```
|
|
|
|
0-23 Color for (first) Vertex (Not for Raw-Texture)
|
|
|
|
24 Texture Mode (0=Blended, 1=Raw) (Textured-Polygon/Rect only)
|
|
|
|
25 Semi Transparency (0=Off, 1=On) (All Render Types)
|
|
|
|
26 Texture Mapping (0=Off, 1=On) (Polygon/Rectangle only)
|
|
|
|
27-28 Rect Size (0=Var, 1=1x1, 2=8x8, 3=16x16) (Rectangle only)
|
|
|
|
27 Num Vertices (0=Triple, 1=Quad) (Polygon only)
|
|
|
|
27 Num Lines (0=Single, 1=Poly) (Line only)
|
|
|
|
28 Shading (0=Flat, 1=Gouroud) (Polygon/Line only)
|
|
|
|
29-31 Primitive Type (1=Polygon, 2=Line, 3=Rectangle)
|
|
|
|
```
|
|
|
|
|
|
|
|
#### Perspective (in-)correct Rendering
|
|
|
|
The PSX doesn't support perspective correct rendering: Assume a polygon to be
|
|
|
|
rotated so that it's right half becomes more distant to the camera, and it's
|
|
|
|
left half becomes closer. Due to the GTE's perspective division, the right half
|
|
|
|
should appear smaller than the left half.<br/>
|
|
|
|
The GPU supports only linear interpolations for rendering - that is correct
|
|
|
|
concerning the X and Y screen coordinates (which are still linear to each
|
|
|
|
other, even after perspective division, since both are divided by the same
|
|
|
|
value).<br/>
|
|
|
|
However, texture coordinates (and Gouraud shaded colors) are NOT linear to the
|
|
|
|
screen coordinates, and so, the linear interpolated PSX graphics are often
|
|
|
|
looking rather distorted, that especially for textures that contain straight
|
|
|
|
lines. For color shading the problem is less obvious (since shading is kinda
|
|
|
|
blurry anyways).<br/>
|
|
|
|
|
|
|
|
#### Perspective correct Rendering
|
|
|
|
For perspective correct rendering, the polygon's Z-coordinates would be needed
|
|
|
|
to be passed from the GTE to the GPU, and, the GPU would then need to use that
|
|
|
|
Z-coordinates to "undo" the perspective division for each pixel (that'd require
|
|
|
|
some additional memory, and especially a powerful division unit, which isn't
|
|
|
|
implemented in the hardware).<br/>
|
|
|
|
As a workaround, you can try to reduce the size of your polygons (the
|
|
|
|
interpolation errors increase in the center region of larger polygons).
|
|
|
|
Reducing the size would be only required for polygons that occupy a larger
|
|
|
|
screen region (which may vary depending on the distance to the camera).<br/>
|
|
|
|
Ie. you may check the size AFTER perspective division, if it's too large, then
|
|
|
|
break it into smaller parts (using the original coordinates, NOT the screen
|
|
|
|
coordinates), and then pass the fragments to the GTE another time.<br/>
|
|
|
|
Again, perspective correction would be relevant only for certain textures (not
|
|
|
|
for randomly dithered textures like sand, water, fire, grass, and not for
|
|
|
|
untextured polygons, and of course not for 2D graphics, so you may exclude
|
|
|
|
those from size reduction).<br/>
|
|
|
|
|
|
|
|
#### 24bit RGB to 15bit RGB Dithering (enabled in Texpage attribute)
|
|
|
|
For dithering, VRAM is broken to 4x4 pixel blocks, depending on the location in
|
|
|
|
that 4x4 pixel region, the corresponding dither offset is added to the 8bit
|
|
|
|
R/G/B values, the result is saturated to +00h..+FFh, and then divided by 8,
|
|
|
|
resulting in the final 5bit R/G/B values.<br/>
|
|
|
|
```
|
|
|
|
-4 +0 -3 +1 ;\dither offsets for first two scanlines
|
|
|
|
+2 -2 +3 -1 ;/
|
|
|
|
-3 +1 -4 +0 ;\dither offsets for next two scanlines
|
|
|
|
+3 -1 +2 -2 ;/(same as above, but shifted two pixels horizontally)
|
|
|
|
```
|
|
|
|
POLYGONs (triangles/quads) are dithered ONLY if they do use gouraud shading or
|
|
|
|
texture blending.<br/>
|
|
|
|
LINEs are dithered (no matter if they are mono or do use gouraud shading).<br/>
|
|
|
|
RECTs are NOT dithered (no matter if they do use texture blending).<br/>
|
|
|
|
|
|
|
|
#### Shading information
|
|
|
|
"Texture RGB values control the brightness of the individual colors ($00-$7f).
|
|
|
|
A value of $80 in a color will take the former value as data." (What...?
|
|
|
|
probably means the "double brightness" effect... or does it want to tell that
|
|
|
|
ALL colors of 80h..FFh have only single brightness.. rather than reaching
|
|
|
|
double brightness at FFh...?)<br/>
|
|
|
|
|
|
|
|
#### Shading
|
|
|
|
The GPU has a shading function, which will scale the color of a primitive to a
|
|
|
|
specified brightness. There are 2 shading modes: Flat shading, and gouraud
|
|
|
|
shading. Flat shading is the mode in which one brightness value is specified
|
|
|
|
for the entire primitive. In Gouraud shading mode, a different brightness value
|
|
|
|
can be given for each vertex of a primitive, and the brightness between these
|
|
|
|
points is automatically interpolated.<br/>
|
|
|
|
|
|
|
|
#### Semi Transparency
|
|
|
|
When semi transparency is set for a pixel, the GPU first reads the pixel it
|
|
|
|
wants to write to, and then calculates the color it will write from the 2
|
|
|
|
pixels according to the semitransparency mode selected. Processing speed is
|
|
|
|
lower in this mode because additional reading and calculating are necessary.
|
|
|
|
There are 4 semitransparency modes in the GPU.<br/>
|
|
|
|
```
|
|
|
|
B=Back (the old pixel read from the image in the frame buffer)
|
|
|
|
F=Front (the new halftransparent pixel)
|
|
|
|
* 0.5 x B + 0.5 x F ;aka B/2+F/2
|
|
|
|
* 1.0 x B + 1.0 x F ;aka B+F
|
|
|
|
* 1.0 x B - 1.0 x F ;aka B-F
|
|
|
|
* 1.0 x B +0.25 x F ;aka B+F/4
|
|
|
|
```
|
|
|
|
|
|
|
|
#### Draw to display enable
|
|
|
|
This will enable/disable any drawing to the area that is currently displayed.
|
|
|
|
Not sure yet WHY one should want to disable that?<br/>
|
|
|
|
Also not sure HOW and IF it works... the SIZE of the display area is implied by
|
|
|
|
the screen size - which is horizontally counted in CLOCK CYCLES, so, to obtain
|
|
|
|
the size in PIXELS, the hardware would require to divide that value by the
|
|
|
|
number of cycles per pixel, depending on the current resolution...?<br/>
|
|
|
|
|
|
|
|
|
|
|
|
|