Here’s what I believe happens when the cursor blinks: when it toggles, the OS reads the video contents under the cursor, modifies the affected bytes/words, and writes them back. If a single pixel in a byte/word is touched, it still has to load/store the whole thing. It may even stash the original data somewhere to restore on the next blink, or maybe it’s using a flash/toggle bit — I’m not 100% sure which. Net effect: it’s CPU-intensive and can show little glitches on the left/right edges. Even though it’s infrequent, it’s distracting. The entire process is inefficient, too.
Same basic story with mouse pointers on GUI systems, just more obvious: the pointer may need redrawing every frame, it’s larger, and it overlaps more bytes/words.
My video card handles this differently with hardware pointers. I can preload two pointers (A and B) into a small area of the frame buffer. They’re 4 bpp images: 1 bit is a mask (transparent), and 3 bits select color via a little LUT, so you can map them to any colors you like. Sizes are 16×16 or 32×32, and the pixel pitch matches whatever mode is active. You flip A/B on/off with a simple register write, move them by writing X/Y, and they’re not written into VRAM at all — they’re spliced in hardware into the video stream. So the OS basically says “put this here” and it happens immediately (dozen-ish instructions).
There’s also a 1-bit overlay bitmap (think OSD): any X×Y size, any on-screen position, any single color (or use it as a mask). That makes things like “PLAY>” overlays, menus, or a blinking text cursor trivial. Real-world example: I’ve used it to cover the screen in black while I redraw the frame buffer underneath, then unblank it. That keeps sync happy, doesn’t touch VRAM, and essentially emulates the 8301 screen-blanking behavior via a register.
The chip also has a hardware rectangle fill. You give it a rectangle and a color; it fills VRAM while the CPU carries on. That maps nicely to window clears and other chunky UI ops where software loops are slow. The catch: if the OS keeps a shadow copy of the frame buffer, you still need to mirror the change there, so the classic strategy is “do the hardware fill right now for instant visual response, then let the CPU update the shadow copy afterward.” You don’t keep the full throughput win, but you do keep the perception win.
Some concrete numbers for a 960×540 screen at 4 bpp (2 pixels/byte): that’s 259,200 bytes to write for a full-screen solid fill. On a 7.5 MHz 68008 with an 8-bit bus and zero wait states, a very tight assembly loop takes about 0.32 s. The hardware rectangle fill does it in about 0.028 s. At 25 fps, that’s roughly 8 frames vs 0.7 frames. You can see the difference. With a 16-bit 68000 at 30 MHz this shrinks a lot, but then the video chip’s FIFO starts to become the bottleneck.
I’m working on adding blit capabilities next — copying regions within the display and to/from off-screen VRAM. That would let me cache the background under windows and restore instantly when they move/close. Conceptually this is cloning a VIDC20-ish feature set and extending it. Outputs today: A digital video port that looks a lot like HDMI (but not called that) and VGA.
OS integration: I think these are “relatively difficult” (extensive work!) to fold into SMSQ/E, and “harder” (almost but not quite impossible) to spoon into Minerva. A common driver for both would be lovely if doable. I’ve got Aurora-mode compatibility to the point where Aurora drivers/GD2 just work in 8 or 16 bpp modes. All non-packed-pixel formats work in any resolution. 800x600 @ 65 fps or 1024x768 @ 50 fps are nice. On an HD TV, 960 x 540 @ 25 fps upscales exactly 2:1 and looks incredible. Truly rock solid corner to corner.
To make this approachable without OS changes, I’m writing an assembly library and BASIC procedures/functions that wrap the features.
Hardware status: it already plugs in and makes pictures, and it can act as a second screen on a BBQL with some caveats. I page the 2 MB frame buffer in 32 KB windows into an expansion slot region, and I map the (extensive) register block into internal I/O. That paging dance is because the QL’s 68008 only has a 1 MB address space. That’s what originally pushed me toward a 68SEC000 to get a 4 MB map. As a standalone video card, it probably wants its own local CPU; once you do that, it goes much, much faster.
This design is meant to carry me from 68000 → 68030. It’ll “work” on 68040/060, but eventually the CPU outruns the video chip. I plan to add a 16-bit FIFO when that bottleneck shows up. Right now, writes are roughly 3× faster than reads.
A few things I’d love feedback on:
- * Does anyone have a definitive description of how the QL cursor blink is actually implemented (byte vs word granularity, any “flash bit,” whether it stashes original bytes or repaints each toggle)?
* Best hook points in SMSQ/E and Minerva to drive: hardware pointer enable/move, overlay updates for OSD/menus/cursor, and rectangle fills (plus a lazy shadow-buffer update afterward) — without tearing up existing code paths. I suspect these are delegated to extension functions, which is ideal as it keeps SMSQ/E consistent across branches.
* Edge cases worth testing: mixed depths, odd widths, partial-byte cursor overlays, etc.
Thanks for reading this far!