Cursors and primitives and resolutions, oh my!

Dave · Post by **Dave** » Sun Sep 07, 2025 11:36 pm

The QL cursor is a bit of a beast.

Here’s what I believe happens when the cursor blinks: when it toggles, the OS reads the video contents under the cursor, modifies the affected bytes/words, and writes them back. If a single pixel in a byte/word is touched, it still has to load/store the whole thing. It may even stash the original data somewhere to restore on the next blink, or maybe it’s using a flash/toggle bit — I’m not 100% sure which. Net effect: it’s CPU-intensive and can show little glitches on the left/right edges. Even though it’s infrequent, it’s distracting. The entire process is inefficient, too.

Same basic story with mouse pointers on GUI systems, just more obvious: the pointer may need redrawing every frame, it’s larger, and it overlaps more bytes/words.

My video card handles this differently with hardware pointers. I can preload two pointers (A and B) into a small area of the frame buffer. They’re 4 bpp images: 1 bit is a mask (transparent), and 3 bits select color via a little LUT, so you can map them to any colors you like. Sizes are 16×16 or 32×32, and the pixel pitch matches whatever mode is active. You flip A/B on/off with a simple register write, move them by writing X/Y, and they’re not written into VRAM at all — they’re spliced in hardware into the video stream. So the OS basically says “put this here” and it happens immediately (dozen-ish instructions).

There’s also a 1-bit overlay bitmap (think OSD): any X×Y size, any on-screen position, any single color (or use it as a mask). That makes things like “PLAY>” overlays, menus, or a blinking text cursor trivial. Real-world example: I’ve used it to cover the screen in black while I redraw the frame buffer underneath, then unblank it. That keeps sync happy, doesn’t touch VRAM, and essentially emulates the 8301 screen-blanking behavior via a register.

The chip also has a hardware rectangle fill. You give it a rectangle and a color; it fills VRAM while the CPU carries on. That maps nicely to window clears and other chunky UI ops where software loops are slow. The catch: if the OS keeps a shadow copy of the frame buffer, you still need to mirror the change there, so the classic strategy is “do the hardware fill right now for instant visual response, then let the CPU update the shadow copy afterward.” You don’t keep the full throughput win, but you do keep the perception win.

Some concrete numbers for a 960×540 screen at 4 bpp (2 pixels/byte): that’s 259,200 bytes to write for a full-screen solid fill. On a 7.5 MHz 68008 with an 8-bit bus and zero wait states, a very tight assembly loop takes about 0.32 s. The hardware rectangle fill does it in about 0.028 s. At 25 fps, that’s roughly 8 frames vs 0.7 frames. You can see the difference. With a 16-bit 68000 at 30 MHz this shrinks a lot, but then the video chip’s FIFO starts to become the bottleneck.

I’m working on adding blit capabilities next — copying regions within the display and to/from off-screen VRAM. That would let me cache the background under windows and restore instantly when they move/close. Conceptually this is cloning a VIDC20-ish feature set and extending it. Outputs today: A digital video port that looks a lot like HDMI (but not called that) and VGA.

OS integration: I think these are “relatively difficult” (extensive work!) to fold into SMSQ/E, and “harder” (almost but not quite impossible) to spoon into Minerva. A common driver for both would be lovely if doable. I’ve got Aurora-mode compatibility to the point where Aurora drivers/GD2 just work in 8 or 16 bpp modes. All non-packed-pixel formats work in any resolution. 800x600 @ 65 fps or 1024x768 @ 50 fps are nice. On an HD TV, 960 x 540 @ 25 fps upscales exactly 2:1 and looks incredible. Truly rock solid corner to corner.

To make this approachable without OS changes, I’m writing an assembly library and BASIC procedures/functions that wrap the features.

Hardware status: it already plugs in and makes pictures, and it can act as a second screen on a BBQL with some caveats. I page the 2 MB frame buffer in 32 KB windows into an expansion slot region, and I map the (extensive) register block into internal I/O. That paging dance is because the QL’s 68008 only has a 1 MB address space. That’s what originally pushed me toward a 68SEC000 to get a 4 MB map. As a standalone video card, it probably wants its own local CPU; once you do that, it goes much, much faster.

This design is meant to carry me from 68000 → 68030. It’ll “work” on 68040/060, but eventually the CPU outruns the video chip. I plan to add a 16-bit FIFO when that bottleneck shows up. Right now, writes are roughly 3× faster than reads.

A few things I’d love feedback on:

* Does anyone have a definitive description of how the QL cursor blink is actually implemented (byte vs word granularity, any “flash bit,” whether it stashes original bytes or repaints each toggle)?
* Best hook points in SMSQ/E and Minerva to drive: hardware pointer enable/move, overlay updates for OSD/menus/cursor, and rectangle fills (plus a lazy shadow-buffer update afterward) — without tearing up existing code paths. I suspect these are delegated to extension functions, which is ideal as it keeps SMSQ/E consistent across branches.
* Edge cases worth testing: mixed depths, odd widths, partial-byte cursor overlays, etc.

If anyone wants pseudocode of what I'm implementing, I can post that soon. I've been focusing on bring up register sequences for initial configuration, not later processed like primitives or altering LUTs. I need to triple check that and clean it up.

Thanks for reading this far!

Post by **NormanDunbar** » Mon Sep 08, 2025 7:23 am

Morning Dave,

Interesting post. And way above my knowledge, but I'll be following along.

Cheers,
Norm.

janbredenbeek · Post by **janbredenbeek** » Mon Sep 08, 2025 8:58 am

It's actually a scheduler task that flashes the cursor by printing and unprinting a rectangle on the current position using OVER -1 (XOR) impressing... nothing to do with hardware.

Minerva has fixed the edge glitches and allows the colour and shape of the cursor to be modified - e.g. POKE !124!51,76 gives you a flashing underscore.

tofro · Post by **tofro** » Mon Sep 08, 2025 11:00 am

SMSQ/E has a more sophisticated cursor handling:

The cursor is no longer a simple rectangle, but rather a loadable sprite (i.e. free-form, as long as it''s 6x10)
Because it's a sprite, it can be more than just a simple red blob, but anything within the limits of screen depth (i.e. multi-colour).
Cursors can be modified on a per-job base

Cursor sprite support can be switched off, so there is a fallback to "traditional" cursor handling.

ql_freak · Post by **ql_freak** » Tue Sep 09, 2025 12:01 am

XOR Painting is great! I miss this in e.g. Pythons TKInter (at least I have not found, that this is possible). On the QL it's reltively easy to paint a hair wire, which you can move over the screen. First print it to show it (all pixels below it are inverted), to move it print it again (restores all pixels) and print it one pixel to the left (right) or/and top (bottom).

Dave · Post by **Dave** » Tue Sep 09, 2025 8:11 pm

NormanDunbar wrote: And way above my knowledge, but I'll be following along.

My peak is quite tall but very narrow. I'd trade this for your broad deep pool of knowledge any day. The breadth of your knowledge is far more useful to you.

janbredenbeek wrote: It's actually a scheduler task that flashes the cursor by printing and unprinting a rectangle on the current position using OVER -1 (XOR) impressing... nothing to do with hardware.

Excellent poke! The aspect I am taking issue with is that the XOR (nice to know the mechanism) requires a read-modify-write cycle for 6 pixels width. This could be a cursor at any arbitrary co-ordinate. In mode 8 the cursor is three words wide and 10 rows deep so that means at least 30 words of read-modify-write across the 8-bit bus. That's before the loop, counter and offset calculations. The cursor is the lighter case. The pointer isn't redrawn every half second, but every time it moves. It isn't XOR'd but some more complex "write stored copy, read new position, store copy, modify original, write back, return" series of cycles occurs. This is by itself no problem except that the code runs every time the mouse moves which could be on every maintenance poll. When you unroll the number of words/lines that need to be written, read and written again for an arbitrary located sprite. That sprite could be up to 64x48 pixels. 33 words * 48 lines - 1584 words that need to be read and written then new location read, stored, mask edited and rewritten at the new location. All using two byte-sized transactions. 3168 bytes experiencing multiple operations. That's quite expensive. I did a tight shortcut loop and cycle counted it as 285,122 clocks, which equates to roughly 38.0 ms. That's just the read, write, modify, write cycles without any location calculations or nested loops. I am allowing for if the pointer moves up or down, the new rows need less processing than the re-altered rows.

What I'm doing is a one time set-up of the cursor/pointer (writes only) and then moving it is as simple as two byte writes. Changing the pointer to the 2nd pointer is one byte write.

The loss for SMSQ/E would be the ability to have rainbow cursors/pointers but they would still be 3-colour, which most examples I've seen are anyway.

Now, I don't want this to sounds like a rant. It goes on a bit and explains the inner workings of the comparison. This idea is just being floated as an option someone could use if they wanted to speed up a pointer-using game or etc. It becomes especially useful if the QL has dual screens side by side and you want to move the pointer between two screens smoothly. By the way, when I put together a video demo of dual screens, I think that would be my first demo, and BASIC programming in one display with the results in the other display with the editor open the whole time would be very very cool. Especially as it's fairly easy to do this with only program support. It's purely a "wouldn't it be nice if...." What we have works and is ingenious. I'm not knocking it. There's just a leaner way to get a better looking result for those that might want to use it.

I've been writing a game for several years with the premise that you're a zombie apocalypse survivor. It's multi-player, online. Each round, the players are zombies except the winner of the previous round. The zombie that catches the person is the person next round. To make things more dynamic, if there are more players then it increases the number of live people. 6+ players, 2 survivors, 9+ players, 3 survivors. Always 2x the zombies to survivors. I'm writing this to run on a few different retro platforms. It's the only way I can be sure there are enough players most of the time to make gaming possible, even enjoyable. The ability as a developer to code in a large window and compile/run the game on a separate display with the ideal resolution/colour depth is very appealing to me. The complexity of this is simply editing the frame buffer pointer between the two areas to gain write access to each. The OS doesn't even know. If the program doesn't detect the 2nd screen it knows it's in straight game mode and doesn't do any of the logging or 2nd screen window set-up.

tofro wrote: Mon Sep 08, 2025 11:00 am SMSQ/E has a more sophisticated cursor handling:

The cursor is no longer a simple rectangle, but rather a loadable sprite (i.e. free-form, as long as it''s 6x10)

Because it's a sprite, it can be more than just a simple red blob, but anything within the limits of screen depth (i.e. multi-colour).

Cursors can be modified on a per-job base

Cursor sprite support can be switched off, so there is a fallback to "traditional" cursor handling.

This is cool. This is all implemented in software. There is no sprite hardware in the QL. There might be on the Q68 or QIMSI, but I don't know. My implementation doesn't have sprites implemented. I do have early blit functionality started, and supporting pseudocode. This is powerful because I can preload squares of game icons into non-display frame buffer areas, then blit them into the visible area instantly with a few register writes. The writes are the same size and timing regardless of blit size. The video chip is running at 50 MHz so the blit will take the same time on any platform with any CPU speed. Right now it's core pseudocode and some buggy BASIC while I work out how it'll work. It has to be easy to use. It has to be callable from BASIC or assembly and I'll have to create a C library at minimum. And documentation.

Again, I'm not saying OMG BEST IDEA EVER WE MUST DO THIS! I'm just saying there's this approach I am starting to write into my code and it's noticeably quicker. It's a subjective feel thing. The slower the machine the more of a difference it makes.

Just collecting peoples' feedback on what makes sense to do, how things work, and possible other ideas of how it can be used. EG if there's just one good and one bad guy, the cursor and pointer can be used together overlaying any arbitrary backdrop. I just can't put them in multiple positions unless I interlace one by moving it per frame to the various locations and it honestly looks horrible.

I'm researching HDMI capture devices that are economical and work with a Mac. I'd like to get some very high quality screen captures when it's a bit further along to show how truly crisp it is. It's like cheating. It's just these damned bugs I need to iron out - mostly on the SGC clone side with some subtle timing issues I'm polishing off.

The Sinclair QL Forum

Cursors and primitives and resolutions, oh my!

Cursors and primitives and resolutions, oh my!

Re: Cursors and primitives and resolutions, oh my!

Re: Cursors and primitives and resolutions, oh my!

Re: Cursors and primitives and resolutions, oh my!

Re: Cursors and primitives and resolutions, oh my!

Re: Cursors and primitives and resolutions, oh my!