Hardware programmable timers

Peter · Post by **Peter** » Wed Aug 20, 2025 1:18 am

tofro wrote: Tue Aug 19, 2025 12:06 pm Well, the exact reason isn't really relevant here, in my opinion: My point still stands: If sound output eats more than 50% of your CPU, there's only 50% left to generate sound. And somehow it must be generated, that's what people tend to forget.

I would agree if SLAVEing was an inherent part of sound generation. But it's not. SLAVEing eliminated -> sound generation fine. E.g. it can be circumvented by using a ramdisk or FS.LOAD if the file is small enough, or temporarily allocating memory and releasing it after the MIDI file was played.

Dave · Post by **Dave** » Wed Aug 20, 2025 4:55 pm

Nasta wrote: Tue Aug 19, 2025 6:09 pm The problem with such a system resource is, who gets do decide the correct divisor and what happens when it is changed and the internal division counter is in any given state - for instance if it is a count down type timer and someone changes the reset value (which is the divisor) from say 100 to 10, while it was at state 50, will it now count from 50 all the way to 255, loop back to 0 and then reset at the new value of 10?
System timers used for base timing are NEVER a 'public' resource, where any old job can change their frequency, rather they are always system only. That being said, any application specific hardware can implement their own timer, assuming there is an interrupt available to service it (although in theory it might be polled in a faster polling loop too...).

This exposed an indeterminate behaviour in my implementation. I added a reset counter step to my verilog so the behavior will at least be consistent.

My take is that very few programs/tasks need a configurable timer and most will just trigger off the default interrupt. The likelihood of two different programs wanting the programmable interrupt/timer at the same time is low. A program can always tell if a particular timer is already in use by the non-zero value. If someone writes/ports a sequencer and someone else writes a piece of software that might conflict with it, I'd consider that a huge success. And a problem for the programmers to get together and solve. If they wanted to they could add their handler code to the end of the other program's handler code. It strikes me that if the situation were to arise it would likely either be the same programmer, or two programmers who know each other well and have done for the last 40 years

Nasta wrote: Tue Aug 19, 2025 6:09 pm So this is where we get to interrupt latency... which is btw a potential problem with vectored interrupts, as vectors are still shared between the available interrupt levels, so multiple interrupting devices on the same level but with different vectors still need prioritizing. More on that later...

Still better than the current situation. The 68SEC000 in 16-bit mode at 30 MHz is theoretically 8x faster than the 68008. Real world it comes in fairly close behind a SuperGoldCard and well ahead of a GoldGard. Something like 6.5-7x more immediate. A large part of the gain comes from the CPU and IO being decoupled from video generation as is the intent of Issue 6 and 7.

Nasta wrote: This does not work in a multitasking environment where multiple programs may want to use the timer. One taking over another without the other being aware of that would potentially result in instability. The way this is normally done, and in fact is under QDOS is that the (in this case auto-)vector is pointing to a linked list of interrupt service routines, which can be linked into by any program that wants to use the interrupt.
There is also the question of how exactly the interrupting hardware is serviced, which comes down to how the CPU implements interrupts. Again, more on that later. For now, the interrupt vector points to some code to properly address interrupt acknowledge, then jumps to the linked list of tasks the interrupt causes to be performed, which each are parts of code of other 'programs'.
From memory, new interrupt tasks are linked at the start of the list but since the list is in a known format any linking software can follow the list and link wherever. The order of course determines the priority of servicing.

This is for the 50 Hz basic services interrupt. Those calls are baked in to that interrupt already. The notion of them being serviced at 4096 Hz is quite amusing to me. I'm already baking my noodle on how the network is supposed to work when not only is the CPU a different speed but also changes speed. I suspect that a neat, crisp separate timer with its own interrupt level and solo handler code in its sparsely utilized interrupt level is what's right for a cubase port or similar. I'm sure there are tons of other uses but again I can't see many of them being used at the same time. It also seems incumbent on any sequencing software to start its timer when play or record are asserted and to stop the timer when they are de-asserted. The timer being very easy to adjust and start/stop makes for a lot of utility. EG: if the user wants to change the beats per minute they can achieve this by editing the timer so it's eg 128x or 256x the BPM rate.

Nasta wrote: Tue Aug 19, 2025 6:09 pm So imagine this scenario:
The IPC pulls low IPL02L, causing int 5 and then (since there is now ay for the 8302 to know the status of the IPL02L pin), the 8302 pulls IPL1L low, so now what was supposed to be int 5 with int 2 pending has just become int 7, which has interrupted the int 5 servicing routine 'somewhere', and the int 7 routine is now supposed to untangle who actually caused what level interrupt and emulate the proper response, i.e. this is actually a false non-maskable interrupt. What is worse, both int 5 and/or int 2 could have been masked because other important processing is being done, and all of a sudden you have a non-maskable interrupt, which WILL be processed.
Short version: only level 2 is usable as it stands now, DO NOT USE the IPC pins to cause an interrupt directly, or for that matter the IPL lines on the J1 bus, you are in for a world of hurt.

Lovely wrinkle.

As I have it currently the interrupt is on the external card and doesn't affect the QL IO area, so it's not passed through. The QL's 50 Hz interrupt is passed through to the expansion card (which remember has a full 68000 and its own RAM). I suppose the 50 Hz interrupt could be remapped to a new priority level and have that reflected by editing the vector table. I am using a Greenpak PLD as an interrupt handler. This does limit me to a 12-bit counter, however. Beyond prototyping I'm using a Tang 20K to implement video, IO and an MC68901-style interrupt handler/timer. I'm open to suggestions. Just trying to get the sound chip working for MIDI file playback first trying different methods after finding jitter problematic.

Nasta wrote: Tue Aug 19, 2025 6:09 pm I even proposed some changes to the J1 bus along these lines - instead of exposing the IPL lines on the bus (which actually is NOT usable for the same reasons as explained above with the IPC scenario), re-use then as inputs to the interrupt level encoder.
One possibility would be:
INT 2 is available through the EXTINTL pin
IPL1L can become an int 5 request
IPL02L can become an int 7 request.

I have scribbled this down. [Edit: delete comments that were private - forgot where I was typing. Note to self: this is not an email to nasta!]

[More deletion]

Anyhow, back to learning what works and doesn't regarding interrupts from people on the forum. I've got some good feedback that may well push me in a different direction.

Popopo · Post by **Popopo** » Wed Aug 20, 2025 9:58 pm

Hi!
Excellent thread and very helpful to know more about the low level architecture of the QL.

Also I am working on my own sound card implementation for QL. But from the POV of someone who doesn't know the architecture as you do, not programming low level for QL neither. So my project is not only to give a sound card for QL but also a way to know better the system.

Said that & excuse me if I say something too much dumb

My approach is pretty different considering what wrote above. I rely on a existing solutions applied on ZX Spectrum by other Spanish developers.

What is the doubt? what would be more helpful for programmers to use your development? to solve a issue that you have explained with the clock?

BTW, is your project OpenHardware?

Dave · Post by **Dave** » Wed Aug 20, 2025 11:22 pm

My project will be open source on release. Anything I do that isn't open source will become so on my death.

The more the merrier. My sound card is a synth on a chip. It's not an 80s sound chip like on every computer in 1984 - instead, it's a late 80s chip from the heart of a Yamaha DX synthesizer. This was the foundation of General MINI and the XG50 wavetable synthesis sound cards that became popular 1989 thru 1996 or so. My sound chip has very strict timing requirements to sound right when replaying MIDI files. 80s sound chips just react "now" to their instruction "now" so they're ideal for game sounds and game music.

A sound card is a really good way to learn about the QL expansion system.

Peter · Post by **Peter** » Thu Aug 21, 2025 9:52 am

Wouldn't it be possible to implement a hardware FIFO for the audio samples which can hold data for over 20 ms and feed the synthesizer directly at the intended sample rate? That would guarantee lowest jitter, eliminate the need for an extra timer interrupt and provide the best performance on a low-end machine. (Just asking, I know that it might be too complex without FPGA.)

tofro · Post by **tofro** » Thu Aug 21, 2025 10:46 am

Peter wrote: Thu Aug 21, 2025 9:52 am Wouldn't it be possible to implement a hardware FIFO for the audio samples which can hold data for over 20 ms and feed the synthesizer directly at the intended sample rate? That would guarantee lowest jitter, eliminate the need for an extra timer interrupt and provide the best performance on a low-end machine. (Just asking, I know that it might be too complex without FPGA.)

If you look at, for example, PC sound cards like the SoundBlaster, these provide two buffers where one can be ticked out by the sound card while the other is being filled by the CPU in parallel. An interrupt is generated when the SoundBlaster starts to operate on a new buffer (thus, the other one is done and can now be accessed by the CPU). Exact timing is guaranteed and handled by the sound sub-system itself only, and the CPU timer has much less requirements on jitter.

Nasta · Post by **Nasta** » Thu Aug 21, 2025 5:08 pm

Quick answer to the last 2 posts, this is true for reproducing audio samples. By the time PC plug and play came into the mainstream, practically all sound chips had some form of FIFO of the ADC and DAC, i.e. digital PCM audio data in or out.
MIDI is a 'bit' different, in that it can be used as an 'immediate event' serial stream - a 'note on' code received is immediately played. In most cases, the synthesizer core on a PC sound chip is capable of more or less interpreting MIDI data with some repacking, plus some handshaking has to be done through the relevant registers to load the data, but timing is generated/calculated from specialized data in the file based on a time reference. This data has to be loaded into the relevant sound chip registers according to that timing, as exactly as possible, as it reflects what the chip is supposed to be doing from that moment on.
The way I understand what is being said, there is a need to have a stable timing reference that would be able to generate a low latency interrupt to load data retrieved from the MIDI file to the relevant synth chip control registers. Any FIFO action therefore actually happens before this point, what we are talking about is at the out part of the FIFO.

Dave · Post by **Dave** » Thu Aug 21, 2025 6:46 pm

Peter wrote: Thu Aug 21, 2025 9:52 am Wouldn't it be possible to implement a hardware FIFO for the audio samples which can hold data for over 20 ms and feed the synthesizer directly at the intended sample rate?

Yes. I have always implemented D2A channels with 2Kx9 FIFOs, using the 9th bit as a signal present marker. If I implement analog audio on this board that is how I would do it. 2Kx9 at 48 KHz allows the hardware to be sampled as low as 35-40x per second.

The chip I am working with has a microphone/line input and headphone/line output but does not have onboard sample audio capabilities. It does have a [serial|parallel] port that can be read/written as straight MIDI data. Incoming data from MIDI IN is parallelized and passed thru the RX register to the computer for capture or replay back to the chip's TX port.

The MIDI data to the chip is handled instantly. It is this data that needs very low jitter.

If I DO implement analogue sound sampling/replay it will only be because I implement the MIDI thru and out channels with a $1 microcontroller. A2D and D2A audio would be implemented as spare capacity on that microcontroller. That's a stretch goal and not important to the basic module.

For clarity, the chip I am using is the SAM2695. It's a very powerful single chip MIDI synthesizer used in the Yamaha TX7. There are two makers of this, and I am using the French "Dream" version which correctly implements General MIDI, chorus/echo/reverb/pan effects, and XG50 secondary sound set.

Datasheet: https://docs.dream.fr/pdf/Serie2000/SAM ... AM2695.pdf

M68008 · Post by **M68008** » Thu Aug 21, 2025 6:55 pm

SAM2695 sounds too good and modern for a QL

Nasta · Post by **Nasta** » Thu Aug 21, 2025 7:11 pm

So, a bit more on interrupts:

Again, some notes on how the 68k interrupt system works:
When a non-masked interrupt level is recognized on the ILP lines, the 68k will start exception processing (exceptions being a much wider category than interrupts, but in this case interrupts are what we are discussing).
An interrupt can be looked at as a 'jump to subroutine' caused by an external event, so the CPU has to do:
1) Acknowledge the interrupt and through that mechanism figure out what device caused it, so it knows to what 'subroutine' to jump to, that is expected to handle the interrupt
2) Store the current CPU context (i.e. remember where it stopped normal program execution)
3) Jump to the interrupt handler 'subroutine'
4) On completion of the interrupt handler, restore the CPU context (i.e. go back to where it was interrupted).

For this particular point in the discussion, let's look closely at point (1):
To acknowledge an interrupt, the CPU executes a special read cycle, with a special encoding on the FC and address lines, the encoding also contains the interrupt level that is acknowledged. The interrupting device is expected to respond in one of two ways:
a) with a 1-byte exception vector, the value of which is multiplied by 4, and this number is taken as the address from which to fetch the address of the interrupt servicing routine.
b) with an 'autovector signal' (the implementation of which varies on different 68k CPU family members), which forces the CPU to calculate the vector from the interrupt level being serviced, as 24+interrupt level. This number is again multiplied by 4 and the result is used as an address from which the address of the servicing routine is read

There are many subtleties to the way the 68k handles exceptions in general, which are also filter down to how interrupts are handled. For instance, what if no device responds, so no vector byte nor autovector signal? There are provisions to re-run the vector request or, default to vector 24. Also, there is a particular quirk of the 68k that although vector numbers 64 to 255 are defined as user interrupt vectors, an interrupting device can pass any interrupt vector, even vector 0 which is actually the initial address of the supervisor stack pointer at reset. This 'feature' is however used by dedicated 68k peripheral chips, which normally contain a register for the interrupt vector to e loaded into, so when the chip causes an interrupt, this will be used for interrupt acknowledge - on devices that implement this, it is required that the initial value of the register is 15, so if a device causes an interrupt out of turn while not being fully or properly initialized, there is a default vector which an OS can 'catch' and signal a programming error.

It should be noted that since multiple levels of interrupt exist, not just one maskable and one unmaskable as on older CPUs, it is permitted that recognizing a higher than currently processed interrupt level, will result with the current interrupt being interrupted by a higher level interrupt, unless the current interrupt servicing routine has masked higher level interrupts. This may turn to be a problem when adding higher level interrupt support on the QL.

All of this means it is not exactly possible to make a dead simple 'Sinclair style' interrupt system on the 68k - external devices are expected to encode the interrupt level and in case several devices are using the same level, and using vectors, external hardware has to decide which one of the possible devices causing the interrupt should respond with a vector when the interrupt is acknowledged, as only one can do so. This system can implement very complex interrupt structures.
However, since we are talking about a Sinclair product, it is simplified as far as it could have been. Any interrupt acknowledge is automatically made to request auto-vectoring. BTW this is possible even if devices capable of vectored interrupts are used to cause interrupts - it's just that detecting which one did it has to be done in software. Also, as I said in a previous post, since no interrupt priority encoder has been implemented, there is really just one interrupt level that can be used.

So, how are multiple interrupt sources handled in this maximally simplified system? This is basically the same as on any older system - when an interrupt occurs, it is auto-vectored to an address defined in the vector table, in this case vector 26, which on the QL is a fixed address residing in ROM. This jumps to the interrupt handler. It is up to this piece of software to go through all sources of interrupt and detect why the interrupt happened, and decide from that what to do about it.

On the bare QL this happens by reading the interrupt register, physically this is inside the 8302 ULA. There are several bits that will be set by the hardware requesting an interrupt, the handler has to read the register and then depending on which bits are set, act on each of them in turn based on priority hardcoded in the actual code, i.e. in which order does it check them and act on them. Also, once it has acted on a bit being set, it has to reset that bit according to a protocol depending on the actual hardware, to signal that it has handled that particular interrupt source.
As long as any of the interrupt bits are set, the interrupt pin on the 8302 remains active, so if some are not handled and the handler returns from interrupt processing, the remaining set bits will just cause another interrupt and get right back to interrupt processing. This last bit would be quite inefficient as it means all the 4 steps above will happen again, which requires time. The time that passes between hardware causing an interrupt and the interrupt being handled is usually referred to as interrupt latency and is a very important and contentious issue, especially for a real time OS - which is appropriate since QDOS/SMSQ has a lot of characteristics of a RTOS.

What is not so obvious is that the simpler interrupt system described above can be faster than a vectored interrupt system, and in general simpler to manage. One reason is that it skips the inefficiency of CPU context switching every time an interrupt vector has to be resolved - for instance if many devices are using the same interrupt level but supply a different vector, even with external hardware priority arbitration, the only way for the CPU to get the next vector from a device that has an interrupt pending, is to return from exception processing and then let itself be interrupted again.

In a simple system, the steps above are usually done as 1, 2 (where the common interrupt handler is invoked), and then as many steps 3 as are needed to service all interrupts, by software maintaining a table or linked list or simply hard coded order of servicing routines, and then finally step 4. Some further efficiency can be gained by storing all but a small number of registers with important pointers, saving the context of the CPU at point of interrupt, then do as many interrupt servicing routines that can consider all other registers free to use, then when all of them are done, restore them and end exception processing.
In a multi-vectored system each routine would have to store as many registers as it needs, then restore them before finishing, possibly only for the next interrupt to store them again, etc. On a slow CPU this is quite important, especially since we are talking about 4-byte wide registers on a CPU with an 8-bit bus.
On a faster more advanced 68k members this may actually be even ore important as there is more of the CPU context to be stored on the stack, and also, internal operations will be much faster than external reads and writes, the latter will rely on caching. While the faster CPU may well reduce latency in terms of time, it may NOT reduce it in terms of CPU cycles performed.
Yes, it is possible to do the same in a vector based system, but then one is exactly emulating a non-vectored system, by reading the vector register and calculating the vector manually, fetching the address and jumping to the next handler in turn.

There is abetter way:

I mentioned latency, and from there we quickly get to the concept of interrupt priority management. In case of a hardware based solution, unless it's quite complex hardware, you have what you have. In case of a simpler system implemented with linked lists, the list is handled in order of linking, first links get handled first. In other words, the linking order defines the priority and largely (* <- remember this!) latency. The linking order can therefore be used to define priority in software.

The way this is implemented in general, is that (in QL's case) an autovectored interrupt jumps to a sort of skeleton handler, which implements a 'preamble' of sorts that any interrupt handler would do before actually handling the interrupt, which is basically step 2 from the beginning of this post, presenting each of the linked handlers the same 'software interface', which it then fills in by going through the linked list of handlers, normally providing a pointer to the handlers private data (stored in the list). Each handler in turn is called as a subroutine (that would be step(s) 3 above), and can basically do whatever it needs, with the restriction that it has to keep the list pointer intact. When the list is exhausted, there is a 'postamble' which then implements step 4 and returns from exception handling.
It is up to each sub-handler to know what it has to do with it's hardware, to properly acknowledge, handle, and clear the relevant interrupt, including from multiple source(s).

SO, finally we come to the (*) above.
One of the hallmarks of a real time OS is that interrupt latency and processing times are well defined (at least as a range of best to worst case timings). The above is all assumed not to have been interrupted by a higher level interrupt. In that case, if the linked list is properly ordered and the handlers themselves are properly done with respect to the actual hardware, there is very good predictability.

The 68k will automatically rise the internal interrupt level mask one above the interrupt level currently being processed, so a higher level interrupt can interrupt lower level interrupt processing. That being said, once any exception processing is started (including interrupts) the CPU will automatically be in supervisor mode and there is nothing preventing an interrupt handler from altering the interrupt level mask to prevent itself from being interrupted while doing time-critical operations, or operations that must be atomic (usually some sort of handshake procedure required for proper interrupting hardware operation). The problem is, if higher level interrupts are implemented, assuming higher priority and less latency in their handling might not be a given, if every lower level handler decides the first thing it does is rises the interrupt mask to level 7.

There is really no other way to get completely around this, especially as legacy software can actually do this.
Basically, any interrupt structure can be mishandled, and handlers can misbehave, there is no bullet proof way around it.
What can be done is prescribe and expect some discipline in how handling of interrupts is done depending on level. While higher levels will be prioritized, it is expected also that the actual handling of said interrupts is as quick as possible. This should really be something along the lines of 'check status register, if needed move (X amount of) data from X to Y, clear interrupt, return'.

So, back to system timers:
If there is a hardware timer that generates a periodic interrupt on the system level, in order to prevent the actual timer read from introducing extra latency, there is a software interrupt count. Incrementing the count is handled in the 'preamble' (step 2 above) of the main handler, BEFORE anything else, to insure the count is accurate. Sub-handlers can then read the count and compare with what the read the last time around in their local data space, see how many interrupts have passed, and based on that decide if any actual handling is needed - if not, return and do not waste any more time. Local counters can be kept as well.

There are also hardware assisted solutions:
Said hardware timer from above also has a hardware counter, which can then be (only) read by anyone, to see how much time has elapsed since last the handler was called. This is to a large extent independent of the actual interrupt system, because the counter counts regardless of any interrupt being handled or masked - tough if used, has to be taken into account when writing and linking the handler.
Also, a hardware timer can be a part of the hardware that needs it to be properly serviced, and thus private to the interrupt handler for that hardware. How it is used, as a counter and/or interrupt timer, is then up to the designer, but it is not generally available to other programs, and cannot be meddled with by other code.

Oh, and I keep forgetting:
Interrupt vectors use up 3/4 of a k od data in the ROM. I've known a certain Tony Tebby that wrote a rather complex piece of code in that size of a space and called it bloatware

so back to Sinclar ways... the OS was barely squeezed in the available space.

The Sinclair QL Forum

Hardware programmable timers

Re: Hardware programmable timers

Re: Hardware programmable timers

Re: Hardware programmable timers

Re: Hardware programmable timers

Re: Hardware programmable timers

Re: Hardware programmable timers

Re: Hardware programmable timers

Re: Hardware programmable timers

Re: Hardware programmable timers

Re: Hardware programmable timers