Regarding a SRAM internal expansion:
Two such RAM chips (if a PCB was made, a SMD case would be perfect for small size) or even one larger 1Mx8 chip (they do exist but are quite difficult to find), can be used to make a very clever and fast RAM expansion similar to the Trumpcard (but without the floppy controller). Using a GAL decoder, this could implement full shadowing of the on-board videao RAM which will speed it up quite a bit.
The trick is in the decoding logic for DSMCL and the two chip selects for the RAM chips.
Since two 512k RAM chips have a total capacity of 1M, which is equal to the full address space of the 68008 in the QL, some of the actual RAM will not be used as such because parts of the address map have to be left as is to decode the on-board (EP)ROM, IO, ROM slot, and perhaps a 16k slot for Io expansion (for a floppy controller or IDE interface).
This is how the decoding logic would work:
Inputs: Outputs:
A19 A18 A17 A16 A15 RWL DSMCL RAMCS0L RAMCS1L EPROMCSL
0 0 0 0 X X high-Z H H L System ROM (or EPROM)*
0 0 0 1 0 X * * H * Extended ROM or emulation
0 0 0 1 1 X high-Z H H H QL internal IO + spare
0 0 1 0 0 0 high-Z L H H Screen 0 write
0 0 1 0 0 1 H L H H Screen 0 read
0 0 1 0 1 0 high-Z** L H H Screen 1 write
0 0 1 0 1 1 H L H H Screen 1 read
0 0 1 1 X X H L H H replaces internal top 64k
0 1 X X X X H L H H implements 256k expansion
1 0 X X X X H H L H additional 256k expansion
1 1 0 X X X H H L H additional 128k expansion
1 1 1 0 X X H H L H additional 64k expansion
1 1 1 1 0 X H H L H additional 32k expansion
1 1 1 1 1 X H*** H H H top 32k of IO area is free
* can be used for ROM emulation on a Minerva
** depends on use of screen 1, see below.
*** has an interesting side effect when no IO expansion is present
Now, there are more clever ways to set-up the decoder so that one 512k and one 128k chip can be used, for instance, also address lines can be jiggled around a bit to simplify and possibly fit the logic into a smaller GAL still including more options.
What does this decoder actually do?
Well, it maps the first 512k SRAM chip into the bottom 512k of the QL's memory map (addresses 00000h..7FFFFh), the second 512k SRAM chip into the top 512k of the QL's memory map (addresses 80000h..FFFFFh), but not in it's entirety. It 'skips' the required QL internal bits and pieces, and does this with some cleverness regarding the internal RAM, to gain speed.
The second chip decoding (RAMCS1L) is easier to understand - it simply puts it in the top 512k of the entire QL address map except for the very top 32k at F8000h..FFFFFh, where the last two 16k IO expansion 'slots' are, so that something like a floppy interface and/or IDE/CF interface can be connected and mapped there. The SRAM chip is prevented from responding at those addresses by keeping it's chip select signal inactive (high). However, there is a small catch regarding DSMCL which I will explain below. In any case the part of the SRAM chip that would map there is not used.
The first chip decoding (RAMCS0L) is a bit more involved as more areas need to be left unmapped so that the required QL's internal bits appear at the proper places. Although the SRAM chip maps into the bottom 512k of the address map, some addresses are not used by preventing the chip from responding by keeping it's chip select signal inactive, and leaving the internal QL hardware to respond instead in the usual manner by keeping the DSMCL signal inactive when these addresses appear. In particular, the system ROM area is left as is (for obvious reasons) although there is a EPROMCSL active low signal implemented for convenience should someone want to use an EPROM instead of the original ROMs. The EPROMCSL signal is also generated for both reading or writing so that a rewritable Flash ROM can also be used, for this a RDL (read active low) signal must be generated by the GAL by simply inverting RDWL. Extra options could be added to select use of the external ROM slot or use the top 16k of a 64k EPROM chip to emulate it - the table assumes EPROM emulation, but this is easily added into the logic.
Interesting things happen in the next 32k up. This area is normally unused on QLs but when Minerva is run, it will look for ROM images there, so it is possible to map some RAM into those addresses, and load ROM code, so the system can recognize them on the next reset as ROMs. The caveat is that the ROM images are unprotected and can be corrupted by writing data to that area. A better use perhaps would be to couple this feature with the use of a Flash ROM or EPROM, a part of it could be mapped here and indeed contain additional ROM images for the system to detect.
The next 32k up contain the QL internal IO and yet another unused 16k, this however is left as is in the QL for simplicity reasons.
The next 128k up (address $20000h and up) are occupied by the internal RAM and implement the two 32k screen RAM areas in the bottom half. Here a bit of cleverness is used to get around the problem of the internal RAM being slow. The whole top 64k of the internal RAM is replaced by the ram in SRAM chip 1, by disabling the internal ram via DSMCL and mapping the external SRAM instead. A bit more attention is needed for the screen, as the QL still has to write the contents of the screen RAM into the actual internal RAM, to have the 8301 ULA display it on the screen. To do this, it does not disable the internal RAM for writing, but does select the external RAM, which has the consequence of writing data to both the internal and external RAM. The internal RAM dictates the speed by virtue of the ULA generating the DTACKL signal (this is not mentioned in the table but see below).
However, the internal RAM is disabled for reading so the data written will be read back from the copy in external RAM - at the increased speed of the external RAM, about twice as fast.
The only other thing left is to decide if we want to use screen 1, addresses 28000h..2FFFFh. As shown, the decoder treats both screen 0 and 1 the same, enabling use of both screen areas. The entire screen 1 area could be disabled and external RAM mapped into it instead with no ill effect except for gibberish on the screen if screen 1 is activated. The reason one would want this is that when 2 screens are not used, the screen 1 area of RAM contains quite a bit of the system's internal data structures, which are very frequently accessed, both for reads and writes. System variables and tables all start at the beginning of this area and extend to higher addresses depending on how much extra RAM there is. Using external fast RAM for this will get you the last few % in speed, but in general the shadowing mechanism used already increases speed quite a bit.
The rest of SRAM chip 2 is used to implement the first 256k of expansion RAM.
The table shows how decoding is put into terms of a decoding table - any signal with an X is not used to decode that particular area of the address map. Sometimes the notation looks a bit odd, for instance the addresses for SRAM chip 2 have to be expressed as a 256k + 128k + 64k + 32k = total 480k RAM expansion (on top of that implemented in SRAM chip 1) just to cater for the last 32k remaining free for IO expansion boards. Similarily, some 384k (128 + 256) of RAM are implemented in SRAM chip 1. The total is therefore 864k.
Three important things do not appear in the decoder table:
1) either the CS signals for the EPROM and SRAM must only go low when DSL is low, or separate read and write active low signals must be generated by the GAL, using RDL and only active when DSL is low. In the first case the OE signals of the SRAM chips are connected to ground and WR of the SRAM chips is connected to RDWL. The EPROMCSL signal must also only appear when RDWL is high (only on read) or it's OE pin must be connected to a read enable signal generated by the GAL by inverting RDWL. In the second case, the connections are more logical and the chip select signals go directly to the relevant chips, while a read enable goes to all OE pins, and write enable to all WE pins (if present - the EPROM only has OE so it will be read-only).
2) DTACKL must be handled for all addresses that the expansion hardware takes over. This is normally done by simply copying the DSL signal to DTACKL whenever the decoder generates DSMCL high (accounting for the fact that DTACKL should only pull low but remain high impedance or tri-state when it's supposed to be high, i.e. inactive). In the case of this hardware, there is an exception, and that is when the external SRAM is used to shadow a write to the screen area. In this case DSMCL is not generated, and neither is DTACKL but the relevant chip select to the external RAM chip and it's write enable is. This lets the QL's internals behave exactly as usual, enabling the external RAM to 'pick up' the write data and store a copy of it.
Using DSL to generate DTACKL with no delay results in the shortest and fastest access cycle. All SRAM chips of the required capacity are so fast that they could easily run at twice the speed of the QL or better, so running them slower than maximum significantly reduces power requirements (the chips use CMOS technology, with it power consumption is proportional to speed), which is an added bonus.
3) There is an unexpected caveat to the way the OS scans for RAM which results in RAM being correctly detected up to the very top of the address map, including the 32k Io expansion area 'snippet' left free there. The reason for this is the QL's internal decoder only decoding 256k of addresses, so it simply sees the same actual internal hardware in each block of 256k within the 1M of the complete address map. Hence, the system ROM will appear at 00000h..0BFFFh, but also at 40000h..4BFFFh, 80000h..8BFFFh, C0000h..CBFFFh. The same happens with internal RAM - 20000h..3FFFFh, also at 60000h..7FFFFh, A0000h..BFFFFh, E0000h..FFFFFh. In our particular case it's the alias of the internal RAM at E0000h..FFFFFh that is of interets, in particular it;s last 32k at F8000h..FFFFFh, because all of the rest has been disabled by the decoder for our RAM expansion. However, the last 32k in the address map has not been disabled, and when scanning for RAM, the system will find an alias of the on-board RAM there, and in fact initialize it and use it as RAM, increasing the total system RAM to 896k. But hang on, isn't this the same as at addresses 38000h..3FFFFh where the system expects the original 128k of RAM? Well, no - because we have disabled that particular part (as well as all other aliases) with our decoder, so this is actuallythe only place where the CPU can now find the top 32k of the original slow 128k RAM, and in fact it will use it if no IO expansion is installed to take over those addresses as it would do normally.
If one has read carefully, then one may notice there is a potential problem here, and indeed it would be the reason why there was never a 640k RAM expansion - except the one I made for myself and stumbled across this problem.
Here is what happens - such an expansion would use the internal RAM and then add 512k and a further 128k to make up for 640k of expansion RAM or 768k total RAM. However, when the system checks for RAM, it finds it starting from the usual place at 20000h, and all the way up to DFFFFh, making 768k total. But then, it encounters an alias of the original 128k RAM at $E0000h at which point it is again testing the first 128k or RAM and writing stuff into it. This was very odd to see - the screen would fill with the test pattern, then for a while nothing would happen while the extra RAM was being tested, then all of a sudden the screen would fill with the test pattern again and then the system would crash before the F1/F2 prompt - because the RAM test would actually overwrite the system variable area being built on-fly with the test, at the start of screen 1, by writing to it's alias.
So, if one wants a QL with 768k total RAM, there has to be a 'fake' or real IO expansion at E0000h just to stop the RAM test from crashing the system
