Well, that's a rather involved question
Under QDOS the CPU is virtualized to each job (each sees it as completely available to itself to a very large extent), but memory is not in the strictest sense (that part is in a sense co-operative, as it relies on the CPU being able to address everything relative to a base address).
The virtualization is provided in a time-sliced manner under the control of a scheduler, which gives every job a certain length of time to run before it's suspended for others to run. From the standpoint of the job, it does not know this except through various system calls which can be time limited (that's one of the RTOS elements - most of those were never used).
Now, in principle, it should not be particularly difficult for jobs to run on multiple CPUs, using a central scheduler (although that would actually be a set of schedulers connected by a central data structure) - this is how most current OSs work on multi-core CPUs (and those are actually a notional equivalent to multiple discrete CPUs).
One thing that would be different is a consequence of non-virtualized memory - and in fact to an extent it's a simplification. Under QDOS, multiple CPUs would have a common memory, or at least a part of common memory. Among other reasons, it's because the jobs communicate with each other and the real world through areas in RAM. While jobs can ask the OS for memory to use for whatever, and this could be RAM private to a given CPU, it's not easily discernible how the RAM will be used, to know if it should be allocated from shared RAM (at least as far as I remember).
On the other side of all of this is the part of the OS that has to do with the real world - which is not time sliced and scheduled according to a priority system, but rather driven by real world events and protocols that arise from that. This is not easy to distribute among multiple CPUs, though it can be done, but not completely automatically. It is also a completely different way to exploit parallelism, as handling IO has a lot to do with abstracting real hardware into data structures, and can be handled independently by dedicated CPUs which may not even be 68k. One consequence of this is that general multi-processing (or multi-core) solutions tend to have a rather involved interrupt system, mostly dedicating one CPU to doing most IO tasks, with most interrupts routed to it. This is also usually the 'main' or 'boot' core, which starts the whole computer, in turn working as a single-CPU (or single core) machine until everything is set up fot he scheduler.
Finally, there is one aspect of modern CPUs that makes multiprocessing a bit more complex if it's not catered for in some way by the CPU and system design, in addition to handling interrupts (i.e. indirectly real-world events) and that's memory caching.
Under most multitasking OSs, and also under QDOS, it's possible to run multiple copies of a job using the same code (which than has to be re-entrant, in most cases meaning no assumption about data spaces and absolutely no self-modifying code, most systems that are memory managed do not let this happen anyway), so extra steps have to be taken when a job is started and ended to handle leftover copies of code in the caches as well as leftover copies of code in RAM no longer used by actual jobs.
And of course, all data structures that are accessible by multiple CPUs or cores (or indeed jobs since in this context there is no direct way to know if a job is executing time-scheduled, core-scheduled or indeed, both) and commonly accessible, must not be cached, or, in the more complex way to so things, cache snooped (not always possible nor feasible - this is when a CPU cache becomes common RAM so that when one CPU writes to common memory, the actual contents of all CPU caches that represent that memory are also updated). Given the way QDOS works, cache snooping would most likely not be used or required.
So, what would be required, assuming we can somehow tie multiple CPUs together?
On the OS side, the common mutex function the OS already provides to prevent deadlock by jobs handling resources, would have to remain, which means that part is centralized so resource allocation probably always runs on the boot CPU (or core). In other words, some OS calls would have to be 'queued' and then sequentially run one by one on a single core in order to manage resources properly. IO becomes more complex as some things can and others can't be distributed, for instance screen handling. The scheduler would become more complex but also faster once there are enough jobs to occupy the available cores as scheduling overhead happens only on one core (the others are instructed at interrupt which job to run next, without the need to calculate which one's turn it is). The OS also needs to be aware of common memory and private memory, though in the interest of simplicity, the best approach would be to have common memory and rely on caching to give the CPUs faster access to commonly accessed data.
On the hardware side, private vs common RAM has to be handled, but as i said, it is possible to make all of it look (or indeed be) common RAM. That being said, in order to access some parts of RAM and be able to cache the contents, but also being able to force non-caching to guarantee the common data is always a real copy, the available RAM would have to have a cache inhibited alias.
This was something I experimented while planning the GF, the entire RAM had an alias, so that if one wanted to access it as a non-cached area, a fixed offset was added to the required address and it would access that address directly, no caching. This would keep the memory allocation mechanism the same, and the job or OS code or driver or whatever could then decide as needed to use it cached or not, on fly - making that distinction also in a manner of speaking co-operative between whatever needs the memory in question.
======================
Re RAM limit for QDOS, that was only 2M, also nearing the limit for the slave block table

As far as I know, there was no direct limit and it could in principle use the entire 4G address space, but I have been warned that some parts of the OS treat address pointers as signed numbers so that would limit the available space to 2G.
That being said, there seems to be one application that lowers this limit further and that is (to my knowledge) the Qliberator Basic compiler, which used the top 3 bits of 32-bit addresses as some sort of debug info, counting on those bits not being available as a real address on 68k CPUs available at the time. In other words, it expects the top 3 address bits, A29, 30, 31 to be 'don't care' which reduces the usable address space to 512M and all 8 possible states of the top 3 bits need to alias to the same 512M.
The OS and applications themselves certainly do not need a lot or RAM UNTIL high res and simultaneously high color display is available and you want to run multiple programs under the PE. When a full screen uses up 32k of RAM, 4M is plenty, but try 1024x768 in 16 bit, so it becomes 1.5M all of a sudden. Some expansions to the screen drivers such as ProWess also require a decent amount of RAM once the display becomes higher resolution and deep color.
One more avenue of making things quicker is with ColdFire V3 and V4 CPUs. The problem with those is incomplete 68k compatibility and the need to write an emulator for what is not supported. It has recently been brought to my attention that the MicroAPL 68k emulation has become freely available, so that may actually make it possible to use these as a faster 68k - that being said, ColdFire is also becoming a dead end as ARM is conquering everything.