Anne & Lynn Wheeler
2006-07-20 19:02:18 UTC
Might be relevant if Lynn Wheeler could expand on the unreleased VAMPS
microcode to speed up 370 SMP, and also provided logical processors
with similarities to those on current zSeries LPARs, although that may
just have dropped parts of 370 sequential code down into microcode.
so presumably this recent post vis-a-vis vamps and the later i432microcode to speed up 370 SMP, and also provided logical processors
with similarities to those on current zSeries LPARs, although that may
just have dropped parts of 370 sequential code down into microcode.
http://www.garlic.com/~lynn/2006n.html#42 Why is zSeries so CPU poor?
misc. collected past vamps postings
http://www.garlic.com/~lynn/subtopic.html#bounce
early microcode effort was "VMA" original for 370/158 that helped
virtual machine performance. for subset of "supervisor" state
instructions, microcode was added to execute the instruction using
"virtual machine" rules (to avoid interrupting into the virtual
machine hypervisor where the instruction was simulated).
concurrent with VAMPS effort was "ECPS" for 370 138&148. ECPS did some
more stuff like VMA on the 158 (direct supervisor state instruction
execution) ... but it also identified parts of the hypervisor kernel
and moved that kernel code into microcode. the issue on 138&148
machines was that there was an avg. of 10:1 microcode instructions
executed for every 370 instruction. Much of the kernel code moved to
microcode on straigh 1:1 basis resulting in ten times performance
speed up. old posting identifying specific kernel code segments for
migrating into microcode.
http://www.garlic.com/~lynn/94.html#21 370 ECPS VM microcode assist
the VMA-related efforts eventually evolved into SIE ... where nearly
all supervisor state instructions had microcode enhancement for
directly executing with regard to virtual machine rules (avoiding a
lot of interruption into virtual machine hypervisor to simulate
supervisor state instructions). SIE was a state change instruction
that gathered up all the fields needed by various supervisor state
instructions to execute according to "virtual machine" rules. post of
old SIE discussion about implementation issue differences between 3081
and "trout" (3090)
http://www.garlic.com/~lynn/2006j.html#27 virtual memory
there were still things like page faults for the virtual machine that
resulted in interruptions into the hypervisor kernel for handling. a
special case was defined involving things like dedicated real storage
for a virtual machine ... eliminating need to interrupt into the
hypervisor kernel. This resulted in being able to operate a virtual
machine subset directly supported by hardware ... w/o the need for a
virtual machine kernel. This was called "PR/SM" ... and PR/SM
capability eventually evolved into the current LPARs (logical
partitions). a reference discussing some current LPAR and PR/SM
http://researchweb.watson.ibm.com/journal/rd/483/siegel.html
current machines can have a configurable limited number of LPARs ...
and it is possible to run a virtual machine hypervisor in an LPAR,
which in turns supports a much larger number of virtual machines. The
has been an evoluation of the SIE support. Initially, SIE was not
virtualized but LPARs make use of SIE for support. That met that a
virtual machine hypervisor running in an LPAR wouldn't have
performance assist of SIE for running its virtual machines (all
virtual machine supervisor instructions would interrupt into the
hypervisor for simulation). Enhancements were required to virtualize
SIE for at least one level (so it could be used both by LPAR function
and also by hypervisor running in an LPAR).
Since I was doing both VAMPS and ECPS ... I borrowed a lot of stuff
done for ECPS for doing VAMPS. However, for VAMPS, I wanted it
extended in a much more architected way ... rather than simply doing a
1-fo-1 movement of existing kernel 370 code into microcode. VAMPS was
to have up to five processors ... and I defined a microcode hardware
queued work interface where the hypervisor put units of work on the
queued work interface (and the microcode took the queued work and
executed on whatever available processor there were). The hardeware
microcode also placed queued work for the hypervisor to handle ...
like things that were i/o interrupts in traditional 370 or page fault
interrupts (from executing virtual machines), etc.
The VAMPS abstraction of queued work for multiprocessor environment
was somewhat akin to the later defintion found later in i432. Some of
the VAMPS abstraction for i/o work queueing was somewhat akin to what
showed up later for 370-xa i/o operations.
After VAMPS was killed, I adapted the multiprocessing microcode queued
processing to an software implementation. A lot of the SMP kernel
implementations used a single, global kernel SPIN lock to serialize
all kernel execution. This drastically minimized the amount of code
changes to adapt a single-processor operating system to support a
multiprocessor operation.
In adapting the VAMPs multiprocessing microcode support to software, I
took the equivalent kernel software functions (that had been moved to
microcode in VAMPS) and made them multiprocessing parallelized with
fine-grain locking. This amounted to the majority of the software
kernel execution time ... but a relatively small amount of the total
kernel instructions. The majority of the kernel instructions relied on
a somewhat traditional global kernel lock. However, when ever the
"parallized" kernel code required to transition into the "sequential"
kernel code ... rather than "spinning" on the global kernel lock
... it "bounced". If it obtained the global kernel lock, then it
proceeded as normal. If it couldn't obtain the global kernel lock, it
would queue a super lightweight work request ... and go off and look
for other "parallelized" work.
This approach obtained almost all the thruput benefit of having a
kernel fine-grain locking implementation, avoided the degradation of
single kernel spin-lock implementation ... but the kernel code changes
were not significantly more than required for a single kernel
spin-lock implementation. This implementation shipped in VM370 release
four.