Discussion:
IPL times
(too old to reply)
Richard Schuh
2006-02-18 17:12:45 UTC
Permalink
Colin Allinson wrote:

>
> We run 6 LPARS on a busy 6 way 2084 and have a VERY large number of
> devices (mostly DASD).
>
> IPL times can be very long ( 90 minutes, or more, for low priority
> LPAR at busy times).
>
> My initial reaction was that this is caused by the time taken to sense
> devices and build the device blocks. The length of time a system will
> hang, (more than a minute for 256 devices), when building a range of
> device blocks when they are dynamically defined seems to support this.
>
> I have been trying to tune this using devices NOTACCEPT/ACCEPT list in
> the SYSTEM CONFIG to eliminate unnecessary devices but the results
> have been inconsistent and a little confusing.
>
> Last night I tried multiple IPL's on a small system in a medium/low
> priority LPAR. Before any tuning this IPL'd 2 weeks ago in 30 minutes
> in the relatively low load evening period I was using for my tests.
> For these tests I had set :-
> - Devices Notaccept 0000-FFFF
> - Devices Accept nnnn-nnnn Multiple specific ranges
> - Devices Notsensed 0000-FFFF
> - Devices Sensed nnnn-nnnn Same multiple ranges
> as above
> - Devices Online_at_IPL 0000-FFFF
> - SET RDEV for each range.
> There are just over 12000 recognised devices (mostly DASD)
>
> During the tests I was watching the System Activity Display on the
> HMC. Throughout the test the overall cpu activity was between 60-80%
> and there was no excessive channel activity shown. However, the system
> being IPL'd did use huge amount of processor resources (up to 360% of
> its allocated share).
>
> In the first test the IPL completed in 6 minutes - great result but
> some of my SET RDEV statements were in error and ignored.
> Second IPL with these corrected took 26mins 35secs
> Third IPL with the RDEV statements (that were originally ignored)
> removed took 31 mins 45 sec.
> Later on a larger system with many more devices and no tuning in the
> system config IPL'd in 20 minutes.
>
> My questions are :-
> - Am I just fooling myself that restricting the device range
> will help (i.e. the 6 mins was anomalous)?
> - Is the time taken to sense a device the same if it exists or
> not?
> - Are there any other suggestions of things I can do to speed
> up IPL time?
>
> Any advice would be much appreciated.
>
> Colin Allinson
> Amadeus Data Processing

There was a thread about this problem not too long ago. We have a large
number of dasd devices, too and do not have your problem. The 6 minute
figure is about normal for us. A good deal of that time is during
initialization after the ipl - spool (we have several thousand files in
spool at any given time), SFS and VTAM are major offenders. I have
configured many devices that are z/OS only Offline_at_ipl, but none are
ignored. I have specified 0000-FFFF as sensed and a large number of
Offline_at_ipl ranges. There are many more offline than are online, but
all are sensed.

What you are describing as hangs could be the system waiting for channel
time-out caused by improperly terminated channels. I don't know the
timeout interval for the newer systems, but it used to be 7 seconds per
device. The timeout would only occur for devices that are not present on
a channel. If you check the prior thread, there are other suggestions.
Probably you will find the answer there.
Ranga Nathan
2006-02-19 01:58:39 UTC
Permalink
I had a similar issue when I was trying to bring up a test z/VM on an
under-resourced LPAR with cpu capped to 10%. It took a long time to IPL.
After restricting the devices, the IPL time came down to about a minute.

Colin Allinson wrote:
>
> We run 6 LPARS on a busy 6 way 2084 and have a VERY large number of
> devices (mostly DASD).
>
> IPL times can be very long ( 90 minutes, or more, for low priority
> LPAR at busy times).
>
> My initial reaction was that this is caused by the time taken to sense
> devices and build the device blocks. The length of time a system will
> hang, (more than a minute for 256 devices), when building a range of
> device blocks when they are dynamically defined seems to support this.
>
> I have been trying to tune this using devices NOTACCEPT/ACCEPT list in
> the SYSTEM CONFIG to eliminate unnecessary devices but the results
> have been inconsistent and a little confusing.
>
> Last night I tried multiple IPL's on a small system in a medium/low
> priority LPAR. Before any tuning this IPL'd 2 weeks ago in 30 minutes
> in the relatively low load evening period I was using for my tests.
> For these tests I had set :-
> - Devices Notaccept 0000-FFFF
> - Devices Accept nnnn-nnnn Multiple specific ranges
> - Devices Notsensed 0000-FFFF
> - Devices Sensed nnnn-nnnn Same multiple ranges
> as above
> - Devices Online_at_IPL 0000-FFFF
> - SET RDEV for each range.
> There are just over 12000 recognised devices (mostly DASD)
>
> During the tests I was watching the System Activity Display on the
> HMC. Throughout the test the overall cpu activity was between 60-80%
> and there was no excessive channel activity shown. However, the system
> being IPL'd did use huge amount of processor resources (up to 360% of
> its allocated share).
>
> In the first test the IPL completed in 6 minutes - great result but
> some of my SET RDEV statements were in error and ignored.
> Second IPL with these corrected took 26mins 35secs
> Third IPL with the RDEV statements (that were originally ignored)
> removed took 31 mins 45 sec.
> Later on a larger system with many more devices and no tuning in the
> system config IPL'd in 20 minutes.
>
> My questions are :-
> - Am I just fooling myself that restricting the device range
> will help (i.e. the 6 mins was anomalous)?
> - Is the time taken to sense a device the same if it exists or
> not?
> - Are there any other suggestions of things I can do to speed
> up IPL time?
>
> Any advice would be much appreciated.
>
> Colin Allinson
> Amadeus Data Processing

--
__________________
Ranga Nathan
Work: 714-442-7591
Kris Buelens
2006-02-19 09:01:07 UTC
Permalink
During the tests for migration to from 2x9672 to z9, I noted big
differences too, but I didn't time it. Our partiotns have access to old
DASD and new DASD, both have about 2000 addresses. Much less than you.
- when I have the old DASD all as NOT SENSED, the IPL is very fast
- with the old DASD sensed, the system indeed seems to hange quite a
while, much more than twice as long
The difference: the old DASD have one escon channel to each control unit,
the new DASD have 4 ficon channels per control unit. I thought that the
lack of multiple paths was causing delays (when switching to production
the old DASDs will get 5 escon channels)

Kris,
IBM Belgium, VM customer support
Colin Allinson
2006-02-19 13:31:12 UTC
Permalink
I was not very clear. My timings are from pressing PF10 on the SAPL panel
to the first line of the IPL sequence appearing on the console - so this
excludes any time taken by spool initialisation.

The fact that you have a large config also and are not experiencing this
problem gives me some hope of a resolution.

I am not sure how improperly terminated channels works with ESCON & FICON
as there is no chaining involved but I will certainly check this out.

I will also try to search for a previous thread on this. Any ideas for a
keyword?

Colin Allinson

Richard Schuh wrote:

> There was a thread about this problem not too long ago. We have a large
> number of dasd devices, too and do not have your problem. The 6 minute
> figure is about normal for us. A good deal of that time is during
> initialization after the ipl - spool (we have several thousand files in
> spool at any given time), SFS and VTAM are major offenders. I have
> configured many devices that are z/OS only Offline_at_ipl, but none are

> ignored. I have specified 0000-FFFF as sensed and a large number of
> Offline_at_ipl ranges. There are many more offline than are online, but
> all are sensed.

> What you are describing as hangs could be the system waiting for channel

> time-out caused by improperly terminated channels. I don't know the
> timeout interval for the newer systems, but it used to be 7 seconds per
> device. The timeout would only occur for devices that are not present on

> a channel. If you check the prior thread, there are other suggestions.
> Probably you will find the answer there.
RickE
2006-02-19 18:31:25 UTC
Permalink
Colin Allinson wrote:
> I am not sure how improperly terminated channels works with ESCON & FICON
> as there is no chaining involved but I will certainly check this out.

We recently removed some old 3480 tape drives that were attached to an
ESCON channel via a 9034 ESCON converter. The 3480s were not removed
from the IOCDS (an oversight), but this was not a problem as long as
the ESCON cable was still attached to the 9034. When the 9034 was
removed and the ESCON cable dropped under the floor, the next IPL of VM
experienced a long delay before messages started to appear on the
operator console. I didn't time it, but I'd guess that the delay was 6
or 7 minutes. Scanning the hardware messages gave plenty of clues as
to the source of the problem, and once the 3480 section was removed
from the IOCDS, the next VM IPL proceeded normally.

Rick Ekblaw
Loading...