Richard Schuh
2006-02-18 17:12:45 UTC
Colin Allinson wrote:
>
> We run 6 LPARS on a busy 6 way 2084 and have a VERY large number of
> devices (mostly DASD).
>
> IPL times can be very long ( 90 minutes, or more, for low priority
> LPAR at busy times).
>
> My initial reaction was that this is caused by the time taken to sense
> devices and build the device blocks. The length of time a system will
> hang, (more than a minute for 256 devices), when building a range of
> device blocks when they are dynamically defined seems to support this.
>
> I have been trying to tune this using devices NOTACCEPT/ACCEPT list in
> the SYSTEM CONFIG to eliminate unnecessary devices but the results
> have been inconsistent and a little confusing.
>
> Last night I tried multiple IPL's on a small system in a medium/low
> priority LPAR. Before any tuning this IPL'd 2 weeks ago in 30 minutes
> in the relatively low load evening period I was using for my tests.
> For these tests I had set :-
> - Devices Notaccept 0000-FFFF
> - Devices Accept nnnn-nnnn Multiple specific ranges
> - Devices Notsensed 0000-FFFF
> - Devices Sensed nnnn-nnnn Same multiple ranges
> as above
> - Devices Online_at_IPL 0000-FFFF
> - SET RDEV for each range.
> There are just over 12000 recognised devices (mostly DASD)
>
> During the tests I was watching the System Activity Display on the
> HMC. Throughout the test the overall cpu activity was between 60-80%
> and there was no excessive channel activity shown. However, the system
> being IPL'd did use huge amount of processor resources (up to 360% of
> its allocated share).
>
> In the first test the IPL completed in 6 minutes - great result but
> some of my SET RDEV statements were in error and ignored.
> Second IPL with these corrected took 26mins 35secs
> Third IPL with the RDEV statements (that were originally ignored)
> removed took 31 mins 45 sec.
> Later on a larger system with many more devices and no tuning in the
> system config IPL'd in 20 minutes.
>
> My questions are :-
> - Am I just fooling myself that restricting the device range
> will help (i.e. the 6 mins was anomalous)?
> - Is the time taken to sense a device the same if it exists or
> not?
> - Are there any other suggestions of things I can do to speed
> up IPL time?
>
> Any advice would be much appreciated.
>
> Colin Allinson
> Amadeus Data Processing
There was a thread about this problem not too long ago. We have a large
number of dasd devices, too and do not have your problem. The 6 minute
figure is about normal for us. A good deal of that time is during
initialization after the ipl - spool (we have several thousand files in
spool at any given time), SFS and VTAM are major offenders. I have
configured many devices that are z/OS only Offline_at_ipl, but none are
ignored. I have specified 0000-FFFF as sensed and a large number of
Offline_at_ipl ranges. There are many more offline than are online, but
all are sensed.
What you are describing as hangs could be the system waiting for channel
time-out caused by improperly terminated channels. I don't know the
timeout interval for the newer systems, but it used to be 7 seconds per
device. The timeout would only occur for devices that are not present on
a channel. If you check the prior thread, there are other suggestions.
Probably you will find the answer there.
>
> We run 6 LPARS on a busy 6 way 2084 and have a VERY large number of
> devices (mostly DASD).
>
> IPL times can be very long ( 90 minutes, or more, for low priority
> LPAR at busy times).
>
> My initial reaction was that this is caused by the time taken to sense
> devices and build the device blocks. The length of time a system will
> hang, (more than a minute for 256 devices), when building a range of
> device blocks when they are dynamically defined seems to support this.
>
> I have been trying to tune this using devices NOTACCEPT/ACCEPT list in
> the SYSTEM CONFIG to eliminate unnecessary devices but the results
> have been inconsistent and a little confusing.
>
> Last night I tried multiple IPL's on a small system in a medium/low
> priority LPAR. Before any tuning this IPL'd 2 weeks ago in 30 minutes
> in the relatively low load evening period I was using for my tests.
> For these tests I had set :-
> - Devices Notaccept 0000-FFFF
> - Devices Accept nnnn-nnnn Multiple specific ranges
> - Devices Notsensed 0000-FFFF
> - Devices Sensed nnnn-nnnn Same multiple ranges
> as above
> - Devices Online_at_IPL 0000-FFFF
> - SET RDEV for each range.
> There are just over 12000 recognised devices (mostly DASD)
>
> During the tests I was watching the System Activity Display on the
> HMC. Throughout the test the overall cpu activity was between 60-80%
> and there was no excessive channel activity shown. However, the system
> being IPL'd did use huge amount of processor resources (up to 360% of
> its allocated share).
>
> In the first test the IPL completed in 6 minutes - great result but
> some of my SET RDEV statements were in error and ignored.
> Second IPL with these corrected took 26mins 35secs
> Third IPL with the RDEV statements (that were originally ignored)
> removed took 31 mins 45 sec.
> Later on a larger system with many more devices and no tuning in the
> system config IPL'd in 20 minutes.
>
> My questions are :-
> - Am I just fooling myself that restricting the device range
> will help (i.e. the 6 mins was anomalous)?
> - Is the time taken to sense a device the same if it exists or
> not?
> - Are there any other suggestions of things I can do to speed
> up IPL time?
>
> Any advice would be much appreciated.
>
> Colin Allinson
> Amadeus Data Processing
There was a thread about this problem not too long ago. We have a large
number of dasd devices, too and do not have your problem. The 6 minute
figure is about normal for us. A good deal of that time is during
initialization after the ipl - spool (we have several thousand files in
spool at any given time), SFS and VTAM are major offenders. I have
configured many devices that are z/OS only Offline_at_ipl, but none are
ignored. I have specified 0000-FFFF as sensed and a large number of
Offline_at_ipl ranges. There are many more offline than are online, but
all are sensed.
What you are describing as hangs could be the system waiting for channel
time-out caused by improperly terminated channels. I don't know the
timeout interval for the newer systems, but it used to be 7 seconds per
device. The timeout would only occur for devices that are not present on
a channel. If you check the prior thread, there are other suggestions.
Probably you will find the answer there.