Platform LSF – Working with Hosts (bhost, lsload, lsmon)


Host status

Host status describes the ability of a host to accept and run batch jobs in terms of daemon states, load levels, and administrative controls. The bhosts and lsload commands display host status.

 

1. bhosts
Displays the current status of the host

STATUS DESCRIPTION
ok  Host is available to accept and run new batch jobs
unavail  Host is down, or LIM and sbatchd are unreachable.
unreach  LIM is running but sbatchd is unreachable.
closed  Host will not accept new jobs. Use bhosts -l to display the reasons.
unlicensed Host does not have a valid license.

 

2. bhosts -l
Displays the closed reasons. A closed host does not accept new batch jobs:

$ bhosts -l
HOST  node001
STATUS           CPUF  JL/U    MAX  NJOBS    RUN  SSUSP  USUSP    RSV DISPATCH_WINDOW
closed_Adm      60.00     -     16      0      0      0      0      0      -

CURRENT LOAD USED FOR SCHEDULING:
r15s   r1m  r15m    ut    pg    io   ls    it   tmp   swp   mem   root maxroot
Total           0.0   0.0   0.0    0%   0.0     0    0 28656  324G   16G   60G  3e+05   4e+05
Reserved        0.0   0.0   0.0    0%   0.0     0    0     0    0M    0M    0M    0.0     0.0

processes clockskew netcard iptotal  cpuhz cachesize diskvolume
Total             404.0       0.0     2.0     2.0 1200.0     2e+04      5e+05
Reserved            0.0       0.0     0.0     0.0    0.0       0.0        0.0

processesroot   ipmi powerconsumption ambienttemp cputemp
Total                 396.0   -1.0             -1.0        -1.0    -1.0
Reserved                0.0    0.0              0.0         0.0     0.0


aa_r aa_r_dy aa_dy_p aa_r_ad aa_r_hpc fluentall fluent fluent_nox
Total         17.0    25.0   128.0    10.0    272.0      48.0   48.0       50.0
Reserved       0.0     0.0     0.0     0.0      0.0       0.0    0.0        0.0

gambit geom_trans tgrid fluent_par
Total           50.0       50.0  50.0      193.0
Reserved         0.0        0.0   0.0        0.0

 

3. bhosts -X

Condensed host groups in an condensed format

$ bhosts -X
HOST_NAME          STATUS       JL/U    MAX  NJOBS    RUN  SSUSP  USUSP    RSV
comp027            ok              -     16      0      0      0      0      0
comp028            ok              -     16      0      0      0      0      0
comp029            ok              -     16      0      0      0      0      0
comp030            ok              -     16      0      0      0      0      0
comp031            ok              -     16      0      0      0      0      0
comp032            ok              -     16      0      0      0      0      0
comp033            ok              -     16      0      0      0      0      0

 

4. bhosts -l hostID

Display all information about specific server host such as the CPU factor and the load thresholds to start, suspend, and resume jobs

# bhosts -l comp067
HOST  comp067
STATUS           CPUF  JL/U    MAX  NJOBS    RUN  SSUSP  USUSP    RSV DISPATCH_WINDOW
ok              60.00     -     16      0      0      0      0      0      -

CURRENT LOAD USED FOR SCHEDULING:
r15s   r1m  r15m    ut    pg    io   ls    it   tmp   swp   mem   root maxroot
Total           0.0   0.0   0.0    0%   0.0     0    0 13032  324G   16G   60G  3e+05   4e+05
Reserved        0.0   0.0   0.0    0%   0.0     0    0     0    0M    0M    0M    0.0     0.0

processes clockskew netcard iptotal  cpuhz cachesize diskvolume
Total             406.0       0.0     2.0     2.0 1200.0     2e+04      5e+05
Reserved            0.0       0.0     0.0     0.0    0.0       0.0        0.0

processesroot   ipmi powerconsumption ambienttemp cputemp
Total                 399.0   -1.0             -1.0        -1.0    -1.0
Reserved                0.0    0.0              0.0         0.0     0.0

aa_r aa_r_dy aa_dy_p aa_r_ad aa_r_hpc fluentall fluent fluent_nox
Total         18.0    25.0   128.0    10.0    272.0      47.0   47.0       50.0
Reserved       0.0     0.0     0.0     0.0      0.0       0.0    0.0        0.0

gambit geom_trans tgrid fluent_par
Total           50.0       50.0  50.0      193.0
Reserved         0.0        0.0   0.0        0.0

LOAD THRESHOLD USED FOR SCHEDULING:
r15s   r1m  r15m   ut      pg    io   ls    it    tmp    swp    mem
loadSched   -     -     -     -       -     -    -     -     -      -      -
loadStop    -     -     -     -       -     -    -     -     -      -      -

root maxroot processes clockskew netcard iptotal   cpuhz cachesize
loadSched     -       -         -         -       -       -       -         -
loadStop      -       -         -         -       -       -       -         -

diskvolume processesroot    ipmi powerconsumption ambienttemp cputemp
loadSched        -             -       -                -           -       -
loadStop         -             -       -                -           -       -

 

5. lsload

[user1@login1 ~]$ lsload
HOST_NAME       status  r15s   r1m  r15m   ut    pg  ls    it   tmp   swp   mem
login1          ok   0.0   0.0   0.0   1%   0.0  17     0  240G   16G   28G
login2          ok   0.0   0.0   0.0   0%   0.0   0  7040  242G   16G   28G
node1           ok   0.0   0.4   0.3   0%   0.0   0 31760  324G   16G   60G

Displays the current state of the host:

STATUS DESCRIPTION
ok Host is available to accept and run batch jobs and remote tasks.
-ok LIM is running but RES is unreachable.
busy Does not affect batch jobs, only used for remote task placement (i.e., lsrun). The value of a load index exceeded a threshold (configured in lsf.cluster.cluster_name, displayed by lshosts -l). Indices that exceed thresholds are identified with an asterisk (*).
lockW Does not affect batch jobs, only used for remote task placement (i.e., lsrun). Host is locked by a run window (configured in lsf.cluster.cluster_name, displayed by lshosts -l).
lockU Will not accept new batch jobs or remote tasks. An LSF administrator or root explicitly locked the host using lsadmin limlock, or an exclusive batch job (bsub -x) is running on the host. Running jobs are not affected. Use lsadmin limunlock to unlock LIM on the local host.
unavail Host is down, or LIM is unavailable.

 

6. lshosts -l
The lshosts command shows the load thresholds.

$ lshosts -l
HOST_NAME:  comp001
type             model  cpuf ncpus ndisks maxmem maxswp maxtmp rexpri server nprocs ncores nthreads
X86_64     Intel_EM64T  60.0    16      1    63G    16G 352423M      0    Yes      2      8        1

RESOURCES: Not defined
RUN_WINDOWS:  (always open)

LOAD_THRESHOLDS:
r15s   r1m  r15m   ut    pg    io   ls   it   tmp   swp   mem   root maxroot processes clockskew netcard iptotal  cpuhz cachesize diskvolume processesroot   ipmi powerconsumption ambienttemp cputemp
-   3.5     -    -     -     -    -    -     -     -     -      -       -         -         -       -       -      -         -          -             -      -                -           -       -

 

7. References:

  1. Platform – Working with hosts
Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s