Basic Tuning of RDMA Parameters for Spectrum Scale

If your cluster has symptoms of overload and GPFS kept reporting “overloaded” in GPFS logs like the ones below, you might get long waiters and sometimes deadlocks.

Wed Apr 11 15:53:44.232 2018: [I] Sending 'overloaded' status to the entire cluster
Wed Apr 11 15:55:24.488 2018: [I] Sending 'overloaded' status to the entire cluster
Wed Apr 11 15:57:04.743 2018: [I] Sending 'overloaded' status to the entire cluster
Wed Apr 11 15:58:44.998 2018: [I] Sending 'overloaded' status to the entire cluster
Wed Apr 11 16:00:25.253 2018: [I] Sending 'overloaded' status to the entire cluster
Wed Apr 11 16:28:45.601 2018: [I] Sending 'overloaded' status to the entire cluster
Wed Apr 11 16:33:56.817 2018: [N] sdrServ: Received deadlock notification from

Increase scatterBuffersize to a Number that match IB Fabric
One of the first tuning will be to tune the scatterBufferSize. According to the wiki, FDR10 can be tuned to 131072 and FDR14 can be tuned to 262144

The default value of 32768 may perform OK. If the CPU utilization on the NSD IO servers is observed to be high and client IO performance is lower than expected, increasing the value of scatterBufferSize on the clients may improve performance.

# mmchconfig scatterBufferSize=131072

There are other parameters which can be tuned. But the scatterBufferSize worked immediately for me.
verbsRdmaSend
verbsRdmasPerConnection
verbsRdmasPerNode

Disable  verbsRdmaSend=no

# mmchconfig verbsRdmaSend=no -N nsd1,nsd2

Verify settings has taken place

# mmfsadm dump config | grep verbsRdmasPerNode

Increase verbsRdmasPerNode to 514 for NSD Nodes

# mmchonfig verbsRdmasPerNode=514 -N nsd1,nsd2

References:

  1. Best Practices RDMA Tuning
Advertisements

Cannot initialize RDMA protocol on Cluster with Platform LSF

If you encounter this issue during an application run and your scheduler used is Platform LSF. There is a simple solution.

Symptoms

explicit_dp: Rank 0:13: MPI_Init_thread: didn't find active interface/port
explicit_dp: Rank 0:13: MPI_Init_thread: Can't initialize RDMA device
explicit_dp: Rank 0:13: MPI_Init_thread: Internal Error: Cannot initialize RDMA protocol
MPI Application rank 13 exited before MPI_Init() with status 1
mpirun: Broken pipe

Cause:
In this case the amount of locked memory was set to unlimited in /etc/security/limits.conf, but this was not sufficient.
The MPI jobs were started under LSF, but the lsf daemons were started with very small memory locked limits.

Solution:
Set the amount of locked memory to unlimited in /etc/init.d/lsf by adding the ‘ulimit -l unlimited’ command.

.....
.....
### END INIT INFO
ulimit -l unlimited
. /opt/lsf/conf/profile.lsf
.....
.....

References:

  1. HP HPC Linux Value Pack 3.1 – Platform MPI job failed

Disable SElinux in CentOS 7

1. Check the SELinux Status on CentOS 7

# sestatus
SELinux status: enabled
SELinuxfs mount: /sys/fs/selinux
SELinux root directory: /etc/selinux
Loaded policy name: targeted
Current mode: enforcing
Mode from config file: enforcing
Policy MLS status: enabled
Policy deny_unknown status: allowed
Max kernel policy version: 28

2. Disable SElinux Temporarily

# setenforce 0

2a. Check Status

# sestatus
SELinux status: enabled
SELinuxfs mount: /sys/fs/selinux
SELinux root directory: /etc/selinux
Loaded policy name: targeted
Current mode: permissive
Mode from config file: permissive
Policy MLS status: enabled
Policy deny_unknown status: allowed
Max kernel policy version: 28

3. Disable SElinux Permanently

# vim /etc/sysconfig/selinux/config
# This file controls the state of SELinux on the system.
# SELINUX= can take one of these three values:
# enforcing - SELinux security policy is enforced.
# permissive - SELinux prints warnings instead of enforcing.
# disabled - No SELinux policy is loaded.
SELINUX=disabled
# SELINUXTYPE= can take one of three two values:
# targeted - Targeted processes are protected,
# minimum - Modification of targeted policy. Only selected processes are protected.
# mls - Multi Level Security protection.
SELINUXTYPE=targeted

3a. Check Status

# sestatus
SELinux status: disabled

Setting up NTP in CentOS 7

Step 1: Yum Install

NTP can be installed from the CentOS repositories with yum

# yum install ntp

Step 2: Edit the Public Time Servers

Once you have installed ntp package, go to official NTP Public Pool Time Servers . For Singapore, you can use this specific pool zone, add the following to your ntp.conf file:

server 0.sg.pool.ntp.org iburst
server 1.sg.pool.ntp.org iburst
server 2.sg.pool.ntp.org iburst
server 3.sg.pool.ntp.org iburst

Step 3: Allow the clients from the network to sync with this server

Restrict which clients from which network is allowed to query and sync time

restrict 192.168.5.0 netmask 255.255.224.0 nomodify notrap

Step 4: Record all NTP server issues into one dedicated log file. Edit /etc/ntp.conf

logfile /var/log/ntp.log

Step 5: Add Firewall Rule and Start Services

# firewall-cmd --add-service=ntp --permanent
# firewall-cmd --reload
# systemctl start ntpd
# systemctl enable ntpd
# systemctl status ntpd

Step 6: Verify Time Sync

# ntpq -p
# date -R

Or query or synchronize against a selected pool of time servers

# ntpdate -q 0.sg.pool.ntp.org

References:

  1. Setting Up “NTP (Network Time Protocol) Server” in RHEL/CentOS 7 (by Tecmint.com)

Set hostname using hostnamectl for CentOS 7

1. Listing hostname using “hostnamectl” or “hostnamectl status”

[root@localhost ~]# hostnamectl
Static hostname: helloworld.com
Icon name: computer-server
Chassis: server
Machine ID: aaaaaaaaaaaaa
Boot ID: ddddddddddd
Operating System: CentOS Linux 7 (Core)
CPE OS Name: cpe:/o:centos:centos:7
Kernel: Linux 3.10.0-327.el7.x86_64
Architecture: x86-64

2.Setting static host-name using hostnamectl

# hostnamectl set-hostname "helloworld.com" --static

3. Delete static host-nameusing hostnamectl

# hostnamectl set-hostname "" --static

IBM Spectrum Scale Development Blogs for (Q1 2018)

Here are list of development blogs in the this quarter (Q1 2018). As discussed in User Groups, passing it along:

GDPR Compliance and Unstructured Data Storage
https://developer.ibm.com/storage/2018/03/27/gdpr-compliance-unstructure-data-storage/

IBM Spectrum Scale for Linux on IBM Z ? Release 5.0 features and highlights
https://developer.ibm.com/storage/2018/03/09/ibm-spectrum-scale-linux-ibm-z-release-5-0-features-highlights/

Management GUI enhancements in IBM Spectrum Scale release 5.0.0
https://developer.ibm.com/storage/2018/01/18/gui-enhancements-in-spectrum-scale-release-5-0-0/

IBM Spectrum Scale 5.0.0 ? What?s new in NFS?
https://developer.ibm.com/storage/2018/01/18/ibm-spectrum-scale-5-0-0-whats-new-nfs/

Benefits and implementation of Spectrum Scale sudo wrappers
https://developer.ibm.com/storage/2018/01/15/benefits-implementation-spectrum-scale-sudo-wrappers/

IBM Spectrum Scale: Big Data and Analytics Solution Brief
https://developer.ibm.com/storage/2018/01/15/ibm-spectrum-scale-big-data-analytics-solution-brief/

Variant Sub-blocks in Spectrum Scale 5.0
https://developer.ibm.com/storage/2018/01/11/spectrum-scale-variant-sub-blocks/

Compression support in Spectrum Scale 5.0.0
https://developer.ibm.com/storage/2018/01/11/compression-support-spectrum-scale-5-0-0/

IBM Spectrum Scale Versus Apache Hadoop HDFS
https://developer.ibm.com/storage/2018/01/10/spectrumscale_vs_hdfs/

ESS Fault Tolerance
https://developer.ibm.com/storage/2018/01/09/ess-fault-tolerance/

Genomic Workloads ? How To Get it Right From Infrastructure Point Of View.
https://developer.ibm.com/storage/2018/01/06/genomic-workloads-get-right-infrastructure-point-view/

IBM Spectrum Scale On AWS Cloud: This video explains how to deploy IBM Spectrum Scale on AWS. This solution helps the users who require highly available access to a shared name space across multiple instances with good performance, without requiring an in-depth knowledge of IBM Spectrum Scale.

Detailed Demo : https://www.youtube.com/watch?v=6j5Xj_d0bh4
Brief Demo : https://www.youtube.com/watch?v=-aMQKPW_RfY