The document details the installation and configuration of three Mellanox ConnectX-4 Adapters across multiple servers. It covers verifying detection, driver loading, InfiniBand setup, subnet management, and IP over InfiniBand configuration for effective connectivity and testing in a lab environment.
RHEL9
Project “NVIDIA HPC Infiniband Homelab GPU Cluster”: Part 1: Project Overview
Introduction InfiniBand is a mature interconnect technology known for high bandwidth and low latency. It has long been used in supercomputing and HPC environments, and has also been deployed in certain storage and clustered infrastructure designs as an alternative to Fibre Channel. More recently, InfiniBand has seen strong continued adoption in large-scale AI and GPU … Continue reading Project “NVIDIA HPC Infiniband Homelab GPU Cluster”: Part 1: Project Overview
Configuring LACP on TP-Link SX3008F for RHEL 9/10
This guide details setting up three LACP port-channels on a TP-Link SX3008F switch for RHEL 9/10 hosts, enabling 20Gbe connectivity for efficient NFS backups.
Dell OpenManage Server Administrator: Comprehensive Guide for Hardware Monitoring (RHEL)(Dell 12 Gen)
Dell OpenManage Server Administrator (OMSA) is Dell’s on-host hardware management and monitoring framework for PowerEdge servers. It runs inside the operating system and provides direct visibility into system hardware such as RAID controllers, physical and virtual disks, power supplies, fans, temperatures, memory, processors, and chassis health. OMSA communicates with the server’s iDRAC and hardware controllers … Continue reading Dell OpenManage Server Administrator: Comprehensive Guide for Hardware Monitoring (RHEL)(Dell 12 Gen)
Step-by-Step Nvidia Driver, CUDA Toolkit, & Container Toolkit Install for RHEL9
Step-by-Step Guide to install Nvidia Drivers, Nvidia CUDA Toolkit, & Container Toolkit on Red Hat Enterprise Linux 9