The document details the installation and configuration of three Mellanox ConnectX-4 Adapters across multiple servers. It covers verifying detection, driver loading, InfiniBand setup, subnet management, and IP over InfiniBand configuration for effective connectivity and testing in a lab environment.
nvidia
Project “NVIDIA HPC Infiniband Homelab GPU Cluster”: Part 1: Project Overview
Introduction InfiniBand is a mature interconnect technology known for high bandwidth and low latency. It has long been used in supercomputing and HPC environments, and has also been deployed in certain storage and clustered infrastructure designs as an alternative to Fibre Channel. More recently, InfiniBand has seen strong continued adoption in large-scale AI and GPU … Continue reading Project “NVIDIA HPC Infiniband Homelab GPU Cluster”: Part 1: Project Overview
How to Set Up NVIDIA CUDA and Container Toolkits on RHEL 10
This guide details installing NVIDIA drivers, CUDA Toolkit, and Container Toolkit on RHEL 10.1, utilizing simplified methods and new repositories for streamlined setup and verification processes.
RHEL 10 – Enable Health Monitoring for NVIDIA GPUs Using DCGM Exporter
Nvidia Datacenter GPU Manager (DCGM) facilitates GPU health monitoring, performance telemetry, and diagnostics for NVIDIA GPUs on servers. This post outlines installation and setup for RHEL 10, including driver validation and metric visualization.
Step-by-Step Nvidia Driver, CUDA Toolkit, & Container Toolkit Install for RHEL9
Step-by-Step Guide to install Nvidia Drivers, Nvidia CUDA Toolkit, & Container Toolkit on Red Hat Enterprise Linux 9
Resetting a Lost BMC Password with ipmitool
I recently got my hands on a couple of gigabyte servers. These machines came preinstalled with Ubuntu 20.04. Credentials for a OS local user account were on a sticker on the machines. However there was no indication of what the BMC credentials were. According to this document, there should be default credentials that we can … Continue reading Resetting a Lost BMC Password with ipmitool
Simple RAG with Ollama, OpenWebUI, and VectorDB on Ubuntu 22.04
Prerequisites I have already installed Nvidia proprietary drivers and the Nvidia Cuda Toolkit. I documented the install of the Cuda toolkit in an older post which can be found here. Since I have Nvidia GPUs in my host system, and I intend to run some services in containers, I want to make sure that I … Continue reading Simple RAG with Ollama, OpenWebUI, and VectorDB on Ubuntu 22.04