Nvidia Datacenter GPU Manager (DCGM) facilitates GPU health monitoring, performance telemetry, and diagnostics for NVIDIA GPUs on servers. This post outlines installation and setup for RHEL 10, including driver validation and metric visualization.
GPUs
Essential Commands to Monitor Nvidia GPUs in Linux
Identify Your GPU Via the Linux CLI Identify that your card is recognized by the OS via the CLI command below, hwinfo # hwinfo --gfxcard --short graphics card: nVidia TU104GL [Tesla T4] nVidia TU104GL [Tesla T4] Matrox G200eR2 Primary display adapter: #58 Or you can see similar output with lshw # lshw -C display *-display … Continue reading Essential Commands to Monitor Nvidia GPUs in Linux
Ollama CLI Quick Start Guide and Tutorial for Beginners – Part 1
This two-part guide introduces beginners to using Ollama, focusing initially on installation and model pulling. It explains essential commands and service configurations, discusses memory management, and provides troubleshooting tips for ensuring successful model installation and interaction with Ollama through the command line interface.
Selecting a GPU for a Dell T620
This post discusses the top GPU options for the Dell T620 server, focusing on compatibility, power, and thermal considerations. It highlights GPUs like the NVIDIA Tesla T4 and Quadro series while offering tips on optimizing performance through cooling upgrades and PCIe configuration.
Installing the GPU Power Supply Expansion Board into the Dell T620
Introduction I recently picked up a couple of used Dell T602s for my homelab for AI/ML project work. Dell Tower form factor servers are very attractive to homelabbers due to their availability, their low costs, the fact that they are rather low noise, and due to the fact that they are easily expandable. For example, … Continue reading Installing the GPU Power Supply Expansion Board into the Dell T620