Detailed Course Outline
Module 1: Install Cluster
- Describe HPCM features
- Define operating system slots
- Build cluster from ground up
- Provision node with GUI
- Provision node with command line
- Add nodes to the cluster
- Explore auto installation tools
Module 2: Discover
- Discover nodes
- Interpret cluster configuration files
- Review cluster services
Module 3: Data Networks
- Describe technologies
- Describe InfiniBand configuration
- Describe Intel Omni-Path configuration
- Describe software components
- Use diagnostic commands
Module 4: Manage Images
- Manage software repositories
- List software repositories
- Add software repositories
- Remove software repositories
- Create repository groups
- Customize an image by using RPM lists
- Create a compute node image
- Create an ICE-compute node image
- Manage image version control
- Check in an image into version control
- Compare differences between two versions of an image
- List the versions of an image
- Deploy a specific version of an image
- Push an ICE-compute image to a rack
- Use parallel tools and inbuilt functionality to check differences between nodes
- Install batch scheduler server on a compute node
- Install batch scheduler client on a compute node and in ICE compute node
- Configure HPCM connectors to job schedulers
- Capture an image from a node (golden)
- Add RPMs to, remove RPMs from, and version control compute images
- Add and remove RPMs from running compute nodes
- Clone an ICE-compute image
- Add RPMs to ICE compute image Compare when and when not to use tmpfs root
- Determine which nodes use tmpfs root
- Configure nodes to use tmpfs root
- List tmpfs quota difference (rack leader quotas do not apply when ICE-compute nodes are in tmpfs)
- Set tmpfs mode
- Set disk mode
- Show which mode a node has booted with
- Show which mode a node is scheduled to boot into
Module 5: Automate Post Installation Tasks
- Review conf.d scripts
- Exclude a conf.d script
- Use
- Use
- Develop post install and per-host customization scripts
Module 6: Configure Shared Filesystem, User Accounts, Applications, and Updates NFS Export a filesystem on a compute node
- Mount an NFS filesystem and create a user on an ICE compute node
- Manage user accounts
- Synchronize UIDs and GIDs, LDAP, etc.
- Run an application on compute and ICE compute nodes
- Display BIOS settings
- Upgrade firmware
- Update kernel
- Update distribution
- Update HPCM
Module 7: Troubleshoot Cluster
- Backup cluster configuration
- Backup managed network switch configuration
- Use the central log repository
- Investigate log files
- Gather system information
- Interrogate iLOs, BMCs
- Confirm resources
- Create pdsh groups
- Investigate bond devices
- Inspect VLAN devices
- Capture a node crash dump
- Transfer an image from another slot or another system and confirm that the image can be used.
- Inject faults