Intel-GPU
Prerequisites
- Having your GPU isolated when using a VM
- Passed the GPU to your Talos Machine when using a VM
Extensions for Talhelper/Clustertool
Its important to add the following Extensions to your talconfig.yaml
for bootstrap:
schematic: customization: systemExtensions: officialExtensions: - siderolabs/i915 - siderolabs/intel-ucode - siderolabs/mei
Adding it to your cluster
If its a fresh bootstrap you can simply follow the clustertool guide on how to bootstrap your cluster.
If it is a existing cluster you will need to run clustertool talos upgrade
to add the extensions to your cluster.
Adding Intel Repo for required Charts
Add the following repo to your cluster if using fluxcd:
---# yaml-language-server: $schema=https://kubernetes-schemas.pages.dev/source.toolkit.fluxcd.io/helmrepository_v1.jsonapiVersion: source.toolkit.fluxcd.io/v1kind: HelmRepositorymetadata: name: intel namespace: flux-systemspec: interval: 2h url: https://intel.github.io/helm-charts
Add intel-device-plugin-operator
Add the intel-device-plugin-operator to your cluster Example helm-release configuration:
---# yaml-language-server: $schema=https://kubernetes-schemas.pages.dev/helm.toolkit.fluxcd.io/helmrelease_v2.jsonapiVersion: helm.toolkit.fluxcd.io/v2kind: HelmReleasemetadata: name: intel-device-plugin-operator namespace: systemspec: interval: 30m chart: spec: chart: intel-device-plugins-operator version: 0.32.0 sourceRef: kind: HelmRepository name: intel namespace: flux-system install: crds: CreateReplace remediation: retries: 3 upgrade: cleanupOnFail: true crds: CreateReplace remediation: strategy: rollback retries: 3 dependsOn: - name: node-feature-discovery namespace: kube-system values: controllerExtraArgs: | - --devices=gpu
Add intel-device-plugin-gpu
Add the intel-device-plugin-gpu to your cluster Example helm-release configuration:
---# yaml-language-server: $schema=https://kubernetes-schemas.pages.dev/helm.toolkit.fluxcd.io/helmrelease_v2.jsonapiVersion: helm.toolkit.fluxcd.io/v2kind: HelmReleasemetadata: name: intel-device-plugin-gpu namespace: systemspec: interval: 30m chart: spec: chart: intel-device-plugins-gpu version: 0.32.0 sourceRef: kind: HelmRepository name: intel namespace: flux-system install: remediation: retries: 3 upgrade: cleanupOnFail: true remediation: strategy: rollback retries: 3 dependsOn: - name: intel-device-plugin-operator namespace: system values: name: intel-gpu-plugin sharedDevNum: 5 nodeFeatureRule: true
Check if GPU is schedulable
kubectl get nodes -o=jsonpath="{range .items[*]}{.metadata.name}{'\n'}{' i915: '}{.status.allocatable.gpu\.intel\.com/i915}{'\n'}"
Example of GPU Assignment
The following shows an example on how to add the GPU to a chart. Depending on the chart you may need to adapt the workload-name.
resources: limits: gpu.intel.com/i915: 1