Learn how to view Kubernetes costs and efficiency metrics in Vantage.
pvc:xyz
for PVCs and node:xyz
for Nodes.
node:xyz
. Note that PVC Labels are not available for filtering on these reports. Kubernetes Efficiency Reports provide detail into the usage, efficiency, and cost of the underlying compute resources of Kubernetes Workloads. These reports do not include reporting on attached volume storage, where PVC Labels are applied. To report on both compute and storage resources, use Cost Reports.Click to view visual example
Click to view visual example
Click to view visual example
__idle__
Namespace in Kubernetes Efficiency Reports__idle__
namespace represents the unallocated portion of nodes per hour, providing insight into overall cluster efficiency. The __idle__
namespace is included in total cluster costs.
__idle__
namespace is enabled by default for new integrations. Existing integrations that have not already done so can contact support@vantage.sh to have the namespace enabled.__idle__
namespace costs, set the report’s Group By criteria to Namespace. In many cases, __idle__
ranks among the top namespaces in terms of cost. It highlights unused capacity in your cluster and helps identify opportunities for workload optimization.
__idle__
is calculated as the difference between a node’s total capacity and the sum of allocated pod resources. For example, if a node has:
8 CPU / 16 GB RAM
8 CPU / 6 GB RAM
__idle__
are:
__idle__
costs should closely approximate total compute costs for the cluster. Minor discrepancies may occur due to hourly allocation calculations, such as multiple pods running at different times within an hour. In addition, if a node is fully allocated for a short period but mostly idle throughout an hour, __idle__
may not reflect partial usage, leading to some variation in reported costs.
true
in the agent’s values.yaml
: -set agent.gpu.usageMetrics=true
.
The agent also provides some additional GPU configuration options. The defaults match the operator’s defaults. Refer to the agent’s values.yaml
for option configuration details.
dcgm-exporter
to collect custom metrics, retrieve the metrics file and save it as dcgm-metrics.csv
:DCGM_FI_DEV_FB_TOTAL
memory metric to the metrics file:gpu-operator
namespace:--set dcgmExporter.config.name=metrics-config
--set dcgmExporter.env[0].name=DCGM_EXPORTER_COLLECTORS --set dcgmExporter.env[0].value=/etc/dcgm-exporter/dcgm-metrics.csv
.