1 .. SPDX-License-Identifier: BSD-3-Clause
2 Copyright(c) 2010-2014 Intel Corporation.
4 VM Power Management Application
5 ===============================
10 Applications running in Virtual Environments have an abstract view of
11 the underlying hardware on the Host, in particular applications cannot see
12 the binding of virtual to physical hardware.
13 When looking at CPU resourcing, the pinning of Virtual CPUs(vCPUs) to
14 Host Physical CPUs(pCPUS) is not apparent to an application
15 and this pinning may change over time.
16 Furthermore, Operating Systems on virtual machines do not have the ability
17 to govern their own power policy; the Machine Specific Registers (MSRs)
18 for enabling P-State transitions are not exposed to Operating Systems
19 running on Virtual Machines(VMs).
21 The Virtual Machine Power Management solution shows an example of
22 how a DPDK application can indicate its processing requirements using VM local
23 only information(vCPU/lcore, etc.) to a Host based Monitor which is responsible
24 for accepting requests for frequency changes for a vCPU, translating the vCPU
25 to a pCPU via libvirt and affecting the change in frequency.
27 The solution is comprised of two high-level components:
29 #. Example Host Application
31 Using a Command Line Interface(CLI) for VM->Host communication channel management
32 allows adding channels to the Monitor, setting and querying the vCPU to pCPU pinning,
33 inspecting and manually changing the frequency for each CPU.
34 The CLI runs on a single lcore while the thread responsible for managing
35 VM requests runs on a second lcore.
37 VM requests arriving on a channel for frequency changes are passed
38 to the librte_power ACPI cpufreq sysfs based library.
39 The Host Application relies on both qemu-kvm and libvirt to function.
41 This monitoring application is responsible for:
43 - Accepting requests from client applications: Client applications can
44 request frequency changes for a vCPU, translating
45 the vCPU to a pCPU via libvirt and affecting the change in frequency.
47 - Accepting policies from client applications: Client application can
48 send a policy to the host application. The
49 host application will then apply the rules of the policy independent
50 of the application. For example, the policy can contain time-of-day
51 information for busy/quiet periods, and the host application can scale
52 up/down the relevant cores when required. See the details of the guest
53 application below for more information on setting the policy values.
55 - Out-of-band monitoring of workloads via cores hardware event counters:
56 The host application can manage power for an application in a virtualised
57 OR non-virtualised environment by looking at the event counters of the
58 cores and taking action based on the branch hit/miss ratio. See the host
59 application '--core-list' command line parameter below.
61 #. librte_power for Virtual Machines
63 Using an alternate implementation for the librte_power API, requests for
64 frequency changes are forwarded to the host monitor rather than
65 the APCI cpufreq sysfs interface used on the host.
67 The l3fwd-power application will use this implementation when deployed on a VM
68 (see :doc:`l3_forward_power_man`).
70 .. _figure_vm_power_mgr_highlevel:
72 .. figure:: img/vm_power_mgr_highlevel.*
80 VM Power Management employs qemu-kvm to provide communications channels
81 between the host and VMs in the form of Virtio-Serial which appears as
82 a paravirtualized serial device on a VM and can be configured to use
83 various backends on the host. For this example each Virtio-Serial endpoint
84 on the host is configured as AF_UNIX file socket, supporting poll/select
85 and epoll for event notification.
86 In this example each channel endpoint on the host is monitored via
87 epoll for EPOLLIN events.
88 Each channel is specified as qemu-kvm arguments or as libvirt XML for each VM,
89 where each VM can have a number of channels up to a maximum of 64 per VM,
90 in this example each DPDK lcore on a VM has exclusive access to a channel.
92 To enable frequency changes from within a VM, a request via the librte_power interface
93 is forwarded via Virtio-Serial to the host, each request contains the vCPU
94 and power command(scale up/down/min/max).
95 The API for host and guest librte_power is consistent across environments,
96 with the selection of VM or Host Implementation determined at automatically
97 at runtime based on the environment.
99 Upon receiving a request, the host translates the vCPU to a pCPU via
100 the libvirt API before forwarding to the host librte_power.
102 .. _figure_vm_power_mgr_vm_request_seq:
104 .. figure:: img/vm_power_mgr_vm_request_seq.*
106 VM request to scale frequency
109 Performance Considerations
110 ~~~~~~~~~~~~~~~~~~~~~~~~~~
112 While Haswell Microarchitecture allows for independent power control for each core,
113 earlier Microarchtectures do not offer such fine grained control.
114 When deployed on pre-Haswell platforms greater care must be taken in selecting
115 which cores are assigned to a VM, for instance a core will not scale down
116 until its sibling is similarly scaled.
124 Enhanced Intel SpeedStepĀ® Technology must be enabled in the platform BIOS
125 if the power management feature of DPDK is to be used.
126 Otherwise, the sys file folder /sys/devices/system/cpu/cpu0/cpufreq will not exist,
127 and the CPU frequency-based power management cannot be used.
128 Consult the relevant BIOS documentation to determine how these settings
131 Host Operating System
132 ~~~~~~~~~~~~~~~~~~~~~
134 The DPDK Power Library can use either the *acpi_cpufreq* or *intel_pstate*
135 kernel driver for the management of core frequencies. In many cases
136 the *intel_pstate* driver is the default Power Management environment.
138 Should the *acpi-cpufreq* driver be required, the *intel_pstate* module must
139 be disabled, and *apci_cpufreq* module loaded in its place.
141 To disable *intel_pstate* driver, add the following to the grub Linux
144 .. code-block:: console
148 Upon rebooting, load the *acpi_cpufreq* module:
150 .. code-block:: console
152 modprobe acpi_cpufreq
154 Hypervisor Channel Configuration
155 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
157 Virtio-Serial channels are configured via libvirt XML:
162 <name>{vm_name}</name>
163 <controller type='virtio-serial' index='0'>
164 <address type='pci' domain='0x0000' bus='0x00' slot='0x06' function='0x0'/>
166 <channel type='unix'>
167 <source mode='bind' path='/tmp/powermonitor/{vm_name}.{channel_num}'/>
168 <target type='virtio' name='virtio.serial.port.poweragent.{vm_channel_num}'/>
169 <address type='virtio-serial' controller='0' bus='0' port='{N}'/>
173 Where a single controller of type *virtio-serial* is created and up to 32 channels
174 can be associated with a single controller and multiple controllers can be specified.
175 The convention is to use the name of the VM in the host path *{vm_name}* and
176 to increment *{channel_num}* for each channel, likewise the port value *{N}*
177 must be incremented for each channel.
179 Each channel on the host will appear in *path*, the directory */tmp/powermonitor/*
180 must first be created and given qemu permissions
182 .. code-block:: console
184 mkdir /tmp/powermonitor/
185 chown qemu:qemu /tmp/powermonitor
187 Note that files and directories within /tmp are generally removed upon
188 rebooting the host and the above steps may need to be carried out after each reboot.
190 The serial device as it appears on a VM is configured with the *target* element attribute *name*
191 and must be in the form of *virtio.serial.port.poweragent.{vm_channel_num}*,
192 where *vm_channel_num* is typically the lcore channel to be used in DPDK VM applications.
194 Each channel on a VM will be present at */dev/virtio-ports/virtio.serial.port.poweragent.{vm_channel_num}*
196 Compiling and Running the Host Application
197 ------------------------------------------
202 For information on compiling DPDK and the sample applications
203 see :doc:`compiling`.
205 The application is located in the ``vm_power_manager`` sub-directory.
207 To build just the ``vm_power_manager`` application using ``make``:
209 .. code-block:: console
211 export RTE_SDK=/path/to/rte_sdk
212 export RTE_TARGET=build
213 cd ${RTE_SDK}/examples/vm_power_manager/
216 The resulting binary will be ${RTE_SDK}/build/examples/vm_power_manager
218 To build just the ``vm_power_manager`` application using ``meson/ninja``:
220 .. code-block:: console
222 export RTE_SDK=/path/to/rte_sdk
227 meson configure -Dexamples=vm_power_manager
230 The resulting binary will be ${RTE_SDK}/build/examples/dpdk-vm_power_manager
235 The application does not have any specific command line options other than *EAL*:
237 .. code-block:: console
239 ./build/vm_power_mgr [EAL options]
241 The application requires exactly two cores to run, one core is dedicated to the CLI,
242 while the other is dedicated to the channel endpoint monitor, for example to run
243 on cores 0 & 1 on a system with 4 memory channels:
245 .. code-block:: console
247 ./build/vm_power_mgr -l 0-1 -n 4
249 After successful initialization the user is presented with VM Power Manager CLI:
251 .. code-block:: console
255 Virtual Machines can now be added to the VM Power Manager:
257 .. code-block:: console
259 vm_power> add_vm {vm_name}
261 When a {vm_name} is specified with the *add_vm* command a lookup is performed
262 with libvirt to ensure that the VM exists, {vm_name} is used as an unique identifier
263 to associate channels with a particular VM and for executing operations on a VM within the CLI.
264 VMs do not have to be running in order to add them.
266 A number of commands can be issued via the CLI in relation to VMs:
268 Remove a Virtual Machine identified by {vm_name} from the VM Power Manager.
270 .. code-block:: console
274 Add communication channels for the specified VM, the virtio channels must be enabled
275 in the VM configuration(qemu/libvirt) and the associated VM must be active.
276 {list} is a comma-separated list of channel numbers to add, using the keyword 'all'
277 will attempt to add all channels for the VM:
279 .. code-block:: console
281 add_channels {vm_name} {list}|all
283 Enable or disable the communication channels in {list}(comma-separated)
284 for the specified VM, alternatively list can be replaced with keyword 'all'.
285 Disabled channels will still receive packets on the host, however the commands
286 they specify will be ignored. Set status to 'enabled' to begin processing requests again:
288 .. code-block:: console
290 set_channel_status {vm_name} {list}|all enabled|disabled
292 Print to the CLI the information on the specified VM, the information
293 lists the number of vCPUS, the pinning to pCPU(s) as a bit mask, along with
294 any communication channels associated with each VM, along with the status of each channel:
296 .. code-block:: console
300 Set the binding of Virtual CPU on VM with name {vm_name} to the Physical CPU mask:
302 .. code-block:: console
304 set_pcpu_mask {vm_name} {vcpu} {pcpu}
306 Set the binding of Virtual CPU on VM to the Physical CPU:
308 .. code-block:: console
310 set_pcpu {vm_name} {vcpu} {pcpu}
312 Enable query of physical core information from a VM:
314 .. code-block:: console
316 set_query {vm_name} enable|disable
318 Manual control and inspection can also be carried in relation CPU frequency scaling:
320 Get the current frequency for each core specified in the mask:
322 .. code-block:: console
324 show_cpu_freq_mask {mask}
326 Set the current frequency for the cores specified in {core_mask} by scaling each up/down/min/max:
328 .. code-block:: console
330 set_cpu_freq {core_mask} up|down|min|max
332 Get the current frequency for the specified core:
334 .. code-block:: console
336 show_cpu_freq {core_num}
338 Set the current frequency for the specified core by scaling up/down/min/max:
340 .. code-block:: console
342 set_cpu_freq {core_num} up|down|min|max
344 There are also some command line parameters for enabling the out-of-band
345 monitoring of branch ratio on cores doing busy polling via PMDs.
347 .. code-block:: console
349 --core-list {list of cores}
351 When this parameter is used, the list of cores specified will monitor the ratio
352 between branch hits and branch misses. A tightly polling PMD thread will have a
353 very low branch ratio, so the core frequency will be scaled down to the minimum
354 allowed value. When packets are received, the code path will alter, causing the
355 branch ratio to increase. When the ratio goes above the ratio threshold, the
356 core frequency will be scaled up to the maximum allowed value.
358 .. code-block:: console
360 --branch-ratio {ratio}
362 The branch ratio is a floating point number that specifies the threshold at which
363 to scale up or down for the given workload. The default branch ratio is 0.01,
364 and will need to be adjusted for different workloads.
371 In addition to the command line interface for host command and a virtio-serial
372 interface for VM power policies, there is also a JSON interface through which
373 power commands and policies can be sent. This functionality adds a dependency
374 on the Jansson library, and the Jansson development package must be installed
375 on the system before the JSON parsing functionality is included in the app.
378 .. code-block:: javascript
380 apt-get install libjansson-dev
382 The command and package name may be different depending on your operating
383 system. It's worth noting that the app will successfully build without this
384 package present, but a warning is shown during compilation, and the JSON
385 parsing functionality will not be present in the app.
387 Sending a command or policy to the power manager application is achieved by
388 simply opening a fifo file, writing a JSON string to that fifo, and closing
389 the file. In actual implementation every core has own dedicated fifo[0..n],
390 where n is number of the last available core.
391 Having a dedicated fifo file per core allows using standard filesystem permissions
392 to ensure a given container can only write JSON commands into fifos it is allowed
395 The fifo is at /tmp/powermonitor/fifo[0..n]
397 For example all cmds put to the /tmp/powermonitor/fifo7, will have
398 effect only on CPU[7].
400 The JSON string can be a policy or instruction, and takes the following
403 .. code-block:: javascript
410 The 'packet_type' header can contain one of two values, depending on
411 whether a policy or power command is being sent. The two possible values are
412 "policy" and "instruction", and the expected name-value pairs is different
413 depending on which type is being sent.
415 The pairs are the format of standard JSON name-value pairs. The value type
416 varies between the different name/value pairs, and may be integers, strings,
417 arrays, etc. Examples of policies follow later in this document. The allowed
418 names and value types are as follows:
421 :Pair Name: "command"
422 :Description: The type of packet we're sending to the power manager. We can be
423 creating or destroying a policy, or sending a direct command to adjust
424 the frequency of a core, similar to the command line interface.
428 :CREATE: used when creating a new policy,
429 :DESTROY: used when removing a policy,
430 :POWER: used when sending an immediate command, max, min, etc.
434 .. code-block:: javascript
439 :Pair Name: "policy_type"
440 :Description: Type of policy to apply. Please see vm_power_manager documentation
441 for more information on the types of policies that may be used.
445 :TIME: Time-of-day policy. Frequencies of the relevant cores are
446 scaled up/down depending on busy and quiet hours.
447 :TRAFFIC: This policy takes statistics from the NIC and scales up
448 and down accordingly.
449 :WORKLOAD: This policy looks at how heavily loaded the cores are,
450 and scales up and down accordingly.
451 :BRANCH_RATIO: This out-of-band policy can look at the ratio between
452 branch hits and misses on a core, and is useful for detecting
453 how much packet processing a core is doing.
454 :Required: only for CREATE/DESTROY command
457 .. code-block:: javascript
459 "policy_type", "TIME"
461 :Pair Name: "busy_hours"
462 :Description: The hours of the day in which we scale up the cores for busy
464 :Type: array of integers
465 :Values: array with list of hour numbers, (0-23)
466 :Required: only for TIME policy
469 .. code-block:: javascript
471 "busy_hours":[ 17, 18, 19, 20, 21, 22, 23 ]
473 :Pair Name: "quiet_hours"
474 :Description: The hours of the day in which we scale down the cores for quiet
476 :Type: array of integers
477 :Values: array with list of hour numbers, (0-23)
478 :Required: only for TIME policy
481 .. code-block:: javascript
483 "quiet_hours":[ 2, 3, 4, 5, 6 ]
485 :Pair Name: "avg_packet_thresh"
486 :Description: Threshold below which the frequency will be set to min for
487 the TRAFFIC policy. If the traffic rate is above this and below max, the
488 frequency will be set to medium.
490 :Values: The number of packets below which the TRAFFIC policy applies the
491 minimum frequency, or medium frequency if between avg and max thresholds.
492 :Required: only for TRAFFIC policy
495 .. code-block:: javascript
497 "avg_packet_thresh": 100000
499 :Pair Name: "max_packet_thresh"
500 :Description: Threshold above which the frequency will be set to max for
503 :Values: The number of packets per interval above which the TRAFFIC policy
504 applies the maximum frequency
505 :Required: only for TRAFFIC policy
508 .. code-block:: javascript
510 "max_packet_thresh": 500000
512 :Pair Name: "workload"
513 :Description: When our policy is of type WORKLOAD, we need to specify how
514 heavy our workload is.
518 :HIGH: For cores running workloads that require high frequencies
519 :MEDIUM: For cores running workloads that require medium frequencies
520 :LOW: For cores running workloads that require low frequencies
521 :Required: only for WORKLOAD policy types
524 .. code-block:: javascript
528 :Pair Name: "mac_list"
529 :Description: When our policy is of type TRAFFIC, we need to specify the
530 MAC addresses that the host needs to monitor
532 :Values: array with a list of mac address strings.
533 :Required: only for TRAFFIC policy types
536 .. code-block:: javascript
538 "mac_list":[ "de:ad:be:ef:01:01", "de:ad:be:ef:01:02" ]
541 :Description: the type of power operation to apply in the command
545 :SCALE_MAX: Scale frequency of this core to maximum
546 :SCALE_MIN: Scale frequency of this core to minimum
547 :SCALE_UP: Scale up frequency of this core
548 :SCALE_DOWN: Scale down frequency of this core
549 :ENABLE_TURBO: Enable Turbo Boost for this core
550 :DISABLE_TURBO: Disable Turbo Boost for this core
551 :Required: only for POWER instruction
554 .. code-block:: javascript
561 Profile create example:
563 .. code-block:: javascript
567 "policy_type": "TIME",
568 "busy_hours":[ 17, 18, 19, 20, 21, 22, 23 ],
569 "quiet_hours":[ 2, 3, 4, 5, 6 ]
572 Profile destroy example:
574 .. code-block:: javascript
580 Power command example:
582 .. code-block:: javascript
589 To send a JSON string to the Power Manager application, simply paste the
590 example JSON string into a text file and cat it into the proper fifo:
592 .. code-block:: console
594 cat file.json >/tmp/powermonitor/fifo[0..n]
596 The console of the Power Manager application should indicate the command that
597 was just received via the fifo.
599 Compiling and Running the Guest Applications
600 --------------------------------------------
602 l3fwd-power is one sample application that can be used with vm_power_manager.
604 A guest CLI is also provided for validating the setup.
606 For both l3fwd-power and guest CLI, the channels for the VM must be monitored by the
607 host application using the *add_channels* command on the host. This typically uses
608 the following commands in the host application:
610 .. code-block:: console
612 vm_power> add_vm vmname
613 vm_power> add_channels vmname all
614 vm_power> set_channel_status vmname all enabled
615 vm_power> show_vm vmname
621 For information on compiling DPDK and the sample applications
622 see :doc:`compiling`.
624 For compiling and running l3fwd-power, see :doc:`l3_forward_power_man`.
626 The application is located in the ``guest_cli`` sub-directory under ``vm_power_manager``.
628 To build just the ``guest_vm_power_manager`` application using ``make``:
630 .. code-block:: console
632 export RTE_SDK=/path/to/rte_sdk
633 export RTE_TARGET=build
634 cd ${RTE_SDK}/examples/vm_power_manager/guest_cli/
637 The resulting binary will be ${RTE_SDK}/build/examples/guest_cli
640 This sample application conditionally links in the Jansson JSON
641 library, so if you are using a multilib or cross compile environment you
642 may need to set the ``PKG_CONFIG_LIBDIR`` environmental variable to point to
643 the relevant pkgconfig folder so that the correct library is linked in.
645 For example, if you are building for a 32-bit target, you could find the
646 correct directory using the following ``find`` command:
648 .. code-block:: console
650 # find /usr -type d -name pkgconfig
651 /usr/lib/i386-linux-gnu/pkgconfig
652 /usr/lib/x86_64-linux-gnu/pkgconfig
656 .. code-block:: console
658 export PKG_CONFIG_LIBDIR=/usr/lib/i386-linux-gnu/pkgconfig
660 You then use the make command as normal, which should find the 32-bit
661 version of the library, if it installed. If not, the application will
662 be built without the JSON interface functionality.
664 To build just the ``vm_power_manager`` application using ``meson/ninja``:
666 .. code-block:: console
668 export RTE_SDK=/path/to/rte_sdk
673 meson configure -Dexamples=vm_power_manager/guest_cli
676 The resulting binary will be ${RTE_SDK}/build/examples/guest_cli
681 The standard *EAL* command line parameters are required:
683 .. code-block:: console
685 ./build/guest_vm_power_mgr [EAL options] -- [guest options]
687 The guest example uses a channel for each lcore enabled. For example,
688 to run on cores 0,1,2,3:
690 .. code-block:: console
692 ./build/guest_vm_power_mgr -l 0-3
694 Optionally, there is a list of command line parameter should the user wish to send a power
695 policy down to the host application. These parameters are as follows:
697 .. code-block:: console
699 --vm-name {name of guest vm}
701 This parameter allows the user to change the Virtual Machine name passed down to the
702 host application via the power policy. The default is "ubuntu2"
704 .. code-block:: console
706 --vcpu-list {list vm cores}
708 A comma-separated list of cores in the VM that the user wants the host application to
709 monitor. The list of cores in any vm starts at zero, and these are mapped to the
710 physical cores by the host application once the policy is passed down.
711 Valid syntax includes individual cores '2,3,4', or a range of cores '2-4', or a
712 combination of both '1,3,5-7'
714 .. code-block:: console
716 --busy-hours {list of busy hours}
718 A comma-separated list of hours within which to set the core frequency to maximum.
719 Valid syntax includes individual hours '2,3,4', or a range of hours '2-4', or a
720 combination of both '1,3,5-7'. Valid hours are 0 to 23.
722 .. code-block:: console
724 --quiet-hours {list of quiet hours}
726 A comma-separated list of hours within which to set the core frequency to minimum.
727 Valid syntax includes individual hours '2,3,4', or a range of hours '2-4', or a
728 combination of both '1,3,5-7'. Valid hours are 0 to 23.
730 .. code-block:: console
732 --policy {policy type}
734 The type of policy. This can be one of the following values:
735 TRAFFIC - based on incoming traffic rates on the NIC.
736 TIME - busy/quiet hours policy.
737 BRANCH_RATIO - uses branch ratio counters to determine core busyness.
738 Not all parameters are needed for all policy types. For example, BRANCH_RATIO
739 only needs the vcpu-list parameter, not any of the hours.
742 After successful initialization the user is presented with VM Power Manager Guest CLI:
744 .. code-block:: console
748 To change the frequency of a lcore, use the set_cpu_freq command.
749 Where {core_num} is the lcore and channel to change frequency by scaling up/down/min/max.
751 .. code-block:: console
753 set_cpu_freq {core_num} up|down|min|max
755 To query the available frequences of an lcore, use the query_cpu_freq command.
756 Where {core_num} is the lcore to query.
757 Before using this command, please enable responses via the set_query command on the host.
759 .. code-block:: console
761 query_cpu_freq {core_num}|all
763 To query the capabilities of an lcore, use the query_cpu_caps command.
764 Where {core_num} is the lcore to query.
765 Before using this command, please enable responses via the set_query command on the host.
767 .. code-block:: console
769 query_cpu_caps {core_num}|all
771 To start the application and configure the power policy, and send it to the host:
773 .. code-block:: console
775 ./build/guest_vm_power_mgr -l 0-3 -n 4 -- --vm-name=ubuntu --policy=BRANCH_RATIO --vcpu-list=2-4
777 Once the VM Power Manager Guest CLI appears, issuing the 'send_policy now' command
778 will send the policy to the host:
780 .. code-block:: console
784 Once the policy is sent to the host, the host application takes over the power monitoring
785 of the specified cores in the policy.