更多linux系統電源管理相關的內容請看:https://blog.csdn.net/u010936265/article/details/146436725?spm=1011.2415.3001.5331
1 簡介
CPUFreq子系統位于drivers/cpufreq目錄下,負責進行運行過程中CPU頻率和電壓的動態調整,即DVFS (Dynamic Voltage Frequency Scaling,動態電源頻率調整)。
????????????????????????????????《Linux設備驅動開發詳解:基于最新的Linux4.0內核》19.2 CPUFreq驅動
CPU工作態電源管理在Linux內核中稱為CPUFreq子系統(在一些文獻中也稱DVFS),它主要適用于CPU利用率在5%~100%(對單個CPU核而言)動態變化的場景,基本方法是動態變頻和動態變壓。
????????????????????????????????《用“芯”探核:基于龍芯的Linux內核探索解析》8.2 運行時電源管理
SoC CPUFreq驅動只是設定了CPU的頻率參數,以及提供了設置頻率的途徑,但是它并不會管CPU自身究竟應該運行在哪種頻率上。究竟頻率依據的是哪種標準,進行何種變化,而這些完全由CPUFreq的策略決定。
系統的狀態以及CPUFreq的策略共同決定了CPU頻率跳變的目標,CPUFreq核心層并將目標頻率傳遞給底層具體SoC的CPUFreq驅動,該驅動修改硬件,完成頻率的變換。
????????????????????????????????《Linux設備驅動開發詳解:基于最新的Linux4.0內核》19.2.2?CPUFreq的策略
2 cpufreq_driver
2.1 簡介
每個SoC的具體CPUFreq驅動實例只需要實現電壓、頻率表,以及從硬件層面完成這些變化。
?????????????????????????????????《Linux設備驅動開發詳解:基于最新的Linux4.0內核》19.2.1 SoC的CPUFreq驅動實現
2.2?數據結構
//include/linux/cpufreq.h
struct cpufreq_driver {char name[CPUFREQ_NAME_LEN];......int (*target)(struct cpufreq_policy *policy,unsigned int target_freq,unsigned int relation); /* Deprecated */int (*target_index)(struct cpufreq_policy *policy,unsigned int index);......
};
target()和target_index()
????????實現最終調頻的接口,內部可以自行實現或調用CLK接口。
????????這是最重要的一個功能,在切換頻率時調用。它會將當前CPU核的主頻設置成CPUFreq策略提供的目標頻率。
????????????????????????????????《SoC底層軟件低功耗系統設計與實現》13.1.4 主要數據結構;3.driver相關數據結構
????????????????????????????????《?“芯”探核:基于?芯的Linux內核探索解析》8.2.1 動態變頻;(一) CPUFreq的機制部分
register和unregister接口
int cpufreq_register_driver(struct cpufreq_driver *driver_data);
int cpufreq_unregister_driver(struct cpufreq_driver *driver);
2.3 實例分析:phytium (ARM)平臺CPU的cpufreq_driver
2.3.1 phytium平臺CPU相關功能簡介
以phytium平臺的FT-2000/4 CPU為例。?
FT-2000/4 支持處理器的多種功耗管理技術,并通過 ARM 定義的 SCPI(System Control and Power Interface)[2]接口和 PSCI(Power State Corodination Interface)[3] 供系統功耗管理軟件調用。
實現 core 運行頻率的動態調節。通過 SCPI 接口,可以查詢 CPU 支持的頻率點集合,以及實現頻率的動態切換。
????????????????????????????????《FT-2000/4軟件編程手冊》(V1.4); 7.1 CPU 功耗管理
2.3.2 ARM的SCP,SCPI簡介
A System Control Processor (SCP) is a processor-based capability that provides a flexible and
extensible platform for provision of power management functions and services.
????????????????《ARM Compute Subsystem SCP Message Interface Protocols》
? ? ? ????????????????? 1.1 The System Control Processor
System Control and Power Interface (SCPI)
The SCPI is one of the primary interfaces to the SCP in an ARM CSS-based platform. It is used
to access many of the services that are exposed to the AP. The SCP is expected to be idle and
waiting for SCPI commands for most of the time after the system boot process completes.
????????????????《ARM Compute Subsystem SCP Message Interface Protocols》
? ? ????????????????? ? Chapter 3 CSS System Control and Power Interface (SCPI)
?
2.3.3?數據結構
//drivers/cpufreq/scpi-cpufreq.c
static struct cpufreq_driver scpi_cpufreq_driver = { .name = "scpi-cpufreq",.flags = CPUFREQ_STICKY | CPUFREQ_HAVE_GOVERNOR_PER_POLICY |CPUFREQ_NEED_INITIAL_FREQ_CHECK |CPUFREQ_IS_COOLING_DEV,.verify = cpufreq_generic_frequency_table_verify,.attr = cpufreq_generic_attr,.get = scpi_cpufreq_get_rate,.init = scpi_cpufreq_init,.exit = scpi_cpufreq_exit,.target_index = scpi_cpufreq_set_target,
};
2.3.4?scpi_cpufreq_init()代碼大致流程
scpi_cpufreq_init();-> scpi_ops->add_opps_to_device(cpu_dev);-> scpi_dvfs_add_opps_to_device();-> scpi_dvfs_info();-> scpi_dvfs_get_info();-> scpi_send_message(CMD_GET_DVFS_INFO, ...);-> info->count = buf.opp_count;-> opp->freq = le32_to_cpu(buf.opps[i].freq);-> dev_pm_opp_add();-> dev_pm_opp_init_cpufreq_table(); //create a cpufreq table for a device
scpi_cpufreq_init()函數會使用SCPI接口獲取CPU的頻率和電壓等信息,然后根據這些信息實現一個struct?cpufreq_frequency_table。
具體信息請看SCPI命令中的Get DVFS Info命令(《ARM Compute Subsystem SCP Message Interface Protocols》3.2.9 Get DVFS Info)
2.3.5 設置頻率的流程
scpi_cpufreq_set_target();-> clk_set_rate(priv->clk, rate);-> clk_core_set_rate_nolock();-> clk_change_rate();-> core->ops->set_rate();-> scpi_clk_set_rate();-> clk->scpi_ops->clk_set_val();-> scpi_clk_set_val();-> scpi_send_message(CMD_SET_CLOCK_VALUE, ...);
《ARM Compute Subsystem SCP Message Interface Protocols》3.2.15 Set Clock Value
2.4?查看系統當前使用的cpufreq_driver
cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_driver
或者
cat /sys/devices/system/cpu/cpufreq/policy0/scaling_driver
3 CPUFreq的governor
3.1 簡介
CPUFreq策略(Governor)的主要原則是根據當前系統負載來選擇最合適的主頻/電壓。
????????????????????????????????《用“芯”探核:基于龍芯的Linux內核探索解析》8.2 運行時電源管理
3.2 數據結構
3.2.1 struct cpufreq_governor?
//include/linux/cpufreq.h
struct cpufreq_governor {char name[CPUFREQ_NAME_LEN];int (*init)(struct cpufreq_policy *policy);void (*exit)(struct cpufreq_policy *policy);int (*start)(struct cpufreq_policy *policy);void (*stop)(struct cpufreq_policy *policy);void (*limits)(struct cpufreq_policy *policy);ssize_t (*show_setspeed) (struct cpufreq_policy *policy,char *buf);int (*store_setspeed) (struct cpufreq_policy *policy,unsigned int freq);/* For governors which change frequency dynamically by themselves */bool dynamic_switching;struct list_head governor_list;struct module *owner;
};
register和unregister函數:?
int cpufreq_register_governor(struct cpufreq_governor *governor)
void cpufreq_unregister_governor(struct cpufreq_governor *governor)
?3.2.2?struct dbs_governor;
//drivers/cpufreq/cpufreq_governor.h
/* Common Governor data across policies */
struct dbs_governor {struct cpufreq_governor gov;struct kobj_type kobj_type;/* * Common data for platforms that don't set* CPUFREQ_HAVE_GOVERNOR_PER_POLICY*/struct dbs_data *gdbs_data;unsigned int (*gov_dbs_update)(struct cpufreq_policy *policy);struct policy_dbs_info *(*alloc)(void);void (*free)(struct policy_dbs_info *policy_dbs);int (*init)(struct dbs_data *dbs_data);void (*exit)(struct dbs_data *dbs_data);void (*start)(struct cpufreq_policy *policy);
};
3.2.3 鏈表:cpufreq_governor_list
用來存放所有注冊的governor節點
//drivers/cpufreq/cpufreq.c
static LIST_HEAD(cpufreq_governor_list);cpufreq_register_governor();-> list_add(&governor->governor_list, &cpufreq_governor_list);
3.3?現有的策略
3.3.1 performance
this governor causes the highest frequency, within the ``scaling_max_freq`` policy limit, to be requested for that policy.
//drivers/cpufreq/cpufreq_performance.c
static struct cpufreq_governor cpufreq_gov_performance = {.name = "performance",.owner = THIS_MODULE,.limits = cpufreq_gov_performance_limits,
};
cpufreq_gov_performance_init();-> cpufreq_register_governor(&cpufreq_gov_performance);
3.3.2 powersave
this governor causes the lowest frequency, within the ``scaling_min_freq`` policy limit, to be requested for that policy.
//drivers/cpufreq/cpufreq_powersave.c
static struct cpufreq_governor cpufreq_gov_powersave = {.name = "powersave",.limits = cpufreq_gov_powersave_limits,.owner = THIS_MODULE,
};
cpufreq_gov_powersave_init();-> cpufreq_register_governor(&cpufreq_gov_powersave);
3.3.3 userspace
This governor does not do anything by itself. Instead, it allows user space to set the CPU frequency for the policy it is attached to by writing to the ``scaling_setspeed`` attribute of that policy.
//drivers/cpufreq/cpufreq_userspace.c
static struct cpufreq_governor cpufreq_gov_userspace = {.name = "userspace",.init = cpufreq_userspace_policy_init,.exit = cpufreq_userspace_policy_exit,.start = cpufreq_userspace_policy_start,.stop = cpufreq_userspace_policy_stop,.limits = cpufreq_userspace_policy_limits,.store_setspeed = cpufreq_set,.show_setspeed = show_speed,.owner = THIS_MODULE,
};
cpufreq_gov_userspace_init();-> cpufreq_register_governor(&cpufreq_gov_userspace);
3.3.4 schedutil
This governor uses CPU utilization data available from the CPU scheduler. It generally is regarded as a part of the CPU scheduler, so it can access the scheduler's internal data structures directly.
//kernel/sched/cpufreq_schedutil.c
struct cpufreq_governor schedutil_gov = {.name = "schedutil",.owner = THIS_MODULE,.dynamic_switching = true,.init = sugov_init,.exit = sugov_exit,.start = sugov_start,.stop = sugov_stop,.limits = sugov_limits,
};
sugov_register();-> cpufreq_register_governor(&schedutil_gov);
當系統負載發生變化時,會根據負載來調整CPU頻率,流程大致如下:
cpufreq_update_util();-> data->func();-> sugov_update_single();-> sugov_deferred_update();-> irq_work_queue(&sg_policy->irq_work);-> sugov_irq_work();-> sugov_work();-> __cpufreq_driver_target();-> cpufreq_driver->target();
3.3.5 ondemand
按需(Ondemand)策略:設置CPU負載的閾值T,當負載低于T時,調節??個剛好能夠 滿?當前負載需求的最低頻/最低壓;當負載?于T時,?即提升到最?性能狀態。
//drivers/cpufreq/cpufreq_ondemand.c
static struct dbs_governor od_dbs_gov = { .gov = CPUFREQ_DBS_GOVERNOR_INITIALIZER("ondemand"),.kobj_type = { .default_attrs = od_attributes },.gov_dbs_update = od_dbs_update,.alloc = od_alloc,.free = od_free,.init = od_init,.exit = od_exit,.start = od_start,
};
cpufreq_gov_dbs_init();-> cpufreq_register_governor(CPU_FREQ_GOV_ONDEMAND);
3.3.6 conservative
保守(Conservative)策略:跟Ondemand策略類似,設置CPU負載的閾值T,當 負載低于T時,調節??個剛好能夠滿?當前負載需求的最低頻/最低壓;但當負載 ?于T時,不是?即設置為最?性能狀態,?是逐級升?主頻/電壓。
//drivers/cpufreq/cpufreq_conservative.c
static struct dbs_governor cs_governor = {.gov = CPUFREQ_DBS_GOVERNOR_INITIALIZER("conservative"),.kobj_type = { .default_attrs = cs_attributes },.gov_dbs_update = cs_dbs_update,.alloc = cs_alloc,.free = cs_free,.init = cs_init,.exit = cs_exit,.start = cs_start,
};
cpufreq_gov_dbs_init();-> cpufreq_register_governor(CPU_FREQ_GOV_CONSERVATIVE);
參考資料
? ? ? ? Documentation/admin-guide/pm/cpufreq.rst
????????《Linux設備驅動開發詳解:基于最新的Linux4.0內核》 19.2.2 CPUFreq的策略
????????《SoC底層軟件低功耗系統設計與實現》 13.1.5主要函數實現;4.ondemand governor
????????《?“芯”探核:基于?芯的Linux內核探索解析》 8.2 運?時電源管理
3.4?配置系統當前的governor
查看當前支持的governor
# cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_available_governors
performance powersave
或者
# cat /sys/devices/system/cpu/cpufreq/policy0/scaling_available_governors
performance powersave
設置當前的governor
echo powersave > /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor
或者
echo powersave > /sys/devices/system/cpu/cpufreq/policy0/scaling_governor
4 其他數據結構
4.1?struct cpufreq_frequency_table;
當前CPU支持的頻率表。?
//include/linux/cpufreq.h
struct cpufreq_frequency_table {unsigned int flags;unsigned int driver_data; /* driver specific data, not used by core */unsigned int frequency; /* kHz - doesn't need to be in ascending* order */
};
4.2?struct?cpufreq_policy;
每個CPU核都有自己的控制策略(cpufreq_policy)
//include/linux/cpufreq.h
struct cpufreq_policy {/* CPUs sharing clock, require sw coordination */cpumask_var_t cpus; /* Online CPUs only */cpumask_var_t related_cpus; /* Online + Offline CPUs */......unsigned int min; /* in kHz */ unsigned int max; /* in kHz */unsigned int cur; /* in kHz, only needed if cpufreq */......struct cpufreq_governor *governor;......struct cpufreq_frequency_table *freq_table; //當前CPU支持的頻率表......
};
結構體成員說明
<1> cpus和related_cpus
cpus及related_cpus表示當前policy管理的CPU,cpus代表當前處于online狀態的CPU,related_cpus表示所有包含online/offline的CPU。
查看cpus和related_cpus的值
cat /sys/devices/system/cpu/cpufreq/policy0/affected_cpus
cat /sys/devices/system/cpu/cpufreq/policy0/related_cpus
<2> min/max/cur
min/max/cur表示當前policy支持的最大、最小及當前頻率。
查看或者設置min/max的值
/sys/devices/system/cpu/cpufreq/policy0/scaling_min_freq
/sys/devices/system/cpu/cpufreq/policy0/scaling_max_freq
查看cur的值
cat /sys/devices/system/cpu/cpufreq/policy0/scaling_cur_freq
????????????????????????????????《SoC底層軟件低功耗系統設計與實現》
? ? ? ? ????????????????????????????????13.1.4 主要數據結構;1.cpufreq_policy結構體
初始化函數:cpufreq_init_policy();
5 nofifier
5.1 簡介
在頻率變化的過程 中,會發送2次通知:
????????CPUFREQ_PRECHANGE:準備進?頻率變更
????????CPUFREQ_POSTCHANGE:已經完成頻率變更
數據結構:BLOCKING_NOTIFIER_HEAD(cpufreq_policy_notifier_list);
發出通知的代碼:
srcu_notifier_call_chain(&cpufreq_transition_notifier_list,CPUFREQ_PRECHANGE, freqs);srcu_notifier_call_chain(&cpufreq_transition_notifier_list,CPUFREQ_POSTCHANGE, freqs);
????????????????????????????????《Linux設備驅動開發詳解:基于最新的Linux4.0內核》 19.2.4 CPUFreq通知
6 調試
6.1 cpufreq-stats
cpufreq-stats is a driver that provides CPU frequency statistics for each CPU.
/sys/devices/system/cpu/cpu0/cpufreq/stats # ls -l
total 0
drwxr-xr-x 2 root root 0 May 14 16:06 .
drwxr-xr-x 3 root root 0 May 14 15:58 ..
--w------- 1 root root 4096 May 14 16:06 reset
-r--r--r-- 1 root root 4096 May 14 16:06 time_in_state
-r--r--r-- 1 root root 4096 May 14 16:06 total_trans
-r--r--r-- 1 root root 4096 May 14 16:06 trans_table
????????????????????????????????Documentation/cpu-freq/cpufreq-stats.txt?
注意:
????????當使?cpufreq_driver驅動是intel_pstate時,不會存在stats/?錄
6.2?/sys/kernel/debug/tracing/events/power/
cpu_frequency_limits
cpu_frequency
6.3 cpufreq-bench
工具源碼:<kernel_src>/tools/power/cpupower/bench/
cpufreq-bench工具的工作原理是模擬系統運行時候的“空閑→忙→空閑→忙”場景,從而觸發系統的動態頻率變化,然后在使用ondemand、conservative、interactive等策略的情況下,計算在做與performance高頻模式下同樣的運算完成任務的時間比例。
?般的?標是在采?CPUFreq動態調整頻率和電壓后,性能應該 為performance這個性能策略下的90%左右,這樣才?較理想。
《Linux設備驅動開發詳解:基于最新的Linux4.0內核》 19.2.3 CPUFreq的性能測試和調優
6.4 cpupower?frequency-info|frequency-set
cpupower frequency-info
A small tool which prints out cpufreq information helpful to developers and interested users.
cpupower frequency-set
cpupower ?frequency-set ?allows ?you ?to ?modify ?cpufreq ?settings ?without ?having to type e.g. "/sys/devices/system/cpu/cpu0/cpufreq/scaling_set_speed" all the time.
6.5?cpufreq-info和cpufreq-set
cpufreq-info
A small tool which prints out cpufreq information helpful to developers and interested users.
cpufreq-set
cpufreq-set ?allows ?you ?to modify cpufreq settings without having to type e.g. "/sys/devices/system/cpu/cpu0/cpufreq/scaling_set_speed" all?the time.