Asked  4 Months ago    Answers:  5   Viewed   546 times

I am using multiprocessing.Pool.imap to run many independent jobs in parallel using Python 2.7 on Windows 7. With the default settings, my total CPU usage is pegged at 100%, as measured by Windows Task Manager. This makes it impossible to do any other work while my code runs in the background.

I've tried limiting the number of processes to be the number of CPUs minus 1, as described in How to limit the number of processors that Python uses:

pool = Pool(processes=max(multiprocessing.cpu_count()-1, 1)
for p in pool.imap(func, iterable):
     ...

This does reduce the total number of running processes. However, each process just takes up more cycles to make up for it. So my total CPU usage is still pegged at 100%.

Is there a way to directly limit the total CPU usage - NOT just the number of processes - or failing that, is there any workaround?

 Answers

24

The solution depends on what you want to do. Here are a few options:

Lower priorities of processes

You can nice the subprocesses. This way, though they will still eat 100% of the CPU, when you start other applications, the OS gives preference to the other applications. If you want to leave a work intensive computation run on the background of your laptop and don't care about the CPU fan running all the time, then setting the nice value with psutils is your solution. This script is a test script which runs on all cores for enough time so you can see how it behaves.

from multiprocessing import Pool, cpu_count
import math
import psutil
import os

def f(i):
    return math.sqrt(i)

def limit_cpu():
    "is called at every process start"
    p = psutil.Process(os.getpid())
    # set to lowest priority, this is windows only, on Unix use ps.nice(19)
    p.nice(psutil.BELOW_NORMAL_PRIORITY_CLASS)

if __name__ == '__main__':
    # start "number of cores" processes
    pool = Pool(None, limit_cpu)
    for p in pool.imap(f, range(10**8)):
        pass

The trick is that limit_cpu is run at the beginning of every process (see initializer argment in the doc). Whereas Unix has levels -19 (highest prio) to 19 (lowest prio), Windows has a few distinct levels for giving priority. BELOW_NORMAL_PRIORITY_CLASS probably fits your requirements best, there is also IDLE_PRIORITY_CLASS which says Windows to run your process only when the system is idle.

You can view the priority if you switch to detail mode in Task Manager and right click on the process:

enter image description here

Lower number of processes

Although you have rejected this option it still might be a good option: Say you limit the number of subprocesses to half the cpu cores using pool = Pool(max(cpu_count()//2, 1)) then the OS initially runs those processes on half the cpu cores, while the others stay idle or just run the other applications currently running. After a short time, the OS reschedules the processes and might move them to other cpu cores etc. Both Windows as Unix based systems behave this way.

Windows: Running 2 processes on 4 cores:

OSX: Running 4 processes on 8 cores:

enter image description here

You see that both OS balance the process between the cores, although not evenly so you still see a few cores with higher percentages than others.

Sleep

If you absolutely want to go sure, that your processes never eat 100% of a certain core (e.g. if you want to prevent that the cpu fan goes up), then you can run sleep in your processing function:

from time import sleep

def f(i):
    sleep(0.01)
    return math.sqrt(i)

This makes the OS "schedule out" your process for 0.01 seconds for each computation and makes room for other applications. If there are no other applications, then the cpu core is idle, thus it will never go to 100%. You'll need to play around with different sleep durations, it will also vary from computer to computer you run it on. If you want to make it very sophisticated you could adapt the sleep depending on what cpu_times() reports.

Wednesday, June 16, 2021
 
dmp
answered 4 Months ago
dmp
92

cat /proc/stat

http://www.linuxhowtos.org/System/procstat.htm

I agree with this answer above. The cpu line in this file gives the total number of "jiffies" your system has spent doing different types of processing.

What you need to do is take 2 readings of this file, seperated by whatever interval of time you require. The numbers are increasing values (subject to integer rollover) so to get the %cpu you need to calculate how many jiffies have elapsed over your interval, versus how many jiffies were spend doing work.

e.g. Suppose at 14:00:00 you have

cpu 4698 591 262 8953 916 449 531

total_jiffies_1 = (sum of all values) = 16400

work_jiffies_1 = (sum of user,nice,system = the first 3 values) = 5551

and at 14:00:05 you have

cpu 4739 591 289 9961 936 449 541

total_jiffies_2 = 17506

work_jiffies_2 = 5619

So the %cpu usage over this period is:

work_over_period = work_jiffies_2 - work_jiffies_1 = 68

total_over_period = total_jiffies_2 - total_jiffies_1 = 1106

%cpu = work_over_period / total_over_period * 100 = 6.1%

Hope that helps a bit.

Saturday, June 12, 2021
 
ChronoFish
answered 5 Months ago
67

If a variable is assigned to anywhere in a function, it is treated as a local variable. shared_array += some_other_array is equivalent to shared_array = shared_array + some_other_array. Thus shared_array is treated as a local variable, which does not exist at the time you try to use it on the right-hand side of the assignment.

If you want to use the global shared_array variable, you need to explicitly mark it as global by putting a global shared_array in your function.

The reason you don't see the error with shared_array[i,:] = i is that this does not assign to the variable shared_array. Rather, it mutates that object, assigning to a slice of it. In Python, assigning to a bare name (e.g., shared_array = ...) is very different from any other kind of assignment (e.g., shared_array[...] = ...), even though they look similar.

Note, incidentally, that the error has nothing to do with multiprocessing.

Saturday, August 7, 2021
 
njai
answered 3 Months ago
62

This is not the correct way to use map.

  1. Using a global variable that way is absolutely wrong. Processes do not share the same memory (normally) so every f will have his own copy of foo. To share a variable between different processes you should use a Manager
  2. Function passed to map are, usually, expected to return a value.

I suggest you to read some documentation.

However here is a dummy example of how you could implement it:

from multiprocessing import Pool

foo = {1: []}

def f(x):
    return x

def main():
    pool = Pool()
    foo[1] = pool.map(f, range(100))
    pool.close()
    pool.join()
    print foo

if __name__ == '__main__':
    main()

You may also do something like pool.map(functools.partial(f, foo), range(100)) where foo is a Manager.

Thursday, August 12, 2021
 
treeface
answered 2 Months ago
86

You'll need the psutil module.

import psutil
print(psutil.cpu_percent())
Thursday, August 19, 2021
 
Elias Van Ootegem
answered 2 Months ago
Only authorized users can answer the question. Please sign in first, or register a free account.
Not the answer you're looking for? Browse other questions tagged :