Asked  7 Months ago    Answers:  5   Viewed   29 times

I have a few related questions regarding memory usage in the following example.

  1. If I run in the interpreter,

    foo = ['bar' for _ in xrange(10000000)]
    

    the real memory used on my machine goes up to 80.9mb. I then,

    del foo
    

    real memory goes down, but only to 30.4mb. The interpreter uses 4.4mb baseline so what is the advantage in not releasing 26mb of memory to the OS? Is it because Python is "planning ahead", thinking that you may use that much memory again?

  2. Why does it release 50.5mb in particular - what is the amount that is released based on?

  3. Is there a way to force Python to release all the memory that was used (if you know you won't be using that much memory again)?

NOTE This question is different from How can I explicitly free memory in Python? because this question primarily deals with the increase of memory usage from baseline even after the interpreter has freed objects via garbage collection (with use of gc.collect or not).

 Answers

57

Memory allocated on the heap can be subject to high-water marks. This is complicated by Python's internal optimizations for allocating small objects (PyObject_Malloc) in 4 KiB pools, classed for allocation sizes at multiples of 8 bytes -- up to 256 bytes (512 bytes in 3.3). The pools themselves are in 256 KiB arenas, so if just one block in one pool is used, the entire 256 KiB arena will not be released. In Python 3.3 the small object allocator was switched to using anonymous memory maps instead of the heap, so it should perform better at releasing memory.

Additionally, the built-in types maintain freelists of previously allocated objects that may or may not use the small object allocator. The int type maintains a freelist with its own allocated memory, and clearing it requires calling PyInt_ClearFreeList(). This can be called indirectly by doing a full gc.collect.

Try it like this, and tell me what you get. Here's the link for psutil.Process.memory_info.

import os
import gc
import psutil

proc = psutil.Process(os.getpid())
gc.collect()
mem0 = proc.memory_info().rss

# create approx. 10**7 int objects and pointers
foo = ['abc' for x in range(10**7)]
mem1 = proc.memory_info().rss

# unreference, including x == 9999999
del foo, x
mem2 = proc.memory_info().rss

# collect() calls PyInt_ClearFreeList()
# or use ctypes: pythonapi.PyInt_ClearFreeList()
gc.collect()
mem3 = proc.memory_info().rss

pd = lambda x2, x1: 100.0 * (x2 - x1) / mem0
print "Allocation: %0.2f%%" % pd(mem1, mem0)
print "Unreference: %0.2f%%" % pd(mem2, mem1)
print "Collect: %0.2f%%" % pd(mem3, mem2)
print "Overall: %0.2f%%" % pd(mem3, mem0)

Output:

Allocation: 3034.36%
Unreference: -752.39%
Collect: -2279.74%
Overall: 2.23%

Edit:

I switched to measuring relative to the process VM size to eliminate the effects of other processes in the system.

The C runtime (e.g. glibc, msvcrt) shrinks the heap when contiguous free space at the top reaches a constant, dynamic, or configurable threshold. With glibc you can tune this with mallopt (M_TRIM_THRESHOLD). Given this, it isn't surprising if the heap shrinks by more -- even a lot more -- than the block that you free.

In 3.x range doesn't create a list, so the test above won't create 10 million int objects. Even if it did, the int type in 3.x is basically a 2.x long, which doesn't implement a freelist.

Tuesday, June 1, 2021
 
Anand
answered 7 Months ago
45

I am a bit lost as to what is the question here...

But let's try to answer, at least part of it:

For a starter let's explain what URL.createObjectURL(blob) roughly does:

It creates a blob URI, which is an URI pointing to the Blob blob in memory just like if it was in an reachable place (like a server).
This blob URI will mark blob as being un-collectable by the Garbage Collector (GC) for as long as it has not been revoked, so that you don't have to maintain a live reference to blob in your script, but that you can still use/load it.

URL.revokeObjectURL will then break the link between the blob URI and the Blob in memory. It will not free up the memory occupied by blob directly, it will just remove its own protection regarding the GC, [and won't point to anywhere anymore].
So if you have multiple blob URI pointing to the same Blob object, revoking only one won't break the other blob URIs.

Now, the memory will be freed only when the GC will kick in, and this in only decided by the browser internals, when it thinks it is the best time, or when it sees it has no other options (generally when it misses memroy space).

So it is quite normal that you don't see your memory being freed up instantly, and by experience, I would say that FF doesn't care about using a lot of memory, when it is available, making GC kick not so often, whihc is good for user-experience (GCing often results in lags).


For your download question, indeed, web APIs don't provide a way to know if a download has been successful or failed, nor even if it has just ended.
For the revoking part, it really depends on when you do it.
If you do it directly in the click handler, then the browser won't have done the pre-fetch request yet, so when the default action of the click (the download) will happen, there won't be anything linked by the URI anymore.
Now, if you do revoke the blob URI after the "save" prompt, the browser will have done a pre-fetch request, and thus might be able to mark by itself that the Blob resource should not be cleared. But I don't think this behavior is tied by any specs, and it might be better to wait at least for the window's focus event, at which point the downloading of the resource should already have started.

const blob = new Blob(['bar']);
const uri = URL.createObjectURL(blob);
anchor.href = uri;
anchor.onclick = e => {
  window.addEventListener('focus', e=>{
    URL.revokeObjectURL(uri);
    console.log("Blob URI revoked, you won't be able to download it anymore");
  }, {once: true});
};
<a id="anchor" download="foo.txt">download</a>
Friday, August 6, 2021
 
neon29
answered 4 Months ago
99

I had installed the latest svn of numpy and the issue had vanished. I assume it was inside one of the numpy functions. I never got a chance to dig further into it.

Friday, August 6, 2021
 
Yoshi
answered 4 Months ago
100

Are you remembering to close your figures when you are done with them? e.g.:

import matplotlib.pyplot as plt

#generate figure here
#...
plt.close(fig)  #release resources associated with fig
Wednesday, August 18, 2021
 
Rodrigo Vedovato
answered 4 Months ago
14

You can use the Counter upfront saving you memory from using intermediate lists (especially words_1800 which is as big as the file you’re reading):

common_words_1800 = Counter()

with open('E:\Book\1800.txt', "r", encoding='ISO-8859-1') as File_1800:
    for line in File_1800:
        for match in re.finditer(r'w+', line.lower()):
            word = match.group()
            if len(word) > 3:
                common_words_1800[word] += 1

print(common_words_1800.most_common(50))
Saturday, August 28, 2021
 
Sufi
answered 3 Months ago
Only authorized users can answer the question. Please sign in first, or register a free account.
Not the answer you're looking for? Browse other questions tagged :  
Share