Asked  7 Months ago    Answers:  5   Viewed   33 times

We recently upgraded our message processing application from Java 7 to Java 8. Since the upgrade, we get an occasional exception that a stream has been closed while it is being read from. Logging shows that the finalizer thread is calling finalize() on the object that holds the stream (which in turn closes the stream).

The basic outline of the code is as follows:

MIMEWriter writer = new MIMEWriter( out );
in = new InflaterInputStream( databaseBlobInputStream );
MIMEBodyPart attachmentPart = new MIMEBodyPart( in );
writer.writePart( attachmentPart );

MIMEWriter and MIMEBodyPart are part of a home-grown MIME/HTTP library. MIMEBodyPart extends HTTPMessage, which has the following:

public void close() throws IOException
{
    if ( m_stream != null )
    {
        m_stream.close();
    }
}

protected void finalize()
{
    try
    {
        close();
    }
    catch ( final Exception ignored ) { }
}

The exception occurs in the invocation chain of MIMEWriter.writePart, which is as follows:

  1. MIMEWriter.writePart() writes the headers for the part, then calls part.writeBodyPartContent( this )
  2. MIMEBodyPart.writeBodyPartContent() calls our utility method IOUtil.copy( getContentStream(), out ) to stream the content to the output
  3. MIMEBodyPart.getContentStream() just returns the input stream passed into the contstructor (see code block above)
  4. IOUtil.copy has a loop that reads an 8K chunk from the input stream and writes it to the output stream until the input stream is empty.

The MIMEBodyPart.finalize() is called while IOUtil.copy is running, and it gets the following exception:

java.io.IOException: Stream closed
    at java.util.zip.InflaterInputStream.ensureOpen(InflaterInputStream.java:67)
    at java.util.zip.InflaterInputStream.read(InflaterInputStream.java:142)
    at java.io.FilterInputStream.read(FilterInputStream.java:107)
    at com.blah.util.IOUtil.copy(IOUtil.java:153)
    at com.blah.core.net.MIMEBodyPart.writeBodyPartContent(MIMEBodyPart.java:75)
    at com.blah.core.net.MIMEWriter.writePart(MIMEWriter.java:65)

We put some logging in the HTTPMessage.close() method that logged the stack trace of the caller and proved that it is definitely the finalizer thread that is invoking HTTPMessage.finalize() while IOUtil.copy() is running.

The MIMEBodyPart object is definitely reachable from the current thread's stack as this in the stack frame for MIMEBodyPart.writeBodyPartContent. I don't understand why the JVM would call finalize().

I tried extracting the relevant code and running it in a tight loop on my own machine, but I cannot reproduce the problem. We can reliably reproduce the problem with high load on one of our dev servers, but any attempts to create a smaller reproducible test case have failed. The code is compiled under Java 7 but executes under Java 8. If we switch back to Java 7 without recompiling, the problem does not occur.

As a workaround, I've rewritten the affected code using the Java Mail MIME library and the problem has gone away (presumably Java Mail doesn't use finalize()). However, I'm concerned that other finalize() methods in the application may be called incorrectly, or that Java is trying to garbage-collect objects that are still in use.

I know that current best practice recommends against using finalize() and I will probably revisit this home-grown library to remove the finalize() methods. That being said, has anyone come across this issue before? Does anyone have any ideas as to the cause?

 Answers

13

A bit of conjecture here. It is possible for an object to be finalized and garbage collected even if there are references to it in local variables on the stack, and even if there is an active call to an instance method of that object on the stack! The requirement is that the object be unreachable. Even if it's on the stack, if no subsequent code touches that reference, it's potentially unreachable.

See this other answer for an example of how an object can be GC'ed while a local variable referencing it is still in scope.

Here's an example of how an object can be finalized while an instance method call is active:

class FinalizeThis {
    protected void finalize() {
        System.out.println("finalized!");
    }

    void loop() {
        System.out.println("loop() called");
        for (int i = 0; i < 1_000_000_000; i++) {
            if (i % 1_000_000 == 0)
                System.gc();
        }
        System.out.println("loop() returns");
    }

    public static void main(String[] args) {
        new FinalizeThis().loop();
    }
}

While the loop() method is active, there is no possibility of any code doing anything with the reference to the FinalizeThis object, so it's unreachable. And therefore it can be finalized and GC'ed. On JDK 8 GA, this prints the following:

loop() called
finalized!
loop() returns

every time.

Something similar might be going on with MimeBodyPart. Is it being stored in a local variable? (It seems so, since the code seems to adhere to a convention that fields are named with an m_ prefix.)

UPDATE

In the comments, the OP suggested making the following change:

    public static void main(String[] args) {
        FinalizeThis finalizeThis = new FinalizeThis();
        finalizeThis.loop();
    }

With this change he didn't observe finalization, and neither do I. However, if this further change is made:

    public static void main(String[] args) {
        FinalizeThis finalizeThis = new FinalizeThis();
        for (int i = 0; i < 1_000_000; i++)
            Thread.yield();
        finalizeThis.loop();
    }

finalization once again occurs. I suspect the reason is that without the loop, the main() method is interpreted, not compiled. The interpreter is probably less aggressive about reachability analysis. With the yield loop in place, the main() method gets compiled, and the JIT compiler detects that finalizeThis has become unreachable while the loop() method is executing.

Another way of triggering this behavior is to use the -Xcomp option to the JVM, which forces methods to be JIT-compiled before execution. I wouldn't run an entire application this way -- JIT-compiling everything can be quite slow and take lots of space -- but it's useful for flushing out cases like this in little test programs, instead of tinkering with loops.

Tuesday, June 1, 2021
 
ranhan
answered 7 Months ago
82

The problem is that the type inference has been improved. You have a method like

public <T extends Base> T get() {
    return (T) new Derived();
}

which basically says, “the caller can decide what subclass of Base I return”, which is obvious nonsense. Every compiler should give you an unchecked warning about your type cast (T) here.

Now you have a method call:

set(new Derived(), new Consumer().get());

Recall that your method Consumer.get() says “the caller can decide what I return”. So it’s perfectly correct to assume that there could be a type which extends Base and implement Collection at the same time. So the compiler says “I don’t know whether to call set(Base i, Derived b) or set(Derived d, Collection<? extends Consumer> o)”.

You can “fix” it by calling set(new Derived(), new Consumer().<Derived>get()); but to illustrate the madness of your method, note that you can also change it to

public <X extends Base&Collection<Consumer>> void test() {
    set(new Derived(), new Consumer().<X>get());
}

which will now call set(Derived d, Collection<? extends Consumer> o) without any compiler warning. The actual unsafe operation happened inside the get method.

So the correct fix would be to remove the type parameter from the get method and declare what it really returns, Derived.


By the way, what irritates me, is your claim that this code could be compiled under Java 7. Its limited type inference with nested method calls leads to treating the get method in a nested invocation context like returning Base which can’t be passed to a method expecting a Derived. As a consequence, trying to compile this code using a conforming Java 7 compiler will fail as well, but for different reasons.

Monday, June 7, 2021
 
TheFrack
answered 6 Months ago
71

According to this simple test program, the JVM will still make its call to finalize() even if you explicitly called it:

private static class Blah
{
  public void finalize() { System.out.println("finalizing!"); }
}

private static void f() throws Throwable
{
   Blah blah = new Blah();
   blah.finalize();
}

public static void main(String[] args) throws Throwable
{
    System.out.println("start");
    f();
    System.gc();
    System.out.println("done");
}

The output is:

start
finalizing!
finalizing!
done

Every resource out there says to never call finalize() explicitly, and pretty much never even implement the method because there are no guarantees as to if and when it will be called. You're better off just closing all of your resources manually.

Friday, June 25, 2021
 
Wickethewok
answered 6 Months ago
86

Actually, in JavaScript even primitives are stored in the heap rather than on a stack (see note below the break below, though). When control enters a function, an execution context (an object) for that call to the function is created, which has a variable object. All vars and arguments to the function (plus a couple of other things) are properties of that anonymous variable object, exactly like other properties of named objects. A call stack is used, but the spec doesn't require the stack be used for "local" variable storage, and JavaScript's closures would make using a stack a'la C, C++, etc. for that impractical. Details in the spec.

Instead, a chain (linked list) is used. When you refer to an unqualified symbol, the interpreter checks the variable object for the current execution context to see if it has a property for that name. If so, it gets used; if not, the next variable object in the scope chain is checked (note that this is in the lexical order, not the call order like a call stack), and so on until the global execution context is reached (the global execution context has a variable object just like any other execution context does). The variable object for the global EC is the only one we can directly access in code: this points to it in global scope code (and in any function called without this being explicitly set). (On browsers, we have another way of accessing it directly: The global variable object has a property called window that it uses to point to itself.)

Re your question of why objects are stored in the heap: Because they can be created and released independently of one another. C, C++, and others that use a stack for local variables can do so because variables can (and should) be destroyed when the function returns. A stack is a nice efficient way to do that. But objects aren't created an destroyed in that straightforward a way; three objects created at the same time can have radically different lifecycles, so a stack doesn't make sense for them. And since JavaScript's locals are stored on objects and those objects have a lifecycle that's (potentially) unrelated to the function returning...well, you get the idea. :-) In JavaScript, the stack is pretty much just for return addresses.


However, it's worth noting that just because things are as described above conceptually, that doesn't mean that an engine has to do it that way under the hood. As long as it works externally as described in the spec, implementations (engines) are free to do what they like. I understand that V8 (Google's JavaScript engine, used in Chrome and elsewhere) does some very clever things, like for instance using the stack for local variables (and even local object allocations within the function) and then only copying those out into the heap if necessary (e.g., because the execution context or individual objects on it survive the call). You can see how in the majority of cases, this would minimize heap fragmentation and reclaim memory used for temporaries more aggressively and efficiently than relying on GC, because the execution context associated with most function calls doesn't need to survive the call. Let's look at an example:

function foo() {
    var n;

    n = someFunctionCall();
    return n * 2;
}

function bar() {
    var n;

    n = someFunction();
    setCallback(function() {
        if (n === 2) {
            doThis();
        }
        else {
            doThat();
        }
    });
}

In the above, an engine like V8 that aggressively optimizes can detect that the conceptual execution context for a call to foo never needs to survive when foo returns. So V8 would be free to allocate that context on the stack, and use a stack-based mechanism for cleanup.

In contrast, the execution context created for a call to bar has to stick around after bar returns, because there's a closure (the anonymous function we passed into setCallback) relying on it. So when compiling bar (because V8 compiles to machine code on-the-fly), V8 may well use a different strategy, actually allocating the context object in the heap.

(If either of the above had used eval in any way, by the way, it's likely V8 and other engines don't even attempt any form of optimization, because eval introduces too many optimization failure modes. Yet another reason not to use eval if you don't have to, and you almost never have to.)

But these are implementation details. Conceptually, things are as described above the break.

Sunday, August 1, 2021
 
Rocket
answered 4 Months ago
53

The reason why powers of 2 appear everywhere is because when expressing numbers in binary (as they are in circuits), certain math operations on powers of 2 are simpler and faster to perform (just think about how easy math with powers of 10 are with the decimal system we use). For example, multication is not a very efficient process in computers - circuits use a method similar to the one you use when multiplying two numbers each with multiple digits. Multiplying or dividing by a power of 2 requires the computer to just move bits to the left for multiplying or the right for dividing.

And as for why 16 for HashMap? 10 is a commonly used default for dynamically growing structures (arbitrarily chosen), and 16 is not far off - but is a power of 2.

You can do modulus very efficiently for a power of 2. n % d = n & (d-1) when d is a power of 2, and modulus is used to determine which index an item maps to in the internal array - which means it occurs very often in a Java HashMap. Modulus requires division, which is also much less efficient than using the bitwise and operator. You can convince yourself of this by reading a book on Digital Logic.

The reason why bitwise and works this way for powers of two is because every power of 2 is expressed as a single bit set to 1. Let's say that bit is t. When you subtract 1 from a power of 2, you set every bit below t to 1, and every bit above t (as well as t) to 0. Bitwise and therefore saves the values of all bits below position t from the number n (as expressed above), and sets the rest to 0.

But how does that help us? Remember that when dividing by a power of 10, you can count the number of zeroes following the 1, and take that number of digits starting from the least significant of the dividend in order to find the remainder. Example: 637989 % 1000 = 989. A similar property applies to binary numbers with only one bit set to 1, and the rest set to 0. Example: 100101 % 001000 = 000101

Saturday, November 20, 2021
 
ysquared
answered 2 Weeks ago
Only authorized users can answer the question. Please sign in first, or register a free account.
Not the answer you're looking for? Browse other questions tagged :  
Share