Skip to main content

Is volatile expensive?



After reading http://gee.cs.oswego.edu/dl/jmm/cookbook.html about the implementation of volatile, especially section "Interactions with Atomic Instructions" I assume that reading a volatile variable without updating it needs a LoadLoad or a LoadStore barrier. Further down the page I see that LoadLoad and LoadStore are effectively no-ops on X86 CPUs. Does this mean that volatile read operations can be done without a explicit cache invalidation on x86, and is as fast a normal variable read (disregarding the reordering constraints of volatile)?





I believe I don't understand this correctly. Could someone care to enlighten me?





EDIT: I wonder if there are differences in multi-processor environments. On single CPU systems the CPU might look at it's own thread caches, as John V. states, but on multi CPU systems there must be some config option to the CPUs that this is not enough and main memory has to be hit, making volatile slower on multi cpu systems, right?





PS: On my way to learn more about this I stumbled about the following great articles, and since this question may be interesting to others, I'll share my links here:




Comments

  1. In the words of the Java Memory Model (as defined for Java 5+ in JSR 133), any operation -- read or write -- on a volatile variable creates a happens-before relationship with respect to any other operation on the same variable. This means that the compiler and JIT are forced to avoid certain optimisations such as reordering instructions within the thread or performing operations only within the local cache.

    Since some optimisations are not available, the resulting code is necessarily slower that it would have been, though probably not by very much.

    Nevertheless you shouldn't make a variable volatile unless you know that it will be accessed from multiple threads outside of synchronized blocks. Even then you should consider whether volatile is the best choice versus synchronized, AtomicReference and its friends, the explicit Lock classes, etc.

    ReplyDelete
  2. Accessing a volatile variable is in many ways similar to wrapping access to an ordinary variable in a synchronized block. For instance, access to a volatile variable prevents the CPU from re-ordering the instructions before and after the access, and this generally slows down execution (though I can't say by how much).

    More generally, on a multi-processor system I don't see how access to a volatile variable can be done without penalty -- there must be some way to ensure a write on processor A will be synchronized to a read on processor B.

    ReplyDelete
  3. Generally speaking, on most modern processors a volatile load is comparable to a normal load. A volatile store is about 1/3 the time of a montior-enter/monitor-exit. This is seen on systems that are cache coherent.

    To answer the OP's question, volatile writes are expensive while the reads usually are not.


    Does this mean that volatile read
    operations can be done without a
    explicit cache invalidation on x86,
    and is a fast a normal variable read
    (disregarding the reordering
    contraints of volatile)?


    Yes, sometimes when validating a field the CPU may not even hit main memory, instead spy on other thread caches and get the value from there (very general explanation).

    However, I second Neil's suggestion that if you have a field accessed by multiple threads you shold wrap it as an AtomicReference. Being an AtomicReference it executes roughly the same throughput for reads/writes but also is more obvious that the field will be accessed and modified by multiple threads.

    Edit to answer OP's edit:

    Cache coherence is a bit of a complicated protocol, but in short: CPU's will share a common cache line that is attached to main memory. If a CPU loads memory and no other CPU had it that CPU will assume it is the most up to date value. If another CPU tries to load the same memory location the already loaded CPU will be aware of this and actually share the cached reference to the requesting CPU - now the request CPU has a copy of that memory in its CPU cache. (It never had to look in main memory for the reference)

    There is quite a bit more of protocol involved but this gives an idea of what is going on. Also to answer your other question, with the absence of multiple processors, volatile reads/writes can in fact be faster then with multiple processors. There are some applications that would in fact run faster concurrently with a single CPU then multiple.

    ReplyDelete

Post a Comment

Popular posts from this blog

[韓日関係] 首相含む大幅な内閣改造の可能性…早ければ来月10日ごろ=韓国

div not scrolling properly with slimScroll plugin

I am using the slimScroll plugin for jQuery by Piotr Rochala Which is a great plugin for nice scrollbars on most browsers but I am stuck because I am using it for a chat box and whenever the user appends new text to the boxit does scroll using the .scrollTop() method however the plugin's scrollbar doesnt scroll with it and when the user wants to look though the chat history it will start scrolling from near the top. I have made a quick demo of my situation http://jsfiddle.net/DY9CT/2/ Does anyone know how to solve this problem?

Why does this javascript based printing cause Safari to refresh the page?

The page I am working on has a javascript function executed to print parts of the page. For some reason, printing in Safari, causes the window to somehow update. I say somehow, because it does not really refresh as in reload the page, but rather it starts the "rendering" of the page from start, i.e. scroll to top, flash animations start from 0, and so forth. The effect is reproduced by this fiddle: http://jsfiddle.net/fYmnB/ Clicking the print button and finishing or cancelling a print in Safari causes the screen to "go white" for a sec, which in my real website manifests itself as something "like" a reload. While running print button with, let's say, Firefox, just opens and closes the print dialogue without affecting the fiddle page in any way. Is there something with my way of calling the browsers print method that causes this, or how can it be explained - and preferably, avoided? P.S.: On my real site the same occurs with Chrome. In the ex