Asked  7 Months ago    Answers:  5   Viewed   30 times

This is straight from the Java Docs:

This class and its iterator implement all of the optional methods of the Collection and Iterator interfaces. The Iterator provided in method iterator() is not guaranteed to traverse the elements of the priority queue in any particular order. If you need ordered traversal, consider using Arrays.sort(pq.toArray()).

So basically, my PriorityQueue works fine, but printing it out to the screen using its own built in toString() method caused me to see this anomaly in action, and was wondering if someone could explain why it is that the iterator provided (and used internally) does not traverse the PriorityQueue in its natural order?

 Answers

85

Because the underlying data structure doesn't support it. A binary heap is only partially ordered, with the smallest element at the root. When you remove that, the heap is reordered so that the next smallest element is at the root. There is no efficient ordered traversal algorithm so none is provided in Java.

Tuesday, June 1, 2021
 
SubniC
answered 7 Months ago
44

A LinkedList is the worst choice. Either use an ArrayList (or, more generally, a RandomAccess implementor), or PriorityQueue. If you do use a list, sort it only before iterating over its contents, not after every insert.

One thing to note is that the PriorityQueue iterator does not provide the elements in order; you'll actually have to remove the elements (empty the queue) to iterate over its elements in order.

Friday, July 30, 2021
 
123r789
answered 5 Months ago
20

If you need to have an ordering according the insertion order you need to use an extra element for timestamp. I.e. on insertions and equal weight use timestamp to see which element was inserted first. So CustomObject should be something like:

class CustomObject {  
   int weight;  
   long timestamp;  
}

And the comparison should be:

public int compareTo (CustomObject o) {  
    int thisWeight = this.weight;  
    int thatWeight = o.weight;  
    if (thisWeight != thatWeight) {  
        return thisWeight - thatWeight;  
    }  
    else {  
        return this.timestamp - o.timestamp;  
    }  
}  

The smaller timestamp means it was inserted earlier so you keep in the insertion order.

You could also use a "logical" time by maintaining a counter that you update on each add or remove.

Tuesday, August 3, 2021
 
qwertylpc
answered 4 Months ago
35

You misunderstand what your code does.

Your code on line // 1 is free to not block at all. condition_variables can (and will!) have spurious wakeups -- they can wake up for no good reason at all.

You are responsible for checking if the wakeup is spurious.

Using a condition_variable properly requires 3 things:

  • A condition_variable
  • A mutex
  • Some data guarded by the mutex

The data guarded by the mutex is modified (under the mutex). Then (with the mutex possibly disengaged), the condition_variable is notified.

On the other end, you lock the mutex, then wait on the condition variable. When you wake up, your mutex is relocked, and you test if the wakeup is spurious by looking at the data guarded by the mutex. If it is a valid wakeup, you process and proceed.

If it wasn't a valid wakeup, you go back to waiting.

In your case, you don't have any data guarded, you cannot distinguish spurious wakeups from real ones, and your design is incomplete.

Not surprisingly with the incomplete design you don't see the reason why the mutex is relocked: it is relocked so you can safely check the data to see if the wakeup was spurious or not.

If you want to know why condition variables are designed that way, probably because this design is more efficient than the "reliable" one (for whatever reason), and rather than exposing higher level primitives, C++ exposed the lower level more efficient primitives.

Building a higher level abstraction on top of this isn't hard, but there are design decisions. Here is one built on top of std::experimental::optional:

template<class T>
struct data_passer {
  std::experimental::optional<T> data;
  bool abort_flag = false;
  std::mutex guard;
  std::condition_variable signal;

  void send( T t ) {
    {
      std::unique_lock<std::mutex> _(guard);
      data = std::move(t);
    }
    signal.notify_one();
  }
  void abort() {
    {
      std::unique_lock<std::mutex> _(guard);
      abort_flag = true;
    }
    signal.notify_all();
  }        
  std::experimental::optional<T> get() {
    std::unique_lock<std::mutex> _(guard);
    signal.wait( _, [this]()->bool{
      return data || abort_flag;
    });
    if (abort_flag) return {};
    T retval = std::move(*data);
    data = {};
    return retval;
  }
};

Now, each send can cause a get to succeed at the other end. If more than one send occurs, only the latest one is consumed by a get. If and when abort_flag is set, instead get() immediately returns {};

The above supports multiple consumers and producers.

An example of how the above might be used is a source of preview state (say, a UI thread), and one or more preview renderers (which are not fast enough to be run in the UI thread).

The preview state dumps a preview state into the data_passer<preview_state> willy-nilly. The renderers compete and one of them grabs it. Then they render it, and pass it back (through whatever mechanism).

If the preview states come faster than the renderers consume them, only the most recent one is of interest, so the earlier ones are discarded. But existing previews aren't aborted just because a new state shows up.


Questions where asked below about race conditions.

If the data being communicated is atomic, can't we do without the mutex on the "send" side?

So something like this:

template<class T>
struct data_passer {
  std::atomic<std::experimental::optional<T>> data;
  std::atomic<bool> abort_flag = false;
  std::mutex guard;
  std::condition_variable signal;

  void send( T t ) {
    data = std::move(t); // 1a
    signal.notify_one(); // 1b
  }
  void abort() {
    abort_flag = true;   // 1a
    signal.notify_all(); // 1b
  }        
  std::experimental::optional<T> get() {
    std::unique_lock<std::mutex> _(guard); // 2a
    signal.wait( _, [this]()->bool{ // 2b
      return data.load() || abort_flag.load(); // 2c
    });
    if (abort_flag.load()) return {};
    T retval = std::move(*data.load());
    // data = std::experimental::nullopt;  // doesn't make sense
    return retval;
  }
};

the above fails to work.

We start with the listening thread. It does step 2a, then waits (2b). It evaluates the condition at step 2c, but doesn't return from the lambda yet.

The broadcasting thread then does step 1a (setting the data), then signals the condition variable. At this moment, nobody is waiting on the condition variable (the code in the lambda doesn't count!).

The listening thread then finishes the lambda, and returns "spurious wakeup". It then blocks on the condition variable, and never notices that data was sent.

The std::mutex used while waiting on the condition variable must guard the write to the data "passed" by the condition variable (whatever test you do to determine if the wakeup was spurious), and the read (in the lambda), or the possibility of "lost signals" exists. (At least in a simple implementation: more complex implementations can create lock-free paths for "common cases" and only use the mutex in a double-check. This is beyond the scope of this question.)

Using atomic variables does not get around this problem, because the two operations of "determine if the message was spurious" and "rewait in the condition variable" must be atomic with regards to the "spuriousness" of the message.

Wednesday, August 4, 2021
 
moister
answered 4 Months ago
29

That's only one "extra" table.

It's because the same question may have many tags.

And because the same tag may be used by many questions.

You need somewhere to store (questionId, tagId) and to make sure there are no duplicates of that.


I haven't been following your questions on this topic, but it looks like there's some bad design here. I thought you only had one extra table because I assumed you had a sensible structure. You do not.

Why does Question-Tags have both the tag string and a tag id? That doesn't make much sense to me.


I don't want to go back through the sequence of questions. Still, I wanted to try to illustrate what I was talking about. So I created a very simple Object-Role Modeling model of this part of StackOverflow using the NORMA tool:

Simple model of StackOverflow

This generated the following ER diagram:

ER diagram

Note that the "extra" table is all we need to keep for tags, simply because there is no additional information kept about tags. Also, there is no need to store a tag id that is the foreign key to a Tags table, since the tag name is already unique. If we kept additional data about a tag then there would probably be a separate Tags table, with the primary key still being the tag name. That could be changed to use an integer id if it became a performance issue, in which case the tag name would still get a unique index over it.

Sunday, August 15, 2021
 
LunaLoveDove
answered 4 Months ago
Only authorized users can answer the question. Please sign in first, or register a free account.
Not the answer you're looking for? Browse other questions tagged :  
Share