Asked  7 Months ago    Answers:  5   Viewed   47 times

I need to take a C++ vector with potentially a lot of elements, erase duplicates, and sort it.

I currently have the below code, but it doesn't work.

vec.erase(
      std::unique(vec.begin(), vec.end()),
      vec.end());
std::sort(vec.begin(), vec.end());

How can I correctly do this?

Additionally, is it faster to erase the duplicates first (similar to coded above) or perform the sort first? If I do perform the sort first, is it guaranteed to remain sorted after std::unique is executed?

Or is there another (perhaps more efficient) way to do all this?

 Answers

85

I agree with R. Pate and Todd Gardner; a std::set might be a good idea here. Even if you're stuck using vectors, if you have enough duplicates, you might be better off creating a set to do the dirty work.

Let's compare three approaches:

Just using vector, sort + unique

sort( vec.begin(), vec.end() );
vec.erase( unique( vec.begin(), vec.end() ), vec.end() );

Convert to set (manually)

set<int> s;
unsigned size = vec.size();
for( unsigned i = 0; i < size; ++i ) s.insert( vec[i] );
vec.assign( s.begin(), s.end() );

Convert to set (using a constructor)

set<int> s( vec.begin(), vec.end() );
vec.assign( s.begin(), s.end() );

Here's how these perform as the number of duplicates changes:

comparison of vector and set approaches

Summary: when the number of duplicates is large enough, it's actually faster to convert to a set and then dump the data back into a vector.

And for some reason, doing the set conversion manually seems to be faster than using the set constructor -- at least on the toy random data that I used.

Tuesday, June 1, 2021
 
TecHunter
answered 7 Months ago
79

Situations where you want to insert a lot of items into anywhere but the end of a sequence repeatedly.

Check out the complexity guarantees for each different type of container:

What are the complexity guarantees of the standard containers?

Tuesday, June 1, 2021
 
alez
answered 7 Months ago
56
>>> from itertools import groupby
>>> L = [0, 0, 0, 3, 3, 2, 5, 2, 6, 6]
>>> grouped_L = [(k, sum(1 for i in g)) for k,g in groupby(L)]
>>> # Or (k, len(list(g))), but that creates an intermediate list
>>> grouped_L
[(0, 3), (3, 2), (2, 1), (5, 1), (2, 1), (6, 2)]

Batteries included, as they say.

Suggestion for using sum and generator expression from JBernardo; see comment.

Sunday, June 6, 2021
 
Valdas
answered 7 Months ago
28

Unfortunately, I believe 1 is the best option. I suspect a majority of your overhead in comparison to iPhone is in the cross process IPC inherent to the content provider design.

Your analysis of 3 is correct.

There are options on rooted devices to go around the content provider but I doubt that is what you are looking for.

Wednesday, October 13, 2021
 
itowlson
answered 2 Months ago
73

From a purely philosophical point of view: yes, a string is a type of vector. It is a contiguous memory block that stores characters (a vector is a contiguous memory block that stores objects of arbitrary types). So, from this perspective, a string is a special kind of vector.

In terms of design and implementation of std::string and std::vector, they share some of the same interface elements (e.g. contiguous memory blocks, operator[]), but std::string does not derive from std::vector (side note: you should not publicly derive from standard containers as they are not designed to be based classes - e.g. they do not have virtual destructors), nor are they directly convertible to each other. That is, the following will not compile:

std::string s = "abc";
std::vector<char> v = s; // ERROR!

However, since they both have iterator support, you can convert a string to a vector:

std::string s = "abc";
std::vector<char> v(s.begin(), s.end()); // note that the vector will NOT include the '' character

std::string will no longer have a reference count (as of C++11) as the copy-on-write functionality that many implementations used was forbidden by the C++11 standard.

From a memory perspective, an instance of std::string will look very similar to a std::vector<char> (e.g. they both will have a pointer to their memory location, a size, a capacity), but the functionality of the two classes is different.

Monday, November 1, 2021
 
Mike
answered 1 Month ago
Only authorized users can answer the question. Please sign in first, or register a free account.
Not the answer you're looking for? Browse other questions tagged :
 
Share