Asked  6 Months ago    Answers:  5   Viewed   75 times

Right, so I have an enumerable and wish to get distinct values from it.

Using System.Linq, there's, of course, an extension method called Distinct. In the simple case, it can be used with no parameters, like:

var distinctValues = myStringList.Distinct();

Well and good, but if I have an enumerable of objects for which I need to specify equality, the only available overload is:

var distinctValues = myCustomerList.Distinct(someEqualityComparer);

The equality comparer argument must be an instance of IEqualityComparer<T>. I can do this, of course, but it's somewhat verbose and, well, cludgy.

What I would have expected is an overload that would take a lambda, say a Func<T, T, bool>:

var distinctValues = myCustomerList.Distinct((c1, c2) => c1.CustomerId == c2.CustomerId);

Anyone know if some such extension exists, or some equivalent workaround? Or am I missing something?

Alternatively, is there a way of specifying an IEqualityComparer inline (embarrass me)?

Update

I found a reply by Anders Hejlsberg to a post in an MSDN forum on this subject. He says:

The problem you're going to run into is that when two objects compare equal they must have the same GetHashCode return value (or else the hash table used internally by Distinct will not function correctly). We use IEqualityComparer because it packages compatible implementations of Equals and GetHashCode into a single interface.

I suppose that makes sense.

 Answers

37
IEnumerable<Customer> filteredList = originalList
  .GroupBy(customer => customer.CustomerId)
  .Select(group => group.First());
Tuesday, June 1, 2021
 
hnkk
answered 6 Months ago
15

Use this:

with summed_sales_of_each_product as 
(
    select p.artist_name, p.product_id, sum(i.qty) as total
    from product p join order_item i 
    on i.product_id = p.product_id
    group by p.artist_name, p.product_id
),
each_artist_top_selling_product as
(
    select x_in.artist_name, x_in.product_id, x_in.total 
    from summed_sales_of_each_product x_in where total = 
        (select max(x_out.total) 
            from summed_sales_of_each_product x_out 
            where x_out.artist_name = x_in.artist_name)
)
select top 3
artist_name, product_id, total
from each_artist_top_selling_product
order by total desc

But you cannot stop at that query, how about if there are two products on one artist that are ties on highest selling? This is how the data like this...

beatles  yesterday       1000
beatles  something       1000
elvis    jailbreak rock  800
nirvana  lithium         600
tomjones sexbomb         400

...will result to following using the above query:

beatles  yesterday       1000
beatles  something       1000
elvis    jailbreak rock  800

Which one to choose? yesterday or something? Since you cannot arbitrarily chose one over the other, you must list both. Also, what if the top 10 highest selling belongs to beatles and are ties, each with a quantity of 1000? Since that is the very best thing you are avoiding(i.e. reporting same artist on top 3), you have to amend the query so the top 3 report will look like this:

beatles  yesterday       1000
beatles  something       1000
elvis    jailbreak rock  800
nirvana  lithium         600

To Amend:

with summed_sales_of_each_product as 
(
    select p.artist_name, p.product_id, sum(i.qty) as total
    from product p join order_item i 
    on i.product_id = p.product_id
    group by p.artist_name, p.product_id
),
each_artist_top_selling_product as
(
    select x_in.artist_name, x_in.product_id, x_in.total 
    from summed_sales_of_each_product x_in 
    where x_in.total = 
        (select max(x_out.total) 
            from summed_sales_of_each_product x_out 
            where x_out.artist_name = x_in.artist_name)
),
top_3_total as
(    
    select distinct top 3 total 
    from each_artist_top_selling_product
    order by total desc
)
select artist_name, product_id, total 
from each_artist_top_selling_product
where total in (select total from top_3_total)
order by total desc

How about if the beatles has another product which has 900 qty? Will the above query still work? Yes, it will still work. Since the top_3 CTE only concerns itself from the already filtered top qty on each artist. So this source data...

beatles  yesterday       1000
beatles  something       1000
beatles  and i love her  900
elvis    jailbreak rock  800
nirvana  lithium         600
tomjones sexbomb         400

...will still result to following:

beatles  yesterday       1000
beatles  something       1000
elvis    jailbreak rock  800
nirvana  lithium         600
Sunday, August 22, 2021
 
bruce
answered 4 Months ago
23

No, it's not possible. Extension methods can only be created for instances

Friday, August 27, 2021
 
styvane
answered 3 Months ago
56

Performance:

Winner: GROUP BY

Some very rudimentary testing on a large table with unindexed columns showed that at least in my case the two queries generated a completely different query plan. The one for PARTITION BY was significantly slower.

The GROUP BY query plan included only a table scan and aggregation operation while the PARTITION BY plan had two nested loop self-joins. The PARTITION BY took about 2800ms on the second run, the GROUP BY took only 500ms.

Readability / Maintainability:

Winner: GROUP BY

Based on the opinions of the commenters here the PARTITION BY is less readable for most developers so it will be probably also harder to maintain in the future.

Flexibility

Winner: PARTITION BY

PARTITION BY gives you more flexibility in choosing the grouping columns. With GROUP BY you can have only one set of grouping columns for all aggregated columns. With DISTINCT + PARTITION BY you can have different column in each partition. Also on some DBMSs you can chose from more aggregation/analytic functions in the OVER clause.

Sunday, September 5, 2021
 
John Oleynik
answered 3 Months ago
18

The nullary lambda equivalent would be () => 2.

Friday, September 24, 2021
 
matthy
answered 2 Months ago
Only authorized users can answer the question. Please sign in first, or register a free account.
Not the answer you're looking for? Browse other questions tagged :
 
Share