Asked  6 Months ago    Answers:  5   Viewed   52 times

Anyone have a quick method for de-duplicating a generic List in C#?

 Answers

91

Perhaps you should consider using a HashSet.

From the MSDN link:

using System;
using System.Collections.Generic;

class Program
{
    static void Main()
    {
        HashSet<int> evenNumbers = new HashSet<int>();
        HashSet<int> oddNumbers = new HashSet<int>();

        for (int i = 0; i < 5; i++)
        {
            // Populate numbers with just even numbers.
            evenNumbers.Add(i * 2);

            // Populate oddNumbers with just odd numbers.
            oddNumbers.Add((i * 2) + 1);
        }

        Console.Write("evenNumbers contains {0} elements: ", evenNumbers.Count);
        DisplaySet(evenNumbers);

        Console.Write("oddNumbers contains {0} elements: ", oddNumbers.Count);
        DisplaySet(oddNumbers);

        // Create a new HashSet populated with even numbers.
        HashSet<int> numbers = new HashSet<int>(evenNumbers);
        Console.WriteLine("numbers UnionWith oddNumbers...");
        numbers.UnionWith(oddNumbers);

        Console.Write("numbers contains {0} elements: ", numbers.Count);
        DisplaySet(numbers);
    }

    private static void DisplaySet(HashSet<int> set)
    {
        Console.Write("{");
        foreach (int i in set)
        {
            Console.Write(" {0}", i);
        }
        Console.WriteLine(" }");
    }
}

/* This example produces output similar to the following:
 * evenNumbers contains 5 elements: { 0 2 4 6 8 }
 * oddNumbers contains 5 elements: { 1 3 5 7 9 }
 * numbers UnionWith oddNumbers...
 * numbers contains 10 elements: { 0 2 4 6 8 1 3 5 7 9 }
 */
Tuesday, June 1, 2021
 
conmen
answered 6 Months ago
35

yield

You can use a generator for an elegant solution. At each iteration, yield twice—once with the original element, and once with the element with the added suffix.

The generator will need to be exhausted; that can be done by tacking on a list call at the end.

def transform(l):
    for i, x in enumerate(l, 1):
        yield x
        yield f'{x}_{i}'  # {}_{}'.format(x, i)

You can also re-write this using the yield from syntax for generator delegation:

def transform(l):
    for i, x in enumerate(l, 1):
        yield from (x, f'{x}_{i}') # (x, {}_{}'.format(x, i))

out_l = list(transform(l))
print(out_l)
['a', 'a_1', 'b', 'b_2', 'c', 'c_3']

If you're on versions older than python-3.6, replace f'{x}_{i}' with '{}_{}'.format(x, i).

Generalising
Consider a general scenario where you have N lists of the form:

l1 = [v11, v12, ...]
l2 = [v21, v22, ...]
l3 = [v31, v32, ...]
...

Which you would like to interleave. These lists are not necessarily derived from each other.

To handle interleaving operations with these N lists, you'll need to iterate over pairs:

def transformN(*args):
    for vals in zip(*args):
        yield from vals

out_l = transformN(l1, l2, l3, ...)

Sliced list.__setitem__

I'd recommend this from the perspective of performance. First allocate space for an empty list, and then assign list items to their appropriate positions using sliced list assignment. l goes into even indexes, and l' (l modified) goes into odd indexes.

out_l = [None] * (len(l) * 2)
out_l[::2] = l
out_l[1::2] = [f'{x}_{i}' for i, x in enumerate(l, 1)]  # [{}_{}'.format(x, i) ...]

print(out_l)
['a', 'a_1', 'b', 'b_2', 'c', 'c_3']

This is consistently the fastest from my timings (below).

Generalising
To handle N lists, iteratively assign to slices.

list_of_lists = [l1, l2, ...]

out_l = [None] * len(list_of_lists[0]) * len(list_of_lists)
for i, l in enumerate(list_of_lists):
    out_l[i::2] = l

zip + chain.from_iterable

A functional approach, similar to @chrisz' solution. Construct pairs using zip and then flatten it using itertools.chain.

from itertools import chain
# [{}_{}'.format(x, i) ...]
out_l = list(chain.from_iterable(zip(l, [f'{x}_{i}' for i, x in enumerate(l, 1)]))) 

print(out_l)
['a', 'a_1', 'b', 'b_2', 'c', 'c_3']

iterools.chain is widely regarded as the pythonic list flattening approach.

Generalising
This is the simplest solution to generalise, and I suspect the most efficient for multiple lists when N is large.

list_of_lists = [l1, l2, ...]
out_l = list(chain.from_iterable(zip(*list_of_lists)))

Performance

Let's take a look at some perf-tests for the simple case of two lists (one list with its suffix). General cases will not be tested since the results widely vary with by data.

enter image description here

Benchmarking code, for reference.

Functions

def cs1(l):
    def _cs1(l):
        for i, x in enumerate(l, 1):
            yield x
            yield f'{x}_{i}'

    return list(_cs1(l))

def cs2(l):
    out_l = [None] * (len(l) * 2)
    out_l[::2] = l
    out_l[1::2] = [f'{x}_{i}' for i, x in enumerate(l, 1)]

    return out_l

def cs3(l):
    return list(chain.from_iterable(
        zip(l, [f'{x}_{i}' for i, x in enumerate(l, 1)])))

def ajax(l):
    return [
        i for b in [[a, '{}_{}'.format(a, i)] 
        for i, a in enumerate(l, start=1)] 
        for i in b
    ]

def ajax_cs0(l):
    # suggested improvement to ajax solution
    return [j for i, a in enumerate(l, 1) for j in [a, '{}_{}'.format(a, i)]]

def chrisz(l):
    return [
        val 
        for pair in zip(l, [f'{k}_{j+1}' for j, k in enumerate(l)]) 
        for val in pair
    ]
Thursday, July 15, 2021
 
Kenny
answered 5 Months ago
12

You can use java 8 Arrays stream.distinct() method to get distinct values from array and it will remain the input order only

public static void main(String[] args) {
    int[] input = {7,8,7,1,9,0,9,1,2,8};
    int[] output = Arrays.stream(input).distinct().toArray();
    System.out.println(Arrays.toString(output)); //[7, 8, 1, 9, 0, 2]
}
Tuesday, August 24, 2021
 
Steve
answered 3 Months ago
33

I'm assuming that by string, you mean std::string.

If it's only the last character of the string that needs removing you can do:

mystring.pop_back();

mystring.erase(mystring.size() - 1);

Edit: pop_back() is the next version of C++, sorry.

With some checking:

if (!mystring.empty() && mystring[mystring.size() - 1] == 'r')
    mystring.erase(mystring.size() - 1);

If you want to remove all r, you can use:

mystring.erase( std::remove(mystring.begin(), mystring.end(), 'r'), mystring.end() );
Thursday, October 28, 2021
 
RealHowTo
answered 1 Month ago
27

you can, but do not forget to append .ToList(); in the end. also you can call newset.AddRange(ListX); i think it is better in terms of performance

Sunday, November 7, 2021
 
Razvan N
answered 3 Weeks ago
Only authorized users can answer the question. Please sign in first, or register a free account.
Not the answer you're looking for? Browse other questions tagged :
 
Share