Asked  6 Months ago    Answers:  5   Viewed   28 times

I've got a dict that has a whole bunch of entries. I'm only interested in a select few of them. Is there an easy way to prune all the other ones out?

 Answers

30

Constructing a new dict:

dict_you_want = { your_key: old_dict[your_key] for your_key in your_keys }

Uses dictionary comprehension.

If you use a version which lacks them (ie Python 2.6 and earlier), make it dict((your_key, old_dict[your_key]) for ...). It's the same, though uglier.

Note that this, unlike jnnnnn's version, has stable performance (depends only on number of your_keys) for old_dicts of any size. Both in terms of speed and memory. Since this is a generator expression, it processes one item at a time, and it doesn't looks through all items of old_dict.

Removing everything in-place:

unwanted = set(keys) - set(your_dict)
for unwanted_key in unwanted: del your_dict[unwanted_key]
Tuesday, June 1, 2021
 
MDDY
answered 6 Months ago
82
>>> {key:{k:v for k,v in dic.items() if 'customer' in k} for key,dic in clients.items()}
{'Shop': {'customer': 'cumoster_v'}, 'Gym': {'customer_2': 'customer_v2', 'customer_1': 'customer_v1'}, 'Bank': {'customer_3': 'customer_v3'}}
Thursday, August 26, 2021
 
Magn3s1um
answered 3 Months ago
49

I would use groupby and just pick the first one from each group:

1) First sort your list by key (to create the groups) and descending count of nulls (your stated goal):

>>> l2=sorted(l, key=lambda d: (d['id'], -sum(1 for v in d.values() if v))) 

2) Then group by id and take the first element of each iterator presented as d in the groupby on the sorted list:

>>> from itertools import groupby
>>> [next(d) for _,d in groupby(l2, key=lambda _d: _d['id'])]
[{'id': 'a', 'foo': 'bar', 'baz': 'bat'}, {'id': 'b', 'foo': 'bar', 'baz': 'bat'}]

If you want a 'tie breaker' to select the first dict if otherwise they have the same null count, you can add an enumerate decorator:

>>> l2=sorted(enumerate(l), key=lambda t: (t[1]['id'], t[0], -sum(1 for v in t[1].values() if v)))
>>> [next(d)[1] for _,d in groupby(l2, key=lambda t: t[1]['id'])]

I doubt that additional step is actually necessary though since Python's sort (and sorted) is a stable sort and the sequence will only change from list order based on the key and void counts. So use the first version unless you are sure you need to use the second.

Thursday, October 21, 2021
 
Andro Selva
answered 1 Month ago
90

You always clear, then append and then insert the same sectionList, that's why it always overwrites the entries - because you told the program it should.

Always remember: In Python assignment never makes a copy!

Simple fix

Just insert a copy:

formatDict[section] = sectionList.copy()    # changed here

Instead of inserting a reference:

formatDict[section] = sectionList  

Complicated fix

There are lots of things going on and you could make it "better" by using functions for subtasks like the grouping, also files should be opened with with so that the file is closed automatically even if an exception occurs and while loops where the end is known should be avoided.

Personally I would use code like this:

def groups(seq, width):
    """Group a sequence (seq) into width-sized blocks. The last block may be shorter."""
    length = len(seq)
    for i in range(0, length, width):   # range supports a step argument!
        yield seq[i:i+width]

# Printing the dictionary could be useful in other places as well -> so
# I also created a function for this.
def print_dict_line_by_line(dct):  
    """Print dictionary where each key-value pair is on one line."""
    for key, value in dct.items():
        print("for key =", key, "value =", value)

def mytask(filename):
    formatDict = {}
    with open(filename) as formatFileHandle:
        # I don't "strip" each line (remove leading and trailing whitespaces/newlines)
        # but if you need that you could also use:
        # for usableLine in (line.strip() for line in formatFileHandle):
        # instead.
        for usableLine in formatFileHandle:
            section = usableLine[:3]
            sectionList = list(groups(usableLine[3:]))
            formatDict[section] = sectionList
    # upon exiting the "with" scope the file is closed automatically!
    print_dict_line_by_line(formatDict)

if __name__ == '__main__':
    mytask('insert your filename here')
Sunday, October 24, 2021
 
Nasenbaer
answered 1 Month ago
73

You're right: your test always passes because one condition is true. You need all the conditions to be true.

You could use all to get the proper behaviour:

{k: v for k, v in all_dict.items() if all(v[feature] == match_dict[feature] for feature in feature_list)}

note that if match_list keys are the same as feature_list, it's even simpler, just compare dictionaries:

r = {k: v for k, v in all_dict.items() if v == match_dict}

(or compute a filtered match_dict with the features you require first. Performance will be better)

Tuesday, November 23, 2021
 
blacksite
answered 7 Days ago
Only authorized users can answer the question. Please sign in first, or register a free account.
Not the answer you're looking for? Browse other questions tagged :  
Share