# How does tuple comparison work in Python?

I have been reading the Core Python programming book, and the author shows an example like:

``````(4, 5) < (3, 5) # Equals false
``````

So, I'm wondering, how/why does it equal false? How does python compare these two tuples?

Btw, it's not explained in the book.

46

Tuples are compared position by position: the first item of the first tuple is compared to the first item of the second tuple; if they are not equal (i.e. the first is greater or smaller than the second) then that's the result of the comparison, else the second item is considered, then the third and so on.

See Common Sequence Operations:

Sequences of the same type also support comparisons. In particular, tuples and lists are compared lexicographically by comparing corresponding elements. This means that to compare equal, every element must compare equal and the two sequences must be of the same type and have the same length.

Also Value Comparisons for further details:

Lexicographical comparison between built-in collections works as follows:

• For two collections to compare equal, they must be of the same type, have the same length, and each pair of corresponding elements must compare equal (for example, `[1,2] == (1,2)` is false because the type is not the same).
• Collections that support order comparison are ordered the same as their first unequal elements (for example, `[1,2,x] <= [1,2,y]` has the same value as `x <= y`). If a corresponding element does not exist, the shorter collection is ordered first (for example, `[1,2] < [1,2,3]` is true).

If not equal, the sequences are ordered the same as their first differing elements. For example, cmp([1,2,x], [1,2,y]) returns the same as cmp(x,y). If the corresponding element does not exist, the shorter sequence is considered smaller (for example, [1,2] < [1,2,3] returns True).

Note 1: `<` and `>` do not mean "smaller than" and "greater than" but "is before" and "is after": so (0, 1) "is before" (1, 0).

Note 2: tuples must not be considered as vectors in a n-dimensional space, compared according to their length.

Note 3: referring to question https://stackoverflow.com/questions/36911617/python-2-tuple-comparison: do not think that a tuple is "greater" than another only if any element of the first is greater than the corresponding one in the second.

Tuesday, June 1, 2021

32

Your guess was close - the Linq to Objects `Except` extension method uses a `HashSet<T>` internally for the second sequence passed in - that allows it to look up elements in O(1) while iterating over the first sequence to filter out elements that are contained in the second sequence, hence the overall effort is O(n+m) where n and m are the length of the input sequences - this is the best you can hope to do since you have to look at each element at least once.

For a review of how this might be implemented I recommend Jon Skeet's EduLinq series, here part of it's implementation of `Except` and the link to the full chapter:

``````private static IEnumerable<TSource> ExceptImpl<TSource>(
IEnumerable<TSource> first,
IEnumerable<TSource> second,
IEqualityComparer<TSource> comparer)
{
HashSet<TSource> bannedElements = new HashSet<TSource>(second, comparer);
foreach (TSource item in first)
{
{
yield return item;
}
}
}
``````

Your first implementation on the other hand will compare each element in the first list to each element in the second list - it is performing a cross product. This will require nm operations so it will run in O(nm) - when n and m become large this becomes prohibitively slow very fast. (Also this solution is wrong as is since it will create duplicate elements).

Tuesday, July 20, 2021

17

According to the C11 standard, the relational operators `<`, `<=`, `>`, and `>=` may only be used on pointers to elements of the same array or struct object. This is spelled out in section 6.5.8p5:

When two pointers are compared, the result depends on the relative locations in the address space of the objects pointed to. If two pointers to object types both point to the same object, or both point one past the last element of the same array object, they compare equal. If the objects pointed to are members of the same aggregate object,pointers to structure members declared later compare greater than pointers to members declared earlier in the structure, and pointers to array elements with larger subscript values compare greater than pointers to elements of the same array with lower subscript values. All pointers to members of the same union object compare equal. If the expression P points to an element of an array object and the expression Q points to the last element of the same array object, the pointer expression Q+1 compares greater than P. In all other cases, the behavior is undefined.

Note that any comparisons that do not satisfy this requirement invoke undefined behavior, meaning (among other things) that you can't depend on the results to be repeatable.

In your particular case, for both the comparison between the addresses of two local variables and between the address of a local and a dynamic address, the operation appeared to "work", however the result could change by making a seemingly unrelated change to your code or even compiling the same code with different optimization settings. With undefined behavior, just because the code could crash or generate an error doesn't mean it will.

As an example, an x86 processor running in 8086 real mode has a segmented memory model using a 16-bit segment and a 16-bit offset to build a 20-bit address. So in this case an address doesn't convert exactly to an integer.

The equality operators `==` and `!=` however do not have this restriction. They can be used between any two pointers to compatible types or NULL pointers. So using `==` or `!=` in both of your examples would produce valid C code.

However, even with `==` and `!=` you could get some unexpected yet still well-defined results. See Can an equality comparison of unrelated pointers evaluate to true? for more details on this.

Regarding the exam question given by your professor, it makes a number of flawed assumptions:

• A flat memory model exists where there is a 1-to-1 correspondence between an address and an integer value.
• That the converted pointer values fit inside an integer type.
• That the implementation simply treats pointers as integers when performing comparisons without exploiting the freedom given by undefined behavior.
• That a stack is used and that local variables are stored there.
• That a heap is used to pull allocated memory from.
• That the stack (and therefore local variables) appears at a higher address than the heap (and therefore allocated objects).
• That string constants appear at a lower address then the heap.

If you were to run this code on an architecture and/or with a compiler that does not satisfy these assumptions then you could get very different results.

Also, both examples also exhibit undefined behavior when they call `strcpy`, since the right operand (in some cases) points to a single character and not a null terminated string, resulting in the function reading past the bounds of the given variable.

Tuesday, August 3, 2021

76

`bisect` supports arbitrary sequences. If you need to use `bisect` with a key, instead of passing the key to `bisect`, you can build it into the sequence:

``````class KeyList(object):
# bisect doesn't accept a key function, so we build the key into our sequence.
def __init__(self, l, key):
self.l = l
self.key = key
def __len__(self):
return len(self.l)
def __getitem__(self, index):
return self.key(self.l[index])
``````

Then you can use `bisect` with a `KeyList`, with O(log n) performance and no need to copy the `bisect` source or write your own binary search:

``````bisect.bisect_right(KeyList(test_array, key=lambda x: x[0]), 5)
``````
Tuesday, August 24, 2021

54

Using a set lets you avoid creating a double loop; add items you haven't seen yet to a new list to avoid altering the list you are looping over (which will lead to skipped items):

``````seen = set()
keep = []
for filename, filepath in file_info:
if filename in seen:
print filename, filepath
else:
keep.append((filename, filepath))
file_info = keep
``````

If order doesn't matter and you don't have to print the items you removed, then another approach is to use a dictionary:

``````file_info = dict(reversed(file_info)).items()
``````

Reversing the input list assures that the first entry is kept rather than the last.

If you needed all the full paths for files with duplicates, I'd build a dictionary with lists as values, then remove anything that has only one element:

``````filename_to_paths = {}
for filename, filepath in file_info:
filename_to_paths.setdefault(filename, []).append(filepath)
duplicates = {filename: paths for filename, paths in filename_to_paths.iteritems() if len(paths) > 1}
``````

The `duplicates` dictionary now only contains filenames where you have more than 1 path in the `file_info` list.

Thursday, November 11, 2021