I'm very curious, why stability is or is not important in sorting algorithms?
A sorting algorithm is said to be stable if two objects with equal keys appear in the same order in sorted output as they appear in the input array to be sorted. Some sorting algorithms are stable by nature like Insertion sort, Merge Sort, Bubble Sort, etc. And some sorting algorithms are not, like Heap Sort, Quick Sort, etc.
Background: a "stable" sorting algorithm keeps the items with the same sorting key in order. Suppose we have a list of 5-letter words:
peach straw apple spork
If we sort the list by just the first letter of each word then a stable-sort would produce:
apple peach straw spork
In an unstable sort algorithm,
spork may be interchanged, but in a stable one, they stay in the same relative positions (that is, since
straw appears before
spork in the input, it also appears before
spork in the output).
We could sort the list of words using this algorithm: stable sorting by column 5, then 4, then 3, then 2, then 1. In the end, it will be correctly sorted. Convince yourself of that. (by the way, that algorithm is called radix sort)
Now to answer your question, suppose we have a list of first and last names. We are asked to sort "by last name, then by first". We could first sort (stable or unstable) by the first name, then stable sort by the last name. After these sorts, the list is primarily sorted by the last name. However, where last names are the same, the first names are sorted.
You can't stack unstable sorts in the same fashion.
I came up with a solution which probably isn't the most efficient, but it works well enough. Basically:
- Sort all the words by length, descending.
- Take the first word and place it on the board.
- Take the next word.
- Search through all the words that are already on the board and see if there are any possible intersections (any common letters) with this word.
- If there is a possible location for this word, loop through all the words that are on the board and check to see if the new word interferes.
- If this word doesn't break the board, then place it there and go to step 3, otherwise, continue searching for a place (step 4).
- Continue this loop until all the words are either placed or unable to be placed.
This makes a working, yet often quite poor crossword. There were a number of alterations I made to the basic recipe above to come up with a better result.
- At the end of generating a crossword, give it a score based on how many of the words were placed (the more the better), how large the board is (the smaller the better), and the ratio between height and width (the closer to 1 the better). Generate a number of crosswords and then compare their scores and choose the best one.
- Instead of running an arbitrary number of iterations, I've decided to create as many crosswords as possible in an arbitrary amount of time. If you only have a small word list, then you'll get dozens of possible crosswords in 5 seconds. A larger crossword might only be chosen from 5-6 possibilities.
- When placing a new word, instead of placing it immediately upon finding an acceptable location, give that word location a score based on how much it increases the size of the grid and how many intersections there are (ideally you'd want each word to be crossed by 2-3 other words). Keep track of all the positions and their scores and then choose the best one.
Your assumptions are almost correct. Let's review those first.
- It assigns the return of a self-executing function
This is called an Immediately-invoked function expression or IIFE
- It defines a local variable within this function
private keyword or functionality otherwise.
- It returns the actual function containing logic that makes use of the local variable.
Again, the main point is that this local variable is private.
Is there a name for this pattern?
AFAIK you can call this pattern Module Pattern. Quoting:
The Module pattern encapsulates "privacy", state and organization using closures. It provides a way of wrapping a mix of public and private methods and variables, protecting pieces from leaking into the global scope and accidentally colliding with another developer's interface. With this pattern, only a public API is returned, keeping everything else within the closure private.
Comparing those two examples, my best guesses about why the first one is used are:
- It is implementing the Singleton design pattern.
- One can control the way an object of a specific type can be created using the first example. One close match with this point can be static factory methods as described in Effective Java.
- It's efficient if you need the same object state every time.
But if you just need the vanilla object every time, then this pattern will probably not add any value.
Check this document out: The Dependency Inversion Principle.
It basically says:
- High level modules should not depend upon low-level modules. Both should depend upon abstractions.
- Abstractions should never depend upon details. Details should depend upon abstractions.
As to why it is important, in short: changes are risky, and by depending on a concept instead of on an implementation, you reduce the need for change at call sites.
Effectively, the DIP reduces coupling between different pieces of code. The idea is that although there are many ways of implementing, say, a logging facility, the way you would use it should be relatively stable in time. If you can extract an interface that represents the concept of logging, this interface should be much more stable in time than its implementation, and call sites should be much less affected by changes you could make while maintaining or extending that logging mechanism.
By also making the implementation depend on an interface, you get the possibility to choose at run-time which implementation is better suited for your particular environment. Depending on the cases, this may be interesting too.
In practice, you can first iterate through the array once and use a hash table the count the number of occurrences of the individual elements (this is O(n) where n = size of the list). Then take all the unique elements and sort them (this is O(k log k) where k = number of unique elements), and then expand this back to a list of n elements in O(n) steps, recovering the counts from the hash table. If k << n you save time.