if ... then ... else ...
(use if'
instead if imported)[(i,j) | i <- [1,2], j <- [3,4]]
The Quickselect algorithm computes the \(k\)-th smallest element of a list inplace. This means the algorithm does not require any additional space except pointers to the list. The algorithm consists of two parts: partition and selection.
In the partition part a list is sorted around a pivot element. More specifically, after executing this part it should hold that all elements in the list which are smaller than the pivot element should be on its left side and all others on its right side. Example (red = pivot element, blue = element to be compared):
3 | 1 | 12 | 7 | 18 | 22 | 14 | 0 | 9 | 3 < 9 → continue |
3 | 1 | 12 | 7 | 18 | 22 | 14 | 0 | 9 | 1 < 9 → continue |
3 | 1 | 12 | 7 | 18 | 22 | 14 | 0 | 9 | 12 ≥ 9 → swap |
3 | 1 | 9 | 7 | 18 | 22 | 14 | 0 | 12 | 7 < 9 → swap |
3 | 1 | 7 | 9 | 18 | 22 | 14 | 0 | 12 | 18 ≥ 9 → continue |
3 | 1 | 7 | 9 | 18 | 22 | 14 | 0 | 12 | 22 ≥ 9 → continue |
3 | 1 | 7 | 9 | 18 | 22 | 14 | 0 | 12 | 14 ≥ 9 → continue |
3 | 1 | 7 | 9 | 18 | 22 | 14 | 0 | 12 | 0 < 9 → swap 2x |
3 | 1 | 7 | 0 | 9 | 22 | 14 | 18 | 12 |
The selection part works as follows.
First, a pivot element is selected according to a given rule. You may assume that the last element of the remaining list is always selected as pivot element. Then the partition part is executed. Let \(p\) be the index of the pivot element after partition. We call the \(k\)-th smallest element desired element.
partition_list :: [Int] -> Int -> Int -> Int -> [Int] partition_pivot :: [Int] -> Int -> Int -> Int -> Int select :: Int -> [Int] -> Int
partition_list
returns the list arg1
after the remaining list between index arg2
and arg3
with the pivot element at index arg4
has been partitioned.partition_pivot
returns the index of the pivot element in the list (partition_list arg1 arg2 arg3 arg4)
.select
returns the arg1
-th smallest element in the list arg2
.
You may assume that 1 <= arg1 <= (length arg2)
and that arg2
is non-empty and does not contain duplicates.AB5_1/partition.sp
and AB5_1/select.sp
such that they implement the algorithm described above. Do not add any additional variables to the programs. Also, all variables besides list
and res
may only assume values between \(0\) and \(n\) where \(n\) is the length of list
. The variable res
is not allowed to appear in any predicate or in any operation on the right-hand side. Huffman coding is a method to assign code words of variable length to a finite set of source symbols \(\Sigma\). Code words are words over a given code alphabet \(\Gamma\). Source symbols which occur more frequently should be assigned shorter code words. The relative frequency of every source symbol \( p \colon \Sigma \rightarrow [0,1] \) is assumed to be given; furthermore, it is assumed that every source symbol occurs at least once, i.e. \(p(x) > 0\) holds for all \(x \in \Sigma\).
The Huffman encoding \(h \colon \Sigma \rightarrow \Gamma^* \) is constructed as follows. A set of trees is iteratively merged until only a single tree remains whose leafs correspond to the source symbols and the path to a leaf describes its code word. The nodes are labeled with subsets \(X\) of \(\Sigma\) and the cumulative relative frequency of such a set is \(p(X) = \Sigma_{x \in X} p(x)\).
In the beginning, a tree is created for each source symbol \(x\) which consists of a single node labeled with \(\{x\}\). Then the following step is repeated until only a single tree remains.
Choose \(|\Gamma|\) trees which have the lowest relative frequency. The relative frequency of a tree \(T\) is \(p(X)\) where \(X\) is the label of the root node of \(T\). If this selection criterion is ambiguous then prefer trees with smaller depth. If this is still ambiguous then prefer trees whose root node has a label which is lexicographically smaller. Merge these trees into a single one.
To ensure the unambiguity of this construction append the selected trees to the new tree in lexicographic order w.r.t. to the labels of the root nodes.
Example: given \(\Sigma = \{a,b,c,d,e\}\), \(\Gamma = \{0,1\}\) and \( p = \left(\begin{smallmatrix}a&b&c&d&e\\\frac{4}{20}&\frac{3}{20}&\frac{8}{20}&\frac{1}{20}&\frac{4}{20}\end{smallmatrix}\right)\), we get the following tree:type CodeAlphabet = [Char] type Probabilities = Map Char (Int,Int) type HuffmanTree = (Tree [Char]) huffmanTree :: CodeAlphabet -> Probabilities -> HuffmanTree huffmanEncode :: HuffmanTree -> CodeAlphabet -> String -> String huffmanDecode :: HuffmanTree -> CodeAlphabet -> String -> String
huffmanTree
returns the Huffman tree for a probability distribution arg1
and a code alphabet arg2
. The set of source symbols is implicitly given as (map fst (toList arg2))
.
ht1 = huffmanTree "01" (fromList [('a',(4,20)),('b',(3,20)),('c',(8,20)),('d',(1,20)),('e',(4,20))]) "abcde" "ae" "a" "e" "bcd" "bd" "b" "d" "c" ht2 = huffmanTree "01" (fromList [('a',(4,20)),('b',(8,40)),('c',(4,20)),('d',(8,40)),('e',(4,20))]) "abcde" "abe" "ab" "a" "b" "e" "cd" "c" "d" ht3 = huffmanTree "012" (fromList [('a',(6,24)),('b',(2,24)),('c',(6,24)),('d',(2,24)),('e',(5,24)),('f',(2,24)),('g',(1,24))]) "abcdefg" "a" "bdefg" "bdg" "b" "d" "g" "e" "f" "c"
huffmanEncode
converts arg3
into a code word (string over the code alphabet). The \(i\)-th subtree of a node of arg1
should correspond to the \(i\)-th character in arg2
. Examples:
huffmanEncode ht1 "01" "abc" = "0010011"
huffmanEncode ht1 "yx" "abc" = "yyxyyxx"
huffmanEncode ht1 "10" "d" = "010"
huffmanEncode ht3 "ABC" "acd" = "ACBAB"
The function huffmanDecode
decodes arg3
back into a string over the source alphabet, i.e. (huffmanDecode x y (huffmanEncode x y z)) == z
.
Implement this function by completing the program in the file AB5_2/decode.sp
. The variable input
should be treated as an input stream and the variable output
should be treated as an output stream.
This means the variable input
may only be accessed via the operation o_next
(reads the next character from the input stream) and the predicate p_endOfStream
(true iff the input stream has reached the end meaning there are no more characters to be read).
The variable output
may only be accessed via the operation o_flush
(buffer
is appened to output
).
The following questions are open-ended.
Follow-up question 1. Assume a German text \(T\) of length \(n\) consisting of the characters a through z, äöüß, whitespace and period (32 different characters) is given. The text \(T\) can be encoded using \(5n\) bits. How would you choose the set of source symbols \(\Sigma\) for \(T\) in order for the Huffman encoded version \(T_H\) of \(T\) to be as short as possible (\(\Gamma = \{0,1\}\))?
What would you expect the quotient \(|T_H|/5n\) to look like?
Follow-up question 2. How could Huffman coding be used to compress binary data (pictures, music, videos, applications, etc.)?