Introduction
This is the documentation for the embedded domain-specific language Shrimp which is based on Haskell. Visit the project website to download the newest version and for additional background info. If you have not installed it yet, please follow the instructions on the Install page.
It is assumed that you are familiar with the basics of Haskell. If this is not the case, check out the free online tutorial Learn You a Haskell, which is an excellent introduction for beginners. The first four chapters already suffice to get started with Shrimp.
First steps
In Haskell, functions are defined using expressions. For example, the line f x y = y + sum [1 .. x+y]
computes the pairing function. Sometimes, in order to define more complex functions the use of recursion is required and multiple functions that call each other. Shrimp provides a simple way to define such functions in terms of flow charts.
Consider the following problem. Given a string such as 'abbcccaa'
return [('a',1),('b',2),('c',3),('a',2)]
, i.e. how often the same character appears consecutively. The following Shrimp program (also called sp-program) describes an algorithm for this:
#NAME
countAlg
#FUNCTIONS
count input = result
#VARIABLES
input :: String --must be non-empty
result :: [(Char,Int)]
cur :: Char
n :: Int
#OPERATIONS
init:
cur' = first input
n' = 1
input' = removeFirst input
result' = []
same: --same character as before
n' = n + 1
input' = removeFirst input
new: --different character than before
cur' = first input
n' = 1
input' = removeFirst input
result' = append (cur,n) result
final:
result' = append (cur,n) result
#PREDICATES
hasNext = not (isEmpty input)
newChar = cur /= (first input)
#FLOW
init = (hasNext (newChar new same) final)
same = (hasNext (newChar new same) final)
new = (hasNext (newChar new same) final)
final = HALT
#VERBATIM
first = head --returns first element of a list
isEmpty = null --true iff list is empty
removeFirst = tail --returns list without first element
append x y = y++[x] --appends element x to end of list y
We will discuss this program in more detail later on. First, let us see how this sp-program can be executed.
Every sp-program must be included in a host Haskell source code file in order to be executed. Let us call this file Counter.hs
and let it have the following content:
import qualified Shrimp.Interpreter
sp__thisFileName = "Counter.hs"
--START OF PROGRAM countAlg
--END OF PROGRAM countAlg
The two comment lines state where spc
should insert the sp-program. Additionally, sp-programs referenced from a Haskell file need to be in the correct place in order for spc
to find them. In this case the correct directory structure is:
./Counter.hs
./Counter/countAlg.sp
The sp-program should have the same file name as what is written in the #NAME
-section plus the suffix .sp
and it should be in the directory that has the same basename (filename without extension part) as the Haskell source code file that references it.
Now, we can compile the file Counter.hs
. If you don't want to create these two files yourself, they are also included in Shrimp/Tests
. Open the terminal and navigate to the directory containing Counter.hs
and execute the command spc Counter.hs
. There should be no output on the terminal which means everything worked as expected. If you open Counter.hs
again, you see that a translated version of the sp-program was inserted between the two comment lines.
Open Counter.hs
with ghci
and type count "abbcccaa"
. It should return the correct result. Additionally, the trace produced by executing the sp-program is written in ./Counter/countAlg/1.csv
. If you open this in a spreadsheet program such as LibreOffice Calc then you should see a table like this:
The values in trace row number 0 describe the input values after calling the function count. The bottom symbol ⊥ indicates that a value has not been initialized (undefined). If a cell in a variable column is blank this means the value has not changed.
Note: If you have problems seeing the trace correctly, import it as csv-file with ;
as only delimiter and "
as string delimiter using UTF-8
encoding.
Some things you can try out:
- Figure out the meaning of each section of
countAlg
using the trace as help - What happens when
count ""
is called? - How could
countAlg
be modified so that it returns[]
for the input""
? If you changecountAlg.sp
simply executespc Counter.hs
again to recompile
Semantics
Let us go through each section of the sp-program countAlg
and see what it means. A section begins with a line that starts with #
and ends when the next section begins or the file ends.
#NAME
countAlg
In the #NAME
-section the name of the sp-program is defined. The name is used to reference the sp-program in the Haskell file (look for sp__countAlg
in Counter.hs
).
#VARIABLES
input :: String --must be non-empty
result :: [(Char,Int)]
cur :: Char
n :: Int
Program variables must be declared upfront in Shrimp in the #VARIABLES
-section. It is also possible to use type variables in the type expressions.
#FUNCTIONS
count input = result
A function in an sp-program specifies which program variables should be given as input and the value of which variable should be returned as result at the end of the execution.
When an sp-program is called via a function, all variables which are not part of the input are set to undefined
and using them before initializing them will cause an exception.
An sp-program can compute more than one function. Try adding the line count2 n input = cur
to the section and see how it behaves.
#OPERATIONS
init:
cur' = first input
n' = 1
input' = removeFirst input
result' = []
same: --same character as before
n' = n + 1
input' = removeFirst input
new: --different character than before
cur' = first input
n' = 1
input' = removeFirst input
result' = append (cur,n) result
final:
result' = append (cur,n) result
An operation defines how the values of the program variables can change in one execution step. To distinguish the new values from the old ones, an '
is appended to the variable name. For example, cur' = first input
means that the new value of cur
after executing the operation should be the first element of the old value of input
.
Program variables on the left-hand side must always have a '
at the end (because the old value can't be changed anymore). On the right-hand side both is possible. If the first assignment in init
is replaced with cur' = first input'
then the new value of cur
would be the second element of the old value of input
because that is the first element of input'
(assuming input
has enough elements).
All assignments of an operation must be more indented than the line where the operation starts. It is also possible that an operation contains no assignments. Multi-line assignments are allowed, provided the additional lines are more indented than the line where the assignment starts.
#PREDICATES
hasNext = not (isEmpty input)
newChar = cur /= (first input)
In order for the sp-program to know with what operation to continue or whether the execution should halt, it needs to consider if certain properties hold with respect to the values of the program variables. A predicate describes such a property. The predicate name is on the left-hand side. The variables in the expression on the right-hand refer to the values after the operation has been executed. For example, the predicate hasNext
is true afer the operation init
iff the input string has at least two characters.
Multi-line expressions can be used to define predicates, provided each additional line is more indented than the line where the predicate begins.
#FLOW
init = (hasNext (newChar new same) final)
same = (hasNext (newChar new same) final)
new = (hasNext (newChar new same) final)
final = HALT
After executing an operation the sp-program needs to know with what operation to continue. For each operation this information can be specified using a binary decision tree (BDT). The inner nodes of the tree are labeled with a predicate and the leaves with operations or HALT
. If a predicate is true then the first child is taken, otherwise the second. If the BDT of an operation consists of only one operation (or HALT
) then there should be no parentheses.
The BDT (hasNext (newChar new same) final)
is read as follows. If the predicate hasNext
is true then continue with checking the predicate newChar
. If newChar
is true then continue with the operation new
, otherwise with same
. If hasNext
is false then continue with the operation final
.
#VERBATIM
first = head --returns first element of a list
isEmpty = null --true iff list is empty
removeFirst = tail --returns list without first element
append x y = y++[x] --appends element x to end of list y
The #VERBATIM
-section can be used to include functions and constants which are added on the global level to the Haskell source code file.
Good to know
Additional functions:
spc
has some additional functions such as computing the code coverage given an sp-program and a directory of traces or comparing traces in various ways. See spc Usage for more information.
Tracing self-defined data types: In order for Shrimp to be able to generate traces for self-defined data types they must be a member of the type class Show
. Alternatively, you can also hide variables from traces by prefixing them with ?
, e.g. ?result :: [(Char,Int)]
.
Local variables:
It is possible to use variables which have not been declared in the #VARIABLES
-section; they are called local variables or helper variables. Simply choose a variable name not used by a program variable in an assignment on the left-hand side. Such variable is only visible within the operation where it is defined.
#OPERATIONS
opExample:
a' = c * c
b' = c + c
c = 123 --local variable
Phantom operations:
Reconsider the #FLOW
-section from the previous example.
#FLOW
init = (hasNext (newChar new same) final)
same = (hasNext (newChar new same) final)
new = (hasNext (newChar new same) final)
final = HALT
Notice how all operations but final
have the same BDT.
You can use a phantom operation to refactor this:
#FLOW
init = nop
same = nop
new = nop
.nop = (hasNext (newChar new same) final)
final = HALT
The .
in front of nop
means that it does not modify the contents of the variables and thus does not need to occur in the #OPERATIONS
-section. This can also be used if there is a sub-BDT that occurs repeatedly. The only difference in the traces are the additional rows with nop
as operation. If you don't want to see phantom operations in the traces, use the option MERGE_PHANTOM_OPERATIONS
(see below).
Options:
There exists an #OPTIONS
-section which can be used to modify certain aspects of how an sp-program is translated or executed.
For example, an sp-program can be translated in different modes (Production
, Interpreted
and Empty
). The default mode is Interpreted
which was used in the previous example. It causes traces to be written as files during execution. If you are certain your program works as intended and you want maximum speed use the Production
mode instead. You can do so by adding the following section right after the #NAME
-section.
#OPTIONS
TRANSLATION Production
Another useful option is MAX_STEPS 42
which limits the number of execution steps to 42 (or any other number you like). This only works when using the Interpreted
translation mode. A typical use-case for this option would be when a program's execution does not seem to terminate and you want a trace to see what is going on. For all possible options see the #OPTIONS
-section in Syntax & Semantics.