Syntax & Semantics

General Structure

A Shrimp program—just called program in the following—consists of sections that must appear in a specified order. Every section has a type and a section of every type may occur at most once. A section starts with a line containing only the name of the section type and possibly trailing white spaces (no leading white spaces allowed). The content of a section is defined as all the lines from the start of the section until the end of the section minus the first line, i.e. the one containting the section type name.

Every line starting with a # indicates the start of a new section and consequently must specify an existing section type. There are the following section types and the order in which they are listed reflects the order in which they must appear in a program:

  • #NAME
  • #OPTIONS (optional)
  • #FUNCTIONS (optional)
  • #VARIABLES
  • #CONSTRAINTS (optional)
  • #OPERATIONS
  • #PREDICATES (optional)
  • #FLOW
  • #VERBATIM (optional)

Section types not marked as optional have to appear in a program.

The lines before the #NAME-section in a program can contain anything (except a line indicating the start of a section) and will be treated as a comment. For example, it could be used to give a description of the program's underlying algorithm.

A program should not contain tabulator characters (replace with white spaces).

Comments

Comments are only allowed within a section. This means the line which indicates the start of a section (#...) cannot contain a comment. The syntax for comments is the same as in Haskell, i.e. -- for single line comments and {- -} for multi-line comments. For a multi-line comment it should hold that:

  • there are only white spaces (or nothing) before {- in the line containing {-
  • there are only white spaces (or nothing) after -} in the line containing -}

A legal identifier is a varid (see section 2.4 Identifiers and Operators in the Haskell Language Report 2010) with the following additional restrictions:

  • it may not begin with sp__
  • it may not begin with a small letter followed by __

The program name and the name of every variable, function, operation and predicate must be a legal identifier.

EBNF Notation

The extended Backus-Naur form is used to specify the syntax of certain sections. Non-terminals are written as nonTerminal and terminals as "str". Non-terminals used:

  • ws0 = {" "} (0 or more white spaces)
  • ws1 = " " ws0 (1 or more white spaces)
  • legalId = set of legal identifiers (see above)
  • legalId2 = ws0 legalId ws0

Indentation rule

The right-hand side (everything right of =) of

  • an assignment in an operation
  • a predicate
  • a binary decision tree (BDT)

can span multiple lines, provided every line after the one where it starts is more indented than the first.

Empty lines

For the specification of the content of the section types (except for the #VERBATIM-section) we assume that the program contains no comments and empty lines (lines only consisting of white spaces or nothing) without loss of generality.

#NAME-section

The content of this section is a single line containing the program name, which must be of the form:

  • legalId2

The name is used to give the program a name in the host Haskell source code file. For example, if the program name is exampleAlg then the function sp__exampleAlg in the host Haskell source code file contains the description of this program.

#OPTIONS-section

Each line in the content of this section is treated as an option. An option may affect how a program is translated or parsed. An option is a line of the form OPTIONNAME or OPTIONNAME $ARG1 $ARG2 ... Trailing white spaces are ignored. Unknown options or options with invalid arguments may be ignored by the compiler. The following options exist:

  • TRANSLATION $MODE where $MODE is one of the following: Empty, Interpreted, Production
  • ALLOW_LOCAL_VARIABLES_WITH_TRAILING_APOSTROPHE
  • MERGE_PHANTOM_OPERATIONS whenever a phantom operation is referenced in a BDT it is replaced with its BDT
  • EXCLUDE_FUNCTION_SIGNATURE $FUNC omits signature of $FUNC in the Haskell code

The following options only apply to the the Interpreted translation mode:

  • UNTRACED no trace files are written when the program is executed
  • UNTRACED_FUNCTION $FUNC same as above but only when the function $FUNC is called
  • MAX_STEPS $STEPS program is executed for at most $STEPS steps (e.g. MAX_STEPS 20)
  • MAX_STEPS_FUNCTION $FUNC $STEPS same as above but only when the function $FUNC is called
  • TRACE_DIRECTORY "$DIR" writes trace files to $DIR (must be escaped like a string in Haskell). The directory is relative to the one containing the host Haskell source code file
  • TRACE_DIRECTORY_FUNCTION $FUNC "$DIR" same as above but only when $FUNC is called

#FUNCTIONS-section

Each line in the content of this section must be of the following form:

  • functionName arguments "=" output

where

  • functionName = legalId2
  • arguments = {ws1,inputVariableName}
  • inputVariableName = legalId2
  • output = singleOutput | tupleOutput
  • singleOutput = legalId2
  • tupleOutput = ws0 "(" legalId2 "," legalId2 {"," legalId2} ")" ws0

Each line in this section describes an interface that allows one to execute the program. The function names are used in the host Haskell source code file. This interface specifies which variables are the input to the program and which variable(s) are returned as output at the end of the execution. All variables which are not part of the arguments of a function are set to undefined at the beginning of the program execution.

#VARIABLES-section

Each line in the content of this section must be of the following form:

  • ws0 ["?"] ["!"] legalId ws0 "::" ws0 type

where

  • type = any syntactically valid type expression in Haskell that may include type variables

Each line in this section describes a variable (name and type) used by the program. A ?-modifier indicates that the variable should not be included in the trace output. A !-modifier indicates that the variable should be evaluated strictly (not implemented in spc yet).

Note: it is only sensible that variables which are input variables, i.e. occur as input parameter of a function, to contain type variables.

#CONSTRAINTS-section

Each line in the content of this section is a type constraint on the type variables that occur in the #VARIABLES-section and is written as in Haskell, i.e. Show a.

#OPERATIONS-section

The content of this section consists of a sequence of operations and each operation may consist of a set of assignments. A line which indicates the start of an operation is of the form:

  • operationName ":" ws0 | operationName ws1 "SAMEAS" ws1 operationName

where

  • operationName = legalId2

All subsequent lines after a line starting a new operation are interpreted as assignment belonging to that operation if they are more indented than that starting line, except if the operation is of them form x SAMEAS y. Otherwise it is assumed that a line must be the start of a new operation. An assignment is of the form:

  • pattern "=" expression

where

  • pattern = any syntactically valid pattern in Haskell

An operation describes how the variables of the program can change in a single execution step.

The first operation in the section is the start operation of the program.

For an operation we distinguish between global variables (or program variables) and local variables (or helper variables). A global variable is one that is defined in the #VARIABLES-section. A local variable only occurs in the pattern of some assignment in the operation and is only visible within that operation.

Let varName be the name of a global variable. Within the assignments of an operation varName refers to the value of this variable before the operation was executed (old value) and varName' to the value after the operation was executed (new value). Consequently, the patterns on the left-hand side should only use the global variables names with a trailing apostrophe. It is recommended that local variables do not use a trailing apostrophe in order to avoid confusion with global variables. However, this can be disabled with the option ALLOW_LOCAL_VARIABLES_WITH_TRAILING_APOSTROPHE.

Note: due to the indentation rule it is possible that an assignment spans over multiple lines

#PREDICATES-section

Each line in the content of this section must be of the form:

  • predicateName "=" expression

where

  • predicateName = legalId2
  • expression = any syntactically valid expression in Haskell that may reference global variables

A predicate describes a boolean property in terms of the values of the variables. Predicates are only used in the #FLOW-section to describe the binary decision trees.

Note: due to the indentation rule it is possible that a predicate spans over multiple lines

#FLOW-section

Each line in the content of this section must be of the form:

  • ws0 ["."] operationName ws0 "=" bdt

where

  • operationName = legalId
  • predicateName = legalId2
  • bdt = ws0 (innerNode | leaveNode) ws0
  • innerNode = "(" predicateName ws1 bdt ws1 bdt ")"
  • leaveNode = "HALT" | operationName

A line in this section specifies a binary decision tree (BDT) that is associated with an operation. For every operation there must be an associated BDT. This BDT describes with what operation to continue or whether to halt the execution. The first child in an innerNode describes the branch that is taken when the predicate predicateName is true. Otherwise the second child is taken.

The modifier . indicates a phantom operation. This is an operation which does not change the contents of the global variables and thus does not have to (and should not) appear in the #OPERATIONS-section. It can be used to compress common sub-BDTs.

Note: due to the indentation rule it is possible that a BDT spans over multiple lines

#VERBATIM-section

The content of this section may contain anything but a line indicating the start of a new section. The content of this section is pasted as is in the Haskell source code file.

Program semantics

When a function from the #FUNCTIONS-section is called, the program variables which are arguments of that function are initialized with the given values. All other variables are initialized with undefined. Then the program begins execution with the start operation (the first operation that occurs in the #OPERATIONS-section).

When an operation is visited then the variables change their values according to the specified assignments of that operation. Then the associated binary decision tree (BDT) of that operation is considered to determine what operation to visit next or to halt the execution. The predicates are evaluted w.r.t. the new values of the variables.