Like most languages, OmniMark supports functions. Unlike many languages, an OmniMark program is not simply a
hierarchy of functions. Rules are the principal structural element of
OmniMark programs. Functions are supplementary structures. Functions cannot contain rules (though they can invoke
them through submit
, do xml-parse
, do sgml-parse
, or do
markup-parse
). You can use functions to encapsulate code you use commonly within different rule bodies.
There are three types of functions in OmniMark:
A function that returns a value can be define
d as follows:
define integer function add (value integer x, value integer y) as return x + y
The return type of the function is declared following the define
keyword. It may be any OmniMark
data type, including record or opaque types. The value is returned using the return
keyword.
return
exits the function.
Here is how the add
function can be called:
process output "d" % add (2,3)
There are four types of coroutine functions in OmniMark:
When a value-returning function is called, its caller suspends execution while the function executes. The caller can proceed only after the function terminates and returns its single result value at the same time.
Coroutines always execute in pairs. When a coroutine function is called, two coroutines are created: one that
executes the function, and its sibling coroutine that is specified along with the call. The caller suspends
until both coroutines terminate their execution. Here is an example of a coroutine function and its call:
define string sink function add-all () as local integer result repeat scan #current-input match digit+ => n set result to result + n match white-space+ ; EMPTY again put #main-output "d" % result process using output as add-all () do output " 2" output " 3" done
In this example, the sibling coroutine for the function add-all
is the do
… done
scope
that outputs two numbers.
A coroutine function of string source
or markup source
type produces a continuous stream of
values. As the function produces the stream, its sibling coroutine consumes it. The stream is terminated when
the producing function terminates.
A coroutine function defined as string sink
or markup sink
consumes a stream produced by its
sibling coroutine. When the consuming coroutine terminates, its producer sibling is forcibly terminated as well.
An action function does not return a value. Rather, it performs an action. Here is how an action function
might be defined. Note that it has no return type in the definition and no return
is required:
define function clear-flags (modifiable switch flags) as repeat over flags set flags to false again
This function clears all the switch
es on a switch
shelf that is passed to it as a modifiable argument.
Action functions can generate output. The following function outputs characters in a specified range:
define function output-character-range (value string start, value string end) as repeat for integer i from binary start to binary end output "b" % i again process output-character-range ("A", "M")
While this type of function is permitted, it is generally preferable to write such functions as string source functions. This will improve the readability of your code and increase the
generality of the functions. The function is changed to a string source
function simply by adding
string source
to the definition and changing the function name from
output-character-range
to character-range
:
define string source function character-range (value string start, value string end) as repeat for integer i from binary start to binary end output "b" % i again process output character-range ("A", "M")
Here the normal OmniMark keyword output
can be used in the function call, enhancing the clarity of
the program. In addition, the string source
function can be used in a wider range of contexts such
as:
submit character-range ("A", "Z")
You can also write functions that both return a value and do output:
define integer function add (value integer x, value integer y) as output "I will add %d(x) and %d(y)%n" return x + y process local integer z set z to add (2,3) output "d" % z || "%n"
While it is certainly possible to program like this, we recommend that you avoid writing functions that both
do output and return a value. Not only do they make it hard to follow your code, but they can have unexpected
results. In particular, if the return value is directed to #current-output
, you may not get the
function's return values and output in the order you expected.
You can use switch
-returning functions as pattern functions.
You can also use string source
functions or string
-returning functions to dynamically
define the text to be matched in a pattern.
Because record shelves are references, they behave slightly
differently when passed to functions. In particular, the value of a record passed to a function as a value
argument can be changed, since it is the reference that is passed by value, not the record itself. For
the same reason, if the value of a record passed to a function as a value
argument is changed, its value
will also be changed in the calling environment, since it is the same record.
You can call OmniMark functions recursively. The following program calculates the factorial of a number using
a recursive function:
define integer function factorial (value integer n) as do when n <= 0 return 1 else return n * factorial (n - 1) done process output "d" % factorial (7)
Functions can be overloaded. See Functions: overloaded for details.
The principal job of a function is to encapsulate a discrete operation. However, a function may have side effects on the global state of the program. While writing functions with side effects is appropriate in some situations, you should exercise caution when using this technique as it can lead to programs that are difficult to debug and hard to read and maintain.
Functions isolate sections of code, but don't isolate you from the current environment, in particular the
current output scope. Output generated in an action function goes to the current output scope. If a function
changes the destinations of the current output scope (with output-to
), this carries over
to the calling environment.
Similarly, accessing #current-input
in a function can modify the current input being scanned by the
calling environment, affecting whether patterns succeed or not. This can be desirable in some cases (e.g.,
pattern functions), but in other cases this can lead to programs that are difficult
to understand and debug.
Function side effects can be particularly problematic with functions used in patterns and in the guards of rules. To allow for optimization of pattern matching routines, OmniMark does not define whether a pattern or the guard on a pattern is executed first (a pattern is itself a kind of guard on a statement, so this is sensible). You should never write a program that depends on the order in which a pattern and a guard on that pattern are executed.
Coroutine functions that modify the global state can also make program hard to understand because the order of
their execution, and therefore of their side effects, depends on the data flow. You can restrict the scope of
side effects by using save
and by declaring the relevant global
shelves as domain-bound
.
In the case of patterns that fail, OmniMark does not guarantee that all parts of the pattern will be tried, or that the same parts will be tried in all circumstances. This allows OmniMark to optimize pattern matching. You should never write a program that depends on the side effects of a function called in a pattern that fails.