STRUCTORIZER User Guide

Syntax > Specific aspects for ARM export

About the ARM Generator prototype

Under construction

Since version 3.32-02, Structorizer provides (a somewhat premature) prototypical generator for ARM assembler code thanks to Alessandro Simonetta et al.

ARM (assembler) code is a mnemonic representation of machine code for ARM processors, such that the abstraction level differs fundamentally from that of higher-level programming languages like Pascal, C, Java, etc.

Two uses cases (or perspectives) may have to be distinguished here:

ARM code generation from an arbitrary Nassi Shneiderman diagram. This would require a full compiler capability (possibly even breaking down floating point arithmetics to sequences of byte and word operations). This cannot be the task of Structorizer on this early stage.
Conversion of algorithms formulated on the conceptional level of RISC processor capabilities from a structogram to ARM assembler code. In this perspective, it may not even be desirable to implement too much compiling intelligence here. Even conceding this, the conversion capabilities of this early prototype are still very limited. On the other hand, this required some additions that don't work or don't even make sense in e.g. Executor (e.g. address retrieval for a variable or direct memory access as if it were an array). These additions are briefly explained below.

From version 3.32-05 on, Structorizer follows a two-fold strategy to combine both intentions: On the one hand, a very restricting grammar check may be imposed via the Analyser Preferences, which complains about Element lines that leave the narrow ARM processor capabilities behind. Via a respective export option you may even have the ARM generator reject such rich instructions. On the other hand, the ARM generator will be enhanced to accept and compile more and more complex expressions and instructions in an evolutionary development process. To make use of these extending capabilities you should lift the just mentioned restrictive options. But please do not expect too much — you will simply have to check the generated code (in the code preview) to find out whether ARM generator coped or not.

Some important facts in a nutshell:

The set of supported statements is very limited and the syntax may even differ from the Structorizer conventions (see Basic Concepts).
Certain variable names will be interpreted directly as machine registers, and there are some addtional keywords or markers for certain machine-oriented aspects.
Array definitions differ somewhat from the usual conventions in Structorizer (see Arrays).
Records and Enumerations are not supported at all in this context by now.
Strings can only be used in variable assignments (better: initialisations), in order to create an array of characters in the "memory".
The generated code for an intended array access via a copy of a variable or register that was associated with the address of an array or string should not be expected to make sense.

Register mapping

Variable names R0, R1, etc. through R15 and, equivalenty(!), r0, r1, ..., r15 are interpreted as registers of the ARM processor architecture. Other variables will be mapped to registers not explicitly referenced. Register name R15 (or r15) denotes the program counter and may only be used within a condition (comparison expression) but may not be set explicitly or used in other kinds of expression.

If more than 15 variables occur in a diagram then the ARM generator will refuse to translate them sensibly (in future it is meant to do a more or less intelligent management in memory). If both the upper-case and the lower case register name of the same register (same number) occur in one diagram (e.g. R5 and r5), then the behaviour is undefined.

<identifier^R> thus denotes an identifier as described in Basic Categories where ARM register names are treated in a special way.

<register> denotes one of the register names R0, R1, ..., R14, or r0, r1, ..., r14.

Expression complexity

The manageable complexity of expressions is very low at the moment. Only "flat" expressions using one kind of operator (e.g. addition or multiplication, not both) can usually be processed, no complex nesting is supported, parentheses will be ignored.

Next to the usual assignment operators, the only supported operator symbols (referred to as <operator> below) are:

+, -, *, &, |, and, &&, or, ||

Logical expressions (to be used in Alternatives, While and Repeat loops) may either be atomic or a series of one or more comparisons combined either by and (equivalently: && ) or by or (equivalently: || ), but not both. Do not rely on operator precedence, parentheses will internally be eliminated. Atomic logical expressions may be variables or registers (which are then implicitly tested to be non-0), a negation operator (not or !) may be applied. Note: In comparisons the left operand must always be a register or variable (such that e.g. 4 < R5 is not allowed, whereas R5 > 4 is).

Examples:

isNice
not R5
R4 < 17
R0 = 'b' or R1 >= R4 or R6 = 0x2e4

To keep things simple, we will introduce a combined literal concept <int_literal> here, which is either an integral decimal <literal_int> or a hexadecimal literal <literal_hex> (see Basic Concepts):

<int_literal> ::= <literal_int> | <literal_hex>

Statements

Basic assignment

The basic assignments allow just Boolean literals, integral literals, variables or a single operation between two simple terms.

<identifier^R> ( <- | := ) (true | false)

<identifier^R> ( <- | := ) ( <identifier^R> | <int_literal> ) [ <operator> ( <identifier^R> | <int_literal> ) ]

Examples:

test ← false
count ← R3
R4 ← 0x6 + count

Memory read and memory write operations

This is an alternative way to access the content of a declared and intialized array (by version 3.32-03, other variables are not allocated in memory but rather mapped to registers).

<identifier^R> ( <- | := ) (memory | memoria) '[' <identifier^R> [ + <int_literal> ] ']'

(memory | memoria) '[' <identifier^R> [ + <int_literal> ] ']' ( <- | := ) <identifier^R>

Note: The <identifier^R> within the brackets may be a variable name or a register name, depending on how the array was declared (see Array support below). The given <int_literal> must be the actual address offset rather than an index: No automatic index transformation will be performed.

Examples:

R6 ← memoria[height]
R2 ← memory[R3 + 0x12]

memory[R3] ← R8
memoria[count + 4] ← r2

Address assignment

Assigns the address of some variable held in storage (i.e. an array) to a register. The right-hand side of the assignment resembles the call of a built-in function in syntax. The argument must not be a register name (if the array was declared with a register name then the address assignment will have been done automatically).

<register> ( <- | := ) (address | indirizzo) '(' <identifier> ')'

Examples:

R5 ← address(storage)
R2 ← indirizzo(count)

Character assignment and String initialization

Assigns a character or string literal:

<identifier^R> ( <- | := ) " <character>{<character>} "

<identifier^R> ( <- | := ) ' <character> '

Examples:

digit ← '3'
R9 ← "These are 4 silly words"

Remarks:

A string literal must not be empty!
A string initialization induces the memory allocation of an array of the contained characters, each represented by an entire ARM word (4 Bytes), which is sufficient for UTF-32.
A character literal assignment, in contrast, is converted into an instruction loading the charcter code as a direct operand into the target register.
A string cannot be prolongated in ARM code therefore (as the memory reservation will exactly follow the length of the string literal). So the export of diagrams that use e.g. string concatenation will fail to produce usable ARM code.
Since version 3.32-04, character assignments require single quotes (') as delimiters of the character literal. String initializations, however, require the string literal to be delimited with double quotes (").
There will not be a terminating '\0' character at the end of the allocated string unless you switch on an ARM-specific export option "Store strings with 0-termination".
Non-Ascii and control characters will be expressed by their hexadecimal code point value in the exported code.

Array support

Arrays are first to be initialized by a statement of the following form, which may or may not involve a declaration over a specific low-level data type (since version 3.32-04, the type description is required to be similar to C# or Java, i.e. an empty pair of brackets must follow to the element type name):

[(byte | hword | word | quad | octa)'['']'] <identifier^R> ( <- | := ) '{' <int_literal> { , <int_literal> } '}'

If none of the types "byte[]", "hword[]", etc. is specified then "word[]" will be assumed, which designates a 32 bit value (4 byte width). Note that since version 3.32-03 an ARM-specific export option controls whether memory alignment to entire word addresses will automatically be performed.

Example: word[] array1 ← {56, 7, 98}

If <identifier^R> designates a register, then the register will automatically be associated with the address of the array, whereas the array itself will be placed with a not directly accessible generic label.

Then assignments of the subsequent kinds are meant to be accepted:

Read from an array:

<identifier^R> ( <- | := ) <identifier^R> '[' (<identifier^R> | <int_literal>) ']'

Example: c ← array1[R5]

Write to an array:

<identifier^R> '[' (<identifier^R> | <int_literal>) [ + (<identifier^R> | <int_literal>) ] ']' ( <- | := ) <identifier^R>

Example: array1[R2 + 7] ← R4

Remarks:

Be aware that by now an array element access still cannot be exported to ARM code if placed within other kinds of instruction or expression, e.g. if you want to compare the content of an array element you must first assign it to some variable or register and then compare this.
Value lists in traversing FOR loops (collection-controlled loops, FOR IN loops) may either be explicit (array literal or comma-separated list over integer literals) or variables / registers that refer to a declared array of integers in order to allow successful export.

Input and output instructions

The export of input and output instructions is only supported if the GNU syntax mode is chosen in the ARM-specific export options. Input instructions require all their parameters to be simple variables (or registers), i.e., no array element access or record component access is syntactically supported. The expression list of an output instruction may comprise variable/register names and integer literals. Whereas in the restrictive grammar check of the ARM instruction level approach (see use case 2 above) complains if there is no item in the instruction or if the input instruction contains a prompt string literal, the generator will tolerate them, a prompt string in input instructions is ignored.

<input_keyword> <identifier^R> { , <identifier^R> }

<output_keyword> ( <identifier^R> | <int_literal> ) { , ( <identifier^R> | <int_literal> ) }

Examples:

INPUT R3, number
OUPUT number, -21, R3

ARM assembler instructions

Moreover, all instruction lines that start with one of the following ARM assembler mnemonics (case-ignorantly) are considered as ready-to-use ARM instructions (without further syntax analysis, only variable names will be replaced by register names and unprefixed integer literals will be prefixed with '#'):
add, adc, adcs, and, asr, b, bic, bkpt, cdc, cdp, clz, cmn, cmp, cpsid, cpsie, cpy, eor, ldc, ldm, ldr, lsl, lsr, mcr, mla, mov, mrc, mrrc, mrs, msr, mul, orr, pkhbt, pkhtb, rev, rfe, ror, rrx, rsb, rsc, sel, setend, sbc, smla, smlsd, smmla, smmls, smuadx, srs, ssat, stc, stm, str, sub, swi, sxtab, sxtah, sxtb, sxth, teq, tst, usat, uxtab, uxtah, uxtb, uxth.

Consequences of ARM syntax limitations

The following screenshot shows the consequences of the syntax limitations. The restrictive syntax check complains about the input instruction with prompt string, whereas the code preview demonstrates that ARM generator actually copes with it. It produces more efficient code than from sequences of single input/output elements, by the way since repeated assignments to the address register can be omitted:

Syntax check and compilation result

The above complained assignment expression with two operators can still not be converted, indeed.