S T R U C T O R I Z E R - User Guide
Import > Source Code

Source Code Import

Structorizer allows to derive a structogram from a given source code file (reverse engineering). By now, this import feature is only available for CLI Pascal, ANSI-C, and COBOL files, other programming languages(e.g. Java) are likely to follow.

Be aware that the grammars used by Structorizer to parse the source code are usually somewhat simplified, and you might face parser errors with some correct code samples, which are simply too complex for a reverse engineering or contain peculiarities Structorizer may not cope with anyway. In particular, Structorizer cannot sensibly import "spaghetti code", which makes use of GO TO instructions or other means of the source language that are incompatible with the idea and concepts of structured programming. For instance C files containing function pointers won't be accepted. Code with normal pointers will pass the syntax analysis but the resulting diagrams won't be executable because Executor doesn't support pointer types. You may have to pre-process such code files (e.g. cut some parts out, modify others) in order to be able to import at least the essential algorithmic structure.

In interactive mode, you can import code files of any supported programming language just by dragging the corresponding sources onto Structorizer. The respective parser will automatically be selected based on the file name extention. If the decision is ambiguous then you will be presented a choice menu to select the most appropriate parser.

Another way to achieve the same goal is to use the menu, i.e. "File => Import => Source Code ...".

In the file chooser dialog that will open you may select the appropriate file filter (combobox at the bottom of the dialog), both to restrict the search and to disambiguate the parser choice:

Import file filter choice in file dialog

The importer will parse the file according to a provided grammar and, if that has succeeded, synthesize control structures from the derived parse tree. Certain control keywords (or standard function names) of the source language recognized in the instruction texts may be replaced by the corresponding parser preferences for the same structures, as currently configured.

Syntactical Restrictions

Note that Pascal files to be imported must have a program, unit, package, or library header. If you want to convert a bare subroutine (procedure or function) definition you have to embed it in one of these sorts of compilable concepts before, e.g. in a unit or between

program dummy;

and

begin
end.

As mentioned above, C source files must not contain function pointers and should not make excessive use of externally defined preprocessor symbols like __stdcall, __thiscall, or __cdecl.

A file comprising several routine definitions will be converted into a set of separate diagrams, one for each of the routines, and - if not empty - another one for the main program. If you do the import within the application (rather than as batch job, see below) then the imported diagrams will be collected in the Arranger.
Be aware that subroutine calls (references to other functions) can only be identified as such by Structorizer if the corresponding routine definitions are also available in the imported code. Otherwise the respective lines of code will usually be inserted as ordinary Instruction elements but may manually be transmuted to equivalent CALL elements (at least if you intend to run the algorithm with Executor). This element transmutation, however, can be done by a simple mouse click. The capabilities of identifying standard routines or library functions for which there would be analogous built-in routines in Structorizer are still rather poor. (But improvements are planned.)

Type definitions (particularly for record/struct types) and constant definitions may be essential for the interpretability of expressions. Therefore they will be imported (since release 3.27). In the resulting diagram set, they will occupy Instruction elements, possibly in Includable diagrams, as they are usually needed globally. Constant definitions will be converted to assignment instructions (typically dyed in rosé and equipped with a comment "constant!" by the respective parser). Initialized variable declarations will also be imported as assignments.

The import of mere variable declarations (i.e. without initial value assignment)may be enabled via the Import Preferences dialog. Imported (local) declarations will typically be coloured in a faint green. (Since version 3.26-02, declarations in Pascal or VisualBasic style are tolerated as content of Instruction elements.)

The import of comments may also be enabled via the Import Preferences dialog. Structorizer tries its best to associate comments found in the source code to the closest element they may belong to.

Note: Some code files as exported from Structorizer might cause errors on re-import if they haven't been completed manually before. (Watch out for TODO or FIXME comments in the generated code.) Declaration sections e.g. in Pascal export frm earlier Structorizer versions might only contain comments, but Pascal source files are not allowed to contain empty declaration areas (e.g. a var keyword without variables being declared after it).

Note: Global declarations and initialisations (e.g. from C source files) will be placed into so called Includable diagrams, which are referenced in the "include list" of the main program diagram (if one emerged from the file) and those routine diagrams that refer to them. Additionally, global declarations will be presented in a light cyan background colour after import. The C import will only support a minimum subset of the C pre-processor (simple #defines without arguments); #include and #if in any variant may NOT be expected to work. If the code strongly depends on them then you may run the C source code through your compiler's pre-processor (for example gcc -E) and import the pre-processed source in order to compare the results or pass the parser restrictions.

Typical error display on parsing failure

If the code to be imported is not compatible with the used import grammar then you will be presented an error dialog as shown above where always the last 10 lines of code are shown for better orientation - for the line numbers might not exactly match those of the original code because usually some problematic pieces of the source may have been cut off in a preprocessing phase. In the last line, a little arrow symbol (») indicates the character or token where the parser detected a problem. As usual, the parsing failure may actually have been caused by preceding parts of the code. The message box also tells you what kind of symbols the parser had expected. For a deeper analysis you may inspect the import log file placed in the folder of the imported file. Its name is automatically derived from the source file itself.

Batch Import

Structorizer may also be used in batch mode to convert a source file (Pascal, C, or COBOL) into an NSD file. The command syntax is given below, where the underlined pseudo program name Structorizer is to be replaced with the respective batch or shell script name for the console environment:

  • structorizer.sh for Linux, UNIX, and the like;
  • Structorizer.bat for Windows.

The scripts can be found in the Structorizer installation directory; don't try with Structorizer.exe! The Java WebStart installation is not suited, either - you need the unzipped respective downloadable version.

Structorizer -p [-f] [-e encoding] [-v log-directory] [-o output-file] source-file ...

The options mean:

-p must be the first option and indicates the use as parser, i.e. for code import. Structorizer will conclude from the file name extensions what code parser is to be used. This holds on a file per file basis, i.e. the source-file list may even consist of heterogenous files, say Pascal, C, COBOL or whatever parsers will be available.

-e (followed by a charset name) is reserved for the choice of the source file character set (for Pascal import it's still rather irrelevant, though, because the used Pascal grammar doesn't cope with any non-ASCII characters, such that these are simply eliminated in a pre-parsing step).

-f forces overwriting an existing file with same name as the designated output file (see -o), otherwise an output file name modification by an added or incremented counter will be done instead (e.g. output.nsd -> output.0.nsd), thus preserving the existing file.

-o (followed by a file path or name) specifies the output file name. If not given, the output file name will be derived from the source file name by replacing the file name extension with ".nsd". The file name extension ".nsd" is ensured in either case. If several source files were given then without option -o the nsd file names will be derived from the corresponding source file names; with option -o, however, the name variation described for the absence of option -f would be used (creating files output-file.nsd, output-file.0.nsd, output-file.1.nsd etc.). If several diagrams emerge from a source file then the respective function signature will be appended to the base original file name, e.g. output-file.sub1-0.nsd, output-file.sub2-4.nsd etc.

-v (followed by a directory path) induces that for each imported file source-file a corresponding log file log-directory/source-file.log will be created in the specified folder, where preprocessor, parser and diagram builder write their log data into ("verbose mode"). These log files might help diagnosting parser trouble.

source-file (one or many file paths/names) the code files to be parsed and converted to Nassi-Shneiderman diagrams (nsd files).

Examples:

Structorizer.sh -p testprogram.pas

The above Linux/UNIX command imports file "testprogram.pas" from the current directory as Pascal source and will create the resulting nsd file with name "testprogram.nsd" (if it is a single diagram).

Structorizer.bat -p -e UTF-8 -o quicksort.nsd qsort.pas

This MSDOS command imports file "qsort.pas" (from the current directory) as UTF-8-encoded Pascal file and stores the resulting structogram in file "quicksort.nsd".

Structorizer.bat -p -e ISO-8859-1 -v . foo.c bar.pas

This MSDOS command parses the source files "foo.c" (as C file) and "bar.pas" (as Pascal file), assuming both to be encoded with ISO-8859-1 character set, storing the resulting diagrams by default as "foo.nsd" (plus possibly "foo.0.nsd", "foo.1.nsd", etc.) and "bar.nsd" (plus possibly "bar.0.nsd", "bar.1.nsd", ... etc.) in the current folder. It also writes log files "foo.c.log" and "bar.pas.log" to the current directory (option "-v .").