S T R U C T O R I Z E R - User Guide
Import > Source Code

Source Code Import

Structorizer allows to derive a structogram from a given source code file (reverse engineering). By now, this import feature is only available for CLI Pascal, ANSI-C (two grammar versions), and COBOL files, other programming languages(e.g. Java) are likely to follow.

Be aware that the grammars used by Structorizer in order to parse the source code are usually somewhat simplified, and you might face parser errors with some correct code samples, which are simply too complex for a reverse engineering or contain peculiarities Structorizer may not cope with anyway. In particular, Structorizer cannot sensibly import so called "spaghetti code". This means code that makes use of GO TO instructions or other means of the source language not being compatible with the idea and concepts of structured programming. For instance C files containing function pointers may not be accepted. Code with normal pointers will pass the syntax analysis but the resulting diagrams won't be executable because Executor doesn't support pointer types. You may have to experiment with some language-specific Import Preferences or manually pre-process such code files (e.g. cut some parts out, modify others) in order to be able to import at least the essential algorithmic structure. See also section Troubleshooting.

In interactive mode, you can import code files of any supported programming language just by dragging the corresponding sources onto Structorizer. The respective parser will automatically be selected based on the file name extention. If the decision is ambiguous then you will be presented a choice menu to select the most appropriate parser.

Another way to achieve the same goal is to use the menu, i.e. "File => Import => Source Code ...".

In the file chooser dialog that will open you may select the appropriate file filter (combobox at the bottom of the dialog), both to restrict the search and to disambiguate the parser choice:

Import file filter choice in file dialog

The importer will parse the file according to a provided grammar and, if that has succeeded, synthesize control structures from the derived parse tree. Certain control keywords (or standard function names) of the source language recognized in the instruction texts may be replaced by the corresponding parser preferences for the same structures, as currently configured.

From version 3.28-05 on, a code import monitor will show you, in what phase the import process is working and the rough progress:

Code Import Monitor (vesrion 3.28-05), working

This way t may look like on the completion of an import:

Code Import Monitor (vesrion 3.28-05), completed

The monitor allows you to abort the import process via the "Cancel" button:

Code Import Monitor (vesrion 3.28-05), cancelled

If an error occurred then you will get a similar picture bur with the information that erros occurred (after accepting you would be shown the error description in a separate window, see further below):

Code Import Monitor (vesrion 3.28-05), failed

The import of a single file may produce many diagrams (one for each function or routine plus some diagrams for shared stuff). These diagrams will be poured into the Arranger unless their number exceeds a configurable limit (see Import Preferences):

Arranger with imported diagrams

If the limit is exceeded, you will be offered to save the diagrams instead and only the assumed main or most important diagram will be displayed:

Offer to save the diagrams rather than to display them

Syntactical Restrictions

Note that Pascal files to be imported must have a program, unit, package, or library header. If you want to convert a bare subroutine (procedure or function) definition you have to embed it in one of these sorts of compilable concepts before, e.g. in a unit or between

program dummy;

and

begin
end.

As mentioned above, if  C source files contain function pointers, then the ANSI-C73 parser won't cope with them. From version 3.28-05 on, however, you may use the ANSI-C99 parser instead (it may generally be recommended since its grammar is less restricted). C files should not make excessive use of externally defined preprocessor symbols like __stdcall, __thiscall, or __cdecl. You may, however, declare such symbols as "redundant" defines in the Import Preferences such that they would be removed automatically before the actual parsing begins. C import with the legacy ANSI-C73 parser won't accept the word "unsigned" allone (without subsequent scalar type name like "int" or "short") as typename. You will have to place e.g. "int" after any singular "unsigned" in the souce file (or a copy of it) or better use the new ANSI-C99 parser if available.

A file comprising several routine definitions will be converted into a set of separate diagrams, one for each of the routines, and - if not empty - another one for the main program. If you do the import within the application (rather than as batch job, see below) then the imported diagrams will be collected in the Arranger (if the configurable diagram number threshold as introduced with version 3.28-05 isn't exceeded).
Be aware that subroutine calls (references to other functions) can only be identified as such by Structorizer if the corresponding routine definitions are also available in the imported code. Otherwise the respective lines of code will usually be inserted as ordinary Instruction elements but may manually be transmuted to equivalent CALL elements (at least if you intend to run the algorithm with Executor). This element transmutation, however, can be done by a simple mouse click. The capabilities of identifying standard routines or library functions for which there would be analogous built-in routines in Structorizer are still rather poor. (But improvements are planned.)

Type definitions (particularly for record/struct types) and constant definitions may be essential for the interpretability of expressions. Therefore they will be imported (since release 3.27). In the resulting diagram set, they will occupy Instruction elements, possibly in Includable diagrams, as they are usually needed globally. Constant definitions will be converted to assignment instructions (typically dyed in rosé and equipped with a comment "constant!" by the respective parser). Initialized variable declarations will also be imported as assignments.

The import of mere variable declarations (i.e. without initial value assignment)may be enabled via the Import Preferences dialog. Imported (local) declarations will typically be coloured in a faint green. (Since version 3.26-02, declarations in Pascal or VisualBasic style are tolerated as content of Instruction elements.)

The import of comments may also be enabled via the Import Preferences dialog. Structorizer tries its best to associate comments found in the source code to the closest element they may belong to.

Note: Some code files as exported from Structorizer might cause errors on re-import if they haven't been completed manually before. (Watch out for TODO or FIXME comments in the generated code.) Declaration sections e.g. in Pascal export frm earlier Structorizer versions might only contain comments, but Pascal source files are not allowed to contain empty declaration areas (e.g. a var keyword without variables being declared after it).

Note: Global declarations and initialisations (e.g. from C source files) will be placed into so called Includable diagrams, which are referenced in the "include list" of the main program diagram (if one emerged from the file) and those routine diagrams that refer to them. Additionally, global declarations will be presented in a light cyan background colour after import. The C import will only support a minimum subset of the C pre-processor (simple #defines without arguments); #include and #if in any variant may NOT be expected to work. If the code strongly depends on them then you may run the C source code through your compiler's pre-processor (for example gcc -E) and import the pre-processed source in order to compare the results or pass the parser restrictions.

Typical error display on parsing failure

If the code to be imported is not compatible with the used import grammar then you will be presented an error dialog as shown above where always the last 10 lines of code are shown for better orientation - for the line numbers might not exactly match those of the original code because usually some problematic pieces of the source may have been cut off in a preprocessing phase. In the last line, a little arrow symbol (») indicates the character or token where the parser detected a problem. As usual, the parsing failure may actually have been caused by preceding parts of the code. The message box also tells you what kind of symbols the parser had expected. For a deeper analysis you may inspect the import log file placed in the folder of the imported file. Its name is automatically derived from the source file itself.

Often it is an iterative process to get complex source files imported where you may have to modify some import options step by step in order to overcome certain problems (and bump into others). Sometimes it may even be necessary to modify the source file.

Troubleshooting

If you always get errors on import of apparently correct source files then the parser log file placed next to your source file (if enabled) may be helpful in the analysis. It contians the read tokens during preprocessing, possibly some token report during the actual parsing, and the tried reduction rules during the build phase. In case of an error the error message will also be present here. Maybe the content doesn't say you so much but if you request help from the developer team then the parser log file will be highly appreciated.

If the pre-processing succeds but the parsing constantly fails, such that you assume that a defective preprocessing causes the troubles then the preprocessed intermediate source file can be very helpful. You may find it in your temp directory (location is OS-dependent). I is named "Structorizer<cryptic_hex_number_sequence>.<extension>" where the extension is the source-file-specific file name extension (e.g. "c", "pas" or the like).

Third helpful kind of file is the parse tree file in case the parsing succeeded but the diagram builder causes trouble.

Last but not least the general log file being situated in the .structorizer folder of your home directory. Look for the least outdated log files. You may have to close Structorizer in order to obtain a flushed file.

See Import Preferences in order to find out where yo may enable the respective logging.

Batch Import

Structorizer may also be used in batch mode to convert a source file (Pascal, C, or COBOL) into an NSD file. The command syntax is given below, where the underlined pseudo program name Structorizer is to be replaced with the respective batch or shell script name for the console environment:

  • structorizer.sh for Linux, UNIX, and the like;
  • Structorizer.bat for Windows.

The scripts can be found in the Structorizer installation directory; don't try with Structorizer.exe! The Java WebStart installation is not suited, either - you need the unzipped respective downloadable version.

Structorizer -p [parser-name] [-f] [-e encoding] [-v log-directory] [-s setting-file] [-o output-file] source-file ...

The options mean:

-p must be the first option and indicates the use as parser, i.e. for code import. As far as you don't provide an explicit parser-name, Structorizer will conclude from the file name extensions what code parser is to be used. This holds on a file-per-file basis, i.e. the source-file list may even be hetergenous, say contain mixed Pascal, C, and COBOL files for whatever parsers will be available. I case of ambiguity (in particular for C source and header files where two parsers are available in version 3.28-05), Structorizer will interactively ask for your favourite parser among the applicable ones:
Console window with interactive parser menu
If you (optionally) specify a parser-name next to the -p switch then this will override the automatic parser detection ans try to parse all files listed with this parser - no matter what file name extensions they may have (versions ≥ 3.28-05). The currently available parameter values for parser-name are (where synonyms are separated with '|' within a single line):

  • Pascal | D7Parser
  • ANSI-C73 | CParser
  • ANSI-C99 | C99Parser
  • COBOL | COBOLParser

-e (followed by a charset name) is reserved for the choice of the source file character set (for Pascal import it's still rather irrelevant, though, because the used Pascal grammar doesn't cope with any non-ASCII characters, such that these are simply eliminated in a pre-parsing step).

-f forces overwriting an existing file with same name as the designated output file (see -o), otherwise an output file name modification by an added or incremented counter will be done instead (e.g. output.nsd -> output.0.nsd), thus preserving the existing file.

-o (followed by a file path or name) specifies the output file name. If not given, the output file name will be derived from the source file name by replacing the file name extension with ".nsd". The file name extension ".nsd" is ensured in either case. If several source files were given then without option -o the nsd file names will be derived from the corresponding source file names; with option -o, however, the name variation described for the absence of option -f would be used (creating files output-file.nsd, output-file.0.nsd, output-file.1.nsd etc.). If several diagrams emerge from a source file then the respective function signature will be appended to the base original file name, e.g. output-file.sub1-0.nsd, output-file.sub2-4.nsd etc.

-s (followed by a text file path) specifies a settings-file other than the structorizer.ini file in your home directory, which is used as default for retrieval of parser-specific options for the import. The file must contain relevant key=value pairs, where the key is composed of the parser name and a corresponding import option name, both glued with a dot, e.g.:
COBOLParser.fixedColumnText=37
You may find the relevant keys in the structorizer.ini file and then copy the lines to your import setting file and modify the values without changing structorizer.ini. (Usually, you will adhre to the settings maintained in structorizer.ini, though, which is maintained via the Structorizer Import Preferences dialog).

-v (followed by a directory path) induces that for each imported file source-file a corresponding log file log-directory/source-file.log will be created in the specified folder, where preprocessor, parser and diagram builder write their log data into ("verbose mode"). These log files might help diagnosting parser trouble.

source-file (one or many file paths/names) the code files to be parsed and converted to Nassi-Shneiderman diagrams (nsd files).

Examples:

Structorizer.sh -p testprogram.pas

The above Linux/UNIX command imports file "testprogram.pas" from the current directory as Pascal source and will create the resulting nsd file with name "testprogram.nsd" (if it is a single diagram).

Structorizer.bat -p -e UTF-8 -o quicksort.nsd qsort.pas

This MSDOS command imports file "qsort.pas" (from the current directory) as UTF-8-encoded Pascal file and stores the resulting structogram in file "quicksort.nsd".

Structorizer.bat -p -e ISO-8859-1 -v . foo.c bar.pas

This MSDOS command parses the source files "foo.c" (as C file) and "bar.pas" (as Pascal file), assuming both to be encoded with ISO-8859-1 character set, storing the resulting diagrams by default as "foo.nsd" (plus possibly "foo.0.nsd", "foo.1.nsd", etc.) and "bar.nsd" (plus possibly "bar.0.nsd", "bar.1.nsd", ... etc.) in the current folder. It also writes log files "foo.c.log" and "bar.pas.log" to the current directory (option "-v .").