STRUCTORIZER User Guide

Import > Source Code

Source Code Import

Structorizer allows to derive a structogram from a given source code file (reverse engineering). By now, this import feature is only available for CLI Pascal, C (ANSI-C99), Java (SE 8), COBOL, and Processing files, other programming languages are likely to follow.

Be aware that the grammars used by Structorizer for parsing the source code are usually somewhat simplified, and you might face parser errors with some correct code samples, which are simply too complex for a reverse engineering or contain peculiarities Structorizer may not cope with anyway. In particular, Structorizer cannot sensibly import so called "spaghetti code". This means code that makes use of GO TO instructions or other means of the source language not being compatible with the idea and concepts of structured programming. Code with pointers will pass the syntax analysis but the resulting diagrams won't be executable because Executor doesn't support pointer types. You may have to experiment with some language-specific Import Preferences or manually pre-process such code files (e.g. cut some parts out, modify others) in order to be able to import at least the essential algorithmic structure. See also section Troubleshooting.

In interactive mode, you can import code files of any supported programming language just by dragging the corresponding sources onto Structorizer. The respective parser will automatically be selected based on the file name extention. If the decision is ambiguous then you will be presented a choice menu to select the most appropriate parser.

Another way to achieve the same goal is to use the menu, i.e. "File › Import › Source Code ...".

In the file chooser dialog that will open you may select the appropriate file filter (combobox at the bottom of the dialog), both to restrict the search and to disambiguate the parser choice:

Import file filter choice in file dialog

If the name extension of the selected file does not match the file filter of any of the available parsers then a choice dialog will open requesting to associate the intended parser (via the related file filter) or to cancel:

Parser choice dialog for ambiguous import files

The importer will parse the file according to a provided grammar and, if that has succeeded, synthesize control structures from the derived parse tree. Certain control keywords (or standard function names) of the source language recognized in the instruction texts may be replaced by the corresponding parser preferences for the same structures, as currently configured.

A code import monitor shows you what phase the import process is working in and the rough progress:

Code Import Monitor (version 3.28-05), working

This way it may look like on the completion of an import:

Code Import Monitor (version 3.28-05), completed

The monitor allows you to abort the import process via the "Cancel" button:

Code Import Monitor (version 3.28-05), cancelled

If an error occurred then you will get a similar picture but with the information that errors occurred (after accepting you would be shown the error description in a separate window, see further below):

Code Import Monitor (version 3.28-05), failed

The import of a single file may produce many diagrams (one for each function or routine plus some diagrams for shared stuff). These diagrams will be poured into the Arranger unless their number exceeds a configurable limit (see Import Preferences):

Arranger with imported diagrams

If the limit is exceeded, you will be offered to save the diagrams instead and only the assumed main or most important diagram will be displayed:

Offer to save the diagrams rather than to display them

When you select the target directory and accept the proposed name or specify a different name for the first of the files, you may opt for automatic acceptance of the name proposals for all remaining files via an "accessory" checkbox on the right-hand side of the file chooser dialog:

File chooser with accessory for SaveAll

If you would overwrite an existing file you will yet be warned and may freely decide to modify the name, to overwrite the old file, to skip this file, or to cancel the serial saving activity. (This opportunity will always occur if you use the "Save All" menu item or button with several files never having been saved.)

Note: Since version 3.30-11, a mode with all interactive code import opportunities being disabled (they don't even appear in the menu) is achievable. To activate this mode, do the following:

A structorizer.ini file is to be placed in the installation directory as predominant ini file, which must contain a line "noExportImport=1" (manually to be inserted e.g. by means of a text editor).
Other settings in this file are not necessary unless they shall be predominant.

Syntactical Restrictions

Pascal Code Import

Note that Pascal files to be imported must have a program, unit, package, or library header. If you want to convert a bare subroutine (procedure or function) definition you have to embed it in one of these sorts of compilable concepts before, e.g. in a unit or between

program dummy;

and

begin
end.

Since version 3.32-18, even some ObjectPascal / Delphi 7 sources may be imported producing half-way sensible sets of diagrams. Because Nassi-Shneiderman diagrams were not designed for OOP, some dirty tricks had to be used to present classes and methods in a reasonable way. See Java import for the taken compromises, which apply here in a similar way. (Analyser will of course complain about almost everything in the resulting diagrams.)

C Code Import

C source files should not make excessive use of externally defined preprocessor symbols like __stdcall, __thiscall, or __cdecl. You may, however, declare such symbols as "redundant" defines in the Import Preferences such that they would be removed automatically before the actual parsing begins.

NOTE: Release 3.30 removed the deprecated ANSI-C73 parser, which had very inconvenient syntactical limitations, e.g. it did not accept the reserved word "unsigned" alone (without subsequent scalar type name like "int" or "short") as typename. Source code containing function pointers could not be parsed, either.

COBOL Code Import

COBOL file import is a somewhat delicate task. First make sure that the format (fixed-format or not) of the file is correctly chosen in the COBOL-specific Import Preferences. The syntax of the language is very peculiar, some of the strangest constructs may not have been implemented in the import.

Certain types of statements may require manual or, optionally, automatic postprocessing, e.g. PERFORM THRU instructions are primarily converted to multi-line CALL elements (which cause Analyser warnings) and will therefore have to be split into single calls and cleaned up afterwards. By default this is done automatically after the import, but the COBOL-specific Import Preferences offer you the choice among three modes of assistance. If you decline automatic tidying then specifically inserted comments will suggest you sensible steps for tidying-up (since version 3.32-09):
Multi-line CALL resulting from COBOL import

The generated diagrams may also contain functionless (and therefore permanently disabled) marker elements (derived from the CALL element type, see there for examples) showing you e.g. the places of paragraph or section labels in the COBOL code (where subroutine code may have been extracted) or indicating COBOL statements that exceeded the capabilities of Structorizer import (the latter ones usually in signal red).

Another example for import deficiencies: Some COBOL expressions like X IS ALPHABETIC-LOWER without direct functional equivalent in Structorizer will be transformed into expressions with function syntax but no executional backup (here: isString(x) and isAlphabetic_lower(x)) in order to convey the meaning.

Java Code Import (since release 3.31)

Most Java source files can be imported if they comply with Java SE 8 syntax and don't make use of lambda expressions. There are some syntactical pitfalls, however:

The closing angular brackets of nested type arguments used to cause syntax errors if there is no space between them (they were mistaken for shift operators), e.g. in
HashMap<String, Stack<Integer>> doesntWork.
Version 3.32-18 introduced an import option "Separate >> of type parameters to > >" to automatically insert spaces between right angular brackets were they belong (but sparing actual shift operators) in the file preprocessing phase. The option is by default active. In some cases, however, it might compromise >> or >>> operators. If this happens to be a show stopper then you should switch the option off and insert the blanks manually instead until the parsing succeeds.
Empty type parameter lists, as e.g. in new ArrayList<>(Arrays.asList(values)), will automatically be removed before the parsing. In certain cases the same may happen for unspecific type parameters like in Class<?>[].
Annotations are not in all positions accepted where the Java language specification allows them. As they are of no importance in Structorizer, you might simply remove or outcomment them before the import.
Anonymous internal classes like in the following code snippet will only be converted into diagram elements if the Java-specific import option "Dissect anonymous inner classes into diagrams" (introduced with version 3.32-17) is enabled. Otherwise the code would simply be placed as expression (i.e., more or less as is) into the surrounding context element (here an Instruction element):
BaseClass b = new BaseClass(){int doSomethingDifferent(int a){return a * 13;}};

You should not expect the diagrams resulting from Java code import to be executable in Structorizer — Java is too deeply OOP-based, we can't provide a sufficient class context —, only limited efforts were made to convert the Java source style to the syntactic preferences in Structorizer (in the Import Options, there is a Java-specific checkbox to configure the degree of syntactical conversions, however). It would have alienated the content too much without significantly improving the chance to run/debug the diagrams. The major benefit of a Java import is assumed to be its graphical structure representation, and this is what it satisfies. Java classes will be represented by Includable diagrams. They are put to the include lists of all member diagrams (classes or methods) to potentially allow them access to the fields declared as constants or variables in the class-representing Includables. In order to address the hierarchical nature of Java classes (with member and local classes etc.), a "namespace" attribute was added to the diagrams, which is filled on Java code import with the package path (on top level) or the respective class / method name prefix (on nested levels), such hat the Arranger index can present the class path. If not empty, then even the editor for diagrams will show (and allow to manuipulate) it:

Demonstration of package / class paths after Java import

A checkbox menu item "Show qualifiers as prefix" in the context menu of the Arranger index allows to switch off the display of the "class paths" (see screenshot for the Processing import below). Instead, the hierarchical relations between the diagrams would then be represented as a multi-level tree (which costs more update time, though):

Arranger index with multi-level tree representation of inner classes

In addition to the subroutine diagrams, which are representing methods of a class, the respective method diagram headers (the declarations) are also inserted as permanently disabled pseudo-CALL elements (if the general import option "Import variable (and method) declarations" is selected). In order to improve legibility the inner text areas of these diverted CALL elements will not be hatched (in contrast to usual disabling) since version 3.32-20. These pseudo-CALLs serve just as sort of links to the method diagrams — via the menu entry "Edit Sub-routine ..." you can summon the referenced diagram into an additional Structorizer window for inspection:

Class diagram with method declaration

As mentioned above, you may opt out of their import (together with mere variable declarations). You may of course also simply remove the declarations after having imported them, or you could hide them via display mode Hide mere decarations.

Processing Code Import (since release 3.31)

Processing source code is like Java code with an implicit class on top level and a set of built-in functions and variables, which may be regarded as methods and fields of that implicit outer class. Other than Java, it is not an all-purpose language but dedicated to 2D and 3D graphic presentations. No main method is required to start. Instead, a setup() method is implicitly called as initialization. Then a method draw() is run in an implicit eternal loop after the initialization. On the import to Structorizer, the latter reflects the usual Processing behaviour by placing the respective calls into the top-level Includable and a related program diagram (see screenshots below for an example from the language reference).

The top-level includable for a Processing example Simulated main Processing program

Structorizer reflects some of the standard "Processing" constants in the Includable diagram (see above, way more of them since version 3.31-03, now placed in a separate Includable diagram). If the imported code contained individual classes, they will be represented in analogy to the Java import with hierarchy-reflecting qualifiers in the Arranger index. (Via the context menu, the qualifier prefix display may be switched off and give way to a deep tree representation instead):

Arranger index with imported Processing diagrams

The Processing parser still does not cope well will import directives at the beginning of the code. So better comment them out before you try to parse a pde file.

Subroutines / Methods

A file comprising several routine definitions (or a class with several methods) will be converted into a set of separate diagrams, one for each of the routines, and — if not empty — another one for the main program. If you do the import within the application (rather than as batch job, see below) then the imported diagrams will be collected in the Arranger (if the configurable diagram number threshold as introduced with version 3.28-05 isn't exceeded).
Be aware that subroutine calls (references to other functions) can only be identified as such by Structorizer if the corresponding routine definitions are also available in the imported code. Otherwise the respective lines of code will usually be inserted as ordinary Instruction elements but may manually be transmuted to equivalent CALL elements (at least if you intend to run the algorithm with Executor). This element transmutation, however, can be done by a simple mouse click. The capabilities of identifying standard routines or library functions for which there would be analogous built-in routines in Structorizer are still rather poor. (But improvements are planned.)

Definitions and Declarations

Type definitions (particularly for record/struct types) and constant definitions may be essential for the interpretability of expressions. Therefore they will be imported (since release 3.27). In the resulting diagram set, they will occupy Instruction elements, possibly in Includable diagrams, as they are usually needed globally. Constant definitions will be converted to assignment instructions (typically dyed in rosé and equipped with a comment "constant!" by the respective parser). Initialized variable declarations will also be imported as assignments.

The import of mere variable declarations (i.e. without initial value assignment) may be enabled via the Import Preferences dialog. Imported (local) declarations will typically be coloured in a faint green. (Since version 3.26-02, declarations in Pascal or VisualBasic style are tolerated as content of Instruction elements.)

When you import a class (Java or Processing) then method declarations may be added to the Includable diagram that will represent the imported class. These are permanently disabled elements shaped like a CALL and referencing the respective method diagrams. The creation of these method reference elements depends on the same import option as for variable declaration import.

Comments

The import of comments may also be enabled via the Import Preferences dialog. Structorizer tries its best to associate comments found in the source code to the closest element they may belong to.

Note: Some code files as exported from Structorizer might cause errors on re-import if they haven't been completed manually before. (Watch out for TODO or FIXME comments in the generated code.) Declaration sections e.g. in Pascal export frm earlier Structorizer versions might only contain comments, but Pascal source files are not allowed to contain empty declaration areas (e.g. a var keyword without variables being declared after it).

Global and Shared Stuff

Note: Global declarations and initialisations (e.g. from C source files) will be placed into so called Includable diagrams, which are referenced in the "include list" of the main program diagram (if one emerged from the file) and those routine diagrams that refer to them. Additionally, global declarations will be presented in a light cyan background colour after import. The C import will only support a minimum subset of the C pre-processor (simple #defines without arguments); #include and #if in any variant may NOT be expected to work. If the code strongly depends on them then you may run the C source code through your compiler's pre-processor (for example gcc -E) and import the pre-processed source in order to compare the results or pass the parser restrictions.

Troubleshooting

Typical error display on parsing failure

If the code to be imported is not compatible with the used import grammar then you will be presented an error dialog as shown above where always the last 10 lines of code are shown for better orientation — for the line numbers might not exactly match those of the original code because usually some problematic pieces of the source may have been cut off in a preprocessing phase. In the last line, a little arrow symbol (») indicates the character or token where the parser detected a problem. As usual, the parsing failure may actually have been caused by preceding parts of the code. The message box also tells you what kind of symbols the parser had expected. For a deeper analysis you may inspect the import log file placed in the folder of the imported file. Its name is automatically derived from the source file itself.

Often it is an iterative process to get complex source files imported where you may have to modify some import options step by step in order to overcome certain problems (and bump into others). Sometimes it may even be necessary to modify the source file.

If you always get errors on import of apparently correct source files then the parser log file placed next to your source file (if enabled) may be helpful in the analysis. It contains the tokens read during preprocessing, possibly some token report emerged from the actual parsing, and the tried reduction rules during the build phase. In case of an error the error message will also be present here. Maybe the content doesn't say you so much but if you request help from the developer team then the parser log file will be highly appreciated.

If the pre-processing succeeds but the parsing constantly fails, such that you assume that a defective preprocessing causes the troubles, then the preprocessed intermediate source file can be very helpful. You may find it in your temp directory (location is OS-dependent). It is named
"Structorizer<cryptic_hex_number_sequence>.<extension>"
where <extension> is the source-file-specific file name extension (e.g. "c", "pas" or the like). Date and time of the file may help to identify the relevant one.

Third helpful kind of file is the parse tree file in case the parsing succeeded but the diagram builder causes trouble.

Last but not least consider the general log file being situated in the .structorizer folder of your home directory. Look for the most recent (or least out-dated) log files. You may have to close Structorizer in order to obtain a flushed file.

See Import Preferences in order to find out where you may enable the respective logging.

Batch Import

Structorizer may also be used in batch mode to convert a source file (Pascal, C, or COBOL) into an NSD file or, morwe typically, into a set of NSD files or an arrangement archive. The command syntax is given below, where the underlined pseudo program name Structorizer is to be replaced with the respective batch or shell script name for the console environment:

structorizer.sh for Linux, UNIX, and the like;
Structorizer.bat for Windows.

The scripts can be found in the downloadable Structorizer zip packages (you always need these for batch command); don't try with Structorizer.exe!

Structorizer (-p|--parse) [parser-name] [-f] [-z] [-e encoding] [-l max-line-length] [-v log-directory] [-s settings-file] [-o output-file] source-file ...

The options mean:

-p (or, equivalently, --parse, with versions ≥ 3.32-23) must be the first option and indicates the use as parser, i.e. for code import. Unless you provide an explicit parser-name, Structorizer will conclude from the file name extensions what code parser is to be used. This holds on a file-per-file basis, i.e. the source-file list may even be hetergenous, say it may contain mixed Pascal, C, and COBOL files for whatever parsers will be available.

If you (optionally) specify a parser-name next to the -p switch then this will override the automatic parser detection and try to parse all files listed with this parser — no matter what file name extensions they may have (since version 3.28-05). The currently available parameter values for parser-name are (where '|' separates synonyms; ANSI-C73 aka CParser was withdrawn by release 3.30, Java-Se8 and Processing were introduced with release 3.31):

Pascal | D7Parser
ANSI-C99 | C99Parser
COBOL | COBOLParser
Java-SE8 | JavaParser
Processing | ProcessingParser

-e (followed by a charset name) is reserved for the choice of the source file character set (for Pascal import it's still rather irrelevant, though, because the used Pascal grammar doesn't cope with any non-ASCII characters, such that these are simply eliminated in a pre-parsing step).

-f forces overwriting an existing file with same name as the designated output file (see -o), otherwise an output file name modification by an added or incremented counter will be done instead (e.g. output.nsd -> output.0.nsd), thus preserving the existing file.

-l (followed by a non-negative number) specifies after how many characters a text line should be broken (wrapped) in order to avoid too long lines. The line wrapping respects syntactical units like string literals etc. (it is sort of a word wrapping). If zero is specified then automatic line breaking will be suppressed. If this option is not specified then the line limit from the import preferences held in the structorizer.ini file will be used.

-o (followed by a file path or name) specifies the output file name. If not given, the output file name will be derived from the source file name by replacing the file name extension with ".nsd". The file name extension ".nsd" is ensured in either case. If several source files were given then without option -o the nsd file names will be derived from the corresponding source file names; with option -o, however, the name variation described for the absence of option -f would be used (creating files output-file.nsd, output-file.0.nsd, output-file.1.nsd etc.). If several diagrams emerge from a source file then the respective function signature will be appended to the base original file name, e.g. output-file.sub1-0.nsd, output-file.sub2-4.nsd etc.

-s (followed by a text file path) specifies a settings-file to be used used for retrieval of general and parser-specific options for the import. (Without switch -s the application defaults would be used.) The file must contain relevant key=value pairs, where the keys for parser-specific options are composed of the parser name and a corresponding import option name, both glued with a dot, whereas general import option keys start with "imp", e.g.:
C99Parser.definesToConstants=true
COBOLParser.fixedColumnText=37
impComments=true
Since version 3.29-12, you may configure import options in Structorizer GUI and save just these import options to a specific ini file, see Preferences export and import. So you won't any longer have to look for the relevant keys among the randomly ordered key-value pairs in the abundant structorizer.ini file like for a needle in a haystack and then copy the strewn lines to your import setting file. Of course you can modify the values with a text editor in the selectively saved ini file without changing your settings residing in structorizer.ini. (Usually, you will adhere to the settings held in structorizer.ini, though, which is maintained via the Structorizer Import Options dialog).

-v (followed by a directory path) induces that for each imported file source-file a corresponding log file log-directory/source-file.log will be created in the specified folder, where preprocessor, parser and diagram builder write their log data into ("verbose mode"). These log files might help diagnose parser trouble.

-z specifies that in case a source file induces more than one diagram file (typically if the source contains several routines) a compressed arrangement archive (with file name extension .arrz) is generated instead of loose files (since version 3.29-09). In this case, only the archive file will inherit the source file name (or the outfile name specified with option -o), the diagram files within the archive will have file names as proposed by the created diagrams, i.e. derived from the diagram name or subroutine signature.
If option -z is not specified then at least an arrangement list file (with the name of the source file or the out file and extension .arr) will be produced for each set of diagrams emerging from one source file (as far as they are more than one). This way, it is very convenient to load the connected diagrams at once into Structorizer or Arranger.

source-file (one or many file paths/names) stands for the code files to be parsed and converted to Nassi-Shneiderman diagrams (nsd files).

Examples:

structorizer.sh -p testprogram.pas

The above Linux/UNIX command imports file "testprogram.pas" from the current directory as Pascal source and will create the resulting nsd file with name "testprogram.nsd" (if it is a single diagram).

Structorizer.bat -p -e UTF-8 -o quicksort.nsd qsort.pas

This MSDOS command imports file "qsort.pas" (from the current directory) as UTF-8-encoded Pascal file and stores the resulting structogram in file "quicksort.nsd".

Structorizer.bat -p -e ISO-8859-1 -v . foo.c bar.cob

This MSDOS command parses the source files "foo.c" (as C file) and "bar.cob" (as COBOL file), assuming both to be encoded with ISO-8859-1 character set, storing the resulting diagrams by default as "foo.nsd" (plus possibly "foo.0.nsd", "foo.1.nsd", etc.) and "bar.nsd" (plus possibly "bar.0.nsd", "bar.1.nsd", ... etc.) in the current folder. It also writes log files "foo.c.log" and "bar.pas.log" to the current directory (option "-v .").