Training Module 22: using Pipes and Tokens

The slides used in the following video are shown below:

Slide 1

Welcome to the training course on IBM Scalable Architecture for Financial Reporting, or SAFR.  This is Module 22, using Pipes and Tokens

Slide 2

Upon completion of this module, you should be able to:

  • Describe uses for the SAFR piping feature
  • Read a Logic Table and Trace with piping
  • Describe uses for the SAFR token feature
  • Read a Logic Table and Trace with tokens
  • Debug piping and token views

Slide 3

This module covers three special logic table functions, the WRTK which writes a token, the RETK which reads the token and the ET function which signifies the end of a set of token views.  These functions are like the WR functions, the RENX function and the ES function for non-token views. 

Using SAFR Pipes does not involve special logic table functions.

Slide 4

This module gives an overview of the Pipe and Token features of SAFR.

SAFR views often either read data or pass data to other views, creating chains of processes, each view performing one portion of the overall process transformation

Pipes and Tokens are ways to improve the efficiency with which data is passed from one view to another

Pipes and Tokens are only useful to improve efficiency of SAFR processes; they do not perform any new data functions not available through other SAFR features.

Slide 5

A pipe is a virtual file passed in memory between two or more views. A pipe is defined as a Physical File of type Pipe in the SAFR Workbench.

Pipes save unnecessary input and output. If View 1 outputs a disk file and View 2 reads that file, then time is wasted for output from View 1 and input to View 2. This configuration would typically require two passes of SAFR, one processing view 1 and a second pass processing view 2.

If there is a pipe placed between View 1 and View 2, then the records stay in memory, and no time is wasted and both views are executed in this single pass.

The pipe consists of blocks of records in memory.

Slide 6

Similarly a token is a named memory area. The name is used for communication between two or more views.

Like a pipe, it allows passing data in memory between two or more views.

But unlike pipes which are virtual files and allow asynchronous processing, tokens operate one record at a time synchronously between the views.

Slide 7

Because Pipes simply substitute a virtual file for a physical file, views writing to a pipe are an independent thread from views reading the pipe.

Thus for pipes both threads are asynchronous

Tokens though pass data one record at a time between views.  The views writing the token and the views reading the token are in the same thread.

Thus for tokens there is only one thread, and both views run synchronously

Slide 8

An advantage of pipes is that they include parallelism, which can decrease the elapsed time needed to produce an output.  In this example,  View 2 runs in parallel to View 1. After View 1 has filled block 1 with records, View 2 can begin reading from block 1.  While View 2 is reading from block 1, View 1 is writing new records to block 2. This advantage is one form of parallelism in SAFR which improves performance. Without this, View 2 would have to wait until all of View 1 is complete.

Slide 9

Pipes can be used to multiply input records, creating multiple output records in the piped file, all of which will be read by the views reading the pipe.  This can be used to perform an allocation type process, where a value on a record is divided into multiple other records.

In this example, a single record in the input file is written by multiple views into the pipe.  Each of these records is then read by the second thread.

Slide 10

Multiple views can write tokens, but because tokens can only contain one record at a time, token reader views must be executed after each token write.  Otherwise the token would be overwritten by a subsequent token write view.

In this example, on the left a single view, View 1, writes a token, and views 2 and 3 read the token. View 4 has nothing to do with the token, reading the original input record like View 1.

On the right, both views 1 and 4 write to the token.  After each writes to the token, all views which read the token are called. 

In this way, tokens can be used to “explode” input event files, like pipes.

Slide 11

Both pipes and tokens can be used to conserve or concentrate computing capacity.  For example CPU intensive operations like look-ups, or lookup exits, can be performed one time in views that write to pipes or tokens, and the results added to the record that is passed onto to dependent views.  Thus the dependent, reading views will not have to perform these same functions.

In this example, the diagram on the left shows the same lookup is performed in three different views.  Using piping or tokens, the diagram on the right shows how this lookup may be performed once, the result of the lookup stored on the event file record and used by subsequent views, thus reducing the number of lookups performed in total.

Slide 12

A limitation of pipes is the loss of visibility to the original input record read from disk; the asynchronous execution of thread 1 and thread 2 means the records being processed in thread one is unpredictable.

There are instances where visibility by the view using the output from another view to the input record is important.  As will be discussed in a later module, the Common Key Buffering feature can at times allow for use of records related to the input record.  Also, the use of exits can be employed to preserve this type of visibility.  Token processing, because it is synchronous within the same thread, can provide this capability. 

In this example, on the top the token View 2 can still access the original input record from the source file if needed, whereas the Pipe View 2 on the bottom cannot.

Slide 13

Piping can be used in conjunction with Extract Time Summarization, because the pipe writing views process multiple records before passing the buffer to the pipe reading views.  Because tokens process one record at a time, they cannot use extract time summarization.  A token can never contain more than one record.

In the example on the left, View 1, using Extract Time Summarization, collapses four records with like sort keys from the input file before writing this single record to the Pipe Buffer.  Whereas Token processing on the right will pass each individual input record to all Token Reading views.

Note that to use Extract Time Summarization  with Piping, the pipe reading views must process Standard Extract Format records, rather than data written by a WRDT function; summarization only occurs in CT columns, which are only contained in Standard Extract Format Records. 

Slide 14

Workbench set-up for writing to, and reading from a pipe is relatively simple.  It begins with creating a pipe physical file and a view which will write to the pipe.

Begin by creating a physical file with a File Type of “Pipe”.

Create a view to write to the Pipe, and within the View Properties:

  • Select the Output Format of Flat File, Fixed-Length Fields,
  • Then select the created physical file as the output file

Slide 15

The next step is to create the LR for the Pipe Reader Views to use.

The LR used for views which read from the pipe must match exactly the column outputs of the View or Views which will write to the Pipe.  This includes data formats, positions, lengths and contents.

In this example, Column 1 of the View,  ”Order ID” is a 10 byte field, and begins at position 1.  In the Logical Record this value will be found in the Order_ID field, beginning at position 1 for 10 bytes.

Preliminary testing of the new Logical Record is suggested, making the view write to a physical file first, inspect the view output, and ensure the data matches the Pipe Logical Record, and then change the view to actually write to a pipe.  Looking at an actual output file to examine positions is easier than using the trace to detect if positions are correct as the Pipe only shows data referenced by a view, not the entire written record produced by the pipe writing view.

Slide 16

The view which reads the Pipe uses the Pipe Logical Record.  It also must read from a Logical File which contains the Pipe Physical File written to by the pipe writer view.

In this example, the view read the pipe-LR ID 38, and the Pipe LF ID 42 Logical File which contains the Physical File of a type Pipe written to by the pipe writing view.  The view itself uses all three of the fields on the Pipe Logical Record.

Slide 17

This is the logic table generated for the piping views.  The pipe logic table contains no special logic table functions.  The only connection between the ES sets is the shared physical file entity.

The first RENX – ES set is reading Logical File 12.  This input event file is on disk.  The pipe writing view, view number 73, writes extracted records to Physical File Partition 51. 

The next RENX – ES set reads Logical File 42.  This Logical File contains the same Physical File partition (51) written to by the prior view.  The Pipe Reader view is view 74.

Slide 18

This the the trace for piped views.  In the example on the left, the thread reading the event file, with a DD Name of ORDER001, begins processing input event records.  Because the pipe writer view, number 73, has no selection criteria, each record is written to the Pipe by the WRDT function.  All 12 records in this small input file are written to the Pipe.

When all 12 records have been processed, Thread 1 has reached the end of the input event file, it stops processing, and turns the pipe buffer over to Thread 2 for processing.

Thread 2, the Pipe Reader thread, then begins processing these 12 records. All 12 records are processed by the pipe reader view, view number 74.

As shown on the right, if the input event file had contained more records, enough to fill multiple pipe buffers, the trace still begins with Thread 1 processing, but after the first buffer is filled, Thread 2, the pipe reading view, begins processing.  From this point, the printed trace rows for Thread 1 and Thread 2 may be interspersed as both threads process records in parallel against differing pipe buffers.  This randomness is highlighted by each NV function against a new event file record

Slide 19

Workbench set-up for writing to, and reading from a token is very similar to the set-up for pipes. It begins with creating a physical file with a type of token, and a view which will write to the token.

Begin by creating a physical file with a File Type of “Token”.

Create a view to write to the token, and within the View Properties, select the Output Format of Flat File, Fixed-Length Fields.

Then in the proprieties for the Logical Record and Logical File to be read, select the output Logical File and created Physical File as the destination for the view

Slide 20

Like pipe usage, the next step is to create the Logical Record for the Token Reader Views to use. In our example, we have reused the same Logical Record used in the piping example.  Again, the logical record which describes the token must match the output from the view exactly, including data formats, positions, lengths and contents.

Slide 21

Again, the view which reads the Token uses this new Logical Record.  It also must read from a Logical File which contains the Token Physical File written to by the token writer view.

Note that the same Logical File must be used for both the token writer and token reader; using the same Physical File contained in two different Logical Files causes an error in processing. 

Slide 22

This is the logic table generated for the token views.

Any views which read a token are contained within an RETK – Read Next Token / ET – End of Token set at the top of the logic table.

The token writer view or views are listed below this set in the typical threads according to the files they read; the only change in these views is the WRTK logic table function, which Writes a Token shown at the bottom of the logic table in red.

Although the token reading views are listed at the top of the Logic Table, they are not executed first in the flow.  Rather, the regular threads process, and when they reach an WRTK Write Token logic table function, the views in the RETK thread are called and processed.  Thus the RETK – ET set is like a subroutine, executed at the end of the regular thread processing.

In this example, view 10462 is reading the token written by view 10461.  View 10462 is contained within the RETK – ET set at the top of the logic table.  View 10461 has a WRTK function to write the token.  After execution of this WRTK function, all views in the RETK – ET set are executed.

Slide 23

This is the Logic Table Trace for the Token Views.

Note that, unlike the Pipe Example which has multiple Event File DD Names within the Trace, because Tokens execute in one thread, the Event File DD Name is the same throughout, ORDER001 in this example.

The WRTK Write Token Function, on view 10461 in this example, which ends the standard thread processes are immediately followed by the Token Reading view or views, view 10462 in this example.  The only thing that makes these views look different from the views reading the input event file is that they are processing against a different Logical File and Logical Record, Logical File ID 2025  and Logical Record ID 38 in this example. 

Slide 24

The trace on the right has been modified to show the effect of having multiple token writing views executing at the same time. 

Note multiple executions of View 10462, the token reading view processing the same Event Record1,  from the Same Event DD Name ORDER001, after each token writing views, 10460 and 10461.

Slide 25

The GVBMR95 control report shows the execution results from running both the Pipe and Token views at the same time.

  • At the top of the report, the Input files listed include the
  • Disk File, the event file for both the pipe and token writing views
  • the Token (for the Token Reading views), and
  • the Pipe (for the Pipe Reading views)
  • The bottom of the report shows the final output disposition, showing records were written to
  • an Extract File (shared by the pipe and token reading views),
  • the Token (for the Token Writing View) and
  • the Pipe (for the Pipe Writing View)
  • The center section shows more details about each one of the Views.

Slide 26

The top of this slide shows the final run statistics about the run from the GVBMR95 control report.

Note that a common error message may be encountered if view selection for the pass is not carefully constructed, for either Pipes or Tokens.  Execution of a pass which contains only  Pipe Writers and no Pipe Readers will result in this message.  The opposite is also true, with no Pipe Writers and only Pipe Readers selected.  The same is also true for token processes.  Pairs of views must be executed for either process, matched by Logical Files containing matching Physical Files for the process to execute correctly.

Slide 27

This module described using piping and tokens. Now that you have completed this module, you should be able to:

  • Describe uses for the SAFR piping feature
  • Read a Logic Table and Trace with piping
  • Describe uses for the SAFR token feature
  • Read a Logic Table and Trace with tokens
  • Debug piping and token views

Slide 28

Additional information about SAFR is available at the web addresses shown here. This concludes Module 22, The SAFR Piping and Token Functions

Slide 29

Research on Next Generation SAFR: 2017

The following is an R&D effort to define the next generation of SAFR, composed by Kip Twitchell, in March 2017. It was not intended to be a formally released paper, but the ideas expressed herein have continued to resonate with later assessments of the tool direction.

Proposal for Next Generation SAFR Engine Architecture

Kip Twitchell

March 2017

Based upon my study the last two month, in again considering the next generation SAFR architecture, the need to separate the application architecture from the engine architecture has become clear: the same basic combustion engine can be used to build a pump, a generator, a car or a crane.  Those are applications, not engines.   In this document, I will attempt to outline a possible approach to the Next Generation SAFR engine architecture, and then give example application uses after that.

This document has been updated with the conclusions from a two day workshop at SVL in the section titled “Conclusions” after the proposal in summary.

Background

This slide from Balancing Act is perhaps a simple introduction to the ideas in this document:

(from Balancing Act, page 107)

Key SAFR Capabilities

The SAFR architecture recognizes that building balances from transactions is critical to almost all quantitative analytical and reporting processes.  All financial and quantitative analytics begins with a balance, or position.  Creating these is the key SAFR feature.

Additionally, SAFR enables generation of temporal and analytical business events in the midst of the posting process, and posts these business events at the same time.  In other words, some business events are created from outstanding balances.  SAFR does this very effectively.

Third, the rules used to create balances and positions sometimes change.  SAFR’s scale means it can rebuild balances when rules change; for reclassification and reorganization problems, changes in required analyses, or new, unanticipated analyses.

Fourth, SAFR’s high speed join capabilities, especially date effective joins, and the unique processing model for joins on high cardinality, semi-normalized table structures creates capabilities not available in any other tool.

Last, the ability to do all these things in one pass through the data, post initial transactions, generated new transactions, reclassify some balances or create new ones, and join all the dimensions together, along with generating multiple outputs, and do so efficiently, creates a unique set of functions. 

This combined set of features is what is enable by the bottom diagram.

Key SAFR Limitations

SAFR has often been associated with posting or ETL processes, which are typically more closely aligned with the tail end of the business event capture processes.  It also has association with the report file generation processes at the beginning of the other end of the data supply chain.  It has done so without fully embracing either ends of this spectrum:  It has attempted to provide a simplified user interface for reporting processes which was not completely a language to express posting processes, and it has not integrated with SQL, the typical language of reporting processes.

The idea behind this paper is that SAFR should more fully embrace the ends of these spectrums.  However, to be practical, it does not propose attempting to build a tool to cover end-to-end data supply chain, the scope of which would be very ambitious. Covering the entire data supply chain would require building two major user interfaces, for business event capture and then reporting or analysis.   The approach below contemplates allowing most of the work on either of these interfaces to be done by other tools, and allowing them to use the SAFR engine, to reduce scope of the work involved. 

This paper assumes an understanding of how SAFR processes today to be briefer in writing.  This document is intended for an internal SAFR discussion, not a general customer engagement. 

Proposal in Summary

In summary, SAFR should be enhanced as follows:

Overall

1) The SAFR engine should be made a real-time engine, capable of real time update and query capabilities

2) It should also be configurable to execute in a batch mode if the business functions require (system limitations, posting process control limitations, etc.).

Posting Processes

3) The existing workbench should be converted or replaced with a language IDE, such as Java, Java Script, Scala, etc. The compile process for this language would only permit a subset of the total functions of those languages that fit in the SAFR processing construct.

4) When the engine is executed in “Compiled Code” mode, (either batch or real time), it would generally perform functions specified in this language, and the logic will be static.

5) Real-time mode application architecture would typically use SAFR against indexed access inputs and outputs, i.e., databases capable of delivering specific data for update, and applying updates to specific records.

Reporting Processes

6) SAFR should have a second logic interpreter in addition to the language IDE above, and accept SQL as well.  This would generally be used in an “Interpreted Logic” mode.

7) To resolve SQL, SAFR would be a “bypass” SQL interpreter.  If the logic specified in SQL can be resolved via SAFR, the SAFR engine should resolve the request. 

8) If SAFR cannot satisfy all the required elements, SAFR should pass the SQL to another SQL engine for resolution.  If the data is stored in sequential files, and SAFR is the only resolution engine available, and an error would be returned.

Aggregation and Allocation Functions

9) SAFR functionality should be enhanced to more easily perform aggregation and allocation functions. 

10) This may require embedded sort processes, when the aggregation key is not the natural sort order of the input file.

11) A more generalized framework for multiple level aggregation (subtotals) should be considered, but may not be needed in the immediate term.

12) Additionally, a generalized framework for hierarchy processes should also be contemplated.

Workshop Conclusions:

In the two day workshop I made the point that the z/OS version of SAFR does not meet the requirements of this paper; and the SAFR on Spark version did not as well.  Beyond this, the workshop concluded the following:

(1) The existing SAFR GUI and its associated meta data tables, language, and XML are all inadequate to the future problems, because the GUI is too simple for programming—thus the constant need for user exits—and it is too complex for business people—thus one customer built RMF;

(2) The database becomes a problem in the traditional architecture because there is this explosion of rows of data in the LR Buffer which then collapses down to the answer sets, but the database in the middle means it all has to get stored before it can be retrieved to be collapsed. 

(3) Platform is not an issue considered in this paper, and that the experience of both existing customers seems to demonstrate there is something unique in SAFR that other tools do not have. 

(4) We agreed that although the point about the data explosion and the database in the middle being a problem, taking on the SQL interpretation mode was a bridge too far; we should let the database do the work of satisfying the queries.  SAFR can still produce views which selects and aggregates data to be placed in the database for use, as it has historically.

(5) Stored procedures are just too simplistic for the kinds of complex logic in a posting process; so, we need to build something outside the database which can access the database for inputs and for storing outputs.

(6) There is not a good reason to require the programmer to go clear to a procedural language to specify the SAFR process; The SAFR on Spark POC SQL is a declarative language that does not put as much burden on the developer.

(7) It is possible to use in-line Java code within the SAFR on Spark prototype when a procedural language is required.  This could eliminate simple user exits.

(8) Spark is a good framework for a distributed work load; z/OS manages the parallel processes very effectively, but requires a big server to do so.  Using another language on another platform would require building that distribution network.  That is expensive.

(9) The real time capabilities would simply require creation of a daemon that would then execute the SAFR on Spark processes whenever needed.  This is nearly a trivial problem; performance and response time notwithstanding.

(10) We do not have enough performance characteristics to say if the Spark platform and the prototype will scale for any significant needs.

Real-time Engine and IO Models

The following table outlines likely processing scenarios:

Real-time ModeBatch Mode
Input TypesDatabase Messaging QueueSequential files or Database or Messaging Queue
Output TypesDatabase or Messaging Queue or Sequential FilesSequential Files

Sequential files in this table included native Hadoop structures.  Output from batch mode files could be loaded into databases or messaging queues beyond SAFR processing, as is typically done today with SAFR.

Recommendation:  SAFR today recognizes different IO routines, particularly on input.  It has an IO routine for sequential files, DB2 via SQL, VSAM, DB2 via VSAM. It has had prototype IMS and IDMS drivers.  This same construct can be carried forward into the new version, continuing to keep separate the logic specification from the underlying data structures, both inbound and outbound.  New data structure accesses can be developed over time.

Enhancement:  An additional processing approach could be possible but less likely except in very high volume situations: Real-time mode against input sequential files.  In this configuration, SAFR could continually scan the event files.  Each new request would jump into the ongoing search, noting the input record count of where the search begins.  It would continue to the end of the file, and through the loop back to the start of the file, continuing until it reaches the starting record count again, and then end processing for that request.  This would only be useful if very high volumes of tables scans and very high transaction volumes made the latency in response time acceptable.

Impediments:  The challenges to a real-time SAFR engine is all the code generation interpretation processes in the performance engine.  Making these functions very fast from their current form would be difficult.  Using an IDE for logic specification has a much better chance of making the performance possible.  Performance for the interpreted mode would require significant speed in interpretation and generation of machine code from the SQL logic.

Compiled Code Mode

Background:  Major applications have been built using SAFR.  The initial user interface for SAFR (then called Geneva, or 25 years ago) was intended to be close to an end user tool for business people creating reports.  The Workbench (created starting in 1999) moved more and more elements away from business users, and is now wholly used only by IT professionals. 

Additionally, because of the increasingly complex nature of the applications, additional features to manage SAFR views as source code have been developed in the workbench.  These include environment management, and attempts at source code control system integration.  These efforts are adequate for the time, but not highly competitive with other tools. 

SAFR’s internal “language” was not designed to be a language, but rather, only provide logic for certain parts of a view.  The metadata constructs provide structures, linkage to tables or files, partitioning, join logic.  The view columns provide structure for the fields or logic to be output.  The SAFR language, called Logic Text, only specifies logic within these limited column and general selection logic.

Recommendation:  Given the nature of the tool, it is recommended SAFR adopt a language for logic specification.  I would not recommend SQL as it is not a procedural language, and will inhibit new functionality over time.

The fundamental unit for compile would be a Pass or Execution, including the following components:

  • A Header structure, which defines the meta data available to all subprograms or “method”.  It would contain:
    • The Event and lookup LR structures
    • The common key definition
    • Join paths
    • Logical Files and Partitions
    • Available functions (exits)
  • A View Grouping structure,
    • Defining piping and token groups of views.
    • Tokens and similar (new) functions could potentially be define as In-line (or view) function calls (like tokens) which can be used by other views.
  • Each view would be a method or subprogram that will be called and passed each event file record

The final completed Execution code could be compiled, and the resulting executable managed like any other executable; the source code could be stored in any standard source code control library.  All IT control and security procedures could follow standard processes.  This recognizes the static nature of most SAFR application development processes.

Although this approach offloads some portion of the user interface, language definition, and source code management issues, the next gen SAFR code base would have to include compilers for each platform supported.  Additional SAFR executables would be IO modules, and other services like join, common key buffer management, etc. for each platform supported.

Impediments:  A standard IDE and language will likely not include the Header and Grouping structure constructs required.  Can a language and IDE be enhanced with the needed constructs? 

Interpreted Logic Mode

Background:  Certain SAFR processes have been dynamic over the years, in various degrees.  At no time, however, has the entire LR and meta data structure definition been dynamic. In the early days, the entire view was dynamic in that there were fairly simple on-line edits, but some views were disabled during the production batch run. In more recent years fewer pieces have been dynamic.  Reference data values have always been variable, and often been used to vary function results, in the last two decades always from externally maintained systems like ERP systems.  Most recently, new dynamic aspects have been bolted on through new interfaces to develop business rules facilities. 

The SAFR on Spark prototype application demonstrated the capability to turn SAFR logic (a VDP, or the VDP in XML format) into SQL statements quite readily.  It used a CREATE statement to specify the LR structure to be used.  It had to create a new language construct, called MERGE to represent the common key buffering construct.  SQL tends to be the language of choice for most dynamic processes.  SQL also tends to be used heavily in reporting applications of various kinds. 

Recommendation:  Develop a dynamic compile process to turn SAFR specific SQL into SAFR execution code.   This will allow for other reporting tools to gain the benefits of SAFR processes in reporting.  If the data to be accessed by SAFR is in a relational database (either the only location or replicated there), SAFR could pass off the SQL to another resolution engine if non-SAFR SQL constructs are used. 

Impediments:  It is not clear how the MERGE parameters would be passed to the SAFR SQL Interpreter.  If sequential files are used for SAFR, would a set of dummy tables in a relational system provide a catalogue structure sufficient for the reporting tool to generate the appropriate reporting requirements?  Also, can the cost of creating two compilers be avoided?

Enhancement:  CUSTOMER’s EMQ provides a worked example of a reporting application which resolves SQL queries more efficiently that most reporting tool.  It uses SQL SELECT statements for the general selection criteria, and the database functions for and sort (ORDER BY), and some aggregation (GROUP BY) functions.  When considering SAFR working against an indexed access structure, this construct may be similar.  More consideration should perhaps be given.

Compile Code and Interpreted Logic Intersection

Background:  The newer dynamic, rules based processes, and the older, more static Workbench developed processes cannot be combined into a single process very easily because of duplicated metadata.  This arbitrary division of functions to be performed in Compile Code mode verses Interpretive Mode is not desirable; there may be instances where these two functions can be blended into a single execution.

Recommendation: One way of managing this would be to provide a stub method which allows for calling Interpreted processes under the Compiled object.  The IDE for development of the Interpreted Mode logic would be responsible for ensuring conformance with any of the information in the header or grouping structures.

Aggregation and Allocation Processes

Background: Today’s SAFR engine’s aggregation processes are isolated from the main extract engine, GVBMR95, in a separate program, GVBMR88.  Between these programs is a sort step, using a standard sort utility.  This architecture can complicate the actual posting processes requiring custom logic to table, sort (in a sense, through hash tables) detect changed keys, and aggregate last records before processing new records. 

Additionally, today’s SAFR does not natively perform allocation processes, which require dividing transactions or balances by some set number of additional records or attributes, generating new records.  Also, SAFR has limited variable capabilities today. 

Recommendation:   Enhancing SAFR with these capabilities in designing the new engine is not considered an expensive development; it simply should be considered and developed from the ground up.

Next Steps

Feedback should be sought on this proposal from multiple team members, adjustments to it documented, and rough order of estimates of build cost developed.

Training Module 8: Common Performance Engine Errors

The slides used in this video are shown below:

Slide 1

Welcome to the training course on IBM Scalable Architecture for Financial Reporting, or SAFR. This is module 8: “Common Errors Encountered When Running the Performance Engine.”

Slide 2

This module provides you with an overview of common errors encountered when running the SAFR Performance Engine. Upon completion of this module, you should be able to:

  • Diagnose and resolve common Performance Engine errors,
  • Set output record limits to assist in debugging, and
  • Analyze the relationships between components using the Dependency Checker.

Slide 3

Under normal circumstances, the SAFR Performance Engine transforms data and provides useful outputs. However, if a Performance Engine program detects an unusual condition, it will display an error message. Errors can occur for many different reasons.

Recall that the SAFR Performance Engine consists of six phases. Each phase creates data sets that are read by subsequent phases. If the JCL that executes a program does not point to the data set that the program is expecting, or if it is missing a needed data set, an error is reported.

Errors can also occur if keywords in JCL parameters are missing or misspelled. On the following slides, we’ll cover the most common error conditions and describe how to remedy them.

Slide 4

Recall that the Select phase produces a control report named MR86RPT, or the MR86 Report. This control report contains useful statistics for the current run and, if an error occurs, an error message to help you diagnose the problem.

Slide 5

A view must be active and saved to the Metadata Repository to be selected by the MR86 program. If it is not, an error message is displayed in the MR86 Report, specifying the view number and name.

To remedy this problem, find the view in the Workbench, activate the view, save it, and then rerun the job.

Slide 6

The MR86 program looks for views only in the environment specified by the ENVIRONMENTID parameter in the MR86 configuration file. If the MR86 Report displays a message saying that no views can be found, the problem is often that the wrong environment has been specified there. This value must match the value that is displayed in the Workbench status line, which is the bottom line of the Workbench window. We can see here that the true environment ID is 7 and the value for ENVIRONMENTID in the JCL is 7777, which is a typographical error. In such cases, you correct the ENVIRONMENTID value in the configuration parameters and rerun the job.

Slide 7

For the MR86 program to find your views, the database connection parameters in the MR86 configuration file must be specified correctly. One connection parameter is the schema. The value of the schema parameter must match the value that is labeled as “database” in the Workbench status line. In this example, the schema value is SAFRTR01, but the SCHEMA parameter is SAFRTR01XXXX. Because these two values do not match, a database error is displayed in the MR86 Report. In such cases, you should correct the SCHEMA value in the configuration parameters and rerun the job.

Slide 8

In the Compile phase, the MR84 program reads the XML file created by the prior step and converts it to a form that can be executed by the Extract phase later. This process, called compiling, checks for syntax errors and other errors in the XML file and displays associated messages in the MR84 Report. This report seldom displays any error messages because under normal circumstances, views that are read by the MR84 program have already been compiled by the Workbench. If you encounter a problem in this phase, report it to SAFR Support.

Slide 9

In the Logic phase, the MR90 program reads the VDP file created by the prior step and converts it into data sets that are used later by the Reference phase and the Extract phase. Errors that occur in this phase are rare and are usually the result of database corruption. If you encounter a problem in this phase, report it to SAFR Support.

Slide 10

In the Reference phase, the MR95 program converts reference data into a form that is appropriate for lookup operations that are executed subsequently in the Extract phase. MR95 produces a control report known as the MR95 Report, which either indicates successful completion or displays errors that must be corrected before proceeding.

Slide 11

The view shown here reads the ORDER_001 logical file and displays data from two lookup fields: CUSTOMER_ID and NAME_ON_CREDIT_CARD.

Slide 12

We can see from the column source properties that the CUSTOMER_ID field receives its value from data accessed by the CUSTOMER_from_ORDER lookup path.

Slide 13

We can also see from the column source properties that the NAME_ON_CREDIT_CARD field receives its value through a different lookup path named CUSTOMER_STORE_CREDIT_CARD_INFO_from_ORDER.

Slide 14

If the JCL for the Reference job does not contain DD statements for all reference files needed to support the selected views, the job terminates with an abnormal end (or “abends”). The MR95 Report displays a message with the DD name of the missing file, which in this case is CUSTCRDT.

Slide 15

The error condition in this case can be remedied by adding the CUSTCRDT file to the JCL and rerunning the job.

Slide 16

The Reference Extract Header (or “REH”) file, with a DD name of GREFREH, is required for the Reference phase to function correctly. If the DD statement for the REH file is missing, an error message is displayed in the MR95 Report, saying that the DCB information for the file is missing. To remedy the problem, add the DD statement for GREFREH to the JCL and rerun the job.

Slide 17

Similarly, a Reference Extract Detail (or “RED”) file associated with every lookup path is required for the Reference phase to function correctly. DD names for RED files begin with the letters “GREF” and end with a 3-digit number. If the DD statement for a RED file is missing, an error message is displayed in the MR95 Report, saying that the DCB information for the file is missing. In this case, the GREF004 file is missing. To remedy the problem, add the DD statement for GREF004 to the JCL and rerun the job.

Slide 18

For reference files to function correctly in SAFR, there are two rules which must be followed:

  • First, records must have unique keys
  • And second, records must be sorted in key sequence.

If either of these rules is broken, a “lookup table key sequence” error message is displayed in the MR95 Report. To resolve the problem, we must first examine the error message and note the DD name associated with the reference work file in error. In this case, the DD name is GREF004. We can then use this to find the DD name of the source reference file. Scanning upwards in the MR95 Report, we can see that GREF004 is written using records read from the DD name CUSTCRDT.

Slide 19

To research this problem further, we must find the length of the key associated with the logical record for customer credit card Info. In this case, it is 10. We must also find the data set associated with the DD name CUSTCRDT. From the JCL, we can see that it is GEBT.TRN.CUSTCRDT.

Slide 20

If we open the data set GEBT.TRN.CUSTCRDT on an ISPF Edit screen, we can see that records 5 through 9 all have the same key value. But rule #1 stated that each key must be unique. We can also see that the records are not in key sequence, which breaks rule #2. To solve our original problem, we remove the duplicate records and sort the records in key sequence. Doing this and then rerunning the job will remove the error condition.

Slide 21

Sometimes when you are debugging a problem, it is useful to limit the number of records written to an extract file. You can set such a limit in the Workbench on the Extract tab of the View Properties screen for a view. Processing for the view will stop after the specified number of records is written. Remember, you must rerun from the Select phase to incorporate the view changes into the Extract phase.

Slide 22

You can set a similar limit for the final output file that is written in the Format phase. You can set such a limit in the Workbench on the Format tab of the View Properties screen for a view. Processing for the view will stop after the specified number of view output records is written. Again, you must rerun from the Select phase to incorporate the view changes into the Format phase.

Slide 23

The Dependency Checker is a useful tool for debugging views. You can select it from the Reports menu of the Workbench.

Slide 24

In this example, you want to see all direct dependencies for view #71. First, select the appropriate environment, and then select the component type, which in this case is View. Then select view #71 from the list.

For debugging purposes, we normally select the Show direct dependencies only check box. Clearing the check box would cause indirect dependencies to be displayed, too, and this is typically useful only when an impact analysis is needed for a project.

Slide 25

After clicking Check, you see a hierarchical list of all the SAFR metadata components that are used by the view. This helps you determine whether any components that should not have been referenced are referenced and whether any components are missing. Portions of the list can be expanded by clicking the plus sign or collapsed by clicking the minus sign.

Slide 26

In the Extract phase, MR95 executes a series of operations that are defined in the Extract Logic Table, or XLT. The MR95 Trace function causes these operations to be written to a report to aid you in debugging a view. The Trace function can be selected and configured in MR95PARM, the MR95 parameter file.

Trace can be used to diagnose many SAFR problems:

  • LR problems
  • Lookup problems
  • View coding errors, and
  • Data errors

The Trace function will be described in detail in an advanced SAFR training module.

Slide 27

When you are developing a view, you should apply several guidelines to make your work more efficient.

Begin by coding a small subset of the requirements for a view, and then testing that. For example, you may choose to code a view with only a small number of columns or a smaller record filter. Once you have successfully gotten this view to work, you can add more complexity and continue until the view is complete.

It is useful to start testing with only a small number of records. In this way, you can more easily see the results of your view logic.

Verify that the output format for each column is correct. If it isn’t, review the column properties.

Check to make sure that the number of records written is what you expected. If it isn’t, check the record filters to verify that the logic is correct.

Finally, verify that the output value for each column is correct. If it isn’t, examine the column logic.

Slide 28

This module provided an overview of common errors encountered when running the SAFR Performance Engine. Now that you have completed this module, you should be able to:

  • Diagnose and resolve common Performance Engine errors
  • Set output record limits to assist in debugging, and
  • Analyze the relationships between components using the Dependency Checker

Slide 29

Additional information about SAFR is available at the web addresses shown here.

Slide 30

Slide 31

This concludes the Module 8. Thank you for your participation.

Training Module 2: Performance Engine Overview

The following slides are used in this on-line video :

Slide 1

Welcome to the training course on IBM Scalable Architecture for Financial Reporting (or SAFR). This is module 2, “Performance Engine Overview.”

Slide 2

This module provides you with an introduction to the SAFR Performance Engine and how to use it. By the end of this training, you should be able to:

  • Identify the phases of the Performance Engine
  • Identify the main inputs and outputs for each phase
  • Give a high-level overview of the reports from each phase, and
  • Run a Performance Engine job stream

Slide 3

As we noted in the “Introduction to SAFR Views” module of this training course, SAFR consists of two software components: a PC-based Workbench and a mainframe-based batch process known as the Performance Engine. Developers use the Workbench to build applications that are stored in a metadata repository in an IBM® DB2® database. These applications are then executed by the Performance Engine, which reads data from source files or databases, transforms it, and writes it to output files. In this sense, SAFR is an application development tool and is not fundamentally different from any other tool or language.

Slide 4

The Performance Engine comprises six phases:

  • The SELECT phase, where the desired views are selected from the Metadata Repository
  • The COMPILE phase, where the selected views are converted to an executable form
  • The LOGIC phase, where logic from multiple views is consolidated and optimized for execution
  • The REFERENCE phase, where values are retrieved from lookup tables and optimized for execution
  • The EXTRACT phase, where data from source files is retrieved, merged with lookup data, and transformed for output, and
  • The optional FORMAT phase, where data is sorted, summarized, and formatted if necessary

Slide 5

The primary SAFR program in the Select phase is GVBMR86 (The names of all programs in the SAFR Performance Engine start with the letters “GVB,” so it is common to refer to this program as MR86). MR86 reads a list of view numbers and finds the corresponding views in the SAFR Metadata Repository. It then processes these views and converts them to an Extensible Markup Language (or XML) file. It also produces a control report known as the MR86 Report, which indicates successful completion or displays errors that may have to be corrected before proceeding.

Slide 6

The view numbers for the desired views are placed in the JCL, using a DD name called VIEWLIST. In this example, we are specifying view 90, which is the view we created in the “Introduction to SAFR Views” module of this training course.

Slide 7

For a view to be selected, it must first be set to Active status in the Workbench. If the view is not active, the MR86 Report will contain the error message “View is inactive.” A view that is inactive cannot be run.

Slide 8

When errors occur, MR86 also issues a condition code of 8 for the job step, which typically causes the remaining job steps to be skipped.

Slide 9

Once the view is activated in the SAFR Workbench, MR86 can be rerun and the view should be successfully selected for processing. If no other errors are encountered, the MR86 Report does not contain the word “ERROR” at the beginning of any row, and MR86 returns a condition code of 0 for the job step, allowing remaining jobs steps to run.

Slide 10

The primary SAFR program in the Compile phase is GVBMR84. MR84 reads the XML file created by the previous step and converts it to a form that can be executed by the Extract phase later. This new file is called the View Definition Parameters (or VDP) file. Because multiple versions of the VDP file are used in the SAFR job stream, this one is referred to as the MR84 VDP to indicate the program that wrote it. MR84 also produces a control report known as the MR84 Report.

Slide 11

The MR84 Report indicates successful completion or displays XML syntax errors and view compile errors that have to be corrected before proceeding. If there are no errors, the process continues. 

The MR84 Report also shows how columns in the view are translated into a specialized SAFR construct known as a logic table.

Slide 12

The primary SAFR program in the Logic phase is GVBMR90. MR90 reads the MR84 VDP file created by the previous step and creates a new VDP and logic tables that are used by the next steps, the Reference and Extract phases. The new VDP is called the MR90 VDP to distinguish it from the input VDP file. The logic tables are known as the MR90 JLT (for “Join Logic Table”) and the MR90 XLT (for “the Extract Logic Table”).

The next programs in the Logic phase, which are not shown here, read these files and produce converted versions known as the MR77 VDP, the MR76 JLT, and the MR76 XLT.

Slide 13

The MR90 Report displays summary statistics for the Logic Table Creation process. In this case, the Extract Logic Table contains nine records. This view included no joins, so no records were written to the Join Logic Table. The report also displays, in detail format, the contents of the Extract Logic Table and the Join Logic Table if applicable.

Slide 14

The primary SAFR program in the Reference phase is GVBMR95. MR95 reads the MR77 VDP file and the MR76 JLT file created by earlier steps and one or more Reference Data files. It then produces a Reference Extract Header (or REH) that describes the format of the reference data being processed. It also creates one Reference Extract Detail (or RED) file for each Reference Data file read. These files contain only the data required to perform joins in the Extract phase. Finally, MR95 produces a control report known as the MR95 Report, which indicates successful completion or displays errors that may have to be corrected before proceeding.

Slide 15

If this job stream does not access any Reference Data files, the MR95 Report displays a warning message saying that there is an “empty or abbreviated logic table.” If you do not intend to access any Reference Data files, this is considered a normal condition and you may proceed to the next step.

To be processed correctly, reference data must be sorted in key sequence and have no records with duplicate keys.

Slide 16

The primary SAFR program in the Extract phase is GVBMR95, which is the same program that is run in the Reference phase. In reviewing the job stream output, you should take care not to confuse the outputs of the two executions of MR95.

MR95 reads the MR77 VDP file, the MR76 XLT file, and the REH, and RED files created by earlier steps, along with source data for the process (Source data is sometimes referred to as business event data or just event data).

MR95 executes various transformations and, depending on specifications in the selected views, produces one or more View Output files. It may also produce one or more Extract Work files, which are temporary files processed by the Format phase. Finally, it produces an MR95 Report, which presents summary statistics for the process. 

Slide 17

Before running the Extract phase job, you must make sure that DD statements for any required input files are included, containing the data to be scanned and extracted according to the view requirements. The DD names (such as CUSTOMER or ORDER001) must match those in the Workbench physical files referenced by the views being run.

Slide 18

Similarly, you also must ensure that DD statements for any required output files are included. These DD names are determined by view parameters in the Workbench.

Slide 19

The Extract phase MR95 Report displays statistics that are useful for audit trails and performance tuning. In this example, we can see that 12 records were read from the ORDER001 file, 12 records were processed by view 90, and 12 records were written to the OUTPUT01 file.

Slide 20

A secondary program in the Extract phase is GVBUT90, which produces the UT90 Report. This report displays information from the Extract Logic Table (XLT), which shows each step that is executed in a data transformation. This is useful information for debugging views. A more detailed discussion of the logic table is presented in the training module entitled “Basic Debugging.”

Slide 21

The primary SAFR program in the Extract phase is GVBMR88. MR88 reads the MR77 VDP, REH, and RED files created by earlier steps, along with the temporary Extract Work files for the process. It then sorts, summarizes, and formats the data, producing one or more View Output files. Additional details will be provided in the training module entitled “Introduction to the Format Phase.”

Slide 22

This module provided an overview of the SAFR Performance Engine. Now that you have completed this module, you should be able to:

  • Identify the phases of the Performance Engine
  • Identify the main inputs and outputs for each phase
  • Give a high-level overview of the reports from each phase, and
  • Run a Performance Engine job stream

Slide 23

Additional information about SAFR is available at the web addresses shown here.

Slide 24

Slide 25

This concludes the Module 2. Thank you for your participation.

Technical Team Resources: 1994

In the development of the system, the team learned certain documents were handy to have around (before the Internet, these types of documents were the Internet). The following were used to debug the system during development, testing, implementation projects, and subsequent maintenance.

More about this time can be read in Balancing Act: A Practical Approach to Business Event Based Insights, Chapter 23. Development and Chapter 38. Abends.

Logic Table Codes

The following diagrams were carried by many of the support team so as to be able to understand the GenevaERS “p-code” if you will, called the Logic Table.

IBM s/390 “Yellow Card”

The “Yellow Card” was not really a “card” at all, but a booklet containing short descriptions of all the essential elements of the System 390 mainframe. This was carried to work and home from work by some of the support team, and was often used in understanding the generated machine code and subsequent problems. In these pages one can find:

– The op codes or hex values for those instructions, if one is looking at machine code and trying to decide what the code is attempting to tell the machine to do.

– What the corresponding instruction name and intent of those machine instructions are.

– The various formats each of the assembler instructions take for s/390 assembler take.

– Other important things, like block sizes for various data storage devices

S/360 Green Card

Although never used on the commercial GenevaERS system (but likely used in some manner in the original system development at the State of Alaska), some technical team members were proud to receive a reprinted S/360 “Green Card” (from which the “Yellow Card” received it name)

CICS and Other Price Waterhouse Technical Helps

The following were also used at times; these were produced by Price Waterhouse as part of its Mitas Technical Training Program.

Technical Diagrams–Internal Memory Structures 1994

The following diagrams were created by Doug Kunkel probably in 1994 to explain the maintenance team the internal memory structures and pointers to them in the Extract Engine, MR95.

More about this time can be read in  Balancing Act: A Practical Approach to Business Event Based Insights, Chapter 38. Abends