Sign-up and Commit to the Spark-GenevaERS POC

Organization of the project continues, but much progress has been made. Check out the Community Repository on GitHub, and its Governance and Technical Steering Committee Checklist to see what’s been happening.

But don’t stop there….

Get Connected

The first Technical Steering Committee (TSC) Meeting, open to all, is scheduled for Tuesday, August 11, 2020 at 8 PM US CDT, Wednesday August 12, 10 AM WAST/HK time.

Project Email List: To join the meeting or to be kept up to date on project announcements, join the GenevaERS Email List. You’ll receive an invitation as part of the calendar system. You must be on the e-mail list to join the meeting.

Slack Channel: Conversations are starting to happen on the Open Mainframe Project Slack workspace. Request access here, then when granted look for the GenevaERS Channel.

Spark POC Plans

The team continues to prepare the code for release. But that’s no impediment to our open source progress.

We’ve determined that the best way to help people understand the power of The Single Pass Optimization Engine is to contrast it–and begin to integrate it–with the better known Apache Spark.

Here’s what we plan to do work on over the next several weeks, targeting completion before the inaugural Open Mainframe Summit, September 16 – 17, 2020.

This POC was designed using these ideas for the next version of GenevaERS.

The POC will be composed of various runs of Spark on z/OS using GenevaERS components in some, and ultimately full native GenevaERS.

GenevaERS Spark POC Approach

The configurations run include the following:

  1. The initial execution to produce the required outputs will be native Spark, on an open platform.
  2. Spark on z/OS, utilizing JZOS as the IO engine. JZOS is a set of Java Utilities to access z/OS facilities. They are C++ routines having Java wrappers, allowing them to be included easily in Spark. Execution of this process will all be on z/OS.
  3. The first set of code planned for release for GenevaERS is a set of utilities that perform GenevaERS encapsulated functions, such as GenevaERS overlapped BSAM IO, and the unique GenevaERS join algorithm. As part of this POC, we’ll encapsulate these modules, written in z/OS Assembler, in Java wrappers, to allow them to be called by Spark.
  4. If time permits, we’ll switch out the standard Spark join processes for the GenevaERS utility module.
  5. The last test is native GenevaERS execution.

The results of this POC will start to contribute to this type of architectural choice framework, which will highlight where GenevaERS’s single pass optimization can enhance Apache Spark’s formidable capabilities.

Note that the POC scope will not test the efficiencies of the Map and Reduce functions directly, but will provide the basis upon which that work can be done.

Start Contributing

Open source is all about people making contributions. Expertise is needed in all types off efforts to carry of the POC.

  • Data preparation. We are working to build a data set that can be used on either z/OS or open systems to provide a fair comparison for either platform, but with enough volume to test some performance. Work here includes scripting, data analysis, and cross platform capabilities and design.
  • Spark. We want some good Spark code, that uses the power of that system, and makes the POC give real world results. Expertise needed includes writing Spark functions, data design, and turning.
  • Java JNI. Java Native Interface is the means by which one calls other language routines under Java, and thus under Spark. Assistance can be used in helping to encapsulate the GenevaERS utility function, GVBUR20 to perform fast IO for our test.
  • GenevaERS. The configuration we create we hope to be able to extract as GenevaERS VDP XML, and provide it as a download for initial installation testing. A similar goal with the sample JCL that will be provided. GenevaERS expertise in these fields is needed.
  • Documentation, Repository Work, and on and on and on. At the end of drafting this blog entry, facing the distinct chance it will be released with typos and other problems, we recognize we could use help in many more areas.

The focus for our work is this Spark-POC repository. Clone, fork, edit, commit, create pull request, repeat.

On a daily basis there is an open scrum call on these topics held at this virtual meeting link at 4:00 PM US CDT. This call is open to anyone to join.

Contributions from all are welcome.

Proposed GenevaERS 5.0 Architectural Direction

The following are some initial thoughts on the next version of GenevaERS as an Open Source project might go:

Workbench

Currently the only way to specify GenevaERS processes (called a GenevaERS “view”) is through the Workbench, which is a structured environment allowing specifications of column formats, values to be used in populating those columns, including the use of logic called GenevaERS Logic Text.

GenevaERS developers have known for years that in some cases a simple language would be easier to use. The structured nature of the Workbench is useful for simple views, but becomes more difficult to work with for more complex views.

Specific Proposals

In response, we propose enhancements to the front-end of GenevaERS for the following:

  • Open up a new method of specifying logic besides the Workbench, initially Java.
  • This language would be limited to a subset of all the Java functions as supported by the the extract engines.
  • The current Workbench compiler would be modified to produce a GenevaERS logic table from the Java code submitted to it.
  • Develop plug-ins for major IDE’s (Eclipse, Intellij) that highlight use of functions GenevaERS does not support in the language.
  • GenevaERS Performance Engine Processes should be able to construct a VDP (View Definition Parameter) file from a mix of input sources.

Benefits

Doing this would allow:

  • Storage of GenevaERS process logic in source code management systems, enabling all the benefits of change management
  • Opening up to other languages; often extracts from GenevaERS repositories are more SQL-like because of the data normalization that happens within the repository
  • Taking in logic specified for Spark and other execution engines which conform to GenevaERS syntax, providing higher performance throughput for those processes
  • Begin to open the possibility of constructing “pass” specifications, rather than simply defined in execution scripting.
  • Perhaps creation of in-line “exit” like functionality wherein the specified logic can be executed directly (outside the Logic Table construct).

Performance Engine

The GenevaERS Performance Engine uses the Logic Table and VDP file to resolve the GenevaERS processes (GenevaERS “views”).

Specific Proposals

Proposed enhancements include:

  • Expanding today’s very efficient compiler to more functions, to support greater sets of things expressed in the languages.  This would include things like greater control over looping mechanisms, temporary variables, greater calculation potential, and execution of in-line code and called functions within a view under the GenevaERS execution engine.
  • If there is interest and capacity, we may even move in the longer term towards an alternative Java execution engine.  This would be a dynamic java engine, similar to the GenevaERS extract engine today; not a statically created for specific business functions as discussed below.

Alternative Engines

Existing customers have expressed interest in alternative execution engines for SAFR logic, as demonstrated in the Initial Open Source Work: Metadata Translation. The following are approaches that can be taken to these requests:

View to Stand-alone Java Program: It is possible to consider creating utilities which translate from GenevaERS meta data to another language, each view becoming a stand alone program, which could simply be maintained as custom processes. This would provide a migration path for very simple GenevaERS Processes to other tooling, when performance is not important.

Multi-View Execution Java Program: A set of GenevaERS views (a VDP) could be converted to a single Java program, which produces multiple outputs in a single pass of the input file, similar to what GenevaERS does today.  In other words, it is possible to look at how GenevaERS performs the one-pass architecture, isolates differing business logic constructs from each other, perform joins, etc., and write new code to do these functions.  This would also provide performance benefits from learning from the GenevaERS architecture.

Dynamic Java Program: Today’s GenevaERS compiler which produces a Logic Table could be converted to produce a Java (or other language) executable. This might add the benefit of making the processes dynamic, rather than static.  This can have benefits to changing rules and functions in the GenevaERS workbench, and some sense of consistent performance for those functions, and the potential benefit of growing community of GenevaERS developers for new functions.

Next Steps

These ideas will be discussed at an upcoming GenevaERS community call to gauge interest and resources which might be applied.

Training Module 22: using Pipes and Tokens

The slides used in the following video are shown below:

Slide 1

Welcome to the training course on IBM Scalable Architecture for Financial Reporting, or SAFR.  This is Module 22, using Pipes and Tokens

Slide 2

Upon completion of this module, you should be able to:

  • Describe uses for the SAFR piping feature
  • Read a Logic Table and Trace with piping
  • Describe uses for the SAFR token feature
  • Read a Logic Table and Trace with tokens
  • Debug piping and token views

Slide 3

This module covers three special logic table functions, the WRTK which writes a token, the RETK which reads the token and the ET function which signifies the end of a set of token views.  These functions are like the WR functions, the RENX function and the ES function for non-token views. 

Using SAFR Pipes does not involve special logic table functions.

Slide 4

This module gives an overview of the Pipe and Token features of SAFR.

SAFR views often either read data or pass data to other views, creating chains of processes, each view performing one portion of the overall process transformation

Pipes and Tokens are ways to improve the efficiency with which data is passed from one view to another

Pipes and Tokens are only useful to improve efficiency of SAFR processes; they do not perform any new data functions not available through other SAFR features.

Slide 5

A pipe is a virtual file passed in memory between two or more views. A pipe is defined as a Physical File of type Pipe in the SAFR Workbench.

Pipes save unnecessary input and output. If View 1 outputs a disk file and View 2 reads that file, then time is wasted for output from View 1 and input to View 2. This configuration would typically require two passes of SAFR, one processing view 1 and a second pass processing view 2.

If there is a pipe placed between View 1 and View 2, then the records stay in memory, and no time is wasted and both views are executed in this single pass.

The pipe consists of blocks of records in memory.

Slide 6

Similarly a token is a named memory area. The name is used for communication between two or more views.

Like a pipe, it allows passing data in memory between two or more views.

But unlike pipes which are virtual files and allow asynchronous processing, tokens operate one record at a time synchronously between the views.

Slide 7

Because Pipes simply substitute a virtual file for a physical file, views writing to a pipe are an independent thread from views reading the pipe.

Thus for pipes both threads are asynchronous

Tokens though pass data one record at a time between views.  The views writing the token and the views reading the token are in the same thread.

Thus for tokens there is only one thread, and both views run synchronously

Slide 8

An advantage of pipes is that they include parallelism, which can decrease the elapsed time needed to produce an output.  In this example,  View 2 runs in parallel to View 1. After View 1 has filled block 1 with records, View 2 can begin reading from block 1.  While View 2 is reading from block 1, View 1 is writing new records to block 2. This advantage is one form of parallelism in SAFR which improves performance. Without this, View 2 would have to wait until all of View 1 is complete.

Slide 9

Pipes can be used to multiply input records, creating multiple output records in the piped file, all of which will be read by the views reading the pipe.  This can be used to perform an allocation type process, where a value on a record is divided into multiple other records.

In this example, a single record in the input file is written by multiple views into the pipe.  Each of these records is then read by the second thread.

Slide 10

Multiple views can write tokens, but because tokens can only contain one record at a time, token reader views must be executed after each token write.  Otherwise the token would be overwritten by a subsequent token write view.

In this example, on the left a single view, View 1, writes a token, and views 2 and 3 read the token. View 4 has nothing to do with the token, reading the original input record like View 1.

On the right, both views 1 and 4 write to the token.  After each writes to the token, all views which read the token are called. 

In this way, tokens can be used to “explode” input event files, like pipes.

Slide 11

Both pipes and tokens can be used to conserve or concentrate computing capacity.  For example CPU intensive operations like look-ups, or lookup exits, can be performed one time in views that write to pipes or tokens, and the results added to the record that is passed onto to dependent views.  Thus the dependent, reading views will not have to perform these same functions.

In this example, the diagram on the left shows the same lookup is performed in three different views.  Using piping or tokens, the diagram on the right shows how this lookup may be performed once, the result of the lookup stored on the event file record and used by subsequent views, thus reducing the number of lookups performed in total.

Slide 12

A limitation of pipes is the loss of visibility to the original input record read from disk; the asynchronous execution of thread 1 and thread 2 means the records being processed in thread one is unpredictable.

There are instances where visibility by the view using the output from another view to the input record is important.  As will be discussed in a later module, the Common Key Buffering feature can at times allow for use of records related to the input record.  Also, the use of exits can be employed to preserve this type of visibility.  Token processing, because it is synchronous within the same thread, can provide this capability. 

In this example, on the top the token View 2 can still access the original input record from the source file if needed, whereas the Pipe View 2 on the bottom cannot.

Slide 13

Piping can be used in conjunction with Extract Time Summarization, because the pipe writing views process multiple records before passing the buffer to the pipe reading views.  Because tokens process one record at a time, they cannot use extract time summarization.  A token can never contain more than one record.

In the example on the left, View 1, using Extract Time Summarization, collapses four records with like sort keys from the input file before writing this single record to the Pipe Buffer.  Whereas Token processing on the right will pass each individual input record to all Token Reading views.

Note that to use Extract Time Summarization  with Piping, the pipe reading views must process Standard Extract Format records, rather than data written by a WRDT function; summarization only occurs in CT columns, which are only contained in Standard Extract Format Records. 

Slide 14

Workbench set-up for writing to, and reading from a pipe is relatively simple.  It begins with creating a pipe physical file and a view which will write to the pipe.

Begin by creating a physical file with a File Type of “Pipe”.

Create a view to write to the Pipe, and within the View Properties:

  • Select the Output Format of Flat File, Fixed-Length Fields,
  • Then select the created physical file as the output file

Slide 15

The next step is to create the LR for the Pipe Reader Views to use.

The LR used for views which read from the pipe must match exactly the column outputs of the View or Views which will write to the Pipe.  This includes data formats, positions, lengths and contents.

In this example, Column 1 of the View,  ”Order ID” is a 10 byte field, and begins at position 1.  In the Logical Record this value will be found in the Order_ID field, beginning at position 1 for 10 bytes.

Preliminary testing of the new Logical Record is suggested, making the view write to a physical file first, inspect the view output, and ensure the data matches the Pipe Logical Record, and then change the view to actually write to a pipe.  Looking at an actual output file to examine positions is easier than using the trace to detect if positions are correct as the Pipe only shows data referenced by a view, not the entire written record produced by the pipe writing view.

Slide 16

The view which reads the Pipe uses the Pipe Logical Record.  It also must read from a Logical File which contains the Pipe Physical File written to by the pipe writer view.

In this example, the view read the pipe-LR ID 38, and the Pipe LF ID 42 Logical File which contains the Physical File of a type Pipe written to by the pipe writing view.  The view itself uses all three of the fields on the Pipe Logical Record.

Slide 17

This is the logic table generated for the piping views.  The pipe logic table contains no special logic table functions.  The only connection between the ES sets is the shared physical file entity.

The first RENX – ES set is reading Logical File 12.  This input event file is on disk.  The pipe writing view, view number 73, writes extracted records to Physical File Partition 51. 

The next RENX – ES set reads Logical File 42.  This Logical File contains the same Physical File partition (51) written to by the prior view.  The Pipe Reader view is view 74.

Slide 18

This the the trace for piped views.  In the example on the left, the thread reading the event file, with a DD Name of ORDER001, begins processing input event records.  Because the pipe writer view, number 73, has no selection criteria, each record is written to the Pipe by the WRDT function.  All 12 records in this small input file are written to the Pipe.

When all 12 records have been processed, Thread 1 has reached the end of the input event file, it stops processing, and turns the pipe buffer over to Thread 2 for processing.

Thread 2, the Pipe Reader thread, then begins processing these 12 records. All 12 records are processed by the pipe reader view, view number 74.

As shown on the right, if the input event file had contained more records, enough to fill multiple pipe buffers, the trace still begins with Thread 1 processing, but after the first buffer is filled, Thread 2, the pipe reading view, begins processing.  From this point, the printed trace rows for Thread 1 and Thread 2 may be interspersed as both threads process records in parallel against differing pipe buffers.  This randomness is highlighted by each NV function against a new event file record

Slide 19

Workbench set-up for writing to, and reading from a token is very similar to the set-up for pipes. It begins with creating a physical file with a type of token, and a view which will write to the token.

Begin by creating a physical file with a File Type of “Token”.

Create a view to write to the token, and within the View Properties, select the Output Format of Flat File, Fixed-Length Fields.

Then in the proprieties for the Logical Record and Logical File to be read, select the output Logical File and created Physical File as the destination for the view

Slide 20

Like pipe usage, the next step is to create the Logical Record for the Token Reader Views to use. In our example, we have reused the same Logical Record used in the piping example.  Again, the logical record which describes the token must match the output from the view exactly, including data formats, positions, lengths and contents.

Slide 21

Again, the view which reads the Token uses this new Logical Record.  It also must read from a Logical File which contains the Token Physical File written to by the token writer view.

Note that the same Logical File must be used for both the token writer and token reader; using the same Physical File contained in two different Logical Files causes an error in processing. 

Slide 22

This is the logic table generated for the token views.

Any views which read a token are contained within an RETK – Read Next Token / ET – End of Token set at the top of the logic table.

The token writer view or views are listed below this set in the typical threads according to the files they read; the only change in these views is the WRTK logic table function, which Writes a Token shown at the bottom of the logic table in red.

Although the token reading views are listed at the top of the Logic Table, they are not executed first in the flow.  Rather, the regular threads process, and when they reach an WRTK Write Token logic table function, the views in the RETK thread are called and processed.  Thus the RETK – ET set is like a subroutine, executed at the end of the regular thread processing.

In this example, view 10462 is reading the token written by view 10461.  View 10462 is contained within the RETK – ET set at the top of the logic table.  View 10461 has a WRTK function to write the token.  After execution of this WRTK function, all views in the RETK – ET set are executed.

Slide 23

This is the Logic Table Trace for the Token Views.

Note that, unlike the Pipe Example which has multiple Event File DD Names within the Trace, because Tokens execute in one thread, the Event File DD Name is the same throughout, ORDER001 in this example.

The WRTK Write Token Function, on view 10461 in this example, which ends the standard thread processes are immediately followed by the Token Reading view or views, view 10462 in this example.  The only thing that makes these views look different from the views reading the input event file is that they are processing against a different Logical File and Logical Record, Logical File ID 2025  and Logical Record ID 38 in this example. 

Slide 24

The trace on the right has been modified to show the effect of having multiple token writing views executing at the same time. 

Note multiple executions of View 10462, the token reading view processing the same Event Record1,  from the Same Event DD Name ORDER001, after each token writing views, 10460 and 10461.

Slide 25

The GVBMR95 control report shows the execution results from running both the Pipe and Token views at the same time.

  • At the top of the report, the Input files listed include the
  • Disk File, the event file for both the pipe and token writing views
  • the Token (for the Token Reading views), and
  • the Pipe (for the Pipe Reading views)
  • The bottom of the report shows the final output disposition, showing records were written to
  • an Extract File (shared by the pipe and token reading views),
  • the Token (for the Token Writing View) and
  • the Pipe (for the Pipe Writing View)
  • The center section shows more details about each one of the Views.

Slide 26

The top of this slide shows the final run statistics about the run from the GVBMR95 control report.

Note that a common error message may be encountered if view selection for the pass is not carefully constructed, for either Pipes or Tokens.  Execution of a pass which contains only  Pipe Writers and no Pipe Readers will result in this message.  The opposite is also true, with no Pipe Writers and only Pipe Readers selected.  The same is also true for token processes.  Pairs of views must be executed for either process, matched by Logical Files containing matching Physical Files for the process to execute correctly.

Slide 27

This module described using piping and tokens. Now that you have completed this module, you should be able to:

  • Describe uses for the SAFR piping feature
  • Read a Logic Table and Trace with piping
  • Describe uses for the SAFR token feature
  • Read a Logic Table and Trace with tokens
  • Debug piping and token views

Slide 28

Additional information about SAFR is available at the web addresses shown here. This concludes Module 22, The SAFR Piping and Token Functions

Slide 29

Research on Next Generation SAFR: 2017

The following is an R&D effort to define the next generation of SAFR, composed by Kip Twitchell, in March 2017. It was not intended to be a formally released paper, but the ideas expressed herein have continued to resonate with later assessments of the tool direction.

Proposal for Next Generation SAFR Engine Architecture

Kip Twitchell

March 2017

Based upon my study the last two month, in again considering the next generation SAFR architecture, the need to separate the application architecture from the engine architecture has become clear: the same basic combustion engine can be used to build a pump, a generator, a car or a crane.  Those are applications, not engines.   In this document, I will attempt to outline a possible approach to the Next Generation SAFR engine architecture, and then give example application uses after that.

This document has been updated with the conclusions from a two day workshop at SVL in the section titled “Conclusions” after the proposal in summary.

Background

This slide from Balancing Act is perhaps a simple introduction to the ideas in this document:

(from Balancing Act, page 107)

Key SAFR Capabilities

The SAFR architecture recognizes that building balances from transactions is critical to almost all quantitative analytical and reporting processes.  All financial and quantitative analytics begins with a balance, or position.  Creating these is the key SAFR feature.

Additionally, SAFR enables generation of temporal and analytical business events in the midst of the posting process, and posts these business events at the same time.  In other words, some business events are created from outstanding balances.  SAFR does this very effectively.

Third, the rules used to create balances and positions sometimes change.  SAFR’s scale means it can rebuild balances when rules change; for reclassification and reorganization problems, changes in required analyses, or new, unanticipated analyses.

Fourth, SAFR’s high speed join capabilities, especially date effective joins, and the unique processing model for joins on high cardinality, semi-normalized table structures creates capabilities not available in any other tool.

Last, the ability to do all these things in one pass through the data, post initial transactions, generated new transactions, reclassify some balances or create new ones, and join all the dimensions together, along with generating multiple outputs, and do so efficiently, creates a unique set of functions. 

This combined set of features is what is enable by the bottom diagram.

Key SAFR Limitations

SAFR has often been associated with posting or ETL processes, which are typically more closely aligned with the tail end of the business event capture processes.  It also has association with the report file generation processes at the beginning of the other end of the data supply chain.  It has done so without fully embracing either ends of this spectrum:  It has attempted to provide a simplified user interface for reporting processes which was not completely a language to express posting processes, and it has not integrated with SQL, the typical language of reporting processes.

The idea behind this paper is that SAFR should more fully embrace the ends of these spectrums.  However, to be practical, it does not propose attempting to build a tool to cover end-to-end data supply chain, the scope of which would be very ambitious. Covering the entire data supply chain would require building two major user interfaces, for business event capture and then reporting or analysis.   The approach below contemplates allowing most of the work on either of these interfaces to be done by other tools, and allowing them to use the SAFR engine, to reduce scope of the work involved. 

This paper assumes an understanding of how SAFR processes today to be briefer in writing.  This document is intended for an internal SAFR discussion, not a general customer engagement. 

Proposal in Summary

In summary, SAFR should be enhanced as follows:

Overall

1) The SAFR engine should be made a real-time engine, capable of real time update and query capabilities

2) It should also be configurable to execute in a batch mode if the business functions require (system limitations, posting process control limitations, etc.).

Posting Processes

3) The existing workbench should be converted or replaced with a language IDE, such as Java, Java Script, Scala, etc. The compile process for this language would only permit a subset of the total functions of those languages that fit in the SAFR processing construct.

4) When the engine is executed in “Compiled Code” mode, (either batch or real time), it would generally perform functions specified in this language, and the logic will be static.

5) Real-time mode application architecture would typically use SAFR against indexed access inputs and outputs, i.e., databases capable of delivering specific data for update, and applying updates to specific records.

Reporting Processes

6) SAFR should have a second logic interpreter in addition to the language IDE above, and accept SQL as well.  This would generally be used in an “Interpreted Logic” mode.

7) To resolve SQL, SAFR would be a “bypass” SQL interpreter.  If the logic specified in SQL can be resolved via SAFR, the SAFR engine should resolve the request. 

8) If SAFR cannot satisfy all the required elements, SAFR should pass the SQL to another SQL engine for resolution.  If the data is stored in sequential files, and SAFR is the only resolution engine available, and an error would be returned.

Aggregation and Allocation Functions

9) SAFR functionality should be enhanced to more easily perform aggregation and allocation functions. 

10) This may require embedded sort processes, when the aggregation key is not the natural sort order of the input file.

11) A more generalized framework for multiple level aggregation (subtotals) should be considered, but may not be needed in the immediate term.

12) Additionally, a generalized framework for hierarchy processes should also be contemplated.

Workshop Conclusions:

In the two day workshop I made the point that the z/OS version of SAFR does not meet the requirements of this paper; and the SAFR on Spark version did not as well.  Beyond this, the workshop concluded the following:

(1) The existing SAFR GUI and its associated meta data tables, language, and XML are all inadequate to the future problems, because the GUI is too simple for programming—thus the constant need for user exits—and it is too complex for business people—thus one customer built RMF;

(2) The database becomes a problem in the traditional architecture because there is this explosion of rows of data in the LR Buffer which then collapses down to the answer sets, but the database in the middle means it all has to get stored before it can be retrieved to be collapsed. 

(3) Platform is not an issue considered in this paper, and that the experience of both existing customers seems to demonstrate there is something unique in SAFR that other tools do not have. 

(4) We agreed that although the point about the data explosion and the database in the middle being a problem, taking on the SQL interpretation mode was a bridge too far; we should let the database do the work of satisfying the queries.  SAFR can still produce views which selects and aggregates data to be placed in the database for use, as it has historically.

(5) Stored procedures are just too simplistic for the kinds of complex logic in a posting process; so, we need to build something outside the database which can access the database for inputs and for storing outputs.

(6) There is not a good reason to require the programmer to go clear to a procedural language to specify the SAFR process; The SAFR on Spark POC SQL is a declarative language that does not put as much burden on the developer.

(7) It is possible to use in-line Java code within the SAFR on Spark prototype when a procedural language is required.  This could eliminate simple user exits.

(8) Spark is a good framework for a distributed work load; z/OS manages the parallel processes very effectively, but requires a big server to do so.  Using another language on another platform would require building that distribution network.  That is expensive.

(9) The real time capabilities would simply require creation of a daemon that would then execute the SAFR on Spark processes whenever needed.  This is nearly a trivial problem; performance and response time notwithstanding.

(10) We do not have enough performance characteristics to say if the Spark platform and the prototype will scale for any significant needs.

Real-time Engine and IO Models

The following table outlines likely processing scenarios:

Real-time ModeBatch Mode
Input TypesDatabase Messaging QueueSequential files or Database or Messaging Queue
Output TypesDatabase or Messaging Queue or Sequential FilesSequential Files

Sequential files in this table included native Hadoop structures.  Output from batch mode files could be loaded into databases or messaging queues beyond SAFR processing, as is typically done today with SAFR.

Recommendation:  SAFR today recognizes different IO routines, particularly on input.  It has an IO routine for sequential files, DB2 via SQL, VSAM, DB2 via VSAM. It has had prototype IMS and IDMS drivers.  This same construct can be carried forward into the new version, continuing to keep separate the logic specification from the underlying data structures, both inbound and outbound.  New data structure accesses can be developed over time.

Enhancement:  An additional processing approach could be possible but less likely except in very high volume situations: Real-time mode against input sequential files.  In this configuration, SAFR could continually scan the event files.  Each new request would jump into the ongoing search, noting the input record count of where the search begins.  It would continue to the end of the file, and through the loop back to the start of the file, continuing until it reaches the starting record count again, and then end processing for that request.  This would only be useful if very high volumes of tables scans and very high transaction volumes made the latency in response time acceptable.

Impediments:  The challenges to a real-time SAFR engine is all the code generation interpretation processes in the performance engine.  Making these functions very fast from their current form would be difficult.  Using an IDE for logic specification has a much better chance of making the performance possible.  Performance for the interpreted mode would require significant speed in interpretation and generation of machine code from the SQL logic.

Compiled Code Mode

Background:  Major applications have been built using SAFR.  The initial user interface for SAFR (then called Geneva, or 25 years ago) was intended to be close to an end user tool for business people creating reports.  The Workbench (created starting in 1999) moved more and more elements away from business users, and is now wholly used only by IT professionals. 

Additionally, because of the increasingly complex nature of the applications, additional features to manage SAFR views as source code have been developed in the workbench.  These include environment management, and attempts at source code control system integration.  These efforts are adequate for the time, but not highly competitive with other tools. 

SAFR’s internal “language” was not designed to be a language, but rather, only provide logic for certain parts of a view.  The metadata constructs provide structures, linkage to tables or files, partitioning, join logic.  The view columns provide structure for the fields or logic to be output.  The SAFR language, called Logic Text, only specifies logic within these limited column and general selection logic.

Recommendation:  Given the nature of the tool, it is recommended SAFR adopt a language for logic specification.  I would not recommend SQL as it is not a procedural language, and will inhibit new functionality over time.

The fundamental unit for compile would be a Pass or Execution, including the following components:

  • A Header structure, which defines the meta data available to all subprograms or “method”.  It would contain:
    • The Event and lookup LR structures
    • The common key definition
    • Join paths
    • Logical Files and Partitions
    • Available functions (exits)
  • A View Grouping structure,
    • Defining piping and token groups of views.
    • Tokens and similar (new) functions could potentially be define as In-line (or view) function calls (like tokens) which can be used by other views.
  • Each view would be a method or subprogram that will be called and passed each event file record

The final completed Execution code could be compiled, and the resulting executable managed like any other executable; the source code could be stored in any standard source code control library.  All IT control and security procedures could follow standard processes.  This recognizes the static nature of most SAFR application development processes.

Although this approach offloads some portion of the user interface, language definition, and source code management issues, the next gen SAFR code base would have to include compilers for each platform supported.  Additional SAFR executables would be IO modules, and other services like join, common key buffer management, etc. for each platform supported.

Impediments:  A standard IDE and language will likely not include the Header and Grouping structure constructs required.  Can a language and IDE be enhanced with the needed constructs? 

Interpreted Logic Mode

Background:  Certain SAFR processes have been dynamic over the years, in various degrees.  At no time, however, has the entire LR and meta data structure definition been dynamic. In the early days, the entire view was dynamic in that there were fairly simple on-line edits, but some views were disabled during the production batch run. In more recent years fewer pieces have been dynamic.  Reference data values have always been variable, and often been used to vary function results, in the last two decades always from externally maintained systems like ERP systems.  Most recently, new dynamic aspects have been bolted on through new interfaces to develop business rules facilities. 

The SAFR on Spark prototype application demonstrated the capability to turn SAFR logic (a VDP, or the VDP in XML format) into SQL statements quite readily.  It used a CREATE statement to specify the LR structure to be used.  It had to create a new language construct, called MERGE to represent the common key buffering construct.  SQL tends to be the language of choice for most dynamic processes.  SQL also tends to be used heavily in reporting applications of various kinds. 

Recommendation:  Develop a dynamic compile process to turn SAFR specific SQL into SAFR execution code.   This will allow for other reporting tools to gain the benefits of SAFR processes in reporting.  If the data to be accessed by SAFR is in a relational database (either the only location or replicated there), SAFR could pass off the SQL to another resolution engine if non-SAFR SQL constructs are used. 

Impediments:  It is not clear how the MERGE parameters would be passed to the SAFR SQL Interpreter.  If sequential files are used for SAFR, would a set of dummy tables in a relational system provide a catalogue structure sufficient for the reporting tool to generate the appropriate reporting requirements?  Also, can the cost of creating two compilers be avoided?

Enhancement:  CUSTOMER’s EMQ provides a worked example of a reporting application which resolves SQL queries more efficiently that most reporting tool.  It uses SQL SELECT statements for the general selection criteria, and the database functions for and sort (ORDER BY), and some aggregation (GROUP BY) functions.  When considering SAFR working against an indexed access structure, this construct may be similar.  More consideration should perhaps be given.

Compile Code and Interpreted Logic Intersection

Background:  The newer dynamic, rules based processes, and the older, more static Workbench developed processes cannot be combined into a single process very easily because of duplicated metadata.  This arbitrary division of functions to be performed in Compile Code mode verses Interpretive Mode is not desirable; there may be instances where these two functions can be blended into a single execution.

Recommendation: One way of managing this would be to provide a stub method which allows for calling Interpreted processes under the Compiled object.  The IDE for development of the Interpreted Mode logic would be responsible for ensuring conformance with any of the information in the header or grouping structures.

Aggregation and Allocation Processes

Background: Today’s SAFR engine’s aggregation processes are isolated from the main extract engine, GVBMR95, in a separate program, GVBMR88.  Between these programs is a sort step, using a standard sort utility.  This architecture can complicate the actual posting processes requiring custom logic to table, sort (in a sense, through hash tables) detect changed keys, and aggregate last records before processing new records. 

Additionally, today’s SAFR does not natively perform allocation processes, which require dividing transactions or balances by some set number of additional records or attributes, generating new records.  Also, SAFR has limited variable capabilities today. 

Recommendation:   Enhancing SAFR with these capabilities in designing the new engine is not considered an expensive development; it simply should be considered and developed from the ground up.

Next Steps

Feedback should be sought on this proposal from multiple team members, adjustments to it documented, and rough order of estimates of build cost developed.

Consolidated Data Supply Chain: 2016

The following slides were developed while consulting at 12 of top 25 world banks about financial systems from 2013-2016. Although not specific to GenevaERS, it delineates the causes of the problems in our financial systems, and the potential impact of GenevaERS supported systems and the Event Based Concepts.

Additional details can be read in this white paper: A Proposed Approach to Minimum Costs for Financial System Data Maintenance, and further thinking on the topic in this blog and video presentation from 2020: Sharealedger.org: My Report to Dr. Bill McCarthy on REA Progress.

Training Module 21: The SAFR Write Function

The slides used in the following are shown below:

Slide 1

Welcome to the training course on IBM Scalable Architecture for Financial Reporting, or SAFR.  This is Module 21, The SAFR Write Function

Slide 2

Upon completion of this module, you should be able to:

  • Describe uses for SAFR Write Function
  • Read a Logic Table and Trace with explicit Write Functions
  • Properly use Write Function parameters
  • Debug views containing Write Functions

Slide 3

This module covers the WR function codes in more detail.  These function codes were covered in training modules 11, 12, 17 and 18. 

Slide 4

Standard SAFR views always contain one of these four write functions, either (1) writing a copy of the input record, (2) the DT area of an extract record, (3) a standard extract record, or (4) a summarized extract record.  Typically views contain an implicit WR function in the Logic Table, almost always at the last function of the view.  This can be seen in the logic table on the left. 

In constructing more complex SAFR applications, at times greater control over what type of records are written, where, how many and when they are written is required.   The Write Function is a Logic Table verb which gives greater control over when WR functions are place in the Logic Table. The Write Function, inserted into a view as logic text, creates additional WR functions like the logic table shown on the right.

Let’s look at two examples of problems the Write Verb can help solve.

Slide 5

The Write Function can be used to “flatten” records that have embedded arrays, creating normalized records.  This is possible because for each record read by GVBMR95, multiple records can be written.  In this example, from Customer record 1, three records can be written, for products A, Q and C. 

Slide 6

Another typical use of the Write Verb is to split a file into multiple partitions.  Partitioning is important to be able to increase parallelism of the system in later steps. 

For example, suppose the day’s transactions are delivered in a single file.  Yet the SAFR environment has multiple partitions, a master file for each set of customers, perhaps by state, or for a range of customer IDs.  To be able to add the day’s transactions to customer master file, the daily file must be split into multiple files, one to be used in each master file update process.

To do this, the Write Verb is inserted into Logic Text “If” statements, testing the Customer ID and writing the record to the appropriate file.

The Write Verb and associated parameters control all aspects of writing records, including specifying which record type to write, which file (DD Name) to write it to, and if a Write User Exit should be called.  Let’s first review how these parameters are controlled on standard views, when the Write Verb is not used. 

Slide 7

The following rules determine which Logic Table Function is generated when a view contains no Write Verb and calls no Write Exit: 

  • Views which use the Format Phase, either for File Format Output or for Hard Copy Reports, contain an implicit WRXT Functions to write standard extract records. 
  • Views which use Extract Time Summarization, which requires standard extract file format records, contain an implicit WRSU to write summarized extract records.
  • File output views which DO NOT use the Format Phase are Extract Only views.  Views which are Copy Input views—specified by selecting “Source-Record Structure”—contain WRIN Functions. All others contain WRDT functions to write the DT Data area of the extract record.

Almost all use of the Write Verb is for extract only views—views which do not use the Format Phase.  Because the WRXT and WRSU functions write standard extract file records (which are typically processed in the Format Phase), they are rarely used in conjunction with the Write Verb. 

Slide 8

The DD Name used for Extract Only views can be specified three ways, as reviewed on this slide, and the next. 

If an output physical file is specified in the View Properties, the Output DD Name associated with the physical file will be used.  In this example, the output physical file is OUTPUT01.  If we find the DD name for physical file OUTPUT01 by selecting Physical Files from the Navigator pane opening the OUTPUT01 physical file shows the output DD name of OUTPUT01.

Slide 9

If the physical file value is blank, a new DD name will be generated at runtime.  The most common generated name is the letter “F” followed by the last seven digits of the view number, using leading zeros if required.   In this left hand example, the DD Name is “F0000099”.

Alternatively, an Extract File Suffix can be used.  If the Extract File Number is assigned a value, the Extract Engine concatenates the assigned number to the value “EXTR” with three digits representing the Extract Work File Number, beginning with zeros if necessary to form the DD Name to be used.  In the right hand example, the extract records will be written to the file assigned to the DD Name EXTR001.

Slide 10

The last parameter the Write Verb controls is whether a Write Exit is called.  When the Write Verb is not used, write exits are assigned in view properties, in the Extract Phase tab.  The write exit is called as part of the implicit WR function generated for the view.

Slide 11

These are the elements of the Write Function syntax.  The Write verb keyword can be followed by three parameters which specify:

  • Source
  • Destination
  • User Exit

The source parameter can be used to specify three possible sources:

  • Input data, which means the view is a Copy Input view, the extract file containing a copy of the input record
  • Data, which means the DT area data is written to the output file
  • View, which means the standard view extract record is written to the output file.

The destination parameter allows modification of the target for the write:

  • The Extract parameter allows specifying the DD Name suffix of the extract file
  • The File parameter allows associating the write to a specific logical file

The user exit parameter allows specifying a user exit to be used

Slide 12

The following are examples of Write Functions.  Each of these example statements first tests field 4 to see if it is numeric.  If it is, then data will be written in some way:

  • In example 1, the input data will be copied to an assigned Copy Input (Source=Input) Physical File associated with a specific Logical File (Destination=File)
  • In example 2, the write verb causes a call to the DB2_Update User Exit (Userexit=DB2_Update).  It passes the exit the DT area data of the view. (Source=Data)
  • In example 3, the write verb writes a Standard Extract Record (Source=View) to DD Name EXTR003 (Destination=Extension=03)

Slide 13

The Write Verb can be used in either Record Level Filtering Logic, or Column Assignment Logic.  When a Write Verb is used in Record Filtering, the Select and Skip verbs cannot be used.  The Write Verb replaces the need for these keywords.

The logic shown here is Record Level Filtering logic, and could be used to partition a file containing multiple customers into individual files.  Record level filtering requires the use of the Source=Input parameter is used.  This parameter copies input records to the extract file.  This is required because no columns are created prior to executing Record Level Filtering logic.  The only valid output is to copy the input record.

In this example, the logic in the record filter first tests the Customer ID.  If it is “Cust1,” then the record is copied to the DD Name EXTR001 because of the Destination=01 parameter.  The second If statement will copy Customer 4 records to the EXTR004.  Because the logic is mutually exclusive, (the Customer ID can only be either Cust1 or Cust4, not both), the total output records will equal the input records, except they will be in two different files.

Slide 14

Column Assignment Logic can use the Source=Input parameter if desired.  More frequently, it uses the Source=Data parameter.  This parameter writes all DT columns that have been built in the extract record to that point.  Thus logic can be used to select particular fields from the event file, and write a fixed length record to the output extract file.

Remember that the logic table creates fields in order by column numbers, from left to right.  This affects which column should contain the Write Verb. With a Write Verb in multiple columns, multiple records, of different lengths, will likely be written to the extract file. 

This example attempts to break a record with an array into multiple individual records.  The Write Verb in each column causes an extract record to be written with the DT data built to that point in the Logic Table.  As each field is added, the record become longer.  This is not typically a desired SAFR output.

Slide 15

Alternatively, the Write Verb is often placed only in the last column of the view, thus acting upon the extract record after all columns have been built.  In order to change pervious columns, the logic text can reference those columns specifically.  Thus values in prior columns can be manipulated within the last column, prior to use of the Write Verb.

This example again attempts to break the array into multiple records.  Because logic text can refer to previous columns, logic can be placed in column four which changes the value of column 3.  Thus both the Product ID and the Amount fields are set in column 4, and then a record is written.  Then these values are overwritten by the next Product ID and Amount combination, and another record is written.  Finally the last two fields are moved, and the last record written.  This is a more common design of a view containing multiple Write Verbs.

Slide 16

Let’s take a moment and contrast the prior view with multiple write statements in one view, with using multiple views.  Because each view contains its own implicit Write function, by writing multiple views we could break apart the array.  To do so, the first view would reference the 1st array fields, Product and Amount 1; the second view the 2nd set of fields, and the third the 3rd set of fields.  Thus by reading one event file record, three extract records would be written.

The downside of this approach is that the logic to populate fields 1 and 2, the Customer ID and Tran Date, would be replicated in all three views; changes to these fields would need to be made in three places. 

Slide 17

Using the Write Function keywords, any of the WR functions can be generated in a logic table.

In these example logic tables, a standard extract only view without a Write Verb is shown on the left.  This view has one WR function, as the last function in the logic table.

The logic table on the right shows the write verb, with Source=Data, has been coded multiple times in logic text. 

Note that the sequence number is 001, meaning all these write verbs are contained in the logic text of column 1.  Each of the logic table functions immediately preceding these WRDT functions may modify the extracted data during the Extract Phase.  Each will result in a different extract record being written to the extract file.

Slide 18

This module described SAFR Write Function. Now that you have completed this module, you should be able to:

  • Describe uses for SAFR Write Function
  • Read a Logic Table and Trace with explicit Write Functions
  • Properly use Write Function parameters
  • Debug views containing Write Functions

Slide 19

Additional information about SAFR is available at the web addresses shown here. This concludes Module 21, The SAFR Write Function

Slide 20

Training Module 20: Overview of SAFR User-Exit Routines

The slides used in the following video are shown below:

Slide 1

Welcome to the training course on IBM Scalable Architecture for Financial Reporting, or SAFR.  This is Module 20, Overview of SAFR User-Exit Routines

Slide 2

Upon completion of this module, you should be able to:

  • Describe uses for SAFR user-exit routines
  • Read a Logic Table and Trace which use exits
  • Explain the Function Codes used in the example
  • Debug views using exits

Slide 3

The prior modules covered the function codes highlighted in blue.  Those to be covered in this module are highlighted in yellow.  These include the RENX, LUEX and WREX functions.

Slide 4

Let’s briefly review the SAFR Performance Engine processes.

Slide 5

SAFR begins by developers creating metadata describing records, files, fields, relationships between files and fields, and user exits which might be called to perform processing in the workbench.  They then create views, which specify the logic to be applied to the data through the scan processes.  The metadata and the views are stored in the SAFR View and Metadata repository, awaiting the start the Performance Engine.

Slide 6

The first parts of the Performance engine performs the Select, Logic and Reference File phases.  The Select Phase selects the views and associated metadata for those views from the View and Metadata Repository.  Alternatively, this phase can read data provided in a SAFR XML schema, which defines the metadata and functions to be performed against the data.  It bundles either of these into a VDP, View Definition Parameter file.

The VDP is used in the Logic Phase to produce two Logic Tables, one for the Reference File Phase, and one for the Extract Phase.  The logic tables contain the functions codes described in this and the prior modules. 

The Extract Phase Logic Table file is used to process event files, but before that the Reference File Logic Table is used to extract a core image file from the reference files.  The Reference File Logic Table does not contain any selection logic, thus it does not remove any rows of data.  Rather, the core image file contains only those fields (columns) needed to be loaded into memory for joins, plus the keys required for the join processes and any effective dates.

Slide 7

The Extract Phase begins by loading the VDP, Extract Phase Logic Table and Reference Data from disk.  It then uses the logic table to generate machine code for each thread, each input file partition. It then opens input and output files, and executes these threads according to the thread governor, in parallel.  Each event file record is read and processed through the logic table, performing joins, and writing selected data to output extract files.

The Extract Phase includes multiple types of user exits, including read, write and look-up exits, highlighted on this graphic by the green balls.

Slide 8

The format phase, which is optional depending on the view requirements, begins by sorting each extract file using a custom generated sort card based upon the sort criteria of the views written to that extract file.  During the final phase of sort processing—that which writes data to disk—the records are passed in memory to the format engine.  These records are formatted according to the VDP specifications.  The data is written to the final output file.

The Format Phase includes the Format Exit, shown here at the end of the Format process.  Sort utilities also allow creating sort input exits, shown here over the Sort process.

Let’s overview each of these users exits.

Slide 9

User exits, which can also be thought of as API points, are custom programs to perform functions SAFR does not do natively.  For example, SAFR does not perform an exponential mathematical calculation.  A customer could create a program to perform this function, and call it from SAFR to return the results of the calculation.

As noted, SAFR has four major points which can invoke a user exit, read, lookup, write and format exits.  The first three are Extract Phase exits, and are used much more frequently than the fourth Format Phase exit.  They are:

  • Read Exits stand between the actual input event file and the SAFR views.  These exits can modify input records to be presented to SAFR threads for processing.
  • Lookup Exits stand between SAFR views and the look-up data loaded into memory.   Lookup exits accept join parameters and return looked up records in response to individual joins. These exits can also be used as simple function calls which do not actually perform any look-up.  For example the exponential calculation discussed above could be written as a look-up exit.
  • Write Exits stand between SAFR and the extract files.  They receive extract records and can manipulate them before being written to extract files. 
  • Format Exits, the only GVBMR88 exit, accepts summarized and formatted Format Phase output records prior to being written to files. Format exits are very similar to write exits, except that the record used is the final output record, rather than the extract record. 

At the end of this module we will touch on a non-SAFR exit, the Sort Input Exits.

Slide 10

As noted, read exits sit between the extract engine, and the input event file.  The SAFR views, inside GVBMR95, are only aware of the virtual file which is output by the read exit.  Read Exits are assigned to a physical file, and are placed on the RENX logic table function.  It is called each time the RENX function is executed. 

Examples of read exits that have been written for SAFR applications include:

  • Read a specialized database structure and extract every record to be passed to SAFR to allow SAFR to produce reports against those records
  • Merge multiple sequential files and compare snap shot records with the history file into a single master file, then used to produce reports
  • Process sets of records, and perform functions across the entire set, where one record can affect other either later, or earlier records.  SAFR does not easily perform these functions in view.  After all affects are determined and applied, the file is passed to SAFR for report generation.

Read exits are typically the most complex to write, because they must perform some IO of some kind.  They are further complicated because it is very inefficient to call a read exit for each record; so instead they are usually written to do block level processing. 

Slide 11

Look-up exits sit between the extract engine, and a potential look-up file. The SAFR views are only aware of the virtual lookup record output by the lookup exit.  A sample application could be doing direct reads to a database table to retrieve a join value for processing.  However, most often look-up exits do not actually load any data from disk; rather they simply use input parameters passed to them by the views to do some function.  Thus the exit is basically a simple function call. 

Look-up exits are the easiest type of exit to create. The parameters passed to the lookup exit are the values placed in the fields of the join key.  These can be constants, fields from the event file, or fields from another lookup, including calls to other exits.  The output from the lookup exit is a record that must match the LR for the “reference file” record it is to return.   Although it appears to a SAFR developer as if SAFR has taken the keys and performed a search of a reference table to find the appropriate record, the exit may have done no such thing.  In fact, it could do something as simple as reordering the fields passed to it and returning the record.

Lookup Exits are assigned to a target look-up LR.  When used, the typical LUSM functions are changed to LUEX functions.   The exit is called each time the LUEX function is executed. 

Certain modules are delivered with the product, called SAFR Standard Look-up Exits.  These perform a common set of functions not done by native SAFR.   For example:

  • GVBXLEXP      An exponential calculator.  If passed a number, format and exponent, it will perform the calculation
  • GVBXLMD       Accepts two dates, and computes days between those dates
  • GVBXLRUD     Simply returns the SAFR Fiscal Year data, passed in through the VDP
  • GVBXLST         Accepts strings and concatenates them and returns a single string value
  • GVBXLTM        Performs a trim function against passed string parameters

Slide 12

Write exits sit between the extract engine, and a potential extract or output file. Each write exit is tightly coupled to it’s SAFR view, because the exit receives the view output.  Extract exits are called whenever a view is to write an extract record.  In addition to the view’s extract record the write exit is also passed and can sees the event record. 

Write Exits are associated with a view, since the view or a write statement within the view generates the WREX logic table function (rather than WRIN, WRDT, or WRSU functions) and creates the extract record

The write exit can tell GVBMR95 to do any one of the following:

  • Tell GVBMR95 to write a record the exit specifies (could be any record) and continue with event file processing
  • Tell GVBMR95 to skip this extract record and go on
  • Tell GVBMR95 to write a record the exit specifies (could be any record) and return to the exit to do more processing

The exit can manipulate the extract record; substitute a new record, table the extract record in some way and then dump the table at the end of event file processing, or any other number of things.  Note, though, that unlike read exits which do open and actually read files, write exits typically do not.  They return records to SAFR to write to the extract file.  They could do their own IO, but there is typically no benefit to doing so.  SAFR’s write routines are very efficient.

Some examples of write exits include the following:

  • Table multiple records, summarize them if they have common sort keys, and write a record when the key changes.
  • Read a set of parameters giving scoring requirements, table multiple records and upon a key change, score records.  Create a completely new record with the scoring results, write this new record, and drop all the tabled records

Write exits are in between read and lookup exits for complexity.  This is mainly because of the complexity of dealing with extract records.  The exit must know what the extract record will look like for a particular view.  This might be easiest to determine by actually writing the view, and inspecting the extracted records to find positions and lengths.  Any changes in the views can create a new to update the write exit. 

Format exits are very much like write exits, except that they are called at the end of GVBMR88.  They are also dependent upon the view specification.  They are called for each format output record.  Like write exits, they are specified on each view.  They are specified only in the VDP; there is no logic table impact for them.

Slide 13

Using exits requires that they first be described in the User Exit Routine screens within the Workbench.  The name can be anything desired. The type can be either Read, Look-up, Write, or Format.  The language and path are for documentation only.  The executable must match the load module name stored within an accessible loadlib for either GVBMR95 or GVBMR88.

Slide 14

The Optimizable flag is only applicable for look-up exits.  Remember that SAFR bypasses certain look-ups when the look-up has already been performed.  In these cases, the look-up exit would not be called in the subsequent cases.  If the look-up exit is stateless, in other words, it does not function differently from one execution to another given the same input parameters, the exit can be optimized.  If the exit retains its state from one call to another, then it must be called each time and cannot be optimized. 

For example, one exit was written to detect the first time it was called for a particular event file record.  In this case, it would return a return code of 0; every subsequent call would return a code of 1.  This exit cannot be optimized; each potential call must actually call the exit.  Otherwise the exit would always return a code of 0.

Slide 15

Once the user exits are entered into the User Exit table, they can be assigned to the other appropriate metadata components.  For Read Exits, the exit must be assigned to specific Physical File entities.  Remember that for each physical file read by views being processed, the Logic Table contains an RENX logic table row for that physical file.  By assigning a read exit on the physical file, the generated RENX entry will contain the exit to be called each time a new record or block of data is needed

The data returned by the read exit must match the Logical Records that are assigned to this physical file.  Thus when a view accesses a field to perform, perhaps, a selection function, the data must match the logical record layout for the SAFR “physical” file entity, not the file read by the read exit.

Slide 16

Standard look-ups require the data to be joined to be defined by an LR.  For look-up exits, the logical record to be returned by the exit must match a specific LR (not the data read from file by the look-up exit, if there is any).  In the example shown here, the “phase code” value must be returned by the look-up exit in position 3 for a length of 2.

When defining the logical record to be returned by the exit, define any of the input parameters the exit will require as “keys,” as if the exit were going to search a reference file table to find the required answer.

Next define a path that will provide the needed inputs to the exit.  The values in the path can be provided as constants in the views, or in the path, or as values passed from the input file or looked up from another look-up table, requiring a multi-level look-up.

Slide 17

After defining the LR and the look-up path, to assign a lookup exit to the Logical record, select the LR Properties tab.  Then select the appropriate exit.  When a field from the target LR is used in a view, this exit will be called to return data in the format of the defined logical record.  Thus when the Logic Table is generated, the LUSM will be changed to LUEX.  This logic table function contains the user module name to be called.

Slide 18

Write Exits and Format Exits are assigned in view properties.  Write exits are assigned in the Extract Phase tab; Format Exit are assigned Format Phase tab. 

Because both of these exits sit potentially at the end of the process, these exits do not return data to SAFR views; therefore no logical record defines the outputs from these exits.  Rather the view columns and format (file, hardcopy, etc.) define what these exits will receive.  Changes to the view layout will affect the exits.

Often the “Write DT Area” option is used with write exits, to eliminate the complexity of the extract record layout.  Only the column data is passed to the write exit.

Slide 19

Each exit has the potential to receive a fixed set of parameters upon start up.  These parameters are assigned for each instance that an exit is invoked. 

For example, a look-up exit may function differently depending on which LR it is supposed to return; perhaps data can be returned in compressed format in one instance, and not in another.  The LR for the compressed data may pass in a start-up parameter of “CMPSD” and the uncompressed LR would pass in a start-up parameter of “UNCMPSD”.  In this way the same exit program can be used, and which LR should be returned can be indicated as a parameter to the exit. 

Slide 20

Read exits receive only the start-up parameters.  Write and Format exits also receive start-up parameters; they also receive the extracted record from the view.  Write exits also have visibility to the event file record as well.

Look-up exits, by contrast, receive start-up parameters; they also have visibility to the event file record.  Additionally, Look-up exits receive all the parameters built in the look-up path.  These values must match the required key for the logical record.

Note the difference between these two types of parameters:

  • The start-up parameters do not change throughout the entire run of SAFR.  They are constant.  They are typically only used by the exit at start up to determine which mode the program should function in.
  • The look-up key values can change based upon every event file record processed.  Customer ID 1 on the first record may become Customer ID 10,000 on the next record. 

Slide 21

We will use this view to show the GVBMR95 Logic Table and MR95 Trace.  Our example will use a look-up exit, which returns various thread parameters that can be of interest for technical reasons in a view.  The look-up exit is assigned on the LR and path we just examined.

Slide 22

This is the logic table for the example view.  Note that lookups, which would normally have the function code of LUSM, have been changed to the function code of LUEX.  Also, for each LUEX, the ID associated with the User Exit is assigned, in this case, ID 13.  This is the ID assigned to the module GVBXLENV. 

Our path required a single character value be passed to the exit.  This “key” value—a constant of “D”—built by the LKC function, will be passed to the exit as part of the lookup.

Note also that the view has no write exit, because the logic table is WRXT, not a WREX.  Also note there is no Exit ID assigned to the WRXT row.

Slide 23

This is the GVBMR95 Trace for the logic table. 

For event file record 1, the LKC function builds a key with a value “D” in it.  This value is passed to the exit GVBXLENV during the LUEX function.  The exit is called.  The view then uses the data, through a DTL function, placing a value 0001 into column 2.  GVBXLENV did not search any data.  It simply queried the SAFR thread number to return a value of 0001. 

Also, note the number of times the exit is called in this trace.  Each lookup actually required calling the exit.

Slide 24

This is the same view but a new trace.  In this trace the “Optimize” flag was turned on.  This means that the logic table only has one executed LUEX function for the entire first event file record.  Because the LKC value of “D” did not change between calls to the exit, the exit is not called again. 

Slide 25

The difference between the logic tables for optimized and non-optimized is very clear. The Optimized trace on the right saves significant CPU time, including the overhead for linking to the user exit multiple times on each event file record.  Exits require CPU time by their nature, and the efficiency of the language run-time can also have an impact.  Efficiency should carefully be considered when creating any exit.

Slide 26

The trace shows the values placed in the key by the LKDT, LKC and other look-up key functions, the value of “D” in our example view.  These parameters are passed to the look-up exit.  It also shows the static parameters passed as well.  These are shown on the end of the LKEX function row, after the Look-up Exit module.  Similar parameters can be seen in the trace for Read and Write exits if start-up parameters are passed.

Slide 27

Exits must be written following the SAFR User Exit guidelines.   These specify a standard set of linkage parameters to interact with GVBMR95 and GVBMR88.  They include:

  • A standard set of pointers, used to access various data provided by SAFR, including the event record, the extract record, look-up keys, and a work area for maintaining working storage parameters for the exit. 
  • Environment data, including the Phase Code, Open, Read, or Close, informing the exit what the status of processing is.  Exits are called during the
  • Open phase, to prepare for processing,
  • Read phase as event file records are processed, and
  • Close phase, to print out control reports, flush final records, or clean up.
  • Return codes values, informing SAFR of the results of processing.  These can include:
  • A Found, Not Found, or Skip Event Record condition on a Look-up Exit,
  • An End of File on for a Read exit, or
  • Write the standard extract record, Write a different record and then return to the exit for processing, or Skip the extract record and continue processing for Write Exits
  • All Exits may signal view or process level errors as well

Slide 28

As noted earlier this graphic shows the potential for a Sort Utility Input or Read Exit.  SAFR uses standard sort utilities provided with the operating system or otherwise.  These utilities typically allow for a read exit to the sort utility, a program which stands between the sort utility and the file to be sorted.  Procedures for writing these exits depend upon the sort utility used.  Refer to the sort utility documentation for instructions.

As an example of this type of exit, SAFR provides a module GVBSR02.  This module accepts a parameter file, instructing it which view(s) to create multiple permutations for.  Before the utility sorts the records, the Sort Exit will replicate the extracted data, creating new records with permuted sort keys.  For example, a record with a sort key of A, B and C will be duplicated, and a records for of the following would be created:

  • B, C, A,
  • C, A, B,
  • C, B, A
  • A, C, B
  • B, A, C

This produces many more possible outputs without creating all the different views or exploding the extract files with all the possible combination. 

Using this exit requires concatenating special sort cards to the GVBMR95 generated sort cards, and creating parameters for the Logic Table programs instructing them to generate permuted VDP views, similar to the process it undertakes when it creates the JLT for the reference file phase. These views are never found in the Metadata Repository; they are temporary views only in that run’s VDP.  They are required in the VDP for GVBMR88 to refer to in processing the permuted records.

Slide 29

In this module, we examined the following logic table functions:

  • REEX, Read a new Event Record with a user exit
  • LUEX, Lookup calling a user exit
  • WREX, Write calling a user exit

Slide 30

This module described SAFR User Exit Routines. Now that you have completed this module, you should be able to:

  • Describe uses for SAFR user-exit routines
  • Read a Logic Table and Trace which use exits
  • Explain the Function Codes used in the example
  • Debug views using exits

Additional information about SAFR is available at the web addresses shown here. This concludes Module 20, Overview of SAFR User-Exit Routines

Training Module 19: Parallel Processing

The slides for the following video are shown below:

Slide 1

Welcome to the training course on IBM Scalable Architecture for Financial Reporting, or SAFR.  This is Module 19, Parallel Processing

Slide 2

Upon completion of this module, you should be able to:

  • Describe SAFR’s Parallel Processing feature
  • Read a Logic Table and Trace with multiple sources
  • Set Trace parameters to filter out unneeded rows
  • Configure shared extract files across parallel threads
  • Set thread governors to conserve system resources
  • Debug a SAFR execution with multiple sources

Slide 3

In a prior module we noted that the Extract Engine, GVBMR95, is a parallel processing engine, the Format Phase, GVBMR88, is not.  It is possible to run multiple Format Phase jobs in parallel, one processing each extract file.  Yet each job is independent of each other.  They share no resources, such as look-up tables for join processing.  In this module we will highlight the benefits of GVBMR95 multi-thread parallelism, how to enable, control, and debug it.

Slide 4

Parallelism allows for use of more resources to complete a task in a shorter period of time.  For example if the view requires reading 1 million records to produce the appropriate output and the computer can process 250,000 records a second, it will require 4 seconds at a minimum to produce this view.  If the file is divided into 4 parts then the output could be produced in 1 second.

Doing so requires adequate system resources, in the form of I/O Channels, CPUs, and memory.  If for example, our computer only has 1 CPU, then each file will effectively be processed serially, no matter how many files we have.  All parallel processing is resource constrained in some way.

Slide 5

GVBMR95 is a multi-task parallel processing engine.  It does not require multiple jobs to create parallelism.  Using the Logic Table and VDP, GVBMR95 actually creates individual programs from scratch and then asks the operating system to execute each program, called sub tasks, one for each input Event File to be scanned for the required views.  This is done all within one z/OS job step.  Each of these sub tasks independently reads data from the Event File, and writes extract records to extract files for all views reading that event file.

In this example, the combination of views contained in the VDP and Logic Table require reading data from four different event files.  These views write output data to six different extract files.  The main GVBMR95 program will generate four different sub tasks, corresponding to the four different input event files to be read.

Slide 6

Multi-threaded parallelism can provide certain benefits over Multi-job parallelism.  These benefits include:

  • Shared Memory.  Only one copy of the reference data to be joined to must be loaded into memory.  Sharing memory across different jobs is much less efficient than within a single job.  Thus all sub tasks can efficiently access the memory resident tables
  • Shared I/O.  Data from multiple input files can be written to a single extract file for that view, allowing for shared output files
  • Shared Control.  Only one job needs to be controlled and monitored, since all work occurs under this single job step
  • Shared Processing.  In certain cases, data may be shared between threads when required, through the use of SAFR Piping or exit processing

Slide 7

Recall from an earlier lesson that:

  • A physical file definition describes a data source like the Order Files
  • A logical file definition describes a collection of one or more physical files
  • Views specify which Logical Files to read

The degree of potential parallelism is determined by the number of Physical Files.  GVBMR95 generates subtasks to read each Physical File included in every Logical File read by any View included in the VDP.

Slide 8

The SAFR metadata component Physical File contains the DD Name to be used to read the input event file.  This DD Name is used throughout the GVBMR95 trace process to identify the thread being processed.

Slide 9

Remember that GVBMR95 can also perform dynamic allocation of input files.  This means that even if a file is NOT listed in the JCL, if the file name is listed in the SAFR Physical File entity, GVBMR95 will instruct the operating system to allocate this named file.  GVBMR95 will first test if the file is listed in the JCL before performing dynamic allocation.  Thus the SAFR Physical Entity file name can be over-ridden in the JCL if required.  Dynamic allocation can allow an unexpected view included in the VDP to successfully run even though no updates were made to the JCL. 

Slide 10

The same Physical File can be included in multiple logical files.  This allows the SAFR developer to embed in file’s meaning that may be useful for view processing, like a partition for stores within each state.  Another Logical File can be created which includes all states in a particular region, and another to include all states in the US.

In this module’s example, we have the ORDER001 Logical File reading the Order_001 physical file, the ORDER005 Logical File reading the ORDER-005 physical File, and the ORDERALL Logical File, reading the ORDER001, 002, 003, 004 and 005 physical files.

Slide 11

In this module we have three sample views, each Extract Only Views with the same column definitions (3 DT columns).  The only difference is the Logical File each is reading.  View 148 reads only the ORDER 001 file, while view 147 reads only the ORDER 005 file, and view 19 reads all Order files, including 1, 2, 3, 4 and 5.

Slide 12

In the first run, each view in our example writes its output to its own extract file.  Output files are selected in the view properties tab.  The SAFR Physical File meta data component lists the output DD Name for each, similar to the Input Event Files. 

Slide 13

As GVBMR90 builds the logic table, it copies each view into the “ES Set” which reads a unique combination of source files.  Doing so creates the template needed by GVBMR95 to generate the individualized program needed to read each source. Note that Output files are typically a single file for an entire view, and thus typically are shared across ES sets.

In our example, three ES Sets are created.  The first includes views 148 and 19, each with their NV, DTEs, and WRDT functions.  The second ES Set is only view 19.  And the third is views 19 and 147.  The Output04 file will be written to by multiple ES Sets, which contain view 19.

Slide 14

This is the generated logic table for our sample views.  It contains only one HD Header Function, and one EN End function at the end.  But unlike any of the pervious Logic Tables, it has multiple RENX Read Next functions, and ES End of Set Functions.  Each RENX is preceded by a File ID row, with a generated File ID for that unique combination of files which are to be read.  Between the RENX and the ES is the logic for each view; only the Goto True and False rows differ for the same logic table functions between each ES set.

In our example, begins with the HD function, then ES Set 1, reading File “22” (a generated file ID for the ORDER001 file) contains view logic 48 and 19.  The second ES Set for File “23” (Order files 2, 3, and 4) contains only View ID 19, and the third set for file ID “24” (the ORDER005 file) contains the view logic for view 47 and 19 and ends with the EN function.

Next we’ll look at the initial messages for the Logic Table Trace.

Slide 15

When Trace is activated, GVBMR95 prints certain messages during initialization, before parallel processing begins.  Let’s examine these messages.

  • MR95 validates any input parameters listed in the JCL for proper keywords and values
  • MR95 loads the VDP file from disk to memory
  • MR95 next loads the Logic Table from disk to memory
  • MR95 clones ES Sets to create threads, discussed more fully on the next slide
  • MR95 loads the REH Reference Data Header file into memory, and from this table then loads each RED Reference Data file into memory.  During this process it check for proper sort order of each key in each reference table
  • Using each function code in the logic table, MR95 creates a customized machine code program in memory.  The data in this section can assist SAFR support in locating in memory the code generated for specific LT Functions
  • Next MR95 opens each of the output files to be used.  Opening of input Event files is done within each thread, but because threads can share output files, they are opened before beginning parallel processing
  • MR95 updates various addresses in the generated machine code in memory
  • Having loaded all necessary input and reference data, generated the machine code for each thread, and opened output files, MR95 then begins parallel processing.  It does this by instructing the operating system to execute each generated program.  The main MR95 program (sometimes called the “Mother Task”) then goes to sleep until all the subtasks are completed.  At this point, if no errors have occurred in any thread, the mother task wakes up, closes all extract files, and prints the control report.

Slide 16

The GVBMR90 logic table only includes ES Sets, not individual threads.  When GVBMR95 begins, it detects if multiple files must be read by a single ES Set.  If so, it clones the ES Set logic to create multiple threads from the same section of the logic table.

In our example views, the single ES Set for the Order 2, 3 and 4 files is cloned during MR95 initialization to create individual threads for each input file.

Slide 17

During parallel processing GVBMR95 prints certain messages to the JES Message Log

  • The first message shows the number of threads started, and the time parallel processing began
  • As each thread finishes processing, it also prints a message showing the time it completed
  • The thread number completed
  • The DD Name of the input file being read, and
  • The record count of the input file records read for that thread

Each thread is independent (asynchronous) with any other thread (and the sleeping Mother Task during thread processing).  The order and length of work each performs is under the control of the operating system.  A thread may process one or many input event records in bursts, and then be swapped off the CPU to await some system resource or higher priority work.  Thus the timing of starting and stopping for each thread cannot be predicted.

Slide 18

The Trace output is a shared output for all processing threads.  Because threads process independently, under the control of the operating system, the order of the records written to the trace is unpredictable.  There may be long bursts of trace records from one thread, followed by a long burst of processing within another thread.  This would be the case if only two threads were being processed (for two input event files) on a one CPU machine (there would be no parallel processing in this case either)  How long each thread remains on the CPU is determined by the operating system. 

Alternatively, records from two or more threads may be interleaved in the trace output, as they are processed by different CPUs.  Thus the EVENT DDNAME column become critical to determining which thread a specific trace record is for.  The DD Name value is typically the input Event file DD name.  The Event Record count (the next column) always increases for a particular thread.  Thus record 3 is always processed after record 2 for one specific DD Name.

In this example,

  • The ORDER002 thread begins processing, and processes input event records 1 – 13 for view 19 before any other thread processes. 
  • It stops processing for a moment, and ORDER001 begins processing, but only completes 1 input event record for views 148 and view 19 before stopping to process. 
  • ORDER002 picks back up and processes record 14 for its one view. 
  • ORDER005 finally begins processing, completing input record 1 for both views 147 and 19. 
  • ORDER001 begins processing again, record 2 for both views 148 and 19.

Note that this portion of the trace does not show any processing for thread ORDER003 and ORDER004.

Slide 19

In a prior module we showed examples of Trace parameters which control what trace outputs are written to the trace file.  One of the most useful is the TRACEINPUT parameter, which allows tracing a specific thread or threads.  It uses the thread DD Name to restrict output to that thread.

Slide 20

The heading section of the GVBMR95 control report provides information about the GVBMR95 environment and execution.  It includes:

  • SAFR executable version information
  • The system date and time of the execution of the SAFR process.  (This is not the “Run Date” parameter, but the actual system date and time)
  • The GVBMR95 parameters provided for this execution, including overall parameters, environment variables, and trace parameters
  • zIIP processing status information

Slide 21

The remainder of the control report shows additional information when running in parallel mode:

  • The number of threads executed, 5 in this example, is shown
  • The number of views run is not simply a count of the views in the VDP, but the replicated ES Set views.  In this example, because view 19 read 5 sources, it counts as 5, plus view 147 and view 148 equals 7
  • The greater the number of views and threads that are run, the larger the size of the generated machine code programs,  in bytes. 
  • The input record count for each thread is shown, along with it’s DD name.  Because GVBMR95 ran in full parallel mode, the record counts for the thread are the same as the record counts for each input file.  Later we’ll explain how these numbers can be different.
  • The results of each view against each input DD Name (thread in this case) is shown, including lookups found and not found, records extracted, DD name where those records were written to, and the number of records read from that DD Name.  Certain additional features, explain in a later module, can cause the record counts read for a view to be different than the record counts for the input file.

Slide 22

Because SAFR allows sharing output files across threads, it is possible to have many views writing to a single output file.  The outputs may either be all the same record formats, or differing formats as variable length records.  This can allow for complex view logic to be broken up multiple views, each containing a portion of the logic for a particular type of record output.

Slide 23

This control report shows the results of changing our views to write their outputs to one file.  The total record count of all records written remains the same.  They are now consolidated into one single file, rather than 3 files.

Similar to the Trace output, the order of the records written to the extract file by two or more threads is unpredictable. 

As we’ll discuss in a later module, there is a performance impact for multiple threads writing to a single output file.  Threads ready to write a record may have to wait, and the coordination of which thread should wait consumes resources.  Therefore it can be more efficient to limit the amount of sharing of output files when possible, and combine files (through concatenation, etc.) after SAFR processing.

Slide 24

In this execution, we have included additional views, reading the same input event files, but which produce Format Phase views.  The rows showed in red are added to the prior control report as additional outputs from the single scan of the input event files.  This also demonstrates how the standard extract files can be shared across multiple threads.

Slide 25

SAFR provides the ability to control the overall number of threads executed in parallel without changing the views.  The GVBMR95 parameters Disk and Tape Threads specify how many parallel threads GVBRM95 should execute in parallel.  Disk threads control the number of threads reading input Event Files on disk, as specified in the SAFR Physical File meta data; tape threads control those input Event Files which are on tape. 

The default value for these parameters is 999 meaning effectively all threads should be executed in parallel.  If one of the parameters is set to a number less than the number of input Event Files of a specific type, SAFR will generate only that many threads.  GVBMR95 will then process multiple event files serially within one of those threads.  As one event file completes processing, that thread will then examine if more event files remain to be processed and if so, it will process another Event file under the same thread.

Slide 26

The GVBMR95 Control reports show the results of thread governor.  In the top, the disk governor parameters are shown, and the number of records read by each thread are shown.  Because each thread only processed one event file, and all were done in parallel, the records read from the event file and for the thread are the same.

In the bottom example, the thread governor has been set to 1, meaning only one thread will be run.  In this instance the control report shows that the total records read for the thread equal the total reads read for all event files because this thread processed each event file serially until all threads were complete. 

Slide 27

The above highlight the key z/OS statistics often tracked from GVBMR95 runs.  These include CPU time, elapsed time, memory usage, and IO counts.

The top set of statistics are from the run with parallelism, the bottom from a run using the Thread Governor with no parallelism.  The impact of parallelism can be seen in the difference between the elapsed time for a job with parallelism and one without. Parallelism cut the elapsed time in half. 

The Extract Program, GVBMR95, is subject to all standard z/OS control features, including job classes, workload class, priority, etc.  In other words, even with parallel processing, the z/OS job class or priority may be set to such a low level, and the other workloads in higher classes on the machine may be so significant that effectively no parallel processing occurs even though GVBMR95 instructs z/OS to execute multiple threads.  Each of these threads receive the same job class, priority, etc. as the entire Extract Phase job.  In this way the operator and system program remain in control of all SAFR jobs.

Slide 28

This module described SAFR Parallel Processing. Now that you have completed this module, you should be able to:

  • Describe SAFR’s Parallel Processing feature
  • Read a Logic Table and Trace with multiple sources
  • Set Trace parameters to filter out unneeded rows
  • Configure shared extract files across parallel threads
  • Set thread governors to conserve system resources
  • Debug a SAFR execution with multiple sources

Slide 29

Additional information about SAFR is available at the web addresses shown here. This concludes Module 19, Parallel Processing

Slide 30

This page does not have audio.