In the summer of 2020, Kip Twitchell presented SAFR / GenevaERS to the Linux Foundation’s Open Mainframe Project’s Technical Advisory Committee (TAC).
The following presentation was built after a major successful project using Geneva/SAFR at a large, international bank, moving from tooling to a fully formed business configuration system.
The following citation, in summary at the top, with more detail below, are for a large financial service firm. More about this time can be read in Balancing Act: A Practical Approach to Business Event Based Insights, Chapter 21. ERP Reporting and Chapter 35. Model the Repository.
Insurance Company Manages Very Large Data Warehouse
Three years ago, a Fortune 100 Insurance Company embarked upon a “ Data Warehouse Program” to “build a single consistent common source of financially balanced data to be used by multiple business users for decision support, data mining, and internal/external reporting.” A major component of this information architecture is the construction of multiple Operational Data Stores or ODS’s. These ODS’s may contain up to 10 years of event level or detailed transaction history as the basis for populating other data warehousing applications. The company uses Geneva ERS to load and extract data from these data stores.
Data is loaded into these repositories nightly. In some cases Geneva ERS is used to extract data from operational source systems. “Source systems” vary from the PeopleSoft Journal Lines DB/2 table to legacy flat files. The DB/2 extract process reads 80 physical DB2 partitions in parallel, scanning over 80 million records in approximately 10 minutes.
This extracted data is edited using a combination of custom Geneva ERS processes and stand-alone custom programs. After the data is edited it is loaded into the repositories in parallel by Geneva ERS. The number of records loaded varies by repository, from 100,000 financial transactions (journal lines) to more than 1,000,000 policy transactions. In total, approximately 2 million transactions a day are added to the repositories.
In the same pass of the data that loads the repositories, Geneva ERS also produces multiple reports and extract files for use by business users and other application systems. One of the repositories is “backed up” by Geneva ERS at the same time. This daily execution of Geneva ERS reads over 450 million records in 23 different entities and 220 physical partitions. It writes over 900 million physical records to over 800 files. This process has run in 1 hour 3 minutes of wall clock time and 3 hours of CPU time.
Other executions of Geneva ERS extract data on a daily, weekly, monthly and annual basis. The output from one execution creates an executive information drill down file accessed via the company Intranet. This web site is accessible by over 300 business users.
The Geneva “Views” executed in all of the above processes are maintained within the Geneva ViewBuilder software. This includes record structures, field types, relationships between structures, and the actual queries themselves. Many queries are very similar to programs in their sophistication and contain complex business logic. Some have over 900 column definitions. These views also utilize custom code for accessing certain files and executing business logic requiring a programming language. Over 100 people have been trained on programming using Geneva ERS, and the company has had up to 10 testing environments at one time.
The most sophisticated use of Geneva ERS emulates a PeopleSoft financial cost allocation process. Custom programs were developed which generate over 6,000 Geneva ERS views based upon over 7,000 PeopleSoft rules. Geneva ERS executes these views to scan the financial repository selecting records eligible for allocation. It then allocates these costs through four allocation layers, such as products, and geographical units. At 1999 year-end, this process read over 50 million records selecting nearly 3 million that were eligible for allocation. These 3 million records were exploded into over 290 million virtual allocation records, of which 186 million summarized records were written to physical files. The process runs in 7½ hours wall clock time and 28½ hours of CPU time.
Financial Services Company Simplifies Complex Processes
Many financial service organizations were early adopters of computer technology. They quickly constructed systems to automate numerous processes. Programs were often added to the fringe of existing processes to solve new problems or provide an additional report or file for analysis. The need to keep multiple types of books and fulfill regulatory reporting requirements added to the complexity of some of these systems. These systems grew before modularization, subsystem definition, or other computing concepts were developed. Furthermore, the number of transactions generated by these organizations always challenged computing capacity. This required creative and complex solutions. Over time these systems metamorphosed into closely intertwined, interconnected, and inflexible systems.
In 1996, a Fortune 100 Insurance Company determined their reporting system was so inflexible that it would become unsupportable in the near future. They decided “…to build a single, consistent source of financially balanced data to be used by multiple business users for decision support, data mining, and internal/external reporting.” After analyzing their information needs they determined that the best approach to satisfy the broadest number of users was to construct a comprehensive data warehouse environment, including extract transformation processes, Operational Data Stores (ODSs), reporting environments, and data marts. The company viewed the ODSs as the heart of the data warehousing effort. ODSs are designed to contain up to 10 years of detailed transactional history. By storing the transactional detail, the company can satisfy requests for individual transactions, summaries using any field from the transaction detail, or snap shots as of any point in time. This robust environment truly enables the “single, consistent source of financially balanced data.”
Although such a data store simplifies and satisfies all types of information needs, it also means tremendous data volumes. Managing such data volumes requires a creative, industrial strength solution. It also requires people who know how to make it happen. This company went looking for a tool and the people that were up to the challenge. They chose PricewaterhouseCoopers and Geneva ERS.
Geneva offers a robust data warehousing solution able to process tremendous amounts of data quickly. It also provides for innovative, flexible solutions that are easily supported by company employees. This company uses Geneva to do the following:
- Source system Extract and Transform
- ODS Load, Query, Interface production
- Detailed Financial Allocation Process
- Executive Information Delivery
Extract and Transform
Extracting from legacy systems can often present a challenge to the most robust tools. Especially when source systems include complex data structures and high data volumes. Legacy systems often were designed to support monthly reporting cycles by accumulating and summarizing data before finally kicking out the hard copy report at the end of the month. Changing such systems to provide more recent information can make the difference between a successful information environment and simply a different format for the same old content. However, making sure the results are right can be a very expensive process.
PwC consultants used a new approach to attack this problem. For this client, they first created a program which understood these complex and unique legacy structures, which could be called by the Geneva ERS open API. This program opened up difficult legacy systems to the power of Geneva. Geneva was used to extract data from multiple sources. The Geneva development environment allows definition of source system data structures. Once defined to Geneva, business logic can be applied. The business logic is stored as a Geneva “View.” The Views are organized by logical frequencies like daily or weekly processes or on request processes. Once the environment was created, they focused on understanding the data structures instead of tracing through the entire legacy process. They used an iterative prototyping approach to discover the source of all the data contained in legacy downstream files. They successfully proved that the system could be converted from a monthly to a daily system. They extracted the source system data and populated the ODSs. The iterative prototyping approach used by PwC consultants shaved months off the delivery cycle and hundreds of man-hours spent in legacy system research.
The power of Geneva is evident in the types of source systems from which data is extracted. In addition to the complex legacy structure noted above, a DB/2 extract process reads 80 physical DB/2 partitions in parallel, scanning over 80 million records in approximately 10 minutes. All processing is completed in a fraction of the time it would take another data warehousing tool.
Load, Query and Interface Production
PwC set out to help create an environment that would allow many users, with diverse information needs to pull the desired information from the repositories. They created data and processing models that allowed all queries to be resolved in a single pass of the transaction detail. These models minimizes data storage requirements by eliminating duplicate data from transactions, but combining the data back together for report production in an efficient manner.
In the same pass of the data that loads the repositories, Geneva ERS also produces multiple reports and extract files for use by business users and other application systems. Many output formats are possible including reports, spreadsheets and files. And because the ODSs contain transaction level detail, users are able to choose what level of detail they wish to see. The data model also allows for use of Geneva’s date effective join capabilities. This enables users to create reports as of any point in time. Summaries can be created using organizational structures from any point in time.
The client choose to construct an ODS to support the ERP General Ledger being installed. However, the coding structure for the new ERP package differed significantly from the historical organization and account coding structure. The ODS supported all interface production, translating from old to new and from new to old. Global utilities were constructed that were called from the Geneva API. Because of Geneva’s ability to process the detailed transactions, all fields could be translated at the lowest level of detail. This enabled consistent answer set production for all interfaces.
The number of records loaded varies by repository, from 100,000 financial transactions (journal lines) to more than 1,000,000 policy transactions. In total, approximately 2 million transactions a day are added to the repositories. The single pass architecture even produces back ups using Geneva ERS in the same pass of the ODS. This daily execution of Geneva ERS reads over 450 million records in 23 different entities and 220 physical partitions. It writes over 900 million physical records to over 800 files. This process has run in 1 hour 3 minutes of wall clock time and 3 hours of CPU time.
Detailed Financial Allocation Process
The most sophisticated use of Geneva emulates an ERP financial cost allocation process. The insurance company recognized that with their volume of data, the ERP package could not handle the allocation process. They would have to change their business requirements or find another solution. They looked to PwC and Geneva to supply that solution. Client members and PwC consultants analyzed the allocation process and created custom programs which generate over 6,000 Geneva views based upon over 7,000 allocation rules. Geneva executes these views to scan the financial ODS selecting records eligible for allocation. It then allocates these costs through four allocation layers, such as products, and geographical units.
During the first year of implementation, the year-end allocation process read over 50 million records selecting nearly 3 million that were eligible for allocation. These 3 million records were exploded into 186 million allocation results. The process runs in 7½ hours wall clock time. The Geneva system produces these results 250 times faster than the ERP package. Because of this innovative Geneva solution, the business users were able to have their data represented exactly as they wanted.
Geneva ERS Executive Information System
Providing users with on-line access into repositories holding hundreds of millions of detailed records is a significant challenge. The PwC team developed an innovative approach to give users the access, but not compromise the performance of the ODS processes or require massive new processing capacity. The result was the Geneva ERS Executive Information System.
This system uses Geneva to produce summary “views” of the detailed transactions. These summaries were developed within Geneva, and could summarize by any field in the ODS. Approximately 20 different cuts of the data were developed. During the load and query processes, these queries are executed to refresh the summaries from the detail transactions. Because the summaries are always regenerated from the detail, no sophisticated update processes had to be developed and they also always contain the same consistent answer.
Users access the companies Intranet site, and select which summary to view. The Geneva ERS Java applet allows users to drill down within the report to lower and lower levels of detail. Because of the unique structure of the Geneva output file, data access if very efficient. This web site is accessible by over 300 business users
Most Geneva Views are created and maintained within the Geneva ViewBuilder software. This interface stores table structures, field types, relationships between structures, and the business logic to process the data. Geneva trainers trained over 100 company employees on site on the use of the ViewBuilder, for everything from basic view construction to use of the Geneva API, to key data warehousing concepts.
With PwC assistance the company implemented the first two ODSs. They have now moved on to developing additional warehouses on their own and multiple data marts.
The result has been the ability to replace certain legacy systems with much more flexible architecture. The company has made major strides in meeting it objectives of “…a single, consistent source of financially balanced data to be used by multiple business users for decision support, data mining, and internal/external reporting.” They have been able to create business intelligence in an intelligent way.
After this Fortune 100 firm implemented multiple SAP modules, they tackled the problem of analyzing all the integrated data captured by the new system. But running the SAP processes 24 hours a day, 7 days a week left very little time for anything but transaction processing. They needed their data warehouses updated three times each day to reflect the close of business in each of its worldwide operations. An innovative solution would be required, and it took a few attempts to find the right one.
One approach might have been to simply analyze the data in the production database. But the potential disruption of transaction processing ruled this out. There was simply no room for analysis in the transaction-processing environment.
Another approach involved replicating tables from the Oracle SAP transaction database to a separate SAP instance. This was supplemented with third party software to capture changes to specific data. After extracting the data, custom programs were used on other processors to complete the data transformation. The data was then loaded into a variety of EIS and data warehouse applications. While this approach freed up the production server, data volumes were so high that the extract and transform processes were taking 48 hours to complete for each of the three daily processes. They were getting farther behind the harder they worked!
The company turned to Geneva ERS for a solution. To shorten the data extract process, database triggers were used to capture just updates, deletes, and inserts to the database rather than replicating the entire database. To reduce the transform process time, the transformation logic was translated into GenevaERS “Views” supplemented with custom programs. This approach cut the elapsed time from 48 hours to 45 minutes without impacting the production system.
The company also found some unanticipated benefits as well. The Genera ERS solution provides a detailed repository of historical data for analysis. This data can be stored in the Geneva environment and archived from the SAP tables, thus improving the transaction system performance. New types of detailed analysis were possible using Geneva and the process has proven itself scaleable as the company continues to grow.
Since the first implementation, the company has employed GenevaERS in several different systems, and has used it to replace many data analysis processes originally coded in SAS. After reconfiguring one SAS application with Geneva, the end users were able to slash eight hours off of the delivery time for their reports.
A typical execution of GenevaERS at this client reads 100 million records, performs 1.2 billion foreign key joins across 2.2 gigabytes of table space. It writes out 100 million records in 70 user extract files. This process takes just under an hour of elapsed time and 73 minutes of CPU time.
Summary of Sizing Concepts and Architectural Alternatives
Richard K. Roth
September 15, 1996
Reasons for performance problems being encountered with operational reporting and data warehouse functions in medium- and high-volume SAP implementations are described. A generalized approach for estimating reporting workload early in an SAP project is provided along with throughput metrics that can be used to assess whether performance problems are unlikely, probable, or certain. The integrated ABAP/Geneva solution that was implemented to resolve the performance problems for Keebler is described, including comparative processing metrics for the alternative architectures.
This paper should be used in conjunction with the Data Warehouse Size Estimation (DWSE) model and instructions available for download from the IT Knowledge Net under the topic “Estimating Report Process Requirements.”
Experience inside and outside PW is demonstrating that the LIS, FIS, HIS, and EIS facilities of SAP are not adequate to support the full range of operational reporting or open information warehouse requirements for medium-to-large SAP implementations. Operational reporting and open information warehouse requirements in practice mean the ability to “slice and dice” the granular operational data captured by SAP applications for a variety of business purposes. Since our client base for SAP implementations consists of larger organizations, it is important for Price Waterhouse to identify up front when problems are likely to be encountered and be ready with solid recommendations on how requirements in this area should be accomplished.
“Data warehousing” frequently is assumed to be the solution for the functional void in reporting that needs to be filled. A problem with this terminology is that “data warehousing” has come to mean different things to different people. And in some quarters, lumping the problem of slicing and dicing granular SAP data under “data warehousing” has resulted in an inadvertent underscoping of the general reporting problem that needs to be solved.
Common concepts or topics that frame notions of data warehousing include: varying levels of data summarization, limitations on data content to the needs of particular business purposes (subject areas), time variant snapshots, focus on interactive usage and GUI presentation, performance measurement, limited scope pilot applications as a way to get started, and data warehousing software and hardware most widely discussed in the technology press. All of these things have their place under the right circumstances. But, approaching the problems being faced by big businesses with new SAP applications in these terms can be misleading with respect to the fundamental reporting problem that needs to be addressed.
Reporting Problem in General
Implementing a new SAP application means that the corresponding legacy system reporting framework being used to run the business is being replaced (presumably improved). Depending on the application, the old framework typically consists of hundreds to low single thousands of reports (on-line and hard copy). These reports were developed in response to business problems and ways of managing the enterprise that evolved over time. Certainly, a large portion of the legacy reports are obsolete and a new concept and framework for reporting will consist of many fewer more targeted reports. But, what is important here is that the mission critical reporting framework must be replaced as part of the SAP implementation. This is a general problem that should not be approached inadvertently with a set of limiting assumptions about what data warehousing is or is not.
Reporting problems come in varying sizes depending on the data basis and size of the reports in the mix. The data basis for a report means the number and size of data records that must be manipulated in order to produce the report. Report size means the number and length of the lines that constitute the final report output. By far the largest single factor in determining overall reporting load is data basis.
The most important aspect of data basis usually relates to the number and size of records that must be extracted then further manipulated to produce the report output. However, under certain circumstances, the number and size of records that must be scanned and evaluated as candidates for extract can be the primary determination of data basis. In cases where the extract data basis is small, but large sequential table scans must be executed in order to identify the extract records, input table size becomes the relevant data basis for projecting reporting load.
In general, the size of an overall reporting load can be determined by aggregating the data basis and report sizes across the number of reports and frequency with which they will be executed in a given period of time. By stratifying reporting requirements in categories of on-demand, daily, weekly, monthly, quarterly, and annual, an accurate picture of report processing load over time can be projected. It generally is a good practice to project detail reporting load separately from the summary reporting load.
Until an attempt is made to project the reporting load in this way, it almost certainly will be underscoped. Fortunately, most reports (largest number of reports) are low data-basis operational reports that typically are small in terms of data basis and report size. As a class, these reporting requirements do not contribute greatly to overall reporting load and do not need to be evaluated in detail. However, high data-basis reports tend to be comparatively few in number but account for the bulk of the overall reporting load. Underscoping risk primarily is in the area of high data-basis operational reporting requirements.
Low data-basis operational reports tend to deal with business problems involving open orders or transactions for individual customers, or other small subsets of data constrained to a limited period of time. Very often the indices that support SAP transaction processing also support the access requirements of operational reporting. Consequently, limited table scans are required to access the few records in the data basis for these kinds of reports. In general, the majority of low data-basis operational reporting requirements can be satisfied with a data basis under a hundred records and most with under a thousand, making them good candidates for on-demand processing. Even a large number of people executing these kinds of reports frequently will not create an unmanageable reporting load.
High data-basis operational reports tend to include requirements for retrospective analyses of transaction processing over time. High data-basis operational reporting requirements come from all major aspects of a business whether they be marketing, sales, product development, financial, production, human resources, procurement or legal to name some major constituencies. And all of these constituencies are interested in the basic transactions captured and processed by the SAP modules for many of their analytical purposes.
Keebler High Data-Basis Reporting Problem
Although the number of high data-basis operational reports tends to be dramatically smaller than the number of low data-basis reports, the data basis and size for these reports tends to be dramatically larger. The experience at Keebler clearly bears this out: in the original analysis of high data-basis operational reporting requirements, only 45 sales analysis reports were identified, but the data basis for these reports turned out to be approximately 23 gigabytes per week on average (peak loads at end of period are much higher than the average indicates). Examples of these reports are included below:
- FS Sales Force Ranking
- FS Item Customer Pch: SIUs
- FS Item Customer Pch: Cases
- Food Service Ranking – ITEM
- Food Service Ranking – Customer
- Food Service Comparison CASES
- Food Service Comparison $$$
- Food Service Budget – PTD to HYTD
- Item Sales to Budget (Div.)
- Diversified Retail Sales – Final
- SBU Item Sales for Period: (by Sales Org)
- SBU Item Sales for Period: (SBU-Brand-Item) Corp Type
- Item Sales to Budget (Store Door)
- Budget Buster RANKED
- Final Sales BT
- Budget Buster
- Military Account Item
- Account Item – Trend & YTD
- Customer Purchase (Store Detail)
- Gain & Loss, w/ Allow.
- FS Cat. Grouping
As should be evident from the titles, these do not represent “nice to have” reporting requirements, but are blocking and tackling period-to-period comparisons of key volume statistics needed to run the company. The 23 gigabytes also does not include Keebler-specific requirements for data reorganization due to material, sales and other account hierarchy changes, legacy systems data integration, replacement of reporting currently outsourced to Synectics and substantial ad hoc requirements that are anticipated. Unless a detailed analysis of the high data-basis operational reporting requirements is done, it is easy to dismiss these kinds of basic reporting requirements as something that will be handled in background processes when time is available.
In order to understand whether a given reporting load could be handled in background when time is available, it is necessary to have some basic metrics about how fast this kind of processing can be done in a given environment. Assuming an HP T-500 8-Way world with EMC disk (the Keebler configuration), extract processing can be done at a rate of about 6 megabytes/minute (if table joins are required, calculations need to be adjusted to reflect processing at a rate of about 3,000 joins/minute). These numbers are consistent with information from SAP on performance in reporting and general Price Waterhouse experience with UNIX SMP machines running third-party relational data bases. In the Keebler case, this would mean approximately 65 hours per week of continuous processing just to complete the initial extract, assuming no table joins were required. Taking into account table joins would mean the basic extract processing would take about 260 hours of continuous single-threaded processing. Further, according to SAP, a general rule of thumb is that reports with a data basis of several hundred thousand records will require several hours to process (this is consistent with our Keebler observations). Considering that the 23 weekly Keebler gigabytes represent about 210 million extract records, something on the order of 2,000 processing hours per week would be a reasonable estimate (albeit a practical impossibility to implement) of the end-to-end processing load, given the Keebler environment.
Another consideration is that, according to SAP, report sizes in excess of 50 megabytes cannot be produced at all because of application server memory limitations (no Keebler attempt at this kind of reporting got far enough to verify this limitation). This would mean that reports would have to be split up (stacked), which in turn would increase the extract scanning load substantially.
* * *
To recap the Keebler experience, it was assumed at the outset that most reporting could be handled as a data warehouse problem through the SAP SIS and EIS facilities. Early experience with using these facilities highlighted the fundamental throughput problems and other alternatives were explored, including popular data warehouse tools. However, it was when the detailed analysis of the basic high data-basis operational reporting load was completed that the scope of the problem became apparent. No rationalization about data warehousing being limited to performance measurement or summary level numbers could eliminate the need to produce the reports or reduce the data basis required to generate them. Understanding the scope of the problem in terms of data basis was the key to focusing attention on architectural solutions that would work at scale.
Projecting Data Basis
Data basis is a function of:
- Transaction volume;
- Data structure complexity; and
- Number of reports, level of detail, and content requirements.
Transaction volume. Transaction detail represents the richest source of information about financial and other events captured by a system. As part of transaction processing, information entered about individual events is augmented with data from master tables according to business rules that constitute the accounting model in place at that point in time. Depending on SAP module and configuration options implemented, detail transaction files (e.g., SD COPA CE1 table) are posted in addition to various summary balances in the SAP data model.
In principle, a cumulative inception-to-date transaction file plus the related effective-dated reference tables could be the source data for all types of operational reporting. Projecting data basis for this reporting architecture would be relatively easy: total gigabytes of transaction files that would have to be scanned for each report, taking into account the number of reference table joins that would have to be performed as part of processing, extended by the frequency for reports over a period of time, would be the data basis for this kind of reporting load.
Taking the Keebler SD case as an example, 120,000 1,608-byte CE1 transaction records per day comes out to about 4.5 gigabytes per month or 111 gigabytes for the two years of history. The 45 basic reports actually represent an average of about 5 alternate sequences per report, so taking into account the repetitive aspects of daily, weekly, and monthly cycles, there are about 700 elemental reports that have to be produced to accomplish the basic high data-basis monthly reporting requirement. Producing one set of all reports directly off the detail would mean a data basis of something like 2.8 terabytes, clearly an impractical load to process at a rate of several hours per gigabyte, or even at a rate of several gigabytes per hour.
For audit trail or archive reporting purposes, there is no way around facing up to the volumes associated with cumulative transaction files, unless these requirements simply are ignored. However, for most summary reporting purposes, effectively designed summary files can dramatically reduce the data basis required to accomplish these requirements. The degree to which data basis reductions can be realized though use of summary file schemes depends primarily on data structure complexity.
Data Structure Complexity. The simple physics of routinely reprocessing raw transaction data as the method for producing reports was the driver behind development of the manual accounting cycle. By determining the important summary totals required in advance, then classifying all transactions accordingly as they were journalized, detail could be archived period-by-period leaving a practical size data basis for report processing over time. In the manual world, this required maintaining a separate set of account structures and subsidiary ledgers for each summarization structure. General ledger accounting, cost accounting, revenue accounting, tax accounting, fixed asset accounting, inventory accounting, regulatory accounting, receivable accounting, payable accounting, and so on, all had needs for different sets of summary totals derived from various subsets and supersets of the same basic transactions. As automation of these reporting processes advanced through the ‘60s and ‘70s, the basic subsidiary ledger structure of the manual world was preserved in the subsystem architecture of computer-based applications.
As computers became faster at re-sequencing the same basic transaction set a variety of different ways, more granular account structures started to emerge in subsystem implementations. Product, organizational, and project dimensions started to be included in general ledger account structures by the early ‘80s simply because it became practical to extract, sort, and summarize a few hundred thousand to a million or so fully qualified summary accounts several different ways, something that was strictly impractical in a manual world. As a result, separate subsystems were no longer required for each business function that required a different roll-up of transaction detail. But, the determination of what could be consolidated and what had to be broken out in a separate subsystem continued to be arbitrated primarily based on the volume of fully qualified accounts implied by the dimensions included in the account classification structure.
SAP presents a highly integrated profile for transaction processing, which leaves the impression that highly integrated cross-dimensional views of the data captured also should be available. However, there are virtually an unlimited number of ways transaction-level data from SAP could be summarized taking into account all the possible cross-structural and time-delimited combinations. Fortunately, for a given implementation, patterns in the use of cross-structural combinations usually emerge that permit reporting data basis to be minimized by taking advantage of “locality” in the transactions with respect to account classification structures (i.e., maintaining summary files). Locality means that a large number of transactions create a much smaller set of combinations when summarized by a set of cross-structural dimensions. How substantial the locality effect is depends on transaction coding patterns, the cross-structural dimensions that are required given the patterns of report usage, and the number, breadth, and depth of the structures — data structure complexity.
The benefits of locality are usually substantial. But summary files are often assumed to be the “big” answer that makes the high data-basis operational reporting problem go away. The truth is that summary files quickly get larger than the detail files if a new summary file is created for each new use of the data that requires a different slice, which means that the summary files become a significant volume and maintenance problem unto themselves. In practice, summary files should be maintained to the point where the data basis reduction related to report production is greater than the data basis created by the need for summary file maintenance functions. In addition, summary files should be employed as sparingly as possible due to the substantial system administration requirements that attend their maintenance and use. In the Keebler case, only three (3) different cross-dimensional summary files maintained in multiple partitions by time-slice were required to satisfy these conditions, which is typical for data models of this type.
Modeling the degree of locality can be done fairly easily by using data from existing systems to calculate the collapse rate across structures. Three-to-six months of the two-to-four highest volume transaction types related to the structures in question usually is enough. Simply sorting the transaction files and calculating the average number of unique cross-structural key combinations by week, month and quarter will give a reasonably good picture of what kinds of collapse rates can be expected. The result usually is far less dramatic than is assumed in the absence of a detailed analysis.
Frequently, special reporting requirements will emerge that have not been (or cannot be) anticipated and are not supported by summary files. If the requirements relate to summary reports that will be produced with some frequency, the summary files can be modified and regenerated or new summary files can be created by reprocessing the detail transaction data. If the requirements relate to special requests with no pattern of usage, or if they are audit trail or archive requirements, or if requirements are that summary files reflect the current account structure hierarchies, there must be a way to process the transaction-level detail, or the requirements must be ignored. Said differently, unless all summary reporting requirements can be specified precisely at the outset, and audit trail, archive, or structure reorganization reporting is not a requirement, processing the transaction detail will be unavoidable. Summary files are not the “big” answer. They are just an important part of the solution.
Number of Reports, Level of Detail, and Content Requirements. Easily the biggest driver of data basis is the requirement to report at a low levels of hierarchies in cross-structural combinations. Subsystem reporting requirements typically are limited to summarizing data one structure at a time. As a result, the number of fully qualified accounts that get created by even large volumes of transactions is limited to the number of accounts in the given structure. By avoiding the need to report by customer/product/organization, vendor/SKU/location, employee ID/general ledger account, and so on, subsystem architectures limit their reporting data basis exposure — and consequently their usefulness.
Reporting at low levels of cross-structural detail usually is important because that is the level at which day-to-day resource allocation decisions are made. Inventory replenishment, sales force management and compensation, promotional campaigns, and crew scheduling are examples of large dollar investments managed in atomized quantities of resources, location-by-location, vendor-by-vendor, days at a time. High-level performance measures that show out of whack inventory turnover rates or sales per employee do not help identify the out-of-stock problems that need to be fixed or the particular sales calls not getting made. The thousands of reports in legacy systems got created because figuring out what was going wrong required a specific cross-structural slice of the data to illuminate the drivers about which something could be done.
It is unlikely that thousands of reports are needed to run any business at a given point in time. But, over time, the number of problems that need to be solved certainly accumulate into the thousands. In the world of subsystem architectures, a new problem meant building a new reporting subsystem (that mostly never get retired). In the integrated SAP world, the underlying assumption is that whatever slice is needed will be available because the transactions were captured by an integrated process. But, as we have seen, availability of any slice means access to detail or maintenance of every summary total (a clearly impractical alternative).
Two basic approaches to defining reporting level of detail and content requirements can be taken:
- Produce a pre-defined set of reports in case they are needed.
- Provide a framework that can generate almost anything, but produce only what people ask for.
The first approach is easiest to manage from a project standpoint. But, limiting the set of reports to a practical number is difficult. And it results in an enormous amount of processing for generating numbers nobody looks at. Also, it is not what most clients have in mind when they embark on implementing a new integrated information system. The second approach is preferable, but more difficult to achieve. Something that takes both into account is required.
An exercise that can be done very early in the project that promotes achieving the right balance is as follows:
- For the high-volume aspects of the modules being implemented, define the transaction-level data that constitutes the basis for operational reporting.
- Define the lowest level of cross-structural reporting that will be supported.
- Define the inter-period comparison (e.g., last year/this year, structure re-organizations) features that will be generally supported.
- Define a reasonably inclusive set of cross-transaction accumulators (e.g., promoted/not promoted sales, gross sales, returns, statistic 1, statistic 2, etc.) that will be generally supported.
- Define a basic set (20-40) of potentially high data-basis operational reports to be implemented that represent ranges in level of detail and content.
- Design the cross-dimensional summary files that efficiently support producing those reports over a two-year time horizon.
- Calculate the data basis for generating and maintaining the summary files and producing the reports.
(An Excel Data Warehouse Size Estimation (DWSE) model, with instructions, is available for download on the IT Knowledge Net to facilitate doing these data basis projections; see topic “Estimating Reporting Process Requirements.”)
The summary of results from the model for the current Keebler implementation are provided as follows:
|Day||Sum of Report Count||17|
|Sum of Table Joins – Total||124,818,052|
|Sum of Recs Read – Total||42,960,000|
|Sum of MB Read – Total||34,005|
|Sum of Records Extracted||22,774,204|
|Sum of MB Extracted||2,004|
|Week||Sum of Report Count||54|
|Sum of Table Joins – Total||873,863,473|
|Sum of Recs Read – Total||182,064,000|
|Sum of MB Read – Total||173,751|
|Sum of Records Extracted||57,880,800|
|Sum of MB Extracted||5,708|
|Period||Sum of Report Count||75|
|Sum of Table Joins – Total||507,524,187|
|Sum of Recs Read – Total||506,832,000|
|Sum of MB Read – Total||337,157|
|Sum of Records Extracted||56,289,420|
|Sum of MB Extracted||21,670|
|Quarter||Sum of Report Count||2|
|Sum of Table Joins – Total||1,514,661,174|
|Sum of Recs Read – Total||16,584,000|
|Sum of MB Read – Total||7,500|
|Sum of Records Extracted||10,209,000|
|Sum of MB Extracted||876|
|Half year||Sum of Report Count||9|
|Sum of Table Joins – Total||–|
|Sum of Recs Read – Total||105,300,000|
|Sum of MB Read – Total||89,771|
|Sum of Records Extracted||19,908,000|
|Sum of MB Extracted||6,993|
Even though specific configuration details are not available early in the project, the drivers of data complexity, level of detail, and report content are sufficiently predictable based on existing business processes to yield very useful results. This exercise forces the scale issue to surface early so that informed decisions can be made about hardware and software platform. It illuminates the generalized nature of the reporting problem and promotes implementing a generalized solution rather than a one-program-one-report architecture.
Demonstrating a generalized solution helps with scope control because it is clear how new requirements can be accommodated as they evolve over time. It minimizes the tendency to define everything the existing system does plus everything new that can be thought of into the requirements. And it shows that so many possibilities exist that there is no hope of producing everything just in case somebody wants to use some of it.
Most importantly, this exercise makes it possible to make meaningful economic decisions about how much and what kind of reporting will be supported. The big drivers of cost in reporting are data basis and number of reports. By creating this model early in the project that can be exercised as new ideas are explored, expectations about reporting requirements and technology can be managed on a factual rather than emotional basis.
Components of a High Data-Basis Operational Reporting Solution
Given that SAP facilities alone are not adequate, something else is required. A basic assumption in the market is that some sort of data warehouse solution probably is the answer. In addition, the components of the architecture are widely assumed to be based around SMP or MPP hardware, a relational and/or multi-dimensional data base and some OLAP presentation-layer tools.
The problem is that the heavy data manipulation load associated with high data-basis operational reporting is not a process that works well on bus architecture machines using these kinds of access methods. Only by ignoring the high-volume aspects of reporting requirements can these kinds of solutions be expected to work in a cost-effective way.
Current SAP operational reporting/data warehouse strategies seem to fall into three major categories:
- Use SAP summary level reporting features (SIS, LIS, EIS, etc.) and access the data directly from SAP data structures to support very low data-basis operational reporting requirements. General “slicing and dicing” of granular detail cannot be supported. Practical limits on summary table size are in the 10,000s of rows.
- Extract SAP data and create a series of summary files in relational or multi-dimensional data bases for access by third-party data warehouse tools to support low-data basis operational reporting requirements. General “slicing and dicing” of granular detail cannot be supported. However, options for broader-based interactive access to pre-defined summary tables are available compared to native SAP facilities. Practical limits on summary table size are in the 100,000s of rows.
- Extract SAP data and create a separate operational data store containing detail and summary files as required. Use Geneva or custom COBOL/Assembler programs for extraction and reduction processing on an MVS machine. Use third-party data warehouse tools or SAP/ABAP facilities for presentation-layer functions on MVS, UNIX, or other platforms as required.
In low data-basis situations, or where reporting requirements can be constrained, or where reporting requirements can be precisely defined at the outset and remain stable over time, a creative implementation of Strategies 1 or 2 may be adequate. However, for larger clients with even medium transaction volumes and moderately complicated data structures, including an MVS-based operational data store in the architecture (Strategy 3) is a very practical solution for dealing with the inherent data bandwidth problem. In addition, the MVS environment provides for managing and achieving high throughput for the multiple concurrent batch processes characteristic of operational reporting problems. Complex batch streams with contingent dependencies are not similarly well supported in the UNIX world.
The Strategy 3 architecture that was implemented to solve the Keebler problem very successfully demonstrates the essential elements of a general solution:
In this solution, the heavy data manipulation processes and archive requirements were off-loaded from the SAP UNIX machine. This left the SAP R/3 environment free to be tuned for transaction processing, unencumbered by extensive summary file posting requirements and accumulating detail data volumes. High data-basis report generation takes place on an outsourced MVS machine and answer sets are returned to the SAP/UNIX environment for access through an ABAP report viewer that appears to be a completely integrated part of the SAP application. It is an extremely cost-effective way to provide broad-based desk-top access to high-volume operational data.
Typical CPU and wall times for end-to-end processing of the reporting load described above using the Geneva engine are as follows:
Frequency CPU Time Wall Clock Time
Daily 15 min 1 hour
Weekly 150 min 3.5 hours
Period Summary File Build 300 min 8 hours
Period Reports 350 min 6 hours
(CPU/wallclock ratios vary due to differences in tape processing requirements, time of day the jobs are run, and whether the Geneva configuration is set up for single or multi-threaded processing — net result: cumulative processing time over the course of a month is approximately 26 CPU hours)
A more general presentation for high-volume situations that accounts for the various tool sets that could be employed would look as follows:
Event-level data from SAP, legacy, and other operational systems would be extracted and transformed using various tool sets as appropriate under the circumstances. The transform process would primarily deal with data cleansing issues (i.e., gaps, inconsistencies, and errors). Data reduction (i.e., summarization level and content) would be left primarily to the ODS component managed by Geneva or custom programs. The primary advantage of this separation is that this provides a systematic framework for re-running reductions over time as new requirements emerge. Reductions can be in the form of end-user report files delivered directly through some presentation layer, as was done in the Keebler case with ABAP on the UNIX machine. Or they can be data sets for downstream processes, including replacements or updates to relational or multi-dimensional data bases, data mining applications that require pre-processed inputs, or interfaces to other systems.
Absent an ODS component in medium- and high-volume situations, requirements must be defined very precisely at the outset, they must be stable over time, and limited to what the assumed technology platform can practically accommodate. As early SAP implementation experience has shown, long-range requirements are difficult to define. And volumes that can be feasibly accommodated are dramatically less than most imagine.
However, by incorporating an ODS on a high-data bandwidth platform for systematically reprocessing transaction-level data and maintaining/re-generating summary file structures, highly cost-effective and practical scale and flexibility can be designed into the architecture from the outset. SAP high data-basis operational reporting and data warehouse problems can be readily solved by using right tools for the right job in medium- and high-volume situations.
 For example, a data basis of all invoices for the last year would be a large data basis when compared to only invoices for yesterday, which would constitute a relatively smaller data basis. Last year’s invoices or yesterday’s invoices could be summarized at a high organizational level such as division, which would result in a small report size, or either data basis could be summarized by customer, item and week, which would result in a relatively larger report size.
 Detail (audit trail) reporting is a class of analytical reports that fits this profile. The general problem is to enumerate the detail transactions that support summary numbers accumulated as part of transaction processing, or summary numbers generated on other operational and analytical reports. Unless an index is available that permits going directly to the small subset of transactions required, sequential table scans are required to pick out the transactions that qualify for a given set of selection criteria. If the criteria require that table joins be executed as part of the processing, this needs to be accounted for accordingly.
While the extract data basis and consequent report sizes are small for the bulk of detail reporting requirements, the number of detail report requests can be substantial and the primary selection criteria can be based on any field in the data base, not just those with indices. The more robust the summary analytical reporting framework, the more demand there will be for detail reporting to substantiate the summary numbers. And the more likely it is that the robust summary numbers will be based on dimensions of the data base that do not have indices to facilitate selection processing.
 The nature of the processing necessary to produce reports also is an important factor. To the extent that the source data structures for a report are denormalized, or a report requires only a single table for input, data basis and size provide a good surrogate for describing reporting loads. To the extent that normalized structures are involved that force numerous table joins to be performed as part of extraction, calculation and formatting processes, projections of overall reporting load must be adjusted accordingly.
 The weekly data basis would have been over 20 times the 23 gigabytes if all processing had been done using transaction level data (basic transaction volume is 120,000 order lines per day @ 1,608 bytes per line). Based on the reporting requirements identified, summary file structures were defined that would reduce total data manipulation load necessary to produce the reports. As new reporting requirements emerged during development, summary files were repeatedly modified and regenerated from the transaction detail as necessary to support efficient production processing. It is expected that the summary file structures will continue to be modified and regenerated over time as new reporting requirements evolve.
 In some cases, obtaining detail transaction records from SAP is difficult for a variety of reasons. Since this is the richest source of information, as well as the basis for auditability of numbers generated by SAP, it is important that we resolve the open issues and develop consistent recommendations surrounding how all necessary transaction detail should be obtained from SAP, where it should be stored under varying volume situations, and how it should be accessed.
 Fortunately, audit trail reporting tends to involve large table scans but small extract sets, which means that processing subsequent to extraction is minimal.
 SAP highly integrates the interrelated aspects of transaction processing that traditionally have been disconnected by subsystem architectures. As transaction detail is captured in SAP, a very rich set of attributes about each business event is recorded. On a go-forward basis, business rules can be modified to accommodate new requirements or other changes over time. And SAP provides for posting a wide variety of summary records at transaction processing time to facilitate inquiry and operational reporting requirements. But, the determination of what summary totals will be required for each aspect of managing the business remains part of the up front configuration exercise. The ability to adapt to retrospective information requirements that were not foreseen up front, or new requirements that emerge over time, is very limited in the SAP world.
SAP will post only those summary totals that it is configured to post at the time a transaction is processed. No systematic facilities are provided for accessing or analyzing the rich detail that gets captured. There is no framework for mass regeneration of summary totals that are posted in error due to configuration problems. There is no general method for reorganizing data to reflect retrospective changes in account hierarchy relationships. SAP effectively addresses problems with transaction processing integration. But SAP does not effectively address integration problems related to high data-basis operational reporting.
In 1996 a major US cookie manufacturer installed SAP to manage all aspects of the manufacturing, HR and financial operations. The system captured data and provided adequate information for managing most operational aspects of the business. But SAP could not deliver the detailed sales analysis necessary to support the highly detailed, store-specific sales process. They chose Geneva ERS to solve the problem in an integrated SAP environment.
The reporting requirements were substantial. The company required detailed information at the customer/line item level over various time slices from daily to year-to-date, for over 60,000 customer stores and 2,000 items, selecting and summarizing by over 100 fields. The company sells over a million items per week, and they needed comparative information for each time period for at least the last two years. They also needed comparative data be reorganized to reflect changes in organizational structure over time!
A Geneva ERS application was developed to deliver the full benefit of the SAP implementation, without burdening users with another tool. SAP’s Profitability Analysis module was configured to capture the required data. Programs were written to extract the sales items from SAP on a nightly basis. A series of Geneva ERS processes were developed to manage the data warehouse, including creation of various summary file structures. The completed data warehouse contains over 250 gigabytes of data in 15 entities, containing over 300 million records.
Geneva ERS was configured to produce executive information files, commonly called “information cubes.” The company defined Geneva ERS “Views” to create over 150 cubes, and 75 interface files. Geneva’s effective date lookup processing allowed for recasting the historical detail to reflect the current management structures.
The SAP Executive Report Viewer was also developed. Constructed within the SAP environment and fully integrated with the Profitability Analysis reporting, this analysis tool gives users a single source of information. Through this tool, users are able to select a time period, and then the respective cube. Users can specify selection criteria, drill down into reports to lower and lower levels of detail. They can export to Excel, view the data in a graphical format, format the report, and save a profile of the report.
Users have commented that the sales reporting system has been one of the most successful components of the SAP implementation. It has allowed the company to capitalize on it national sales force that makes personal contact with stores nearly 3 times each week. The sales representatives are armed with needed information. They can analyze customer and product information for daily, weekly, monthly, year to date, and spot trends from over 3 years of data. The system has helped fuel their growth in net income nearly 10 times from 1996 to 1999, and has scaled as the company has purchased and integrated three additional companies.
The following brochure described the Geneva Solution for the State of Alaska. The origins of Geneva started in the early 1980’s with a custom developed financial system for Alaska. In circa 1994 the newer version of the tool (version 3.0) was installed to enhance the reporting processes.
by Richard K. Roth, Principal, Price Waterhouse LLP
Eric L. Denna, Warnick/Deloitte & Touche Faculty Fellow, Brigham Young University, Marriott School of Management
Through employing events-based concepts in systems architecture, organizations can dramatically increase the value and role their central systems play in accomplishing the business mission. Events-based architectural concepts result in system designs that increase the timeliness and quality of information, as well as expand the scope of business functions addressed by a given set of computer programs. Compared to traditional designs, events-based designs result in systems that are better, faster, and cheaper — from both development and operational standpoints.
The concept of event-driven (or events-based) systems started with taking a broader view of the proper role and scope for accounting systems. So we have used the accounting system problem as the basis for our discussion here. However, the same broader view applied to other application areas leads to the same general conclusions.
Events-based concepts discard the arbitrary limitations that historically have been used to narrowly define accounting. Under an events-based concept, owners of an accounting system become anyone in the organization with a need for information about events captured by the system, not just the accountants. While the accountants continue to be important users of the system, the financial statements and other reports accountants need are viewed as simply another byproduct of capturing and summarizing all the events of importance to the organization. Accountants use what they need to prepare balance sheets and cash flow analyses. Managers use some of the accounting data along with statistics about claims paid or policies issued to determine the number and type of adjusters and agents that should be assigned to various risks or locations.
Moving off the old accounting paradigm and its artifacts of invoices, journal vouchers, general
and subsidiary ledger files, etc., to an events-based approach has the effect of simplifying the system architecture problem for accountants, other users, and system builders alike. The traditional accounting system architecture has invoice and disbursement transactions being recorded in a payables subsystem, billing and cash receipt transactions get posted in a receivables subsystem, inventory receipts and issues get recorded in an inventory subsystem, and everything gets posted twice in the general ledger subsystem. Special functions often have to be developed to record cash receipts that represent previous overpayments of invoices or disbursements that represent refunds of credit balances in customer accounts. The effect of this traditional accounting system architecture is to create the multiple system problems of redundant entry and reconciliation within what appears to be the same system. This is why traditional accounting systems usually seem more complex than a general understanding of the accounting problem would suggest they should be.
By viewing accounting events in the same context as all events, a different architecture is suggested because it becomes impractical to build a new subsystem each time some new event needs to be captured. Information about vehicles inspected, miles of road striped, licenses issued, inventory counted, invoices received, invoices paid, bills issued, and cash deposited take on a similar appearance when viewed in the broader context. An events-based architecture for a system allows events to be summarized, counted, and compared at various levels of detail for various time frames according to the needs of the particular user concerned. Trends in product sales/square foot of floor space is of interest to assortment planners in retail operations. Payroll costs divided by miles of road striped by month would be of interest to a highway maintenance engineer. Warranty claims by product and failure type are of interest to manufacturers. Policy claims by risk with policyholder overlays would be of interest to risk managers and accountants alike.
As these examples illustrate, we can still get the accounting information required for financial accounting purposes from an events-based architecture by comparing various classes of events without having to incur the overhead of developing, operating, maintaining, and (probably most of all) understanding separate subsystems just to keep track of these subtotals for us.
Exhibits I-I through IV have been included to contrast the differences between traditional and events-based system architectures.
Exhibit I-I shows the basic architecture that has been used in accounting systems automation for the last 20 or so years.
Accounting events (which are a subset of all events a business would like to plan for, evaluate or control) are captured by a set of computer programs collectively called a transaction processor. The transaction processor takes a subset of information that was present on the accounting events as they were originally entered and posts the transactions on one or more summary files. The full detail is then posted to history files, which typically are archived or otherwise made relatively inaccessible. Report programs specific to the financial purpose being addressed are developed based on the specifics of information that was posted to the summary files by the transaction processor.
To the extent that the purposes being addressed have been fully provided for in the summary files and report programs, information can be obtained from the system. However, to the extent that detail information is required that was lost during the summary posting process, or was not captured at all because it was considered outside the scope of what accounting events properly should include, these requirements cannot be accommodated by the system and must be addressed by a subsidiary system specially designed for that purpose.
Exhibit I-II shows the extension of this result to the primary purposes that traditionally are included as part of the financial function. Financial reporting functions are provided for in a general ledger subsystem. Inventory control and reporting functions are provided for in an inventory subsystem. Fixed assets are a different type of inventory and are provided for separately again …and so on.
Each subsystem characteristically is designed to accept and process the specific events that are the appropriate subject matter for its corresponding area of the business. Event details are posted to inaccessible history files and specially designed summary files become the basis for narrow focus reporting requirements. Visibility to detail events captured and lost in the history files of one system become further removed in the context of the multiple subsystems of what we refer to as the — Financial Subsystems Paradigm. Each subsystem has its own data structures designed to support its particular purposes and business rules for transaction processing and reporting. Maintaining and understanding these subsystems and their interrelationships becomes an area of specialization in its own right. Systematically ensuring that the various subsystems are in balance is a labor-intensive function that consumes much of the energy invested in accounting activities. The numerous inter-system relationships make flexibility practically un-attainable.
Exhibit I-III illustrates the real impact of inside-the-box thinking relative to the Financial Subsystems Paradigm. The mainstream events that make or break business missions often have financial effects, but there also are other information requirements that are above and beyond what accountants need to accomplish their mission. By defining accounting systems narrowly, the Financial Subsystems Paradigm has left most mainstream business requirements to be addressed by still separate subsystems that capture most of the same events needed by the accountants, but with more detailed information attached.
The separate subsystems also have the same basic architecture of traditional accounting systems which is characterized by a transaction processor that archives detail and posts summary files that feed narrowly defined reporting processes. The result is that parallel systems are required to support accountants and mainstream business requirements which in turn creates increasing layers of reconciliation, flexibility, system maintenance, and data visibility problems. The layers of complexity caused by traditional architecture, coupled with the Financial Subsystems Paradigm, have resulted in a virtually unworkable situation from an enterprise standpoint.
Traditional architecture was fit for the purpose in the world of 20 years ago when the economics of computing power forced designers to archive detail (much like manual accounting systems) and automated systems were few in number, limited in scope and non-strategic in nature. A new paradigm is required as we attack integrated system requirements. The events-based architecture illustrated in Exhibit I-IV is a solution that is attractive both theoretically and practically.
Exhibit I-IV illustrates an architecture where the point of departure is a broad view of the business systems problem.
It views the proper subject matter for a system as all business events that an organization is interested in planning for, evaluating or controlling. Rather than building separate transaction processors to capture, edit and post payroll transactions, general ledger transactions, inventory transactions… and so on, one transaction processor can be designed to capture events according to their respective data structures and apply the respective business rules for editing and posting and reporting in a generalized way. It explicitly recognizes that detail information should be accessible so that information requirements can be accommodated as they are identified over time. In addition, it
recognizes that mainstream information require-ments and financial information requirements should be supported through the same data capture, maintenance, and reporting facility.
The practical result of viewing systems problems in this broader architectural context is a consolidated design for the systems that get built. A consolidated design means fewer programs to write, test, and maintain. Fewer interrelationships must be defined, understood, explained, and reconciled. The sensitivity to requirements defined at the outset is diminished substantially because views of the data captured are not limited by the summary data structures posted by the transaction processors.
A consolidated architecture is better because of its flexibility in accommodating changing requirements. A consolidated architecture is faster due to the reduced development life cycle emanating from the underlying simplicity of the concept. It is also cheaper because redundant functions are eliminated and there are fewer objects to build and maintain in the application portfolio.
These are images of the original paper.