During the January 12th Technical Steering Committee meeting, Kip Twitchell provided an update on our proof of concept work with GenevaERS and Spark.
Kip provided an overview of the GenevaERS project at the mini summit held Friday October 30th 2020.
Organization of the project continues, but much progress has been made. Check out the Community Repository on GitHub, and its Governance and Technical Steering Committee Checklist to see what’s been happening.
But don’t stop there….
The first Technical Steering Committee (TSC) Meeting, open to all, is scheduled for Tuesday, August 11, 2020 at 8 PM US CDT, Wednesday August 12, 10 AM WAST/HK time.
Project Email List: To join the meeting or to be kept up to date on project announcements, join the GenevaERS Email List. You’ll receive an invitation as part of the calendar system. You must be on the e-mail list to join the meeting.
Spark POC Plans
The team continues to prepare the code for release. But that’s no impediment to our open source progress.
We’ve determined that the best way to help people understand the power of The Single Pass Optimization Engine is to contrast it–and begin to integrate it–with the better known Apache Spark.
Here’s what we plan to do work on over the next several weeks, targeting completion before the inaugural Open Mainframe Summit, September 16 – 17, 2020.
This POC was designed using these ideas for the next version of GenevaERS.
The POC will be composed of various runs of Spark on z/OS using GenevaERS components in some, and ultimately full native GenevaERS.
The configurations run include the following:
- The initial execution to produce the required outputs will be native Spark, on an open platform.
- Spark on z/OS, utilizing JZOS as the IO engine. JZOS is a set of Java Utilities to access z/OS facilities. They are C++ routines having Java wrappers, allowing them to be included easily in Spark. Execution of this process will all be on z/OS.
- The first set of code planned for release for GenevaERS is a set of utilities that perform GenevaERS encapsulated functions, such as GenevaERS overlapped BSAM IO, and the unique GenevaERS join algorithm. As part of this POC, we’ll encapsulate these modules, written in z/OS Assembler, in Java wrappers, to allow them to be called by Spark.
- If time permits, we’ll switch out the standard Spark join processes for the GenevaERS utility module.
- The last test is native GenevaERS execution.
The results of this POC will start to contribute to this type of architectural choice framework, which will highlight where GenevaERS’s single pass optimization can enhance Apache Spark’s formidable capabilities.
Note that the POC scope will not test the efficiencies of the Map and Reduce functions directly, but will provide the basis upon which that work can be done.
Open source is all about people making contributions. Expertise is needed in all types off efforts to carry of the POC.
- Data preparation. We are working to build a data set that can be used on either z/OS or open systems to provide a fair comparison for either platform, but with enough volume to test some performance. Work here includes scripting, data analysis, and cross platform capabilities and design.
- Spark. We want some good Spark code, that uses the power of that system, and makes the POC give real world results. Expertise needed includes writing Spark functions, data design, and turning.
- Java JNI. Java Native Interface is the means by which one calls other language routines under Java, and thus under Spark. Assistance can be used in helping to encapsulate the GenevaERS utility function, GVBUR20 to perform fast IO for our test.
- GenevaERS. The configuration we create we hope to be able to extract as GenevaERS VDP XML, and provide it as a download for initial installation testing. A similar goal with the sample JCL that will be provided. GenevaERS expertise in these fields is needed.
- Documentation, Repository Work, and on and on and on. At the end of drafting this blog entry, facing the distinct chance it will be released with typos and other problems, we recognize we could use help in many more areas.
The focus for our work is this Spark-POC repository. Clone, fork, edit, commit, create pull request, repeat.
On a daily basis there is an open scrum call on these topics held at this virtual meeting link at 4:00 PM US CDT. This call is open to anyone to join.
Contributions from all are welcome.