Photo by Jan Antonin Kolar on Unsplash
What is VSAM? The acronym is old enough that AM means access method, first coined for OS/360 in the early 1960’s. Access Method just means a method to access data on disk, tape, or an external device. VSAM provides several ways to access data, but only on disk (no tape support, except for export/import/backup activities). The VS means virtual storage, which back in the 60’s was a big deal, but today, not so much. This article will talk about VSAM ESDS and VSAM KSDS. ESDS, or Entry Sequenced Data Set provides direct access to a record, given an RBA (Relative Byte Address). A KSDS, or Key Sequenced Data Set provides direct access to a record, given an RBA. KSDS also provides direct access to a record, given a Unique KEY. Sometimes alternate keys are necessary, and sometimes keys are not unique. An alternate index (AIX) can be created for each alternate key needed for either an ESDS or KSDS. An ESDS cluster consists of the ESDS catalog entry and a corresponding data component. A KSDS cluster consists of the KSDS catalog entry, the corresponding data component, and the index component for the key. If a cluster has associated alternate indices, the cluster, indices, and paths are called a Sphere.
I first encountered VSAM in the late ’80s when working as a university student in IBM Toronto. We were building the C/370 compiler and runtime for MVS/ESA and VM/CMS. I was responsible for testing the C ANSI runtime extensions we were adding for VSAM through fopen(), fread(), fwrite(), and flocate(). I was super excited – not so much because of the work, but because we were behind in delivery and IBM put me up in a hotel for a week! They also paid for all my meals because my apartment rent was finished and I was supposed to head home to Vancouver. Living the life on IBM’s dime – I loved it.
As it turns out, more than 30 years later, VSAM is alive and well. It’s the backbone for a number of functions on z/OS, and because it’s blazing fast and super scalable, is heavily used in the real world. It’s also available to everyone that has z/OS, and is therefore a great option for anyone that doesn’t want to pre-req a database for something relatively simple.
I realized that my knowledge of VSAM was not as deep as I wanted, so I took a course from Interskill that covered the theory of VSAM very well, and I read VSAM Demystified. But I am very much a learn by doing type of person, so I decided to look for a key/value store using VSAM. I had already heard of zFAM in the past, which seemed a great match, but then I saw it required CICS. I didn’t want to use CICS (not because CICS isn’t great, but because I didn’t want to pre-req anything other than base z/OS). After writing the draft of this blog, I discovered VSAMDB, which does much of what I am trying to do here, albeit without a C interface. Luckily, the VSAM team plans to provide a C interface in the future. Perhaps I will blog about rewriting this service to use VSAMDB when it is available.
Key/Value Store Introduction
The requirements I imposed on myself for this key/value store were as follows:
Common stuff is simple. Setting a key/value pair is easy and retrieving the value of a key is easy.A permanent database (VSAM) that can be backed up, copied, and version controlled is used.It’s Fast.No limitations on what the characters in the key or value are.No limitations on the length of the key or value.Able to list a group of related keys easily.
I called the program Xsysvar, for extended system variable. Here are a few simple examples:
Set and get an extended system variable:
Xsysvar ‘MDLB URL’=’https://MakingDevelopersLivesBetter.wordpress.com’
Xsysvar ‘MDLB URL’
Work with a group of extended system variables:
Xsysvar -PZOS -C’z/OS CSI’ CSI=MVS.GLOBAL.CSI
Xsysvar -PEQA -C’Debug Tool CSI’ CSI=EQAE20.GLOBAL.CSI
Xsysvar -PIGY -C’COBOL CSI’ CSI=IGY630.GLOBAL.CSI
Xsysvar -l CSI
ZOS CSI MVS.GLOBAL.CSI z/OS CSI
IGY CSI IGY630.GLOBAL.CSI COBOL CSI
EQA CSI EQAE20.GLOBAL.CSI Debug Tool CSI
Creating the Database
I need to decide how to access the data. The key is a key (not surprising – it is called a key after all). The key isn’t unique because I want to support very long keys, and VSAM requires that a key be fixed length. Making the key unique would require the fixed length be very long, which would waste space. I broke the key into a fixed (non-unique) and variable part, with the fixed part being at most 15 characters, and the variable part being less than 32K. The intent is for most of the keys to reside completely in the fixed component, with some needing the variable part. Also, most keys would vary in the first 15 characters. Very short keys would waste some space, but not much given the fixed part is only 15 characters.
I want to use this key store as an alternative to the config files currently being used for the zospm technology I’ve written about in the past. Therefore, I need to work with a group of keys, like all the keys associated with IMS or CICS. So I created a non-unique product-id key. I broke the product-id key up into a fixed and variable part, but since IBM has a 3 letter prefix it uses for z/OS software product-ids, I made the fixed size just 3 characters.
I need to be able to further refine keys by version/release/mod and it seems useful to specify sysplex or system specific keys, so I added these as optional filters in the variable section of the VSAM record. In addition, I added an optional comment to describe what a particular key is for.
Now it is simply a matter of programming ?
Looking at my choices of KSDS, ESDS, LDS, and RRDS, I realize I need an ESDS with 2 alternate indices – one for the key and one for the product id. I couldn’t use a KSDS because I did not have a unique key. But ESDS does not support growing a record – that is only available with KSDS. I want to be able to change a key from a short value to a long value, so I encode an inactive flag in the VSAM record to indicate it is deleted.
Here is the layout of the VSAM record for a key/value pair that I use:
I: Inactive indicator. 0 indicates active, 1 inactive.R: Reserved. 1 byte in size. Always 0 today.P: Fixed part of prod-id key. 4 bytes in size, with the last byte always 0.K: Fixed part of key key. 16 bytes in size, with the last byte always 0.V: Fixed part of value. 16 bytes in size, with the last byte always 0.Po: offset into text (T) area for variable part of prod-id key. Pl: length of variable part of prod-id key. 0 if prod-id is less than 4 bytes.Ko: offset into text (T) area for variable part of key key.Kl: length of variable part of key key. 0 if key is less than 16 bytes.Vo: offset into text (T) area for variable part of valueVl: length of variable part of value. 0 if value is less than 16 bytes.Fo: offset into filter info (F-I) area for filter information, if present.Fl: length of filter info (F-I) area. F-I: filter info area. The filter info is a set of offset/length pairs for each of sysplex, system, version, release, mod, and comment. If the filter info is present, then each offset/length pair will point to corresponding strings in the text (T) area, not shown.
I wrote crtvsam, which creates a VSAM Sphere (a VSAM ESDS or KSDS cluster and 0 or more alternate indices with corresponding paths) so that I could easily experiment with a variety of VSAM configurations. The code takes all the defaults, so it is likely not optimal, but it is great for experimenting. The syntax is fairly simple:
crtvsam cluster repro [key]*
cluster is the dataset prefix to use for all the datasetsrepro is an HFS file that can be used to prime the base cluster so that alternate indices can be builtkey is one or more keys you want in your record.
Each key is of the form name:length:offset[U], where:
name: the name of the key, which is used to name the alternate index data and index datasetslength: the length of the key in the recordoffset: the offset of the key in the recordU: if specified, indicates that the key is unique.
If the first key specified is unique, then the base cluster created will be a KSDS and the key will be used in the KSDS definition, otherwise the base cluster will be an ESDS and all keys will be alternate indices.
I wrote crtvsamxsysvar to create the Xsysvar VSAM Sphere. All it does is generate a 1 line repro file of 64 NULL characters, then creates a VSAM Sphere with a base ESDS cluster, the prodid alternate index of length 4 starting at offset 2 and the key alternate index of length 16 starting at offset 6. Note that crtvsamxsysvar uses the no-charge Z Open Automation Utilities to drive the VSAM commands.
As an aside, I am not sure why I need to put something into my VSAM base cluster before I can build an alternate index for it. If there is a way to avoid that, it would be useful because the record I put into the database is never used – it is just there so the alternate indices can be built.
Xsysvar Code Overview
Xsysvar reads, writes and updates entries in a VSAM database. I won’t go through the code line by line, but will highlight a few interesting aspects of the code. Xsysvar -? provides the syntax and options to run the program.
Xsysvar is written in C and runs as a POSIX(ON) application, i.e. it runs in the Unix System Services shell. It should work fine as a POSIX(OFF) application, i.e. can be built into a program object and run from batch, but I haven’t tried this. bldXsysvar is a trivial script to build the application.
The code is standard C, with the exception of I/O, which uses the z/OS XL C/C++ VSAM I/O extensions in fopen(), fread(), fwrite(), fclose(), flocate(), and fupdate(). To access the records directly with no key, use fopen() on the base cluster. To access the records through either the prod-id or key key, use fopen() on the corresponding path. To find a particular key, use flocate(). To update that key, use fupdate(). Since our base cluster is an ESDS cluster, fupdate() will fail if the new record is longer than the old record. If the record is longer, the current record is updated as inactive, by using fupdate() to set the first byte of the record to 1, and then updating the new record. To ensure the base cluster and indices remain consistent, the alternate indices are marked with UPGRADE and the alternate index paths are marked with UPDATE.
setKey() opens the KEY path file with the extended format of type=record. All VSAM I/O is record I/O (i.e. writing an entire record at a time instead of a stream of bytes). vsamxlocate() is called to determine if the key is already present in the database. If the record is present, the record length is checked. If longer, the current record is marked inactive. If the new record is not longer, the new record is padded out to the size of the old record with zeroes, and the record is updated. The KEY path file is then closed.
If the key did not exist, or the new record is longer than the old record, the base cluster is opened, and the new record written to the database.
getKey() opens the KEY path file with the extended format of type=record as setKey() does. vsamxlocate() is called to locate the key. If found, the corresponding value is printed. The KEY path file is then closed.
listEntriesByKey() opens the path file associated with the key passed in (either prod-id or key). vsamxlocate() is called to find the first record that matches the key, along with any filters specified. The code then loops through all records matching the key and filters, printing all the fields for each matching entry. The path file is then closed.
vsamxlocate() is the workhorse that finds the records that match the key and any filters specified. The code determines the fixed portion of the key to be located, which may be the prod-id or key key. flocate() is called. If the fixed key isn’t found vsamxlocate() returns. If the fixed key is found, then the code reads through records that are a partial match. A partial match means the key matches, but one or more of the filters do not match. After the search, there may be no match, a full match, or the end of file was reached (no match).
My thanks for Anthony Giorgio, Terri Menendez, and Greg Keuken for their review of this article. I am very grateful for their valuable input.
Terri Menendez pointed out that I would get better results if the key I used was more random – i.e. that it was more likely to be unique and therefore I would have to search through fewer records looking for the long key. This makes a lot of sense, so I plan to generate some sort of hash string for each of the two keys to put in the 16 byte fixed area, and maybe change the length of the fixed area.
This article shows a full end-to-end example of using VSAM to define and use a database on IBM z/OS using shell scripts, Z Open Automation Utilities, and C. Comments and suggestions for improvements are always welcome and appreciated.