Data Compression And Decompression System Patent Grant De Maine , et al. April 11, 1 [Research Corporation]

Data Compression And Decompression System

De Maine , et al. April 11, 1

Patent Grant 3656178

U.S. patent number 3,656,178 [Application Number 04/857,707] was granted by the patent office on 1972-04-11 for data compression and decompression system. This patent grant is currently assigned to Research Corporation. Invention is credited to Paul A. D. De Maine, Gordon K. Springer.

United States Patent	3,656,178
De Maine , et al.	April 11, 1972

DATA COMPRESSION AND DECOMPRESSION SYSTEM

Abstract

A high speed, multistage, compressor-decompressor system for processing arbitrary bit strings by reversibly removing redundant information. Alphanumeric information is processed by Type 1 compression which involves removing patterns of contiguous bytes and replacing each removed pattern by decompression information which takes considerably less storage space, and Type 2 compression which involves removing individual redundant bytes and constructing a bit map identifying the location of the removed bytes. Numerical information is processed by a compression technique involving truncation, recursive differencing, sequence removal, packing, and then utilizing the Type 1 and Type 2 compression which are used in conjunction with alphanumeric information. The information which is to be compressed is arranged in strings of bytes and any information defining removal of redundant information from a string is kept together with the string. As a result, each string is self-defined in the sense that it contains all information needed to decompress that string. ##SPC1##

Inventors:	De Maine; Paul A. D. (State College, PA), Springer; Gordon K. (State College, PA)
Assignee:	Research Corporation (New York, NY)
Family ID:	25326570
Appl. No.:	04/857,707
Filed:	September 15, 1969

Current U.S. Class:	341/87
Current CPC Class:	H03M 7/3066 (20130101)
Current International Class:	H03M 7/30 (20060101); G06f 007/06 ()
Field of Search:	;340/172.5

References Cited [Referenced By]

U.S. Patent Documents


3237170	February 1966	Blasbalg et al.
3273130	September 1966	Baskin et al.
3289169	November 1966	Marosz
3310786	March 1967	Rinaldi et al.
3413611	November 1968	Pfeutze
3422403	January 1969	Webb
3490690	January 1970	Apple et al.
3535696	October 1970	Webb

Other References

PA.D. de Maine, B. A. Marron, and K. Kloss, The Solid System II; Numeric Compression The Solid System III; Alphanumeric Compression Nat. Bureau of Standards Technical Note 413, Aug. 15, 1967 .
R.W. Bemer, Data Compression System, IBM Tech. Disc. Bull. Vol. 3, No. 8, Jan. 1961.

Primary Examiner: Zache; Raulfe B.
Assistant Examiner: Chirlin; Sydney R.

Claims

We claim:

1. Method of utilizing a digital computer system including memory and control sections comprising the steps of:

a. storing in the memory a string comprising a set of multibit information units;

b. identifying and storing in the memory a LEXICON table comprising Type 1 codes defined as units which are of the same format as the stored string units and which do not occur in the stored string;

c. searching the stored string for the presence of a plurality of patterns of contiguous string units which patterns are repeated in the string, and identifying such repeated patterns;

d. replacing each of the patterns of a plurality of repeated patterns identified in the preceding step by a unique Type 1 code from the LEXICON table;

e. storing in the memory decompression information associated with the string and defining the replacement carried out in the preceding step; and

f. storing in the memory a PCORD table containing one pattern from each plurality of identical patterns which have been replaced by a Type 1 code.

2. Method as in claim 1 including computing and storing in the memory a savings ratio indicative of the saving in the length of the stored string achieved through replacing by a Type 1 code string pattern identical to patterns stored in the PCORD table to thereby associate a savings ratio with each pattern stored in the PCORD table.

3. Method as in claim 2 including: defining a maximum value for the number of patterns that can be stored in the PCORD table; testing before storing a pattern in the PCORD table if the PCORD table is full; if the PCORD table is full, testing if the savings ratio of the pattern to be stored in the PCORD table is more favorable than the least favorable savings ratio of the patterns already in the PCORD table; and, if the answer is yes, removing the pattern with the least favorable savings ratio from the PCORD table and storing therein the pattern with the more favorable savings ratio.

4. Method of utilizing a digital computer having memory and control sections to decompress a string compressed by the method of claim 1 comprising:

a. storing the compressed string an the decompression information in the memory;

b. identifying from the decompression information a Type 2 code and the pattern replaced by it;

c. locating in the compressed string Type 2 codes identical to the Type 2 code identified in the preceding step; and

d. replacing each Type 2 code located in the preceding step by the pattern identified in the identifying step.

5. Method of utilizing a digital computer system including memory and control sections comprising the steps of:

a. storing in the memory a string comprising a set of multibit information units;

b. identifying and storing in the memory a LEXICON table comprising Type 1 codes defined as units which are of the same format as the stored string units and which do not occur in the stored string;

c. storing in a PCORD table in the memory a list of PCORD patterns each composed of a plurality of units of the same format as the units of the stored string;

d. searching the stored string for a plurality of patterns each composed of contiguous units and each identical to a PCORD pattern;

e. identifying such patterns for replacement by Type 1 codes;

f. replacing each of the patterns of a plurality of repeated patterns identified in the preceding step by a unique Type 1 code from the LEXICON table; and

g. storing in the memory decompression information associated with the string and defining the replacement carried out in the preceding step.

6. Method as in claim 5 including defining prior to the searching step if the stored string is to be subjected to slow mode compression or to a fast mode compression, and -- if slow mode compression is defined -- searching in the searching step for a plurality of patterns each composed of contiguous units and identifying such patterns for replacement by Type 1 codes without reference to the PCORD patterns stored in the PCORD table, but -- if fast mode compression is defined -- searching in the searching step only for repeating patterns identical to PCORD patterns from the PCORD table.

7. Method of utilizing a digital computer system including memory and control sections comprising the steps of storing in the memory a string comprising a set of multibit information units; identifying and storing in the memory a LEXICON table comprising Type 1 codes defined as units which are of the same format as the stored string units and which do not occur in the stored string; searching the stored string for the presence of a plurality of patterns of contiguous string units which patterns are repeated in the string, and identifying such repeated patterns; replacing each of the patterns of a plurality of repeated patterns identified in the preceding step by a unique Type 1 code from the LEXICON table; storing in the memory decompression information associated with the string and defining the replacement carried out in the preceding step; including in said identifying and storing step the substep of identifying and storing in the LEXICON table Type 2 codes defined as units which are of the same format as the stored string units and which occur in the stored string at least a preselected number of times, and including the additional steps of: searching the stored string for string units identical to Type 2 codes stored in the LEXICON portion of the memory in the preceding step; constructing and storing in the memory a bit map identifying the locations in the stored string of units identical to a Type 2 code and occurring at least a preselected number of times in the string; removing from the string the units identified in the preceding step; and storing in the memory decompression information associated with the string and comprising the bit map and one of the removed string units.

8. Method of utilizing a digital computer system including memory and control sections comprising the steps of:

a. storing in the memory a string comprising a set of multibit information units;

b. identifying and storing in the memory a LEXICON table comprising Type 1 codes defined as units which are of the same format as the stored string; units and which do not occur in the stored string;

c. searching the stored string for the presence of a plurality of patterns of contiguous string units which patterns are repeated in the string, and identifying such repeated patterns;

d. replacing each of the patterns of a plurality of repeated patterns identified in the preceding step by a unique Type 1 code from the LEXICON table; and

e. storing in the memory decompression information associated with the string and defining the replacement carried out in the preceding step;

f. storing in the memory a PCORD table containing preselected Type 2 codes each of the same format as the units of the stored string;

g. searching the stored string for units which are identical to a Type 2 code and which occur in the string at least a preselected number of times;

h. removing from the string units identified in the preceding step;

i. constructing a bit map identifying the locations of the removed units; and

j. storing in the memory decompression information associated with the string and comprising the bit map and one of the removed string units.

9. Method as in claim 8 including: defining prior to the step of searching the stored string for units identical to Type 2 codes if the stored string is to be subjected to fast mode compression or to slow mode compression; if slow mode compression is defined identifying and storing in said identifying and storing step in the LEXICON table Type 2 codes defined as identical in format with the units of the stored string and occurring in the string at least a preselected number of times, searching the stored string for string units identical to a Type 2 code from the LEXICON table of the memory and occurring at least a preselected number of times in the stored string, and then proceeding to the bit map constructing step without recourse to the table of Type 2 codes; but if fast mode compression is defined, then searching the stored string for units identical to Type 2 codes from the PCORD table of Type 2 codes without recourse to the LEXICON portion of the memory.

10. Method of utilizing a digital computer having memory and control sections to decompress a string compressed by the method of claim 8 comprising the steps of:

a. storing the compressed string and the decompression information associated with the string in the memory;

b. identifying from the decompression information a Type 1 code and the pattern replaced by it;

c. locating in the compressed string Type 1 codes identical to the Type 1 code identified in the preceding step;

d. replacing each Type 1 code located in the preceding step by the pattern identified in step b;

e. identifying from the decompression information a bit map and one of the string units removed in conjunction with constructing the bit map; and

f. storing the removed string unit in the places in the string identified by the bit map and expanding the string accordingly.

11. Method of utilizing a digital computer system including memory and control sections comprising the steps of:

a. storing in the memory a string comprising a set of multibit information units;

b. storing in a PCORD table in the memory at least one PCORD pattern composed of a plurality of units which are of the same format as the units of the stored string;

c. searching the stored string for the presence of a plurality of patterns each composed of contiguous string units and each identical to a PCORD pattern, and identifying such string patterns if any are found; and

d. replacing each of the string patterns identified in the preceding step by a Type 1 code defined as a unit which is of the same format as the string units but which does not occur in the stored string.

12. Method of utilizing a digital computer having memory and control sections to decompress a string compressed by the method of claim 11 comprising the steps of:

a. identifying a Type 1 code which has replaced a pattern and the pattern replaced by the code;

b. replacing each Type 1 code identical to the identified code and occurring in the string by the pattern identified in the preceding step; and

c. expanding the string by a number of units equal to the difference between the number of units in the replaced patterns and the number of Type 1 codes which have been replaced by patterns.

13. Method of utilizing a digital computer system including memory and control sections comprising the steps of:

a. storing in the memory a string comprising a set of multibit information units;

b. identifying and storing in a LEXICON table in the memory Type 2 codes defined as units which are of the same format as the stored string units and which occur in the stored string at least a preselected number of times;

c. searching the stored string for units identical to a Type 2 code and identifying the locations in the string of such identical units;

d. constructing and storing in the memory a bit map identifying the locations in the stored string of units identified in the preceding step;

e. removing from the string the identified units; and

f. storing in the memory decompression information associated with the string and comprising the bit map and one of the removed string units.

14. Method of utilizing a digital computer having memory and control sections to decompress a string compressed by the method of claim 13 comprising the steps of:

a. storing the compressed string and the decompression information in the memory;

b. identifying from the decompression information a bit map and a string unit removed in conjunction with constructing the bit map; and

c. inserting the string unit identified in the preceding step in the locations in the string identified by the bit map and expanding the string accordingly.

15. Method of utilizing a computer system including memory and control sections comprising the steps of:

a. storing in the memory a string composed of a number of multibit information units;

b. storing in a PCORD table in the memory Type 2 codes defined as units which are of the same format as the stored string units;

c. searching the stored string for units which are identical to a Type 2 code from the PCORD table and which occur at least a pre-selected number of times, and identifying such units if any are found;

d. constructing a bit map identifying the locations in the stored string of units identified in the preceding step; and

e. removing from the string the identified units and storing in the memory together with the string the bit map and one of the removed string units.

16. Method of utilizing a digital computer having byte-oriented memory and control sections to compress information supplied in the form of a string comprising a set of bytes of information bits, comprising the steps of:

a. using the value of each byte of the string to address a 256byte table in which each byte address corresponds to a unique one of the 256 possible bit configurations of a byte and each byte address contains a count of the number of times the byte address has been addressed;

b. storing an indication of the address of each byte address of the 256 byte table that has not been addressed in the course of step (a) in a LEXICON table to compile thereby a set of Type 1 codes which are bytes that do not occur in the string;

c. detecting the occurrence in the string of a group of non-overlapping patterns, if any, of R contiguous bytes (R is an integer greater than 1) which patterns are identical with each other;

d. replacing each of the patterns detected in the course of step c. with an identical Type 1 code selected from available Type 1 codes in the LEXICON table and compressing the string to eliminate the space vacated because of the difference in length between each such replaced pattern of R bytes and the one byte Type 1 code replacing it;

e. associating with the string decompression information comprising the Type 1 code used in the course of step d and one of the replaced patterns of R bytes;

f. changing the value of R and repeating steps c through e for as long as both i the combined length of the Type 1 codes used as pattern replacements and the decompression information is less than the combined length of the replaced patterns and ii. previously unused Type 1 codes are available in the LEXICON table; and

g. storing a pattern from each group of patterns of R bytes which has been deleted from the string in a PCORD table.

17. Method as in claim 16 including storing a pattern from each group of patterns of R bytes which has been deleted from the string in a PCORD TABLE; associating with each pattern in the PCORD table a savings ratio indicative of the degree of compression of the string resulting from the deletion of said pattern from the string.

18. Method as in claim 17 including: limiting the capacity of the PCORD table; checking whether the PCORD table is full when an attempt is made to include therein a new pattern; and, in case the PCORD table is full, storing the new pattern in the PCORD table and deleting from the PCORD table the pattern having the lowest associated savings ratio, but only if said lowest savings ratio is lower than the savings ratio of the new pattern.

19. Method of utilizing a digital computer having memory and control sections to decompress a string compressed by the method of claim 16 comprising the steps of:

a. storing the compressed string and the decompression information in the memory;

b. identifying from the decompression information a Type 1 code and the pattern of R bytes replaced by it;

c. locating in the compressed string Type 1 codes identical to the Type 1 code identified in the preceding step;

d. replacing each Type 1 code located in the preceding step by the pattern of R contiguous bytes identified in step b; and

e. expanding the string by a number of bytes equal to the difference between the number of bytes of the patterns replacing the Type 1 codes and the number of replaced Type 1 codes.

20. Method of utilizing a digital computer having memory and control sections to compress numeric information supplied in the form of a first string comprising a set of words A, B, C, D, E. . . . , each word containing a number, comprising the steps of:

a. recursively differencing by

i. adding the absolute value of the contents of the words of the first string to obtain a first sum;

ii. generating from the first string a second string having the same number of words, each word except the first having a value equal to the difference between the correspondingly located word of the first string and the next preceding word of the first string, whereby the second string comprises words A, A-B, B-C, C-D, D-E, . . . , and adding the absolute values of the words of the second string to obtain a second sum;

iii. comparing the magnitudes of the first and the second sums and, continuing to step (b) if the first sum is less than the second sum, but continuing to substep (iv) if the magnitude of the first sum is greater than or equal to the magnitude of the second sum;

iv. generating from the second string generated in substep (ii) a third string in the same manner as in substep (ii), whereby the third string comprises words A, A-(A-B), (A-B)-(B-C), (B-C)-(C-D), (C-D)-(D-E), ... , and repeating substeps i and ii by considering each newly generated string as a second string and considering the previously generated string as a first string;

b. detecting sequences of identical words;

c. determining if the replacement of such sequences by defined decompression information would result in saving in string length and proceeding to step (d) if saving is indicated but proceeding to step (e) if no saving is indicated;

d. deleting each detected sequence of identical words and associating with the string decompression information comprising the value of the deleted word, the number of deleted words and the address in the string prior to deletion at which the deleted sequence started, and compressing the string to take up the space vacated by the deleted words; and

e. packing the words of the string into double words, each double word having a set of bits designating the total number of and a set of bits storing values of words packed in the double word, each word occupying a fixed number of bit positions, said fixed number determined by the number of significant bits of the highest value word packed in the double word.

21. Method as in claim 20 including converting floating point numbers contained in the words A, B, C, D, E, . . . , into integer numbers by a logical right shift truncation process.

22. Method as in claim 21 including: placing each integer number in the string into a four byte word; searching the string to find a minimum and maximum value of the words; storing said maximum and minimum values; determining the median value by dividing the sum of the minimum and maximum values by 2; recording said median value; subtracting the recorded median value from each word in the string; associating with the string decompression information defining the above steps; and storing the decompression information.

23. Method as in claim 21 including: providing a value LSX indicating the degree of accuracy at which the information stored in the string is to be maintained; dividing LSX by 2 and storing the result; finding the minimum value of the words in the string; subtracting from each word of the string the minimum value and dividing the difference by the value LSX over 2; and rounding by removing all digits to the right of the decimal point in the words leaving only the digits to the left of the decimal point.

24. Method as in claim 21 including: locating in the string composed of integer numbers each contained in a word any patterns of words which patterns are composed of a plurality of contiguous words identical to each other; replacing each such pattern by two new words one of which is a count of the number of the words repeated in the pattern and the other one of which is a copy of the word which is repeated in the pattern, and associating with the string a third new word indicating the location in the string of the pattern which is replaced by said two new words.

25. Method of utilizing a digital computer having memory and control sections to decompress strings compressed by the method of claim 20 comprising the steps of:

a. storing in the memory the packed string and its decompression information;

b. unpacking the packed string double words by reference to the set of bits designating the total number of words packed in the double words and to the set of bits storing the values of words to generate unpacked words;

c. if sequences of words were deleted in the course of compressing the string, utilizing the decompression information to replace in the string the deleted words; and

d. if second or subsequent string were generated during compression, carrying out the reverse of the second and subsequent string generating step to regenerate the original string of words A, B, C, D, E, . . . .

Description

A program for a general purpose digital computer for storing, retrieving or updating and purging a collection of items whose individual members have been assigned descriptor sets. The program comprises first translating the assigned descriptor sets into a special digital linear informational representation form. Within core storage of the digital computer a first index column array (MARRAY) is provided. MARRAY consists of subarrays each having EXECUTIVE POINTERS. The EXECUTIVE POINTERS in the index array contains only an address of length &ADDLY bytes. The EXECUTIVE POINTERS are arranged in the MARRY subarray seriatim in accordance with the associated M value and the addresses contain the beginning address of a JARRAY subarray.

A plurality of subarray JARRAYS are provided in core storage which JARRAYS each contain EXECUTIVE POINTERS including the next descriptor in the JOB-LIST ITEM and the address of the next subarray to be checked. The EXECUTIVE POINTERS in the column JARRAY subarrays are arranged seriatim in accordance with the value of the JARRAY EXECUTIVE POINTERS descriptors.

The address portion of the JARRAY EXECUTIVE POINTER points a subarray in a memory block resident in core. Said memory block subarrays are similar to said JARRAY subarray.

The final column arrays (RFILE) in the memory block have addresses which point to the address in the bulk storage of the digital computer where the collection of items may be stored, retrieved, updated or purged.

At the end of each subarray in the MARRAY, JARRAY and RFILE subarrays, a link address is provided where the search may be continued when a particular column subarray is filled. Core storage also maintains composite addresses, namely EMPTY which gives the exact location in core where the next newly created memory block is to be stored, with the fast memory address (FMADD of EMPTY) containing the beginning address of a transient portion of the core storage array or the beginning of the memory block; CURRENT the address in core storage where the memory block normally resides, with the fast memory address portion of CURRENT (FMADD of CURRENT) being the relative address of the first byte of the resident memory block that is not part of an existing subarray, and; ADDRESS which is an address extracted from a subarray during a search and which may be a link address pointing to a continuance of the subarray; an address extracted from an EXECUTIVE POINTER, or zero if the subpath is missing. Additionally, there are provided in storage various indicator elements of which one is called MSIGNAL which will give a continuous updated indication of the status of the current search.

After initializing the system, the program searches each descriptor in the JOB-LIST ITEM starting in the MARRAY and continuing in the JARRAY through the memory block until, in retrieval, one finds in RFILE an address or addresses in bulk storage where the information to be retrieved can be found or, in storage, an address where information can be stored. Provision is made for the automatically updating storage, for eliminating certain descriptors by using overrides, and for transferring memory blocks back and forth into core with a minimum time loss.

The information in bulk storage is in compressed form and is decompressed only after having been retrieved or prior to storage. The compressor and decompressor takes two forms; one is an alphanumeric compressor and decompressor (SANPAK) and the other is numeric compressor and decompressor (SNUPAK).

The compressor and decompressor can be either in the form of a program for a general purpose digital computer or may be a hard wired special computer.

The alphanumeric compressor operates to compress a string of digital signals by first scanning a segment of the string on a byte to byte basis. A table (LEXICON) is provided in the core storage of the digital computer. It has 256 byte positions. Each byte position corresponds to the 256 bit configurations possible in a single byte of information. In each of the appropriate byte positions of the table is stored the number of times particular bit configurations appear in the segment. Those bit configurations that do not appear in the segment are segregated as possible Type 1 code bytes. If there are Type 1 code bytes then the segment is scanned in multiple byte segments, byte by byte to determine if there are any common groupings of bytes for which a Type 1 code byte may be substituted. If a common multibyte segment is found, a Type 1 code byte is substituted in their place in the string and the string is closed up. At the head of the string is placed the common multi-byte segment for which substitution has been effected, the Type 1 code byte substituted, and the number of times that the Type 1 code byte was substituted. This information is used upon decompression of the string. The common multi-byte segments (PCORDS) may also be kept in a special table called the PCORD TABLE and, where certain PCORDS are expected to be found in a given information, it may not be necessary to scan the string byte by byte with different common multi-byte segments, but the string may be scanned with one of the PCORDS from the PCORD TABLE so as to speed up the compression process. The PCORD TABLE is continuously being updated with the amount of saving achieved with particular PCORDS so that only those PCORDS which achieve a saving may be used in successive compression steps.

Additionally, where this type of compression has been utilized to its fullest, or where it cannot be used because there are no Type 1 codes available, Type 2 compression is effected. In this type of compression, where a particular byte appears more than 34 times in the first 256 bytes in the string, these common bytes are removed from the string and a bit map is placed at the head of the string to show where the bytes have been removed.

The numeric compressor compresses digital numeric strings by converting the strings of numeric information into integers, eliminating floating point exponents, differencing successive integers seriatim, placing at the head of the string a number indicating the number of differencing procedures, condensing identical sequences in the string and placing information at the head of the string showing the place where the identical sequences have been condensed and packing all of the substring integers into double words in a optimal fashion.

The search procedure and the compression techniques are all integrated into a single storage, retrieval an updating and purging system which has been called the SOLID SYSTEM.

BACKGROUND OF THE INVENTION

The invention is in the field of data storage and retrieval systems and particularly in the field of large scale systems of this nature. It also relates to data compression and decompression systems.

In designing a large computer oriented data storage and retrieval system it is desirable that the final product meet the following design and performance specifications:

i. Storage, retrieval, updating and purging tasks must be accomplished as fast as possible.

ii. The system must be independent of the information base.

iii. Components should be capable of being coded independently of all the other components in the system.

iv. Programming a particular modification of a fully implemented scheme or combining equivalent hardware components to meet the varied needs of users, should be as simple as possible.

v. The system should be open ended to provide for future modifications.

vi. The coded system should be as free of machine dependence as possible to provide for easier translation to other computer configurations.

Because size and scope would prohibit writing such system as a single program or creating a one piece hardware embodiment, the system is organized on a component by component basis. Each component should perform a single task in the overall scheme. For example, one component can handle card input; another the output; and so on for each separate task the system will perform.

To simplify recoding of a large system for another computer, it is essential that a higher language be used or developed. The present higher languages (e.g., FORTRAL, ALGOL, COBOL, SNOBOL, COMIT, etc.) are not suitable for coding large retrieval or indexing systems because they do not have the bit and byte manipulation capabilities that are essential for efficient machine coding. A large system should, therefore, have associated with it an open ended higher language, such as ALLOCATE, which can grow as the system is implemented. Thus, a fully implemented System can be coded in a machine independent higher language that can provide the basis for a retrieval language. The macro language provided in the IBM System 360 can provide a starting point for the language ALLOCATE.

Each component can be coded in the macro language. The central concept is thus one of extensively nested macros incorporated into the assembly language processor of the computer. In this way the normal operations of the assembly language are extended with macro instructions that perform the special operations needed. In the IBM 360 System the assembly language is called BAL. A programmer can add, delete, or ignore certain components of the system to suit specific needs. This design allows for the unrestricted growth of the system and the retrieval language (ALLOCATE) by adding new components to the system macro library.

Translation of a System defined in this manner to other computer configurations can be greatly simplified by the use of the component type system. For example, it is possible to translate directly to FORTRAN IV by a suitable translator. The translation can be performed component by component rather than by trying to rewrite an entire system. Moreover, since the components are independent of one another, only those components needed for the particular application of SOLID need be translated. The necessity for programming around deleted components becomes unnecessary.

In regard to the data compression part of the invention, the need for effective compressors is obvious because it is always desirable to reduce the number of information indicia required to represent information of given content without affecting the information content. Special recoding techniques that save storage or transmission time, such as the "SQUOZE encoding" developed for the Share 709 System, and the PREST scheme for the IBM 7094 have been developed in the past and are in use but both are tied inseparatably to the particular data base or to a particular hardware. It is desirable therefore to have a data compressing system which is completely independent of the data base so as to have unrestricted general purpose use and which in addition meets the following design objectives:

i. By compression, increase the amount of information that can be stored in mass storage or on magnetic tapes and other peripheral devices.

ii. Increase the rate of transmission either from "slow" to "fast" memory or between receiver/transmitter stations by transferring compressed information.

iii. Automatically decompress the compressed information when it is needed either by a computer (in fast-memory) or by users of the system.

iv. Error checking procedures which will insure that errors in the transmitted compressed information will be found either before or during compression.

v. Increase the efficiency of the computer system by decreasing the time required for search and/or fetch operations.

SUMMARY OF THE INVENTION

The invention is in a data management system for manipulating large amounts of information. In a particular form, the invention resides in a data storage and retrieval system which, once it has associated a set of descriptors to an item of information of any size, is independent of the actual data base of that item of information. That item of information can then be stored in bulk memory, in compressed form, and the set of descriptors associated with it can be stored, retrieved, updated or purged within a fixed time which is independent of: the number of sets of descriptors maintained by the invented system, the actual size of a particular set of descriptors, or the type of search, retrieval, updating or purging operation carried out. The advantage of this fixed time for manipulating descriptors derives from the fact that they are manipulated independently of the items of information in bulk storage, and that an efficient novel manner of manipulating sets of descriptors has been provided.

In particular, a very small portion of each set of descriptors is kept in the fast memory of a computer incorporated into the invented system such that only a small part of that fast memory is occupied even though the total storage space required for all sets of descriptors may exceed the fast memory capacity many times. The remaining portions of all sets of descriptors are organized in memory blocks of which one is at all times in fast memory but all others are kept in virtual memory. The system ensures, through the use of "continuance tables" which come into use before a search associated with a particular set of descriptors goes into a memory block, that the search will be completed within a single memory block. Thus, for any number of memory blocks needed for a particular great number of sets of descriptors, a search should involve a transfer of only one block from virtual memory to fast memory. The size of the bulk storage space occupied by the information with which the descriptor sets are associated thus also has no effect on the search speed.

Storage space in fast and in virtual memory is utilized efficiently in that the need for reserving specific blocks of memory for a particular use has been avoided. The invented system utilizes arrays and subarrays which have no fixed location and which can vary in size as needed in a manner not requiring the intervention of a user of the system, but controlled in an optimum manner by a system control package. Further, any available spaces within a memory block which have been vacated by purged sets of descriptors are used for creating new descriptor paths before attempting to locate previously unused memory space. The control package for overseeing the use of these vacated spaces operates by linking each vacated space to another such space to create a continuous chain, such that only the beginning of it need be kept track of. Once it is determined that a new descriptor set is to be stored in a particular memory block, the control package need only locate the start of this chain of vacated spaces and then insert appropriate descriptor information in the first available locations along the link which can take that information. The newly occupied spaces are then deleted from the link but the rest of the link, if any, closes again around the deleted spaces.

Storage space and retrieval time for the bulk storage information are optimized because, before entering bulk storage, the bulk information may be compressed into self-defining strings such that each string has associated with it all information needed for decompressing it. When compressed information is taken out of bulk storage, it can be decompressed without referring to any additional information associated with that particular string but stored elsewhere.

While the compressor-decompressor portion of the invention is of great importance to the efficiency of the data management system referred to above, it is also of great utility in any situation where data compression-decompression may be desirable, such as in communication between various combinations of peripheral devices, computer systems and subsystems and communication networks.

BRIEF DESCRIPTION OF THE DRAWINGS

FIGS. 1A, 1B and 1C illustrate a flow diagram of a macro SANPAKC used in alphanumeric compression.

FIG. 2 is a flow diagram of a macro SANPAKD used in alphanumeric decompression.

FIG. 3 is a generalized flow diagram of a macro SNUPAKC used in numeric compression.

FIG. 4 is a flow diagram of step a of the macro SNUPAKC of FIG. 3.

FIG. 5 is a flow diagram of step b of the macro SNUPAKC of FIG. 3.

FIG. 6A, 6B and 6C illustrate an expanded flow diagram of the macro SNUPAKC of FIG. 3.

FIGS. 7A, 7B and 7C illustrate a flow diagram of a macro SNUPAKD used in numeric decompression.

FIG. 8 shows the information at the head of a string prior to its being supplied to the decompressor SNUPAKD of FIG. 7.

FIG. 9 is a table showing input format code and its meaning with respect to the type of compression in the string.

FIG. 10 is a flow diagram of Part A of a macro SMEMORY which is in a GLOBAL MEMORY component of the invented system.

FIG. 11 is a simplified flow diagram of a macro COPAK for use in compression-decompression.

FIG. 12 is a diagramtic showing of a typical array in storage.

FIG. 13 is a diagramatic showing of the manner in which a JOB-LIST Item is searched in an AUXILIARY FILE.

FIG. 14 is a diagram illustrating the hierarchical arrangement in the design selected for the invented system.

FIG. 15 is a diagramatic showing of CONTROL routines utilized in the invented system.

FIG. 16 is a flow diagram showing the status of information called MSIGNAL and used in the retrieval package of the invented system.

FIG. 17 is a flow diagram showing a macro AUXFILE, used in search procedure of a file called RFILE.

FIG. 18A and 18B are a flow diagram of a macro called SCREEN and used in search procedure.

FIGS. 19A and 19B are a flow diagram of a macro SUPERSCH used in search procedure.

FIGS. 20A and 20B are a flow diagram of a macro called INSERT.

FIG. 21 is a flow diagram of a macro called MMATCH.

FIG. 22 is a flow diagram of a macro called STRATEGY.

FIG. 23 is a flow diagram of a macro called TBADD.

FIGS. 24A and 24B are a flow diagram of a macro called CREATE.

FIG. 25 is a flow diagram for a CONTROL package called SOLIDE.

FIG. 26 is a flow diagram for a CONTROL package called COPAKCO.

FIG. 27 is a flow diagram for a CONTROL package called COPAKAN.

FIG. 28 is a flow diagram for a CONTROL package called COPAKNU.

Preface

In order to facilitate initial orientation into this sophisticated and multifaceted invention, the detailed description begins with a brief exploration of the mathematical basis of associating sets of descriptors with items of information for the purpose of creating a versatile and flexible data management system. Under the heading following that, an illustrative example is given of the invention as a data storage and retrieval system using a particular normalized form of these descriptors to access information items stored in large scale bulk storage. Once the cooperation between various portions of the invented system is indicated by means of the illustrative example, these portions are explained in great detail and particularity, with emphasis on their interrelation. The detailed description concludes with a portion devoted to a data compressing and decompressing system used in conjunction with the invented data management system. The data management system is referred to below as SOLID, and the data compressing and decompressing system is referred to as COPAK.

A. Mathematical Basis of the SOLID System

Suppose that a particular document has associated within it the following nine descriptors or designators:

A, b, c, d, e, f, g, h, and I.

To simplify matters, we shall suppose that there are no Type 2 over-rides. Perhaps, at this time, a simple description of the need for Type 2 over-rides should be presented. There are four codes for descriptors or designators which are reserved. These codes are as follows:

0, 1, 2 and 3.

The last three are called Type 1, Type 2, and Type 3 over-rides respectively. The 0 code for a designator indicates that there is no information at that point in the information representation.

In the retrieval mode, the Type 1 override code indicates that any non-zero designator at that particular point in the information representation is to be accepted for retrieval purposes. In the storage mode, that is when the information representation is being stored, the Type 1 over-ride code is used to create a new class with a 1 as a designator. It is thus possible to create a new non-specific class by substituting a 1 for one or more of the designators in an information representation.

The Type 2 over-rides are substantially similar to the Type 1 over-rides in the retrieval mode except that the Type 2 over-ride indicates that any zero or non-zero designator in the Type 2 over-ride position is acceptable.

When utilized in the storage mode, a Type 2 override code for a designator in an information representation means that the existing designator must be replaced by another designator which can be found in a separate table in storage (table AOVER2R).

The translation routine, used for an entire collection of documents, arranges the designators A, B, C, D, E, F, G, H, and I in the form of a label as follows: ##SPC2##

Here the linear form is simply a shorthand way of writing the square array. / and // are inserted for clarity. LABEL is called an information representation (IR) of the particular designator-set. Its elements are divided into the following three categories or levels of disclosure.

Kernels: These lie on the principal diagonal (i.e., A, B, and C).

Ci connectors: Lie above the principal diagonal (i.e., D, E, and F).

Cii connectors: Lie below the principal diagonal (i.e., G, H, and I).

In an information representation, IR, the designators cannot be reclassified by an transformation. This means that IR's can be manipulated by any transformation rules that leave the designators in their assigned levels of disclosure. The mathematics of the transformation rules and normalization are shown and fully discussed in Appendices I and II of the book "Information Retrieval: a Critical View" in the article by P.A.D. deMaine and B. A. Marron, "The SOLID System I. A Method for Organizing and Searching Files." The book was edited by Schecter and was published by the Thompson Book Company in Washington, D. C. in 1967.

The elements in an IR can be assigned numbers, from a look-up table for each level of disclosure, in the translation step. However, as was discussed previously, four numbers have been assigned special meanings, namely Type 1, Type 2 and Type 3 over-rides and, of course, the zero which means no information for a designator.

Information representations can be expanded by replacing a single kernel by a new IR. Contraction occurs when an information representation is replaced by an IR with fewer kernels. These properties of expansion and contraction permit the reclassification of documents without having to reorganize the file. Reclassification may be desirable because either the original designator-set did not adequately describe the referenced information, or, due to natural growth, new subclasses must be created. Thus these information representations permit the uninhibited growth of a retrieval system without incurring redundancy or obsolescence.

The status of an information representation with respect to expansion or contraction is indicated by the first bit-map (B.sub.1) thus:

B.sub.1 =(M/m.sub.1, m.sub.2 ,---, m.sub.M /L/1.sub.1,1.sub.2 ,---1.sub.L) (2)

Here M is the number of nested representations in the IR; m.sub.1 is the number of kernels in the ith nested IR with the basic IR; L and 1.sub.i refer to the Type 2 over-rides. The first bit-map for the above example, 1, is B.sub.1 =(1/3/0). Here we have assumed that the kernels are single information representations and that no Type 2 override codes are present. The second bit-map (B.sub.2) is the binary projection of the linear form of LABEL. For example, if all nine designators in the LABEL are not zero, B.sub.2 is:

B.sub.2 =(111/11/1//11/1)

binary (3)

=(7/3/1//3/1) in decimal ten } (3)

The second bit-map (B.sub.2) is constructed from the information representation in its square-array form. The linear form of LABEL is compressed by eliminating all zero designators. Terminal zeros in B.sub.2 are omitted. For example, if E, G and I in (1) are zero, then:

Label = (abc/d/f//h)

b.sub.1 = (1/3/0) } (4)

b.sub.2 = (7/2/1//1)

it should be noted that the original information representation can be constructed from LABEL and B.sub.2.

The MOBILE CANONICALIZATION is used to transform the information representation and its bit-maps to any of its equivalent forms, i.e., another form for 4 is:

LABEL (BAC/DE//H); B.sub.1 = (1/3/0); B.sub.2 = (7/3/0//01). One

of these equivalent forms (the NORMAL FORM) is unique for each and every designator set. This means that there is a unique "information path" associated with each and every descriptor-set, no matter what its source. Thus the SOLID System is independent of the information base. It is possible, to avoid normalization, if that is desired, and to use the unnormalized information representation. This situation occurs when the designators must remain in a particular order because of the nature of the information itself. This might occur, for example, where designators in the three classes (Kernels, CI, and CII Connectors) have been assigned the same codes, and their places in the information representation indicates the type of designator.

In the SOLID System the information in (4) is combined thus:

JOB-LIST ITEM = (1/3/ /ABC/2/D/1/F//I/H*).. (5)

The first two numbers are the values for m (=1) and J=m.sub.1,m.sub.2,m.sub.3 ,-,m.sub.M = 3. ABC is the principal diagonal of the information representation. The remaining numbers are the other diagonals of the second bit-map (B.sub.2) and LABEL alternated. The asterisk (*), added to the last non-zero LABEL diagonal, indicates that the path is terminated. The field separators are inserted for clarity. They are not present in the machine representation. For a general information representation which contains M nested representations and J = m.sub.1 m.sub.2 m.sub.3 -m.sub.M kernels 5 is replaced by 6:

JOB-LIST ITEM = (m/J/LO.sub. o /BD.sub.1 /LD.sub.1 /.../BD.sub.1.sub.-J /LO.sub.1.sub.-J *) (6)

Here BD.sub.i is the decimal ten value of the binary-bit projection of the i.sup.th diagonal of the IR, whose compressed form is LD.sub.i (i.e., no zeros). The range of i can be expressed as follows: (1-j, .ltoreq. i .ltoreq. (j-1). LD.sub.1.sub.-J * means that the compressed diagonal LD.sub.1.sub.-J terminates the JOB-LIST ITEM. Diagonals of B.sub.2 and LABEL are numbered beginning with the principal diagonal; through the diagonals with CI connectors; then through diagonals with CII connectors.

The diagonals of the second bit-map (B.sub.2) are binary bit projections of the associated diagonals of the information representation. Each bit in the B.sub.2 diagonal indicates the zero (bit off) or non-zero (bit on) status of a particular designator in the information representation. The actual machine method of representing the B.sub.2 diagonals is based on the fact that the basic data-unit in the IBM 360 system, a byte, contains eight bits. Thus in a single byte of a B.sub.2 diagonal the status of up to eight designators in the associated IR diagonal can be recorded in the SOLID System each B.sub.2 diagonal is left adjusted to a byte boundary. This means that a particular B.sub.2 diagonal begins at the left-most bit in a particular byte and continues, across byte boundaries, if necessary, until all the bits needed to indicate the zero or non-zero status of all the designators in the associated IR diagonal have been recorded. The used bits in the last byte of a particular B.sub.2 diagonal will be left adjusted in the byte and the unused bits (of the original eight) are set to zero or turned-off. For example, if one takes the principal diagonal in (1), its B.sub.2 diagonal would contain a single byte with the eight bit binary-number 11100000. This would be the decimal ten number 224. If both designators in the LD.sub.1 diagonal of the IR (1) are not, zero, then its single byte BD.sub.1 diagonal would contain the binary number 1100000 or the decimal ten number 192. It is understood that where the IR diagonal has more than eight designators it may be necessary to use two or more bytes in the B.sub.2 diagonal, to record the zero and non-zero status of the designators. By using the above left adjusted method for B.sub.2 diagonals, the binary representations will follow without the need for any additional calculation by the computer.

As an example, suppose that the elements in the linear form of LABEL are from an assigned descriptor set that has been rearranged to to the square array form as follows: ##SPC3##

The second bit-map diagonals are binary bit projection of LABEL diagonals, left adjusted to a byte boundary. The principal B.sub.2 diagonal, in this case 240, is not present in the JOBLIST ITEM.

The information in LABEL, B.sub.1, and B.sub.2 are combined thus:

JOB-LIST ITEM = (1/4/ABCD/160/EF/0/128 /I/96/PQ*)

The asterisk added to diagonal PQ singifies this is the terminal link in the information path. The Translation Package produces the JOB-LIST ITEM as a self-defined string thus: ##SPC4##

The first two bytes contain the length of the JOB-LIST ITEM. Each diagonal is preceded by two bytes which define its length. The compressed referenced information is stored in bulk storage at an automatically assigned location.

m.sub.1 utilized only one byte. This means that a single representation cannot have more than 255 kernels. It is understood that the parent IR, which can have up to M representations nested in it, may have as many as M times 255 kernels. The two byte length of M limits its maximum value to 65,535. Thus, without considering the storage limitations, the parent IR can have a maximum of between 255 (M=1) and 14,211,415 (M=65,535) kernels. It is unlikely that these limits will ever be attained in applications of the SOLID System.

Overview - An Illustrative Configuration of the SOLID System In Data Storage and Retrieval

The JOBLIST items defined under the previous heading are utilized in a system such as that shown in block form in FIG. 16. Although the operation of the configuration of FIG. 16 is described in much greater detail later on in the specification, it is believed that a brief and qualitative preview of it at this time will help illuminate and unify understanding the operation, interrelation and cooperation of the various portions of the invention particularized below.

In one specific example in reference to FIG. 16, stages 600 and 602 initialize all appropriate registers in a computer system which is used in the embodiment of the invention and which has fast and virtual memories.

Then, a JOBLIST item, which may have been stored or created at stage 604, is used during the processing at stage 602. A portion of the JOBLIST item is examined at stage 606 where it may be determined that it is an index. Control then goes to stage 610.

In stage 610, a control procedure explained in much greater detail below carries out a search through an array in memory associated with indexes to locate information which may be associated with the particular index of the current search. If such information exists, it would be in the form of an EXECUTIVE POINTER. One of the following three situations may occur:

a. An EXECUTIVE POINTER is found and control goes to stage 616 which contains a control package which is called MMATCH and is explained in detail further in the specification;

(b) An EXECUTIVE POINTER is not found in the searched array or possible extensions of it. If the system is in retrieval mode, the control package at stage 616 determines that further search procedures are unnecessary and aborts the search. If the search had been for a descriptor or screen in the JOBLIST item in a screen array (SCREEN SEARCH), and the JOBLIST item currently being used contained overrides, the control package MMATCH would call, through its macro called STRATEGY another control package called MOBILE CANONICALIZATION PACKAGE. The contents and function of the macros and packages mentioned above are defined in detail in the description below. If the system is in a storage or updating mode then steps which are explained in detail further in this specification are taken to create a new searching path for the JOBLIST item currently in use; and

(c) If the memory array reserved for the EXECUTIVE POINTERS which are being searched is full then control again goes to stage 616 containing the control package MMATCH, but with an indication that more space for the array may be needed.

Stage 620 contains a control package called TBADD which assumes control as directed by portions of the MMATCH control package of stage 616. The control package TBADD which is a complex and highly efficient control for determining if any transfer of memory blocks between fast and virtual memory may be required at a particular stage of using a JOBLIST ITEM. It should be noted that limiting all search procedures associated with one JOBLIST item to a single memory block is an important facet of the invention because that fixes the maximum time required for a search associated with a JOBLIST. One memory block is always in fast memory; a number of memory blocks may be in virtual memory. If a search is limited to a single memory block then only one block need be transferred from virtual to fast memory. A search thus will take the same maximum time whether there are two or N memory blocks in the system. Because of this provision the search time is considered independent of the file size.

If a new memory block is required, then the control package TBADD in Stage 620 transfers control to another control package called SMEMORY which saves, if necessary, the memory block which had been in fast memory up to this time and replaces it by a new one obtained from virtual memory. If only a combination of the memory space defined for a particular array is needed, the TBADD package transfers control to a control package called CREATE which performs the needed operations.

Once all operations associated with a portion of the JOBLIST ITEM currently being used have been completed, the control package TBADD transfers control to a stage 626 where it is determined if there are any more portions left of the JOBLIST ITEM. If there are, procedures similar to the one referred to above is repeated for each such portion, until a location is reached in a memory array called RFILE which is for addresses of information stored in bulk storage.

In addition to the INDEX SEARCH and the search through RFILE (AUXFILE) mentioned above, there are also operations involving search procedures using the JOBLIST ITEM between the INDEX SEARCH and AUXFILE called SCREEN SEARCH.

The function and operation of the control packages mentioned above, as well as the function and operation of portions thereof, has been explained in detail under the subsequent headings. Particular attention is directed to those individual steps or points in the control packages and portions thereof described below which relate to transferring or accepting control from another control package or a portion thereof.

STRUCTURE OF THE FILE

Storage is divided into the MAIN and AUXILIARY FILES. The MAIN FILE contains the referenced information which is stored in bulk storage (i.e., tapes, data-cells, etc.). The AUXILIARY FILE contains information paths which terminate in locations that contain addresses of the referenced information in the MAIN FILE. Job-List items, like those in (5) and (6), are used to trace, modify, or create information paths in the AUXILIARY FILE. The fully automatic COPAK compressor, discussed hereinafter, is used to substantially reduce the storage requirements of the MAIN FILE.

The AUXILIARY FILE consists of a maze of fully self-organizing column-arrays that are associated with an index (M), and screens (J, LD.sub.o, BD.sub.1, LD.sub.1, -) that appear in the Job-List, (6). The column arrays are:

a. A single column array, MA, which is associated with the number of nested representations (M). Here M.sub.m is the maximum value of M in the system.

b. M.sub.m column arrays (JA(II), with the range of II being between 1-II-M.sub.m). The Mth of these [JA(M)] is associated with the J screen (m.sub.1,m.sub.2,m.sub.3 -m.sub.M) for the M nested representations.

c. With each element of column array JA(M) there are associated two families of column arrays (BD(I) and LD(I)). BD(I) and LD(I) are associated with the configurations of the Ith diagonals, for B.sub.2 and LABEL in the Job-List item (see 6).

For each B.sub.2, "I" begins with the first positive diagonal, proceeds through all positive diagonals and then all negative diagonals, ending with the last non-zero diagonal. For each LABEL, "I" begins with the principal diagonal then proceeds as for B.sub.2.

d. The RECORD FILE (RFILE) contains the addresses in bulk storage of the referenced information.

Each column array has the structure shown in FIG. 12. The first four bytes if the array contains its length. This is followed by entries or elements, called executive pointers, and, in the last position, the link or continuance address for the array. The link address is &ADDL bytes long, and it contains the locations where the particular kind of column-array (viz MA, JA, or LD(I), etc) is continued or extended. There are two kinds of elements or EXECUTIVE POINTERS. One kind, which is used exclusively in the MA array or its extensions, contains only an address of length &ADDL bytes. The second kind of EXECUTIVE POINTER is used in the arrays associated with screens. It contains a screen (e.g., J in 6) and a composite address of length &ADDL, bytes. The System Parameter &ADDL, whose value can be changed to meet storage requirements, will be discussed hereinafter.

If the column array, shown in FIG. 12, is for the index M, the elements or EXECUTIVE POINTERS are stored seriatum with respect to the associated M value, and the addresses contain the beginning address of a JA array which is associated with the particular M value. Since the EXECUTIVE POINTERS are arranged seriatum, one need only go to position (M times &ADDL plus four) in the MA array to find the address of the JA array that is associated with the particular M value. At the end of the array in FIG. 12 is the &ADDL long continuance address, which contains the location of another array where the array is extended (or continued) for other values of, for example, M. This seriatum arrangement continues over as many arrays as are required to store the different EXECUTIVE POINTERS. Thus the system is totally expandable, as the number of different M values does not in anyway affect the system. It is possible to store as many EXECUTIVE POINTERS, each associated with a different M value, in the MA array(s) as the system is capable of storing.

All those arrays associated with screens differ from the MA, or index array, in that each EXECUTIVE POINTER in the arrays contains two parts. That is, the first element is a particular value of the particular screen (viz J, LD.sub.o, BD.sub.1, etc.) that is associated with the array in question. The second part contains the address of the array that is associated with the next screen where the search should continue. Further, within each array, the EXECUTIVE POINTERS are arranged in numerical order with the lowest value for the screen in the first position. For example, if the screen J6 is equal to 1, then the corresponding EXECUTIVE POINTER would be in the first position in array JA, and the address would contain the location of the array that is associated with screen LD.sub.o, where the search continues. It should be noted that the screen J cannot be zero and that the search is aborted if it is. It should also be noted that all the elements or EXECUTIVE POINTERS in a particular array (shown in FIG. 12) are the same length(=screen address length). The length of a principal array, that is the total length of all EXECUTIVE POINTERS (=screen & address) between the initial four bytes, which indicates the array length, and the link address is a variable System Parameter (&MATRIXL), that is discussed hereinafter. Of course, each screen array has a link address for an extension array, as discussed above. Also the EXECUTIVE POINTERS are arranged in numerical order over an array and all of its extensions.

Now column-arrays are created when they are needed for inserting the missing links or sub-paths (i.e., the elements in column-arrays) that will define a new "information path." The length of each newly created array is determined by the value of &MATRIXL.

Each element in the column arrays is called an EXECUTIVE POINTER which contains the beginning address of another column array. Elements in arrays that are associated with the diagonals B.sub.2 and LABEL contain a screen (VIZ. J, LD.sub.O, etc.), in addition to the address. Within each column array and its extensions the elements are ordered according to the numerical value of the associated screen (for diagonals J, LD.sub.O, etc.) or the index M. The element in the column array that is associated with the last non-zero diagonal in the JOB-LIST item (see (5) and (6)) contains a screen (i.e., LD.sub.1.sub.-J *) and the address of a sub-array of RFILE. The sub-array of RFILE contains the address in bulk storage of the compressed referenced information.

The values of M, J, LD.sub.O -(see (6)) are used to trace an "information path" through the maze of arrays of RFILE. For fixed values of M, each different configuration for each diagonal is entered only once and then only if it occurs. Since arrays are created in unoccupied storage areas only when they are needed, there is minimal movement of data. Moreover, because of the "paths" are essentially independent of each other, the time needed for a search is not altered by increasing or decreasing size of the file. Also, because only non-duplicate subpaths are stored, the AUXILIARY FILE will require substantially less storage than conventional files require. Further, because the entire search is accomplished in core, the search is extremely fast in providing the exact address in bulk storage where the desired compressed referenced information is stored.

FIG. 13 is a diagrammatic showing of the manner in which a JOB-LIST item is used to trace an "information path" in the AUXILIARY FILE. The value of M in the JOB-LIST item locates the EXECUTIVE POINTER in the array MA which points to the particular screen array JA(M) associated with that value of M. In the screen array JA(M), the screen J (as found in the JOB-LIST item) is used to locate a second EXECUTIVE POINTER which will point to the particular screen array, LD(.phi.), that is associate with the JOB-LIST entries J and M. In the array LD(.phi.), the screen LD is used to locate an EXECUTIVE POINTER which points to the next array, BD(1)). It should be noted that in this case, in searching through the first column array or LD(.phi.), there was no EXECUTIVE POINTER found for the value of screen LD in the JOB-LIST item. However, when one came to the bottom of the first column array of LD(.phi.) a link address was given, indicating that the search should be continued in the first extension column array of LD(.phi.). In this extended column array of LD(.phi.) there was an EXECUTIVE POINTER pointing to the next column array, BD(1), which contained the screen LD.sub.o. In BD(1), the next EXECUTIVE POINTER with BD.sub.1 as a screen was found. This continues until, finally, the array LB(1) is found where the screen LD.sub.1 * indicates that the JOB-LIST ITEM is ended and the EXECUTIVE POINTER then points to a place in RFILE where the address(es) in bulk storage of the referenced information can be found. These address(es) in bulk storage are used to fetch the referenced information that was requested by the JOB-FILE item. It should be noted that in the search to reach an address in RFILE, one has been working solely with core storage information. Once the RFILE address has been located, the time necessary to obtain the particular information will be determined by the characteristics of the device on which the bulk storage information is stored. For example, if the bulk storage information is on a disk, the time it takes to bring the referenced information into core storage will include the disk access and transfer times. It should be understood that, there is no need to search the MAIN FILE because the bulk storage address(es), which are found in the RFILE array(s), are the exact location(s) of the requested information. It is also understood that the information in the bulk storage or MAIN FILE is, within the context of this system, stored in compressed form, and that it will be decompressed by the COPAK decompressor in the core-storage. The COPAK compressor will be discussed hereinafter.

Retrieval operations in the SOLID System are illustrated in FIG. 13 and they have just been described. It should be noted that if, during the search of the AUXILIARY FILE, no EXECUTIVE POINTER can be found in a particular array, then this means that the requested referenced information is not in the MAIN FILE or the bulk storage. If this situation occurs in the retrieval mode then the search is discontinued and the user is advised that the information is not in the system. If this occurs in thestorage mode, then a new subpath is created by inserting new EXECUTIVE POINTERS in those arrays which do not have them. This creation of a new information path continues through the AUXILIARY FILE until the RFILE is reached and the new bulk storage address has been allocated of the compressed referenced information. Thus, later in the retrieval mode, the same JOBLIST item will trace the newly created information path and locate the new referenced information. Thus, the system expands, by itself, independently of the user. Further, it should be noted that the new items are stored without in any way effecting any of the other information previously stored in the system. Thus, the system can automatically expand until all the allocated storage has been used. Moreover, there is no duplication of information in the AUXILIARY FILE because, in the storage mode, EXECUTIVE POINTERS are added and new arrays are created only if subpaths, defined by addresses in EXECUTIVE POINTERS, cannot be found. The expansion of the arrays in the AUXILIARY FILE, which occurs when new information is stored, is entirely independent of the user. Storage areas for newly created or extended arrays are automatically allocated by the computer, and is not in any way controlled by the user. Thus, because all retrieval and storage operations are fully automatic, the user has no concern whatever with the actual machine structures of either the AUXILIARY or the MAIN FILES. The actual way in which the computer uses the AUXILIARY FILE will be described next. However, the computer method of organizing the AUXILIARY FILE, which is fully described hereinafter, must first be briefly discussed.

In the computer, the AUXILIARY FILE is divided into two parts. One part, which is permanently resident in core storage, contains the column arrays that are associated with the index M and the screen J(=m.sub.1 m.sub.2- m.sub.M). This part is generated or read from cards in the RESERVE macro-instruction when the SOLID System is initialized. It is automatically generated when the system is used for the first time. The Second part of the AUXILIARY FILE contains all those column arrays associated with the remainder of the screens in the JOB-LIST ITEM (i.e., J, LD.sub.o,- ,BD.sub.1.sub.-J, LD.sub.1.sub.-J *)and the bulk storage address. This part is divided into memory blocks which are stored on disks in the virtual memory. They are transferred to core storage by the Global Memory component (SMEMORY) whenever they are needed. The size of the memory blocks determines the efficiency of the SOLID System, because as the memory block size increases the average search time decreases, since the memory blocks will be transferred less frequently. Each information path is restricted to a single memory block.

In retrieval operations the "continuance tables" will be used by the permanently resident part of the AUXILIARY FILE to select the memory block that might contain the request path, described by a JOB-LIST ITEM. The Global Memory Component, SMEMORY, decides either that the selected memory-block is already core-resident, in which case it is not transferred, or transfers it from virtual memory to core. Thus, a maximum of one memory-block is transferred for each JOBLIST item or request. In storage operations (i.e. "continuance tables" will be used to ensure that no "information path" will extend over more than one memory-block, which guarantees that a maximum of one memory block will be transferred for each request. One System Parameter (&LTHAYY), which is described hereinafter, sets the memory-block size. It should be understood that in normal applications of the SOLID System, the memory-block size will be large enough so that there will be a high probability that many requests can be answered from a resident memory-block and, consequently, there will infrequent transfers of memory-blocks from the virtual memory (i.e.. disks) to core. During the storage cycle, if the particular storage path crosses more than one memory block, a program can be used to transfer that path to a single memory block so as to avoid the problem of crossing memory blocks during a retrieval cycle.

The system contains many composite addresses that are used during storage and retrieval operations to insure that memory blocks are correctly positioned either in core or in storage, and for enabling the machine to know where information is to be stored or retrieved at any particular time.

Composite addresses have two parts. The first part, which is called the slow memory address, specifies a location on a peripheral device like disks, drums, magnetic tape, data-cells, etc. It is used when memory-blocks or referenced information is to be transferred to or from core memory. The second part of the composite address is called the fast memory address, and it specifies a location in core-memory. Now there are two such composite addresses at the head of the permanently core-resident part of the AUXILIARY FILE, which contains the slow memory address in virtual memory (e.g. disks) where the next new memory-block can be stored. The first such composite address is called EMPTY. The slow memory address part of EMPTY will be used and updated when a new memory block is created in core-storage, and it is the location in virtual memory where the newly created memory-block can be stored by the Global Memory component (SMEMORY). The fast memory address in EMPTY is the location in core storage where the new memory block can be created. This location is always immediately after the permanently resident part of the AUXILIARY FILE.

The second composite address at the head of the permanently core-resident part of the AUXILIARY FILE is called BULK. It also has a slow memory address and a fast memory address part. The slow memory address part of BULK contains the bulk storage location where newly compressed referenced information can be stored. This part of BULK is used and updated in the storage mode when new "information paths" are created or another bulk storage address is added to an existing RFILE subarray. This occurs in the storage mode when new referenced information is added to the MAIN FILE. The fast memory part of BULK contains the location in core storage where the new uncompressed referenced information is found. In the SOLID System the address of the uncompressed information is located in the full word named LBRYY. The COPAK compressor, which is executed after new bulk storage (or slow memory) addresses as assigned, compresses the information in the location specified by the fast memory address and transfers it to the bulk storage location specified by the recently assigned slow-memory-address.

During the initialization of the SOLID System, which occurs in the macro-instruction RESERVE, the permanently core-resident part of the AUXILIARY FILE is either generated by the macro-instruction MJARRY, or it is read from cards. This card-deck is punched by the Global-Memory component (SMEMORY) during the termination procedure. SMEMORY will be described later in this disclosure. During the initialization, the first two composite addresses, which precede the M-J arrays are loaded and the third composite address, which follows immediately after the M-J arrays, is made equal to the hexadecimal number FFFFFFFF. These three composite addresses, which at this point are in the principal data-array (YY), are transferred to the locations EMPTY, BULK, and CURRENT. Three other composite addresses (CORD1, (EMPTY+&ADDLY), and ADDRESS) also plays a significant role in the SOLID System. Two of these, (EMPTY+&ADDL) and CORD1, are used to ensure that machine or operation errors will not damage the AUXILIARY FILE. The last composite address, ADDRESS, is used to trace or create the information paths in the AUXILIARY FILE. The roles played by these six composite addresses (EMPTY, BULK, CURRENT, (EMPTY + &ADDL), ADDRESS, and CORD1) are described next.

The SSEARCH component used the information in the JOBLIST ITEM to trace (retrieval or storage) or create (new storage)information paths in the AUXILIARY FILE in core-storage. The composite address BULK is used at the end of storage operation, when the RFILE array is reached, to assign bulk storage locations for new compressed referenced information. After each use BULK is updated to show the location of the next available space in the bulk storage. In the termination procedure, which is executed by SMEMORY, BULK is stored in its assigned location at the head of M-J arrays and the card deck for the next initialization of the SOLID System is punched. Of course, this new card-deck will also contain the new value of EMPTY.

For our present purpose we will suppose that a search is not discontinued because the sub-path cannot be found or created.

SSEARCH begins each task by setting (EMPTY +&ADDL) equal to the address EMPTY, and then it searches the MA array, which resides permanently in core, for the EXECUTIVE POINTER associated with the M value in the JOBLIST item If a new sub-path is being executed then a new EXECUTIVE POINTER will be constructed from the address EMPTY and correctly inserted in the array. The address part of the located (or constructed) EXECUTIVE POINTER is placed in ADDRESS. in the next step, which occurs in the TBADD macro-instruction, the slow memory address parts of CURRENT and ADDRESS are used to determine whether or not the request address (ADDRESS) points to a location in core memory, The three alternatives are:

i. If the slow memory address parts of CURRENT and ADDRESS are equal, then the fast memory address part of ADDRESS points to a location in the resident memory block. In this case a transfer between the core and virtual memory does not occur.

ii. If the slow memory address part of ADDRESS is zero, then the fast memory address part points to a location in the permanently resident part of the AUXILIARY FILE. In this case, which occurs only if the located EXECUTIVE POINTER is in array MA, a transfer of memory-blocks does not occur.

iii. If the slow memory address parts of CURRENT and ADDRESS are not equal and that for ADDRESS is not zero, then ADDRESS points to a memory block that is not resident in core. In this complex situation the Global Memory component uses the slow parts of addresses CURRENT and ADDRESS, and the MSIGNAL signal byte to decide what course of action should be taken. Full details of the various procedures that are executed by the RETRIEVAL and GLOBAL MEMORY PACKAGES are given hereinafter.

From the foregoing discussion it should be recognized that in the SOLID System the management and organization of both the AUXILIARY and the MAIN FILES is automatic in every respect. It should be further recognized that the system can be started anew or restarted when there is no information about memory-blocks in core storage and/or in the virtual memory. Moreover, fully automatic safety procedures, which are discussed hereinafter, protect the files from all machine and operator hazards. Finally, because storage is allocated automatically, the growth of the system is bound only by the total storage capacities of the machine that is being used. Thus, the entire system does not depend in any sense upon the amount of information that is stored, whether it is zero or billions on bytes.

SYSTEM PARAMETERS

There are 14 parameters in the SOLID System which must be set before the system is compiled. Properly selected values for these parameters insure that usage of core-storage and the performance of the SOLID System will be optimal. Nine of the 14 parameters can be reset, by recompiling the system, at any time. If the other five parameters (&ADDL, &LSLOW, &LFAST, &ENTRKS, and &TRKL) are reset, the AUXILIARY FILE must be regenerated from scratch.

Six parameters are used to define the eight principal data-arrays for the entire SOLID System. One of these (&LTHAYY) is the amount of core-storage that can be used by both the AUXILIARY FILE and the COPAK Compressor. Two parameters name (&JBLIST) and define the length (&LJBLIST) of the data-array that is used for storing the JOBLIST ITEMS that are produced by the TRANSLATION PACKAGE from the descriptor-sets. &LJBLIST is also the length of a work-array (JBWORK). Three parameters (&LOVER1, &LOVER2, and &LOVER3) specify the lengths of the five arrays that are used to store information about the three over-ride codes (Type 1 (1); Type 2 (2); and Type 3 (3)).

Two variable parameters are associated with the COPAK Compressor One of these, &LTHAYY, which is also associated with the AUXILIARY FILE, has already been mentioned. The other parameter (&TPCORD) is used to optimize the performance of the alphanumeric component of the COPAK Compressor.

Eight parameters are associated with various aspects of the AUXILIARY-FILE. One of these (&LTHAYY) has already been mentioned. Four of the eight parameters (&ADDL, &LTHAYY, &NTRKS, and &TRKL) are concerned exclusively with the way in which the AUXILIARY FILE is transferred between core-storage and virtual memory (i.e. disks). The last four variable parameters (&LSLOW, &LFAST, &MATRIXL, and &MATRIXS) together determine the length of each newly created column array. Two parameters (&MATRIXL and &MATRIXS), and information about the over-ride codes, determine the way in which the arrays that are associated with screens are searched.

The roles played by these fourteen System Parameters are discussed next.

A. Structure of JOBLIST ITEMS.

In the SOLID System a TRANSLATOR component, which is called by the TRANSLATION PACKAGE, rearranges the previously assigned descriptor-sets to the particular linear form that is used to trace the "Information Paths" through the part of the AUXILIARY FILE in core-storage. A new TRANSLATOR component must be recoded for each new collection of items that are to be stored in the SOLID System.

The linear form (JOBLIST ITEM) for an information representation with M nested representations and

Kernels is:

JOBLIST ITEM = (M/m.sub.1 m.sub.2 -m.sub.m /LD.sub.0 /BD.sub.1 /LD.sub.1 /.-./BD.sub. 1.sub.-J /LD.sub.1.sub.-J *) . (7) In the computer 7 is expanded to 8. ##SPC5##

Ljbi is the number of bytes in the JOBLIST ITEM (up to and including the last screen, LD.sub.1.sub.-J *). Each screen is preceded by a half-word (two bytes) which contains its length plus two. For example (LL.sub.O -2) is the length of screen LD.sub.0.

The screen that is associated with the "zero" or principal diagonal of the second bit-map (B.sub.2) is omitted from the computer representation, (8). Screens associated with terminal zero on empty diagonals in both the second bit-map (B.sub.2) and LABEL (i.e. LD.sub.0, LD.sub.1,-, BD.sub.1.sub.-J, LD.sub.1.sub.-J) are omitted from the representation, 8. B.sub.2 diagonals in 8 are left-adjusted to a full byte boundary. Thus the JOBLIST ITEM 5 should be written:

JOBLIST ITEM = (1/3 /ABC/128/D/128/F/64/H*)

In the computer this representation is expanded to 9 thus:

In 9 there are 29 bytes in the JOBLIST ITEM. The first screen, J=3, is used to compute the rank

of the information representation (IR). The first B.sub.2 diagonal entered in 9 is the screen 128. This screen and the K value together indicate that the associated diagonal of the IR, whose screen is D, actually is D.phi.. The asterisk in the last screen (H*) indicates termination of the information path. The terminal zero or empty screens associated with B.sub.2 and LABEL have been omitted.

JOBLIST ITEMS like 9 are constructed from assigned discriptor-sets by the TRANSLATOR components in the data-array &JBLIST (=JBLIST), which is &LJBLIST bytes long. Random JOBLIST ITEMS can be generated for test purposes. The TRANSLATOR components also extract information about over-rides from the assigned or generated descriptor-sets and stores it in the five over-ride arrays (AOVER1, AOVER2, AOVER2R, AOVER3, AOVER3R), whose lengths are defined by the parameters &LOVER1, &LOVER2, and &LOVER3. The TRANSLATOR components can produce more than one JOBLIST ITEM. Moreover, the JOBLIST may be automatically changed by adding or deleting items during retrieval or updating of the AUXILIARY FILE. The information that is generated by the TRANSLATOR components (i.e. JOBLIST ITEMS, over-ride tables, etc) is used to trace (for retrieval), modify (for updating), create (for new storage), or purge "information paths" in the AUXILIARY FILE.

The variable parameters &JBLIST, &LJBLIST, &LOVER1, &LOVER2, and &LOVER3 that are associated with the TRANSLATION PACKAGE can be changed at anytime. However, in this case, the CONTROL routine and three components (SMEMORY, SRESULT, and SSEARCH) might have to be recompiled.

The structure of the composite address used in the SOLID System are discussed next.

B. Address Structure:

Three system parameters (&ADDL, &LSLOW, and &LFAST) sets the number of bytes in the addresses that are stored in the AUXILIARY FILE. The elements in the column-arrays in the AUXILIARY FILE are EXECUTIVE POINTERS, each of which contain a single address. In the "path" tracing procedure, the address part of an EXECUTIVE POINTER is the location of the next column array that is to be created (in a new storage or updating act) or searched (for retrieval). The elements of the RFILE sub-arrays, which are the terminal location of the "information paths," contain the Bulk Storage Address (BSA) of the referenced information.

In the current version of the SOLID System the structure of the component addresses is:

Here:

D is a code (0 to 15) which specifies a particular type of peripheral storage device (viz., disk, magnetic tape, data-cell, drum, etc.)

Dno is a code (0 to 15) which specifies a particular device of type D.

Trk (0 to 63) is the track where the information begins.

Cyln (0 to 1023) is the cylinder where the information begins.

(Note if D specifies magnetic tape then the record is stored in the two bytes occupied by TRK and CYLN).

Fmadd is the beginning location in core-memory where the information will reside.

The lengths of the slow (D,DNO,TRK,CYLN) and fast (FMADD) parts of the composite addresses are set by the variable parameters &LSLOW and &LFAST respectively. The total length of the composite address is set by the parameter &ADDL. In one program the values of &LSLOW, &LFAST, and &ADDL, are 3, 3, and 6 bytes respectively. Assignments of the device type code (D) for the AUXILIARY FILE and RFILE address need not be the same. In one programs D=0 means a magnetic tape drive (for Bulk Storage Addresses) or an IBM 2314 disk drive (for AUXILIARY FILE Addresses).

Three service-macros (APART, ASADD, and COMPARE) disassemble, assemble, and compare addresses of the above type. If any of the above three System Parameters (&ADDL, &LSLOW, or &LFAST) are changed, these three macros must be recoded, and the AUXILIARY FILE must be regenerated. In one of the version of the SOLID System, one IBM 2314 disk pack is used to store the AUXILIARY FILE. This disk has been assigned D=0 and DNO=0. The compressed "referenced" or "bulk" information is stored on magnetic tapes which have been assigned D=0 and DNO = 0, 1,-, 15.

C. Computer Organization of the AUXILIARY FILE

In the SOLID System the AUXILIARY FILE is divided into two parts. One part, which is permanently resident in core-storage, contains the column arrays that are associated with the prime index M and the screens beginning with J(=m.sub.1 m.sub.2 -. m.sub.m) (see (8)). This part is generated by the service-macro MJARRAY, which is called by the initializing component SSTATECL, when the SOLID System is used for the first time. Thereafter, it is read from cards by SSTATECL in the macro RESERVE at the start of each job-steam. A new card-deck is punched as the final act in each job-stream.

The second part of the AUXILIARY FILE contains all those column arrays that are associated with the remainder of the screens in the JOBLIST ITEMS (i.e., LD.sub.0, BD.sub.1, LD.sub.1,-, LD.sub.1-J *) and the Bulk Storage Addresses (BSA), which are assigned when they are needed. This part is divided into memory-blocks, which are stored on disks in virtual memory, and are transferred to core-storage by the Global Memory component (SMEMORY) whenever they are needed. The size of the memory-blocks determines the efficiency of the SOLID System. As the size of the memory-blocks increase, the average search-time decreases because the memory-blocks will be transferred less frequently. Complete information paths are restricted to a single memory-blocks. Thus, because a maximum of one memory-block is transferred per query, search-time is virtually independent of the AUXILIARY FILE size.

Each memory-block is prefaced by a composite address (of length (&ADDL) whose fast-memory address, FMADD (of length &LFAST), is its first unused byte. This information is used when new sub-paths are to be created. The slow-memory part of the composite addresses (of length &LSLOW) contains the beginning location in virtual memory where the memory-block will be stored. The first part of the AUXILIARY FILE, which resides permanently in core-storage, is prefaced by two composite addresses. The first of these is associated with the first unused byte in the last memory-block. This information is used to create new memory-blocks or to extend blocks that are full. The second composite address is the Bulk Storage Address that is used to assign slow-memory locations for new referenced information. Its fast-memory part, FMADD (of length &LFAST) contains the location in core-storage when the compressed referenced information will reside.

Three system parameters (&LTHAYY, &NTRKS, and &TRKL) together determine the size of every memory-block. &TRKL is the length of a single record in virtual memory. &NTRKS is the number of records of length &TRKL in each memory-block. &LTHAYY is the number of bytes in the principal data-array (YY), in core-storage. This array, YY, must contain the permanently resident part of the AUXILIARY FILE (i.e., arrays associated with M and J); a single memory-block; and approximately two strings of uncompressed referenced information. In one example of the program for the SOLID System &TRKL=7294; &NTRKS=10; and &LTHAYY=100,000. If these three parameters are changed once then the service-macros APART, ASADD and COMPARE must be changed, and the system started from scratch.

Two system parameters (&MATRIXL and &MATRIXS) determine the lengths of all newly-created column-arrays. These two parameters, which can be reset at any time are defined in the next section, D.

D. Definitions of Parameters

In this section, D, the fourteen system parameters of the SOLID System are defined.

&ADDL: The number of bytes in the composite addresses that are used in the AUXILIARY FILE.

&lslow: the length (in bytes) of the slow-memory part of the composite address.

&LFAST: The number of bytes in the fast-memory part of the composite address.

Note: &addl, &lslow, and &LFAST are associated with the service macros APART, ASADD, and COMPARE. These macros must be changed if &ADDL, &LSLOW or &LFAST is altered.

&JBLIST: (=JBLIST): The name of the array that is used for storing the JOBLIST ITEMS that are produced by the TRANSLATOR COMPONENTS (see Section A).

&ljblist: the number of bytes in the JOBLIST array, &JBLIST, and in the JOBLIST work-array, JBWORK.

&lover1: the length (in bytes) of the three principal over-ride arrays (AOVER1, AOVER2, and AOVER3) that are used for storing information about Type 1, Type 2, and Type 3, over-rides. Two additional over-ride arrays used by the TRANSLATOR components. AOVER1, AOVER2, and AOVER3 normally contain information about the specific locations (in the JOBLIST array, &JBLIST) of the designated types of over-ride codes.

&LOVER2: The length of the second, Type 2 over-ride array (AOVER2R). This array normally contains information for updating the "information paths" (in the AUXILIARY FILE) and the compressed referenced information (in the Bulk Storage).

&LOVER3: The number of bytes in the second Type 3 (or gate) over-ride array (AOVER3R), which normally contains the two gates for each Type 3 over-ride whose location is in array AOVER3. &LOVER3, should be about the same length as &LOVER2, and twice the length of &LOVER1.

&lthayy designates the length of the principal data-array (YY) of the SOLID System. YY should be large enough to include the permanently resident portion of the AUXILIARY FILE, which holds the M-J arrays in about 1080 bytes; a memory-block (=&NTRKS*&TRKL bytes); and two strings of uncompressed reference information. In one program &LTHAYY is set equal to &NTRKS*&TRKL plus 20,000. In large applications of the SOLID System (viz., a national retrieval network) it is anticipated that the value of &LTHAYY will be set between 1,000,000 and 15,000,000.

&MATRIXL is the number of elements in the column-arrays that are associated with the screen of the principal diagonal of the information representations. In the JOBLIST ITEM 9 this is ABC.

&matrixs: the number of elements (e.g. EXECUTIVE POINTERS) in the column arrays that are associated with all screens other than the principal one. In JOBLIST ITEM (8) these are: J(=m.sub.1 m.sub.2 -m.sub.M), BD.sub.1, LD.sub.1, etc. At present &MATRIXS is also the number of Bulk Storage Addresses (BSA(in the RFILE sub-array which terminates the "information paths." However, the RFILE structure can be changed so that it can be accessed independently.

&NTRKS: The number of records in virtual memory that are in each memory-block (see Section C). In our program the record length (&TRKL) is set equal to the IBM 2314 disk track length (=7294 bytes). Thus &NTRKS is the number of tracks needed to store a single memory block.

&TPCORD: The number of permanent cords that are to be used by the alphanumeric compressors in the fast mode. This is one of two parameters which together determine the optimum throughput rate for the COPAK compressor

&TRKL: The number of bytes in each record in virtual memory. In one program &TRKL is set equal to the track length of the IBM 2314 disks (7294).

DESIGN PHILOSOPHY

In designing a large computer oriented system it is essential that design and performance specifications for the completed system be set up. The design goals for the SOLID System are given next.

i. Storage, retrieval, updating, and purging tasks must be accomplished as fast as possible.

ii. The system must be independent of the information base.

iii. Components of the SOLID System should be capable of being coded independently of all the other components in the system.

iv. Programming of the fully implemented scheme, to meet the varied needs of users, should be as simple as possible.

v. The system should be open ended to provide for future innovations.

vi. The coded system should be as free of machine dependence as possible to provide for easier translation to other computer configurations.

Because the size and scope prohibit writing the system as a single program, it was decided to write the system on a component by component basis. Each component performs a single task in the overall scheme. For example, one component handles card input; another the output; and so on for each separate task the system will perform.

To simplify recoding of a large system for another computer, it is essential that a higher language be used or developed. The present higher languages (e.g., FORTRAN, ALGOL, COBOL, SNOBOL, COMIT, etc.) are not suitable for coding large retrieval or indexing systems because they do not have the bit and byte manipulation capabilities that are essential for efficient machine coding. The SOLID System has associated with it an open ended higher language, ALLOCATE, which grows as the system is implemented. Thus, the fully implemented SOLID System will be coded in a machine independent higher language that can provide the basis for a retrieval language. The macro language provided in the IBM System 360 provides a starting point for the language ALLOCATE.

Each component of SOLID is coded in the macro language. The central concept is one of extensively nested macros incorporated into the assembly language processor of the computer. In this way the normal operations of the assembly language are extended with marco instructions that perform the special operations needed for SOLID. In the IBM 360 System the assembly language is called BAL. A programmer can add, delete, or ignore components of the system at any time. This design allows for the unrestricted growth of the system and the retrieval language (ALLOCATE) by adding new components to the system macro library.

Translation of the SOLID System to other computer configurations is greatly simplified by the use of the component type system. For example, it is possible to translate the SOLID System directly to FORTRAN IV by a suitable translator. The translation will be performed component by component rather than by trying to rewrite the entire system. Moreover, since the components are independent of one another, only those components needed for the particular application of SOLID need be translated. The necessity for programming around deleted components becomes unnecessary.

Design of the SOLID System

The design of the SOLID System is based on the concept of a system which contains two subsets of instructions. In one subset are all the assembly language instructions. The second subset, which is entered in the macro library (SOLID.MACLIB) of the System 360 Assembly Language Processor (ALP), contains all of the components of the SOLID System and certain selected service macros. Components are entered as macro subroutines with their own USING controls so that they may be placed anywhere in the system. Independently compiled components are stored in a partitioned data set, SOLID.LOAD.

Since both subsets of instructions are processed by the same compiler, the programmer can code in any arbitrarily selected combination of assembly language and macro instructions. In the remainder of this discussion the terms: "level of coding" or "coding level" refer to one such arbitrarily selected combination of instructions. At every coding level, instructions can be added, deleted, or over-riden. New Instructions can be programmed at a previously defined coding level. In its final form, the SOLID System, which will be coded at the highest level (in the language ALLOCATE), is independent of the machine used. As the coding level is lowered, the proportion of instructions needed from the first subset (assembly language) in increased and the programming becomes more difficult. The language ALLOCATE can have less than fifteen instructions, drawn from the two subsets mentioned above.

The two part design that has envolved for the SOLID System consists of the various components and service macros, MACROPAK, and a control routine (CONTROL). The control routine, which is coded in the evolving higher language ALLOCATE, assigns tasks to the various components of the SOLID System when a search, storage, compression or decompression job is executed.

The service macros in MACROPAK perform specialized tasks such as bit, byte and string manipulation which are necessary for an information system. Also included in this group are macros used for the input/output operations; the calling procedures for branching from the control routine (CONTROL); and other specialized macros. The components are coded in a hierarchical fashion with extensive nesting. The kind of hierarchical arrangement that has been achieved is illustrated in FIG. 14.

Each of the coding levels indicated in FIG. 14 designates an arbitrarily selected combination of assembly language (first subset) and macro (second subset) instructions. The pseudo-operations used in each of these arbitrarily selected coding levels are defined in MACROPAK and are themselves coded in mixtures of instructions from any of the lower coding levels. In a hierarchical scheme of this kind, the difficulty of coding the system is decreased as the coding level is raised. This occurs because both the proportion of assembly language instructions and the need for a specific knowledge of the contents of RESERVE are decreased.

The open ended, two part design just described permit optimum machine language coding while giving rise to a "machine independent" language that greatly simplifies the task of recoding the SOLID System for a new configuration. For the System 360 for example this design has naturally led to the full utilization of the System 360 macro language facility. In some configurations it may be necessary to extend the assembly language processor so that those instructions that are not defined in the machine instructions set can be handled. Moreover, some or all of the macro instructions in the second subset may suggest ways in which existing hardware can be modified or they may influence the design of fourth generation machines. In this connection the automatic multi-stage COPAK compressor can be realized in the form of a small fast computer with an equivalent hardware set.

Contents of MACROPAK

The macro-instructions in MACROPAK can be classified as follows:

i. Reserve: There are twenty-two heavily nested macro-instructions which together contain the data declarations, system parameters, and status controls for the SOLID System and its stand-alone subsystems. Only two of these 22 instructions actually appear in the different CONTROL routines.

ii. Service-Macros: macro-instructions which simulate useful but unavailable "instructions", including certain rather elaborate operations.

iii. Components: useful macro-subroutines too large or complex to be viewed as single operations.

The 151 entries in one of the versions of MACROPAK are stored in SOLID.MACLIB, which is concatenated with the macro-library of the System 360 Assembly Language Processor (ALP). The contents and functions of entries in the three classes (Reserve, Service Macros, and Components) are described below:

Reserve:

The first and last executable instructions in each CONTROL routine is a "RESERVE" type and a "SUBMP" type of macro-instruction respectively. MACROPAK contains two "RESERVE" (RESERCO and RESERVE) and eight "SUBMP" macro-instructions. Different combinations of these two kinds of instructions are used in the CONTROL routines for the SOLID System and its stand-alone subsystems. The fourteen System Parameters defined earlier, which are used to tailor the SOLID System to the machine configuration, are primarily associated with the "RESERVE" and "SUBMP" types of instructions.

The "RESERVE" type of macro-instruction is located immediately after the START instruction of the CONTROL routine. It establishes addressability; defines all global constants, counters, DCB or format statements, work arrays, and registers; and initializes the CONTROL routine. It also contains the instructions for opening and closing all those input/output devices that are used in the SOLID System for communication purposes and storing the referenced information. These various functions are performed by one of three "JUNK" type instructions (JUNK, JUNKC, JUNKR), which contain six of seven special macro-instructions that perform the identified special tasks (e.g. define constants). One of these special macro-instructions (DEVICES) calls the component OPENSHUT, which executes all opening and closing instructions, and the component SACTION, which is designed to execute all error-correcting procedures for the COPAK compressor. The CONTROL routine for the SOLID System is initialized in the component SSTATECL, which is called from the RESERVE macro-instruction. JUNK and JUNKC are slightly modified versions of JUNKR. JUNK is used to separately compile components see Components). JUNKC is used in RESERCO for the stand-alone subsystems.

The fully expanded "JUNK" type instruction contains more than 300 items whose significance must be understood if the SOLID System is to be coded entirely in assembly language. However, at the highest coding level (i.e., the language ALLOCATE) the significance of only the twenty-seven input/output commands need be fully understood.

The "SUBMP" type of instruction appears immediately before the END instruction of the CONTROL routines. Its function is to locate the literal pool; define the principal data array and specify their lengths; dummy addresses of unused components; and positions of the OPENSHUT component at compilation time. The principal data-array (YY) is specified in "SUMBP" instructions. The two JOBLIST arrays (JBLIST and JBWORK) and five over-ride arrays (AOVER1, AOVER2, AOVER3, AOVER2R, and AOVER3R) are specified in the WORKAREA macro-instruction, which appears in certain of the "SUMBMP" type of instruction.

Service Macros:

In one version there are 98 entries in MACROPAK that are classified as part of the service macros.

For the most part, the macros classified under this heading performs special tasks and are extensively nested within components see Components). Among these entries are macros which will truncate a floating point number to fixed-point (TRUNC); convert a fixed point number to floating point (CONVE); move any string of information left (LMOVE) or right (RMOVE=RMVC) by a designated number of bytes; and several other macros which perform the specialized bit, byte, and string manipulative tasks needed for the SOLID System.

All the macros dealing directly with the input/output devices such as the card reader, card punch, printer, disks, and tape units are also classified as part of the service macros. The macros for reading and writing magnetic tape also perform the tasks of blocking or deblocking the information. At the highest coding level (i.e., the language ALLOCATE) card input, tape input and the entire output-operation(s) are single macro-instructions.

Other macros of special note here are nine macros which facilitate branching between the components.

Components:

Components are macro-instructions that contain their own USING controls for establishing addressability. They can be compiled in the control routine, or they may be separately compiled in named CSECTS. Compiled CSECTS are link-edited and stored in the partitioned data-set, SOLID.LOAD. The calling procedures are discussed herein after.

In one version there are 31 components of the SOLID System. The following three components initialize the system at the indicated times. OPENSHUT, which is called by the DEVICES macro-instruction in the "JUNK" type instructions, performs the tasks of opening and closing input/output devices at the beginning and end of each job-stream. SSTATECL, which is called by the RESERVE macro-instruction, initializes the CONTROL routines for the SOLID System at the beginning of each job-stream. SCOMMAND, which is called from the control routines, initializes the system at the beginning of each new job and before each use of the COPAK compressor.

Four of the components handle the input/output for the SOLID System (SJOBLIST, SREADC, SREADT, and SOUTPUT). These entries use the input/output service macros and another three components (SPR1NT, SPUNXH, and SREID). Performance data for each job (component SRESULT) and for the COPAK Compressor (service macro SAVINGS in SANPAKC) are printed. In production runs this information would not be required.

There are six components which are associated with the COPAK Compressor. One of these (SACTION) which is called from the "JUNK" type instruction, is intended to process the error-correcting procedures for the several forms of the COPAK Compressor. Six service-macros handle the task of setting-up the strings of referenced information and setting the status controls for the three principal compressor components, (SANPAKC, SANPAKD, and SNUPAK). The remaining two components, (SNAPAKJ and SNUPAB) are variants of SANPAKD and SNUPAK that are used in the separate alphanumeric and numeric stand-alone compressors.

Eleven components are used by the TRANSLATION PACKAGE, which generates JOBLIST ITEMS in their normal forms. The supervisory component (SJOBLIST), which also handles input, uses three service-macros (JLITEM, TRANSLATE, and NORMFORM) to call the random JOBLIST generator component (SGENITEM); the five translator components (STLATOR1, STLATOR2, STLATOR3, STRATLOR4, and STLATOR5; and the normalization component (SNORMAL). The fully implemented SNORMAL component will use the three transformation rules, which are to be coded in components SCYCLIC, SREFLECT and SXCHANGE. These transformation rules will be used in the RETRIEVAL PACKAGE also. SGENITEM can be used to generate the random JOBLISTS that are needed to evaluate the performance of the SOLID System. The five translator components will actually perform the task of extracting (if necessary) and rearranging the descriptor-sets to the JOBLIST ITEM form. A new translator component must be coded for each new application of the SOLID System. There are provisions for incorporating up to 255 different translator components.

Five components are used by the RETRIEVAL PACKAGE, which duplicates the tracing, creating, purging and modification of "information paths" described earlier. One component (SRESULT), which is mentioned above, prints performance data for the RETRIEVAL PACKAGE. The supervisory component (SSEARCH) uses two service-macros (MMATCH and TBADD) to call the MOBILE CANONICALIZATION PACKAGE and the Global Memory component (SMEMORY). The Global Memory transfers the memory-blocks of the AUXILIARY FILE between core-storage and virtual-memory whenever they are needed. Two new components (SMATCH and SMOBILE) and the three transformation rules (SCYCLIC, SREFLECT and SXCHANGE) are used in the MOBILE CANONICALIZATION PACKAGE. The supervisory component of the RETRIEVAL PACKAGE, SSEARCH, has ten variable parameters which together designate the number of elements in each column array, and optimize the searches.

C. The CONTROL Routine.

The CONTROL routines are used to "thread" the retrieval, storage, updating, purging or compression problem through the various parts of the SOLID System. CONTROL is coded exclusively in the higher language ALLOCATE. In one version the CONTROL routine for the entire SOLID System contains sixteen statements. Five of these are ALP instructions. The other eleven instructions are taken from the second subset of macro-instructions. Thus the CONTROL routines can be easily changed to meet user needs by simply adding or deleting single instructions. This facility, and the fourteen System Parameters mentioned earlier permit the translation of the SOLID System to a particular machine configuration or application.

There are seven service macros which facilitate the branching among components. The three types of control that can be achieved by single macro-instructions are illustrated in FIG. 15. Five service macros (CALL1, CALL2, CALL3, CALL4, and CALL5) permit branching from the CONTROL routine to a component and the eventual return to any pre-assigned address in CONTROL. For example, the pseudo-operation; CALL1, SEARCH,RESEARCH passes control to the SSEARCH component and then returns it to the address RESEARCH in the CONTROL routine. The TRANSFER and GLOBAL service-macros permit branching between components.

The first and last executable statements in the CONTROL routines are the "RESERVE" and "SUBMP" type of macro-instruction respectively. Any allowed assembly language and service-macro instruction can be inserted between these two statements. The following program will read a record of compressed information from magnetic tape and print it in hexadecimal format.

SOLIDIO START 0 RESERIO 600 MVC MODE (4),ZERO REIDT LIST,JII LA BRY,JII PRINT B,LIST,O(BRY) B CL1 LIST DS 150F SUBIO END SOLIDIO

the information is read into the array LIST, which can contain up to 600 bytes. JII contains the record length.

Reserio and SUBIO are minor modifications of the RESERCO and SUBCE macro-instructions respectively.

DESCRIPTION OF MACR.phi.-INSTRUCTIONS IN MACROPAK

The two-part open-ended design that has evolved for the SOLID System consists of MACR.phi.PAK, which contains the second subset of instructions, and a control routine (CONTROL). This control routine, which operates under the O/S system, assigns the specific tasks to the individual components of the SOLID System. It is easily changed to include new hardware or for special applications of parts of the SOLID System.

MACR.phi.PAK, which is entered in the macro-library (SOLID.MACLIB) of the System 360 Assembly Language Processor (ALP), contains entries which are identified in the Reserve, Service-Macro, or Component classes (see above). The individual macro-instructions are described below. Listings of the macro-instructions are given in the Appendix.

In the following, the macro-instruction name (e.g., SKIPP) precedes the prototype statement thus: SKIPP - (&J SKIPP &LC). A blank left field in the prototype statement means that there is no location variable.

A. Reserve

Twenty-two entries in MACROPAK are identified in the "Reserve" category. There are three "JUNK", two "RESERVE", and eight "SUBMP" types of instructions. The roles played by these three kinds of macro-instructions has already been described.

Seven of the 22 entries can be viewed as special service-macros for the "JUNK" type instruction. Another two entries, ENTRANCE and WORKAREA can be regarded as special service-macros for the SUBMP type instructions. These nine "special service-macros" of the Reserve category perform tasks like defining global constants; beginning and terminating jobstreams; declaring formats; specifying relocatable addresses and entry points; designating registers, counters and gates; and defining work areas for the SOLID System.

Two of the three "JUNK" type instructions are used in the two "RESERVE" type instructions. The third "JUNK" type instruction is a special variant that is used to separately compile components of the SOLID System. Only one of the "RESERVE" type instructions, and one of the eight "SUBMP" type instructions appears in each CONTROL routine. Three components (OPENSHUT, SACTION and SSTATECL) are associated exclusively with the "RESERVE" type of instruction. Two of these (OPENSHUT and SACTION) are called from one of the special service-macros for JUNK type instructions, DEVICES. The third component, SSTATECL, is called only by the macro-instruction RESERVE. The components are described in Section C of this Chapter.

The fourteen System Parameters, defined elsewhere in this disclosure, are the variable parameters for the "RESERVE" and "SUBMP" type macro-instructions. The System Parameters are briefly described in Table 1.

Table 1. Brief descriptions of System Parameters for the solid system. Definitions are given in the section on "System Parameters."

System Parameter Description __________________________________________________________________________ &ADDL Length of composite addresses.

&JBLIST Name of the JOBLIST array (JBLIST) &LFAST Length of fast-memory part of composite addresses &LJBLIST Length of the &JBLIST array.

&LOVER1 Length of the principal over-ride arrays &LOVER2 Length of the second Type II over-ride array &LOVER3 Length of the second Type III over-ride array.

&LSLOW Length of slow-memory part of composite addresses &LTHAYY Length of the principal data-array &MATRIXL Number of entries in sub-arrays of the principal screen

&MATRIXS Number of entries in sub-arrays of the secondary screens &NTRKS Number of seconds in each memory-block &TPCORD Number of permanent cords to be used in fast mode

&TRKL Length of records in memory-blocks __________________________________________________________________________

The 22 macro-instructions in the Reserve category are described next.

Special Service-Macros:

1. CONSTANT-(CONSTANT )

Global constants and some error messages are defined in CONSTANT. This macro-instruction appears only in the three "JUNK" type instructions.

2. DEVICES-(&J DEVICES ) .

The DEVICES macro loads the second (BR3 11) and third BR4 12) USING registers for the CONTROL routines. It calls the component OPENSHUT, which executes the IBM OPEN and CLOSE system macro-instructions for the communication (e.g. Card Reader and Punch; Printer) and bulk storage (e.g. magnetic tapes) devices. DCB statements for these devices are given in the INOUT special service macro of the Reserve category (see below). DEVICES prints messages about the bulk-storage devices and it calls the components SACTION and SMEMORY. After executing the termination procedures (at the end of each job-stream) control is returned to the IBM Operating System O/S).

3. entrance-(entrance )

entrance is used in six of the eleven "SUBMP" type instructions to specify that the relocatable addresses for all components are in the main-stem of the designated CONTROL routines. These CONTROL routines are actually compiled in the so-called extended forms (see Section C). The remaining five CONTROL routines, whose "SUBMP" type instructions do not contain ENTRANCE, are executed with planned overlays.

4. INOUT-(INOUT )

The DCB or format statements for communication (e.g. Printer, Card Reader and Punch) and bulk storage (e.g. Magnetic Tapes) devices are specified in INOUT. The current allocation of resources is as follows:

DCB Name Device Assigned Purpose __________________________________________________________________________ MASTER Card-Read Read 80-columns of a card. PRINT Print Prints 132 bytes, preceded by the control character. PUNCH Card-Punch Punches 1 columns on a card. TAPEIND Tape Input tape containing uncompressed referenced information TAPEINC Tape Input tape containing compressed referenced information. TAPEOTC Tape Output tape for the COPAK Compressor. TAPEJB Tape Input tape containing the descriptor-sets. __________________________________________________________________________

If new DCB's are added, then the DEVICES macro must be changed also. Opening and closing operations, and DCB's for the Global Memory component, SMEMORY, are specified in the DCBMEM marco-instruction, which is executed in SMEMORY.

5. modadi-(modadi )

the relocatable addresses for the 31 components, five over-ride arrays, and two JOBLIST arrays (JBLIST and JBWORK) are specified in the MODADI macro-instruction by V-type addresses. The address of the principal data-array, YY, is also specified. MODADI is sued in the JUNIO, JUNKC and JUNKR macro-instructions. All new V-type and A-type addresses for the SOLID System must be specified in both MODADI and MODADX.

6. modadx-(modadx )

except for the EXTRN declarations of all A-type addresses, the MODADX and MODADI macro-instructions are identical. MODADX is used only in the macro-instruction JUNK, which is used to separately compile the individual components as named CSECTS.

7. savearea-(savearea )

global storage-areas for saving registers in the SOLID System are specified in the SAVEAREA macro-instruction. Some registers are also assigned names. The five COSAVEX (X=1,2,3,4 or 5) arrays are used by the TRANSFER calling instruction. If more than five levels are to be used, new arrays must be defined (see Section C).

8. storage-(storage &tpcord,&ljblist)

the STORAGE macro-instruction allocates storage for the input/output commands, indicators, counters and gates; and for the composite addresses used by the RETRIEVAL PACKAGE. The permanent cords table (PCORDS), and other arrays used by the COPAK compressor are also allocated in STORAGE. The System Parameters &TPCORD and &LJBLIST are mentioned in Table 1, above.

9. WORKAREA-(WORKAREA &LOVER1,&LOVER2,&LOVER3,&LJBLIST)

The lengths of the fiver over-ride arrays (AOVER1,AOVER2,AOVER2R, AOVER3, and AOVER3R) and the two JOBLIST arrays (JBLIST and JBWORK) are assigned in WORKAREA, which is a special service-macro for the "SUBMP" type of instruction. These eight arrays have been assigned relocatable addresses (in MODADI and MODADX). The absolute machine address for the beginning of each array is found in the location specified in the MODADI and MODADX macro-instructions. For example the word AAOVER1 contains the absolute machine address of array AOVER1.

"JUNK" TYPE INSTRUCTIONS

Two of the four "JUNK" type instructions (JUNKC and JUNKR) are actually service-macros for the two "RESERVE" type instructions. The third, JUNK, is used to separately compile the components of the SOLID System for the overlay forms of the CONTROL routines. The two System Parameters that are associated with "JUNK" type instructions specify the number of permanent cords (&TPCORD) and the length of the two-JOBLIST arrays (JBLIST and JBWORK), &LJBLIST. The values for these parameters are stored in locations PCGATE and LJBLIST in the STORAGE macro-instruction.

10. JUNK-(JUNK &TPCORD,&LJBLIST)

Junk is used as a DSCET to separately compile the components of the SOLID System. It uses the MODADX macro-instruction.

11. JUNKC-(JUNKC &TPCORD,&LJBLIST)

Junkc is used in the RESERCO macro-instruction, which appears in the six CONTROL routines for stand-alone operation of the COPAK compressor and its components.

12. JUNKR-(JUNKR &TPCORD,&LJBLIST)

Junkr is the service macro for RESERVE. The two CONTROL routines for the SOLID System, SOLIDE and SOLIDO, contain the RESERVE macro-instruction.

"RESERVE" Type Instructions

The three "RESERVE" type instructions are the initializing instructions in different CONTROL routines. The four System Parameters associated with these instructions have been defined in Table 1. Each "RESERVE" type instruction executes the IBM SAVE instruction, which saves all registers, and establishes the addressability of the entire CONTROL routine with registers 10, 11 and 12. These registers have been named BR1, BR3 and BR4 respectively. A brief description of each "RESERVE" type instruction is given below:

13. RESERCO-(RESERCO &LTHAYY,&TPCORD,&LJBLIST).

Reserco is the initializing macro-instruction for the six CONTROL routines for the stand-alone COPAK compressor and it components. The addresses SAVEYY, SBRYY and SBRY are computed and the permanent cords table, PCRODS, is set to zero.

14. RESERVE-(RESERVE &ADDL,&LTHAYY,&TPCORD,&LJBLIST)

Reserve initializes the CONTROL routines for the entire SOLID System (SOLIDE and SOLIDO). The actual initialization of registers, M-J arrays, and the permanent cords table (PCORDS) occurs in the component SSTATECL, which is called from RESERVE.

"SUBMP" Type Instructions

The "SUBMP" type instruction is the last executable statement in the CONTROL routines. Its function is to locate the literal pool; to dummy the relocatable addresses of unused components; to specify and position the eight principal arrays; and to position components compiled in the main stem. There are eight "SUBMP" type instructions in MACROPAK. Six of these (SUBCB, SUBCBO, SUBCE, SUBCO, SUBCJ, and SUBCJO) are associated with the six CONTROL routines for the stand alone compressors. Two (SUBME and SUBMO) are used in the CONTROL routines SOLIDE and SOLIDO respectively. A "SUBMP" type instruction ending with an 0 is used in the CONTROL routine that is to be executed as a planned overlay. "SUBMP" type instructions without the end 0 are used for the so-called extended forms of the CONTROL routines. (see below)

Thirteen of the 14 System Parameters are associated with the "SUBMP" type of instruction. These parameters have been defined in Table 1. The eleven "SUBMP" type instructions are described next.

15. SUBCB-(SUBCB &LTHAYY).

Subcb and RESERCO are used in the CONTROL routine for the extended form of the stand-alone numeric compressor-decompressor (COPAKNU).

16. subcbo-(subcbo &lthayy)

subcbo is a variant of SUBCB which is used for the planned overlay form of the stand-alone numeric compressor-decompressor (COPAKNUO).

17. subce-(subce &lthayy)

subce and RESERCO are used for the extended form of the stand alone combined compressor-decompressor, COPAKCO. This compressor-decompressor contains both the numeric and alphanumeric part.

18. SUBCJ-(SUBCJ &LTHAYY)

The extended form of the stand-alone alphanumeric compressor-decompressor, COPAKAN, uses the macro-instructions RESERCO and SUBCJ.

19. subcjo-(subcjo &lthayy)

subcjo is the special variant of SUBCJ that is used in the planned overlay form of the CONTROL routine COPAKAN, which is called COPAKANO.

20. subco-(subco &lthayy)

subco is a special variant of SUBCE. It is used in the planned overlay CONTROL routine COPAKCOO.

21. subme-(subme &addl,&lslow,&lfast,&ntrks,&trkl,&lthayy &jblist,&ljblist,&lover1,&lover2,&lover3,&matrixl,&matrixs)

subme and RESERVE are used in the extended form of the CONTROL routine for the entire SOLID System, SOLIDE. This CONTROL routine executes all 31 components. The 13 System-Parameters associated with SUBME have been defined in Table 1.

22. SUBMO-(SUBMO &LOVER1,&LOVER2, &LOVER3,&LJBLIST)

Submo is sued for the overlay form of the CONTROL routine for the entire SOLID System, SOLID0.

A "RESERVE" type macro-instruction is the second instruction in the control routine. A "SUBMP" type macro-instruction always precedes the the END statement.

B. Service-Macros

The 98 service-macros in the current version of MACR.phi.PAK can be identified as either General or Special Service-Macros. The 37 General Service-Macros are needed for most information processing systems. Thus they can be regarded as basic operations which are not in the Assembly Language Processor (ALP) instruction set. The 64 Special Service-Marcos execute the special bit, byte and string manipulative operations used in the SOLID System.

In the following discussion an "address" means either a named location or a singly subscripted variable. The two register form of IBM addressing (i.e., D2(X2 ,B2)) is not allowed.

a. General Service-Macros

The thirty-seven General Service-Macros are identified in five classes (see Table 2). The General Service-Macros can be used anywhere in the SOLID System since all registers used by these macros are preserved. Seven macro-instructions, in Class 4, are obvious extensions of existing IBM 360 ALP instructions. The remaining 30 General Service Macros must be viewed as vital operations for the SOLID System. Twenty-one of these, in Classes 2 and 3, are calling procedures which are used to manage all I/O and transfers during execution. The last nine macro-instructions, in Classes 1 and 5 (see Table 3), perform vital arithmetic and string movement operations on data.

The hardware implementation of many of the General Service Macros, either singly or in special combinations, can very substantially increase the already very impressive performance of the SOLID System.

The 37 General Service Macros are discussed below.

TABLE 2.

Classes of the General Service Macro Instructions

Macro Macro Class Macro Name No. Class Macro Name No. __________________________________________________________________________ 1 CONVE 23 REID4 42 DICE 24 REID5 43 FACE 25 REID6 44 TRUNC 26 REID7 45 REID8 46 2 CALL1 27 CALL2 28 3(ii) REIDJB 47 CALL3 29 REIDT 48 CALL4 30 WR1TE 49 CALL5 31 4 CSSCRN 50 GLOBAL 32 DECPC 51 STSR 33 DUMADD 52 TESTL 34 HEXPC 53 TRANSFER 35 LARGEXC 54 SKIPP 55 3(i) PR1NT 36 TALE 56 PUNSH 37 PUNXH 38 5 LMOVE 57 REID1 39 RMOVE 58 REID2 40 RMVC 59 REID3 41 __________________________________________________________________________

1. Arithmetic Operations

23. CONVE-(&J CONVE &FROM,&TO)

The integer number in address &FROM is converted to a normalized (short) floating-point number and stored in address &TO. All registers are unchanged.

24. DICE-(&J DICE &NOBITS,&FROM,&TO)

A random integer number with the number of bits specified in &NOBITS is produced in the full-word location &RANDNO (right adjusted). &ODDNO contains the multiplicand, and it is updated after each use of DICE. If &ODDNO contains zero, a starting 32 bit odd number is constructed from the "Time of Day". Eleven global words (in STORAGE) are reserved exclusively for storing the random starting numbers. They are initialized to zero. These words are GENERATE, ODDNO, and ODDNOX (with X=1,2,-9). All registers are unchanged.

25. FACE-(&J FACE &NFACES,&RANDNO,&ODDNO)

Face is a special variant of DICE which produces a random integer number (in &RANDNO) that lies in the range from 0 to that specified in &NFACES. For example, if &NFACES contains 3, &RANDNO will contain 0,1 or 2. All registers are unchanged.

26. TRUNC-(&J TRUNC &FROM,&TO)

The normalized (short) floating point address &FROM is truncated to an integer number and stored in address &TO. All registers are unchanged.

2. Calling Procedures

Special calling procedures are used in the SOLID System to branch between the CONTROL routine and the components or to branch among the components themselves. The three types of calling procedures have already been illustrated (FIG. 15). The Macro-instruction CALL1 (see below) is the first type. TRANSFER is the second type. GLOBAL is a special version of TRANSFER which is used exclusively for calling the Global Memory component, SMEMORY. CALL2,CALL3,CALL4, and CALL5 is the third type of calling procedure.

There are two macro-instructions (STSR and TESTL) which are associated with the seven special calling instructions (see Table 3, Class 2). TESTL is nested in the CALLX(X=1,..,5), and TRANSFER macro-instructions. The STSR instruction is the first executable instruction in each component. Together the TESTL and STSR macro-instructions permit the use of V-type relocatable addresses (in MODADI and MODADX) for all components of the SOLID System. With a CALLX(X=1,-5) instruction the USING and BRANCH registers are altered. No register is changed by using either the GLOBAL OR TRANSFER instructions.

In the following macro-instructions &NAME, &NAME1, etc. are the relocatable addresses (V-type) of the designated components. In the SOLID System the name of a component is its relocatable address preceded by an S(i.e., ANPAKC (relocatable address); SANPAKC (Name)). &ALINST is the return address in the CONTROL routine. &RETURN is the return address in the CONTROL routine or in the component in which the call is issued.

27. CALL1-(&J CALL1 &NAME,&ALINST)

After the branching to the component S&NAME (entry point &NAME) control is returned to location &ALINST in the CONTROL routine. The USING(&UR) and BRANCH(&RR) registers specified for component S&NAME are altered.

28. CALL2-(&J CALL2&NAME1,&NAME2,&ALINST)

Components S&NAME1 and S&NAME2 are executed before control is returned to location &ALINST in CONTROL.

29. call3-(&j call3 &name1,&name2,&name3,&alinst)

the components with relocatable addresses &NAME1,&NAME2 and &NAME3 are executed before returning to the address &ALINST in the CONTROL routine.

30. CALL4-(&J CALL4 &NAME1,&NAME2,&NAME3,&NAME4,&ALINST)

The four components whose relocatable addresses are given in the instruction are executed before returning to address &ALINST in the CONTROL routine.

31. CALL5-(&J CALL5 &NAME1,&NAME2,&NAME3,&NAME4,&NAME5,&ALINST)

The indicated five components are executed before control is returned to address &ALINST.

32. global-(&j global &addl,&ntrks,&return)

this instruction can be used anywhere in the SOLID System to call the Global Memory component, SMEMORY. The return address (&RETURN) can be in the CONTROL routine or anywhere in the component where the call is issued. All registers are unchanged. &ADDL is the length of the composite addresses, and &NTRKS is the number of tracks or records in each memory-block (see Table 1).

33. STSR-(&J STSR &UR,&RR,&DUMMY)

This instruction, which is the first executable statement in each component, defines addressability and restores registers 0, 1, 14 and 15 which are used in V-type addressing. &UR and &RR are the using and branch registers for the component. &DUMMY is DUMMY (for separate compilation of components) or SOLID (if the component is positioned by the SUBMP service-macro). Further details are given in the subsection on Components, Section C.

34. testl-(&j testl &name)

&name is the relocatable address of the component (viz. the relocatable address of component SANPAKC is ANPAKC). This instruction appears in the calling procedures for components (viz. CALLX, TRANSFER, and GLOBAL) and the input/output routines (see below). Together with the STSR instruction, TESTL ensures that the registers 0, 1, 14, and 15 will be restored after executing a branch and link instruction to a V-type address.

35. TRANSFER-(&J TRANSFER &N,&NAME,&RETURN)

Transfer is the so-called global calling procedure (see Section C). &N is the assigned level of the component S&NAME. &RETURN is defined above. If a TRANSFER instruction is issued within a component, then &N must be greater than the assigned level of the component. All registers are unchanged.

3. Input/Output Calling Procedures

Assembly Language Processor (ALP) input/output packages for tapes and cards are a part of the SOLID System. They are global calling procedures and can be used anywhere in the SOLID System. The DCB statements for these packages are in the INOUT macro-instruction (see Reserve). Thus, these DCB statements can be changed without recompiling the individual input/output components. Here two forms of the calling procedures are noted.

i. Card/Printer Operations

The print (PR1NT), read (REIDY, Y=1,2,-8) and punch (PUNXH) operations in the SOLID System are extremely versatile. A single variable (&FORMAT) completely designates the formats (i.e., A, I, J, B, E, F, mixed (e.g., IFB), or X). If certains kinds of errors are made in these instructions, the operation is aborted and an appropriate error message is printed. The addresses in these operations can be either named locations or singly subscripted variables. The tworegister form of IBM addressing (i.e., D.sub.2 (X.sub.2,B.sub.2)) is not permitted.

36. PR1NT-(&J PR1NT &FORMAT,&FORM,&TO)

All four-byte words (for numerical data) or bytes (for alphanumeric data) between the addresses &FROM and &TO+4 are printed in the fields and formats designated by &FORMAT. The first address (&FROM) must be located on a word boundary. If &FROM is greater than &TO the operation is aborted with the printed message: ADDRESSING ERROR. All registers are unaltered.

37. PUNSH-(&J PUNSH &FORMAT,&FROM,&TO)

Punsh is a special form of the PUNXH operation which punches (on cards) the information between addresses &FROM and &TO in the X or column binary format. Unlike PUNXH, PUNSH does not have an associated component. No register is changed.

38. PUNXH-(&J PUNXH &FORMAT,&FROM,&TO)

Information between addresses &FROM and &TO+14 is punched on cards in the field(s) and formats designated by &FORMAT. Format options are A, I, B, E, F, mixed (e.g., IEF) or X. The address &FROM should be located on a single word boundary. Output obtained from the PUNXH operation can be read with the REIDY (Y=1,2-8) operation. If &FROM is greater than &TO the PUNXH operation is aborted with the printed message: ADDRESSING ERROR. All registers are unaltered.

39 to 47 REIDY, with Y = 1,2,-8.

There are eight separate card-read instructions with the form:

&J REIDY &FROMAT,&W1,&W2, -&WY

&w1, &w2, -&wy are the addresses of the single variables or arrays where the information is to be stored in the designated format, &FORMAT (see Chapter V). The REIDY operation can be used to simultaneously load up to eight separate locations or arrays. The formats can be A, I, B, J, E, F, mixed or X. At the end of each completed card-reading operation the total number of bytes that were stored is in location JII. If any address (&W1, &W2, etc.) is in protected storage (i.e., in the O/S system or CONTROL routine) or the &FORMAT conflicts with the number of variables (Y) the REIDY operation is aborted with an appropriate error message. No register is changed.

ii. Tape Operations

In the SOLID System there are two tape-read (REIDT and REIDJB) and on tape-write (WR1TE) macro-instructions which are used in the input (SREIDT), job-list (SJOBLIST) and output (SOUTPUT) components. The DCB statements associated with these three macro-instructions are in the INOUT macro-instruction. &ADDRESS is the location in memory where the reading (from tape) or writing (on tape) is to begin. The location &JII contains either the number of bytes read into memory (for REIDT and REIDJB) or the number of bytes that are to be written on tape (for WR1TE). Registers are unchanged in these operations. In their present form, all tape DCB's have a block-size of 3000 bytes and two buffers. Variable length records (up to 3000 bytes) are used for all tapes. However, there is no limit for the value in location &JII.

48. reidjb-(&j reidjb &address,&jii)

this instruction uses the tape DCB statement TAPEJB (see INOUT in the Appendix). This tape normally contains the index data that is to be translated (or rearranged) to the JOB-LIST ITEM form which is used to trace the information path through the AUXILIARY FILE. The DCB name is TAPEJB and the DDNAME = COPAK8. No register is changed.

49. REIDT-(&J REIDT &ADDRESS,&JII)

In this macro-instruction the retrieval command (i.e., MODE =0 (retrieve) or .noteq. 0 (store or update)) determines which of two tapes will be read. If MODE = 0 compressed bulk information, which will be decompressed by COPAK, is read with the TAPEINC DCB. For MODE .noteq. 0 bulk (or referenced) information is read with the TAPEIND DCB. This information will be processed by the COPAK compressor and stored on the device indicated by the output command, OUTPXT (see Chapter V). The DDNAMES of TAPEINC and TAPEIND are COPAK5 and COPAK4 respectively. No register is changed.

50. WR1TE-(&J WR1TE &ADDRESS,&JII)

The &JII bytes of information beginning the &ADDRESS are written on the tape with DCB TAPEOTC. The DDNAME of the TAPEOTC DCB Bis COPAK6. All registers are unaltered.

4. Extended IBM and Message Operations.

There are seven macro-instructions in Class 4. Four of these (CSSCRN, DUMADD, LARGEXC, and SKIPP) are extensions of existing IBM 360 ALP instructions. The remaining three (DECPC, HEXPC, and TALE) are used to print messages. Brief descriptions of these Class 4 general Service Macros are given below.

51. CSSCRN-(&J CSSCRN &HLGTH,&ADD1,&ADD2)

This instruction compared the number of bytes in the half-word &HLGTH of the items beginning in locations &ADD1 and &ADD2. &HLGTH can contain any positive integer number up to 65,535. If &HGLTH contains zero the first 256 bytes are compared. Condition code settings are the same as those for the IBM CLC (Compare Logical Character) instruction. The registers are not changed by the CSSCRN instruction.

52. DECPC-(&J DECPC &NAME,&LOC,&DISCR)

This instruction prints a message about the decimal-ten number in &LOC thus: &NAME=&LOC &DISCR. If &DISCR=SECS, the number in &LOC is printed as a fixed-point number with two decimal places. Otherwise &LOC appears as an integer number. If &DISCR=RATE then BYTES PER SEC. is printed for &DISCR. &NAME and &DISCR are strings of up to 20 alphanumeric characters. No register is altered by the DECPC instruction.

53. DUMADD-(&J DUMADD &BRANCH,&DUMMY)

This instruction generates the IBM 360 ALP statement:

&DUMMY EQU &BRANCH

The address that is to be dummied, &DUMMY, is set equivalent to the address &BRANCH. The DUMADD macro-instruction is used extensively in the "SUBMP" type instructions of the category Reserve.

54. HEXPC-(&J HEXPC &NAME,&LOC,&LENGTH)

In this instruction, a string of hexadecimal characters, whose length (in bytes) is in the full word &LENGTH is printed thus:

&NAME=HEXADECIMAL STRING

The descriptive label (&NAME) can be any string of alphanumeric characters up to 19 bytes long. If &LENGTH contains a number greater than 65 or less than one, it is set to 65. The HEXPC instruction does not change any register.

55. LARGEXC-(&J LARGEXC &NBYTES,&TO,&FROM)

Largexc is the extended form of the IBM XC (Exclusive Or Character) instruction. The half-word &NBYTES contains the number of bytes, (up to 65535) beginning in the address &FROM, that are to OR'ed into the string beginning in address &TO. If &NBYTES contains zero LARGEXC is not executed. If it contains a negative half-word number, then the first 256 bytes are OR'ed. All registers are unchanged.

56. SKIPP-(&J SKIPP &LC)

&lc pages are skipped on the printer. All registers are saved. The instruction: SKIPP 1 skips one page.

57. TALE-(&J TALE &DASH,&MESSAGE,&NOTIMES)

This instruction prints the message: START OF NEW &MESSAGE.,&NOTIMES times. &DASH is the background in which the message is printed. Thus TALE ASTERICK,SEGMENT, 2 will print (extended over 133 characters):

**********START OF NEW SEGMENT**********

**********start of new segment**********

all registers are unchanged.

5. String Movement Instructions

There are three service-macros which greatly facilitate the movement of strings of bytes in core-storage. In the first two instructions (LMOVE and RMOVE), the halfword &NOBYTES contains the number of bytes that are to be moved, beginning in address &FROM, to begin in location &TO. The addresses cannot be doubly subscripted in the RMOVE and LMOVE instructions. In all three macro-instructions all the registers are saved.

58. LMOVE-(&J LMOVE &NOBYTE,&TO,&FROM)

If &NOBYTES .ltoreq. O the LMOVE instruction is not executed. The maximum number of bytes that can be moved is 65,535. LMOVE is the extended form of IBM MVC instruction, which moves the bytes in strings from left to right. No registers are changed.

59. RMOVE-(&J RMOVE &NOBYTES,&TO,&FROM)

This instruction is similar to LMOVE except that the bytes in the strings are moved from right to left. Thus RMOVE can be used to displace a string (of length &NOBYTES) right. &TO and &FROM are the addresses of the leftmost bytes. No registers are changed.

60. RMVC-(&J RMVC &D1,&IR,&B1,&D2,&B2)

In this special form of the RMOVE macro-instruction &D1(&B1) is the address of the last byte in the string's new location. &D2(&B2) is the address of the last byte in the string that is to be moved. &D1 and &D2 are displacements. &B1, &B2 and &IR are registers. &IR contains the number of bytes which are to be moved. If &IR contains zero or is negative the RMVC instruction is not executed.

The reverse move macro-instructions (RMOVE or RMVC) are much slower than the LMOVE or MVC instructions. The speed of the LMOVE and RMOVE instructions primarily determine the speed of the compressor and decompressor components of COPAK. These two instructions can perform best when hardware implemented.

b. Special Service-Macros

There are 64 macro-instructions entered in MACROPAK that are exclusively associated with the 31 components of the SOLID System. Two of these, BEGINS and MJARRAY, are used to initialize various aspects of the entire SOLID System. Another three (DEVICE, GETJLIST, and STRING), which use the General Service-Macro Input calling procedures, handle input for the SOLID System and its subsystems.

One special service-macro, SAVINGS, computes both the percentile savings and the thruput rate for the COPAK Compressor. The remaining 58 special service macros perform the special bit, byte, string, and search operations that are an essential part of the SOLID System.

The last 58 special service macros, the String Movement and Arithmetic Operations (see Section B (a)), to a very substantial degree, determine the speed of compression, decompression, retrieval, storage, purging, and updating. The principal service-macros can be hardwired in the SOLID System.

In this Section, B(b), the 64 special service-macros are discussed in terms of the nine groups of components (see Section C). For example, there are two components, OPENSHUT AND SSTATECL, which initialize the SOLID System. Also, there are six components in the COPAK compressor that used the twenty-nine compressor special service macros. In general all the special service macros have been designed to perform highly selective tasks at particular locations in the SOLID System. Registers, arrays, gates, and counters are changed to achieve the designed purpose. They cannot be used outside their designated environments.

1. Initializing Operations

61. BEGINS-(&J BEINGS )

This instruction, which appears near the beginning of the SSEARCH component, is the initializing routine for the global memory (SMEMORY) component. The physical characteristics of the virtual memory devices (i.e., the number of cylinders, tracks/cylinder, and the number of devices) are recorded in arrays CHECK, SOS, BWX, and LSX.

62. mjarray-(&j mjarray &addl)

mjarray is used in the principal initializing component, SSTATECL. It generates the M-J arrays when the SOLID System is used for the first time. The MJARRAY macro-instruction is executed if NEWFILE appears on the first card in the data-deck. &ADDL is the number of bytes in the composite addresses (see Table 2).

2. Input Operations

Three macro-instructions (DEVICE, GETJLIST, and STRING) are used in the CONTROL routines to read (from cards) twenty-three of the 27 input commands for the SOLID System and its subsystems. Here the functions of the three macro-instructions will be discussed.

63. DEVICE-(&J DEVICE &INPUT,&OUTPUT,&SKIPS,&SLENGTH,&LLENGTH,&RNOS,&CORDS).

The DEVICE and STRING (see below) macro-instructions are special calling procedures for executing the SCOMMAND component. With DEVICE the seven device commands are read from a single card. The default options for each device command are executed in the component SCOMMAND.

64. getjlist-(&j getjlist &jlinpxt,&jlrskip,&jltran,&jlnorm,&klength,&njobs,&ntaks,&nvalue,&jvalue,& numdiag,&generate)

getjlist is a special calling procedure for executing the component SJOBLIST which is the supervisory component for the TRANSLATION PACKAGE. The first eight commands, which are associated exclusively with the control and use of the TRANSLATORS, are read from a single card. The last five commands are associated exclusively with the random generation of JOBLIST ITEMS by Monte Carlo Generators. These five commands are read (from a single card) only if the input command &JLINPXT=16.

65. string-(&j string &mode,&postop,&lexcon,&lexmode,&lexpch)

string is a special calling procedure for executing the second half of the input component SCOMMAND. The five string commands are read from a single card.

3. Compressor

There are 29 special service macros associated with the COPAK Compressor, which has ten components (see Section C). Four of the special service macros (COPAB, COPAJ, JIMP1, and SINCOP) and two components SANPAKJ and SNUPAK) are special variants that are used in the stand-alone Numeric, COPAKNU, and Alphanumeric, COPAKAN, versions of COPAK. The relationships between the components of the compressor are discussed hereinafter. Here it is sufficient to note the following:

i. Three components handle input (SREADC and SREADT) and output (SOUTPUT) for the compressor part of the SOLID System.

ii. Two components SCOMMAND and SSTATECL) set up the strings of information for processing by the COPAK compressor. SSTATECL also initializes the AUXILIARY FILE.

iii. The actual compression/decompression (of compressed information) is done by one special service macro (COPAB, COPAJ, or COPAK), which calls the components when it is used. Table 3 shows the relationships between the special service macro and the compressor/decompressor components they call.

TABLE 3.

Decompressor/compressor components combined means both Numeric and Alphanumeric Information. ##SPC6##

Strings of information are divided into segments, which are handled by the alphanumeric components (SANPAKC, SANPAKJ and SANPAKD). Segments are divided into substrings which are processed by the numeric components SNUPAB and SNUPAK). Detailed descriptions of the composition of the strings before and after compression will be found elsewhere.

The 32 special service macros associated with the COPAK compressor are identified in the following three categories:

Calling Procedures: Three macro-instructions (COPAB, COPAJ and COPAK) are used to call the compressor/decompressor components (see Table 3.)

Alhanumeric: Sixteen macro-instructions are associated exclusively with the three alphanumeric components (SANPAKC, SANPAKD and SANPAKJ).

Numeric: Ten special service macros are associated exclusively with the two numeric components (SNUPAB and SNUPAK).

A description of the 32 special service macros in their three categories begins next.

i. Calling Procedures:

A single macro-instruction in each CONTROL routine is used to execute the compressor/decompressor parts of the several different forms of the COPAK compressor.

66. COPAB-(&J COPAB &RADD)

This special service macro uses the CALL1 instruction for calling the stand-alone numeric component (SNUPAB). After executing SNUPAB control returns to address &RADD in the CONTROL routines (COPAKNU and COPAKNUO). The COPAB Macro-instruction is associated exclusively with the "SUBMP" type instructions SUBCB and SUBCBO.

67. copaj-(&j copaj &radd)

copaj uses the CALL2 instruction for calling SANPAKJ then SANPAKC for the stand-alone alphanumeric compressors (COPAKAN and COPAKANO). &RADD is the return address in the CONTROL routines. COPAJ is used only when the "SUBMP" type instructions SUBCJ and SUBCJO are used

68. COPAK-(&J COPAK &RADD)

Copak uses the CALL3 instruction to call SANPAKD, SNUPAK, then SANPAKC (in that order) before returning control to the address &RADD in the CONTROL routines. COPAK is used in the CONTROL routines for the SOLID System (SOLIDE and SOLIDO) and for the stand-alone combined compressor (COPAKCO and COPAKCOO). Four "SUBMP" type instructions (SUBCE, SUBCO, SUBME, and SUBMO) are associated with COPAK.

ii. Alphanumeric

The distribution of the sixteen Special Service-Macros among the several components is shown in Table 4.

TABLE 4.

Distribution of the Sixteen Alphanumeric Special Service Macros among the indicated components. ##SPC7##

Two macro-instructions, OPCORDS and PPCORDS, organize and punch the permanent cords table (PCORDS) respectively. The PCORDS table is used by SANPAKC when it operates in the fast mode, (as described hereinafter). Thruput rates and percentile savings for the Compressors are computed in SAVINGS, which is executed at the end of SANPAKC. In production situations SAVINGS should be removed. The JIMP1 macro-instruction is used only in the stand-alone alphanumeric compressor (COPAKAN and COPAKANO). It performs the substring manipulations that are normally executed in the numeric component (SNUPAK). The remaining twelve special service macros perform highly specialized bit, byte, and string manipulative operations.

69. BBM-(&J BBM &JII,&CODE,&CORD1)

The BBM macro searches the first &JII bytes of a string for the repeated occurences of the high order byte in &CODE. A bit-map is constructed to denote the relative positions in the string where the repeated byte occurs. The resulting bit-map and its length are stored beginning at the location &CORD1. All registers are preserved by this macro.

70. BBMD-(&J BBMD &JII,&CODE)

Bbmd disassembles the bit-map which was constructed by the BBM macro. The associated code and bit-map are removed from the head of the string. The code is stored in &CODE. The addresses of the occurences of the code in the string are determined by disassembling the bit-map and are stored in the array TL, which is defined in JUNKR. &JLL is reduced by the number of bytes in the bit-map minus two and the string is moved left the same number of bytes. All registers are preserved by this macro.

71. CCD-(&J CCD &JII,&NBB)

The CCD macro searches a string for the repeats of a R-byte pattern which is located &NBB bytes from the beginning of a string. Only the &JII-&NBB bytes following the pattern in the string are searched. The addresses of the repeats of the pattern are stored in the array TL. This macro also determines whether or not a savings can be made by comparing N*(R-1) with R+2. N is the number of repeats of the pattern. If N*(R-1) > R+2, then a savings can be made. Registers 0 to 3 are altered in this macro.

72. FIND-(&J F1ND &JII,&CODE)

The F1ND macro searches the first &JII bytes of a string for the single byte which is contained in &CODE. The addresses where the byte occurs are stored in the array TL. Registers 0 to 3 are altered by the F1ND macro.

73. JAR-(&J JAR &JII,&NR)

The JAR macro combines the three low order bytes of &JII and the low order byte of &NR into a single machine word. &JII is increased by four bytes before the word is constructed. This composite control word is then inserted at the head of the string addressed by the register BRYY. This control word is used to decompress the compressed strings of information.

74. JIMP1-(&J JIMP1 &RADD)

Jimp1 contains the numeric special service macros SOSCODE, ENDSS, and STRINGA, which are normally executed in the component SNUPAK (see (iii) Numeric below). JIMP1, which appears at the end of component SANPAKJ (see Section C), is used for the stand-alone alphanumeric compressor, COPAKAN and COPAKANO.

75. land-(&j land &jii,enr)

the LAND macro removes the four bytes of control information placed at the head of the string by the JAR macro. The four bytes are then disassembled into two full words (&JII and &NR). &JII is decreased by four to its original value. &JII and &NR occupy three and one bytes in the composite control word. Registers 0, 1 and 2 are changed by this macro.

76. LEX-(&J LEX &JII,&CODE,&CORD1)

This macro places the high order byte in &CODE and the R-bytes in &CORD1 at the head of a string. The string is moved to the right R+1 bytes to accommodate the addition. &JII is increased by the number R+1. Register 2 is altered by this macro.

77. LEXD-(&J LEXD &JII,&CODE,&CORD1)

The LEXD macro removes the first R+1 bytes of a string storing the first byte in &CODE and the following R bytes in &CORD 1. The string is moved to the left R+1 bytes. &JII is reduced by the number R+1. Registers 0 and 1 are changed by this macro.

78. MAS-(&J MAS &JII,&NBB,&CORD1)

The MAS macro creates an R-byte opening in a string beginning &NBB byte from the head of the string. The R-bytes contained in &CORD1 are then substituted for the single byte which was located &NBB bytes from the head of the string. &JII is increased by R-1. Registers 0, 1 and 2 are changed by this macro.

79. OPCORDS-(&J OPCORDS &RADD)

In this macro-instruction, which appears at the beginning of the SANPAKD component (see Section C), some indicator messages are printed and the timing of the compression and decompression steps is begun. In production runs, the statements which perform these functions would be removed. The principal purpose of OPCORDS is to select those &TPCORD permanent cords with the highest saving ratios. &TPCORD is one of fourteen variable parameters which is set before SOLID is compiled. &RADD is the beginning location in the SANPAKD component of the string disassembly macro-instruction (STRINGD).

80. ppcords-(&j ppcords &radd)

the PPCORDS macro-instruction appears at the beginning of the SOUTPUT component. &RADD is the first instruction is SOUTPUT which follows PPCORDS. The principal function of this macro is to print and punch (in column binary) the table of permanent cords (PCORDS) if the LEXMODE command is zero.

81. RAND-(&J RAND &JII,&REPEATS)

The RAND macro removes a single composite byte from the head of string. The left four bits of the byte are incremented by one and stored in &REPEATS. The right four bits of the byte are stored in RM. RM is incremented by one and stored in R. &JII is reduced by 1. Registers 0 to 3 are changed by this macro.

82. RRL-(&J RRL &JII,&REPEATS)

The RRL macro constructs a composite byte which is to be added to the head of a string. The left four bits of the byte contain the number in &REPEATS decreased by one. The right four bits of the byte contain the number in RM. The composite byte is inserted at the head of the string. &JII is increased by 1. Registers 0, 1 and 2 are altered by this macro.

83. SAM-(&J SAM &JII,ENBB,&CODE)

Sam substitutes the single byte contained in &CODE for the R-bytes which are located &NBB bytes from the head of a string. The trailing bytes of the string are moved left R-1 to eliminate the spaces created during the substitution. Registers 0, 1 and 2 are changed by the SAM MACRO.

84. savings-(&j savings &radd)

the percentile savings in storage and the thruput rate expressed in bytes/second of uncompressed information) are computed in SAVINGS, which appears at the end of SANPAKC. Both the percentile savings and thruput rate are printed if MODE.noteq. 0 (storage and update modes). Just the thruput rate is printed if MODE=0(retrieval). SAVINGS would be omitted during production runs. &RADD is a dummy address.

iii. Numeric

Two of the ten special numeric service macros (CONSTRCT and EXTRAKT) assemble and decompose the segment control word (PARM), which contains the segment length and the number of substrings. Another two macro-instructions (SOSCODE and ENDSS) ensure that the compressor (NUPAKC) and decompressor (NUPAKD) parts will process each segment of information one substring at a time. SOSCODE also redefines the state-of-substring control word, SOS. Of the remaining three macro-instructions, two (STRINGA and STRINGD) assemble and decompose compressed substrings respectively. The macro-instruction COPAKEND, which appears in the output component (SOUTPUT), executes the post-operation commands.

85. CONSTRCT-(&J CONSTRCT &PARM,&III,&NV)

The four byte composite status-of-segment control word, &PARM, is constructed with &JII in the leftmost three bytes and &NV in the rightmost byte. &JII is the total number of bytes in the segment of information. &NV is the number of substrings in the segment. &JII and &NV are four byte words. The CONSTRCT operation leaves all registers unchanged.

86. COPAKEND-(&J COPAKEND)

The COPAKEND macro-instruction, which appears at the end of component SOUTPUT, executes the post-operation commands POSTOP(=LJ), NJOBS (number of bulk-storage items), and NTASKS (number of ITEMS in Joblist). These commands are:

Lj<0; increment LJ by one and, if LJ=0, set SWITCH=0 and LEXPCH=1. This means that for the next segment of information the alphanumeric compressor (SANPAKC) will operate in the fast mode and the PCORDS table will not be punched.

Decrement NJOBS by one. If NJOBS >0 control goes to the location CARDREAD in the CONTROL routine, where the next string is read.

If, after decrementation, NJOBS<0, then NTASKS is examined. For NTASKS <0 control goes to the location LOOKFILE in the CONTROL routine. Normally the RETRIEVAL PACKAGE is entered at LOOKFILE. For NTASKS .ltoreq.0 control goes to the macro-instruction DISPENSE or DISPOSE at location ANSWER in the CONTROL routine. Other options of the POSTOP command are executed in DISPENSE and DISPOSE.

87. endss-(&j endss &radd)

the ENDSS macro-instruction appears in the SNUPAK and SNUPAB components after the SOSCODE, NUPAKC, and NUPAKD macro-instructions. In ENDSS the new four byte status-of-substring composite control word (SOS) is constructed. During decompression the absolute error check for each decompressed substring occurs. If an error is detected, control passes to location RTRANMIT in the macro-instruction DEVICES (see Section A), where the error procedure component (SACTION) is called (see Section C). If no errors are detected, and some substrings are still to be processed, control returns from ENDSS to location &RADD in SNUPAB or SNUPAK.

88. extrakt-(&j extrakt &jii,&nv,&parm)

this macro-instruction is the reverse of the CONSTRCT instruction. The number of bytes in the segment (&JII) and the number of substrings (&NV) are extracted from the four byte composite status-of-segment control word, &PARM. &JII and &NV are four byte words. All registers and &PARM are unchanged.

89. NUPAKC-(&J NUPAKC &RADD)

Nupakc is the numeric compressor macro-instruction which is used in numeric compressor components, SNUPAB and SNUPAK. After a substring is processed by NUPAKC, control goes to location &RADD in SNUPAB or SNUPAKC for execution of the ENDSS macro-instruction.

90. NUPAKD-(&J NUPAKD &RADD)

Nupakd is the numeric decompressor instruction in the component SNUPAK After processing a substring by NUPAKD, control is returned to the ENDSS instruction in SNUPAK or SNUPAB at location &RADD. NUPAKD reverses the steps executed in NUPAKC.

91. sincop-(&j sincop &radd)

sincop is used in the numeric stand-alone compressor component (SNUPAB). It contains the macro-instruction STRINGD and those parts of component SANPAKD that are needed to process segments of information. Information about the segments is printed and timing of compression/decompression is begun in SINCOP. &RADD is a dummy address.

92. SOSCODE-(&J SOSCODE &RADD)

The SOSCODE macro-instruction appears near the beginning of the SNUPAK component (see Appendix). Its principal purpose is to initialize the substrings for processing by either the compressor or decompressor parts of the numeric compressor component, SNUPAK or SNUPAB. The sign and NDR, which occupies the rightmost four bits of the status-of-substring commands, are modified to divert control to either the macro NUPAKC or to component SANPAKC. &RADD is the instruction in SNUPAK which follows SOSCODE.

93. stringa-(&j stringa &radd)

this macro is used near the end of the SNUPAK and SNUPAB components (see Appendix). Its principal function during decompression is to check that processing by the numeric decompressor part of SNUPAK yielded the correct number of bytes for the segment. Disagreement leads to the printing of an appropriate error message and control passes to the location RTRANMIT, for processing by the error-procedure component (SACTION).

During compression STRINGA inserts the information that is needed to decompress and check the substrings at the head of the segment of compressed information.

&RADD is the location of the instruction which follows STRINGA in the components SNUPAK or SNUPAB.

94. stringd-(&j stringd &radd)

this macro extracts the control information that was inserted at the head of the segment by STRINGA during compression. This control information is used to check the alphanumeric decompression (just completed in SANPAKD) and is used by the NUPAKD macro-instruction to decompress the segment, one substring at a time. Additional error checks are made on each substring (in ENDSS) and also on the segment after processing by SNUPAK (in STRINGA). &RADD is the return address in the component SANPAKD or SANPAKJ.

4. TRANSLATION PACKAGE

The TRANSLATION PACKAGE is a subsystem of SOLID that produces normalized JOBLIST ITEMS from the assigned descriptor sets or generates them with random number generators. The JOBLIST ITEMS are used to trace, create, purge, or update the information paths in the AUXILIARY FILE.

The package consists of eleven components and eleven special service macros. One special service macro (GETJLIST), which has been described above (see macro 68, calls the TRANSLATION PACKAGE supervisory component (SJOBLIST) from the CONTROL routines SOLIDE and SOLIDO. SJOBLIST reads the thirteen TRANSLATION PACKAGE commands and executes them. SJOBLIST performs the following six functions:

i. Read TRANSLATION PACKAGE commands from cards.

ii. Read the assigned descriptor-set from the designated input device, or

iii. Generate JOBLIST ITEMS with random number generators.

iv. Translate the assigned desriptor-sets to the JOBLIST form.

v. Read the override data from cards.

vi. Normalize the JOBLIST ITEMS produced in functions iii and iv. The last function, vi, is not executed if the command JLNORM equals zero.

One component (SGENITEM), which is called from SJOBLIST by the special service macro JLITEM, generates the random JOBLIST ITEMS. Another six special service macros (BITTHROW, JBLISTI, KERTHROW, LBLTHROW, MJTHROW and SQUEEZE) are associated exclusively with the component SGENITEM.

The special service macro TRANLATE calls the five TRANSLATOR components from SJOBLIST. There are provisions for including up to 255 different TRANSLATORS. The TRANSLATOR components, which rearrange the assigned descriptor-sets to the JOBLIST form, must be coded for each new collection of items.

The override data is read (from cards) by a single special service macro (OVERRIDE) in SJOBLIST.

The NORMALIZATION PACKAGE consists of four components (SNORMAL, SCYCLIC, SREFLECT, and SXCHANGE) and one special service macro, NORMFORM. However, three of these components (the TRANSFORMATION PACKAGE) are also used by the MOBILE CANONICALIZATION PACKAGE. NORMFORM is used in SJOBLIST to call the normalization supervisory component, SNORMAL.

Seven of the ten special service macros that are to be described below require an understanding of the JOBLIST ITEM structure, which follows. ##SPC8##

Ljbi is the number of bytes in the JOBLIST ITEM. M is the number of nested information representations, which have ranks m.sub..sub.1, 2,m.sub. 3 -m.sub. M. LD.sub.o is the principal diagonal of the Information Representation (IR) of rank

and LL.sub.o -2 is is length. BD.sub.i and LD.sub.i (i.noteq.0) are the left adjusted second bit-map (B.sub.2) diagonal and the associated LABEL diagonal respectively. (LB.sub.i -2) and LL.sub.i -2) are the lengths of the screens BD.sub.i and LD.sub.i. A JOBLIST item terminates with an asterisk attached to the last non-zero diagonal of LABEL. A JOBLIST consists of NTASKS JOBLIST items, whose total length is found in JLL. A Bit Map Item consists of a Bit-Map Head (LB.sub.i) and a Bit-Map Screen (BD.sub.i) Information Representation Items consists of a I.R. Head (i.e. LL.sub.i) and an I.R. Screen (i.e. LD.sub.i). Bit-Map and I.R. elements are the bits and elements in their screens.

95 BITTHROW-(&J BITTHROW &NUMBITS,&BITMAP)

This macro-instruction generates a pseudo-random Bit Map Item, which contains a two-byte Bit-Map Head and the Bit-Map Screen. The Bit-Map Head contains the length of the Bit-Map Screen plus two. &NUBMITS contains the number of elements (i.e. bits) in the screen. &BITMAP is the beginning address of the Bit-Map Item. All registers are unchanged.

96. JBLISTI(&J JBLISTI &JBLIST,&MVALUE,&JVALUE,&NUMDIAG).

Jblisti constructs a pseudo-random JOBLIST item, in the array &JBLIST(=JBLIST). Location &MVALUE contains the stipulated value of `M`. &JVALUE contains the maximum value for each of the m.sub. i (1 .ltoreq. i .ltoreq. M) values in the first screen (see above). &NUMADIAG contains the number of Bit-Map Item - I.R. Item pairs that are to be constructed. No register is altered.

97. JLITEM-(&J JLITEM &JBLIST,&MVALUE,&JVALUE,&NUMDIAG)

This macro-instruction appears in the supervisory component (SJOBLIST) and it calls the random JOBLIST ITEM Generator Component, SGENITEM. Registers 2 through 5 are loaded with the addresses of the four variables, in that order. If the &JBLIST array is less than 256 bytes long, an error message is printed and execution is terminated. The four variable parameters (&JBLIST, &MVALUE,&JVALUE and &NUMDIAG) are defined above.

98. KERTHROW-(&J KERTHROW &JBLIST) (i.e.

Kerthrow constructs a random principal diagonal for the information representation i.e., LD.sub.o above) and stores it in the array JBLIST at the location &JBLIST. The principal diagonal, which is used as the second screen in the JOBLIST item, has no `zero` I.R. elements. Registers 5 through 15 are not changed.

99. LBTHROW-(&J LBLTHROW &JAYBITS,&BITMAP)

This macro-instruction produces a single I.R. Item that is associated with the Bit Map Item which begins in location &BITMAP. &JAYBITS contains the number of I.R. elements that are to be generated. All registers are unchanged.

100. MJTHROW-(&J MJTHROW &JBLIST,&M,&JAY)

Mjthrow assigns to `m` the value of &M and then generates M random values of m.sub. i (see above) whose values lie between one and &JAY. &JBLIST is the starting address in the array JBLIST of the current JOBLIST ITEM. &M contains assigned numbers of nested representations (M). &JAY contains the maximum rank for the M nested representations. If &M .ltoreq. 0 or &JAY .ltoreq. 1 an error message is printed and execution is terminated. All sixteen registers are unchanged.

101. NORMFORM-(&J NORMFORM &JBLIST,&NTASKS,&JLL,&KLENGTH)

This macro-instruction is used in the TRANSLATION PACKAGE supervisory component (SJOBLIST) to call the principal NORMALIZATION PACKAGE component (SNORMAL). The four variable parameters of NORMFORM are dummy parameters intended to indicate the key information that is needed by SNORMAL. &JBLIST(=JBLIST) is the beginning address of the JOBLIST. &NTASKS (=NTASKS) contains the number of items in the JOBLIST. &JLL (=JLL) contains the total length of the JOBLIST. &KLENGTH (=KLENGTH) contains the number of bytes in each element of the I.R. NORMFORM does not change any register.

102. OVERRIDE-(OVERRIDE )

The OVERRIDE macro-instruction is executed in component SJOBLIST when control returns from TRANLATE. Information about the number of Type 1, Type 2, and Type 3 override codes (which is automatically collected by the TRANSLATORS) is used in OVERRIDE to read updating information from cards. Registers 3 and 4 are changed.

103. SQUEEZE-(&J SQUEEZE &NUMBER,&ADDRESS)

Squeeze removes the `zero` or `empty` elements from a single I.R. Screen and modifies the I.R. Head accordingly. &NUMBER CONTAINS THE NUMBER OF I.R. Elements in the I.R. Screen. &ADDRESS is the beginning address of the associated Bit Map Item. All registers are unaltered.

104. TRANLATE-(&J TRANLATE &ARRAY,&NTASKS,&JLL,&KLENGTH)

This macro-instruction is used in the supervisory component of the TRANSLATION PACKAGE, SJOBLIST, to call the five TRANSLATOR components (STLATORX, with X=1,2,...5). These TRANSLATORS rearrange the assigned descriptor-sets to the JOBLIST form. Timing of searches begins, and information about the TRANSLATOR is printed in TRANLATE. The translator gate, (TGATE) which is a command read in SJOBLIST, determines which of the five TRANSLATORS will be used. There are provisions in TRANLATE for incorporating up to 255 different translators. When control returns from the selected TRANSLATOR component, NOVER1, NOVER2 and NOVER3 contain the number of occurrences of Type 1, Type 2 and Type 3 over-ride codes. The locations of these override codes in the JOBLIST is located in the three primary override-arrays. This information is used by the OVERRIDE macro-instruction in the component SJOBLIST to read update information from cards. The variables are the dummy parameters defined for NORMFORM.

5. Transformation Package

The TRANSFORMATION PACKAGE consists of three components (SCYCLIC, SREFLECT, AND SXCHANGE). The transformation Package is used by both the TRANSLATION and RETRIEVAL PACKAGES.

6. Normalization Package.

At this time the NORMALIZATION PACKAGE consists of one empty shell component (SNORMAL) and its calling macro, NORMFORM (see Translation Package Above). The normalization Package will use the Transformation Package and it will be executed in the supervisory component SJOBLIST after the TRANSLATORS have produced JOBLISTS (see 4. Translation Package). New special reserve macros will be incorporated here when they are implemented.

7. Retrieval Package

The RETRIEVAL PACKAGE will automatically retrieve, store, purge, or update the AUXILIARY FILE and/or the MAIN FILE with information produced by the Translation Package and, if necessary, the MOBILE CANONICALIZATION PACKAGE also.

The RETRIEVAL PACKAGE consists of 16 special service macros, two components (SSEARCH and SRESULT), and the MOBILE CANONICALIZATION and GLOBAL MEMORY PACKAGES. Another two special service macros (BEGINS and MJARRAY) and one component (SSTATECL) initialize the RETRIEVAL Package (see 1. Initializing Operations above). They will not be considered further here.

The GLOBAL MEMORY PACKAGE is called from the SSEARCH component by the TBADD macro-instruction whenever a memory-block is to be transferred between core-storage and virtual memory. The calling procedure, GLOBAL, has been described in (2. Input Operations) SRESULT, which prints search performance data, is called from the CONTROL routines SOLIDE and SOLIDO after each use of the RETRIEVAL PACKAGE. In production situations SRESULT will not be used.

The structure of the composite addresses, that are used in the AUXILIARY FILE to reference the memory-blocks, is:

D and DNO specify the type of device and its number respectively. TRK, CYLN, and FMADD are the track, cylinder, and fast-memory addresses (relative to the starting point) respectively. If a magnetic type is specified, then the sixteen bits of TRK and CYLN specify the record number.

Three of the 16 special service macros (APART,ASADD, and COMPARE), which are used throughout the RETRIEVAL and GLOBAL MEMORY Packages, are used to decompose, assemble, and compare the component parts of the composite addresses. Another macro-instruction, BULK, updates the Bulk Storage Composite Address (also called BULK) after each new allocation of storage for referenced information. The macro-instruction DISPENSE is executed in the CONTROL routines SOLIDE and SOLIDO when control returns from components SSEARCH and SRESULT. Its primary function is to execute the post-operation commands which determine whether or not the COPAK compressor will be used. DISPOSE is a special form of DISPENSE that is used in the stand-alone compressor CONTROL routines.. The macro-instruction LINKHOLE, listed as instruction (115) below, ensures that unused storage in memory-blocks, released during purging operations, will be efficiently searched. The remaining nine of the 16 special service macros are associated exclusively with the heavily nested SSEARCH component.

105. APART-(&J APART &ADDRESS,&RD,&RDNO,&RTRK,&RCYLN,&RFMADD).

Apart extracts the five component parts from the composite address (in location &ADDRESS) into the designated register. If &RTRK and &RCYLN are the same register, a magnetic tape is specified, and both registers contain the record number. The composite address, in location &ADDRESS, and all other registers are unchanged.

106. ASADD-(&J ASADD &ADDRESS,&RD,&RDNO,&RTRK,&RCYLIN,&RFMADD)

This macro-instruction assembles the composite address in location &ADDRESS from its five component parts, in the designated registers. If &RTRK and &RCYLN are the same register, a magnetic tape is specified, and the record number is placed in the twelve bits normally occupied by RTRK and CYLN (see above).

107. AUXFILE-(&J AUXFILE &LSLOW,&ADDL)

Auxfile is executed in the SSEARCH component whenever the terminal location of an information path has been found or created. The terminal locations are in RFILE, and they contain the address(es) of the compressed reference information. AUXFILE inserts (storage or updating), deletes (purging), or collects (retrieval) the bulk-storageaddress(es) in the RFILE sub-arrays of the AUXILIARY FILE. New composite bulk storage addresses are assigned and the macro-instruction BULK is executed in AUXFILE. The two System Parameters (&SLOW and &ADDL) are the lengths of slow and fast and the entire composite address respectively. In our program &SLOW=3 and &ASSL=6.

108. bulk-(&j bulk )

this macro-instruction is executed in AUXFILE after each new assignment of a bulk storage address (called BULK). It increments the record number in the composite address BULK by one.

109. COMPARE-(&J COMPARE &ADDL,&ADD2,&ADDL)

The first four parts (D, DNO, TRK and CYLN) in the composite addresses &ADD1 and &ADD2 are compared. &ADDL is the number of bytes (viz. six, in the addresses. The COMPARE instruction sets the condition code in the PSW OF THE System 360. &ADDL is a System Parameter which is set at compilation time.

110. CREATE- (&J CREATE &ADDL,&LSLOW,&LFAST,&NTRKS,&TRKL,&MATRIXL,&MATRIXS)

The CREATE MACRO, which is called by TBADD in the SSEARCH component, is entered whenever a new sub-array is to be created in a memory-block. The new subarray may be needed to define a new subpath or it may be needed to extend an RFILE subarray. CREATE which is closely interconnected with the global memory component (SMEMORY), via the TBADD instruction, is extremely complex. Without this special service-macro the generalized retrieval system which we have devised would not exist. The seven variable names (of CREATE) are System Parameters that have been fully defined elsewhere in this disclosure. If &ADDL, &SLOW, &LFAST, &NTRKS or &TRKL are changed, the AUXILIARY FILE must be started from scratch. &MATRIXL and &MATRIXS can be changed at any time.

111. CSCREEN-(&J CSCREEN &HLGTH,&ADD1,&ADD2)

Cscreen is used in the SCREEN macro-instruction whenever the screen portion of an EXECUTIVE POINTER is to be compared with the corresponding screen in the JOBLIST item. &ADD1 is the beginning location of the EXECUTIVE POINTER screen. &ADD2 is the address of the JOBLIST item screen. &HLGTH is a half-word which contains the screen length. All registers are unchanged.

The byte JI is used to indicate the comparison status thus:

Ji=00; the screens are equal.

Ji=01; the first screen (&ADD1) is zero (i.e., the location is empty).

Ji=02; the first screen (&ADDL) is less than the second (&ADD2).

Ji=04; the first screen is greater than the second.

112. DISPENSE-(&J DISPENSE &RADD)

Dispense is used in the CONTROL routines SOLIDE and SOLIDO immediately after the SSEARCH and SRESULT components are called. It is executed after each use of SSEARCH and again after each use of the COPAK Compressor, when it is called from COPAKEND in SOUTPUT. DISPENSE executes the Post-Operation (LJ), NTASKS, and NJOBS commands. Register 1 is altered. &RADD is the entry location (in the CONTROL routine) for reading input for the compressor.

113. DISPOSE-(&J DISPOSE &RADD)

Dispose is used in place of DISPENSE in the CONTROL routines for the six stand-alone compressors. The primary difference between DISPENSE and DISPOSE is the way they execute the NTASKS and NJOBS commands.

114. INSERT-(&J INSERT &ADD1,&LFAST,&LTHAYY)

Insert is used in the macro-instruction SCREEN (see below) to insert new executive pointers in their correct positions in subarrays in the AUXILIARY FILE. It is executed only when an EXECUTIVE POINTER must be moved from a location in any subarray. &ADDL, &LFAST, and &LTHYY are System Parameters.

115. LINKHOLE-&J LINKHOLE &EPLNGTH,&FEMPTY)

Linkhole is used in the CREATE macro-instruction. It is executed only when a memory-block is completely used, and before a new memory-block is created. Its designed purpose is to reuse vacant storage areas in the resident memory-block that might have been released during purging operations. Implementation instructions have been incorporated in the macro (see Appendix). EPLNGTH contains the length of the new Executive Pointer. &FEMPTY is the address of the first available subarray in the memory-block. All available subarrays are chained together via their link addresses.

116. MMATCH-(&J MMATCH &NOVER1,&NOVER2,&NOVER3,&CURSCRN,&JBLIST,&JBWORK,&KLENGTH)

This macro-instruction is executed in the SSEARCH component whenever a screen or index, which corresponds to a subpath, is not found. It is also executed during retrieval operations if there are override codes present in the JOBLIST items. If a storage operation is being performed, the signal, MSIGNAL, is set to indicate that an insertion is to be made in the resident memory-block and, after exiting from MMATCH, the insertion is made. The screening procedure, the CREATE macro, and the global-memory component (SMEMORY) are tied together by the MSIGNAL and JI (see CSCREEN) multi-bit signalling system. This signalling system polices the resident memory-block and notifies CREATE whether or not arrays are to be created. It also notifies SMEMORY what procedure is to be executed when a new memory-block is needed or when the job-stream is to be terminated. In retrieval operations MMATCH aborts the search, if no overrides are present, passes control to the MOBILE CANONICALIZATION PACKAGE via its calling procedure, STRATEGY. The last four variable names to MMATCH specify information needed in STRATEGY (see below). The screen procedures, MMATCH, and the MOBILE CANONICALIZATION PACKAGE are tied together by the SRGATE multi-bit signalling system.

&NOVER1=NOVER1 contains the number of Type 1 override codes.

&NOVER2=NOVER2 contains the number of Type 2 override codes.

&NOVER3=NOVER3 contains the number of Type 3 override codes. &CURSCRN,&JBLIST,&JBWORK, and &KLENGTH are dummy variable names for STRATEGY which are explicitly defined in the STRATEGY macro (see Appendix).

117. SCREEN-(&J SCREEN &ADDL,&LFAST,&ADD1,&LTHAYY)

The SCREEN macro-instruction is executed in the SSEARCH Component whenever a screen is sought, (in the resident memory-block,) in one of the column arrays, or its extensions. After initializing the counters and registers, if no over-ride codes are present in the JOBLIST item, the location in the array where the executive pointer should be is determined by the SUPERSCH macro-instruction. If overrides are present, control will go directly to MMATCH, for processing by the MOBILE CANONICALIZATION PACKAGE. If the executive pointer selected by SUPERSCH contains the sought screen (i.e. the sub-path has been found) the tracing procedure (through TBADD) continues. If the two screens are not identical, one of the following can occur:

1. A retrieval job is terminated via the MMATCH macro with a message that the search was unsuccessful.

2. If a vacancy exists, control again goes to MMATCH for insertion of the new executive pointer, which defines a new subpath, and the tracing of the information path continues.

3. If no vacancy exists a hole is created for the new executive pointer and (2) occurs. The hole creating procedure is as follows: If the column-array has a vacant location then all executive pointers greater than the one to be inserted are moved and (2) occurs. If the array is filled, the last executive pointer (EP.sub.L) is saved; a hole is created; the new executive pointer is inserted; the continuance or extension array is found (via the link-address or created (via TBADD), and CREATE); and SCREEN is used to search the new array for the location to insert EP.sub.L. This procedure is repeated until all the executive pointers in an array and its continuances are arranged in increasing order, then the tracing (or creation) of sub-paths continues in the normal manner. The procedure that will be used to ensure that the information paths do not cross memory-blocks, thus eliminating costly additional accesses to the virtual memory, is somewhat analogous to the hole creating procedure that has just been described. However, in this case, the Automatic purging (still to be implemented), hole-closing, and hole-creating capability of SCREEN must all be used. Implementation of this capability will require new service macros for SCREEN and TBADD. The four variable names in the Screen prototype statement are System Parameters, defined earlier.

118. STRATEGY-(&J STRATEGY &CURSCRN,&JBLIST,&JBWORK,&LTHAYY).

This macro instruction appears in the MMATCH macro. It is the entry macro for the MOBILE CANONICALIZATION PACKAGE. (see below). STRATEGY is executed if any overrides are present. It interacts closely with SCREEN and TBADD via the multi-bit SRGATE command. &CURSCRN, &JBLIST, &JBWORK, and &LTHAYY are dummy variables which are fully explained in the STRATEGY macro in the Appendix.

119. SUPERSCH-(&J SUPERSCH &ADDL,&LFAST)

Supersch is executed in the SCREEN macro-instructions. It is executed if no override codes are present. However, it can be entered from the macro-instruction STRATEGY. SUPERSCH does a partition search of an array to find where the new executive pointer should be located. The two System Parameters, &ADDL and &LFAST, have already been defined.

120. TBADD-(&J TBADD &ADDL,&LSLOW,&LFAST,&NTRKS,&TRKL,&MATRIXL,&MATRIXS)

The link address (for continuance or extension arrays) or address in the executive pointer obtained by SCREEN is stored in ADDRESS, and control goes to the TBADD macro. The peripheral equipment addresses, which occupy the three left bytes of ADDRESS and CURRENT, are compared in TBADD to determine if the required memory-block (designated by ADDRESS) is resident in core. If it is not in core the global-memory component (SMEMORY) fetches it. If the create bit of the MSIGNAL signal byte is on, TBADD directs CREATE to create a new column-array with &MATRIXL or &MATRIXS locations for executive pointers. The variables for TBADD are defined in the section on System Parameters.

8. Global Memory

The fully implemented GLOBAL MEMORY consists of one component (SMEMORY and two special service macros (DCBMEM and GLOBAL). The calling procedure (GLOBAL), which has already been described above, is used in the TBADD macro-instructions. GLOBAL MEMORY transfers memory-blocks between core-storage and virtual memory. The Global Memory component, SMEMORY, contains its own DCB or format statements, which are specified in the macro-instruction DCBMEM.

121. dcbmem-(&j dcbmem )

this macro-instruction appears at the end of the SMEMORY components. It contains DCB or format statements together with the IBM OPEN and CLOSE instructions for all the peripheral devices that can be used by the GLOBAL MEMORY. If new peripherals are added the initializing macro-instruction BEGINS, which appears in the SSEARCH component, must be altered. However, except for the recompilation of SSEARCH and SMEMORY, no other changes are necessary. One IBM 2314 disk is currently used for the virtual memory. Branching to the location CLOSE terminates the job-stream. The virtual memory peripheral devices are opened (in DCBMEM) when SMEMORY is used for the first time. The DCB name is GLOBAL1 and its DDNAME is COPAK7.

9. Mobile Canonicalization Package

This package consists of the calling procedure STRATEGY, which is described above, and two components (SMATCH) and SMOBILE), which are referred to below. The aforementioned MOBILE CANONICALIZATION PACKAGE can also use the TRANSFORMATION PACKAGE.

Component SMATCH will determine whether or not mismatches (in MMATCH) are solely due to the presence of one or more override codes. Component SMOBILE will be executed after SMATCH only if the NORMALIZATION PACKAGE was used (to obtain NORMAL FORMS). The design objectives for the various parts of the MOBILE CANONICALIZATION PACKAGE have been fully discussed in the article by P.A. D. deMaine and B.A. Marron, "The SOLID System I. A Method for Organizing and Searching Files." in the book "Information Retrieval: A Critical View." The book is edited by G. Schecter and was published by the Thompson Book Company of Washington, D.C. in 1967.

C. Components

Components are macro-instructions that are stored in the macro-library, SOLID.MACLIB, which have their own using statements to define addressability. In the components it is assumed that data about information that is being processed and the information itself will be found in certain locations and arrays whose addressability is established in the CONTROL routine. Except for this restriction, the components may be viewed as independent subprograms or subroutines. They can be separately compiled (as named CSECTS) for use in planned overlays (see CONTROL PROGRAMS), or they may be compiled with the CONTROL routine by inserting their ENTRY and prototype statements into the `SUBMP` type-instruction. This flexibility in the use of components facilitates the implementation or modification of components, and, with planned overlays, makes it easy to fit the SOLID System onto small 360 configurations. Moreover, with this flexibility the SOLID System can be spread over several partitions in a single computer or over several computers in a network to further improve its already impressive performance.

The calling procedures are used to branch between the CONTROL routine (viz. main-stem) and the components or among the components themselves. The three different types of calling procedures are illustrated in FIG. 15, and they are described in Section B. The points that are to be emphasized in this section, C, will be illustrated with the calling procedures CALL1 (prototype: &J CALL1 &NAME,&ALINST) and TRANSFER (prototype: &J TRANSFER &N,&NAME,&RETURN), and the alphanumeric compressor component (prototype: SANPAKC &NAME, &UR, &RR, &DUMMY). &NAME is the relocatable address of the component, obtained by dropping the first S from the component's name. &ALINST and &RETURN are return addresses. &UR and &RR are the USING and branch registers for the component. &RR can be any register except &UR or 10, 11, and 12, which are used to establish addressability in the CONTROL routine. The choice of the base or USING register (&UR) is generally restricted to 8, 9, and sometimes 7 (see below). &DUMMY designates whether the component was separately compiled as a named CSECT (&DUMMY=DUMMY) or if it was compiled in the CONTROL routine by inserting the prototype statement in the "SUBMP" type instruction (&DUMMY=SOLID). If &DUMMY is equal to DUMMY, the separately compiled component is stored in the module-load library, SOLID.LOAD, for use in planned overlays.

The instruction:

CALL1 ANPAKC,RANPAKC

executes the branch-and-link to the component (SANPAKC) and, on completion of the component's task, returns control to the location RANPAKC. Because the CALLX (X=1,2,-5) instruction changes the values of registers &UR, &RR and others they can only be safely used if the return address (&ALINST=RANPAKC) is in the CONTROL routine. In general, they cannot be used to branch between two components.

The TRANSFER instructions can be used to branch between the CONTROL routine and components or among components. In this instruction all registers are unchanged. Thus the return address, &RETURN, can be either in the CONTROL routine or in the component where the instruction was issued. This greater versatility has been achieved by assigning levels to every component. The level of the requested component is incorporated into the TRANSFER instruction For example:

TRANSFER 1, ANPAKC,RANPAKC

means that control is to be transferred to component SNAPAKC at the first level. The return address RANPAKC can be in the CONTROL routine or it can be in another component (where the TRANSFER was used). The rules for using the TRANSFER instructions are given next.

i. There are, in the given form, five levels available. This number can be increased by adding to the COSAVE&N (&N=1,2,-5) arrays in the macro-instruction SAVEAREA.

ii. While every component has been assigned a specific level, new levels can be arbitrarily reassigned. However, a level assigned to the preceding members of a chain of components cannot be used. This means that with a CALL1 and five successive TRANSFER instructions control can be transferred from the CONTROL routine through five components. The return address for each TRANSFER instruction can be in the CONTROL routine or in the component which contains it.

iii. Components which have been assigned the level "GLOBAL" are not subject to the restrictions in (ii). They have their own special calling procedures, which can be used anywhere in the SOLID system. PRINT,PUNXH,REIDX(X=1,2,-8) and GLOBAL are such calling procedures.

The levels assigned to each component are shown in Table 5.

TABLE 5.

Levels and Classes of Components. The assigned levels are used in the TRANSFER instruction (see Text). The classes are the categories in this Section, C. ##SPC9##

The thirty-one components of the SOLID System (see Table 5) are identified in nine classes or categories. Two (OPENSHUT and SSTATECL) may be viewed primarily as components which initialize the SOLID System. One of these (OPENSHUT) opens and closes the peripheral device (except virtual memory). Three (SPRINT, SPUNXH, and SREID) of the seven I/O components perform the basic operations of reading or punching cards, and printing. The other four components (SCOMMAND, SOUTPUT, SREADC and SREADT) are used to communicate with the user. They, together with components SJOBLIST and SRESULT, handle the input and output for the SOLID System and its subsystems.

The six components in Class 3 are exclusively associated with the COPAK compressor subsystem. One of these six (SACTION), which is called from the macro-instruction DEVICES (see Section A), is executed when a decompression error is detected. Another two components, SANPAKJ and SNUPAB, are the special forms of SANPAKD and SNUPAK that are used in the stand-alone alphanumeric and numeric compressors.

The TRANSLATION PACKAGE contains seven components (Class 4). The supervisory component, SJOBLIST, also reads the translation Package commands from cards and information about descriptor-sets. The TRANSFORMATION (Class 5) and NORMALIZATION (Class 6) packages contain three and one components respectively.

The RETRIEVAL PACKAGE (Class 7) contains two components, SRESULT and SSEARCH. The component SRESULT prints information after each use of the SSEARCH component. In production situations SRESULT can be omitted.

The GLOBAL MEMORY (Class 8) and MOBILE CANONICALIZATION (Class 9) packages contain one and two components respectively.

In the remainder of this section, C, short descriptions of the 31 components are given.

1. Initializing Components

122. OPENSHUT-(OPENSHUT )

This level 0 component is always compiled in the main stem (viz. CONTROL ROUTINE). It is positioned by the "SUBMP" type instruction (see Section A). It opens (at the beginning of each jobstream) and closes (during termination) the DCB's specified in the INOUT macro-instruction. All peripheral devices, other than those in the virtual memory, must be specified in INOUT and OPENSHUT. OPENSHUT, which is called in the macro-instruction DEVICES, uses registers 8 and 9 for USING and Branching.

123. SSTATECL-(SSTATECL &NAME,&UR,&RR,&ADDL&LTHAYY,&DUMMY)

Sstatecl is called from the RESERVE macro-instruction at the start of each job-stream. It initializes addresses used in SSEARCH and generates or reads (from cards) the permanently resident part of the AUXILIARY FILE, which contains the array associated with the prime index, M, and the screen J(=m.sub. 1 m.sub. 2 m.sub. 3 -m.sub. M). If the first card of the data-deck is NEWFILE then the MJARRY macro-instruction generates the M-J arrays.

Level=1

&name is STATECL

&ur can be registers 8 or 9

&RR can be any register except &UR, 10, 11 or 12.

&ADDL is the number of bytes in the composite addresses which preface a memory-block.

&LTHAYY is the number of bytes in the principal data-array, YY.

2. I/O Components

SPR1NT, SPUNXH and SREID are the basic card/printer I/O components for the SOLID System. SCOMMAND reads (from cards) both the device and string commands. SOUTPUT is the principal output component. It handles the compressed and decompressed referenced information. SREADC and SREADT read in the referenced information that is to be compressed or decompressed.

i. Basic Components

There are two output (SPR1NT and SPUNXH) and one input (SREID) components in the SOLID System which are used to print, punch cards and read cards. These components, which are called in the SOUTPUT and SREAD components, can be used (with the PR1NT, PUNXH, and REIDX macro-instructions) on a stand-alone basis. They are extremely versatile, and can be used anywhere in the SOLID System.

124. SPRINT-(SPRINT &NAME,&UR,&RR,&DUMMY)

the PR1NT service-macro calls the component SPR1NT, which prints the requested information in the designated format(s) on the printer. The DCB, PR1NT, is specified in the INOUT macro-instruction.

Level=GLOBAL

&name is PR1NT

&ur must be register 9; &RR can be any register except 9-12.

125. SPUNXH-(SPUNXH &NANE,&UR,&RR,&DUMMY)

This component is called by PUNXH (see Section B). The DCB, PUNXH, is specified in INOUT (see Section A).

Level=GLOBAL

&name is PUNXH.

&ur must be register 9.

&RR can be any register except 9, 10, 11, or 12.

126. SREID-(SREID &NAME,&UR,&RR,&DUMMY)

This component is called by the eight REIDX (X=1,2,-8) service-macros. It reads information from cards with the DCB named MASTER (specified in INOUT).

Level=GLOBAL

&name is REID.

&ur must be 9

&RR can be any register except 9, 10, 11 or 12.

ii. Special Components

127. SCOMMAND-(SCOMMAND &NAME,&UR,&RR,&DUMMY)

This component has two parts, both called from the CONTROL routines. The first part (&NAME=COMANDD) reads the device commands from cards (calling instruction is DEVICE). The second part (&NAME=COMANDS) is called by STRING, it reads the string commands. The default options for all input commands are set in SCOMMAND.

Level=1

&NAME=COMMAND, COMANDD or COMANDS.

&ur must be 8

&RR can be any register except 9, 10, 11 or 12.

128. SOUTPUT-(SOUTPUT &NAME,&UR,&RR,&DUMMY)

This output package for the COPAK compressor prints, punches, or writes on tape the information strings that are processed by COPAK. A format code, which is stored with the compressed string of information, is used to print or punch the output in the entry format type.

Level=1

&NAME is OUTPUT

&ur can be register 8 or 9

&RR can be any register except &UR, 10, 11, or 12.

129. SREADC-(SREADC &NAME,&UR,&RR,&DUMMY)

The SREADC component reads control information and the substrings of data on cards that are processed by COPAK. The PCORDS table is read and checked in SREADC. The status-of-substring control words (SOS) are modified and the status-of-segment control word (PARM) is constructed. The input commands are printed at the end of the SREADC component.

Level=1.

&NAME is READC.

&ur is register 8 or 9.

&RR can be any register except &UR, 10, 11, or 12.

130. SREADT-(SREADT &NAME,&UR,&RR,&DUMMY)

This component reads the compressed and/or decompressed information from magnetic tape. Compressed information is read with the DCB named TAPEIND (DDNAME=COPAK4) and uncompressed information is read with the DCB named TAPEINC (DDNAME=COPAK5). DCB's are specified in the INOUT macro-instruction.

Level=1

&NAME is READT

&ur can be either 8 or 9.

&RR can be any register except &UR, 10, 11 or 12.

3. Compressor

The stand-alone numeric (COPAKNU), alphanumeric (COPAKAN), and combined (COPACOKO) Compressors use different combinations of the six compressor components. The combined compressor, which is also used in the CONTROL routines SOLIDE and SOLIDO, consists of two alphanumeric (SANPAKC and SANPAKD) and one numeric (SNUPAK) component. COPAKAN contains SANPAKC and SANPAKJ, which is a modified form of SANPAKD. COPAKNU contains the modified form of SNUPAK, which is called SNUPAB. The SACTION component is called from the main stem whenever decompression errors occur.

131. SACTION-(SACTION &NAME,&UR,&RR,&DUMMY)

This component is called from the macro-instruction DEVICES in the "Reserve" type instruction (see Section A). Control passes to SACTION from the decompressor parts of the compressors whenever an error is found. Currently SACTION prints appropriate error messages and terminates the job-stream. Error correcting procedures and/or retransmission requests should be handled by SACTION.

Level=1

&NAME is ACTION

&ur can be any register other than 0, 1, or 10-15.

&RR can be any register other than &UR, 10, 11 or 12.

132. SANPAKC-(SANPAKC &SNAME,&&UR,&RR,&DUMMY)

This component compresses the strings of information by two recursive bit-pattern methods in one of two anodes. In the SLOW-MODE the recursive bit-patterns that are used are obtained from the string itself, and those bit patterns which yield savings are stored in a table, PCORDS. In the FAST-MODE only those bit-patterns in the PCORDS table are used. The three input commands associated with SANPAKC (LEXCON, LEXMODE, AND LEXPCH) provide the following options:

i. Build a new PCORDS by compressing the first X strings of information in the SLOW-MODE, then process all subsequent strings in the FAST-MODE.

ii. Read in PCORDS and operate exclusively in the FAST-MODE.

iii. Extend the PCORDS table by processing the first X strings in the SLOW-MODE and then switch to the FAST-MODE.

Level=1.

&NAME is the relocatable address ANPAKC.

&ur -- the base or using register -- can be 8 or 9.

&RR -- the branch register -- can be any register except &UR, 10, 11 or 12. In our programs &RR = 1.

&dummy is discussed above.

133. SANPAKD-(SANPAKD &NAME,&UR,&RR,&DUMMY)

This component first decompresses strings compressed by SANPAKC then disassembles its decompressed strings for processing by the decompressor part of the numeric compression package (SNUPAK). Since all the information that is needed to decompress the strings is stored in the strings themselves, no additional data is needed.

Level=1.

&NAME is the relocatable address ANPAKD.

The restrictions on &UR, &RR and &DUMMY for SANPAKC apply for SANPAKD also.

134. SANPAKJ-(SANPAKJ &NAME,&UR,&RR,&DUMMY)

Sanpakj is the modified form of SANPAKD that is used in the stand-alone alphanumeric compressors (COPAKAN). It contains the macro-instruction JIMP1, which performs those substring operations which are normally executed in SNUPAK.

Level=1

&NAME is the relocatable address ANPAKD.

&ur,&rr, and &DUMMY have the specifications given for SANPAKC and SANPAKD.

135. snupab-(snupab &name,&ur,&rr,&dummy)

snupab is a modified form of SNUPAK, that is used in the stand-alone numeric compressor (COPAKNU). SNUPAB contains the macro-instruction STRINGD, which is normally executed at the end of SANPAKD.

Level=1.

&NAME is the relocatable address NUPAK.

&ur can be 8 or 9

&RR can be any register except &UR, 10, 11 or 12.

&DUMMY is set equal to SOLID if SNUPAB is compiled in the main-stem (viz. extended form). It is set equal to DUMMY if SNUPAB is compiled separately as a named CSECT.

136. snupak-(snupak &name,&ur,&rr,&dummy)

snupak is the numeric compressor-decompressor package in COPAK. It processes strings of information one substring at a time. Compression is accomplished by the four step procedure: truncation, differencing, sequencing, and packing. There are three truncation methods, two of which are automatic. If savings cannot be achieved compression is terminated without loss of information.

Level=1

&NAME is NUPAK.

&ur can be register 8 to 9.

&RR can be any register except &UR, 10, 11, or 12.

4. Translation Package

The TRANSLATION PACKAGE consists of seven components and the NORMALIZATION PACKAGE, which uses the TRANSFORMATION PACKAGE. One of the seven components, SJOBLIST, can be regarded as the supervisory routine for the entire Translation Package. The macro-instruction GETJLIST calls SJOBLIST from the CONTROL routines SOLIDE AND SOLIDO.

SJOBLIST is also the input component for the Translation Package. It reads the Translation Package commands; generates or reads descriptor-sets; reads over-ride information; rearranges descriptor-sets to the JOBLIST item form; and normalizes the JOBLIST items. Random JOBLIST items are produced by the component SGENITEM. Five TRANSLATOR components (STLATORX, with X=1,2,-,5), which are called by the special service macro TRANSLATE, convert the descriptor-sets to their JOBLIST item form. There are provisions for incorporating up to 255 TRANSLATORS. JOBLIST item are converted to their NORMAL FORMS by the Normalization Package, which is called in SJOBLIST by the special service macro NORMFORM.

The TRANSLATION PACKAGE components are briefly described next:

137. SGENITEM-(SGENITEM &NAME,&UR,&RR,&DUMMY)

This component generates JOBLIST items in the array stipulated by its calling procedure (JLITEM), which appears in the component SJOBLIST. SGENITEM uses the information in registers 2, 3, 4 and 5 that were loaded in JLITEM.

Level=2

&NAME is the relocatable address GENITEM.

&ur can be register 8 or 9.

&RR can be any register except 2, 3, 4, 5, &UR, 10, 11 or 12

&DUMMY has been defined above.

138. SJOBLIST-(SJOBLIST &NAME,&UR,&RR,&DUMMY)

Sjoblist is the supervisory component for the TRANSLATION PACKAGE. Its functions have been briefly described above.

Level=1.

&NAME is JOBLIST

&ur can be either 8 or 9

&RR can be any register except &UR, 10, 11, or 12.

139.

to

STLATORX, with X=1,2,3,4, or 5

143.

There are five TRANSLATOR components which have prototype statements like: STLATOR1 &NAME,&UR,&RR,&DUMMY. The first, STLATOR1, is reserved for the AGISAR Translator, which rearranges automatically, extracted data from N-dimensional graphs (or picutres) to the JOBLIST form. New Translators must be coded for each new collection of items. There are provisions in the special service macro TRANLATE, which appears in component SJOBLIST, for incorporating up to 255 Translators.

Level=3.

&NAME is TLATOR1 or TLATOR2 or TLATOR3 or TLATOR4 or TLATOR5.

&ur can be 8 or 9.

&RR can be any register except &UR, 10, 11 or 12.

5. Transformation Package

The TRANFORMATION PACKAGE consists of three components whose design purposes are to execute the CYLIC SHIFT, REFLECTION, and the INTERCHANGE Transformation Rules. These components can be used by both the NORMALIZATION and MOBILE CANONICALIZATION packages.

144. SCYCLIC-(SCYCLIC &NAME,&UR,&RR,&DUMMY)

The CYCLIC component will execute both the left and right cycle shifts. Entry information needed in CYCLIC is defined in the component SNORMAL.

Level=3

&NAME is CYCLIC

&ur can be 8 or 9

&RR may be any register except &UR, 10, 11 or 12.

145. SREFLECT-(SREFLECT &NAME,&UR,&RR,&DUMMY)

This component can be called from SNORMAL or SMOBILE. It will execute the Reflection Rule.

Level=3

&NAME is REFLECT

&ur, &rr and &DUMMY are the same as for component SCYCLIC

146. sxchange-(sxchange &name,&ur,&rr,&dummy)

sxchange executes the kernel Interchange Rule.

Level=3

&NAME is XCHANGE

&ur, &rr and &DUMMY have been specified for SCYCLIC.

6. Normalization Package

The NORMALIZATION PACKAGE contains one component, SNORMAL, which is called by the NORMFORM macro-instruction in component SJOBLIST. SNORMAL can use the TRANSFORMATION PACKAGE (see above) to obtain the NORMAL FORMS of JOBLIST items that are produced by the TRANSLATORS.

147. snormal-(snormal &name,&ur,&rr,&dummy)

this component is called by NORMFORM in the component SJOBLIST after the TRANSLATORS have been executed. Full details of its assigned role in the SOLID System are given in the publication by P.A.D. deMaine and B.A. Marron, mentioned earlier.

Level=2.

&NAME is NORMAL

&ur may be 8 or 9

&RR can be any register except &UR, 10, 11, or 12.

7. Retrieval Package

The RETRIEVAL PACKAGE contains two components, SRESULT and SSEARCH, and it uses the GLOBAL MEMORY and MOBILE CANONICALIZATION PACKAGE, which uses the TRANSFORMATION PACKAGE. In its present form the Retrieval Package can handle explicit retrieval and storage questions. With minor changes, in the MMATCH and AUXFILE macro-instructions, it would handle the explicit purging and updating tasks also. The fully implemented retrieval package is able to handle any kind of explicit, implied (or non-explicit), or browsing question.

The SSEARCH component and the GLOBAL MEMORY and MOBILE CANONICALIZATION PACKAGES are extensively interrelated together. They must be regarded as the Central Core of the SOLID Retrieval System. The Global Memory Component (SMEMORY), which is called from the TBADD macro-instruction in SSEARCH, transfers the memory-blocks between core-storage and the preselected storage devices. The SSEARCH component automatically traces and/or creates the information paths in the resident memory-block. The MOBILE CANONICALIZATION PACKAGE, which is called from the MMATCH macro-instruction, is used in retrieval operations if override codes are present. It makes possible the automatic implied (or non-explicit), fragment, or browsing searches. The SRESULT component, which is called from the CONTROL routine after completion of a search, prints results obtained by the SSEARCH component. SRESULT can be changed to collect statistics on the performance of the retrieval system. Hard-copy of the stored bulk-information is produced by the principal output component, SOUTPUT.

The complex independence of SSEARCH and the two packages mentioned are described elsewhere in this disclosure.

148. SRESULT-(SRESULT &NAME,&UR,&RR,&LSLOW,&JBLIST,&DUMMY)

The SRESULT component is called from the CONTROL routine after the SSEARCH component has been executed. In its present form it constructs and prints an obvious message like: REQUESTED INFORMATION APPEARS ON PRINTER. It also prints the request JOB-LIST item and the BULK address(es) assigned or retrieved in RFILE. These address(es) are normally the location of the compressed referenced information in the bulk storage. SRESULT can be modified to collect and analyze performance data for the SOLID System.

Leval=2

&NAME is RESULT.

&ur is register 8

&RR can be any register except &UR, 10, 11, and 12

&SLOW is the length of the slow position of composite addresses.

&JBLIST(=JBLIST) is the Joblist array whose address is in AJBLIST

&dummy has been defined.

149. SSEARCH-(SSEARCH &NAME,&UR1,&UR2&RR,&ADDL,&LSLOW,&LFAST,NTRKS,&TRKL,&LTHAYY,&JBLIST,&LBJLIS T,&MATRIXL,&MATRIXS,&DUMMY)

This component used the information in the JOB-LIST item(s) to retrieve (MODE=0), store (MODE=1), update or purge (MODE=2,3 or 4) items in the AUXILIARY FILE and compressed referenced information in BULK STORAGE. Except for providing the retrieval command, MODE all operations are fully automatic. Thus information sub-paths are automatically traced (MODE=0, 2, and 3) and/or created (MODE=1, 2 or 3) and/or purged (MODE=2 and 3) in fast memory with the JOB-LIST item information. The Global Memory component (SMEMORY) ensures that the correct memory-block is resident in core-storage. SMEMORY also updates the memory-blocks of the AUXILIARY FILE in the virtual memory (see SMEMORY, above). Protection feature in SSEARCH ensure that the AUXILIARY FILE (in virtual memory) will never be altered by coding, operator of machine errors.

The MOBILE CANONICALIZATION PACKAGE, which handles implicit and intersecting file questions, is called from the Service-Macro MMATCH.

Purging and updating operations in the AUXILIARY FILE will be executed in MMATCH. Thus the basic form of the SSEARCH component will never be altered.

Level=1

&NAME - the relocatable address is SEARCH.

&ur1 and &UR2 are the two USING registers (7 and 8) which establish addressability in the SSEARCH component.

&RR - the branch register - can be any register except 7, 8, 10, 11 or 12).

&DUMMY is defined above. The ten System Parameters were defined earlier in this disclosure.

8. Global-Memory

The GLOBAL MEMORY PACKAGE consists of one component (SMEMORY) and its calling procedure (GLOBAL), which has already been discussed. The DCB or format statement(s) and the opening and closing instructions for the Global Memory are specified in the macro-instruction DCBMEM, which is found at the end of component SMEMORY. If more storage is allocated then DCBMEM must be altered and the macro BEGINS, which is used in SSEARCH, must be changed. The Global Memory Package is fully described elsewhere in this disclosure.

150. SMEMORY- (SMEMORY &NAME,&URI,&UR2,&RR,&ADDL,&INTRKS&TRKL,&JBLIST,&DUMMY)

This component must be positioned by the SUBMP macro-instruction in the CONTROL routine at compilation time. SMEMORY supervises the transfer of the memory-blocks (in the AUXILIARY FILE) between core storage and the designated virtual memory devices. The two parts of SMEMORY accomplish the following:

Part A: (Relocatable Address MEMORY) This part is entered from the TBADD macro-instruction in the SSEARCH component if a new memory-block is needed. If necessary the currently resident memory-block is rewritten at its assigned location (in virtual memory) and then the requested memory-block is fetched from virtual memory. New memory-blocks, which can be created in core-storage, are automatically assigned storage areas in virtual memory).

Part B: (Relocatable address SAVEFM) This part is entered immediately before the Operating System of the 360 regains control to terminate the job. If the resident memory-block has been altered in any storage, updating or purging operation it is stored at the assigned location in virtual memory. It is suggested that the IBM O/S job terminating routine be modified to include a final call to Part B.

The service macro DCBMEM, which appears at the end of SMEMORY, specifies the DCB, OPEN and CLOSE instructions for SMEMORY. The macro-instruction BEGINS, which is used in SSEARCH, specifies the device numbers.

Level=GLOBAL

&name is MEMORY.

&ur1 and &UR2 are two base registers (7 and 8).

&RR -- the branch register - can be any register except 7, 8, 9, 10. 11 or 12.

&ADDL is the length of the composite addresses.

&ENTRKS is the number of tracks (or records) in a memory-block.

&TRKL is the length of the track (or record).

&JBLIST(=JBLIST) is the Joblist array whose address is in location AJBLIST.

&dummy is defined above.

9. Mobile Canonicalization Package

The MOBILE CANONICALIZATION PACKAGE consists of a service macro (STRATEGY), which appears in MMATCH, and two components, SMATCH and SMOBILE. The component SMOBILE will use the TRANSFORMATION PACKAGE to rearrange previously normalized JOBLIST items that contain override codes. The component SMATCH will determine if mismatches occur only because overrides are present. SMATCH and SMOBILE will also perform the intersecting type of search.

151. SMATCH-(SMATCH &NAME,&UR,&RR,&DUMMY)

The information needed by the SMATCH macro has been given in the macro-instruction STRATEGY (see Appendix).

Level=2

&NAME is MATCH

&ur can be 7,8, or 9

&RR can be any register except registers 7-12.

152. SMOBILE-(SMOBILE &NAME,&UR,&RR,&DUMMY)

Smobile is executed after SMATCH and then only if a mismatch could not be resolved. Information required by SMOBILE is stipulated in the macro STRATEGY (see Appendix).

Level=2

&NAME is MOBILE

&ur can be 7, 8 or 9.

&RR can be any register except registers 7-12

&DUMMY is defined above.

RETRIEVAL PACKAGE

Overview:

The AUXILIARY FILE can be viewed as a maze or network that automatically grows, contracts, or is modified to exactly fill the indexing needs for each and every application of the SOLID Retrieval System. Each path through the maze is unique and terminates in a location which contains the address in bulk storage where the compressed referenced information is stored. The JOBLIST items, which are produced by the Translation Packages from the assigned descriptor-sets, are used to trace or create the subpaths that together define an information path. New subpaths are created only when they are needed during a storage or updating assignment. Subpaths are eliminated during purging and in some updating operations. The "length" of a path (or search) is determined solely by the number of decisions that are made while tracing the path, not by the operation (i.e. storage or retrieval or purging or updating) that will be performed at the bulk storage address.

In many respects this scheme is analogous to a telephone network. Each telephone number can be viewed as a unique description (viz. JOBLIST item) of a path from the subscriber's substation to another substation in the network. A telephone call made from the subscriber's substation will be aborted if any link (i.e. subpath) of the path does not exist. In the SOLID System the prime index, M, and the screen, J are analogous to an "area code", and the other screens are descriptions of the intermediate substations that are to be linked for the telephone call. The analogy with a telephone network breaks down when the following facets of the SOLID System are considered.

a. Unlike telephone numbers, which are somewhat arbitrarily assigned to each subscriber, the JOBLIST items actually describe both the path and the referenced information. This means that assignment of "idiot numbers", like those in the National Compound Registry, are quite unnecessary.

b. When the proposed new components are implemented, the Retrieval Package will have a capability for "browsing", which has no parallel in telephone networks.

c. Unlike telephone networks, whose new substations must be created at quite rigidly prescribed locations, the SOLID System creates new paths or substations wherever storage is available.

The AUXILIARY FILE is divided into two parts. One of these parts, which resides permanently in the computer, is associated with the prime index M and the screens J. The second part is divided into memory-blocks and is stored in virtual memory. The Retrieval Package uses the JOBLIST items produced by the Translation Package and a single input command, MODE, to automatically execute all tracing, creating and purging operations in core storage. A global memory component, SMEMORY, transfers the memory blocks between virtual memory and core storage when they are needed. The Continuance Tables will be used to restrict all paths within single memory-blocks. This will ensure that each explicit storage, purge, retrieval, or update request can be executed with, at most, the transfer of one memory-block.

AUXILIARY FILE

It has already been noted that the AUXILIARY FILE is divided into two parts. Part A, which resides in core storage, is generated by the macro MJARRAY or it is read from cards by the SSTATECL component when the file is initialized. Part B is divided into memory blocks that are stored in fast-slow or visual memory.

The principal data array (YY) is divided into three portions as follows:

a. The first portion contains that part of the AUXILIARY FILE which will reside permanently in core storage (i.e., Part A below).

b. The second, transient portion must be large enough to hold one memory-block.

c. The third portion is used to manipulate the strings of referenced information that are stored or retrieved in the bulk storage.

Memory blocks are transferred to the transient portion of core storage by the global memory component (SMEMORY) whenever they are needed for tracing or creating new "information paths". The two parts of the AUXILIARY FILE are discussed next:

Part A:

Part A contains one sub-array associated with the prime index, M, and five subarrays which are associated with the screen J. It is prefaced by two composite addresses, EMPTY and BULK. The first four items in EMPTY together give the location in virtual memory where the next newly created memory block is to be stored. The fast memory address portion of EMPTY (FMADD) contains the beginning address of the transient portion of the data array, YY. The first four items in BULK together give the location in bulk storage where compressed referenced information is stored. The fast memory address portion of BULK (FMADD) specifies the core-location of the referenced information.

Normally, the first input item for the SOLID System is a card deck with Part A punched in column binary. When the SOLID System is used for the first time, this card-deck is replaced by a single card that contains the word NEWFILE. This generates the initializing information for Part A. Thereafter, if part A was changed, a new card deck is punched at the end of each job-stream. The information on the first two cards and last two cards is used to check the card deck at input time.

Part B:

Each of the memory-blocks is prefaced by a composite address, CURRENT. The first four items in CURRENT disclose the location in virtual memory where the memory-block normally resides. The fifth item, FMADD, is the relative address in the principal data array (YY) where a new sub-array can be created. Thus FMADD is the location of the first byte in the resident memory-block that is not a part of an existing sub-array.

Description:

The translated JOBLIST item which is stored in the array &ARRAY, is used by the Retrieval Package to trace (retrieval and old storage) or create (new storage) the information path in the AUXILIARY FILE. This is accomplished with the aid of three addresses (EMPTY, CURRENT, and ADDRESS) and eight bit indicators in a single byte (MSIGNAL). The composite addresses EMPTY and CURRENT initially preface Part A and the resident memory block of the AUXILIARY FILE respectively. ADDRESS is extracted from a subarray during the search. It is either the link address, pointing to an extension or continuance of the subarray, or an address extracted from an EXECUTIVE POINTER. The position of the EXECUTIVE POINTER will disclose the prime index or screen in the JOBLIST item. If a subpath is missing, ADDRESS contains zero. In this case the MMATCH macro instruction either completes the construction of a new EXECUTIVE POINTER, thus creating a new subpath, or it aborts the search.

The eight bits of MSIGNAL are used by the retrieval package to indicate the status of the AUXILIARY FILE with respect to the current search. The signal system is discussed next.

Signal System (MSIGNAL):

The meanings that are assigned to each of the eight bits in MSIGNAL are given next.

MSIGNAL BIT ON Instruction (HEXA- DECIMAL) 80 A new memory-block is to be created. 40 The search component has been used before. 20 ADDRESS contains a link address. 10 Tracing with screen J has been completed. 08 Tracing with index M has been completed. 04 A new subarray is to be created. 02 The resident memory-block is new. 01 The resident memory-block has been changed.

The 40 bit is turned off when the system is initialized. This occurs at the start of each job-stream in the SSTATECL component after the card-deck of Part A, has been read. Three bits (01, 02 and 80) are used by the Global Memory component (SMEMORY) to save the resident memory-block and to fetch a new one. Two bits (04 and 20) indicates the status of the sub-array whose beginning address is in ADDRESS. The last two bits (08 and 10) are used to indicate the type of executive pointer (e.g., with or without screen) in the subarray that is to be searched next.

The role played by MSIGNAL is shown in FIG. 16. A step-by-step description follows:

Step a

At the start of the job-stream the bits in MSIGNAL are turned off. This occurs in the SSTATECL component at stage 600. If CURRENT=FFFFFFFF, bit 80 is turned on. This means that no memory blocks are present in the AUXILIARY FILE.

Step b

All bits in MSIGNAL except 80, 40, 02 and 01 are turned off at stage 602. The index registers which point to M in the JOBLIST item (in array &ARRAY) and to the sub-array associated with M are initialized, and at stage 602, the composite address EMPTY is saved at CORD1. If the 40 bit is off, this address will also be stored in location EMPTY+&ADDL and the 40 bit turned on. If EMPTY and EMPTY+&ADDL do not contain the same address at the end of the job-stream a new card deck will be punched for Part A. Step b at 604 is the normal entry point to the retrieval package.

Step c

At stage 606 the 20 bit in MSIGNAL, which is used indicated that an extension or link sub-array is to be fetched or created, is turned off.

Step d

Bit 08 is inspected at stage 608. If it is one, then either the JOBLIST item index points to a screen, or a search of an RFILE array, which contains bulk storage address(es) of compressed referenced information, is indicated. The length of the unused part of the JOBLIST item is used at stage 612 to differentiate these situations. If the MSIGNAL 08 bit is zero, then the JOBLIST item index points at M, and an index search of the subarray MA is completed at stage 610.

Step e

The sub-aray in core storage is searched at stage 614 for an EXECUTIVE POINTER which contains the screen in the JOBLIST item. The following situations can occur:

1. An EXECUTIVE POINTER is found. The address portion is loaded into ADDRESS and control goes to MMATCH at stage 616. In this case, MSIGNAL is not altered.

2. The subarray is full, so one of its extensions or continuances must be fetched or created. In this case the "continuance bit", 20, is turned on and the link address is loaded in ADDRESS.

3. an EXECUTIVE POINTER which contains the screen in the JOBLIST item cannot be found in the sub-array or in its extension(s). If MODE=0 (i.e., retrieval) ADDRESS is set to zero. In the storage or updating modes (MODE>0) a hole is made at the correct spot in the subarray and the screen is stored in it (left adjusted). ADDRESS is set to zero.

Step f

In the MMATCH macro instruction at stage 616, ADDRESS is compared to zero. If it is zero and MODE=0 (i.e. retrieval) the search is aborted as unsuccessful at stage 618. It should be noted that the MOBILE CANONCALIZATION PACKAGE is called by the macro STRATEGY from MMATCH whenever a mismatch (i.e. ADDRESS is zero) occurs in the retrieval mode and there are over-rides present. If MODE=1 (i.e. storage), the composite address CURRENT is stored in the correct location in the subarray, and both the 01 and 04 bits in MSIGNAL are turned on. This completes the construction of the new EXECUTIVE POINTER and also indicates that the resident memory-block has been changed. If ADDRESS is not zero the next subpath has been found and MSIGNAL is not changed.

Step g

TBADD at stage 620 is the most complex and powerful instruction in the Retrieval Package. It uses the information in the composite addresses CURRENT and ADDRESS plus bits 01, 02, 04 and 80 in MSIGNAL to accomplish the following:

1. If a new memory block is required, then the global memory component (SMEMORY at stage 624) saves, if necessary, the resident memory block and fetches a new one. Bits 01, 02, and 80 in MSIGNAL are used by SMEMORY.

2. if the 04 bit is "on", the CREATE macro instruction at 622 creates a new subarray, beginning in the location specified by the core address portion of ADDRESS. The composite address CURRENT is recomputed at stage 626 and, if necessary, the address EMPTY is recomputed also. The create bit (04) is turned off at the end of CREATE.

Step h

At stage 626 the continuance bit 20 of MSIGNAL indicates whether or not the JOBLIST item index is to be incremented and bits 10 and 08 are to be altered. This procedure is executed until either the search is aborted (in MMATCH) or the search ends successfully at stage 682 after retrieval or insertion of the bulk storage addresses in the RFILE subarray of the resident memory-block. These tasks are performed by the AUXFILE macro instruction at stage 632. It turns on MSIGNAL 01 bit if a bulk storage address is inserted in RFILE. The 20 bit is turned on if a continance of the RFILE subarray has to be fetched or created.

After a successful search, the bulk storage address(es) are used to store (MODE=1), update (MODE=4), or retrieve (MODE=0) the compressed referenced information in the MAINFILE. The retrieved referenced information is decompres-sed by COPAK before it is disseminated.

Search Procedures:

The subarrays in core-storage are searched for EXECUTIVE POINTERS in one of three different ways (see FIG. 16). These are:

Indexes

The EXECUTIVE POINTERS that are associated with the prime index (M) are stored in the subarray at the relative address indicated by the M value. A zero at the relative address means that no EXECUTIVE POINTER has been inserted.

RFILE

The bulk storage address is stored in the first vacant element of the RFILE sub-array or its extension (or continuances). A single macro-instruction, AUXFILE, is responsible for searching and maintaining the RFILE sub-arrays. Retrieved or newly assigned bulk storage addresses are stored in the high address end of the principal data array, YY. The beginning address is found in location SBRY. The number of retrieved or stored addresses is in location NJOBS. An unsuccessful retrieval is indicated by setting JII to zero.

The RFILE subarrays are created at the first available location in core-storage when they are needed. The number of bulk storage addresses that can be stored in a particular RFILE sub-array is determined by the System Parameter &MATRIXS.

The AUXFILE macro-instruction is executed at stage 630 of FIG. 16 whenever the residual or unused length of the JOBLIST item is less than zero. This occurs after the beginning address of the terminal RFILE array has been found. AUXFILE uses the retrieval command, MODE, and, if need be, the bulk storage address (BULK), to retrieve or store addresses of the compressed referenced information in RFILE.

The retrieval command, MODE, has been assigned the following five meanings:

MODE MEANING 0 Retrieve 1 Store or Update 2 Change items in the AUXILIARY FILE 3 Purge paths from the AUXILIARY FILE 4 Purge, replace and add compressed referenced information to the MAINFILE.

the AUXFILE these five commands mean that bulk storage addresses are to be retrieved, stored, purged or changed in RFILE. The three types of override codes (Type 1, Type 2, and Type 3), which are automatically inserted in the JOBLIST items during the translation of descriptor-sets, have different meanings for each of the five values of MODE. These are as follows:

MODE Override Meaning of Over-ride 0 1 Accept any non-zero value for the designated descriptor or element

2 Accept any value, zero or non-zero, for the designated element. 3 Accept any value for the designated descriptor that lies is the range specified in array AOVER3R.

1 1 create normal paths with the 2 specified override. If such paths exist the inverted file searches will not be executed during retrieval.

2 1 Replace specified element(s) by a "1". 2 Replace specified element(s) by a "2". 3 Replace specified element(s) by a "3". (For the MODE=2 update the Normalization Package will not be used).

3 Purge the information path specified in JOBLIST from the AUXILIARY FILE. 2 Replace the element that is specified in the JOBLIST item by the element in the array AOVER3R.

3 use the information in array AOVER3R to construct alternate paths for the path described by the JOBLIST item.

4 Purge the referenced information items whose bulk storage address(es) are given in array AOVER1 from the MAIN FILE. Add the compressed referenced whose storage addresses are given in array AOVER2. Replace the items in MAINFILE as specified in array AOVER3.

in its present form the SOLID System can process explicit retreival (MODE=0) and storage (MODE=1) requests which do not use the NORMALIZATION or MOBILE CANONICALIZATION PACKAGES. The full potential of the SOLID System, which permits the use of all five MODE options and overrides, include utilization of the NORMALIZATION and MOBILE CANONICALIZATION PACKAGE instructions and the MMATCH and AUXFILE macro-instructions are modified slightly. In the following discussion it will be seen that the AUXFILE macro-instruction has been designed so that branches for the remaining three MODE options (2, 3, and 4) can be very easily incorporated. The flow-chart for AUXFILE (FIG. 17) is discussed next.

In FIG. 17, the operation starts at stage 634 wherein the question is asked "is MODE greater than, equal to, or less than 4?" If the answer is that MODE is greater than 4, then the operation is terminated at stage 636, because meanings have not yet been assigned for MODE greater than 4. If MODE is less than 4, control goes to stage 638. If MODE is equal to 4 control passes to stage 640.

At stage 638, the question asked is "is MODE equal to, less than, or greater than 2?." If MODE is greater than or equal to 2, then, control again goes to stage 640. Stage 640 is operative to handle the conditions when MODE is equal to 2, 3 or 4.

If MODE equals one or zero, the program continues to stage 646 and registers R0 and R1 are loaded with zero and NJOBS respectively.

At this point the significance of two counters, NJOBS and RNJOBS, must be understood. Both counters are initialized in the TRANSLATION PACKAGE, which is executed before the RETRIEVAL PACKAGE is used. For retrieval operations NJOBS and RNJOBS are both set equal to zero. NJOBS are used to count the number of bulk storage addresses retrieved from the RFILE array by AUXFILE. For storage operations both NJOBS and RNJOBS are set equal to the number of items of compressed referenced information that are to be stored for the particular JOBLIST item. Thus, if MODE=1, NJOBS cannot be less than one. NJOBS is incremented (retrieval) or decremented (storage) each time a bulk storage address is retrieved or inserted in the RFILE arrays. The bulk storage addresses are also recorded in a sub-array of the principal data array, YY, that begins at the location stored in SBRY. SBRY is set during the initialization of the SOLID System. NJOBS and RNJOBS are used to compute the actual locations in the sub-array of YY where each bulk storage address is recorded.

In the following discussion of stage 648 it is assumed that this is the first time that stage 646 is executed for a particular JOBLIST item. It will be understood that R1 is incremented (retrieval mode) or decremented (storage mode) in cycles through or within AUXFILE. At stage 648, a determination is made as to whether the system is operating in the retrieval or the storage mode. If it is in the retrieval mode then the program continues to stage 650 and, the location where the bulk storage address is to be recorded is computed in the register R1 from the System Parameter &LSLOW, SBRY, and R1 itself. It should be noted that before this computation R1 contains the number of bulk storage addresses that have been recorded in the subarray of YY to this point. &LSLOW is the length of each bulk storage address.

If, at stage 648, the system is operating in the storage mode, then control goes to stage 652, wherein register R1 is set equal to the difference between RNJOBS and NJOBS. As already noted above this difference (in R1) is the number of bulk storage addresses that have been assigned so far for the particular JOBLIST item. From stage 652 the program goes to stage 650, wherein the new R1 is computed as described above.

After stage 650 has been executed control goes to stage 654, wherein the question asked is "is MODE zero?". At this stage it should be noted that the location of the first element in array RFILE is recorded in register IR6 and that register IR3 contains the location of the continuance address of the RFILE array.

If MODE=0 at stage 654 then control goes to stage 656 and the question is asked "is the element of RFILE specified in register IR6 zero?" If the answer at stage 656 is "it equals zero" control then goes to stage 640 and the exit operations of AUXFILE are executed. If the answer at stage 656 is "it is not zero" then control goes to stage 658. Therein the bulk storage address in the RFILE array is recorded in the subarray YY at the location specified in register R1, and NJOBS is incremented by one. From stage 658 the program branches to stage 660 and register IR6 is incremented by the value &LSLOW, which is the length of elements (viz., Bulk Storage Address(es), in the RFILE array. At the next stage, 662, the contents of registers IR6 and IR3 are compared. If IR6 equals IR3, control goes to stage 664, wherein the question is asked "is the continuance address of the RFILE array zero?" If the continuance address is zero, control passes to stage 666 where the question asked is "is MODE equal or greater than zero?" If MODE equals zero, the retrieval mode is indicated and control goes to stage 640 to begin the AUXFILE exit operations. If, at stage 666, MODE is greater than zero, then the retrieval mode is indicated, and the program goes to stage 670, where the MSIGNAL 20 bit is turned on. From stage 670 control goes to stage 640, exits from AUXFILE. If, at stage 664, the continuance address is not zero, then the program goes directly to stage 670 as described above. The MMATCH and TBADD macro-instructions use the MSIGNAL 20 bit, which was turned on at stage 670, to fetch or create an extension of the RFILE array. They will be fully discussed hereinafter.

It should be noted that the MSIGNAL 20 bit is interrogated after each use of AUXFILE. If it is "one" control goes to the MMATCH macro-instruction (at stage 616) and the extension of the RFILE array is fetched (retreival or storage) or created (storage) at stages 620. Eventually control returns to the AUXFILE macro via stages 626, 606, 608 and 612 of FIG. 16.

If, at stage 662, IR6 is greater than IR3, something is wrong, and the program goes to stage 668, where an error message is printed before exiting from the system at stage 636. If IR6 is less than IR3 control returns to the first stage in AUXFILE (634) where the next cycle through AUXFILE begins.

If the answer to the question asked at stage 654 was "MODE is not zero," then the storage mode is indicated and control goes to stage 672. At stage 672 the question asked at stage 656 is again posed. If, the answer is "it is not zero" then control goes to stage 660 and the previously described operations are executed. If the answer at stage 672 is "it is zero", this means that a zero element has been found in the RFILE array, and control goes to stage 674. At stage 674 the new bulk storage address (in BULK) is stored in the RFILE and the YY array, at the locations specified by registers IR6 and R1 respectively. At this point it should be noted that the new bulk storage address that has just been assigned will be used to store compressed referenced information. COPAK will compress the reference information then store it in the MAIN-FILE at the assigned bulk storage addresses.

At the next stage, 676, of the AUXFILE macro, the address BULK is updated by the macro-instruction BULK. Thus the address BULK now contains the next location in the MAIN FILE that can be assigned for storing compressed referenced information. From stage 676 the program goes to stage 678, wherein NJOBS is decremented by one, and the MSIGNAL 01 bit is turned on. It should be noted that at this point NJOBS contains the number of bulk storage addresses that must still be assigned for the JOBLIST item, and the 01 MSIGNAL bit now indicates that the resident memory-block has been changed.

At stage 680 the question is asked "is NJOBS greater than zero?" If the answer is "it is greater than zero" then control goes to stage 660, and the previously described operations are executed. If the answer is "it is not greater than zero," which is identical with the answer "equal zero or negative", then control goes to stage 640.

At stage 640 JII and IR1 are set equal to one and AYY respectively. Later in the SSEARCH component JII=1 indicates that a successful search has been performed and that the new assigned (storage mode) or retrieved (retrieval mode) bulk storage addresses will be found in the YY array, beginning in the location specified in SBRY. At the end of the SSEARCH component the information in this part of array YY is transferred to a work array, JBWORK, and, for the storage mode, NJOBS is set equal to RNJOBS. The information in NJOBS and in array JBWORK are used by the COPAK compressor, which is discussed hereinafter. The register IR1 has been reset to point to the beginning of that part of the AUXILIARY FILE that resides permanently in core-storage.

It should also be noted that in stage 678 the MSIGNAL 01 bit was turned on. This bit will signal the GLOBAL MEMORY component (SMEMORY), when it is called later by the TBADD macro-instruction (see stage 620, FIG. 16), that the resident memory-block must be updated in virtual memory. GLOBAL MEMORY is discussed later in this disclosure.

FIG. 18 is a flow diagram of the program steps in the SCREEN macro-instruction. SCREEN is used (at stage 614 in FIG. 16) whenever the MSIGNAL 08 bit is "on" and the residual length of the JOBLIST item, which is recorded in the half-word DUN1 is not less than zero. These conditions occur when the screens in the JOBLIST item (viz., J, LD.sub.o, BD.sub.1, etc) are being used to trace or create subpaths. Before beginning the discussion about the SCREEN macro instruction, it should be understood that there are two steps involved. First, an array whose EXECUTIVE POINTERS have the same length screen as the one in the JOBLIST item must be located. Second, the located array must be searched for the element where an EXECUTIVE POINTER with the JOBLIST item screen should be located. Of course, the located element may contain zero, the desired EXECUTIVE POINTER, or another EXECUTIVE POINTER. Once the location, where the EXECUTIVE POINTER should be, is found, then a decision can be made about the course of action that should be taken.

In the AUXILIARY FILE all the arrays that are associated with a particular screen (say LD.sub.o) and a particular value for the preceeding screen or index (in our case the screen J) are linked one to the other by their continuance addresses. Within each array, all EXECUTIVE POINTERS have the same length screen. However, two different linked arrays can have EXECUTIVE POINTERS with different length screens.

At stage 684 in FIG. 18 the byte JI is set to zero and all registers, R0 to R15, are stored in an array which begins at DUM1+8. These registers are saved because they are needed if the attempt to locate an array which can be searched is unsuccessful. The byte JI has been initialized for the SUPERSCH macro-instruction, which is executed later in SCREEN, and is fully described hereinafter.

From stage 684 the program goes to stage 686, wherein the length of the EXECUTIVE POINTER, whose screen is in the JOBLIST item, is computed in register R14. Thus R14 contains the screen length plus &ADDL, which is the address length. Also, at stage 686 the total length of all the EXECUTIVE POINTERS in the array that are to be searched is stored in register R1. This length is found in the first four bytes of the array. At the next stage, 688, the question is asked "is register R1 exactly divisible by R14?" If the answer is "no" this means that the array is not be be searched, because the length of its screens differ from the length of the screen in the JOBLIST item. In this case control goes to stage 690 wherein all the registers, R0 to R15, are reloaded from the array DUMX+8, then the program goes to stage 692. At stage 692 the MSIGNAL `20` bit is turned "on" and register IR6 is set equal to register IR3, then the program exits from SCREEN at stage 694. At this point it should be noted that both registers IR3 and IR6 now contain the location of the array's continuance address. This information, in register IR6, and the 20 MSIGNAL bit are used by the MMATCH macro-instruction (Stage 616, FIG. 16) to eventually fetch or create an extension of the array or to abort the search, as described for MMATCH hereinafter. It should also be noted, from the discussion for FIG. 16, that if a search is not aborted at stage 616 control will eventually return to SCREEN via stages 606, 608, and 612.

If, at stage 688 of FIG. 18, the answer was "yes", this means that the array can now be searched. At stage 696 all registers, R0 to R15, are reloaded from the array DUMX+8, wherein they were stored at stage 684. At the next stage, 700, a series of programmatic steps, in a conventional manner, will determine whether or not override codes are present in the JOBLIST item. If overrides are present, control would go to the MMATCH macro-instruction (stage 616, FIG. 16) for eventual processing by the macro instruction MOBILE CANONICALIZATION.

From stage 400 control goes to stage 698 wherein the question is asked "is the half-word DUN1 greater than zero?". At this point it should be noted that the half-word DUN1, which is set in the initialization step at stage 602 of FIG. 16, contains the length of the JOBLIST item screen. If, at stage 698, the answer is "not greater than zero" control goes to stage 702 and the register R1 is loaded with the address of &ADD1, which is the address of the EXECUTIVE POINTER in the array that is being searched. At this point it must be noted that there is only one EXECUTIVE POINTER in the array, because the screen length is zero. At the next stage, 704, the question is asked "is the EXECUTIVE POINTER zero?" If it is, the program goes to stage 710, wherein the registers IR2 and IR6 are both incremented with register BRYY. At this point it should be noted that register BRYY contains the length of the JOBLIST item screen, IR2 the location of the JOBLIST stem screen in JOBLIST, and IR6 the location of the EXECUTIVE POINTER in the array that was reached. Thus, the incremented registers IR2 and IR6 contain the location of the next screen (in JOBLIST) and the address part of the EXECUTIVE POINTER. The program next goes to stage 694 and then exits from SCREEN to the MMATCH macro-instruction at stage 616 in FIG. 16, as described above.

If, at state 704, the answer was "it is zero" then control goes to stage 706 wherein the byte JI is set equal to one, which indicates that a vacant element has been found in the array. At the next stage, 708, the question is asked, "is MODE equal to one?". If the answer is "it is one" this signifies that a retrieval operation is being performed, and the program goes to stage 730. At stage 730 the question is " is the byte LEXICON zero?". If the answer is "it is zero", then at stage 732 the LMOVE macro instruction is used to move the JOBLIST item screen (specified in register IR2) to the EXECUTIVE POINTER position in the array as specified in register IR6. It should be noted that stage 710 is executed only in the storage mode, and that the LMOVE operation, just described, actually constructs the first parts of a new EXECUTIVE POINTER in the correct position in the array. The address part of this partially constructed EXECUTIVE POINTER will be inserted at stage 616 of FIG. 16, wherein the MMATCH macro-instruction is executed. At that time the MSIGNAL 01 bit, which signifies whether or not the resident memory block has been altered, will be turned on, and subsequently, in stage 620 of FIG. 16, a new array will be created in the resident memory block.

Control goes from stage 732 to stage 710 and thence exits at stage 694, in the manner described previously.

Before continuing the discussion, an understanding of the roles played by SUPERSCH (stage 712), INSERT (stage 728), and the byte indicator LEXICON is needed. SUPERSCH actually performs the task of locating the position in the AUXILIARY FILE where an EXECUTIVE POINTER with the JOBLIST item screen should be. If the location is occupied by an EXECUTIVE POINTER with a different screen and the storage mode is indicated, then a vacancy must be createdat the particular spot so that the new EXECUTIVE POINTER can be inserted in its correctly ordered position in the array and its continuances. The hole creating process can involve both the movement of EXECUTIVE POINTERS, within an array, and the transferal of EXECUTIVE POINTERS across an array. These two tasks are accomplished in a complex manner by the INSERT macro-instruction, which will be discussed hereinafter. INSERT uses the LEXICON byte as an indicator. At stage 730 of FIG. 18 the LEXICON byte indicates the status of the transferral of EXECUTIVE POINTERS between arrays.

Now at stage 730 of FIG. 18 the question asked was "is the byte LEXICON zero?" If the answer is "it is zero" this means that a vacancy exists and that there are no EXECUTIVE POINTERS being transferred across arrays. In this case control goes to stage 732 in the manner described earlier. If, at stage 730, the answer is "it is not zero", then a hole must be created in the array at the location specified in register IR6. This task is accomplished by the INSERT macro instruction (at stage 728), wherein the LEXICON byte is also altered and, if necessary, a transferred EXECUTIVE POINTER is inserted. Next control goes to stage 694 to begin the sequences of operations stipulated at stages 616, 620, etc. in FIG. 16. It should be noted that the LEXICON byte indicator is used after the TBADD macro-instruction (stage 620, FIG. 16) to return control, if necessary, to the SCREEN macro at stage 684. After the transferral of EXECUTIVE POINTERS has been completed the new EXECUTIVE POINTER is constructed in the hole that was created by the INSERT. This insertion occurs as the final step in the INSERT macro-instruction.

The remaining stages of FIG. 18, which begin at stage 712, are discussed next.

Stage 712 is executed when the screen length, which is recorded in the half-word DUN1, is greater than zero. In this case control passes from stage 698 to stage 712, wherein the SUPERSCH macro-instruction is executed. SUPERSCH finds the location in an array or its extensions (or continuances) where the EXECUTIVE POINTER with the JOBLIST item screen should be. In SUPERSCH, which is fully discussed hereinafter, the continuance 20 bit of MSIGNAL can be turned on, and new arrays are created or fetched by passing directly from SUPERSCH to the MMATCH (stage 616) and TBADD (stage 620) macro-instructions, as shown in FIG. 16. In both these cases, if the search operation is not aborted in the MMATCH macro-instructions, control returns to SUPERSCH via stages 684 through 698 in FIG. 18. Thus when control goes from SUPERSCH (stage 712) to stage 714; the register IR6 specifies exactly where the EXECUTIVE POINTER with the JOBLIST screen should be.

At stage 714 the CSCREEN macro-instruction is used to compare the screen in the EXECUTIVE POINTER to zero and to the screen in the JOBLIST item, and set the byte JI with one of the following codes:

JI Meaning 00 The two screens are equal 01 The EXECUTIVE POINTER screen is zero 02 The EXECUTIVE POINTER screen has a lower numerical value than the JOBLIST item screen. 04 The EXECUTIVE POINTER screen has a higher numerical value than the JOBLIST item screen.

At stages 716, 718 and 724 of the SCREEN macro-instructions the specific number in byte JI is determined. If byte JI is zero, the program goes from stage 716 to stage 710, and the exit procedures of SCREEN are executed. If, at stage 718, JI is found to be 02 control goes to stage 720, and the register IR6 is incremented by register BRY, which contains the EXECUTIVE POINTER length. At the next stage, 722, the question is asked, "is register IR6 less than register IR3?" If the answer is "register IR6 is not less than register IR3" control goes to stage 692 wherein the MSIGNAL 20 bit is turned on and register IR6 is set equal to register IR3. At this point register IR6 contains the continuance address of an array that must be fetched or created, so control goes to stage 694 and then exits from SCREEN, as described above. If, at stage 722, it was found that register IR6 is less than register IR3, this would mean that there is at least one EXECUTIVE POINTER whose screen is greater than the screen in the JOBLIST item, and control goes to stage 714, which has been described previously. It should be recalled that all the equal length EXECUTIVE POINTERS are ordered within each array and its extensions (or continuances) with the lowest screen in the first position of the array. The operations in the branch, which begins at stage 720 and goes through stage 722 to stage 714, have been included to ensure that correct location is found by SUPERSCH (at stage 712).

If, at stage 724, it is determined that JI contains the number 04, then control goes to stage 726, wherein the question is asked "is MODE equal to zero?" If MODE is zero, the retrieval mode is indicated and the program exits from SCREEN directly to the MMATCH macro-instruction (at stage 616 in FIG. 16), wherein the unsuccessful search is aborted. If, at stage 726, it is found that MODE is not zero (i.e. the storage mode is indicated) then control goes to stage 728 and the INSERT macro is executed, in the manner described earlier. If, at stage 724, JI is found to contain zero, then the sequence of steps that begins at stage 730 is executed in the manner described earlier.

In FIG. 19, the SUPERSCH macro instruction is shown in flow diagram form. As was stated with respect to the description of FIG. 18, the SUPERSCH macro execution started after completion of stage 698, where it is determined:

a. the particular array that is to be searched has EXECUTIVE POINTERS whose length is a multiple of the contents of register R14, as it was determined that R1 was exactly divisable by R14; R9 by dividing R1 by R2. Next, the square root of the number of EXECUTIVE POINTERS in the array, which is in register R9, is computed and stored at the location MASK1. If the square root is not an exact number, it is truncated without rounding. Thus, an integer number is always stored in MASK1. Also, at stage 724, the absolute address of the last EXECUTIVE POINTER in the array is computed in register R1. This is done by subtracting the EXECUTIVE POINTER length, which is in register R2, from the absolute continuance address, which is stored in location C7.

After completion of stage 724, control goes to stage 726 wherein the macro-instruction LARGEXC is performed. The LARGEXC macro instruction is an extended form of the IBM basic assembler instruction XC. In this LARGEXC macro-instruction the number of bytes specified in the half-word DUN1 of the array DUMX are set to zero. Because the half-word DUN1 contains the screen length, this means that a zero screen has been constructed in array DUMX. After completion of stage 726, control goes to stage 728, wherein the byte JI is set equal to the hexadecimal number 60.

At this stage it should be understood that the SUPERSCH macro-instruction operates by moving back and forth along the array in jumps equal to the square root of the number of EXECUTIVE POINTER positions in the array, as recorded in MASK1. Once a subblock, of the array of length MASK1, where the screen might be located, is determined it is searched, one EXECUTIVE POINTER at a time. The left-most four bits of byte JI, which are initialized at the start of the SCREEN macro-instruction (see stage 684 of FIG. 18), are used to control the directional movements of the register-pointer during execution of SUPERSCH. These four bits of the byte JI have been assigned the following meanings in SUPERSCH.

i. The 80 bit indicates the direction in which the register pointer R1 must be changed. If the 80 bit is "on" (i.e., one) then R1 must be increased. If the 80 bit is "off" (i.e., zero) then R1 must be decreased.

ii. The 40 bit indicates the last direction of change for the EXECUTIVE POINTER register R1. That is, if the 40 bit is "on", the last direction of change of R1 was positive, and if "off", the last direction of change was negative (i.e., R1 was decreased).

b. further, it has been determined if the screen in the JOBLIST and the screen in the array are greater than zero. As the system enters SUPERSCH, the first stage of the operation is accomplished at stage 716. At stage 716, registers R0 to R15 are stored in the array C1. Then, in the next succeeding stage 718, certain registers are set.

First, both registers R0 and R6 are set equal to the screen length recorded in the half-word DUN1.

Register R2 is set equal to the EXECUTIVE POINTER length by adding the screen length, set in R0, and the composite address length &ADDL, which is a System Parameter.

Register R4 is loaded with the absolute machine address of the head of the array. This is accomplished by taking the address in location C4, which is the address of the first available EXECUTIVE POINTER in the array, and subtracting therefrom four bytes. These four bytes at the head of the array contain the relative continuance address. Thus the value in register R4 will be the absolute address of the start of the array.

Register R1 is then set to the length of the EXECUTIVE POINTERS in the array. This is accomplished by subtracting four bytes from the relative continuance address, found at the address recorded in register R4. The four bytes are subtracted from the relative continuance address because that address is relative to the head of the array.

After completion of stage 718, the control is transferred to stage 720 wherein the question is asked "is R1 evenly divisable by R2?". Actually, this question should always be yes as the same question has been asked at stage 688 in the screen macro. However, a check has made at stage 720 as to whether the length of the EXECUTIVE POINTERS in the array, as recorded in register R1 are divisable by the EXECUTIVE POINTER length recorded in register R2. If the answer is "no", then the program would continue at stage 722 to print out an error message for the SUPERSCH macro. If the answer at stage 720 is "yes", the program continues to stage 724.

At stage 724, certain parameters are computed. First, the number of EXECUTIVE POINTERS in the array is computed in register

iii. The 20 bit of the byte JI is used to indicate whether or not the first direction of change of register R1 has occurred. If it is "on", this means that the register R1 is to be changed for the first time. Of course, the 20 bit "off" means that register R1 has already been changed at least once.

iv. The 10 bit is used to record the last status of the 40 bit of the JI byte.

Now, at stage 728, the 40 and 20 bits of byte JI were turned on. This means, with respect to the 40 bit, that the EXECUTIVE POINTER register was last increased. This occurred when the position of the last EXECUTIVE POINTER in the array was computed in Register R1 at stage 724. With respect to the 20 bit, this means that the register R1 has not been changed from its initial setting.

At stage 730 the 80 and 10 bits are turned off, and control goes to stage 732, wherein the question is asked "What is the value of the 40 bit in byte JI?" If the 40 bit is one, then the 10 bit is turned on at stage 736 and the program goes to stage 734. If, at stage 732, the 40 bit is zero, control goes directly to stage 734. Thus at stage 734 the status of the 40 bit has been preserved in the 10 bit. At stage 734, the macro-instruction CSSCRN is executed. The macro-instruction CSSCRN is an extended form of the IBM basic assembly language instruction CLC. At stage 734, the question is asked "is the screen at the location indicated in register R1 (the last EXECUTIVE POINTER in the array at this stage time) equal to zero or a number other than zero?". If the screen is equal to zero, then control goes to stage 738. If the screen is a value other than zero, control goes to stage 740. At stage 740, another CSSCRN macro instruction determines if the screen, whose address is recorded in register R1, is equal to, less than, or greater than the screen whose address is recorded in register IR2. It would be remembered that the screen whose address is recorded in register IR2 is the screen in the JOBLIST item, and that the screen whose address is recorded in register R1 is the screen which we are looking at in the array. If the screen in the array is greater than the screen in the JOBLIST item, then we must go backward in the array to find the correct location and, accordingly, control goes directly to stage 738. If the screen in the array is less than the screen in the JOBLIST item, then control goes tostage 742, wherein the 80 bit in the JI byte is "turned on." This indicates that the register R1 should be increased to find the location in the array where the EXECUTIVE POINTER should be. From stage 742 the program goes to stage 738.

At stage 738, the 20 bit in the JI byte is checked. If the 20 bit is one, then, at stage 744, the 20 bit of the JI byte is turned off, and the register R9 is descremented by one. This enables us to return to the first EXECUTIVE POINTER in the array when the new value of R1 is computed from R9. Control goes from stage 744 directly to stage 756.

Before discussing what happens at stage 738, when the 20 bit is zero, a discussion must be made of what happens at stage 740 when the screen in the array is equal to the screen in the JOBLIST item. When this occurs, we have found an EXECUTIVE POINTER which has the JOBLIST screen, and control goes to stage 746 where terminationof the SUPERSCH operation will be effected. At stage 746, the location C4 has recorder therein the address in register R1. C15 has stored therein the screen length recorded in register R6 and, C16 has stored therein the EXECUTIVE POINTER length recorded in register R2. Then registers R0 to R15 are loaded from the array C1. Next control goes to stage 748 wherein the MSIGNAL 20 bit is checked. If the MSIGNAL 20 bit is on, the address specified by register R6 is a continuance address. If the MSIGNAL 20 bit is off, then the SUPERSCH procedure has been completed.

If, at stage 748, the MSIGNAL 20 bit was "on", control goes to stage 750 wherein the question is asked "is the continuance address whoselocation is recorded in register IR6 equal to zero?" If the continuance address is equal to zero, then control goes directly to the MMATCH macro in SSEARCH (see stage 616 in FIG. 16).

If, at stage 750, the continuance address specified by register IR6 is not zero, then it is stored in the location ADDRESS, and control goes directly to the TBADD macro-instruction in SSEARCH (see stage 620 in FIG. 16).

TBADD macro will fetch a new array, and control will eventually be returned to the SUPERSCH macro instruction for a search of the new continuance array.

At stage 738, if it was determined that 20 bit of JI was zero, meaning that this is at least the second time that control has come through stage 738, then the register R9 is reset to the value R9 minus the contents of MASK1 the square root of the total number of EXECUTIVE POINTERS. After completion of stages 754, or stage 744, control would pass directly to stage 756 wherein a check is made as to whether R9 is equal to or less then zero, this means that the subblock search has been completed and control passes directly to stage 758. However, it is in the best interest of this discussion to discuss the result when R9 is greater than zero and the subblock search must be completed and then return to stage 758 at a later time in this discussion.

When R9 is greater than zero, the program passes control to stage 760 wherein a register R15 is set equal to the contents of register R2 times the contents of register R9 or the exact number of bits which must be moved to look at a new EXECUTIVE POINTER in the array. After completion of stage 760, control passes to stage 762 wherein the JI 80 bit is checked. If the 80 bit is zero, this means we must move backwards and if the 80 bit is one this means that we must move forward.

If the 80 bit is zero, control passes to stage 764 wherein the 40 bit of the JI byte is turned off and R15 is reset to the address in R1 minus the value set in R15. This, in effect, sets the new EXECUTIVE POINTER address in the array which is to be checked. Then, at stage 766, a check is made as to whether this new address is greater than or equal to the address in register IR6 which is the first address in the array. If, the address in R15 is greater then or equal to the address in register IR6, the first address in the array, control would pass to stage 768 wherein register R1 would now be set at the address in register R15. If the answer at stage 766 had been less then zero, then control would have passed back to stage 700 for purposes of accomplishing a step search of the EXECUTIVE POINTERS in the sub-block. This type of search will be discussed later.

If the 80 bit of the JI byte at stage 762 is "one," this means that the EXECUTIVE POINTER index, R1, must be increased. Thus register R15 is set equal to register R1 plus the increment (which is in register R15) and the 40 bit of byte JI is turned on to indicate that the movement is forward. At stage 774 the question is asked "is this new value of register R15 greater than the continuance address which is recorded in storage C7?" If the address in register R15 is equal to or greater than the address in location C7, control goes to stage 766 to execute a series of instructions that will be discussed later. However, if the address in register R15 is less than the continuance address (recorded in location C7), control goes to stage 768 wherein the address recorded in register R15 is then transferred to location R1.

When the operation at stage 768 is completed the program goes to stage 778 wherein the question is asked "Are the 40 and 10 bits of the JI byte both on, both off, or mixed?" If they are mixed, then the subblock search continues by returning control to stage 730. If they are both on, this means that there has been two successive forward going steps and, accordingly, control goes directly to stage 776. If they were both off, this means there have been two successive backward going steps and control goes to stage 758 wherein the register R1 is decreased by the subblock length, which is in MASK1.

Control then goes to stage 780 wherein the question is again asked "is the new value in register R1 equal to or greater than the address of the first EXECUTIVE POINTER that is recorded in location IR6?" If that is the case, control goes directly to stage 776. If the new address in register R1 is less than the address in register IR6, control goes to stage 770, wherein the address in register R1 would be set equal to the value of IR6. This assures that the address in register R1 will not be less than the address of the first EXECUTIVE POINTER in the array. From stage 770 control goes to stage 776 wherein the macro-instruction CSSCRN is executed. There, the question is asked whether the screen of the present EXECUTIVE POINTER whose address is recorded in register R1 is equal to zero?" If it equals zero, the SUPERSCH termination procedure that start at stage 746 is executed. If at stage 776, the answer is "not zero" control goes to stage 782. (see above). At stage 782 the CSSCRN macro-instruction is used to ask the question: "is the screen whose address is recorded in register R1 less than the screen whose address is recorded in register IR2 (the JOBLIST item)?" If the answer is "not less than", then control again goes to stage 746 to begin the SUPERSCH termination procedure. It should be noted here that if the two screens are equal, the exact location in the array has been located, and no new EXECUTIVE POINTER will be inserted. However, if, during a storage mode, the screen whose address is in register R1 is greater than the JOBLIST item screen, whose address is in register IR2, then the INSERT macro-instruction of FIG. 20 will eventually be used to create a hole and insert a new EXECUTIVE POINTER.

If the answer at stage 782 is "less than" control goes to stage 784 wherein register R1 is incremented by the EXECUTIVE POINTER length, which is in register R2.

Then, control moves to stage 786 wherein the question is asked "is register R1 greater than, less than, or equal to the continuance address that is recorded in location C7?" If register R1 is greater than the continuance address, control goes to stage 722, and therein indicates that there is an error in SUPERSCH. If register R1 is equal to the continuance address, then control goes to stage 788 wherein the MSIGNAL 20 bit is turned "on", thus indicating that a continuance of the array must be fetched. Control goes from stage 788 to stage 746 wherein the SUPERSCH termination procedure begins (see above).

If, at stage 786, register R1 is less than the continuance address, control returns to stage 776. This cycle (through stages 776, 782, 784, and 786) is repeated until control goes to stage 746 (from stages 776 or 782) or either of stages 722 or 788 (from stage 786). Thus, in the SUPERSCH macro, we have either found the exact location of an EXECUTIVE POINTER or found the exact location where the new EXECUTIVE POINTER is to be inserted, or we have determined that an extension (or continuance) of the array must be fetched or created and searched.

FIG. 20 is the flow diagram for the macro-instruction INSERT. By the time the program reaches the macro-instruction INSERT, at stage 428 of FIG. 18, certain things have occured. First, the particular EXECUTIVE POINTER recorded in register IR6 has been defined, and, additionally, it is known whether, at that address, the space is empty or filled. Further, if the space is filled, we know that the screen in the array is greater than the screen in JOBLIST. Additionally, we know whether we are in the retrieval or storage mode. We know that it is in the storage mode. Further, we have the JI value set in CSCREEN; that is, the value of the JI byte is either 01 or 04 meaning that the screen in the array is zero or that the screen in the EXECUTIVE POINTER in the array is higher than the screen in JOBLIST.

The first stage of the macro INSERT is stage 800 wherein the question is asked "is JI equal to 01?". If the byte JI is 01, then this means that the screen in the array is equal to zero.

If the value of the screen is zero, the program continues to stage 802. If JI is not 01, then the program continues to stage 804. For purposes of discussion we will assume that the 04 bit of JI is "on" and that we are proceeding to program stage 804. At stage 804 two registers are set. The first register RO is set at the length of the EXECUTIVE POINTER which is stored in the register BRY. Register R1 is set to the address of the EXECUTIVE POINTER in the array, which is in register IR6. After these two registers are set, the program continues to stage 806 wherein the question is asked "is the EXECUTIVE POINTER in the array whose address is in register R1 vacant or zero?" If it is vacant or zero then the program would continue to stage 808.

If, at stage 806, the EXECUTIVE POINTER whose address is in register R1 is not zero, then the program would continue through a loop defined by stages 810 and 812 until a vacant or zero below the starting address is found in the array. First, if the EXECUTIVE POINTER is not zero, at stage 810, register R1 is reset to one EXECUTIVE POINTER length beyond the address set in IR6, by adding register R0 to register R1. Then, at stage 812, the determination is made as to whether the continuance address has been reached. If it has not been reached, the program returns to stage 806 and a determination is made as to whether at that new R1 the EXECUTIVE POINTER is zero. If the EXECUTIVE POINTER was zero, then control goes to stage 808. If the EXECUTIVE POINTER at the new address in R1 is filled, the program would continue through stages 810 and 812 until, either, (a) an empty EXECUTIVE POINTER location would have been found, or, (b) the absolute continuance address in register IR3 would be found. If the absolute continuance address in register IR3 is found, the program will continue to stage 814. At stage 814, the following registers would be set:

a. R1 is reset to the address of the last EXECUTIVE POINTER in the array by subtracting the value of R0 from R1. Then,

b. Registers R0 to R6 are stored in array C1; and

c. The value of register IR3 is decreased by R0.

After completing these steps, the program continues to stage 816 where a determination is made as to whether the 01 bit in LEXICON is on.

At this time, perhaps a discussion of the LEXICON array and its use in INSERT is desirable. The 16 bytes of the LEXICON array of are used by the INSERT macro. In thefirst byte, which is set in SSEARCH, the last four bits are significant. The 01 bit is used to signify whether an address has been determined forLEXICON in the YY array storage. Further, when the 01 bit is on, there is an indication that in addition to the address in the YY array having been determined, the screen in JOBLIST has been stored in the high end of the YY array and the address of the screen is stored beginning at LEXICON plus 4 bytes. At stage 816, if the 01 bit is zero, then the program would continue to stage 818. If the 01 bit has been on, the program would have continued to stage 820. At stage 818 certain steps are taken:

a. the register IR1 is set at a value equal to the beginning address of the main storage array, which is storedin AYY during the initialization by SSTATECL;

b. the System Parameter &LTHAYY, which is the length of the main storage array YY, is added to IR1. This would bring us to the end of the YY array.

c. From this value is subtracted the half word DUN1, which is the length of the screen in JOBLIST.

c. Then, this address IR4 is stored in LEXICON plus 4 bytes.

e. In LEXICON plus 8 bytes is stored the address IR1 minus R0. R0 contains the length of the EXECUTIVE POINTER in the array.

f. In LEXICON plus 12 bytes is stored IR1 minus 2 times R0.

(g) In LEXICON plus 16 is stored ADDRESS, which is the composite address of the next available array which will be requested in CREATE.

Then the program continues to stage 822 wherein the LMOVE macro is used to move the screen from JOBLIST and store it at the location described by register IR1 at the end of the array YY. This saves the JOBLIST screen in storage for its eventual insertion in the correct position in the array.

Then, as with the case of the LEXICON bit being turned "on", the program continues to stage 820.

At stage 820 a check is made to determine whether the 02 bit of LEXICON is turned on. If the 02 bit is zero, then, at stage 824, the 02 bit in LEXICON is turned on and register IR1 is loaded with the address in LEXICON plus 8. If the 02 bit were on a stage 820, the LEXICON 04 bit would have been turned on at stage 826 and register IR1 would have been loaded with the address at location LEXICON plus 12. In this case, it means that there is already an EXECUTIVE POINTER defined by the address in LEXICON plus 8 and, therefore, it is necessary to load the new EXECUTIVE POINTER in the address defined at location LEXICON plus 12.

After completion of the steps at either 824 or 826, control goes to stage 828 wherein the half word MASK1 is set equal to the length of the EXECUTIVE POINTER, which is in register R0. Then control is transferred to stage 830, wherein the left move macro instruction LMOVE is executed and the EXECUTIVE POINTER in the array, defined by register IR3, is stored at location in register IR1 in the YY array. Then registers R0 to R6 are loaded from array C1 at stage 832. These registers were saved during the transfer process at stage 814.

At stage 808, the right move macro RMVC is executed. For this macro, certain registers are set. Register BRYY, the number of bytes which have to be moved within the array, is set equal to register R1 minus register R6. Register BRY set equal to register R1 minus 1. Register R1 is set equal to register BRY plus register R0. BRY now contains the end location of the old array.

It should be understood that R1 contains either the address of the last byte in the last EXECUTIVE pointer in the array, or alternatively, if this stage 808 had been reached directly from stage 806, the address of the last byte in the first vacant EXECUTIVE POINTER in the array.

The right move macro RMVC moves the EXECUTIVE POINTER starting at the address IR6 to the end of the array one EXECUTIVE POINTER length, leaving a hole in the sub array for the new EXECUTIVE POINTER whose screen is in JOBLIST and whose address will be inserted in MMATCH. The last EXECUTIVE POINTER in the array must be stored and, in fact, was stored previously, in the save area of the YY array and the address where said EXECUTIVE POINTER was stored is recorded at either LEXICON plus 8 or LEXICON plus 12.

At the completion of the right move macro RMVC at stage 808, the program continues to stage 834 wherein location C15 and register BRYY are both set equal to the value of the half-word DUN1, which is the length of the screen in JOBLIST. Location C16 and register BRY are set equal to the value of the half word DUN1 plus &ADDL. Thus BRY and C16 now contain the length of the EXECUTIVE POINTER which will be inserted in the array. Next registers R0 to R6 are stored in arry C1.

Then, the program continues to stage 836 wherein the question is asked "is the LEXICON byte zero?" This would have occured only if the program stage 808 had been entered directly from stage 806. If this was so, the LEXICON array would not be used because no continuance is required; and, accordingly, the program then continues directly to stage 838 wherein certain operations would be accomplished before exiting from the program.

At stage 838, RO is set to the value of the half-word DUN1, IR6 is changed to the IR6 value plus R0. IR6 now contains the location of the place in the array where an address must be added to the screen from JOBLIST to form a new EXECUTIVE POINTER. The address at IR6 is, of course, set to zero. Then IR6 is reset to the screen address in the array by subtracting therefrom the value R0.

After completing these steps, the control goes to stage 840 wherein another LMOVE macro is executed. The screen in JOBLIST is moved to the address designated by IR6 in the array.

After completing stage 840, IR2 is reset to equal IR2 plus BRYY or the address of the next screen in JOBLIST. IR6 is reset to IR6 BRYY or the position in the array where the address must be placed adjacent to the screen just taken from JOBLIST. This is accomplished at stage 842. After completion of stage 842, the program is completed.

Returning to stage 836, we will consider the possibility when LEXICON is not equal to zero. Then, the program continues at stage 844 wherein a determination is made whether the 04 bit in LEXICON is zero or one. If the 04 bit is zero, that means that the location defined by the address in LEXICON plus 12 byte is empty. If the 04 bit is one, then the location defined by the address in LEXICON plus 12 is filled. It should be understood that when the location in the YY array is filled whose address is in LEXICON plus 12 bytes then, necessarily, the location determined by the address in LEXICON plus 8 bytes is also filled and two EXECUTIVE POINTERS are in place. If the 04 bit is on, the program continues to stage 846. If the LEXICON 04 bit is zero, the program continues to stage 848.

At stage 846, the half word MASK1 has stored therein the contents of register R0. R0 contains the EXECUTIVE POINTER length. Then register IR1 is loaded with the address in LEXICON plus 8 bytes.

If the 04 bit of LEXICON were zero, as was stated previously, the program would have continued at stage 848 wherein the 01 bit of LEXICON would be checked. If the 01 bit in LEXICON were one, this would have meant that there were no more continuances and control would have gone to stage 852 and thence in the same manner that will be discussed with respect to stage 802. However, for our purposes, we will discuss the operation when stage 848 indicates that the 01 bit of LEXICON is zero. At stage 854 the half-word MASK1 is loaded with the EXECUTIVE POINTER length, which is in Register R0. Then, control goes to stage 856 wherein the macro LARGEXC is executed. This instruction merely zeros or erases the EXECUTIVE POINTER which was in the hole identified by the address recorded in register IR6 IR6 has recorded therein the address of the hole where the screen of JOBLIST is to be inserted. Thus, at stage 856, this hole has been erased and cleared for a later insertion

After completing the step at stage 856, control is transferred to stage 858 wherein the LEXICON 01 bit is turned on. Then, at stage 860 the address in LEXICON + 8 is recorded in register IR1. Then, the program continues to stage 862. At stage 862, another LMOVE macro instruction is executed and the screen of the EXECUTIVE POINTER located by the address in register IR1 is moved into the JOBLIST in place of the screen originally therein. The screen originally therein, of course, has been saved in the YY array at the address recorded in LEXICON + 4.

After completion of the operation at stage 862, control goes to stage 864 wherein the registers R0 to R6 are loaded from the array C1 and the MSIGNAL 20 bit is turned on. After completion of stage 864, it will be obvious that the insertion operation has not been truly completed by the operation above described, but the program will continue, as shown in FIG. 16 through MMATCH stage 616, TBADD macro 620, and stage 626 back to the start of SCREEN at stage 614. This will recycle and, when it comes through for the second time, the LEXICON 04 bit will be on. When the LEXICON 04 bit is on, control would have come through stage 846 to stage 850. At stage 850 an LMOVE macro-instruction would have been executed to insert the EXECUTIVE POINTER recorded in the YY array at the address in LEXICON plus 8 bytes into the opening whose address is recorded in register IR6

After completing stage 850, control is transferred to stage 866 wherein register IR3 is loaded with the address in LEXICON + 12. Then, at stage 868, another LMOVE macro is executed to transfer the EXECUTIVE POINTER in the YY array determined by the address in LEXICON + 12 to the place in the YY array determined by the address in LEXICON + 8 bytes. After completing stage 868, control goes to stage 870 wherein the 08 and 04 bits in the LEXICON byte are turned off. Then, control goes to stage 862 wherein the screen stored in the YY array whose address is at LEXICON + 8 bytes is transferred to JOBLIST. Then control passes through stage 864 as discussed previously.

If the LEXICON 04 bit was zero and the LEXICON 01 bit was one, the control goes to stage 852, as discussed earlier. Similarly, when the 01 bit of the JI byte was on, control also goes from stage 800 to stage 852. In stage 852, the register IR1 has recorded therein the address in LEXICON + 8 bytes. The half word MASK1 has recorded therein the value of R0, which is equal to the value in the half word DUN1 plus &ADDL, the EXECUTIVE POINTER length. From stage 852 control is transferred to stage 872 wherein another LMOVE macro instruction is executed and the EXECUTIVE POINTER stored in the YY array at the address defined in LEXICON + 8 is inserted in the array at the address recorded in IR6. Then, at stage 874, IR1 register is reset to LEXICON + 4 bytes. At stage 876, the screen whose location was determined by the address in LEXICON + 4 bytes is transferred back to the JOBLIST.

After completing stage 876, control goes to stage 878 and certain parameters are set. First, the byte LEXICON is set to zero; the registers R0 to R6 are loaded from array C1, and the MSIGNAL 20 bit is turned on. After completing stage 878, control is transferred to stage 620 in FIG. 16.

It must be understood that the hole created in the array will be filled when control returns to SCREEN, after cycling through SSEARCH, at stage 730 and goes to stage 732. The LMOVE macro in stage 732 will insert the correct screen in the hole left in the array.

In FIG. 21, there is shown the flow diagram for the MMATCH macro. In the first stage in MMATCH a determination is made as to whether the address associated with the EXECUTIVE POINTER in the array, as determined by the address recorded in register IR6, is zero or a value other than zero. IR6 has been set by any one of the three stages in INDEX search 610, screen search 614 or AUXFILE search 630 to point at the address of the EXECUTIVE POINTER in the array or to the continuance address. If the address specified by register IR6 is not zero then there is no mismatch; and, therefore, control should go to TBADD. Thus, MMATCH will be bypassed and control will go directly to TBADD. The output of stage 880 is connected to stage 882 wherein the address specified in register IR6 is recorded in the location ADDRESS.

Then the program continues to stage 884 wherein the question is asked "is the MSIGNAL 20 bit zero or one?". If the MSIGNAL 20 bit is "one", indicating that one must access or create a continuance, the program goes directly to the TBADD macro. If the MSIGNAL 20 bit is off or zero, the program continues to stage 886. There the half word MASK2 value is stored in the half word DUN1. As we noted previously, the half word DUN1 contains the screen length and it now contains the screen length of the next screen in JOBLIST.

Further, the third byte of MSIGNAL (or MSIGNAL + 2) bytes is incremented by to indicate that this is the first screen in JOBLIST. Each time MMATCH is executed at stage 884 and the MSIGNAL 20 bit is 0 the MSIGNAL + 2 byte is incremented by one. After completing stage 886 the control goes to the TBADD macro at stage 620 in FIG. 16. The programatic stages 880, 882, 884 and 886 are not, physically within the MMATCH program listing, but they have been show in the flow diagram for purposes of clarity.

After a determining at stage 880 that the address specified by register IR6 is zero, control is transferred to stage 888 where SRGATE is set equal to 02. SRGATE is a special gate which has three different settings. If SRGATE is 02, this means that the search was successful and the search may be continued. If SRGATE is 01, this means that the search must be reexecuted and that the JOBLIST will have been rearranged. This occurs in the macro-instruction STRATEGY. If the SRGATE is 00, this means that the search has failed and it must be terminated This is accomplished by exiting to the instruction FINISHED in SSEARCH. The next stage of MMATCH is at stage 890 where the question is asked "is MODE equal to one?" If MODE is equal to one, then the search is in the storage mode. If MODE is not equal to one, then a retrieval or one of the three update modes operations is being executed. Thus, if the answer at stage 890 is that MODE is one, control goes to stage 892. If the answer at stage 890 is that MODE is not equal to one, the control is transferred to stage 894. At stage 892 the MSIGNAL 01 bit is turned on, indicating that the resident memory block has been changed. This information will be used by the TBADD macro and the Global Memory component (SMEMORY) if a new memory block is required.

After completing step 892, control goes to stage 895 and SRGATE is checked. If the SRGATE is zero, then this means that the search is finished and it is terminated through stage 896 at FINISHED. If SRGATE is 01, this means that the JOBLIST was rearranged in the STRATEGY macro and a new search must be started by going through stage 898 to the location NEWPLAY in the SSEARCH component. If, the SRGATE is in fact 02 at stage 895 then control goes to stage 900 wherein the composite address EMPTY is stored in the array at the address defined by register IR6. This completes the constructions of the new EXECUTIVE POINTER in the array. Additionally, the MSIGNAL 04 bit is turned on and the location at the head of the M array where address EMPTY is normally stored is zeroed. It should be remembered that the M and J arrays are permanently resident in core. Also, at storage 900, the address in register IR6 is recorded in the storage location BWX plus 76. This information may be required by the CREATE macro and the SMEMORY component.

At stage 894 the register IR6 contains the address of a vacant or zero location in the array. This means that the retrieval and update has been unsuccessful and the override code information must be used to determine whether or not the search must be aborted. It should be noted that the three remaining options for MODE (namely 2, 3, and 4), which were discussed earlier, can be carried out by appropriate program steps between stages 890 and 894.

If, at stage 890, it was determined that the search was not in the storage mode, the program would have continued to stage 894 wherein the counter JII is set to zero. SRGATE is also set to zero. This information indicates that the search has been unsuccessful. This information can, of course, be overriden if in the next succeeding states 904, 906 and 908 control goes to STRATEGY and STRATEGY determines that, in fact, the search can be successful.

After completing the operation at stage 894, control goes to stage 902 wherein the question is asked "is the MSIGNAL 20 bit zero or one?". The MSIGNAL 20 bit indicates whether or not a link (or continuance) address is to be inserted at the foot of the array. If a link address is being considered the answer is one, and control goes directly to stage 895. Because the SRGATE has been set at zero the unsuccessful search will be aborted through stage 896. If, at stage 902, the answer was "zero," then the MSIGNAL 20 bit was not turned on and control goes to stage 904 wherein the question is asked "are there any Type 1 over rides?" If the answer is "no", then control goes to stage 906 wherein the question is asked "are there any Type 2 over rides?". If the answer at this stage is "no", the program would have continued to stage 908 where again the question would be asked "are there any Type 3 over rides?" If the answer at stage 908 is "no", control goes to stage 895 and, since the SRGATE is zero, the unsuccessful search would be aborted through stage 896 at location FINISHED in the SSEARCH component.

If at any one of the stages 904, 906 or 908, Type 1, Type 2, or Type 3 overrides were found, control is transferred to stage 910 and the macro-instruction STRATEGY is executed. Type 1, Type 2 and Type 3 overrides are introduced in the assigned descriptor-sets and they are counted and their locations are noted in the translators when these descriptor-sets are rearranged to their JOBLIST item forms. This occurs before the search procedure begins. Thus, if there is Type 1, Type 2 or Type 3 overrides are present in the assigned descriptor-sets the macro-instruction strategy will be used.

The flow diagram for the STRATEGY macro-intruction is shown in FIG. 22. In the first stage, 916, a special TRANSFER macro-intruction transfers control to the special component SMATCH. In SMATCH there will be accomplished, automatically, the inverted or intersecting file type search.

At stage 918, the question is asked, "what is the value of of SRGATE?". If SRGATE is two, then control goes directly to stage 922 wherein the question is again asked "is SRGATE equal to, less than, or greater than one?". If SRGATE is zero, at stage 918, the control passes to FINISHED in the SSEARCH macro. This would terminate the search.

If SRGATE is one, at stage 918, the following conditions have probably occurred. The screen in JOBLIST might have been A1BD. If, in searching the screen array, SMATCH at stage 916 determines that there is a ABCD screen in the array, the SMATCH would have set the SRGATE so that the MOBILE CANONICALIZATION routine would be executed at stage 920. SMATCH would not have set the SRGATE at one if, during the translation stage, the JOBLIST had not been normalized.

At stage 920, control is transferred to the macro-instruction MOBILE wherein the MOBILE CANONICALIZATION package is executed on the JOBLIST item. At the end of the SMOBILE program, control is transferred back to the stage 922 in STRATEGY. MOBILE CANONICALIZATION can be defined as a strategic rearrangement of the JOBLIST item which might effect matching. If the JOBLIST item can be rearranged so as to achieve matching, then, SMOBILE, will do it.

After completing stage 920, control goes to stage 922 wherein again the question is asked "is SRGATE equal to, less than or greater than one?". If less than one, the unsuccessful search is terminated at the location FINISHED in SSEARCH. If SRGATE is equal to one control goes to the location NEWPLAY in SSEARCH. This occurs when new permitted arrangements of the JOBLIST item were effected in SMOBILE. If SRGATE is greater than one, the matching procedure executed in SMATCH and or SMOBILE) has disclosed the existence of an acceptable information subpath. In this case, a stage 912, the composite address in location 0(IR6) is loaded into ADDRESS and control goes to stage 924 in the TBADD macro.

The flow diagram for TBADD is shown in FIG. 23. In TBADD, the first stage (924) contains a COMPARE macro-intruction which compares the slow address parts of the composite addresses CURRENT and ADDRESS. If these two slow memory addresses are equal, this means that the requested memory-block is already resident in core-memory. It should be remembered that CURRENT contains the virtual memory address of the resident memory block, and ADDRESS contains the virtual memory address of the requested memory-block. The second part of ADDRESS, which is called the fast-memory address, specifies the beginning address of the requested array when the requested memory-block is core-resident.

Now to return to FIG. 23, if at stage 924 the slow address parts of CURRENT and ADDRESS are equal then the requested memory-block is already resident in core and control goes to stage 926. There, the APART macro-instruction extracts the fast-memory part of ADDRESS and stores it in register IR6 is checked. Next control is transferred to stage 928 and the 04 bit of the MSIGNAL byte is checked. If the 04 bit of MSIGNAL is off, control goes to stage 930. The 04 bit of MSIGNAL indicates whether or not the requested sub array exists in core-memory. If the sub array already exists the MSIGNAL 04 bit is off. If the subarray does not exist, then the MSIGNAL 04 bit is one and, a new subarray must be created by the CREATE macro at the address specified in register IR6.

If the MSIGNAL 04 bit is on, control goes to stage 932 wherein the macro instruction CREATE is executed. The operation of the macro-instruction CREATE will be more fully discussed hereinafter (see FIG. 24). However, for purposes of a simplified description, CREATE first checks to see whether it is possible to fit a new subarray in the resident memory block. If there is not enough space in the resident block CREATE calls the Global Memory Component (SMEMORY), which writes the resident memory-block in virtual memory and then creates a new resident memory-block. All the composite addresses are updated. Control then goes to state 930 wherein the MSIGNAL 04 bit is turned off. After completing stage 930, control is transferred to stage 626, as shown in FIG. 16.

If there is room in the memory block for the subarray, CREATE simply creates the subarray, updates the composite addresses in the memory block, then control goes to stage 930.

If, at stage 924, it was determined that the slow memory portions of CURRENT and ADDRESS were not equal, then control passes to stage 934 wherein the slow portion of ADDRESS would be compared with zero. If ADDRESS was equal to zero, this would mean that a memory block is not required because the fast memory address in ADDRESS specifies a location in the permanently core-reident part of the AUXILIARY FILE. Accordingly, control passes immediately to stage 926. If the slow-memory part of ADDRESS is not equal to zero at stage 934, then control passes to the macro GLOBAL at stage 936. GLOBAL calls the Global Memory component (SMEMORY), which supervises the memory-blocks of the AUXILIARY FILE. GLOBAL insures that the resident memory-block will be updated in virtual memory and it fetches (or creates) the new memory-block whose address in in ADDRESS. After GLOBAL has completed its operations, the composite address CURRENT is set equal to ADDRESS. This is accomplished at stage 938. This means that request address (ADDRESS) is now also stored in CURRENT.

After completing stage 938, control goes to stage 926 in the manner discussed previously.

In FIG. 24, there is shown the flow diagram for the CREATE macro. In the first state, 940, initializing information is computed. First, register LR5 is set equal to the sum of the values in register IR6 and IR1. This sum in register IR5 is the absolute machine address of the head of the array that is to be created. REgister IR6 contains the relative address of the head of the array that is to be created within the resident memory block. Register IR1 contains the base address of the memory block.

Register BRY is set equal to &TRKL times &NTRKS or the memory block size. &TRKL and &NTRKS are system parameters with &TRKL being the number of bytes per track and &NTRKS being the number of tracks per memory block. Register R0 has recorded therein the half word DUN1, which is the screen length. Register R1 has recorded therein &ADDL, which is the address length for the array. It is now necessary to determine, first, how much storage is required for the array so that it will then be possible to determine whether there is sufficient room in the memory block for the new array to be created. Thus, the first step is to proceed to stage 942 where the question is asked "is R0, which contains the screen length, greater than zero, zero, or less than zero?" At stage 944 register R1 will be set at &LSLOW, the slow memory address length which is a system parameter used in AUXFILE, and Register R1 is set to zero.

If register R0 is equal to or greater than zero, control goes directly to stage 946. Control also goes from stage 944 to 946. At stage 946, register DUN7 has recorded therein the sum of registers R1 and R0. In the case of control coming from stage 944, DUN7 will contain &LSLOW. If register R0 was equal to or greater than zero, then DUN7 will contain the sum of &ADDL and the screen length. This is the EXECUTIVE POINTER length for the array. Additionally, at stage 946, register BRYY is loaded with &MATRIXS, which is a system parameter indicating the number of EXECUTIVE POINTERS in a secondary array. Secondary arrays are those associated with the screens BD.sub.1, LD.sub.1, BD.sub.2. . . , and the bulk storage addresses.

The value of the byte (MSIGNAL+2) determines which kind of array is to be created. At stage 948, the question is asked "does the MSIGNAL plus 2 byte have two, more than two, or less than two therein?". If there is less than two in the MSIGNAL +2 byte, then, at stage 950, BRYY is set to 20. If there is less than 2 in the MSIGNAL plus 2 byte this means an array associated with index M or screen J is being created. In this case the length of the array is arbitrarily set at twenty EXECUTIVE POINTERS. If the MSIGNAL + 2 byte is greater than two, then the present value of &MATRIXS is correct and control goes directly to stage 952. If the MSIGNAL + 2 byte contains two, then control goes to stage 954 wherein BRYY would be set to equal to &MATRIXL. &MATRIXL is a system parameter indicating the number of EXECUTIVE POINTERS which would fill the primary array, which is associated with the screen LD.sub.o.

At stage 952, the register BRYY (the number of EXECUTIVE POINTERS in the array is multiplied times DUN7 (the EXECUTIVE POINTER length) and four bytes are added thereto then the total is stored in register R0. At this point the register R0 contains the relative continuance address in the array that is to be created. The extra four bytes are added to account for the location at the head of the array where the relative address of the continuance address is stored. Then, register R1 is set equal to register R0 plus &ADDL, which is the total length of the array that is to be created. &ADDL is added in order to provide space for the continuance address, which will be added at the end of the array. Control then goes to stage 956 where the question is asked "is register R1 (the length of the array), greater than BRY (the size of the memory block?)". If the answer is yes, then there is something wrong in the system which must be checked. First, the program would proceed to stage 958 wherein the register R14 is decreased by 1. This means that the length of the array will be decreased by one EXECUTIVE POINTER length. Then control goes to stage 960 where the question is asked "is R14 greater than zero?". If this is the case, as it should be, control returns to stage 952 to repeat stage 952 and 956. This byte will continue until either the memory block size is greater than the array size, as determined at stage 956, or the array size is zero. If the register R14 is zero, control goes to stage 962 where certain registers are saved. The exact operation at stage 962 and the succeeding stage 964 will be discussed with respect to another phase of the macro-instruction CREATE. However, suffice to say that control eventually goes to stage 966 wherein the question is asked "is register R14 equal to zero or a value other than zero?". Since, at stage 969 it was determined that register R14 was zero, control goes to stage 968 where a message is printed that a screen is too large for the memory block, and termination of the SOLID System would begin at location CL1 in the CONTROL routine.

If the array length is less than or equal to the memory block length then control is transferred from stage 956 to stage 970. At stage 970, the DUN7 is loaded with register R0. This occurs because, if the array length had been decreased through stages 958 and 960, then DUN7 will have changed. The value in DUN7 is then stored in the array at the location designated by register IR5. Thus the value of IR5 is increased by 4 bytes. Thus, IR5 now points at the first EXECUTIVE POINTER or element in the array that is to be created. At the final step of stage 970 the registers R0 to R4 are stored in the array C1.

Next, control goes to stage 972 where the EMPTY is decomposed into its five parts in registers R0, R1, R2, R3, and R4. At this stage, register R4 contains the address of the first unused byte in the memory block. Control then passes to stage 974 where register R14 is set equal to the sum of DUN7 plus &ADDL plus register R4 . Thus register R14 contains the relative address of the last byte in the new array that is to be created in core-memory. Register R15 is set equal to R15 plus SAVEYY minus AYY. In this equation, the original R15 was the length of the memory block, computed in BRYY, SAVEYY is the absolute machine address of the beginning of the resident memory-block and AYY is the absolute machine address of the beginning of the M and J arrays. Thus register R15 now contains the relative address of the last byte in the memory block. At stage 976, a determination is made as to whether or not R14 is greater than R15. If R14 is greater than R15, then this means that the new array cannot be created in the resident memory-block. If R14 is less than or equal to R15, control goes to stage 978 where the register R4 is set equal to register R14. From stage 978 control goes to stage 980 where the macro instructions ASADD performs the task of updating EMPTY with the new value in register R4. After completing stage 980, control goes to stage 982 where registers R0 to R4 are loaded from array C1. Next, control is transferred to stage 984 where register R0 is set equal to register R1 minus 4. Since register R0 originally contained the relative address of the continuation address and register R1 contains the length of the array, register R0 now contains the length of the array minus four bytes.

From stage 984 control goes to stage 986 where contain safety checks are completed. First, the slow memory portion of address EMPTY is compared with the slow memory portion of address EMPTY + &ADDL. If they are equal, this means that no memory block is to be created and, accordingly, at stage 988, EMPTY is set equal to EMPTY thus updating the fast portion of EMPTY. If the two are not equal at stage 986, control goes to stage 990 wherein the contents of register R0 are stored in the DUM1. Then, at stage 992, the macro-instructions LARGEXC is performed. In LARGEXC, the entire array except for the first four bytes are zeroed. At this point the new array has been created. After completing stage 992, control goes to stage 994 where register R15 is set equal to SAVEYY, the absolute address where the CURRENT address is recorded in the memory-block. Next, the updated composite address EMPTY is stored in the location specified by register R15. After completing stage 994, control returns to the TBADD macro.

The more difficult problem occurs when, at stage 976, it is found that there is insufficient space in the resident memory block to create the new array. When this occurs, control goes directly to stage 996.

At stage 996, the macro-instruction LINKHOLE (defined previously as macro 115) determines whether or not there are any unused arrays in the memory block which can be used for the new array. If the answer is yes, control would go immediately to stage 990, because the array already exists in the memory block. If the answer is no, control goes to stage 998 where a COMPARE macro-instruction compares the slow memory address portions of EMPTY and (EMPTY + &ADDL). If they are equal, this means we have not yet computed the slow memory address of the new memory block that is to be created. If they are not equal, this means that the slow memory address of the new memory block that is to be created has been computed and, accordingly, control goes directly to stage 1000 and, thence, to stage 1002. If a new slow memory address had not been computed, stage 1002 will be reached at the end of the branch which begins at stage 962. For purposes of clarity, it is assumed that the parts of EMPTY and (EMPTY + &ADDL) were not equal at stage 998. At this point the slow memory address of the new memory block has been computed and, accordingly, at stage 1000, register R15 is loadedwith the address in SAVEYY, which is the address of the location at the foot of the M and J arrays. Then, the EMPTY address is stored at the foot of the MJ array. Next the MSIGNAL 80 bit is turned off to indicate that the request for a newly created memory-block is being executed. Also, the request address (ADDRESS) is set equal to EMPTY + &ADDL. Register R15 is then updated to (BWX+76). (BWX+76) contains the address in the resident memory block where the new composite address was inserted. (see stage 900 in FIG. 21). Then, in this location in the last memory block, address EMPTY + &ADDL is inserted. Thus, the old memory block has now been updated by inserting the new value of the address of the new array. From stage 1000 by inserting control goes to stage 1002. However, before discussing the operation at stage 1002, let us consider the case when, at stage 998, the slow memory addresses of EMPTY and (EMPTY + (ADDL) were equal. In this case, control goes to stage 962 where register R15 is loaded with the address in SAVEYY, which is the address of the composite address at the foot of the M and J arrays. Then, EMPTY is stored in the location at the foot of the M and J arrays. Further, R15 is loaded with the address contained in (BWX+76), and the composite address in the resident memory block that is specified in register R15 is set to zero.

From stage 962 control goes to stage 964 wherein the macro instruction APART separates the five components of the composite address EMPTY + &ADDL and records them in the five registers R0 to R4. Then, at stage 966 the question is asked "is register R14 equal to or greater than zero? " If R14 is equal to zero, the abort procedure which begins at stage 968 is executed. If R14 is not zero control goes to stage 1004. For purposes of clarity, stage 1004 has been shown as a block grouping which will effect the following:

Stage 1004 is a series of steps which compute the five components for the new composite address for the memory block that is to be created. These five components are as follows:

1. &RD which is the device type number (disc. tape. data cell, etc.) which is recorded in register R0.

2. &rdo which is the device number recorded in register R1.

3. &rtrk which is the beginning track on the device recorded in register R2.

4. &rcyln which is the beginning cylinder on the device recorded in register R3.

5. &rfmadd which is a relative fast memory address in core where there is space to create an array.

Because a new memory-block is being created &RFMADD is set equal to the address of the foot of the M and J arrays, plus &ADDL. If the device is a tape rather than a disk, it is not necessary to have both track and cylinder numbers, but only a record number and, accordingly, &RTRK and &RCYLN together contain a single record number recorded in registers R2 and R3. If, in computing the new components for the composite address of the new memory block, it is determined that there is, in fact, insufficient equipment to store the new memory block; for example, one has run out of disk memory and there is no other available virtual memory, then a message is printed to notify the machine operator that he needs to obtain additional storage devices for virtual memory. This abort procedure is completed through the Global Memory component SMEMORY. The resident memory-block is saved and the operator is notified that additional devices must be obtained. Additionally, the system also advises the operator as to what type of devices are to be preferred. The declared universe of the system, what is the amount of devices available to the system, is defined in a single macro-instruction called BEGINS.

From the block of stages 1004 control goes to address (EMPTY+&ADDL) is assembled from the five registers R0 to R4. Then R4 (=&RFMADD) is incremented by &ADDL at stage 1008. It should be understood that in stage 1004 the register R4 contained the value in SAVEYY and it is necessary to increment it by the amount &ADDL to get the relative address of the first byte where the new array can be created. After completing stage 1008, control goes to stage 1010, where the composite address is ADDRESS updated with the new value in register R4. EMPTY is set equal to ADDRESS at stage 1012. Additionally, the EMPTY address is inserted in the resident memory block at the address in (BWX+76) as was done with respect to stage 1000.

From stage 1012 control is transferred to stage 1002 wherein registers R0 to R4 are loaded from the array C1 and then control goes to stage 936 in the TBADD macro. This return to GLOBAL in TBADD saves the resident memory block. After completing the save operation in SMEMORY, control returns from the TBADD macro-instruction to CREATE again. In CREATE, at stage 976, a determination is made as to whether there is sufficient space to create the new array in core and, accordingly, it does so at stages 976, 980, 982, 984, 986, 988, 990, 992 and 994 and then exits from CREATE.

GLOBAL MEMORY

The GLOBAL memory component (hereinafter called SMEMORY), transfers memory blocks between the AUXILIARY FILE, which can be located on any combination of devices, and core storage. The AUXILIARY FILE must be separated by definition from bulk storage. That is, information which is utilized to address information in the bulk storage will be found in the AUXILIARY FILE or in core storage. The AUXILIARY FILE is, normally, placed on the disk storage of the computer and aids in finding the address of a particular group or segments of information. The DCB (macro-instructions of the IBM 360 which define the characteristics of the data set on a peripheral storage device) and read/write instructions of each new device that is made a part of the GLOBAL MEMORY are incorporated in the DCBMEM macro-instruction, which is used in SMEMORY. The storage capacity of each new device must be given in the macro BEGINS. This information will be used by the computer to assign new memory blocks when all previously assigned devices are full. Thus, by modifying SMEMORY the GLOBAL MEMORY is easily extended to include new storage devices when they are added. SMEMORY notifies the operator when the GLOBAL MEMORY is full. Because the existing storage is not altered in any way when the GLOBAL MEMORY is extended, this component (SMEMORY) permits the simultaneous growth of the hardware and retrieval systems.

There are two parts, A and B, of the SMEMORY component. Part A supervises the AUXILIARY FILE while the information paths are being traced or purged or created or updated. Part B is entered when the job stream is terminated. Its function is to save (if necessary) the resident memory block and to punch (if necessary) the first part of the AUXILIARY FILE. This punched card deck, which contains the M and J subarrays, will preface the input deck for the next job stream.

It should be noted that the M and J subarrays, although part of AUXILIARY FILE, are always in core storage. The M and J subarrays are in core storage because all information paths start with these subarrays and, thus, it is possible to save considerable time by avoiding the necessity for fetching information from disk storage to start these paths.

At every step in the retrieval package, safety procedures are executed which assure that the memory block in the AUXILIARY FILE will never be damaged by program, input, operator or machine errors. Only the physical breakdown of the virtual memory hardware components can damage the AUXILIARY FILE. No attempt will be made to describe the numerous safety procedures. The two parts of SMEMORY are described next.

The flow chart of Part A of SMEMORY is shown in FIG. 10. Before explaining the flow chart, it should be understood that the input data for SMEMORY contains a composite address whose slow-memory part specifies the location of a memory block in the virtual memory storage. The fast memory part of the composite address specifies the location of the requested information when the memory-block resides in core-storage. The composite address is normally six bytes long and it is used to determine the course of action of GLOBAL MEMORY. For example, in the search procedure it is necessary that the machine believe, at all times, that the information it is looking for is in core storage. It is a purpose of GLOBAL MEMORY to find the information no matter where it may be and transfer it to core storage whenever it is needed. The composite address discussed above contains, in its first three bytes, information relating to non-core storage. The first four bits, designates the type of device of non-core storage where the memory-block can be found. For example, the size permutation and combinations of the first four bits will include codes for tape storage, disks, drum storage, etc. This will key the next four bits to know on which particular one of a possible 16 different units of tape, disk, or drum storage the memory-block can be found. The next 16 bits are divided into six bit and ten bit sections which together designate an address on the particular storage element. For example, if the storage element is a disk, the next sixteen bits contain the track number on the first six bits and the cylinder number on the next tex bits. If the storage element is a tape, then the 16 bits together specify the record where the memory block begins.

The remaining three bytes in the composite address contains the core address where the particular information can be found when the memory-block resides in core.

It should be noted that there is one byte of information known as MSIGNAL, which is continually being updated, and it specifies the type of action that is to be taken by the Global Memory Component (SMEMORY). If the right most bit of MSIGNAL is a "one", this means that the resident memory block or the permanently resident part of the AUXILIARY file has been altered by inserting a new EXECUTIVE POINTER and by creating a new subarray. Another portion of storage which is checked by SMEMORY is the composite address called CURRENT. If CURRENT equals FFFFFFFF (a condition which is placed into CURRENT in SSTRATECL when there is no AUXILIARY FILE) the AUXILIARY FILE does not exist, and a new memory block will have to be created in core-storage. When CURRENT equals zero, that means that there is no resident memory block, and one will have to be fetched from the virtual memory or created in core-storage. These values of CURRENT are set when the AUXILIARY FILE is initialized at the beginning of each job stream.

In one version of the SMEMORY component, the virtual memory was on IBM 2311 disks which had two distinctly different write modes (i.e., new write and rewrite). The 02 or next to last, rightmost bit of the MSIGNAL byte designates which of the two write modes is to be used to store the resident memory block at the location in virtual memory specified by the slow-memory part of the composite address CURRENT.

The flow diagram for the first part of SMEMORY is shown in FIG. 10. The first step in SMEMORY is to check the rightmost of 01 bit of MSIGNAL at stage 322. If the MSIGNAL 01 bit is zero, this means that the memory block was not changed and, therefore, there need be no read out into virtual memory. If the MSIGNAL 01 bit was one, then there must be a readout into virtual memory.

If the MSIGNAL 01 bit is "one", the next step is to check the address CURRENT to determine whether or not there is an AUXILIARY FILE, by subtracting from CURRENT, FFFFFFFF (CONM.). If there is no AUXILIARY FILE, and the answer is, therefore, "zero", then control continues to stage 326 wherein a determination is made as to whether one wishes to create a memory block in core or whether one wishes to read it from virtual memory. This determination is made by testing the MSIGNAL 80 bit which, if it is "one" indicates that you wish to create a new memory block in core and, if it is "zero", indicates that you wish to read a memory block from virtual-memory into core. If the MSIGNAL 80 bit is "one" then the next step is to execute stage 328 wherein the machine updates the component address EMPTY, a position in core storage which contains the composite address of the next available position in virtual memory where a new memory-block can be stored. This EMPTY address is set in CURRENT. Of course, the MSIGNAL 80 bit is turned off and the MSIGNAL 02 bit is turned on to indicate that this newly created memory-block has never been written into virtual memory. When the time comes to write this memory-block into virtual-memory the 02 bit of MSIGNAL will indicate that the new write operation must be used.

After completing step 328, control goes to stage 330 wherein the MSIGNAL 01 bit is turned off to indicate, that at this point, there has been no modification of the new memory block that is now resident in core.

If, at stage 324 the answer was something other than zero, control would have been transferred to stage 332. There CURRENT is compared to zero. If CURRENT were zero, then the procedure set forth with respect to stages 326, 327 and 330 would have been executed in substantially the same manner. However, there is one variation that could occur. That is, if a stage 326 it was determined that an existing memory block was to be read into core-memory from the virtual memory address specified in ADDRESS. In this case the MSIGNAL 80 bit would have been zero and control would have gone to stage 334, where the memory block which had been requested during the SEARCH procedure is read into core storage. It should be understood that in the TBADD macro in the SSEARCH component it was determined that the resident memory block is not the correct memory block for a particular search and, in fact, a different memory block was requested by SSEARCH. After the steps at stage 334 are completed the MSIGNAL 02 bit is turned off to indicate that the new memory block has been read from peripheral storage.

It thus should be noted that when the MSIGNAL 02 bit is off, it indicates that a rewrite procedure must be used when the resident memory-block is transferred back to its specified location in virtual-memory. If the MSIGNAL 02 bit is "on" this would indicate that the resident memory-block is new and the new write procedure must be used to transfer it to the AUXILIARY FILE. After completing stage 336, control goes to stage 330 wherein the MSIGNAL 01 bit is turned off, meaning that the new resident memory block has not yet been changed, then CURRENT is now loaded with the address of the memory block taken from ADDRESS.

If CURRENT is not zero, it specifies where the resident memory-block should be stored in the virtual memory. It must be understood that the memory block is core-memory, which originally came from the virtual memory or it was created, has been modified before SMEMORY was called. The virtual memory must be updated with this changed resident memory-block before a new memory-block is transferred to core-memory or created. Thus, if there is an address in CURRENT, control goes to stage 338 wherein the MSIGNAL 02 bit is again reviewed to determine whether this is a new write or a rewrite procedure. If it is a new write procedure, then the resident memory block has not been transferred to virtual memory before. Thus, the resident memory block must be written in the virtual memory for the first time. Its previously assigned virtual memory location, is found in the slow memory part of the composite address CURRENT.

If this is a new write procedure, then the MSIGNAL 02 bit is "one" and the program continues to stage 340 wherein the memory block in core is written for the first time into the virtual memory at the location specified by CURRENT. If the MSIGNAL 02 bit is zero a rewrite procedure is used. Thus, control goes to stage 342 wherein the rewrite procedure transfers the memory block from core back to its address in the virtual-memory that is specified in CURRENT. From stage 342, control goes to stage 326. It should be noted that the entire purpose of SMEMORY is to give to SSEARCH or a similar programmatic procedure a new memory block whenever it is needed and take care of the procedural functions that are necessary to preserve the core-resident memory block. The stages 322, 324, 332, 338, 340 and 342 have taken care of this procedural function. At stage 326, a determination is made as to whether a new memory block is to be created in core storage or whether a memory block is to be read from virtual memory into core storage. If a new memory block is to be created in core storage, then control is transferred to stage 328. If the required memory block already exists it is read into core storage at stage 334.

COPAK COMPRESSOR

COPAK is a high-speed, multistage, compressor-decompressor software package that can be used to compress arbitrary bit-strings by reversibly removing redundant information. Decompression occurs without losing a single significant binary-bit of the original string. Except for minimal commands, both the compressor and decompressor parts of COPAK are fully automatic. COPAK operates independently of both the data-base and the information-content.

COPAK can be used for supervision of bulk storage and for transmission of data in communications and computer networks. A more effective role can be achieved by implementing COPAK on a small, high-speed, low-cost, specially designed dedicated computer. This unit could be interfaced with computer/communication networks or used on a stand alone basis for compressing and decompressing information. As such, it is highly usefull as a buffer-converter between various combinations of computer systems and input-output devices. Careful considerations indicate that this low-cost unit could have a throughput between three and thirty times faster than COPAK on the IBM 360/67. The throughput on the 360/67, inclusive of both input and output times, lies between 40K and 900K BAUDS, with the optimum near 550K BAUDS.

COPAK has been described in detail hereinunder as a machine process in the form of a combination of a computer software package and a general purpose digital computer of adequate capacity and versatility. In fact, the COPAK package described hereinunder has been utilized in conjunction with general purpose machines such as IBM 360/67 and 360/40. It is noted that when carrying out COPAK, general purpose machines perform a specialized task and only those components of the warehouse of components contained therein which are ordered and organized by the COPAK act, as controlled by COPAK. In effect then, the combination of COPAK and a general purpose machine becomes a special purpose digital computer. Alternatively, the flow diagrams and the program steps and instructions described hereinunder in detail comprise a teaching of combining existing hardware components, such as those used in the general purpose machines mentioned above, under control of COPAK, to arrive at a special purpose computer carrying out COPAK. The process of so combining existing components as dictated by COPAK is an engineering task for one of ordinary skill in the art and does not involve inventive efforts. Although COPAK is usually referred to as "software package" hereinunder, its function as a special purpose machine when combined with a general purpose digital computer should remain clear.

The communications and computer industries have placed great emphasis on engineering research which can increase the "efficiency of networks" by increasing the channel capacity or speed (of transfer) or by reducing the proportion of redundant signals. In recent years the storage capacities of peripheral devices (like disks, drums, data-cells, tapes and cards) have been enormously increased by advances in engineering technology. Some special recoding techniques that save storage and/or lower transmission costs have been widely used. However, these special techniques are of limited usefulness because they apply to particular devices and/or they are not independent of the data base. There appears to be no report of a major effort to devise general software packages that can increase the information content per unit of the information itself. Such packages could artificially increase the storage capacities of existing facilities and lower transmission costs.

To be of more than transient usefulness these general software packages should meet as many of the following specifications as possible.

i. with a minimal number of commands the software packages should be capable of handling any binary coded information. This means that compression and decompression must be independent of the data-base or the information content.

ii. Compressed information should be automatically decompressed back to the original whenever it is needed.

iii. For communications networks the rate of compression should not be less than the rate of transmission. The decompression rate (at a receiver station) should not be slower than the rate of compression.

iv. There must be checks to ensure that errors in the compressed information will be detected before or during decompression.

v. The effectiveness of the proposed package for increasing the capacity of existing storage devices will be determined by several factors. Some of these are: the access and transfer times of the peripheral equipment; the speed of decompression; the frequency that the particular information is used. Obviously, infrequently used information can be highly compressed to release storage that would not normally be available.

To be fully effective in both storage and communications applications the general software packages should have adjustable parameters which would permit the user to stipulate the maximum amount of time that can be devoted to compressing (or decompressing) information.

The COPAK compressor meets the five specifications just listed. The computer speed and two variable parameters determine the rate of both compression and decompression. On the IBM 360/67 COPAK compresses information at rates of 40,000 to 900,000 BAUDS. Decompression is at least one and a half times faster.

Definition and Commands

The two parts of COPAK (Compressor and Decompressor) each have two stages (SNUPAK and SANPAK). The COPAK compressor handles the information as strings, segments and substrings. A string of information can be divided into non-equal segments. Segments can be sub-divided further into substrings. The lengths of strings, segments and substrings is a user option. The numeric stage (SNUPAK), which can process any information designated as binary coded numbers, handles segments of information at the substring level. The alphanumeric stage (SANPAK) handles strings of information at the segment level. As described hereinunder, COPAK processes one segment per string (i.e., string = segment) with each segment containing between one and twenty substrings.

The Device Command LLENGTH, which must have a value less than 256, specified the number of bytes in the label or key which may preface each string. This information is not processed by COPAK. The leftmost LLENGTH bytes are removed from the first segment and the shortened segment is processed by COPAK. The structure of the stored composite string is:

The String Command MODE determines which part of the COPAK compressor is to be used, i.e., MODE=0, decompress; MODE.noteq.0, compress. Three other String Commands (LEXCON, LEXPCH and LEXMODE) are associated exclusively with the alphanumeric compressor stage (SANPAKC) of COPAK. They will be discussed later.

The three Substring Commands (NV, SOS and LSX) are used extensively in the numeric stages (SNUPAK) of both parts of COPAK. NV, which is entered once for each segment, is the number of substrings (maximum 20). One SOS command and LSX command are entered for each substring. Together they determine the entry format-type of the substring and the path that is to be taken through the compressor parts of COPAK. The entry format-type for each substring is stored in the compressed segment as a four-bit format code. This is used to produce hard copy when the segments are retrieved. Format codes used are: A=1; I=2; E or F=3; X=4 (printed in the hexadecimal format (B)). Here X is the IBM 360 column binary. The substring commands are not entered if MODE= 0 (i.e., for retrieval).

Overview of the COPAK Compressor

In the compression mode (MODE.noteq.0) the compressor parts of COPAK construct a completely self-defined string which contains the label or key; format codes; string structure (i.e., segments and substrings retain their identities); and sufficient information to ensure that errors will be detected during decompression. This information, exclusive of the label, is normally less than 24 bytes per segment. It is added even if there is no actual compression. The decompressor parts of COPAK unscramble the self-defined string to obtain the identical original information. The error-checks are executed during decompression. If an error is found, control goes to a location where error-correcting procedures and/or retransmission commands can be executed.

The status of each segment is recorded in a four-byte work area (PARM) which is updated whenever the segment is altered. The structure of PARM is:

The status of each substring in a segment is indicated by a four-byte word (SOS) which is updated whenever the substring is altered. The substring composite control words (SOS) contain four items of information thus:

Here NDR is the "depth of representation" that is computed in the differencing procedure (of NUPAKC).

A flow diagram for the COPAK compressor is given in FIG. 11. As a segment of information enters the computer its status-of-substring control words (SOS) are changed, to the form shown above, and the status-of-segment control word (PARM) is constructed. At this stage the sign of SOS and the values of both NDR (in SOS) and LSX together determine how a substring will be handled by the compressor part of SNUPAK. After some preliminary processing of the segment in SANPAKD it is processed, one substring at a time, by SNUPAK. In this step the SOS composite words are updated and a new status-of-segment control word (PARM) is constructed. The sign of the first SOS and the number of bytes in the segment emerging from SNUPAK are transferred to JII, which is the temporary control variable for SANPAKC. In the final step of SNUPAK the control words (PARM and SOS), check information, and other data needed by the decompressor part of SNUPAK are inserted at the head of the segment.

The information in JII is used by the alphanumeric compressor part (SANPAKC) to decide whether or not compression of the newly defined segment is to be attempted. If compression occurs in SANPAKC, information, which is used by SANPAKD during decompression, is inserted at the head of the segment. In the final step of SANPAKC, four bytes of control-information are inserted at the head of the segment. The label or key information is then inserted preceeding the control-information at the head of the segment, (see Device Commands). The structure of the four byte word of control-information is:

Here NL is the number of redundant bit-patterns removed by SANPAKC.

A single string command suffices to bring about decompression of a stored segment. When this command, MODE=0, is used the label or key is first removed from the head of the segment. Then the four bytes of control information are extracted and the following steps are executed:

Step a. The compressed segment is decompressed with the alphanumeric decompressor (SANPAKD) if NL.noteq.0.

Step b. The control information that was inserted after processing by the SNUPAK compressor is extracted.

Step c. The substrings of information are decompressed one at a time by the decompressor part of SNUPAK.

Step d. The label or key, previously removed, is replaced at the head of the decompressed string.

Error-checks occur at every step of this decompression procedure. Thus a segment with N substrings has (N+2) absolute error checks. Also, there are an additional 15 error-checks which are made during the decompression by SANPAKD and SNUPAK. Moreover, the conventional CHECK-SUM can be used as an additional error check. If errors are found, the decompression is aborted and control goes to location RTRANSMIT, were error-correcting and retransmission procedures can be utilized.

The compressor parts of COPAK have incorporated fail-safe procedures which prevent the inadvertant destruction of information. For example, if SNUPAK is told to compress text or binary information as integers it will abort and change the processing commands to execute SANPAKC without destroying the data.

SANPAKC

INTRODUCTION

SANPAKC is the Macro instruction used for alphanumeric compression of information within the COPAK system.

DETAILED DESCRIPTION

The data on which SANPAKC operates is in alpha-numeric form, in strings of units. In the embodiment described hereinunder the units are conventional 8-bit bytes. It should be clear however, that SANPAKC, as well as the complete COPAK package, can be equally applicable to machine using units other than bytes.

Two distinct types of compression are carried out consecutively. Each may be carried out either in Fast Mode or in Slow Mode.

In Type 1 compression, the string is searched for identical patterns of two or more contiguous units. If such identical multi-unit patterns are found, they are deleted from the string and decompression information which takes less space but has sufficient information content for subsequent decompression of the string to its original form is added to the strings.

In the Slow Mode of Type 1 compression, the scan for identical multi-unit patterns is carried out by comparing a pattern of several contiguous units of the string with all other patterns in the string of like size. In the Fast Mode, this is carried out by comparing previously chosen patterns which are believed to occur often with patterns of like size in the string.

In Type 2 compression, which is executed after the completion of the Type 1 compression, the compressed string is scanned for individual units which occur more than a certain number of times. If such units are found, they are deleted from the string and decompression information is added to the string, but only if the length of the decompression information is less than the length of the deleted information.

As a brief qualitative description of a particular example of carrying out Type 1 compression in the Slow Mode, a string of 1,000 bytes is scanned such that the numerical value of each byte is used to address a 256-byte table in which each location corresponds to a unique one of the 256 possible combinations of the eight binary bits of each byte of the string and each location of the table acts as a counter for the number of times it has been addressed. After the last byte of the 1,000 byte string has been used as an address in this manner, the table is examined for locations which have not been addressed. The address values of these locations, if any, are stored consecutively in one area of LEXICON table and are called Type 1 codes. It will be appreciated that these Type 1 codes represent bytes which are not present in the 1,000 byte string. Additionally, the address values of the locations of the 256-byte table which have been addressed more than a certain number of times, for example, more than 34 times, are stored in another area of the LEXICON table and are called Type 2 codes. These Type 2 codes represent bytes which occur very often in the 1,000 byte string and are likely candidates for deletion. Next, a pattern of contiguous bytes from the string, for example, the first 12 bytes, is compared with all other patterns of the same format in the string. Identical patterns found in this manner are deleted from the string and are replaced by a Type 1 code from the LEXICON table, but only if actual saving in string length would result from this process and only if a unique Type 1 code is available for each group of like patterns. The same Type 1 code followed by the 12 bytes of the deleted pattern is inserted at the beginning of the string for later use in decompression. The process is repeated for different patterns of contiguous bytes for as long as there are unused Type 1 codes and for as long as saving in length of the string can be achieved. When a pattern has been found to occur several times in the string and has been deleted therefrom, it is stored in a PCORDS table which contains patterns likely to occur often in similar strings. A savings ratio is associated with that pattern to indicate the degree of compression achieved by the use of that pattern.

In Slow Mode of Type 2 compression, a portion of the compressed string, for example, a portion of 256 consecutive bytes, is examined for redundancy of particular individual bytes anywhere in the portion. If a particular byte selected from the Type 2 code in the LEXICON table is still found to occur in that portion more than a certain number of times, a 256 bit map is constructed in which each bit location corresponds to a byte of the examined portion of 256 bytes. The bit map serves as a record of the byte position in which the particular byte was found. The redundant bytes are then deleted, the string is closed in to take up the vacated space and the bit map together with the value of the deleted byte is added to the string after the size of the bit map is minimized. The value of a deleted byte and the savings ratio associated with it may be added to the PCORDS table.

In the Fast Mode of carrying Type 2 compression, the portion of 256 bytes from the string is checked for the occurrence of bytes selected not from the LEXICON table but from previously stored bytes in the PCORDS table.

In both modes of both Type 1 and Type 2 compressions, continuous track is kept of various string characteristics for the purpose of insuring complete reconstruction of the compressed string and for the purpose of providing adequate error detection features. A more detailed explanation of the SANPAKC compression, with particular reference to the flow diagrams in the drawings, can be found below.

The flow chart of SANPAKC is given in FIGS. 1A, 1B and 1C. The first step performed in the Macro SANPAKC is to initialize all the registers and counters in that portion of the computer which is being used for alphanumeric compression. The next program instruction at step 11, is to check whether the MODE Command is set equal to "0" or not. "0" means that no compression is desired, and "1" means that compression is desired. The "0" value would occur when the system was in a retrieval mode and therefore, compression would not be required. If the machine was in the storage mode (MODE.noteq.0) compression might be desirable. The next step in the program is to determine if the variable JII is greater than zero. If JII is equal to or less than "0", that means that no compression is desired. If it is greater than "0" then compression is desired. The only way that JII would be negative is by setting it to a negative value prior to entering SANPAKC indicating that compression, by SANPAKC, is not desired. Therefore, even though the computer was in the store mode, one could prevent compression of the information. If JII is positive, the program begins compression at stage 12 in the flowchart of FIG. 1A. Although the flowchart shows various stages in the program, it is understood that this is just a means of designating a group of steps to be completed at a particular point in time. The program listing is IBM Assembly language and is set forth at the end of the written description. At stage 12, the program initiates the steps of finding all available codes and then storing the available codes in a location named LEXICON. Thus, this step is achieved as follows:

a. A thousand byte string of information is scanned one byte at a time starting from the first byte. The numerical value of the first byte is used to address a location in a table of 256 byte positions corresponding to the 256 different bit configurations possible in a single byte of information. A count is initiated at that particular position, to indicate that the particular byte has been found once within the thousand byte string. The next byte within the thousand byte string is similarly used to address a location in the 256-byte table (i.e., by adding the numerical value of the scanned byte to the base address (beginning address) of the table) and a count is added at that particular location. This is continued throughout the one thousand bytes in the string. Where bytes within the thousand byte string are identical, the count at the particular location in the 256-byte table will indicate that the particular byte of information appears more than once in the thousand byte string. The counters in the 256-byte table are not permitted to exceed the value 255 so that an absolute frequency count of the number of occurrances of a particular byte is not achieved if the byte occurs more than 255 times in the string. However, in going through any thousand byte string, there will be many of the positions or locations in the 256-byte table which will not be utilized as there is no byte in the thousand byte string corresponding to that location. Those locations which are "0" in the 256-byte table are determined by scanning the table. The corresponding numerical values (<255) of the 256-byte table positions (i.e., the number of bytes past the beginning of the table) are then transmitted by means of program instructions to the 256-byte array named LEXICON and stored in consecutive byte locations of LEXICON to act, at a later stage, as possible Type 1 code numbers (for Type 1 compression) for groups of bytes to be compressed. The count in each of the individual positions in the 256-byte table where there has been one or more counts is also scanned to determine whether any particular location shows more than 34 counts. This indicates that the particular byte is a candidate for Type 2 compression (which will be described below). Any location which shows more than 34 counts is also stored in LEXICON to act later as Type 2 codes. LEXICON has only 256 positions of storage. However, the positions in the 256 byte table mentioned above which show more than 34 counts are stored starting at the position 256 of LEXICON and working backwards. For example, if position 193 in the 256-byte table were to show more than 34 counts it would be placed in position 256 in LEXICON and if position 232 in the 256-byte table were also found to have more than 34 counts it would be stored in the 255th position in LEXICON. It should be noted that it is impossible for the Type 1 codes stored in LEXICON to overlap the Type 2 codes stored in LEXICON as these positions are derived from the 256-byte table mentioned above.

The next group of instructions in SANPAKC is shown at stage 14 of the flowchart. At this point the question is asked "are there any codes available?". At stage 14 LEXICON is checked at the front end thereof to see whether any Type 1 codes have been stored as a result of the steps taken at stage 12. It should be noted that when the available Type 1 codes were stored in LEXICON, a count was made of the number of available codes thus stored during the steps defined with respect to stage 12. When the step set forth in stage 14 occurs a check is made to determine whether the last mentioned counter has counted any available Type 1 codes being stored in LEXICON. It is possible, if the thousand byte string contains 256-bytes of information different from each other that there may be a count in each one of the 256 counters in the 256-byte table mentioned previously. Accordingly, there are no available Type 1 codes stored in LEXICON. If the answer at stage 14 is that there are no available codes, a different type of compression, namely Type 2 compression, is used. However, Type 2 compression will be described more fully below.

If the answer at stage 14 is "yes" then the program continues to stage 16 where the determination is made as to whether the compression technique should be completed in the "Fast-Mode" or the "Slow-Mode . The Fast-Mode will be discussed separately as most of the operations in the Fast-Mode are included in the Slow-Mode and, in fact, the Fast-Mode may be considered a special case of the Slow-Mode. A full description of the Fast-Mode will be set forth below after consideration of the operation in the Slow-Mode. Thus, if the answer at stage 16 is "no", the compression process continues in the Slow-Mode and the program continues to stage 18 where counters R and RM are set. Counter R is a counter set to contain the value of the largest number of contiguous bytes in the pattern for the occurrence of which the string will be searched later. RM is equal to R-1. For purposes of the system, in actual use, R has been set at a maximum of 12 bytes and, of course, then RM would equal 11. After the counters R and RM are set, the program then proceeds to stage 20 where a scan of the input string is initiated to locate the occurrances of redundant patterns. A pattern is a contiguous grouping of bytes of variable length. Initially, we will consider groups of contiguous bytes of length 12 bytes. At stage 20, as shown in the diagram, there is a set of instructions to be followed entitled CCD: a number JII which designates the length of the string of information at any given point in the compression process (please note that in the example given above the string length was initially 1,000 bytes); and a number CS3 which is a number designating the number of bytes after the starting point at which redundancy is to be checked. The instruction CCD is described below.

A first contiguous group of bytes having a length determined by the counter R (in the first instance 12 bytes) is recorded and compared along the length of the string moving one byte at a time to find how many times and where the like of the first contiguous group of R bytes can be found in the string. If there is no other contiguous group of R bytes exactly like the first group being sampled, that result is transmitted to stage 22 of the flowchart where the question is asked as to whether a saving can be achieved by removing the redundant pattern of bytes. The answer to this question is determined by examining whether the formula R+2 N (R-1) is satisfied or not. N is the number or identical groups of bytes found during a single scan of the string. Since only one group had been found during the scan, R+2=14 and since N(R-1)=11; the answer to the question as to whether there was a saving is obviously "no". Since the answer is "no", the program continues by transmitting this response to stage 24 where a determination is made as to whether the operation is in the Fast Mode. As was discussed previously, this operation is being accomplished in the Slow Mode and the answer at stage 24 is also "no". If the answer were "yes", another operation would take place. This operation will be discussed in conjunction with the description of the Fast-Mode. Since the answer is "no", the program continues by transmitting this information to stage 26 wherein steps are taken to determine whether additional comparisons can be made. This determination is made by comparing whether CS3 plus 1 plus 2R is less than JII. This equation determines whether the sampling has reached the end of the string since many more comparisons would be useless as there are not enough bytes left in the string to be able to effect a savings. Since, in this case, CS3 is "0" and the group CS3+1+2R is equal to 25, and this is certainly less than 1,000 (JII) the answer at stage 26 is "yes". (What happens when the answer is "no" at stage 26 will be discussed below.) First, since the answer at stage 26 is "yes", program control is returned to stage 20 where CS3 now has the value of "1" by the addition of one byte and CCD continues starting with the next byte of the groups of 12 bytes to be compared. Thus, starting at one byte past the first point of the string, the succeeding group of 12 bytes is checked for redundancy going byte by byte along the string to determine whether there are any similar groupings of 12 byte patterns along the length of the string. Assuming in this instance that three such grouping are found along the length of the string, then this information is passed to stage 22 and entered into the formula R+2<N(R-1) or "is 14 less than 33?". The answer obviously is "yes" and, accordingly, this information is passed on to stage 28. At stage 28 the counters associated with the compressor system are updated. That is, a counter, which for purposes of notation is designated as CS8, counts the number of compressions which have taken place on this particular string of information. That is, since this is the first compression which is to take place on this particular string of information the counter will be set equal to 1. An additional counter CS4 is actuated to count the number of compressions which have been taken with R bytes, that is, with 12 bytes, and, accordingly, since this is the first compression with 12 bytes this counter is also set equal to 1. Additionally, the counter CS6 which is associated with LEXICON to determine the spot where one will get a Type 1 code from the LEXICON code array is set. In this case this counter counts "1" and selects the first available code in the LEXICON code array which was set in the manner described with respect to stage 12. At this point, having selected a code from LEXICON, CS6 is now set at "2" so that it is ready to receive at a future time a request to select a new code from the second spot in the LEXICON. The next step is to move to stage 30. For consideration of what happens at stage 30, the effect of the program instructions in CCD at stage 20 must be understood. Each time that a redundant group was found during this stage, the addresses of the redundant groups were stored in core memory beginning at location TL, (which is an area of core memory). The address of the first redundant group, namely the first and original pattern, was stored at location TL in core memory and the address of each additional redundant group was positioned in increments of four bytes with the array TL. Each address stored in TL is located at a specific displacement past the beginning location TL (i.e.a multiple of four bytes past the beginning). The displacement where the address of the last redundant group can be found is set in a register IR4. For example, if there were three redundant groupings, the register IR4 would have a value equal to "8". It must be understood that the value "8" is in relative terms and the core address of the beginning of the TL array must be added to the displacement value (in IR4) in order to compute the absolute machine address desired. Thus, if the value in IR4 was "8", the absolute machine address is the initial address of TL in the core plus 8. Further, it should be noted that TL has physical storage limitations and can store addresses for a maximum of 200 repetitions of a pattern during any given pass. Of course, if the length of the string is maintained in reasonable bounds, this limitation should not be reached in the ordinary course of compression, but provision is made in the program for stopping the compression should there be an attempt to store more than 200 addresses of redundancies in the TL array.

At stage 30, the address of the match or redundancy from TL(IR4) (the last match found in the string) is loaded into a register IR2. Note that in this contact TL(IR4) means the initial core address of the TL array plus the displacement value past the beginning of TL which is recorded in register IR4. That is, the address of TL plus the contents of register IR4 gives the machine location where the desired information is to be found. Then,after IR2 is loaded with the address from TL(IR4), the program moves to stage 32 wherein a series of instructions named SAM are executed. This set of instructions first substitutes, in the string of information, a one byte code in place of the redundant pattern whose address is recorded in register IR2. The replacement code is the code from the LEXICON array pulled out at stage 28. This one byte code is then placed at the address contained in register IR2 in place of the first byte of the redundancy. Then, a determination is made as to the number of bytes in the redundant pattern, namely R, the address of the redundant group, namely the address in IR2, and the length of the string before compression, namely JII. Then the remainder of the string following the last byte of the redundant pattern is moved to close the space between the newly added code information and the remainder of the string to compress the string by an amount equal to R-1 bytes. At this time, JII is changed to reflect the compression of the string by an amount R=1 bytes. It would be understood that the compression is R-1 as it was necessary to add one byte of information to account for the space required to store the replacement code. If the code had not been added, JII would have been reduced by R. It should be noted that at times, when Type 2 compression is being utilized (to be discussed below) no code information is placed in the space vacated by the matched grouping and, in such cases, JII would, in fact, be reduced by R. Since IR4 as set forth above is no longer pointing to the address of the last group of matched information in TL, (having already made the required substitution into the string), IR4 must be reduced by 4 as is accomplished at stage 34 so that, with its new value IR4 points to the address of the next to last group of matched bytes found in the scan.

This new IR4 is then checked at stage 36 to determine whether it is equal to or greater than "0". If it is equal to or greater than "0", then the above procedure is repeated starting at stage 30. The string is continually compressed through stages 30, 32, 34, and 36 until finally stage 36 determines that IR4 is less than "0". This situation occurs when all the redundant patterns located during a single scan of the string have been substituted for. When this occurs, the program continues from stage 36 to stage 38.

At stage 38, a determination is made as to whether the compression had been a Type 2 compression mentioned previously. Since this is not a Type 2 compression, the program continues on to stage 40. There, information relating to what has occured in stages 28 through 36 is placed at the front of the string of information. First, the code taken from the LEXICON array, at stage 28 of the flowchart, is stored at the head of the string followed by the pattern which was replaced so that the code defines the particular 12 byte pattern. Thus, in decompression, when one scans the string and finds the particular code, the information at the head of the string will define the meaning of the code information. Following the addition of the code and pattern information to the head of the string, it is obvious that JII has now been increased by an amount equal to R plus 1. Accordingly, JII is increased by R plus 1. It should be noted that the pattern which has been replaced is stored in the machine at location CORD1.

Before continuing, it is important now to discuss an element of the invention which has not yet been discussed and which is germane to the Fast-Mode procedure. There is in storage a table known as the PCORDS Table in which are maintained a maximum of 200 patterns which are considered to be the most repetitive patterns in strings of information processed by SANPAKC. In certain instances, where one knows the basic contents of the strings of information being fed to SANPAKC, one can input a table of PCORDS (permanent cords or matched groups), which contains the repetitious patterns. Where one is dealing with unknown alphanumeric information or information whose content is now known, for example, information that is purely numeric and should have been transmitted through SNUPAK prior to entering SANPAKC), one must, in order to operate in the Fast-Mode, set up a PCORDS Table which will be continuously changing to optimize the Fast-Mode by selecting the PCORDS with the best savings ratio. By savings ratio, it is meant the original JII minus the new JII after compression divided by the old JII, or the number of bytes saved divided by the old string length. Obviously, it is desired to utilize those PCORDS which provide the best savings ratio and, if the best 200 PCORDS are utilized, it may not be necessary to go through the entire Slow-Mode of operation as was previously described. It is expected that by utilizing the best 200 PCORDS one would be able to reach the optimum compression while saving an enormous amount of search time. Thus, it is important to obtain a PCORDS Table which represents the PCORDS having the best savings ratio. Obviously, the PCORDS Table will only record those patterns which have, in fact, affected a saving. A pattern which does not get past stage 22 (in the flowchart) will not be recorded in the PCORDS Table. Although the PCORDS Table can hold up to 200 different PCORDS, it may be that the best five or ten PCORDS will give such a substantial compression that it would be unnecessary to utilize any further PCORDS. Machine time can thus be substantially reduced since only those five or ten PCORDS would be searched for in the input string.

The PCORDS Table always has at least one PCORD therein and, it is expected in normal operation that the PCORDS Table will be initialized with six PCORDS of the following types: two PCORDS of 12 byte lengths (one containing all zeros and the other containing all blanks); two PCORDS of eight byte lengths (as described above) and two PCORDS of six byte lengths (as described above). By experience, it is known that in most strings of information there are groups of blanks and zeros which occur with regularity and, therefore, it is highly likely that these particular PCORDS will affect substantial savings in any string of information which might be fed to SANPAKC. The PCORDS Table includes 201 20-byte segments of storage space. The first 20-byte segment is control information which gives the status of the entire table. The next 20-byte segment is the first PCORD recorded in the PCORDS Table. This second 20-byte segment, like all succeeding 20-byte segments which are stored in the PCORDS Table, has its first byte signifying the number of bytes in the pattern to be stored. The succeeding bytes after the first byte are the stored pattern or PCORD followed by binary zeroes up to the 16th byte. From the 17th byte through the 20th byte is recorded the savings ratio associated with that particular PCORD. Thus, it can be seen that by scanning the first byte in each 20-byte segment one can determine the length of the PCORD in the 20-byte segment. By scanning the 17th through 20th byte in each 20-byte segment one can determine the savings ratio relating to the particular PCORD. The first 20-byte segment is the control information which gives the status of each PCORD in the PCORDS Table. In this first 20-byte segment, the first 12 bytes are used to indicate the number of patterns of each length (length one up to length 12) stored in the PCORDS Table. The first byte contains the number of length 1 patterns, the second byte contains the number of length 2 patterns, and so on up to the 12th byte. It should be noted that a byte can hold a value up to 256 and thus the number of entries in the PCORD Table for each length can be fully recorded in the first 12 bytes of the control segment, even if all the patterns are the same length. The next four bytes, namely bytes 13 through 16 of the control segment contain the address of the PCORD having the lowest savings ratio. Where the Table has not been filled, this address would be the address of the last 20-byte segment in the PCORD Table, as yet unfilled. However, this information is extremely important when the PCORDS Table is filled as it is desirable to replace the PCORD having the lowest savings ratio with a pattern having a better savings ratio. Since this information is recorded in the control segment, it is possible to replace the lowest savings ratio PCORD with the pattern found to have a higher savings ratio. Of course, in order to compare the lowest PCORDS savings ratio, it is necessary to know what that savings ratio is. This information is recorded in bytes 17 through 20 of the control segment.

In the PCORDS Table, the PCORDS with the longest length are placed at the top and the smallest length PCORDS at the bottom in sequential order. When a pattern is to be substituted in place of the lowest savings ratio PCORD already in the Table, provision is made for shifting the PCORDS so that this sequential arrangement is maintained at all times. The reason for this arrangement is that it is extremely desirable to compress starting with the longest length PCORDS and working downward to the shortest PCORDS since the highest savings are achieved with longer length PCORDS. Although, it has been found desirable to always search starting with the PCORD of the longest length working towards the PCORDS of a shorter length, it is, of course, possible to reverse the procedure without affecting the operation of the compression techniques, although a different amount of compression will, in all probability occur. It is expected that by operating from the PCORDS of the greater length and working towards those of a shorter length that optimal compression can be achieved.

It should be noted that providing the savings ration adjacent to each PCORD it is possible to select, for example, the 20 PCORDS having the best savings ratio and then scanning these selected PCORDS starting with the PCORDS having the greatest length. It is desirable to scan PCORDS of a common length so that it is not necessary to place at the beginning of the string, after compression with a particular PCORD, the length of the PCORD but that such information can be added after all PCORDS of the same length have been utilized in compressing the string. It will be understood that although SANPAKC has been set up to be its own lexicographer, (i.e., develop its own code depending upon the particular string of information supplied thereto) if there is a known input such as strictly alphanumeric information requiring only 50 different bytes, the remaining 206 bytes representable in an eight bit code are known to be available for use as coded information and, therefore, the PCORDS can be permanently assigned a code number without the necessity of going through stages 12 and 14 of the SANPAKC program.

After completing the program step at stage 40, the program transmits the new JII value and the redundant pattern stored in the location CORD1 to stage 42 wherein the savings ratio for this particular pattern is determined. It will be understood that the absolute savings ratio for this particular pattern in CORD1 is determined by the formula N(R-1)-(R=2) divided by JII. For example, if the original JII was 1000 and the number of redundant matched segments was three and the length of CORD1 was 12, the savings ratio would be 1.9 percent. However, if the pattern in CORD1 was one of the PCORDS already stored in the PCORDS Table, it is necessary to compute the savings ratio for the pattern in the current string and to average this new savings ratio with the old one stored in the PCORDS Table. For example, if the old savings ratio was 3 percent and the savings ratio computed for this string is 1 percent, the savings ratio is determined by adding the old savings ratio to the new savings ratio and dividing by two. It should be noted that less emphasis is being placed on past performance of PCORDS than is placed on the performance on current or rather current strings. In this way the PCORDS Table can reflect very quickly changes in the type of input information so as to provide a better indication of the true savings ratio of the PCORDS being utilized on the particular information being supplied. The pattern in CORD1 with its savings ratio or new savings ratio is now added to the PCORDS Table. When the PCORDS Table is full, the pattern is not added to the Table if its savings ratio is less than the smallest savings ratio already stored for a PCORD in the Table. All this is determined by the particular state of the PCORDS Table and the computations carried out at program stage 42. Of course, if the pattern in CORD1 was not already present in the PCORDS Table and had effected a savings ratio greater than that of the lowest PCORDS savings ratio recorded in the control segment of the PCORDS Table, then it will be added to the PCORDS Table in the correct position.

After stage 42, the program execution continues to stage 44 where a determination is made as to whether there are more than 16 repeats of patterns with this R length. Since this is the first pattern of an R length of 12, the answer must be "no". However, assuming that in fact there have been 16 patterns of a length 12 when the program entered stage 44, then the program execution would be immediately transmitted to stage 46 (shown on Figure 1B) wherein there would have been placed at the front of the string a one byte code indicating in the first four bits of the byte the number 16 and in the next four bits the length of the patter, 12. The length of the string would accordingly be increased by one byte. At this point JII would have been adjusted to indicate this additional byte of information at the front of the string.

After completing this operation at stage 46, the program continues executing at stage 48 (shown on Figure 1C) wherein the question is asked whether this is a Type 2 compression. If the answer is "yes", the program next moves to stage 50 where a further question is asked as to whether this is the fast-Mode. If the answer is "no" then the program then moves to stage 90 (shown on Figure 1B) in the Slow-Mode for Type 2 operations which will be discussed in more detail below. If the answer at stage 50 is "yes" then the program next moves to stage 54 for the Fast-Mode series of steps, which will also be discussed in more detail below.

If the answer at stage 48 was "no", as is in this case, then the program moves to stage 56 where the question is asked "are there any codes available?". If the answer is "no", then the program continues to stage 58 wherein the question is asked "is this Fast-Mode?". If the answer is "no", then the program moves to stage 60 which is the start of the Type 2 compression in the Slow-Mode. This will be discussed with respect to Type 2 compression below. If the answer at stage 58 is "yes", then the program continues to stage 62 wherein the question is asked "are there any Type 2 PRCORDS available?". If the answer is "no", then the program moves to stage 64 which is the start of the exit procedures of SANPAKC which will be discussed at the end of the description of SANPAKC. If the answer at stage 62 is "yes", then the program moves to stage 66, where the counters for the PCORDS Table are updated so that the first Type 2 PCORD is the next PCORD to be picked up for scanning, eliminating all of the remaining Type 1 PCORDS. Then, the program moves back to stage 110 (in Figure 1B) to start the Fast-Mode scanning of the Type 2 PCORDS.

If there are any codes available as determined at stage 56, then the program moves to stage 68 wherein a counter CS4 is reset to zero. This is done so that it can count repeats of patters of a particular R length during a further cycle of scans of the string by SANPAKC. A further count accumulating in CS4 will result in the addition of another composite byte being added at the head of the string at a later stage. These composite bytes which are added at various stages during the compression cycle, mainly whenever 16 different patterns of a particular length have affected a savings or whenever the length of the patterns, which are being scanned for in the compressor, is to be changed. The composite bytes are the most important pieces of information which are used during the decompression cycle to unscramble the compressed string. The importance of these composite bytes will become clear when discussing the alphanumeric decompressor, SANPAKD. After reseting of the counter CS4 to zero, the program moves to shape 70 wherein a determination is made as to whether the composite byte created at stage 46 (in Figure 1B) is in the list of available codes in LEXICON. If the answer to this question is "yes", then this available code in LEXICON has become non-available and must be deleted from LEXICON. Thus, if the answer is "yes", the program moves to stage 72 where this code is deleted from the LEXICON Table. Further, since the code has been eliminated, the counter of the number of available codes, CS5, must be changed to indicate one less available code, and the codes in LEXICON must be shifted one byte to the left to fill in the space created by the absence of this now non-available code. Once this has been accomplished, the program moves to stage 74 where a further determination must be made as to whether any codes are now available. This question must be asked because the removal at stage 72 of the code may have caused the LEXICON Table to be emptied of all available codes. If the answer at stage 74 is "no", then the program returns to stage 58 as was discussed previously.

As was discussed previously, if the answer at stage 74 is "yes", the program continues at stage 76. Additionally, if the answer at stage 70 had been "no", then the program would also would have continued at stage 76. At stage 76 the question is asked "is this the Fast-Mode?" If the answer to this question is "yes", then the program continues at stage 110 (in Figure 1B) mentioned previously. If the answer at stage 76 is "no", then the program continues to stage 78 wherein the question is asked "is R to be decremented?". It will be understood that, as in this case, the reason why stage 46 (in Figure 1B) had been operated is that it had received its directions from stage 44 (in Figure 1A) answering "yes", and the answer at stage 78 would be "no". The "no" answer at stage 78 would be the proper one because at stage 44 control was sent to the routine which will add a composite byte to the head of the string because there was 16 repeats of a cord with a particular R length, in this case 12. This means that there may be more patterns of this length which may be found in the string, therefore decrementing of R at this point would not be desirable. If a "no" answer occurs at stage 78, then the next step in the program is to return to stage 20 for a new cycle in the Slow-Mode. If the answer at stage 78 is that R is to be decremented then the program continues at stage 80 and will operate in a manner to be discussed below.

If the answer at stage 44 (in Figure 1A) was "no", then the program would continue at stage 82 wherein the question to be asked is "are there any codes left in LEXICON which are available for substitution?". If the answer is "no", then the program moves to stage 84 (in Figure 1B) where the question is asked "are there any Type 2 codes available?". The manner of this operation will be discussed with respect to the entire Type 2 mode of operation.

If the answer at stage 82 (in Figure 1A) is "yes", then the program moves to stage 86 where the question is asked "is this the Fast-Mode?". If the answer to this question is "yes", then the program moves to stage 89 (in Figure 1B) in the Fast-Mode operations. This will be more fully discussed with respect to a direct discussion of the operation of the Fast-Mode. If the answer at stage 86 is "no", then the program moves to stage 88 wherein the question is asked "is this a type 2 compression?". If the answer to this question is "yes", then the program would move to stage 90 (in Figure 1B) in the Type 2 mode of operation. However, that mode of operation will be discussed in more detail below. If the answer at stage 81 is "no", then the program moves to stage 26 at which point the question is asked "can more comparisons be made with patterns of this R length?"

The operation at stage 26 has been discussed previously in detail. Obviously, if the answer is "yes", the cycle starts again at stage 20. However, if the answer is "no", meaning that all of the possible comparisons have been made in the string of information for this R length, then the program moves to stage at which point the question is asked "was there a Type 1 saving?". If the answer is that there was a Type 1 saving, then the program continues to stage 46 and will proceed through the succeeding stages from stage 46 in the manner previously discussed. If the answer at stage 92 is that there was no Type 1 saving, then the program continues to stage 80 where R (the length of a pattern to be scanned) is decremented by 1 and, therefore, R will be set equal to R-1 and RM will accordingly be set equal to RM-1. After this decrementing operation the program continues to stage 94 (in Figure 1B) at which stage the question is asked "is RM equal to "0"?". If RM is not equal to zero and the answer is therefore "no", then the program is recycled in the Slow-Mode at stage 20 with the new value of R being equal to one less than its previously cycled value of R. If the answer at stage 94 is "yes", then we must be prepared to enter the Type 2 Slow-Mode of operation and the program continues to stage 84 where the question is asked "are there any Type 2 codes available?". As was discussed previously, Type 2 codes are available when, in the LEXICON, there are more than 34 repetitions of single bytes in the string of information supplied to SANPAKC (see description of stages 12 and 14). If the answer at stage 84 is "no", then the program continues to the exit routines of SANPAKC which starts at stage 64 as discussed previously. A complete discussion of the operations following stage 64 will be discussed below.

If the answer at stage 84 is that there are Type 2 codes available, then there is initiated the Type 2 Slow-Mode operation at stage 60. A discussion of what Type 2 compression involves is now needed.

Where a single byte has reoccurred more than 34 times in the input string fed to SANPAKC, there is good chance that, after compression of the string, that the single byte will in fact still appear more than 34 times in the first 256 bytes of the compressed string. If this occurs, Type 2 compression will be operative and an attempt will be made to compress the string further. Type 2 compression operates as follows:

A scan is made of the first 256 bytes, or a lesser grouping thereof, to determine how many occurrances there are of the particular byte pattern and the locations of these one byte patterns. If the scan shows less than 34 occurrances of the byte pattern in the first 256 bytes of the string then Type 2 compression will not effect a saving and no Type 2 compression will follow. If, however, there are more than 34 occurrances of the particular byte pattern, then a bitmap is formed. The first byte of the bitmap is the byte of the redundant pattern. The second byte of the bitmap is a number designating the number of bytes in the entire bitmap. After the second byte, a map is formed, which, for each byte of the string, a bit is used designate the presence or absense of the redundant byte starting from the first byte of the string and continuing through two hundred and fifty six bytes in the string. If, in fact, the bitmap extends for the full 256 bytes in the string, then since each byte has been substituted for a single bit, there are 32 bytes in the bitmap. However, starting from the end of the bitmap, if the last byte in the bitmap does not contain more than one bit showing the occurrance of a redundant byte, then it is wasteful to have the bitmap the full 32 bytes in length and the last byte of the bitmap is removed and the bitmap reduced in size accordingly. A check of the last byte or bytes of the bitmap is made to determine whether the length of the bitmap is optimal and, the bitmap is shortened until is has proved to be optimal in length. The bitmap is then, in fact, a map which shows where redundant bytes occur in the string of information. Once this bitmap is made, it is then only necessary to remove the redundant bytes from the string, substantially compressing the string, and then placing the bitmap at the head of the string to act as a pattern to indicate, (a)the redundant pattern; (b) the length of the bitmap; and (c) the locations of the redundant pattern in the succeeding string which is, of course, the bitmap.

It is now useful to discuss the flow of information through SANPAKC for the purposes of Type 2 compression. Starting at stage 60 (in Figure 1B) counters and registers are set up for Type 2 compression. After this set up step at stage 60, control then flows to stage 52 wherein the machine is next asked to obtain the next Type 2 code in LEXICON to search for in the string. Type 2 codes are the codes which previously were stored at the end of LEXICON. Alternatively, the PCORDS Table can be checked to determine whether there are any Type 2 PCORDS to be searched for. This alternative will be discussed with respect to the Fast-Mode operation. In most instances the length of the string to be searched for a single byte pattern will be 256 bytes in length, even though, the actual length of the string may be greater than that. If the string itself is less than 256 bytes in length then, the length of the string to be searched will equal the actual length of the string. This value of the number of bytes of the string to search, is stored in the machine location CS3. After stage 52 operations have been completed, the control of the program continues at stage 96 where the instructions are executed to scan the first CS3 bytes of the string for the code which has been picked up from the LEXICON Table as noted at stage 52. The program then continues to stage 98 wherein the question is asked "was there a savings utilizing this particular code over the first CS3 bytes of the string?". If there are less than 34 occurrances of the one byte pattern (the selected code) then there is no saving and the answer is "no" at stage 98 and the program continues at stage 90 in the manner discussed previously with respect to stage 90. If the answer at stage 90 is "no", the program continues to stage 100 where again the question is asked "was there a Type 2 saving?". Since the answer must again be "no" at stage 100, the program then moves to stage 64 where it is prepared to end the operation of SANPAKC.

If the answer at stage 90 had been "yes", that there were more Type 2 codes available, then the program returns to stage 52 and the operation continues with a new Type 2 code retrieved from LEXICON or from the PCORDS Table. The program will then continue through stages 96 and 98 as was discussed previously. At stage 98 if the answer has been "yes", then the program would have continued to stage 102 wherein the computer proceeds to build the bitmap (BBM) by scanning the first CS3 bytes of the string and determine where in those CS3 bytes the code appears and recording that information in the bitmap, stored temporarily in location CORD1. Thus, at stage 102 the information of concern is the length of the string being compressed, CS3, the code of the redundant information which will be pulled out of the string to compress the string,and the bitmap or CORD1 which provides a road map of the places in the string where the code is present so that, after the redundant information is removed from the string, CORD1 provides a record of where that information was removed from the string, so that it can be later replaced during decompression. After stage 102, operations have been completed, the program continues to stage 28 (in Figure 1A) wherein the counters are updated. There is no need to select a code from the LEXICON array as this is Type 2 compression and such codes are unnecessary. The process of removing the Type 2 codes from the string is then continued through stages 28, 30, 32, 34 and 36 in the same manner as discussed with respect to Type 1 compression except for the special case,which is not found in Type 1 compression, wherein all of the located redundant bytes in the first CS3 bytes of the string may not be removed due to the fact that in optimizing the length of the bitmap as was discussed earlier, the occurrances of some of the patterns may not be removed even though they are in the first CS3 bytes of the string. Thus, it may be the case in Type 2 compression that the removal of a single byte pattern from the string may effect a savings, but all occurrances of the redundant byte may not be removed from the string which is contrary to the case of Type 1 compressions where all occurrances of the redundant pattern are removed from the string.

After completing the instructions at stage 152 and having recorded the exact position of every occurrance of the code in the string, the program continues execution at stage 154, At stage 154, register IR2 is decremented by 4 and this new value is loaded into register IR4. IR4 points to a location in the TL array where the address of the first spot in string, where the original byte pattern which was substituted for during the execution of SANPAKC is to be replaced, is stored. After completing stage 36, the program continues to stage 38 wherein the question is again asked "is this Type 2 compression?". The answer is "yes" and the program continues on to stage 104 (in Figure 1B) wherein the counters are set up to assemble the LEXICON at the head of the string. The counters are set by setting up R and RM. The meaning of R at this stage is indicative of the number of bytes in the bitmap and RM is again equal to R minus 1. It should be noted that, in Type 2 compression, this action can be performed by simply adding the total number of bytes in the bitmap to the current value contained in R and Rm for purposes which will be discussed hereinafter. After, this has been done, the program then continues to stage 40 (in Figure 1A) wherein the LEX instructions are completed which move the string to the right an amount equal to R+1 bytes, leaving a space at the front of the string for inserting the control information for this Type 2 compression. JII is changed by adding to the original JII and amount equal to R+1. It should be understood that this is necessary as R+1 bytes of control information will be placed at the head of the string. The control information relating to the Type 2 compression (a) one byte for the code of the byte being compressed (removed from the string); (b) one byte indicating the length of the bitmap; and (c) the actual bitmap. Thus JII will have been increased by R+1. The program then continues through stages 40, 42, 44, 86,and 88 in the same manner as was discussed previously with respect to Type 1 compression. At stage 88, when the answer to the question of "is this Type 2?" is answered "yes", the program continues to stage 90 (in Figure 1B) wherein the question is asked "are there any more Type 2 codes?". If the answer at stage 90 is "yes", and the program returns to stage 52 for another cycle of Type 2 compression. If the answer at stage 90 is "no", the program continues to stage 100 wherein the question is asked "was there a Type 2 saving?". If the answer at this stage 100 is "no", then the program continues on to stage 64 which will be discussed below. If the answer at stage 100 is "yes" that there was a Type 2 saving, the program control is passed to stage 46, where additional information is placed at the head of the string, namely a composite byte which in its first four bits gives the number of times Type 2 compression has been effected; and in its next four bits is the number of bytes in the pattern being compressed. In this case the length value would be "1". This will key the machine for recognizing the occurrance Type 2 compression during the decompression cycle. The program continues from stage 46 to stage 48 (in Figure 16) and to stage 50 in a manner discussed previously. If the answer at stage 50 was that this was not a Fast-Mode type compression, then the answer is "no". Then, the program continues again at stage 90.

If the answer at stage 50 was "yes", then the program continues at stage 89 (in FIG. 1B) wherein the question is asked "are there any more PCORDS for this length?". If the answer is "yes", then the program continues again at stage 54. If the answer is "no", the program continues to stage 108, where the question is asked "whether there was a savings?. If the answer at stage 108 is "no", then the program continues to stage 110 where the question is asked "are there any more PCORDS?". If the answer is "yes", the program returns to stage 54 where the next PCORD is retrieved for scanning the string. If the answer is "no", at stage 110, then the system continues at stage 64.

If SANPAKC is operating in the Fast-Mode, then the response at stage 16 (in FIG. 1A) would transfer program control directly to stage 112 (in FIG. 1B) wherein the counters and registers for the Fast-Mode will be set up. At this stage, and after setting up the counters and registers for the Fast-Mode, the program continues to stage 54 to get the next PCORD to search for in the string (from the PCORDS Table). If the program is in Type 2 compression, it would be searching for a Type 2 PCORD or, if in a Type 1 compression, it would of course, be checking the next Type 1 PCORD. This determination is made at the succeeding stage 114 wherein the question is "whether the PCORD to be scanned for is a Type 2 PCORD?". If the answer is "yes", then the program starts again at stage 96 and continues in the manner discussed previously with respect to Type 2 compression. If the answer is "no" at location 114, then the program continues to stage 20 (in FIG. 1A) where the string is searched for the Type 1 PCORD.

All of the Type 1 and Type 2 Fast and Slow-Mode branches discussed previously have eventually terminated at stage 64. At stage 64, steps are taken to end the compression operation on the string of information. Only "housekeeping" functions are completed from stage 64 to the end of SANPAKC. That is, at stage 64 the PCORDS Table is searched to find the PCORD with the lowest savings ratio. It should be noted that if the PCORDS Table is not filled, the lowest savings ratio is zero and the address of the PCORD with the lowest savings ratio is the last location in the PCORDS Table. As was discussed previously, the PCORDS Table in its first 20 byte segment maintains the information as to the lowest savings ratio of a PCORD which is up for review at location 64. It should be noted that, specifically in the Slow-Mode, every PCORD in the PCORDS Table is updated as to its savings ratio. In the Fast-Mode, this would have been accomplished at stage 42 as discussed previously. However, if a particular PCORD had not been looked for during the Slow-Mode, this means that it was not in the string of information reviewed and, accordingly, the savings ratio associated with the particular PCORD is halved. It should be noted that the savings ratio is not averaged as might be expected, but is halved meaning that the last two strings of information have the greatest effect upon which PCORD remains in the Table with the highest savings ratio. This allows the system to rapidly change over from one type of information input to another. For example, if one were compressing information in English and there immediately followed information in German, where there might be different grouping of letters, the PCORDS Table would be very responsive to this change and after only a few strings of information had been fed through the compressor, the entries in the PCORDS Table would reflect this change to the new language being supplied to SANPAKC.

After operations are completed at stage 64, the program continues to stage 116. At stage 116, four additional bytes of information are placed at the head of the string. First, the length of the compressed string, JII, is recorded in the first three bytes of the four byte addition. JII, of course have been updated to include the last value of JII plus the four bytes of information to be added at this stage. As was stated, three bytes are used to record this value of the new JII with the fourth byte being utilized to provide a count of the number of different compressions which took place in the string to follow and which in fact has been made on the string prior to reaching stage 116. If the input information has not been compressed, then, of course, the fourth byte of the above mentioned four byte segment at the head of the string will be zero. After completion of the steps at stage 116, the remaining steps outlined in FIG. 1B are "housekeeping" machine functions which are completed merely to provide information as to the economics of the compression techniques completed in SANPAKC and to determine where the control of the program is to be continued.

For example, at stage 120, after completion of the instructions at stage 116 there is provided a set of instructions for increasing the amount storage within the machine which is addressable by the program. This is necessary because of a limitation on the number of instructions which can be addressed in one section of machine storage. At stage 122 the determination of the value of the variable MODE, an input command, indicates whether the string has come through SANPAKC or, whether at the input stages 11, the string and been directed not to be compressed such as would occur in a retrieval mode of operation. Accordingly, those strings of information which are not to be compressed are transferred directly from stage 11 to stage 120 and then stage 122 without ever passing through any portion of SANPAKC. At stage 122, if the determination is made that the information had not been compressed, it goes directly to the termination point of SANPAKC. If there had been a compression, and, therefore, the answer at stage 122 is that MODE is not equal to zero, then a stage 124 there is computed a savings ratio of the amount of savings achieved by the compression of the input string by SANPAKC. Thus, the savings ratio is the number of bytes saved divided by the original number of bytes in the input string, and accordingly, the actual savings achieved by SANPAKC can be determined for each string of information supplied.

SANPAKD

INTRODUCTION

SANPAKD is the alphanumeric decompressor which is used for decompressing strings of information which have been compressed by SANPAKC. It is the purpose of SANPAKD to take such compressed information and return it to the form of the original input information.

DETAILED DESCRIPTION

Briefly, in the case of a string that has undergone both Type 1 and Type 2 compressions in SANPAKC, the first three bytes of the compressed string indicate its length in bytes. The fourth byte specifies the number of compressions carried out on the string in SANPAKC. These four bytes are removed and set to appropriate registers to be used for control purposes through the decompression process. The next two bytes in the compressed string relate to a Type 2 compression: one gives the Type 2 byte which was deleted from the string and must now be inserted in the proper location thereon, the other byte gives the length of the bit-map which follows next and will be used for finding the right locations in the string to carry out the insertions. The insertion process is carried out and then any other deleted Type 2 bytes are reinserted in the compressed string in the same manner. Next, decompression information relating to Type 1 compression is examined. As noted earlier, for each Type 1 compression, the string has at its head a Type 1 code byte, a byte designating in four bits the length of the replaced pattern and designating in the other four bits the number of patterns replaced. These two bytes are set to appropriate registers for control purposes, and the R bytes of the replaced pattern which follow at the head of the string are inserted in place of every occurrence in the string of the Type 1 code just mentioned. The process is repeated until all deleted patterns are replaced in the string.

Various housekeeping, control and error checking functions are also carried out. A detailed description of each step of the process, with particular references to the drawings, is given below.

The information flowchart for SANPAKD is shown in FIG. 2. In FIG. 2 at stage 130, the instruction SANPAKD is given which will initiate all the steps which follow as set forth in FIG. 2. The first step in the decompression of the string such as the output of SANPAKC discussed above, is to complete the instructions at stage 132. The instruction OPCORDS at stage 132 is to optimize the PCORDS Table if the input string has not already been compressed. But if, the input string is intended to be compressed, at stage 132 the PCORDS Table in SANPAKC will be optimized. This optimization is carried out by removing all patterns in the PCORDS Table except for a specified number of PCORDS with the highest savings ratio values. The actual number of PCORDS to be retained in the PCORDS Table is an option of the user. For example, if it is only desired to scan the best five PCORDS, then, in fact, only the top five PCORDS in terms of savings ratio will be utilized during the SANPAKC compression with all other PCORDS being deleted from the table. Once this optimization instruction is completed prior to entering SANPAKC, the program control then continue with all of the instructions in SANPAKC as was discussed previously. If the string at stage 132 is intended for decompression, then program execution is continued at stage 134 where all of the registers and counters for SANPAKD are set to receive a new string of information. After setting the registers and counters, the program continues at stage 136 wherein instructions are given to remove the first four bytes from the head of the input string. These four bytes, were, as discussed previously with respect to SANPAKC, comprised of three bytes which designated the length of the string and one byte which iniciated the number of compressions which had been completed on the string while passing through the SANPAKC. The first three bytes of these four bytes are removed from the input string and are then stored in counter JII (length of the string). The fourth byte is stored in counter CS8. This last counter CS8 indicates the number of compression which had been completed on the input string. After completing the instructions at stage 136, the program then moves to stage 138 wherein a determination is made as to whether counter CS8 has a value greater than zero. If CS8 is equal to zero, then the input string has not been compressed and there is therefore no need for sending the string through any further stages of SANPAKD. Accordingly, the program control is then immediately passed to stage 140 which will be discussed at the end of the operation of SANPAKD .

If CS8 is greater than zero, the string was compressed and therefore requires decompression. Thus, control passes to stage 142, where registers BRYY and BRY, which are registers in the computer, are loaded with information as to where the string of input information can be found in the computer. Once this is determined, the next step is taken at stage 144 where the first byte of the input string is examined. The first byte of the string thus processed will have, as was discussed with respect to SANPAKC, a composite byte comprising first (A) four bits which indicate the number of repetitions of a particular pattern length of follow; and (B) the next four bits indicate the length of the first group of repetitive patterns which have to be decompressed. For example, if two patterns of length 12 are at the head of the string, then (A) would be 2 and (B) would be 12. However, it should be noted that it is most likely the length of the pattern at the head of the string would be small as in SANPAKC compression occurs first with the longest patterns and works downward to the shortest patterns, with the last patterns to be the compressed being the Type 2 patterns, one byte in length. If there was any compression of the string, this composite byte would be at the head of the string. The first half of the composite byte, indicating the number of repetitions, is stored in counter CS4 and the length the patter is stored in location RM, remembering that R equals RM+1.

After the preliminary steps at stage 144, the instructions proceed to stage 146 wherein a determination is made as to whether the string should be decompressed for Type 2 or Type 1 information. Thus, if RM equals "0", then the information is in order for Type 2 decompression and the instructions would proceed to stage 148. If RM is greater than "0", then Type 1 decompression is in order and the instructions would proceed to stage 150. At stage 150, the input string must have, as its first byte the code which has been substituted for a particular pattern and, the succeeding R bytes comprising of the redundant pattern to which substitution has been effected. It will be understood that at stage 150 JII is reduced by R plus 1 bytes and the substitution code and the redundant pattern are stored in locations CODE and CORD1 respectively. The string is then moved, to the left, R+1 bytes to close up the space created by the removal of the above information from the head of the string.

After completing the steps at stage 150, the program continues at stage 152 where the instruction F1ND is initiated. These instructions scan the string for the single byte in CODE stored during the operation at stage 150. This byte is the substitution code which replaced the occurrances of the redundant pattern during the execution of SANPAKC. The addresses of the locations in the string where this code is found are stored in the array TL. Register IR2 contains the number of bytes past the beginning of the TL array where the address of the last found occurrance of CODE is stored.

After completing the instructions at stage 152 and having recorded the exact position of every occurrance of the code in the string, the program continues execution at stage 154. At stage 154, register IR2 is decremented by 4 and and its new value is loaded into register IR4. IR4 points to a location in the TL array where the address of the first spot in the string, where the original byte pattern which was substituted for during the execution of SANPAKC is to be replaced, is stored. After completing this step, the instructions continue at stage 156, where the number recorded in register IR4 is loaded into register IR5. Register IR5 indicates the last string address at which a code was found during the scan defined in the operations at stage 152. IR5 is now set equal to the value in IR5 minus the address of beginning point of the string of information. Thus, at this point IR5 is equal to the number of bytes from the beginning of the string where the last code found is actually located.

The next step is to determine at stage 158 whether the code We are dealing with is a Type 1 or Type 2 code. If it is a Type 1 code then the program control does directly to stage 160. If it is a type 2 code, the program control goes to stage 162. This determination is made by merely checking, as at stage 146, as to whether RM is equal to zero or greater than zero. Considering the case with Type 1 compression, the program continues to stage 160 wherein the string is operated on by moving the remainder of the string one byte past the location pointed to by IR5 (the location of the CODE in the string) to the right an amount equal to RM (R minus 1). This leaves an opening of R bytes in the string subsequent to the location pointed to by IR5 one byte of which is the substitution code placed in the string by SANPAKC. Then, the pattern in CORD1 is inserted into the string in the R-byte space between the location pointed to by IR5 and the remainder of the string. The insertion of the pattern in CORD1 into the string erases the code which had replaced the pattern during the compression cycle and the new string will be returned toward its decompressed form. In the process, the actual length of the string has been increased by RM bytes and this amount is added to JII. After this stage is completed, IR4 is decremented by 4 so that it now points to the next lower address in the TL array where the next address in the string to be operated on is stored. This is accomplished at program stage 162. That is, the new IR4 is equal to IR4 minus 4 bytes which is the position of the next CODE address stored in TL. The program then continues to stage 164 where a determination is made whether the new IR4 is equal to or greater than zero. If the value is equal to or greater than zero, then the loop is executed again by returning to the instructions at stage 156 to a gain insert the pattern in CORD1 at particular CODE locations. This looping will continue until IR4 is less than zero. This means that all the codes for this particular pattern have been replaced by the original string pattern and the program will continue to the instructions at stage 166. At stage 166 are the instructions relating to checking the operation of the string decompression and insuring correctness. Thus, at stage 166 counter CS8 is decremented by 1 indicating that the first decompression step has been completed and that there are now left a new CS8 minus 1 decompression steps to be completed before total decompression of the string is achieved. Further, counter CS4 is also decremented by 1 meaning that for this particular length of pattern there are CS4 minus 1 decompression steps to be completed before a new composite byte is located or total decompression of the string has been achieved.

The next step is to determine, at stage 168, if counter CS4 is equal to or less than zero. If counter CS4 is greater than zero, then the program returns to stage 146 and the loop will continue until CS4 counts down to zero indicating that all patterns of this R length have been decompressed. When this occurs, the program continues to stage 170 where a determination is made as to whether counter CS8 is equal to zero. If counter CS8 is not equal to zero, then the program returns to stage 144 to remove a new composite byte which, at this stage, should be the first byte of the string and to continue through the decompression stages. If, in fact, all of the decompression steps have been completed, the counter CS8 will be equal to zero and the program will continue to stage 140. At stage 140, the now decompressed information is then set up for use in the numeric decompression portion of SNUPAK (if numeric decompression is required), the operation of which will be discussed below. This treatment involves breaking down the string of information into substrings in accordance with whether the information is textual, floating point, or integer information. All of this will be more fully discussed with respect to SNUPAK. After the steps at stage 140, the string would leave SANPAKD fully decompressed and ready for use wherever needed.

If the determination is made at stage 146 that RM is equal to zero and, thus, we are dealing with Type 2 compressed information, the next steps are taken at stage 148. At stage 148, the Type 2 control information is decompressed by removing three items of information from the head of the string. The first item of information is the 1 byte code which is to be replaced at selected locations of the string as is designated upon decompressing the bitmap which is to follow. The second item of information which is removed from the string is the byte following the one byte code in the string of information. This byte of information designates the number of bytes in the bitmap which follow this byte in the string. It should be noted that by the removal of this information from the head of the string the length of the string, JII, must be reduced by an amount equal to the length of the bitmap plus 2 bytes. The bitmap is also decompressed at this stage and the addresses of each location where a substitution must be made are stored seriatum starting at location TL in four byte increments. The number of bytes past the beginning of TL, where the last address is stored, is stored in counter CS10. The program then moves to stage 149 where the contents of counter CS10 is loaded into register IR4. The program then continues on to stage 156. Register IR4 is transformed at location 156 in the same manner that was discussed previously. Control is then transmitted to stage 158 wherein, again, determination is made whether RM is equal to zero. Since RM is equal to zero, the program continues to stage 162. At stage 162, there is the utilization of a counter CS11 which is the original number of one byte patterns from the string during the compression cycle as determined by the program when disassembling the bitmap at stage 148. This number minus 1 is loaded into register IR6 and the new number in IR6 is then stored back in counter CS11. A new value stored in register IR5 must be determined. IR5 contains the number of bytes from the head of the string to the location where the code to be inserted into the string and from this number must be subtracted the value recorded in register IR6 in order to determine the actual position in the compressed string where the coded information will be placed. This is required as the address stored originally in the bitmap has been changed by reason of the other compressions which had occured during the Type 2 compression but have not been restored into the string yet. After completing the steps at stage 162, a correct value in IR5 has been computed which can be utilized at stage 160 to insert the Type 2 code into its correct position in the string. Once this has occurred, then the loop will continue through stage 162 in the same manner as was discussed with respect to Type 1 compression until the bitmap has been completely utilized to insert the Type 2 codes in their correct position in the string. The result of the SANPAKD operation is to produce, at the end, an absolute reproduction of the original input string into SANPAKC. It should be noted that the original string length JII originally recorded can be checked against the now new length JII at stage 140 and determine whether the length of the decompressed string corresponds to the length of the original input string. Further, there is a check as to whether the number of compressions equal the number of decompressions which were effected by SANPAKD. These cross checks insure that there is no error in compressing and decompressing the input strings. This completes the operation of the alphanumeric compressor and decompressor in COPAK.

NUPAKC

INTRODUCTION

NUPAKC is the numeric compressor. That is, NUPAKC is designed specifically for compressing numeric information. The machine is normally instructed that certain strings of information are basically in numeric form, and, such information will be transmitted to NUPAKC for compression. In FIG. 3, there is shown a flow diagram of the steps that take place in NUPAKC to compress the numeric information. In FIG. 3, there is a start up procedure which instructs the machine to proceed to stage 182, where the input strings of numeric information are converted into integer number organized in four-byte words in a manner which will be more fully described in FIG. 4. This conversion into integers is the first compression step in that it removes the floating point exponent and allows the numerical information to be treated as an integer so as to conserve storage facilities and effect more efficient utilization in the remaining compression steps.

After conversion into integers, the program continues to a differencing stage. At this stage, successive integer words in a substring are sifferenced seriatum so as to substantially reduce the magnitudes of all the integers following the first integer in an optimal manner. The procedures at stage 184 are described in FIG. 6 and will only be accomplished if, in fact, such a differencing procedure will effect a saving and the number of differencing cycles will be limited to that number which reaches the optimal condencing of the input substring.

After completing the procedural steps at stage 184, the program continues to stage 186 described in FIG. 5 in which identical sequences are removed and condensed information replaces the sequential information and a map of the position of such information is placed at the head of the substring so as to indicate, for decompression purposes, where said condensed information can be found in the substring. After completion of the steps at program stage 186, the program continues to stage 188 where all of the substring integers are packed into eight byte double words in a optimal fashion, i.e., the maximum number of integers are placed in each double word so as to again condense the information. It has been found with NUPAKC, that it is possible, especially in dealing with highly repetitive information such as is found in graphical data, etc., that compression up to 99.99 percent is possible. However, more normally, compression of numerical data is in the range of 80 to 95 percent.

As indicated by the Table of FIG. 9, the string of these double words may then be directed to SANPAKC for further compression.

It should be noted that the longer the substring, the more likely are repetitive sequences to occur and more efficiently are the integers packed into double words. It has been found that when one substring, which could be compressed to save 88 percent, is included in a substring ten times as long it would give a savings of 95 percent. Thus, long substrings should in fact give rise to higher savings.

DETAILED DESCRIPTION

FIG. 4 is a structural flow diagram of the operation at stage 182 discussed with respect to FIG. 3. It should be understood that all substrings, which might be utilized in the COPAK system, have certain identifying words associated with them. (the substring command). One of the first words has been called SOS. If SOS is a number less than zero then the substring is intended to be compressed by SANPAKC only. If SOS is equal to zero then the substring is to be compressed by NUPAKC. It will further be understood that when substring information is read into the computer, this type of identifying material is controlled by the user through the input commands because he knows what the information type is (either numeric or alphanumeric) and, therefore, capable of numeric compression or alphanumeric compression. However, it should be noted that NUPAKC is not purely limited to numeric information and, in fact, alphanumeric information could be compressed by NUPAKC which could, assuming that all the information being entered into the machine is in fact in numeric form. However, for practical purposes, NUPAKC is intended strictly for numeric information. In addition to the SOS substring command, there is a second command called LSX. LSX is a substring command which determines the type of numeric compression which will be used. NUPAKC may use no truncation, truncation by the bin procedure using the value of LSX, or truncation by the logical right shift method. These operations are discussed below.

In the procedure at stage 182 (in FIG. 4) the first step is a determination at stage 190 whether LSX is equal to, less than, or greater than zero. At stage 190, register IR1 contains a value of the number of bytes past location LSX in core memory of the machine where the desired LSX value for the particular substring is stored. If LSX is equal to zero, then the program would continue with no truncation at stage 192. If LSX is less than zero, then the program continues at stage 194 to begin execution of the logical right shift method. If LSX is greater than zero, then the program continues at stage 196 wherein truncation by the bin procedure using the value of LSX is started. LSX is an indication of the degree of reliability which the user desires the information to be passed through NUPAKC. Thus, if one knows that the input data is correct to within 1 percent, then LSX would equal, i.e., 0.01. If the user states that LSX is less than zero (usually set to -1) this means that the logical right shift method will be used. In the logical right shift method there will only be a small variation in the seventh significant figure in the input data upon decompression. Thus, normally, one who wishes to use the logical right shift method would be interested in extremely accurate data with little or no loss of significant information during compression.

Register IR1 will be used throughout this description and it will have the following meaning. IR1 is associated with the address of information in various arrays which are used by NUPAKC for each particular substring. The first location in each array (such as SOS, LSX, BWX, YM, etc.) contains the compression commands for the first substring. The second location in each array is the information associated with the second substring, and so on for each substring in the string to be compressed. IR1 contains a count of the number of bytes past the beginning location of the array where the substring information is stored. Thus, for substring 1, IR1 will have a value of zero, indicating that the information is stored at the beginning of each array. For substring 2, IR 1 contains the value of 4, meaning that the information is four-bytes past the beginning of the array. For notation purposes, the symbols such as SOS(IR1), LSX(IR1), etc., will indicate the above mentioned meaning. That is, SOS(IR1) means to use the beginning address of the SOS array plus the number of bytes past the beginning of the array (the value of IR1) to address the proper location of the substring information.

When LSX equals zero, as was previously stated the program continues to stage 192 wherein the floating point number 0.0 is stored in location BWX(IR1). The number 0.0 for LSX indicates that the string is not be be truncated. This information will be added to the head of the string (composed of all the substrings) after completion of the passage of all the substrings through NUPAKC. Thus, after completion of the compression, this information will be placed at the head of the string to indicate that, no truncation was completed on this particular substring so the proper decompression procedure can be affected.

After storing 0.0 in BWX, the program continues to stage 198 wherein instructions are provided to have the substring searched to find the minimum and maximum values in the substring. The minimum value is contained in register IR4 and the maximum value of the substring is contained in register IR5. After this is completed, the program continues to stage 200 wherein the median value of the substring is determined. This is YM(IR1) (the median value for the substring) is computed as the sum of the minimum and maximum values stored in IR4 and IR5 divided by 2. Then IR5 is then reset to equal the absolute value of the median value for the substring (i.e. IR5will always be a positive value at this point).

After completing this stage, the program continues to stage 202 wherein a determination is made as to whether IR5 is greater than 67,108,864. This would occur where the input information was not in fact, numeric information or there had been some mistake in entering the input string. The number 67,108,864 is equal to 2 to the 27th power. If this were to occur, there obviously was an error in the kind of input information entered and, in fact, the input information should not have been supplied to NUPAKC. Since the GR5 is indicative of the medium value and would indicate that there are some numbers above and some numbers below that value, any numbers that exceed the 27 power are too large for the numeric compressor to operate on and, accordingly, they should be bypassed through the numeric compressor. Thus, if IR5 minus 227 is greater than zero then the program continues to stage 204 wherein the number of times the differencing procedure has been executed (in this case 0) is stored in SOS(IR1). Then, SOS is loaded negatively so as to make SOS less than zero. As was discussed previously, when SOS is less than zero, the program is to use only SANPAKC for compression. After completing this storage, the program continues to stage 206 wherein a printout is made to tell the user that the string could not be compressed by NUPAKC.

At stage 208, immediately succeeding stage 206, the operations of NUPAKC on the input string have been completed and program control is transferred to the end of NUPAKC at location 210 as shown in FIG. 3.

If the answer at stage 202 is a negative number, then program continues at stage 212. At stage 212, the number of bytes in the substring is loaded into register IR3 from storage location CS2 and the address of the first byte of the substring is loaded into register BRYY from counter CS15. After this step, the program continues at stage 214 to substract the median value YM(IR1), determined at stage 200, from each word in the substring taking the substring word from storage to complete said subtraction step. Each word in the substring is stored in four byte intervals in a storage location addressed by register BRYY and, at stage 214, the first word in the substring has YM subtracted therefrom. Register IR3 loaded with the number of bytes in the substring, has four bytes subtracted therefrom. Thus, IR3 equals IR3 minus 4. Register BRYY is incremented by 4 to address the next word in the substring. Next, the program continues to stage 216 where determination is made as to whether IR3 minus 4 or the new IR3 is still greater than zero. If it is greater than zero, then the program returns to stage 214 and processes the next word in the substring recorded at BRYY plus 4 and, further decrements IR3 by another four bytes. This value of the new IR3 is then checked at stage 216 until such time as the new IR3 is equal to zero. When this happens, the program continues to the terminal stage 208.

If LSX had been greater than zero, the program would have continued at stage 196 to execute the bin procedure truncation. At this stage, a value LSX(IR1) divided by two is computed. The LSX(IR1)/2.0 value is stored at location BWX(IR1).

After completing the storage step, the program continues at stage 218 where the substring is scanned to find a minimum and a maximum value of the floating point numbers in the substring. This value is contained in registers IR4 and IR5 respectively. At stage 220, the value of the minimum number of the substring, in IR4, is subtracted from IR5 and divided by the number stored in BWX(IR1), namely, (LSX/2) to obtain a value which is stored at location DUM1. Thus, it will be seen that as LSX approaches zero, DUM1 becomes larger and larger. If DUM 1 becomes greater than 227, then the program returns to stage 194 to execute the logical right shift method. If, at stage 222, it is determined that DUM1 is less than 227, then the address of the first byte in the string and the number of bytes in the substring, recorded in counters CS15 and CS2 respectively, are loaded into registers BRYY and IR3 so as to be ready for use. After loading the registers with the values from CS2 and CS15, the program continues to stage 226.

The value of the first word in the substring is replaced in its storage location by a value computed by subtracting from the original value at that location, the minimum number in the substring and dividing by the value (LSX/2) stored in BWX. After completion of this step, this word is then further operated on at stage 228 by truncation. The truncation removes all digits to the right of the decimal place in the word and leaving only the digits to the left of the decimal place. This is known as truncation without rounding as there is no significance placed on the size of the number to the right of the decimal place and it does not affect the integer which remains. The number or integer thus formed is now stored in the same location from which it was taken and, next, the program continues to stage 230 wherein register BRYY is incremented to address the next number in the substring and IR3 is decreased by four bytes, the amount one has moved to find the next word in memory. IR3 is equal to four times the number of words remaining to be processed in the substring and by decreasing IR3 by four bytes for each word processed in memory in the substring, it will be understood that when the entire string has been completed, IR3 will be equal to zero.

At the next stage, a determination is made as to whether in fact IR3 has reached zero. At this stage 232 if IR3 is still greater than zero the program continues back to stage 226 and a new truncation is performed by utilizing the value of the second word in the substring and subtracting therefrom the minimum value and dividing by (LSX/2). This new number is then truncated at stage 228 by dropping all digits to the right of the decimal point and storing the new integer in the place for the second word in the substring and continuing to stage 230 to look for the third word in the string while decreasing the new IR3 by another four bytes. If, at stage 232, IR3 is still greater than zero, the loop continues again until IR3 reaches zero. If IR3 is less than or equal to zero, the program continues to the terminal stage 208.

If LSX had been less than zero, at stage 190, the program would continue to stage 194, where the floating point number "-1.0" would be stored at location BWX(IR1). After completion of this storage stage, the program continues at stage 234. At stage 234, the program takes the first number from the substring and logically adds the last four bits of this number to itself which is affective to round the number before the next step is taken of shifting the resultant sum logically to the right five bits, removing the last five bits from the number. This is a truncation with rounding and although one has lost the last five bits, the bits removed are the least significant bits in the floating point number.

The program then continues at stage 236 wherein the truncated, rounded number is returned to its storage location and BRYY the address in storage is incremented by four bytes to address the next word in the string and IR3 is decreased by four bytes. If IR3 is still greater than zero, at the next stage of the program 238, the loop continues by returning the program back to stage 234 for truncation with rounding of the next word in the string. This continues until TR3 is equal to or less than zero. When this occurs, the program moves on to stage 240.

It should be noted that the resultant words in the substring are now all integers and not floating point numbers as, by shifting to the right five bits, each number in the substring is less than 227 and the number including its exponent at the front thereof can be considered, for all practical purposes as an integer. After this is completed, when the program continues at stage 240 wherein the last word in the substring is removed from the substring and later stored in location CHECK(IR1) for purposes of later utilizing said word as a check on the accuracy of the compression and decompression of the substring. The number so retrieved at stage 240 is first shifted logically left five bits before storing it in CHECK (IR1) at stage 242. This is so the number will be in the exact form in which it should be found after decompression of the string at the end of NUPAKD. These operations occur at stage 242. After completion of the program steps at stage 242, there is stored in memory the address of the beginning of the first word in the string, at location CS15, and the number of bytes in the string, at location CS2. This latter number is loaded into register IR3, which is the number of bytes past the beginning of the substring where the last word in the substring is stored. The address of the beginning of the substring is loaded into register BRYY. After completion of this step, the program continues to stage 198 and the information is treated as though there had been no truncation, in the manner discussed previously with respect to integer numbers, including finding a median and subtracting the median from all numbers in the substring. After completion of stages 198, 200, 202, 212, 214 and 216 the converted substring reaches the terminal stage 208 for Step a of NUPAKC.

Step b of NUPAKC consists of identifying sequences and counting of significant bits so as to achieve condensation of information.

Step b of NUPAKC is best shown in FIG. 5. For purposes of definition the following are true:

Ir4 is equal to the number of consecutive equal integers;

Ir1 is the number of times a particular consecutive number is repeated in the substring;

Ir3 is the number of bytes in the particular substring which is being compressed;

Bryy is the address in storage where the particular substring is stored.

In view of the detailed description heretofore given, it is now possible to describe the steps of the program in accordance with groups of steps and the functions which are achieved by the program steps without necessity of describing each individual step within a particular sequence of steps. The actual program listing at the end of the written description will also aid in the understanding.

In program Step b , in FIG. 5, it is first desirable to examine the string and search the string to find consecutive numbers which are repeated along the length of the string. After completing the scan, as accomplished at the stages identified collectively by the numeral 244, (i.e., to find repeated numbers, identifying the number of repeats in a particular sequence, and providing the address of each of these repeated groups in the string), all of this information is then stored in the computer.

After having completed the scan to determine the number of repeats of particular numbers, the address of the repeats and the particular number being repeated, the program continues to stage 246 where the determination is made as to whether the number of bytes which can be saved, based on the results of the scan during stage 244, is greater than the number of bytes needed for control information for replacing the sequences of consecutive numbers. If the answer to this question is "yes," i.e., that the number of bytes saved is greater than zero, then the program continues to stage 248 wherein the consecutive identical numbers in the string have substituted therefore two four byte words (a double word) in which the first four bytes contain a number corresponding to the number of repeats of the particular information and the second four byte word indicates the number which is being replaced. The address in the substring where this sequence begin is stored in location DUM1 (IR4). After completing the operation at stage 248, the program continues to the steps shown at stage 250 wherein the substring is closed up to erase the sequence information now represented in the substring by the double word described with respect to the steps at stage 248, providing a new substring where the consecutive identical numbers have been replaced by a double word indicating the number of times a particular number is repeated at a given address in the substring. This operation as described with respect to stages 244, 246, 248 and 250 is continued to find further consecutive repeated numbers (if they exist) in the substring. When more than ten such sequences are found, the program stops with respect to this particular means of compressing the string. The number ten is merely an arbitrary number selected to indicate that a multitude of consecutive number are found in the string. It is most probable that the overall saving is not going to be substantially increased by removing additional repeated numbers found in the substring and, therefore, there is no need to waste additional machine time searching the substring. The number of sequences found in the substring is recorded in register IR6. The actual length of the string (in bytes) is recorded in register IR7. When the string has been completely scanned, and there is no more saving to be affected by executing the stages 244, 246, 248 and 250, the program continues to stage 252 wherein the stored address (in array DUM1) of the replaced consecutive groupings are placed at the head of the condensed substring.

If at stage 246, it was determined that for a particular repeated sequence found during the program steps at stage 244, that no saving would be affected by substitution for the repeated numbers, then the program continues to stage 254 wherein a determination is made as to whether the program should continue back to stage 244 to look for additional repeated numbers or whether to end this searching procedure and continue on to stage 252. In effect, stage 254 is substantially similar to the operation at stage 250 discussed previously.

After condensing the substring and placing the address information at the head thereof, the program continues to a stage 256 wherein the string is now prepared for the packing step which follows as described in FIG. 6. It should be remembered that no number in the string is greater than 227. Thus, in any four byte word, there must, necessarily, be five bits which are not used. Thus, the five leftmost bits in each four byte word are, of necessity, not used by any number in the substring. For purposes of packing, it will be necessary to determine, for each four byte word, the number of bits required to represent the number. In this regard, the string is prepared by moving the number in each four byte word to the left five bits leaving the rightmost five bits in each four byte word empty, Then, each number is scanned to determine the number of bits required to represent the number and this information is placed in the rightmost five bits. It will be understood that since the number of bits necessary to represent the number cannot be more than 27 (since the number cannot exceed 227 ) the number of the significant bits will be less than 27 or a number which can be designated within five bits of digital information. Accordingly, the substring, as it reaches stage 256, will be in a form wherein the number in the four byte word is recorded in the first 27 bits and the next five bits will provide information relating to the number of significant bits in those 27 bits.

FIG. 6 is a complete showing of the flow diagram for the program set forth schematically in FIG. 3. FIG. 6 indicates that there are, in fact, four steps required in NUPAKC. The steps are as follows:

a. Truncation;

b. Differencing;

c. Sequencing;

d. Packing.

In view of the detailed description given in respect to SANPAKC, it will be obvious to one skilled in the art after a discussion of the function of NUPAKC as to the manner in which NUPAKC operates from this functional description. Accordingly, although the flow diagram is a complete step by step analysis of the operation of NUPAKC, the description of the programming steps will be only in functional form without reference to specific counters, storage, and processing elements which will be accomplished in the computer by reason of the programming steps.

The step of truncation is effectively the step described in FIG. 4 with respect to stage 182. After completion of truncation which has been generally designated by the numeral 258, the substring proceeds to be operated on by the program at stage 184 where the differencing operation is completed. The differencing technique is basically an attempt to reduce the number of significant bits in the numbers being compressed so as to better enable the packing operation to be more efficient, Thus, the lower the number of significant figures in a given number, the better packing and more efficient packing is possible. Differencing is a means of achieving lower numbers without losing any information in the string.

The differencing operation is affected as follows:

a. First, the substring is added in an absolute manner to determine the absolute sum of all of the numbers in the substring, regardless of the sign of any individual number.

b. Each number in the substring is thus subtracted therefrom its next preceding number, seriatum, in a manner whereby, for example, if the first five numbers of the substring are the numbers (a, b, c, d, and e), after differencing, the new substring will have the numbers (a, (b-a), (c-a), (d-c), and (e-d).

c. Then the new differenced substring is added in the same manner as in step a, taking the absolute value of the resultant numbers in the differenced substring to produce a second sum. If the second sum is greater than the first sum, (that is the absolute sum of the numbers achieved through the differencing operation is greater than the actual sum of the original numbers) then no improvement can be achieved by differencing and, accordingly, the original substring will be passed out of the differencing stage of the program without any differencing operation being completed thereon. If the actual sum of the differenced substring in accordance with step' b is less than the absolute sum determined in step a then there has been some betterment by the differencing technique and a determination will then be made as to whether the substring can be even further improved by a second differencing step.

d. A second differencing step similar to step b is then effected on the resultant substring of step (b) to achieve a new string which will be (a,[(b-a)-a] [(c-b)-(b-a)], [(d-c)-(c-b)], [(e-d)-(d-c)]. The sum of the numbers in this new substring is then determined and if this absolute sum is greater than the absolute sum of the substring in step b, then the substring in step b is continued into the next step in the program. If the sum in step d is less than the sum in step b then the differencing technique is continued until in the last differencing step, the absolute sum of the new substring is greater than the previous differencing step. The step which produces a substring having the lowest absolute sum of the numbers therein is the substring which will be processed through the remainder of the program in NUPAKC. A record is kept of the number of differencing steps achieved in the program stage 184 and this number is recorded in the variable that maintains the status of the substring (SOS), which will later be placed at the head of the substring as information regarding the manner in which the substring can be decompressed.

After completion of the differencing technique at stage 184, the program continues the sequencing operation described in FIG. 5 with respect to step b at stage 186. After completing the sequencing operation, one has a string of information in the form of numbers, each in four byte words, with the last five bits of each four byte word giving the number of significant bis in the preceeding 27 bits. Additionally, there is an address at the head of the string giving a the number of sequencing operations which have been performed on the string and b the addresses of the information which has been sequenced along the string. The purpose of the packing stage 188 is to take the string of information and compress it into its optimal form by the use of a packing technique which will be described as follows:

a. The information is basically placed into sequential double words.

b. In each double word, the first eight bits set forth the numbers which will follow in the next fifty-six bits of information. For example, if a string of information includes numbers whose largest number requires only five bits of significant information, then it is possible to place eleven numbers in the 56 bits following the eight bits control information at the head of the double word. That is, after the control information indicates that there are eleven numbers to follow, each one of the numbers in the succeeding string will be placed in five bit groupings within the double word, leaving, at the end, one bit of useless information at the end of the double word. It will thus be understood that considerable compression would have been achieved as 11 numbers would normally have taken up 44 bytes, whereas, by this technique, it has been possible to compress this into eight bytes. From this limited point of view there had been a saving of 36 bytes.

c. The first double word in the substring is different from all of the other double words in that it has, in its first eight bits, the number of words which are packed into the last 48 bits in the first double word. The second eight bits in the first double word provides the number of sequences which were found in the substring at stage 186. This leaves only 48 bits in the first double word. It will be understood that if there have been sequencing operations on the substring, after the information relating to the number of sequencing operations, there is at the head of the substring, the addresses where each one of these sequencing operations took place. It is thus possible to determine where the addresses for the sequencing operation begin (after the second eight bits in the first double word) and where they end (after the number of sequence address set forth in the second eight bits in the first double word). After all the addresses have been completed, then the remainder of the substring begins.

It should be understood that within each double word only that group of numbers which can be fit into the 56 bits following the control byte will be included within the 56 bits. For example, if the largest number of a group of successive numbers requires seven bits of significant information, then there will be eight numbers within the 56 bits, each in seven bit segments and the control number will be eight. In this manner, maximum packing will be achieved for a particular substring which is being operated on by NUPAKC. All of the information relating to the substrings which are being operated on by NUPAKC have completed their passage through NUPAKC. When this is completed, the information relating to each substring is placed at the head of the string sequentially and additional information is placed at the head of the string relating to the number of bytes in the now condensed string, and the number of bytes in the string prior to entering COPAK which operates as a check for the operation after decompression of this string.

NUPAKD

The input information to the decompressor NUPAKD, best shown in FIG. 8, is of the type wherein the head of the string has certain control information which has been placed in front of the compressed string immediately subsequent to the completion of the numeric compression in NUPAKC. The input control information has at the head thereof four bytes which are designated as JIR, the number of bytes in the original segment. After JIR, the next four bytes are designated PARM. The first three bytes of PARM are the number of bytes in the compressed segment, with the last byte indicating the number of substrings in the string. It should be noted that no string contains more than twenty substring and, therefore, this information can be placed in one byte.

After PARM, comes the first status-of-substring information (SOS). The SOS four bytes of information contains, in the first three bytes, the number of bytes in the compressed substring. The next four bits contain the format code, which indicates the original input format type of X, A, I, E or F for the information. The format code and its meaning with respect to the type of compression in the string is shown in FIG. 9. The last four bits in the SOS four byte word contain the number of differencing procedures which were accomplished on the substring when passing through NUPAKC stage 184. After the SOS four byte word, there comes a four byte word indicated by the term CHECK. CHECK is the last four bytes in the substring which should be reproduced upon decompression. Thus, after decompression, it will be possible to compare CHECK with the last four bytes of the decompressed substring to determine whether there has been an accurate decompression of the substring.

After the CHECK four byte word, the next substring has its four byte words of SOS and CHECK as indicated. If the second substring had passed through the NUPAKC compressor utilizing truncation from floating point to integer form, it would be necessary to add two additional four byte words relating to said truncation. These two four bytes words of information are BWX, the explanation of which has been discussed with respect to FIG. 4 and stage 196 and YM which is discussed with respect to FIG. 4 and stage 220. BWX and YM four byte words are only added if SOS indicates that there is an E or F type format code indicating that truncation by the bin procedure or truncation by the logical right shift method were affected on the input information. Where the format type in SOS is neither E nor F, then there will be no words for BWX or YM. It will be understood that as many SOS and CHECK four byte words are added to the head of the processed string as there are substrings in the string.

All of the above mentioned information placed at the head of the string is removed from the substring and stored for use during the NUPAKD procedure. The first double word which enters NUPAKD contains in its first eight bits the number of words compressed within the last 48 bits of the first double word. The second eight bits of the first double word contains the number of sequencing operations utilized in compressing the substring. In NUPAKD the input first double word is picked up at program stage 262 of FIG. 7A wherein the first double word is loaded into registers IR2 and IR3. The IR2 contains the first four bytes and IR3 contains the second four bytes of the first double word. At program stage 264 immediately following, the position in storage of the substring is incremented by eight bytes to indicate that the first double word is now being decompressed. Additionally, IR7, the register which contains the number of bytes in the condensed substring, is decreased by eight bytes. After this step, the program moves to stage 266 wherein the first eight bits in the double word are extracted from IR2 to provide the number of words in the substring. After extracting the first eight bits, the program continues at stage 268 to compute the number of bits in each word in the remaining portion of the double word. Since the first double word has 48 bits of information, if the number of words in the substring were nine, at location 268, a determination will be made that there are five bits in each segment of the double word which are to be expanded into full words. The program continues next to stage 270 wherein the information in the next double word is shifted to the left eight bits so that the next 56 bits in the double word can be considered. If this is the first double word in the input substring, then, at stage 272, the next eight bits (NOS) are taken from IR2 and that number is stored in RSX. If this is the first double word, then the remaining 48 bits are shifted another eight bits to the left to bring the last 48 bits to the head of the double word for operation thereon. If the number NOS is zero, then the program would skip to program stage 274 and operate in the manner which will be discussed below. However, if NOS is greater than zero indicating that there are some condensed sequences, then the LSX array is used to store the locations in the substring, of the sequences. In each four byte segment is placed the number from IR2. First, however, the number from IR2 is placed into register IR5 at stage 276 at the right hand end of the register so that, when placed into LSX in four byte segments, the number will appear at the correct position, mainly, the right hand end of the four byte segments. All of the above shift and storage into LSX of the information IR2 and IR3 is accomplished at program stage 278. If, by reason of a review of NOS, it appears that all of the sequence address have not been included in the first double word, provision is made through the use of the program stages 282 and 284 to indicate that the second double word must be similarly decompressed. It should be understood that with the second double word, as it enters stage 262, and continues in a manner discussed with respect to the first double word, that only the first eight bits would be looked at for current information, namely the number (NOS) and that at stage 268 the number NOS would be divided into 56 the number of remaining bits in the second double word (and each succeeding double word). Thus, after all of the sequencing addresses have been stored in LSX, the program continues at stage 274 to extract, bit group by bit group, each word compressed in the remaining 56 bits in each double word. However, in order to save time in forming the string, it is necessary to first store each 56 words (four byte group) in core storage position DUN 1. As each 56 words are stored in DUN1, they are transferred, as a group, to a second core storage location YY where they form the partially decompressed string. This is all accomplished in a series of steps herein noted as program step 286 (shown in FIG. 7B). These transfers eliminate the need for continuously shifting all of the partially decompressed substrings to the right as each additional word is added to the substring. By grouping 56 words in DUN1, it is possible to shift this entire amount in one operation to storage location YY in the correct position at the right hand end of the partially decompressed substring. This operation continues until all of the double words have been expanded back into their original form and the complete partially decompressed substring is presented in which all packing has been removed. When this has been completed, the program is at stage 288. At stage 288, the operation for decompressing the condensed sequences in initiated. This programatic operation is done generally within the steps indicated as stage 290 (in FIG. 7C). If there was no sequencing operation, then, of course, the entire stage of 290 is bypassed and the program would continue at the stage succeeding stage 290 which will be discussed below.

If sequencing has been accomplished, then the register in the LSX is increased by four bytes and at stage 292 and the first address in LSX is computed at stage 294 to determine where in the substring the sequence begins. At that location in the substring, at the first word containing the number of times the sequence was repeated and the second word contains the actual number which was repeated. This is determined at stage 296. Thereafter, the program computes the size of the hole which has to be made in the substring in order to insert the repeated sequences.

This "hole" creation is accomplished at program stage 298. This program also creates this hole in the substring so as to allow the data to be inserted into the substring at the proper location. In the next sequence of steps in the program, the numbers to be inserted are regenerated and inserted into the substring at the proper address, and the new length for the string is computed for storage in counter JII. Finally, at substage 302, the number NOS is reduced by one and, if it is greater than zero this indicates that there are additional sequencing addresses in LSX and the operation continues again at stage 288. If NOS is now zero, that means that all of the sequencing operations have been completed and all the sequencing numbers have been decompressed and the operation is ready to continue at stage 304. At stage 304, the differencing operation is reversed for the purposes of further expanding the substring. At stage 304, the differencing operation is reversed by continuing to stage 306 wherein the first word is added to the second word. The newly formed second word is added to the third word, the newly formed third word is added to the fourth word, etc. down the substring until the end of the substring thus reversing the differencing operation in NUPAKC and decompressing one complete differencing operation. If more than one differencing operation is required, at stage 308, a determination will be made that an additional differencing operation was accomplished on the compressed information and, accordingly, the steps at programs stage 306 will be repeated until all of the differencing steps have been reversed, returning the compressed information to its original form prior to differencing. When this has been completed, NDR will be equal to zero and the program will continue at stage 301. At stage 310, the truncation process is reversed. If there was no truncation, then nothing happens at state 310. If the right shift truncation was used during compression, the words will be shifted to the left five bits thus reversing the right truncation. If the bin method was used, then this too is reversed by multiplying BWX times each of the numbers in the substring and adding to each of the numbers in the substring the minimum YM which has been placed in storage. BWX is, of course, also placed in storage from information which was at the head of the substring prior to its application to NUPAKD. After completion of stage 310, the program continues, finally, to stage 312 wherein SOS is reconstructed in accordance with the new string, specifically adding the new JII and, further checking the new string against CHECK(1) and CHECK(2) and any other information which has been stored indicating the original information such as the original length of the string prior to compression and decompression. The information has thus been compressed in NUPAKC and decompressed in NUPAKD and is ready for whatever uses are desired by the user.

CONTROL PROGRAMS

The CONTROL routine can be viewed as a supervisory program that serves as a buffer between the O/S system of the IBM 360 and the SOLID System. It is coded in the "higher language" (ALLOCATE) that has evolved from the open-ended two-part design. The CONTROL routine performs the following functions:

a. During assembly, CONTROL positions those components of the SOLID system that are compiled in the main system with the SUBMP BMP service macro.

b. During execution, CONTROL calls the components when they are needed.

c. Special termination procedures, which are designed to protect the AUXILIARY FILE, are executed in the CONTROL routine before the O/S system terminates the job in the normal way.

By changing the CONTROL routine and the SUBMP service macro, a user can easily alter the SOLID System to perform a specific task like data compression. The SOLID System can be tailored to fit a particular 360 configuration by altering the planned overlays. A step-by-step description of the implementation procedure follows:

Step 1

Code the control routine in ALLOCATE. If some components are not going to be used the SUBMP service-macro must be altered to include the dummy entry points for the omitted components. Also, the name of the component must be deleted from the overlay structure. For example, if the component SSEARCH is not being used SUMMP will contain the two statements:

DUMADD PMARRAYR SEARCH

and SEARCH must be deleted from the planned overlay.

Step 2

Determine the amount of core storage that is available.

Step 3

Figure the amount of storage that is needed for the components and the CONTROL routine.

Step 4

Select values for the fourteen variable parameters, then compile the components and store them in the load module library, SOLID.LOAD. If the retrieval package is being used, the size of the memory block (defined by &NTRKS, &TRKL and &LTHAYY) should be as large as possible. A minimum of 22,000 bytes is needed for the CONTROL routine plus the largest component. Because frequest accesses to the load module library (SOLID.LOAD) are costly, it is suggested that the planned overlay structure should also be considered at this time.

Step 5

Construct the planned overlay structure so that the storage allocated for the programs will be fully and efficiently utilized. Separately compute the 31 components and store them in the load module library (SOLID, LOAD). Assemble the CONTROL routine.

The overlay structures are discussed next, then the CONTROL routines for data-compression, data-transmission and the SOLID System are given.

A. overlay Structure

Here the planned overlays for the IBM 360/40 (128K) and IBM 360/67 (768K) are given as an example. For details of the overlay technique the reader is directed to the IBM 360 Link Editor Manual.

i. IBM 360/40

The storage (in bytes) needed for each component is given in parentheses. Double buffers were assigned for the four tape DCB's. The single buffer for the disk DCB occupied 3600 bytes. A memory block was defined as ten (=INTRKS) logical records (length=&TRKL=3600). The two principal data arrays, &ARRAY and YY, had lengths of 1500 and 38000 bytes respectively. The storage figures given in parentheses below are approximate. This overlay arrangement requires about 27000 bytes of case-storage, exclusive of the storage needed for data-assay. ##SPC10##

ii. IBM 360/67

On the 768K IBM 360/67 the 30 components after OPENSHUT were on a single branch of the overlay. Two buffers were used for each tape DCB. About 84,000 bytes of core are needed for this arrangement. All 31 components can also be compiled with the CONTROL routine and positioned by the "SUBMP" type macro instruction.

B. CONTROL for the SOLID System

The program given for the CONTROL System (SOLIDO) requires both the SOLID.MACLIB and SOLID.LOAD libraries. Normally the CONTROL routine would be compiled separately and stored in the load module library, SOLID.LOAD.

Before the CONTROL routine is compiled the variable parameters associated with the RESERVE and SUBMP macro instructions must be selected. Normally this would be done before the components are separately compiled and stored in the load module library, SOLID.sub.. LOAD. All variable parameters have been defined.

In the particular example described in this specification, the CONTROL routine accomplishes the following:

Storage:

An information path is traced in the AUXILIARY FILE with a JOBLIST item stored on a tape with the DCB named TAPEJB and a record number is assigned to the RFILE. The bulk referenced information (stored on the tape with DCB TAPEIND) is compressed and written on a tape with the DCB named TAPEOTC.

Retrieval:

The JOBLIST item stored on the tape with the DCB named TAPEJB is used to trace an information path in the AUXILIARY FILE to the bulk storage address in RFILE. This address, which is the number of a logical record on a tape read with the DCB named TAPEINC, is used to retrieve the compressed referenced information. This referenced information is decompressed and then appears on the device designated by OUTPXT. An unsuccessful search is terminated with an appropriate message.

The translated JOBLIST items can be arranged sequentially on the tape with the DCB named TAPEJB. The bulk referenced information, which is to be stored in compressed form, is located on the tape with DCB named TAPEIND. With OUTPXT=0 the compressed referenced information can be stored on the tape with DCB named TAPEOTC.

When this CONTROL routine is used on a production basis, the following steps are taken:

The compressed referenced information is stored on a bulk storage device like data-cells, disks or tapes. This might involve setting the tape-read (REIDT) and tape-write (WRITE) macro instructions and the first BULK address in the initialization macro MJARRAY, which is executed in SSTATECL.

The variable parameter &LTHAYY is set to equal to the size of a memory block plus 2000 (for the M and J arrays) plus 21/2 times the maximum string length (LIT = SLENGTH + LLENGTH).

For example, if strings are to be less than 2,000 bytes and a memory block is to have 100 logical records then: &LTHAYY=100 (memory block is 100 .times. 7294 = 729,400 bytes); LIT = 5000 (i.e., greater than twice times the maximum string length); &LTHAYY = 729,400 + 5,000 + 2,000 = 740,000. The 2,000 is the number of bytes needed to store the first arrays associated with the prime index M and screen J.

The flow diagram for the CONTROL package SOLIDE is shown in FIG. 25. SOLIDE is the extended form of the CONTROL package where no overlays are required. If overlays were required, the stage 1020 shown in FIG. 25 calling for the macro-instruction SUBME would have su stituted therefore the macro-instruction SUBMO with the numbers 100,000; 500; 500; 500; and 1500. However, for the purposes of simplicity of the description, the CONTROL package in its extended form will be fully discussed with only the program at the end of the specification being utilized to show the operation with overlays.

In the control package SOLIDE, first, the RESERVE macro instruction is executed at stage 1022. The RESERVE macro-instruction defines the storage areas, the registers, tapes, and intializes the SOLID System. The principal array YY, the override arrays, and the two arrays (JBLIST and JB1) are defined in the macro instruction SUBME at stage 1020. The system parameter &ADDL which is the composite address length is set, for example, at six bytes; the principal array length parameter &LTHAYY is set at 100.000 bytes, the number of PCORDS in the PCORD TABLE that are to be used in the fast mode of SANPAKC are set to 5 (&TPCORD); and the length of the two JOB-LIST arrays JBLIST and JBWORK is set equal to 1500 bytes (&LJBLIST). After these system parameters are set, control goes to the next stage, 1024, wherein the macro instruction DEVICE is executed. The DEVICE macro sets up seven device commands which tell the system where to find information that is to be compressed and what to do with information after it has been compressed. These seven device commands are as follows: INPXT (tells what type of device information is coming in on); OUTPXT (tells the system what device the information should be put on after compression or decompression): RSKIPS (a tape command which tells how far to skip out on the tape before beginning); SLENGTH (the minimum number of bytes per segment of information, used as a compressor command); LLENGTH (the number of bytes in the LABEL that is not to be compressed at the beginning of each segment); RNOS (the number of strings or segments that have to be processed by the compressor before a new set of device commands is read); and TPCORD (the number of PCORDS in the PCORD TABLE that are to be used by SANPAKC in the fast mode). It should be noted that if TPCORD as set in the DEVICE command at stage 1024 is not entered, the value of TPCORD falls back to the value &TPCORD set at stage 1022.

After completion of the macro device at stage 1024, control goes to stage 1026 where the macro-instruction STRING is executed. The STRING macro reads the five string commands that define the status of the strings or segments of information. These five commands are MODE (tells whether the string or segment is to be compressed or decompressed or to be used to update the system); POSTOP (tells the system what to do after it has completed the current job, i.e., to get out of the system; to read a new set of device commands; or to read a new set of string commands); LEXCON (indicates whether the PACORD TABLE must be read or not read); LEXMODE (indicates whether the system is operating in the fast mode or the slow mode or simply extending the PCORDS TABLE); LEXPACH (indicates whether the PACORDS TABLE should be punched or not punched after the current string or segment has been compressed or decompressed).

After completion of stage 1026, this program continues to stage 1020 where the macro instruction GETJLIST is accomplished. The GETLIST macro performs all of the instructions relating to the fetching and translation of the descriptor sets to the JOBLIST form. There are nine instructions associated with the GETJLIST macro. These instructions are as follows: JLINPXT (designates the imprint input/output device); JLRSKIP (a tape command which indicates the number of records that are to be skipped before the first record is read); JLTRAN (designates the translator that is to be used); JLNORM (designates whether normalization is to be executed or not); KLENGTH (designates the number of bytes per kernel in the JOBLIST item); NJOBS (designates the number of bulk items to be stored for each information path); NTASKS (designates the number of items in the JOBLIST); and the last four items NVALUE, JVALUE, NUMDIAG and GENERATE are special instructions associated with the Monte Carlo generator for generating random JOBLIST items. They are read only when JLINPXT equals 16 which indicate that the Monte Carlo generator is to be used. When random generation of JOB-LIST items is being effected the NVALUE is the value of M, JVALUE is the maximum value of any J, NUMDIAG is the maximum number of diagonols or screens to be generated; and GENERATE is the location of the random number that is to be used by the Monte Carlo generator to generate the JOBLIST item. Monte Carlo or Random generators are used to debug the system or to determine the limits of the system and to determine the economics of its operation.

Control then goes to stage 1030 wherein the macro instruction CALL2 is executed. The CALL2 macro first executes the SSEARCH component, as discussed previously, then the component SRESULT is executed. SRESULT prints intermediate results of the search. In a continuing production system, it may be undesirable to even utilize the SRESULT component and, accordingly, the CALL2 macro instruction may have substituted therefore a macro instruction CALL1 which would call only the SEARCH procedure. After the Call1 or CALL2 instruction control goes to the location ANSWER. Thereafter, at stage 1032, the instruction DISPENSE would be executed. In DISPENSE the determination is made as to whether control should pass through the compressor, back to stage 1030, back to stage 1028, back to stage 1026, or back to stage 1024. The other option is, of course, to leave the machine because the day's operations have been completed.

At some point, after completion of stage 1032, the program would continue to stage 1034 wherein another CALL1 instruction would be effected to pass control to the SREADC component which reads the substring command, and the bulk information, if it is on cards. Then control goes to stage 1036 which is a decision box. At stage 1036, determination is made as to whether the INPXT command is zero or not zero. If the INPXT command is zero, then control goes to stage 1038 wherein a CALL1 instruction is used to pass control to SREADT component which reads the sub-strings of information from magnetic tape. When INPXT is zero, this means that the bulk information is on tape. If INPXT is not zero, then the control passes directly to stage 1040 wherein RSKIPS is set to zero.

RSKIPS is normally used for the tape read-out and, since INPXT is not zero, this means that the bulk information is not on tape and, therefore, there is no need to have any value of RSKIPS. If the information had been on tape, and had been read out at stage 1038, RSKIPS would have been reset to zero so that, at a later stage, it would be reset to a new value in the macro DISPENSE at 1032.

After completion of stage 1040, control passes stage 1042 which is the COPAK macro discussed previously. After completion of the COPAK macro at stage 1042, control passes to stage 1044 wherein the macro instruction CALL1 is used to call the SOUTPUT component. SOUTPUT macro disposes of the information after compression or decompression in COPAK in accordance with the OUTPXT command set at stage 1024. After completion of the SOUTPUT command, control can pass either back to stages 1032, 1030, 1028, 1026 or 1024 or, alternatively, can pass out of the system. After completion of stage 1044, the last stage of the program SUBME at stage 1020 positions all the components correctly at compilation time. The fourteen system parameters discussed previously are defined at stage 1020.

CONTROL PROGRAM COPAKCD

It the COPAK compressor program were to stand alone without relation to the SOLID System, then a separate control program would be required for COPAK. This has been defined as COPAKCO. The control program flow diagram for COPAKCO is shown in FIG. 26.

The control program for COPAKCO is substantially similar to the control program for SOLIDE except that unnecessary macro-instructions relating strictly to the SOLID System have been eliminated and new or changed macro instructions have been substituted therefore. Macro instructions which are similar to the macro instructions shown in FIG. 25 have been shown in FIG. 26 with prime numerals. Basically the control program in FIG. 26 is substantially similar and operates in substantially the same manner as the control program of FIG. 25.

In FIG. 26, the first stage of the flow diagram for COPAKCO is stage 1046 wherein the macro instructions RESERCO is executed. In this instruction, only three system parameters are set, namely &LTHAYY, (the length of the principal data array); &TPCORD (the number of PCORDS in the PCORD TABLE used in the fast mode); and &LJBLIST (the length of the job list array). For purposes of example, in FIG. 26 &Lthayy has been set at 20,0000 bytes, &TPCORD has been set at 5 and &LJBLIST has been set at 1,500 bytes. After setting these system parameters, the program continues through stages 1024' and 1026' to stage 1048 wherein the macro instruction DISPOSE is executed. DISPOSE performs the same functions as were performed by DISPENSE at stage 1032 except those procedures relating to the search operation have been omitted. After completion of stage 1048, the program continues to stages 1034', 1036', 1038', 1040', 1042' and 1044' in the same manner as was discussed with respect to FIG. 25. Then, at stage 1050, the macro command SUBCE is performed which positions all Those components needed for compression or decompression at compilation time. SUBCE is used if OOPAKCO is to be used in the extended form. If the CONTROL routine is to be executed in the overlay form, then instruction SUBCO should be used in place of SUBCE.

CONTROL PROGRAM COPAKAN

If the alphanumeric compresser and decompresser is to be used as a stand alone program, then a separate CONTROL program should be used. This CONTROL program is shown in FIG. 27 and it is named COPAKAN. Similar programatic steps shown in FIG. 25 and 26 have been shown by either ' or " numerals in FIG. 27 to indicate that there is no difference between these programatic steps and the steps used in FIG. 27.

In FIG. 27, the program continues as in FIG. 26 through stages 1046', 1024", 1026", 1048', 1034", 1036", 1038", 1040", to stage 1052. At stage 1052, the macro-instruction COPAJ is effected. COPAJ is a special macro-instruction which, in effect, is COPAK without the numeric compressor, decompressor comonent SNUPAK. After completion of stage 1052, the program continues to stage 1044". Stage 1054 includes the instruction SUBCJ which positions all of the components necessary for COPAKAN. It should be noted that the instruction SUBCJ is for use in the extended form. If operation is in the overlay form, then there is substituted for the instruction SUBCJ, the macro instruction SUBCJO. Please note that for both the COPAKCO and COPAKAN and, additionally, for the COPAKNU instructions to be discussed hereinafter, there is only needed three system parameters, namely, &LTHAYY, &TPCORD, and &LJBLIST.

CONTROL PROGRAM COPAKNU

If the numeric compressor and decompressor SNUPAK is operated as a stand alone program without the alphanumeric compressor SANPAK then a special control program for the macro SNUPAK must be used. This is shown in FIG. 28. This is defined as COPAKNU. Similar programatic steps shown in FIGS. 25, 26, and 27 have been indicated with prime numerals to indicate similar instructions. In COPAKNU shown in FIG. 28, again the program starts at stage 1046" continues through stage 1024'" to stage 1056. At stage 1056, the macro instruction STRING is effected, but only the first three string commands mode, POSTOP and LEXICON are read as the remaining instructions discussed with respect to FIGS. 27 and 26 relate to alphanumeric compression and, therefore, are not necessary.

After completion of stage 1056, the program continues through stages 1048", 1034'", 1036'", 1038'", 1040'", to stage 1058 wherein the macro instruction COPAB is performed. COPAB is a variation of COPAK without alphanumeric compression or decompression. After completion of stage 1058, the program continues to stage 1044'". Stage 1060 provides the macro-instruction SUBCB which provides all of the components of COPAKNU at the time of compilation. Again, SUBCB is the macro instruction in the extended form, if the system is operating in the overlay form, then a special instruction SUBCBO must be substituted for the instruction SUBCB.

It will be appreciated that all of the functions shown in block diagram in the drawings are implemented by digital program. The digital program listing in accordance with this invention will now be given sufficient details to enable those skilled in the art to carry it out. This routine is written in IBM BALL language and the program can be carried out by a number of suitable digital processing systems. As one exaMple of a digital system on which this program has been performed, reference is made to the IBM Computer 360/40. The program is as follows:

* * * * *