1.Introduction
|
|
3.1 Overview3.1.1 IntroductionThe next highest level of the Toolkit above the SMF tools includes the Process Control (PC) tools. Their purpose is to provide a direct interface between the science software and the rest of the SDPS, including accessing file attributes (data about files), physical filenames (for use by HDF functions), and other functions. These tools are used internally by many Toolkit functions, such as Generic I/O, Ancillary Access, and other tools. There are two sets of PC tools: the Command tools, which are callable from Unix shell scripts, and the API tools, callable from C and Fortran. Much of the functionality is duplicated between these two groups; many of the Command tools are simple wrappers on the C code of the API tools, with some exceptions. The Process Control File (PCF) is central to the PC tools. At the SCF, you construct a PCF using a text editor, one for each PGE. These PCFs are part of the delivery of your software to the DAAC. Your software will access files by logical identifiers (essentially integers, defined by mnemonics). The PCF maps these logical identifiers to physical references (currently physical file names and directories). Each logical identifier corresponds to one or more physical references, or versions. At the SCF, you can use any physical reference you like. In the production environment, the physical reference is supplied by the DAAC. Details are given below. In this overview section, we walk you through the procedure of constructing your own Process Control File step-by-step, then explain the workings of the pccheck utility, which checks the format of this file. The PCF is read by most of the PC tools (directly or indirectly), and is the current mechanism by which the Toolkit interfaces with the rest of the SDPS. The mechanism may change in the future, but the interface to your code will not. 3.1.2 Constructing your Process Control fileThis section explains how to customize a Process Control File for use in your code. A default Process Control File (PCF) is included in the TK5.1.1 delivery. It contains entries which are either required or optional for use of many Toolkit functions. This file is named $PGSRUN/PCF.relA.template. The particular example we use here is from the Pathfinder AVHRR/Land Toolkit Prototype study. The complete example file appears in Appendix B of this document. It is recommended that you start with the same (customized) copy of the PCF each time you run at the SCF, especially if you are using temporary files in your processing. You don't want previous temporary file references in the PCF, since these files are deleted by the system (unless you are not using PGS_PC_Shell.sh or PGS_PC_TermCom). The Unix environment variable $PGS_PC_INFO_FILE must point to your Process Control file in order for the Toolkit to work at all. We go through the example file section-by-section. The sections of a Process Control file include:
All sections of the PCF, except the SYSTEM RUNTIME PARAMETERS and USER- DEFINED RUNTIME PARAMETERS sections, consist of names, locations and other data about physical files. Each of these sections has a default file location, which is at the beginning of the section. The default file location is delimited by a '!' in column one of the PCF. This location points to the default directory in which these files are stored. This may be overridden for individual files, by inserting the fully-qualified physical directory path, as explained below. The PRODUCT INPUT section provides a detailed example of considerations that apply to all sections that involve files. Explanations of other sections provide only differences unique to those sections. General considerations: # Process Control File: Pathfinder AVHRR/Land Toolkit # Prototype # # Env variable PGS_PC_INFO_FILE must point to this file Comments in a PCF are any lines that begin in the first column with "#". ? SYSTEM RUNTIME PARAMETERS The "?" symbol in the first column defines this line as the subject of the section. These nine subject names must not be changed nor deleted from the PCF. Blank lines are not allowed. Pipe character "|" must be used to delimit fields. The exclamation point "!" must be used to designate the default file location. This must appear before any file entries in each section of the PCF. The entire length of any line in the PCF may not exceed 1000 characters. Different sections of the PCF have different numbers of required and optional fields for each entry. In the examples below, each entry is identified as required or optional. 3.1.2.1 SYSTEM RUNTIME PARAMETERS? SYSTEM RUNTIME PARAMETERS # ---------------------------------------------------------- # Production Run ID - unique production run identifier # ---------------------------------------------------------- 1 This string identifies the particular run of your algorithm at the SDPS. This field is required, and may be up to 200 characters. It cannot be the string "0". # ---------------------------------------------------------- # Software ID - unique software configuration identifier # ---------------------------------------------------------- 1 This string identifies the particular software of which your PGE consists. This field is required, and may be up to 200 characters. It cannot be the string "0". In the production system, both of these fields are written into the PCF by the SDP Planning and Scheduling sub-system. At the SCF, you may use any string you like. Note that the 'Production Run Id' value is used in the naming of Temporary and Intermediate files. Currently these are the only two fields allowed in this section. DAAC and hardware identification are being considered as additions to the configuration data, for future deliveries of the Toolkit. 3.1.2.2 PRODUCT INPUTThis section is for primary data files used as input to create standard products. This includes such files as ancillary data, Level 0 data, and standard products output from other PGEs; in general, all of your input files. ? PRODUCT INPUT FILES # [ next line is for default location ] ! ~/runtime Environment variable PGSHOME/runtime is the default location of the files in this section, unless it is overridden for individual files, as explained below. Note that the tilde character "~" is equated to the environment variable PGSHOME. This is true throughout the entire PCF. This particular default file location $PGSHOME/runtime must not be changed, because of the way the Toolkit Ancillary Data Access input files are handled. Default file locations of all other sections of the PCF may be changed to whatever you like. # ---------------------------------------------------------- # Pathfinder AVHRR/Land input files # ---------------------------------------------------------- 201|87002002709.no9_gac|||||1 401|goldtopolandsea8.bin|||||1 402|gridtoms_1987_sngl_ntwk|||||1 403|ephem8788.dat|||||1 404|timecorr8788.dat|||||1 405|SDSannotations.dat|||||1 406|HDFmetadata.dat|||||1 410|jan021987.proclog|||||1 201|87002002709.no9_gac|||||1 The first entry in this section is used as an example; it is the primary input file for Pathfinder AVHRR/Land processing. 201|87002002709.no9_gac|||||1 Field 1 is the link between your software and this PCF entry, the logical identifier. This identifier should be associated with a mnemonic in your code, at the beginning of the module where you use PGS_IO_Gen_Open to open this file, as shown below. This field is required, and must be an integer, of type PGSt_integer (long) in C, INTEGER in Fortran. Science software may use any positive integer for logical identifiers, except integers in the range 10,000-10,999; these numbers are reserved for the Toolkit. In C, the form of this is #define GAC_FILE 201 In Fortran, PARAMETER (GAC_FILE=201) You then use GAC_FILE as an input parameter to Toolkit function PGS_IO_Gen_Open. Note that while you can use hard-coded numbers in calling sequences, instead of mnemonics (C) or parameters (Fortran), this will make things difficult for integration and test, and also for maintainance; this practice is strongly discouraged. 201|87002002709.no9_gac|||||1 Field 2 is the file reference, currently the actual physical filename, unqualified (i.e., without directory information). In the future production system, this mechanism may change (for example to a Universal Reference), but this will not affect the science software. This field is required, and is a string of up to 256 characters. 201|87002002709.no9_gac|||||1 Field 3 is the path name, for overriding the default directory. In this example, the Toolkit will look for this file in location $PGSHOME/runtime/87002002709.no9_gac. If instead this entry were 201|87002002709.no9_gac|/fire2/toma/data||||1 then the Toolkit would look for this file in /fire2/toma/data/87002002709.no9_gac. This field is optional, and is a string of up to 100 characters. 201|87002002709.no9_gac|||||1 Field 4, blank here, is reserved for future use. 201|87002002709.no9_gac|||||1 Field 5, blank here, is the universal reference. It may contain any string of up to 150 characters. This value may be returned by calling the function PGS_PC_GetUniversalRef. 201|87002002709.no9_gac|||||1 Field 6, blank here, is the attribute location. It is the name of a file that contains data about the file of Field 2. This file must be in the same directory as the file in Field 2. This field is optional, and is a string of up to 256 characters. For an example of an attribute file, see the descriptions of the PGS_PC_Get*Attr Tools. 201|87002002709.no9_gac|||||1 Field 7 is the sequence number. It is used if there is more than one physical file associated with the logical identifier of Field 1, which is normally only the case for PRODUCT INPUT and PRODUCT OUTPUT files. The version number, which is used as an argument to Toolkit functions that access different instances of a file, is not the same as sequence number. The version number is the order which the files are listed in the PCF, from smallest (1) to largest. 201|87002002710.no9_gac|||||2 201|87002002709.no9_gac|||||1 then file 87002002710.no9_gac is version #1 (sequence #2), and file 87002002709.no9_gac is version #2 (sequence #1). (As you may have noticed, it so happens that the the version numbers specified in your code run opposite to the sequence numbers defined in the PCF.) Field 7 is required for PRODUCT INPUT and PRODUCT OUTPUT files (but is optional for all other sections of the PCF). It must be an integer. The rest of the entries in the PRODUCT INPUT section of the Pathfinder AVHRR/Land Toolkit Prototype file (Appendix B ), in the section labeled "Toolkit product input files", are Toolkit files. These are normally not modified. 3.1.2.3 PRODUCT OUTPUTThis section is for standard product output files. ? PRODUCT OUTPUT FILES # [ next line is for default location ] ! ~/runtime # # ---------------------------------------------------------- # Pathfinder AVHRR/Land main output file # ---------------------------------------------------------- 301|test11.hdf|||||1 This file is defined in C code as #define HDF_FILE 301 or in Fortran code as PARAMETER (HDF_FILE=301) It resides in directory $PGSHOME/runtime. It does not have an attribute file. This section has the same fields as PRODUCT INPUT. 3.1.2.4 SUPPORT INPUTThis section is primarily for files that are input to Toolkit functions. Ordinarily, you would not modify any entries in this section. An exception to this is the template files used for ancillary files; see the Ancillary Data Access Tools section. ? SUPPORT INPUT FILES # [ next line is for default location ] ! ~/runtime # # ---------------------------------------------------------- # Pathfinder AVHRR/Land support input files # ---------------------------------------------------------- They reside in directory $PGSHOME/runtime. They have no attribute files. This section has the same fields as PRODUCT INPUT and PRODUCT OUTPUT, except that Field 7 is not required> There are no support input files in the Pathfinder AVHRR/Land Toolkit Prototype. The entries in the SUPPORT INPUT section of the Pathfinder AVHRR/Land Toolkit Prototype file (Appendix B ) are Toolkit files, mostly to support the Ancillary Data Access (AA) Tools. You may modify these, as explained in the AA Tools section of this document. 3.1.2.5 SUPPORT OUTPUTThis section is primarily for files that are output from Toolkit functions ? SUPPORT OUTPUT FILES # [ next line is for default location ] ! ~/runtime # This section has the same fields as PRODUCT INPUT and PRODUCT OUTPUT, except that Field 7 is not required. There are no support output files in the Pathfinder AVHRR/Land Toolkit Prototype. The Toolkit files in this section support the SMF Log files . You may change the names of the files and directories if you want, but not the logical identifier (Field 1). 3.1.2.6 USER-DEFINED RUNTIME PARAMETERSThis section of the PCF is different from the other sections in that it does not contain information about files. Instead, it may be used to obtain other kinds of information from the production environment. ? USER DEFINED RUNTIME PARAMETERS # # ---------------------------------------------------------- # Pathfinder AVHRR/Land runtime parameters # ---------------------------------------------------------- 601|requested_size_x|409 602|requested_size_y|128 603|wait_time|3 601|requested_size_x|409 Field 1 is as always the logical identifier. This field is required. 601|requested_size_x|409 Field 2 is the parameter name. It is an optional text string of up to 200 characters. The Toolkit ignores this field; its intended use is for identification in this PCF, so you may enter whatever you like here. In the Pathfinder AVHRR/Land Toolkit Prototype, the name of the variable in the code is used for this purpose. 601|requested_size_x|409 Field 3 is the parameter value. This is read into your code as a string of up to 200 characters by the Toolkit (PGS_PC_GetConfigData). Your code is responsible for any necessary conversion. e.g. to integer. This field is required. Toolkit files in this section support the sending of files and email to remote locations. For an explanation of these entries in the PCF, see the Notes section of the Tool Description for PGS_SMF_SendRuntimeData. 3.1.2.7 INTERMEDIATE INPUT? INTERMEDIATE INPUT # [ next line is for default location] ! ~/runtime # This section and the next are for intermediate files, or files that will exist for longer than a single PGE, but are not standard products. This section is for intermediate input files. This section has the same fields as PRODUCT INPUT and PRODUCT OUTPUT, except that Field 7 is not required. The unqualified file name (Field 2) is ordinarily the name of a file that was generated as an INTERMEDIATE OUTPUT file by a previous run of this PGE. At the SCF, if you are testing successive runs of a PGE which share intermediate files, you need to make sure that the logical identifier is the same in the PCF that you use for all the runs. If you are accessing the intermediate file from a different PGE than the one that created it, you also need to make sure that the mnemonic definitions in your code reference the same logical identifier. You should also copy over the file name if you use a different PCF. How intermediate files are handled in the production environment, specifically how long they stay around, has not been determined at this writing. Use function PGS_IO_Gen_Temp_Open (C) or PGS_IO_Gen_Temp_OpenF (Fortran) to open intermediate files. There are no intermediate files in the Pathfinder AVHRR/Land Toolkit Prototype. 3.1.2.8 INTERMEDIATE OUTPUT? INTERMEDIATE OUTPUT # [next line is for default location] ! ~/runtime # This section is for intermediate output files. Entries for this section of the PCF are created by the Toolkit; you do not need to enter values. This section has the same fields as PRODUCT INPUT and PRODUCT OUTPUT, except that Field 7 is not required. The unqualified file name (Field 2) is generated by the Toolkit. How intermediate files are handled in the production environment, specifically how long they stay around, has not been determined at this writing. Use function PGS_IO_Gen_Temp_Open (C) or PGS_IO_Gen_Temp_OpenF (Fortran) to open intermediate files. There are no intermediate files in the Pathfinder AVHRR/Land Toolkit Prototype. 3.1.2.9 TEMPORARY IOTemporary files are files that exist only for the duration of a single PGE; the production system deletes these files automatically on PGE termination. (You may use the function PGS_IO_Gen_Temp_Delete to do this at the SCF.) Since a single PGE may consist of several of your executables, this section is part of the PCF to enable these files to be passed among these executables. Entries for this section of the PCF are created by the Toolkit; you do not need to enter values. The unqualified file name (Field 2) is generated by the Toolkit. ? TEMPORARY IO # [ next line is for default location ] ! ~/runtime # # ---------------------------------------------------------- # Pathfinder AVHRR/Land temporary file # ---------------------------------------------------------- 901|pc1157318822894183312||0|0|0|0 This file is defined in C code as #define BINARY_OUTPUT 901 or in Fortran code as PARAMETER (BINARY_OUTPUT=901) It resides in directory $PGSHOME/runtime. It has no attribute file. If you are sharing this temporary file among executables in the same PGE, then you need to have the same #define or PARAMETER statement in the code for each appropriate executable. Use function PGS_IO_Gen_Temp_Open (C) or PGS_IO_Gen_Temp_OpenF (Fortran) to open temporary files. Use function PGS_IO_Gen_Temp_Delete to delete files you no longer need within a PGE. This section has the same fields as PRODUCT INPUT and PRODUCT OUTPUT, except that Field 7 is not required. 3.1.2.10 End of PCFAll PCFs must end with the line ? END Any information after this line is ignored. 3.1.3 Checking your Process Control FileNow that you have created your PCF, you can use it from your software through use of the Toolkit. However, you might want to check it to see if you have entered everything correctly. You can do this by using the pccheck utility, a Unix executable included with the Toolkit. This program is compiled at the time of Toolkit installation, and is located in directory $PGSBIN. You execute shell script pccheck.sh, which calls executable pctcheck; its source code is $PGSSRC/PC/PGS_PC_Check.c . To run it on your file mypcfile, on the Unix command line type $PGSBIN/pccheck.sh -i mypcfile If there are any errors in your file, you will see messages of the form Error - problem with version number in Standard input file Line number: 23 Line: 401|goldtopolandsea8.bin||||| In this example the version number was omitted from the STANDARD INPUT file entry. At the end, you will see a summary of the form Check of mypcfile completed Errors found: 7 Warnings found: 0 For this utility, a pccheck error is defined as a PCF entry that will cause a Toolkit PC function to return an error message. A pccheck warning is defined as an incorrect entry that will not cause the Toolkit trouble, but may cause the PGE to operate incorrectly. For example, a blank character in the file name field does not bother a Toolkit PC function, since it simply returns the string as is; pccheck will not return an error. But a blank character will certainly cause a Unix error, when the file open is attempted by a Toolkit function; pccheck will return a warning to this effect. Output is returned to stdout (usually the screen). This is a simple explanation of how the pccheck utility works. For details, including a list of error messages, and information about other command line options, see "Validating Process Control Files", sec. C.2 of Appendix C, in the Toolkit Users Guide. 3.1.4 Metadata considerationsProtoype metadata (MET) tools, which format standard product metadata for ingest to the data server(SDS), will be available in the next Toolkit delivery. For the present, the only Toolkit functions that deal with data about data are the tools PGS_PC_GetFileAttr, PGS_PC_GetFileAttrCom and PGS_PC_GetFileByAttr. These functions are involved in retrieving file "attribute" data from the system, via the Process Control file. Essentially, you can get character string metadata from a text file using these functions. Since details about how the production system handles metadata are not yet available, this mechanism was determined to be the best that can be done about this issue at the moment. Every effort will be made to keep the calling sequence unchanged for these tools in the future. However, given the uncertainties about this issue, this cannot be guaranteed. 3.1.5 Command tools for use in shell scriptsThis section briefly describes the usage of the set of Command tools, which are callable from Unix shell scripts. These tools are generally identified by the suffix "Com" in the function name. Tool PGS_PC_Shell.sh is used to call your PGE. It is strongly recommended that you call your PGE from this function during testing at the SCF; among other things, it enables Toolkit (not user) shared memory, which speeds execution of certain Toolkit functions. Tools PGS_PC_InitCom and PGS_PC_TermCom are used to initialize and terminate your PGE respectively. Note: The above three functions are used at the DAAC as well as at your SCF; however, it is not necessary to include them in scripts that you deliver to the DAAC for Integration and Test. They are included as part of the Toolkit delivery for your use in testing at the SCF. The rest of the PGS_PC_*Com tools are simply wrappers on Toolkit API tools. These functions are for use inside your PGE. Further details are given in the Tool Descriptions. 3.2 Process Control (PC) Tool descriptions3.2.1 PGS_PC_InitComShort explanation of what it's for: Command line function for initializing the Toolkit for use with your PGE. This function is in file: $PGSSRC/PC/PGS_PC_InitCom.c Shell example: # Execute a PGE, enabling Toolkit (not user) shared memory, # and also initializing creation of Toolkit log files, # and have an SMF Cache size of 50. unix% PGS_PC_InitCom 1 1 50 Notes: You might want to use this function if you decide to write a custom script to call your PGE, in lieu of using PGS_PC_Shell.sh; however, please note that any such customization is not part of your delivery to the DAAC. This function enables your PGE to
The first argument of this function is for turning on or off Toolkit (not user) shared memory, if available; the second argument is for turning on or off creation of Toolkit log files. The third argument is to specify the amount (in records) of SMF Cache memory to reserve for the storage of SMF messages. In the example "unix%" is the Unix command line prompt. For this particular function, the example is identical for any Unix shell. 3.2.2 PGS_PC_GenUniqueIDShort explanation of what it's for: Generates a string that uniquely identifies your standard product output file. May be used as file metadata. This function is in file: $PGSSRC/PC/PGS_PC_GenUniqueID.c Examples: The examples assume the following exist in the Process Control File (PCF): ? SYSTEM RUNTIME PARAMETERS # ---------------------------------------------------------- # Production Run ID - unique production instance identifier # ---------------------------------------------------------- 1 # ---------------------------------------------------------- # Software ID - unique software configuration identifier # ---------------------------------------------------------- 1 #include <PGS_PC.h> #define HDF_FILE 301 char uniqueID[PGSd_PC_LABEL_SIZE_MAX]; PGSt_SMF_status returnStatus; /* Begin example */ returnStatus = PGS_PC_GenUniqueID(HDF_FILE,uniqueID); /* Variable uniqueID now contains the string "PRID - 1 SID - 1 PRODID - 301" */ IMPLICIT NONE INCLUDE 'PGS_SMF.f' INCLUDE 'PGS_PC.f' INCLUDE 'PGS_PC_9.f' INTEGER pgs_pc_genuniqueid INTEGER HDF_FILE PARAMETER(HDF_FILE=301) CHARACTER*200 uniqueid INTEGER returnstatus C C Begin example C returnstatus = pgs_pc_genuniqueid(HDF_FILE,uniqueid) C C Variable uniqueid now contains the string C 'PRID - 1 SID - 1 PRODID - 301' C The mechanism for using the output of this function as file metadata has not yet been defined. |