lots

Package: WA2L/edrc 1.5.57
Section: Maintenance Commands (1m)
Updated: 13 December 2014
Index Return to Main Contents

NAME

lots - long term data save handling

SYNOPSIS

edrc/bin/lots [ -h ]

lots [ -c configfile ] -a ( collect [ -d datalist ] | save | execute ) [ -i identitylist ]

lots [ -c configfile ] -a ( lock | purge )

lots [ -c configfile ] -a clear -j jobname

lots [ -c configfile ] -a list_jobs [ -d datalist ]

lots [ -c configfile ] -a ( list_datalist | list_schedule | list_volume )

lots [ -c configfile ] -a ( print_job -j jobname | print_log [ -f from_date ][ -t to_date ] | print_logtail | print_session -s sessionname )

AVAILABILITY

WA2L/edrc

DESCRIPTION

lots is used to copy data in an organized and automated fashion to a long term storage device. On this device the data is locked.

lots locks the data by setting the access time of the saved data to the future point in time (current day + RETENTION ) and removes the write flag of all files and directories to be locked. If the long term data storage is a file system on a NetAPP filer having a 'SnapLock Enterprise' volume or a 'SnapLock Compliance' volume, the data cannot be removed from Unix until the RETENTION is expired. If lots is used on a normal file system not having this functionality, the root user can delete the data, but it is still ensured, that non-root users cannot view, list or even remove the saved data.

The source path where the data comes from, the permissions of the path and other properties are saved and recorded to ensure a data restore even if the user base has changed since the data save.

After the expiration of the RETENTION + DATA_PURGE_LAG, the lots command purges (=removes) the expired data and the disk space is freed up.

The following four steps are performed during the life cycle of a long term data save:

1) collect

This step is executed once for each DATALIST whose schedule in schedule.dat matches for the current day and whose HOSTNAME setting matches to the host where the lots command is started.

A new job is created and the file and directory names to be saved as defined in the datalist.dat file are collected and stored in the newly created job. Be aware, that the lots release that created a job is always saved in the job, too ( VERSION property when printing the job).

This job will in the following steps be passes thru several phases during the whole information life cycle. The job information can be displayed using the ' lots -a print_job -j jobname ' command.

In this step no data is saved. When the collect action is completed, the job is in the save phase.

If multiple schedule definitions of a DATALIST match at a certain day, only the schedule definition having the highest RETENTION value will be executed, this to use the disk space economically. A scheduled DATALIST is only executed once a day unless the RETENTION in the schedule.dat is increased.

A job that is in the save phase can be removed using the ' lots -a clear -j jobname ' command.

2) save

This step is executed for all jobs that are in the save phase based on the resolved hostname in the collect step.

In this step the files found in the collect step are copied to the long term data save location as defined in the volume.dat file. The save can be performed with or without compression. The compression method has to be defined in the schedule.dat file.

When the save action is completed, the job is in the lock phase.

A job that is in the lock phase can be removed using the ' lots -a clear -j jobname ' command. All data that has been saved will also be removed from the long term storage.

3) lock

This step is executed for all jobs that are in the lock phase, the save phase completed with a SAVESTATE of OK and the number of days as defined in the DATA_LOCK_LAG setting in the lots.cfg file are expired.

The lock delay enables the system administrative personnel to react on unwanted conditions, as: inconsistency of saved data, errors in definitions which cause too much data to be saved (=waste of disk space) and other reasons.

The DATA_LOCK_LAG should be defined carefully and the implications have to be known by the administrative personnel operating the long term data save setup. If, for instance, the DATA_LOCK_LAG is set to 7, the administrator has after the save seven days time to react on a malfunction and to correct it. But during this time it is also possible to delete the saved data, what can be a risk, too.

When a job is in the lock phase, but the previous phase ( save ) did not complete with a SAVESTATE equal to OK, the lock action for the related job will be skipped and the data will not be locked. This means, that the job will remain in the lock phase until it is cleared using the ' lots -a clear -j jobname ' command. This avoids the locking of saved data of unsuccessful save actions.

When the lock action is completed, the job is in the purge phase.

A job that is in the purge phase cannot be removed any more using the ' lots -a clear -j jobname ' command, because on the long term storage the data is locked and lots must wait for the expiration of the lock to remove the data related to the locked job.

4) purge

This step is executed for all jobs that are in the purge phase after RETENTION + DATA_PURGE_LAG is expired.

The purge action removes the data related to the job on the long term storage and the disk space is therefore freed up.

When the purge action is completed, the job is in the expired phase and will remain there for ever for reference and compliance proof purposes.

The DATA_PURGE_LAG has only to be defined greater then 0, if it is experienced, that the system clocks of the server(s) accessing the long term storage ( using lots ) and the 'Compliance Clock' of the long term data storage device are not completely in sync. The purge of the data is delayed by the number of days defined in this setting. This to ensure smooth data purging.

This four steps (collect, save, lock, purge) can be executed separately using the commands:

: 1) lots -a collect
2) lots -a save
3) lots -a lock
4) lots -a purge

but the normal case is using the ' lots -a execute ' command which executes all steps in one session in sequence.

In a productive automated setup, the ' lots -a execute ' command will normally be called via a cron entry that starts lots once each day.

It is supported to access the same long term storage in parallel from multiple hosts. In this case the VARDIR as defined in the lots.cfg file and the lots.cfg file itself should be located also centrally on the long term storage to allow the most convenient operations. It is anyway recommendable to put the VARDIR and the lots.cfg config file also on the long term storage device to have the configuration and state information separated from the server to always secure this information and to be independent from server crashes and server re-installations which might occur during the life cycle of the data stored for a long time on the long term data storage. For central secure setup as suggested here, see the EXAMPLES section.

Each invocation of ' lots -a action ' creates a session. A session represents all command output and can be displayed for verification and compliance proof. Session logs are kept forever and can be displayed using the ' lots -a print_session -s sessionname ' command. To evaluate the sessions related to a job, use the ' lots -a list_(action|collect|save|lock|purge|clear) ' command.

To resolve the data save location for a job of a certain DATALIST, use the commands:

: 1) lots -a list_jobs [ -d datalist ]
2) lots -a print_job -j jobname

The lots command currently does not provide a data restore interface. Therefore the restore of the saved files is performed using the normal operating system commands ( cp, cpio, unzip, bzip2 -d, gzip -d, uncompress ) and after the restore the permissions have to be adjusted based on the information queried using the print_job action as described above also using the operating system commands ( chmod, chown, chgrp ).

OPTIONS

-h

usage message. Here the revision of lots is also displayed.

-c config_file

configuration file.

-i identitylist

comma separated list of identities (=hostnames) of the lots command.
If this option is not specified and the environment variable $LOTS_IDENTITIES is not set, the identities of the lots command are all hostnames under which the host where lots is started is known. Using the -i option it can be defined on which HOSTNAME settings in the schedule.dat file the lots command reacts.

Example, when -i adm_ora1_tst is used:

  :
  resolve identities of this host ...
      adm_ora1_tst
  done.
  :

Example, when the -i option is not used and the $LOTS_IDENTITIES environment variable is not set (the adm_ora1_tst hostname is a cluster package that is currently running on the host):

  :
  resolve identities of this host ...
      adm_ora1_tst acme001 loghost localhost
  done.
  :

-a

action:

collect [ -d datalist ] [ -i identitylist ]

collect data of scheduled datalist(s).

save [ -i identitylist ]

save data that has been collected to the long term storage.

lock

lock data on the long term storage from modification.

purge

remove data whose locks (retention) have been expired from long term storage.

clear -j jobname

remove a job and the data saved (if any). It is only possible to clear a job in the save and lock phases. When a job is cleared that has the biggest retention for a datalist for the current date, it is possible to perform another collect action of the related schedule.

execute [ -i identitylist ]

perform all four long term data save steps ( collect, save, lock, purge ) in one step.

list_action [ -f from_date ] [ -t to_date ]

list all performed actions between the from_date and the to_date.
If the from_date is not specified, the actions are listed from the beginning to the to_date. If the to_date is not entered, the actions are listed from the from_date to the end. If neither of the two dates are specified all actions are listed.

list_session [ -f from_date ] [ -t to_date ]

list all saved sessions between the from_date and the to_date.
If the from_date is not specified, the sessions are listed from the beginning to the to_date. If the to_date is not entered, the sessions are listed from the from_date to the end. If neither of the two dates are specified all sessions are listed.

list_collect [ -f from_date ] [ -t to_date ]

list only the collect actions between the from_date and the to_date.
If the from_date is not specified, the collect actions are listed from the beginning to the to_date. If the to_date is not entered, the collect actions are listed from the from_date to the end. If neither of the two dates are specified all collect actions are listed.

list_save [ -f from_date ] [ -t to_date ]

list only the save actions between the from_date and the to_date.
If the from_date is not specified, the save actions are listed from the beginning to the to_date. If the to_date is not entered, the save actions are listed from the from_date to the end. If neither of the two dates are specified all save actions are listed.

list_lock [ -f from_date ] [ -t to_date ]

list only the lock actions between the from_date and the to_date.
If the from_date is not specified, the lock actions are listed from the beginning to the to_date. If the to_date is not entered, the lock actions are listed from the from_date to the end. If neither of the two dates are specified all lock actions are listed.

list_purge [ -f from_date ] [ -t to_date ]

list only the purge actions between the from_date and the to_date.
If the from_date is not specified, the purge actions are listed from the beginning to the to_date. If the to_date is not entered, the purge actions are listed from the from_date to the end. If neither of the two dates are specified all purge actions are listed.

list_clear [ -f from_date ] [ -t to_date ]

list only the clear actions between the from_date and the to_date.
If the from_date is not specified, the clear actions are listed from the beginning to the to_date. If the to_date is not entered, the clear actions are listed from the from_date to the end. If neither of the two dates are specified all clear actions are listed.

list_jobs [ -d datalist ]

list the jobs and the phase where the job is in. If the -d datalist is not specified, all jobs are listed.

list_datalist

list all valid datalist definitions. See also datalist.dat(4) for more information about the datalist format.

list_schedule

list all valid schedule definitions. See also schedule.dat(4) for more information about the schedule format.

list_volume

list all valid volume definitions. See also volume.dat(4) for more information about the volume format.

print_job -j jobname

print all properties of a certain job. Use this option to print the path of the saved data.

GENERAL JOB PROPERTIES:

PHASE

phase the job is currently in.

HOSTNAME

hostname where the data of a matching schedule has been collected and saved.

VERSION

version of lots that created the related job.

JOBNAME

name of the job. This name has to be specified in the -j option when required.

DATALIST

scheduled datalist name of the job.

DATALIST_DESCRIPTION

free text description of the DATALIST as defined in the datalist.dat file at the time of job creation.

TIMEZONE

time zone as returned by timezone(3) at the time of job creation.

SCHEDULE

schedule in schedule.dat which matched at the time of job creation.

SCHEDULE_DESCRIPTION

free text description of the SCHEDULE as defined in the schedule.dat file at the time of job creation.

RETENTION

effective data retention in days as defined in the schedule.dat file at the time of job creation.

COMPRESSION

compression method of the saved data as defined in the schedule.dat file at the time of job creation.

COLLECTTIMESTAMP

timestamp (seconds since the epoch) when the data to be saved has been collected. This property will be seen when the job is in the phase save, lock, purge or expired.

COLLECTTIMEDAT

this is the human readable format of COLLECTTIMESTAMP.

COLLECTSTATE

state of the collect action. This property will be seen when the job is in the phase save, lock, purge or expired.

LOCKTIMESTAMP

timestamp (seconds since the epoch) when the data to be saved will be locked. This property will be seen when the job is in the phase save, lock, purge or expired.

LOCKTIMEDAT

this is the human readable format of LOCKTIMESTAMP.

LOCKSTATE

state of the lock action. This property will be seen when the job is in the phase purge or expired.

SAVETIMESTAMP

timestamp (seconds since the epoch) when the data has been saved. This property will be seen when the job is in the phase lock, purge or expired.

SAFETIMEDAT

this is the human readable format of SAVETIMESTAMP.

SAVESTATE

state of the save action. This property will be seen when the job is in the phase lock, purge or expired.

PURGETIMESTAMP

timestamp (seconds since the epoch) when the data has been purged. This property will be seen when the job is in the phase expired.

If the PURGESTATE is RETRY, this property will also be seen when the job is in the phase purge.

PURGETIMEDAT

this is the human readable format of PURGETIMESTAMP.

If the PURGESTATE is RETRY, this property will also be seen when the job is in the phase purge.

PURGESTATE

state of the purge action. This property will normally be seen when the job is in the phase expired.

If the PURGESTATE is RETRY, this property will also be seen when the job is in the phase purge. The RETRY PURGESTATE shows, that the purge action was not completely successful. The purging of the data of a job with this condition will be repeated in subsequent calls of lots until it succeeds.

DATA SAVE INFORMATION:

SAVE_BASEDIR

to this directory (on the long term storage) the data of the related job is saved to. In normal cases only one directory is listed here, but in special cases there might be listed multiple directories. This property will only have content when the job in the phase lock or purge.

FILENAME

filename and path of the saved source file. This property will only have content when data to be saved as defined in datalist.dat has been found on the system when collecting the data.

SAVE_SUFFIX

suffix of the saved file. This suffix correlates to the chosen compression method as printed in the COMPRESSION property.

A file to be restored can therefore be accessed concatenating the properties SAVE_BASEDIR + FILENAME + SAVE_SUFFIX.

SIZE

size of the source file in bytes.

USER

file owner name of the source file. The numeric user ID is stored with the file.

GROUP

file group name of the source file. The numeric group ID is stored with the file.

PERMISSIONS

symbolic representation of the source file's permissions as displayed by the ls(1) command.

PERM

numeric representation of the source file's permissions which can be used with the chmod(1) command.

MTIME

modification time of the source file in military format.

print_log [ -f from_date ] [ -t to_date ]

print the master log between the from_date and the to_date.
If the from_date is not specified, the log file is printed from the beginning to the to_date. If the to_date is not entered, the log file is printed from the from_date beginning to the end. If neither of the two dates is specified the whole log is printed.

print_logtail

display a continuous master log output.

print_session -s sessionname

print the session log output. Each output of lots is saved and due to compliance reasons never deleted. To evaluate the session name related to a certain action or a sequence of actions, use list_action, list_collect, list_save, list_lock, list_purge, list_clear or print_log.

-f from_date

begin date in the military format YYYY-MM-DD. To compute dates in this format, see: input(3), seconds(3), timer(1), today(3), tomorrow(3), yesterday(3).

-t to_date

end date in the military format YYYY-MM-DD. To compute dates in this format, see: input(3), seconds(3), timer(1), today(3), tomorrow(3), yesterday(3).

-j jobname

name of a lots job in the format YYYY-MM-DD_hh.mm.ss.

-s sessionname

name of a lots session in the format YYYY-MM-DD_hh.mm.ss.

-d datalist

name of a datalist as specified in datalist.dat and schedule.dat.

ENVIRONMENT

$LOTS_CONFIGFILE

configuration file of lots. The -c configfile command line option has preference. If the configuration file specified in $LOTS_CONFIGFILE does not exist, the default configuration file edrc/etc/lots.cfg is read.

$LOTS_IDENTITIES

comma separated list of identities (=hostnames) of the lots command. The -i identitylist option has preference.

EXIT STATUS

0

no error.

1

configuration file does not exist.

2

session could not be created. If you get this error, check the file systems where the VARDIR/log directory resides. See also lots.cfg(4).

4

Usage printed.

5

lots aborted pressing <Ctrl>+<C>.

6

job as specified in -j jobname not found.

7

session as specified in -s sessionname not found.

8

cannot write to logfile.

9

cannot write to VARDIR.

10

a job to be cleared does not exist or is not in the save or lock phases.

FILES

The VARDIR can be defined in the lots.cfg file. Default is edrc/var/lots.

edrc/etc/lots.cfg

default lots configuration file.

VARDIR/log/lots.master.log

logfile of lots.
Do not modify or shorten this file.

VARDIR/log/lots.session.<SESSIONNAME>.log.gz

session logfile of lots. Display session logs using the ' lots -a print_session -s sessionname ' command. Do not modify or delete any of this files.

VARDIR/locks/

lockfiles, do not edit them by hand.

VARDIR/objects/datalist.dat

data save definition. In this file it is defined which set of files and directories are saved using one handle (datalist).

VARDIR/objects/schedule.dat

schedule, retention and compression definition. In this file it is defined which datalist is scheduled to be saved on which date.

VARDIR/objects/volume.dat

definition of data save volume destinations. With this file it is possible to locate long term data saves to different destinations.

VARDIR/spool/save/<JOBNAME>.job

jobs in the save phase. Do not access job files directly, always use the -a print_job action to query job information, due to the fact that certain job information is resolved by lots while printing the job and some job properties are constructed for certain job VERSIONs due to functionality enhancements of lots.

VARDIR/spool/lock/<JOBNAME>.job

jobs in the lock phase. Do not access job files directly, always use the -a print_job action to query job information, due to the fact that certain job information is resolved by lots while printing the job and some job properties are constructed for certain job VERSIONs due to functionality enhancements of lots.

VARDIR/spool/purge/<JOBNAME>.job

jobs in the purge phase. Do not access job files directly, always use the -a print_job action to query job information, due to the fact that certain job information is resolved by lots while printing the job and some job properties are constructed for certain job VERSIONs due to functionality enhancements of lots.

VARDIR/spool/expired/<JOBNAME>.job

jobs in the expired phase. Do not access job files directly, always use the -a print_job action to query job information, due to the fact that certain job information is resolved by lots while printing the job and some job properties are constructed for certain job VERSIONs due to functionality enhancements of lots.

VARDIR/state/action

record of all performed actions. Do not modify or shorten this file.

VARDIR/state/diskusage

record of used disk space. This information is used to create reports.

VARDIR/state/performance

record of durations and data throughput. This information is used to calculate forecasts and create performance reports.

<VOLUME_PATH>/<YEAR>/<MMDD>/<DATALIST>/<COUNTER>/data/

SAVE_BASEDIR directory on the long term storage device where the data as defined in a datalist is saved to.

<VOLUME_PATH>/<YEAR>/<MMDD>/<DATALIST>/<COUNTER>/meta/<JOBNAME>.job

copy of the job file as located in VARDIR/spool/lock/<JOBNAME>.job or VARDIR/spool/purge/<JOBNAME>.job depending on the phase where the job currently is in. On a severe emergency situation where the information in the VARDIR/spool/lock/ and/or VARDIR/spool/purge/ directories is lost without having a backup, the state of the lots jobs in the lock and purge phases, which are the most critical ones, can be reconstructed by copying the job files from <VOLUME_PATH>/<YEAR>/<MMDD>/<DATALIST>/<COUNTER>/meta/ back to the VARDIR/spool/lock/ or VARDIR/spool/purge/ directories. The meta data of saved data is also locked with the identical retention as the data, therefore this information is secured.

EXAMPLES

NOTES

The NetAPP filers are able to provide the following SnapLock variants, based on the licenses applied:

SnapLock Enterprise:

Is a trusted administrator mode. In this mode a volume containing non-expired locked data can be destroyed by an NetAPP administrator on the NetAPP filer.

SnapLock Compliance:

Is a untrusted administrator mode. In this mode a volume containing non-expired locked data can not be destroyed on the NetAPP filer. This results in the risk, that on program malfunction or handling errors on the SnapLock volume, a big amount of disk space could be wasted without the possibility to clean up the data and to free up the wasted disk space.

BUGS

Abort handling is fully tested under HP-11ia only. Therefore when running lots on other operating systems, refrain from aborting a running lots session if possible. However, the expected side effects are not severe.

AUTHOR

lots was developed by Christian Walther. Send suggestions and bug reports to wa2l@users.sourceforge.net .

COPYRIGHT

This is free software; see edrc/doc/COPYING for copying conditions. There is ABSOLUTELY NO WARRANTY; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

This document was created by man2html using the manual pages.
Time: 13:15:31 GMT, March 12, 2025