Purpose: Data Archiving for Disk Space Optimization
This script helps free up disk space by archiving incident, entry, and indicator data from the /var/lib/demisto/tenants/{TENANT_NAME}/data
and /var/lib/demisto/data
folders and moving them to the /backups/demisto-archive
directory.
Build
The build process is managed through Gitlab’s continuous integration (CI) using the .gitlab-ci.yml
file. The build stage utilizes a script called rpm-build.sh and the deployment stage uses repo-deploy.sh to store the resulting .rpm
packages in a repository.
Installation
Using Ansible role
Manual: yum --enablerepo=mss install xsoar-archive -y
Running the script
To run the script, you need to pass one of the two options:
-s
: This option runs the whole script which includes stopping Demisto service, archiving and compressing tenant files, moving the files to the backups directory, and starting Demisto service.-a
: This option runs only the archiving and compressing tenant files, moving the files to the backups directory.
For example, to run the script with the -s
option, you would enter sh /bin/scriptname.sh -s
in the command line.
The script contains several functions that are used to perform specific tasks:
graph TD;
compression_section-->compress_in_path;
moving_section-->pathCheckCreation;
compress_in_path-->pathCheckCreation;
moving_section-->log_display;
compression_section-->log_display;
timestamp-->log_display;
pathCheckCreation-->log_display;
compress_in_path-->log_display;
moving_section-->getDaysToArchive;
moving_section-->timestamp;
compression_section-->timestamp;
compress_in_path-->timestamp;
moving_section-->gz_csv_pdf_docx_cleanup;
Each function is meant to be called as part of a larger script that coordinates the archiving process and provides an overall summary of the time taken for each task and any errors that might have occurred.
log_display
This function is used to display messages in the console and log them to a file. It serves as a way to log information, errors or success in a consistent way providing clear and organized information about the script’s execution.
The messages printed to the console with different colors, makes it easy to spot errors or important information.
The log file that this function is appending to is also a useful feature as it allows the user to review the script’s output even after it has finished running, this could be helpful in cases where the script has been running for long periods of time or when the user want to check the script’s output at a later time.
gz_csv_pdf_docx_cleanup
The gz_csv_pdf_docx_cleanup
function is a utility function that is used to delete certain types of files that are older than a certain age from a specific directory.
The criteria that the function checks for is :
.gz
and.csv
files.pdf
and.docx
files older than 7 days
timestamp
This function is a utility function that is used to calculate and display the amount of time tak en for a certain process. The function takes four arguments:
STARTTIME
: the start time of the processENDTIME
: the end time of the processmessage
: a string that describes the process that was timedswitch
: an optional argument that is used to switch off recording the duration totimesRecord
variable
pathCheckCreation
This function serves as a way to check if a directory exists and to create it if it does not. It is a helper function that can be used within the script to ensure that a specific directory is present before proceeding with other operations, like creating files or folders within that path, it also provides clear and organized information about the script’s execution, by logging the creation of the path if it’s needed
getDaysToArchive
This function is a utility function that finds and returns a list of months in the format “mmYYYY” for files older than a specified number of days in a given directory. It serves as a way to find files that need to be archived based on their age and it will be used by the script to decide which folders need to be archived
compress_in_path
This function is a utility function that compresses all folders in a given directory, and moves them to their corresponding “archived-YYYY” folder, while logging success or failure of each step in the process and it also outputs the time taken for each iteration of the for loop.
moving_section
This function is responsible for moving the content from the /data
folder of demisto and each tenant to the /demisto-archive
folder. It moves the content that is older than a specified number of days. The number of days is passed as an argument to the script and if none is provided, a default value of 365 days is used.
compression_section
This function performs compression and archiving operations on files and folders, specifically within the Demisto and Tenant file system. The script accepts a list of Tenant names as arguments, and uses this list to iterate through each tenant’s directory, compressing and archiving the data within them. The script uses two nested for-loops to achieve this, the first looping through the list of tenants and the second looping through a list of unique days to archive.
move_folder
This function moves the content from a staging area to the /backups
directory. It appends the current hostname to the destination folder name.
\