Since data stored in plain text format like CSV often occupies huge amount space, the data is often is compressed for transmission and storage.
Because there are often many parameters associated with these commands, I find myself searching the same topic over and over again. Hopefully, this page would be useful for you too.
# Compress the file
gzip testfile.csv
# Gives a testfile.csv.gz
# View the content directly without decompressing
less archive.tar.gz
# Decompress a file
gzip -d testfile.csv.gz
Less can look within the content of a gzipped file, making it very useful for exploring huge file. {: .notice}
Very often, I obtain a database dump with size in terms of multiple gigabytes. Reading the entire file into memroy is not feasible and dumping the data into a database takes some time.
Decompressing to standard output allows us to pipe the output to another command for further processing. For example, the following command counts the number of lines in the compressed file.
gzip -dc testfile.csv.gz | wc -l
Create a sample data of the first 10000 lines of the data.
gzip -dc testfile.csv.gz | head -n10000 > output.csv
zcat
can also print the content to standard output.
{: .notice}
To uncompress a zip file,
# extract
tar -zxvf archive.tar.gz
# list the files
tar -ztvf archive.tar.gz
# This will also list the file.
less archive.tar.gz
To compress a directory or a list of files,
# zip the files
tar -zcvf