Size of data in bytes

This was prompted by an error I was running into with the AWS s3 service: I needed to tell the transfer utility the size of the data, in bytes, when transferring large files.

In this case I am looking at files of characters. Some of these methods should work equally well for binary files, and others don’t. In the following examples, I’ll use the full text of Moby-Dick from Project Gutenberg, 2701-0.txt, as the target file. I retrieved the file using the following command:

curl -O

A couple commands to get size in bytes immediately came to mind: ls, stat, and wc.

$ ls -l 2701-0.txt | cut -d' ' -f5

$ stat --format %s 2701-0.txt 

$ wc -c 2701-0.txt | cut -d' ' -f1

All those options work. But what if the input isn’t a file on disk, and instead is an input stream? This is to demonstrate counting the bytes in a character stream coming from any source, so forgive the “useless use of cat”:

$ cat 2701-0.txt | wc -c

$ cat 2701-0.txt | cksum | cut -d' ' -f2

$ cat 2701-0.txt | dd of=/dev/null
2492+1 records in
2492+1 records out
1276201 bytes (1.3 MB, 1.2 MiB) copied, 0.00997434 s, 128 MB/s

The output from dd above is not the simplest thing to parse. It’s multi-line and sent to stderr, so I redirected it to stdout and grepped for “bytes”:

$ cat 2701-0.txt | dd of=/dev/null 2>&1 | grep 'bytes' | cut -d' ' -f1

There are at least 5 methods to find the size of a file using common command-line tools:

  • ls
  • stat
  • wc
  • cksum
  • dd

Know of others? Leave a comment below.

nmap scans the top 1000 ports by default, but which 1000?

From man nmap:

The simple command nmap target scans 1,000 TCP ports on the host target.

You might reasonable ask, which 1,000 ports is it? Is the particular port in which I am interested included?

Fortunately, nmap has a list of ports/services that includes how frequently they are used. From this we can get the top 1000:

grep -v '^#' /usr/share/nmap/nmap-services | sort -rk3 | head -n1000
  • The initial grep is to filter out the comments (lines that begin with the hash mark).
  • The sort command sorts in descending order, by the 3rd column (the frequency).
  • The final head command displays only the top 1000 results.

In my cases, I wondered if the radmin port, 4899/tcp, was included in an nmap scan. I piped the above command to grep to find out:

grep -v '^#' /usr/share/nmap/nmap-services | sort -rk3 | head -n1000 | grep 4889
radmin  4899/tcp        0.003337        # Radmin ( remote PC control software

It is included in a default nmap scan.

Is there an easier way to do this? Drop me a line in the comments!