How to Split and Join Large Files in Linux

Posted on

There can be many reasons to split up large files into smaller chunks. In the past, this was extremely common due to the lack of storage space on external media, and we needed to transport large files using multiple floppy disks. This problem has been somewhat diminished lately due to the drastic increase in storage capacity, but it’s still possible for a large file to not fit on a single storage media.

A more common use case scenario is where you want to upload or download a large file over an unstable Internet connection. Yes there are utilities to resume downloading, but not everyone knows how to use them. So if you’re offering a large file as a download, you can simply split it up into several files and place them on your website, with instructions on how to reassemble the pieces. Or if you want to¬†upload a large file to a server, you can split it up and reassemble them yourself.

Whatever the reason, splitting and joining large files in Linux is extremely simple. In this tutorial, I’ll show you two basic commands to achieve this and we’ll even compare the result in the end to make sure that the reassembled file exactly matches the one that was originally split.

Splitting a Large File into Multiple Pieces

In this example, I have a large Ubuntu ISO that I got from the main site. You can see the size in MB using the following command:

ls -l --block-size=M

So we have a file of size 1515 MB. Let’s see how we can split this into a bunch of smaller files – each with a size of 500 MB. We use the “split” command for this:

split -b 500M -d -a 3 ubuntu.iso ubuntu

Let’s break this down bit by bit.

  • The “-b” parameter allows us to specify the size of each chunk. In this case, 500MB.
  • The “-d” parameter tells us to use digits while naming each split file
  • We use “-a”specify how many characters we want to append to each file
  • The final parameter is the prefix to the output files

So using this command, we should get 500MB sized chunks using digits like “001, 002, 003”. Let’s see if that’s actually what happened:

As expected, we have files called “ubuntu001, ubuntu002 etc”, each of which is 500MB – except for the last one which makes up the remainder of the file. Now each of these can be uploaded or downloaded separately. Once that’s done, we need to see how to join them together again.

Re-assembling the Pieces from the “split” Command

To put the split files back together again into a single file, we use a command that we’re all familiar with – “cat”. Assuming we have the files as above with the naming convention, we use cat like this:

cat ubuntu{000..003} > ubuntu-new.iso

Here, we ensure that “cat” only takes the files that we want by specifying the suffix range in brackets – {} . For example, if you use something like “ubuntu*”, it will also pick up the original file “ubuntu.iso” and append it as well! Using brackets, we can clearly specify the range – and that’s one benefit of using numbers. They will always be sorted properly and won’t be impacted by things like locale settings etc.

We also use the “>” character to specify the destination. When run, the pieces are re-assembled as below:

Split and Join Large Files in Linux

Finally, we can verify that the result is exactly the same as the original file by using the “diff” command with the “–speed-large-files” parameter. So the following command will compare the original and the re-assembled file:

diff --speed-large-files ubuntu.iso ubuntu-new.iso

As shown in the screenshot above, there is no output, which means that the two files are exactly the same, which is what we wanted!

With “split” and “cat”, there’s no need to rely on zip or rar utilities. We can use those to compress and create the archive in the first place. But those utilities don’t do a very good job of properly splitting and joining archives, since they also take into consideration the placement of files within each split file and don’t guarantee the split file sizes. The techniques outlined above are independent of any tool and will work in all situations.

Leave a Reply

Your email address will not be published. Required fields are marked *