PowerShell: Working with large files performance comparison

Ever since I started using PowerShell (back in 2012) for automating boring tasks or automating complex tasks, I always thought PowerShell Command-lets provided in powershell were the best to use not only for the ease of use but also for the performance. I always thought PowerShell Cmdlet provide high performance than any other alternatives available. However, when I needed to work with large text files (we still have to work on text files and they are very large), I needed to wait for result longer than usual.

Before looking at the script and comparing the performance, let’s look at the requirement:

I needed to take around 10% of the lines from the file that had 260 Million lines and that was 22 GB large. To be precise, I needed to take 29,331,402 lines out of the 259 Million lines. As always, I used simple powershell script to do this. It took more than half an hour and I just cancelled the script in between. I’d never seen powershell script taking that long time for any large tasks before. So, I decided to try some other alternatives and benchmark the output. I hope this will be beneficial to you as well.

To Benchmark this process, I took smaller file with 2405 MB File size and 29.33 Million lines. The test I performed on is Dell Latitude E7450 with i7-5600 2.6 GHz CPU, 16 GB Ram and Solid-state drive.  Then, I’m trying to take First 1 Million lines from this text file and write to newer one.

Leave a Reply

Your email address will not be published. Required fields are marked *