C# I/O async (copyAsync): how to avoid file fragmentation?

1.2k views Asked by At

Within a tool copying big files between disks, I replaced the System.IO.FileInfo.CopyTo method by System.IO.Stream.CopyToAsync. This allow a faster copy and a better control during the copy, e.g. I can stop the copy. But this create even more fragmentation of the copied files. It is especially annoying when I copy file of many hundreds megabytes.

How can I avoid disk fragmentation during copy?

With the xcopy command, the /j switch copies files without buffering. And it is recommended for very large file in TechNet It seems indeed to avoid file fragmentation (while a simple file copy within windows 10 explorer DOES fragment my file!)

A copy without buffering seems to be the opposite way than this async copy. Or it there any way to do async copy without buffering?

Here it my current code for aync copy. I let the default buffersize of 81920 bytes, i.e. 10*1024*size(int64).

I am working with NTFS file systems, thus 4096 bytes clusters.

EDIT: I updated the code with SetLength as suggested, added the FileOptions Async while creating the destinationStream and fix setting the attributes AFTER setting the time (otherwise, exception is thrown for ReadOnly files)

        int bufferSize = 81920; 
        try
        {
            using (FileStream sourceStream = source.OpenRead())
            {
                // Remove existing file first
                if (File.Exists(destinationFullPath))
                    File.Delete(destinationFullPath);

                using (FileStream destinationStream = File.Create(destinationFullPath, bufferSize, FileOptions.Asynchronous))
                {
                    try
                    {                             
                        destinationStream.SetLength(sourceStream.Length); // avoid file fragmentation!
                        await sourceStream.CopyToAsync(destinationStream, bufferSize, cancellationToken);
                    }
                    catch (OperationCanceledException)
                    {
                        operationCanceled = true;
                    }
                } // properly disposed after the catch
            }
        }
        catch (IOException e)
        {
            actionOnException(e, "error copying " + source.FullName);
        }

        if (operationCanceled)
        {
            // Remove the partially written file
            if (File.Exists(destinationFullPath))
                File.Delete(destinationFullPath);
        }
        else
        {
            // Copy meta data (attributes and time) from source once the copy is finished
            File.SetCreationTimeUtc(destinationFullPath, source.CreationTimeUtc);
            File.SetLastWriteTimeUtc(destinationFullPath, source.LastWriteTimeUtc);
            File.SetAttributes(destinationFullPath, source.Attributes); // after set time if ReadOnly!
        }

I fear also that the File.SetAttributes and Time at the end on my code could increase file fragmentation.

Is there a proper way to create a 1:1 asynchronous file copy without any file fragmentation, i.e. asking the HDD that the file steam get only contiguous sectors?

Other topics regarding file fragmentation like How can I limit file fragmentation while working with .NET suggests incrementing the file size in larger chunks, but it does not seem to be a direct answer to my question.

3

There are 3 answers

1
EricBDev On BEST ANSWER

Considering Hans Passant answer, in my code above, an alternative to

destinationStream.SetLength(sourceStream.Length);

would be, if I understood it properly:

byte[] writeOneZero = {0};
destinationStream.Seek(sourceStream.Length - 1, SeekOrigin.Begin);
destinationStream.Write(writeOneZero, 0, 1);
destinationStream.Seek(0, SeekOrigin.Begin);

It seems indeed to consolidate the copy.

But a look at the source code of FileStream.SetLengthCore seems it does almost the same, seeking at the end but without writing one byte:

    private void SetLengthCore(long value)
    {
        Contract.Assert(value >= 0, "value >= 0");
        long origPos = _pos;

        if (_exposedHandle)
            VerifyOSHandlePosition();
        if (_pos != value)
            SeekCore(value, SeekOrigin.Begin);
        if (!Win32Native.SetEndOfFile(_handle)) {
            int hr = Marshal.GetLastWin32Error();
            if (hr==__Error.ERROR_INVALID_PARAMETER)
                throw new ArgumentOutOfRangeException("value", Environment.GetResourceString("ArgumentOutOfRange_FileLengthTooBig"));
            __Error.WinIOError(hr, String.Empty);
        }
        // Return file pointer to where it was before setting length
        if (origPos != value) {
            if (origPos < value)
                SeekCore(origPos, SeekOrigin.Begin);
            else
                SeekCore(0, SeekOrigin.End);
        }
    }

Anyway, that these methods won't guarantee no fragmentation, but at least avoid it for most of the cases. Thus the auto defragment tool will finish the job at a low performance expense. My initial code without this Seek calls created hundred of thousands of fragments for 1 GB file, slowing down my machine when the defragment tool went active.

7
Yury Glushkov On

I think, FileStream.SetLength is what you need.

3
Hans Passant On

but the SetLength method does the job

It does not do the job. It only updates the file size in the directory entry, it does not allocate any clusters. The easiest way to see this for yourself is by doing this on a very large file, say 100 gigabytes. Note how the call completes instantly. Only way it can be instant is when the file system does not also do the job of allocating and writing the clusters. Reading from the file is actually possible, even though the file contains no actual data, the file system simply returns binary zeros.

This will also mislead any utility that reports fragmentation. Since the file has no clusters, there can be no fragmentation. So it only looks like you solved your problem.

The only thing you can do to force the clusters to be allocated is to actually write to the file. It is in fact possible to allocate 100 gigabytes worth of clusters with a single write. You must use Seek() to position to Length-1, then write a single byte with Write(). This will take a while on a very large file, it is in effect no longer async.

The odds that it will reduce fragmentation are not great. You merely reduced the risk somewhat that the writes will be interleaved by writes from other processes. Somewhat, actual writing is done lazily by the file system cache. Core issue is that the volume was fragmented before you began writing, it will never be less fragmented after you're done.

Best thing to do is to just not fret about it. Defragging is automatic on Windows these days, has been since Vista. Maybe you want to play with the scheduling, maybe you want to ask more about it at superuser.com