How to I get all files from a directory with a variable extension of specified length?

1.6k views Asked by At

I have a huge directory I need retrieve files from including subdirectories.

I have files that are folders contain various files but I am only interested in specific proprietary files named with an extension with a length of 7 digits.

For example, I have folder that contains the following files:

abc.txt
def.txt
GIWFJ1XA.0201000
GIWFJ1UC.0501000
NOOBO0XA.0100100
summary.pdf
someinfo.zip
T7F4JUXA.0300600
vxy98796.csv
YJHLPLBO.0302300
YJHLPLUC.0302800  

I have tried the following:

var fileList = Directory.GetFiles(someDir, "*.???????", SearchOption.AllDirectories)

and also

string searchSting = string.Empty;
for (int j = 0; j < 9999999; j++)
{
  searchSting += string.Format(", *.{0} ", j.ToString("0000000"));
}

var fileList2 = Directory.GetFiles(someDir, searchSting, SearchOption.AllDirectories);

which errors because the string is too long obviously.

I want to only return the files with the specified length of the extension, in this case, 7 digits to avoid having to loop over the thousands I would have to process.

I have considered creating a variable string for the search criteria that would contain all 99,999,999 possible digits but d

How can I accomplish this?

3

There are 3 answers

3
ProgrammingLlama On BEST ANSWER

I don't believe there's a way you can do this without looping through the files in the directory and its subfolders. The search pattern for GetFiles doesn't support regular expressions, so we can't really use something like [\d]{7} as a filter. I would suggest using Directory.EnumerateFiles and then return the files that match your criteria.

You can use this to enumerate the files:

private static IEnumerable<string> GetProprietaryFiles(string topDirectory)
{
    Func<string, bool> filter = f =>
    {
        string extension = Path.GetExtension(f);
        // is 8 characters long including the .
        // all remaining characters are digits
        return extension.Length == 8 && extension.Skip(1).All(char.IsDigit);
    };

    // EnumerateFiles allows us to step through the files without
    // loading all of the filenames into memory at once.
    IEnumerable<string> matchingFiles =
        Directory.EnumerateFiles(topDirectory, "*", SearchOption.AllDirectories)
            .Where(filter);
                    
    // Return each file as the enumerable is iterated
    foreach (var file in matchingFiles)
    {
        yield return file;
    }
}

Path.GetExtension includes the . so we check that the number of characters including the . is 8, and that all remaining characters are digits.

Usage:

List<string> fileList = GetProprietaryFiles(someDir).ToList();
4
Ibrennan208 On

I would just grab the list of files in the directory, and then check if the substring length after the '.' is equal to 7. (* As long as you know no other files would have that length extension)

EDITED to use Path instead:

Directory.GetFiles(@"C:\temp").Where(
    fileName => Path.GetExtension(fileName).Length == 8
    ).ToList();

OLD:

Directory.GetFiles(someDir).Where(
         fileName => fileName.Substring(fileName.LastIndexOf('.') + 1).Length == 7
).ToList();
0
teamol On

Consider files as Directory.GetFiles() result.

using System;
using System.Collections.Generic;
using System.Linq;
using System.IO;
using System.Text.RegularExpressions;

public class Program
{
    public static void Main()
    {
        List<string> files = new List<string>()
        {"abc.txt", "def.txt", "GIWFJ1XA.0201000", "GIWFJ1UC.0501000", "NOOBO0XA.0100100", "summary.pdf", "someinfo.zip", "T7F4JUXA.0300600", "vxy98796.csv", "YJHLPLBO.0302300", "YJHLPLUC.0302800"};
        Regex r = new Regex("^\\.\\d{7}$");
        foreach (string file in files.Where(o => r.IsMatch(Path.GetExtension(o))))
        {
            Console.WriteLine(file);
        }
    }
}

Output:

GIWFJ1XA.0201000
GIWFJ1UC.0501000
NOOBO0XA.0100100
T7F4JUXA.0300600
YJHLPLBO.0302300
YJHLPLUC.0302800

Edit: I tried (r.IsMatch) instead of using o but dotnetfiddle Compiler is giving me error saying

Compilation error (line 14, col 27): The call is ambiguous between the following methods or properties: 'System.Linq.Enumerable.Where<string>(System.Collections.Generic.IEnumerable<string>, System.Func<string,bool>)' and 'System.Linq.Enumerable.Where<string>(System.Collections.Generic.IEnumerable<string>, System.Func<string,int,bool>)'

Can't debug it since I am busy now, I'd be happy if anyone passing by suggest any fix for that. But the current code above works.