Thursday, February 3, 2011

Preparation H Waist Wraps

DirectoryInfo.GetFiles returns more files than expected (or how to get Exactly What You Need, with an exact match lookup extension)

Introduction

I didn’t noticed this behavior of the GetFiles() method until now, I must admit. It’s something not frequent to see, but might happen. And it’s dangerous.

As this post , and the MSDN library itself state, when you use the GetFiles() method with a search wildcard that includes the asterisk symbol, and you include a 3 characters long extension (like *.xml, or *.jpg), the GetFiles() method will return any file whose extension STARTS with the one you provided. That means that a search for *.jpg will return anything with extensions like: *.jpg, *.jpg2, *.jpegfileformat, etc.

This is a quite weird behavior (and not too elegant, I should say), introduced to support the 8.3 file name format. As stated in the above mentioned blog:

“A file with the name “alongfilename.longextension” has an equivalent 8.3 filename of “along~1.lon”. If we filter the extensions “.lon”, then the above 8.3 filename will be a match.”

That’s the reason to make the GetFiles() method behave that way. The official MSDN explanation:

Note

When using the asterisk wildcard character in a searchPattern (for example, "*.txt"), the matching behavior varies depending on the length of the specified file extension. A searchPattern with a file extension of exactly three characters returns files with an extension of three or more characters, where the first three characters match the file extension specified in the searchPattern . A searchPattern with a file extension of one, two, or more than three characters returns only files with extensions of exactly that length that match the file extension specified in the searchPattern . When using the question mark wildcard character, this method returns only files that match the specified file extension. For example, given two files in a directory, "file1.txt" and "file1.txtother", a search pattern of "file?.txt" returns only the first file, while a search pattern of "file*.txt" returns both files.

In my case, I had a bug in my software because I temporally renamed an XML file to xxx.XML2222, just to wipe it out of the application. The program was still reading it, what made it had a wrong behavior.

A workaround for this issue

If you want to prevent this behavior, you will need to do a manual check for the returned array of FileInfo classes, to remove those not matching your pattern. An elegant way to do so, is to write a MethodExtender to the DirectoryInfo class, like the following one:

/// <summary>

/// Returns array of files that matches the search wildcard, but with an exact match for the extension.

/// </summary>

/// <param name="pSearchWildcard"> Search wildcard, in the format: *.xml or file?.dat </param>

/// <returns> Array of FileInfo classes </returns>

public static FileInfo [] GetFilesByExactMatchExtension( this DirectoryInfo dinfo, string pSearchWildcard)

{

         FileInfo [] files = dinfo.GetFiles(pSearchWildcard);

         if (files.Length == 0)

             return files;

 

         string extensionSearch = Path .GetExtension(pSearchWildcard).ToLowerInvariant();

         List < FileInfo > filtered = new List < FileInfo >();

         foreach ( FileInfo finfo in files)

         {

             if (finfo.Extension.ToLowerInvariant() != extensionSearch)

                 continue ;

             filtered.Add(finfo);

         }

         return filtered.ToArray();

}

This way, just by the regular GetFiles() method of the DirectoryInfo class, you will find now the brand new GetFilesByExactMatchExtension(), which will have the desired behavior.

Note : In order to be able to use this method in a class, just like any other MethodExtender, you will need to include a “Using” statement to the extension method’s namespace.

Hope it helps !

0 comments:

Post a Comment