Archive by Author

Detecting the Character Encoding of a File

13 May

Written by:

There are several occasions when it’s necessary to automatically detect the encoding that’s used by a file: perhaps your program has an “Import” feature that allows the user to open an arbitrary text file, or perhaps you need to read a HTML file and don’t have access to (or can’t trust) the Content-Type HTTP header. (For an introduction to encodings, see The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!) by Joel Spolsky.)

On U.S. English Windows, you can usually assume that the file might be encoded with UTF-8 or Windows-1252, but if you guess wrong, you might get text that looks like this:

�It�s mine,� he said.

or this:

“It’s mine,” he said.

or worst yet, this:

Unhandled System.Text.DecoderFallbackException

While it’s obviously best to know the encoding that’s used by the input you’re processing, sometimes there’s no way to know it ahead of time. In that case, there are libraries that can guess the encoding, usually based on statistical analysis of the bytes or detection of invalid byte sequences. The Mozilla project has a universal charset detector, and Microsoft has been shipping MLang, a COM component that provides code page detection through the IMultiLanguage2.DetectCodepageInIStream method since IE5.

The COM interfaces and structures we need are declared as follows (definitions taken from MLang.h in the Windows SDK):

Checking for possibly null values in LINQ

30 Apr

Written by:

I recently encountered some confusing code that was written to work around this issue. Let’s say you want to find all items whose title is null. Using LINQ, you could do something like this:

var titleless = items.Where(x => x.Title == null);

This works just fine in LINQ to Objects, LINQ to SQL, and LINQ to Entities. But what if you instead want to find all items whose title is equal to a variable that may or may not be null?

string title = null;
var titleless = items.Where(x => x.Title == title);

This works in LINQ to Objects, but not LINQ to SQL or LINQ to Entities, due to the fact that it generates SQL something like this:

select * from Items where Title = @x 

which translates into

select * from Items where Title = null 

which doesn’t match anything, because null does not equal null in SQL. It needs to generate this:

select * from Items where Title is null 

You can make it work in LINQ to SQL if you use object.Equals:

var titleless = items.Where(x => object.Equals(x.Title, title)); 

That generates the is null when title is null. But it doesn’t work in LINQ to Entities. You can make it work in LINQ to Entities with this statement:

var titleless = items.Where(  x => title == null ? x.Title == null : x.Title == title); 

But that generates some scary SQL akin to this:

select * from Items where (

What Makes Good Code Good?

19 Feb

Written by:

The Logos software development team has a large set of coding guidelines that have evolved over the years. We record them all on an internal wiki, each in its own article. The scope of each guideline varies widely; some guidelines give broad advice, and others indicate a particular coding style for a particular language.

Why do we have coding guidelines? To encourage the writing of good code. But what is good code? My favorite description of good code can be found in an MSDN Magazine article by the late Paul DiLascia. In his End Bracket article titled “What Makes Good Code Good?” he explains that