Thursday, March 27, 2008

VB.net - reading extended ascii text file - StreamReader changes extended ascii to question marks

Ran into an issue today reading a text file in Visual Basic .net. The file had extended ascii characters (special "curly quotes" to be exact) that were being read in as question marks because they were outside of the normal ascii character set. The solution was to set the text encoding to "default" in the StreamReader constructor, which is apparently the setting used for "Windows ANSI" text encoding:

Dim fileReader As System.IO.StreamReader
Dim filePath as String

filePath = "C:\path\to\file.txt"
fileReader = New System.IO.StreamReader(filePath, System.Text.Encoding.Default)

While fileReader.Peek <> -1

curLine = fileReader.ReadLine
' do something with curLine
End While

fileReader.close()

10 comments:

Gav said...

I get Unicode when I read in values over 7F (128). Any work around.
for example I read in 0x80 StreamReader reads 20AC.
thanks in advance.

Anonymous said...

Well... Not sure if you have already solve this but it seams that you have to start reading bytes, in theory there's no standard Extended ASCII, pain in the neck.

Anonymous said...

Thanks. That was simpler than my Binary Stream approach.

Wow Gold said...

WOW GOLD from randyrun. Most cheapest wow gold supplier.More than 10,000 online satisfied customers bears to the fact that we are genuine and fastest wowgold provider!

Anonymous said...

Thanks for that - really saved me some time!

Aion kinah said...

Good blog,read

Aion kinah kaufen said...

Good blog

Philipp said...

Thank's for your post!

I ran in that issue when I was reading an RSS-XML, which has a latin charakterset. Following code solved my problem:

myStreamReader = !String.IsNullOrEmpty(myHttpWebResponse.CharacterSet) ? new StreamReader(myHttpWebResponse.GetResponseStream(), Encoding.GetEncoding(myHttpWebResponse.CharacterSet)) : new StreamReader(myHttpWebResponse.GetResponseStream(),Encoding.Default);

Nier said...

Great solution. Thanks! I was using Encoding.ASCII before but the extended ASCII is not being recognized. Now I'm using Encoding.Default and everything seems to be fine.

Anonymous said...

Thank you very much....after a long search i find this damn useful code thanks dude.