Thursday, March 27, 2008

VB.net - reading extended ascii text file - StreamReader changes extended ascii to question marks

Ran into an issue today reading a text file in Visual Basic .net. The file had extended ascii characters (special "curly quotes" to be exact) that were being read in as question marks because they were outside of the normal ascii character set. The solution was to set the text encoding to "default" in the StreamReader constructor, which is apparently the setting used for "Windows ANSI" text encoding:

Dim fileReader As System.IO.StreamReader
Dim filePath as String

filePath = "C:\path\to\file.txt"
fileReader = New System.IO.StreamReader(filePath, System.Text.Encoding.Default)

While fileReader.Peek <> -1

curLine = fileReader.ReadLine
' do something with curLine
End While

fileReader.close()

11 comments:

cs240 said...

I get Unicode when I read in values over 7F (128). Any work around.
for example I read in 0x80 StreamReader reads 20AC.
thanks in advance.

Braulio said...

Well... Not sure if you have already solve this but it seams that you have to start reading bytes, in theory there's no standard Extended ASCII, pain in the neck.

Anonymous said...

Thanks. That was simpler than my Binary Stream approach.

Wow Gold said...

WOW GOLD from randyrun. Most cheapest wow gold supplier.More than 10,000 online satisfied customers bears to the fact that we are genuine and fastest wowgold provider!

Anonymous said...

Thanks for that - really saved me some time!

Aion kinah said...

Good blog,read

Aion kinah kaufen said...

Good blog

Philipp said...

Thank's for your post!

I ran in that issue when I was reading an RSS-XML, which has a latin charakterset. Following code solved my problem:

myStreamReader = !String.IsNullOrEmpty(myHttpWebResponse.CharacterSet) ? new StreamReader(myHttpWebResponse.GetResponseStream(), Encoding.GetEncoding(myHttpWebResponse.CharacterSet)) : new StreamReader(myHttpWebResponse.GetResponseStream(),Encoding.Default);

Nier said...

Great solution. Thanks! I was using Encoding.ASCII before but the extended ASCII is not being recognized. Now I'm using Encoding.Default and everything seems to be fine.

qishaya said...

Milan create week ended last night with donatella christian louboutin london disregard presenting the ending show of the autumn christian louboutin online frost 2003 italian collections.Donatella, the creative chief of christian boots the house her delayed brother founded, delivered a collection that was, christian louboutin uk as they say, very christian louboutin shoes.First up was the versus limit, louboutin boots the cheaper christian louboutin line. Girls stomped christian louboutin 2010 out with backcombed tresses bearing turquoise leather trousers, christian louboutin uk sale patchwork pullover and blonde fur sliced into stoles and active jackets. christian boots uk But this aggressive hell’s angels look almost seemed a caricature of the christian louboutin boots christian louboutin boots comfort. louboutin sandals Next up was gianni christian louboutin, buy christian louboutin the main collection

Anonymous said...

Thank you very much....after a long search i find this damn useful code thanks dude.