I had this problem the other day. The data was English and Spanish, so it contained a variety of special characters. I was using Mechanize, a ruby on rails plugin that automates your web scraping, to download the spreadsheet and import the data, but kept seeing names like Andr? and Ant?nio. Here is how I solved it:
Short Answer:
#a bunch of code to get the excel response...
corruptbody = agent.page.body.strip
body = Iconv.conv('UTF-8', 'cp1252', corruptbody)
Quick and painless. Ruby's Iconv class came through for me. The hard part was figuring out what charset Microsoft was using for the excel spreadsheet.
cp-1252 is Microsoft's charset for western languages. You can find a list of all encodings here.
Long Answer:
First I tried using ruby's String instance methods to convert the content to UTF-8.
corruptbody = agent.page.body.strip.force_encoding('UTF-8')
This didn't work, as I was using ruby 1.8.7, which doesn't include force_encoding. So I tried finding out what encoding it was in to begin with.
puts corruptbody.encodingThis confused me because I thought I needed to convert it to unicode UTF-8 to get rid of the question marks. I found a post here where someone had the same problem. I tried what they suggested, but nothing changed.
==>UTF-8
After some digging around I found out that ruby, by default, encodes everything to utf-8. So the problem was with Mechanize, and not ruby. Mechanize wasn't able to read whatever encoding the excel data was in.
Naturally, I figured the excel data was using some Spanish charset that I needed to change (even though most latin charsets include values for both english and spanish). I read around some more and found out that ISO-8859-1 is commonly used for Latin alphabets. However, when I tried to convert it to UTF-8, nothing changed.
Finally I found the answer on a list of character encodings, which you can see here. This led me to the Iconv solution.
corruptbody = agent.page.body.strip
body = Iconv.conv('UTF-8', 'cp1252', corruptbody)
Resources:
List of all character encodings: http://en.wikipedia.org/wiki/Character_encoding
String encoding in ruby 1.9: http://blog.grayproductions.net/articles/ruby_19s_string
Ruby Iconverter: http://www.ruby-doc.org/stdlib/libdoc/iconv/rdoc/classes/Iconv.html
Mechanize documentation: http://mechanize.rubyforge.org/mechanize/
Railscast on learning to use mechanize: http://railscasts.com/episodes/191-mechanize