AWS athena

Gibberish in AWS Athena? instead of Hebrew ?

The problem is usually the orignal encoding of the source file.

But sometime it is about the end of line problem from different OS, just use dos2unix and try not to open the files in windows OS systems.

brew install dos2unix

dos2unix filename.csv

 

check the file type in linux CMD:

file -I filename.csv

the result should be something like the below. any other encoding like iso-8859-8 or UTF -8 should produce Hebrew, so your file will be probably in different encoding

text/plain; charset=iso-8859-1

Than the challenge would the convert…

3 ways to convert your text gibrish file into Hebrew:

  1. Microsoft XL , rename the file to filename.txt and open file, it will open a wizard letting you choose the encoding.
  2. Linux CMD:

    iconv -f iso-8859-1 -t utf-8 < file > file.new

  3. online encoding convertor to utf 8 i used : https://subtitletools.com/convert-text-files-to-utf8-online

 

Trying testing the file locally – if you see Hebrew on your desktop , you should be fine on Athena.

Have fun!

—————————————————————————————————–



——————————————————————————————————————————

I put a lot of thoughts into these blogs, so I could share the information in a clear and useful way. If you have any comments, thoughts, questions, or you need someone to consult with, feel free to contact me:

https://www.linkedin.com/in/omid-vahdaty/



Leave a Reply