Converting from Unicode to characters and symbols in Python p.2

preview_player
Показать описание

As requested, this is a tutorial showing users how to handle unicode on websites like Twitter. This can be used to convert things like special language symbols, things like smileys/emoticons, and really any of the symbols that you will come across as unicode as you parse or stream.

Since the main request was for Arabic, that is the example used here, though it works across the board.

Рекомендации по теме
Комментарии
Автор

I had md file and some text in it on Kali Linux after i tried to open it in Windows 7 it replaced my text with strange text like yours

jureklanac
Автор

Need urgent Help:
.-1


I want to compare two text files that are in UTF-8 encoding, File 1 is a dictionary of words and file 2 contains a sentence. I want to find out the similar words that are present in File 1 and File 2.

doorlasst
Автор

has anyone solved this error ..running in python eroor..ascii codec cantr ecode characters in postion...###....ordinal not in range 128...i know what that means ..ascii is 7 bits so its charcaters set goes to a total of 127 different characters..characters NOT in that set are then running in some 8 bit or higher sysytem..and i know it will work in python 3 but i need for it to work in python 2 cu i trying to convert or port the script to be compatible with python 3..thanxz

lucioiams
Автор

thank you so much
Is there another way? for example .csv

erkamtokgoz
Автор

What do you do if it's not "\n" escape, but "\x". This is killing me.

tomstieve
Автор

you are awesome... I have one more question, if you dont mind...
in the twitter streaming program, the every time i write arabic it doesn't accept. how can i search for arabic #'s? thanks a million... 

haysoom
Автор

x.decode('unicode-escape') is the magic word

kenmcc
Автор

Hello my freind, thanks a lot for the solution it works perfectly when i try to print the word in arabic letters! However, it doesn't work when i try to save it in a file, it says: 'ascii' codec can't encode characters in position 0-3: ordinal not in range(128), any solutions?

benkheddayoucef
Автор

I keep getting this error...
Traceback (most recent call last):
  File "pythonstreamarabic.py", line 6, in <module>
    print x.decode('unicode-escape')
UnicodeEncodeError: 'ascii' codec can't encode characters in position 15-17: ordinal not in range(128)

haysoom
Автор

i got an error, invalid escape sequence '\/'

SintaxErorr