Asked  7 Months ago    Answers:  5   Viewed   176 times

Why is the below item failing? Why does it succeed with "latin-1" codec?

o = "a test of xe9 char" #I want this to remain a string as this is what I am receiving
v = o.decode("utf-8")

Which results in:

 Traceback (most recent call last):  
 File "<stdin>", line 1, in <module>  
 File "C:Python27libencodingsutf_8.py",
 line 16, in decode
     return codecs.utf_8_decode(input, errors, True) UnicodeDecodeError:
 'utf8' codec can't decode byte 0xe9 in position 10: invalid continuation byte

 Answers

99

In binary, 0xE9 looks like 1110 1001. If you read about UTF-8 on Wikipedia, you’ll see that such a byte must be followed by two of the form 10xx xxxx. So, for example:

>>> b'xe9x80x80'.decode('utf-8')
u'u9000'

But that’s just the mechanical cause of the exception. In this case, you have a string that is almost certainly encoded in latin 1. You can see how UTF-8 and latin 1 look different:

>>> u'xe9'.encode('utf-8')
b'xc3xa9'
>>> u'xe9'.encode('latin-1')
b'xe9'

(Note, I'm using a mix of Python 2 and 3 representation here. The input is valid in any version of Python, but your Python interpreter is unlikely to actually show both unicode and byte strings in this way.)

Tuesday, June 1, 2021
 
Raef
answered 7 Months ago
39

The problem is with the string

"C:UsersEricDesktopbeeline.txt"

Here, U in "C:Users... starts an eight-character Unicode escape, such as U00014321. In your code, the escape is followed by the character 's', which is invalid.

You either need to duplicate all backslashes:

"C:\Users\Eric\Desktop\beeline.txt"

Or prefix the string with r (to produce a raw string):

r"C:UsersEricDesktopbeeline.txt"
Tuesday, June 1, 2021
 
dimitarvp
answered 7 Months ago
17

The error is because there is some non-ascii character in the dictionary and it can't be encoded/decoded. One simple way to avoid this error is to encode such strings with encode() function as follows (if a is the string with non-ascii character):

a.encode('utf-8').strip()
Tuesday, June 1, 2021
 
hillz
answered 7 Months ago
35

This happens because you chose the wrong encoding.

Since you are working on a Windows machine, just replacing

Past=pd.read_csv("C:/Users/Admin/Desktop/Python/Past.csv",encoding='utf-8') 

with

Past=pd.read_csv("C:/Users/Admin/Desktop/Python/Past.csv",encoding='cp1252')

should solve the problem.

Wednesday, July 28, 2021
 
danjah
answered 5 Months ago
37

#-*- coding: xxx -*- has nothing to do with this error, it only applies to the encoding of the source file it is declared in, not the content of variables coming from a database.

Your error says that you try to pass a str type object containing non ASCII characters to the unicode() constructor (which is called at line 43 of suds/sax/text.py).

You have to convert the strings coming from the database to unicode objects ; for example if your database is encoded in UTF-8:

title = product[1].decode("UTF-8")

See the str.decode() documentation for details.

Thursday, August 26, 2021
 
Blazemonger
answered 4 Months ago
Only authorized users can answer the question. Please sign in first, or register a free account.
Not the answer you're looking for? Browse other questions tagged :  
Share