Asked  7 Months ago    Answers:  5   Viewed   40 times

I have the standard XAMPP installation on win7 (x64). Having had my share of encoding troubles in a past project where mysql encoding did not match with the php enconding which in turn sometimes output html in other encodings, I decided to consistently encode everything using utf-8.

I'm just getting started with the html markup and am allready experiencing troubles.

  • My page is saved using utf-8 (no BOM, I think)
    //update: It turns out this was NOT the case. The file was actually saved with ISO_8859-1. I later found this out thanks to Sherm Pendleys answer. I had to go back and change my project settings (which were set to "ISO-8859-1") to the desired "UTF-8".
  • php is set per .htaccess to serve .php-pages in utf-8 with: AddCharset UTF-8 .php
  • html has a meta tag specifying: <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
  • To test I set used php header('Content-Type:text/html; charset=UTF-8');

The page is evidently served in utf-8 (firefox and chrome recognize it as such) but any special characters such as é, á or ¡ will just show as ?. Also when viewing the source code.

When dropping the encoding settings mentioned above all characters are rendered correctly but the encoding that is detected shows either windows-1252 or ISO-8859-1 depending on the browser.

How come? I'm very puzzled. I would have expected the exact opposite behavior.
Any advice is welcome, thanks!

edit: Hopefully this helps a bit more. This is the response header (as per firebug)

HTTP/1.1 200 OK
Date: Sat, 26 Mar 2011 20:49:44 GMT
Server: Apache/2.2.14 (Win32) DAV/2 mod_ssl/2.2.14 OpenSSL/0.9.8l mod_autoindex_color PHP/5.3.1 mod_apreq2-20090110/2.7.1 mod_perl/2.0.4 Perl/v5.10.1
X-Powered-By: PHP/5.3.1
Content-Length: 91
Keep-Alive: timeout=5, max=99
Connection: Keep-Alive
Content-Type: text/html; charset=utf-8

 Answers

25

When [dropping] the encoding settings mentioned above all characters [are rendered] correctly but the encoding that is detected shows either windows-1252 or ISO-8859-1 depending on the browser.

Then that's what you're really sending. None of the encoding settings in your bullet list will actually modify your output in any way; all they do is tell the browser what encoding to assume when interpreting what you send. That's why you're getting those ?s - you're telling the browser that what you're sending is UTF-8, but it's really ISO-8859-1.

Wednesday, March 31, 2021
 
Whakkee
answered 7 Months ago
91

Looks like you have missed charset specification in your browser ,

try adding <meta charset="UTF-8"> this in your webpage head section . I previously had an issue like this to display multilingual text in UTF -8 I did the same to solve this issue .

hope this helps

BTW

for HTML 5 <meta charset="UTF-8"> works

in case of HTML 4

<meta http-equiv="Content-type" content="text/html;charset=UTF-8">

and in case of XML you have to specify

<?xml version="1.0" encoding="UTF-8"?>

Here is the place where you can get all information

Declaring character encodings in HTML

There are several ways to setup the content charset , even you can setup your server also to render always utf-8 you can read here for more info in the server setup section

EDIT : -

After conversation with you in the comment section ,

Your problem is with Joomla

you tested by putting charset ISO-8859 in the webpage and it works this clearly proves that you are getting content in ISO not in UTF-8

probabily your mysql Database is not in UTF-8 I think and that is why it is sending ISO text to front , you can change the DB to UTF-8 general-ci or ISO latin1 which ever is feasible and that works I suggest you to change DB to utf-8-general-ci since you already have html pages with header set to utf-8 and that will solve your problem .

Also if you cant change the DB then you already know that its in ISO charset so change all your Joomla template header to ISO charset .

which looks like this

<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">

OR

in php

header('Content-Type: text/html; charset=iso-8859-1'); 

by removing your charset utf-8 declaration which is existing .

Saturday, May 29, 2021
 
saad
answered 5 Months ago
76

As justhalf points out above, my question here is essentially a duplicate of this question.

The HTML content reported itself as UTF-8 encoded and, for the most part it was, except for one or two rogue invalid UTF-8 characters.

This apparently confuses BeautifulSoup about which encoding is in use, and when trying to first decode as UTF-8 when passing the content to BeautifulSoup like this:

soup = BeautifulSoup(response.read().decode('utf-8'))

I would get the error:

UnicodeDecodeError: 'utf8' codec can't decode bytes in position 186812-186813: 
                    invalid continuation byte

Looking more closely at the output, there was an instance of the character Ü which was wrongly encoded as the invalid byte sequence 0xe3 0x9c, rather than the correct 0xc3 0x9c.

As the currently highest-rated answer on that question suggests, the invalid UTF-8 characters can be removed while parsing, so that only valid data is passed to BeautifulSoup:

soup = BeautifulSoup(response.read().decode('utf-8', 'ignore'))
Thursday, June 3, 2021
 
PedroKTFC
answered 5 Months ago
98

Quotation from wiki (Em dash)

When an actual em dash is unavailable—as in the ASCII character set—a double ("--") or triple hyphen-minus ("---") is used. In Unicode, the em dash is U+2014 (decimal 8212).

Em dash character is not a part of ASCII character set.

Monday, August 2, 2021
 
IcedAnt
answered 3 Months ago
37

You can achive this using css only (using checkbox and :checked css state).

Let me know if something not clear.

Note: the max-height:500px; it's just example. If the content supposed to be larger, play with this value.

input[type="checkbox"] {
    display:none;
}

label {
  color:blue;
  text-decoration:underline;
  margin-top:10px;
  cursor:pointer;
  display:inline-block;
}

label:after {
  content:"more";  
}

input:checked ~ label:after {
  content:"less";  
}

.inner {
  max-height:100px;
  overflow:hidden;
  transition:all .3s ease;
  width:100px;
}

input:checked + .inner {
  max-height:500px;
}
<div class="outer">
  <input type="checkbox" id="readmore" />
  <div class="inner">
    CONTENT CONTENT CONTENT CONTENT
    CONTENT CONTENT CONTENT CONTENT
    CONTENT CONTENT CONTENT CONTENT
    CONTENT CONTENT CONTENT CONTENT
    CONTENT CONTENT CONTENT CONTENT
    CONTENT CONTENT CONTENT CONTENT
    CONTENT CONTENT CONTENT CONTENT
  </div>
  <label for="readmore">Read </label>
</div>
Friday, September 3, 2021
 
hoof_hearted
answered 2 Months ago
Only authorized users can answer the question. Please sign in first, or register a free account.
Not the answer you're looking for? Browse other questions tagged :