Asked  7 Months ago    Answers:  5   Viewed   31 times

I'm building an XML file from scratch and need to know if htmlentities() converts every character that could potentially break an XML file (and possibly UTF-8 data)?

The values will be from a twitter/flickr feed, so I need to be sure-

 Answers

43

htmlentities() is not a guaranteed way to build legal XML.

Use htmlspecialchars() instead of htmlentities() if this is all you are worried about. If you have encoding mismatches between the representation of your data and the encoding of your XML document, htmlentities() may serve to work around/cover them up (it will bloat your XML size in doing so). I believe it's better to get your encodings consistent and just use htmlspecialchars().

Also, be aware that if you pump the return value of htmlspecialchars() inside XML attributes delimited with single quotes, you will need to pass the ENT_QUOTES flag as well so that any single quotes in your source string are properly encoded as well. I suggest doing this anyway, as it makes your code immune to bugs resulting from someone using single quotes for XML attributes in the future.

Edit: To clarify:

htmlentities() will convert a number of non-ANSI characters (I assume this is what you mean by UTF-8 data) to entities (which are represented with just ANSI characters). However, it cannot do so for any characters which do not have a corresponding entity, and so cannot guarantee that its return value consists only of ANSI characters. That's why I 'm suggesting to not use it.

If encoding is a possible issue, handle it explicitly (e.g. with iconv()).

Edit 2: Improved answer taking into account Josh Davis's comment belowis .

Wednesday, March 31, 2021
 
tika
answered 7 Months ago
52

For simple XML it is often easier to just output the string. But the more complex your document gets, the more benefit you will get from using an XML library (either those included with PHP or a third party script) as it will help you to output correct XML.

For a sitemap, you would probably be best just writing the string.

Saturday, May 29, 2021
 
inVader
answered 5 Months ago
25

take a look at PEAR's XML_Serializer package. I've used it with pretty good results. You can feed it arrays, objects etc and it will turn them into XML. It also has a bunch of options like picking the name of the root node etc.

Should do the trick

Saturday, July 10, 2021
 
ajreal
answered 4 Months ago
38

FireFox seems to be correct, according to the WHATWG specification.

The XMLHttpRequest specification of the FormData constructor says:

  1. If form is given, set fd's entries to the result of constructing the form data set for form.

Then in the description of constructing the form data set, it says:

The algorithm to construct the form data set for a form form optionally in the context of a submitter submitter is as follows. If not specified otherwise, submitter is null.

A button in the form is only included in the form data set if it's the submitter. But when this algorithm is executed from the FormData constructor, no submitter is specified, so no buttons should be included in the form data set.

Thursday, July 29, 2021
 
diegoiglesias
answered 3 Months ago
80

This is what I came up with and works well, and tested.

NOTE: However if the file (file.xml) does not exist, it will throw off an error, so if you figure out a way to automatically delete the old file(s) via CRON or any other method (you mentioned: "...and store it for X amount of time."), you'll have to come up with a way to make a pre-built structured file with at least one set of entries inside it.

E.g.:

<?xml version="1.0" encoding="UTF-8"?>
<entries>
  <reports>
    <timestamp>May 31, 2013, 11:56 am</timestamp>
    <fname>Fred</fname>
    <lname>Fletcher</lname>
    <location>Canada</location>
    <report>Wind Damage</report>
    <description>Winds were gusting mighty hard today!</description>
  </reports>
</entries>

This is relatively easy to do, I've done it before with an if file exists....

Here is my working code:

<?php

// Script by Fred Fletcher, Canada.

$fname = $_POST['firstname'];
$lname = $_POST['lastname'];
$location = $_POST['location'];
$report = $_POST['report'];
$description = $_POST['desc'];

$xml = new DOMDocument('1.0', 'utf-8');
$xml->formatOutput = true;
$xml->preserveWhiteSpace = false;
$xml->load('file.xml');

$element = $xml->getElementsByTagName('reports')->item(0);

$timestamp = $element->getElementsByTagName('timestamp')->item(0);
$fname = $element->getElementsByTagName('fname')->item(0);
$lname = $element->getElementsByTagName('lname')->item(0);
$location = $element->getElementsByTagName('location')->item(0);
$report = $element->getElementsByTagName('report')->item(0);
$description = $element->getElementsByTagName('description')->item(0);

$newItem = $xml->createElement('reports');

$newItem->appendChild($xml->createElement('timestamp', date("F j, Y, g:i a",time())));;

$newItem->appendChild($xml->createElement('fname', $_POST['firstname']));
$newItem->appendChild($xml->createElement('lname', $_POST['lastname']));
$newItem->appendChild($xml->createElement('location', $_POST['location']));
$newItem->appendChild($xml->createElement('report', $_POST['report']));
$newItem->appendChild($xml->createElement('description', $_POST['desc']));

$xml->getElementsByTagName('entries')->item(0)->appendChild($newItem);

$xml->save('file.xml');

echo "Data has been written.";

?>

A "plug" as a comment in the script would be nice, "Script by Fred Fletcher, Canada." (wink)

Let me know how this works out for you.

Friday, July 30, 2021
 
derp
answered 3 Months ago
Only authorized users can answer the question. Please sign in first, or register a free account.
Not the answer you're looking for? Browse other questions tagged :