"fix malformed xml in php before processing using domdocument functions" Answer’s

0

Try using the Tidy library which can be used to clean up bad HTML and XML http://php.net/manual/en/book.tidy.php

A pure PHP solution to fix some XML like this:

<?xml version="1.0"?>
<feed>
<RECORD>
<ID>117387</ID>
<ADVERTISERNAME>Test < texter</ADVERTISERNAME>
<AID>10544740</AID>
<NAME>This & This</NAME>
<DESCRIPTION>For one day only this is > than this.</DESCRIPTION>
</RECORD>
</feed>

Would be something like this:

  function cleanupXML($xml) {
    $xmlOut = '';
    $inTag = false;
    $xmlLen = strlen($xml);
    for($i=0; $i < $xmlLen; ++$i) {
        $char = $xml[$i];
        // $nextChar = $xml[$i+1];
        switch ($char) {
        case '<':
          if (!$inTag) {
              // Seek forward for the next tag boundry
              for($j = $i+1; $j < $xmlLen; ++$j) {
                 $nextChar = $xml[$j];
                 switch($nextChar) {
                 case '<':  // Means a < in text
                   $char = htmlentities($char);
                   break 2;
                 case '>':  // Means we are in a tag
                   $inTag = true;
                   break 2;
                 }
              }
          } else {
             $char = htmlentities($char);
          }
          break;
        case '>':
          if (!$inTag) {  // No need to seek ahead here
             $char = htmlentities($char);
          } else {
             $inTag = false;
          }
          break;
        default:
          if (!$inTag) {
             $char = htmlentities($char);
          }
          break;
        }
        $xmlOut .= $char;
    }
    return $xmlOut;
  }

Which is a simple state machine noting whether we are in a tag or not and if not then encoding the text using htmlentities.

It's worth noting that this will be memory hungry on large files so you may want to rewrite it as a stream plugin or a pre-processor.

Wednesday, March 31, 2021
 
Kemrop
answered 10 Months ago
Only authorized users can answer the question. Please sign in first, or register a free account.
Not the answer you're looking for? Browse other questions tagged :