Asked  7 Months ago    Answers:  5   Viewed   27 times

I originally asked a question along these lines using Regex but was recommended to use the PHP DOM library instead... which is superior, but I am still stuck.

Basically, I want to wrap the contents of an <a> in a <span> if it is not already wrapped in <span>.

<?php
$input = <<<EOT
<html><head></head>
<body bgcolor="#393a36">
    <a href="#"><span style="color:#ffffff;">Link 1</span></a>
    <a href="#">Link 2</a>
    <a href="#"><img src="mypic.gif" />Image Link</a>
    <a href="#"><u>Underlined Link</u></a>
</body>
</html>
EOT;


$doc = new DOMDocument();
$doc->loadHTML($input);
$tags = $doc->getElementsByTagName('a');
foreach ($tags as $tag) {
    $spancount = $tag->getElementsByTagName("span")->length;
    if($spancount == 0){
        $content = nodeContent($tag);
        $element = $doc->createElement('span');
        $element->setAttribute('style','color:#ffffff;');
        $frag = $doc->createDocumentFragment();
        $frag->appendXML($content);
        $element->appendChild($frag);   
        $tag->nodeValue = ""; //clear node
        $tag->appendChild($element);
    }
}
echo $doc->saveHTML();

function nodeContent($n, $outer=false) { 
    $d = new DOMDocument('1.0'); 
    $d->formatOutput = true;
    $b = $d->importNode($n->cloneNode(true),true); 
    $d->appendChild($b);
    $h = $d->saveHTML(); 
    // remove outter tags 
    if (!$outer) $h = substr($h,strpos($h,'>')+1,-(strlen($n->nodeName)+4)); 
    return $h; 
} 

It provides this output:

PHP Warning: DOMDocumentFragment::appendXML(): Entity: line 1: parser error : Premature end of data in tag img line 1 in /private/var/folders/78/78vHGigZHcuFeXB1KKJSb++++TI/-Tmp-/untitled_3xd..php on line 24
PHP Warning: DOMDocumentFragment::appendXML(): Image Link in /private/var/folders/78/78vHGigZHcuFeXB1KKJSb++++TI/-Tmp-/untitled_3xd..php on line 24 PHP Warning: DOMDocumentFragment::appendXML(): ^ in /private/var/folders/78/78vHGigZHcuFeXB1KKJSb++++TI/-Tmp-/untitled_3xd..php on line 24 PHP Warning: DOMNode::appendChild(): Document Fragment is empty in /private/var/folders/78/78vHGigZHcuFeXB1KKJSb++++TI/-Tmp-/untitled_3xd..php on line 25

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
<html>  
<head></head>  
<body bgcolor="#393a36">  
    <a href="#"><span style="color:#ffffff;">Link 1</span></a>  
    <a href="#"><span style="color:#ffffff;">Link 2</span></a>  
    <a href="#"><span style="color:#ffffff;"></span></a>  
    <a href="#"><span style="color:#ffffff;"><u>Underlined Link</u></span></a>  
</body>  
</html>

This mostly works, except that it is really picky, and as you can see it dies if here is an img (or similar) tag inside the a href.

What is the best way to make this work. I've been banging my head against for an embarrassing long time now.

EDIT

Based on feedback below, here is the revised code and output. Note that the text preceding the img tag isn't being wrapped for some reason. Any Ideas?

$doc = new DOMDocument();
$doc->loadHTML($input);
$tags = $doc->getElementsByTagName('a');
foreach ($tags as $tag) {
    $spancount = $tag->getElementsByTagName("span")->length;
    if($spancount == 0){
    $element = $doc->createElement('span');
    $element->setAttribute('style','color:#ffffff;');
    foreach ($tag->childNodes as $child) {
        $tag->removeChild($child);
        $element->appendChild($child);
    }
    $tag->appendChild($element);

    }
}
echo $doc->saveHTML();

Output:

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
<html>
<head></head>
<body bgcolor="#393a36">
    <a href="#"><span style="color:#ffffff;">Link 1</span></a>
    <a href="#"><span style="color:#ffffff;">Link 2</span></a>
    <a href="#">Image Link<span style="color:#ffffff;"><img src="mypic.gif"></span></a>
    <a href="#"><span style="color:#ffffff;"><u>Underlined Link</u></span></a>
</body>
</html>

 Answers

40

Why bother with re-creating the node? Why not just replace the node? (If I understand what you're trying to do)...

if($spancount == 0){
    $element = $doc->createElement('span');
    $element->setAttribute('style','color:#ffffff;');
    $tag->parentNode->replaceChild($element, $tag);
    $element->apendChild($tag);
}

Edit Whoops, it looks like you're trying to wrap everything under $tag in the span... Try this instead:

if($spancount == 0){
    $element = $doc->createElement('span');
    $element->setAttribute('style','color:#ffffff;');
    foreach ($tag->childNodes as $child) {
        $tag->removeChild($child);
        $element->appendChild($child);
    }
    $tag->appendChild($child);
}

Edit2 Based on your results, it looks like that foreach is not completing because of the node removal... Try replacing the foreach with this:

while ($tag->childNodes->length > 0) {
    $child = $tag->childNodes->item(0);
    $tag->removeChild($child);
    $element->appendChild($child);
}
Wednesday, March 31, 2021
 
John_BSDthos
answered 7 Months ago
38

PHP has a PECL extension that gives you access to the features of HTML Tidy. Tidy is a pretty powerful library that should be able to take code like that and close tags in an intelligent manner.

I use it to clean up malformed XML and HTML sent to me by a classified ad system prior to import.

Wednesday, March 31, 2021
 
IcedAnt
answered 7 Months ago
87

What you need to do is create a new paragraph element (with the new attribute) and replace the image tag with this new tag and then add the img tag back into the new paragraph tag...

foreach ($xpath->query("//figure/img") as $img) {
    $img->setAttribute('style','max-width: 100%; height: auto;');
    $img->setAttribute('class','mb-4 mt-4');
    $p = $dom->createElement("p");
    $p->setAttribute('style', 'text-align:center');
    $img->parentNode->replaceChild($p, $img);
    $p->appendChild($img);
}
Wednesday, March 31, 2021
 
daniel__
answered 7 Months ago
46

No, there is no way of specifying a particular doctype to use, or to modify the requirements of the existing one.

Your best workable solution is going to be to disable error reporting with libxml_use_internal_errors:

$dom = new DOMDocument;
libxml_use_internal_errors(true);
$dom->loadHTML('...');
libxml_clear_errors();
Wednesday, June 2, 2021
 
ajreal
answered 5 Months ago
24

This is mentioned in a couple of comments on the DomNode::removeChild documentation, with the issue apparently being how the iterator pointer on the foreach not being able to deal with the fact that you are removing items from a parent array while looping through the list of children (or something).

The recommended fix is to loop through the main node first and push the child nodes you want to delete to its own array, then loop through that "to-be-deleted" array and deleting those children from their parent. Example:

$dom = new DOMDocument();
@$dom->loadHTML($description);
$pTag = $dom->getElementsByTagName('p');

$spotid_children = array();

foreach ($pTag as $value) {
    /** @var DOMElement $value */
    $id = $value->getAttribute('data-spotid');
    if ($id) {
        $spotid_children[] = $value; 
    }
}

foreach ($spotid_children as $spotid_child) {
    $spotid_child->parentNode->removeChild($spotid_child); 
}
Monday, August 16, 2021
 
Litty
answered 2 Months ago
Only authorized users can answer the question. Please sign in first, or register a free account.
Not the answer you're looking for? Browse other questions tagged :