Asked  7 Months ago    Answers:  5   Viewed   29 times

I'm new to DOM parsing in PHP:
I have a HTML file that I'm trying to parse. It has a bunch of DIVs like this:

<div id="interestingbox"> 
   <div id="interestingdetails" class="txtnormal">
        <div>Content1</div>
        <div>Content2</div>
   </div>
</div>

<div id="interestingbox"> 
......

I'm trying to get the contents of the many div boxes using php. How can I use the DOM parser to do this?

Thanks!

 Answers

91

First i have to tell you that you can't use the same id on two different divs; there are classes for that point. Every element should have an unique id.

Code to get the contents of the div with id="interestingbox"

$html = '
<html>
<head></head>
<body>
<div id="interestingbox"> 
   <div id="interestingdetails" class="txtnormal">
        <div>Content1</div>
        <div>Content2</div>
   </div>
</div>

<div id="interestingbox2"><a href="#">a link</a></div>
</body>
</html>';


$dom_document = new DOMDocument();

$dom_document->loadHTML($html);

//use DOMXpath to navigate the html with the DOM
$dom_xpath = new DOMXpath($dom_document);

// if you want to get the div with id=interestingbox
$elements = $dom_xpath->query("*/div[@id='interestingbox']");

if (!is_null($elements)) {

  foreach ($elements as $element) {
    echo "n[". $element->nodeName. "]";

    $nodes = $element->childNodes;
    foreach ($nodes as $node) {
      echo $node->nodeValue. "n";
    }

  }
}

//OUTPUT
[div]  {
        Content1
        Content2
}

Example with classes:

$html = '
<html>
<head></head>
<body>
<div class="interestingbox"> 
   <div id="interestingdetails" class="txtnormal">
        <div>Content1</div>
        <div>Content2</div>
   </div>
</div>

<div class="interestingbox"><a href="#">a link</a></div>
</body>
</html>';

//the same as before.. just change the xpath

[...]

$elements = $dom_xpath->query("*/div[@class='interestingbox']");

[...]

//OUTPUT
[div]  {
        Content1
        Content2
}

[div]  {
a link
}

Refer to the DOMXPath page for more details.

Wednesday, March 31, 2021
 
AlterPHP
answered 7 Months ago
37

You need to do it manually.

DOM handles HTML attributes, not CSS properties.

You need to access the style attribute, explode it's value using ; as a delimiter, then loop the array looking for the value you want to unset.

Wednesday, March 31, 2021
 
huhushow
answered 7 Months ago
54

It's possible since phing 2.4.13 with the configuration attribute:

<phpunit configuration="path/to/phpunit.xml"/>
Saturday, May 29, 2021
 
alez
answered 5 Months ago
14

Put this code in test.php

require 'simple_html_dom.php';
$html = file_get_html('test1.php');
foreach($html->find('table tr') as $element)
{
    foreach($element->find('a',0) as $element)
    {
        echo $element->plaintext;
    }
}

and put your html code in test1.php

<table>
    <tbody>
        <tr>
            <td>
                <a href="#">1st Link</a>
            </td>
            <td>
                <a href="">2nd Link</a>
            </td>
            <td>
                <a href="#">3rd Link</a>
            </td>
        </tr>

        <tr>
            <td>
                <a href="#">1st Link</a>
            </td>
            <td>
                <a href="#">2nd Link</a>
            </td>
            <td>
                <a href="#">3rd Link</a>
            </td>
        </tr>
    </tbody>
</table>
Saturday, May 29, 2021
 
Pwner
answered 5 Months ago
46

You can define a custom function DOMinnerHTML() (described here) to retrieve an element's inner HTML, rather than its text content. It works by temorarlily creating a new document:

<?php 
function DOMinnerHTML($element) 
{ 
    $innerHTML = ""; 
    $children = $element->childNodes; 
    foreach ($children as $child) 
    { 
        $tmp_dom = new DOMDocument(); 
        $tmp_dom->appendChild($tmp_dom->importNode($child, true)); 
        $innerHTML.=trim($tmp_dom->saveHTML()); 
    } 
    return $innerHTML; 
} 
?> 

Example usage:

$doc = new DOMDocument();
$doc -> loadHTML($page);
$divs = $doc->getElementsByTagName('div');
foreach($divs as $div) {
    if ($div->getAttribute('class') === 'text_container') {
        $innerHtml = DOMinnerHTML($div);
        echo '<div>' . $innerHtml . '</div>';
    }
}
Saturday, May 29, 2021
 
Tak
answered 5 Months ago
Tak
Only authorized users can answer the question. Please sign in first, or register a free account.
Not the answer you're looking for? Browse other questions tagged :