Asked  7 Months ago    Answers:  5   Viewed   41 times

I'm using PHP DOM and I'm trying to get an element within a DOM node that have a given class name. What's the best way to get that sub-element?

Update: I ended up using Mechanize for PHP which was much easier to work with.

 Answers

87

Update: Xpath version of *[@class~='my-class'] css selector

So after my comment below in response to hakre's comment, I got curious and looked into the code behind Zend_Dom_Query. It looks like the above selector is compiled to the following xpath (untested):

[contains(concat(' ', normalize-space(@class), ' '), ' my-class ')]

So the PHP would be:

$dom = new DomDocument();
$dom->load($filePath);
$finder = new DomXPath($dom);
$classname="my-class";
$nodes = $finder->query("//*[contains(concat(' ', normalize-space(@class), ' '), ' $classname ')]");

Basically, all we do here is normalize the class attribute so that even a single class is bounded by spaces, and the complete class list is bounded in spaces. Then append the class we are searching for with a space. This way we are effectively looking for and find only instances of my-class .


Use an xpath selector?

$dom = new DomDocument();
$dom->load($filePath);
$finder = new DomXPath($dom);
$classname="my-class";
$nodes = $finder->query("//*[contains(@class, '$classname')]");

If it is only ever one type of element you can replace the * with the particular tagname.

If you need to do a lot of this with very complex selector I would recommend Zend_Dom_Query which supports CSS selector syntax (a la jQuery):

$finder = new Zend_Dom_Query($html);
$classname = 'my-class';
$nodes = $finder->query("*[class~="$classname"]");
Wednesday, March 31, 2021
 
weegee
answered 7 Months ago
10

You can use XPath on your DOMDocument as follows:

$doc->loadHTML($article_header);
$xpath = new DOMXpath($doc);

$imagesAndIframes = $xpath->query('//img | //iframe');

$length = $imagesAndIframes->length;
for ($i = 0; $i < $length; $i++) {
    $element = $imagesAndIframes->item($i);

    if ($element->tagName == 'img') {
        echo 'img';
    } else {
        echo 'iframe';
    }
}
Wednesday, March 31, 2021
 
Packy
answered 7 Months ago
45

You can set the user agent in php.ini, without the need for curl. Just use the below lines before you load the DOMDocument

$agent = "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)";
ini_set('user_agent', $agent);

And then your code:

$doc = new DOMDocument();
@$doc->loadHTMLFile('http://www.facebook.com');
$xpath = new DOMXPath($doc);
echo $xpath->query('//title')->item(0)->nodeValue."n";
Wednesday, March 31, 2021
 
Keat
answered 7 Months ago
94

Use this:

$img = $dom->getElementsByTagName('img')->item(0);
echo $img->attributes->getNamedItem("src")->value;
Saturday, May 29, 2021
 
Sagar
answered 5 Months ago
70

I think it has to do with how you're iterating. You're changing the result list as it's being iterated against, so it winds up breaking (side-effects). Try changing your loop to this:

$nodes = $root->getElementsByTagNameNS($root->lookupNamespaceURI('zuq'), 'data');
$i = $nodes->length - 1;
while ($i >= 0) {
    $node = $nodes->item($i);
    $node->parentNode->replaceChild(
        $node->ownerDocument->createTextNode('foo'), 
        $node
    );
    $i--;
}

Basically, it just iterates backwards over the list of nodes, so that when nodes are removed, they are removed from the end rather than the beginning...

Friday, August 6, 2021
 
RompelStompel
answered 3 Months ago
Only authorized users can answer the question. Please sign in first, or register a free account.
Not the answer you're looking for? Browse other questions tagged :