Asked  7 Months ago    Answers:  5   Viewed   31 times

I am following the suggestion from this question Robust, Mature HTML Parser for PHP, about parsing html that may be malformed with DOMDocument.

Is there any easy way to loop over the parsed document? So I would like to loop over html like this.

$html='<ul>
         <li>value1</li>
         <li>value1</li>
         <li>value3
            <p>subvalue</p>
         </li>
        </ul>
        <p>hello world</p>';

$doc = new DOMDocument();
$doc->loadHTML($html);
???
foreach (??? as $node)
{
  print $node->nodeName.':'.$node->nodeValue;
}

And get results somewhat like this.

 ul:
 li:value1
 li:value2
 li:value3
 p:subvalue
 p:hello world

Using $doc->childNodes by itself doesn't really do what I want. Since it doesn't seem to go down to lower branches in the tree. I used the code suggested by halfdan and I get results like this.

html:
html:value1
         value1
         value3
            subvalue

        hello world

 Answers

52

Try this:

$doc = new DOMDocument();
$doc->loadHTML($html);
showDOMNode($doc);

function showDOMNode(DOMNode $domNode) {
    foreach ($domNode->childNodes as $node)
    {
        print $node->nodeName.':'.$node->nodeValue;
        if($node->hasChildNodes()) {
            showDOMNode($node);
        }
    }    
}
Wednesday, March 31, 2021
 
ranhan
answered 7 Months ago
43

Header, Nav and Section are elements from HTML5. Because HTML5 developers felt it is too difficult to remember Public and System Identifiers, the DocType declaration is just:

<!DOCTYPE html>

In other words, there is no DTD to check, which will make DOM use the HTML4 Transitional DTD and that doesnt contain those elements, hence the Warnings.

To surpress the Warnings, put

libxml_use_internal_errors(true);

before the call to loadHTML and

libxml_use_internal_errors(false);

after it.

An alternative would be to use https://github.com/html5lib/html5lib-php.

Wednesday, March 31, 2021
 
laurent
answered 7 Months ago
56

There is no &nbsp; in XML. The only character entities that have an actual name defined (instead of using a numeric reference) are &amp;, &lt;, &gt;, &quot; and &apos;.

That means you have to use the numeric equivalent of a non-breaking space, which is &#160; or (in hex) &#xA0;.

If you are trying to save HTML into an XML container, then save it as text. HTML and XML may look similar but they are very distinct. appendXML() expects well-formed XML as an argument. Use the nodeValue property instead, it will XML-encode your HTML string without any warnings.

// document fragment is completely unnecessary
$otherElement->nodeValue = $row['message'];
Wednesday, March 31, 2021
 
Lloydworth
answered 7 Months ago
67

If you want to use one variable and perform and action with it, you just need to use one loop:

for file in 4 5 6 7 8
do
   paste "${file}_1" "${file}_2"
done

This will do

paste 4_1 4_2
paste 5_1 5_2
...
Tuesday, June 1, 2021
 
Dev
answered 5 Months ago
Dev
62

This should work for you. The code will

  • Find the element
  • Iterate to get all the options from the dropdown
  • Iterate through the list
  • For each item in the list, select the current option
  • It's necessary to re-select the dropdown on each pass, as the web page has changed

Like so:

from selenium import webdriver
from selenium.webdriver.support.ui import Select, WebDriverWait
browser = webdriver.Firefox()
browser.get("http://www.website.com")

select = browser.find_element_by_xpath( "//select[@id='idname']")  #get the select element            
options = select.find_elements_by_tag_name("option") #get all the options into a list

optionsList = []

for option in options: #iterate over the options, place attribute value in list
    optionsList.append(option.get_attribute("value"))

for optionValue in optionsList:
    print "starting loop on option %s" % optionValue

    select = Select(browser.find_element_by_xpath( "//select[@id='idname']"))
    select.select_by_value(optionValue)
Saturday, August 21, 2021
 
Stefan
answered 2 Months ago
Only authorized users can answer the question. Please sign in first, or register a free account.
Not the answer you're looking for? Browse other questions tagged :