Asked  7 Months ago    Answers:  5   Viewed   413 times

This question is intended as a reference to answer a particularly common question, which might take different forms:

  • I have an XML document which contains multiple namespaces; how do I parse it with SimpleXML?
  • My XML has a colon (":") in the tag name, how do I access it with SimpleXML?
  • How do I access attributes in my XML file when they have a colon in their name?

If your question has been closed as a duplicate of this, it may not be identical to these examples, but this page should tell you what you need to know.

Here is an illustrative example:

$xml = '
    <?xml version="1.0" encoding="utf-8"?>
    <document xmlns="http://example.com" xmlns:ns2="https://namespaces.example.org/two" xmlns:seq="urn:example:sequences">
        <list type="short">
            <ns2:item seq:position="1">A thing</ns2:item>
            <ns2:item seq:position="2">Another thing</ns2:item>
        </list>
    </document>
';
$sx = simplexml_load_string($xml);

This code will not work; why not?

foreach ( $sx->list->ns2:item as $item ) {
    echo 'Position: ' . $item['seq:position'] . "n";
    echo 'Item: ' . (string)$item . "n";
}

The first problem is that ->ns2:item is invalid syntax; but changing it to this doesn't work either:

foreach ( $sx->list->{'ns2:item'} as $item ) { ... }

Why not, and what should you use instead?

 Answers

89

What are XML namespaces?

A colon (:) in a tag or attribute name means that the element or attribute is in an XML namespace. Namespaces are a way of combining different XML formats / standards in one document, and keeping track of which names come from which format. The colon, and the part before it, aren't really part of the tag / attribute name, they just indicate which namespace it's in.

An XML namespace has a namespace identifier, which is identified by a URI (a URL or URN). The URI doesn't point at anything, it's just a way for someone to "own" the namespace. For instance, the SOAP standard uses the namespace http://www.w3.org/2003/05/soap-envelope and an OpenDocument file uses (among others) urn:oasis:names:tc:opendocument:xmlns:meta:1.0. The example in the question uses the namespaces http://example.com and https://namespaces.example.org/two.

Within a document, or a section of a document, a namespace is given a local prefix, which is the part you see before the colon. For instance, in different documents, the SOAP namespace might be given the local prefix soap:, SOAP:, SOAP-ENV:, env:, or just ns1:. These names are linked back to the identifier of the namespace using a special xmlns attribute, e.g. xmlns:soap="http://www.w3.org/2003/05/soap-envelope". The choice of prefix in a particular document is completely arbitrary, and could change each time it was generated without changing the meaning.

Finally, there is a default namespace in each document, or section of a document, which is the namespace used for elements with no prefix. It is defined by an xmlns attribute with no :, e.g. xmlns="http://www.w3.org/2003/05/soap-envelope". In the example above, <list> is in the default namespace, which is defined as http://example.com.

Somewhat peculiarly, un-prefixed attributes are never in the default namespace, but in a kind of "void namespace", which the standard doesn't clearly define. See: XML Namespaces and Unprefixed Attributes

SimpleXML gives me an empty object; what's wrong?

If you use print_r, var_dump, or similar "dump structure" functions on a SimpleXML object with namespaces in, some of the contents will not display. It is still there, and can be accessed as described below.

How do you access namespaces in SimpleXML?

SimpleXML provides two main methods for using namespaces:

  • The ->children() method allows you to access child elements in a particular namespace. It effectively switches your object to look at that namespace, until you call it again to switch back, or to another namespace.
  • The ->attributes() method works in a similar way, but allows you to access attributes in a particular namespace.

Both of these methods take the namespace identifier as their first argument. Since these identifiers are rather long, it can be useful to define a constant or variable to represent the namespaces you're working with, so you don't have to copy and paste the full URI everywhere.

For instance, the example above might become:

define('XMLNS_EG2', 'https://namespaces.example.org/two');
define('XMLNS_SEQ', 'urn:example:sequences');
foreach ( $sx->list->children(XMLNS_EG2)->item as $item ) {
    echo 'Position: ' . $item->attributes(XMLNS_SEQ)->position . "n";
    echo 'Item: ' . (string)$item . "n";
}

As a short-hand, you can also pass the methods the local alias of the namespace, by giving the second parameter as true. Remember that this prefix could change at any time, for instance, a generator might assign prefixes ns1, ns2, etc, and assign them in a different order if the code changes slightly. Using this short-hand, the code would become:

foreach ( $sx->list->children('ns2', true)->item as $item ) {
    echo 'Position: ' . $item->attributes('seq', true)->position . "n";
    echo 'Item: ' . (string)$item . "n";
}

(This short-hand was added in PHP 5.2, and you may see really old examples using a more long-winded version using $sx->getNamespaces to get a list of prefix-identifier pairs. This is the worst of both worlds, as you're still hard-coding the prefix rather than the identifier.)

Wednesday, March 31, 2021
 
Octopus
answered 7 Months ago
0
Great answer Helps Alot.
Thursday, July 29, 2021
 
w3stack
answered 3 Months ago
38

All you need is

$data = new SimpleXMLElement($xml);
$data->registerXPathNamespace('ns1','http://endpoint.websitecom/');
$part = $data->xpath("//ns1:return");
var_dump($part[0]->children("ns1",true));

Output

object(SimpleXMLElement)[3]
  public 'campaignID' => string '0' (length=1)
  public 'categoryID' => string '200230455' (length=9)
  public 'categoryName' => string 'Promotion' (length=9)
  public 'linkID' => string '10001599' (length=8)
  public 'linkName' => string 'KFL-20% off No Min' (length=18)
  public 'mid' => string '3071' (length=4)
  public 'nid' => string '1' (length=1)
  public 'clickURL' => string '
            http://someurl
        ' (length=36)
  public 'endDate' => string 'Oct 15, 2012' (length=12)
  public 'height' => string '250' (length=3)
  public 'iconURL' => string '
            http://someurl
        ' (length=36)
  public 'imgURL' => string '
            http://someurl
        ' (length=36)
  public 'landURL' => string '
            http://someurl
        ' (length=36)
  public 'serverType' => string '22' (length=2)
  public 'showURL' => string '
            http://someurl
        ' (length=36)
  public 'size' => string '13' (length=2)
  public 'startDate' => string 'Oct 14, 2012' (length=12)
  public 'width' => string '300' (length=3)
Wednesday, March 31, 2021
 
rojo
answered 7 Months ago
68

You've been fooled (and had me fooled) by the oldest trick in the SimpleXML book: SimpleXML doesn't parse the whole document into a PHP object, it presents a PHP API to an internal structure. Functions like var_dump can't see this structure, so don't always give a useful idea of what's in the object.

The reason it looks "empty" is that it is listing the children of the root element which are in the default namespace - but there aren't any, they're all in the "soapenv:" namespace.

To access namespaced elements, you need to use the children() method, passing in the full namespace name (recommended) or its local prefix (simpler, but could be broken by changes in the way the file is generated the other end). To switch back to the "default namespace", use ->children(null).

So you could get the ID attribute of the first stationV2 element like this (live demo):

// Define constant for the namespace names, rather than relying on the prefix the remote service uses remaining stable
define('NS_SOAP', 'http://schemas.xmlsoap.org/soap/envelope/');

// Download the XML
$rawxml = file_get_contents("http://opendap.co-ops.nos.noaa.gov/axis/webservices/activestations/response.jsp?v=2&format=xml&Submit=Submit");
// Parse it
$ob = simplexml_load_string($rawxml);

// Use it!
echo $ob->children(NS_SOAP)->Body->children(null)->ActiveStationsV2->stationsV2->stationV2[0]['ID'];

I've written some debugging functions to use with SimpleXML which should be much less misleading than var_dump etc. Here's a live demo with your code and simplexml_dump.

Saturday, May 29, 2021
 
Hugo
answered 5 Months ago
97

whoops!

Turns out, I had got the package installed initially but upon reinstallation it was silently failing. In between those two builds I fixed the manifest to be as you see above - the installed version didn't have the intent-filters specified, which obviously wouldn't work.

Guess I'll leave this here in case someone has the same need? Or should I just delete it?

Wednesday, July 28, 2021
 
Pupil
answered 3 Months ago
42

In short, you can't. The Python Way would be to subclass String and work from there.

Tuesday, August 3, 2021
 
revive
answered 3 Months ago
Only authorized users can answer the question. Please sign in first, or register a free account.
Not the answer you're looking for? Browse other questions tagged :