Asked  7 Months ago    Answers:  5   Viewed   30 times

I know this has been asked many many times but I haven't been able to get any of the suggestions to work with my situation and I have searched the web and here and tried everything and anything and nothing works. I just need to parse this XML with the namespace cap: and just need four entries from it.

<?xml version="1.0" encoding="UTF-8"?>
<entry>
    <id>http://alerts.weather.gov/cap/wwacapget.php?x=TX124EFFB832F0.SpecialWeatherStatement.124EFFB84164TX.LUBSPSLUB.ac20a1425c958f66dc159baea2f9e672</id>
    <updated>2013-05-06T20:08:00-05:00</updated>
    <published>2013-05-06T20:08:00-05:00</published>
    <author>
        <name>w-nws.webmaster@noaa.gov</name>
    </author>
    <title>Special Weather Statement issued May 06 at 8:08PM CDT by NWS</title>
    <link href="http://alerts.weather.gov/cap/wwacapget.php?x=TX124EFFB832F0.SpecialWeatherStatement.124EFFB84164TX.LUBSPSLUB.ac20a1425c958f66dc159baea2f9e672"/>
    <summary>...SIGNIFICANT WEATHER ADVISORY FOR COCHRAN AND BAILEY COUNTIES... AT 808 PM CDT...NATIONAL WEATHER SERVICE DOPPLER RADAR INDICATED A STRONG THUNDERSTORM 30 MILES NORTHWEST OF MORTON...MOVING SOUTHEAST AT 25 MPH. NICKEL SIZE HAIL...WINDS SPEEDS UP TO 40 MPH...CONTINUOUS CLOUD TO GROUND LIGHTNING...AND BRIEF MODERATE DOWNPOURS ARE POSSIBLE WITH</summary>
    <cap:event>Special Weather Statement</cap:event>
    <cap:effective>2013-05-06T20:08:00-05:00</cap:effective>
    <cap:expires>2013-05-06T20:45:00-05:00</cap:expires>
    <cap:status>Actual</cap:status>
    <cap:msgType>Alert</cap:msgType>
    <cap:category>Met</cap:category>

    <cap:urgency>Expected</cap:urgency>
    <cap:severity>Minor</cap:severity>
    <cap:certainty>Observed</cap:certainty>
    <cap:areaDesc>Bailey; Cochran</cap:areaDesc>
    <cap:polygon>34.19,-103.04 34.19,-103.03 33.98,-102.61 33.71,-102.61 33.63,-102.75 33.64,-103.05 34.19,-103.04</cap:polygon>
    <cap:geocode>
        <valueName>FIPS6</valueName>
        <value>048017 048079</value>
        <valueName>UGC</valueName>

        <value>TXZ027 TXZ033</value>
    </cap:geocode>
    <cap:parameter>
        <valueName>VTEC</valueName>
        <value>
        </value>
    </cap:parameter>
</entry>

I am using simpleXML and I have a small simple test script set up and it works great for parsing regular elements. I can't for the dickens of me find or get a way to parse the elements with the namespaces.

Here is a small sample test script with code I am using and works great for parsing simple elements. How do I use this to parse namespaces? Everything I've tried doesn't work. I need it to be able to create variables so I can be able to embed them in HTML for style.

<?php 

$html = "";  

// Get the XML Feed
$data = "http://alerts.weather.gov/cap/tx.php?x=1";


// load the xml into the object
$xml = simplexml_load_file($data);

for ($i = 0; $i < 10; $i++){
    $title = $xml->entry[$i]->title;
    $summary = $xml->entry[$i]->summary;

    $html .= "<p><strong>$title</strong></p><p>$summary</p><hr/>";

}

 echo $html; 
?> 

This works fine for parsing regular elements but what about the ones with the cap: namespace under the entry parent?

<?php
ini_set('display_errors','1');

$html = "";
$data = "http://alerts.weather.gov/cap/tx.php?x=1";
$entries = simplexml_load_file($data);
if(count($entries)):
    //Registering NameSpace
    $entries->registerXPathNamespace('prefix', 'http://www.w3.org/2005/Atom');
    $result = $entries->xpath("//prefix:entry");
    //echo count($asin);
    //echo "<pre>";print_r($asin);
    foreach ($result as $entry):
        $title = $entry->title;
        $summary = $entry->summary;

        $html .= "<p><strong>$title</strong></p><p>$summary</p>$event<hr/>";

    endforeach;
endif;

echo $html;

?>

Any help would be greatly appreciated.

-Thanks

 Answers

11

I have given same type of answer here - solution to your question

You just need to register Namespace and then you can work normally with simplexml_load_file and XPath

<?php
$data = "http://alerts.weather.gov/cap/tx.php?x=1";
$entries = file_get_contents($data);
$entries = new SimpleXmlElement($entries);
if(count($entries)):
    //echo "<pre>";print_r($entries);die;
    //alternate way other than registring NameSpace
    //$asin = $asins->xpath("//*[local-name() = 'ASIN']");

    $entries->registerXPathNamespace('prefix', 'http://www.w3.org/2005/Atom');
    $result = $entries->xpath("//prefix:entry");
    //echo count($asin);
    //echo "<pre>";print_r($result);die;
    foreach ($result as $entry):
        //echo "<pre>";print_r($entry);die;
        $dc = $entry->children('urn:oasis:names:tc:emergency:cap:1.1');
        echo $dc->event."<br/>";
        echo $dc->effective."<br/>";
        echo "<hr>";
    endforeach;
endif;

That's it.

Wednesday, March 31, 2021
 
toesslab
answered 7 Months ago
81

What you tried looks very strange, but at least you tried something ;-) ... You somehow mixed a POST and GET request, there is no header defined and where is the xml format?

Probably it helps to read this first: What is a XML-RPC Request

Then concerning App Inventor you can try to use the following blocks.
EDIT: update of the screenshot to make things clearer.

enter image description here

Wednesday, March 31, 2021
 
hillz
answered 7 Months ago
28

I dont want xml writer to encode the multilingual characters , how this is possible ?

Actually as you write XML you already encode. What you mean is that you don't want to use numeric entities for these two characters which is possible but not always.

To not use numeric entities, you need to match the encoding of the document with the encoding of your string. From the output you provided I can only guess a bit, those two characters probably stand for:

  1. Unicode Han Character 'the Chinese people, Chinese language' (U+6F22)
  2. Unicode Han Character 'letter, character, word' (U+5B57)

Which could mean (I do not speak any Chinese so far) something like Chinese Word.

XMLWriter in PHP will always put characters into a numeric entity (like &#x6F22; and &#x5B57; in your example) whenever the encoding of the document is not able to represent that character within the document.

If you are able to match both encodings XMLWriter will automatically not use the numeric entities.

I give a more simple example. Let's take the US-ASCII encoding and the German umlaut Ä from Äpfel (Unicode Character 'LATIN CAPITAL LETTER A WITH DIAERESIS' (U+00C4)) as an attribute value:

<?php
$xmlWriter = new XMLWriter();
$xmlWriter->openMemory();
$xmlWriter->startDocument('1.0', 'US-ASCII');
$xmlWriter->startElement('root');
$xmlWriter->writeAttribute('value', 'Äpfel');
$xmlWriter->endDocument();
echo $xmlWriter->flush();

This code written down in an UTF-8 encoded PHP file will output when executed:

<?xml version="1.0" encoding="US-ASCII"?>
<root value="&#196;pfel"/>

&#196; is the numeric entity for the unicode character U+00C4 and if you look closely, C4 is the hexadecimal representation of decimal 196 which also shows that the numeric XML entity always represents the Unicode character number.

So the XML output uses the US-ASCII encoding which is not able to represent the Ä from the UTF-8 encoded string in the PHP code and therefore properly encodes it with it's numeric entity to preserve the character information.

Now changing the encoding from:

$xmlWriter->startDocument('1.0', 'US-ASCII');

to the UTF-8 encoding of the PHP string:

$xmlWriter->startDocument('1.0', 'UTF-8');

does change this output:

<?xml version="1.0" encoding="UTF-8"?>
<root value="Äpfel"/>

This would equally work with your example however, one important information in your question is missing: In which encoding is the string from that record?

If it is UTF-8 already, then like I outlined in the example above, it would work already:

<?php
$recordUTf8 = "... contents="Just <span style="color:red">testing</span>:"
             ."xE6xBCxA2xE5xADx97"";
$encoding   = 'UTF-8';
$encoding   = 'US-ASCII';

$xmlWriter = new XMLWriter();
$xmlWriter->openMemory();
$xmlWriter->startDocument('1.0', $encoding);
$xmlWriter->startElement('record');
$xmlWriter->writeAttribute('value', $recordUTf8);
$xmlWriter->endDocument();
echo $xmlWriter->flush();

Output:

<?xml version="1.0" encoding="UTF-8"?>
<record value="... contents=&quot;Just &lt;span style=&quot;color:red&quot;&gt;
               testing &lt;/span&gt;:?? &quot;"/>

As this output show, no numeric entities are used here, however, the string is clearly UTF-8 encoded (in a binary safe manner here in case you use a different encoding for the PHP file if you copy it over).

So just to summarize at this point: The XML encoding need to match the encoding of the string to represent all characters not in numeric entities (apart from the ones used to encode XML itself like <, >, ', " and &).

These are pretty much XML basics. If the document has an encoding the character data can not be represented in but as XML supports Unicode, the fallback are numeric entities. You are trying to prevent this fallback by aligning the document encoding with the string encoding.

Here is my advice for PHP & XMLWriter specifically:

  1. Obtain or re-encode the record from the database to UTF-8.
  2. Only pass UTF-8 strings into XMLWriter methods.
  3. Set the XML documents encoding to UTF-8.

I give these suggestions because UTF-8 is the default encoding of XML and UTF-8 support is quite well in PHP. Also XMLWriter expects Unicode strings to be UTF-8 encoded, there is no setting or option that allows you to change that, so the input already needs to be UTF-8 encoded.

However independent to the input string, you can naturally tell XMLWriter to use a different output encoding. For example any other Chinese or Unicode Encoding might be suitable for you and it is possible for XMLWriter output as long as your PHP configuration supports that specific output encoding (check the iconv library you have).

When you start the document with XMLWriter, the second parameter specifies the encoding:

$xmlWriter->startDocument('1.0', $encoding);

You can put in any encoding from the set of the encodings XML supports in the corresponding XML-Declaration:

<?xml version="1.0" encoding="ISO-8859-1"?><!-- Latin-1 example -->

The full specs of the XML encoding value can be found here: http://www.w3.org/TR/REC-xml/#NT-EncName ::

In an encoding declaration, the values " UTF-8 ", " UTF-16 ", " ISO-10646-UCS-2 ", and " ISO-10646-UCS-4 " should be used for the various encodings and transformations of Unicode / ISO/IEC 10646, the values " ISO-8859-1 ", " ISO-8859-2 ", ... " ISO-8859- n " (where n is the part number) should be used for the parts of ISO 8859, and the values " ISO-2022-JP ", " Shift_JIS ", and " EUC-JP " should be used for the various encoded forms of JIS X-0208-1997. It is recommended that character encodings registered (as charsets) with the Internet Assigned Numbers Authority [IANA-CHARSETS], other than those just listed, be referred to using their registered names; other encodings should use names starting with an "x-" prefix. XML processors should match character encoding names in a case-insensitive way and should either interpret an IANA-registered name as the encoding registered at IANA for that name or treat it as unknown (processors are, of course, not required to support all IANA-registered encodings).

Where-as [IANA-CHARSETS] is:

(Internet Assigned Numbers Authority) Official Names for Character Sets, ed. Keld Simonsen et al. (See http://www.iana.org/assignments/character-sets.)

These specs are perhaps a little bit verbose. In the context of your question, all you need to do is to find out the encoding of your record-string. I btw. can't say I was not able to reproduce your exact output, I always get decimal entities, not hexa-decimal ones. You might be able to provide more information with a hex-dump of the string.

Wednesday, March 31, 2021
 
mopsyd
answered 7 Months ago
86

(Read @ThW's answer about why an array is actually not that important to aim for)

I know it's easy with non-namespaced nodes, but I don't know where to begin on something like this.

It's as easy as with namespaced nodes because technically those are the same. Let's give a quick example, the following script loops over all elements in the document regardless of namespace:

$result = $xml->xpath('//*');
foreach ($result as $element) {
    $depth = count($element->xpath('./ancestor::*'));
    $indent = str_repeat('  ', $depth);
    printf("%s %sn", $indent, $element->getName());
}

The output in your case is:

 message
   header
     response
       result
       gsbStatus
   body
     bodyContent
       attendee
         person
           name
           firstName
           lastName
           address
             addressType
           phones
           email
           type
         contactID
         joinStatus
         meetingKey

As you can see you can iterate over all elements as if they would not have any namespace at all.

But as it has been outlined, when you ignore the namespace you'll also loose important information. For example with the document you have you're actually interested in the attendee and common elements, the service elements deal with the transport:

$uriAtt = 'http://www.webex.com/schemas/2002/06/service/attendee';
$xml->registerXPathNamespace('att', $uriAtt);

$uriCom = 'http://www.webex.com/schemas/2002/06/common';
$xml->registerXPathNamespace('com', $uriCom);

$result = $xml->xpath('//att:*|//com:*');
foreach ($result as $element) {
    $depth  = count($element->xpath("./ancestor::*[namespace-uri(.) = '$uriAtt' or namespace-uri(.) = '$uriCom']"));
    $indent = str_repeat('  ', $depth);
    printf("%s %sn", $indent, $element->getName());
}

The exemplary output this time:

 attendee
   person
     name
     firstName
     lastName
     address
       addressType
     phones
     email
     type
   contactID
   joinStatus
   meetingKey

So why drop all the namespaces? They help you to obtain the elements you're interested in. You can also do it dynamically

Saturday, May 29, 2021
 
Fernando
answered 5 Months ago
63

You should not read the namespaces from the document. The namespace is a unique string defining the XML semantic the tag is part of. Your XML is a good example for that, because it has Point elements in two different namespaces.

p:Point is {http://example.org}:Point gml:Point is {http://www.opengis.net/gml}:Point

The namespace prefixes like p and gml are aliases to make a document smaller and more readable. They are only valid for the element and its children. They can be redefined at any point. More important they are only valid for the document.

So to read XML you define own prefixes for the namespaces and use them with Xpath or you use the namespace aware variants of the DOM methods like getAttributeNS(). Xpath is by a long way the more elegant solution. You can use the prefixes from the document or different ones.

$element = simplexml_load_string($content);
$element->registerXPathNamespace('gml', 'http://www.opengis.net/gml');
$element->registerXPathNamespace('p', 'http://example.org');

$result = [];
$positions = $element->xpath('//p:Point[1]//gml:pos');
foreach ($positions as $pos) {
  $result[] = (string)$pos;
}

var_dump($result);

Output: https://eval.in/159739

array(5) {
  [0]=>
  string(23) "-3.84307585 43.46031547"
  [1]=>
  string(23) "-3.84299411 43.46018513"
  [2]=>
  string(23) "-3.84299935 43.45998723"
  [3]=>
  string(23) "-3.84309913 43.46054546"
  [4]=>
  string(23) "-3.84307585 43.46031547"
}
Saturday, August 7, 2021
 
benjisail
answered 3 Months ago
Only authorized users can answer the question. Please sign in first, or register a free account.
Not the answer you're looking for? Browse other questions tagged :