Asked  7 Months ago    Answers:  5   Viewed   86 times

I'm currently rewriting a PHP class that tried to split an XML file into smaller chunks to use XMLReader and XMLWriter instead of the current basic filesystem and regex approach.

However, I can't figure out how to get the version, encoding and standalone flags from the XML preamble.

The start of my test XML file looks like this:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE fakedoctype SYSTEM "fake_doc_type.dtd">

 <!--
 This is a comment, it's here to try and get the parser to break in some way
 --> 

<root attribute="value" otherattribute="othervalue">

I can open it okay with the reader and move through the document with read(), next() etc, but I just can't seem to get whatever's in <?xml ... ?>. The first thing I'm able to access is the fake DOCTYPE.

My testing code is as follows:

$a = new XMLReader ();
var_dump ($a -> open ('/path/to/test/file.xml')) // true
var_dump ($a -> nodeType); // 0
var_dump ($a -> name); // ""
var_dump ($a -> readOuterXML ()); // ''
var_dump ($a -> read ()); // true
var_dump ($a -> nodeType); // 10
var_dump ($a -> readOuterXML ()); // <!DOCTYPE fakedoctype SYSTEM "fake_doc_type.dtd">

Of course I could just always assume XML 1.0, encoding UTF8 and standalone = yes, but for the sake of correctness I'd really rather be able to grab what the values in my source feed are and use them when generating the split files.

The documentation on XMLReader and XMLwriter seems to be very poor, so there's every chance I've just missed something in the docs. Does anyone know what to do in this case?

 Answers

89

What I know from XMLReader even it has the XMLReader::XML_DECLARATION constant, I have never experienced it when traversing the document with XMLReader::read() in the XMLReader::$nodeType property.

It looks like that it gets skipped and I also wondered why this is and I have not yet found any flag or option to change this behavior.

For the output, XMLReader always returns UTF-8 encoded strings. That's the same as with the other libxml based parts in PHP. So from that side, all is clear. But I assume that is not the part you're interested in, but the concrete string input in the file you open with XMLReader::open().

Not specifically for XMLReader I once created a utility class I named XMLRecoder which is able to detect the encoding of an XML string based on the XML declaration and also based on BOM. I think you should do both. That's one part I think you still need to use regular expressions for but as the XML declaration must be the first thing and also it is a processing instruction (PI) that is very well and strict defined you should be able to peek in there.

This is some related part from the XMLRecoder code:

### excerpt from https://gist.github.com/hakre/5194634 

/**
 * pcre pattern to access EncodingDecl, see <http://www.w3.org/TR/REC-xml/#sec-prolog-dtd>
 */
const DECL_PATTERN = '(^<?xmls+versions*=s*(["'])(1.d+)1s+encodings*=s*(["'])(((?!3).)*)3)';
const DECL_ENC_GROUP = 4;
const ENC_PATTERN = '(^[A-Za-z][A-Za-z0-9._-]*$)';

...

($result = preg_match(self::DECL_PATTERN, $buffer, $matches, PREG_OFFSET_CAPTURE))
    && $result = $matches[self::DECL_ENC_GROUP];

As this shows it goes until encoding, so it's not complete. However for the needs to extract encoding (and for your needs version), it should do the job. I had run this against a tons (thousands) of random XML documents for testing.

Another part is the BOM detection:

### excerpt from https://gist.github.com/hakre/5194634 

const BOM_UTF_8 = "xEFxBBxBF";
const BOM_UTF_32LE = "xFFxFEx00x00";
const BOM_UTF_16LE = "xFFxFE";
const BOM_UTF_32BE = "x00x00xFExFF";
const BOM_UTF_16BE = "xFExFF";

...

/**
 * @param string $string string (recommended length 4 characters/octets)
 * @param string $default (optional) if none detected what to return
 * @return string Encoding, if it can not be detected defaults $default (NULL)
 * @throws InvalidArgumentException
 */
public function detectEncodingViaBom($string, $default = NULL)
{
    $len = strlen($string);

    if ($len > 4) {
        $string = substr($string, 0, 4);
    } elseif ($len < 4) {
        throw new InvalidArgumentException(sprintf("Need at least four characters, %d given.", $len));
    }

    switch (true) {
        case $string === self::BOM_UTF_16BE . $string[2] . $string[3]:
            return "UTF-16BE";

        case $string === self::BOM_UTF_8 . $string[3]:
            return "UTF-8";

        case $string === self::BOM_UTF_32LE:
            return "UTF-32LE";

        case $string === self::BOM_UTF_16LE . $string[2] . $string[3]:
            return "UTF-16LE";

        case $string === self::BOM_UTF_32BE:
            return "UTF-32BE";
    }

    return $default;
}

With the BOM detection I also did run this against the same set of XML documents, however, not many were with BOMs. As you can see, the detection order is optimized for the more common scenarios while taking care of the duplicate binary patterns between the different BOMs. Most documents I encountered are w/o BOM and you mainly need it to find out if the document is UTF-32 encoded.

Hope this at least gives some insights.

Friday, May 28, 2021
 
inieto
answered 7 Months ago
96

This should work.

function test()
{
    $request = $HTTP_RAW_POST_DATA;
    error_reporting(E_ERROR | E_WARNING | E_PARSE);
    $url = "http://site.xml";
    $reader = new XMLReader();
    $reader->open($url);;

    $var = array();
    $i = 0;
    $limit = 3;

    while ($reader->read()) 
    {
        if (($reader->name == "id" || $reader->name == "username") && $reader->nodeType == XMLReader::ELEMENT)
        {
        $name = $reader->name;
            if ($i == $limit) break;
            while ($reader->read())
            {
                if ($reader->nodeType == XMLReader::TEXT
                    || $reader->nodeType == XMLReader::CDATA
                    || $reader->nodeType == XMLReader::WHITESPACE
                    || $reader->nodeType == XMLReader::SIGNIFICANT_WHITESPACE)
                {
                    $var[$i][$name] = $reader->value;
                }
                else if ($reader->nodeType == XMLReader::END_ELEMENT && $reader->name == "id")
                {
                    break;
                }
             }

             if($name == "username")
        $i++;

        }
    }
    $reader->close();

    echo '<pre>';
    print_r($var);
    echo '</pre>';
}

CHANGES:

($reader->name == "id" || $reader->name == "username")

$name = $reader->name;

$var[$i][$name] = $reader->value;

if($name == "username") $i++;

Saturday, May 29, 2021
 
mopsyd
answered 7 Months ago
76

Add a break statement after the end of the first switch condition on the nodeType:

<?php
$xml = new XMLReader();
$xml->open("php://stdin");

while($xml->read()) {

  switch($xml->nodeType) {
    case XMLReader::ELEMENT:
      switch($xml->name) {
        case 'author':
          echo("+" . $xml->name);
          break;
    }

    // THIS LINE IS MISSING
    break;

    case XMLReader::END_ELEMENT:
      switch($xml->name) {
        case 'author':
          echo("-" . $xml->name);
          break;
      }
    }
  }
?>

Add another break after reading the END_ELEMENT, as well, if only for symmetry.

    case XMLReader::END_ELEMENT:
      switch($xml->name) {
        case 'author':
          echo("-" . $xml->name);
          break;
      }
    }

    break;

The problem happened because of the coding style. Simplify the code. For example:

$xml = new XMLReader();
$xml->open("php://stdin");

while($xml->read()) {    
  switch($xml->nodeType) {
    case XMLReader::ELEMENT: {
      startElement( $xml->name );
      break;
    }

    case XMLReader::END_ELEMENT: {
      endElement( $xml->name );
      break;
    }
  }
}

There are further simplifications you can make. PHP has an XML marshalling package, but you could also abstract the code into classes. Instances of those classes would then be able to read (or write) themselves from (or to) an XML file. For example:

$xml = new XMLReader();
$xml->open("php://stdin");

while($xml->read()) {    
  if( $xml->name == 'author' ) {
    $author = new Author();
    $author->marshall( $xml );
  }
}

This couples the details of how the object is stored with the object itself. Any time you change the Author object, you know you must change how it marshalls itself. You could abstract and extend these concepts even further using appropriate design patterns, XML schemas, and so forth.

Thus your final code might resemble:

$xml = new XMLReader();
$xml->open( "php://stdin" );
$publications = new Publications();
$publications->marshall( $xml );

The Publications object is responsible for reading the XML document and instantiating the appropriate classes whenever their associated XML tags appear:

while($xml->read()) {    
  $article = new Article();
  $article->marshall( $xml );
  add( $article );
}

Use a PHP marshalling framework to save yourself time and effort. Consider XML_Serializer:

  • http://pear.php.net/package/XML_Serializer
Saturday, May 29, 2021
 
altermativ
answered 7 Months ago
51

Hey I've had a go at this I can't test it as I dont have the additional files and cart object but it should be close to error free

I've got a Session variable 'cart' if present we grab it unserialize and were done then can edit the values and save it back out so on

If not present i.e. first hit or cart was deleted we build a new cart from the database (This isnt ideal just for testing as presently your adding every item from the database to the cart?)

If the post or get value of adjQ is present we modify some of the values of the cart object and save it back out to the session variable

If the post or get value of showCart is present we output the current cart To make this work you might have to tweak your Shopping_Cart Object to support the variables being called and the getCount function and the getAllRows function

I've removed the additional storage of an array of the items from the cart (w) not sure what thats for since you have the data stored in the object dont need to replicate it

All the request variables should be sanitized to prevent injection attacks n so on

I've added a hidden field to trigger the showCart request

Anyway hope this helps

<?php
    session_start();
?>
<!doctype html>
<html lang="en">
    <head>
        <meta charset="utf-8">
        <title>Testing the Shopping Cart</title>
    </head>
    <body>
<?php # cart.php
// This script uses the ShoppingCart and Item classes.
//error_reporting(0);

    // Create the cart:
    require('ShoppingCart.php');
    require('userMenu.php');

    $rowCount = 0;

    if(isset($_SESSION['cart']))
    {
        echo "We have a stored cart in a Session variable, retrieving data ...";

        $cart = unserialize($_SESSION["cart"]);

        $rowCount = $cart->getCount();
    }
    else
    {
        $cart = new ShoppingCart();

        // Create some items:
        require('Item.php');
        require ('Connect.php');

        $conn=Connect::doConnect();

        $query = "SELECT product_id, product_name, product_price from product";
        $result = mysqli_query($conn, $query);

        $rowCount = $result->num_rows;

        if ($result->num_rows > 0) {
            // output data of each row
            while($row = $result->fetch_assoc()) {
                $cart->addItem(new Item($row["product_id"], $row["product_name"],$row["product_price"]));
            }
        }
        $conn->close();
    }

    if(isset($_REQUEST['adjQ']))
    {
        echo "In stoc avem ".$rowCount." tipuri de produse";

        // Update some quantities:
        $cart_items_new = array_combine($_POST['item_adjust'], $_POST['quantity']);
        foreach ($cart_items_new as $product_id=>$quantity) {
            if($quantity > 0) {
                $cart->updateItem($product_id, $quantity);

                $conn=Connect::doConnect();

                $query1 = "SELECT product_id, product_name, product_price from product where product_id='$product_id'";
                $result1 = mysqli_query($conn, $query1);

                $row1 = mysqli_fetch_array($result1);
                echo $product_id." ".$quantity." + ".$row1["product_name"];
            }
            else {
                $cart->deleteItem($product_id);
            }
        }

        // Show the cart contents:
        echo '<h2>Continutul cosului de cumparaturi (' . $rowCount . ' tipuri de produse)</h2>
        The user is ' . $_SESSION["user"] . '.<br>
        User type is ' . $_SESSION["user_type"] . '.';

        if (!$cart->isEmpty()) {
            foreach ($cart as $arr) {
                // Get the item object:
                $item = $arr['item'];
                // Print the item:
                printf('<p><strong>%s</strong>: %d @ $%0.2f bucata.<p>', $arr['item']->getName(), $arr['item']->getQuantity(), $arr['item']->getPrice());
            } // End of foreach loop!

            echo "Saving card to Session variable";
            //New_cart is only set in adjQ request prehaps this code should be there?
            $_SESSION["cart"] = serialize($cart);
        } // End of IF.
    }

    if(isset($_REQUEST['showCart']))
    {
        if ($cart->getCount() > 0) {
            // output data of each row
            echo '<table border='."1".'><form action="cart.php" method="post">';
            echo '<tr><td><b>'."Id produs".'</td><td><b>'."Denumire".'</td><td><b>'."Pret".'</td><td>'."Numar de bucati solicitate".'</td></tr>';
            foreach ($cart->getAllRows() as $row) {
                echo '
                    <tr>
                        <td>'. $row->getProductId() . '</td>
                        <td>'. $row->getName() . '</td>
                        <td>'. $row->getPrice() . '</td>
                        <td><input type="input" value="0" name="quantity[]"><input type="hidden" value="' . $row->getProductId() . '" name="item_adjust[]"/><input type="hidden" value="showCart" name="showCart"/></td>
                    </tr>';
            }
            echo '<tr><td colspan="3"><input type="submit" value="Adauga in cosul de cumparaturi" name="adjQ"></td></tr></table>';
        } else {
            echo "Cart is empty";
        }

    }
?>
</body>
</html>
Saturday, May 29, 2021
 
Slinky
answered 7 Months ago
54

You can use the PackageManager.GetPackageInfo method to get access to all sorts of information about your app, including VersionCode and VersionName

Monday, August 16, 2021
 
Maxim Shoustin
answered 4 Months ago
Only authorized users can answer the question. Please sign in first, or register a free account.
Not the answer you're looking for? Browse other questions tagged :
 
Share