Asked  7 Months ago    Answers:  5   Viewed   96 times

I have 2 files 1.xml and 2.xml both having similar structure and I would like to have one. I tried many solutions but I had errors only - frankly speaking I have no idea how those scripts worked.

1.xml:

<res>
    <items total="180">
        <item>
            <id>1</id>
            <title>Title 1</title>
            <author>Author 1</author>
        </item>
        ...
    </items>
</res> 

2.xml:

<res>
    <items total="123">
        <item>
            <id>190</id>
            <title>Title 190</title>
            <author>Author 190</author>
        </item>
        ...
    </items>
</res> 

I would like to create a new file merged.xml with the following structure

<res>
    <items total="303">
        <item>
            <id>1</id>
            <title>Title 1</title>
            <author>Author 1</author>
        </item>
        ...  //items from 1.xml
        <item>
            <id>190</id>
            <title>Title 190</title>
            <author>Author 190</author>
        </item>
        ... //items from 2.xml
    </items>
</res> 

How should I do that? Can you explain me the way to do it? How can I do it with more files? Thanks

Edit

What I tried?

<?php
function mergeXML(&$base, $add)
{
    if ( $add->count() != 0 )
    $new = $base->addChild($add->getName());
    else
        $new = $base->addChild($add->getName(), $add);
    foreach ($add->attributes() as $a => $b)
    {
        $new->addAttribute($a, $b);
    }
    if ( $add->count() != 0 )
    {
       foreach ($add->children() as $child)
        {
            mergeXML($new, $child);
        }
    }
}
$xml = mergeXML(simplexml_load_file('1.xml'), simplexml_load_file('2.xml'));
echo $xml->asXML(merged.xml);
?>

EDIT2

Following Torious advice I looked into DOMDocument manual and found an example:

function joinXML($parent, $child, $tag = null)
{
    $DOMChild = new DOMDocument;
    $DOMChild->load($child);
    $node = $DOMChild->documentElement;

    $DOMParent = new DOMDocument;
    $DOMParent->formatOutput = true;
    $DOMParent->load($parent);

    $node = $DOMParent->importNode($node, true);

    if ($tag !== null) {
        $tag = $DOMParent->getElementsByTagName($tag)->item(0);
        $tag->appendChild($node);
    } else {
        $DOMParent->documentElement->appendChild($node);
    }

    return $DOMParent->save('merged.xml');
}

joinXML('1.xml', '2.xml')

But it creates wrong xml file:

<res>
    <items total="180">
        <item>
            <id>1</id>
            <title>Title 1</title>
            <author>Author 1</author>
        </item>
        ...
    </items>
    <res>
        <items total="123">
            <item>
                <id>190</id>
                <title>Title 190</title>
                <author>Author 190</author>
            </item>
            ...
        </items>
    </res> 
</res>  

And I cannot use this file properly. I need correct structure and here I have kind of pasting one file into another. I would like to "paste" only item's not all tags. What should I change?

EDIT3

here is an answer - based on Torious answer - just adapted it to my needs - check //edited

$doc1 = new DOMDocument();
$doc1->load('1.xml');

$doc2 = new DOMDocument();
$doc2->load('2.xml');

// get 'res' element of document 1
$res1 = $doc1->getElementsByTagName('items')->item(0); //edited res - items

// iterate over 'item' elements of document 2
$items2 = $doc2->getElementsByTagName('item');
for ($i = 0; $i < $items2->length; $i ++) {
    $item2 = $items2->item($i);

    // import/copy item from document 2 to document 1
    $item1 = $doc1->importNode($item2, true);

    // append imported item to document 1 'res' element
    $res1->appendChild($item1);

}
$doc1->save('merged.xml'); //edited -added saving into xml file

 Answers

51

Since you've put an effort in, I've coded something that should work.

Note that this is untested code, but you get the idea.

It's up to you to make it into pretty functions. Make sure you understand what's going on; being able to work with the DOM will probably help you out in a lot of future scenarios. The cool thing about the DOM standard is that you have pretty much the same operations in many different programming languages/platforms.

    $doc1 = new DOMDocument();
    $doc1->load('1.xml');

    $doc2 = new DOMDocument();
    $doc2->load('2.xml');

    // get 'res' element of document 1
    $res1 = $doc1->getElementsByTagName('res')->item(0);

    // iterate over 'item' elements of document 2
    $items2 = $doc2->getElementsByTagName('item');
    for ($i = 0; $i < $items2->length; $i ++) {
        $item2 = $items2->item($i);

        // import/copy item from document 2 to document 1
        $item1 = $doc1->importNode($item2, true);

        // append imported item to document 1 'res' element
        $res1->appendChild($item1);

    }
Wednesday, March 31, 2021
 
ammezie
answered 7 Months ago
28

I dont want xml writer to encode the multilingual characters , how this is possible ?

Actually as you write XML you already encode. What you mean is that you don't want to use numeric entities for these two characters which is possible but not always.

To not use numeric entities, you need to match the encoding of the document with the encoding of your string. From the output you provided I can only guess a bit, those two characters probably stand for:

  1. Unicode Han Character 'the Chinese people, Chinese language' (U+6F22)
  2. Unicode Han Character 'letter, character, word' (U+5B57)

Which could mean (I do not speak any Chinese so far) something like Chinese Word.

XMLWriter in PHP will always put characters into a numeric entity (like &#x6F22; and &#x5B57; in your example) whenever the encoding of the document is not able to represent that character within the document.

If you are able to match both encodings XMLWriter will automatically not use the numeric entities.

I give a more simple example. Let's take the US-ASCII encoding and the German umlaut Ä from Äpfel (Unicode Character 'LATIN CAPITAL LETTER A WITH DIAERESIS' (U+00C4)) as an attribute value:

<?php
$xmlWriter = new XMLWriter();
$xmlWriter->openMemory();
$xmlWriter->startDocument('1.0', 'US-ASCII');
$xmlWriter->startElement('root');
$xmlWriter->writeAttribute('value', 'Äpfel');
$xmlWriter->endDocument();
echo $xmlWriter->flush();

This code written down in an UTF-8 encoded PHP file will output when executed:

<?xml version="1.0" encoding="US-ASCII"?>
<root value="&#196;pfel"/>

&#196; is the numeric entity for the unicode character U+00C4 and if you look closely, C4 is the hexadecimal representation of decimal 196 which also shows that the numeric XML entity always represents the Unicode character number.

So the XML output uses the US-ASCII encoding which is not able to represent the Ä from the UTF-8 encoded string in the PHP code and therefore properly encodes it with it's numeric entity to preserve the character information.

Now changing the encoding from:

$xmlWriter->startDocument('1.0', 'US-ASCII');

to the UTF-8 encoding of the PHP string:

$xmlWriter->startDocument('1.0', 'UTF-8');

does change this output:

<?xml version="1.0" encoding="UTF-8"?>
<root value="Äpfel"/>

This would equally work with your example however, one important information in your question is missing: In which encoding is the string from that record?

If it is UTF-8 already, then like I outlined in the example above, it would work already:

<?php
$recordUTf8 = "... contents="Just <span style="color:red">testing</span>:"
             ."xE6xBCxA2xE5xADx97"";
$encoding   = 'UTF-8';
$encoding   = 'US-ASCII';

$xmlWriter = new XMLWriter();
$xmlWriter->openMemory();
$xmlWriter->startDocument('1.0', $encoding);
$xmlWriter->startElement('record');
$xmlWriter->writeAttribute('value', $recordUTf8);
$xmlWriter->endDocument();
echo $xmlWriter->flush();

Output:

<?xml version="1.0" encoding="UTF-8"?>
<record value="... contents=&quot;Just &lt;span style=&quot;color:red&quot;&gt;
               testing &lt;/span&gt;:?? &quot;"/>

As this output show, no numeric entities are used here, however, the string is clearly UTF-8 encoded (in a binary safe manner here in case you use a different encoding for the PHP file if you copy it over).

So just to summarize at this point: The XML encoding need to match the encoding of the string to represent all characters not in numeric entities (apart from the ones used to encode XML itself like <, >, ', " and &).

These are pretty much XML basics. If the document has an encoding the character data can not be represented in but as XML supports Unicode, the fallback are numeric entities. You are trying to prevent this fallback by aligning the document encoding with the string encoding.

Here is my advice for PHP & XMLWriter specifically:

  1. Obtain or re-encode the record from the database to UTF-8.
  2. Only pass UTF-8 strings into XMLWriter methods.
  3. Set the XML documents encoding to UTF-8.

I give these suggestions because UTF-8 is the default encoding of XML and UTF-8 support is quite well in PHP. Also XMLWriter expects Unicode strings to be UTF-8 encoded, there is no setting or option that allows you to change that, so the input already needs to be UTF-8 encoded.

However independent to the input string, you can naturally tell XMLWriter to use a different output encoding. For example any other Chinese or Unicode Encoding might be suitable for you and it is possible for XMLWriter output as long as your PHP configuration supports that specific output encoding (check the iconv library you have).

When you start the document with XMLWriter, the second parameter specifies the encoding:

$xmlWriter->startDocument('1.0', $encoding);

You can put in any encoding from the set of the encodings XML supports in the corresponding XML-Declaration:

<?xml version="1.0" encoding="ISO-8859-1"?><!-- Latin-1 example -->

The full specs of the XML encoding value can be found here: http://www.w3.org/TR/REC-xml/#NT-EncName ::

In an encoding declaration, the values " UTF-8 ", " UTF-16 ", " ISO-10646-UCS-2 ", and " ISO-10646-UCS-4 " should be used for the various encodings and transformations of Unicode / ISO/IEC 10646, the values " ISO-8859-1 ", " ISO-8859-2 ", ... " ISO-8859- n " (where n is the part number) should be used for the parts of ISO 8859, and the values " ISO-2022-JP ", " Shift_JIS ", and " EUC-JP " should be used for the various encoded forms of JIS X-0208-1997. It is recommended that character encodings registered (as charsets) with the Internet Assigned Numbers Authority [IANA-CHARSETS], other than those just listed, be referred to using their registered names; other encodings should use names starting with an "x-" prefix. XML processors should match character encoding names in a case-insensitive way and should either interpret an IANA-registered name as the encoding registered at IANA for that name or treat it as unknown (processors are, of course, not required to support all IANA-registered encodings).

Where-as [IANA-CHARSETS] is:

(Internet Assigned Numbers Authority) Official Names for Character Sets, ed. Keld Simonsen et al. (See http://www.iana.org/assignments/character-sets.)

These specs are perhaps a little bit verbose. In the context of your question, all you need to do is to find out the encoding of your record-string. I btw. can't say I was not able to reproduce your exact output, I always get decimal entities, not hexa-decimal ones. You might be able to provide more information with a hex-dump of the string.

Wednesday, March 31, 2021
 
mopsyd
answered 7 Months ago
46

Using PHPExcel

$inputFileType1 = 'Excel2007';
$inputFileName1 = 'inputData1.xlsx';
$inputFileType2 = 'Excel5';
$inputFileName2 = 'inputData2.xls';
$outputFileType = 'Excel5';
$outputFileName = 'outputData.xls';

// Load the first workbook (an xlsx file)
$objPHPExcelReader1 = PHPExcel_IOFactory::createReader($inputFileType1);
$objPHPExcel1 = $objPHPExcelReader1->load($inputFileName1);

// Load the second workbook (an xls file)
$objPHPExcelReader2 = PHPExcel_IOFactory::createReader($inputFileType2);
$objPHPExcel2 = $objPHPExcelReader2->load($inputFileName2);

// Merge the second workbook into the first
$objPHPExcel2->getActiveSheet()->setTitle('Unique worksheet name');
$objPHPExcel1->addExternalSheet($objPHPExcel2->getActiveSheet());

// Save the merged workbook under a new name (could save under the original name)
// as an xls file
$objPHPExcelWriter = PHPExcel_IOFactory::createWriter($objPHPExcel1,$outputFileType);
$objPHPExcelWriter->save($outputFileName);
Saturday, May 29, 2021
 
Smandoli
answered 5 Months ago
72

If you want to import the whole node sub-tree (and not just the node itself), you need to set $deep to true in importNode:

$domDocument->importNode($node, true);
Saturday, May 29, 2021
 
JakeGR
answered 5 Months ago
39

What the code you posted is doing is combining all the elements regardless of whether or not an element with the same tag already exists. So you need to iterate over the elements and manually check and combine them the way you see fit, because it is not a standard way of handling XML files. I can't explain it better than code, so here it is, more or less commented:

from xml.etree import ElementTree as et

class XMLCombiner(object):
    def __init__(self, filenames):
        assert len(filenames) > 0, 'No filenames!'
        # save all the roots, in order, to be processed later
        self.roots = [et.parse(f).getroot() for f in filenames]

    def combine(self):
        for r in self.roots[1:]:
            # combine each element with the first one, and update that
            self.combine_element(self.roots[0], r)
        # return the string representation
        return et.tostring(self.roots[0])

    def combine_element(self, one, other):
        """
        This function recursively updates either the text or the children
        of an element if another element is found in `one`, or adds it
        from `other` if not found.
        """
        # Create a mapping from tag name to element, as that's what we are fltering with
        mapping = {el.tag: el for el in one}
        for el in other:
            if len(el) == 0:
                # Not nested
                try:
                    # Update the text
                    mapping[el.tag].text = el.text
                except KeyError:
                    # An element with this name is not in the mapping
                    mapping[el.tag] = el
                    # Add it
                    one.append(el)
            else:
                try:
                    # Recursively process the element, and update it in the same way
                    self.combine_element(mapping[el.tag], el)
                except KeyError:
                    # Not in the mapping
                    mapping[el.tag] = el
                    # Just add it
                    one.append(el)

if __name__ == '__main__':
    r = XMLCombiner(('sample1.xml', 'sample2.xml')).combine()
    print '-'*20
    print r
Tuesday, June 29, 2021
 
Zach
answered 4 Months ago
Only authorized users can answer the question. Please sign in first, or register a free account.
Not the answer you're looking for? Browse other questions tagged :