parse special character xml file using simplexml - php

Apologies if there is an obvious answer (and I know there are about 1000 of these similar questions) - but I have spent two days trying to attack this without success. I cannot seem to crack why I get a null response...
Short background: the following works just fine
$xurl= new SimpleXMLElement('https://gptxsw.appspot.com/view/submissionList?formId=GP_v7&numEntries=1', NULL, TRUE);
$keyname = $xurl->idList->id[0];
echo $keyname;
this provides a response: a unique key like uuid:d0721391-6953-4d0b-b981-26e38f05d2e5
however I try a similar request (which ultimately would be based on first request) and get a failure. I've simplified code as follows...
$xdurl= new SimpleXMLElement('https://gptxsw.appspot.com/view/downloadSubmission?formId=GP_v7[#version=null%20and%20#uiVersion=null]/GP_v7[#key=uuid:d0721391-6953-4d0b-b981-26e38f05d2e5]', NULL, TRUE);
$keyname2 = $xdurl->data->GP_v7->SDD_ID_N[0];
echo $keyname2;
this provides null. And if I try something like
echo $xdurl->asXML();
I get an error response from the site (not from PHP).
Do I need to eject from SimpleXMLElement for the second request? I've read about using XPath and about defining the namespace, but I'm not sure that either would be required: the second file does have two namespaces but one of them isn't used and the other has no prefix for elements. Plus I have tried variations of those - enough to think that my problem/error is either more global in nature (or oversight due to inexperience).
For purposes of this request I have no control over the formatting of either XML file.

Here we go: SimpleXMLElement seems to re-escape (or incorrectly handle in some way) already url-escaped characters like white spaces. Try:
$xdurl= new SimpleXMLElement('https://gptxsw.appspot.com/view/downloadSubmission?formId=GP_v7[#version=null and #uiVersion=null]/GP_v7[#key=uuid:d0721391-6953-4d0b-b981-26e38f05d2e5]', NULL, TRUE);
$keyname2 = $xdurl->data->GP_v7->SDD_ID_N[0];
echo $keyname2;
and you should be fine.
(FYI: I debugged this by manually creating a local copy of the XML request result named "foo.xml" which worked perfectly.)

Thanks to #Matze for getting me on right track.
Issue is that URL has special characters that SimpleXMLElement cannot parse without help.
Solution: add urlencode() command like the following
$fixurl = urlencode('https://gptxsw.appspot.com/view/downloadSubmission?formId=GP_v7[#version=null and #uiVersion=null]/GP_v7[#key=uuid:d0721391-6953-4d0b-b981-26e38f05d2e5]');
$xdurl= new SimpleXMLElement($fixurl, NULL, TRUE);
$keyname2 = $xdurl->data->GP_v7->SDD_ID_N[0];
echo $keyname2;
this provided the answer (in this case 958)

Related

Extract value from soap response

I need help to extract value from a soap response.
https://i.stack.imgur.com/djv5y.jpg
What i exactly need is:
$username=user
$message=success
Maybe include the SOAP response here, that would be helpful for those who come in the future. As for your question are you using a particular language? That would make it easier to answer.
If you are looking for a way to view use can use this URL: https://codebeautify.org/xmlviewer
Ok now that I see it, it's fairly easy
Assuming you have loaded the SOAP XML into a variable, lets call it $xml_string
$xml = simplexml_load_string($xml_string); // Load it as an object
$xmlarray = json_decode(json_encode($xml),TRUE); // Change it into an array
Then the variables you are looking for is in
$username = $xmlarray['UserName'];
$message = $xmlarray['response']['MESSAGE'];
BTW This solution is found here
PHP convert XML to JSON
I did it as an array as sometimes objects are a little hard to process. You can easily just do the first line and address it as an object. (If those are the only variables you need then an array works fine. Ex: the 'Plan' data will be messed up in an array as it appears twice)
There can be some issues such as the MESSAGE not appearing or the XML returning a failure but I think you should know how to code around missing datum.

simplexml_load_string - parse error due to unicode characters in payload

I have a problem with simplexml_load_string erring with parse errors due to an xml payload coming from a database with unicode characters in it.
I'm at a loss how to get php to read this and use the xml like I normally would. The code has been working fine until people were getting creative with data being submitted.
Unfortunately I cannot modify the source data, I have to work with what I receive, to give you an idea, one field that's breaking it in the original raw receipt looks like :
<FirstName>🐺</FirstName>
Previously the code works fine by parsing the xml with a simple line of :
$xmlresult = simplexml_load_string($result, 'SimpleXMLElement',LIBXML_NOCDATA);
However with these unicode characters, it just errors.
Depending on what I use to view the data if I dump the raw payload it can look like:
<d83d><dc3a>
or <U+D83D><U+DC3A>
Reading a bit on stack, it seemed DOM might work but didn't have any luck there either.
The incoming payload does have the header:
?xml version="1.0" encoding="UTF-8"?>
data comes in via
<data type="cdata"><![CDATA[<payload>
I'm at a complete loss, hopefully can get some help here to get me over this hump with this data handling.
I've been staring at this for days and it seems one thing I didn't try was to wrap my curl call function with utf8_encode like this :
$result = utf8_encode(do_curl($xmlbuildquery));
My do_curl function is just a separate function to call the curl procedure, nothing more.
Doing that, I'm able to parse the results, instead of those unicode characters showing up, instead its displaying as
[firstname] => 🐺
(the above is result of print_r($result); after
$xmldata = simplexml_load_string((string)$xmlresult->body->function->data);
With that in place the xml is now parsing finally. Oddly this sparked my curiosity further as this information is provided via csv thats imported into a mysql database and when I look up the same record its shown as :
FirstName: ????
with the table type set too :
FirstName varchar(40) COLLATE utf8mb4_unicode_ci NOT NULL,
That might suggest their not utf8_encoding the output to the csv perhaps, separate from this issue but just interesting.
And finally, my script is able to run again!!

Decode a byte encoded string via my URL

We have a PHP site on Zend Framework with a backend Postgresql database. Our primary character encoding is UTF-8.
I just checked our error log and found a strange entry. My URL is as follows:
www.mydomain.com/schuhe-für-breite-füsse
however someone (or maybe a bot) has tried to access this URL as follows:
www.mydomain.com/schuhe-f\xc3\xbcr-breite-f\xc3\xbcsse/
It's the first time I've seen something like the above. Two things are happening on my page:
1) The above URL is queried against our CMS. This works fine for some reason, I think Postgresql reaslises it is byte-encoded and then converts it back when tried to find this SEF URL in our database.
2) An Ajax request is made on the page, passing the same SEF URL. This fails. I believe the slashes are causing a problem on Javascript.
To avoid this I want to decode any URL that is encoded like this. However a quick test of the following code did not decode anything for me :(
$landing_sef_url = $this->_getParam('landing_sef_url');
$utf8=html_entity_decode($landing_sef_url);
$iso8859=utf8_decode($utf8);
$test3 = html_entity_decode($landing_sef_url, 1, "ISO-8859-1");
$test4 = urldecode($landing_sef_url);
echo utf8_decode("$landing_sef_url");
echo "<br/><br/>";
die($landing_sef_url . " -- $utf8 -- $iso8859 <br/>$test3<br/>$test4");
I found the above via various posts online but they all print back the same result - schuhe-f\xc3\xbcr-breite-f\xc3\xbcsse
Any help would be MUCH appreciated. Many thanks!
This method seems to do what you're looking for:
http://li.php.net/manual/en/function.stripcslashes.php
But if you're just looking to unescape \x## sequences, you could also do this with a fairly simple regular expression.

PHP json_decode returns null

I'm writing PHP code that uses a database. To do so, I use an array as a hash-map.
Every time content is added or removed from my DB, I save it to file.
I'm forced by my DB structure to use this method and can't use mysql or any other standard DB (School project, so structure stays as is).
I built two functions:
function saveDB($db){
$json_db = json_encode($db);
file_put_contents("wordsDB.json", $json_db);
} // saveDB
function loadDB(){
$json_db = file_get_contents("wordsDB.json");
return json_decode($json_db, true);
} // loadDB
When echo-ing the string I get after the encoding or after loading from file, I get a valid json (Tested it on a json viewer) Whenever I try to decode the string using json_decode(), I get null (Tested it with var_dump()).
The json string itself is very long (~200,000 characters, and that's just for testing).
I tried the following:
Replacing single/double-quotes with double/single-quotes (Without any backslashes, with one backslash and three backslashes. And any combination I could think of with a different number of backslashes in the original and replaced string), both manually and using str_replace().
Adding quotes before and after the json string.
Changing the page's encoding.
Decoding without saving to file (Right after encoding).
Checked for slashes and backslashes. None to be found.
Tried addslashes().
Tried using various "Escape String" variants.
json_last_error() doesn't work. I get no error number (Get null, not 0).
It's not my server, so I'm not sure what PHP version is used, and I can't upgrade/downgrade/install anything.
I believe the size has something to do with it, because small strings seem to work fine.
Thanks Everybody :)
In your JSON file change null to "null" and it will solve the problem.
Check if your file is UTF8 encoded. json_decode works with UTF8 encoded data only.
EDIT:
After I saw uploaded JSON data, I did some digging and found that there are 'null' key. Search for:
"exceeding":{"S01E01.html":{"2217":1}},null:{"S01E01.html":
Change that null to be valid property name and json_decode will do the job.
I had a similar problem last week. my json was valid according to jsonlint.com.
My json string contained a # and a & and those two made json_decode fail and return null.
by using var_dump(json_decode($myvar)) which stops right where it fails I managed to figure out where the problem was coming from.
I suggest var_dumping and using find dunction to look for these king of characters.
Just on the off chance.. and more for anyone hitting this thread rather than the OP's issue...I missed the following, someone had htmlentities($json) way above me in the call stack. Just ensure you haven't been bitten by the same and check the html source.
Kickself #124

Remove double-quotes from a json_encoded string on the keys

I have a json_encoded array which is fine.
I need to strip the double-quotes on all of the keys of the json string on returning it from a function call.
How would I go about doing this and returning it successfully?
Thanks!
I do apologise, here is a snippet of the json code:
{"start_date":"2011-01-01 09:00","end_date":"2011-01-01 10:00","text":"test"}
Just to add a little more info:
I will be retrieving the JSON via an AJAX request, so if it would be easier, I am open to ideas in how to do this on the javascript side.
EDITED as per anubhava's comment
$str = '{"start_date":"2011-01-01 09:00","end_date":"2011-01-01 10:00","text":"test"}';
$str = preg_replace('/"([^"]+)"\s*:\s*/', '$1:', $str);
echo $str;
This certainly works for the above string, although there maybe some edge cases that I haven't thought of for which this will not work. Whether this will suit your purposes depends on how static the format of the string and the elements/values it contains will be.
TL;DR: Missing quotes is how Chrome shows it is a JSON object instead of a string. Ensure that you have Header('Content-Type: application/json; charset=UTF8'); in PHP's AJAX response to solve the real problem.
DETAILS:
A common reason for wanting to solve this problem is due to finding this difference while debugging the processing of returned AJAX data.
In my case I saw the difference using Chrome's debugging tools. When connected to the legacy system, upon success, Chrome showed that there were no quotes shown around keys in the response according to the debugger. This allowed the object to be immediately treated as an object without using a JSON.parse() call. Debugging my new AJAX destination, there were quotes shown in the response and variable was a string and not an object.
I finally realized the true issue when I tested the AJAX response externally saw the legacy system actually DID have quotes around the keys. This was not what the Chrome dev tools showed.
The only difference was that on the legacy system there was a header specifying the content type. I added this to the new (WordPress) system and the calls were now fully compatible with the original script and the success function could handle the response as an object without any parsing required. Now I can switch between the legacy and new system without any changes except the destination URL.

Categories