I am trying to parse an XML string containing the special octal character 205. The XML string comes from a shoutcast server. It seems that this character crashes the internals of SimpleXMLElement.
// This is my metadata.php file:
<?php
header('Content-type: application/json');
$output = file_get_contents('error.xml.bak');
$xml = new SimpleXMLElement($output, LIBXML_NOERROR|LIBXML_ERR_NONE) or die('Something is wrong');
echo 'OK';
?>
I am getting the following error:
kazepis$ php metadata.php
PHP Fatal error: Uncaught Exception: String could not be parsed as XML in /var/www/html/wolfclub/metadata.php:4
Stack trace:
#0 /var/www/html/radio/metadata.php(4): SimpleXMLElement->__construct('<?xml version="...', 32)
#1 {main}
thrown in /var/www/html/radio/metadata.php on line 4
You can find the sample XML with the problematic character (decimal 133, octal 205) here:
https://wetransfer.com/downloads/f1b615c1cc09c8262cdd9965991b9cd420200123155505/801ab3
or inline:
<?xml version="1.0" standalone="yes" ?><!DOCTYPE SHOUTCASTSERVER [<!ELEMENT SHOUTCASTSERVER (CURRENTLISTENERS,PEAKLISTENERS,MAXLISTENERS,REPORTEDLISTENERS,AVERAGETIME,SERVERGENRE,SERVERURL,SERVERTITLE,SONGTITLE,SONGURL,IRC,ICQ,AIM,WEBHITS,STREAMHITS,STREAMSTATUS,BITRATE,CONTENT,VERSION,WEBDATA,LISTENERS,SONGHISTORY)><!ELEMENT CURRENTLISTENERS (#PCDATA)><!ELEMENT PEAKLISTENERS (#PCDATA)><!ELEMENT MAXLISTENERS (#PCDATA)><!ELEMENT REPORTEDLISTENERS (#PCDATA)><!ELEMENT AVERAGETIME (#PCDATA)><!ELEMENT SERVERGENRE (#PCDATA)><!ELEMENT SERVERURL (#PCDATA)><!ELEMENT SERVERTITLE (#PCDATA)><!ELEMENT SONGTITLE (#PCDATA)><!ELEMENT SONGURL (#PCDATA)><!ELEMENT IRC (#PCDATA)><!ELEMENT ICQ (#PCDATA)><!ELEMENT AIM (#PCDATA)><!ELEMENT WEBHITS (#PCDATA)><!ELEMENT STREAMHITS (#PCDATA)><!ELEMENT STREAMSTATUS (#PCDATA)><!ELEMENT BITRATE (#PCDATA)><!ELEMENT CONTENT (#PCDATA)><!ELEMENT VERSION (#PCDATA)><!ELEMENT WEBDATA (INDEX,LISTEN,PALM7,LOGIN,LOGINFAIL,PLAYED,COOKIE,ADMIN,UPDINFO,KICKSRC,KICKDST,UNBANDST,BANDST,VIEWBAN,UNRIPDST,RIPDST,VIEWRIP,VIEWXML,VIEWLOG,INVALID)><!ELEMENT INDEX (#PCDATA)><!ELEMENT LISTEN (#PCDATA)><!ELEMENT PALM7 (#PCDATA)><!ELEMENT LOGIN (#PCDATA)><!ELEMENT LOGINFAIL (#PCDATA)><!ELEMENT PLAYED (#PCDATA)><!ELEMENT COOKIE (#PCDATA)><!ELEMENT ADMIN (#PCDATA)><!ELEMENT UPDINFO (#PCDATA)><!ELEMENT KICKSRC (#PCDATA)><!ELEMENT KICKDST (#PCDATA)><!ELEMENT UNBANDST (#PCDATA)><!ELEMENT BANDST (#PCDATA)><!ELEMENT VIEWBAN (#PCDATA)><!ELEMENT UNRIPDST (#PCDATA)><!ELEMENT RIPDST (#PCDATA)><!ELEMENT VIEWRIP (#PCDATA)><!ELEMENT VIEWXML (#PCDATA)><!ELEMENT VIEWLOG (#PCDATA)><!ELEMENT INVALID (#PCDATA)><!ELEMENT LISTENERS (LISTENER*)><!ELEMENT LISTENER (HOSTNAME,USERAGENT,UNDERRUNS,CONNECTTIME, POINTER, UID)><!ELEMENT HOSTNAME (#PCDATA)><!ELEMENT USERAGENT (#PCDATA)><!ELEMENT UNDERRUNS (#PCDATA)><!ELEMENT CONNECTTIME (#PCDATA)><!ELEMENT POINTER (#PCDATA)><!ELEMENT UID (#PCDATA)><!ELEMENT SONGHISTORY (SONG*)><!ELEMENT SONG (PLAYEDAT, TITLE)><!ELEMENT PLAYEDAT (#PCDATA)><!ELEMENT TITLE (#PCDATA)>]><SHOUTCASTSERVER><CURRENTLISTENERS>1</CURRENTLISTENERS><PEAKLISTENERS>3</PEAKLISTENERS><MAXLISTENERS>5000</MAXLISTENERS><REPORTEDLISTENERS>1</REPORTEDLISTENERS><AVERAGETIME>1</AVERAGETIME><SERVERGENRE>public</SERVERGENRE><SERVERURL>http://www.virtualdj.com/</SERVERURL><SERVERTITLE>wolf</SERVERTITLE><SONGTITLE>BARRY WHITE - YOU'RE THE FIRST,THE LAST ▒ </SONGTITLE><SONGURL></SONGURL><IRC>wolf</IRC><ICQ>wolf</ICQ><AIM>wolf</AIM><WEBHITS>80</WEBHITS><STREAMHITS>6</STREAMHITS><STREAMSTATUS>1</STREAMSTATUS><BITRATE>96</BITRATE><CONTENT>audio/mpeg</CONTENT><VERSION>1.9.8</VERSION><WEBDATA><INDEX>0</INDEX><LISTEN>0</LISTEN><PALM7>6</PALM7><LOGIN>0</LOGIN><LOGINFAIL>0</LOGINFAIL><PLAYED>0</PLAYED><COOKIE>0</COOKIE><ADMIN>1</ADMIN><UPDINFO>1</UPDINFO><KICKSRC>0</KICKSRC><KICKDST>0</KICKDST><UNBANDST>0</UNBANDST><BANDST>0</BANDST><VIEWBAN>0</VIEWBAN><UNRIPDST>0</UNRIPDST><RIPDST>0</RIPDST><VIEWRIP>0</VIEWRIP><VIEWXML>69</VIEWXML><VIEWLOG>0</VIEWLOG><INVALID>3</INVALID></WEBDATA><LISTENERS><LISTENER><HOSTNAME>78.129.222.56</HOSTNAME><USERAGENT>curl/7.29.0</USERAGENT><UNDERRUNS>0</UNDERRUNS><CONNECTTIME>216</CONNECTTIME><POINTER>0</POINTER><UID>2</UID></LISTENER></LISTENERS><SONGHISTORY><SONG><PLAYEDAT>1579791561</PLAYEDAT><TITLE>BARRY WHITE - YOU'RE THE FIRST,THE LAST ▒ </TITLE></SONG></SONGHISTORY></SHOUTCASTSERVER>
Any ideas why this is happening?
My operating system:
PRETTY_NAME="Debian GNU/Linux 10 (buster)"
NAME="Debian GNU/Linux"
VERSION_ID="10"
VERSION="10 (buster)"
VERSION_CODENAME=buster
ID=debian
Thank you!
PHP 7.3.11-1~deb10u1 (cli) (built: Oct 26 2019 14:14:18) ( NTS )
Copyright (c) 1997-2018 The PHP Group
Zend Engine v3.3.11, Copyright (c) 1998-2018 Zend Technologies
with Zend OPcache v7.3.11-1~deb10u1, Copyright (c) 1999-2018, by Zend Technologies
I found this:
https://www.php.net/manual/en/class.simplexmlelement.php#107869
However the $errors array as shown in the line $errors = libxml_get_errors(); is always empty in my case. So that snippet did not help. Moreover i also got the following warnings:
PHP Warning: DOMDocument::loadXML(): Input is not proper UTF-8, indicate encoding !\nBytes: 0x85 0x20 0x20 0x20 in Entity, line: 1 in /var/www/html/radio/metadata.php on line 6
[Fri Jan 24 18:56:12.290661 2020] [php7:warn] [pid 17910] [client 130.xxx.xxx.xxx:xxxxx] PHP Warning: simplexml_import_dom(): Invalid Nodetype to import in /var/www/html/radio/metadata.php on line 12
Anyway, I managed to get over this messy situation by using utf8_encode() to encode my string before feeding it to the SimpleXMLElement constructor.
My resulting "test" php file is:
<?php
// header('Content-type: application/json');
$output = file_get_contents('error.xml.bak');
$output = utf8_encode($output);
$doc = new DOMDocument('1.0', 'utf-8');
$doc->loadXML($output);
var_dump($output);
$errors = libxml_get_errors();
var_dump($errors);
$xml = simplexml_import_dom($doc);
// $xml = new SimpleXMLElement($output, LIBXML_NOERROR|LIBXML_ERR_NONE) or die('Something is wrong');
var_dump($xml);
?>
which results in the following printout without any errors or warning whatsoever...
string(3396) "]>13500011publichttp://www.virtualdj.com/wolfBARRY WHITE - YOU'RE THE FIRST,THE LAST wolfwolfwolf806196audio/mpeg1.9.800600001100000000690378.129.222.56curl/7.29.00216021579791561 " array(0) { } object(SimpleXMLElement)#2 (22) { ["CURRENTLISTENERS"]=> string(1) "1" ["PEAKLISTENERS"]=> string(1) "3" ["MAXLISTENERS"]=> string(4) "5000" ["REPORTEDLISTENERS"]=> string(1) "1" ["AVERAGETIME"]=> string(1) "1" ["SERVERGENRE"]=> string(6) "public" ["SERVERURL"]=> string(25) "http://www.virtualdj.com/" ["SERVERTITLE"]=> string(4) "wolf" ["SONGTITLE"]=> string(64) "BARRY WHITE - YOU'RE THE FIRST,THE LAST " ["SONGURL"]=> object(SimpleXMLElement)#3 (0) { } ["IRC"]=> string(4) "wolf" ["ICQ"]=> string(4) "wolf" ["AIM"]=> string(4) "wolf" ["WEBHITS"]=> string(2) "80" ["STREAMHITS"]=> string(1) "6" ["STREAMSTATUS"]=> string(1) "1" ["BITRATE"]=> string(2) "96" ["CONTENT"]=> string(10) "audio/mpeg" ["VERSION"]=> string(5) "1.9.8" ["WEBDATA"]=> object(SimpleXMLElement)#4 (20) { ["INDEX"]=> string(1) "0" ["LISTEN"]=> string(1) "0" ["PALM7"]=> string(1) "6" ["LOGIN"]=> string(1) "0" ["LOGINFAIL"]=> string(1) "0" ["PLAYED"]=> string(1) "0" ["COOKIE"]=> string(1) "0" ["ADMIN"]=> string(1) "1" ["UPDINFO"]=> string(1) "1" ["KICKSRC"]=> string(1) "0" ["KICKDST"]=> string(1) "0" ["UNBANDST"]=> string(1) "0" ["BANDST"]=> string(1) "0" ["VIEWBAN"]=> string(1) "0" ["UNRIPDST"]=> string(1) "0" ["RIPDST"]=> string(1) "0" ["VIEWRIP"]=> string(1) "0" ["VIEWXML"]=> string(2) "69" ["VIEWLOG"]=> string(1) "0" ["INVALID"]=> string(1) "3" } ["LISTENERS"]=> object(SimpleXMLElement)#5 (1) { ["LISTENER"]=> object(SimpleXMLElement)#7 (6) { ["HOSTNAME"]=> string(13) "78.129.222.56" ["USERAGENT"]=> string(11) "curl/7.29.0" ["UNDERRUNS"]=> string(1) "0" ["CONNECTTIME"]=> string(3) "216" ["POINTER"]=> string(1) "0" ["UID"]=> string(1) "2" } } ["SONGHISTORY"]=> object(SimpleXMLElement)#6 (1) { ["SONG"]=> object(SimpleXMLElement)#7 (2) { ["PLAYEDAT"]=> string(10) "1579791561" ["TITLE"]=> string(64) "BARRY WHITE - YOU'RE THE FIRST,THE LAST " } } }
NOTE: The problematic character is STILL THERE! \u0085 but properly encoded so I guess that is why it is not a problem any more...
I also tried the previous version of the code with the SimpleXMLElement constructor:
<?php
$output = file_get_contents('error.xml.bak');
$output = utf8_encode($output);
$xml = new SimpleXMLElement($output, LIBXML_NOERROR|LIBXML_ERR_NONE) or die('Something is wrong');
echo json_encode($xml);
?>
which also worked as expected:
{"CURRENTLISTENERS":"1","PEAKLISTENERS":"3","MAXLISTENERS":"5000","REPORTEDLISTENERS":"1","AVERAGETIME":"1","SERVERGENRE":"public","SERVERURL":"http:\/\/www.virtualdj.com\/","SERVERTITLE":"wolf","SONGTITLE":"BARRY WHITE - YOU'RE THE FIRST,THE LAST \u0085 ","SONGURL":{},"IRC":"wolf","ICQ":"wolf","AIM":"wolf","WEBHITS":"80","STREAMHITS":"6","STREAMSTATUS":"1","BITRATE":"96","CONTENT":"audio\/mpeg","VERSION":"1.9.8","WEBDATA":{"INDEX":"0","LISTEN":"0","PALM7":"6","LOGIN":"0","LOGINFAIL":"0","PLAYED":"0","COOKIE":"0","ADMIN":"1","UPDINFO":"1","KICKSRC":"0","KICKDST":"0","UNBANDST":"0","BANDST":"0","VIEWBAN":"0","UNRIPDST":"0","RIPDST":"0","VIEWRIP":"0","VIEWXML":"69","VIEWLOG":"0","INVALID":"3"},"LISTENERS":{"LISTENER":{"HOSTNAME":"78.129.222.56","USERAGENT":"curl\/7.29.0","UNDERRUNS":"0","CONNECTTIME":"216","POINTER":"0","UID":"2"}},"SONGHISTORY":{"SONG":{"PLAYEDAT":"1579791561","TITLE":"BARRY WHITE - YOU'RE THE FIRST,THE LAST \u0085 "}}}
NOTE the \u0085 towards the end...
I try to write a file on my server, but I don't understand what's happening : it's always empty (blank page).
There is no error thrown.
When I insert inside the loop a var_dump($lines), I see the data but at the moment an error appear
With the code below
$minLat = -3.0000; //41.34343606848294;
$maxLat = 22.0000; //57.844750992891;
$minLng = -0.0300; //-16.040039062500004;
$maxLng = 90.4200; //29.311523437500004;
$step = 0.1;
$k = 1;
$estimator = new KNearestNeighbors(9);
$estimator->train($dataset->getSamples(), $dataset->getTargets());
$lines = [];
for($lat=$minLat; $lat<$maxLat; $lat+=$step) {
for($lng=$minLng; $lng<$maxLng; $lng+=$step) {
$lines[] = sprintf('%s;%s;%s', $lat, $lng, $estimator->predict([[$lat, $lng]])[0]);
}
}
var_dump($lines); ==> display info, but always empty
$filename = '/var/www/test/php-ml/result_map.csv';
//$content = implode(PHP_EOL, $lines);
$content = implode( "\n", $lines );
file_put_contents($filename, $content);
the result (example)
-------------
bool(true) ==> check if the file exist
array(1) { [0]=> string(41) "-3;-0.03;Apple 15 Inch MacBook Pro Laptop" }
array(2) { [0]=> string(41) "-3;-0.03;Apple 15 Inch MacBook Pro Laptop" [1]=> string(40) "-3;0.07;Apple 15 Inch MacBook Pro Laptop" }
array(3) { [0]=> string(41) "-3;-0.03;Apple 15 Inch MacBook Pro Laptop" [1]=> string(40) "-3;0.07;Apple 15 Inch MacBook Pro Laptop" [2]=> string(40) "-3;0.17;Apple 15 Inch MacBook Pro Laptop" }
array(4) { [0]=> string(41) "-3;-0.03;Apple 15 Inch MacBook Pro Laptop" [1]=> string(40) "-3;0.07;Apple 15 Inch MacBook Pro Laptop" [2]=> string(40) "-3;0.17;Apple 15 Inch MacBook Pro Laptop" [3]=> string(40) "-3;0.27;Apple 15 Inch MacBook Pro Laptop" }
I am trying to create a project which will help students study various areas. The idea is that I have a piece of raw text, which contains quiz questions and answers which I want to parse as question header and answer options, which will be inserted into a database. However, the text is not properly formatted and due to the large amount of questions and answers (around ~20k per total), I cannot afford the time to manually insert them or format the text myself.
The raw text looks like this:
1. A car averages 27 miles per gallon. If gas costs $4.04 per gallon, which of the following is closest to how much the gas would cost for this car to travel 2,727 typical miles?
a) $44.44 b) $109.08 c) $118.80
d) $408.04 e)
$444.40
2. When x = 3 and y = 5, by how much does the value of 3x2 – 2y exceed the value of 2x2 – 3y ?
a) 4
b) 14
c) 16
d) 20 e) 50
I tried creating my own PHP functions to parse the text properly, however I cannot get myself to get past the random line breaks, spaces, etc.
What I am trying to obtain:
array(1) {
[0]=>
array(3) {
["questionNumber"]=>
string(1) "1"
["questionText"]=>
string(175) "A car averages 27 miles per gallon. If gas costs $4.04 per gallon, which of the following is closest to how much the gas would cost for this car to travel 2,727 typical miles?"
["options"]=>
array(5) {
["a"]=>
string(6) "$44.44"
["b"]=>
string(7) "$109.08"
["c"]=>
string(7) "$118.80"
["d"]=>
string(7) "$408.04"
["e"]=>
string(7) "$444.40"
}
}
}
The code I have so far:
$rawText = '1. A car averages 27 miles per gallon. If gas costs $4.04 per gallon, which of the following is closest to how much the gas would cost for this car to travel 2,727 typical miles?
a) $44.44 b) $109.08 c) $118.80
d) $408.04 e)
$444.40
2. When x = 3 and y = 5, by how much does the value of 3x2 – 2y exceed the value of 2x2 – 3y ?
a) 4
b) 14
c) 16
d) 20 e) 50
';
$rawTextLines = explode("\n", $rawText);
foreach ($rawTextLines as $lineNumber => $lineContents) {
$lContents = trim($lineContents);
if (empty ($lContents)) {
unset ($rawTextLines[$lineNumber]);
} else {
$rawTextLines[$lineNumber] = $lContents;
}
}
$processedQuestions = array ();
$currentQuestionHeader = 0;
foreach ($rawTextLines as $lineNumber => $lineContents) {
if (ctype_digit(substr($lineContents, 0, 1))) { // Question header
$questionHeaderInformation = explode('.', $lineContents);
$currentQuestionHeader = $questionHeaderInformation[0];
$processedQuestions[$currentQuestionHeader]['questionNumber'] = $currentQuestionHeader;
$processedQuestions[$currentQuestionHeader]['questionText'] = $questionHeaderInformation[1];
} else { // Question option
$options = explode(')', $lineContents);
if (count ($options) % 2 === 0) {
$processedQuestions[$currentQuestionHeader]['options'][trim($options[0])] = ucfirst(trim($options[1]));
} else {
}
}
}
Which produces this:
array(2) {
[1]=>
array(3) {
["questionNumber"]=>
string(1) "1"
["questionText"]=>
string(35) " A car averages 27 miles per gallon"
["options"]=>
array(1) {
["a"]=>
string(8) "$44.44 b"
}
}
[2]=>
array(3) {
["questionNumber"]=>
string(1) "2"
["questionText"]=>
string(96) " When x = 3 and y = 5, by how much does the value of 3x2 – 2y exceed the value of 2x2 – 3y ?"
["options"]=>
array(3) {
["a"]=>
string(1) "4"
["b"]=>
string(2) "14"
["c"]=>
string(2) "16"
}
}
}
As you can see, the current output does not match - not by far, what I am trying to obtain.
Thank you in advance.
Hellow,
^[0-9]+\. (.*)[\r\n]+a\)[\s]+(.*)[\s]+b\)[\s]+(.*)[\s]+c\)[\s]+(.*)[\s]+d\)[\s]+(.*)[\s]+e\)[\s]+(.*)[\s]*
Try it !
$re = '/^[0-9]+\. (.*)[\r\n]+a\)[\s]+(.*)[\s]+b\)[\s]+(.*)[\s]+c\)[\s]+(.*) [\s]+d\)[\s]+(.*)[\s]+e\)[\s]+(.*)[\s]*/m';
$str = '1. A car averages 27 miles per gallon. If gas costs $4.04 per gallon, which of the following is closest to how much the gas would cost for this car to travel 2,727 typical miles?
a) $44.44 b) $109.08 c) $118.80
d) $408.04 e)
$444.40
2. When x = 3 and y = 5, by how much does the value of 3x2 – 2y exceed the value of 2x2 – 3y ?
a) 4
b) 14
c) 16
d) 20 e) 50';
preg_match_all($re, $str, $matches, PREG_SET_ORDER, 0);
// Print the entire match result
var_dump($matches);
I am trying to access a Taleo RSS/XML feed and parse the data. I am using SimpleXML, and it is loading in all of the regular data correctly, such as <title>, <link>, etc.
However, there are several nodes that are formatted like <taleo:reqId> or <taleo:location>, and I can't seem to figure out how to access that data. It's not being returned by SimpleXML.
`$xml = simplexml_load_file('https://chp.tbe.taleo.net/chp03/ats/servlet/Rss?org=DRAGADOS&cws=1&WebPage=SRCHR&WebVersion=0&_rss_version=2');`
Returns in web browser source:
`<?xml version="1.0" encoding="UTF-8" ?>
<?xml-stylesheet type="text/xsl" href="https://chp.tbe.taleo.net/chp03/ats/rss/taleorssfeed.xsl" ?>
<?xml-stylesheet type="text/css" href="https://chp.tbe.taleo.net/chp03/ats/rss/taleorssfeed.css" ?>
<rss xmlns:taleo="urn:TBERss" version="2.0">
<channel>
<title>Dragados Job Feed</title>
<link>https://chp.tbe.taleo.net/dispatcher/servlet/DispatcherServlet?org=DRAGADOS&act=redirectCws&cws=1</link>
<description>Dragados Job Feed</description>
<language>en</language>
<pubDate>Tue, 07 Nov 2017 17:00:40 GMT</pubDate>
<ttl>60</ttl>
<item>
<title>Estimator</title>
<link>https://chp.tbe.taleo.net/chp03/ats/careers/requisition.jsp?org=DRAGADOS&cws=1&rid=1155</link>
<guid>https://chp.tbe.taleo.net/chp03/ats/careers/requisition.jsp?org=DRAGADOS&cws=1&rid=1155</guid>
<description> ..... </description>
<pubDate>Tue, 07 Nov 2017 17:00:40 GMT</pubDate>
<taleo:reqId>1155</taleo:reqId>
<taleo:location>Southern California Branch (bidding)</taleo:location>
<taleo:locationCountry>US</taleo:locationCountry>
<taleo:locationState>US-CA</taleo:locationState>
<taleo:locationCity>Costa Mesa</taleo:locationCity>
<taleo:department>West Coast Bidding</taleo:department>
<taleo:html-description> ... </taleo:html-description>
</item>
...`
Returns in SimpleXML:
`object(SimpleXMLElement)#487 (2) { ["#attributes"]=> array(1) { ["version"]=> string(3) "2.0" } ["channel"]=> object(SimpleXMLElement)#486 (7) { ["title"]=> string(17) "Dragados Job Feed" ["link"]=> string(97) "https://chp.tbe.taleo.net/dispatcher/servlet/DispatcherServlet?org=DRAGADOS&act=redirectCws&cws=1" ["description"]=> string(17) "Dragados Job Feed" ["language"]=> string(2) "en" ["pubDate"]=> string(29) "Tue, 07 Nov 2017 17:00:40 GMT" ["ttl"]=> string(2) "60" ["item"]=> array(79) { [0]=> object(SimpleXMLElement)#485 (5) { ["title"]=> string(9) "Estimator" ["link"]=> string(87) "https://chp.tbe.taleo.net/chp03/ats/careers/requisition.jsp?org=DRAGADOS&cws=1&rid=1155" ["guid"]=> string(87) "https://chp.tbe.taleo.net/chp03/ats/careers/requisition.jsp?org=DRAGADOS&cws=1&rid=1155" ["description"]=> string(3580) " ... " ["pubDate"]=> string(29) "Tue, 07 Nov 2017 17:00:40 GMT" }
...`
I find it easier sometimes to split out the data for different namespaces in this case. So when going through the <item> data, you can extract all the elements for a specific namespace (the 'taleo' elements in this case) using ->children('namespaceURN') and it then access the data in a similar way to all the other times but using this new set of nodes as the basis.
foreach ( $xml->channel as $channel ) {
echo "title=".$channel->title.PHP_EOL;
foreach ( $channel->item as $item ) {
echo "pubDate=".$item->pubDate.PHP_EOL;
// Extract the elements for taleo namespace
$taleo = $item->children("urn:TBERss");
echo "taleo:reqId=".$taleo->reqId.PHP_EOL;
}
}
<?php
$show_value = 123;
echo 'sing_quote'.$show_value;
echo "double_quote{$show_value}";
?>
Its opcode is:
1: <?php
2: $show_value = 123;
0 ASSIGN !0, 123
3: echo 'sing_quote'.$show_value;
1 CONCAT 'sing_quote', !0 =>RES[~1]
2 ECHO ~1
4: echo "double_quote{$show_value}";
3 ADD_STRING 'double_quote' =>RES[~2]
4 ADD_VAR ~2, !0 =>RES[~2]
5 ECHO ~2
6 RETURN 1
Check out the Vulcan Logic Disassembler PECL extension - see author's home page for more info.
The Vulcan Logic Disassembler hooks
into the Zend Engine and dumps all the
opcodes (execution units) of a script.
It was written as as a beginning of an
encoder, but I never got the time for
that. It can be used to see what is
going on in the Zend Engine.
Once installed, you can use it like this:
php -d vld.active=1 -d vld.execute=0 -f yourscript.php
See also this interesting blog post on opcode extraction, and the PHP manual page listing the available opcodes.
Parsekit has parsekit_compile_string().
sudo pecl install parsekit
var_dump(parsekit_compile_string(<<<PHP
\$show_value = 123;
echo 'sing_quote'.\$show_value;
echo "double_quote{\$show_value}";
PHP
));
The output is quite verbose, so you'd need to process it to get assembler-like format.
["opcodes"]=>
array(10) {
[0]=>
array(9) {
["address"]=>
int(44682716)
["opcode"]=>
int(101)
["opcode_name"]=>
string(13) "ZEND_EXT_STMT"
["flags"]=>
int(4294967295)
["result"]=>
array(8) {
["type"]=>
int(8)
["type_name"]=>
string(9) "IS_UNUSED"
["var"]=>
int(0)
["opline_num"]=>
string(1) "0"
["op_array"]=>
string(1) "0"
["jmp_addr"]=>
string(1) "0"
["jmp_offset"]=>
string(8) "35419039"
["EA.type"]=>
int(0)
}
["op1"]=>
array(8) {
["type"]=>
int(8)
["type_name"]=>
string(9) "IS_UNUSED"
["var"]=>
int(0)
["opline_num"]=>
string(1) "0"
["op_array"]=>
string(1) "0"
["jmp_addr"]=>
string(1) "0"
["jmp_offset"]=>
string(8) "35419039"
["EA.type"]=>
int(0)
}
You can run code and also see the opcodes if you use https://3v4l.org/
Note: It automatically shows the Vulcan Logic Disassembler (VLD) output, but only if you have "all supported versions" selected in the version dropdown.
Here's a simple example (shown below for posterity): https://3v4l.org/Gt8fd/vld
Code:
<?php
$arr = [1, 2, 3, 4];
print_r(array_map(fn(int $i): int => $i * $i, $arr));
Result:
Finding entry points
Branch analysis from position: 0
1 jumps found. (Code = 62) Position 1 = -2
filename: /in/Gt8fd
function name: (null)
number of ops: 10
compiled vars: !0 = $arr
line #* E I O op fetch ext return operands
-------------------------------------------------------------------------------------
2 0 E > ASSIGN !0, <array>
3 1 INIT_FCALL 'print_r'
2 INIT_FCALL 'array_map'
3 DECLARE_LAMBDA_FUNCTION '%00%7Bclosure%7D%2Fin%2FGt8fd%3A3%240'
4 SEND_VAL ~2
5 SEND_VAR !0
6 DO_ICALL $3
7 SEND_VAR $3
8 DO_ICALL
9 > RETURN 1
Function %00%7Bclosure%7D%2Fin%2FGt8fd%3A3%240:
Finding entry points
Branch analysis from position: 0
1 jumps found. (Code = 62) Position 1 = -2
filename: /in/Gt8fd
function name: {closure}
number of ops: 6
compiled vars: !0 = $i
line #* E I O op fetch ext return operands
-------------------------------------------------------------------------------------
0 E > RECV !0
1 MUL ~1 !0, !0
2 VERIFY_RETURN_TYPE ~1
3 > RETURN ~1
4* VERIFY_RETURN_TYPE
5* > RETURN null
End of function %00%7Bclosure%7D%2Fin%2FGt8fd%3A3%240
Generated using Vulcan Logic Dumper, using php 8.0.0
Two options are, setting opcache.opt_debug_level INI setting or using phpdbg binary provided in a debug-enabled PHP environment (e.g. requiring you to either compile PHP from source or install the related package on Linux).
For more information and a full guide, refer to this php.watch article (also credits to this article).