How I can use ucfirst() on PHP SimpleXML node? - php

I use php and simplexml for parse url. I want take value of simplexml node and change it, first I convert it to string, but ucfirst() doesn't work for that string.
$xml = simplexml_load_file($url);
foreach($xml->offers->offer as $offer)
{
$bodyType = (string) $offer->{"body-type"}; //I convert simplexml to string first
echo ucfirst($bodyType); // In this line ucfirst doesn't work
}
How to deal with it?
UPDATE: Problem was in Cyrillic letters, since ucfirst works only with Latin.
Working solution is to use this function:
$bodyType = (string) $offer->{"body-type"};
$encoding='UTF-8';
$str = mb_ereg_replace('^[\ ]+', '', $bodyType);
$str = mb_strtoupper(mb_substr($str, 0, 1, $encoding), $encoding). mb_substr($str, 1, mb_strlen($str), $encoding);

Dear plz share your xml file data also. I have used the following and it is working fine..
<?xml version="1.0"?>
<data>
<offers>
<offer>
<body-type>offer 1</body-type>
</offer>
<offer>
<body-type>offer 2</body-type>
</offer>
</offers>
</data>
my output is
Offer 1
Offer 2
HTML: Offer 1<br />Offer 2<br />
by following php code..
<?PHP
$url = "test.xml";
$xml = simplexml_load_file($url);
foreach($xml->offers->offer as $offer)
{
$bodyType = (string) $offer->{"body-type"}; //I convert simplexml to string first
echo ucfirst($bodyType); // In this line ucfirst doesn't work
echo '<br />';
}
?>

Given the test.xml from Farrukh's answer, you can actually even omit the typecasting. This works as well for me:
<?php
$url = "test.xml";
$xml = simplexml_load_file($url);
foreach($xml->offers->offer as $offer) {
echo ucfirst($offer->{"body-type"}) .'<br>';
}
Here's a live demo: http://codepad.viper-7.com/L4VwPL
UPDATE (after URL was provided by OP)
You'll most likely have an encoding issue. When I set the UTF-8 charset explicitly, it works as expected (otherwise simplexml returns corrupted strings only).
$url = "http://carsguru.net/x/used/exchange/4.xml";
$xml = simplexml_load_file($url);
header('Content-Type: text/html; charset=utf-8');
foreach($xml->offers->offer as $offer) {
echo ucfirst($offer->{"body-type"}) .'<br>';
}
When I run the above snippet, I get this output (stripped):
фургон
универсал
хэтчбек
хэтчбек
минивэн
минивэн
минивэн
седан
седан
универсал
хэтчбек
универсал
седан
хэтчбек
седан
NOTE You don't serve a content-type/charset header for the xml! I'd add that.
Anyway, you may want to have a look at this: iconv -> iconv("cp1251", "UTF-8", $str);
Actually file encoding is Cyrillic windows-1251, which is probably make sence.
Why? You can, of course, use valid UTF-8! Here is an example node from your XML converted with this cp1251-to-utf8-function (might look odd, but renders perfectly!)
<?xml version="1.0" encoding="UTF-8"?>
<auto-catalog>
<creation-date>2013-02-07 02:00:08 GMT+4</creation-date>
<host>carsguru.net</host>
<offers>
<offer type="commercial">
<url>http://carsguru.net/used/5131406/view.html</url>
<date>2013-02-07</date>
<mark>ГАЗ</mark>
<model>2705</model>
<year>2003</year>
<seller-city>Санкт-Петербург</seller-city>
<seller-phone>8-921-997-74-06</seller-phone>
<price>150000</price>
<currency-type>RUR</currency-type>
<steering-wheel>левый</steering-wheel>
<run-metric>км</run-metric>
<run>194</run>
<displacement>2300</displacement>
<stock>в наличии</stock>
<state>Хорошее</state>
<color>синий</color>
<body-type>фургон</body-type>
<engine-type>бензин</engine-type>
<gear-type>задний</gear-type>
<transmission>ручная</transmission>
<horse-power>98</horse-power>
<image>http://carsguru.net/clf/03/af/9c/8b/used.4r9v39h31facog8cs0w0wk8ws.jpg.medium.jpg</image>
<image>http://carsguru.net/clf/ae/51/be/3a/used.bxyc3q9mx80sko0wg80880w0k.jpg.medium.jpg</image>
<image>http://carsguru.net/clf/28/dc/c1/d4/used.8i1b76l1b8o4cwg8gc08oos4s.jpg.medium.jpg</image>
<image>http://carsguru.net/clf/55/3d/37/10/used.7dmn7puczuo0wo4cs8kko0cco.jpg.medium.jpg</image>
<image>http://carsguru.net/clf/49/02/15/54/used.7k8lhomw4j4s4040kssk4kgso.jpg.medium.jpg</image>
<equipment>Магнитола</equipment>
<equipment>Подогрев зеркал</equipment>
</offer>
</offers>
</auto-catalog>

Related

how to remove character in php? (str_replace function not working)

"contentDetails" has following data in it:
<p>This is data sample. </p><p>Second part of the paragraph. </p>
str_replace is not working here. Please take a look.
here is how my xml strucuture in php looks like:
$xml = <?xml version="1.0" encoding="UTF-8">;
$xml = '<root>';
$xml = '<myData>';
$xml .= <content> . str_replace(" ", "", htmlentities($_POST[contentDetails])) . </content>
$xml = '</myData>';
$xml = '</root>';
I'm assuming your contentDetails actually contains:
<p>This is data sample. </p><p>Second part of the paragraph. </p>
($nbsp; replaced with )
Your problem is that when you call htmlentities on contentDetails it converts into &nbsp;, so your str_replace won't find any matches. To solve the problem, call str_replace before htmlentities:
$xml .= '<content>' . htmlentities(str_replace(" ", "", $_POST['contentDetails'])) . '</content>';
Note that associative array keys should be enclosed in quotes; this will cause a warning now but in future PHP versions will be an error.
The htmlentities() function converts to &nbsp; --- so try this...
str_replace("&nbsp;", "", htmlentities($_POST[contentDetails]))

Converting from XML to json with json_encode messes up the encoding of the string

I have a string that receives an XML structure.
One of the elements contains Chinese characters.
In order to covert the XML to json, I use json_encode(). The output for the Chinese characters is garbled.
I tried checking the encoding with mb_detect_encoding and even tried the solution listed here.
I've googled around (a lot) and found numerous other resources but none of them seems to solve my problem. Any help is much appreciated.
Code:
<?php
$str = <<<XML
<?xml version="1.0" encoding="UTF-8"?>
<rootjson>
<widget>
<debug>on</debug>
<text>
<data>點擊這裡</data>
<size>36</size>
<alignment>center</alignment>
</text>
</widget>
</rootjson>
XML;
$xml = simplexml_load_string($str);
if ($encoding = mb_detect_encoding($xml, 'UTF-8', true)) echo 'XML is utf8'; //It finds it to be utf8
$json = json_encode($xml, JSON_PRETTY_PRINT);
if ($encoding = mb_detect_encoding($json, 'UTF-8', true)) echo 'Json is utf8'; //It also finds it to be utf8
var_dump($json);
?>
Output:
{
"widget": {
"debug": "on",
"text": {
"data": "\u9ede\u64ca\u9019\u88e1",
"size": "36",
"alignment": "center"
}
}
}
I don't think I can trust the mb_detect_encoding here as it is telling that both $xml and $json are UTF-8 encoded. The Chinese string 點擊這裡 is now showing as
\u9ede\u64ca\u9019\u88e1
.
What you need is JSON_UNESCAPED_UNICODE, see the documentation at php.net/manual/en/function.json-encode.php

Why preg_match() result show 0 in PHP when I use simplexml_load_file()?

I have some problems with php , this is my code
test.xml like:
<?xml version='1.0'?>
<document responsecode="200">
<result count="10" start="0" totalhits="133047950">
<title>Test</title>
<from id = "jack">655</from>
<to>Tsung</to>
</result>
</document>
php code:
<?php
header("content-type:text/html; charset=utf-8");
$xml = simplexml_load_file("test.xml");
$text = htmlspecialchars($xml->asXML());
$pattern = "/</";
$result = preg_match($pattern,$text);
echo $result;
?>
The result is show "0" ,it's mean not found ,so I change $pattern value
$pattern = "document" ;
the result is show "1" (it's mean found)
I debug a lot of time ...
Maybe codeing UTF-8 , ASCII probram OR "/</" wrong ?
My purpose is want to parse this string then get
'<title> .. </title>'
somebody can tell me where is my error ?? Thanks :))
You are using a parser, just parse it, no need for a regex.
$xml = '<?xml version=\'1.0\'?>
<document responsecode="200">
<result count="10" start="0" totalhits="133047950">
<title>Test</title>
<from id = "jack">655</from>
<to>Tsung</to>
</result>
</document>';
$xml = new SimpleXMLElement($xml);
echo $xml->result->title->asXML();
Output:
<title>Test</title>
As the other answers state the issue is your usage of htmlspecialchars. Your regex also isn't specific enough to find the title element. If you needed to do this with a regex you could do:
/((<|<)title(>|>).*?\2\/title\3)/
Demo: https://regex101.com/r/kM8tR8/1
Capture group 1 will have your title element. If the title text can extend multiple lines add the s modifier.
Don't call htmlspecialchars, it's converting all the XML tags to HTML entities.
<?php
header("content-type:text/html; charset=utf-8");
$xml = simplexml_load_file("test.xml");
$text = $xml->asXML();
$pattern = "/</";
$result = preg_match($pattern,$text);
echo $result;
?>
The problem is htmlspecialchars() converts special characters to HTML entities like < to <, > to > etc. So if you want to parse the xml document and get the title then you can do something like this:
header("content-type:text/html; charset=utf-8");
$xml = simplexml_load_file("test.xml");
$text = htmlspecialchars($xml->asXML());
$pattern = "/<title>(.*?)<\/title>/";
$matches = array();
preg_match($pattern, $text, $matches);
echo $matches[1]; // Test

Strip prepended and appended text from outside XML

We have a PHP XML RPC we make to a third party and they are having issues with returning additional text outside the XML body like
133
<Envelope>
<Body>
<RESULT>
<SUCCESS>true</SUCCESS>
<SESSIONID>99B153C1DFA889C34213B</SESSIONID>
<ORGANIZATION_ID>f528764d624db129b32c21fbca0cb8d6</ORGANIZATION_ID>
<SESSION_ENCODING>;jsessionid=99B153C1DFA889C34213B</SESSION_ENCODING>
</RESULT>
</Body>
</Envelope>
0
The additional text varies and is not always numeric. Their staff are working on the issue but in the interim it would be great if using PHP I could cleanly eliminate everything in their response outside the <Envelope></Envelope>.
Anyone have a tip for me?
For example:
<?php
$xml = '133
<Envelope>
<Body>
<RESULT>
<SUCCESS>true</SUCCESS>
<SESSIONID>99B153C1DFA889C34213B</SESSIONID>
<ORGANIZATION_ID>f528764d624db129b32c21fbca0cb8d6</ORGANIZATION_ID>
<SESSION_ENCODING>;jsessionid=99B153C1DFA889C34213B</SESSION_ENCODING>
</RESULT>
</Body>
</Envelope>
0';
$open_tag = '<Envelope>';
$close_tag = '</Envelope>';
$start_index = strpos($xml,$open_tag);
$length = strpos($xml, $close_tag) - $start_index + strlen($close_tag);
$clean_xml = substr($xml, $start_index, $length);
echo $clean_xml;
echo "\r\n";
Other solution, inline but way less elegant:
$clean_xml = $open_tag . reset(explode($close_tag,end(explode($open_tag,$xml)))) . $close_tag;
echo $clean_xml;
echo "\r\n";
$xml = preg_replace('~^.*(<Envelope>.+?</Envelope>).*$~si', '$1', $xml);
Try this one. The lazy version :)
There are a number of approaches. You could use preg_match and a regular expression to get to the data, or simple string matching. Since you have a well-defined start and end-point, I would probably opt for the string matching. Simply, read the entire response into a string. use strpos to find the location of <Envelope> and </Envelope>. The just use substr to extract the string between the two positions (note you will need to add 11 to the location of the closing tag to include the closing tag in the extracted string.

PHP SimpleXML doesn't preserve line breaks in XML attributes

I have to parse externally provided XML that has attributes with line breaks in them. Using SimpleXML, the line breaks seem to be lost. According to another stackoverflow question, line breaks should be valid (even though far less than ideal!) for XML.
Why are they lost? [edit] And how can I preserve them? [/edit]
Here is a demo file script (note that when the line breaks are not in an attribute they are preserved).
PHP File with embedded XML
$xml = <<<XML
<?xml version="1.0" encoding="utf-8"?>
<Rows>
<data Title='Data Title' Remarks='First line of the row.
Followed by the second line.
Even a third!' />
<data Title='Full Title' Remarks='None really'>First line of the row.
Followed by the second line.
Even a third!</data>
</Rows>
XML;
$xml = new SimpleXMLElement( $xml );
print '<pre>'; print_r($xml); print '</pre>';
Output from print_r
SimpleXMLElement Object
(
[data] => Array
(
[0] => SimpleXMLElement Object
(
[#attributes] => Array
(
[Title] => Data Title
[Remarks] => First line of the row. Followed by the second line. Even a third!
)
)
[1] => First line of the row.
Followed by the second line.
Even a third!
)
)
Using SimpleXML, the line breaks seem to be lost.
Yes, that is expected... in fact it is required of any conformant XML parser that newlines in attribute values represent simple spaces. See attribute value normalisation in the XML spec.
If there was supposed to be a real newline character in the attribute value, the XML should have included a
character reference instead of a raw newline.
The entity for a new line is
. I played with your code until I found something that did the trick. It's not very elegant, I warn you:
//First remove any indentations:
$xml = str_replace(" ","", $xml);
$xml = str_replace("\t","", $xml);
//Next replace unify all new-lines into unix LF:
$xml = str_replace("\r","\n", $xml);
$xml = str_replace("\n\n","\n", $xml);
//Next replace all new lines with the unicode:
$xml = str_replace("\n","
", $xml);
Finally, replace any new line entities between >< with a new line:
$xml = str_replace(">
<",">\n<", $xml);
The assumption, based on your example, is that any new lines that occur inside a node or attribute will have more text on the next line, not a < to open a new element.
This of course would fail if your next line had some text that was wrapped in a line-level element.
Assuming $xmlData is your XML string before it is sent to the parser, this should replace all newlines in attributes with the correct entity. I had the issue with XML coming from SQL Server.
$parts = explode("<", $xmlData); //split over <
array_shift($parts); //remove the blank array element
$newParts = array(); //create array for storing new parts
foreach($parts as $p)
{
list($attr,$other) = explode(">", $p, 2); //get attribute data into $attr
$attr = str_replace("\r\n", "
", $attr); //do the replacement
$newParts[] = $attr.">".$other; // put parts back together
}
$xmlData = "<".implode("<", $newParts); // put parts back together prefixing with <
Probably can be done more simply with a regex, but that's not a strong point for me.
Here is code to replace the new lines with the appropriate character reference in that particular XML fragment. Run this code prior to parsing.
$replaceFunction = function ($matches) {
return str_replace("\n", "
", $matches[0]);
};
$xml = preg_replace_callback(
"/<data Title='[^']+' Remarks='[^']+'/i",
$replaceFunction, $xml);
This is what worked for me:
First, get the xml as a string:
$xml = file_get_contents($urlXml);
Then do the replacement:
$xml = str_replace(".\xe2\x80\xa9<as:eol/>",".\n\n<as:eol/>",$xml);
The "." and "< as:eol/ >" were there because I needed to add breaks in that case. The new lines "\n" can be replaced with whatever you like.
After replacing, just load the xml-string as a SimpleXMLElement object:
$xmlo = new SimpleXMLElement( $xml );
Et Voilà
Well, this question is old but like me, someone might come to this page eventually.
I had slightly different approach and I think the most elegant out of these mentioned.
Inside the xml, you put some unique word which you will use for new line.
Change xml to
<data Title='Data Title' Remarks='First line of the row. \n
Followed by the second line. \n
Even a third!' />
And then when you get path to desired node in SimpleXML in string output write something like this:
$findme = '\n';
$pos = strpos($output, $findme);
if($pos!=0)
{
$output = str_replace("\n","<br/>",$output);
It doesn't have to be '\n, it can be any unique char.

Categories