Characters intermittently disappearing from PHP-built XML - php

The XML output from this loop was failing to validate but the validator was giving me different errors each time. Each time it had to do with the opening < of an element closure being missing. A different one each time...
Every time I refresh and re-validate the output there is at least one of these and it has never yet been in the same member record.
Initially I was adding tags everywhere which is why you will see many of them wrapping things where they should not be needed.
The XML is built by this loop:
if ($members) {
$xml = '<api><response status="ok"><users>';
foreach ($members as $m) {
$join_date = date("Y-m-d H:i:s", $m->join_date);
list($md) = $mdObj->retrieve("member_id = '$m->member_id'");
$join_date = ($m->join_date > 0) ? date("Y-m-d H:i:s", $m->join_date) : '0000-00-00 00:00:00';
$address = preg_replace('/\R/', '', $md->m_field_id_3);
$xml .= "<user id=\"$m->member_id\"><admin>0</admin><name><![CDATA[$m->username]]></name><company>$md->m_field_id_9</company><company_id>$md->m_field_id_28</company_id><address><![CDATA[$address]]></address><city>$md->m_field_id_5</city><region>$md->m_field_id_6</region><postal_code>$md->m_field_id_7</postal_code><email><![CDATA[$m->email]]></email><phone>$md->m_field_id_10</phone><first>$md->m_field_id_1</first><last>$md->m_field_id_1 $md->m_field_id_2</last><url></url><description><![CDATA[]]></description><status>active</status><date>$join_date</date><modified>0000-00-00 00:00:00</modified></user>";
}
$xml .= '</users></response></api>';
return $xml;
}
Has anyone seen this before? Have any advice?
Here's a little PHP info:
PHP Version 5.2.17
Linux foo.foo.com 2.6.18-274.17.1.el5 #1 SMP Wed Jan 4 22:45:44 EST 2012 x86_64
Build Date Feb 8 2012 14:19:50

I suspect the database entries you're including into you XML might contain unescaped characters which have special meaning, e.g. &, <, >, " and ' which need to be encoded.
I would also break up that long string into
$xml .= "<user id=\"" . $m->member_id . "\"><admin>0</admin><name><![CDATA[";
$xml .= $m->username . "]]></name><company>" . $md->m_field_id_9 . "</company>";
$xml .= "<company_id>" . $md->m_field_id_28 . "</company_id><address><![CDATA[";
$xml .= $address . "]]></address><city>" . $md->m_field_id_5 . "</city><region>";
$xml .= $md->m_field_id_6 . "</region><postal_code>" . $md->m_field_id_7;
$xml .= "</postal_code><email><![CDATA[" . $m->email . "]]></email><phone>";
$xml .= $md->m_field_id_10 . "</phone><first>" . $md->m_field_id_1 . "</first>";
$xml .= "<last>" . $md->m_field_id_1 . $md->m_field_id_2 . "</last><url></url>";
$xml .= "<description><![CDATA[]]></description><status>active</status><date>";
$xml .= $join_date . "</date><modified>0000-00-00 00:00:00</modified></user>";
and then use str_replace() to specifically encode the above-mentioned characters.

What could be happening is that you data contains invisible whitespace, most notably DEL characters .. I suppose that would cause this precise behaviour.
To check, loop over each character in the string and print the character code to check if a string contains any hidden whitespace.

This appears to be a bug in Chrome's view source routine on a large XML file. XML obtained from the same source via IE and FireFox was valid across repeated tests.
Additionally Chrome's normal view did not display these aberrations and did not report errors in the XML in normal view.

Related

Cannot output a big string block without break lines (nginx+php)

I try to print a big JSON block (100k) to the browser, but the server fails without an error.
For example:
echo 'var config = ' . json_encode( $config ) . ';' . PHP_EOL;
I Have found that if i send a small piece, it's OK.
I have found that if I put line breaks in the JSON string, it's OK even if the string is 400k.
For example:
$config_json = json_encode( $config );
$config_json = str_replace( '},', '},' . PHP_EOL, $config_json );
echo 'var config = ' . $config_json . ';' . PHP_EOL;
But the breaklines breaks my JSON.
So, if it's a buffer setting, why the PHP_EOL helps?
I have tried also to split the JSON to pieces like here: https://stackoverflow.com/a/19156563/1009525, But without success, Only the breaklines helps me.
As you write
the server fails without an error
I presume you mean that the server sends a response to the client (status code: 200 - no error), but the response body (the content) is empty (this is the failure).
You should check this because if actually the server sends a response with content then the issue is not with php, nginx or buffering.
Otherwise (as suggested in comments) maybe the JSON instead of inside a <script> - </script> block may be wrapped between <pre> tags and this could be the problem (but I can't help unless you post more of your code).
From now on I assume the response sent from the server is empty
The code you posted is valid and is supposed to handle correctly the output string you're building up (that's far below PHP limits).
Said that it seems a weird buffering issue. I write "weird" because as far as I know (and I took time to do some research too) buffering should not be influenced by line breaks.
I have found that if I put line breaks in the JSON string, it's OK even if the string is 400k.
A quick workaround to solve your problem is to output a valid JSON with line breaks. You just need to specify an option to json_encode:
echo 'var config = ' . json_encode( $config, JSON_PRETTY_PRINT ) . ';' . PHP_EOL;
JSON_PRETTY_PRINT tells json_encode to format the json to be more readable and doing so will add line breaks.
(Note that this option is available for PHP 5.4.0 and above)
I hope the above solution works for you.
Anyway I strongly suggest you to investigate further the issue in order to let the original code too to work.
First you should ensure you're running a recent and stable version of both nginx and php.
Then I would check nginx configuration file, php-fpm configuration (if you're using php-fpm) and finally php configuration.
Also check php, nginx, and php-fpm error logs.
try using php heredoc for echoing http://www.php.net/manual/en/language.types.string.php#language.types.string.syntax.heredoc
In case you don't have PHP version > 5.4.0 installed on your server a quick workaround could be something like this. The below snippet works for a test array. Initial test was with an array of 250Kb. Since i can't post the actual test array here is a test link with a smaller example. It is as the result of JSON_PRETTY_PRINT though.
$out = json_encode($arr,JSON_FORCE_OBJECT);
$out = str_replace( ':{', ':' . PHP_EOL . ' ' . '{', $out );
$out = str_replace( '},', PHP_EOL . ' },', $out );
$out = str_replace( ',', ',' . PHP_EOL . ' ', $out );
$out = str_replace( '},' . PHP_EOL . ' ', '},' . PHP_EOL . ' ', $out );
$out = str_replace( '}}', PHP_EOL . ' }' . PHP_EOL . '}', $out );
echo $out;

Why are special characters being added at the end of my text converted from PHP?

So I get these values from a form and they are then saved into a word document.
If my input (this is a textarea by the way) reads this:
"This"
&
"That"
I would expect the output to be exactly like that
However, whenever it comes out it looks like this:
It adds those special block characters at the end...
How can I get rid of these?
These are my variables:
$multipleImports = explode("\n",$_POST['multipleImports']);
$multipleImportsInfo = explode("\n",$_POST['multipleImportsInfo']);
$multipleImportsCounts = explode("\n",$_POST['multipleImportsCounts']);
And here I concatenate them into a string.
$length = count($multipleImports);
for ($i = 0; $i < $length; $i++) {
$content = $content . $multipleImports[$i] . " " . $multipleImportsInfo[$i] . " " . $multipleImportsCounts[$i] . "\n ";
}
I tried to right trim, I tried to use html entities and html decode entities and nothing I tried worked. Please help.
After reading #Tom Hedden's post it gave me an idea to try this, and it worked!
$length = count($multipleImports);
for ($i = 0; $i < $length; $i++) {
$content = $content . $multipleImports[$i] . " " . $multipleImportsInfo[$i] . " " . $multipleImportsCounts[$i] . "\r\n ";
}
I would be curious to know what those special characters are. You should do a hex dump to see. I just glanced at your code and haven't thought seriously about it, but what immediately pops into my mind is the different in end-of-line in Windows vs. *nix. That is, if the data comes from Windows I think the end of line is provided by a carriage return AND line feed ("\r\n") rather than by just a line feed ("\n").

PHP Convert Full Date To Short Date

I need a PHP script to loop through all .html files in a directory and in each one find the first instance of a long date (i.e. August 25th, 2014) and then adds a tag with that date in short format (i.e. <p class="date">08/25/14</p>).
Has anyone done something like this before? I'm guessing you'd explode the string and use a complex case statement to convert the month names and days to regular numbers and then implode using /.
But I'm having trouble figuring out the regular expression to use for finding the first long date.
Any help or advice would be greatly appreciated!
Here's how I'd do it in semi-pseudo-code...
Loop through all the files using whatever floats your boat (glob() is an obvious choice)
Load the HTML file into a DOMDocument, eg
$doc = new DOMDocument();
$doc->loadHTMLFile($filePath);
Get the body text as a string
$body = $doc->getElementsByTagName('body');
$bodyText = $body->item(0)->textContent; // assuming there's at least one body tag
Find your date string via this regex
preg_match('/(January|February|March|April|May|June|July|August|September|October|November|December) \d{1,2}(st|nd|rd|th)?, \d{4}/', $bodyText, $matches);
Load this into a DateTime object and produce a short date string
$dt = DateTime::createFromFormat('F jS, Y', $matches[0]);
$shortDate = $dt->format('m/d/y');
Create a <p> DOMElement with the $shortDate text content, insert it into the DOMDocument where you want and write back to the file using $doc->saveHTMLFile($filePath)
I incorporated the helpful response above into what I already had and it seems to work. I'm sure it's far from ideal but it still serves my purpose. Maybe it might be helpful to others:
<?php
$dir = "archive";
$a = scandir($dir);
$a = array_diff($a, array(".", ".."));
foreach ($a as $value) {
echo '</br>File name is: ' . $value . "<br><br>";
$contents = file_get_contents("archive/".$value);
if (preg_match('/(January|February|March|April|May|June|July|August|September|October|November|December) \d{1,2}(st|nd|rd|th)?, \d{4}/', $contents, $matches)) {
echo 'the date found is: ' . $matches[0] . "<br><br>";
$dt = DateTime::createFromFormat('F jS, Y', $matches[0]);
$shortDate = $dt->format('m/d/y');
$dateTag = "\n" . '<p class="date">' . $shortDate . '</p>';
$filename ="archive/".$value;
$file = fopen($filename, "a+");
fwrite($file, $dateTag);
fclose($file);
echo 'Date tag added<br><br>';
} else {
echo "ERROR: No date found<br><br>";
}
}
?>
The code assumes the files to modify are in a directory called "archive" that resides in the same directory as the script.
Needed the two different preg_match lines because I found out some dates are listed with the ordinal suffix (i.e. August 24th, 2005) and some are not (i.e. August 24, 2005). Couldn't quite puzzle out exactly how to get a single preg_match that handles both.
EDIT: replaced double preg_match with single one using \d{1,2}(st|nd|rd|th)? as suggested.

csv_output containing with field . (full stop)

I have a php script that is generating a tab delimited .txt file that works great but I need to output a field that contains a . (full stop/ period) within the $model field i.e itemname.1 it affects the formatting of the file.
$csv_output .= $model . "\t";
$csv_output .= '' . "\t";
$csv_output .= '' . "\t";
$csv_output .= '' . "\t";
$csv_output .= $totalstock . "\t";
$csv_output .= $leadtime . "\t";
$csv_output .= "\n";
$csv_handler = fopen('../outputfile.txt','w');
fwrite ($csv_handler,$csv_output);
fclose ($csv_handler);
I have tried enclosing in double quotes and various other variations but the output is inserting newlines
example output
itemname.1
20 1
any ideas how i can output the fields with the . in them without it affecting the tabs/lines.
Could you show us your code that encloses the variable in quotes please?
The not-very-well-informed solution appears to be use replace() to test if the . is indeed causing the new line, or if it's "something else". You might just be able to replace new line straight up even.
It Was a school boy error.
The issue was the data imported into db had carriage returns on each line entry, although the data appeared correct in the db and in excel the problem only manifested in the output file.
Thanks for your help guys made me go back to source and identify the issue.

How do I convert this one element of the array to utf-8?

Using Zend _gdata. For some reason, recently the $when string is no longer utf-8. I need to convert it to utf-8. All the other fields are working fine.
foreach ($feed as $event) { //iterating through all events
$contentText = stripslashes($event->content->text); //striping any escape character
$contentText = preg_replace('/\<br \/\>[\n\t\s]{1,}\<br \/\>/','<br />',stripslashes($event->content->text)); //replacing multiple breaks with a single break
$contentText = explode('<br />',$contentText); //splitting data by break tag
$eventData = filterEventDetails($contentText);
$when = $eventData['when'];
$where = $eventData['where'];
$duration = $eventData['duration'];
$title = stripslashes($event->title);
echo '<li class="pastShows">' . $when . " - " . $title . ", " . $where . '</li>';
}
How do I make $when utf-8?
Thanks!
Depending on what encoding that string is using, you should be able to encode it to UTF-8 using one of the following functions :
utf8_encode()
iconv()
For example :
$when = utf8_encode($eventData['when']);
Or :
$when = iconv('ISO-8859-1', 'UTF-8', $eventData['when']);
If the string is in Latin1 you can just do what Pascal suggests.
Otherwise you need to find out which encoding it is.
Therefor check your php.ini settings or you can try to detect it by mb_detect_encoding (be aware it's not fail prove)

Categories