Fastcgi 500 error on preg_match_all in PHP - php

I'm trying to set up some exotic PHP code (I'm not an expert), and I get a FastCGI Error 500 on a PHP line containing 'preg_match_all'.
When I comment out the line, the page is returned with a 200 (but not how it was meant to be).
The code is parsing PHP, HTML and JavaScript content loaded from the database and is composing them to return the finished page.
Now, by placing around some error_log entries I could determine that the line with the preg_match_all is the cause of the 500. However the line is hit multiple times during the loading of the page and on other occasions, the line does not cause an error.
Here's how it looks like exactly:
preg_match_all ("/(<([\w]+)[^>]*>)((?:.|\n)*)(<\/\\2>)/",
$part['data'], $tags, PREG_PATTERN_ORDER|PREG_OFFSET_CAPTURE);
The subject string is a piece of text that looks like:
<script> ... some javascript functions ... </script>
Edit: This is code that is up and running correctly elsewhere, so this very well could be a PHP setting or environment difference. I'm using PHP 5.2.13 on IIS6 with FastCGI.
Edit: Nothing is mentioned in the log files. At least not in the ones I checked:
IIS Logs
Event Logs
PHP Log
Edit: jab11 has pointed out the problem, but there's no solution yet:
Any thoughts or direction would be welcome.

Any chance that $part['data'] might be extremely big?
I used to get 500 error on preg_match_all when I used it on strings bigger than 100 KB.

This is a wonderful example why it's a bad idea to process HTML with regular expressions. I'm willing to bet you're running into a Stack Overflow because the HTML source string is containing some unclosed tags, making the regex try all sorts of permutations in its futile attempt to find a closing tag (</\2>). In an HTML file of 32 KB, it's easy to throw your regex off the trolley. Perhaps the stack is a different size on a different server so it works on one but not the other.
A quick test:
I applied the regex to the source code of this page (after having removed the closing </html> tag). RegexBuddy promptly went catatonic for about a minute before then matching the <head> and <body> tags (successfully). Debugging the regex from <html> on showed that it took the regex engine 970257 steps to find out that it couldn't match.

Related

Wordpress – HTML document does not start at line one

I've been struggling with this problem for a long time now, but I cannot really find the solution. The problem is that < !DOCTYPE html etc... does not start at the first line, but leaves four blank lines before it starts.
All my files (header.php, index.php etc) have no line breaks before they start.
Anyone with any similar problems/experiences out there? It would have been of huge help!
See here for reference: view-source:http://2famous.tv/
Thank you
This is most often not caused by leading but by trailing whitespace. Lots of old PHP code still closes down code at the end, which then all too often has a stray newline:
<?php
// Lot of source code
?> <----- and a newline here which is the culprit!
To avoid this issue, never close files with ?> - PHP doesn't need it and will just stop parsing at EOF, thus implicitly avoid this 'garbage' in the output.
As for finding the files causing it - good luck, I'd start with combing any custom extensions for this and just removing all ?> markers that you can find.
As an alternative, you can probably 'fix' it by adding a single ob_start() call to your index.php, and then in the template containing the doctype executing ob_end_clean() - this puts all intermediate output in the output buffers, and then trashes it.

finding wrong characters in javascript

Since I am about to write a small php-script I like to call to get all the javascript for my page, this leads to strange error on client side. The script does actually use an configuration xml-file and some xsl-stylesheets to generate an large Javascript string. Sometimes it happens that I get an 'unterminated String literal' error, sometimes an error rises that says: 'An attempt was made to use an object that is not, or is no longer, usable', just after the javascript executes an document.write operation.
Are there any resources, or is there any tutorial, just something that reveals about the traps of running into such problems when copying a bunch of javscript files into one String or file?
greetings philipp
EDIT::
the following error:
is thrown in an webpage that is delivered with content-type: 'application/xhtml+xml'. The actual generated Javascript looks like this:
source code generated
The script itself runs until the first document.write command is triggered.
It sounds like the strings are not escaped completely/correctly. If you take care of
escaping all apostrophes to \'
escaping all quotation marks to \"
escaping all line feeds to \n
you shouldn't have any errors.

What would cause html code formatting to disappear in php page?

I have a page written with php where, for some reason, all of the plain html content of the file index.php goes on one line (look at the source) The white space is preserved, but all the new-lines disappear.
I cannot come up with any reason why this would happen, short of a syntax error, but I went through with a fine toothed comb, and found nothing out of place. This only happens on the index.php page.
Anyone have any Ideas what I should be looking for? I can post more code if necessary.
<?php
//...
include('ssi/header.php');
?>
<div>
<section id="charters">
<h2>Tanker Chartering</h2>
<!-- ... -->
The above code evaluates to something like this:
<div> <section id="charters"> <h2><a href="charters.php">Tanker ...
Maybe you have linux server and you're using windows system. Different operating systems use different new line characters. Also, for one server my FTP client uploaded it with wrong formatting, and missed every line break.
Also applications like
Notepad++ gives you the ability to change formatting and linebreaks.
It's probably the encoding of the file combined with the transfer mode on the ftp from which you downloaded/uploaded the file. Try using something like notepad2, and saving the file in UTF-8 rather than ANSI. Also upload/download with your FTP program in binary not ASCII. That stopped all of my newline issues with PHP.
could it be that your hosting provider is doing some kind of minimization for you? no newlines means less characters pused down the wire.

If just the index.php loads as its generic unparsed self, what exactly is [not] happening?

I visited a client's site today, and I'm getting the actual content of their index.php file itself rather than their website. The function of the index.php file says:
This file loads and executes the parser. *
Assuming this is not happening, what would be some common reasons for that?
If the apache and php are configured correctly, so that .php files go through the php interpreter, the thing I would check is whether the php files are using short open tags "<?"
instead of standard "<?php" open tags. By default newer php versions are configured to not accept short tags as this feature is deprecated now. If this is the case, look for "short_open_tag" line in php.ini and set it to "on" or, preferrably and if time allows, change the tags in the code. Although the second option is better in the long run, it can be time consumming and error-prone if done manually.
I have done such a thing in the past with a site-wide find/replace operation and the general way is this.
Find all "<?=" and replace with "~|~|~|~|" or some other unusual string that is extremely unlikely to turn up in real code.
Find all "<?php" and replace with "$#$#$#"
Find all "<?" in the site and replace with "$#$#$#"
Find all "$#$#$#" and replace with "<?php " The trailing space is advised
Find all "~|~|~|~|" and replace with "<?php echo " The trailing space is neccessary

Debugging PHP Output

I have a php website that on certain pages is adding a dot or space before the first html tag. I can't figure out where it is coming from - is there a way to debug the code so i can see where it is coming from?
Thanks,
Josh
To help prevents this happening it is considered a good practice to don't end your PHP file with a ?>.
You possibly have some file that are this way (notice the extra space after the ?>):
<?php
// Some code //
?>
If you would remove the ?> at the end, the extra space at the end of the file won't be interpreted as something to output.
For files that contain only PHP code,
the closing tag ("?>") is never
permitted. It is not required by PHP,
and omitting it´ prevents the
accidental injection of trailing white
space into the response.
Source: http://framework.zend.com/manual/en/coding-standard.php-file-formatting.html
Maybe it is a BOM character?
Maybe you should check your templates if you are using them... the problem could be there and not in your main code.
and yes is a GOOD PRACTICE in PHP not to close the ending tag.
There really is no good way to go about debugging this. You need to go through every file the page is hitting and figure out where the output is coming from. If you really wanted to be lazy about it you could do some output buffering, but this isn't the right way to do things.
Problems like this can be difficult to track down. If you're in some kind of framework or system that includes a lot of files, you might try a var_dump(get_included_files()) on the line before your error occurs, and that will give you a place to start. If that isn't sufficient, xdebug might get you further. Things to look out for are space before and after the PHP tags, and functions that might send output.

Categories