Simple regex seems to cause infinite loop in PHP

Simple regex seems to cause infinite loop in PHP - php

The following 2 lines are my code:
$rank_content = file_get_contents('https://www.championsofregnum.com/index.php?l=1&ref=gmg&sec=42&world=2');
$tmp_ = preg_replace("/.+width=.16.> /Uis", "", $rank_content, 1);
The second line above causes an infinite loop.
In contrary, the following alternatives DO work:
$tmp_ = preg_replace("/.+width=.16.> /Ui", "", $rank_content, 1);
$tmp_ = preg_replace("/[^§]+width=.16.> /Uis", "", $rank_content, 1);
But sadly, they do not give me what I want - both alternatives do not include line breaks within $rank_content.
Also, if I replaced the file_get_contents function with something like
$rank_content = "asdfas\nasdfasdfaswidth=m16m> teststring";
There are no problems either, although \n represents a line break, too, doesn’t it?!
So do I understand it right that RegEx has problems in noticing a String with line breaks in it?
How can I filter a substring of $rank_content (which has multiple lines in it) by removing some lines until something like "width="16" " appears? (Can be seen in the site's source code)

Replace the m modifier with the s modifier. m changes the behaviour of ^ and $, whereas s changes the behaviour of .
That said, you should not be parsing HTML with regex. Seriously. Bad things happen.

I give up on it: It seems the problem is the LENGTH of the haystack variable $rank_content. Its length is about 90,000, while the maximum allowed length for regex match() is about 30,000, so I guess it is the same for regex replace().
Solving this problem would surely be possible, if somebody is interested: Have a look into this link -> PHP preg_match_all limit
I myself am going to solve the problem using another method for reading the contents of a website like HTML Unit or maybe retrieving the site line after line.

Related

Replace // comments by /* comments */ Except in URLs [duplicate]

I need to remove the comment lines from my code.
preg_replace('!//(.*)!', '', $test);
It works fine. But it removes the website url also and left the url like http:
So to avoid this I put the same like preg_replace('![^:]//(.*)!', '', $test);
It's work fine. But the problem is if my code has the line like below
$code = 'something';// comment here
It will replace the comment line with the semicolon. that is after replace my above code would be
$code = 'something'
So it generates error.
I just need to delete the single line comments and the url should remain same.
Please help. Thanks in advance

try this
preg_replace('#(?<!http:)//.*#','',$test);
also read more about PCRE assertions http://cz.php.net/manual/en/regexp.reference.assertions.php

If you want to parse a PHP file, and manipulate the PHP code it contains, the best solution (even if a bit difficult) is to use the Tokenizer : it exists to allow manipulation of PHP code.
Working with regular expressions for such a thing is a bad idea...
For instance, you thought about http:// ; but what about strings that contain // ?
Like this one, for example :
$str = "this is // a test";

This can get complicated fast. There are more uses for // in strings. If you are parsing PHP code, I highly suggest you take a look at the PHP tokenizer. It's specifically designed to parse PHP code.
Question: Why are you trying to strip comments in the first place?
Edit: I see now you are trying to parse JavaScript, not PHP. So, why not use a javascript minifier instead? It will strip comments, whitespace and do a lot more to make your file as small as possible.

Is there a typo in this str_replace code? / Am I reading it correctly?

Here is the line of code from a PHP file, specifically it is from zstore.php which is a file include as part of the "Zazzle Store Builder" toolset from Zazzle.com
The set of files allows someone like me, who has products for sale on Zazzle and massage that data into a nicer "storefront" which I can set up my way instead of being confined by the CMS structure of Zazzle.com where they understandably want to keep the monkeys (uhmmm... users like myself) from causing too much mayhem.
So... here is the code:
$keywords = str_replace(" ",",",str_replace(",","",$keywords));
Two questions:
Am I understanding what it does and
Is there an extra single or double quote in the string that does not need to be there?
Here is what I think the line of code is saying:
Take the string of characters that the user inputs (dance diva) and assign it to the variable called
$keywords
then run the following function on that character string
= str_replace
(" ","," <<< look for spaces. If you find a space, replace it with a comma
,str_replace(",","" <<< this is the bit I don't understand or which may have a typo
I THINK that it is saying " if you find commas, leave them alone, but I'm not certain.
,$keywords)); <<< then put the edited string of characters backing to the variable called $keywords.
What lead me to look at this was that I was inputting the following:
dance,diva which is what I THOUGHT the script was wanting from me based on the commented text in the README.txt file:
// Search terms. Comma separated keywords you can use to select products for your store
So..
Am I understanding what this line of code is supposed to do?
which, assuming I am correct, and I'm pretty sure that the first half is supposed to work as I've described, now brings me to my second question:
Why isn't the second bit working? Is there a typo?
To review:
dance diva produces results
dance,diva does not
Both, SHOULD work.
Thanks in advance for your help. I have a lot of HTML experience and computer experience but PHP is new to me.

$keywords = str_replace(" ",",",str_replace(",","",$keywords));
You can split into
$temp = str_replace(",","",$keywords);
$keywords = str_replace(" ",",",$temp);
First it replaces all comas with empty string, it is removes all comas. Then replaces all spaces with comas.
For "dance diva" there are no comas so first does nothing, then it replaces space and result is "dance,diva"
For "dance,diva" it removes coma, you get "dancediva" and there in no space to replace next so it is Your result.

PHP Regex URL parsing issues preg_replace

I have a custom markup parsing function that has been working very well for many years. I recently discovered a bug that I hadn't noticed before and I haven't been able to fix it. If anyone can help me with this that'd be awesome. So I have a custom built forum and text based MMORPG and every input is sanitized and parsed for bbcode like markup. It'll also parse out URL's and make them into legit links that go to an exit page with a disclaimer that you're leaving the site... So the issue that I'm having is that when I user posts multiple URL's in a text box (let's say \n delimited) it'll only convert every other URL into a link. Here's the parser for URL's:
$markup = preg_replace("/(^|[^=\"\/])\b((\w+:\/\/|www\.)[^\s<]+)" . "((\W+|\b)([\s<]|$))/ei", '"$1".shortURL("$2")."$4"', $markup);
As you can see it calls a PHP function, but that's not the issue here. Then entire text block is passed into this preg_replace at the same time rather than line by line or any other means.
If there's a simpler way of writing this preg_replace, please let me know
If you can figure out why this is only parsing every other URL, that's my ultimate goal here
Example INPUT:
http://skylnk.co/tRRTnb
http://skylnk.co/hkIJBT
http://skylnk.co/vUMGQo
http://skylnk.co/USOLfW
http://skylnk.co/BPlaJl
http://skylnk.co/tqcPbL
http://skylnk.co/jJTjRs
http://skylnk.co/itmhJs
http://skylnk.co/llUBAR
http://skylnk.co/XDJZxD
Example OUTPUT:
http://skylnk.co/tRRTnb
<br>http://skylnk.co/hkIJBT
<br>http://skylnk.co/vUMGQo
<br>http://skylnk.co/USOLfW
<br>http://skylnk.co/BPlaJl
<br>http://skylnk.co/tqcPbL
<br>http://skylnk.co/jJTjRs
<br>http://skylnk.co/itmhJs
<br>http://skylnk.co/llUBAR
<br>http://skylnk.co/XDJZxD
<br>

e flag in preg_replace is deprecated. You can use preg_replace_callback to access the same functionality.
i flag is useless here, since \w already matches both upper case and lower case, and there is no backreference in your pattern.
I set m flag, which makes the ^ and $ matches the beginning and the end of a line, rather than the beginning and the end of the entire string. This should fix your weird problem of matching every other line.
I also make some of the groups non-capturing (?:pattern) - since the bigger capturing groups have captured the text already.
The code below is not tested. I only tested the regex on regex tester.
preg_replace_callback(
"/(^|[^=\"\/])\b((?:\w+:\/\/|www\.)[^\s<]+)((?:\W+|\b)(?:[\s<]|$))/m",
function ($m) {
return "$m[1]".shortURL($m[2])."$m[3]";
},
$markup
);

What's throwing off my str_word_count?

I'm using PHP's function to count the number of words from a textarea via POST...
The issue is that if I do a post back to my file and output the word count it is different than if I copy and paste the same text into my PHP script to evaluate the word count.
What is throwing off the number? There is difference of 6 words, incidentally there are 6 double line breaks in the textarea as well.
How do I minimize this difference?

You could remove the line breaks and tags altogether:
str_word_count(str_replace('<br>', '', nl2br(strip_tags($data))));
Or I guess this is better:
str_word_count(strip_tags(nl2br($data)));

If your line breaks are in HTML-form, you could use something like strip_tags()
If they aren't, I suspect an issue with encoding. Maybe an combination of stripslashes, utf8_encode or utf8_decode could solve this wrong counted words.
As an last resort you could use some regular expression to filter anything but [a-zA-Z] and spaces.

how to remove the comment line starts with // and not the url like http:// using preg_replace

I need to remove the comment lines from my code.
preg_replace('!//(.*)!', '', $test);
It works fine. But it removes the website url also and left the url like http:
So to avoid this I put the same like preg_replace('![^:]//(.*)!', '', $test);
It's work fine. But the problem is if my code has the line like below
$code = 'something';// comment here
It will replace the comment line with the semicolon. that is after replace my above code would be
$code = 'something'
So it generates error.
I just need to delete the single line comments and the url should remain same.
Please help. Thanks in advance

try this
preg_replace('#(?<!http:)//.*#','',$test);
also read more about PCRE assertions http://cz.php.net/manual/en/regexp.reference.assertions.php

If you want to parse a PHP file, and manipulate the PHP code it contains, the best solution (even if a bit difficult) is to use the Tokenizer : it exists to allow manipulation of PHP code.
Working with regular expressions for such a thing is a bad idea...
For instance, you thought about http:// ; but what about strings that contain // ?
Like this one, for example :
$str = "this is // a test";

This can get complicated fast. There are more uses for // in strings. If you are parsing PHP code, I highly suggest you take a look at the PHP tokenizer. It's specifically designed to parse PHP code.
Question: Why are you trying to strip comments in the first place?
Edit: I see now you are trying to parse JavaScript, not PHP. So, why not use a javascript minifier instead? It will strip comments, whitespace and do a lot more to make your file as small as possible.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.