strange regex behaviour with [^pL ] - php

I have a snippet of PHP that replaces all characters from the left not being Unicode letters. It works fine, with one exception, and I can't figure out why. Can anyone help?
<?php
$B=$A;
do{
$A=$B;
$B=preg_replace('/^[^\pL\s]/','',$B);
}
while($B!=$A);
echo $B;
?>
If I feed it with a string like "\\*^&\\\##\816.80831téstmé" it nicely spits out "téstmé".
$A="*^&\\\##\816.80831[+" gives an empty string, also correct.
But, when I enter "\\*^&\\\##\816.80831", I end up with "831", when in fact it should be an empty string.
"^&\\\##\8016.8048.31" gives "48.31"
"^&\\\##\8016.8148.31" gives an empty string correctly
"^&\\\##\8016.8148067" gives "16.8148067"
"^&\\\##\8116.8148167" the again is empty
It seems to have somethinh to do with the zero and the dot, but I can't find a pattern nor a solution. I tried adding strval, but still the same result.
Maybe someone has an answer? Thnx.

I honestly can not find out why this is going wrong. It has to be some sort of bug. However there is a simple solution.
<?php
$B=preg_replace('/^[^\pL\s]*/','',$A);
This way it has the same functionality, except it works and has a lot less overhead.
Update: i did some testing in Java, regex coach and regexpal.com and they all do it correctly. So this has to be a bug in preg_replace.

Related

PHP: How would I remove parts of a string between 2 chunks of characters without removing too much?

This problem is driving me nuts. Let's say I have a string:
This is a &start;pretty bad&end; string that I want to &start;somehow&end; display differently
I want to be able to remove the &start; and &end; parts as well as everything in between so it says:
This is a string that I want to display differently
I tried using preg_replace with a regular expression but it took off too much, ie:
This is a display differently
The question is: how do I remove the stuff just between sets of &start; and &end; pairs and make sure that it doesn't remove anything between any &end; and &start; segments?
Keep in mind, I'm working with hundreds of strings that are very different to each other so I'm looking for a flexible solution that'll work with all of them.
Thanks in advance for any help with this.
Edit: Replaced dollar signs with ampersands. Oops!
Try this regex /\&start;(.+?)\$end;/g
It looks like it works as desired: https://regex101.com/r/MW5nom/2
I quickly tried it on chrome console using JS, tried converting it into PHP:
"This is a &start;pretty bad$end; string that I want to &start;somehow$end; display differently".replace(/\&start;(.+?)\$end;/g, "")

Using preg_match_all to filter out strings containing this but not this

im having an issue with preg_match_all. I have this string:
$product_req = "ACTIVE-6,CATEGORY-ACTIVE-8,CATEGORY-ACTIVE-4,ACTIVE-9";
I need to get the numbers preceded by "ACTIVE-" but not by "CATEGORY-ACTIVE-", so in this case the result should be 6,9. I used the statement below:
preg_match_all("/ACTIVE-(\d+)/", $product_req, $this_act);
However this will return all the numbers because all of them are in fact preceded by "ACTIVE-" but thats not what i meant because i need to leave out those preceded by "CATEGORY-ACTIVE-". How can i configure preg_match_all to do it? Or maybe there is some other function that can do the job?
EDIT:
I tried this:
preg_match_all("/CATEGORY-ACTIVE-(\d+)/", $product_req, $this_cat_act);
preg_match_all("/ACTIVE-(\d+)/", $product_req, $this_act);
$act_cat = str_replace($this_cat_act[1],"",$this_act[1]);
it kinda works, but i guess there is a better and cleaner way to do it. Besides the output is kinda weird too.
Thank you.

apostrophe in preg_match_all() is giving me problems

So I've got this piece of code that wont play nice.
preg_match_all("/(\{\[)([\w-\d\s\.\|']*)(\]\})/i",$replace_text, $match);
What it is supposed to do, is allow an apostrophe to be in my replacement text. So in my text, where i have "{[SPIN--they are|they’re]}" it should return "they are" or "they're".
But instead, it simply does nothing and spits out the entire spintax code just as I typed above.
The only time this does not work, is when a replacement text has an apostrophe. It works perfectly everywhere else. Been trying to fix this for two days and I'm about to throw my keyboard through my monitor.
There are many things that my project does and it is imperative to have the {[SPIN-- before specifying the replacement text, and the ]} closing brackets.
Can someone help, please?
In your example string it's not a single quote character, but something that looks similarly
’ (the actual character) vs ' (that's what you think it is)

PHP: preg_match; Not able to match the £ symbol

I've really been wracking my brains over this one, as for the life of me I can't figure out what the problem is.
I've got some data I want to run a regular expression on. For reference, the original document is encoded in iso-8859-15, if that makes any difference.
Here is a function using the regular expression;
if(preg_match("{£\d+\.\d+}", $handle)) //
{
echo 'Found a match';
}
else
{
echo 'No match found';
}
No matter what I try I can't seem to get it to match. I've tried just searching for the £ symbol. I've gone over my regular expression and there aren't any issues there. I've even pasted the source data directly into a regular expression tester and it finds a complete match for what I'm looking for. I just don't understand why my regular expression isn't working. I've looked at the raw data in my string that I'm searching for and the £ symbol is there as clear as day.
I get the feeling that there's some encoded character there that I just can't see, but no matter how I output the data all I can see is the £ symbol, but for whatever reason it's not being recognised.
Any ideas? Is there an absolute method to viewing raw data in a string? I've tried var_dump and var_export, but I do get the feeling that something isn't quite right, as var_export does display the data in a different language. How can I see what's "really" there in my variable?
I've even saved the content to a txt file. The £ is there. There should be no reason why I shouldn't be able to find it with my regular expression. I just don't get it. If I create a string and paste in the exact bit of test my regular expression should pick up, it finds the match without any problems.
Truly baffling.
You could always transform the letter:
$string = '£100.00';
if(preg_match("/\xa3/",$string)){
echo 'match found';
}else{
echo 'no matches';
}
You can include any character in your regular expression if you know the hexadecimal value. I think the value is 0A3H, so try this:
\xa3 // Updated with the correct hex value

PHP wordpress string formatting error

I have a bit of PHP where I want to store a URL in a string.
The code itself seems fine, but for some reason, when I use the characters $sectionId=, it causes problems, in fact, it alters $sectionId= and changes it to §ionId=.
If I misspell it to $secionId then it works fine.
The full url SHOULD be:
http://url.com/file.php?appKey=$appkey&storeId=$storeid&sectionId=$sectionid&v=3
but when I do an echo $myURL; on it, it gives me:
http://url.com/file.php?appKey=$appkey&storeId=$storeid§ionId=$sectionid&v=3
Notice the §ionId= instead of $sectionId=.
Can anyone help me with this? It seems like basic PHP, but I don't understand why it just doesnt like those 4 or 5 characters in a row!!
Thanks.
Are you echoing it right to HTML? Well, some over-helpful browsers will do character conversions without being asked explicitly to with a semicolon; all you need to do is run it through htmlentities or replace all &s with & and it will display correctly.

Categories