PHP: preg_match; Not able to match the £ symbol - php

I've really been wracking my brains over this one, as for the life of me I can't figure out what the problem is.
I've got some data I want to run a regular expression on. For reference, the original document is encoded in iso-8859-15, if that makes any difference.
Here is a function using the regular expression;
if(preg_match("{£\d+\.\d+}", $handle)) //
{
echo 'Found a match';
}
else
{
echo 'No match found';
}
No matter what I try I can't seem to get it to match. I've tried just searching for the £ symbol. I've gone over my regular expression and there aren't any issues there. I've even pasted the source data directly into a regular expression tester and it finds a complete match for what I'm looking for. I just don't understand why my regular expression isn't working. I've looked at the raw data in my string that I'm searching for and the £ symbol is there as clear as day.
I get the feeling that there's some encoded character there that I just can't see, but no matter how I output the data all I can see is the £ symbol, but for whatever reason it's not being recognised.
Any ideas? Is there an absolute method to viewing raw data in a string? I've tried var_dump and var_export, but I do get the feeling that something isn't quite right, as var_export does display the data in a different language. How can I see what's "really" there in my variable?
I've even saved the content to a txt file. The £ is there. There should be no reason why I shouldn't be able to find it with my regular expression. I just don't get it. If I create a string and paste in the exact bit of test my regular expression should pick up, it finds the match without any problems.
Truly baffling.

You could always transform the letter:
$string = '£100.00';
if(preg_match("/\xa3/",$string)){
echo 'match found';
}else{
echo 'no matches';
}

You can include any character in your regular expression if you know the hexadecimal value. I think the value is 0A3H, so try this:
\xa3 // Updated with the correct hex value

Related

using regex to find invalid postcode

I'm new to php and I'm trying to write a function to find an invalid postcode. This is an option, however I've been told this isnt the ideal format:
function postcode_valid($postcode) {
return preg_match('/\w{2,3} \d\w{2}/', $postcode);
}
//more accurate
//[A-Z]{1,2}[0-9]{1,2}[A-Z]? [0-9][A-Z]{2}
I understand the first function, but I don't know how to write the 'ideal' solution as a function, please can you advise?
If the regular expression you provided in the comment field is the correct one and you don't know how to use it in PHP, here is the solution:
function postcode_valid($postcode) {
return preg_match('/^[A-Z]{1,2}[0-9]{1,2}[A-Z]? [0-9][A-Z]{2}$/', $postcode);
}
You need to add two slashes (one in front, one at the end) of the regular expression and pack it in a string in PHP. I would also highly recommend you to use ^ and $ at the beginning resp. at the end of the regular expression to indicate the beginning and the end of the string (otherwise, it is valid, if only a part of the string contains the correct pattern i.e. a longer string with a valid part would be accepted.) Here is a live example.
If you are looking for the validation of a UK post code, you should be using the following regex instead (source):
(GIR 0AA)|((([A-Z-[QVX]][0-9][0-9]?)|(([A-Z-[QVX]][A-Z-[IJZ]][0-9][0-9]?)|(([A-Z-[QVX]][0-9][A-HJKPSTUW])|([A-Z-[QVX]][A-Z-[IJZ]][0-9][ABEHMNPRVWXY])))) [0-9][A-Z-[CIKMOV]]{2})
If you are looking for something else, please provide a comment below.

Using preg_match_all to filter out strings containing this but not this

im having an issue with preg_match_all. I have this string:
$product_req = "ACTIVE-6,CATEGORY-ACTIVE-8,CATEGORY-ACTIVE-4,ACTIVE-9";
I need to get the numbers preceded by "ACTIVE-" but not by "CATEGORY-ACTIVE-", so in this case the result should be 6,9. I used the statement below:
preg_match_all("/ACTIVE-(\d+)/", $product_req, $this_act);
However this will return all the numbers because all of them are in fact preceded by "ACTIVE-" but thats not what i meant because i need to leave out those preceded by "CATEGORY-ACTIVE-". How can i configure preg_match_all to do it? Or maybe there is some other function that can do the job?
EDIT:
I tried this:
preg_match_all("/CATEGORY-ACTIVE-(\d+)/", $product_req, $this_cat_act);
preg_match_all("/ACTIVE-(\d+)/", $product_req, $this_act);
$act_cat = str_replace($this_cat_act[1],"",$this_act[1]);
it kinda works, but i guess there is a better and cleaner way to do it. Besides the output is kinda weird too.
Thank you.

strange regex behaviour with [^pL ]

I have a snippet of PHP that replaces all characters from the left not being Unicode letters. It works fine, with one exception, and I can't figure out why. Can anyone help?
<?php
$B=$A;
do{
$A=$B;
$B=preg_replace('/^[^\pL\s]/','',$B);
}
while($B!=$A);
echo $B;
?>
If I feed it with a string like "\\*^&\\\##\816.80831téstmé" it nicely spits out "téstmé".
$A="*^&\\\##\816.80831[+" gives an empty string, also correct.
But, when I enter "\\*^&\\\##\816.80831", I end up with "831", when in fact it should be an empty string.
"^&\\\##\8016.8048.31" gives "48.31"
"^&\\\##\8016.8148.31" gives an empty string correctly
"^&\\\##\8016.8148067" gives "16.8148067"
"^&\\\##\8116.8148167" the again is empty
It seems to have somethinh to do with the zero and the dot, but I can't find a pattern nor a solution. I tried adding strval, but still the same result.
Maybe someone has an answer? Thnx.
I honestly can not find out why this is going wrong. It has to be some sort of bug. However there is a simple solution.
<?php
$B=preg_replace('/^[^\pL\s]*/','',$A);
This way it has the same functionality, except it works and has a lot less overhead.
Update: i did some testing in Java, regex coach and regexpal.com and they all do it correctly. So this has to be a bug in preg_replace.

Help with replacing characters

Hopefully someone can help out here;
I am trying to write a function which replaces special characters and returns the correct one.
This is what I have so far:
function convertlatin($output){
$latinchar = array("€", "‚","Æ'","„","…","‡","ˆ","‰","Å","‹","Å'",'Ž','‘','’','“','â€','•','â€"','â€"','Ëœ','â"¢','Å¡','›','Å"',"ž",'Ÿ','¡','¢','£','¤','Â¥','¦','§','¨','©','ª','«','¬','®','¯','°','±','²','³','´','µ','¶','·','¸','¹','º','»','¼',"½",'¾','¿','À','Ã','Â','Ã','Ã"','Ã…','Æ','Ç','È','É','Ê','Ë','ÃŒ ','Ã','ÃŽ','ß','Ã',"Ã'","Ã'",'Ã"','Ã"','Õ','Ö','×','Ø','Ù','Ú','Û','Ãœ','Ã','Þ','ß','Ã','á','â','ã','ä','Ã¥','æ','ç','è','é','ê','ë','ì','Ã','î','ï','ð','ñ','ò','ó','ô','õ','ö','÷','ø','ù','ú','û','ü','ý',"þ","ÿ");
$correctchar = array("€", "‚","ƒ",'"','…','‡','ˆ','‰',"Š",'‹','Œ','Ž',"'","'",'"','"','•','–','—','˜','™','š','›','œ','ž','Ÿ','¡','¢','£','¤','¥','¦','§','¨','©','ª','«','¬','®','¯','°','±','²','³','´','µ','¶','·','¸','¹','º','»','¼','½','¾','¿','À','Á','Â','Ã','Ä','Å','Æ','Ç','È','É','Ê','Ë','Ì','Í','Î','Ï','Ð','Ñ','Ò','Ó','Ô','Õ','Ö','×','Ø','Ù','Ú','Û','Ü','Ý','Þ','ß','à','á','â','ã','ä','å','æ','ç','è','é','ê','ë','ì','í','î','ï','ð','ñ','ò','ó','ô','õ','ö','÷','ø','ù','ú','û','ü','ý','þ',"ÿ");
$returnval = str_replace($latinchar, $correctchar, $output);
echo($returnval);
return $returnval;
}
The problem I have is I thought it was working but it has random results, such as if it finds a match on just one of the characters it replaces a different one in that array. What I would like to do is find and replace an exact match of latin char within a supplied string eg "testingÿ" with "testingÿ" - at the mo it replaces ÿ with testingá¿
It just seems to replace one character in some occasions, when I would like it to match and replace both parameters.
I also tried strcmp with not much success.
Any ideas ?
Looks like your problem is not wrong chars, it's more a wrong encoding. Maybe you better try to change the encoding of $output. utf8_encode will not help you, the "wrong" chars look like some wrong converted Windows-1252-input.
Try:
echo mb_convert_encoding('testingÿ','CP1252','UTF-8');

regular expression and forward slash

i'm searching for keywords in a string via a regular expression. It works fine for all keywords, exept one which contains a forward slash in it: "time/emit" .
Even using preg_quote($find,'/'), which escapes it, i still get the message:
Unknown modifier 't' in /frontend.functions.php on line 71
If i print the find pattern, it shows /time\\/emit/ . Without preg_quote, it shows /time/emit/ and both return the same error message.
Any bit of knowledge would be useful.
Try to begin and end your regular expression with different sign than /
I personally use `
I've seen people using #
I think most chars are good. You can read more about it here: http://pl.php.net/manual/en/regexp.reference.delimiters.php
Like this:
preg_match('#time/emit#', $subject); // instead of /time/emit/
To put it another way: Your $find variable should contain rather #time/emit# than /time/emit/
looks like you have something already escaping it..
preg_quote('time/emit') // returns time\/emit
preg_quote('time\/emit') // returns time\\/emit
as a hack you could simply do:
preg_quote(stripslashes($find)) // will return time\/emit
bit of code?
the the 'regex' for that particular term should look something like '/time/emit/'. With a set of keywords there may be a more efficient method so seeing what you are doing would be good.
this should work:
$a="Hello////////";
$b=str_replace($a,"//","/");
echo $b;

Categories