PHP Regex Help (converting from preg_match_all to preg_replace) - php

I'm having a bit of difficulties converting some regex from being used in preg_match_all to being used in preg_replace.
Basically, via regex only, I would like to match uppercase characters that are preceded by either a space, beginning of text, or a hypen. This is not a problem, I have the following for this which works well:
preg_match_all('/(?<= |\A|-)[A-Z]/',$str,$results);
echo '<pre>' . print_r($results,true) . '</pre>';
Now, what I'd like to do, is to use preg_replace to only return the string with the uppercase characters that match my criteria above. If I port the regex straight into preg_replace, then it obviously replaces the characters I want to keep.
Any help would be much appreciated :)
Also, I'm fully aware regex isn't the best solution for this in terms of efficiency, but nonetheless I would like to use preg_replace.

According to De Morgan's laws,
if you want to keep letters that are
A-Z, and
preceded by [space], \A, or -
then you'd want to remove characters that are
not A-Z, or
not preceded by [space], \A, or -
Perhaps this (replace match with empty string)?
/[^A-Z]|(?<! |\A|-)./
See example here.

I think it will be something like this:
$sString = preg_replace('#.*?(?<= |\A|-)([A-Z])([a-z]+)#m',"$1", $sString);

Related

Using preg_replace() with search words that may have special characters [duplicate]

Regular Expressions are completely new to me and having done much searching my expression for testing purposes is this:
preg_replace('/\b0.00%\b/','- ', '0.00%')
It yields 0.00% when what I want is - .
With preg_replace('/\b0.00%\b/','- ', '50.00%') yields 50.00% which is what I want - so this is fine.
But clearly the expression is not working as it is not, in the first example replacing 0.00% with -.
I can think of workarounds with if(){} for testing length/content of string but presume the replace will be most efficient
The word boundary after % requires a word char (letter, digit or _) to appear right after it, so there is no replacement taking place here.
You need to replace the word boundaries with unambiguous boundaries defined with the help of (?<!\w) and (?!\w) lookarounds that will fail the match if the keywords are preceded or followed with word characters:
$value='0.00%';
$str = 'Price: 0.00%';
echo preg_replace('/(?<!\w)' . preg_quote($value, '/') . '(?!\w)/i', '- ', $str);
See the PHP demo
Output: Price: -
preg_replace has three arguments as you probably already know. The regular expression pattern to match, the replacement value, and the string to search (in that order).
It appears that your preg_replace regex pattern has word boundries \b it is looking for on either end of the value you are looking for 0.00% which should not really be needed. This looks a bit like a bug to me especially when I plug it into the regex website I use. It works fine there. There is probably a somewhat odd querk with it so you might want to try it without the \b and try something like the start of string ^ and end of string characters $.

How do i match with regex special chars that are not alphanumeric whilst ignoring emojis?

i'm currently having an problem, i don't know how to make regex match special characters whilst ignoring emojis.
Example, i want to match the special chars that are not emojis in this string: β€οΈπ“‰π‘’π“ˆπ“‰π’Ύπ“ƒπ‘”β€οΈ
currently as my regex i have
[^\x00-\x7F]+
Current output: β€οΈπ“‰π‘’π“ˆπ“‰π’Ύπ“ƒπ‘”β€οΈ
Wanted output: π“‰π‘’π“ˆπ“‰π’Ύπ“ƒπ‘”
How would i go around fixing this?
Maybe, this expression might work:
$re = '/[\x{1f300}-\x{1f5ff}\x{1f900}-\x{1f9ff}\x{1f600}-\x{1f64f}\x{1f680}-\x{1f6ff}\x{2600}-\x{26ff}\x{2700}-\x{27bf}\x{1f1e6}-\x{1f1ff}\x{1f191}-\x{1f251}\x{1f004}\x{1f0cf}\x{1f170}-\x{1f171}\x{1f17e}-\x{1f17f}\x{1f18e}\x{3030}\x{2b50}\x{2b55}\x{2934}-\x{2935}\x{2b05}-\x{2b07}\x{2b1b}-\x{2b1c}\x{3297}\x{3299}\x{303d}\x{00a9}\x{00ae}\x{2122}\x{23f3}\x{24c2}\x{23e9}-\x{23ef}\x{25b6}\x{23f8}-\x{23fa}]/u';
$str = 'β€οΈπ“‰π‘’π“ˆπ“‰π’Ύπ“ƒπ‘”β€οΈ';
$subst = '';
echo preg_replace($re, $subst, $str);
Output
π“‰π‘’π“ˆπ“‰π’Ύπ“ƒπ‘”οΈ
The expression is explained on the top right panel of this demo if you wish to explore/simplify/modify it.
Reference:
javascript unicode emoji regular expressions
Use the following unicode regex:
[^\p{M}\p{S}]+
\p{M} matches characters intended to be combined with another character (here ️).
\p{S} matches symbols (❀ in this case).
Demo
I think that your posts' title does not match it's body.
There is virtually no overlap between emoji and AlphaNum characters.
There are a couple of keycap emoji but since their sequence beyond
the first digits don't overlap the alphanum, it's enough just to put
a negative look ahead in front of the alphanum class.
'~(?![0-9]\x{FE0F}\x{20E3}|\x{2139})[\pL\pN]+~'
https://regex101.com/r/1JcUqY/1

path to php in regex

I am currently working at a project involving regex in php. I wanted to know why or how can I get this recursive regular expression to work in PHP:
{{test":"([a-f0-9]{32})"},{"test2":"([a-z]{3})}}
And the given results should be an array with:
[a-f0-9]{32}
[a-z]{3}
Maybe, this regex helps
/.sample.\/.*?(\d+\.\d+\.\d+)-/
You should escape the . or it will mean any character.
If it does not matter, what is after the dash, you need not use the anchor $ for end of string.
This finds the first occurence of number in the string, because .*? is not eager. It matches only as much as necessary for the rest of the pattern.
You can use this, if between / and the number are only letters allowed:
/.sample.\/[a-zA-Z]*(\d+\.\d+\.\d+)-/

PHP Regex to find a specific substring

So basically, I have a big string with some other information, and somewhere at the end, I have the following structure of a string:
62AC979D-5277D720
It is numbers and uppercase letters. I would like to extract this substring from many lines of the bigger strings which all contain it at different places. I have tried:
preg_match('/^[\w]+$/', $string);
But I really don't have much experience with regular expressions. Can someone provide the regex necessary or at least tell me where I am mistaken? Thank you for your time!
This regex should do it for you,
([A-Z\d]{8}-[A-Z\d]{8})
in use
<?php
$string = 'This is 62AC979D-5277D720 the whole string.';
preg_match_all('~([A-Z\d]{8}-[A-Z\d]{8})~', $string, $value);
print_r($value[1]);
Your current regex fails I suspect because of the ^ and $. These mark the start and end of the string you are searching for (or line if the m modifier is used). The \w is also a-z, A-Z, 0-9 and _. I think you only care about capital letters and you want to allow only one dash. If the target will also always only be 8 characters you can add the {8} in place of the +. The () are to capture the value that is found. The first found value in $string will be $value[1][0].
Demo: http://sandbox.onlinephpfunctions.com/code/c6b2c391d95c5454a3c7ea81d5ac4a3bb8e49aef
preg_match_all('/\\b[0-9A-Z]+-[0-9A-Z]+\\b/')
This should do it for you.
preg_match('/\\b[0-9A-Z]{8}-[0-9A-Z]{8}\\b/', $string);
This works for the string you gave i.e 8 numbers or alphabets followed by - and then numbers and alphabets again
You try this.
preg_match('/^[0-9A-Z]{8}-[0-9A-Z]{8}$/', $string)

PHP Regex for checking space or certain characters after string

I need a regex which can basically check for space, line break etc after string.
So conditions are,
Allow special characters ., _, -, + inside the string i.e.#hello.world, #hello_world, #helloworld, etc.
Discard anything including special characters where there is no alpha-numeric string after them i.e. #helloworld.<space>, #helloworld-<space>, #helloworld.?, etc. must be parsed as #helloworld
My existing RegEx is /#([A-Za-z0-9+_.-]+)/ which works perfectly Condition #1, but still there seems to be a problem Condition #2
I am using above RegEx in preg_replace()
Solution:
$str = preg_replace('##[\w+.\-]+\b#', '[[$0]]', $str);
This works perfectly.
Tested with
http://gskinner.com/RegExr/
You can use word boundaries to easily find the position between an alphanumeric letter and a non-alphanumeric letter:
$str = preg_replace('##[\w+.\-]+\b#', '[[$0]]', $str);
Working example: http://ideone.com/0ShCm
Here's an idea:
Use strrev to reverse the string
Use strcspn to find the longest prefix of the reversed string that does not contain any alphanumeric characters
Cut the prefix off with substr
Reverse the string again; this is your final result
See it in action.
I 'm not taking into account any requirement that restricts the legal characters in the string to some subset, but you can use your regular expression for that (or even strspn, which might be faster).
The reason is because it's reading the string as a whole. If you want it to parse out everything after the alphanumeric section you might have to do like and end(explode()); and run that through to make sure that it isn't valid and if it isn't valid then remove it from the equation, but then you'd have to check the end for every possible explode point i.e. .,-,~,etc.
Then again another trap that you might run into is that in the case of a item or anything w/ alphanumeric value it might just parse everything from after the last alphanumeric character on.
Sorry that this isn't much help, but I figured thinking aloud does help.

Categories