match the following pattern - php

I have a lot of (p. #) like (p. 13) (p. 234) in a string and I want to remove them. I used the following pattern to match but it doesn't work
preg_replace('/\(p\.*\)/','',$string);
( to escape (
p is p
\. to escape .
I need some help here. Thank you.

This is the regular expression you're looking for:
\(p\.\s+\d+\)
Or, in your code:
preg_replace('/\(p\.\s+\d+\)/', '', $string);
Here's a fiddle.

You have nothing in your regexp to match the page number. You're just matching something like (p.......).
preg_replace('/\(p\.\s*\d+\s*\)/', '', $string);

The * means to match 0 or more of the preceding item, so it would be working on the . character in your example. Perhaps you might need something like:
preg_replace('/\(p\. ?[0-9]+\)/','',$string);
This matches:
(p.
Then a space, which the ? makes optional
Then one or more digits 0-9, due to the +
Then )
Hope this helps

First at all preg_replace is a function, not a procedure, if you want to see any changes, you need to write:
$str = preg_replace($pattern, $replacement, $str);
In your pattern you wrote \.* that means a literal . zero or more times, I assume that is not what you want. I assume you wanted to write \..* a literal . and zero or more characters. But this doesn't work too for two reasons: this doesn't check if characters are digits, and since the * quantifier is greedy, .* will match the all the characters until the end of the line and will backtrack until the last parenthesis.
The good way is probably Barmar pattern that checks there is at least a digit (using the + quantifier) or the same more constraignant (without a variable number of spaces):
/\(p\. \d+\)/

Related

Using preg_replace() with search words that may have special characters [duplicate]

Regular Expressions are completely new to me and having done much searching my expression for testing purposes is this:
preg_replace('/\b0.00%\b/','- ', '0.00%')
It yields 0.00% when what I want is - .
With preg_replace('/\b0.00%\b/','- ', '50.00%') yields 50.00% which is what I want - so this is fine.
But clearly the expression is not working as it is not, in the first example replacing 0.00% with -.
I can think of workarounds with if(){} for testing length/content of string but presume the replace will be most efficient
The word boundary after % requires a word char (letter, digit or _) to appear right after it, so there is no replacement taking place here.
You need to replace the word boundaries with unambiguous boundaries defined with the help of (?<!\w) and (?!\w) lookarounds that will fail the match if the keywords are preceded or followed with word characters:
$value='0.00%';
$str = 'Price: 0.00%';
echo preg_replace('/(?<!\w)' . preg_quote($value, '/') . '(?!\w)/i', '- ', $str);
See the PHP demo
Output: Price: -
preg_replace has three arguments as you probably already know. The regular expression pattern to match, the replacement value, and the string to search (in that order).
It appears that your preg_replace regex pattern has word boundries \b it is looking for on either end of the value you are looking for 0.00% which should not really be needed. This looks a bit like a bug to me especially when I plug it into the regex website I use. It works fine there. There is probably a somewhat odd querk with it so you might want to try it without the \b and try something like the start of string ^ and end of string characters $.

preg_replace pattern to remove pNUMBERxNUMBER

Im trying to locate a pattern with preg_replace() and remove it...
I have a string, that contains this: p130x130/ and these numbers vary, they can be higher, or lower ... what I need to do is locate that string, and remove it, whole thing.
I've been trying to use this:
preg_replace('/p+[0-9]+x+[0-9]"/', '', $str);
but that doesnt work for some reason. Would any of you know the correct regexp?
Kind regards
You need to first remove the + quantifier after p then switch the + quantifier from after x and place it after your character class (e.g. x[0-9]+), also remove the quote " inside of your expression, which to me looks like a typo here. You can also use a different delimiter to avoid escaping the ending slash.
$str = preg_replace('~p[0-9]+x[0-9]+/~', '', $str);
If the ending slash is by mistake a typo as well, then this is what you're looking for.
$str = preg_replace('/p[0-9]+x[0-9]+/', '', $str);
Regex to match p130x130/ is,
p[0-9]+x[0-9]+\/
Try this:
$str = preg_replace("/p[0-9]+?x[0-9]+?\//is","",$str);
As mentioned by the comment I have to explain the code as I'm a teacher now.
I've used "/" as a delimiter, but you can use different characters to avoid slashing.
The part that says [0-9]+ is saying to match any character between 0 and 9 at least once, but more if possible. If I had put [0-9]*? then it would have matched an empty space too (as * means to match 0 or more, not 1 or more like +) which is probably not what you wanted anyway.
I've put the ? at the end to make it non-greedy, just a habit of mine but I don't think it's needed. (I used ereg a lot previously).
Anyway, it's going to find 0-9 until it hits an x, and then it does another match for more numbers until it hits a single forward slash. I've backslashed that slash because my delimiter is a slash also and I didn't want it to end there.

Can you explain/simplify this regular expression (PCRE) in PHP?

preg_match('/.*MyString[ (\/]*([a-z0-9\.\-]*)/i', $contents, $matches);
I need to debug this one. I have a good idea of what it's doing but since I was never an expert at regular expressions I need your help.
Can you tell me what it does block by block (so I can learn)?
Does the syntax can be simplified (I think there is no need to escape the dot with a slash)?
The regexp...
'/.*MyString[ (\/]*([a-z0-9\.\-]*)/i'
.* matches any character zero or more times
MyString matches that string. But you are using case insensitive matching so the matched string will spell "mystring" by but with any capitalization
EDIT: (Thanks to Alan Moore) [ (\/]*. This matches any of the chars space ( or / repeated zero of more times. As Alan points out the final escape of / is to stop the / being treated as a regexp delimeter.
EDIT: The ( does not need escaping and neither does the . (thanks AlexV) because:
All non-alphanumeric characters other than \, -, ^ (at the start) and
the terminating ] are non-special in character classes, but it does no
harm if they are escaped.
-- http://www.php.net/manual/en/regexp.reference.character-classes.php
The hyphen, generally does need to be escaped, otherwise it will try to define a range. For example:
[A-Z] // matches all upper case letters of the aphabet
[A\-Z] // matches 'A', '-', and 'Z'
However, where the hyphen is at the end of the list you can get away with not escaping it (but always best to be in the habit of escaping it... I got caught out by this].
([a-z0-9\.\-]*) matches any string containing the characters a through z (note again this is effected by the case insensitive match), 0 through 9, a dot, a hyphen, repeated zero of more times. The surrounding () capture this string. This means that $matches[1] will contain the string matches by [a-z0-9\.\-]*. The brackets () tell preg_match to "capture" this string.
e.g.
<?php
$input = "aslghklfjMyString(james321-james.org)blahblahblah";
preg_match('/.*MyString[ (\/]*([a-z0-9.\-]*)/i', $input, $matches);
print_r($matches);
?>
outputs
Array
(
[0] => aslghklfjMyString(james321-james.org
[1] => james321-james.org
)
Note that because you use a case insensitive match...
$input = "aslghklfjmYsTrInG(james321898-james.org)blahblahblah";
Will also match and give the same answer in $matches[1]
Hope this helps....
Let's break this down step-by step, removing the explained parts from the expression.
"/.*MyString[ (\/]*([a-z0-9\.\-]*)/i"
Let's first strip the regex delimiters (/i at the end means it's case-insensitive):
".*MyString[ (\/]*([a-z0-9\.\-]*)"
Then we've got a wildcard lookahead (search for any symbol any number of times until we match the next statement.
"MyString[ (\/]*([a-z0-9\.\-]*)"
Then match 'MyString' literally, followed by any number (note the '*') of any of the following: ' ', '(', '/'. This is probably the error zone, you need to escape that '('. Try [ (/].
"([a-z0-9\.\-]*)"
Then we get a capture group for any number of any of the following: a-z literals, 0-9 digits, '.', or '-'.
That's pretty much all of it.

how can i write this regex? ungreedy related

I'm sorry for the poor title, but it is a very generic question
I have to match this pattern
;AAAAAAA(BBBBBB,CCCCC,DDDDDD)
AAAAA = all characters starting from ";" to "(" (both ;( not included)
BBBBB = all characters starting from "(" to "," (both (, not included)
CCCCC = all characters starting from "," to "," (both ,, not included)
DDDDD = all characters starting from "," to ")" (both ,) not included)
The "all characters between x and y" is a problem that kills me everytime
:(
I'm using PHP and I have to match all occurrences of this pattern (preg_match_all) that also, sadly, can be on multiple lines
Thank you in advance!
I would recommend you do not use an ungreedy quantifier, but instead make all repetitions mutually exclusive with their delimiters. What does this mean? It means, for instance, that A can be any character except (. Giving this regex:
;([^(]*)[(]([^,]*),([^,]*),([^)]*)[)]
Where the last [)] is not even necessary.
The PHP code would then look like this:
preg_match_all('/;([^(]*)[(]([^,]*),([^,]*),([^)]*)[)]/', $input, $matches);
$fullMatches = $matches[0];
$arrayOfAs = $matches[1];
$arrayOfBs = $matches[2];
$arrayOfCs = $matches[3];
$arrayOfDs = $matches[4];
As the comments show, my escaping technique is a matter of taste. This regex is of course equal to:
;([^(]*)\(([^,]*),([^,]*),([^)]*)\)
But I think that looks a lot more mismatched/unbalanced than the other variant. Take you pick!
Finally, for the question why this approach would be better than using ungreedy (lazy) quantifiers. Here is some good, general reading. Basically, when you use ungreedy quantifiers, the engine still has to backtrack. It tries one repetition first, then notices that ( after that doesn't match. So it has to go back into the repetition and consume another character. But then the ( still doesn't match, so back to the repetition again. With this approach however, the engine will consume as much as possible, when going into the repetition for the first time. And when all non-( characters are consumed, then the engine will be able to match the following ( right away.
You could use something like this code:
preg_match_all('/;(.*?)\((.*?),(.*?),(.*?)\)/s',$text,$matches);
See it on ideone.com.
Basically, you can use .*? (question mark being ungreedy), make sure to escape the parentheses, and you may need the s modifier to have it work on multiple lines.
Variables would be in an array: $matches

Remove number then a space from the start of a string

How would I go about removing numbers and a space from the start of a string?
For example, from '13 Adam Court, Cannock' remove '13 '
Because everyone else is going the \d+\s route I'll give you the brain-dead answer
$str = preg_replace("#([0-9]+ )#","",$str);
Word to the wise, don't use / as your delimiter in regex, you will experience the dreaded leaning-toothpick-problem when trying to do file paths or something like http://
:)
Use the same regex I gave in my JavaScript answer, but apply it using preg_replace():
preg_replace('/^\d+\s+/', '', $str);
Try this one :
^\d+ (.*)$
Like this :
preg_replace ("^\d+ (.*)$", "$1" , $string);
Resources :
preg_replace
regular-expressions.info
On the same topic :
Regular expression to remove number, then a space?
regular expression for matching number and spaces.
I'd use
/^\d+\s+/
It looks for a number of any size in the beginning of a string ^\d+
Then looks for a patch of whitespace after it \s+
When you use a backslash before certain letters it represents something...
\d represents a digit 0,1,2,3,4,5,6,7,8,9.
\s represents a space .
Add a plus sign (+) to the end and you can have...
\d+ a series of digits (number)
\s+ multiple spaces (typos etc.)
The same regex I gave you on your other question still applies. You just have to use preg_replace() instead.
Search for /^[\s\d]+/ and replace with the empty string. Eg:
$str = preg_replace(/^[\s\d]+/, '', $str);
This will remove digits and spaces in any order from the beginning of the string. For something that removes only a number followed by spaces, see BoltClock's answer.
If the input strings all have the same ecpected format and you will receive the same result from left trimming all numbers and spaces (no matter the order of their occurrence at the front of the string), then you don't actually need to fire up the regex engine.
I love regex, but know not to use it unless it provides a valuable advantage over a non-regex technique. Regex is often slower than non-regex techniques.
Use ltrim() with a character mask that includes spaces and digits.
Code: (Demo)
var_export(
ltrim('420 911 90210 666 keep this part', ' 0..9')
);
Output:
'keep this part'
It wouldn't matter if the string started with a space either. ltrim() will greedily remove all instances of spaces or numbers from the start of the string intil it can't anymore.

Categories