preg_replace pattern to remove pNUMBERxNUMBER - php

Im trying to locate a pattern with preg_replace() and remove it...
I have a string, that contains this: p130x130/ and these numbers vary, they can be higher, or lower ... what I need to do is locate that string, and remove it, whole thing.
I've been trying to use this:
preg_replace('/p+[0-9]+x+[0-9]"/', '', $str);
but that doesnt work for some reason. Would any of you know the correct regexp?
Kind regards

You need to first remove the + quantifier after p then switch the + quantifier from after x and place it after your character class (e.g. x[0-9]+), also remove the quote " inside of your expression, which to me looks like a typo here. You can also use a different delimiter to avoid escaping the ending slash.
$str = preg_replace('~p[0-9]+x[0-9]+/~', '', $str);
If the ending slash is by mistake a typo as well, then this is what you're looking for.
$str = preg_replace('/p[0-9]+x[0-9]+/', '', $str);

Regex to match p130x130/ is,
p[0-9]+x[0-9]+\/

Try this:
$str = preg_replace("/p[0-9]+?x[0-9]+?\//is","",$str);
As mentioned by the comment I have to explain the code as I'm a teacher now.
I've used "/" as a delimiter, but you can use different characters to avoid slashing.
The part that says [0-9]+ is saying to match any character between 0 and 9 at least once, but more if possible. If I had put [0-9]*? then it would have matched an empty space too (as * means to match 0 or more, not 1 or more like +) which is probably not what you wanted anyway.
I've put the ? at the end to make it non-greedy, just a habit of mine but I don't think it's needed. (I used ereg a lot previously).
Anyway, it's going to find 0-9 until it hits an x, and then it does another match for more numbers until it hits a single forward slash. I've backslashed that slash because my delimiter is a slash also and I didn't want it to end there.

Related

Regex - Match Word Aslong As Nothing Follows It

Having a little trouble with regex. I'm trying to test for a match but only if nothing follows it. So in the below example if I go to test/create/1/2 - it still matches. I only want to match if it's explicitally test/create/1 (but the one is dynamic).
if(preg_match('^test/create/(.*)^', 'test/create/1')):
// do something...
endif;
I've found some answers that suggest using $ before my delimiter but it doesn't appear to do anything. Or a combination of ^ and $ but I can't quite figure it out. Regex confuses the hell out of me!
EDIT:
I didn't really explain this well enough so just to clarify:
I need the if statement to return true if a URL is test/create/{id} - the {id} being dynamic (and of any length). If the {id} is followed by a forward slash the if statement should fail. So that if someone types in test/create/1/2 - it will fail because of the forward slash after the 1.
Solution
I went for thedarkwinter's answer in the end as it's what worked best for me, although other answers did work as well.
I also had to add an little extra in the regex to make sure that it would work with hyphens as well so the final code looked like this:
if(preg_match('^test/create/[\w-]*$^', 'test/create/1')):
// do something...
endif;
/w matches word characters, and $ matches end of string
if(preg_match('^test/create/\w*$^', 'test/create/1'))
will match test/create/[word/num] and nothing following.
I think thats what you are after.
edit added * in \w*
Here you go:
"/^test\\/create\\/([^\\/]*)$/"
This says:
The string that starts with "test" followed by a forward slash (remember the first backslash escapes the second so PHP puts a letter backslash in the input, which escapes the / to regex) followed by create followed by a forward slash followed by and capture everything that isn't a slash which is then the end of the string.
Comment if you need more detail
I prefer my expressions to always start with / because it has no meaning as a regex character, I've seen # used, I believe some other answer uses ^, this means "start of string" so I wouldn't use it as my regex delimiters.
Use following regular expression (use $ to denote end of the input):
'|test/create/[^/]+$|'
If you want only match digits, use folloiwng instead (\d match digit character):
'^test/create/\d+$^'
The ^ is an anchor for the beginning of the line, i.e. no characters occurring before the ^ . Use a $ to designate the end of the string, or end of the line.
EDIT: wanted to add a suggestion as well:
Your solution is fine and works, but in terms of style I'd advise against using the carat (^) as a delimiter -- especially because it has special meaning as either negation or as a start of line anchor so it's a bit confusing to read it that way. You can legally use most special characters as long as they don't occur (or are escaped) in the regex itself. Just talking about a matter of style/maintainability here.
Of course nearly every potential delimiter has some special meaning, but you also often tend to see the ^ at the beginning of a regex so I might chose another alternative. For example # is a good choice here :
if(preg_match('#test/create/[\w-]*$#', $mystring)) {
//etc
}
The regex abc$ will match abc only when it's the last string.
abcd # no match
dabc # match
abc # match

A preg_replace puzzle: replacing zero or more of a char at the end of the subject

Say $d is a directory path and I want to ensure that it starts and ends with exactly one slash (/). It may initially have zero, one or more leading and/or trailing slashes.
I tried:
preg_replace('%^/*|/*$', '/', $d);
which works for the leading slash but to my surprise yields two trailing slashes if $d has at least one trailing slash. If the subject is, e.g., 'foo///' then preg_replace() first matches and replaces the three trailing slashes with one slash and then it matches zero slashes at the end and replaces that with with a slash. (You can verify this by replacing the second argument with '[$0]'.) I find this rather counterintuitive.
While there are many other ways to solve the underlying problem (and I implemented one) this became a PCRE puzzle for me: what (scalar) pattern in a single preg_replace does this job?
ADDITIONAL QUESTION (edit)
Can anyone explain why this pattern matches the way it does at the end of the string but does not behave similarly at the start?
$path = '/' . trim($path, '/') . '/';
This first removes all slashes at beginning or end and then adds single ones again.
Given a regex like /* that can legitimately match zero characters, the regex engine has to make sure that it never matches more than once in the same spot, or it would get stuck in an infinite loop. Thus, if it does consume zero characters, the engine jumps forward one position before attempting another match. As far as I know, that's the only situation in which the regex engine does anything on its own initiative.
What you're seeing is the opposite situation: the regex consumes one or more characters, then on the next go-round it tries to start matching at the spot where it left off. Never mind that this particular regex can't match anything but the one character, and it already matched as many of those as it could; it still has the option of matching nothing, so that's what it does.
So, why doesn't your regex match twice at the beginning, like it does at the end? Because of the start anchor (^). If the subject starts with one or more slashes, it consumes them and then tries to match zero slashes, but it fails because it's not at the beginning of the string any more. And if there are no slashes at the beginning, the manual bump-along has the same affect.
At the end of the subject it's a different story. If there are no slashes there, it matches nothing, tries to bump along and fails; end of story. But if it does match one or more slashes, it consumes them and tries to match again--and succeeds because the $ anchor still matches.
So in general, if you want to prevent this kind of double match, you can either add a condition to the beginning of the match to prevent it, like the ^ anchor does for the first alternative:
preg_replace('%^/*|(?<!/)/*$%', '/', $d);
...or make sure that part of the regex has to consume at least one character:
preg_replace('%^/*|([^/])/*$%', '$1/', $d);
But in this case you have a much simpler option, as demonstrated by John Kugelman: just capture the part you want to keep and chuck the rest.
preg_replace('%^/*(.*?)/*$%', '/\1/', $d)
it can be done in a single preg_replace
preg_replace('/^\/{2,}|\/{2,}$|^([^\/])|([^\/])$/', '\2/\1', $d);
A small change to your pattern would be to separate out the two key concerns at the end of the string:
Replace multiple slashes with one slash
Replace no slashes with one slash
A pattern for that (and the existing part for matching at the start of the string) would look like:
#^/*|/+$|$(?<!/)#
A slightly less concise, but more precise, option would be to be very explicit about only matching zero or two-or-more slashes; the notion being, why replace one slash with one slash?
#^(?!/)|^/{2,}|/{2,}$|$(?<!/)#
Aside: nikic's suggestion to use trim (to remove leading/trailing slashes, then add your own) is a good one.

RegEx string "preg_replace"

I need to do a "find and replace" on about 45k lines of a CSV file and then put this into a database.
I figured I should be able to do this with PHP and preg_replace but can't seem to figure out the expression...
The lines consist of one field and are all in the following format:
"./1/024/9780310320241/SPSTANDARD.9780310320241.jpg" or "./t/fla/8204909_flat/SPSTANDARD.8204909_flat.jpg"
The first part will always be a period, the second part will always be one alphanumeric character, the third will always be three alphanumeric characters and the fourth should always be between 1 and 13 alphanumeric characters.
I came up with the following which seems to be right however I will openly profess to not knowing very much at all about regular expressions, it's a little new to me! I'm probably making a whole load of silly mistakes here...
$pattern = "/^(\.\/[0-9a-zA-Z]{1}\/[0-9a-zA-Z]{3}\/[0-9a-zA-Z]{1,13}\/)$/";
$new = preg_replace($pattern, " ", $i);
Anyway any and all help appreciated!
Thanks,
Phil
The only mistake I encouter is the anchor for the string end $ that should be removed. And your expression is also missing the _ character:
/^(\.\/[0-9a-zA-Z]{1}\/[0-9a-zA-Z]{3}\/[0-9a-zA-Z_]{1,13}\/)/
A more general pattern would be to just exclude the /:
/^(\.\/[^\/]{1}\/[^\/]{3}\/[^\/]{1,13}\/)/
You should use PHP's builtin parser for extracting the values out of the csv before matching any patterns.
I'm not sure I understand what you're asking. Do you mean every line in the file looks like that, and you want to process all of them? If so, this regex would do the trick:
'#^.*/#'
That simply matches everything up to and including the last slash, which is what your regex would do if it weren't for that rogue '$' everyone's talking about. If there are other lines in other formats that you want to leave alone, this regex will probably suit your needs:
'#^\./\w/\w{3}/\w{1,13}/#"
Notice how I changed the regex delimiter from '/' to '#' so I don't have to escape the slashes inside. You can use almost any punctuation character for the delimiters (but of course they both have to be the same).
The $ means the end of the string. So your pattern would match ./1/024/9780310320241/ and ./t/fla/8204909_flat/ if they were alone on their line. Remove the $ and it will match the first four parts of your string, replacing them with a space.
$pattern = "/(\.\/[0-9a-z]{1}\/[0-9a-z]{3}\/[0-9a-z\_]+\.(jpg|bmp|jpeg|png))\n/is";
I just saw, that your example string doesn't end with /, so may be you should remove it from your pattern at the end. Also underscore is used in the filename and should be in the character class.

Regex Question Again!

I really don't know what my problem is lately, but Regex seems to be giving me the most trouble.
Very simple thing I need to do, but can't seem to get it:
I have a uri that returns either /xmlfeed or /xmlfeed/what/is/this
I want to match /xmlfeed on any occasion.
I've tried many variations of the following:
preg_match('/(\/.*?)\/?/', $_SERVER['REQUEST_URI'], $match);
I would read this as: Match forwardslash then match any character until you come to an optional forwardslash.
Why not:
preg_match ('#/[^/]+#', _SERVER['REQUEST_URI'], $match);
?
$match[0] will give you what you need
why do you need regex that make you confused??
$string = "/xmlfeed/what/is/this";
$s = explode("/",$string,3);
print "/".$s[1]."\n";
output
$ php test.php
/xmlfeed
Your problem is the reluctant quantifier. After the initial slash is matched, .*? consumes the minimum number of characters it's allowed to, which is zero. Then /? takes over; it doesn't see a slash in the next position (which is immediately after the first slash), but that's okay because it's optional. The result: the regex always matches a single slash, and group #1 always matches an empty string.
Obviously, you can't just replace the reluctant quantifier with a greedy one. But if you replace the .* with something that can't match a slash, you don't have to worry about greediness. That's what K Prime's regex, '#/[^/]+#' does. Notice as well how it uses # as the regex delimiter and avoids the necessity of escaping slashes within the regex.
In PHP: '/(\/.*?)\/?/' is a string containing a regular expression.
First you have to decode the string: /(/.*?)\/?/
So you have a forward slash that starts the result expression. An opening brace. A forward slash that ends the matching part of the expression … and I'm pretty sure that it will then error since you haven't closed the brace.
So, to get this working:
Remember to escape characters with special meanings in strings and regular expressions
Don't confuse the forward slash / with the backslash \
You want to match everything after and including the first slash, but before any (optional) second slash (so we don't want the ? that makes it non-greedy):
/(\/[^\/]*)/
Which, expressed as a PHP string is:
'/(\\/([^\\/]*)/'
I know this is avoiding the regex, and therefore avoids the question, but how about splitting the uri (at slashes) into an array.
Then you can deal with the elements of the array, and ignore the bits of the uri you don't want.
Using the suggestions posted, I ended up trying this:
echo $_SERVER['REQUEST_URI'];
preg_match("/(\/.*)[^\/]/", $_SERVER['REQUEST_URI'], $match);
$url = "http://".$_SERVER['SERVER_NAME'].$match[0];
foreach($match as $k=>$v){
echo "<h1>$k - $v</h1>";
}
I also tried it without the .* and without the parentheses.
Without the .* AND () it returns the / with the next character ONLY.
Like it is, it just returns the entire URI everytime
So, when ran with the code above, the output is
/tea-time-blog/post/20
0 - /tea-time-blog/post/20
1 - /tea-time-blog/post/2
This code is being eval()'d by the way. I don't think that should make any differnce in the way PHP handles the regular expression.

Including new lines in PHP preg_replace function

I'm trying to match a string that may appear over multiple lines. It starts and ends with a specific string:
{a}some string
can be multiple lines
{/a}
Can I grab everything between {a} and {/a} with a regex? It seems the . doesn't match new lines, but I've tried the following with no luck:
$template = preg_replace( $'/\{a\}([.\n]+)\{\/a\}/', 'X', $template, -1, $count );
echo $count; // prints 0
It matches . or \n when they're on their own, but not together!
Use the s modifier:
$template = preg_replace( $'/\{a\}([.\n]+)\{\/a\}/s', 'X', $template, -1, $count );
// ^
echo $count;
I think you've got more problems than just the dot not matching newlines, but let me start with a formatting recommendation. You can use just about any punctuation character as the regex delimiter, not just the slash ('/'). If you use another character, you won't have to escape slashes within the regex. I understand '%' is popular among PHPers; that would make your pattern argument:
'%\{a\}([.\n]+)\{/a\}%'
Now, the reason that regex didn't work as you intended is because the dot loses its special meaning when it appears inside a character class (the square brackets)--so [.\n] just matches a dot or a linefeed. What you were looking for was (?:.|\n), but I would have recommended matching the carriage-return as well as the linefeed:
'%\{a\}((?:.|[\r\n])+)\{/a\}%'
That's because the word "newline" can refer to the Unix-style "\n", Windows-style "\r\n", or older-Mac-style "\r". Any given web page may contain any of those or a mixture of two or more styles; a mix of "\n" and "\r\n" is very common. But with /s mode (also known as single-line or DOTALL mode), you don't need to worry about that:
'%\{a\}(.+)\{/a\}%s'
However, there's another problem with the original regex that's still present in this one: the + is greedy. That means, if there's more than one {a}...{/a} sequence in the text, the first time your regex is applied it will match all of them, from the first {a} to the last {/a}. The simplest way to fix that is to make the + ungreedy (a.k.a, "lazy" or "reluctant") by appending a question mark:
'%\{a\}(.+?)\{/a\}%s'
Finally, I don't know what to make of the '$' before the opening quote of your pattern argument. I don't do PHP, but that looks like a syntax error to me. If someone could educate me in this matter, I'd appreciate it.
From http://www.regular-expressions.info/dot.html:
"The dot matches a single character,
without caring what that character is.
The only exception are newline
characters."
you will need to add a trailing /s flag to your expression.

Categories