Regular Expression to remove underscore and string

Regular Expression to remove underscore and string - php

I need to use PHP regex to remove '_normal' from the end of this url.
http://a0.twimg.com/profile_images/3707137637/8b020cf4023476238704a9fc40cdf445_normal.jpeg
so that it becomes
http://a0.twimg.com/profile_images/3707137637/8b020cf4023476238704a9fc40cdf445.jpeg.
I tried
$prof_img = preg_replace('_normal', '', $prof_img);
but the underscore seems to be throwing things off.

As others have stated, str_replace is probably the best option for this simple example.
The problem with your specific code is that your regex string is undelimited, you need to this instead:
$prof_img = preg_replace('/_normal/', '', $prof_img);
See PCRE regex syntax for a reference.
The underscore is treated as a normal character in PCRE and isn't throwing things off.
If you require that only _normal at the end of the filename is matched, you can use:
$prof_img = preg_replace('/_normal(\.[^\.]+)$/', '$1', $prof_img);
See preg_replace for more information on how this works.

Try using str_replace; it's much more efficient than regex for something like this.
However, if you want to use regular expressions, you need a delimiter:
preg_replace('|_normal|','', $url);

str_replace should work.
$prof_img = str_replace('_normal', '', $prof_img);

You just forgot to add delimiters around your regex.
http://www.php.net/manual/en/regexp.reference.delimiters.php
When using the PCRE functions, it is required that the pattern is
enclosed by delimiters. A delimiter can be any non-alphanumeric,
non-backslash, non-whitespace character.
Often used delimiters are forward slashes (/), hash signs (#) and
tildes (~). The following are all examples of valid delimited
patterns.
$prof_img = preg_replace('/_normal/', '', $prof_img);
$prof_img = preg_replace('#_normal#', '', $prof_img);
$prof_img = preg_replace('~_normal~', '', $prof_img);

You can use decompose the URL first, perform the replacement and stick them back together, i.e.
$url = 'http://a0.twimg.com/profile_images/3707137637/8b020cf4023476238704a9fc40cdf445_normal.jpeg';
$parts = pathinfo($url);
// transform
$url = sprintf('%s%s.%s',
$parts['dirname'],
preg_replace('/_normal$/', '', $parts['filename']),
$parts['extension']
);
You might note two differences between your expression and mine:
Yours wasn't delimited.
Mine is anchored, i.e. it only removes _normal if it occurs at the end of the file name.

Using non-capturing groups, you can also try like this:
$prof_img = preg_replace('/(.+)(?:_normal)(.+)/', '$1$2', $prof_img);
It will keep the required part as a match.

Related

Regexp for preg_replace in PHP

I have strings like this (some examples):
F7998FM3213/02F
J442554NM/05
K439459845/34D
I need to use PHP with preg_replace and regular expressions to delete all non-numeric characters in any string, after the forward-slash, '/'.
For example the codes above would look like this afterwards:
F7998FM3213/02
J442554NM/05
K439459845/34

If you're going for readability, something like this would be perfect:
$parts = explode("/",$line,2);
$parts[1] = preg_replace("/\D/","",$parts[1]);
$output = implode("/",$parts);
However, for conciseness and based entirely on the examples you have given, try this:
$output = preg_replace("/\D+$/","",$input);
This will strip any non-numeric characters from the end of the string, which seems to be what you're after based on your examples.

you can use this:
$subject = <<<LOD
F7998FM3213/02F
J442554NM/05
K439459845/34D
K439459845/34D34
LOD;
echo preg_replace('~^[^/]*+/\K|[^\d\n]++~m', '', $subject);
explanation:
The regex is an alternation between two things:
You match the begining until you encounter / included
the part after the / that is all that is not a digit or a new line one or more times
Since the begining of the string is checked at first, all non digit characters are removed after the /

To remove all \D anywhere after a / you could replace:
(?:/\K|\G(?!^))(\d*)\D+
with $1. Like:
preg_replace(',(?:/\K|\G(?!^))(\d*)\D+,', '$1', $str);

Making a url regex global

I've been searching for a regex to replace plain text url's in a string (the string can contain more than 1 url), by:
url
and I found this:
http://mathiasbynens.be/demo/url-regex
I would like to use the diegoperini's regex (which according to the tests is the best):
_^(?:(?:https?|ftp)://)(?:\S+(?::\S*)?#)?(?:(?!10(?:\.\d{1,3}){3})(?!127(?:\.\d{1,3}){3})(?!169\.254(?:\.\d{1,3}){2})(?!192\.168(?:\.\d{1,3}){2})(?!172\.(?:1[6-9]|2\d|3[0-1])(?:\.\d{1,3}){2})(?:[1-9]\d?|1\d\d|2[01]\d|22[0-3])(?:\.(?:1?\d{1,2}|2[0-4]\d|25[0-5])){2}(?:\.(?:[1-9]\d?|1\d\d|2[0-4]\d|25[0-4]))|(?:(?:[a-z\x{00a1}-\x{ffff}0-9]+-?)*[a-z\x{00a1}-\x{ffff}0-9]+)(?:\.(?:[a-z\x{00a1}-\x{ffff}0-9]+-?)*[a-z\x{00a1}-\x{ffff}0-9]+)*(?:\.(?:[a-z\x{00a1}-\x{ffff}]{2,})))(?::\d{2,5})?(?:/[^\s]*)?$_iuS
But I want o make it global to replace all the url's in a string.
When I use this:
/_(?:(?:https?|ftp)://)(?:\S+(?::\S*)?#)?(?:(?!10(?:\.\d{1,3}){3})(?!127(?:\.\d{1,3}){3})(?!169\.254(?:\.\d{1,3}){2})(?!192\.168(?:\.\d{1,3}){2})(?!172\.(?:1[6-9]|2\d|3[0-1])(?:\.\d{1,3}){2})(?:[1-9]\d?|1\d\d|2[01]\d|22[0-3])(?:\.(?:1?\d{1,2}|2[0-4]\d|25[0-5])){2}(?:\.(?:[1-9]\d?|1\d\d|2[0-4]\d|25[0-4]))|(?:(?:[a-z\x{00a1}-\x{ffff}0-9]+-?)*[a-z\x{00a1}-\x{ffff}0-9]+)(?:\.(?:[a-z\x{00a1}-\x{ffff}0-9]+-?)*[a-z\x{00a1}-\x{ffff}0-9]+)*(?:\.(?:[a-z\x{00a1}-\x{ffff}]{2,})))(?::\d{2,5})?(?:/[^\s]*)?_iuS/g
It does not work, how do I make this regex global and what does the underscore at the beginning and the "_iuS", at the end, means?
I would like to use it with php so I am using:
preg_replace($regex, '$0', $examplestring);

The underscores are the regex delimiters, the i, u and S are pattern modifiers :
i (PCRE_CASELESS)
If this modifier is set, letters in the pattern match both upper and lower
case letters.
U (PCRE_UNGREEDY)
This modifier inverts the "greediness" of the quantifiers so that they are
not greedy by default, but become greedy if followed by ?. It is not compatible
with Perl. It can also be set by a (?U) modifier setting within the pattern
or by a question mark behind a quantifier (e.g. .*?).
S
When a pattern is going to be used several times, it is worth spending more
time analyzing it in order to speed up the time taken for matching. If this
modifier is set, then this extra analysis is performed. At present, studying
a pattern is useful only for non-anchored patterns that do not have a single
fixed starting character.
For more informations see http://www.php.net/manual/en/reference.pcre.pattern.modifiers.php
When you added the / ... /g , you added another regex delimiter plus the modifier g wich does not exists in PCRE, that's why it did not work.

I agree with #verdesmarald and used this pattern in the following function:
$string = preg_replace_callback(
"_(?:(?:https?|ftp)://)(?:\S+(?::\S*)?#)?(?:(?!10(?:\.\d{1,3}){3})(?!127(?:\.\d{1,3}){3})(?!169\.254(?:\.\d{1,3}){2})(?!192\.168(?:\.\d{1,3}){2})(?!172\.(?:1[6-9]|2\d|3[0-1])(?:\.\d{1,3}){2})(?:[1-9]\d?|1\d\d|2[01]\d|22[0-3])(?:\.(?:1?\d{1,2}|2[0-4]\d|25[0-5])){2}(?:\.(?:[1-9]\d?|1\d\d|2[0-4]\d|25[0-4]))|(?:(?:[a-z\x{00a1}-\x{ffff}0-9]+-?)*[a-z\x{00a1}-\x{ffff}0-9]+)(?:\.(?:[a-z\x{00a1}-\x{ffff}0-9]+-?)*[a-z\x{00a1}-\x{ffff}0-9]+)*(?:\.(?:[a-z\x{00a1}-\x{ffff}]{2,})))(?::\d{2,5})?(?:/[^\s]*)?_iuS",
create_function('$match','
$m = trim(strtolower($match[0]));
$m = str_replace("http://", "", $m);
$m = str_replace("https://", "", $m);
$m = str_replace("ftp://", "", $m);
$m = str_replace("www.", "", $m);
if (strlen($m) > 25)
{
$m = substr($m, 0, 25) . "...";
}
return "$m";
'), $string);
return $string;
It seem to do the trick, and resolve an issue I was having. As #verdesmarald said, removing the ^ and $ characters allowed the pattern to work even in my pre_replace_callback().
Only thing that concerns me, is how efficient is the pattern. If used in a busy/high traffic web app, could it cause a bottle neck?
UPDATE
The above regex pattern breaks if there is a trail dot at the end of the path section of a url, like so http://www.mydomain.com/page.. To solve this I modified the final part of the regex pattern by adding ^. making the final part look like so [^\s^.]. As I read it, do not match a trailing space or dot.
In my tests so far it seems to be working fine.

php replace problem

there are some url, like 11162_sport.html, 11451_sport.html, 11245_sport.html or 231sport.html,
I want when the url like XXXXX_sport.html then replace them into 11162_football.html, 11451_football.html, 11245_football.html, and 231sport.html has no change.
how to replace them, $newurl = preg_replace("_sport.html","_football.html",$url)? Thanks.

Simply do $newurl = str_replace("_sport.html", "_football.html", $url);
This is faster than doing a preg_replace() and more accurant.
see the manual on str_replace.

you can use str_replace for such simple replacement.

If it must be regular expressions, do:
preg_replace('/_sport\.html$/', '_football.html', $url);
str_replace() would indeterminately replace all occurences of sport.html whereas a regular expression with an end-of-line marker ($) will only replace the pattern at the end of the URL.
The dot needs to be escaped because it would match any character (except new-lines).

using preg_match to strip specified underscore in php

There has always been a confusion with preg_match in php.
I have a string like this:
apsd_01_03s_somedescription
apsd_02_04_somedescription
Can I use preg_match to strip off anything from 3rd underscore including the 3rd underscore.
thanks.

Try this:
preg_replace('/^([^_]*_[^_]*_[^_]*).*/', '$1', $str)
This will take only the first three sequences that are separated by _. So everything from the third _ on will be removed.

if you want to strip the "_somedescription" part: preg_replace('/([^]*)([^]*)([^]*)(.*)/', '$1_$2_$3', $str);

I agree with Gumbo's answer, however, instead of using regular expressions, you can use PHP's array functions:
$s = "apsd_01_03s_somedescription";
$parts = explode("_", $s);
echo implode("_", array_slice($parts, 0, 3));
// apsd_01_03s
This method appears to execute similarly in speed, compared to a regular expression solution.

If the third underscore is the last one, you can do this:
preg_replace('/^(.+)_.+?)$/', $1, $str);

Replacing HTML attributes using a regex in PHP

OK,I know that I should use a DOM parser, but this is to stub out some code that's a proof of concept for a later feature, so I want to quickly get some functionality on a limited set of test code.
I'm trying to strip the width and height attributes of chunks HTML, in other words, replace
width="number" height="number"
with a blank string.
The function I'm trying to write looks like this at the moment:
function remove_img_dimensions($string,$iphone) {
$pattern = "width=\"[0-9]*\"";
$string = preg_replace($pattern, "", $string);
$pattern = "height=\"[0-9]*\"";
$string = preg_replace($pattern, "", $string);
return $string;
}
But that doesn't work.
How do I make that work?

PHP is unique among the major languages in that, although regexes are specified in the form of string literals like in Python, Java and C#, you also have to use regex delimiters like in Perl, JavaScript and Ruby.
Be aware, too, that you can use single-quotes instead of double-quotes to reduce the need to escape characters like double-quotes and backslashes. It's a good habit to get into, because the escaping rules for double-quoted strings can be surprising.
Finally, you can combine your two replacements into one by means of a simple alternation:
$pattern = '/(width|height)="[0-9]*"/i';

Your pattern needs the start/end pattern character. Like this:
$pattern = "/height=\"[0-9]*\"/";
$string = preg_replace($pattern, "", $string);
"/" is the usual character, but most characters would work ("|pattern|","#pattern#",whatever).

I think you're missing the parentheses (which can be //, || or various other pairs of characters) that need to surround a regular expression in the string. Try changing your $pattern assignments to this form:
$pattern = "/width=\"[0-9]*\"/";
...if you want to be able to do a case-insensitive comparison, add an 'i' at the end of the string, thus:
$pattern = "/width=\"[0-9]*\"/i";
Hope this helps!
David

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

Regular Expression to remove underscore and string - php

Try using str_replace; it's much more efficient than regex for something like this. However, if you want to use regular expressions, you need a delimiter: preg_replace('|_normal|','', $url);

str_replace should work. $prof_img = str_replace('_normal', '', $prof_img);

Using non-capturing groups, you can also try like this: $prof_img = preg_replace('/(.+)(?:_normal)(.+)/', '$1$2', $prof_img); It will keep the required part as a match.

Related

Regexp for preg_replace in PHP

Making a url regex global

php replace problem

using preg_match to strip specified underscore in php

Replacing HTML attributes using a regex in PHP

Categories

Resources