preg_replace with lots of special characters

preg_replace with lots of special characters - php

I tend to lose track when I'm dealing with alot of special characters.
I have some urls (badly formatted, not consistent enough to use parse_url), and I want to replace all occurrences of the parameters ?dead and/or ?dead=some_text_here with nothing.
There might be other variables before and after.
Example urls:
http://www.url.com/?dead?dead=whatever_text&wow=test
http://www.url.com/?hello?dead=whatever_text
This is what I thought would work, but it doesnt.
$parsed_url = preg_replace("/(\?dead(?:=.*?)?)(?:\&|$|\?)/", "", $url);
What its supposed to do is check for "?dead", with an optional =value behind, then replace that with nothing. But this is also replacing the ? and &, if there is a parameter following the ?dead parameter. Also its only replacing 1 occurrence, not all.
It makes
http://www.url.com/?dead?dead=whatever_text&wow=test
Become
http://www.url.com/dead=whatever_text&wow=test

I think you want something like this pattern?
(\?dead(=[^&]*|))*
PHP Code:
echo preg_replace('/(\?dead(=[^&]*|))*/','',$sourcestring);
This will produce this output of your given urls:
http://www.url.com/&wow=test
http://www.url.com/&wow=test
http://www.url.com/?hello

You can use \Q and \E (as in QuotE) when dealing with lots of special characters.
The text between these delimiters will be treated literally.

How about:
$parsed_url = preg_replace("/\?dead(?:=[^&?]*)?/", "", $url);
preg_replace should match all occurances by default (the $limit parameter is -1 by default).
This regex is very similar to yours but instead of .* it uses [^&?]* to match until the next ? or &

Related

PHP - Comment System "Replace Http// urls" [duplicate]

This question already has answers here:
How do I replace certain parts of my string?
(5 answers)
Closed 2 years ago.
I'm creating a simple comment system connected by Steam API. Every Steam user connected in my website can automatically post things. But i'm changing some functions to replace things like the URLs.
My question is: When a user post something like,
"Hello I'm nice, have a look at http://www.cute.com"
Automatically replaces the http:// for the link without changing the http:// in the string.

Maybe something like this?
<?php
$str = "helloo im nice, have a look http://www.cute.com";
echo preg_replace("/http:\/\/(.+)\.(.+)\.(.+)/", "<a href='http://$1.$2.$3'>$1.$2.$3</a>", $str);
?>
This will convert any link into an anchor (or an a tag).
Alternative added
Alternatively, it might be a good idea to add support for https as well. In which case the following might be useful.
<?php
$str = "helloo im nice, have a look http://www.cute.com";
echo preg_replace("/http(s?):\/\/(.+)\.(.+)\.(.+)/", "<a href='http$1://$2.$3.$4'>http$1://$2.$3.$4</a>", $str);
?>
This takes advantage of the ? modifier which means "one or more of the preceding character". In this case it is the "s" character since it is "http" and "https" both match.
Explanation
This uses RegEx (or Regular Expressions) to create this.
The first parameter of the preg_replace function takes the RegEx (I like to test mine here: http://regexr.com/).
All RegExs must start and end with a forward slash. The bits inbetween are as follows.
http: is simply selecting a string that starts with "http:"
\/\/ is called "escaping" and that will select two forward slashes. Since forward slashes are special characters used in RegEx (start and end of a statement) they need to be escaped so that PHP doesn't think the RegEx has ended sooner.
(.+) The brackets are also special characters (though not escaped) and they are known as "capture groups". What this is used for is so that I can see what is between the "http://" and the ".com" (or whatever extension is used). The full stop (or period or ".") character selects anything.
\. Further on the escaping. Since full stop is used as a special character, we have to escape this one. What that means so far is that we are selecting "http://" then anything and then stopping at a full stop.
(.+) Last but not least is the final capture group. This, again selects anything from the string so that have our final capture group and RegEx complete.
Modifiers:
? means "one or more of the preceding character". This means that /tests?/ would match test and tests since s is the preceding character and in the first example we have 0 and in the second there is 1
+ means "one of more of the preceding character". In this case we are saying one of more of anything which means we expect at least one character to be provided.
The second parameter is our replace part.
In short, the $1 and $2 sections are to reference the two brackets from the above RegEx.
Some further reading
The PHP function I used
More information on Regular Expressions
RegEx capture groups

$string = 'helloo im nice, have a look http://www.cute.com';
$string = str_replace('http://', '', $string);
echo $string;

Encoding SEO friendly URL

I am trying to encode a phrase in order to pass it inside a URL. Currently it works fine with basic words, where spaces are replaces with dashes.
<a href="./'.str_replace(' ', '-', preg_replace("/[^A-Za-z0-9- ]/", '', $phrase)).'">
It produces something like:
/this-is-my-phase
On the page that this URL takes me I am able to replace the dashes with spaces and query my db for this phrase.
The problem I have is if the phrase contains apostrophe. My current script removes it. Is there any way to preserve it or replace with some URL-friendly character to accommodate something like?
this is bob's page

There is a PHP standard library function urlencode() to encode non-alphanumeric characters with %Xxx where xx is the hex value of the character.
If the limitations of that conversion (&, ©, £, etc.), are not acceptable, see rawurlencode().

If you want to allow another character , you have to add it to this section: ^A-Za-z0-9- so if for example you wish to allow ' the regex will be [^A-Za-z0-9-' ]

If you only need to replace all the apostrophes ('), then you can replace it with the URL-encoded character %27:
str_replace("'", "%20", $url);
EDIT
If you want to replace all URL-non-safe character, use a built-in function like in #wallyk's answer. It's much simpler.

PHP preg_replace pattern only seems to work if its wrong?

I have a string that looks like this
../Clean_Smarty_Projekt/tpl/templates_c\.
../Clean_Smarty_Projekt/tpl/templates_c\..
I want to replace ../, \. and \.. with a regulare expression.
Before, I did this like this:
$result = str_replace(array("../","\..","\."),"",$str);
And there it (pattern) has to be in this order because changing it makes the output a little buggy. So I decided to use a regular expression.
Now I came up with this pattern
$result = preg_replace('/(\.\.\/)|(\\[\.]{1,2})/',"",$str);
What actually returns only empty strings...
Reason: (\\[\.]{1,2})
In Regex101 its all ok. (Took me a couple of minutes to realize that I don't need the /g in preg_replace)
If I use this pattern in preg_replace I have to do (\\\\[\.]{1,2}) to get it to work. But that's obviously wrong because im not searching for two slashes.
Of course I know the escaping rulse (escaping slashes).
Why doesn't this match correctly ?

I suggest you to use a different php delimiter. Within the / delimiter, you need to use three \\\ or four \\\\ backslashes to match a single backslash.
$string = '../Clean_Smarty_Projekt/tpl/templates_c\.'."\n".'../Clean_Smarty_Projekt/tpl/templates_c\..';
echo preg_replace('~\.\./|\\\.{1,2}~', '', $string)
Output:
Clean_Smarty_Projekt/tpl/templates_c
Clean_Smarty_Projekt/tpl/templates_c

regex to clean up url

I am looking for a way to get a valid url out of a string like:
$string = 'http://somesite.com/directory//sites/9/my_forms/3-895a3e/somefilename.jpg|:||:||:||:|19845';
My original solution was:
preg_match('#^[^:|]*#', str_replace('//', '/', $string), $modifiedPath);
But obviously its going to remove a slash from the http:// instead of the one in the middle of the string.
My expected output that I want from the original is:
http://somesite.com/directory/sites/9/my_forms/3-895a3e/somefilename.jpg
I could always break off the http part of the string first but would like a more elegant solution in the form of regex if possible. Thanks.

This will do exactly what you are asking:
<?php
$string = 'http://somesite.com/directory//sites/9/my_forms/3-895a3e/somefilename.jpg|:||:||:||:|19845';
preg_match('/^([^|]+)/', $string, $m); // get everything up to and NOT including the first pipe (|)
$string = $m[1];
$string = preg_replace('/(?<!:)\/\//', '/' ,$string); // replace all occurrences of // as long as they are not preceded by :
echo $string; // outputs: http://somesite.com/directory/sites/9/my_forms/3-895a3e/somefilename.jpg
exit;
?>
EDIT:
(?<!X) in regular expressions is the syntax for what is called a lookbehind. The X is replaced with the character(s) we are testing for.
The following expression would match every instance of double slashes (/):
\/\/
But we need to make sure that the match we are looking for is NOT preceded by the : character so we need to 'lookbehind' our match to see if the : character is there. If it is then we don't want it to be counted as a match:
(?<!:)\/\/
The ! is what says NOT to match in our lookbehind. If we changed it to (?=:)\/\/ then it would only match the double slashes that did have the : preceding them.
Here is a Quick tutorial that can explain it all better than I can lookahead and lookbehind tutorial

Assuming all your strings are in the form given, you don't need any but the simplest of regexes to do this; if you want an elegant solution, then a regex is definitely not what you need. Also, double slashes are legal in a URL, just like in a Unix path, and mean the same thing a single slash does, so you don't really need to get rid of them at all.
Why not just
$url = array_shift(preg_split('/\|/', $string));
?
If you really, really care about getting rid of the double slashes in the URL, then you can follow this with
$url = preg_replace('/([^:])\/\//', '$1/', $url);
or even combine them into
$url = preg_replace('/([^:])\/\//', '$1/', array_shift(preg_split('/\|/', $string)));
although that last form gets a little bit hairy.

Since this is a quite strictly defined situation, I'd consider just one preg to be the most elegant solution.
From the top of my head:
$sanitizedURL = preg_replace('~((?<!:)/(?=/)|\\|.+)~', '', $rawURL);
Basically, what this does is look for any forward slash that IS NOT preceded by a colon (:), and IS followed bij another forward slash. It also searches for any pipe character and any character following it.
Anything found is removed from the result.
I can explain the RegEx in more detail if you like.

Split string on non-alphanumerics in PHP? Is it possible with php's native function?

I was trying to split a string on non-alphanumeric characters or simple put I want to split words. The approach that immediately came to my mind is to use regular expressions.
Example:
$string = 'php_php-php php';
$splitArr = preg_split('/[^a-z0-9]/i', $string);
But there are two problems that I see with this approach.
It is not a native php function, and is totally dependent on the PCRE Library running on server.
An equally important problem is that what if I have punctuation in a word
Example:
$string = 'U.S.A-men's-vote';
$splitArr = preg_split('/[^a-z0-9]/i', $string);
Now this will spilt the string as [{U}{S}{A}{men}{s}{vote}]
But I want it as [{U.S.A}{men's}{vote}]
So my question is that:
How can we split them according to words?
Is there a possibility to do it with php native function or in some other way where we are not dependent?
Regards

Sounds like a case for str_word_count() using the oft forgotten 1 or 2 value for the second argument, and with a 3rd argument to include hyphens, full stops and apostrophes (or whatever other characters you wish to treat as word-parts) as part of a word; followed by an array_walk() to trim those characters from the beginning or end of the resultant array values, so you only include them when they're actually embedded in the "word"

Either you have PHP installed (then you also have PCRE), or you don't. So your first point is a non-issue.
Then, if you want to exclude punctuation from your splitting delimiters, you need to add them to your character class:
preg_split('/[^a-z0-9.\']+/i', $string);
If you want to treat punctuation characters differently depending on context (say, make a dot only be a delimiter if followed by whitespace), you can do that, too:
preg_split('/\.\s+|[^a-z0-9.\']+/i', $string);

As per my comment, you might want to try (add as many separators as needed)
$splitArr = preg_split('/[\s,!\?;:-]+|[\.]\s+/', $string, -1, PREG_SPLIT_NO_EMPTY);
You'd then have to handle the case of a "quoted" word (it's not so easy to do in a regular expression, because 'is" "this' quoted? And how?).
So I think it's best to keep ' and " within words (so that "it's" is a single word, and "they 'll" is two words) and then deal with those cases separately. For example a regexp would have some trouble in correctly handling
they 're 'just friends'. Or that's what they say.
while having "'re" and a sequence of words of which the first is left-quoted and the last is right-quoted, the first not being a known sequence ('s, 're, 'll, 'd ...) may be handled at application level.

This is not a php-problem, but a logical one.
Words could be concatenated by a -. Abbrevations could look like short sentences.
You can match your example directly by creating a solution that fits only on this particular phrase. But you cant get a solution for all possible phrases. That would require a neuronal-computing based content-recognition.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

preg_replace with lots of special characters - php

I think you want something like this pattern? (\?dead(=[^&]|)) PHP Code: echo preg_replace('/(\?dead(=[^&]|))/','',$sourcestring); This will produce this output of your given urls: http://www.url.com/&wow=test http://www.url.com/&wow=test http://www.url.com/?hello

You can use \Q and \E (as in QuotE) when dealing with lots of special characters. The text between these delimiters will be treated literally.

How about: $parsed_url = preg_replace("/\?dead(?:=[^&?])?/", "", $url); preg_replace should match all occurances by default (the $limit parameter is -1 by default). This regex is very similar to yours but instead of . it uses [^&?]* to match until the next ? or &

Related

PHP - Comment System "Replace Http// urls" [duplicate]

Encoding SEO friendly URL

PHP preg_replace pattern only seems to work if its wrong?

regex to clean up url

Split string on non-alphanumerics in PHP? Is it possible with php's native function?

Categories

Resources

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

preg_replace with lots of special characters - php

I think you want something like this pattern? (\?dead(=[^&]*|))* PHP Code: echo preg_replace('/(\?dead(=[^&]*|))*/','',$sourcestring); This will produce this output of your given urls: http://www.url.com/&wow=test http://www.url.com/&wow=test http://www.url.com/?hello

You can use \Q and \E (as in QuotE) when dealing with lots of special characters. The text between these delimiters will be treated literally.

How about: $parsed_url = preg_replace("/\?dead(?:=[^&?]*)?/", "", $url); preg_replace should match all occurances by default (the $limit parameter is -1 by default). This regex is very similar to yours but instead of .* it uses [^&?]* to match until the next ? or &

Related

PHP - Comment System "Replace Http// urls" [duplicate]

Encoding SEO friendly URL

PHP preg_replace pattern only seems to work if its wrong?

regex to clean up url

Split string on non-alphanumerics in PHP? Is it possible with php's native function?

Categories

Resources

I think you want something like this pattern? (\?dead(=[^&]|)) PHP Code: echo preg_replace('/(\?dead(=[^&]|))/','',$sourcestring); This will produce this output of your given urls: http://www.url.com/&wow=test http://www.url.com/&wow=test http://www.url.com/?hello

How about: $parsed_url = preg_replace("/\?dead(?:=[^&?])?/", "", $url); preg_replace should match all occurances by default (the $limit parameter is -1 by default). This regex is very similar to yours but instead of . it uses [^&?]* to match until the next ? or &