preg issue with PHP - php

I have the following piece of PHP code:
$string = "Ouch!; Funny, these photos were taken with my own phone... … ";
echo preg_replace("[^A-Za-z0-9:\/.,;]", '', $string);
As far as I can tell, this removes everything that is not Alphanumeric as well as the characters: : . , /
When I run it, I get:
Ouch!; Funny, these photos were taken with my own phone... …
Instead of what I was expecting:
Ouch!; Funny, these photos were taken with my own phone...
These special characters are still making it in, even though I am excluding them. Any ideas?
Answer:
Summarized from the answers and comments below - this will eliminate special characters, but allows .',;?/\: and insures that we don't end up with multiple blanks:
preg_replace("/[^A-Za-z0-9:\/.,;!##'?!\s+!]/",' ', $string)

PHP regular expressions, including preg_replace, expect delimiters around the regular expression.
$string = "Ouch!; Funny, these photos were taken with my own phone... … ";
echo preg_replace("/[^A-Za-z0-9:\/.,;]/u", ' ', $string);
Note the / on either side of your expression. You'll also probably want the utf-8 modifier u (thx #jon).
Now in this case, you're actually going to end up with:
Ouch;Funny,thesephotosweretakenwithmyownphone...
This isn't what you wrote out however; in order to do that, you'll need a bit more complex code. You could simply replace with ' ' (space) but you might end up with a bunch of unwanted whitespace.

This works:
$string = "Ouch!; Funny, these photos were taken with my own phone... … ";
echo preg_replace("/[^A-Za-z0-9:\/\.,; ]/", '', $string);
http://3v4l.org/ne7Qu

Related

using regex for filtering some words in persian in php

I'm working on a script that is going to identify offensive words from text messages. The problem is that sometimes users make some changes in words and make them unidentifiable. my code has to be able to identify those too as far as possible.
First of all I replace all non-alnum chars to spaces.
And then:
I've written two regex patterns.
One to remove repeating characters from string.
for Example: the user has written: seeeeex, it replaces it with sex:
preg_replace('/(.)\1+/', '$1', $text)
this regex works fine for English words but not in Farsi words which is my case.
for example if you write:
امیییییییییین
it does nothing with it.
I also tried
mb_ereg_replace
But it didn't work either.
My other regex is to remove spaces around all one-letter words.
for example: I want it to convert S E X to sex:
preg_replace('/( [a-zA-Zآ-ی] )\1+/', trim('$1'), $text);
This regex doesn't work at all and needs to be corrected.
Thank you for your help
Working with multi-byte characters, you should enable Unicode Aware modifier to change behavior of tokens in order to match right thing. In your first case it should be:
/(.)\1+/u
In your second regex, however, I see both syntax and semantic errors which you would change it to:
/\b(\pL)\s+/u
PHP:
preg_replace('/\b(\pL)\s+/u', '$1', $text);
Putting all together:
$text = 'سسس ککک سسس';
echo preg_replace(['/(.)\1+/u', '/\b(\pL)\s+/u'], '$1', $text); // خروجی میدهد: سکس
Live demo

Regex to find hashtag in string - without taking the initial hashtag symbol

I'm trying to do this in PHP and I am just wondering as I'm not great with Regex.
I'm trying to find all hashtags in a string, and wrap them in a link to twitter. In order to do this I need the content of the hashtag, without the symbol.
I want to select the #hashtag - without the preceding # => Just to return hashtag?
I'd like to do it in one line but I'm doing a preg_replace, followed by a string replace as shown:
$string = preg_replace('/\B#([a-z0-9_-]+)/i', '$0 ', $string);
$string = str_replace('https://twitter.com/hashtag/#', 'https://twitter.com/hashtag/', $string);
Any guidance is apprecaited!
I was using a regex tester and found the answer.
preg_replace was returning two values, one $0 with the #hashtag value, and $1 with the hashtag value - without the # symbol.
Tested here (select preg_replace): http://www.phpliveregex.com/p/kOn
Perhaps it is something to do with the regex itself I'm not sure. Hopefully this helps someone else too.
My one liner is:
$string = preg_replace('/\B#([a-z0-9_-]+)/i', '$0 ', $string);
Edit: I understand it now. The added brackets ( ) around the square brackets effectively return the $1 variable. Otherwise the whole pattern is $0.

preg_replace to convert a user input string to a link pattern

I want to convert user input string
"something ... un// important ,,, like-this"
to
"something-un-important-like-this"
So basically remove all recurring special characters with "-". I've googled and came to this
preg_replace('/[-]+/', '-', preg_replace('/[^a-zA-Z0-9_-]/s', '-', strtolower($string)));
I'm curious as to know if this can be done with a single preg_replace().
Just to clear things out:
replace all special characters and blank space with a hyphen(-). If more occurrence appear consecutively replace them with single hyphen
My solution works perfectly as I want to but I'm looking to do the same in a single call
There was a similar question yesterday, but I don't have it at hand.
In your current first pattern:
[^a-zA-Z0-9_-]
you're looking for a single character only. If you make that a greedy match for one or more, the regular expression engine will automatically replace multiple of these with a single one:
[^a-zA-Z0-9_-]+
^- + = one or more
You then still have the problem that existing - inside the string are not caught, so you need to take them out of the "not-in" character class:
[^a-zA-Z0-9_]+
This then should do it:
preg_replace('/[^a-zA-Z0-9_]+/s', '-', strtolower($string));
And as it's only lowercase, you do not need to look for A-Z as well, just another reduction:
preg_replace('/[^a-z0-9_]+/s', '-', strtolower($string));
See as well Repetition and/or Quantifiers of which the + is one of (see Repetition­Docs; Repetition with Star and Plus­regular-expressions.info).
Also if you take a look at the modifiers­Docs, you'll see that the s (PCRE_DOTALL) modifier is not necessary:
$urlSlug = preg_replace('/[^a-z0-9_]+/', '-', strtolower($string));
Hope this helps and explains you a little about the regular expression you're using and also where you can find further documentation which is always helpful.
Try This:
preg_replace('/[^a-zA-Z0-9_-]+/s', '-', strtolower($string));

Regex pattern matching literal repeated \n

Given a literal string such as:
Hello\n\n\n\n\n\n\n\n\n\n\n\nWorld
I would like to reduce the repeated \n's to a single \n.
I'm using PHP, and been playing around with a bunch of different regex patterns. So here's a simple example of the code:
$testRegex = '/(\\n){2,}/';
$test = 'Hello\n\n\n\n\n\n\n\n\nWorld';
$test2 = preg_replace($testRegex ,'\n',$test);
echo "<hr/>test regex<hr/>".$test2;
I'm new to PHP, not that new to regex, but it seems '\n' conforms to special rules. I'm still trying to nail those down.
Edit: I've placed the literal code I have in my php file here, if I do str_replace() I can get good things to happen, but that's not a complete solution obviously.
To match a literal \n with regex, your string literal needs four backslashes to produce a string with two backlashes that’s interpreted by the regex engine as an escape for one backslash.
$testRegex = '/(\\\\n){2,}/';
$test = 'Hello\n\n\n\n\n\n\n\n\n\n\n\nWorld';
$test2 = preg_replace($testRegex, '\n', $test);
Perhaps you need to double up the escape in the regular expression?
$pattern = "/\\n+/"
$awesome_string = preg_replace($pattern, "\n", $string);
Edit: Just read your comment on the accepted answer. Doesn't apply, but is still useful.
If you're intending on expanding this logic to include other forms of white-space too:
$output = echo preg_replace('%(\s)*%', '$1', $input);
Reduces all repeated white-space characters to single instances of the matched white-space character.
it indeed conforms to special rules, and you need to add the "multiline"-modifier, m. So your pattern would look like
$pattern = '/(\n)+/m'
which should provide you with the matches. See the doc for all modifiers and their detailed meaning.
Since you're trying to reduce all newlines to one, the pattern above should work with the rest of your code. Good luck!
Try this regular expression:
/[\n]*/

PHP trim problem

I asked earlier how can I get rid of extra hyphens and whitespace added at the end and beginning of user submitted text for example, -ruby-on-rails- should be ruby-on-rails you guys suggested trim() which worked fine by itself but when I added it to my code it did not work at all it actually did some funky things to my code.
I tried placing the trim() code every where in my code but nothing worked can someone help me to get rid of extra hyphens and whitespace added at the end and beginning of user submitted text?
Here is my PHP code.
$tags = preg_split('/,/', strip_tags($_POST['tag']), -1, PREG_SPLIT_NO_EMPTY);
$tags = str_replace(' ', '-', $tags);
Update the trim statement to the following in order to update each item in the array:
foreach($tags as $key=>$value) {
$tags[$key] = trim($value, '-');
}
That should allow you to trim each value based on a string being expected.
If you have a string you can do this to strip hyphens from the beginning and end:
$tag = trim($tag, '-');
Your problem is that preg_split returns an array, but trim takes a string. You need to do the above for every string in the array.
Regarding trimming whitespace: if you are first converting all whitespace to hyphens then it should not be necessary to trim whitespace afterwards - the whitespace will already be gone. But be careful because the terms "whitespace" and "space" have different meanings. Your question seems to muddle these two terms.
Verify that the hyphen character you're attempting to trim is the same hyphen character that is wrapping -ruby-on-rails-. For example, these are all different characters that look similar: -, –, —, ―.
Im new to StackOverflow.com so I hope the function I wrote helps you in some way. You can specify what characters you want it to trim in the second parameter, for your example I've set it to just remove whitespace and 'dashes' by default, i've tested it using 'ruby-on-rails' and a somewhat extreme example of '- -- - - ruby-on-rails - -- - - -' and both produce the result: 'ruby-on-rails'.
The regular expression might be a bit of a q&d way of going about it but I hope it helps you, just reply if you have any problems implementing it or w/e.
function customTrim($s,$c='- ')
{
preg_match('#'.($a='[^'.$c.']').'.{1,}'.$a.'#',$s,$match);
return $match[0];
}

Categories