preg_replace to convert a user input string to a link pattern - php

I want to convert user input string
"something ... un// important ,,, like-this"
to
"something-un-important-like-this"
So basically remove all recurring special characters with "-". I've googled and came to this
preg_replace('/[-]+/', '-', preg_replace('/[^a-zA-Z0-9_-]/s', '-', strtolower($string)));
I'm curious as to know if this can be done with a single preg_replace().
Just to clear things out:
replace all special characters and blank space with a hyphen(-). If more occurrence appear consecutively replace them with single hyphen
My solution works perfectly as I want to but I'm looking to do the same in a single call

There was a similar question yesterday, but I don't have it at hand.
In your current first pattern:
[^a-zA-Z0-9_-]
you're looking for a single character only. If you make that a greedy match for one or more, the regular expression engine will automatically replace multiple of these with a single one:
[^a-zA-Z0-9_-]+
^- + = one or more
You then still have the problem that existing - inside the string are not caught, so you need to take them out of the "not-in" character class:
[^a-zA-Z0-9_]+
This then should do it:
preg_replace('/[^a-zA-Z0-9_]+/s', '-', strtolower($string));
And as it's only lowercase, you do not need to look for A-Z as well, just another reduction:
preg_replace('/[^a-z0-9_]+/s', '-', strtolower($string));
See as well Repetition and/or Quantifiers of which the + is one of (see Repetition­Docs; Repetition with Star and Plus­regular-expressions.info).
Also if you take a look at the modifiers­Docs, you'll see that the s (PCRE_DOTALL) modifier is not necessary:
$urlSlug = preg_replace('/[^a-z0-9_]+/', '-', strtolower($string));
Hope this helps and explains you a little about the regular expression you're using and also where you can find further documentation which is always helpful.

Try This:
preg_replace('/[^a-zA-Z0-9_-]+/s', '-', strtolower($string));

Related

using regex for filtering some words in persian in php

I'm working on a script that is going to identify offensive words from text messages. The problem is that sometimes users make some changes in words and make them unidentifiable. my code has to be able to identify those too as far as possible.
First of all I replace all non-alnum chars to spaces.
And then:
I've written two regex patterns.
One to remove repeating characters from string.
for Example: the user has written: seeeeex, it replaces it with sex:
preg_replace('/(.)\1+/', '$1', $text)
this regex works fine for English words but not in Farsi words which is my case.
for example if you write:
امیییییییییین
it does nothing with it.
I also tried
mb_ereg_replace
But it didn't work either.
My other regex is to remove spaces around all one-letter words.
for example: I want it to convert S E X to sex:
preg_replace('/( [a-zA-Zآ-ی] )\1+/', trim('$1'), $text);
This regex doesn't work at all and needs to be corrected.
Thank you for your help
Working with multi-byte characters, you should enable Unicode Aware modifier to change behavior of tokens in order to match right thing. In your first case it should be:
/(.)\1+/u
In your second regex, however, I see both syntax and semantic errors which you would change it to:
/\b(\pL)\s+/u
PHP:
preg_replace('/\b(\pL)\s+/u', '$1', $text);
Putting all together:
$text = 'سسس ککک سسس';
echo preg_replace(['/(.)\1+/u', '/\b(\pL)\s+/u'], '$1', $text); // خروجی میدهد: سکس
Live demo

Replace all kind of dashes

I have a excel document which I import in MySQL using a library.
But some of the texts in the document contain dashes which I though I have replaced, but apparently not all of them.
-, –, - <-all of these are different.
Is there any way I could replace all kind of dahes with this one -
The main problem is that I dont know all of the dashes that exist in computers.
Just use regex with unicode modifier u and a character class:
$output = preg_replace('#\p{Pd}#u', '-', $input);
From the manual : Pd Dash punctuation
Online demo
How about:
$string = str_replace(array('-','–','-','—', ...), '-', $string);
Use the above code and see if it works. If you're still seeing some dashes not being replaced, you can just add them into the array, and it'll work.

preg_replace proper regex for repeating character

OK I'm stuck again, this time it's a problem with the regex... Was searching google, was searching SO, but there wasn't a post that made me happy... So to make a long story short:
§text = Database entry string -> could be everything
$text gets parsed and the regex should replace everything between 2 * with:
[bla].$matchedtext.[blub]
So I've tried to find the right regex for that and that's what I came up with:
$text= preg_replace('~(/\*([^\"]*?)\*/)~', "$1<b>$2</b>", $text);
And the 2 * per match should disappear as well :/...
Obviously it doesn't work, elsewhise I wouldn't post :D -> Any ideas?
This should probably do it:
preg_replace('/\*([^"*]*)\*/', '<b>\1</b>', $text);
A few comments on your earlier regular expression:
[^\"]*?
The non-greedy * is not necessary; when you're looking at a negative character set, simply add the '*' inside the character set. Also, the double quote doesn't need escaping.
[^"*]*
You only need memory groups for things you wish to remember; in your case, you don't want to know that you matched a beginning and ending asterisk. So you can do your whole matching with just one memory group.

PHP trim problem

I asked earlier how can I get rid of extra hyphens and whitespace added at the end and beginning of user submitted text for example, -ruby-on-rails- should be ruby-on-rails you guys suggested trim() which worked fine by itself but when I added it to my code it did not work at all it actually did some funky things to my code.
I tried placing the trim() code every where in my code but nothing worked can someone help me to get rid of extra hyphens and whitespace added at the end and beginning of user submitted text?
Here is my PHP code.
$tags = preg_split('/,/', strip_tags($_POST['tag']), -1, PREG_SPLIT_NO_EMPTY);
$tags = str_replace(' ', '-', $tags);
Update the trim statement to the following in order to update each item in the array:
foreach($tags as $key=>$value) {
$tags[$key] = trim($value, '-');
}
That should allow you to trim each value based on a string being expected.
If you have a string you can do this to strip hyphens from the beginning and end:
$tag = trim($tag, '-');
Your problem is that preg_split returns an array, but trim takes a string. You need to do the above for every string in the array.
Regarding trimming whitespace: if you are first converting all whitespace to hyphens then it should not be necessary to trim whitespace afterwards - the whitespace will already be gone. But be careful because the terms "whitespace" and "space" have different meanings. Your question seems to muddle these two terms.
Verify that the hyphen character you're attempting to trim is the same hyphen character that is wrapping -ruby-on-rails-. For example, these are all different characters that look similar: -, –, —, ―.
Im new to StackOverflow.com so I hope the function I wrote helps you in some way. You can specify what characters you want it to trim in the second parameter, for your example I've set it to just remove whitespace and 'dashes' by default, i've tested it using 'ruby-on-rails' and a somewhat extreme example of '- -- - - ruby-on-rails - -- - - -' and both produce the result: 'ruby-on-rails'.
The regular expression might be a bit of a q&d way of going about it but I hope it helps you, just reply if you have any problems implementing it or w/e.
function customTrim($s,$c='- ')
{
preg_match('#'.($a='[^'.$c.']').'.{1,}'.$a.'#',$s,$match);
return $match[0];
}

Need to negate this regex pattern, but no clue how

I found a regex pattern for PHP that does the exact OPPOSITE of what I'm needing, and I'm wondering how I can reverse it?
Let's say I have the following text: Item_154 ($12)
This pattern /\((.*?)\)/ gets what's inside the parenthesis, but I need to get "Item_154" and cut out what's in parenthesis and the space before the parenthesis.
Anybody know how I can do that?
Regex is above my head apparently...
/^([^( ]*)/
Match everything from the start of the string until the first space or (.
If the item you need to match can have spaces in it, and you only want to get rid of whitespace immediately before the parenthetical, then you can use this instead:
/^([^(]*?)\s*\(/
The following will match anything that looks like text (...) but returns just the text part in the match.
\w+(?=\s*\([^)]*\))
Explanation:
The \w includes alphanumeric and underscore, with + saying match one or more.
The (?= ) group is positive lookahead, saying "confirm this exists but don't match it".
Then we have \s for whitespace, and * saying zero or more.
The \( and \) matches literal ( and ) characters (since its normally a special chat).
The [^)] is anything non-) character, and again * is zero or more.
Hopefully all makes sense?
/(.*)\(.*\)/
What is not in () will now be your 1st match :)
One site that really helped me was http://gskinner.com/RegExr/
It'll let you build a regex and then paste in some sample targets/text to test it against, highlighting matches. All of the possible regex components are listed on the right with (essentially) a tooltip describing the function.
<?php
$string = 'Item_154 ($12)';
$pattern = '/(.*)\(.*?\)/';
preg_match($pattern, $string, $matches);
var_dump($matches[1]);
?>
Should get you Item_154
The following regex works for your string as a replacement if that helps? :-
\s*\(.*?\)
Here's an explanation of what's it doing...
Whitespace, any number of repetitions - \s*
Literal - \(
Any character, any number of repetitions, as few as possible - .*?
Literal - \)
I've found Expresso (http://www.ultrapico.com/) is the best way of learning/working out regular expressions.
HTH
Here is a one-shot to do the whole thing
$text = 'Item_154 ($12)';
$text = preg_replace('/([^\s]*)\s(\()[^)]*(\))/', $1$2$3, $text);
var_dump($text);
//Outputs: Item_154()
Keep in mind that using any PCRE functions involves a fair amount of overhead, so if you are using something like this in a long loop and the text is simple, you could probably do something like this with substr/strpos and then concat the parens on to the end since you know that they should be empty anyway.
That said, if you are looking to learn REGEXs and be productive with them, I would suggest checking out: http://rexv.org
I've found the PCRE tool there to very useful, though it can be quirky in certain ways. In particular, any examples that you work with there should only use single quotes if possible, as it doesn't work with double quotes correctly.
Also, to really get a grip on how to use regexs, I would check out Mastering Regular Expressions by Jeffrey Friedl ISBN-13:978-0596528126
Since you are using PHP, I would try to get the 3rd Edition since it has a section specifically on PHP PCRE. Just make sure to read the first 6 chapters first since they give you the foundation needed to work with the material in that particular chapter. If you see the 2nd Edition on the cheap somewhere, that pretty much the same core material, so it would be a good buy as well.

Categories