I have a PHP functions that should do following tasks:
The function will take 2 params - the string and the glue(defaults to "-").
for a given string,
-- remove any special characters
-- make it lowercase
-- remove multiple spaces
-- replace spaces with glue (-).
The function takes $input as parameter. The code I have used for that is below:
//make all the charecters in lowercase
$low = strtolower($input);
//remove special charecters and multiple spaces
$nospecial = preg_replace('/[^a-zA-Z0-9\s+]/', '', $low);
//replace the spaces into glues (-). here is the problem.
$converted = preg_replace('/\s/', '-', $nospecial);
return $converted;
I did not find anything wrong with this code. but is shows multiple glues in the output. but i have already removed multiple spaces in the second line of the code. so why it shows multiple glues? could anyone have any solution?
but i have already removed multiple spaces in the second line of the code
No, you haven't remove the spaces. The second line of code keeps in $nospecial the letters, the digits, the spaces and the plus sign (+).
A character class matches a single character in the subject. \s+ in a character class doesn't mean "one or many space characters". It means either a space character (\s) or a plus sign (+). If it would mean what you meant, $nospecial won't contain any space character at all.
I suggest you split the second processing step in two: first remove all the special characters (keep letters, digits and spaces) then compact the spaces (there is no way to do both of them in a single replace).
The compacting can be then combined with the replacement of the spaces with the glue in a single operation:
// Make all the charecters lowercase
// Trim the white spaces first to avoid the final result have stray hyphens on the sides
$low = strtolower(trim($input));
// Remove special characters (keep letters, digits and spaces)
$nospecial = preg_replace('/[^a-z0-9\s]/', '', $low);
// Compact the spaces and replace them with the glue
$converted = preg_replace('/\s+/', '-', $nospecial);
return $converted;
Update: added trimming the input string before any processing to avoid getting a result that start or end with the glue. This is not required by the question, it was suggested by #niet-the-dark-absol in a comment and I also think it's a good thing; most probably, the string generated this way is used as file name by the question's author.
Related
I was scouring through SO answers and found that the solution that most gave for replacing multiple spaces is:
$new_str = preg_replace("/\s+/", " ", $str);
But in many cases the white space characters include UTF characters that include line feed, form feed, carriage return, non-breaking space, etc. This wiki describes that UTF defines twenty-five characters defined as whitespace.
So how do we replace all these characters as well using regular expressions?
When passing u modifier, \s becomes Unicode-aware. So, a simple solution is to use
$new_str = preg_replace("/\s+/u", " ", $str);
^^
See the PHP online demo.
The first thing to do is to read this explanation of how unicode can be treated in regex. Coming specifically to PHP, we need to first of all include the PCRE modifier 'u' for the engine to recognize UTF characters. So this would be:
$pattern = "/<our-pattern-here>/u";
The next thing is to note that in PHP unicode characters have the pattern \x{00A0} where 00A0 is hex representation for non-breaking space. So if we want to replace consecutive non-breaking spaces with a single space we would have:
$pattern = "/\x{00A0}+/u";
$new_str = preg_replace($pattern," ",$str);
And if we were to include other types of spaces mentioned in the wiki like:
\x{000D} carriage return
\x{000C} form feed
\x{0085} next line
Our pattern becomes:
$pattern = "/[\x{00A0}\x{000D}\x{000C}\x{0085}]+/u";
But this is really not great since the regex engine will take forever to find out all combinations of these characters. This is because the characters are included in square brackets [ ] and we have a + for one or more occurrences.
A better way to then get faster results is by replacing all occurrences of each of these characters by a normal space first. And then replacing multiple spaces with a single normal space. We remove the [ ]+ and instead separate the characters with the or operator | :
$pattern = "/\x{00A0}|\x{000D}|\x{000C}|\x{0085}/u";
$new_str = preg_replace($pattern," ",$str); // we have one-to-one replacement of character by a normal space, so 5 unicode chars give 5 normal spaces
$final_str = preg_replace("/\s+/", " ", $new_str); // multiple normal spaces now become single normal space
A pattern that matches all Unicode whitespaces is [\pZ\pC]. Here is a unit test to prove it.
If you're parsing user input in UTF-8 and need to normalize it, it's important to base your match on that list. So to answer your question that would be:
$new_str = preg_replace("/[\pZ\pC]+/u", " ", $str);
We're scrubbing a ridiculous amount of data, and am finding many examples of clean data that are left with irrelevant punctuation at the beginning and end of the final string. Quotes and DoubleQuotes are fine, but leading/trailing dashes, commas, etc need to be removed
I've studied the answer at How can I remove all leading and trailing punctuation?, but am unable to find a way to accomplish the same in PHP.
- some text. dash and period should be removed
"Some Other Text". period should be removed
it's a matter of opinion apostrophe should be kept
/ some more text? Slash should be removed and question mark kept
In short,
Certain punctuation occurring BEFORE the first AlphaNumeric character must be removed
Certain punctuation occurring AFTER the last AlphaNumeric character must be removed
How can I accomplish this with PHP - the few examples I've found surpass my RegEx/JS abilites.
This is an answer without regex.
You can use the function trim (or a combination of ltrim/rtrim to specify all characters you want to remove. For your example:
$str = trim($str, " \t\n\r\0\x0B-.");
(As I suppose you also want to remove spacing and newlines at the begin/end, I left the default mask)
See also rtrim and ltrim if you don't want to remove the same charlist at the beginning and the end of your strings.
You can modify the pattern to include characters.
$array = array(
'- some text.',
'"Some Other Text".',
'it\'s a matter of opinion',
'/ some more text?'
);
foreach($array as $key => $string){
$array[$key] = preg_replace(array(
'/^[\.\-\/]*/',
'/[\.\-\/]*$/'
), array('', ''), $string);
}
print_r($array);
If the punctuation could be more than one character, you could do this
function trimFormatting($str){ // trim
$osl = 0;
$pat = '(<br>|,|\s+)';
while($osl!==strlen($str)){
$osl = strlen($str);
$str =preg_replace('/^'.$pat.'|'.$pat.'$/i','',$str);
}
return $str;
}
echo trimFormatting('<BR>,<BR>Hello<BR>World<BR>, <BR>');
// will give "Hello<BR>World"
The routine checks for "<BR>" and "," and one or spaces ("\s+"). The "|" being the OR operator used three times in the routine. It trims both at the start "^" and the end "$" at the same time. It keeps looping through this until no more matches are trimmed off (i.e. there is no further reduction in string length).
I wanna replace with spaces all characters except number, lecters, space and other characters #=<>();*,.+\/-
e.g. preg_replace("/[^ #=<>();*,.+\/-\w]+/", " ", $string);
My problem is that when in the $string there are two or more consecutive characters to be replaced, the function replace this characters with just one space, while I need that the functions replace the two or more characters with two or more spaces.
Is there a way?
You should match only one character at a time. You must also escape some of the characters.
Change
preg_replace("/[^ #=<>();*,.+/-\w]+/", " ", $string);
to
preg_replace("/[^ #=<>();*,\\.+\\/\\-\\w]/", " ", $string);
If your character class contains both forward and backward slash, you need to escape both forward and backward slashes which are present inside the character class.
I wanna replace with spaces all characters except number, lecters, space and other characters #=<>();*,.+\/-
\w represent letters,numbers and also _ symbol. So avoid using \w inside the character class.
As another answer said, you need to remove the + after character class, which replaces one or more characters with a single space.
And your regex should be,
[^- #=<>();*,.+\\\/0-9A-Za-z]
DEMO
In the demo it matches _ symbol because it isn't included in the NOT character class. In the replacement part i gave only a single space. It replaces three _ symbols with three spaces.
i need to write a regex for make a double check: if a string contains empty spaces at the beginning, at the end, and if all string it's composed by empty spaces, and if string contains only number.
I've write this regex
$regex = '/^(\s+ )| ^(\d+)$/';
but it doesn't' work. What's wrong ?
First things first: get your spaces right!
For example (\s+ ) will match a minimum of one space (\s+) followed by another space ()! Same applies for the space between | and ^. This way you will match the space literally every time and this leads to wrong results.
If I get you right and you want to match on strings which
start with one or more spaces OR
end with one or more spaces OR
consist only of spaces OR
consist only of numbers
I'd use
/^(?:\s+.*|.*\s+$|\d+$)/
Demo # regex101
This way you match spaces at the start of the string (\s+.*) or (|) spaces at the end of the string (.*\s+$) or a completely numeric string (\d+$).
Insert capturing groups as needed.
This will match in case the whole string consists of spaces, too, because technically the string then starts with spaces.
The space before ^(\d+) make your regex can't catch the numeric string.
It should be like below:
$regex = '/^\s*\d*\s*$/';
First if all, remove the space between | and ^. You are trying to match a space before the beginning of the line (^), so that can not work.
I do not exactly understand what you want. Either a string that only consists of white spaces, or a number that may have white spaces at the beginning or end? Try this:
$regex = '/^\s*\d*\s*$/';
How to replace spaces and dashes when they appear together with only dash in PHP?
e.g below is my URL
http://kjd.case.150/1 BHK+Balcony- 700+ sqft. spacious apartmetn Bandra Wes
In this I want to replace all special characters with dash in PHP. In the URL there is already one dash after "balcony". If I replace the dash with a special character, then it becomes two dashes because there's already one dash in the URL and I want only 1 dash.
I'd say you may be want it other way. Not "spaces" but every non-alphanumeric character.
Because there can be other characters, disallowed in the URl (+ sign, for example, which is used as a space replacement)
So, to make a valid url from a free-form text
$url = preg_replace("![^a-z0-9]+!i", "-", $url);
If there could be max one space surrounding the hyphen you can use the answer by John. If there could be more than one space you can try using preg_replace:
$str = preg_replace('/\s*-\s*/','-',$str);
This would replace even a - not surrounded with any spaces with - !!
To make it a bit more efficient you could do:
$str = preg_replace('/\s+-\s*|\s*-\s+/','-',$str);
Now this would ensure a - has at least one space surrounding it while its being replaced.
This should do it for you
strtolower(str_replace(array(' ', ' '), '-', preg_replace('/[^a-zA-Z0-9 s]/', '', trim($string))));
Apply this regular expression /[^a-zA-Z0-9]/, '-' which will replace all non alphanumeric characters with -. Store it in a variable and again apply this regular expression /\-$/, '' which will escape the last character.
Its old tread but to help some one, Use this Function:
function urlSafeString($str)
{
$str = eregi_replace("[^a-z0-9\040]","",str_replace("-"," ",$str));
$str = eregi_replace("[\040]+","-",trim($str));
return $str;
}
it will return you a url safe string