How can strip whitespaces in PHP's variable? - php

I know this comment PHP.net.
I would like to have a similar tool like tr for PHP such that I can run simply
tr -d " " ""
I run unsuccessfully the function php_strip_whitespace by
$tags_trimmed = php_strip_whitespace($tags);
I run the regex function also unsuccessfully
$tags_trimmed = preg_replace(" ", "", $tags);

To strip any whitespace, you can use a regular expression
$str=preg_replace('/\s+/', '', $str);
See also this answer for something which can handle whitespace in UTF-8 strings.

A regular expression does not account for UTF-8 characters by default. The \s meta-character only accounts for the original latin set. Therefore, the following command only removes tabs, spaces, carriage returns and new lines
// http://stackoverflow.com/a/1279798/54964
$str=preg_replace('/\s+/', '', $str);
With UTF-8 becoming mainstream this expression will more frequently fail/halt when it reaches the new utf-8 characters, leaving white spaces behind that the \s cannot account for.
To deal with the new types of white spaces introduced in unicode/utf-8, a more extensive string is required to match and removed modern white space.
Because regular expressions by default do not recognize multi-byte characters, only a delimited meta string can be used to identify them, to prevent the byte segments from being alters in other utf-8 characters (\x80 in the quad set could replace all \x80 sub-bytes in smart quotes)
$cleanedstr = preg_replace(
"/(\t|\n|\v|\f|\r| |\xC2\x85|\xc2\xa0|\xe1\xa0\x8e|\xe2\x80[\x80-\x8D]|\xe2\x80\xa8|\xe2\x80\xa9|\xe2\x80\xaF|\xe2\x81\x9f|\xe2\x81\xa0|\xe3\x80\x80|\xef\xbb\xbf)+/",
"_",
$str
);
This accounts for and removes tabs, newlines, vertical tabs, formfeeds, carriage returns, spaces, and additionally from here:
nextline, non-breaking spaces, mongolian vowel separator, [en quad, em quad, en space, em space, three-per-em space, four-per-em space, six-per-em space, figure space, punctuation space, thin space, hair space, zero width space, zero width non-joiner, zero width joiner], line separator, paragraph separator, narrow no-break space, medium mathematical space, word joiner, ideographical space, and the zero width non-breaking space.
Many of these wreak havoc in xml files when exported from automated tools or sites which foul up text searches, recognition, and can be pasted invisibly into PHP source code which causes the parser to jump to next command (paragraph and line separators) which causes lines of code to be skipped resulting in intermittent, unexplained errors that we have begun referring to as "textually transmitted diseases"
[Its not safe to copy and paste from the web anymore. Use a character scanner to protect your code. lol]

Sometimes you would need to delete consecutive white spaces. You can do it like this:
$str = "My name is";
$str = preg_replace('/\s\s+/', ' ', $str);
Output:
My name is

$string = str_replace(" ", "", $string);
I believe preg_replace would be looking for something like [:space:]

You can use trim function from php to trim both sides (left and right)
trim($yourinputdata," ");
Or
trim($yourinputdata);
You can also use
ltrim() - Removes whitespace or other predefined characters from the left side of a string
rtrim() - Removes whitespace or other predefined characters from the right side of a string
System: PHP 4,5,7
Docs: http://php.net/manual/en/function.trim.php

If you want to remove all whitespaces everywhere from $tags why not just:
str_replace(' ', '', $tags);
If you want to remove new lines and such that would require a bit more...

Any possible option is to use custom file wrapper for simulating variables as files. You can achieve it by using this:
1) First of all, register your wrapper (only once in file, use it like session_start()):
stream_wrapper_register('var', VarWrapper);
2) Then define your wrapper class (it is really fast written, not completely correct, but it works):
class VarWrapper {
protected $pos = 0;
protected $content;
public function stream_open($path, $mode, $options, &$opened_path) {
$varname = substr($path, 6);
global $$varname;
$this->content = $$varname;
return true;
}
public function stream_read($count) {
$s = substr($this->content, $this->pos, $count);
$this->pos += $count;
return $s;
}
public function stream_stat() {
$f = fopen(__file__, 'rb');
$a = fstat($f);
fclose($f);
if (isset($a[7])) $a[7] = strlen($this->content);
return $a;
}
}
3) Then use any file function with your wrapper on var:// protocol (you can use it for include, require etc. too):
global $__myVar;
$__myVar = 'Enter tags here';
$data = php_strip_whitespace('var://__myVar');
Note: Don't forget to have your variable in global scope (like global $__myVar)

This is an old post but the shortest answer is not listed here so I am adding it now
strtr($str,[' '=>'']);
Another common way to "skin this cat" would be to use explode and implode like this
implode('',explode(' ', $str));

You can do it by using ereg_replace
$str = 'This Is New Method Ever';
$newstr = ereg_replace([[:space:]])+', '', trim($str)):
echo $newstr
// Result - ThisIsNewMethodEver

you also use preg_replace_callback function . and this function is identical to its sibling preg_replace except for it can take a callback function which gives you more control on how you manipulate your output.
$str = "this is a string";
echo preg_replace_callback(
'/\s+/',
function ($matches) {
return "";
},
$str
);

$string = trim(preg_replace('/\s+/','',$string));

Is old post but can be done like this:
if(!function_exists('strim')) :
function strim($str,$charlist=" ",$option=0){
$return='';
if(is_string($str))
{
// Translate HTML entities
$return = str_replace(" "," ",$str);
$return = strtr($return, array_flip(get_html_translation_table(HTML_ENTITIES, ENT_QUOTES)));
// Choose trim option
switch($option)
{
// Strip whitespace (and other characters) from the begin and end of string
default:
case 0:
$return = trim($return,$charlist);
break;
// Strip whitespace (and other characters) from the begin of string
case 1:
$return = ltrim($return,$charlist);
break;
// Strip whitespace (and other characters) from the end of string
case 2:
$return = rtrim($return,$charlist);
break;
}
}
return $return;
}
endif;
Standard trim() functions can be a problematic when come HTML entities. That's why i wrote "Super Trim" function what is used to handle with this problem and also you can choose is trimming from the begin, end or booth side of string.

A simple way to remove spaces from the whole string is to use the explode function and print the whole string using a for loop.
$text = $_POST['string'];
$a=explode(" ", $text);
$count=count($a);
for($i=0;$i<$count; $i++){
echo $a[$i];
}

The \s regex argument is not compatible with UTF-8 multybyte strings.
This PHP RegEx is one I wrote to solve this using PCRE (Perl Compatible Regular Expressions) based arguments as a replacement for UTF-8 strings:
function remove_utf8_whitespace($string) {
return preg_replace('/\h+/u','',preg_replace('/\R+/u','',$string));
}
- Example Usage -
Before:
$string = " this is a test \n and another test\n\r\t ok! \n";
echo $string;
this is a test
and another test
ok!
echo strlen($string); // result: 43
After:
$string = remove_utf8_whitespace($string);
echo $string;
thisisatestandanothertestok!
echo strlen($string); // result: 28
PCRE Argument Listing
Source: https://www.rexegg.com/regex-quickstart.html
Character Legend Example Sample Match
\t Tab T\t\w{2} T ab
\r Carriage return character see below
\n Line feed character see below
\r\n Line separator on Windows AB\r\nCD AB
CD
\N Perl, PCRE (C, PHP, R…): one character that is not a line break \N+ ABC
\h Perl, PCRE (C, PHP, R…), Java: one horizontal whitespace character: tab or Unicode space separator
\H One character that is not a horizontal whitespace
\v .NET, JavaScript, Python, Ruby: vertical tab
\v Perl, PCRE (C, PHP, R…), Java: one vertical whitespace character: line feed, carriage return, vertical tab, form feed, paragraph or line separator
\V Perl, PCRE (C, PHP, R…), Java: any character that is not a vertical whitespace
\R Perl, PCRE (C, PHP, R…), Java: one line break (carriage return + line feed pair, and all the characters matched by \v)

There are some special types of whitespace in the form of tags.
You need to use
$str=strip_tags($str);
to remove redundant tags, error tags, to get to a normal string first.
And use
$str=preg_replace('/\s+/', '', $str);
It's work for me.

Related

Regex not working with white spaces [duplicate]

I know this comment PHP.net.
I would like to have a similar tool like tr for PHP such that I can run simply
tr -d " " ""
I run unsuccessfully the function php_strip_whitespace by
$tags_trimmed = php_strip_whitespace($tags);
I run the regex function also unsuccessfully
$tags_trimmed = preg_replace(" ", "", $tags);
To strip any whitespace, you can use a regular expression
$str=preg_replace('/\s+/', '', $str);
See also this answer for something which can handle whitespace in UTF-8 strings.
A regular expression does not account for UTF-8 characters by default. The \s meta-character only accounts for the original latin set. Therefore, the following command only removes tabs, spaces, carriage returns and new lines
// http://stackoverflow.com/a/1279798/54964
$str=preg_replace('/\s+/', '', $str);
With UTF-8 becoming mainstream this expression will more frequently fail/halt when it reaches the new utf-8 characters, leaving white spaces behind that the \s cannot account for.
To deal with the new types of white spaces introduced in unicode/utf-8, a more extensive string is required to match and removed modern white space.
Because regular expressions by default do not recognize multi-byte characters, only a delimited meta string can be used to identify them, to prevent the byte segments from being alters in other utf-8 characters (\x80 in the quad set could replace all \x80 sub-bytes in smart quotes)
$cleanedstr = preg_replace(
"/(\t|\n|\v|\f|\r| |\xC2\x85|\xc2\xa0|\xe1\xa0\x8e|\xe2\x80[\x80-\x8D]|\xe2\x80\xa8|\xe2\x80\xa9|\xe2\x80\xaF|\xe2\x81\x9f|\xe2\x81\xa0|\xe3\x80\x80|\xef\xbb\xbf)+/",
"_",
$str
);
This accounts for and removes tabs, newlines, vertical tabs, formfeeds, carriage returns, spaces, and additionally from here:
nextline, non-breaking spaces, mongolian vowel separator, [en quad, em quad, en space, em space, three-per-em space, four-per-em space, six-per-em space, figure space, punctuation space, thin space, hair space, zero width space, zero width non-joiner, zero width joiner], line separator, paragraph separator, narrow no-break space, medium mathematical space, word joiner, ideographical space, and the zero width non-breaking space.
Many of these wreak havoc in xml files when exported from automated tools or sites which foul up text searches, recognition, and can be pasted invisibly into PHP source code which causes the parser to jump to next command (paragraph and line separators) which causes lines of code to be skipped resulting in intermittent, unexplained errors that we have begun referring to as "textually transmitted diseases"
[Its not safe to copy and paste from the web anymore. Use a character scanner to protect your code. lol]
Sometimes you would need to delete consecutive white spaces. You can do it like this:
$str = "My name is";
$str = preg_replace('/\s\s+/', ' ', $str);
Output:
My name is
$string = str_replace(" ", "", $string);
I believe preg_replace would be looking for something like [:space:]
You can use trim function from php to trim both sides (left and right)
trim($yourinputdata," ");
Or
trim($yourinputdata);
You can also use
ltrim() - Removes whitespace or other predefined characters from the left side of a string
rtrim() - Removes whitespace or other predefined characters from the right side of a string
System: PHP 4,5,7
Docs: http://php.net/manual/en/function.trim.php
If you want to remove all whitespaces everywhere from $tags why not just:
str_replace(' ', '', $tags);
If you want to remove new lines and such that would require a bit more...
Any possible option is to use custom file wrapper for simulating variables as files. You can achieve it by using this:
1) First of all, register your wrapper (only once in file, use it like session_start()):
stream_wrapper_register('var', VarWrapper);
2) Then define your wrapper class (it is really fast written, not completely correct, but it works):
class VarWrapper {
protected $pos = 0;
protected $content;
public function stream_open($path, $mode, $options, &$opened_path) {
$varname = substr($path, 6);
global $$varname;
$this->content = $$varname;
return true;
}
public function stream_read($count) {
$s = substr($this->content, $this->pos, $count);
$this->pos += $count;
return $s;
}
public function stream_stat() {
$f = fopen(__file__, 'rb');
$a = fstat($f);
fclose($f);
if (isset($a[7])) $a[7] = strlen($this->content);
return $a;
}
}
3) Then use any file function with your wrapper on var:// protocol (you can use it for include, require etc. too):
global $__myVar;
$__myVar = 'Enter tags here';
$data = php_strip_whitespace('var://__myVar');
Note: Don't forget to have your variable in global scope (like global $__myVar)
This is an old post but the shortest answer is not listed here so I am adding it now
strtr($str,[' '=>'']);
Another common way to "skin this cat" would be to use explode and implode like this
implode('',explode(' ', $str));
You can do it by using ereg_replace
$str = 'This Is New Method Ever';
$newstr = ereg_replace([[:space:]])+', '', trim($str)):
echo $newstr
// Result - ThisIsNewMethodEver
you also use preg_replace_callback function . and this function is identical to its sibling preg_replace except for it can take a callback function which gives you more control on how you manipulate your output.
$str = "this is a string";
echo preg_replace_callback(
'/\s+/',
function ($matches) {
return "";
},
$str
);
$string = trim(preg_replace('/\s+/','',$string));
Is old post but can be done like this:
if(!function_exists('strim')) :
function strim($str,$charlist=" ",$option=0){
$return='';
if(is_string($str))
{
// Translate HTML entities
$return = str_replace(" "," ",$str);
$return = strtr($return, array_flip(get_html_translation_table(HTML_ENTITIES, ENT_QUOTES)));
// Choose trim option
switch($option)
{
// Strip whitespace (and other characters) from the begin and end of string
default:
case 0:
$return = trim($return,$charlist);
break;
// Strip whitespace (and other characters) from the begin of string
case 1:
$return = ltrim($return,$charlist);
break;
// Strip whitespace (and other characters) from the end of string
case 2:
$return = rtrim($return,$charlist);
break;
}
}
return $return;
}
endif;
Standard trim() functions can be a problematic when come HTML entities. That's why i wrote "Super Trim" function what is used to handle with this problem and also you can choose is trimming from the begin, end or booth side of string.
A simple way to remove spaces from the whole string is to use the explode function and print the whole string using a for loop.
$text = $_POST['string'];
$a=explode(" ", $text);
$count=count($a);
for($i=0;$i<$count; $i++){
echo $a[$i];
}
The \s regex argument is not compatible with UTF-8 multybyte strings.
This PHP RegEx is one I wrote to solve this using PCRE (Perl Compatible Regular Expressions) based arguments as a replacement for UTF-8 strings:
function remove_utf8_whitespace($string) {
return preg_replace('/\h+/u','',preg_replace('/\R+/u','',$string));
}
- Example Usage -
Before:
$string = " this is a test \n and another test\n\r\t ok! \n";
echo $string;
this is a test
and another test
ok!
echo strlen($string); // result: 43
After:
$string = remove_utf8_whitespace($string);
echo $string;
thisisatestandanothertestok!
echo strlen($string); // result: 28
PCRE Argument Listing
Source: https://www.rexegg.com/regex-quickstart.html
Character Legend Example Sample Match
\t Tab T\t\w{2} T ab
\r Carriage return character see below
\n Line feed character see below
\r\n Line separator on Windows AB\r\nCD AB
CD
\N Perl, PCRE (C, PHP, R…): one character that is not a line break \N+ ABC
\h Perl, PCRE (C, PHP, R…), Java: one horizontal whitespace character: tab or Unicode space separator
\H One character that is not a horizontal whitespace
\v .NET, JavaScript, Python, Ruby: vertical tab
\v Perl, PCRE (C, PHP, R…), Java: one vertical whitespace character: line feed, carriage return, vertical tab, form feed, paragraph or line separator
\V Perl, PCRE (C, PHP, R…), Java: any character that is not a vertical whitespace
\R Perl, PCRE (C, PHP, R…), Java: one line break (carriage return + line feed pair, and all the characters matched by \v)
There are some special types of whitespace in the form of tags.
You need to use
$str=strip_tags($str);
to remove redundant tags, error tags, to get to a normal string first.
And use
$str=preg_replace('/\s+/', '', $str);
It's work for me.

Replacing space indentation with tabs

I am looking to replace 4 spaces at the start of a line to tabs, but nothing further when there is text present.
My initial regex of / {4}+/ or /[ ]{4}+/ for the sake of readability clearly worked but obviously any instance found with four spaces would be replaced.
$string = ' this is some text --> <-- are these tabs or spaces?';
$string .= "\n and this is another line singly indented";
// I wrote 4 spaces, a tab, then 4 spaces here but unfortunately it will not display
$string .= "\n \t and this is third line with tabs and spaces";
$pattern = '/[ ]{4}+/';
$replace = "\t";
$new_str = preg_replace( $pattern , $replace , $string );
echo '<pre>'. $new_str .'</pre>';
This was an example of what I had originally, using the regex given the expression works perfectly with regards to the conversion but for the fact that the 4 spaces between the ----><---- are replaced by a tab. I am really looking to have text after indentation unaltered.
My best effort so far has been (^) start of line ([ ]{4}+) the pattern (.*?[;\s]*) anything up til the first non space \s
$pattern = '/^[ ]{4}+.*?[;\s]*/m';
which... almost works but for the fact that the indentation is now lost, can anybody help me understand what I am missing here?
[edit]
For clarity what I am trying to do is change the the start of text indentation from spaces to tabs, I really don't understand why this is confusing to anybody.
To be as clear as possible (using the value of $string above):
First line has 8 spaces at the start, some text with 4 spaces in the middle.
I am looking for 2 tabs at the start and no change to spaces in the text.
Second line has 4 spaces at the start.
I am looking to have only 1 tab at the start of the line.
Third line has 4 spaces, 1 tab and 4 spaces.
I am looking to have 3 tabs at the start of the line.
If you're not a regular expression guru, this will probably make most sense to you and be easier to adapt to similar use cases (this is not the most efficient code, but it's the most "readable" imho):
// replace all regex matches with the result of applying
// a given anonymous function to a $matches array
function tabs2spaces($s_with_spaces) {
// before anything else, replace existing tabs with 4 spaces
// to permit homogenous translation
$s_with_spaces = str_replace("\t", ' ', $s_with_spaces);
return preg_replace_callback(
'/^([ ]+)/m',
function ($ms) {
// $ms[0] - is full match
// $ms[1] - is first (...) group fron regex
// ...here you can add extra logic to handle
// leading spaces not multiple of 4
return str_repeat("\t", floor(strlen($ms[1]) / 4));
},
$s_with_spaces
);
}
// example (using dots to make spaces visible for explaining)
$s_with_spaces = <<<EOS
no indent
....4 spaces indent
........8 spaces indent
EOS;
$s_with_spaces = str_replace('.', ' ');
$s_with_tabs = tabs2spaces($s_with_spaces);
If you want a performant but hard to understand or tweak one-liner instead, the solutions in the comments from the regex-gurus above should work :)
P.S. In general preg_replace_callback (and its equivalent in Javascript) is a great "swiss army knife" of structured text processing. I have, shamefully, even writtent parsers to mini-languages using it ;)
The way I would do it is this.
$str = "...";
$pattern = "'/^[ ]{4}+/'";
$replace = "\t";
$multiStr = explode("\n", $str);
$out = "";
foreach ($multiStr as &$line) {
$line = str_replace("\t", " ",$line);
$out .= preg_replace( $pattern , $replace , $line )
}
$results = implode("\n", $out);
Please re-evaluate the code thoroughly as I have done this on a quick and intuitive way.
As I can't run a PHP server to test it :( but should help you resolved this problem.

PHP rtrim all trailing special characters

I'm making a function that that detect and remove all trailing special characters from string. It can convert strings like :
"hello-world"
"hello-world/"
"hello-world--"
"hello-world/%--+..."
into "hello-world".
anyone knows the trick without writing a lot of codes?
Just for fun
[^a-z\s]+
Regex demo
Explanation:
[^x]: One character that is not x sample
\s: "whitespace character": space, tab, newline, carriage return, vertical tab sample
+: One or more sample
PHP:
$re = "/[^a-z\\s]+/i";
$str = "Hello world\nhello world/\nhello world--\nhellow world/%--+...";
$subst = "";
$result = preg_replace($re, $subst, $str);
try this
$string = preg_replace('/[^A-Za-z0-9\-]/', '', $string); // Removes special chars.
or escape apostraphe from string
preg_replace('/[^A-Za-z0-9\-\']/', '', $string); // escape apostraphe
You could use a regex like this, depending on your definition of "special characters":
function clean_string($input) {
return preg_replace('/\W+$/', '', $input);
}
It replaces any characters that are not a word character (\W) at the end of the string $ with nothing. \W will match [^a-zA-Z0-9_], so anything that is not a letter, digit, or underscore will get replaced. To specify which characters are special chars, use a regex like this, where you put all your special chars within the [] brackets:
function clean_string($input) {
return preg_replace('/[\/%.+-]+$/', '', $input);
}
This one is what you are looking for. :
([^\n\w\d \"]*)$
It removes anything that is not from the alphabet, a number, a space and a new line.
Just call it like this :
preg_replace('/([^\n\w\s]*)$/', '', $string);

Trying to generate url slugs with PHP regex, Japanese characters not going through

So I'm trying to generate slugs to store in my DB. My locales include English, some European languages and Japanese.
I allow \d, \w, European characters are transliterated, Japanese characters are untouched. Period, plus and dash (-) are kept. Leading/trailing whitespace is removed, while the whitespace in between is replaced by a dash.
Here is some code: (please feel free to improve it, given my conditions above as my regex-fu is currently white belt tier)
function ToSlug($string, $separator='-') {
$url = iconv('UTF-8', 'ASCII//TRANSLIT//IGNORE', $string);
$url = preg_replace('/[^\d\w一-龠ぁ-ゔァ-ヴー々〆〤.+ -]/', '', $url);
$url = strtolower($url);
$url = preg_replace('/[ ' . $separator . ']+/', $separator, $url);
return $url;
}
I'm testing this function, however my JP characters are not getting through, they are simply replaced by ''. Whilst I do suspect it's the //IGNORE that's taking them out, I need that their or else my German, France transliterations will not work. Any ideas on how I can fix this?
EDIT: I'm not sure if Japanese Kanji covers all of Simplified Chinese but I'm gonna need that and Korean as well. If anyone who knows the regex off the bat please let me know it will save me some time searching. Thanks.
Note: I am not familiar with the Japanese writing system.
Looking at the function the iconv call appears to remove all the Japanese characters. Instead of using iconv to transliterate, it may be easier to just create a function that does it:
function _toSlugTransliterate($string) {
// Lowercase equivalents found at:
// https://github.com/kohana/core/blob/3.3/master/utf8/transliterate_to_ascii.php
$lower = [
'à'=>'a','ô'=>'o','ď'=>'d','ḟ'=>'f','ë'=>'e','š'=>'s','ơ'=>'o',
'ß'=>'ss','ă'=>'a','ř'=>'r','ț'=>'t','ň'=>'n','ā'=>'a','ķ'=>'k',
'ŝ'=>'s','ỳ'=>'y','ņ'=>'n','ĺ'=>'l','ħ'=>'h','ṗ'=>'p','ó'=>'o',
'ú'=>'u','ě'=>'e','é'=>'e','ç'=>'c','ẁ'=>'w','ċ'=>'c','õ'=>'o',
'ṡ'=>'s','ø'=>'o','ģ'=>'g','ŧ'=>'t','ș'=>'s','ė'=>'e','ĉ'=>'c',
'ś'=>'s','î'=>'i','ű'=>'u','ć'=>'c','ę'=>'e','ŵ'=>'w','ṫ'=>'t',
'ū'=>'u','č'=>'c','ö'=>'o','è'=>'e','ŷ'=>'y','ą'=>'a','ł'=>'l',
'ų'=>'u','ů'=>'u','ş'=>'s','ğ'=>'g','ļ'=>'l','ƒ'=>'f','ž'=>'z',
'ẃ'=>'w','ḃ'=>'b','å'=>'a','ì'=>'i','ï'=>'i','ḋ'=>'d','ť'=>'t',
'ŗ'=>'r','ä'=>'a','í'=>'i','ŕ'=>'r','ê'=>'e','ü'=>'u','ò'=>'o',
'ē'=>'e','ñ'=>'n','ń'=>'n','ĥ'=>'h','ĝ'=>'g','đ'=>'d','ĵ'=>'j',
'ÿ'=>'y','ũ'=>'u','ŭ'=>'u','ư'=>'u','ţ'=>'t','ý'=>'y','ő'=>'o',
'â'=>'a','ľ'=>'l','ẅ'=>'w','ż'=>'z','ī'=>'i','ã'=>'a','ġ'=>'g',
'ṁ'=>'m','ō'=>'o','ĩ'=>'i','ù'=>'u','į'=>'i','ź'=>'z','á'=>'a',
'û'=>'u','þ'=>'th','ð'=>'dh','æ'=>'ae','µ'=>'u','ĕ'=>'e','ı'=>'i',
];
return str_replace(array_keys($lower), array_values($lower), $string);
}
So, with some modifications, it could look something like this:
function toSlug($string, $separator = '-') {
// Work around this...
#$string = iconv('UTF-8', 'ASCII//TRANSLIT//IGNORE', $string);
$string = _toSlugTransliterate($string);
// Remove unwanted chars + trim excess whitespace
// I got the character ranges from the following URL:
// https://stackoverflow.com/questions/6787716/regular-expression-for-japanese-characters#10508813
$regex = '/[^一-龠ぁ-ゔァ-ヴーa-zA-Z0-9a-zA-Z0-9々〆〤.+ -]|^\s+|\s+$/u';
$string = preg_replace($regex, '', $string);
// Using the mb_* version seems safer for some reason
$string = mb_strtolower($string);
// Same as before
$string = preg_replace("/[ {$separator}]+/", $separator, $string);
return $string;
}
$x = ' æøå!this.ís-a test-ゔヴ ーァ ';
echo toSlug($x);
In regex you can use unicode "scripts" to match letters of various languages. There is no "Japanese" one, but there are Hiragana, Katakana and Han. As I have no idea how Japanese is written, and how one could use these, I am not even going to try.
Using these scripts, however, would be done something like this:
'/[\p{Hiragana}\p{Katakana}\p{Han}]+/'

PHP - Remove excess Whitespace but not new lines

i was looking for a way to remove excess whitespaces from within a string (that is, if 2 or more spaces are next each other, leave only 1 and remove the others), i found this Remove excess whitespace from within a string and i wanted to use this solution:
$foo = preg_replace( '/\s+/', ' ', $foo );
but this removes new lines aswell, while i want to keep them.
Is there any way to keep newlines while removing excess whitespace?
http://www.php.net/manual/en/regexp.reference.escape.php
defines \h
any horizontal whitespace character (since PHP 5.2.4)
so probably you are looking for
$foo = preg_replace( '/\h+/', ' ', $foo );
example: http://ideone.com/NcOiKW
If some of your symbols were converted to � after preg_replace (for example, Cyrillic capital letter R / Р), use mb_ereg_replace instead of preg_replace:
$value = mb_ereg_replace('/\h+/', ' ', $value);
if you want to remove excess of only-spaces (not tabs, new-lines, etc) you could use HEX code to be more specific:
$text = preg_replace('/\x20+/', ' ', $text);

Categories