How do I convert unicode codepoints to hexadecimal HTML entities? - php

I have a data file (an Apple plist, to be exact), that has Unicode codepoints like \U00e8 and \U2019. I need to turn these into valid hexadecimal HTML entities using PHP.
What I'm doing right now is a long string of:
$fileContents = str_replace("\U00e8", "è", $fileContents);
$fileContents = str_replace("\U2019", "’", $fileContents);
Which is clearly dreadful. I could use a regular expression to convert the \U and all trailing 0s to &#x, then stick on the trailing ;, but that also seems heavy-handed.
Is there a clean, simple way to take a string, and replace all the unicode codepoints to HTML entities?

Here's a correct answer, that deals with the fact that those are code units, not code points, and allows unencoding supplementary characters.
function unenc_utf16_code_units($string) {
/* go for possible surrogate pairs first */
$string = preg_replace_callback(
'/\\\\U(D[89ab][0-9a-f]{2})\\\\U(D[c-f][0-9a-f]{2})/i',
function ($matches) {
$hi_surr = hexdec($matches[1]);
$lo_surr = hexdec($matches[2]);
$scalar = (0x10000 + (($hi_surr & 0x3FF) << 10) |
($lo_surr & 0x3FF));
return "&#x" . dechex($scalar) . ";";
}, $string);
/* now the rest */
$string = preg_replace_callback('/\\\\U([0-9a-f]{4})/i',
function ($matches) {
//just to remove leading zeros
return "&#x" . dechex(hexdec($matches[1])) . ";";
}, $string);
return $string;
}

You can use preg_replace:
preg_replace('/\\\\U0*([0-9a-fA-F]{1,5})/', '&#x\1;', $fileContents);
Testing the RE:
PS> 'some \U00e8 string with \U2019 embedded Unicode' -replace '\\U0*([0-9a-f]{1,5})','&#x$1;'
some è string with ’ embedded Unicode

Related

Replace illegal charactes in a text by underscore in PHP

i need to replace the illegal characters by underscore(_),
For Example:
if user given text is "imageЙ ййé.png" need to replace this Й йй characters by _ __ So the overall output must be image_ __é.png. And this replacing must not occur for french characters. I have worked check the below code and help me to get the output.
<?php
$allowed_char_array=array("a","b","c","d","e","f","g","h","i","j","k","l","m","n","o","p","q","r","s","t","u","v","w","x","y","z","à","á","â","ã","ä","å","æ","ç","è","é","ê","ë","ì","í","î","ï","ñ","ò","ó","ô","õ","ö","ð","ø","œ","š","Þ","ù","ú","û","ü","ý","ÿ","ž","0","1","2","3","4","5","6","7","8","9"," ","(",")","-","_",".","#","#","$","%","*","¢","ß","¥","£","™","©","®","ª","×","÷","±","+","-","²","³","¼","½","¾","µ","¿","¶","·","¸","º","°","¯","§","…","¤","¦","≠","¬","ˆ","¨","‰");
$word = 'imageЙ ййé.png';
$file_name = url_rewrite(trim($word));
$file_name2 = strtolower($file_name);
$split = str_split($file_name2);
if(is_array($split) && is_array($allowed_char_array)){
$result=array_diff($split,$allowed_char_array);
echo '<pre>';
print_r($split);
echo '<pre>';
print_r($allowed_char_array);
echo '<pre>';
print_r($result);
}
function url_rewrite($chaine) {
// On va formater la chaine de caractère
// On remplace pour ne plus avoir d'accents
$accents = array('é','à','è','À','É','È');
$sans = array('é','à','è','À','É','È');
$chaine = str_replace($accents, $sans, $chaine);
return $chaine;
}
?>
I would build a regex (character class, to be exact) using your whitelisted characters, and then remove any character which matches the negation of that class.
$allowed_char_array = array("a","b","c","d","e") // and others
$chars = implode("", $allowed_char_array);
$regex = "/[^" . $chars . "]/u";
$input = "imageЙ ййé.png";
echo $regex . "\n";
$output = preg_replace($regex, "_", $input);
echo $input . "\n" . $output;
imageЙ ййé.png
image_ __é.png
If the above be not clear, here is what the actual all to preg_replace would look like:
preg_replace("/[^abcdefghijklmnopqrstuv]/u, "_", $input);
That is, any non whitelisted character would be replaced with just underscore. I did not bother to list out the entire character class, because you already have that in your source code.
Note that the /u flag in the regex is critical here, because your input string is a UTF-8 string. UTF-8 characters may consist of more than one byte, and using preg_replace on them without /u may have unexpected results.
You will want to use mb_strtolower() to convert multibyte characters to lowercase safely.
My solution uses strtr() to convert your French accented letters to your preferred form.
Since all characters are lowercased from the onset, you can halve your white list of French characters.
Using pathinfo() helps you to dissect your filename.
Code: (Demo)
$word = 'imageЙ ййé.png';
$parts = pathinfo($word);
$filename = strtr(mb_strtolower($parts['filename']), ['é' =>'é', 'à' => 'à','è' => 'è']);
echo preg_replace('~[^ a-zéàè]~u', '_', $filename) , "." , $parts['extension'];
Output:
image_ __é.png

Regex not working with white spaces [duplicate]

I know this comment PHP.net.
I would like to have a similar tool like tr for PHP such that I can run simply
tr -d " " ""
I run unsuccessfully the function php_strip_whitespace by
$tags_trimmed = php_strip_whitespace($tags);
I run the regex function also unsuccessfully
$tags_trimmed = preg_replace(" ", "", $tags);
To strip any whitespace, you can use a regular expression
$str=preg_replace('/\s+/', '', $str);
See also this answer for something which can handle whitespace in UTF-8 strings.
A regular expression does not account for UTF-8 characters by default. The \s meta-character only accounts for the original latin set. Therefore, the following command only removes tabs, spaces, carriage returns and new lines
// http://stackoverflow.com/a/1279798/54964
$str=preg_replace('/\s+/', '', $str);
With UTF-8 becoming mainstream this expression will more frequently fail/halt when it reaches the new utf-8 characters, leaving white spaces behind that the \s cannot account for.
To deal with the new types of white spaces introduced in unicode/utf-8, a more extensive string is required to match and removed modern white space.
Because regular expressions by default do not recognize multi-byte characters, only a delimited meta string can be used to identify them, to prevent the byte segments from being alters in other utf-8 characters (\x80 in the quad set could replace all \x80 sub-bytes in smart quotes)
$cleanedstr = preg_replace(
"/(\t|\n|\v|\f|\r| |\xC2\x85|\xc2\xa0|\xe1\xa0\x8e|\xe2\x80[\x80-\x8D]|\xe2\x80\xa8|\xe2\x80\xa9|\xe2\x80\xaF|\xe2\x81\x9f|\xe2\x81\xa0|\xe3\x80\x80|\xef\xbb\xbf)+/",
"_",
$str
);
This accounts for and removes tabs, newlines, vertical tabs, formfeeds, carriage returns, spaces, and additionally from here:
nextline, non-breaking spaces, mongolian vowel separator, [en quad, em quad, en space, em space, three-per-em space, four-per-em space, six-per-em space, figure space, punctuation space, thin space, hair space, zero width space, zero width non-joiner, zero width joiner], line separator, paragraph separator, narrow no-break space, medium mathematical space, word joiner, ideographical space, and the zero width non-breaking space.
Many of these wreak havoc in xml files when exported from automated tools or sites which foul up text searches, recognition, and can be pasted invisibly into PHP source code which causes the parser to jump to next command (paragraph and line separators) which causes lines of code to be skipped resulting in intermittent, unexplained errors that we have begun referring to as "textually transmitted diseases"
[Its not safe to copy and paste from the web anymore. Use a character scanner to protect your code. lol]
Sometimes you would need to delete consecutive white spaces. You can do it like this:
$str = "My name is";
$str = preg_replace('/\s\s+/', ' ', $str);
Output:
My name is
$string = str_replace(" ", "", $string);
I believe preg_replace would be looking for something like [:space:]
You can use trim function from php to trim both sides (left and right)
trim($yourinputdata," ");
Or
trim($yourinputdata);
You can also use
ltrim() - Removes whitespace or other predefined characters from the left side of a string
rtrim() - Removes whitespace or other predefined characters from the right side of a string
System: PHP 4,5,7
Docs: http://php.net/manual/en/function.trim.php
If you want to remove all whitespaces everywhere from $tags why not just:
str_replace(' ', '', $tags);
If you want to remove new lines and such that would require a bit more...
Any possible option is to use custom file wrapper for simulating variables as files. You can achieve it by using this:
1) First of all, register your wrapper (only once in file, use it like session_start()):
stream_wrapper_register('var', VarWrapper);
2) Then define your wrapper class (it is really fast written, not completely correct, but it works):
class VarWrapper {
protected $pos = 0;
protected $content;
public function stream_open($path, $mode, $options, &$opened_path) {
$varname = substr($path, 6);
global $$varname;
$this->content = $$varname;
return true;
}
public function stream_read($count) {
$s = substr($this->content, $this->pos, $count);
$this->pos += $count;
return $s;
}
public function stream_stat() {
$f = fopen(__file__, 'rb');
$a = fstat($f);
fclose($f);
if (isset($a[7])) $a[7] = strlen($this->content);
return $a;
}
}
3) Then use any file function with your wrapper on var:// protocol (you can use it for include, require etc. too):
global $__myVar;
$__myVar = 'Enter tags here';
$data = php_strip_whitespace('var://__myVar');
Note: Don't forget to have your variable in global scope (like global $__myVar)
This is an old post but the shortest answer is not listed here so I am adding it now
strtr($str,[' '=>'']);
Another common way to "skin this cat" would be to use explode and implode like this
implode('',explode(' ', $str));
You can do it by using ereg_replace
$str = 'This Is New Method Ever';
$newstr = ereg_replace([[:space:]])+', '', trim($str)):
echo $newstr
// Result - ThisIsNewMethodEver
you also use preg_replace_callback function . and this function is identical to its sibling preg_replace except for it can take a callback function which gives you more control on how you manipulate your output.
$str = "this is a string";
echo preg_replace_callback(
'/\s+/',
function ($matches) {
return "";
},
$str
);
$string = trim(preg_replace('/\s+/','',$string));
Is old post but can be done like this:
if(!function_exists('strim')) :
function strim($str,$charlist=" ",$option=0){
$return='';
if(is_string($str))
{
// Translate HTML entities
$return = str_replace(" "," ",$str);
$return = strtr($return, array_flip(get_html_translation_table(HTML_ENTITIES, ENT_QUOTES)));
// Choose trim option
switch($option)
{
// Strip whitespace (and other characters) from the begin and end of string
default:
case 0:
$return = trim($return,$charlist);
break;
// Strip whitespace (and other characters) from the begin of string
case 1:
$return = ltrim($return,$charlist);
break;
// Strip whitespace (and other characters) from the end of string
case 2:
$return = rtrim($return,$charlist);
break;
}
}
return $return;
}
endif;
Standard trim() functions can be a problematic when come HTML entities. That's why i wrote "Super Trim" function what is used to handle with this problem and also you can choose is trimming from the begin, end or booth side of string.
A simple way to remove spaces from the whole string is to use the explode function and print the whole string using a for loop.
$text = $_POST['string'];
$a=explode(" ", $text);
$count=count($a);
for($i=0;$i<$count; $i++){
echo $a[$i];
}
The \s regex argument is not compatible with UTF-8 multybyte strings.
This PHP RegEx is one I wrote to solve this using PCRE (Perl Compatible Regular Expressions) based arguments as a replacement for UTF-8 strings:
function remove_utf8_whitespace($string) {
return preg_replace('/\h+/u','',preg_replace('/\R+/u','',$string));
}
- Example Usage -
Before:
$string = " this is a test \n and another test\n\r\t ok! \n";
echo $string;
this is a test
and another test
ok!
echo strlen($string); // result: 43
After:
$string = remove_utf8_whitespace($string);
echo $string;
thisisatestandanothertestok!
echo strlen($string); // result: 28
PCRE Argument Listing
Source: https://www.rexegg.com/regex-quickstart.html
Character Legend Example Sample Match
\t Tab T\t\w{2} T ab
\r Carriage return character see below
\n Line feed character see below
\r\n Line separator on Windows AB\r\nCD AB
CD
\N Perl, PCRE (C, PHP, R…): one character that is not a line break \N+ ABC
\h Perl, PCRE (C, PHP, R…), Java: one horizontal whitespace character: tab or Unicode space separator
\H One character that is not a horizontal whitespace
\v .NET, JavaScript, Python, Ruby: vertical tab
\v Perl, PCRE (C, PHP, R…), Java: one vertical whitespace character: line feed, carriage return, vertical tab, form feed, paragraph or line separator
\V Perl, PCRE (C, PHP, R…), Java: any character that is not a vertical whitespace
\R Perl, PCRE (C, PHP, R…), Java: one line break (carriage return + line feed pair, and all the characters matched by \v)
There are some special types of whitespace in the form of tags.
You need to use
$str=strip_tags($str);
to remove redundant tags, error tags, to get to a normal string first.
And use
$str=preg_replace('/\s+/', '', $str);
It's work for me.

reversing a regular expression in php

suppose I have this function:
function f($string){
$string = preg_replace("`\[.*\]`U","",$string);
$string = preg_replace('`&(amp;)?#?[a-z0-9]+;`i','-',$string);
$string = htmlentities($string, ENT_COMPAT, 'utf-8');
$string = preg_replace( "`&([a-z])(acute|uml|circ|grave|ring|cedil|slash|tilde|caron|lig|quot|rsquo);`i","\\1", $string );
$string = preg_replace( array("`[^a-z0-9]`i","`[-]+`") , "-", $string);
return $string;
}
how can I reverse this function...ie. how should I write the function fReverse() such that we have the following:
$s = f("some string223---");
$reversed = fReverse($s);
echo $s;
and output: some string223---
f is lossy. It is impossible to find an exact reverse. For example, both "some string223---" and "some string223--------" gives the same output (see http://ideone.com/DtGQZ).
Nevertheless, we could find a pre-image of f. The 5 replacements of f are:
Strip everything between [ and ].
Replace entities like <, { and encoded entities like &lt; to a hyphen -.
Escape special HTML characters (< → <, & → & etc.)
Remove accents of accented characters (é (=é) → e, etc.)
Turn non-alphanumerics and consecutive hyphens into a single hyphen -.
Out of these, it is possible that 1, 2, 4 and 5 be identity transforms. Therefore, one possible preimage is just reverse step 3:
function fReverse($string) {
return html_entity_decode($string, ENT_COMPAT, 'utf-8');
}

Convert string into slug with single-hyphen delimiters only

I would like to sanitize a string in to a URL so this is what I basically need:
Everything must be removed except alphanumeric characters and spaces and dashed.
Spaces should be converter into dashes.
Eg.
This, is the URL!
must return
this-is-the-url
function slug($z){
$z = strtolower($z);
$z = preg_replace('/[^a-z0-9 -]+/', '', $z);
$z = str_replace(' ', '-', $z);
return trim($z, '-');
}
First strip unwanted characters
$new_string = preg_replace("/[^a-zA-Z0-9\s]/", "", $string);
Then changes spaces for unserscores
$url = preg_replace('/\s/', '-', $new_string);
Finally encode it ready for use
$new_url = urlencode($url);
The OP is not explicitly describing all of the attributes of a slug, but this is what I am gathering from the intent.
My interpretation of a perfect, valid, condensed slug aligns with this post: https://wordpress.stackexchange.com/questions/149191/slug-formatting-acceptable-characters#:~:text=However%2C%20we%20can%20summarise%20the,or%20end%20with%20a%20hyphen.
I find none of the earlier posted answers to achieve this consistently (and I'm not even stretching the scope of the question to include multi-byte characters).
convert all characters to lowercase
replace all sequences of one or more non-alphanumeric characters to a single hyphen.
trim the leading and trailing hyphens from the string.
I recommend the following one-liner which doesn't bother declaring single-use variables:
return trim(preg_replace('/[^a-z0-9]+/', '-', strtolower($string)), '-');
I have also prepared a demonstration which highlights what I consider to be inaccuracies in the other answers. (Demo)
'This, is - - the URL!' input
'this-is-the-url' expected
'this-is-----the-url' SilentGhost
'this-is-the-url' mario
'This-is---the-URL' Rooneyl
'This-is-the-URL' AbhishekGoel
'This, is - - the URL!' HelloHack
'This, is - - the URL!' DenisMatafonov
'This,-is-----the-URL!' AdeelRazaAzeemi
'this-is-the-url' mickmackusa
---
'Mork & Mindy' input
'mork-mindy' expected
'mork--mindy' SilentGhost
'mork-mindy' mario
'Mork--Mindy' Rooneyl
'Mork-Mindy' AbhishekGoel
'Mork & Mindy' HelloHack
'Mork & Mindy' DenisMatafonov
'Mork-&-Mindy' AdeelRazaAzeemi
'mork-mindy' mickmackusa
---
'What the_underscore ?!?' input
'what-the-underscore' expected
'what-theunderscore' SilentGhost
'what-the_underscore' mario
'What-theunderscore-' Rooneyl
'What-theunderscore-' AbhishekGoel
'What the_underscore ?!?' HelloHack
'What the_underscore ?!?' DenisMatafonov
'What-the_underscore-?!?' AdeelRazaAzeemi
'what-the-underscore' mickmackusa
This will do it in a Unix shell (I just tried it on my MacOS):
$ tr -cs A-Za-z '-' < infile.txt > outfile.txt
I got the idea from a blog post on More Shell, Less Egg
Try This
function clean($string) {
$string = str_replace(' ', '-', $string); // Replaces all spaces with hyphens.
$string = preg_replace('/[^A-Za-z0-9\-]/', '', $string); // Removes special chars.
return preg_replace('/-+/', '-', $string); // Replaces multiple hyphens with single one.
}
Usage:
echo clean('a|"bc!#£de^&$f g');
Will output: abcdef-g
source : https://stackoverflow.com/a/14114419/2439715
Using intl transliterator is a good option because with it you can easily handle complicated cases with a single set of rules. I added custom rules to illustrate how it can be flexible and how you can keep a maximum of meaningful informations. Feel free to remove them and to add your own rules.
$strings = [
'This, is - - the URL!',
'Holmes & Yoyo',
'L’Œil de démon',
'How to win 1000€?',
'€, $ & other currency symbols',
'Und die Katze fraß alle mäuse.',
'Белите рози на София',
'പോണ്ടിച്ചേരി സൂര്യനു കീഴിൽ',
];
$rules = <<<'RULES'
# Transliteration
:: Any-Latin ; :: Latin-Ascii ;
# examples of custom replacements
'&' > ' and ' ;
[^0-9][01]? { € > ' euro' ; € > ' euros' ;
[^0-9][01]? { '$' > ' dollar' ; '$' > ' dollars' ;
:: Null ;
# slugify
[^[:alnum:]&[:ascii:]]+ > '-' ;
:: Lower ;
# trim
[$] { '-' > &Remove() ;
'-' } [$] > &Remove() ;
RULES;
$tsl = Transliterator::createFromRules($rules, Transliterator::FORWARD);
$results = array_map(fn($s) => $tsl->transliterate($s), $strings);
print_r($results);
demo
Unfortunately, the PHP manual is totally empty about ICU transformations but you can find informations about them here.
All previous asnwers deal with url, but in case some one will need to sanitize string for login (e.g.) and keep it as text, here is you go:
function sanitizeText($str) {
$withSpecCharacters = htmlspecialchars($str);
$splitted_str = str_split($str);
$result = '';
foreach ($splitted_str as $letter){
if (strpos($withSpecCharacters, $letter) !== false) {
$result .= $letter;
}
}
return $result;
}
echo sanitizeText('ОРРииыфвсси ajvnsakjvnHB "&nvsp;\n" <script>alert()</script>');
//ОРРииыфвсси ajvnsakjvnHB &nvsp;\n scriptalert()/script
//No injections possible, all info at max keeped
function isolate($data) {
$data = trim($data);
$data = stripslashes($data);
$data = htmlspecialchars($data);
return $data;
}
You should use the slugify package and not reinvent the wheel ;)
https://github.com/cocur/slugify
The following will replace spaces with dashes.
$str = str_replace(' ', '-', $str);
Then the following statement will remove everything except alphanumeric characters and dashed. (didn't have spaces because in previous step we had replaced them with dashes.
// Char representation 0 - 9 A- Z a- z -
$str = preg_replace('/[^\x30-\x39\x41-\x5A\x61-\x7A\x2D]/', '', $str);
Which is equivalent to
$str = preg_replace('/[^0-9A-Za-z-]+/', '', $str);
FYI: To remove all special characters from a string use
$str = preg_replace('/[^\x20-\x7E]/', '', $str);
\x20 is hexadecimal for space that is start of Acsii charecter and \x7E is tilde. As accordingly to wikipedia https://en.wikipedia.org/wiki/ASCII#Printable_characters
FYI: look into the Hex Column for the interval 20-7E
Printable characters
Codes 20hex to 7Ehex, known as the printable characters, represent letters, digits, punctuation marks, and a few miscellaneous symbols. There are 95 printable characters in total.

How can strip whitespaces in PHP's variable?

I know this comment PHP.net.
I would like to have a similar tool like tr for PHP such that I can run simply
tr -d " " ""
I run unsuccessfully the function php_strip_whitespace by
$tags_trimmed = php_strip_whitespace($tags);
I run the regex function also unsuccessfully
$tags_trimmed = preg_replace(" ", "", $tags);
To strip any whitespace, you can use a regular expression
$str=preg_replace('/\s+/', '', $str);
See also this answer for something which can handle whitespace in UTF-8 strings.
A regular expression does not account for UTF-8 characters by default. The \s meta-character only accounts for the original latin set. Therefore, the following command only removes tabs, spaces, carriage returns and new lines
// http://stackoverflow.com/a/1279798/54964
$str=preg_replace('/\s+/', '', $str);
With UTF-8 becoming mainstream this expression will more frequently fail/halt when it reaches the new utf-8 characters, leaving white spaces behind that the \s cannot account for.
To deal with the new types of white spaces introduced in unicode/utf-8, a more extensive string is required to match and removed modern white space.
Because regular expressions by default do not recognize multi-byte characters, only a delimited meta string can be used to identify them, to prevent the byte segments from being alters in other utf-8 characters (\x80 in the quad set could replace all \x80 sub-bytes in smart quotes)
$cleanedstr = preg_replace(
"/(\t|\n|\v|\f|\r| |\xC2\x85|\xc2\xa0|\xe1\xa0\x8e|\xe2\x80[\x80-\x8D]|\xe2\x80\xa8|\xe2\x80\xa9|\xe2\x80\xaF|\xe2\x81\x9f|\xe2\x81\xa0|\xe3\x80\x80|\xef\xbb\xbf)+/",
"_",
$str
);
This accounts for and removes tabs, newlines, vertical tabs, formfeeds, carriage returns, spaces, and additionally from here:
nextline, non-breaking spaces, mongolian vowel separator, [en quad, em quad, en space, em space, three-per-em space, four-per-em space, six-per-em space, figure space, punctuation space, thin space, hair space, zero width space, zero width non-joiner, zero width joiner], line separator, paragraph separator, narrow no-break space, medium mathematical space, word joiner, ideographical space, and the zero width non-breaking space.
Many of these wreak havoc in xml files when exported from automated tools or sites which foul up text searches, recognition, and can be pasted invisibly into PHP source code which causes the parser to jump to next command (paragraph and line separators) which causes lines of code to be skipped resulting in intermittent, unexplained errors that we have begun referring to as "textually transmitted diseases"
[Its not safe to copy and paste from the web anymore. Use a character scanner to protect your code. lol]
Sometimes you would need to delete consecutive white spaces. You can do it like this:
$str = "My name is";
$str = preg_replace('/\s\s+/', ' ', $str);
Output:
My name is
$string = str_replace(" ", "", $string);
I believe preg_replace would be looking for something like [:space:]
You can use trim function from php to trim both sides (left and right)
trim($yourinputdata," ");
Or
trim($yourinputdata);
You can also use
ltrim() - Removes whitespace or other predefined characters from the left side of a string
rtrim() - Removes whitespace or other predefined characters from the right side of a string
System: PHP 4,5,7
Docs: http://php.net/manual/en/function.trim.php
If you want to remove all whitespaces everywhere from $tags why not just:
str_replace(' ', '', $tags);
If you want to remove new lines and such that would require a bit more...
Any possible option is to use custom file wrapper for simulating variables as files. You can achieve it by using this:
1) First of all, register your wrapper (only once in file, use it like session_start()):
stream_wrapper_register('var', VarWrapper);
2) Then define your wrapper class (it is really fast written, not completely correct, but it works):
class VarWrapper {
protected $pos = 0;
protected $content;
public function stream_open($path, $mode, $options, &$opened_path) {
$varname = substr($path, 6);
global $$varname;
$this->content = $$varname;
return true;
}
public function stream_read($count) {
$s = substr($this->content, $this->pos, $count);
$this->pos += $count;
return $s;
}
public function stream_stat() {
$f = fopen(__file__, 'rb');
$a = fstat($f);
fclose($f);
if (isset($a[7])) $a[7] = strlen($this->content);
return $a;
}
}
3) Then use any file function with your wrapper on var:// protocol (you can use it for include, require etc. too):
global $__myVar;
$__myVar = 'Enter tags here';
$data = php_strip_whitespace('var://__myVar');
Note: Don't forget to have your variable in global scope (like global $__myVar)
This is an old post but the shortest answer is not listed here so I am adding it now
strtr($str,[' '=>'']);
Another common way to "skin this cat" would be to use explode and implode like this
implode('',explode(' ', $str));
You can do it by using ereg_replace
$str = 'This Is New Method Ever';
$newstr = ereg_replace([[:space:]])+', '', trim($str)):
echo $newstr
// Result - ThisIsNewMethodEver
you also use preg_replace_callback function . and this function is identical to its sibling preg_replace except for it can take a callback function which gives you more control on how you manipulate your output.
$str = "this is a string";
echo preg_replace_callback(
'/\s+/',
function ($matches) {
return "";
},
$str
);
$string = trim(preg_replace('/\s+/','',$string));
Is old post but can be done like this:
if(!function_exists('strim')) :
function strim($str,$charlist=" ",$option=0){
$return='';
if(is_string($str))
{
// Translate HTML entities
$return = str_replace(" "," ",$str);
$return = strtr($return, array_flip(get_html_translation_table(HTML_ENTITIES, ENT_QUOTES)));
// Choose trim option
switch($option)
{
// Strip whitespace (and other characters) from the begin and end of string
default:
case 0:
$return = trim($return,$charlist);
break;
// Strip whitespace (and other characters) from the begin of string
case 1:
$return = ltrim($return,$charlist);
break;
// Strip whitespace (and other characters) from the end of string
case 2:
$return = rtrim($return,$charlist);
break;
}
}
return $return;
}
endif;
Standard trim() functions can be a problematic when come HTML entities. That's why i wrote "Super Trim" function what is used to handle with this problem and also you can choose is trimming from the begin, end or booth side of string.
A simple way to remove spaces from the whole string is to use the explode function and print the whole string using a for loop.
$text = $_POST['string'];
$a=explode(" ", $text);
$count=count($a);
for($i=0;$i<$count; $i++){
echo $a[$i];
}
The \s regex argument is not compatible with UTF-8 multybyte strings.
This PHP RegEx is one I wrote to solve this using PCRE (Perl Compatible Regular Expressions) based arguments as a replacement for UTF-8 strings:
function remove_utf8_whitespace($string) {
return preg_replace('/\h+/u','',preg_replace('/\R+/u','',$string));
}
- Example Usage -
Before:
$string = " this is a test \n and another test\n\r\t ok! \n";
echo $string;
this is a test
and another test
ok!
echo strlen($string); // result: 43
After:
$string = remove_utf8_whitespace($string);
echo $string;
thisisatestandanothertestok!
echo strlen($string); // result: 28
PCRE Argument Listing
Source: https://www.rexegg.com/regex-quickstart.html
Character Legend Example Sample Match
\t Tab T\t\w{2} T ab
\r Carriage return character see below
\n Line feed character see below
\r\n Line separator on Windows AB\r\nCD AB
CD
\N Perl, PCRE (C, PHP, R…): one character that is not a line break \N+ ABC
\h Perl, PCRE (C, PHP, R…), Java: one horizontal whitespace character: tab or Unicode space separator
\H One character that is not a horizontal whitespace
\v .NET, JavaScript, Python, Ruby: vertical tab
\v Perl, PCRE (C, PHP, R…), Java: one vertical whitespace character: line feed, carriage return, vertical tab, form feed, paragraph or line separator
\V Perl, PCRE (C, PHP, R…), Java: any character that is not a vertical whitespace
\R Perl, PCRE (C, PHP, R…), Java: one line break (carriage return + line feed pair, and all the characters matched by \v)
There are some special types of whitespace in the form of tags.
You need to use
$str=strip_tags($str);
to remove redundant tags, error tags, to get to a normal string first.
And use
$str=preg_replace('/\s+/', '', $str);
It's work for me.

Categories