Mongo db utf-8 exception

Mongo db utf-8 exception - php

i'm wondering why i'm having troubles when inserting strings in db like hey hey %80 the '%80' still produce an eception :
Uncaught exception 'MongoException' with message 'non-utf8 string: hey hey �'
what i need to do? :( is %80 not a utf-8; char? :O
js pass the string to the controller:
function new_pool_post(_url,_data,_starter){
$.ajax({
type:'POST',
data:_data,
dataType:'json',
url:_url,
beforeSend:function(){
$('.ajax-loading').show();
$(_starter).attr('disabled','disabled');
},
error:function(){
$('.ajax-loading').hide();
$(_starter).removeAttr('disabled');
},
success:function(json){
$('.ajax-loading').hide();
$(_starter).removeAttr('disabled');
if(json){
$('.pool-append').prepend(json.pool_post);
}
}
});
}
controller receive data:
$id_project = $this->input->post('id_project',true);
$id_user = $this->session->userdata('user_id');
$pool_post = $this->input->post('pool_post',true);
controller sanitize data :
public function xss_clean($str, $is_image = FALSE)
{
/*
* Is the string an array?
*
*/
if (is_array($str))
{
while (list($key) = each($str))
{
$str[$key] = $this->xss_clean($str[$key]);
}
return $str;
}
/*Remove non utf-8; chars*/
$str = htmlspecialchars(urlencode(preg_replace('/[\x00-\x1F\x80-\xFF]/','',$str)));
/*
* Remove Invisible Characters
*/
$str = remove_invisible_characters($str);
// Validate Entities in URLs
$str = $this->_validate_entities($str);
/*
* URL Decode
*
* Just in case stuff like this is submitted:
*
* Google
*
* Note: Use rawurldecode() so it does not remove plus signs
*
*/
$str = rawurldecode($str);
/*
* Convert character entities to ASCII
*
* This permits our tests below to work reliably.
* We only convert entities that are within tags since
* these are the ones that will pose security problems.
*
*/
$str = preg_replace_callback("/[a-z]+=([\'\"]).*?\\1/si", array($this, '_convert_attribute'), $str);
$str = preg_replace_callback("/<\w+.*?(?=>|<|$)/si", array($this, '_decode_entity'), $str);
/*
* Remove Invisible Characters Again!
*/
$str = remove_invisible_characters($str);
/*
* Convert all tabs to spaces
*
* This prevents strings like this: ja vascript
* NOTE: we deal with spaces between characters later.
* NOTE: preg_replace was found to be amazingly slow here on
* large blocks of data, so we use str_replace.
*/
if (strpos($str, "\t") !== FALSE)
{
$str = str_replace("\t", ' ', $str);
}
/*
* Capture converted string for later comparison
*/
$converted_string = $str;
// Remove Strings that are never allowed
$str = $this->_do_never_allowed($str);
/*
* Makes PHP tags safe
*
* Note: XML tags are inadvertently replaced too:
*
* <?xml
*
* But it doesn't seem to pose a problem.
*/
if ($is_image === TRUE)
{
// Images have a tendency to have the PHP short opening and
// closing tags every so often so we skip those and only
// do the long opening tags.
$str = preg_replace('/<\?(php)/i', "<?\\1", $str);
}
else
{
$str = str_replace(array('<?', '?'.'>'), array('<?', '?>'), $str);
}
/*
* Compact any exploded words
*
* This corrects words like: j a v a s c r i p t
* These words are compacted back to their correct state.
*/
$words = array(
'javascript', 'expression', 'vbscript', 'script',
'applet', 'alert', 'document', 'write', 'cookie', 'window'
);
foreach ($words as $word)
{
$temp = '';
for ($i = 0, $wordlen = strlen($word); $i < $wordlen; $i++)
{
$temp .= substr($word, $i, 1)."\s*";
}
// We only want to do this when it is followed by a non-word character
// That way valid stuff like "dealer to" does not become "dealerto"
$str = preg_replace_callback('#('.substr($temp, 0, -3).')(\W)#is', array($this, '_compact_exploded_words'), $str);
}
/*
* Remove disallowed Javascript in links or img tags
* We used to do some version comparisons and use of stripos for PHP5,
* but it is dog slow compared to these simplified non-capturing
* preg_match(), especially if the pattern exists in the string
*/
do
{
$original = $str;
if (preg_match("/<a/i", $str))
{
$str = preg_replace_callback("#<a\s+([^>]*?)(>|$)#si", array($this, '_js_link_removal'), $str);
}
if (preg_match("/<img/i", $str))
{
$str = preg_replace_callback("#<img\s+([^>]*?)(\s?/?>|$)#si", array($this, '_js_img_removal'), $str);
}
if (preg_match("/script/i", $str) OR preg_match("/xss/i", $str))
{
$str = preg_replace("#<(/*)(script|xss)(.*?)\>#si", '[removed]', $str);
}
}
while($original != $str);
unset($original);
// Remove evil attributes such as style, onclick and xmlns
$str = $this->_remove_evil_attributes($str, $is_image);
/*
* Sanitize naughty HTML elements
*
* If a tag containing any of the words in the list
* below is found, the tag gets converted to entities.
*
* So this: <blink>
* Becomes: <blink>
*/
$naughty = 'alert|applet|audio|basefont|base|behavior|bgsound|blink|body|embed|expression|form|frameset|frame|head|html|ilayer|iframe|input|isindex|layer|link|meta|object|plaintext|style|script|textarea|title|video|xml|xss';
$str = preg_replace_callback('#<(/*\s*)('.$naughty.')([^><]*)([><]*)#is', array($this, '_sanitize_naughty_html'), $str);
/*
* Sanitize naughty scripting elements
*
* Similar to above, only instead of looking for
* tags it looks for PHP and JavaScript commands
* that are disallowed. Rather than removing the
* code, it simply converts the parenthesis to entities
* rendering the code un-executable.
*
* For example: eval('some code')
* Becomes: eval('some code')
*/
$str = preg_replace('#(alert|cmd|passthru|eval|exec|expression|system|fopen|fsockopen|file|file_get_contents|readfile|unlink)(\s*)\((.*?)\)#si', "\\1\\2(\\3)", $str);
// Final clean up
// This adds a bit of extra precaution in case
// something got through the above filters
$str = $this->_do_never_allowed($str);
/*
* Images are Handled in a Special Way
* - Essentially, we want to know that after all of the character
* conversion is done whether any unwanted, likely XSS, code was found.
* If not, we return TRUE, as the image is clean.
* However, if the string post-conversion does not matched the
* string post-removal of XSS, then it fails, as there was unwanted XSS
* code found and removed/changed during processing.
*/
if ($is_image === TRUE)
{
return ($str == $converted_string) ? TRUE: FALSE;
}
log_message('debug', "XSS Filtering completed");
return $str;
}
controller pass sanitized data to model and model inserts in mongo db:
nothing more ... :)

I had related problem
eq
ucfirst for UTF-8 need use mb_ucfirst('helo','UTF-8');
And i think in your situation problem is with: substr need use mb_substr
else :
So meybe on the begin iconv convert to iso-8859-1 and on write to db icon to t Utf-8

To prevent the problem you can use
header("Content-Type: text/html; charset=UTF-8");
in the top of the php file.
Found the solution in this stackoverflow post and worked for me when migrating MySQL DB to MongoDB with latin special chars.

Related

PHP syntactic sugar: How to apply a function on a given input multiple times?

From a database I am getting a text where the function htmlentities() was applied four times. Sample text:
specials &amp;amp; workshops
In order to decode this text I have to do the following:
$out = html_entity_decode(html_entity_decode(html_entity_decode(html_entity_decode("specials &amp;amp;amp; workshops"))));
Result:
specials & workshops
Is there a natural way in PHP to write this more efficient?

I like to do it recursively in such a way that I do not need to know how many entities to match.
$string = 'specials &amp;amp;amp; workshops';
$entity = '/&/';
function recurseHTMLDecode($str, $entity) {
preg_match($entity, $str, $matches);
echo count($matches);
if(1 == count($matches)) {
$str = html_entity_decode($str);
$str = recurseHTMLDecode($str, $entity);
return $str;
} else {
return $str;
}
}
var_dump(recurseHTMLDecode($string, $entity));
This returns:
11110string(20) "specials & workshops"
Here is the EXAMPLE
This could be improved by adding a whitelist of entities to the function so you would not have to specify the entity when calling, just loop through the whitelist. This would solve the issue of having more than one entity in a string. It could be quite complex.

Why not declare a function to do so?
$in = "specials &amp;amp; workshops";
$decode = function($in) {
foreach(range(1,4) as $x) $in = html_entity_decode($in); return $in; };
function decode($in) {
foreach(range(1,4) as $x)
$in = html_entity_decode($in);
return $in;
}
// inline
$out = $decode($in);
// traditional
$out = decode($in);

According to the recursive idea of #JayBlanchard I have no created the following - really like it:
/**
* Apply a function to a certain input multiple times.
*
* #param $input: The input variable:
* #param callable $func: The function to call.
* #param int $times: How often the function should be called. -1 for deep call (unknown number of calls required). CAUTION: If output always changes this results in an endless loop.
* #return mixed
*/
function recapply($input,callable $func,int $times) {
if($times > 1) {
return recapply($func($input),$func,$times - 1);
} else if($times == -1) {
$res = $func($input);
if($res === $input) {
return $input;
} else {
return recapply($res,$func,-1);
}
}
return $func($input);
}
Working example call:
echo recapply("specials &amp;amp; workshops","html_entity_decode",4);

Truncate PHP string with a specific delimiter

Good morning,
I need to truncate a string with a specific delimiter character.
For example with that string:
myString = 'customname_489494984';
I would like to strip automatically every string parts after "_"
(in this case "489494984").
Is there a function in order to truncate the string part after a specific
delimiter?
Many thanks!
François

You can also use strstr() which finds the first occurrence of a string:
$myString= "customname_489494984";
echo strstr($myString, '_', true);
Here's my reference: http://php.net/manual/en/function.strstr.php

Use a simple combo of substr and strpos:
$myString = 'customname_489494984';
echo substr($myString, 0, strpos($myString, '_'));
BONUS - Wrap it in a custom function:
function truncateStringAfter($string, $delim)
{
return substr($string, 0, strpos($string, $delim));
}
echo truncateStringAfter('customname_489494984', '_');

Try this,It will remove anything after "_".
$string= 'customname_489494984';
$string=substr($string, 0, strrpos($string, '_'));

The problem with other straightforward approaches is that if the String doesn't contain “_” then an empty String will be returned. Here is a small StringUtils class (part of a larger one) that does that with beforeFirst static method in a multibyte safe way.
var_dump(StringUtils::beforeFirst("customname_489494984", "_"));
class StringUtils
{
/**
* Returns the part of a string <b>before the first</b> occurrence of the string to search for.
* #param <b>$string</b> The string to be searched
* #param <b>$search</b> The string to search for
* #param <b>$caseSensitive boolean :optional</b> Defines if the search will be case sensitive. By default true.
* #return string
*/
public static function beforeFirst($string,$search,$caseSensitive = true)
{
$firstIndex = self::firstIndexOf($string, $search,$caseSensitive);
return $firstIndex == 0 ? $string : self::substring($string, 0 , $firstIndex);
}
/**
* Returns a part of the string from a character and for as many characters as provided
* #param <b>$string</b> The string to retrieve the part from
* #param <b>$start</b> The index of the first character (0 for the first one)
* #param <b>$length</b> The length of the part the will be extracted from the string
* #return string
*/
public static function substring($string,$start,$length = null)
{
return ( $length == null ) ? ( mb_substr($string, $start) ) : ( $length == 0 ? "" : mb_substr($string, $start , $length) );
}
/**
* Return the index of <b>the first occurance</b> of a part of a string to the string
* #param <b>$string</b> The string to be searched
* #param <b>$search</b> The string to search for
* #param <b>$caseSensitive boolean :optional</b> Defines if the search will be case sensitive. By default true.
* #return number
*/
public static function firstIndexOf($string,$search,$caseSensitive = true)
{
return $caseSensitive ? mb_strpos($string, $search) : mb_stripos($string, $search);
}
/**
* Returns how many characters the string is
* #param <b>$string</b> The string
* #return number
*/
public static function length($string)
{
return mb_strlen($string);
}
}

What is the correct way to display a multi-line #param using PHPDoc?

From what I've done research on, I can't seem to find a correct method to format a multiline phpdoc #param line. What is the recommended way to do so?
Here's an example:
/**
* Prints 'Hello World'.
*
* Prints out 'Hello World' directly to the output.
* Can be used to render examples of PHPDoc.
*
* #param string $noun Optional. Sends a greeting to a given noun instead.
* Input is converted to lowercase and capitalized.
* #param bool $surprise Optional. Adds an exclamation mark after the string.
*/
function helloYou( $noun = 'World', $surprise = false ) {
$string = 'Hello ' . ucwords( strtolower( $string ) );
if( !!$surprise ) {
$string .= '!';
}
echo $string;
}
Would that be correct, or would you not add indentation, or would you just keep everything on one, long line?

You can simply do it this way:
/**
*
* #param string $string Optional. Sends a greeting to a given noun instead.
* Input is converted to lowercase and capitalized.
* #param bool $surprise
*/
function helloYou( $string = 'World', $surprise = false )
{
$string = 'Hello ' . ucwords( strtolower( $string ) );
if( !!$surprise ) {
$string .= '!';
}
echo $string;
}
So your example is fine except for one thing: the PHPDoc #param needs to have the same name as the PHP parameter. You called it $noun in the doc and $string in the actual code.

trying to count numbers that inside of a variable

I'm trying to count how many numbers inside of a variable. Here is the regex that i use..
preg_match('/[^0-9]/', $password, $numbers, PREG_OFFSET_CAPTURE);
When I try to get all numbers one by one, I use:
print_r($this->filter->password_filter($data["user_password"]));
user_password is 123d4sd6789. Result is an empty array.
Array ( )

Well, you can easily do it using preg_split:
$temp = preg_split('/\d/', $password);
$numCount = count($temp) - 1;
Your regex is flawed, since you're trying to count the number of digits in a password, using:
/[^0-9]/
Won't cut it, perhpas you meant to write:
/[0-9]/
Because what you have now matches everything EXCEPT a number.
There are a great number of ways to do what you're trying to do, I've benchmarked 4 different approaches, and found that using regex is the fastest approach. Using the bundled pcre extension that PHP ships with, preg_split outperforms all other approaches ~70% of the time, 20% of the time, the loops are faster, though, and ~10% preg_match_all is fastest.
On codepad, who don't use the standard PCRE for some reason, preg_match_all doesn't work, nor did shuffle prove to be reliable, so I added a knuth method, and I decided to test the differences between /\d/ and /[0-9]/ in combination with preg_split instead. On codepad, regex is faster >95% of the time as a result.
In short: use preg_split + regex for the best results.
Anyway, here's the benchmark code. It may seem silly to put it all into a class, but really, it's the fair way to benchmark. The string that is processed is kept in memory, as are all the arrays that are used to time the functions, and compare speeds.
I'm not calling the test methods directly, either, but use a timeCall method instead, simply because I want the garbage collector to GC what needs to be GC'ed after each call. Anyway, it's not too difficult to figure this code out, and it's the results that matter
class Bench
{
/**
* #var string
*/
private $str = '123d4sd6789';
private $functions = array(
'regex' => null,
'regex2' => null,
'loop' => null,
'loop2' => null
);
private $random = null;
public function __construct()
{
$this->random = array_keys($this->functions);
if (!shuffle($this->random)) $this->knuth();
}
/**
* Knuth shuffle
*/
private function knuth()
{
for ($i=count($this->random)-1,$j=mt_rand(0,$i);$i>0;$j=mt_rand(0,--$i))
{
$temp = $this->random[$j];
$this->random[$j] = $this->random[$i];
$this->random[$i] = $temp;
}
}
/**
* Call all functions in random order, timing each function
* determine fastest approach, and echo results
* #param $randomize
* #return string
*/
public function test($randomize)
{
if ($randomize) if (!shuffle($this->random)) $this->knuth();
foreach($this->random as $func) $this->functions[$func] = $this->timeCall($func);
$fastest = array('f', 100000);
foreach($this->functions as $func => $time)
{
$fastest = $fastest[1] > $time ? array($func, $time) : $fastest;
echo 'Function ', $func, ' took ', $time, 'ms', PHP_EOL;
}
echo 'Fastest approach: ', $fastest[0], ' (', $fastest[1], 'ms)', PHP_EOL;
return $fastest[0];
}
/**
* Time function call
* #param string $func
* #return float mixed
*/
private function timeCall($func)
{
echo $func, PHP_EOL;
$start = microtime(true);
$this->{$func}();
return (microtime(true) - $start);
}
/**
* Count digits in string using preg_split
* #return int
*/
private function regex()
{
return count(preg_split('/\d/', $this->str)) - 1;
}
/**
* count digits in string using str_split + is_numeric + loop
* #return int
*/
private function loop()
{
$chars = str_split($this->str);
$counter = 0;
foreach($chars as $char) if (is_numeric($char)) ++$counter;
return $counter;
}
/**
* count digits by iterating over string, using is_numeric
* #return int
*/
private function loop2()
{
for($i=0,$j=strlen($this->str),$counter=0;$i<$j;++$i) if (is_numeric($this->str{$i})) ++$counter;
return $counter;
}
/**
* use preg_split + [0-9] instead of \d
* #return int
*/
private function regex2()
{
return count(preg_split('/[0-9]/', $this->str)) - 1;
if (preg_match_all('/[0-9]/',$this->str, $matches)) return count($matches);
return 0;
}
}
$benchmark = new Bench();
$totals = array();
for($i=0;$i<10;++$i)
{
$func = $benchmark->test($i);
if (!isset($totals[$func])) $totals[$func] = 0;
++$totals[$func];
if ($i < 9) echo PHP_EOL, '---------------------------------------------', PHP_EOL;
}
var_dump($totals);
Here's the codepad I set up

Do you really want a regex?
$arr1 = str_split($password);
$counter=0;
foreach($arr1 as $v){
if(is_numeric($v))$counter++;
}

Use preg_match_all to select all the numbers:
$reg = '/[0-9]/';
$string = '123d4sd6789';
preg_match_all($reg, $string, $out);
echo (count($out[0]));

How to pass a URL as a parameter of a controller's method in codeigniter

I have a Codeigniter controller which takes a full URL as the first argument, but the passed URL inside my controller only is only showing http:
public function mydata($link)
{
echo $link; //then it show only http: rather than the full url http://abc.com
}
How can i solve this issue?

if you want to pass url as parameters then use
urlencode(base64_encode($str))
ie:
$url=urlencode(base64_encode('http://stackoverflow.com/questions/9585034'));
echo $url
result:
aHR0cDovL3N0YWNrb3ZlcmZsb3cuY29tL3F1ZXN0aW9ucy85NTg1MDM0
then you call:
http://example.com/mydata/aHR0cDovL3N0YWNrb3ZlcmZsb3cuY29tL3F1ZXN0aW9ucy85NTg1MDM0
and in your controller
public function mydata($link)
{
$link=base64_decode(urldecode($link));
...
...
...
you have an encoder/decoder here:
http://www.base64decode.org/

In Codeigniter controllers, each method argument comes from the URL separated by a / slash. http://example.com
There are a few different ways to piece together the the arguments into one string:
public function mydata($link)
{
// URL: http://example.com/mysite/mydata/many/unknown/arguments
// Ways to get the string "many/unknown/arguments"
echo implode('/', func_get_args());
echo ltrim($this->uri->uri_string(), '/');
}
However:
In your case, the double slash // may be lost using either of those methods because it will be condensed to one in the URL. In fact, I'm surprised that a URL like:
http://example.com/mydata/http://abc.com
...didn't trigger Codeigniter's "The URI contains disallowed chatacters" error. I'd suggest you use query strings for this task to avoid all these problems:
http://example.com/mydata/?url=http://abc.com
public function mydata()
{
$link = $this->input->get('url');
echo $link;
}

Aside from the issue of whether you should be passing a URL in a URL think about how you are passing it:
example.com/theparameter/
but your URL will actually look like
example.com/http://..../
See where you're going wrong yet? The CodeIgniter framework takes the parameter out of the URL, delimited by slashes. So your function is working exactly as it should.
If this is how you must do it then URL encode your parameter before passing it.

You can try this. It worked fr me.
"encode" the value before passing
$value = str_replace('=', '-', str_replace('/', '_', base64_encode($album)));
"decode" the value after receiving
$value = base64_decode(str_replace('-', '=', str_replace('_', '/', $value)));
reference: https://forum.codeigniter.com/printthread.php?tid=40607

I did like #user72740's until I discovered that it can still produce characters not permitted by CI like %.
What I ended up doing is converting the segment string into a hex, them back.
So I created a MY_URI that extended CI_URI and added these methods:
/**
* Segmentize
*
* Makes URI segments, CI "segment proof"
* Removes dots and forwardslash leaving ONLY hex characters
* Allows to pass "anything" as a CI URI segment and coresponding function param
*
* #access public
* #return string
*/
public function segmentize($segment){
if(empty($segment)){
return '';
}
return bin2hex($segment);
}
/**
* Desegmentize
*
* #access public
* #return string
*/
public function desegmentize($segment){
if(empty($segment)){
return '';
}
return $this->hex2bin($segment);
}
/**
* hex2bin
*
* PHP 5.3 version of 5.4 native hex2bin
*
* #access public
* #return string
*/
public function hex2bin($hex) {
$n = strlen($hex);
$bin = '';
$i = 0;
while($i < $n){
$a = substr($hex, $i, 2);
$c = pack('H*', $a);
if ($i == 0){
$bin = $c;
}
else {
$bin .= $c;
}
$i += 2;
}
return $bin;
}
Then used $this->uri->segmentize($url) to create the segment string and
$this->uri->desegmentize($this->input->post('url', true)) to get it back into readable format.
Thus
https://www.example.com/somewhere/over/the/rainbow
becomes
68747470733a2f2f7777772e6d79736974652e636f6d2f736f6d6577686572652f6f7665722f7468652f7261696e626f77
and back.
I am sure there is a better way, like a base_convert() implementation, because this way the string can get arbitrarily long. But now I dont have to worry about = signs and padding, etc.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

Mongo db utf-8 exception - php

I had related problem eq ucfirst for UTF-8 need use mb_ucfirst('helo','UTF-8'); And i think in your situation problem is with: substr need use mb_substr else : So meybe on the begin iconv convert to iso-8859-1 and on write to db icon to t Utf-8

To prevent the problem you can use header("Content-Type: text/html; charset=UTF-8"); in the top of the php file. Found the solution in this stackoverflow post and worked for me when migrating MySQL DB to MongoDB with latin special chars.

Related

PHP syntactic sugar: How to apply a function on a given input multiple times?

Truncate PHP string with a specific delimiter

What is the correct way to display a multi-line #param using PHPDoc?

trying to count numbers that inside of a variable

How to pass a URL as a parameter of a controller's method in codeigniter

Categories

Resources