create URL slugs for chinese characters. Using PHP - php

My users sometimes use chinese characters for the title of their input.
My slugs are in the format of /stories/:id-:name where an example could be /stories/1-i-love-php.
How do I allow chinese characters?
I have googled and found the japanese version of this answer over here.
Don't quite understand Japanese, so I am asking about the chinese version.
Thank you.

i have tested in Bengali characters
it may work. try this:
at first the coded page (write code where in the page) have to convert into encoding type in UTF-8, then write code.
code here:
function to_slug($string, $separator = '-') {
$re = "/(\\s|\\".$separator.")+/mu";
$str = #trim($string);
$subst = $separator;
$result = preg_replace($re, $subst, $str);
return $result;
}
$id=34;
$string_text="আড়াইহাজারে দেড় বছরের --- শিশুর -গলায় ছুরি";
$base_url="http://example.com/";
echo $target_url=$base_url.$id."-". #to_slug($string_text);
var_dump($target_url);
output:
http://example.com/34-আড়াইহাজারে-দেড়-বছরের-শিশুর-গলায়-ছুরি
string 'http://example.com/34-আড়াইহাজারে-দেড়-বছরের-শিশুর-গলায়-ছুরি' (length=136)

Related

PHP regex replace based on \v character (vertical tab)

I have a character string like (ascii codes):
32,13,7,11,11,
"string1,blah;like: this...", 10,10, 32,32,32,32, 138,138, 32,32,32,32, 13,7, 11,11,
"string2/lorem/example-text...", 10,10, 32,32,32,32,32, 143,143,143,143,143
So the sequence is:
any characters, followed by my search string, followed by any
characters
11,11
the string I want to replace
any non-printable characters
If the block contains string1 then I need to replace the next string with something else. The second string always starts directly after the 11,11.
I'm using PHP.
I thought something like this, but I am not getting the correct result:
$updated = preg_replace("/(.*string1.*?\\v+)([[:print:]]+)([[:ascii:]]*)/mi", "$1"."new string"."$3", $orig);
This puts "new string" between the 10,10 and the 138,138 (and replaces the 32's).
Also tried \xb instead of \v.
Normally I test with regex101, but not sure how to do that with non-printable characters. Any suggestions from regex guru's?
Edit: the expected output is the sequence:
32,13,7,11,11,
"string1,blah;like: this...", 10,10, 32,32,32,32, 138,138, 32,32,32,32, 13,7, 11,11,
"new string", 10,10, 32,32,32,32,32, 143,143,143,143,143
Edit: sorry for the confusion regarding the ascii codes.
Here's a complete example:
<?php
$s = chr(32).chr(32).chr(7).chr(11).chr(11);
$s .= "string1,blah;like: this...". chr(10).chr(10).chr(32).chr(32).chr(32).chr(32).chr(138).chr(138);
$s .= chr(32).chr(32).chr(32).chr(32).chr(13).chr(7).chr(11).chr(11);
$s .= "string2/lorem/example-text...". chr(10).chr(10).chr(32).chr(32).chr(32).chr(32).chr(32).chr(143).chr(143).chr(143);
$result = preg_replace('/(.*string1.*?\v+)([[:print:]]+)([[:ascii:]]*)/mi', "$1"."new string"."$3", $s);
echo "\n------------------------\n";
echo $result;
echo "\n------------------------\n";
The text string2/lorem/example-text... should be replaced by new string.
My php-cli halted every time preg_match has reached char(138) and I don't know why.
I will throw my hat on this RegEx (note: \v matches a new-line | no flags are set):
"[^"]*"[^\x0b]+\v{2}"\K[^"]*
PHP code:
$source = chr(32).chr(13).chr(7).chr(11).chr(11)."\"string1,blah;like: this...\"".chr(10).
chr(10).chr(32).chr(32).chr(32).chr(32).chr(138).chr(138).chr(32).chr(32).chr(32).chr(32).
chr(13).chr(7).chr(11).chr(11)."\"string2/lorem/example-text...\"".chr(10).chr(10).chr(32).
chr(32).chr(32).chr(32).chr(32).chr(143).chr(143).chr(143).chr(143).chr(143);
echo preg_replace('~"[^"]*"[^\x0b]+\v{2}"\K[^"]*~', "new string", $source);
Beautiful output:
"string1,blah;like: this..."
��
"new string"
�����
Live demo
Solved. It was a combination of things:
/mis was needed (instead of /mi)
\x0b was needed (instead of \v)
Complete working example:
<?php
$s = chr(32).chr(32).chr(7).chr(11).chr(11);
$s .= "string1,blah;like: this...". chr(10).chr(10).chr(32).chr(32).chr(32).chr(32).chr(138).chr(138);
$s .= chr(32).chr(32).chr(32).chr(32).chr(13).chr(7).chr(11).chr(11);
$s .= "string2/lorem/example-text...". chr(10).chr(10).chr(32).chr(32).chr(32).chr(32).chr(32).chr(143).chr(143).chr(143);
$result = preg_replace('/(.*string1.*?\x0b+)([[:print:]]+)/mis', "$1"."new string", $s);
echo "\n------------------------\n";
echo $result;
echo "\n------------------------\n";
Thanks for everyone's suggestions. It put me on the right track.

How can i allow only arabic characters in input text field?

I already searched here and found similar posts related to this post but i did not find a solution yet .
i tried this :
$text = "الحمد لله رب العالمين hello";
echo $is_arabic = preg_match('/\p{Arabic}/u', $text);
I add the unicode flag but if i add any English characters it is returning true ! any fix for this ?
Any idea folks ?
Thanks in advance
Use unicode flag:
$text = "الحمد لله رب العالمين";
echo $is_arabic = preg_match('/\p{Arabic}/u', $text);
here __^
If you want to match only arabic you should do:
echo $is_arabic = preg_match('/^[\s\p{Arabic}]+$/u', $text);
Update: I see I am apparently wrong about classes not being supported (though the docs at say "Extended properties such as "Greek" or "InMusicalSymbols" are not supported by PCRE" but the comment at http://php.net/manual/en/regexp.reference.unicode.php#102756 says they are supported), so I guess M42's is the better answer. They can, however, be done with ranges as follows:
$text = "الحمد لله رب العالمين";
echo $is_arabic =
preg_match('/^[\s\x{0600}-\x{06FF}\x{0750}-\x{077F}\x{08A0}-\x{08FF}\x{FB50}-\x{FDFF}\x{FE70}-\x{FEFF}\x{10E60}\x{10E60}—\x{10E7F}\x{1EE00}—\x{1EEFF}]+$/u', $text);

php non latin to hex function

I have website that's in win-1251 encoding and it needs to stay that way. But I also need to be able to echo few links that contain non latin, non cyrillic characters like šžāņūī...
I need a function that convert this
"māja un man tā patīk"
to
"māja un man tā patīk"
and that does not touch html, so if there is <b> it needs to stay as <b>, not > or <
And please no advices about the encoding and how wrong that is.
$str = "<b>Obāchan</b> おばあちゃん";
$str = preg_replace_callback('/./u', function ($matches) {
$chr = $matches[0];
if (strlen($chr) > 1) {
$chr = mb_convert_encoding($chr, 'HTML-ENTITIES', 'UTF-8');
}
return $chr;
}, $str);
This expects the original $str to be UTF-8 encoded, i.e. your PHP file should be saved in UTF-8. It encodes all non-ASCII compatible code points to HTML entities. Since all HTML special characters are ASCII characters, they remain untouched. The resulting string is pure ASCII. Since the lower Win-1251 code points are ASCII compatible, the resulting string is also a valid Win-1251 string. The above $str converts to:
<b>Obāchan</b> おばあちゃん
The main things you probably don't want to encode are <, > and &. Those are really the only special characters. So how about encoding everything first, and then just decode <, > and & I feel you should be fine.
This is untested:
$output =
htmlspecialchars_decode(
htmlentities($input, ENT_NOQUOTES, 'CP-1251')
);
let me know
What Evert suggest looks logical to me too! If you insist this is a way to do it if there are only two letters that bother you. For more letters the scrit will not be as effective and needs to change.
<?PHP
function myConvert($str)
{
$chars['ā']='ā';
$chars['ī']='ī';
foreach ($chars as $key => $value)
$output = str_replace($key, $value, $str);
echo $str;
}
myConvert("māja un man tā patīk");
?>
==================edited==============
For many characters maybe this one can help you:
<?PHP
function myConvert($str)
{
$final=null;
$parts = preg_split("/&#[0-9]*;/i", $str);//get all text parts
preg_match_all("/&#[0-9]*;/i", $str, $delimiters );//get delimiters;
$delimiters[0][]='';//make arrays equal size
foreach($parts as $key => $value)
$final.=$value.mb_convert_encoding
($delimiters[0][$key], "UTF-8", "HTML-ENTITIES");
return $final;
}
$fh = fopen("testFile.txt", 'w') ;
fwrite($fh, myConvert("māja un man tā patīkī"));
fclose($fh);
?>
The desired output is written in the text file. This code, exactly as it is -not merged in some project- does what it claims to do. Converts codes like ā to the analogous character they present.

Problem in UTF Encoding in PHP

I use the following lines of code:
$revTerm = "". strrev($limitAry["term"]);
$revTerm = utf8_encode($revTerm);
The $revTerm contains Norwegian characters as ø æ å. However, it is shown correctly. I need to reverse them before displaying, so I use the first line.
When I display them this way, I get an error of bad xml format - used to fill a grid.
When I try to use the second line, I don't get an error but the characters are not shown correctly. Could there be any other way to solve that?
If it may help, I use jqGrid to fill those data in.
strrev, like most PHP string functions, is not safe for multi-byte encodings.
try this example
$test = 'А роза упала на лапу Азора ウィキ';
$test = iconv('utf-8', 'utf-16le', $test);
$test = strrev($test);
// キィウ арозА упал ан алапу азор А
echo iconv('utf-16be', 'utf-8', $test);
(russian)
http://bolknote.ru/2012/04/02/~3625#56
Try this:
$revTerm = utf8_decode($limitAry["term"]);
$revTerm = strrev($revTerm);
$revTerm = utf8_encode($revTerm);
For using strrev you have to decode your string to a non-multibyte string.

converting C style unicode escapes to htmlentities

I want to change every C style unicode char into html entity. I've written this function to do that:
function ununicode($text) {
$text = preg_replace('/\\\\u([0-9a-f]{4})/i', '&#x$1;', $text);
return $text;
}
it works good, but ignores second character in sth like that \u00f6\u00df. ie it will produce: ö\u00df
whats wrong with my regex?
Try adding the g flag (which allows more than one replacement per line), so that it's this:
$text = preg_replace('/\\\\u([0-9a-f]{4})/ig', '&#x$1;', $text);
Edit:
Your code as written seems to work for me:
php > $text = "\u00f6\u00df";
php > print $text;
\u00f6\u00df
php > $text2 = preg_replace('/\\\\u([0-9a-f]{4})/i', '&#x$1;', $text);
php > print $text2;
öß

Categories