Remove Arabic Diacritic

Remove Arabic Diacritic - php

I want php to convert this...
Text : الْحَمْدُ لِلَّهِ رَبِّ الْعَالَمِينَ
converted to : الحمد لله رب العالمين
I am not sure where to start and how to do it. Absolutely no idea. I have done some research, found this link http://www.suhailkaleem.com/2009/08/26/remove-diacritics-from-arabic-text-quran/ but it is not using php. I would like to use php and covert the above text to converted text. I want to remove any diacritic from user input arabic text

The vowel diacritics in Arabic are combining characters, meaning that a simple search for these should suffice. There's no need to have a replace rule for every possible consonant with every possible vowel, which is a little tedious.
Here's a working example that outputs what you need:
header('Content-Type: text/html; charset=utf-8', true);
$string = 'الْحَمْدُ لِلَّهِ رَبِّ الْعَالَمِينَ';
$remove = array('ِ', 'ُ', 'ٓ', 'ٰ', 'ْ', 'ٌ', 'ٍ', 'ً', 'ّ', 'َ');
$string = str_replace($remove, '', $string);
echo $string; // outputs الحمد لله رب العالمين
What's important here is the $remove array. It looks weird because there's a combining character between the ' quotes, so it modifies one of those single quotes. This might need saving in the same character encoding as your text is.

try this:
$string = 'الْحَمْدُ لِلَّهِ رَبِّ الْعَالَمِينَ';
$string = preg_replace("~[\x{064B}-\x{065B}]~u", "", $string);
echo $string; // outputs الحمد لله رب العالمين

Try this code, it's works fine:
<?php
$str = 'الْحَمْدُ لِلَّهِ رَبِّ الْعَالَمِينَ';
$unicode = [
"~[\x{0600}-\x{061F}]~u",
"~[\x{063B}-\x{063F}]~u",
"~[\x{064B}-\x{065E}]~u",
"~[\x{066A}-\x{06FF}]~u",
];
$str = preg_replace($unicode, "", $str);
echo $str;
?>
See: Arabic unicode
Thank's for: Hosein Shahrestani

I'm not Arabic speaking, but i think you can make some alphabet remap:
function remap($string) {
$remap = [
'ą' => 'a',
'č' => 'c',
/* ... Arabic alphabet remap */
];
return str_replace(array_keys($remap), $remap, $string);
}
echo remap('ąčasdadfg'); // => acasdadfg

Related

php htmlentities corrupt string

For security reasons I wanted to add a function to turn strings to safe format by using the code below. in normal English characters it works fine, but when i use Amharic characters like ከበደ I am getting different string like áŠ¨á‰ á‹°, what shall I do.
echo safestring("ከበደ");
//the string after echo is absolutely changed
function safestring($str){
//make the string from SQL injection
$str = htmlentities($str);
$str= mysql_real_escape_string($str);
return $str;
}

First thing first, first you have provide the charset for your document
HTML
just add the following code the <head> element of your HTML
<meta charset="UTF-8">
PHP
for json you can use the header function of php like so
header('content-type: application/json; charset=utf-8');
In order to prevent from losing any chars from the string you can use the code below
function safestring($string){
$string = trim($string);
$string = str_replace("<", "<", $string);
$string = str_replace(">", ">", $string);
$string = mysql_real_escape_string($string);
return $string;
}

PHP mb_eregi_replace does not work

I am trying to match a whole UTF-8 word in PHP. This is how I am trying to do it:
<?php
$string = 'DS DAMAT TAKIM ELBİSE (GOLD)';
$search = 'takım elbise';
$replace = 'TakımElbise';
$result = mb_eregi_replace('/\b'.$search.'\b/ui', $replace, $string);
echo $result;
echo preg_match('/\b'.$search.'\b/ui', $replace);
?>
But it does not work. What can be the problem?
NOTE:
I have tried adding these lines at the beginning of script:
mb_internal_encoding('UTF-8');
mb_regex_encoding('UTF-8');
No result.

How about:
$string = 'DS DAMAT TAKIM ELBİSE (GOLD)';
// ^__ this isn't an I
$search = 'takım elbİse';
// ^__ this isn't an I
$replace = 'TakımElbise';
$result = preg_replace("/\b$search\b/ui", $replace, $string);
echo $result;
I've just change the i to İ in the search string. You may want to use lowercase (I haven't on my keyboard)

See the comment here: http://php.net/manual/en/function.mb-ereg-replace.php
Unlike preg_replace, mb_ereg_replace doesn't use separators
Example with preg_replace:
$data = preg_replace("/[^A-Za-z0-9\.\-]/","",$data);
Example with mb_ereg_replace:
$data = mb_ereg_replace("[^A-Za-z0-9\.\-]","",$data);
Also, don't use the ui flags.

How to make the first 2 characters of a string uppercase?

How can I use the PHP strtoupper function for the first two characters of a string? Or is there another function for that?
So the string 'hello' or 'Hello' must be converted to 'HEllo'.

$txt = strtoupper( substr( $txt, 0, 2 ) ).substr( $txt, 2 );
This works also for strings that are less than 2 characters long.

$string = "hello";
$string{0} = strtoupper($string{0});
$string{1} = strtoupper($string{1});
var_dump($string);
//output: string(5) "HEllo"

Assuming it's just a single word you need to do:
$ucfirsttwo = strtoupper(substr($word, 0, 2)) . substr($word, 2);
Basically, extract the first two characters and uppercase the, then attach the remaining characters.
If you need to handle multiple words in the string, then it gets a bit uglier.
Oh, and if you're using multi-byte characters, prefix the two functions with mb_ to get a multibyte-aware version.

$str = substr_replace($str, strtoupper($str[0].$str[1]), 1, 2);

Using preg_replace() with the e pattern modifier could be interesting here:
$str = 'HELLO';
echo preg_replace('/^(\w{1,2})/e', 'strtoupper(\\1)', strtolower($str));
EDIT: It is recommended that you not use this approach. From the PHP manual:
Use of this modifier is discouraged, as it can easily introduce security vulnerabilites:
<?php
$html = $_POST['html'];
// uppercase headings
$html = preg_replace(
'(<h([1-6])>(.*?)</h\1>)e',
'"<h$1>" . strtoupper("$2") . "</h$1>"',
$html
);
The above example code can be easily exploited by passing in a string
such as <h1>{${eval($_GET[php_code])}}</h1>. This gives the attacker
the ability to execute arbitrary PHP code and as such gives him nearly
complete access to your server.
To prevent this kind of remote code execution vulnerability the
preg_replace_callback() function should be used instead:
<?php
$html = $_POST['html'];
// uppercase headings
$html = preg_replace_callback(
'(<h([1-6])>(.*?)</h\1>)',
function ($m) {
return "<h$m[1]>" . strtoupper($m[2]) . "</h$m[1]>";
},
$html
);
As recommended, instead of using the e pattern, consider using preg_replace_callback():
$str = 'HELLO';
echo preg_replace_callback(
'/^(\w{1,2})/'
, function( $m )
{
return strtoupper($m[1]);
}
, strtolower($str)
);

This should work strtoupper(substr($target, 0, 2)) . substr($target, 2) where $target is your 'hello' or whatever.

ucfirstDocs does only the first, but substr access on strings works, too:
$str = ucfirst($str);
$str[1] = strtoupper($str[1]);
Remark: This works, but you will get notices on smaller strings if offset 1 is not defined, so not that safe, empty strings will even be converted to array. So it's merely to show some options.

How to convert a string with numbers and spaces into an int

I have a small problem. I am tryng to convert a string like "1 234" to a number:1234
I cant't get there. The string is scraped fro a website. It is possible not to be a space there? Because I've tried methods like str_replace and preg_split for space and nothing. Also (int)$abc takes only the first digit(1).
If anyone has an ideea, I'd be greatefull! Thank you!

This is how I would handle it...
<?php
$string = "Here! is some text, and numbers 12 345, and symbols !£$%^&";
$new_string = preg_replace("/[^0-9]/", "", $string);
echo $new_string // Returns 12345
?>

intval(preg_replace('/[^0-9]/', '', $input))

Scraping websites always requires specific code, you know how you receive the input - and you write code that is required to make it usable.
That is why first answer is still str_replace.
$iInt = (int)str_replace(array(" ", ".", ","), "", $iInt);

$str = "1 234";
$int = intval(str_replace(' ', '', $str)); //1234

I've just came into the same issue, however the answer that was provided wasn't covering all the different cases I had...
So I made this function (the idea popped in my mind thanks to Dan) :
function customCastStringToNumber($stringContainingNumbers, $decimalSeparator = ".", $thousandsSeparator = " "){
$numericValues = $matches = $result = array();
$regExp = null;
$decimalSeparator = preg_quote($decimalSeparator);
$regExp = "/[^0-9$decimalSeparator]/";
preg_match_all("/[0-9]([0-9$thousandsSeparator]*)[0-9]($decimalSeparator)?([0-9]*)/", $stringContainingNumbers, $matches);
if(!empty($matches))
$matches = $matches[0];
foreach($matches as $match):
$numericValues[] = (float)str_replace(",", ".", preg_replace($regExp, "", $match));
endforeach;
$result = $numericValues;
if(count($numericValues) === 1)
$result = $numericValues[0];
return $result;
}
So, basically, this function extracts all the numbers contained inside of a string, no matter how many text there is, identifies the decimal separator and returns every extracted number as a float.
One can specify what decimal separator is used in one's country with the $decimalSeparator parameter.

Use this code for removing any other characters like .,:"'\/, !##$%^&*(), a-z, A-Z :
$string = "This string involves numbers like 12 3435 and 12.356 and other symbols like !## then the output will be just an integer number!";
$output = intval(preg_replace('/[^0-9]/', '', $string));
var_dump($output);

php str_replace problem with single quote getting converted to &039

I am using str_replace to replace some characters and for some reason the output converts single quotes to &039. I am not trying to replace single quotes at all. What can be causing this?

$v = yourstring;
$newv = str_replace("&039", "'", $v);
Example:
$v = "Hi My Name Is &039George&039";
$newv = str_replace("&039", "'", $v);
echo $newv;
The Output Would Be:
Hi My Name Is 'George'
Now I just hope this helps a little and I hope I understood your question right.

Maybe some sort of conversion could be useful:
$v = $_GET['value'];
$v1 = html_entity_decode($v);

You can convert them back with something like
html_entity_decode(__("Some Text"), ENT_QUOTES, "UTF-8")

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

Remove Arabic Diacritic - php

try this: $string = 'الْحَمْدُ لِلَّهِ رَبِّ الْعَالَمِينَ'; $string = preg_replace("~[\x{064B}-\x{065B}]~u", "", $string); echo $string; // outputs الحمد لله رب العالمين

I'm not Arabic speaking, but i think you can make some alphabet remap: function remap($string) { $remap = [ 'ą' => 'a', 'č' => 'c', /* ... Arabic alphabet remap */ ]; return str_replace(array_keys($remap), $remap, $string); } echo remap('ąčasdadfg'); // => acasdadfg

Related

php htmlentities corrupt string

PHP mb_eregi_replace does not work

How to make the first 2 characters of a string uppercase?

How to convert a string with numbers and spaces into an int

php str_replace problem with single quote getting converted to &039

Categories

Resources