How to remove chinese characters in a string - php

is there any easy way to truncate chinese characters i found that regexp but it doesn't work as expected
<?php
$data1='疯狂的管道Test';
$data2='睡眠帮手-背景乐Test';
echo str_replace(preg_replace("/[^\x{4e00}-\x{9fa5}]+/u", '', $data1),'',$data1)
."<br>\n".
str_replace(preg_replace("/[^\x{4e00}-\x{9fa5}]+/u", '', $data2),'',$data2);
exit;
it works for data1 but not data2

You can use a Unicode character property (Han should work for you):
preg_replace("/\p{Han}+/u", '', $data)
Working example: http://ideone.com/uEiIV5

Try this code (online version # Ideone.com):
<?php
$data1='疯狂的管道Test';
$data2='睡眠帮手-背景乐Test';
echo preg_replace("/[\x{4e00}-\x{9fa5}]+/u", '', $data1), "\n";
echo preg_replace("/[\x{4e00}-\x{9fa5}]+/u", '', $data2);
// Better use this (credits to Kobi's answer below)
preg_replace("/\p{Han}+/u", '', $data)
I have removed the ^ from the regular expression so we don't need str_replace() anymore.
Your old regexp matched all non-chinese characters thus preg_replace() only left chinese character in the returned string. In order to obtain the final result, you had to replace the found chinese characters by an empty string.
preg_replace("/[^\x{4e00}-\x{9fa5}]+/u", '', $data1) // returns 疯狂的管道
str_replace('疯狂的管道', '', $data1); // gives us Test
The second regexp again matched all non-chinese characters. But now, they are not in a sequence!
preg_replace("/[^\x{4e00}-\x{9fa5}]+/u", '', $data2) // returns 睡眠帮手背景乐
And this string cannot be found in $data2 anymore thus it doesn't work.

This one should also do the job
/[^\u4E00-\u9FFF]+/

Related

Create a function to find a specific word in the title

I have the following title formation on my website:
It's no use going back to yesterday, because at that time I was... Lewis Carroll
Always is: The phrase… (author).
I want to delete everything after the ellipsis (…), leaving only the sentence as the title. I thought of creating a function in php that would take the parts of the titles, throw them in an array and then I would work each part, identifying the only pattern I have in the title, which is the ellipsis… and then delete everything. But when I do that, in the X space of my array, it returns the following:
was...
In position 8 of the array comes the word and the ellipsis and I don't know how to find a pattern to delete the author of the title, my pattern was the ellipsis. Any idea?
<?php
$a = get_the_title(155571);
$search = '... ';
if(preg_match("/{$search}/i", $a)) {
echo 'true';
}
?>
I tried with the code above and found the ellipsis, but I needed to bring it into an array to delete the part I need. I tried something like this:
<?php
define('WP_USE_THEMES', false);
require('./wp-blog-header.php');
global $wpdb;
$title_array = explode(' ', get_the_title(155571));
$search = '... ';
if (array_key_exists("/{$search}/i",$title_array)) {
echo "true";
}
?>
I started doing it this way, but it doesn't work, any ideas?
Thanks,
If you use regex you need to escape the string as preg_quote() would do, because a dot belongs to the pattern.
But in your simple case, I would not use a regex and just search for the three dots from the end of the string.
Note: When the elipsis come from the browser, there's no way to detect in PHP.
$title = 'The phrase... (author).';
echo getPlainTitle($title);
function getPlainTitle(string $title) {
$rpos = strrpos($title, '...');
return ($rpos === false) ? $title : substr($title, 0, $rpos);
}
will output
The phrase
First of all, since you're working with regular expressions, you need to remember that . has a special meaning there: it means "any character". So /... / just means "any three characters followed by a space", which isn't what you want. To match a literal . you need to escape it as \.
Secondly, rather than searching or splitting, you could achieve what you want by replacing part of the string. For instance, you could find everything after the ellipsis, and replace it with an empty string. To do that you want a pattern of "dot dot dot followed by anything", where "anything" is spelled .*, so \.\.\..*
$title = preg_replace('/\.\.\..*/', '', $title);

mb_strtolower not working

I have the following string:
$var = "RUA TANGARA"
And I'm doing:
echo mb_strtolower(preg_replace('/[^~\'"]/', null, iconv('UTF-8', 'ASCII//TRANSLIT', $var)), 'UTF-8');
But this still returning "RUA TANGARA".
I use the preg_replace() because $var can be "RÜÁ TÃNAGARA".
Can someone help me?
The issue is that you want to replace certain characters after the transliteration but you specified ^ (which has special meaning and means NOT) at the beginning of the character class [].
So you are replacing characters that are NOT ~'" (which happens to be all of them in your example), so it results in an empty string. To fix, just escape the ^, move it away from the beginning or remove it if not needed and it should be fine:
/[\^~\'"]/
Or:
/[~^\'"]/
Working Example
PHP's functions work very well, take a look at this simple demonstration:
<?php
$data = "RUA TANGARA";
$result = mb_strtolower($data);
var_dump($result);
The obvious output is:
string(11) "rua tangara"
The same works with non ascii characters:
<?php
$data = 'RÜÁ TÃNAGARA';
$result = mb_strtolower($data);
var_dump($result);
The output of that is:
string(15) "rüá tãnagara"
Try
$newStr = strtolower($var);
echo $newStr;

how to start string from specific character and remove unwanted characters

I have URL of file which looks like this
movieImages/1`updateCategory.PNG
it should look like this
updateCategory.PNG
you can use like this, simple
$string = 'movieImages/1`updateCategory.PNG';
$ser = 'movieImages/1`';
$trimmed = str_replace($ser, '', $string);
echo $trimmed;
output will be updateCategory.PNG
Find the position of unwanted character and then pick up the substring after that position.
$str="movieImages/1`updateCategory.PNG";
$unwanted="`";
echo substr($str,strpos($str,$unwanted)+1);
Output
updateCategory.PNG
Fiddle
That is if the string can vary in structure and size. If the first part will always remain same you can simply remove the unwanted stuff using str_replace.
echo str_replace('movieImages/1`','',$str);

PHP Strip String, Convert to int

I have a STRING $special which is formatted like £130.00 and is also an ex TAX(VAT) price.
I need to strip the first char so i can run some simple addition.
$str= substr($special, 1, 0); // Strip first char '£'
echo $str ; // Echo Value to check its worked
$endPrice = (0.20*$str)+$str ; // Work out VAT
I don't receive any value when i echo on the second line ? Also would i then need to convert the string to an integer in order to run the addition ?
Thanks
Matt
+++ UPDATE
Thanks for your help with this, I took your code and added some of my own, There are more than likely nicer ways to do this but it works :) I found out that if the price was below 1000 would look like £130.00 if the price was a larger value it would include a break. ie £1,400.22.
$str = str_replace('£', '', $price);
$str2 = str_replace(',', '', $str);
$vatprice = (0.2 * $str2) + $str2;
$display_vat_price = sprintf('%0.2f', $vatprice);
echo "£";
echo $display_vat_price ;
echo " (Inc VAT)";
Thanks again, Matt
You cannot use substr the way you are using it currently. This is because you are trying to remove the £ char, which is a two-byte unicode character, but substr() isn't unicode safe. You can either use $str = substr($string, 2), or, better, str_replace() like this:
$string = '£130.00';
$str = str_replace('£', '', $string);
echo (0.2 * $str) + $str; // 156
Original answer
I'll keep this version as it still can give some insight. The answer would be OK if £ wouldn't be a 2byte unicode character. Knowing this, you can still use it but you need to start the sub-string at offset 2 instead of 1.
Your usage of substr is wrong. It should be:
$str = substr($special, 1);
Check the documentation the third param would be the length of the sub-string. You passed 0, therefore you got an empty string. If you omit the third param it will return the sub-string starting from the index given in the first param until the end of the original string.

php preg_match_all not specific number

I want to exclude a specific number like 4800 from a string of numbers like 569048004801.
I'm using php for this and the method preg_match_all some pattern's examples I have tried :
/([^4])([^8])([^0])([^0])/i
/([^4800])/i
If you just want to see if a string contain 4800, you don't need regular expressions :
<?php
$string = '569048004801';
if(strpos($string,'4800') === false){
echo '4800 was not found in the string';
}
else{
echo '4800 was found in the string';
}
More information about strpos in the documentation here
If you mean you simply want to remove 4800 from a string, this is easier with a str_replace:
$str = '569048004801';
$str = str_replace('4800', '', $str);
On the other hand, if you mean you want to know if a particular string of digits contains 4800, this will test that for you:
$str = '569048004801';
if (preg_match_all('/4800/', $str) > 0) {
echo 'String contains 4800.';
} else {
echo 'String does not contain 4800.';
}
/([^4])([^8])([^0])([^0])/i
This actually says, a sequence of four characters that is not "4800". Close.
/([^4800])/i
This actually says, a single character that is not '4', '8', or '0'.
Assuming you mean to capture a number that doesn't contain "4800" in it, I think you might want
/(?!\d*4800)\d+/i
This says, check first that we're not looking at a string of numbers with "4800" somewhere, and provided this is the case, capture the string of numbers. It's called a "negative lookahead assertion".

Categories