How to check single byte katakana in a string - php

Iam working with Double byte japaneese character website, i need to check the user enter a single byte katakana.Site developed in php platform.
This is the preg match that i used for checking
'/[\x{3040}-\x{309F}]/u'

I'm not 100% sure if this the test string I use is legal $string. I'll remove the answer (or try to update it) if it works out different. As the string is manual input (escaped the backslash initially), instead of raw;
$string = "\\xe3\\x80\\x85"; // RAW input might still be '\xe3\x80\x85' here
$result = preg_match_all("/\\\\xe3\\\\x8[0-3]\\\\x[8-9a-b][0-9a-f]/u", $string, $matches);
echo $string;
echo '<pre>';
print_r($matches);
echo '</pre>';
This prints out;
\xe3\x80\x85
Array
(
[0] => Array
(
[0] => \xe3\x80\x85
)
)
Thus; 々

Related

Trim Text From String PHP

I want remove unnecessary text from my string variable. I have variable like $from which contains value like 123456#blog.com. I want only 123456 from it. I have checked some example for trim but does not getting proper idea for do it. Let me know if someone can help me for do it. Thanks
You can split the string on the # symbol like so:
$str = "123456#blog.com";
// here you split your string into pieces before and after #
$pieces = explode("#",$str);
// here you echo your first piece
echo $pieces['0'];
Demo
You can use explode() for splitting the string.
Syntax:
explode('separator', string);
$result= explode("#", '123456#blog.com');
print_r($result);
Output:
Array
(
[0] => 123456
[1] => blog.com
)

php extract Emoji from a string

I have a string contain emoji.
I want extract emoji's from that string,i'm using below code but it doesn't what i want.
$string = "😃 hello world 🙃";
preg_match('/([0-9#][\x{20E3}])|[\x{00ae}\x{00a9}\x{203C}\x{2047}\x{2048}\x{2049}\x{3030}\x{303D}\x{2139}\x{2122}\x{3297}\x{3299}][\x{FE00}-\x{FEFF}]?|[\x{2190}-\x{21FF}][\x{FE00}-\x{FEFF}]?|[\x{2300}-\x{23FF}][\x{FE00}-\x{FEFF}]?|[\x{2460}-\x{24FF}][\x{FE00}-\x{FEFF}]?|[\x{25A0}-\x{25FF}][\x{FE00}-\x{FEFF}]?|[\x{2600}-\x{27BF}][\x{FE00}-\x{FEFF}]?|[\x{2900}-\x{297F}][\x{FE00}-\x{FEFF}]?|[\x{2B00}-\x{2BF0}][\x{FE00}-\x{FEFF}]?|[\x{1F000}-\x{1F6FF}][\x{FE00}-\x{FEFF}]?/u', $string, $emojis);
i want this:
$emojis = ["😃", "🙃"];
but return this:
$emojis = ["😃"]
and also if:
$string = "😅😇☝🏿"
it return only first emoji
$emoji = ["😅"]
Try looking at preg_match_all function. preg_match stops looking after it finds the first match, which is why you're only ever getting the first emoji back.
Taken from this answer:
preg_match stops looking after the first match. preg_match_all, on the other hand, continues to look until it finishes processing the entire string. Once match is found, it uses the remainder of the string to try and apply another match.
http://php.net/manual/en/function.preg-match-all.php
So your code would become:
$string = "😃 hello world 🙃";
preg_match_all('/([0-9#][\x{20E3}])|[\x{00ae}\x{00a9}\x{203C}\x{2047}\x{2048}\x{2049}\x{3030}\x{303D}\x{2139}\x{2122}\x{3297}\x{3299}][\x{FE00}-\x{FEFF}]?|[\x{2190}-\x{21FF}][\x{FE00}-\x{FEFF}]?|[\x{2300}-\x{23FF}][\x{FE00}-\x{FEFF}]?|[\x{2460}-\x{24FF}][\x{FE00}-\x{FEFF}]?|[\x{25A0}-\x{25FF}][\x{FE00}-\x{FEFF}]?|[\x{2600}-\x{27BF}][\x{FE00}-\x{FEFF}]?|[\x{2900}-\x{297F}][\x{FE00}-\x{FEFF}]?|[\x{2B00}-\x{2BF0}][\x{FE00}-\x{FEFF}]?|[\x{1F000}-\x{1F6FF}][\x{FE00}-\x{FEFF}]?/u', $string, $emojis);
print_r($emojis[0]); // Array ( [0] => 😃 [1] => 🙃 )

PHP preg_match_all failing

I'm trying to extract ID from a possibly huge text, what did I miss?
preg_match_all('/(ID\s\d+)/', "ID 20380843, ID ​20675712", $matches);
print_r( $matches[0] );
Only return:
Array
(
[0] => ID 20380843
)
Instead of:
Array
(
[0] => ID 20380843
[1] => ID 20675712
)
Did you copy that string from your code? Because there is something sneaky happening.
When I copied the code to my editor, it gave me this for string:
"ID 20380843, ID ?20675712"
As you can see, there is a questionmark-sign in the 2nd, thus failing your expression :)
Your problem isn't preg_replace_all, it's your source file. There's an invisible unicode character in the second ID - you can see by copy/pasting it into this Unicode Converter, you'll see U+200B show up in various forms in the lower boxes:
Unicode U+hex notation
preg_match_all('/(ID\s\d+)/', "ID 20380843, ID U+200B^20675712", $matches);
(emphasis mine)
This is the Unicode Zero-Width Spaaace, which is apparently not included in \s as PHP's PREG defines it.
print_r(matches) instead of print_r(matches[0]);
try
preg_match_all('/(ID\s\d+)/', "ID 20380843, ID ​20675712", $matches);
print_r( $matches );

Is there a way to match recursively/nested with regex? (PHP, preg_match_all)

How can I match both (http://[^"]+)'s?:
(I know it's an illegal URL, but same idea)
I want the regex to give me these two matches:
1 http://yoursite.com/goto/http://aredirectURL.com/extraqueries
2 http://aredirectURL.com/extraqueries
Without running multiple preg_match_all's
Really stumped, thanks for any light you can shed.
This regular expression will get you the output you want: ((?:http://[^"]+)(http://[^"]+)). Note the usage of the non-capturing group (?:regex). To read more about non-capturing groups, see Regular Expression Advanced Syntax Reference.
<?php
preg_match_all(
'((?:http://[^"]+)(http://[^"]+))',
'',
$out);
echo "<pre>";
print_r($out);
echo "</pre>";
?>
The above code outputs the following:
Array
(
[0] => Array
(
[0] => http://yoursite.com/goto/http://aredirectURL.com/extraqueries
)
[1] => Array
(
[0] => http://aredirectURL.com/extraqueries
)
)
you can split the string with this function:
http://de.php.net/preg_split
each part can contain e.g. one of the urls in the array given in the result.
if there is more content maybe call the preg_split using a callback operation while your full text is "worked" on.
$str = '';
preg_match("/\"(http:\/\/.*?)(http:\/\/.*?)\"/i", $str, $match);
echo "{$match[0]}{$match[1]}\n";
echo "{$match[1]}\n";

PHP regex issue: cannot find $C

I'm trying to parse dollar amounts from a text of in mixed French (Canadian) and English. The text is in UTF-8. They use $C to denote currency. For some reason when I use preg_match neither the '$' nor the 'C' can be found. Everything else works fine. Any ideas?
e.g. use
preg_match_all('/\$C/u', $match)
on "Thanks for a payment of 46,00 $C" returns empty.
I think the regex can't find those characters because they aren't there. If you initialize the string like this:
$source = "Thanks for a payment of 46,00 $C";
...(i.e., as a double-quoted string literal), $C gets interpreted as a variable name. Since you never initialized that variable, it gets replaced with nothing in the actual string. You should either use single-quotes to initialize the string, or escape the dollar sign with a backslash like you did in the regex.
By the way, this couldn't be an encoding problem, because (in the example, at least), all the characters are from the ASCII character set. Whether it was encoded as UTF-8, ISO-8859-1 or ASCII, the binary representation of the string would be identical.
preg_match_all('/\$C/u', 'Thanks for a payment of 46,00 $C', $matches);
print_r($matches);
works fine for me:
Array
(
[0] => Array
(
[0] => $C
)
)
Maybe this helps:
// assuming $text is the input string
$matches = array();
preg_match_all('/([0-9,\\.]+)\\s*\\$C/u', $text, $matches);
if ($matches) {
$price = floatval(str_replace(',', '.', $matches[1][0]));
printf("%.2f\n", $price);
} else {
printf("No price found\n");
}
Just make sure the input string ($text) has been properly decoded into an Unicode string. (For example, if it's in UTF-8, use the utf8_decode function.)

Categories