php extract Emoji from a string - php

I have a string contain emoji.
I want extract emoji's from that string,i'm using below code but it doesn't what i want.
$string = "πŸ˜ƒ hello world πŸ™ƒ";
preg_match('/([0-9#][\x{20E3}])|[\x{00ae}\x{00a9}\x{203C}\x{2047}\x{2048}\x{2049}\x{3030}\x{303D}\x{2139}\x{2122}\x{3297}\x{3299}][\x{FE00}-\x{FEFF}]?|[\x{2190}-\x{21FF}][\x{FE00}-\x{FEFF}]?|[\x{2300}-\x{23FF}][\x{FE00}-\x{FEFF}]?|[\x{2460}-\x{24FF}][\x{FE00}-\x{FEFF}]?|[\x{25A0}-\x{25FF}][\x{FE00}-\x{FEFF}]?|[\x{2600}-\x{27BF}][\x{FE00}-\x{FEFF}]?|[\x{2900}-\x{297F}][\x{FE00}-\x{FEFF}]?|[\x{2B00}-\x{2BF0}][\x{FE00}-\x{FEFF}]?|[\x{1F000}-\x{1F6FF}][\x{FE00}-\x{FEFF}]?/u', $string, $emojis);
i want this:
$emojis = ["πŸ˜ƒ", "πŸ™ƒ"];
but return this:
$emojis = ["πŸ˜ƒ"]
and also if:
$string = "πŸ˜…πŸ˜‡β˜πŸΏ"
it return only first emoji
$emoji = ["πŸ˜…"]

Try looking at preg_match_all function. preg_match stops looking after it finds the first match, which is why you're only ever getting the first emoji back.
Taken from this answer:
preg_match stops looking after the first match. preg_match_all, on the other hand, continues to look until it finishes processing the entire string. Once match is found, it uses the remainder of the string to try and apply another match.
http://php.net/manual/en/function.preg-match-all.php
So your code would become:
$string = "πŸ˜ƒ hello world πŸ™ƒ";
preg_match_all('/([0-9#][\x{20E3}])|[\x{00ae}\x{00a9}\x{203C}\x{2047}\x{2048}\x{2049}\x{3030}\x{303D}\x{2139}\x{2122}\x{3297}\x{3299}][\x{FE00}-\x{FEFF}]?|[\x{2190}-\x{21FF}][\x{FE00}-\x{FEFF}]?|[\x{2300}-\x{23FF}][\x{FE00}-\x{FEFF}]?|[\x{2460}-\x{24FF}][\x{FE00}-\x{FEFF}]?|[\x{25A0}-\x{25FF}][\x{FE00}-\x{FEFF}]?|[\x{2600}-\x{27BF}][\x{FE00}-\x{FEFF}]?|[\x{2900}-\x{297F}][\x{FE00}-\x{FEFF}]?|[\x{2B00}-\x{2BF0}][\x{FE00}-\x{FEFF}]?|[\x{1F000}-\x{1F6FF}][\x{FE00}-\x{FEFF}]?/u', $string, $emojis);
print_r($emojis[0]); // Array ( [0] => πŸ˜ƒ [1] => πŸ™ƒ )

Related

How can i replace all specific strings in a long text which have dynamic number in between?

I am trying to replace strings contains specific string including a dynamic number in between.
I tried preg_match_all but it give me NULL value
Here is what i am actually looking for with all details:
In my long text there are values which contains this [_wc_acof_(some dynamic number)] , i.e: [_wc_acof_6] i want to convert them to $postmeta['_wc_acof_14'][0]
This can be multiple in the same long text.
I want to run through with this logic:
1- First i get all numbers after [_wc_acof_ and save them in array by using preg_match_all as guided here get number after string php regex
2- Then i run a foreach loop and set my arrays for patterns and replacements with that number i.e:
foreach ($allMatchNumbers as $MatchNumber){
$key = "[_wc_acof_" . $MatchNumber. "]";
$patterns[] = $key;
$replacements[] = $postmeta[$key][0];
}
3- Then i do replace with this echo preg_replace($patterns, $replacements, $string);
But i am unable to get preg_match_all it gives me NULL where i tried below
preg_match_all('/[_wc_acof_/',$string,$allMatchNumbers );
Please Help? i am not sure if preg_grep is better than this?
It seems you want to process the input in stages, to obtain all the numbers in specific lexical context first, and then modify the user input using some lookup technique.
The first step can be implemented as
preg_match_all('~\[_wc_acof_(\d+)]~', $text, $matches)
that extracts all sequences of one or more digit in between [_wc_acof_ and ] into Group 1 (you can access the values via $matches[1]).
Then, you may fill the $replacements array using these values.
Next, you can use
preg_replace_callback('~\[_wc_acof_(\d+)]~', function($m) use ($replacements){
return $replacements[$m[1]];
}, $text)
See the PHP demo:
<?php
$text = '<p>[_wc_acof_6] i want to convert this and it contains also this [_wc_acof_9] or can be this [_wc_acof_11] number can never be static</p>';
if (preg_match_all('~\[_wc_acof_(\d+)]~', $text, $matches)) {
foreach($matches[1] as $matched){
$replacements[$matched] = 'NEW_VALUE_FOR_'.$matched.'_KEY';
}
print_r($replacements);
echo preg_replace_callback('~\[_wc_acof_(\d+)]~', function($m) use ($replacements){
return $replacements[$m[1]];
}, $text);
}
Output:
Array
(
[0] => 6
[1] => 9
[2] => 11
)
NEW_VALUE_FOR_6_KEY i want to convert this and it contains also this NEW_VALUE_FOR_9_KEY or can be this NEW_VALUE_FOR_11_KEY number can never be static

Getting substring from a matching string in PHP

I want to get sub-string from a string and the sub-string will have a certain format.
Eg :
This is my test ABC-MMS-0001
Another test for ABC-MMS-00023
I need a way to get just the sub string which is in format ABC-MMS-<anynumber>
The above example should give me:
ABC-MMS-0001
ABC-MMS-00023
Try using preg_match with the pattern \b\w+-\w+-\d+\b:
$input = "This is my test ABC-MMS-0001";
$matches = array();
preg_match("/\b\w+-\w+-\d+\b/", $input, $matches);
print_r($matches)[0];
This outputs:
ABC-MMS-0001

Matching a substring (an apostrophe) in a given word using regex

I have a server application which looks up where the stress is in Russian words. The end user writes a word ΠΆΠ°ΠΆΠ΄Π°. The server downloads a page from another server which contains the stresses indicated with apostrophes for each case/declension like this ΠΆΠ°'ΠΆΠ΄Π°. I need to find that word in the downloaded page.
In Russian the stress is always written after a vowel. I've been using so far a regex that is a grouping of all possible combinations (ΠΆΠ°'ΠΆΠ΄Π°|ΠΆΠ°ΠΆΠ΄Π°'). Is there a more elegant solution using just a regex pattern instead of making a PHP script which creates all these combinations?
EDIT:
I have a word ΠΆΠ°ΠΆΠ΄Π°
The downloaded page contains the string ΠΆΠ°'ΠΆΠ΄Π°. (notice the
apostrophe, I do not before-hand know where the apostrophe in the
word is)
I want to match the word with apostrophe (ΠΆΠ°'ΠΆΠ΄Π°).
P.S.: So far I have a PHP script creating the string (ΠΆΠ°'ΠΆΠ΄Π°|ΠΆΠ°ΠΆΠ΄Π°') used in regex (apostrophe is only after vowels) which matches it. My goal is to get rid of this script and use just regex in case it's possible.
If I understand your question,
have these options (d'isorder|di'sorder|dis'order|diso'rder|disor'der|disord'er|disorde'r|disorderβ€Œβ€‹') and one of these is in the downloaded page and I need to find out which one it is
this may suit your needs:
<pre>
<?php
$s = "d'isorder|di'sorder|dis'order|diso'rder|disor'der|disord'er|disorde'r|disorder'|disorde'";
$s = explode("|",$s);
print_r($s);
$matches = preg_grep("#[aeiou]'#", $s);
print_r($matches);
running example: https://eval.in/207282
Uhm... Is this ok with you?
<?php
function find_stresses($word, $haystack) {
$pattern = preg_replace('/[aeiou]/', '\0\'?', $word);
$pattern = "/\b$pattern\b/";
// word = 'disorder', pattern = "diso'?rde'?r"
preg_match_all($pattern, $haystack, $matches);
return $matches[0];
}
$hay = "something diso'rder somethingelse";
find_stresses('disorder', $hay);
// => array(diso'rder)
You didn't specify if there can be more than one match, but if not, you could use preg_match instead of preg_match_all (faster). For example, in Italian language we have Γ ncora and ancΓ²ra :P
Obviously if you use preg_match, the result would be a string instead of an array.
Based, on your code, and the requirements that no function is called and disorder is excluded. I think this is what you want. I have added a test vector.
<pre>
<?php
// test code
$downloadedPage = "
there is some disorde'r
there is some disord'er in the example
there is some di'sorder in the example
there also' is some order in the example
there is some disorder in the example
there is some dso'rder in the example
";
$word = 'disorder';
preg_match_all("#".preg_replace("#[aeiou]#", "$0'?", $word)."#iu"
, $downloadedPage
, $result
);
print_r($result);
$result = preg_grep("#'#"
, $result[0]
);
print_r($result);
// the code you need
$word = 'also';
preg_match("#".preg_replace("#[aeiou]#", "$0'?", $word)."#iu"
, $downloadedPage
, $result
);
print_r($result);
$result = preg_grep("#'#"
, $result
);
print_r($result);
Working demo: https://eval.in/207312

preg match to get text after # symbol and before next space using php

I need help to find out the strings from a text which starts with # and till the next immediate space by preg_match in php
Ex : I want to get #string from this line as separate.
In this example, I need to extract "#string" alone from this line.
Could any body help me to find out the solutions for this.
Thanks in advance!
PHP and Python are not the same in regard to searches. If you've already used a function like strip_tags on your capture, then something like this might work better than the Python example provided in one of the other answers since we can also use look-around assertions.
<?php
$string = <<<EOT
I want to get #string from this line as separate.
In this example, I need to extract "#string" alone from this line.
#maybe the username is at the front.
Or it could be at the end #whynot, right!
dog#cat.com would be an e-mail address and should not match.
EOT;
echo $string."<br>";
preg_match_all('~(?<=[\s])#[^\s.,!?]+~',$string,$matches);
print_r($matches);
?>
Output results
Array
(
[0] => Array
(
[0] => #string
[1] => #maybe
[2] => #whynot
)
)
Update
If you're pulling straight from the HTML stream itself, looking at the Twitter HTML it's formatted like this however:
<s>#</s><b>UserName</b>
So to match a username from the html stream you would match with the following:
<?php
$string = <<<EOT
<s>#</s><b>Nancy</b> what are you on about?
I want to get <s>#</s><b>string</b> from this line as separate. In this example, I need to extract "#string" alone from this line.
<s>#</s><b>maybe</b> the username is at the front.
Or it could be at the end <s>#</s><b>WhyNot</b>, right!
dog#cat.com would be an e-mail address and should not match.
EOT;
$matchpattern = '~(<s>(#)</s><b\>([^<]+)</b>)~';
preg_match_all($matchpattern,$string,$matches);
$users = array();
foreach ($matches[0] as $username){
$cleanUsername = strip_tags($username);
$users[]=$cleanUsername;
}
print_r($users);
Output
Array
(
[0] => #Nancy
[1] => #string
[2] => #maybe
[3] => #WhyNot
)
Just do simply:
preg_match('/#\S+/', $string, $matches);
The result is in $matches[0]

Get data out of string

I am going to parse a log file and I wonder how I can convert such a string:
[5189192e][game]: kill killer='0:Tee' victim='1:nameless tee' weapon=5 special=0
into some kind of array:
$log['5189192e']['game']['killer'] = '0:Tee';
$log['5189192e']['game']['victim'] = '1:nameless tee';
$log['5189192e']['game']['weapon'] = '5';
$log['5189192e']['game']['special'] = '0';
The best way is to use function preg_match_all() and regular expressions.
For example to get 5189192e you need to use expression
/[0-9]{7}e/
This says that the first 7 characters are digits last character is e you can change it to fits any letter
/[0-9]{7}[a-z]+/
it is almost the same but fits every letter in the end
more advanced example with subpatterns and whole details
<?php
$matches = array();
preg_match_all('\[[0-9]{7}e\]\[game]: kill killer=\'([0-9]+):([a-zA-z]+)\' victim=\'([0-9]+):([a-zA-Z ]+)\' weapon=([0-9]+) special=([0-9])+\', $str, $matches);
print_r($matches);
?>
$str is string to be parsed
$matches contains the whole data you needed to be pared like killer id, weapon, name etc.
Using the function preg_match_all() and a regex you will be able to generate an array, which you then just have to organize into your multi-dimensional array:
here's the code:
$log_string = "[5189192e][game]: kill killer='0:Tee' victim='1:nameless tee' weapon=5 special=0";
preg_match_all("/^\[([0-9a-z]*)\]\[([a-z]*)\]: kill (.*)='(.*)' (.*)='(.*)' (.*)=([0-9]*) (.*)=([0-9]*)$/", $log_string, $result);
$log[$result[1][0]][$result[2][0]][$result[3][0]] = $result[4][0];
$log[$result[1][0]][$result[2][0]][$result[5][0]] = $result[6][0];
$log[$result[1][0]][$result[2][0]][$result[7][0]] = $result[8][0];
$log[$result[1][0]][$result[2][0]][$result[9][0]] = $result[10][0];
// $log is your formatted array
You definitely need a regex. Here is the pertaining PHP function and here is a regex syntax reference.

Categories