I have text in a php variable $sentence. I want to display the actual emoji codes found in $sentence on my web page. Not the emoji graphical icons but the actual emoji codes themselves which are embedded in the sentence.
I have been trying all day to do this by reverse engineering code that is usually used to "remove emojis". No success.
Looking to display the actual codes to emojis embedded in text. Example ...
[twitter.com/ProfitTradeRoom/status/880439582063566848][1]
As you can see the fire emoji is in this tweet. The code for fire is 1F525. I want to see 1F525 on my screen when i echo out the php variable. Basically "1F525 HOT 1F525 stocks today" on my screen instead of the graphical emojis.
String taken straight from my MySQL database (copied and pasted) ...
🔥HOT🔥stocks today
I found the answer! The issue was my mysql connection. I was pulling from mysql into php. None of my regular expressions were matching any emoji unicodes until I did this ...
mysql_set_charset('utf8mb4');
I put mysql_set_charset immediately after I defined the connection string. Now when I pull text with emojis it will match a preg_match / preg_replace. For example detecting the fire emoji in a sentence ...
$theFireEmojiCode = '/[\x{1F525}]/u';
preg_match($theFireEmojiCode, $sentence, $matches);
Or I can do a preg_replace like this ...
$theFireEmojiCode = '/[\x{1F525}]/u';
preg_replace($theFireEmojiCode, "1F525", $sentence);
Hope this helps.
Related
Is there a PHP function that can take a string and convert any special characters to unicode. Similar to htmlspecialchars() or UTF8_encode().
For example in the string: "I think Bob's going too".
I would need the apostrophe or single right quote unicode in place of the apostrophe in "Bob's". So then after conversion the string should read: "I think Bob\u2019s going too".
I need this for use in a PHP script that prints into a javascript function.
Using \ to escape or ' does not work, it stops the script from running. I am trying to use Flowplayers Playist plugin. The only way it seems I can have a string with special characters is if they are in unicode.
Here is a JSFIDDLE to play around with and see what I mean when I say it doesn't work. Just replace \u2019 with ' or something similar and click to have the song play. The media player just goes black and doesn't play anything, whereas if you leave it with \u2019 then it plays fine.
Any help is appreciated.
I think json_encode() is the function you are looking for here.
The following code:
$string = "I think Bob’s going too";
print_r(json_encode($string));
will output:
"I think Bob\u2019s going too"
I have the following website:
http://stationmeteo.meteorologic.net/metar/your-metar.php?icao=LFRS&day=070308
I want to extract data from it.
I tried using file_get_contents and some regular expressions, but something is not working.
this is the code I tried:
$content=file_get_contents('http://stationmeteo.meteorologic.net/metar/your-metar.php? icao=LFMN&day=010513');
preg_match('/00\:30 07\/03\/2008(.+)01\:30 07\/03\/2008/',$content,$m);
echo $m[0];
echo $m[1];
It's giving me undefined offset 0 and 1.
If I copy the content of the web page directly to $content instead of using file_get_contents, it works fine.
What am I missing?
The problem is that .+ matches any characters except newlines, and there is a newline character in the text you're trying to match.
Try
preg_match('~00:30 07/03/2008(.+)01:30 07/03/2008~s',$content,$m);
(using ~ as a delimiter so you don't have to escape all those slashes, by the way)
The next question is: Why don't I get this problem when copying the contents of the webpage directly into $content? Well, all whitespace is normalized to a single space when a webpage is rendered, turning the \n that's present in the page's source code (press Ctrl-U to see it) into a simple space. And .+ matches that space.
It still keeps the original text layout (I mean the spacing, offsets, new line, paragraphs) while the text fragment is stored in MySql ('text' type) field - I can tell when I peer into it in my DB browser (Adminer:)
but it gets lost when I output it from the DB: it becomes a single line string of my text characters. How can one restore it its original layout?
I've tried to reshape the text fragment using the PHP nl2br() function with some success:
it brought back the newline breaks, but the text words positioning is not kept, everything
shifts to the left.
Thanks in advance for a good idea.
If you've got multiple spaces and things like that. e.g. for code. Then trying using the pre tag.
http://htmldog.com/reference/htmltags/pre
http://reference.sitepoint.com/html/pre
The html_entity_decode() function converts HTML entities to characters.
The syntax is:
html_entity_decode(string, [quotestyle], [character-set]);
You can refer example2.
I'm writing a small PHP script to grab the latest half dozen Twitter status updates from a user feed and format them for display on a webpage. As part of this I need a regex replace to rewrite hashtags as hyperlinks to search.twitter.com. Initially I tried to use:
<?php
$strTweet = preg_replace('/(^|\s)#(\w+)/', '\1#\2', $strTweet);
?>
(taken from https://gist.github.com/445729)
In the course of testing I discovered that #test is converted into a link on the Twitter website, however #123 is not. After a bit of checking on the internet and playing around with various tags I came to the conclusion that a hashtag must contain alphabetic characters or an underscore in it somewhere to constitute a link; tags with only numeric characters are ignored (presumably to stop things like "Good presentation Bob, slide #3 was my favourite!" from being linked). This makes the above code incorrect, as it will happily convert #123 into a link.
I've not done much regex in a while, so in my rustyness I came up with the following PHP solution:
<?php
$test = 'This is a test tweet to see if #123 and #4 are not encoded but #test, #l33t and #8oo8s are.';
// Get all hashtags out into an array
if (preg_match_all('/(^|\s)(#\w+)/', $test, $arrHashtags) > 0) {
foreach ($arrHashtags[2] as $strHashtag) {
// Check each tag to see if there are letters or an underscore in there somewhere
if (preg_match('/#\d*[a-z_]+/i', $strHashtag)) {
$test = str_replace($strHashtag, ''.$strHashtag.'', $test);
}
}
}
echo $test;
?>
It works; but it seems fairly long-winded for what it does. My question is, is there a single preg_replace similar to the one I got from gist.github that will conditionally rewrite hashtags into hyperlinks ONLY if they DO NOT contain just numbers?
(^|\s)#(\w*[a-zA-Z_]+\w*)
PHP
$strTweet = preg_replace('/(^|\s)#(\w*[a-zA-Z_]+\w*)/', '\1#\2', $strTweet);
This regular expression says a # followed by 0 or more characters [a-zA-Z0-9_], followed by an alphabetic character or an underscore (1 or more), followed by 0 or more word characters.
http://rubular.com/r/opNX6qC4sG <- test it here.
It's actually better to search for characters that aren't allowed in a hashtag otherwise tags like "#Trentemøller" wont work.
The following works well for me...
preg_match('/([ ,.]+)/', $string, $matches);
I have devised this: /(^|\s)#([[:alnum:]])+/gi
I found Gazlers answer to work, although the regex added a blank space at the beginning of the hashtag, so I removed the first part:
(^|\s)
This works perfectly for me now:
#(\w*[a-zA-Z_0-9]+\w*)
Example here: http://rubular.com/r/dS2QYZP45n
I'm trying to scrape a price from a web page using PHP and Regexes. The price will be in the format £123.12 or $123.12 (i.e., pounds or dollars).
I'm loading up the contents using libcurl. The output of which is then going into preg_match_all. So it looks a bit like this:
$contents = curl_exec($curl);
preg_match_all('/(?:\$|£)[0-9]+(?:\.[0-9]{2})?/', $contents, $matches);
So far so simple. The problem is, PHP isn't matching anything at all - even when there are prices on the page. I've narrowed it down to there being a problem with the '£' character - PHP doesn't seem to like it.
I think this might be a charset issue. But whatever I do, I can't seem to get PHP to match it! Anyone have any ideas?
(Edit: I should note if I try using the Regex Test Tool using the same regex and page content, it works fine)
Have you try to use \ in front of £
preg_match_all('/(\$|\£)[0-9]+(\.[0-9]{2})/', $contents, $matches);
I have try this expression with .Net with \£ and it works. I just edited it and removed some ":".
(source: clip2net.com)
Read my comment about the possibility of Curl giving you bad encoding (comment of this post).
maybe pound has it's html entity replacement? i think you should try your regexp with some sort of couching program (i.e. match it against fixed text locally).
i'd change my regexp like this: '/(?:\$|£)\d+(?:\.\d{2})?/'
This should work for simple values.
'#(?:\$|\£|\€)(\d+(?:\.\d+)?)#'
This will not work with thousand separator like 234,343 and 34,454.45.