Extract specific data from string using Regex (PHP)

Extract specific data from string using Regex (PHP) - php

My objective is to check if the message have emojis or not , if yes i am going to convert them.
I am having a problem to find out how many emojis are in the string.
Example :
$string = ":+1::+1::+1:"; //can be any string that has :something: between ::
the objective here is to get the :+1: ( all of them ) but i tried two pattern and i keep getting only 1 match.
preg_match_all('/:(.*?):/', $string, $emojis, PREG_SET_ORDER); // $emojis is declared as $emojis='' at the beginning.
preg_match_all('/:(\S+):/', $string, $emojis, PREG_SET_ORDER);
The Rest of the codeafter getting the matches i am doing this :
if (isset($emojis[0]))
{
$stringMap = file_get_contents("/api/common/emoji-map.json");
$jsonMap = json_decode($stringMap, true);
foreach ($emojis as $key => $value) {
$emojiColon = $value[0];
$emoji = key_exists($emojiColon, $jsonMap) ? $jsonMap[$emojiColon] : '';
$newBody = str_replace($emojiColon, $emoji, $newBody);
}
}
i would appreciate any help or suggestions , Thanks.

I updated your expression a little bit:
preg_match_all('/\:(.+?)(\:)/', $string, $emojis, PREG_SET_ORDER);
echo var_dump($emojis);
The : is now escaped, otherwise it could be treated as special character.
Replaced the * by +, this way you won't match two consecutive :: but require at least one charcter in-between.

Related

Replacing values in string

I have a string which contains certain number of #{number} structures. For example:
#328_#918_#1358
SKU:#666:#456
TEST--#888/#982
For each #{number} structure, I have to replace it with a known string.
For the first example:
#328_#918_#1358
I have the following strings:
328="foo"
918="bar"
1358"arg"
And the result should be:
foo_bar_arg
How do I achieve such effect? My current code looks like that:
$matches = array();
$replacements = array();
// starting string
$string = "#328_#918:#1358";
// getting all the numbers from the string
preg_match_all("/\#[0-9]+/", $string, $matches);
// getting rid of #
foreach ($matches[0] as $key => &$feature) {
$feature = preg_replace("/#/", "", $feature);
} // end foreach
// obtaining the replacement values
foreach ($matches[0] as $key => $value) {
$replacement[$value] = "fizz"; // here the value required for replacement is obtained
} // end foreach
But I have no idea how to actually perform a replacement in $string variable using values from $replacement table. Any help is much appreciated!

You can use a preg_replace_callback solution:
$string = '#328_#918:#1358
SKU:#666:#456
TEST--#888/#982';
$replacements = [328=>"foo", 918=>"bar", 1358=>"arg"];
echo preg_replace_callback("/#([0-9]+)/", function ($m) use ($replacements) {
return isset($replacements[$m[1]]) ? $replacements[$m[1]] : $m[0];
}
,$string);
See the PHP demo.
The #([0-9]+) regex will match all non-overlapping occurrences of # and one or more digits right after capturing them into Group 1. If there is an item in the replacements associative array with the numeric key, the whole match is replaced with the corresponding value. Else, the match is returned so that no replacement could occur and the match does not get removed.

Getting a word after a specific character in PHP

I want to get the words in a string where a specific character stands before, in this case, its the : character.
textexjkladjladf :theword texttextext :otherword :anotherword
From this snippet the expected output will be:
theword
otherword
anotherword
How do i do this with PHP?

You can use regular expression:
$string = "textexjkladjladf :theword texttextext :otherword :anotherword";
$matches = array();
preg_match_all('/(?<=:)\w+/', $string, $matches);
foreach ($matches[0] as $item) {
echo $item."<br />";
}
Output is:
theword
otherword
anotherword
The array what you want is the $matches[0]

Another way of getting those words without Regular Expression can be:
Use explode(' ',$str) to get all the words.
Then loop the words and check which one starts with ':'.

try
$str = 'textexjkladjladf :theword texttextext :otherword :anotherword';
$tab = exlode(':',$str);
print_r($tab);
//if echo entry 0 =>
echo $tab[0]; // => textexjkladjladf

PHP extract one part of a string

I have to extract the email from the following string:
$string = 'other_text_here to=<my.email#domain.fr> other_text_here <my.email#domain.fr> other_text_here';
The server send me logs and there i have this kind of format, how can i get the email into a variable without "to=<" and ">"?
Update: I've updated the question, seems like that email can be found many times in the string and the regular expresion won't work well with it.

You can try with a more restrictive Regex.
$string = 'other_text_here to=<my.email#domain.fr> other_text_here';
preg_match('/to=<([A-Z0-9._%+-]+#[A-Z0-9.-]+\.[A-Z]{2,4})>/i', $string, $matches);
echo $matches[1];

Simple regular expression should be able to do it:
$string = 'other_text_here to=<my.email#domain.fr> other_text_here';
preg_match( "/\<(.*)\>/", $string, $r );
$email = $r[1];
When you echo $email, you get "my.email#domain.fr"

Try this:
<?php
$str = "The day is <tag> beautiful </tag> isn't it? ";
preg_match("'<tag>(.*?)</tag>'si", $str, $match);
$output = array_pop($match);
echo $output;
?>
output:
beautiful

Regular expression would be easy if you are certain the < and > aren't used anywhere else in the string:
if (preg_match_all('/<(.*?)>/', $string, $emails)) {
array_shift($emails); // Take the first match (the whole string) off the array
}
// $emails is now an array of emails if any exist in the string
The parentheses tell it to capture for the $matches array. The .* picks up any characters and the ? tells it to not be greedy, so the > isn't picked up with it.

PHP Regex match string but exclude a certain word

This question has been asked multiple times, but I didn't find a working solution for my needs.
I've created a function to check for the URLs on the output of the Google Ajax API:
https://ajax.googleapis.com/ajax/services/search/web?v=1.0&q=site%3Awww.bierdopje.com%2Fusers%2F+%22Gebruikersprofiel+van+%22+Stevo
I want to exclude the word "profile" from the output. So that if the string contains that word, skip the whole string.
This is the function I've created so far:
function getUrls($data)
{
$regex = '/https?\:\/\/www.bierdopje.com[^\" ]+/i';
preg_match_all($regex, $data, $matches);
return ($matches[0]);
}
$urls = getUrls($data);
$filteredurls = array_unique($urls);
I've created a sample to make clear what I mean exactly:
http://rubular.com/r/1U9YfxdQoU
In the sample you can see 4 strings selected from which I only need the upper 2 strings.
How can I accomplish this?

function getUrls($data)
{
$regex = '#"(https?://www\\.bierdopje\\.com[^"]*+(?<!/profile))"#';
return preg_match_all($regex, $data, $matches) ?
array_unique($matches[1]) : array();
}
$urls = getUrls($data);
Result: http://ideone.com/dblvpA
vs json_decode: http://ideone.com/O8ZixJ
But generally you should use json_decode.

Don't use regular expressions to parse JSON data. What you want to do is parse the JSON and loop over it to find the correct matching elements.
Sample code:
$input = file_get_contents('https://ajax.googleapis.com/ajax/services/search/web?v=1.0&q=site%3Awww.bierdopje.com%2Fusers%2F+%22Gebruikersprofiel+van+%22+Stevo');
$parsed = json_decode($input);
$cnt = 0;
foreach($parsed->responseData->results as $response)
{
// Skip strings with 'profile' in there
if(strpos($response->url, 'profile') !== false)
continue;
echo "Result ".++$cnt."\n\n";
echo 'URL: '.$response->url."\n";
echo 'Shown: '.$response->visibleUrl."\n";
echo 'Cache: '.$response->cacheUrl."\n\n\n";
}
Sample on CodePad (since it doesn't support loading external files the string is inlined there)

regular expression word preceded by char

I want to grab a specific string only if a certain word is followed by a = sign.
Also, I want to get all the info after that = sign until a / is reached or the string ends.
Let's take into example:
somestring.bla/test=123/ohboy/item/item=capture
I want to get item=capture but not item alone.
I was thinking about using lookaheads but I'm not sure it this is the way to go. I appreciate any help as I'm trying to grasp more and more about regular expressions.

[^/=]*=[^/]*
will give you all the pairs that match your requirements.
So from your example it should return:
test=123
item=capture
Refiddle Demo

If you want to capture item=capture, it is straightforward:
/item=[^\/]*/
If you want to also extract the value,
/item=([^\/]*)/
If you only want to match the value, then you need to use a look-behind.
/(?<=item=)[^\/]*/
EDIT: too many errors due to insomnia. Also, screw PHP and its failure to disregard separators in a character group as separators.

Here is a function I wrote some time ago. I modified it a little, and added the $keys argument so that you can specify valid keys:
function getKeyValue($string, Array $keys = null) {
$keys = (empty($keys) ? '[\w\d]+' : implode('|', $keys));
$pattern = "/(?<=\/|$)(?P<key>{$keys})\s*=\s*(?P<value>.+?)(?=\/|$)/";
preg_match_all($pattern, $string, $matches, PREG_SET_ORDER);
foreach ($matches as & $match) {
foreach ($match as $key => $value) {
if (is_int($key)) {
unset($match[$key]);
}
}
}
return $matches ?: FALSE;
}
Just trow in the string and valid keys:
$string = 'somestring.bla/test=123/ohboy/item/item=capture';
$keys = array('test', 'item');
$keyValuePairs = getKeyValue($string, $keys);
var_dump($keyValuePairs);

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

Extract specific data from string using Regex (PHP) - php

Related

Replacing values in string

Getting a word after a specific character in PHP

PHP extract one part of a string

PHP Regex match string but exclude a certain word

regular expression word preceded by char

Categories

Resources