Multiple Hash Tags removal - php

function getHashTagsFromString($str){
$matches = array();
$hashTag=array();
if (preg_match_all('/#([^\s]+)/', $str, $matches)) {
for($i=0;$i<sizeof($matches[1]);$i++){
$hashtag[$i]=$matches[1][$i];
}
return $hashtag;
}
}
test string $str = "STR
this is a string
with a #tag and
another #hello #hello2 ##hello3 one
STR";
using above function i am getting answers but not able to remove two # tags from ##hello3 how to remove that using single regular expression

Update your regular expression as follows:
/#+(\S+)/
Explanation:
/ - starting delimiter
#+ - match the literal # character one or more times
(\S+) - match (and capture) any non-space character (shorthand for [^\s])
/ - ending delimiter
Regex101 Demo
The output will be as follows:
Array
(
[0] => tag
[1] => hello
[2] => hello2
[3] => hello3
)
Demo

EDIT: To match all the hash tags use:
preg_match_all('/#\S+/', $str, $match);
To remove, instead of preg_match_all you should use preg_replace for replacement.
$repl = preg_replace('/#\S+/', '', $str);

Related

Trying to create a regex in PHP that matches patterns inside a pattern

I have seen some regex examples where the string is "Test string: Group1Group2", and using preg_match_all(), matching for patterns of text that exists inside the tags.
However, what I am trying to do is a bit different, where my string is something like this:
"some t3xt../s8fo=123,sij(variable1=123,variable2=743,variable3=535)"
What I want to do is match the sections such as 'variable=123' that exist inside the parenthesis.
What I have so far is this:
if( preg_match_all("/\(([^\)]*?)\)"), $string_value, $matches )
{
print_r( $matches[1] );
}
But this just captures everything that's inside the parenthesis, and doesn't match anything else.
Edit:
The desired output would be:
"variable1=123"
"variable2=743"
"variable3=535"
The output that I am getting is:
"variable1=123,variable2=743,variable3=535"
You can extract the matches you need with a single call to preg_match_all if the matches do not contain (, ) or ,:
$s = '"some t3xt../s8fo=123,sij(variable1=123,variable2=743,variable3=535)"';
if (preg_match_all('~(?:\G(?!\A),|\()\K[^,]+(?=[^()]*\))~', $s, $matches)) {
print_r($matches[0]);
}
See the regex demo and a PHP demo.
Details:
(?:\G(?!\A),|\() - either end of the preceding successful match and a comma, or a ( char
\K - match reset operator that discards all text matched so far from the current overall match memory buffer
[^,]+ - one or more chars other than a comma (use [^,]* if you expect empty matches, too)
(?=[^()]*\)) - a positive lookahead that requires zero or more chars other than ( and ) and then a ) immediately to the right of the current location.
I would do this:
preg_match("/\(([^\)]+)\)/", $string_value, $matches);
$result = explode(",", $matches[1]);
If your end result is an array of key => value then you can transform it into a query string:
preg_match("/\(([^\)]+)\)/", $string_value, $matches);
parse_str(str_replace(',', '&', $matches[1]), $result);
Which yields:
Array
(
[variable1] => 123
[variable2] => 743
[variable3] => 535
)
Or replace with a newline \n and use parse_ini_string().

Match regex pattern that isn't within a bbcode tag

I am attempting to create a regex patten that will match words in a string that begin with #
Regex that solves this initial problem is '~(#\w+)~'
A second requirement of the code is that it must also ignore any matches that occur within [quote] and [/quote] tags
A couple of attempts that have failed are:
(?:[0-9]+|~(#\w+)~)(?![0-9a-z]*\[\/[a-z]+\])
/[quote[\s\]][\s\S]*?\/quote](*SKIP)(*F)|~(#\w+)~/i
Example: the following string should have an array output as displayed:
$results = [];
$string = "#friends #john [quote]#and #jane[/quote] #doe";
//run regex match
preg_match_all('regex', $string, $results);
//dump results
var_dump($results[1]);
//results: array consisting of:
[1]=>"#friends"
[2]=>"#john"
[3]=>"#doe
You may use the following regex (based on another related question):
'~(\[quote](?:(?1)|.)*?\[/quote])(*SKIP)(*F)|#\w+~s'
See the regex demo. The regex accounts for nested [quote] tags.
Details
(\[quote](?:(?1)|.)*?\[/quote])(*SKIP)(*F) - matches the pattern inside capturing parentheses and then (*SKIP)(*F) make the regex engine omit the matched text:
\[quote] - a literal [quote] string
(?:(?1)|.)*? - any 0+ (but as few as possible) occurrences of the whole Group 1 pattern ((?1)) or any char (.)
\[/quote] - a literal [/quote] string
| - or
#\w+ - a # followed with 1+ word chars.
PHP demo:
$results = [];
$string = "#friends #john [quote]#and #jane[/quote] #doe";
$rx = '~(\[quote\](?:(?1)|.)*?\[/quote])(*SKIP)(*F)|#\w+~s';
preg_match_all($rx, $string, $results);
print_r($results[0]);
// => Array ( [0] => #friends [1] => #john [2] => #doe )

Words finder regex fails

I'm using this pattern to check if certain words exists in a string:
/\b(apple|ball|cat)\b/i
It works on this string cat ball apple
but not on no spaces catball smallapple
How can the pattern be modified so that the words match even if they are combined with other words and even if there are no spaces?
Remove \b from the regex. \b will match a word boundary, and you want to match the string that is not a complete word.
You can also remove the capturing group (denoted by ()) as it is not required any longer.
Use
/apple|ball|cat/i
Regex Demo
An IDEONE PHP demo:
$re = "/apple|ball|cat/i";
$str = "no spaces catball smallapple";
preg_match_all($re, $str, $matches);
print_r($matches[0]);
Results:
[0] => cat
[1] => ball
[2] => apple

Regex to match a set of characters but only if particular characters are not grouped

This is a tricky one, I have a string:
This is some text with a {%TAG IN IT%} and some more text then {%ANOTHER TAG%} with some more text at the end.
I have a regex to match the tags:
({%\w+[\w =!:;,\.\$%"'#\?\-\+\{}]*%})
Which will match a starting tag with any alphanumeric character followed by any number of other ansi characters (sample set specified in the regex above).
However (in PHP using "preg_match_all" and "preg_split" at least) the fact that the set contains both the percent (%) and the curly braces ({}) means that the regex matches too much if there are two tags on the same line.
e.g, in the example given, the following is matched:
{%TAG IN IT%} and some more text then {%ANOTHER TAG%}
As you can see, the %}...{% were matched. So, what I need is to allow the "%" but NOT when followed by "}"
I've tried non-reedy matching, and negative lookahead, but the negative lookahead won't work in a character set (i.e. everything in the [\w...]* set).
I'm stuck!
You could use alternation to achieve this:
/\{%(?:[^%]|%(?!}))*%\}/
It matches either characters that aren't % or those that aren't followed by } (using a look-ahead assertion).
$str = 'This is some text with a {%tag with % and } inside%} and some more text then {%ANOTHER TAG%} with some more text at the end.';
$pattern = '/\{%(?:[^%]|%(?!}))*%\}/';
preg_match_all($pattern, $str, $matches);
print_r($matches[0]);
Output:
Array
(
[0] => {%tag with % and } inside%}
[1] => {%ANOTHER TAG%}
)
A slight modification of your regexp works(Just add the question mark to make it non-greedy)-
<?php
$input = "This is some text with a {%TAG % }IT%%} and some more text then {%ANOTHER TAG%} with some more text at the end.";
$regexp = "/{%\w+[\w =!:;,\.\$%\"'#\?\-\+\{}]*?%}/";
// ^ Notice this
if(preg_match_all($regexp, $input, $matches, PREG_SET_ORDER)) {
foreach($matches as $match) {
var_dump($match);
echo "\r\n";
}
unset($match);
}
/*
Outputs:
array
0 => string '{%TAG % }IT%%}' (length=14)
array
0 => string '{%ANOTHER TAG%}' (length=15)
*/
?>

Regex Help with manipulating string

i am seriously struggling to get my head around regex.
I have a sring with "iPhone: 52.973053,-0.021447"
i want to extract the two numbers after the colon into two seperate strings so delimited by the comma.
Can anyone help me? Cheers
Try:
preg_match_all('/\w+:\s*(-?\d+\.\d+),(-?\d+\.\d+)/',
"iPhone: 52.973053,-0.021447 FOO: -1.0,-1.0",
$matches, PREG_SET_ORDER);
print_r($matches);
which produces:
Array
(
[0] => Array
(
[0] => iPhone: 52.973053,-0.021447
[1] => 52.973053
[2] => -0.021447
)
[1] => Array
(
[0] => FOO: -1.0,-1.0
[1] => -1.0
[2] => -1.0
)
)
Or just:
preg_match('/\w+:\s*(-?\d+\.\d+),(-?\d+\.\d+)/',
"iPhone: 52.973053,-0.021447",
$match);
print_r($match);
if the string only contains one coordinate.
A small explanation:
\w+ # match a word character: [a-zA-Z_0-9] and repeat it one or more times
: # match the character ':'
\s* # match a whitespace character: [ \t\n\x0B\f\r] and repeat it zero or more times
( # start capture group 1
-? # match the character '-' and match it once or none at all
\d+ # match a digit: [0-9] and repeat it one or more times
\. # match the character '.'
\d+ # match a digit: [0-9] and repeat it one or more times
) # end capture group 1
, # match the character ','
( # start capture group 2
-? # match the character '-' and match it once or none at all
\d+ # match a digit: [0-9] and repeat it one or more times
\. # match the character '.'
\d+ # match a digit: [0-9] and repeat it one or more times
) # end capture group 2
A solution without using regular expressions, using explode() and stripos() :) :
$string = "iPhone: 52.973053,-0.021447";
$coordinates = explode(',', $string);
// $coordinates[0] = "iPhone: 52.973053"
// $coordinates[1] = "-0.021447"
$coordinates[0] = trim(substr($coordinates[0], stripos($coordinates[0], ':') +1));
Assuming that the string always contains a colon.
Or if the identifier before the colon only contains characters (not numbers) you can do also this:
$string = "iPhone: 52.973053,-0.021447";
$string = trim($string, "a..zA..Z: ");
//$string = "52.973053,-0.021447"
$coordinates = explode(',', $string);
Try:
$string = "iPhone: 52.973053,-0.021447";
preg_match_all( "/-?\d+\.\d+/", $string, $result );
print_r( $result );
I like #Felix's non-regex solution, I think his solution for the problem is more clear and readable than using a regex.
Don't forget that you can use constants/variables to change the splitting by comma or colon if the original string format is changed.
Something like
define('COORDINATE_SEPARATOR',',');
define('DEVICE_AND_COORDINATES_SEPARATOR',':');
$str="iPhone: 52.973053,-0.021447";
$s = array_filter(preg_split("/[a-zA-Z:,]/",$str) );
print_r($s);
An even more simple solution is to use preg_split() with a much more simple regex, e.g.
$str = 'iPhone: 52.973053,-0.021447';
$parts = preg_split('/[ ,]/', $str);
print_r($parts);
which will give you
Array
(
[0] => iPhone:
[1] => 52.973053
[2] => -0.021447
)

Categories