php - regex pattern - php

I need to use a regex pattern , but what is the right php "decode" . my pattern is "similar" to BBcode i.e. ['something'] the 'something' could be "any length" but realistically I doubt not more than 10 chars/numbers. What is the correct php syntax to "unscrambe" i.e.
if ($row->xyz =['something'] ):
do this
else:
do that
endif;
Thanks in advance

A basic regexp to match BBCode style tags would look something like this:
preg_match('/\[[\/]?[A-Za-z0-9]+\]/', $row->xyz)
That will match anything that starts with a "[", ends with a "]", and has one or more alphanumeric characters in the middle (with an optional "/" for an end-tag.) Note it has flaws - for example, if you have a nested "[...]" in a larger "[...]", it will only grab the inner one. (i.e. [foo[bar]] will return only "[bar]".)
Example:
<?php
$regexp = '/\[[\/]?[A-Za-z0-9]+\]/';
$testString = '[i]An italic string with some [b]bold[/b] text.[/i]';
preg_match_all($regexp, $testString, $result);
print_r($result);
?>
Result:
array(1) {
[0]=> array(4) {
[0]=> string(3) "[i]"
[1]=> string(3) "[b]"
[2]=> string(4) "[/b]"
[3]=> string(4) "[/i]"
}
}
Of course, I'm not sure this is what you actually mean you want to do, but it is what you say you want to do. Are you sure you want to find BBCodes, rather than find strings that are wrapped in them?

Related

Strange result of using asterisk * quantifier

I am trying to practice asterisk * quantifier on a simple string, but while i have only two letters, the result contains a third match.
<?php
$x = 'ab';
preg_match_all("/a*/",$x,$m);
echo '<pre>';
var_dump($m);
echo '</pre>';
?>
the result came out:
array(1) {
[0]=>
array(3) {
[0]=> string(1) "a"
[1]=> string(0) ""
[2]=> string(0) ""
}
}
As i understand it first matched a then nothing matched when b, so the result should be
array(1) {
[0]=>
array(2) {
[0]=> string(1) "a"
[1]=> string(0) ""
}
}
So what is the third match?
From using a regex demo tool here, we can see that the first match is a, while the second and third matches are the zero width delimiters in between a and b, and also in between b and the end of the string.
Keep in mind that the behavior of preg_match_all is to repeatedly take the pattern a* and try to apply it sequentially to the entire input string.
I suspect that what you really want to use here is a+. If you examine this second demo, you will see that with a+ we only get a single match, for the single a letter in ab. So, I vote for using a+ here to resolve your problem.
Your regular expression '/a/*' Matches zero(empty) or more consecutive a characters.
Example : if you try to match '/a*/' to an empty string it will return one match because * refer to nothing or more . see here
the preg_match_all continues to look until finishning processing the entire string. Once match is found, it remainds of the string to try and apply another match.

How to get numbers between a long space (PHP Regex)

I'd like to extract the numbers specifically with a PHP regex expression, I don't get the regex very much although I'm currently trying with the regex101 website. Thing is, I have this:
66
28006 MadridVer teléfono
(Literally that, it's seen with a lot of more spaces and 28006 MadridVer teléfono is presented in the next line actually). And I'd like to extract the number 28006 or at least split the findings of the expression in a way I have the 28006 separately in one of the groups. What would be my php regex expresion like? Maybe apart from capturing spaces I should capture a new line or something. But I am totally lost in this (yes, I'm an absolute regex novice yet).
I don't see a need for regex.
Remove the new line and explode on space.
Then use array_filter to remove empty values from the array and rearrange the array with array_values.
$str = "66
28006 MadridVer teléfono";
$str = str_replace("\n", " ", $str);
$arr = explode(" ", $str);
$arr = array_values(array_filter($arr));
var_dump($arr);
Returns:
array(4) {
[0]=>
string(2) "66"
[1]=>
string(5) "28006"
[2]=>
string(9) "MadridVer"
[3]=>
string(9) "teléfono"
}

PHP and RegEx: how to split a string including comma,space,colon to some substring

I'm trying to split a string that can either be comma, space or semi-colon delimitted. It could also contain a space or spaces after each delimitter. For example
chr1:22222-333333 or
chr1 22222 333333 or
chr1 22222 333333 or
chr1:22,222-33,333
Any one of these would produce an array with three values ["chr1","22222","33333"], I have tried some method, but it not all complete. especially the fourth case.
Thank you very much for help me.
$yourString = "chr1:22222-33333"; // for instance
$output = preg_split("/:| |;/", $yourString);
This acts as an equivalent of explode() but when you want multiple delimiters.
Explanation of the characters in the preg_split statement:
/ acts to enclose the regular expression, as to say ok, that's happening here
| acts as a OR statement, as if to tell this OR this OR that
So that in the end, /:| |;/ means select anything that is ":" or " " or ";"
If you want to practice or simply understand better the principles of RegEx, you can have a look to this nice collection of RegEx tutorials
you can use str_replace with explode
$str = array('chr1:22222-333333', 'chr1 22222 333333', 'chr1 22222 333333', 'chr1:22,222-33,333');
foreach($str as $val){
var_dump(explode(" ", str_replace(array(',',':','-'), array('',' ', ' '), $val)));
}
which pretty much removes all , then replaces : AND - with a space then explodes with spaces as a delimiter.
Demo
which produces
array(3) {
[0]=>
string(4) "chr1"
[1]=>
string(5) "22222"
[2]=>
string(6) "333333"
}
array(3) {
[0]=>
string(4) "chr1"
[1]=>
string(5) "22222"
[2]=>
string(6) "333333"
}
array(3) {
[0]=>
string(4) "chr1"
[1]=>
string(5) "22222"
[2]=>
string(6) "333333"
}
array(3) {
[0]=>
string(4) "chr1"
[1]=>
string(5) "22222"
[2]=>
string(5) "33333"
}
If you value conciseness and want to keep things neat, preg_split is the best way to go, in my opinion.
In the following examples, I assume you want your input separated by commas, spaces or colons:
$splitted = preg_split("/[,: ]/", $string);
If you want to treat tabs as whitespaces, you can replace the single space character with \s, which will match tabs as well:
$splitted = preg_split("/[,:\s]/", $string);
Note: The \s will match newlines too, if your input may eventually be a multline string.
Yet, if you don't trust your input (You don't, right?) and think that perhaps subsequent spaces and/or tabs should be ignored and treated as single spaces, you can go with this version:
$splitted = preg_split("/,|:|\s/", $string);
All the forms above work great provided the input you presented. If you want to play with these a little, this is a nice place to do so.

Need Regexp help PHP

I have for example such string - "7-th Road" or "7th number some other words" or "Some word 8-th word".
I need to get the first occurrence of number and all other next symbols to first occurrence of space.
So for examples above i need such values "7-th", "7th", "8-th".
And then from these matches like "7-th" i need extract only numbers in other operations.
Thanks in advance!
Regex should be /(\d+)([^\d]+)\s/ and the numbers would resolve to $1 and the ending characters to $2
Sample Code:
$string = '7-th Road';
preg_match_all('/(\d+)([^\d]+)\s/', $string, $result, PREG_PATTERN_ORDER);
var_dump($result[1]);
array(1) {
[0]=> string(1) "7"
}
var_dump($result[2]);
array(1) {
[0]=> string(1) "-th"
}
Are you asking for something like this?
#(\d+)-?(?:st|nd|rd|th)#
Example
If you would like to get just nums from the text use it:
preg_match_all('/(\d+)[th|\-th]*?/','7-th", "7th", "8-th', $matches);
But if you would like to remove 'th' or other just do replacement:
preg_replace('/(\d+)[th|\-th]*?/','$1', 'some string')
Not sure about the last one...

Having a bit of regex headaches with varied links and href delimiters (" and ')

So, I want to match the following link structures with a preg_match_all in php..
<a garbage href="http://this.is.a.link.com/?query=this has invalid spaces" possible garbage>
<a garbage href='http://this.is.a.link.com/?query=this also has has invalid spaces' possible garbage>
<a garbage href=http://this.is.a.link.com/?query=no_spaces_but_no_delimiters possible garbage>
<a garbage href=http://this.is.a.link.com/?query=no_spaces_but_no_delimiters>
I can get " and ' deilmited urls one by doing
'#<a[^>]*?href=("|\')(.*?)("|\')#is'
or I can get all 3, but not if there are spaces in the first two with:
'#<a[^>]*?href=("|\')?(.*?)[\s\"\'>]#is'
How can I formulate this so that it will pick up " and ' delimited with potential spaces, but also properly encoded URLs without delimiters.
OK, this seems to work:
'#<a[^>]*?href=((["\'][^\'"]+["\'])|([^"\'\s>]+))#is'
($matches[1] contains the urls)
Only annoyance is that quoted urls have the quotes still on, so you'll have to strip them off:
$first = substr($match, 0, 1);
if($first == '"' || $first == "'")
$match = substr($match, 1, -1);
EDIT: I have edited this to work a little better than I originally posted.
You almost have it in the second regex:
'#<a[^>]*?href=("|\')?(.*?)[\\1|>]#is'
Returns the following array:
array(3) {
[0]=>
array(4) {
[0]=>
string(92) "<a garbage href="http://this.is.a.link.com/?query=this has invalid spaces" possible garbage>"
[1]=>
string(101) "<a garbage href='http://this.is.a.link.com/?query=this also has has invalid spaces' possible garbage>"
[2]=>
string(94) "<a garbage href=http://this.is.a.link.com/?query=no_spaces_but_no_delimiters possible garbage>"
[3]=>
string(77) "<a garbage href=http://this.is.a.link.com/?query=no_spaces_but_no_delimiters>"
}
[1]=>
array(4) {
[0]=>
string(1) """
[1]=>
string(1) "'"
[2]=>
string(0) ""
[3]=>
string(0) ""
}
[2]=>
array(4) {
[0]=>
string(74) "http://this.is.a.link.com/?query=this has invalid spaces" possible garbage"
[1]=>
string(83) "http://this.is.a.link.com/?query=this also has has invalid spaces' possible garbage"
[2]=>
string(77) "http://this.is.a.link.com/?query=no_spaces_but_no_delimiters possible garbage"
[3]=>
string(60) "http://this.is.a.link.com/?query=no_spaces_but_no_delimiters"
}
}
Works with or without delimiters.
Use a DOM parser. You cannot parse (x)HTML with regular expressions.
$html = <<<END
<a garbage href="http://this.is.a.link.com/?query=this has invalid spaces" possible garbage>
<a garbage href='http://this.is.a.link.com/?query=this also has has invalid spaces' possible garbage>
<a garbage href=http://this.is.a.link.com/?query=no_spaces_but_no_delimiters possible garbage>
<a garbage href=http://this.is.a.link.com/?query=no_spaces_but_no_delimiters>
END;
$domd = new DOMDocument();
libxml_use_internal_errors(true);
$domd->loadHTML($html);
libxml_use_internal_errors(false);
$items = $domd->getElementsByTagName("a");
foreach ($items as $item) {
var_dump($item->getAttribute("href"));
}
When you say you want to match them, are you trying to extract information out of the links, or simply find hyperlinks with a href? If you're after only the latter, this should work just fine:
/<a[^>]*href=[^\s].*?>/
As #JasonWoof indicated, you need to use an embedded alternation: one alternative for quoted URLs, one for non-quoted. I also recommend using a capturing group to determine which kind of quote is being used, as #DanHorrigan did. With the addition of a negative lookahead ((?!\\2)) and possessive quantifiers (*+), you can create a highly robust regex that is also very quick:
~
<a\\s+[^>]*?\\bhref=
(
(["']) # capture the opening quote
(?:(?!\\2).)*+ # anything else, zero or more times
\\2 # match the closing quote
|
[^\\s>]*+ # anything but whitespace or closing brackets
)
~ix
See it in action on ideone. (The doubled backslashes are because the regex is written in the form of a PHP heredoc. I'd prefer to use a nowdoc, but ideone is apparently still running PHP 5.2.)

Categories