Manipulating string using PHP - php

In PHP, I have strings like so:
$string = "This is a 123 test (your) string (for example 1234) to test.";
From that string, I'd like to get the words inside the () with the numbers. I've tried using explode but since I have 2 word/group of strings enclosed in the parentheses, I end up getting (your) instead of (for example 1234). I've also used substr like so:
substr($string, -20)
This works most of the time but the problem with this is, there are instances that the string is shorter so it ends up getting even the unwanted string. I've also tried using regular expression in which I set something like so:
/[^for]/
but that did not work either. The string I want to get always starts with "for" but the length varies. How do I manipulate php so that I can get only the string enclosed inside the parentheses that starts with the word for?

I might use preg_match() in this case.
preg_match("#\((for.*?)\)#",$string,$matches);
Any matches found would be stored in $matches.

Use the following regular expression:
(\(for.*?\))
It will capture patterns like:
(for)
(foremost)
(for example)
(for 1)
(for: 1000)
A sample PHP code:
$pattern = '/(\(for.*?\))/';
$result = preg_match_all(
$pattern,
" text (for example 1000) words (for: 20) other words",
$matches
);
if ( $result > 0 ) {
print_r( $matches );
}
Above print_r( $matches ) result:
Array
(
[0] => Array
(
[0] => (for example 1000)
[1] => (for: 20)
)
[1] => Array
(
[0] => (for example 1000)
[1] => (for: 20)
)
)

Use preg_match for regular expression
$matches = array();
$pattern = '/^for/i';
preg_match($pattern,$string,$matches);
pirnt_r($matches);
If matches is provided, then it is filled with the results of search. $matches[0] will contain the text that matched the full pattern, $matches[1] will have the text that matched the first captured parenthesized subpattern, and so on.

$matches = array();
preg_match("/(\(for[\w\d\s]+\))/i",$string,$matches);
var_dump($matches);

Related

find a specific word in string php

I have a text in PHP stored in the variable $row. I'd like to find the position of a certain group of words and that's quite easy. What's not so easy is to make my code recognize that the word it has found is exactly the word i'm looking for or a part of a larger word. Is there a way to do it?
Example of what I'd like to obtain
CODE:
$row= "some ugly text of some kind i'd like to find in someway"
$token= "some";
$pos= -1;
$counter= substr_count($row, $token);
for ($h=0; $h<$counter; $h++) {
$pos= strpos($row, $token, $pos+1);
echo $pos.' ';
}
OUTPUT:
what I obtain:
0 17 47
what I'd like to obtain
0 17
Any hint?
Use preg_match_all() with word boundaries (\b):
$search = preg_quote($token, '/');
preg_match_all("/\b$search\b/", $row, $m, PREG_OFFSET_CAPTURE);
Here, the preg_quote() statement is used to correctly escape the user input so as to use it in our regular expression. Some characters have special meaning in regular expression language — without proper escaping, those characters will lose their "special meaning" and your regex might not work as intended.
In the preg_match_all() statement, we are supplying the following regex:
/\b$search\b/
Explanation:
/ - starting delimiter
\b - word boundary. A word boundary, in most regex dialects, is a position between a word character (\w) and a non-word character (\W).
$search - escaped search term
\b - word boundary
/ - ending delimiter
In simple English, it means: find all the occurrences of the given word some.
Note that we're also using PREG_OFFSET_CAPTURE flag here. If this flag is passed, for every occurring match the appendant string offset will also be returned. See the documentation for more information.
To obtain the results you want, you can simply loop through the $m array and extract the offsets:
$result = implode(' ', array_map(function($arr) {
return $arr[1];
}, $m[0]));
echo $result;
Output:
0 18
Demo
What you're looking for is a combination of Regex with a word boundaries pattern and the flag to return the offset (PREG_OFFSET_CAPTURE).
PREG_OFFSET_CAPTURE
If this flag is passed, for every occurring match the appendant
string offset will also be returned. Note that this changes the
value of matches into an array where every element is an array
consisting of the matched string at offset 0 and its string offset
into subject at offset 1.
$row= "some ugly text of some kind i'd like to find in someway";
$pattern= "/\bsome\b/i";
preg_match_all($pattern, $row, $matches, PREG_OFFSET_CAPTURE);
And we get something like this:
Array
(
[0] => Array
(
[0] => Array
(
[0] => some
[1] => 0
)
[1] => Array
(
[0] => some
[1] => 18
)
)
)
And just loop through the matches and extract the offset where the needle was found in the haystack.
// store the positions of the match
$offsets = array();
foreach($matches[0] as $match) {
$offsets[] = $match[1];
}
// display the offsets
echo implode(' ', $offsets);
Use preg_match():
if(preg_match("/some/", $row))
// [..]
The first argument is a regex, which can match virtually anything you want to match. But, there are dire warnings about using it to match things like HTML.

preg_replace_callback regex issue, match with (.*?) returns array

Given the string {{esc}}"Content"{{/esc}} ... {{esc}}"More content"{{/esc}} I would like to output \"Content\" ... \"More content\" e.g., I am trying to escape the quotes inside a string. (This is a contrived example, though, so an answer with something like 'just use this library to do it' would be unhelpful.)
Here is my current solution:
return preg_replace_callback(
'/{{esc}}(.*?){{\/esc}}/',
function($m) {
return str_replace('"', '\\"', $m[1]);
},
$text
);
As you can see, I need to say $m[1], because a print_r reveals that $m looks like this:
Array
(
[0] => {{esc}}"Content"{{/esc}}
[1] => "Content"
)
or, for the second match,
Array
(
[0] => {{esc}}"More content"{{/esc}}
[1] => "More content"
)
My question is: why does my regex cause $m to be an array? Is there any way I can get the result of $m[1] as just a single variable $m?
The regex matches the string and puts the result into array. If match, the first index store the whole match string, the rest elements of the array are the string captured.
preg_replace_callback() acts like preg_match():
$result = array();
preg_match('/{{esc}}(.*?){{\/esc}}/', $input_str, $result);
// $result will be an array if match.
With the help of Jack, I answered my own question here since srain did not make this point clear: The second element of the array is the result captured by the parenthesized subexpression (.*?), per the PHP manual. Indeed, there does not appear to be a convenient way to extract the string matched by this subexpression otherwise.

Split string depending on the existence of a leading character

In PHP, I need to split a string by ":" characters without a leading "*".
This is what using explode() does:
$string = "1*:2:3*:4";
explode(":", $string);
output: array("1*", "2", "3*", "4")
However the output I need is:
output: array("1*:2", "3*:4")
How would I achieve the desired output?
You're probably looking for preg_match_all() rather than explode(), as you are attempting a more complex split than explode() itself can handle. preg_match_all() will allow you to gather all of the parts of a string that match a specific pattern, expressed using a regular expression. The pattern you are looking for is something along the lines of:
anything except : followed by *: followed by anything but :
So, try this instead:
preg_match_all('/[^:]+\*:[^:]+/', $string, $matches);
print_r($matches);
Which will output something like:
Array
(
[0] => Array
(
[0] => 1*:2
[1] => 3*:4
)
)
Which you should be able to use in much the same way that you would use the results of explode() even if there is the added dimension in the array (it divides the matches into 'groups', and all your results match against the whole expression or the first (0th) group).
$str = '1*:2:3*:4';
$res = preg_split('~(?<!\*):~',$str);
print_r($res);
will output
Array
(
[0] => 1*:2
[1] => 3*:4
)
The pattern basically says:
split by [a colon that is not lead by an asterisk]

Get all occurrences of words between curly brackets

I have a text like:
This is a {demo} phrase made for {test}
I need to get
demo
test
Note: My text can have more than one block of {}, not always two. Example:
This is a {demo} phrase made for {test} written in {English}
I used this expression /{([^}]*)}/ with preg_match but it returns only the first word, not all words inside the text.
Use preg_match_all instead:
preg_match_all($pattern, $input, $matches);
It's much the same as preg_match, with the following stipulations:
Searches subject for all matches to the regular expression given in
pattern and puts them in matches in the order specified by flags.
After the first match is found, the subsequent searches are continued
on from end of the last match.
Your expression is correct, but you should be using preg_match_all() instead to retrieve all matches. Here's a working example of what that would look like:
$s = 'This is a {demo} phrase made for {test}';
if (preg_match_all('/{([^}]*)}/', $s, $matches)) {
echo join("\n", $matches[1]);
}
To also capture the positions of each match, you can pass PREG_OFFSET_CAPTURE as the fourth parameter to preg_match_all. To use that, you can use the following example:
if (preg_match_all('/{([^}]*)}/', $s, $matches, PREG_OFFSET_CAPTURE)) {
foreach ($matches[1] as $match) {
echo "{$match[0]} occurs at position {$match[1]}\n";
}
}
As the { and } are part of regex matching syntax, you need to escape these characters:
<?php
$text = <<<EOD
this {is} some text {from}
which I {may} want to {extract}
some words {between} brackets.
EOD;
preg_match_all("!\{(\w+)\}!", $text, $matches);
print_r($matches);
?>
produces
Array
(
[0] => Array
(
[0] => {is}
[1] => {from}
[2] => {may}
[3] => {extract}
[4] => {between}
)
... etc ...
)
This example may be helpful to understand the use of curly brackets in regexes:
<?php
$str = 'abc212def3456gh34ij';
preg_match_all("!\d{3,}!", $str, $matches);
print_r($matches);
?>
which returns:
Array
(
[0] => Array
(
[0] => 212
[1] => 3456
)
)
Note that '34' is excluded from the results because the \d{3,} requires a match of at least 3 consecutive digits.
Matching portions between pair of braces using RegEx, is less better than using Stack for this purpose. Using RegEx would be something like «quick and dirty patch», but for parsing and processing input string you have to use a stack. Visit here for the concept and here for applying the same.

Regular expression to parse pipe-delimited data enclosed in double braces

I'm trying to match a string like this:
{{name|arg1|arg2|...|argX}}
with a regular expression
I'm using preg_match with
/{{(\w+)\|(\w+)(?:\|(.+))*}}/
but I get something like this, whenever I use more than two args
Array
(
[0] => {{name|arg1|arg2|arg3|arg4}}
[1] => name
[2] => arg1
[3] => arg2|arg3|arg4
)
The first two items cannot contain spaces, the rest can.
Perhaps I'm working too long on this, but I can't find the error - any help would be greatly appreciated.
Thanks Jan
Don't use regular expressions for these kind of simple tasks. What you really need is:
$inner = substr($string, 2, -2);
$parts = explode('|', $inner);
# And if you want to make sure the string has opening/closing braces:
$length = strlen($string);
assert($inner[0] === '{');
assert($inner[1] === '{');
assert($inner[$length - 1] === '}');
assert($inner[$length - 2] === '}');
The problem is here: \|(.+)
Regular expressions, by default, match as many characters as possible. Since . is any character, other instances of | are happily matched too, which is not what you would like.
To prevent this, you should exclude | from the expression, saying "match anything except |", resulting in \|([^\|]+).
Should work for anywhere from 1 to N arguments
<?php
$pattern = "/^\{\{([a-z]+)(?:\}\}$|(?:\|([a-z]+))(?:\|([a-z ]+))*\}\}$)/i";
$tests = array(
"{{name}}" // should pass
, "{{name|argOne}}" // should pass
, "{{name|argOne|arg Two}}" // should pass
, "{{name|argOne|arg Two|arg Three}}" // should pass
, "{{na me}}" // should fail
, "{{name|arg One}}" // should fail
, "{{name|arg One|arg Two}}" // should fail
, "{{name|argOne|arg Two|arg3}}" // should fail
);
foreach ( $tests as $test )
{
if ( preg_match( $pattern, $test, $matches ) )
{
echo $test, ': Matched!<pre>', print_r( $matches, 1 ), '</pre>';
} else {
echo $test, ': Did not match =(<br>';
}
}
Of course you would get something like this :) There is no way in regular expression to return dynamic count of matches - in your case the arguments.
Looking at what you want to do, you should keep up with the current regular expression and just explode the extra args by '|' and add them to an args array.
indeed, this is from PCRE manual:
When a capturing subpattern is
repeated, the value captured is the
substring that matched the final
iteration. For example, after
(tweedle[dume]{3}\s*)+ has matched
"tweedledum tweedledee" the value of
the captured substring is
"tweedledee". However, if there are
nested capturing subpatterns, the
corresponding captured values may have
been set in previous iterations. For
example, after /(a|(b))+/ matches
"aba" the value of the second captured
substring is "b".

Categories