Attempting to understand handling regular expressions with php - php

I am trying to make sense of handling regular expression with php. So far my code is:
PHP code:
$string = "This is a 1 2 3 test.";
$pattern = '/^[a-zA-Z0-9\. ]$/';
$match = preg_match($pattern, $string);
echo "string: " . $string . " regex response: " , $match;
Why is $match always returning 0 when I think it should be returning a 1?

[a-zA-Z0-9\. ] means one character which is alphanumeric or "." or " ". You will want to repeat this pattern:
$pattern = '/^[a-zA-Z0-9. ]+$/';
^
"one or more"
Note: you don't need to escape . inside a character group.

Here's what you're pattern is saying:
'/: Start the expressions
^: Beginning of the string
[a-zA-Z0-9\. ]: Any one alphanumeric character, period or space (you should actually be using \s for spaces if your intention is to match any whitespace character).
$: End of the string
/': End the expression
So, an example of a string that would yield a match result is:
$string = 'a'
Of other note, if you're actually trying to get the matches from the result, you'll want to use the third parameter of preg_match:
$numResults = preg_match($pattern, $string, $matches);

You need a quantifier on the end of your character class, such as +, which means match 1 or more times.
Ideone.

Related

get the portion of a string between two positions with php

I have a string like "some words 12345cm some more words"
and I want to extract the 12345cm bit from that string. So I get the position of the first number:
$position_of_first_number = strcspn( "some words 12345cm some more words" , '0123456789' );
Then the position of the first space after $position_of_first_number
$position_of_space_after_numbers = strpos("some words 12345cm some more words", " ", $position_of_first_number);
Then I want to have a function which return the portion of the string between $position_of_first_number and $position_of_space_after_numbers.
How do I do it?
You can use the substr function. Note that it takes a starting position and a length, which you can calculate as the difference between the start and end positions.
Since you are looking for a pattern like blank-digits-letters-blank, I would recommend a regular expression using preg_match:
$s = "some words 12345cm some more words";
preg_match("/\s(?P<result>\d+[^\W\d_]+)\s/", $s, $matches);
echo $matches["result"];
12345cm
Explaining the pattern:
"/.../" limits the pattern in PHP
\s matches any whitespace character
(?P<name>...) names the following pattern
\d+ matches 1 or more digits
[^\W\d_]+ matches 1 or more Unicode-letters (i.e. any character that is not a non-alphanumeric character; see this answer)

Looking for specific character in capture group

I need to replace all double quotes in any (variable) given string.
For example:
$text = 'data-caption="hello"world">';
$pattern = '/data-caption="[[\s\S]*?"|(")]*?">/';
$output = preg_replace($pattern, '"', $text);
should result in:
"hello"world"
(The above pattern is my attempt at getting it to work)
The problem is that I don't now in advance if and how many double quotes are going to be in the string.
How can i replace the " with quot; ?
You may match strings between data-caption=" and "> and then replace all " inside that match with " using a mere str_replace:
$text = 'data-caption="<element attribute1="wert" attribute2="wert">Name</element>">';
$pattern = '/data-caption="\K.*?(?=">)/';
$output = preg_replace_callback($pattern, function($m) {
return str_replace('"', '"', $m[0]);
}, $text);
print_r($output);
// => data-caption="<element attribute1="wert" attribute2="wert">Name</element>">
See the PHP demo
Details
data-caption=" - starting delimiter
\K - match reset operator
.*? - any 0+ chars other than line break chars, as few as possible
(?=">) - a positive lookahead that requires the "> substring immediately to the right of the current location.
The match is passed to the anonymous function inside preg_replace_callback (accessible via $m[0]) and that is where it is possible to replace all " symbols in a convenient way.

PHP Array str_replace Whole Word

I'm doing str_replace on a very long string and my $search is an array.
$search = array(
" tag_name_item ",
" tag_name_item_category "
);
$replace = array(
" tag_name_item{$suffix} ",
" tag_name_item_category{$suffix} "
);
echo str_replace($search, $replace, $my_really_long_string);
The reason why I added spaces on both $search and $replace is because I want to only match whole words. As you would have guessed from my code above, if I removed the spaces and my really long string is:
...
tag_name_item ...
tag_name_item_category ...
...
Then I would get something like
...
tag_name_item_sfx ...
tag_name_item_sfx_category ...
...
This is wrong because I want the following result:
...
tag_name_item_sfx ...
tag_name_item_category_sfx ...
...
So what's wrong?
Nothing really, it works. But I don't like it. Looks dirty, not well coded, inefficient.
I realized I can do something like this using regular expressions using the \b modifier but I'm not good with regex and so I don't know how to preg_replace.
A possible approach using regular expressions would/could look like this:
$result = preg_replace(
'/\b(tag_name_item(_category)?)\b/',
'$1' . $suffix,
$string
);
How it works:
\b: As you say are word boundaries, this is to ensure we're only matching words, not word parts
(: We want to use part of our match in the replacement string (tag_name_index has to be replaced with itself + a suffix). That's why we use a match group, so we can refer back to the match in the replacement string
tag_name_index is a literal match for that string.
(_category)?: Another literal match, grouped and made optional through use of the ? operator. This ensures that we're matching both tag_name_item and tag_name_item_category
): end of the first group (the optional _category match is the second group). This group, essentially, holds the entire match we're going to replace
\b: word boundary again
These matches are replaced with '$1' . $suffix. The $1 is a reference to the first match group (everything inside the outer brackets in the expression). You could refer to the second group using $2, but we're not interested in that group right now.
That's all there is to it really
More generic:
So, you're trying to suffix all strings starting with tag_name, which judging by your example, can be followed by any number of snake_cased words. A more generic regex for that would look something like this:
$result = preg_replace(
'/\b(tag_name[a-z_]*)\b/',
'$1' . $suffix,
$string
);
Like before, the use of \b, () and the tag_name literal remains the same. what changed is this:
[a-z_]*: This is a character class. It matches characters a-z (a to z), and underscores zero or more times (*). It matches _item and _item_category, just as it would match _foo_bar_zar_fefe.
These regex's are case-sensitive, if you want to match things like tag_name_XYZ, you'll probably want to use the i flag (case-insensitive): /\b(tag_name[a-z_]*)\b/i
Like before, the entire match is grouped, and used in the replacement string, to which we add $suffix, whatever that might be
To avoid the problem, you can use strtr that parses the string only once and chooses the longest match:
$pairs = [ " tag_name_item " => " tag_name_item{$suffix} ",
" tag_name_item_category " => " tag_name_item_category{$suffix} " ];
$result = strtr($str, $pairs);
This function replaces the entire whole word but not the substring with an array element which matches the word
<?PHP
function removePrepositions($text){
$propositions=array('/\b,\b/i','/\bthe\b/i','/\bor\b/i');
if( count($propositions) > 0 ) {
foreach($propositions as $exceptionPhrase) {
$text = preg_replace($exceptionPhrase, '', trim($text));
}
$retval = trim($text);
}
return $retval;
}
?>
See the entire example

Add + before word, see all between quotes as one word

I have a question. I need to add a + before every word and see all between quotes as one word.
A have this code
preg_replace("/\w+/", '+\0', $string);
which results in this
+test +demo "+bla +bla2"
But I need
+test +demo +"bla bla2"
Can someone help me :)
And is it possible to not add a + if there is already one? So you don't get ++test
Thanks!
Maybe you can use this regex:
$string = '+test demo between "double quotes" and between \'single quotes\' test';
$result = preg_replace('/\b(?<!\+)\w+|["|\'].+?["|\']/', '+$0', $string);
var_dump($result);
// which will result in:
string '+test +demo +between +"double quotes" +and +between +'single quotes' +test' (length=74)
I've used a 'negative lookbehind' to check for the '+'.
Regex lookahead, lookbehind and atomic groups
I can't test this but could you try it and let me know how it goes?
First the regex: choose from either, a series of letters which may or may not be preceded by a '+', or, a quotation, followed by any number of letters or spaces, which may be preceded by a '+' followed by a quotation.
I would hope this matches all your examples.
We then get all the matches of the regex in your string, store them in the variable "$matches" which is an array. We then loop through this array testing if there is a '+' as the first character. If there is, do nothing, otherwise add one.
We then implode the array into a string, separating the elements by a space.
Note: I believe $matches in created when given as a parameter to preg_match.
$regex = '/[((\+)?[a-zA-z]+)(\"(\+)?[a-zA-Z ]+\")]/';
preg_match($regex, $string, $matches);
foreach($matches as $match)
{
if(substr($match, 0, 1) != "+") $match = "+" + $match;
}
$result = implode($matches, " ");

PHP preg_match find certain word

I am trying to use preg_match to find a certain word in a string of text.
$pattern = "/" . $myword . "/i";
This pattern will find the word "car" inside "cartoon"...
I need just matches where the certain word appears.
P.S The word may be anywhere inside the text.
Thanks
Wrap your regex with word-boundaries:
$pattern = "/\b" . $myword . "\b/i";
or, if your $myword may contain regex-meta-chars, do:
$pattern = "/\b" . preg_quote($myword) . "\b/i";
Try this:
$pattern = "/\b" . $myword . "\b/i";
In regular expressions, the \b escape character represents a "word boundary" character. By wrapping your search term within these boundary matches, you ensure that you will only match the word itself.
$subject = "abcdef";
$pattern = '/^def/';
preg_match($pattern, $subject, $matches, PREenter code hereG_OFFSET_CAPTURE, 3);
print_r($matches);
pattern
The pattern to search for, as a string.
subject
The input string.
matches
If matches is provided, then it is filled with the results of search. $matches[0] will contain the text that matched the full pattern, $matches[1] will have the text that matched the first captured parenthesized subpattern, and so on.
flags
flags can be the following flag:
PREG_OFFSET_CAPTURE
If this flag is passed, for every occurring match the appendant string offset will also be returned. Note that this changes the value of matches into an array where every element is an array consisting of the matched string at offset 0 and its string offset into subject at offset 1.
offset
Normally, the search starts from the beginning of the subject string. The optional parameter offset can be used to specify the alternate place from which to start the search (in bytes).
Example:
if (preg_match('/;/', $_POST['value_code']))
{
$input_error = 1;
display_error(_("The semicolon can not be used in the value code."));
set_focus('value_code');
}

Categories