Why does regex fail to match quotes?

Why does regex fail to match quotes? - php

In my wordpress post contents, I have a line [yu_TOC title="Short Stories"]. I am trying to match it with
preg_match('/\[yu_TOC title=\"(.*?)\"\s*\]/', $content[0], $matchedTitle);
I have printed out the line I wanted to match using error_log(substr($content, 0, 1000));.
The output (relevant part of it) is [yu_TOC title=”Short Stories”]</p>
Is it expected that the quotes have changed from " to ”?
Why does not my pattern match the line that should be matched?
How to fix it?
Update: I have tried to replace []s with {}s, still the same issue.

If those quotes have changed and you also want to match the encoded version you could use an alternation to match either one of them in a capturing group and then use a backreference \1 for the same match as the accompanying closing match.
Your value is in the second capturing group as the first group is used for the backreference.
\[yu_TOC title=("|”)(.*?)\1\s*\]
Regex demo | Php demo
Note that you don't have to escape "
For example
$content = ["[yu_TOC title=”Short Stories”]</p>"];
preg_match('/\[yu_TOC title=("|”)(.*?)\1\s*\]/', $content[0], $matchedTitle);
print_r($matchedTitle);
Output
Array
(
[0] => [yu_TOC title=”Short Stories”]
[1] => ”
[2] => Short Stories
)

Related

Preg Match Return Two Word in Array Position

I am trying to extract all words of a .txt file that contants this structure %HOUSE% %CAR%
I am using Preg_match and It´s works but when I have in the same line two words the array return in one position the two words that are in the same line
$rawContent = file($_FILES["file"]["tmp_name"]);
$content = implode(" ",$rawContent);
preg_match_all("/%.*%/",$content,$arrMatches");
Array ( [0] => %HOSTNAME% [1] => %INTERFAZ_LAN% [2] => %IP_LAN% %MASK_LAN% [3] => %ID_INTERFACE_WAN% )
In Position [2] there are two word for example
I think that is a problem of my preg match expression I need to add some

By default, regular expressions using the * character will be "greedy", meaning it will match as many characters as possible. In this case, the expression .* is matching IP_LAN% %MASK_LAN.
To change this bevavior to non-greedy, that is to match as few characters as possible, add a question mark after the asterisk, so your pattern becomes /%.*?%/.
Alternatively, you can change your approach and, rather than match any character any number of times, match anything except the percentage sign any number of times: /%[^%]*%/.

Regexp for handling "test-12-1"-like strings (php)

I need some help with writing regexp to parse input strings like this ones:
test-12-1
blabla12412-5
t-dsf-gsdg-x-10
to next matches:
test and 1
blabla12412 and 5
t-dsf-gsdg-x and 10
I try to reach it by using something like
$matches = [];
preg_match('/^[a-zA-Z0-9]+(-\d+)+$/', 'test-12-1', $matches);
But I received unexpected result:
array (
0 => 'test-12-1',
1 => '-1',
)
You can move forward with help on this playground: https://ru.functions-online.com/preg_match.html?command={"pattern":"/^[a-zA-Z0-9]+(-\d+)+$/","subject":"test-12-1"}
Thanks a lot!

You may use
'~^(.*?)(?:-(\d+))+$~'
See the regex demo
Details
^ - start of string
(.*?) - Group 1: any zero or more chars other than line break chars, as few as possible
(?:-(\d+))+ - 1 or more occurrences of
- - a hyphen
(\d+) - Group 2: one or more digits (the last occurrence is kept in the group value since it is located in a repeated non-capturing group)
$ - end of string.

How do I locate and replace text with a common element using regex?

I'm pretty lousy at regex, and need help with the following scenario. I need to locate and replace text that has a common structure, but one aspect will be different:
here is a string (with 3 values)
here is another string (with 5 values)
In the above examples, I need to locate and then replace the value in parenthesis. I can't search by parens alone, as the string may contain other parens. But the value in the parens that needs to be replaced is consistently constructed: (with # values) -- the only difference will be the number.
So ideally the regex returns (with 3 values) and (with 5 values) so I can use a simple str_replace to change the text.
This is regex in a PHP script.

Try with this regex :
\(with\s+\d+\s+values\)
Demo here

The following regex should work for you:
/\(with (\d+) values\)/g
This matches strings of the specified format and gives the value in a capture group so it may be used in the replace. The g flag at the end is only needed if you have multiple of these in one string.
Demo here
If, however, there can only be one digit, then the following will work:
/\(with (\d) values\)/g
Or, if the number can only be a digit greater than 1, for example, then the following:
/\(with ([2-9]) values\)/g

If I got you right, you are looking for exactly three or five items within parentheses (comma separated).
This could be accomplished by
\( # "(" literally
(?:[^,()]+,){2} # not , or ( or ) exactly two times
(?:(?:[^,()]+,){2})? # repeated
[^,()]+ # without the comma in the end
\) # the closing parenthesis
See a demo on regex101.com.
If you're really looking only for two variant of strings, you could very easily do
\(with (?:3|5) values\)
In general
\(with \d+ values\)
as proposed by #SchoolBoy.

Something like this maybe
$str ="here is another string (with 5 values)";
preg_match_all("/\(with (\d+) values\)/", $str, $out );
print_r( $out );
Output:
Array
(
[0] => Array
(
[0] => (with 5 values)
)
[1] => Array
(
[0] => 5
)
)
Here at ideone...
It uses the regex
\(with (\d+) values\)
that matches the literal opening parentheses followed by the string with # values, capturing the actual number #, and finally the closing parentheses.
It returns the complete match (the parenthesized string) in the first dimension and the actual number in the second.

PHP Regex URL until a space, \ or " not returning what I need

I am having trouble creating a regex in PHP whereby I need to extract all URLs beginning like
http://hello.hello/asefaesasef my name is
https://aw3raw.com/asdfase/
www.aer.com/afseaegfefsesef\
domain.com/afsegaesga"
I need to basically extract the URL until I hit a white space, a backslash (\) or a double quote (").
I have the following code:
$column = "adsfahttp://hello.hello/asefaesas\"ef asefa aweoija weeij asd sa https://aw3raw.com/asdfase/ asdafewww.aer.com/afseaegfefsesef\ even ashafueh domain.com/afsegaesga\"asdfasda";
preg_match_all("/(http|https):\/\/\S+[^(\"|\\)]+/",$column,$urls);
echo "Url = \n";
print_r($urls);
So I need my to extract so I have:
http://hello.hello/asefaesasef
https://aw3raw.com/asdfase
www.aer.com/afseaegfefsesef
domain.com/afsegaesga
I'm struggling to get my head around it as my result is showing as:
Url =
Array
(
[0] => Array
(
[0] => http://hello.hello/asefaesas"ef asefa aweoija weeij asd sa https://aw3raw.com/asdfase/ asdafewww.aer.com/afseaegfefsesef\ even ashafueh domain.com/afsegaesga
)
[1] => Array
(
[0] => http
)
)

First, you've got the syntax of character classes wrong. Within the square brackets, you don't need parentheses for grouping or pipes for alternation. Just list the characters you're interested in--or in this case, that you want to exclude.
What you're doing now is matching some non-whitespace characters (including \ and "), followed by some not-quote, non-backslash characters (including whitespace). You need to combine both criteria into one negated character class:
preg_match_all("~https?://[^\"\s\\\\]+~", $column, $urls);
Notice that this only matches the URLs starting with http:// or https://. You can' make the protocol optional ("~(?:https?://)?[^\"\s\\\\]+~"), but then the regex will match almost anything, making it useless. Are all your URLs at the beginning of a line, the way you showed them? If so, you can use an anchor instead:
preg_match_all('/(?m)^[^\"\s\\\\]+/', $column, $urls);

You just need to add a \s to your regex: /(http|https):\/\/\S+[^(\"|\\)\s]+/ so it doesn't match a whitespace.

php preg_match_all between ... and

I'm trying to use preg_match_all to match anything between ... and ... and the line does word wrap. I've done number of searches on google and tried different combinations and nothing is working. I have tried this
preg_match_all('/...(.*).../m/', $rawdata, $m);
Below is an example of what the format will look like:
...this is a test...
...this is a test this is a test this is a test this is a test this is a test this is a test this is a test this is a test this is a test...

The s modifier allows for . to include new line characters so try:
preg_match_all('/\.{3}(.*?)\.{3}/s', $rawdata, $m);
The m modifier you were using is so the ^$ acts on a per line basis rather than per string (since you don't have ^$ doesn't make sense).
You can read more about the modifiers here.
Note the . needs to be escaped as well because it is a special character meaning any character. The ? after the .* makes it non-greedy so it will match the first ... that is found. The {3} says three of the previous character.
Regex101 demo: https://regex101.com/r/eO6iD1/1

Please escape the literal dots, since the character is also a regular expressions reservered sign, as you use it inside your code yourself:
preg_match_all('/\.\.\.(.*)\.\.\./m/', $rawdata, $m)
In case what you wanted to state is that there are line breaks within the content to match you would have to add this explicitely to your code:
preg_match_all('/\.\.\.([.\n\r]*)\.\.\./m/', $rawdata, $m)
Check here for reference on what characters the dot includes:
http://www.regular-expressions.info/dot.html

You're almost near to get it,
so you need to update your RE
/\.{3}(.*)\.{3}/m
RE breakdown
/: start/end of string
\.: match .
{3}: match exactly 3(in this case match exactly 3 dots)
(.*): match anything that comes after the first match(...)
m: match strings that are over Multi lines.
and when you're putting all things together, you'll have this
$str = "...this is a test...";
preg_match_all('/\.{3}(.*)\.{3}/m', $str, $m);
print_r($m);
outputs
Array
(
[0] => Array
(
[0] => ...this is a test...
)
[1] => Array
(
[0] => this is a test
)
)
DEMO

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

Why does regex fail to match quotes? - php

Related

Preg Match Return Two Word in Array Position

Regexp for handling "test-12-1"-like strings (php)

How do I locate and replace text with a common element using regex?

PHP Regex URL until a space, \ or " not returning what I need

php preg_match_all between ... and

Categories

Resources