How to write such url pattern? - php

I need URL pattern for my router which would match with:
/page_name.html
/page_name.html/1
/page_name.html/2
....
/page_name.html/999
And preg_match() must put page_name into matches[1] and digit after slash into matches[2] (or empty string, index [2] must always be present!).
I need this to not match my patern:
/page_name.html/
/page_name.html131
I wrote this:
^\/([\w\-]+)\.html[\/]?([\d]{1,3})?$/
But it mathces URLs like /page_name.html123 and doesn't put anything into matches[2] if there is no digit.

You can use this regex:
preg_match('~^/([\w-]+)\.html(?|/(\d{1,3})|())$~', $matches, $input);
RegEx Demo
(?|...) - Subpatterns declared within each alternative of this construct will start over from the same index. This is to make sure to always populate $matches[2] with something, even an empty string.

Related

PHP preg_replace_callback match string but exclude urls

What I'm trying to do is find all the matches within a content block, but ignore anything that is inside tags, for use inside preg_replace_callback().
For example:
test
test title
test
In this case, I want the first line to match, and the third line to match, but NOT the url match, nor the title match in between the a tags.
I've got a regex that I feel like is close:
#(?!<.*?)(\btest\b)(?![^<>]*?>)#si
(and this will not match the url part)
But how do I modify the regex to also exclude the "test" between a and /a?
If it's always the same pattern you can use [A-Z] or a combination like [A-Za-z]
I ended up solving it myself. This regex pattern will do what I wanted:
#(?!<a[^>]*?>)(\btest\b)(?![^<]*?<\/a>)#si

Regex - Match characters but don't include within results

I have got the following Regex, which ALMOST works...
(?:^https?:\/\/)(?:www|[a-z]+)\.([^.]+)
I need the result to be the only result, or within the same position in the Array.
So for example this http://m.facebook.com/ matches perfect, there is only 1 group.
However, if I change it to http://facebook.com/ then I get com/in place of where Facebook should be. So I need to have (?:www|[a-z]+) as an optional check really.
Edit:
What I expect is just to match facebook, if ANY of the strings are as follows:
http://www.facebook.com
http://facebook.com
http://m.facebook.com
And obviously the https counterparts.
This is my Regex now
(?:^https?:\/\/)(?:www)?\.?([^.]+)
This is close, however it matches the m on when I try `http://m.facebook.com
https://regex101.com/r/GDapY5/1
So I need to have (?:www|[a-z]+) as an optional check really.
A ? at the end of a pattern is generally used for "optional" bits -- it means "match zero or one" of that thing, so your subpattern would be something like this:
(?:www|[a-z]+)?
If you're simply trying to get the second level domain, I wouldn't bother with regex, because you'll be constantly adjusting it to handle special cases you come across. Just split on dots and take the penultimate value:
$domain = array_reverse(explode('.', parse_url($str)['host']))[1];
Or:
$domain = array_reverse(explode('.', parse_url($str, PHP_URL_HOST)))[1];
Perhaps you could make the first m. part optional with (?:\w+\.)?.
Instead of a capturing group you could use \K to reset the starting point of the reported match.
Then match one or more word characters \w+ and use a positive lookahead to assert that what follows is a dot (?=\.)
For example:
^https?://(?:www)?(?:\w+\.)?\K\w+(?=\.)
Edit: Or you could match for m. or www. using an alternation:
^https?://(?:m\.|www\.)?\K\w+(?=\.)
Demo Php

php preg_match_all and recursive rexep

I have some trouble with a generated file, and I like to make some substitution
Say I have got this pattern :
<ul/><htmlelement>some text</htmlelement>
I want to find with my regexep the value of some text, since I can find the element htmlelement with a regexp, i want to recursively include it in the regex like
preg_match_all("#<ul/><([^><])>(.)*</(first capuring match)>#", $string, $matches);
Do you have a solution?
You miss the + quantifier for the "htmlelement" opening tag.
You need the * inside the capture group
and better make it non-greedy with ?.
Refer the "first capturing match" with \1.
So the regex should be:
<ul\/><([^><]+)>(.*?)<\/\1>
^ ^^ ^
1 23 4
Demo: https://regex101.com/r/f25N9J/1

Regex to match a section between two static url components

I have a url like so: http://example.com/c/TEXTTOMATCH/. The problem is that the url isn't always like that; sometimes it's http://example.com/c/TEXTTOMATCH/#/?test. I'm trying to use a regex to grab everything between /c/ and /. I've tried
$catpreg = preg_match('/c(.*)/', $reffer, $matches);
but it fails.
How about this:
<?php
$url='http://example.com/wreqwreqrq/rfqewrqwe/c/TEXTTOMATCH/';
$split_url=parse_url($url, PHP_URL_PATH);
//print_r($split_url);
$e=explode('/',$split_url);
//find "c" key and add one
$find=array_search('c',$e);
echo $e[$find+1];
Try this:
preg_match('#/c/(.*?)/#', $reffer, $matches);
You were just everything after c, not matching the slashes. The slashes in your call were being used as the delimiters around the regexp, I used # as the delimiters so I could use / inside the regexp without having to escape them.
The non-greedy quantifier .*? ensures that it only matches TEXTTOMATCH in the second example, not TEXTTOMATCH/#.

How to preg_match '{95}1340{113}1488{116}1545{99}1364'

i want to preg_match following as it is
$this_string = '{95}1340{113}1488{116}1545{99}1364';
My best try was
preg_match('/^[\{\d+\}\d+]+$/', $this_string);
That matches
{95}1340{113}1488
but also
{95}1340{113}
which is wrong.
I know why it is matching last example. One match {95}1340 was true, so '+' 'll be always true. But i don't know how to tell, if it match, so it has always be a complete match in '[…]'
i do expect only matches likes these
{…}…
{…}…{…}…
{…}…{…}…{…}…
one of the tries:
^(\{\d+\}\d+)+$
does also match
{99}1364
at the very last end of this string as a second match, so i get back an Array with two Elements:
Array[0] = {95}1340{113}1488{116}1545{99}1364 and
Array[1] = {99}1364
Problem is unnecessary use of character class in your regex i.e. [ and ].
You can use:
'/^(\{\d+\}\d+)+$/'
The translation of your regex to a clearer thing would be: /^[\{\}0-9+]+$/, this would be explained as everything that is inside this chracters {}0123456789+, exactly those ones.
What you want is grouping, for grouping, parentheses are needed and not character classes () instead [], so what you want to do is replace [] for ().
Short answer: '/^(\{\d+\}\d+)+$/'
What you are trying to do is a little unclear. Since your last edit, I assume that you want to check the global format of the string and to extract all items (i.e. {121}1231) one by one. To do that you can use this code:
$str = '{95}1340{113}1488{116}1545{99}1364';
$pattern = '~\G(?:{\d+}\d+|\z)~';
if (preg_match_all($pattern, $str, $matches) && empty(array_pop($matches[0])))
print_r($matches[0]);
\G is an anchor for the start of the string or the end of the previous match
\z is an anchor for the end of the string
The alternation with \z is only needed to check that the last match is at the end of the string. If the last match is empty, you are sure that the format is correct until the end.

Categories