Regex to match a section between two static url components

Regex to match a section between two static url components - php

I have a url like so: http://example.com/c/TEXTTOMATCH/. The problem is that the url isn't always like that; sometimes it's http://example.com/c/TEXTTOMATCH/#/?test. I'm trying to use a regex to grab everything between /c/ and /. I've tried
$catpreg = preg_match('/c(.*)/', $reffer, $matches);
but it fails.

How about this:
<?php
$url='http://example.com/wreqwreqrq/rfqewrqwe/c/TEXTTOMATCH/';
$split_url=parse_url($url, PHP_URL_PATH);
//print_r($split_url);
$e=explode('/',$split_url);
//find "c" key and add one
$find=array_search('c',$e);
echo $e[$find+1];

Try this:
preg_match('#/c/(.*?)/#', $reffer, $matches);
You were just everything after c, not matching the slashes. The slashes in your call were being used as the delimiters around the regexp, I used # as the delimiters so I could use / inside the regexp without having to escape them.
The non-greedy quantifier .*? ensures that it only matches TEXTTOMATCH in the second example, not TEXTTOMATCH/#.

Related

PHP regex last occurrence of words

My string is: /var/www/domain.com/public_html/foo/bar/folder/another/..
I want to remove the root folder from this string, to get only public folder, because some servers have multiple websites inside.
My actual regex is: /^(.*?)(www|public_html|public|html)/s
My actual result is: /domain.com/public_html/foo/bar/folder/another/..
But i want to remove the last ocorrence, and get somethig like this: /foo/bar/folder/another/..
Thanks!

You have to use a greedy quantifier and to check if the alternative is enclosed between slashes using lookarounds:
/^.*(?<![^\/])(?:www|public(?:_html)?|html)(?![^\/])/
About the lookarounds: I use negative lookarounds with a negated character class to check if there is a slash or the limit of the string at the same time. This way you are sure that for instance html is a folder and not the part of another folder name.
I removed the s modifier that is useless. I removed the capture groups too since the goal is to replace all with an empty string.

The ? makes your expression non-greedy which is not actually what you want here. Try:
^(.*)(www|public_html|public|html)
which should keep going until the last match.
Demo: https://regex101.com/r/v5WbB3/1/

How to write such url pattern?

I need URL pattern for my router which would match with:
/page_name.html
/page_name.html/1
/page_name.html/2
....
/page_name.html/999
And preg_match() must put page_name into matches[1] and digit after slash into matches[2] (or empty string, index [2] must always be present!).
I need this to not match my patern:
/page_name.html/
/page_name.html131
I wrote this:
^\/([\w\-]+)\.html[\/]?([\d]{1,3})?$/
But it mathces URLs like /page_name.html123 and doesn't put anything into matches[2] if there is no digit.

You can use this regex:
preg_match('~^/([\w-]+)\.html(?|/(\d{1,3})|())$~', $matches, $input);
RegEx Demo
(?|...) - Subpatterns declared within each alternative of this construct will start over from the same index. This is to make sure to always populate $matches[2] with something, even an empty string.

how to use preg_replace to replace all ocurrences of a given pattern?

I have a pattern (a slash followed by 1 or more dashes) inside strings that could occur many times like
/hi/--hello/-hi
I want to replace it with
/hi/hello/hi
I have tried
$str = preg_replace('/\/-+/', '/', $subject);
but this does not seem to be working properly. Am I missing something. I use http://www.debuggex.com/ to test my regex and \/-+ does not seem to match the string.

The reason this doesn't work in debuggex.com is that you don't have to put the delimiters on this site.
Remove the slashes at the begining and at the end from the input box.
Write only: \/-+ or /-+ since you don't need to escape the slashes.

PHP Regex - Issue with forward slashes and alternation

I have a series of URLs like so:
http://www.somesite.com/de/page
http://www.somesite.com/de/another
http://www.somesite.com/de/page/something
http://www.somesite.com/de/page/bar
I need to search the block of text and pull the language and am using a regex like so:
/(de|en|jp)/
I'm trying to find and replace, via preg_replace and including the forward slashes:
/de/
/en/
/jp/
However, this doesn't work and does not include the slashes. I've tried escaping the slashes with \, \\. I've tried placing the needle in preg_quote but this breaks the alternation.
I feel like I am missing something very simple here!
edit:
Full function call:
preg_replace("/(de|en|jp)/", "/".$newLang."/", $url);
--
(tagged magento and wordpress as I am trying to solve an issue with unifying the navigation menu when both CMSes are multilingual)

You don't have to use slashes as delimiters, but you have to have some delimiter. Try this:
if( preg_match("(/(de|en|jp)/)",$url,$m)) {
$lanuage = $m[1];
}

You can use a different delimiter, such as %.
if (preg_match('%/(de|en|jp)/%', $url, $match)) {
$lang = $match[1];
}
That should help you, just modify what you have :).

Regular expression pattern to match URL with or without http://www

I'm not very good at regular expressions at all.
I've been using a lot of framework code to date, but I'm unable to find one that is able to match a URL like http://www.example.com/etcetc, but it is also is able to catch something like www.example.com/etcetc and example.com/etcetc.

For matching all kinds of URLs, the following code should work:
<?php
$regex = "((https?|ftp)://)?"; // SCHEME
$regex .= "([a-z0-9+!*(),;?&=$_.-]+(:[a-z0-9+!*(),;?&=$_.-]+)?#)?"; // User and Pass
$regex .= "([a-z0-9\-\.]*)\.(([a-z]{2,4})|([0-9]{1,3}\.([0-9]{1,3})\.([0-9]{1,3})))"; // Host or IP address
$regex .= "(:[0-9]{2,5})?"; // Port
$regex .= "(/([a-z0-9+$_%-]\.?)+)*/?"; // Path
$regex .= "(\?[a-z+&\$_.-][a-z0-9;:#&%=+/$_.-]*)?"; // GET Query
$regex .= "(#[a-z_.-][a-z0-9+$%_.-]*)?"; // Anchor
?>
Then, the correct way to check against the regex is as follows:
<?php
if(preg_match("~^$regex$~i", 'www.example.com/etcetc', $m))
var_dump($m);
if(preg_match("~^$regex$~i", 'http://www.example.com/etcetc', $m))
var_dump($m);
?>
Courtesy: Comments made by splattermania in the PHP manual: preg_match
RegEx Demo in regex101

This worked for me in all cases I had tested:
$url_pattern = '/((http|https)\:\/\/)?[a-zA-Z0-9\.\/\?\:#\-_=#]+\.([a-zA-Z0-9\&\.\/\?\:#\-_=#])*/';
Tests:
http://test.test-75.1474.stackoverflow.com/
https://www.stackoverflow.com
https://www.stackoverflow.com/
http://wwww.stackoverflow.com/
http://wwww.stackoverflow.com
http://test.test-75.1474.stackoverflow.com/
http://www.stackoverflow.com
http://www.stackoverflow.com/
stackoverflow.com/
stackoverflow.com
http://www.example.com/etcetc
www.example.com/etcetc
example.com/etcetc
user:pass#example.com/etcetc
example.com/etcetc?query=aasd
example.com/etcetc?query=aasd&dest=asds
http://stackoverflow.com/questions/6427530/regular-expression-pattern-to-match-url-with-or-without-http-www
http://stackoverflow.com/questions/6427530/regular-expression-pattern-to-match-url-with-or-without-http-www/
Every valid Internet URL has at least one dot, so the above pattern will simply try to find any at least two strings chained by a dot and has valid characters that URL may have.

Try this:
/^http:\/\/|(www\.)?[a-z0-9]+([\-\.]{1}[a-z0-9]+)*\.[a-z]{2,5}(:[0-9]{1,5})?(\/.*)?$/
It works exactly like the people want.
It takes with or with out http://, https://, and www.

You can use a question mark after a regular expression to make it conditional so you would want to use:
http:\/\/(www\.)?
That will match anything that has either http://www. or http:// (with no www.)
You could just use a replace method to remove the above, thus getting you the domain. It depends on what you need the domain for.

Try something like this:
.*([\w-]+\.)+[a-z]{2,5}(/[\w-]+)*

Use:
/(https?://)?((?:(\w+-)*\w+)\.)+(?:[a-z]{2})(\/?\w?-?=?_?\??&?)+[\.]?([a-z0-9\?=&_\-%#])?/g
It matches something.com, http(s):// or www. It does not match other [something]:// URLs though, but for my purpose that's not necessary.
The regex matches e.g.:
http://foo.co.uk/
www.regex.com/foo.html?q=bar$some=thi-ng,regex
regex.foo.com/blog

You can try this:
r"(http[s]:\/\/)?([\w-]+\.)+([a-z]{2,5})(\/+\w+)? "
Selection:
may be start with http:// or https:// (optional)
anything (word) end with dot (.)
followed by 2 to 5 character [a-z]
followed by "/[anything]" (optional)
followed by space

Try this
$url_reg = /(ftp|https?):\/\/(\w+:?\w*#)?(\S+)(:[0-9]+)?(\/([\w#!:.?+=&%#!\/-])?)?/;

I have been using the following, which works for all my test cases, as well as fixes any issues where it would trigger at the end of a sentence preceded by a full-stop (end.), or where there were single character initials, such as 'C.C. Plumbing'.
The following regex contains multiple {2,}s, which means two or more matches of the previous pattern.
((http|https)\:\/\/)?[a-zA-Z0-9\.\/\?\:#\-_=#]{2,}\.([a-zA-Z0-9\&\.\/\?\:#\-_=#]){2,}
Matches URLs such as, but not limited to:
https://example.com
http://example.com
example.com
example.com/test
example.com?value=test
Does not match non-URLs such as, but not limited to:
C.C Plumber
A full-stop at the end of a sentence.
Single characters such as a.b or x.y
Please note: Due to the above, this will not match any single character URLs, such as: a.co, but it will match if it is preceded by a URL scheme, such as: http://a.co.

I was getting so many issues getting the answer from anubhava to work due to recent PHP allowing $ in strings and the preg match wasn't working.
Here is what I used:
// Regular expression
$re = '/((https?|ftp):\/\/)?([a-z0-9+!*(),;?&=.-]+(:[a-z0-9+!*(),;?&=.-]+)?#)?([a-z0-9\-\.]*)\.(([a-z]{2,4})|([0-9]{1,3}\.([0-9]{1,3})\.([0-9]{1,3})))(:[0-9]{2,5})?(\/([a-z0-9+%-]\.?)+)*\/?(\?[a-z+&$_.-][a-z0-9;:#&%=+\/.-]*)?(#[a-z_.-][a-z0-9+$%_.-]*)?/i';
// Match all
preg_match_all($re, $blob, $matches, PREG_SET_ORDER, 0);
// Print the entire match result
var_dump($matches);
// The first element of the array is the full match

This PHP Composer package URL highlight is doing a good job in PHP:
<?php
use VStelmakh\UrlHighlight\UrlHighlight;
$urlHighlight = new UrlHighlight();
$matches = $urlHighlight->getUrls($string);
?>

If it does not have to be regex, you could always use the validate filters that are in PHP.
filter_var('http://example.com', FILTER_VALIDATE_URL);
filter_var (mixed $variable [, int $filter = FILTER_DEFAULT [, mixed $options ]]);
Types of Filters
Validate Filters

Regex if you want to ensure a URL starts with HTTP/HTTPS:
https?:\/\/(www\.)?[-a-zA-Z0-9#:%._\+~#=]{1,256}\.[a-zA-Z0-9()]{1,6}\b([-a-zA-Z0-9()#:%_\+.~#?&//=]*)
If you do not require the HTTP protocol:
[-a-zA-Z0-9#:%._\+~#=]{1,256}\.[a-zA-Z0-9()]{1,6}\b([-a-zA-Z0-9()#:%_\+.~#?&//=]*)

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

Regex to match a section between two static url components - php

How about this: <?php $url='http://example.com/wreqwreqrq/rfqewrqwe/c/TEXTTOMATCH/'; $split_url=parse_url($url, PHP_URL_PATH); //print_r($split_url); $e=explode('/',$split_url); //find "c" key and add one $find=array_search('c',$e); echo $e[$find+1];

Related

PHP regex last occurrence of words

How to write such url pattern?

how to use preg_replace to replace all ocurrences of a given pattern?

PHP Regex - Issue with forward slashes and alternation

Regular expression pattern to match URL with or without http://www

Categories

Resources