How to check the URL's structure using PHP preg_match?

How to check the URL's structure using PHP preg_match? - php

All my site's URLs have the following structure:
https://www.example.com/section/item
where section is a word and item is a number.
So, possible URLs are:
https://www.example.com
https://www.example.com/section
https://www.example.com/section/item
By .htaccess, all requests go to index.php (route).
I want to show a 404 error message if user types:
https://www.example.com/section/item/somethingelse
In order to check the URL's structure, how can I change the pattern properly in the following function?
function isValidURL($url) {
return preg_match('|^http(s)?://[a-z0-9-]+(.[a-z0-9-]+)*(:[0-9]+)?(/.*)?$|i', $url);
}
Thanks.

If section is a word (and can not contain digits), and item is a number, you could match word characters except digits using [^\W\d]+ and \d+ to match 1+ digits.
As in the example data there are optional parts, you could replace (/.*)?$ with (?:/[^\W\d]+(?:/\d+)?)?$.
Explanation
(?: Non capturing group
/[^\W\d]+ For section, match 1+ times a word char except a digit
(?:/\d+)? For item, optionally match / and 1+ digits
)? Close non capturing group and make it optional
If section can be a word which can also consists of only digits, you could also use \w+
The pattern might look like
^https?://[a-z0-9-]+(?:\.[a-z0-9-]+)*(?::[0-9]+)?(?:/[^\W\d]+(?:/\d+)?)?$
Regex demo
Note to escape the dot to match it literally.

Related

Special Expression not allowed - Regular Expression in PHP

I am trying match my String to not allow the case: for example 150x150 from the image name below:
test-string-150x150.png
I am using the following pattern to match this String:
/^([^0-9x0-9]+)\..+/
It works fine, Except in such a case:
teststring.com-150x150.jpg
What i need to get - the mask must disallow only dimensions in the end of string, here is some examples:
test-string-150x150.png > must disallow
any-string.png > allow
200x200-test.png > allow
1x1.png-100x100.jpg > disallow

You could use a negative lookahead to assert that the string does not contain the sizes followed by a dot and 1+ word characters till the end of the string.
^(?!.*\d+x\d+\.\w+$).+$
Explanation
^ Start of string
(?! Negative lookahead, assert what is on the right is not
.* Match 0+ occurrences of any char except a newline
\d+x\d+ Match the sizes format, where \d+ means 1 or more digits
\.\w+$ Match a dot, 1+ word characters and assert the end of the string $
) Close lookahead
.+ Match 1+ occurrences of any char except a newline
$ End of string
Regex demo

If I understand your question, you're trying to find image names that do not include the image dimensions. If so, try this:
/^(?![\w-\.]+(\d+x\d+))[\w-\.]+\.\w+$/gm
For details about this code, please see regexr.com/4tmd1. This site is a great place to play around with regexes to make sure you're getting the results you expect.
Be aware that the exact syntax of the regular expression depends on the regex engine used by whatever program you're running.

Using Regex to detect if string exists

I need to use PHP's preg_match() and Regex to detect the following conditions:
If a URL path is one of the following:
products/new items
new items/products
new items/products/brand name
Do something...
I can't seem to figure out how to check if the a string exists before or after the word products. The closest I can get is:
if (preg_match("([a-zA-Z0-9_ ]+)\/products\/([a-zA-Z0-9_ ]+)", $url_path)) {
// Do something
Would anyone know a way to check if the first part of the string exists within the one regex line?

You could use an alternation with an optional group for the last item making the / part of the optional group.
If you are only looking for a match, you can omit the capturing groups.
(?:[a-zA-Z0-9_ ]+/products(?:/[a-zA-Z0-9_ ]+)?|products/[a-zA-Z0-9_ ]+)
Explanation
(?: Non catpuring group
[a-zA-Z0-9_ ]+/products Match 1+ times what is listed in the character class, / followed by products
(?:/[a-zA-Z0-9_ ]+)? Optionally match / and what is listed in the character class
| Or
products/[a-zA-Z0-9_ ]+ Match products/, match 1+ times what is listed
) Close group
Regex demo
Note that [a-zA-Z0-9_ ]+ might be shortened to [\w ]+

You can use alternation
([\w ]+)\/products|products\/([\w ]+)
Regex Demo
Note:- I am not sure how you're using the matched values, if you don't need back reference to any specific values then you can avoid capturing group, i.e.
[\w ]+\/products|products\/[\w ]+

regex for extracting all urls from string excluding period for terminating strings

I'm trying to extract URL from a piece of string I have different posts that contains URL in their message. I've prepared a pattern to match but it's not working properly. I have asked the same question here but forgot to add this case in that so I'm asking a new question for it.
Tried Pattern
\b(\.?)(?:https?://)?(?:(?i:[a-z]+\.)+)[^\s,]+\b
CODE
for ( $i = 0; $i < $resultcount; $i ++ ) {
$pattern = '%\b(\.?)(?:https?://)?(?:(?i:[a-z]+\.)+)[^\s,]+\b%';
$message = (string)$result[$i]['message'];
preg_match_all($pattern,$message,$match);
print_r($match);
}
A Example of my post like this
"This is just a post to test regex for extracting URL.
http://google.com, https://www.youtube.com/watch?v=dlw32af
https://instagram.com/oscar/ en.wikipedia.org"
Post may have comma or may not have comma for multiple URLs and also it is possible that a string and url doesn't have any space in between like
sometext.http://google.com
regexDemo
Thank you people :)

This will match strings which are precisely encoded and have formats like an HTTP URL except those fall into IDN categorization:
(?i)(?:https?://[^"'\s<>(){}]++|[a-z0-9](?<=\b.)[a-z0-9-]*+(?:\.[a-z-]{2,}+)++(?=[/?"'()\s]|:\d++|\Z)[^"'\s<>(){}]*+)
So you will not expect
ftp://username:password#ftpserver/folder/
to be matched.
Live demo

In your initial question you failed to specify that each "word"
(a part of URL) can contain something other than letters.
Note that your regex contains [a-z] which suggests, that you
want to match only URLs, which have "words" composed entirely
of letters, without any digits, minus chars or underscores.
Try the following regex:
(?:https?:\/\/)?(?i)[a-z][a-z0-9_-]*(?:[.\/](?!http)[a-z0-9_-]+)+\/?(?:\?[^\s,.]+)?
Description:
(?:https?:\/\/)? - Optional protocol name.
(?i) - Turn on case insensitive option.
[a-z][a-z0-9_-]* - The first "word" of the URL (first a letter,
then any number of letter, digit, underscore or minus chars).
(?:[.\/] - Non-capturing group: Either a dot or a slash.
(?!http) - then negative lookahead, to block cases when URL starting from
http is immediately preceded by a dot (or a slash).
[a-z0-9_-]+)+ - then the next "word" (optional, no requirement to start
from a letter), all this (non-capturing group) repeated.
\/? - Optional slash, terminating the part before query string (if any).
(?:\?[^\s,.]+)? - Optional non-capturing group for query string.
It starts from ? and then a sequence of chars other than space,
comma or dot.
The above regex does not match the trailing dot, just as you wish.
Note:
As I tried this regex under regex101.com, I quoted / chars contained
in it. You probably can omit this quotation.
Following your comment, I changed the regex, that a "word" can contain also
digits, underscores and minus chars.
Note also that - as a first or last char between [...] stands
for itself (opposite to - between two other chars, where it means
from - to).

regex to fetch url from string

In php am doing one task -
I want such a regex which will fetch like
$str = "this is my friend's website http://example1.com I think it is coll some text example.com some text t.com/2000 some text rs.500 some text http://www some text"
how can I fetch following with the help of regex -
http://example1.com
example.com
t.com/2000
http://www
rs.500 must be avoided!
actually I need such a regex which can satisfy any link
please help me with that

This regex is what you're looking for (mandatory regex101 link):
(https?:\/\/\S+)|([a-z]+\.[a-z]+(?:\/\S+)?)
It's basically the two regexes https?:\/\/\S+ and [a-z]+\.[a-z]+(?:\/\S+)? placed into capturing groups (so that you can extract all URLs with a global search) and then combined with an OR.
https?:\/\/\S+ finds URLs that are prefixed with http:// or https:// by matching:
The string "http" literally http, followed by
An optional "s" s? followed by
A colon and two forward slashes :\/\/, followed by
One or more non-whitespace characters \S+
If https?:\/\/\S+ doesn't match, then [a-z]+\.[a-z]+(?:\/\S+)? kicks in and finds URLs that are not prefixed with http:// or https:// and whose top level domains don't contain numbers by matching:
One or more lowercase letters [a-z]+, followed by
A dot \., followed by
One or more lowercase letters [a-z]+, followed by
An optional group, which consists of
A forward slash \/, followed by
One or more non-whitespace characters \S+

regex to filter out numbers in seo url

I have some urls like these below
http://www.bla-bla.com/hello-world/blah/1345346-asfasdf.html
http://www.bla-bla.com/hello-world/454536556-asdf-rtrthr-dssdfg.html
http://www.bla-bla.com/hello-world/bla/how/what/26609768-nmbbasdf.html
IF the url has a slash followed by numbers, I need to return the just numbers
so the result must be
1345346
454536556
26609768
How can I get everything but the numbers from urls

If those are the only numbers in your URL, you can simply use /\d+/, which stands for "Any digit one or more times".
If you need to specifically group out the numbers in the final part of the string, you can use something more like this: /\/(\d+).*\.html$/, which stands for "A group of digits, following a literal forward slash '/', followed by any characters and .html at the end of a string", and capture group 1 would contain it.
As per request from comment: to get the numbers preceded by a forward slash / and ending with a hyphen -, just use this: /(?<=\/)\d+(?=\-)/, which can be broken down as:
(?<=\/) # Look before the group for a forward slash, but don't add it to the capture group.
\d+ # Match one or more digits (0-9)
(?=\-) # Look after the group for a hyphen, but don't add it to the capture group.

Try using this as your regular expression: /\/([0-9]+)/

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

How to check the URL's structure using PHP preg_match? - php

Related

Special Expression not allowed - Regular Expression in PHP

Using Regex to detect if string exists

regex for extracting all urls from string excluding period for terminating strings

regex to fetch url from string

regex to filter out numbers in seo url

Categories

Resources