How to validate a hyperlink from different links using php - php

can you please tell me how to validate a hyperlink from different hyperlinks. eg
i want to fetch these links separately starting with the bolded address(between two stars) from a website using simple html dom
1 http://**www.website1.com**/1/2/
2 http://**news.website2.com**/s/d
3 http://**website3.com/news**/gds
i know we can do it using preg_match ;but i am getting a hardtime understanding preg_match.
can anyone give me a preg_match script for these websites validation..
and can you also explain me what this means
preg_match('|^http(s)?://[a-z0-9-]+(.[a-z0-9-]+)*(:[0-9]+)?(/.*)?$|i', $url)
what are those random looking characters in preg_match? what is the meaning of these characters?

If you want to learn about regular expression, I think you could get a good start on the regular-expressions.info website.
And if you want to use them more, the book Mastering Regular Expressions is a must read.
Edit: here is a simple walkthrough tho:
the first parameter of preg_match is the regexp string. The second is the string you're testing against. A third optionnal one can be used and would be an array inside which everything captured is stored.
the | are used to delimit your regexp and its options. What is between the first one is the regexp, the i at the end is an option (meaning your regexp is case insensitive)
the first ^ is marking where your string you want to match starts
then (s)? mean that you want one or no s character, and you want to "capture it"
[a-z0-9]+ is any number (even 0) of alphanumeric characters
(.[a-z0-9-]+)* is wrong. It should be (\.[a-z0-9-]+)* to capture any number of sequences formed by a dot then at least one alphanumeric character
(:[0-9]+)? will capture one or no sequence formed by : followed by any number. It's used to get the url port
(/.*)? captures the end of the url, a slash followed by any number of any character
$ is the end of your string

Have a look at In search of the perfect URL validation regex.

Related

PHP Regex detect repeated character in a word

(preg_match('/(.)\1{3}/', $repeater))
I am trying to create a regular expression which will detect a word that repeats a character 3 or more times throughout the word. I have tried this numerous ways and I can't seem to get the correct output.
If you don't need letters to be contiguous, you can do it with this pattern:
\b\w*?(\w)\w*?\1\w*?\1\w*
otherwise this one should suffice:
\b\w*?(\w)\1{2}\w*
Try this regex instead
(preg_match('/(.)\1{2,}/', $repeater))
This should match 3 or more times, see example here http://regexr.com/3fk80
Strictly speaking, regular expressions that include \1, \2, ... things are not mathematical regular expressions and the scanner that parses them is not efficient in the sense that it has to modify itself to include the accepted group, in order to be used to match the discovered string, and in case of failure it has to backtrack for the length of the matched group.
The canonical way to express a true regular expression that accepts word characters repeated three or more times is
(A{3,}|B{3,}|C{3,}|...|Z{3,}|a{3,}|b{3,}|...|z{3,})
and there's no associativity of the operator {3,} to be able to group it as you shown in your question.
For the pedantic, the pure regular expression should be:
(AAAA*|BBBB*|CCCC*|...|ZZZZ*|aaaa*|bbbb*|cccc*|...|zzzz*)
again, this time, you can use the fact that AAAA* is matched as soon as three As are found, so it would be valid also the regex:
AAA|BBB|CCC|...|ZZZ|aaa|bbb|ccc|...|zzz
but the first version allow you to capture the \1 group that delimits the actual matching sequence.
This approach will be longer to write but is by far much more efficient when parsing the data string, as it has no backtrack at all and visits each character only once.

PHP RegEx: a Pattern to Validate the Second Level Domain

Note: this is a theoretical question about PHP flavor of regex, not a practical question about validation in PHP. I am merely using Domain Names for lack of a better example.
"Second Level Domain" refers to the combination of letters, numbers, period signs, and/or dashes that are placed between http:// or http://www. and .com (.co, .info, .etc) .
I am only interested in second level domains that use English version of Latin alphabet.
This pattern:
[A-Za-z0-9.-]+
matches valid domain names, such as stackoverflow, StackOverflow, stackoverflow.co (as in stackoverflow.co.uk), stack-overflow, or stackoverflow123.
However, the same pattern would also match something like stack...overflow, stack---over--flow, ........ , -------- , or even . and -.
How can that pattern be rewritten, to indicate that period signs and dashes, even though they can be used multiple times in a node,
cannot be used without other symbols,
cannot be placed twice or more side by side with each other,
and cannot be placed in the beginning or end of the node?
Thank you in advance!
I think something like this should do the trick:
^([a-zA-Z0-9]+[.-])*[a-zA-Z0-9]+$
What this tries to do is
start at the beginning of string, end at the end
one or more letter or digit
followed by either dot or hypen
the group above repeated 0 or more times
followed by one or more letter or digit
Assuming that you are looking for a regex that does not allow two consecutive . or - you can use:
^[a-zA-Z0-9]+([-.][a-zA-Z0-9]+)*$
regexr demo

Discard character in matching group

I have a couple of matching groups one after another in a long Regex pattern. Around the middle I have
...(?<number>(?:/(?:digit|num))?\d+|)...
which should match something like /num9, /digit9 or 9 or blank (because I need the named group to appear in the resulting associative array even if it's empty).
The pattern works, but is it possible to discard the / character if the one of first two cases is matched? I tried a positive lookahead, but it seems that you can't use those if you have expressions before the lookahead.
Is what I'm trying to accomplish possible using Regex?
Based on your input, I think that you need to capture / anyway at some point, otherwise your whole regex fails. At the same time you want to ignore it, so it cannot be a part of you named group. Therefore by putting it outside it and making it optional, while ensuring that a digit is not preceded directly by a / you come up with the desired results :
^/?(?<number>(?:(?:digit|num))?(?<!/)\d+|)$
However given your lack of a more complete input and regex, I am not 100% sure this will work for all your cases.

Regular expression to match words with no space

I am trying to do a preg_match to filter unwanted spam queries and I would like to match any word that is listed in the preg_match and filter it if it has no space after it.
So for example if I have the word balloon in the preg_match then I want to filter anything like "balloon1" or "balloond" or "balloonedfbdg" etc and allow anything with a space after balloon like "balloon big", "balloon small" etc.
I have a lot of queries from google that take a single word and add a whole bunch of crap to it that I want to filter out. It is only a few words but it is irritating for me enough to come here and find an answer to fix this.
I already use a preg_match for some of the spam queries using regular expressions but I do not know how to match something that is not spaced and allow something that has a space.
Any help is appreciated, Thanks.
Your Expression: /(balloon|otherwordone|othertwo)[^\s]/i
This matches the listed words if they're not followed by a whitespace (\s)
Edit: Using \B (not a word boundary):
/(balloon|otherwordone|othertwo)\B/i
This prevents common sentence symbols from triggering the regex (like dot, comma).

PHP: Validate string containing numbers, separated by hyphen (possible by preg_match)?

I’m trying to validate a string which contains numbers where each four numbers are separated by a hyphen, for example 1111-2222-3333-4444
I’m trying to do some kind of validating so I can guarantee that this format is being used (with 16 digits, three hyphens and nothing else). I’ve this preg_match where it checks for digits only but I need to accept hyphens and this format.
preg_match('/^[0-9]{1,}$/', $validatenumbers)
I’ve tried to do it with regex but unfortunately it isn’t my strongest side so I haven’t been able to correctly validate the numbers.
It is important that it is in PHP and not Javascript because of the ability to “turn off” javascript in a browser.
preg_match("/^([0-9]{4}-){3}[0-9]{4}$/", $input);
([0-9]{4}-){3} Matches exactly 3 groups of 4 digits followed by a hyphen. That is terminated by another group [0-9]{4} (4 digits without a hyphen).
preg_match('/^[0-9]{4}\-[0-9]{4}\-[0-9]{4}\-[0-9]{4}$/',$numbers);
i think that should work.
This looks like a credit card number. If that's the case, you should use a Luhn checksum instead of a simple regex.
try:
if(preg_match('#^\d{4}-\d{4}-\d{4}-\d{4}$#',$string){}
If you require to match that exact format the pattern would be '~^\d{4}-\d{4}-\d{4}-\d{4}$~', or you can write it more generally like this: '/^(\d+-)*\d+$/' (this would match 11, 11-11111... and so on),

Categories