PHP match *.domain.com - php

This is some code from a php file im working with. I need to match 'domain.com' but when I enter that it's not working because it's parsing a document looking for href tags and i think it needs the http://www. for the match. I tried the below preg match but it didn't work and my coding isn't to great any help would be appreciated.
preg_match ("/domain.com/i");
$match = 'http://www.domain.com';
for($i=0;$i<$documentLinks->length;$i++)
{
$documentLink = $documentLinks->item($i);
if ($documentLink->hasAttribute('href') AND substr(strtolower($documentLink->getAttribute('href')), 0, strlen($match)) == $match)
{

try this:
for($i=0;$i<$documentLinks->length;$i++)
{
$documentLink = $documentLinks->item($i);
if ($documentLink->hasAttribute('href'))
{
if (preg_match('!^https?://([^/]+\.)?domain\.com(/|#|$|\?)!i', trim($documentLink->getAttribute('href'))))
{
the regexp is the important part:
^https?://([^/]+\.)?domain\.com(/|#|$|\?)
start at the beginning of the string, match http or https, then an optional subdomain that may not include forward slashes (so you know you're still in the domain part), followed by the domain you want to match, then either the start of a path, start of a fragment or the end of the url

Related

Regex to replace domain of url if it's ending with .css

I'm trying to write a php script to replace ONLY domain of every URL in the content with new domain if the URL ends with .css.
For example:
www.example.com/asset/css/style.css
After checking condition and replacement we have:
www.new-domain.net/asset/css/style.css
Would anyone please help me to find the correct pattern for this.
So far I've tried this:
preg_replace('/[http://].*\.(css)/i','www.new-domain.net',$Html_contents)
If I correctly understood, you should try something like:
preg_replace('/(https?:\/\/|)?[^\/]*(?=\/.*\.css$)/i','$1www.new-domain.net',$Html_contents)
Where
(https?:\/\/|) means that the string http:// (or https://) is optional
[^\/]* means "anithing but /"
(?=\/.*\.css$) means "a /, followed by anything, followed by a literal dot, followed by css, followed by end of string"
See demo here.
if the domain is static you can try this without using regex
$old_domain = 'https://www.example.com/asset/css/style.css';
if (substr($old_domain, -4) == '.css'){
echo str_replace('www.example.com', 'www.new-domain.net', $old_domain);
}

Regex to match OPO Invite link

if (preg_match('#^https?://account.oneplus.net/invite/claim/....-....-....-....#', $url) === 0) {
return "Invalid link";
}
I currently use this code (in PHP) to verify the url. However, it also passes as true when you try with other stuff trailing behind the link. How do I fix this so that only links ending with or without / work?
This was the regex I was looking for:
preg_match('#^https?://account.oneplus.net/invite/claim/\S{4}-\S{4}-\S{4}-\S{4}/?$#', $url) === 0
I suggest you to replace all the dots with \S (which matches any non-space character), so .... would be written as \S{4} because . would match also a horizontal space. And also add the pattern (/?) to match an optional / at the last.

how to make regex string to match domain and url [duplicate]

I'm not very good at regular expressions at all.
I've been using a lot of framework code to date, but I'm unable to find one that is able to match a URL like http://www.example.com/etcetc, but it is also is able to catch something like www.example.com/etcetc and example.com/etcetc.
For matching all kinds of URLs, the following code should work:
<?php
$regex = "((https?|ftp)://)?"; // SCHEME
$regex .= "([a-z0-9+!*(),;?&=$_.-]+(:[a-z0-9+!*(),;?&=$_.-]+)?#)?"; // User and Pass
$regex .= "([a-z0-9\-\.]*)\.(([a-z]{2,4})|([0-9]{1,3}\.([0-9]{1,3})\.([0-9]{1,3})))"; // Host or IP address
$regex .= "(:[0-9]{2,5})?"; // Port
$regex .= "(/([a-z0-9+$_%-]\.?)+)*/?"; // Path
$regex .= "(\?[a-z+&\$_.-][a-z0-9;:#&%=+/$_.-]*)?"; // GET Query
$regex .= "(#[a-z_.-][a-z0-9+$%_.-]*)?"; // Anchor
?>
Then, the correct way to check against the regex is as follows:
<?php
if(preg_match("~^$regex$~i", 'www.example.com/etcetc', $m))
var_dump($m);
if(preg_match("~^$regex$~i", 'http://www.example.com/etcetc', $m))
var_dump($m);
?>
Courtesy: Comments made by splattermania in the PHP manual: preg_match
RegEx Demo in regex101
This worked for me in all cases I had tested:
$url_pattern = '/((http|https)\:\/\/)?[a-zA-Z0-9\.\/\?\:#\-_=#]+\.([a-zA-Z0-9\&\.\/\?\:#\-_=#])*/';
Tests:
http://test.test-75.1474.stackoverflow.com/
https://www.stackoverflow.com
https://www.stackoverflow.com/
http://wwww.stackoverflow.com/
http://wwww.stackoverflow.com
http://test.test-75.1474.stackoverflow.com/
http://www.stackoverflow.com
http://www.stackoverflow.com/
stackoverflow.com/
stackoverflow.com
http://www.example.com/etcetc
www.example.com/etcetc
example.com/etcetc
user:pass#example.com/etcetc
example.com/etcetc?query=aasd
example.com/etcetc?query=aasd&dest=asds
http://stackoverflow.com/questions/6427530/regular-expression-pattern-to-match-url-with-or-without-http-www
http://stackoverflow.com/questions/6427530/regular-expression-pattern-to-match-url-with-or-without-http-www/
Every valid Internet URL has at least one dot, so the above pattern will simply try to find any at least two strings chained by a dot and has valid characters that URL may have.
Try this:
/^http:\/\/|(www\.)?[a-z0-9]+([\-\.]{1}[a-z0-9]+)*\.[a-z]{2,5}(:[0-9]{1,5})?(\/.*)?$/
It works exactly like the people want.
It takes with or with out http://, https://, and www.
You can use a question mark after a regular expression to make it conditional so you would want to use:
http:\/\/(www\.)?
That will match anything that has either http://www. or http:// (with no www.)
You could just use a replace method to remove the above, thus getting you the domain. It depends on what you need the domain for.
Try something like this:
.*([\w-]+\.)+[a-z]{2,5}(/[\w-]+)*
Use:
/(https?://)?((?:(\w+-)*\w+)\.)+(?:[a-z]{2})(\/?\w?-?=?_?\??&?)+[\.]?([a-z0-9\?=&_\-%#])?/g
It matches something.com, http(s):// or www. It does not match other [something]:// URLs though, but for my purpose that's not necessary.
The regex matches e.g.:
http://foo.co.uk/
www.regex.com/foo.html?q=bar$some=thi-ng,regex
regex.foo.com/blog
You can try this:
r"(http[s]:\/\/)?([\w-]+\.)+([a-z]{2,5})(\/+\w+)? "
Selection:
may be start with http:// or https:// (optional)
anything (word) end with dot (.)
followed by 2 to 5 character [a-z]
followed by "/[anything]" (optional)
followed by space
Try this
$url_reg = /(ftp|https?):\/\/(\w+:?\w*#)?(\S+)(:[0-9]+)?(\/([\w#!:.?+=&%#!\/-])?)?/;
I have been using the following, which works for all my test cases, as well as fixes any issues where it would trigger at the end of a sentence preceded by a full-stop (end.), or where there were single character initials, such as 'C.C. Plumbing'.
The following regex contains multiple {2,}s, which means two or more matches of the previous pattern.
((http|https)\:\/\/)?[a-zA-Z0-9\.\/\?\:#\-_=#]{2,}\.([a-zA-Z0-9\&\.\/\?\:#\-_=#]){2,}
Matches URLs such as, but not limited to:
https://example.com
http://example.com
example.com
example.com/test
example.com?value=test
Does not match non-URLs such as, but not limited to:
C.C Plumber
A full-stop at the end of a sentence.
Single characters such as a.b or x.y
Please note: Due to the above, this will not match any single character URLs, such as: a.co, but it will match if it is preceded by a URL scheme, such as: http://a.co.
I was getting so many issues getting the answer from anubhava to work due to recent PHP allowing $ in strings and the preg match wasn't working.
Here is what I used:
// Regular expression
$re = '/((https?|ftp):\/\/)?([a-z0-9+!*(),;?&=.-]+(:[a-z0-9+!*(),;?&=.-]+)?#)?([a-z0-9\-\.]*)\.(([a-z]{2,4})|([0-9]{1,3}\.([0-9]{1,3})\.([0-9]{1,3})))(:[0-9]{2,5})?(\/([a-z0-9+%-]\.?)+)*\/?(\?[a-z+&$_.-][a-z0-9;:#&%=+\/.-]*)?(#[a-z_.-][a-z0-9+$%_.-]*)?/i';
// Match all
preg_match_all($re, $blob, $matches, PREG_SET_ORDER, 0);
// Print the entire match result
var_dump($matches);
// The first element of the array is the full match
This PHP Composer package URL highlight is doing a good job in PHP:
<?php
use VStelmakh\UrlHighlight\UrlHighlight;
$urlHighlight = new UrlHighlight();
$matches = $urlHighlight->getUrls($string);
?>
If it does not have to be regex, you could always use the validate filters that are in PHP.
filter_var('http://example.com', FILTER_VALIDATE_URL);
filter_var (mixed $variable [, int $filter = FILTER_DEFAULT [, mixed $options ]]);
Types of Filters
Validate Filters
Regex if you want to ensure a URL starts with HTTP/HTTPS:
https?:\/\/(www\.)?[-a-zA-Z0-9#:%._\+~#=]{1,256}\.[a-zA-Z0-9()]{1,6}\b([-a-zA-Z0-9()#:%_\+.~#?&//=]*)
If you do not require the HTTP protocol:
[-a-zA-Z0-9#:%._\+~#=]{1,256}\.[a-zA-Z0-9()]{1,6}\b([-a-zA-Z0-9()#:%_\+.~#?&//=]*)

Remove trailing slash on domain extensions without trailing directory

I'm importing data from a csv and I've been looking high and low for a particular regular expression to remove trailing slashes from domain names without a directory after it. See the following example:
example.com/ (remove trailing slash)
example.co.uk/ (remove trailing slash)
example.com/gb/ (do not remove trailing slash)
Can anyone help me out with this or at least point me in the right direction?
Edit: This is my progress so far, I've only matched the extension at the moment but it's picking up those domains with trailing directories.
[a-z0-9\-]+[a-z0-9]\/[a-z]
Many thanks
I don't know how it would compare to a regular expression performance-wise, but you can do it without one.
A simple example:
$string = rtrim ($string, '/');
$string .= (strpos($string, '/') === false) ? '' : '/';
In the second line I'm only adding a / at the end if the string already contains one (to separate domain from folder).
A more solid approach would probably be to only rtrim if the first / found, is the last character of the string.
not sure,
but you can try this,
if it is a $_SERVER['SERVER_NAME'] only then remove slash otherwise keep it
because $_SERVER['SERVER_NAME'] will return URL without any directory
try this
/^(http|https|ftp)\:\/\/[a-z0-9\-\.]+\.[a-z]{2,3}(:[a-z0-9]*)?\/?([a-z0-9\-\._\?\,\'\/\\\+&%\$#\=~])*$/i
you could test for a match on /[a-z]/, then remove the last charater if it's not found.
this is javascript, but it'd be similar in php.
/\/[a-z]+\//
var txt = 'example.com/gb/';
var match = txt.match(/\/[a-z]+\//);
if (!match) {
alert(txt.substring(txt,txt.length-1));
}
else {
alert(txt);
}
http://jsfiddle.net/xjKTS/
Try this, it works:
<?
$result = preg_replace('/^([^\/]+)(\/)$/','$1',$your_data);
?>
I have tested like this:
$reg = '/^([^\/]+)(\/)$/';
echo preg_replace($reg,'$1',$str1);//example.com
echo preg_replace($reg,'$1',$str2);//example.co.uk
echo preg_replace($reg,'$1',$str3);//example.com/gb/
?>

preg_match expression how to ignore a character

I am absolutely a newbie and have not ventured to this level yet but needed to be able to strip a domain down to only the hostname for a search function. I looked and found this below which pretty much works except if the domain name has any - in it. So http://www.example.com strips down to example.com as does www.example.com but www.exa-mple.com becomes example.com.
$pattern = '/\w+\..{2,3}(?:\..{2,3})?(?:$|(?=\/))/i';
$url = $myurl;
if (preg_match($pattern, $url, $matches) === 1) {
$mydom = $matches[0];
}
What would have to be changed in the expression so that it accepts the - in the domain names?
You'd be better off with parse_url function:
parse_url($url)
Just prepend http:// if the url doesn't start with it.
Your regex currently allows the character _ and disallows the character -, which means it accepts invalid URLs. You can correct this with the following group:
$pattern = '/[a-z0-9-]+\..{2,3}(?:\..{2,3})?(?:$|(?=\/))/i';
Note that there are still issues with this. First, domain names are not allowed to start or end with a hyphen. Second, you are currently allowing any character in the TLD, whereas they only contain letters.
The best solution would be to use a proper URL parsing library and not to try to do this yourself.
$sites = array('mysite.com',
'www.mysite.com',
'http://www.mysite.com',
'www.my-site.com',
'sub.folder.2.example.com',
'http://www.mysite.com/argh/index.php');
$reg = '%^(?:http://)?(?:[^.]*\.)*([a-zA-Z0-9_-]+\.[a-zA-Z0-9]+)%m';
foreach($sites as $site)
{
if(preg_match($reg,$site,$matches))
{
echo $matches[1],PHP_EOL;
}
}
Output:
mysite.com
mysite.com
mysite.com
my-site.com
examle.com
mysite.com

Categories