I'm trying to write a php script to replace ONLY domain of every URL in the content with new domain if the URL ends with .css.
For example:
www.example.com/asset/css/style.css
After checking condition and replacement we have:
www.new-domain.net/asset/css/style.css
Would anyone please help me to find the correct pattern for this.
So far I've tried this:
preg_replace('/[http://].*\.(css)/i','www.new-domain.net',$Html_contents)
If I correctly understood, you should try something like:
preg_replace('/(https?:\/\/|)?[^\/]*(?=\/.*\.css$)/i','$1www.new-domain.net',$Html_contents)
Where
(https?:\/\/|) means that the string http:// (or https://) is optional
[^\/]* means "anithing but /"
(?=\/.*\.css$) means "a /, followed by anything, followed by a literal dot, followed by css, followed by end of string"
See demo here.
if the domain is static you can try this without using regex
$old_domain = 'https://www.example.com/asset/css/style.css';
if (substr($old_domain, -4) == '.css'){
echo str_replace('www.example.com', 'www.new-domain.net', $old_domain);
}
if (preg_match('#^https?://account.oneplus.net/invite/claim/....-....-....-....#', $url) === 0) {
return "Invalid link";
}
I currently use this code (in PHP) to verify the url. However, it also passes as true when you try with other stuff trailing behind the link. How do I fix this so that only links ending with or without / work?
This was the regex I was looking for:
preg_match('#^https?://account.oneplus.net/invite/claim/\S{4}-\S{4}-\S{4}-\S{4}/?$#', $url) === 0
I suggest you to replace all the dots with \S (which matches any non-space character), so .... would be written as \S{4} because . would match also a horizontal space. And also add the pattern (/?) to match an optional / at the last.
I'm not very good at regular expressions at all.
I've been using a lot of framework code to date, but I'm unable to find one that is able to match a URL like http://www.example.com/etcetc, but it is also is able to catch something like www.example.com/etcetc and example.com/etcetc.
For matching all kinds of URLs, the following code should work:
<?php
$regex = "((https?|ftp)://)?"; // SCHEME
$regex .= "([a-z0-9+!*(),;?&=$_.-]+(:[a-z0-9+!*(),;?&=$_.-]+)?#)?"; // User and Pass
$regex .= "([a-z0-9\-\.]*)\.(([a-z]{2,4})|([0-9]{1,3}\.([0-9]{1,3})\.([0-9]{1,3})))"; // Host or IP address
$regex .= "(:[0-9]{2,5})?"; // Port
$regex .= "(/([a-z0-9+$_%-]\.?)+)*/?"; // Path
$regex .= "(\?[a-z+&\$_.-][a-z0-9;:#&%=+/$_.-]*)?"; // GET Query
$regex .= "(#[a-z_.-][a-z0-9+$%_.-]*)?"; // Anchor
?>
Then, the correct way to check against the regex is as follows:
<?php
if(preg_match("~^$regex$~i", 'www.example.com/etcetc', $m))
var_dump($m);
if(preg_match("~^$regex$~i", 'http://www.example.com/etcetc', $m))
var_dump($m);
?>
Courtesy: Comments made by splattermania in the PHP manual: preg_match
RegEx Demo in regex101
This worked for me in all cases I had tested:
$url_pattern = '/((http|https)\:\/\/)?[a-zA-Z0-9\.\/\?\:#\-_=#]+\.([a-zA-Z0-9\&\.\/\?\:#\-_=#])*/';
Tests:
http://test.test-75.1474.stackoverflow.com/
https://www.stackoverflow.com
https://www.stackoverflow.com/
http://wwww.stackoverflow.com/
http://wwww.stackoverflow.com
http://test.test-75.1474.stackoverflow.com/
http://www.stackoverflow.com
http://www.stackoverflow.com/
stackoverflow.com/
stackoverflow.com
http://www.example.com/etcetc
www.example.com/etcetc
example.com/etcetc
user:pass#example.com/etcetc
example.com/etcetc?query=aasd
example.com/etcetc?query=aasd&dest=asds
http://stackoverflow.com/questions/6427530/regular-expression-pattern-to-match-url-with-or-without-http-www
http://stackoverflow.com/questions/6427530/regular-expression-pattern-to-match-url-with-or-without-http-www/
Every valid Internet URL has at least one dot, so the above pattern will simply try to find any at least two strings chained by a dot and has valid characters that URL may have.
Try this:
/^http:\/\/|(www\.)?[a-z0-9]+([\-\.]{1}[a-z0-9]+)*\.[a-z]{2,5}(:[0-9]{1,5})?(\/.*)?$/
It works exactly like the people want.
It takes with or with out http://, https://, and www.
You can use a question mark after a regular expression to make it conditional so you would want to use:
http:\/\/(www\.)?
That will match anything that has either http://www. or http:// (with no www.)
You could just use a replace method to remove the above, thus getting you the domain. It depends on what you need the domain for.
Try something like this:
.*([\w-]+\.)+[a-z]{2,5}(/[\w-]+)*
Use:
/(https?://)?((?:(\w+-)*\w+)\.)+(?:[a-z]{2})(\/?\w?-?=?_?\??&?)+[\.]?([a-z0-9\?=&_\-%#])?/g
It matches something.com, http(s):// or www. It does not match other [something]:// URLs though, but for my purpose that's not necessary.
The regex matches e.g.:
http://foo.co.uk/
www.regex.com/foo.html?q=bar$some=thi-ng,regex
regex.foo.com/blog
You can try this:
r"(http[s]:\/\/)?([\w-]+\.)+([a-z]{2,5})(\/+\w+)? "
Selection:
may be start with http:// or https:// (optional)
anything (word) end with dot (.)
followed by 2 to 5 character [a-z]
followed by "/[anything]" (optional)
followed by space
Try this
$url_reg = /(ftp|https?):\/\/(\w+:?\w*#)?(\S+)(:[0-9]+)?(\/([\w#!:.?+=&%#!\/-])?)?/;
I have been using the following, which works for all my test cases, as well as fixes any issues where it would trigger at the end of a sentence preceded by a full-stop (end.), or where there were single character initials, such as 'C.C. Plumbing'.
The following regex contains multiple {2,}s, which means two or more matches of the previous pattern.
((http|https)\:\/\/)?[a-zA-Z0-9\.\/\?\:#\-_=#]{2,}\.([a-zA-Z0-9\&\.\/\?\:#\-_=#]){2,}
Matches URLs such as, but not limited to:
https://example.com
http://example.com
example.com
example.com/test
example.com?value=test
Does not match non-URLs such as, but not limited to:
C.C Plumber
A full-stop at the end of a sentence.
Single characters such as a.b or x.y
Please note: Due to the above, this will not match any single character URLs, such as: a.co, but it will match if it is preceded by a URL scheme, such as: http://a.co.
I was getting so many issues getting the answer from anubhava to work due to recent PHP allowing $ in strings and the preg match wasn't working.
Here is what I used:
// Regular expression
$re = '/((https?|ftp):\/\/)?([a-z0-9+!*(),;?&=.-]+(:[a-z0-9+!*(),;?&=.-]+)?#)?([a-z0-9\-\.]*)\.(([a-z]{2,4})|([0-9]{1,3}\.([0-9]{1,3})\.([0-9]{1,3})))(:[0-9]{2,5})?(\/([a-z0-9+%-]\.?)+)*\/?(\?[a-z+&$_.-][a-z0-9;:#&%=+\/.-]*)?(#[a-z_.-][a-z0-9+$%_.-]*)?/i';
// Match all
preg_match_all($re, $blob, $matches, PREG_SET_ORDER, 0);
// Print the entire match result
var_dump($matches);
// The first element of the array is the full match
This PHP Composer package URL highlight is doing a good job in PHP:
<?php
use VStelmakh\UrlHighlight\UrlHighlight;
$urlHighlight = new UrlHighlight();
$matches = $urlHighlight->getUrls($string);
?>
If it does not have to be regex, you could always use the validate filters that are in PHP.
filter_var('http://example.com', FILTER_VALIDATE_URL);
filter_var (mixed $variable [, int $filter = FILTER_DEFAULT [, mixed $options ]]);
Types of Filters
Validate Filters
Regex if you want to ensure a URL starts with HTTP/HTTPS:
https?:\/\/(www\.)?[-a-zA-Z0-9#:%._\+~#=]{1,256}\.[a-zA-Z0-9()]{1,6}\b([-a-zA-Z0-9()#:%_\+.~#?&//=]*)
If you do not require the HTTP protocol:
[-a-zA-Z0-9#:%._\+~#=]{1,256}\.[a-zA-Z0-9()]{1,6}\b([-a-zA-Z0-9()#:%_\+.~#?&//=]*)
I'm importing data from a csv and I've been looking high and low for a particular regular expression to remove trailing slashes from domain names without a directory after it. See the following example:
example.com/ (remove trailing slash)
example.co.uk/ (remove trailing slash)
example.com/gb/ (do not remove trailing slash)
Can anyone help me out with this or at least point me in the right direction?
Edit: This is my progress so far, I've only matched the extension at the moment but it's picking up those domains with trailing directories.
[a-z0-9\-]+[a-z0-9]\/[a-z]
Many thanks
I don't know how it would compare to a regular expression performance-wise, but you can do it without one.
A simple example:
$string = rtrim ($string, '/');
$string .= (strpos($string, '/') === false) ? '' : '/';
In the second line I'm only adding a / at the end if the string already contains one (to separate domain from folder).
A more solid approach would probably be to only rtrim if the first / found, is the last character of the string.
not sure,
but you can try this,
if it is a $_SERVER['SERVER_NAME'] only then remove slash otherwise keep it
because $_SERVER['SERVER_NAME'] will return URL without any directory
try this
/^(http|https|ftp)\:\/\/[a-z0-9\-\.]+\.[a-z]{2,3}(:[a-z0-9]*)?\/?([a-z0-9\-\._\?\,\'\/\\\+&%\$#\=~])*$/i
you could test for a match on /[a-z]/, then remove the last charater if it's not found.
this is javascript, but it'd be similar in php.
/\/[a-z]+\//
var txt = 'example.com/gb/';
var match = txt.match(/\/[a-z]+\//);
if (!match) {
alert(txt.substring(txt,txt.length-1));
}
else {
alert(txt);
}
http://jsfiddle.net/xjKTS/
Try this, it works:
<?
$result = preg_replace('/^([^\/]+)(\/)$/','$1',$your_data);
?>
I have tested like this:
$reg = '/^([^\/]+)(\/)$/';
echo preg_replace($reg,'$1',$str1);//example.com
echo preg_replace($reg,'$1',$str2);//example.co.uk
echo preg_replace($reg,'$1',$str3);//example.com/gb/
?>
I am absolutely a newbie and have not ventured to this level yet but needed to be able to strip a domain down to only the hostname for a search function. I looked and found this below which pretty much works except if the domain name has any - in it. So http://www.example.com strips down to example.com as does www.example.com but www.exa-mple.com becomes example.com.
$pattern = '/\w+\..{2,3}(?:\..{2,3})?(?:$|(?=\/))/i';
$url = $myurl;
if (preg_match($pattern, $url, $matches) === 1) {
$mydom = $matches[0];
}
What would have to be changed in the expression so that it accepts the - in the domain names?
You'd be better off with parse_url function:
parse_url($url)
Just prepend http:// if the url doesn't start with it.
Your regex currently allows the character _ and disallows the character -, which means it accepts invalid URLs. You can correct this with the following group:
$pattern = '/[a-z0-9-]+\..{2,3}(?:\..{2,3})?(?:$|(?=\/))/i';
Note that there are still issues with this. First, domain names are not allowed to start or end with a hyphen. Second, you are currently allowing any character in the TLD, whereas they only contain letters.
The best solution would be to use a proper URL parsing library and not to try to do this yourself.
$sites = array('mysite.com',
'www.mysite.com',
'http://www.mysite.com',
'www.my-site.com',
'sub.folder.2.example.com',
'http://www.mysite.com/argh/index.php');
$reg = '%^(?:http://)?(?:[^.]*\.)*([a-zA-Z0-9_-]+\.[a-zA-Z0-9]+)%m';
foreach($sites as $site)
{
if(preg_match($reg,$site,$matches))
{
echo $matches[1],PHP_EOL;
}
}
Output:
mysite.com
mysite.com
mysite.com
my-site.com
examle.com
mysite.com