I'm trying to write a php script to replace ONLY domain of every URL in the content with new domain if the URL ends with .css.
For example:
www.example.com/asset/css/style.css
After checking condition and replacement we have:
www.new-domain.net/asset/css/style.css
Would anyone please help me to find the correct pattern for this.
So far I've tried this:
preg_replace('/[http://].*\.(css)/i','www.new-domain.net',$Html_contents)
If I correctly understood, you should try something like:
preg_replace('/(https?:\/\/|)?[^\/]*(?=\/.*\.css$)/i','$1www.new-domain.net',$Html_contents)
Where
(https?:\/\/|) means that the string http:// (or https://) is optional
[^\/]* means "anithing but /"
(?=\/.*\.css$) means "a /, followed by anything, followed by a literal dot, followed by css, followed by end of string"
See demo here.
if the domain is static you can try this without using regex
$old_domain = 'https://www.example.com/asset/css/style.css';
if (substr($old_domain, -4) == '.css'){
echo str_replace('www.example.com', 'www.new-domain.net', $old_domain);
}
Related
I have one url fragment: page/login and i need to know if another url fragment contains them.
These, will match:
/admin/page/login/
/admin/page/login
admin/page/login
http://www.dot.com/admin/page/login
/admin/page/login?id=10
/admin/page/login/id/10
/admin/page/login/?id=10
/admin/page/login/user?id=10
/admin/page/login/user/?id=10
page/login
page/login/
page/login/id/10
/page/login/id/10
And these not:
/admin/firstpage/login
admin/page/loginOk
/admin/page/loginOk/id/10
mypage/login/id/10
/mypage/login/id/10
mypage/login
I tried: page\/login[\/\s\?], \/?page\/login[\/\s\?] without any result
You can use a word boundary so partial matches aren't matched.
\bpage\/login[\/\s?]
Demo: https://regex101.com/r/yhNsdw/1/
Also if you change your delimiter none of the forward slashes will need to be escaped.
I'm trying to write a regexp.
some background info: I am try to see if the REQUEST_URI of my website's URL contains another URL. like these:
http://mywebsite.com/google.com/search=xyz
However, the url wont always contain the 'http' or the 'www'. so the pattern should also match strings like:
http://mywebsite.com/yahoo.org/search=xyz
http://mywebsite.com/www.yahoo.org/search=xyz
http://mywebsite.com/msn.co.uk'
http://mywebsite.com/http://msn.co.uk'
there are a bunch of regexps out there to match urls but none I have found do an optional match on the http and www.
i'm wondering if the pattern to match could be something like:
^([a-z]).(com|ca|org|etc)(.)
I thought maybe another option was to perhaps just match any string that had a dot (.) in it. (as the other REQUEST_URI's in my application typically won't contain dots)
Does this make sense to anyone?
I'd really appreciate some help with this its been blocking my project for weeks.
Thanks you very much
-Tim
I suggest using a simple approach, essentially building on what you said, just anything with a dot in it, but working with the forward slashes too. To capture everything and not miss unusual URLs. So something like:
^((?:https?:\/\/)?[^./]+(?:\.[^./]+)+(?:\/.*)?)$
It reads as:
optional http:// or https://
non-dot-or-forward-slash characters
one or more sets of a dot followed by non-dot-or-forward-slash characters
optional forward slash and anything after it
Capturing the whole thing to the first grouping.
It would match, for example:
nic.uk
nic.uk/
http://nic.uk
http://nic.uk/
https://example.com/test/?a=bcd
Verifying they are valid URLs is another story! It would also match:
index.php
It would not match:
directory/index.php
The minimal match is basically something.something, with no forward slash in it, unless it comes at least one character past the dot. So just be sure not to use that format for anything else.
To match an optional part, you use a question mark ?, see Optional Items.
For example to match an optional www., capture the domain and the search term, the regular expression could be
(www\.)?(.+?)/search=(.+)
Although, the question mark in .+? is a non-greedy quantifier, see http://www.regular-expressions.info/repeat.html.
You might try starting your regex with
^(http://)?(www\.)?
And then the rules to match the rest of a URL.
$re = '/http:\/\/mywebsite\.com\/((?:http:\/\/)?[0-9A-Za-z]+(?:-+[0-9A-Za-z]+)*(?:\.[0-9A-Za-z]+(?:-+[0-9A-Za-z]+)*)+(?:\/.*)?)/';
https://regex101.com/r/x6vUvp/1
Obeys the DNS rule that hyphens must be surrounded. Replace http with https? to allow https URLs as well.
According to the list of TLDs at Wikipedia there are at least 1519 of them and it's not constant so you may want to give the domain its own capture group so it can be verified with an online API or a file listing them all.
Here is my two cents :
$regex = "/http:\/\/mywebsite\.com\/((http:\/\/|www\.)?[a-z]*(\.org|\.co\.uk|\.com).*)/";
See the working exemple
But I'm sure you can do better !
Hope it helps.
I have a problem with the following piece of code: pastebin. For example:
/^\/index\.php\/index\/home\/(\w+)$/
It adds a slash before the .php extension. Any ideas how to fix it?
Well, if you pass that example as the uri I see that on line 10 you have preg_quote($uri). That should be the reason. Since dot (.) has a meaning in Regex the function is escaping it.
But that is what you want I believe since if you strip that slash your regex will match ANY character instead of the dot (including the dot). So any of these will be valid:
indexBphp
index-php
indexmphp
index.php
etc...
Dot in Regex means match any character at this position. So I believe that there is nothing wrong, right?
One way to fix this if you still want to have that dot there is to build the regex in two separate parts:
$urlDivided = explode('.php', $url);
$this->finalRegex = preg_quote($urlDivided[0]) . '.php' . preg_quote($urlDivided[1]);
Obviously, the method above assumes that you always have the '.php' extension in the url. You should do sanity checks.
I would like to use preg_match in PHP to test the format of a URL. The URL looks like this:
/2013/09/05/item-01.html
I'm trying to get this to work :
if (preg_match("/([0-9]{4})\/([0-9]{2})\/([0-9]{2})/[.]+[.html]$", $_SERVER['REQUEST_URI'])) :
echo "match";
endif;
But something's not quite right. Any ideas?
Try:
if (preg_match('!\d{4}/\d{2}/\d{2}/.*\.html$!', $_SERVER['REQUEST_URI'])) {
echo 'match';
}
\d is short for [0-9] and you can use different start/end delimiters (I use ! in this case) to make the regexp more readable when you're trying to match slashes.
It looks like you are nearly correct with what you have, though you have some minor problems
you forgot to escape your last "/" before the page.html
the [.]+ should be [^.]+, you aren't looking for 1 or more periods, you are looking for anything not a period.
You shouldnt be using the [] to match the html, but rather () or nothing at all
if (preg_match("/([0-9]{4})\/([0-9]{2})\/([0-9]{2})\/([^.]+.html)$", $_SERVER['REQUEST_URI'])) :
echo "match";
endif;
Also you should probably learn when to use the (), these are used to make sure you are storing whatever is matched inside them. In your case I'm not sure if you want to be storing every directory up until the file or not.
My guess is you had a working expression for a file path, and it stopped working when you tried to add the file name part.
preg_match() requires a pair of delimiter characters to be specified; one at each end of the expression. It looks like you have these, but you've put an extra bit of the expression (ie the file name) at the end of the string outside of the delimiters. This is invalid.
"/([0-9]{4})\/([0-9]{2})\/([0-9]{2})/[.]+[.html]$"
^ ^
your start delimiter your end delimiter
You need to move the expression code [.]+[.html]$ that is currently after the end delimiter so that it is inside it.
That should solve the problem.
This is some code from a php file im working with. I need to match 'domain.com' but when I enter that it's not working because it's parsing a document looking for href tags and i think it needs the http://www. for the match. I tried the below preg match but it didn't work and my coding isn't to great any help would be appreciated.
preg_match ("/domain.com/i");
$match = 'http://www.domain.com';
for($i=0;$i<$documentLinks->length;$i++)
{
$documentLink = $documentLinks->item($i);
if ($documentLink->hasAttribute('href') AND substr(strtolower($documentLink->getAttribute('href')), 0, strlen($match)) == $match)
{
try this:
for($i=0;$i<$documentLinks->length;$i++)
{
$documentLink = $documentLinks->item($i);
if ($documentLink->hasAttribute('href'))
{
if (preg_match('!^https?://([^/]+\.)?domain\.com(/|#|$|\?)!i', trim($documentLink->getAttribute('href'))))
{
the regexp is the important part:
^https?://([^/]+\.)?domain\.com(/|#|$|\?)
start at the beginning of the string, match http or https, then an optional subdomain that may not include forward slashes (so you know you're still in the domain part), followed by the domain you want to match, then either the start of a path, start of a fragment or the end of the url