I want to convert relative URLs that starts with ../stuff/more.php to http://www.example.com/stuff/more.php in my RSS feed.
I used this PHP code to do so is the following:
$content = preg_replace("#(<\s*a\s+[^>]*href\s*=\s*[\"'])(?!http)([^\"'>]+)([\"'>]+)#", '$1http://www.example.com/$2$3', $content);
The result is wrong thought, it returns the URL like this
http://www.example.com/../stuff/more.php
Notice the ../ part hasn't been removed, please help!
So Basically..
This what I have: ../stuff/more.php
This is what I get (after running the code above): http://www.example.com/../stuff/more.php
This what I WANT: http://www.example.com/stuff/more.php
Adding (\.|\.\.|\/)* should work.
$content = preg_replace("#(<\s*a\s+[^>]href\s=\s*[\"'])(?!http)(../|../|/)*([^\"'>]+)([\"'>]+)#", '$1http://www.example.com/$3$4', $content);
Also, note $2$3 has been changed to $3$4
Edit:
Reduced to one alternative:
$content = preg_replace("#(<\s*a\s+[^>]*href\s*=\s*[\"'])(?!http)(\.\.\/)*([^\"'>]+)([\"'>]+)#", '$1http://www.example.com/$3$4', $content);
Why don't you just replace the first 2 dots with the domain?
$result = str_replace('..', 'http://www.example.com', $contet, 1);
Use $_SERVER[HTTP_HOST] $_SERVER[REQUEST_URI] is the global variable in PHP to get the absolute url.
Well, I'll start looking at the regex. Most of it looks good (in fact, you've got a good enough regex here I'm a little surprised you're having trouble otherwise!) but the end is a bit weird -- better like this:
#(<\s*a\s+[^>]*href\s*=\s*[\"'])(?!http)([^\"'>]+)([\"']>)#
(Technically it would be better to capture the starting quote and make sure it's a matching ending quote, but chances are you won't have any problems there.
To remove the ../ I would do it apart from regex entirely:
foreach (array("<a href=\"http://../foo/bar\">",
"<a href=\"../foo/bar\">") as $content) {
echo "A content=$content<br />\n";
########## copy from here down to...
if (preg_match("#(<\s*a\s+[^>]*?href\s*=\s*[\"'])(?!http)([^\"'>]+)([\"']>)#", $content, $m)) {
echo "m=<pre>".print_r($m,true)."</pre><br />\n";
if (substr($m[2], 0, 3) == '../')
$m[2] = substr($m[2], 3);
$content = $m[1].'http://www.example.com/'.$m[2].$m[3];
}
######### copy from above down to HERE
echo "B content=$content<br />\n";
}
(I included a mini-test suite around what you're looking for - you will need to take just the marked lines inside for your code.)
I found the solution thanks to everyone who helped me on this.
Here's the code I used:
$content = preg_replace("#(<a href=\"\.\.\/)#", '<a href="http://www.example.com/', $content);
it searches for <a href="../ and replace it with http://www.example.com/ it's not general but this works for me.
Related
Lets say that $content is the content of a textarea
/*Convert the http/https to link */
$content = preg_replace('!((https://|http://)+[a-z0-9_./?=&-]+)!i', '<a target="_blank" href="$1">$1</a> ', nl2br($_POST['helpcontent'])." ");
/*Convert the www. to link prepending http://*/
$content = preg_replace('!((www\.)+[a-z0-9_./?=&-]+)!i', '<a target="_blank" href="http://$1">$1</a> ', $content." ");
This was working ok for links, but realised that it was breaking the markup when an image is within the text...
I am trying like this now:
$content = preg_replace('!\s((https?://|http://)+[a-z0-9_./?=&-]+)!i', ' $1 ', nl2br($_POST['content'])." ");
$content = preg_replace('!((www\.)+[a-z0-9_./?=&-]+)!i', '<a target="_blank" href="http://$1">$1</a> ', $content." ");
As is the images are respected, but the problem is that url's with http:// or https:// format won't be converted now..:
google.com -> Not converted (as expected)
www.google.com -> Well Converted
http://google.com -> Not converted (unexpected)
https://google.com -> Not converted (unexpected)
What am I missing?
-EDIT-
Current almost working solution:
$content = preg_replace('!(\s|^)((https?://)+[a-z0-9_./?=&-]+)!i', ' $2 ', nl2br($_POST['content'])." ");
$content = preg_replace('!(\s|^)((www\.)+[a-z0-9_./?=&-]+)!i', '<a target="_blank" href="http://$2" target="_blank">$2</a> ', $content." ");
The thing here is that if this is the input:
www.funcook.com http://www.funcook.com https://www.funcook.com
funcook.com http://funcook.com https://funcook.com
All the urls I want (all, except name.domain) are converted as expected, but this is the output
www.funcook.com http://www.funcook.com https://www.funcook.com ;
funcook.com http://funcook.com https://funcook.com
Note an ; is inserted, any idea why?
try this:
preg_replace('!(\s|^)((https?://|www\.)+[a-z0-9_./?=&-]+)!i', ' $2 ',$text);
It will pick up links beginning with http:// or with www.
Example
You can't at 100%. Becuase there may be links such as stackoverflow.com which do not have www..
If you're only targeting those links:
!(www\.\S+)!i
Should work well enough for you.
EDIT: As for your newest question, as to why http links don't get converted but https do, Your first pattern only searches for https://, or http://. which isn't the case. Simplify it by replacing:
(https://|http://\.)
With
(https?://)
Which will make the s optional.
Another method to go about adding hyperlinks is that you could take the text that you want to parse for links, and explode it into an array. Then loop through it using foreach (very fast function - http://www.phpbench.com/) and change anything that starts with http://, or https://, or www., or ends with .com/.org/etc into a link.
I'm thinking maybe something like this:
$userTextArray = explode(" ",$userText);
foreach( $userTextArray as &$word){
//if statements to test if if it starts with www. or ends with .com or whatever else
//change $word so that it is a link
}
Your changes will be reflected in the array since you had the "&" before $userText in your foreach statement.
Now just implode the array back into a string and you're good to go.
This made sense in my head... But I'm not 100% sure that this is what you're looking for
I had similar problem. Here is function which helped me. Maybe it will fit your needs to:
function clHost($Address) {
$parseUrl = parse_url(trim($Address));
return str_replace ("www.","",trim(trim($parseUrl[host] ? $parseUrl[host].$parseUrl[path] : $parseUrl[path]),'/'));
}
This function will return domain without protocol and "www", so you can add them yourself later.
For example:
$url = "http://www.". clHost($link);
I did it like that, because I couldn't find good regexp.
\s((https?://|www.)+[a-z0-9_./?=&-]+)
The problem is that your starting \s is forcing the match to start with a space, so, if you don't have that starting space your match fails. The reg exp is fine (without the \s), but to avoid replacing the images you need to add something to avoid matching them.
If the images are pure html use this:
(?<!src=")((https?://|www.)+[a-z0-9_./?=&-]+)
That will look for src=" before the url, to ignore it.
If you use another mark up, tell me and I'll try to find another way to avoid the images.
Lets say that $content is the content of a textarea
/*Convert the http/https to link */
$content = preg_replace('!((https://|http://)+[a-z0-9_./?=&-]+)!i', '<a target="_blank" href="$1">$1</a> ', nl2br($_POST['helpcontent'])." ");
/*Convert the www. to link prepending http://*/
$content = preg_replace('!((www\.)+[a-z0-9_./?=&-]+)!i', '<a target="_blank" href="http://$1">$1</a> ', $content." ");
This was working ok for links, but realised that it was breaking the markup when an image is within the text...
I am trying like this now:
$content = preg_replace('!\s((https?://|http://)+[a-z0-9_./?=&-]+)!i', ' $1 ', nl2br($_POST['content'])." ");
$content = preg_replace('!((www\.)+[a-z0-9_./?=&-]+)!i', '<a target="_blank" href="http://$1">$1</a> ', $content." ");
As is the images are respected, but the problem is that url's with http:// or https:// format won't be converted now..:
google.com -> Not converted (as expected)
www.google.com -> Well Converted
http://google.com -> Not converted (unexpected)
https://google.com -> Not converted (unexpected)
What am I missing?
-EDIT-
Current almost working solution:
$content = preg_replace('!(\s|^)((https?://)+[a-z0-9_./?=&-]+)!i', ' $2 ', nl2br($_POST['content'])." ");
$content = preg_replace('!(\s|^)((www\.)+[a-z0-9_./?=&-]+)!i', '<a target="_blank" href="http://$2" target="_blank">$2</a> ', $content." ");
The thing here is that if this is the input:
www.funcook.com http://www.funcook.com https://www.funcook.com
funcook.com http://funcook.com https://funcook.com
All the urls I want (all, except name.domain) are converted as expected, but this is the output
www.funcook.com http://www.funcook.com https://www.funcook.com ;
funcook.com http://funcook.com https://funcook.com
Note an ; is inserted, any idea why?
try this:
preg_replace('!(\s|^)((https?://|www\.)+[a-z0-9_./?=&-]+)!i', ' $2 ',$text);
It will pick up links beginning with http:// or with www.
Example
You can't at 100%. Becuase there may be links such as stackoverflow.com which do not have www..
If you're only targeting those links:
!(www\.\S+)!i
Should work well enough for you.
EDIT: As for your newest question, as to why http links don't get converted but https do, Your first pattern only searches for https://, or http://. which isn't the case. Simplify it by replacing:
(https://|http://\.)
With
(https?://)
Which will make the s optional.
Another method to go about adding hyperlinks is that you could take the text that you want to parse for links, and explode it into an array. Then loop through it using foreach (very fast function - http://www.phpbench.com/) and change anything that starts with http://, or https://, or www., or ends with .com/.org/etc into a link.
I'm thinking maybe something like this:
$userTextArray = explode(" ",$userText);
foreach( $userTextArray as &$word){
//if statements to test if if it starts with www. or ends with .com or whatever else
//change $word so that it is a link
}
Your changes will be reflected in the array since you had the "&" before $userText in your foreach statement.
Now just implode the array back into a string and you're good to go.
This made sense in my head... But I'm not 100% sure that this is what you're looking for
I had similar problem. Here is function which helped me. Maybe it will fit your needs to:
function clHost($Address) {
$parseUrl = parse_url(trim($Address));
return str_replace ("www.","",trim(trim($parseUrl[host] ? $parseUrl[host].$parseUrl[path] : $parseUrl[path]),'/'));
}
This function will return domain without protocol and "www", so you can add them yourself later.
For example:
$url = "http://www.". clHost($link);
I did it like that, because I couldn't find good regexp.
\s((https?://|www.)+[a-z0-9_./?=&-]+)
The problem is that your starting \s is forcing the match to start with a space, so, if you don't have that starting space your match fails. The reg exp is fine (without the \s), but to avoid replacing the images you need to add something to avoid matching them.
If the images are pure html use this:
(?<!src=")((https?://|www.)+[a-z0-9_./?=&-]+)
That will look for src=" before the url, to ignore it.
If you use another mark up, tell me and I'll try to find another way to avoid the images.
i'm trying to create a script that transform all the relative paths to absolute paths
so how can I find and replace in a html text all the occurences of
src="/jsfile.js
with
src="http://mysite.com/jsfile.js
then
src="../jsfile.js
with
src="http://mysite.com/jsfile.js
and then
src="js/jsfile.js
with
src="http://mysite.com/js/jsfile.js
and maybe more cases? well of course also the href scenarios
UPDATE
maybe my question was bad written, but the goal is to replace any relative url or relative link to an absolute url... i'm not sure if the answers below are working
How about a single regex using preg_replace? It will also work for href and src attributes. Be sure to check the demo to see it in action!
This converts all of the above test cases correctly:
$result = preg_replace( '/(src|href)="(?:\.\.\/|\/)?([^"]+)"/i', '$1="' . $url . '/$2"', $test);
Demo
That's not really a good comparison. Those two functions serve separate purposes. I would personally use three, in this order:
preg_match: Find the URLs that need to be modified.
substr: Modify the URLs.
str_replace: Replace the old URLs with the modified URLs.
If it becomes more than 3, use
$pathes=array(
'src="/jsfile.js' => 'src="http://mysite.com/jsfile.js',
'src="../jsfile.js' => 'src="http://mysite.com/jsfile.js',
'src="js/jsfile.js' => 'src="http://mysite.com/js/jsfile.js'
);
$newhtml=str_replace(array_keys($pathes),$pathes,$oldhtml);
<?php
$html = file_get_contents('index.html');
$html = preg_replace_callback('#"(\S+).js"#', "replace_url", $html);
function replace_url($url) {
return '"http://'.$_SERVER['HTTP_HOST'].chr(47).trim($url[1], '/,.').'.js"';
}
echo $html;
Use preg_replace_callback
I’m working on a small hoppy project where I want to replace a specific page on a URL. Let me explain:
I’ve got the URL
http://www.example.com/article/paragraph/low/
I want to keep the URL but replace the last segment /low/ with /high/ so the new URL is:
http://www.example.com/article/paragraph/high/
I’ve tried different explode, split and splice but I just can’t seem to wrap my head around it and make it work. I can change the entire URL but not just the last segment and save it in a new variable.
I’m pretty confidence that it is a pretty straight forward case but I’ve never worked that much with arrays / string-manipulation in PHP so I’m pretty lost.
I guess that I have to first split the URL up in segments, using the "\" to separate it (I tried that but have problems by using explode("\", $string)) and then replace the last \low\ with \high\
Hope someone could help or point me in the right direction to what methods to use for doing this.
Sincere
Mestika
how about str_replace?
<?php
$newurl = str_replace('low', 'high', $oldurl);
?>
documentation;
http://php.net/manual/en/function.str-replace.php
edit;
Rik is right; if your domain (or any other part of the url for that matter) includes the string "low", this will mess up your link.
So: if your url may contain multiple 'low' 's, you will have to add an extra indicator in the script. An example of that would be including the /'s in your str_replace.
You took \ for /.
$url = explode('/', rtrim($url, '/'));
if (end($url) == 'low') {
$url[count($url)-1] = 'high';
}
$url = implode('/', $url) .'/';
Use parse_url to split the URL into its components, modify them as required (here you can use explode to split the path into its segments), and then rebuild the URL with http_build_url.
<?php
class TestURL extends PHPUnit_Framework_TestCase {
public function testURL() {
$URL = 'http://www.mydomain.com/article/paragraph/low/';
$explode = explode('/', $URL);
$explode[5] = 'high';
$expected = 'http://www.mydomain.com/article/paragraph/high/';
$actual = implode('/', $explode);
$this->assertEquals($expected, $actual);
}
}
--
phpunit simple-test.php
PHPUnit 3.4.13 by Sebastian Bergmann.
.
Time: 0 seconds, Memory: 4.75Mb
OK (1 test, 1 assertion)
This will probably be enough:
$url = "http://www.mydomain.com/article/paragraph/low/";
$newUrl = str_replace('/low/', '/high/', $url);
or with regular expressions (it allows more flexibility)
$url = "http://www.mydomain.com/article/paragraph/low/";
$newUrl = preg_replace('/low(\/?)$/', 'high$1', $url);
Note that the string approach will replace any low segment and only if it's followed by a /. The regex approach will replace low only if it's the last segment and it may not be followed by a /.
I am trying to get the page or last directory name from a url
for example if the url is: http://www.example.com/dir/ i want it to return dir or if the passed url is http://www.example.com/page.php I want it to return page Notice I do not want the trailing slash or file extension.
I tried this:
$regex = "/.*\.(com|gov|org|net|mil|edu)/([a-z_\-]+).*/i";
$name = strtolower(preg_replace($regex,"$2",$url));
I ran this regex in PHP and it returned nothing. (however I tested the same regex in ActionScript and it worked!)
So what am I doing wrong here, how do I get what I want?
Thanks!!!
Don't use / as the regex delimiter if it also contains slashes. Try this:
$regex = "#^.*\.(com|gov|org|net|mil|edu)/([a-z_\-]+).*$#i";
You may try tho escape the "/" in the middle. That simply closes your regex. So this may work:
$regex = "/.*\.(com|gov|org|net|mil|edu)\/([a-z_\-]+).*/i";
You may also make the regex somewhat more general, but that's another problem.
You can use this
array_pop(explode('/', $url));
Then apply a simple regex to remove any file extension
Assuming you want to match the entire address after the domain portion:
$regex = "%://[^/]+/([^?#]+)%i";
The above assumes a URL of the format extension://domainpart/everythingelse.
Then again, it seems that the problem here isn't that your RegEx isn't powerful enough, just mistyped (closing delimiter in the middle of the string). I'll leave this up for posterity, but I strongly recommend you check out PHP's parse_url() method.
This should adequately deliver:
substr($s = basename($_SERVER['REQUEST_URI']), 0, strrpos($s,'.') ?: strlen($s))
But this is better:
preg_replace('/[#\.\?].*/','',basename($path));
Although, your example is short, so I cannot tell if you want to preserve the entire path or just the last element of it. The preceding example will only preserve the last piece, but this should save the whole path while being generic enough to work with just about anything that can be thrown at you:
preg_replace('~(?:/$|[#\.\?].*)~','',substr(parse_url($path, PHP_URL_PATH),1));
As much as I personally love using regular expressions, more 'crude' (for want of a better word) string functions might be a good alternative for you. The snippet below uses sscanf to parse the path part of the URL for the first bunch of letters.
$url = "http://www.example.com/page.php";
$path = parse_url($url, PHP_URL_PATH);
sscanf($path, '/%[a-z]', $part);
// $part = "page";
This expression:
(?<=^[^:]+://[^.]+(?:\.[^.]+)*/)[^/]*(?=\.[^.]+$|/$)
Gives the following results:
http://www.example.com/dir/ dir
http://www.example.com/foo/dir/ dir
http://www.example.com/page.php page
http://www.example.com/foo/page.php page
Apologies in advance if this is not valid PHP regex - I tested it using RegexBuddy.
Save yourself the regular expression and make PHP's other functions feel more loved.
$url = "http://www.example.com/page.php";
$filename = pathinfo(parse_url($url, PHP_URL_PATH), PATHINFO_FILENAME);
Warning: for PHP 5.2 and up.