php: str_replace or preg_match? - php

i'm trying to create a script that transform all the relative paths to absolute paths
so how can I find and replace in a html text all the occurences of
src="/jsfile.js
with
src="http://mysite.com/jsfile.js
then
src="../jsfile.js
with
src="http://mysite.com/jsfile.js
and then
src="js/jsfile.js
with
src="http://mysite.com/js/jsfile.js
and maybe more cases? well of course also the href scenarios
UPDATE
maybe my question was bad written, but the goal is to replace any relative url or relative link to an absolute url... i'm not sure if the answers below are working

How about a single regex using preg_replace? It will also work for href and src attributes. Be sure to check the demo to see it in action!
This converts all of the above test cases correctly:
$result = preg_replace( '/(src|href)="(?:\.\.\/|\/)?([^"]+)"/i', '$1="' . $url . '/$2"', $test);
Demo

That's not really a good comparison. Those two functions serve separate purposes. I would personally use three, in this order:
preg_match: Find the URLs that need to be modified.
substr: Modify the URLs.
str_replace: Replace the old URLs with the modified URLs.

If it becomes more than 3, use
$pathes=array(
'src="/jsfile.js' => 'src="http://mysite.com/jsfile.js',
'src="../jsfile.js' => 'src="http://mysite.com/jsfile.js',
'src="js/jsfile.js' => 'src="http://mysite.com/js/jsfile.js'
);
$newhtml=str_replace(array_keys($pathes),$pathes,$oldhtml);

<?php
$html = file_get_contents('index.html');
$html = preg_replace_callback('#"(\S+).js"#', "replace_url", $html);
function replace_url($url) {
return '"http://'.$_SERVER['HTTP_HOST'].chr(47).trim($url[1], '/,.').'.js"';
}
echo $html;
Use preg_replace_callback

Related

PHP: replace relative top URL "../" with absolute domain URL

I want to convert relative URLs that starts with ../stuff/more.php to http://www.example.com/stuff/more.php in my RSS feed.
I used this PHP code to do so is the following:
$content = preg_replace("#(<\s*a\s+[^>]*href\s*=\s*[\"'])(?!http)([^\"'>]+)([\"'>]+)#", '$1http://www.example.com/$2$3', $content);
The result is wrong thought, it returns the URL like this
http://www.example.com/../stuff/more.php
Notice the ../ part hasn't been removed, please help!
So Basically..
This what I have: ../stuff/more.php
This is what I get (after running the code above): http://www.example.com/../stuff/more.php
This what I WANT: http://www.example.com/stuff/more.php
Adding (\.|\.\.|\/)* should work.
$content = preg_replace("#(<\s*a\s+[^>]href\s=\s*[\"'])(?!http)(../|../|/)*([^\"'>]+)([\"'>]+)#", '$1http://www.example.com/$3$4', $content);
Also, note $2$3 has been changed to $3$4
Edit:
Reduced to one alternative:
$content = preg_replace("#(<\s*a\s+[^>]*href\s*=\s*[\"'])(?!http)(\.\.\/)*([^\"'>]+)([\"'>]+)#", '$1http://www.example.com/$3$4', $content);
Why don't you just replace the first 2 dots with the domain?
$result = str_replace('..', 'http://www.example.com', $contet, 1);
Use $_SERVER[HTTP_HOST] $_SERVER[REQUEST_URI] is the global variable in PHP to get the absolute url.
Well, I'll start looking at the regex. Most of it looks good (in fact, you've got a good enough regex here I'm a little surprised you're having trouble otherwise!) but the end is a bit weird -- better like this:
#(<\s*a\s+[^>]*href\s*=\s*[\"'])(?!http)([^\"'>]+)([\"']>)#
(Technically it would be better to capture the starting quote and make sure it's a matching ending quote, but chances are you won't have any problems there.
To remove the ../ I would do it apart from regex entirely:
foreach (array("<a href=\"http://../foo/bar\">",
"<a href=\"../foo/bar\">") as $content) {
echo "A content=$content<br />\n";
########## copy from here down to...
if (preg_match("#(<\s*a\s+[^>]*?href\s*=\s*[\"'])(?!http)([^\"'>]+)([\"']>)#", $content, $m)) {
echo "m=<pre>".print_r($m,true)."</pre><br />\n";
if (substr($m[2], 0, 3) == '../')
$m[2] = substr($m[2], 3);
$content = $m[1].'http://www.example.com/'.$m[2].$m[3];
}
######### copy from above down to HERE
echo "B content=$content<br />\n";
}
(I included a mini-test suite around what you're looking for - you will need to take just the marked lines inside for your code.)
I found the solution thanks to everyone who helped me on this.
Here's the code I used:
$content = preg_replace("#(<a href=\"\.\.\/)#", '<a href="http://www.example.com/', $content);
it searches for <a href="../ and replace it with http://www.example.com/ it's not general but this works for me.

What is the PHP regex to get the parent directory of a page?

I have a page at:
http://somewebsite.com/1234/test/
How do I get the 1234 extracted from it with a PHP regex?
Don't use RegEx use parse_url() and explode().
For one level up, use dirname()
I like to avoid regex if I don't need it, so I would recommend you do it this way:
$url = explode("/", $_SERVER['PHP_SELF']);
Then you can reference the part of the url that you want using this:
$url[1]
If you need/want to use regex, I'll think about it for a second and try to post a solution later.
To just get the one above you can use
echo dirname("http://www.google.com/cake/lol");
Outputs
http://www.google.com/cake
Or for just the bit between the / and / you could do
var_dump(explode("/", "http://www.google.com/cake/lol"));
Or in regex
preg_match_all('#/([^/]*)/#',$sourcestring,$matches);
Perform regex on $_SCRIPT['REQUEST_URL'] or split it by '/' and get the first element => 1234
If the PHP was triggered by launching that URL, then these will probably be true:
$_SERVER['REQUEST_URI'] == "/1234/test/"
dirname($_SERVER['REQUEST_URI']) == "/1234"
basename(dirname($_SERVER['REQUEST_URI'])) == "1234"
preg_match(".*\/(\w+)\/(\w+)\/", "$url", $matches);
The parent directory is $matches[1].

PHP remove page name Regex - preg_replace

I have this url (several similar ones)..
images/image1/image1.jpg
images/images1/images2/image2.jpg
images/images2/images3/images4/image4.jpg
I have this regex: but I want it to strip away the image name from the string:
<?php $imageurlfolder = $pagename1;
$imageurlfolder = preg_replace('/[A-Za-z0-9]+.asp/', '', $pagename1);?>
the string would look like the url's above images/images2/images3/images4/ but without the image4.jpg
hope you can help
Thanks
For this particular purpose function dirname() would be sufficient:
<?php echo dirname('images/images2/images3/images4/image4.jpg'); ?>
Would return:
images/images2/images3/images4
I think you can use the dirname function
for instance (from that page)
dirname("/etc/passwd")
would print
/etc
A quite straightforward way to do it:
preg_replace("#(?<=/)[^/]+$#","",$your_string);
It will remove everything between the last / and the end of the string.
Edit: as many peopole pointed out, you can also use dirname which might proof faster…

How can I use a PHP regex to transform the contents of certain HTML tag attributes?

I think I am right in asuming that RegEx can do this job, I'm just not sure how I would do it!
Basically I have a number of links on my website that are in the format of:
Example
I need some code that will transform the href value so that it gets outputed in lowercase, but that does not affect the anchor text . E.g:
Example
Is this possible? And if so, what would be the code to do this?
you can use preg_replace_callback
something like that
function replace($match){
return strtolower($matches[0])
}
...
preg_replace_callback('/(href="[^"]*")/i' 'replace',$str);
Using preg_match and strtolower functions
preg_match('/\<a(.*)\>(.*)\<\/a\>/i',$cadena, $a);
$a[1]=strtolower($a[1]);
$cadena = preg_replace('/\<a(.*)\>(.*)\<\/a\>/i',$a[1],$cadena);
echo $cadena;
Regards!

regex to get current page or directory name?

I am trying to get the page or last directory name from a url
for example if the url is: http://www.example.com/dir/ i want it to return dir or if the passed url is http://www.example.com/page.php I want it to return page Notice I do not want the trailing slash or file extension.
I tried this:
$regex = "/.*\.(com|gov|org|net|mil|edu)/([a-z_\-]+).*/i";
$name = strtolower(preg_replace($regex,"$2",$url));
I ran this regex in PHP and it returned nothing. (however I tested the same regex in ActionScript and it worked!)
So what am I doing wrong here, how do I get what I want?
Thanks!!!
Don't use / as the regex delimiter if it also contains slashes. Try this:
$regex = "#^.*\.(com|gov|org|net|mil|edu)/([a-z_\-]+).*$#i";
You may try tho escape the "/" in the middle. That simply closes your regex. So this may work:
$regex = "/.*\.(com|gov|org|net|mil|edu)\/([a-z_\-]+).*/i";
You may also make the regex somewhat more general, but that's another problem.
You can use this
array_pop(explode('/', $url));
Then apply a simple regex to remove any file extension
Assuming you want to match the entire address after the domain portion:
$regex = "%://[^/]+/([^?#]+)%i";
The above assumes a URL of the format extension://domainpart/everythingelse.
Then again, it seems that the problem here isn't that your RegEx isn't powerful enough, just mistyped (closing delimiter in the middle of the string). I'll leave this up for posterity, but I strongly recommend you check out PHP's parse_url() method.
This should adequately deliver:
substr($s = basename($_SERVER['REQUEST_URI']), 0, strrpos($s,'.') ?: strlen($s))
But this is better:
preg_replace('/[#\.\?].*/','',basename($path));
Although, your example is short, so I cannot tell if you want to preserve the entire path or just the last element of it. The preceding example will only preserve the last piece, but this should save the whole path while being generic enough to work with just about anything that can be thrown at you:
preg_replace('~(?:/$|[#\.\?].*)~','',substr(parse_url($path, PHP_URL_PATH),1));
As much as I personally love using regular expressions, more 'crude' (for want of a better word) string functions might be a good alternative for you. The snippet below uses sscanf to parse the path part of the URL for the first bunch of letters.
$url = "http://www.example.com/page.php";
$path = parse_url($url, PHP_URL_PATH);
sscanf($path, '/%[a-z]', $part);
// $part = "page";
This expression:
(?<=^[^:]+://[^.]+(?:\.[^.]+)*/)[^/]*(?=\.[^.]+$|/$)
Gives the following results:
http://www.example.com/dir/ dir
http://www.example.com/foo/dir/ dir
http://www.example.com/page.php page
http://www.example.com/foo/page.php page
Apologies in advance if this is not valid PHP regex - I tested it using RegexBuddy.
Save yourself the regular expression and make PHP's other functions feel more loved.
$url = "http://www.example.com/page.php";
$filename = pathinfo(parse_url($url, PHP_URL_PATH), PATHINFO_FILENAME);
Warning: for PHP 5.2 and up.

Categories