PHP's file_get_contents: dealing with relative paths inside the result - php

I'm trying to solve a cross-domain issue, so I'm implementing a script that will get a URL from a GET parameter and open with file_get_contents. It works fine until the page try to get relative paths like (the following line is inside index.html):
<script src="js/custom_script.js" />
If I create a regex with preg_replace that replace all the HTML data switching js/custom.js to http://content.domain/js/custom_script.js it also works, but the problem is that I don't always know how many levels are inside the page I'm trying to open, like: index.html could have a button to another page with another relative paths.
Is there an elegant solution to this problem?

I would use the base HTML tag, and instead of scraping through the whole source, just insert it right before the </head> tag.
Could be really simple with just a line of code: echo str_replace("</head>", "<base href=\"http://content.domain/\" target=\"_blank\"></head>", $source)
Using the base tag makes the browser handle all the nested links for you:
Specify a default URL and a default target for all links on a page:

Related

file_get_contents not returning entire webpage

I've been trying to retrieve the contents of a webpage (http://3sk.tv) using file_get_contents. Unfortunately, the resulting output is missing many elements (images, formating, styling, etc...), and just basically looks nothing like the original page I'm trying to retrieve.
This has never happened before with any other URLs I have tried retrieve using this same method, but for some reason, this particular URL (http://3sk.tv) refuses to work properly.
The code I'm using is:
<?php
$homepage = file_get_contents('http://3sk.tv');
echo $homepage;
?>
Am I missing anything? All suggestions on how to get this working properly would be greatly appreciated. Thank you all for your time and consideration.
Thats normal behaviour, as you are only grabbing the file, and not related images, stylesheets etc...
I have one quick workaround to fix relative paths
http://www.w3schools.com/tags/tag_base.asp
Just add to your code <base> tag.
<?php
$homepage = file_get_contents('http://3sk.tv');
echo str_replace(
'<head>',
'<head><base href="http://3sk.tv" target="_blank">',
$homepage
);
?>
It's should help.
This is to be expected. If you look at the source code, you'll notice many places which do not have a full URL (ex lib/dropdown/dropdown.css). This tells the browser to assume http://3sk.tv/lib/dropdown/dropdown.css. However, on your website, it will be YOURURL.COM/lib/dropdown/dropdown.css, which does not exist. This will be the case for much of the content.
So, you can't just print another website's source and expect it to work. It needs to be the same URL.
The best way to embed another website is usually to just use an iframe or some alternative.
The webpage is not completely generated server-side, but it relies heavily on JavaScript after the HTML part loads. If you are looking for rendering the page as it looks in browser, you may need a headless browser instead - see e.g. this binding to PhantomJS: http://jonnnnyw.github.io/php-phantomjs/

Relative Link to Subfolder?

I have the link: www.mysite.com/jobs/stackoverflow
And I want to get this link using a relative file path: www.mysite.com/jobs/stackoverflow/123
Right now I am getting the current url using PHP and then appending '/123' onto the end of the 'href' value to access the subfolder. This method works but I was wondering if there is a way to write a relative link that will link to a subfolder. At first I thought it would be as easy as typing:
Link
but that replaces the 'stackoverflow' part rather than appending it onto the end.
I'm fine with using the PHP solution and I am not looking for a javascript or jQuery solution, I was just curious to see if there was a simple way to do this using only HTML.
I found someone linking to the current directory like this:
Basic HTML - how to set relative path to current folder?
so putting the ./ before the 123 in your link might work.

CSS failure when using INCLUDE Function

I downloaded a Template + CSS File for a Website that I'm Building, the template worked well until I tried to break it down and put every code in its own file (for easy modification and editing in the future).
So, when I cut the head part which included (Title + Meta Data .. etc ), and put it in its own file, and replaced it (for sure) with an include() function, I lost the CSS styles and returned to the basic & standard style (Black & white with no extra format .. etc)
Where did I Go wrong? Knowing that here is the include function that I've used:
<?php
include 'files/head.php';
?>
With an URL like file:///C:/xampp/htdocs/test6/index.php PHP is NOT executed. You must run it with apache being involved. Currently you are opening your PHP script as a regular txt or html file - it is just passed to browser without processing.
In order to make include function work you must run it with apache. As you are using xamp, I think you should simply open it with URL like http://localhost/test6/index.php In this case, apache will get that request and pass it to PHP. PHP engine will interpret your PHP script and "replace" include files/head.php with a content of head.php.
If everything is Ok, after pressing Ctrl+U (or looking at HTML with Developer Tools or Firebug) you should see a content of head.php instead of <?php include ....
Please note that css files should be linked with relative URL like css/screen.css. Or absolute URL like http://localhost/test6/css/screen.css. like Search for relative and absolute URLs in google for more info.

Using PHP to assign css/js files to an html document

I've created my own templating/viewing engine to use with Codeigniter. In it I'm able to specify certain css/js files to use with a specific view. I assign the file names in an array, which will then get looped through while echoing the necessary <link href="X"..., <script type="X"..., etc for the respective file type in the header file of the template.
The problem is that I can't seem to use the resources I'm trying to include. The CSS/JS files aren't working even though they're being included and embedded and everything looks right in terms of the syntax in the HTML source code.
My theory is that because I'm using echo to actually print the link/script object into the HTML, that it's actually not really an object that HTML can recognize? Kind of like trying to echo an object in PHP - it doesnt work.
Any advise?
It does not matter if you use plain HTML of php generated code. For the browser it is all the same.
You need to check your source code, and check if the scripts you include are accessible. So copy/paste the src="blabla" from your source code from your browser, and paste it in the address-bar and see what happens.
It is definitely not PHP's fault.

How to keep a website with url routing directory independent

I'm developing a PHP website that uses url routing. I'd like the site to be directory independent, so that it could be moved from http://site.example.com/ to http://example.com/site/ without having to change every path in the HTML. The problem comes up when I'm linking to files which are not subject to routing, like css files, images and so on.
For example, let's assume that the view for the action index of the controller welcome contains the image img/banner.jpg. If the page is requested with the url http://site.example.com/welcome, the browser will request the image as http://site.example.com/img/banner.jpg, which is perfectly fine. But if the page is requested with the url http://site.example.com/welcome/index, the browser will think that welcome is a directory and will try to fetch the image as http://site.example.com/welcome/img/banner.jpg, which is obviously wrong.
I've already considered some options, but they all seem imperfect to me:
Use url rewriting to redirect requests from (*.css|*.js|...) or (css/*|js/*|...) to the right path.
Problems: Every extension would have to be named in the rewrite rules. If someone would add a new filetype (e.g. an mp3 file), it wouldn't be rewritten.
Prepend the base path to each relative path with a php function. For example:
<img src="<?php echo url::base(); ?>img/banner.jpg" />
Problems: Looks messy; css- and js-files containing paths would have to be processed by PHP.
So, how do you keep a website directory independent? Is there a better/cleaner way than the ones I came up with?
You could put in the head
<base href="<?php echo url::base(); ?>" />
This will mean the browser will request any non-absolute URLs relative to that path. However I am not sure how this would affect URLs embedded in CSS files etc. This does not affect paths defined in CSS files. (thanks mooware)
The <base> thing will work but you need to remember it's going to affect your <a> tags too. Consider this example.:
<!-- this page is http://oursite.com/index.html -->
<html>
<head>
<base href="http://static.oursite.com/" />
</head>
<body>
<img src="logo.gif" alt="this is http://static.oursite.com/logo.gif" />
this links to http://static.oursite.com/login which is not what we wanted. we wanted http://oursite.com/login
</body>
</html>
If you use a PHP function call for creating your links, that won't be a problem as you can just make sure it spits out absolute URL. But if you (or your designers) hand-code the <a> tags then you're stuck with the same problem again, just now with <a> instead of <img>.
EDIT: I should add the above paragraph is assuming you serve images from a different host name like we do. If you don't then obviously that won't be a problem.
tomhaigh has a good point, and would be worthwhile to investigate it further.
According to MSDN, the base tag works for all external sources, including style sheets, images, etc.
Perhaps I'm missing something, but can't you just do what I (and I thought everybody else) do/es? Namely put all your images, css, javascripts, etc in a common directory i.e.:
/inc/images/
/inc/css/
/inc/javascript/
etc
And then reference them with base-relative URLs, i.e.:
<img src="/inc/images/foo.jpg" />
etc
?

Categories