Relative paths INSIDE file_get_contents result

Relative paths INSIDE file_get_contents result - php

I am loading a url into my page on my local server with PHP file_get_contents. It works fine but relative paths on the site I am pulling obviously fail and default to my localhost.
Does anyone have any advice on how I could swap relative paths of a page I am pulling to absolute ones?
Ive tried something like this but it fails...
$homepage = file_get_contents($theURL);
$homepage2 = str_replace($homepage, "/images", $theURL + '/images');
echo $homepage2;
Thanks!

You're arguments for str_replace are in the wrong order.
Instead of
$homepage2 = str_replace($homepage, "/images", $theURL + '/images');
You should be doing
$homepage2 = str_replace("/images", $theURL + '/images', $homepage);
http://php.net/str_replace
str_replace(search, replace, subject)

Related

Just another php image exists issue

All of this stuff is for example (names aren't actual).
Everything is also located on localhost:8080 (USBWebserver 8.5)
Directory Structure:
(Files located on localhost:8080/[project_name])
/ajax
/ajax_file.php
/img
/250x250
/[image_name].jpg
Code (From ajax_file.php):
$url = 'img/250x250/'.$image_name.'.jpg';
$url = file_exists($url);
This will return false.
I've tried an img_exists($url) function which used cUrl that did not work.
I've also tried:
$url = 'img/250x250/'.$image_name.'.jpg';
$image_check = getimagesize($url);
if (!is_array($image_check))
{
$url = 'img/default_image.png';
}
but this returns a warning for getimagesize() saying no file or directory exists.
When I put $url = 'img/250x250/'.$image_name.'.jpg' into <img src="$url" /> the image shows up...but if the image does not exist then it comes up with a broken image...
How come anything I try to do fails in some way?
I want a default image to show up when the image is broken :/
EDIT
$url = 'img/products/250x250/'.$image_name.'.jpg';
$url = var_dump(file_exists($url));
Returns bool(false)
$url = '../img/products/250x250/'.$image_name.'.jpg';
$url = var_dump(file_exists($url));
Returns bool(false)

It appears as if you need to branch out of the ajax folder before accessing img folder?
Try:
$url = '../img/250x250/'.$image_name.'.jpg';
#Alex Lunix
My guess is that he put the img tag inside of the actual php page, not the ajax script.

If you're in /ajax/ajax_file.php and you look for 'img/250x250/'.$image_name.'.jpg' it will be looking for /ajax/img/250x250/'.$image_name.'.jpg. Instead you should be using
$url = '../img/250x250/'.$image_name.'.jpg';
Although I'm not sure why it shows up in image tags, my guess is you're getting lucky and your browser is fixing the url.

How to scrape only the largest images from the DOM?

I am using SimpleHTMLDOM to scrape pages (in servers other than mine).
The basic implementation is
try {
$html = file_get_html(urldecode(trim($url)));
} catch (Exception $e) {
echo $url;
}
foreach ($html->find('img') as $element) {
$src = "";
$src = $element->src;
if (preg_match("/\.(?:jpe?g|png)$/i", $src)) {
$images[] = $src;
}
}
This works fine but it returns all images from the page, including small avatars, icons, and button images. Of course I'd like to avoid these.
I then tried to insert within the loop as follows
...
if (preg_match("/\.(?:jpe?g|png)$/i", $src)) {
$size = getimagesize($src);
if ($size[0] > 200) {
$images[] = $src;
}
}
...
That works well on a page like http://cnn.com.
But in others it returns numerous errors.
For example
http://www.huffingtonpost.com/2012/05/27/alan-simpson-republicans_n_1549604.html
gives a bunch of errors like
<p>Severity: Warning</p>
<p>Message: getimagesize(/images/snn-logo-comments.png): failed to open stream: No such file or directory
<p>Severity: Warning</p>
<p>Message: getimagesize(/images/close-gray.png): failed to open stream: No such file or directory
which seem to happening because of relative URLs in some images. The problem here is that this crashes the script and then no images a loaded, with my Ajax box loading forever.
Do you have any ideas how to troubleshoot this?

The problem is that the image URLs are relative to the site root, so your server can't make sense of them to fetch them and find out their size. You could refer to this question to figure out how to get absolute URLs from relative ones.

The approach you tried with image size checking is correct.
However, in order for it to work on all sites, you would need to add some kind of relative URL parsing.
I don't know if there are any libraries or such for it but here's a quick overview on how to do it:
Find the domain part of the URL you're scraping
Assume any URL starting with / is an absolute URL. You can fetch these simply by concatenating domain and path
Assume any URL not starting with / is relative. You may need to parse any .. markers in the URL to locate the expected path
Check for the <base> tag in the document: If the document has a <base> tag, it will anchor all relative paths into the path defined in the tag.
You may be able to find a library to convert relative paths and absolute paths into something you can use, but in most cases they will not account for the <base> tag mentioned in the last point.

Try something like this assuming a url of http://somedomain.com...
$domain = explode('/', $url);
$domain = $domain[2];
// ... snip ...
if (preg_match("/\.(?:jpe?g|png)$/i", $src)) {
$size = getimagesize($src);
if ($size[0] > 200) {
if(strpos($src, '/', 0) === 0)
$src = $domain . $src;
$images[] = $src;
}
}
This will help some, but it won't be fool-proof - I can't think of many domains using ../../etc relative paths to images, but I'm sure someone is - of course, you could test for a match of anything other than the domain in the image's src attribute, and try throwing the domain on there but no promises that will work every time either. I would think there's a better way... perhaps have a default method and load a config with predefined domain "fixes" for troublesome domains.

Get full path to image from php

Hy!
I parse a website with simplehtml dom to get all links from the pictures.
The problem is that the link is like "/pics/bla.jpg".
I have the full path from the website like "http://xxx.xxx/blob/gulsch".
Now i want to get the full image link from the image (link root + /pics/bla.jpg) (no concat)
like: http://xxx.xxx/pics/bla.jpg
This should work for many websites
I tried it with explode()
$root = explode("/", $link);
echo $root[2];
I never get it working.
Please help.

Try with parse_url:
$r = parse_url($websiteUrl);
$imageUrl = $r["scheme"] . "://" . $r["host"] . "/" . $imageRelativeUrl;
The "root" of the website is simply $r["host"].

Given the parts of a URL, you can build a full URL with http_build_url.

PHP take arguments from URL path

Say I have a url like this:
http://www.mysite.com/forum/board1/sub-forum/topics/123
Is there a simple way in PHP (can't use HTAccess) to take that URL and extract board1, sub-forum, topics and 123 so I can use them in a database for example? Are there any built in functions or will I have to write my own?
Thanks,
James

explode('/', getenv('REQUEST_URI'));
If your environment happens to include the query string part in the above value, here's a neat workaround:
explode('/', strtok(getenv('REQUEST_URI'), '?'));

You can, but without redirecting requests your webserver will just return a 404 error for non-existing paths.
However, you can use urls like http://your.site.com/index.php/foo/bar/baz and then split the url into parts like #pestaa said which you can then parse into parameter values.

This is taken from my MVC
http://www.phpclasses.org/package/6363-PHP-Implements-the-MVC-design-pattern.html
The link is outdated at the minute, I have just updated it so it does not have the MVC stuff in, and this can be called with getLoadDetails($_URL); amd $_URL will be exactly the same as $_GET other than it gets the data from the folder path.
function getLoadDetails(&$_URL){
$filePath = $_SERVER['REQUEST_URI'];
$filePath = explode("/", $filePath);
for($i = 0; $i < count($filePath); $i++){
$key = $filePath[$i];
$i++;
$val = $filePath[$i];
$keyName = urldecode($key);
$_URL[$keyName] = urldecode($val);
}
}
I do have one question, if you cant use HTACCESS how do you plan on coping with the folder path please dont tell me your system is going to create the folder paths and index file for every URL that will trash your server Speed and your Host will hate you for it.

If you already have configured your web server to send those requests to your particular PHP file, you can use parse_url and explode to get and then split the requested URI path into its segments:
$_SERVER['REQUEST_URI_PATH'] = parse_url($_SERVER['REQUEST_URI'], PHP_URL_PATH);
$pathSegments = explode('/', $_SERVER['REQUEST_URI_PATH']);

PHP absolute path with timThumb doesn't work

I'm writing a wordpress plugin, and using this script to resize images: Timthumb
This script uses absolute paths, but I can't get it to work for me; I've triple-checked all my paths but still nothing.
Here is my code:
$plugin_dir_name = "my-plugin";
$pathTimThumb = WP_PLUGIN_URL . '/' . $plugin_dir_name . '/timthumb.php';
$pathToUpload = WP_CONTENT_URL.'/uploads/'.$plugin_dir_name;
$hImg = 150;
$wImg = 150;
....
$myImage = '<img class="thumb" src="'.$pathImageThumb.'?src='.$pathToUpload.'/'.$allImages[$i].'&h='.$hImg.'&w='.$wImg.'&zc=1" alt="">';
In firebug I get this URL:
<img alt="" src="http://localhost/mu/wp-content/plugins/my-plugin/timthumb.php?src=http://localhost/mu/wp-content/uploads/my-plugin/car___1/26zhoar5.jpg&h=150&w=150&zc=1" class="thumb">
Where is the mistake?

use this one.
$my_plugin_url = plugins_url('my-plugin-name/');
$my_timthumb_url = $my_plugin_url.'timthumb.php?';
$my_image_url = 'http://localhost/images/image.jpg';
echo '<img alt="" src="'.$my_timthumb_url.'src='.$my_image_url.'&h=150&w=150&zc=1"/>';
things to consider to make timthumb work:
chmod your timthumbs cache folder
do not use external images
check you timthumb version
Cheers,Dave

WP_CONTENT_URL is a url, not an absolute path. Use WP_CONTENT_DIR instead.

TimThumb tries to determine the local path of the image by stripping out http://CURRENT_HOST.tld from the beginning of the src parameter.
Since you're running on localhost it might be getting a little confused and falsely calculating it as an external image. I doubt this is the case (I checked the source and it should be OK), but it's an educated guess.
Have you tried reading the HTTP response headers from http://localhost/mu/wp-content/plugins/my-plugin/timthumb.php?src=http://localhost/mu/wp-content/uploads/my-plugin/car___1/26zhoar5.jpg&h=150&w=150&zc=1?
If not, use HttpFox for FireFox and post back the results.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

Relative paths INSIDE file_get_contents result - php

You're arguments for str_replace are in the wrong order. Instead of $homepage2 = str_replace($homepage, "/images", $theURL + '/images'); You should be doing $homepage2 = str_replace("/images", $theURL + '/images', $homepage); http://php.net/str_replace str_replace(search, replace, subject)

Related

Just another php image exists issue

How to scrape only the largest images from the DOM?

Get full path to image from php

PHP take arguments from URL path

PHP absolute path with timThumb doesn't work

Categories

Resources