Why is this foreach failing? - php

The script I am using 'gets' a html page and parses is showing only the .jpg images within, but I need to make some modifications and when i do it simply fails...
This works:
include('simple_html_dom.php');
function getUrlAddress() {
$url = $_SERVER['HTTPS'] == 'on' ? 'https' : 'http';
return $url .'://'.$_SERVER['HTTP_HOST'].$_SERVER['REQUEST_URI'];
}
$html = file_get_html($url);
foreach($html->find('img[src$=jpg]') as $e)
echo '<img src='.$e->src .'><br>';
However, there are some problems... I only want to show images over a certain size, plus some site do not display full URL in the img tag and so need to try to get around that too... so I have done the following:
include('simple_html_dom.php');
function getUrlAddress() {
$url = $_SERVER['HTTPS'] == 'on' ? 'https' : 'http';
return $url .'://'.$_SERVER['HTTP_HOST'].$_SERVER['REQUEST_URI'];
}
$html = file_get_html($url);
foreach($html->find('img[src$=jpg]') as $e)
$image = $e->src;
// check to see if src has domain
if (preg_match("/http/", $e->src)) {
$image = $image;
} else {
$parts = explode("/",$url);
$image = $parts['0']."//".$parts[1].$parts[2].$e->src;
}
$size = getimagesize($image);
echo "<br /><br />size is {$size[0]}";
echo '<img src='.$image.'><br>';
This works, but only returns the first image.
On the example link below there are 5 images, which the first code shows but does not display them as the src is without the leading domain
Example link as mentioned above
Is there a better way to do this? And why does the loop fail?

You seem to be missing a {:
foreach($html->find('img[src$=jpg]') as $e) {

You forgot your brackets:
foreach($html->find('img[src$=jpg]') as $e){
$image = $e->src;
// check to see if src has domain
if (preg_match("/http/", $e->src)) { $image = $image; }
else {
$parts = explode("/",$url);
$image = $parts['0']."//".$parts[1].$parts[2].$e->src;
}
$size = getimagesize($image);
echo "<br /><br />size is {$size[0]}";
echo '<img src='.$image.'><br>';
}

Related

Loading html inside PHP for heroku

I have created separate HTML and jquery sites for iPhone and Ipad. Cause I have used Heroku for deployment I have to have a PHP index. So I have included Html inside PHP. It loads the index.html inside there's folders.but the links are broken( CSS, Images ).
Is there any solution of alternative for that?
<?php
echo $width = "<script>document.write(screen.width);</script>";
echo $height = "<script>document.write(screen.height);</script>";
if($width!=null && height !=null) {
if($width>376 && $height>812){
include_once("ipad/index.html");
} else {
include_once("iphone/index.html");
}
} else {
echo json_encode(array('outcome'=>'error','error'=>"Couldn't redirect. Redirecting to iPhone version"));
}
?>
Note -
I have tried something like this
<?php
echo $width = "<script>document.write(screen.width);</script>";
// echo $height = "<script>document.write(screen.height);</script>";
if($width!=null) {
$path = $_SERVER['DOCUMENT_ROOT'];
if($width>376){
$path = $_SERVER['DOCUMENT_ROOT'];
$path .= "/StanstedGo/ipad/index.html";
include_once($path);
} else {
$path = $_SERVER['DOCUMENT_ROOT'];
$path .= "/StanstedGo/iphone/index.html";
include_once($path);
}
} else {
echo json_encode(array('outcome'=>'error','error'=>"Couldn't redirect. Redirecting to iPhone version"));
}
?>
but it also does not solve my problem. I still have broken links to CSS files.

Print out favicon instead of link to it

I'm trying to print a website's favicon, as an image, not as a link to it.
I have a php script in which I extract the favicon, but now I want to show it as it is.
Here is what I've tried.
//extract favicon
$url = $_POST['url'];
$doc = new DOMDocument();
$doc->strictErrorChecking = FALSE;
$doc->loadHTML(file_get_contents($url));
$xml = simplexml_import_dom($doc);
$arr = $xml->xpath('//link[#rel="shortcut icon"]');
echo "<br>";
//echo "favicon:";
if( $arr)
{
$src = $arr[0]['href'];
echo "<img src = "$src">";//as I can see, the parameter here cannot be a variable
//second thing that I've tried: echo "<img src = "$arr[0]['href']""; it doesn't work either
}
This is what my script is echoing right now. http://i.stack.imgur.com/Wkoyj.jpg
Instead of the link to the favicon, I want the actual favicon to be displayed. I hope I explained myself correctly.
Your error is with the code:
echo "<img src = "$src">";//as I can see, the parameter here cannot be a variable
It should be
echo '<img src="'.$src.'">';
Or even
echo "<img src=\"$src\">";

Parsing image url from source code of the page

Here is my regex to get the image url on the page.
<?php
$url = $_POST['url'];
$data = file_get_contents($url);
$logo = get_logo($data);
function get_logo($html)
{
preg_match_all('/\bhttps?:\/\/\S+(?:png|jpg)\b/', $html, $matches);
//echo "mactch : $matches[0][0]";
return $matches[0][0];
}
?>
Is there any thing missing in regex? for some of the url it does not give image url though they have image in it.
for example: http://www.milanart.in/
it does not give image on that page.
Please No dome. I could not use it.
<?php
$url = "http://www.milanart.in";
$data = file_get_contents($url);
$logo = get_logo($data);
function get_logo($html)
{
preg_match_all("/<img src=\"(.*?)\"/", $html, $matches);
return $matches[1][0];
}
echo 'logo path : '.$logo;
echo '<img src="'.$url.'/'.$logo.'" />';
?>
Use DOM Class of PHP to get all images:
Search for image files in CSS.....url(imagefilename.extension)
Search for image file in HTML ......

Get images from article in Joomla with php

I'm trying edit a plugin which I use to add meta open graph tags to the header. The problem with it is that it would only let me choose one picture for the whole site.. this is what I've done:
preg_match_all('/<img .*?(?=src)src=\"([^\"]+)\"/si', $hdog_base, $image);
if (strlen($hdog_base) <= 25)
{
if (substr($image[0], 0, 4) != 'http')
{
$image[0] = JURI::base().$image[0];
}
$hdog_image_tmp = $image[0];
}
else
{
if (substr($image[1], 0, 4) != 'http')
{
$image[1] = JURI::base().$image[1];
}
$hdog_image_tmp = $image[1];
}
$hdog_image = '<meta property="og:image" content="'.$hdog_image_tmp.'" />
';
$hdog_base is the current webpage I'm on.
The first if-statement would show the very first picture, which is the logo (used for ex. homepage), and the else would show the second picture (which would be different on each page), but the result only comes out like this, no matter if I'm on the homepage or anywhere else on the site:
<meta property="og:image" content="http://mysite.com/Array" />
Any suggestions?
Thanks in advance,
Update:
The biggest fault I'm making is that I am trying to find the images in a url, not the actual webpage. But just the link. So how would I go on to get the contents of the current page in a string? Instead of $hdog_base, which is nothing but a link.
UPDATE, SOLVED:
I used
$buffer = JResponse::getBody();
to get the webpage in HTML
and then DOM for the rest
$doc = new DOMDocument();
#$doc->loadHTML($buffer);
$images = $doc->getElementsByTagName('img');
if (strlen($hdog_base) <= 26)
{
$image = $images->item(0)->getAttribute('src');
}
else
{
$image = $images->item(1)->getAttribute('src');
}
if (substr($image, 0, 4) != 'http') $image = JURI::base().$image;
$hdog_image = '<meta property="og:image" content="'.$image.'" />
';
Thanks a lot cpilko for your help! :)
Using preg_match_all with more than one subpattern in the regular expression will return a multidimensional array. In your code $image[n] is an array. If you cast an array as a string in php, as you are doing it returns the text Array.
EDIT: Using a regex to parse HTML isn't ideal. You are better off doing it with DOMDocument:
$doc = new DOMDocument();
#$doc->loadHTML($hdog_base);
$images = $doc->getElementsByTagName('img');
if (strlen($hdog_base) <= 25) {
$image = $images->item(0)->getAttribute('src');
} else {
$image = $images->item(1)->getAttribute('src');
}
if (substr($image[0], 0, 4) != 'http') $image .= JURI::base();
$hdog_image = '<meta property="og:image" content="'.$hdog_image_tmp.'" />
';

Scrape FULL image src with PHP

I am trying to scrape img src's with php, I can get the src fine, but if the src does not include the full path then I can't really reuse it. Is there a way to grab the full path of the image using php (browsers can get it if you use the right click menu).
ie. How do I get a FULL path including the domain in one of the following two examples?
src="../foo/logo.png"
src="/images/logo.png"
Thanks,
Allan
You don't need a regex... just some patience. I don't really want to write the code for you, but just check if the src starts with http://, and if not, you have like 3 different cases.
If it begins with a / then prepend http://domain.com
If it begins with .. you'll have to split the full URL and hack off pieces until the src starts with a /
Else (it begins with a letter), the take the full domain, and strip it down to the last slash then append the src URL.
Or.... be lazy and steal this script
$url = "http://www.goat.com/money/dave.html";
$rel = "../images/cheese.jpg";
$com = InternetCombineURL($url,$rel);
// Returns http://www.goat.com/images/cheese.jpg
function InternetCombineUrl($absolute, $relative) {
$p = parse_url($relative);
if($p["scheme"])return $relative;
extract(parse_url($absolute));
$path = dirname($path);
if($relative{0} == '/') {
$cparts = array_filter(explode("/", $relative));
}
else {
$aparts = array_filter(explode("/", $path));
$rparts = array_filter(explode("/", $relative));
$cparts = array_merge($aparts, $rparts);
foreach($cparts as $i => $part) {
if($part == '.') {
$cparts[$i] = null;
}
if($part == '..') {
$cparts[$i - 1] = null;
$cparts[$i] = null;
}
}
$cparts = array_filter($cparts);
}
$path = implode("/", $cparts);
$url = "";
if($scheme) {
$url = "$scheme://";
}
if($user) {
$url .= "$user";
if($pass) {
$url .= ":$pass";
}
$url .= "#";
}
if($host) {
$url .= "$host/";
}
$url .= $path;
return $url;
}
From http://www.web-max.ca/PHP/misc_24.php
Unless you have the site URL you're starting with (in which case you can prepend it to the value of the src attribute) it seems like all you're left with there is a string.
I'm assuming you don't have access to any additional information of course. If you're parsing HTML, I'd assume you must be able to access an absolute URL to at least the HTML page, but perhaps not.

Categories