I have an automated archive of several (media) websites' frontpage, written in php. Specifically, I am copying the html in the <body> tag twice a day, I have a copy of all their css and js files, so I can recreate the frontpage from any point in the past. Now, I came to a problem with one of those websites, as they load the main slider content (most important news) with an ajax call. I would like this ajax call to be executed before I parse the data, not just a blank div. By looking around, I found out they use a wordpress plugin named lof-jslidernews2, but I can't find the specific ajax call to see the url and make curl request. Any ideas how to achieve this?
The website: http://fokus.mk/
My code (had to parse manually like this, because of some problems with DomDocument and not-valid html):
// ...
if($html = file_get_contents ($row['page_url'])) {
$content = strstr($html, '<body');
$content = str_before($content, '</body>') . '</body>';
$filename = date('YmdHis') . $row['page_name'];
if($success = file_put_contents ('app/webroot/files/' . $filename, $content)) {
// ....
** There is nothing illegal about my project, I am not stealing content, just freezing frontpages for later comparison. I have consulted a lawyer about this. :)
I don't know why, but the guy that actually solved my problem deleted his answer. So, here it is:
He suggested using an emulator, specifically Mink. It was easy to install (using composer) and did the job on the first try. Awesome library.
Mink is an open source browser controller/emulator for web applications, written in PHP 5.3.
i got image url saved in my database.i am using php and mysql.
Some images can be displayed but others are restricted.At the moment restricted image show up broken on my site.
I only which to display non restricted images.
url for image that can be shown is
http://images.icecat.biz/img/gallery/16678932_9061.jpg
restricted image is
http://images.icecat.biz/img/gallery/8622798_7908.jpg
i have tried getimagesize but cant seem to be having any luck.
kind regards
nafri
file_exists() doesn't work across domains. Server side can be done like:
$url = 'http://images.icecat.biz/img/gallery/16678932_9061.jpg';
$header = get_headers($url, 1);
if(strpos( $header[0],'200') === false){
// do what ever
}
EDIT: fixed for 200 response. Better use curl though, faster it is
If you're happy to handle this on the client side, then you could use javascript to deal with this:
You can use handle the onError event to replace the image or do something else in the event that the image cannot be displayed.
See this answer for an example.
I'm using an external web service that will return an image URL which i will display in my website, For example :
$url = get_from_web_service();
echo '<img url="'.$url.'" />';
everything is working fine except if i have 100 images to show then calling the web service become time & resources consuming.
//the problem
foreach($items as $item) {
$url = get_from_web_service($item);
echo '<img url="'.$url.'" />';
}
So now i'm considering two options:
//Option1: Using php get_file_contents():
foreach($items as $item)
{
echo '<img url="url_to_my_website/get_image.php?id='.$item->id.'" />'
}
get_image.php :
$url = get_from_web_service($id);
header("Content-Type: image/png");
echo file_get_contents($url);
//Option2: Using ajax:
echo '<img scr="dummy_image_or_website_logo" data-id="123" />';
//ajax call to the web service to get the id=123 and get the url then add the src attribute to that image.
THOUGHTS
First option seems more straight forward, but my server might be
overloaded and involved in every single image request.
Second option it's all done by browser & web service so my server is not involved at all. but for each image i'm making 2 calls 1 ajax call to get the image URL and another one one to get the image. so loading time might be vary and ajax calls might fail for large number of calls.
Information
Around 50 Images will be displayed in that page.
This service will be used by around a 100 user at a given time.
I have no control over the web service so i can't change its functionality and it doesn't accept more than 1 image ID for each call.
My Questions
Any better option i should consider?
If not, which option should I follow? and most important why i should follow that one?
Thanks
Method 1: Rendering in PHP
Pros:
Allows for custom headers that're independent of any server software. If you're using something that's not generally cached (like a PHP file with a query string) or are adding this to a package that needs header functionality regardless of server software, this is a very good idea.
If you know how to use GD or Imagick, you can easily resize, crop, compress, index, etc. your images to reduce the image file size (sometimes drastically) and make the page load significantly faster.
If width and height are passed as variables to the PHP file, the dimensions can be set dynamically:
<div id="gallery-images">
<noscript>
<!-- So that the thumbnail is small for old mobile devices //-->
<img src="get-image.php?id=123&h=200&w=200" />
</noscript>
</div>
<script type="text/javascript">
/* Something to create an image element inside of the div.
* In theory, the browser height and width can be pulled dynamically
* on page load, which is useful for ensuring that images are no larger
* than they need to be. Having a function to load the full image
* if the borwser becomes bigger isn't a bad idea though.
*/
</script>
This would be incredibly considerate of mobile users on a page that has an image gallery. This is also very considerate of users with limited bandwidth (like almost everyone in Alaska. I say this from personal experience).
Allows you to easily clear the EXIF data of images if they're uploaded by users on the website. This is important for user privacy as well as making sure there aren't any malicious scripts living in your JPGs.
Gives potential to dynamically create a large image sprite and drastically reduce your HTTP requests if they're causing latency. It'd be a lot of work so this isn't a very strong pro, but it's still something you can do using this method that you can't do using the second method.
Cons:
Depending on the number and size of images, this could put a lot of strain on your server. When used with browser-caching, the dynamic images are being pulled from cache instead of being re-generated, however it's still very easy for a bot to be served the dynamic image a number of times.
It requires knowledge of HTTP headers, basic image manipulation skills, and an understanding of how to use image manipulation libraries in PHP to be effective.
Method 2: AJAX
Pros:
The page would finish loading before any of the images. This is important if your content absolutely needs to load as fast as possible, and the images aren't very important.
Is far more simple, easy and significantly faster to implement than any kind of dynamic PHP solution.
It spaces out the HTTP requests, so the initial content loads faster (since the HTTP requests can be sent based on browser action instead of just page load).
Cons:
It doesn't decrease the number of HTTP requests, it simply spaces them out. Also note that there will be at least one additional external JS file in addition to all of these images.
Displays nothing if the end device (such as older mobile devices) does not support JavaScript. The only way you could fix this is to have all of the images load normally between some <noscript> tags, which would require PHP to generate twice as much HTML.
Would require you to add loading.gif (and another HTTP request) or Please wait while these images load text to your page. I personally find this annoying as a website user because I want to see everything when the page is "done loading".
Conclusion:
If you have the background knowledge or time to learn how to effectively use Method 1, it gives far more potential because it allows for manipulation of the images and HTTP requests sent by your page after it loads.
Conversely, if you're looking for a simple method to space out your HTTP Requests or want to make your content load faster by making your extra images load later, Method 2 is your answer.
Looking back at methods 1 and 2, it looks like using both methods together could be the best answer. Having two of your cached and compressed images load with the page (one is visible, the other is a buffer so that the user doesn't have to wait every time they click "next"), and having the rest load one-by-one as the user sees fit.
In your specific situation, I think that Method 2 would be the most effective if your images can be displayed in a "slideshow" fashion. If all of the images need to be loaded at once, try compressing them and applying browser-caching with method 1. If too many image requests on page load is destroying your speed, try image spriting.
As of now, you are contacting the webservice 100 times. You should change it so it contacts the webservice only once and retrieves an array of all the 100 images, instead of each image separately.
You can then loop over this array, which will be very fast as no further webtransactions are needed.
If the images you are fetching from the webservice are not dynamic in nature i.e. do not get changed/modified frequently, I would suggest to setup a scheduled process/cron job on your server which gets the images from the webservice and stores locally (in your server itself), so you can display images on the webpage from your server only and avoid third party server round trip every time webpage is served to the end users.
Both of the 2 option cannot resolve your problem, may be make it worse.
For option 1:
The process where cost most time is "get_from_web_service($item)", and the code is only made it be executed by another script( if the file "get_image.php" is executed at the same server).
For option 2:
It only make the "get-image-resource-request" being trigger by browser, but your server has also need to process the "get_from_web_service($item)".
One thing must be clear is that the problem is about the performance of get_from_web_service, the most straight proposal is to make it have a better performance. On the other hand, we can make it reduce the number of concurrent connections. I haven't thought this through, only have 2 suggestion:
Asynchronous: The user didn't browse your whole page, they only notice the page at the top. If your mentioned images does not all displayed at the top, you can use jquery.lazyload extension, it can make the image resource at invisible region do not request the server until they are visible.
CSS Sprites : An image sprite is a collection of images put into a single image. If images on your page does not change frequency, you can write some code to merge them daily.
Cache Image : You can cache your image at your server, or another server (better). And do some key->value works: key is about the $item, value is the resource directory(url).
I am not a native english speaker, hope I made it clear and helpful to you.
im not an expert, but im thinking everytime you echo, it takes time. getting 100 images shouldnt be a problem (solely)
Also. maybe get_from_web_service($item); should be able to take an array?
$counter = 1;
$urls = array();
foreach($items as $item)
{
$urls[$counter] = get_from_web_service($item);
$counter++;
}
// and then you can echo the information?
foreach($urls as $url)
{
//echo each or use a function to better do it
//echo '<img url="url_to_my_website/get_image?id='.$url->id.'" />'
}
get_image.php :
$url = get_from_web_service($item);
header("Content-Type: image/png");
echo file_get_contents($url);
at the end, it would be mighty nice if you can just call
get_from_web_service($itemArray); //intake the array and return images
Option 3:
cache the requests to the web service
Option one is the best option. I would also want to make sure that the images are cached on the server, so that multiple round trips are not required from the original web server for the same image.
If your interested, this is the core of the code that I use for caching images etc (note, that a few things, like reserving the same content back to the client etc is missing):
<?php
function error404() {
header("HTTP/1.0 404 Not Found");
echo "Page not found.";
exit;
}
function hexString($md5, $hashLevels=3) {
$hexString = substr($md5, 0, $hashLevels );
$folder = "";
while (strlen($hexString) > 0) {
$folder = "$hexString/$folder";
$hexString = substr($hexString, 0, -1);
}
if (!file_exists('cache/' . $folder))
mkdir('cache/' . $folder, 0777, true);
return 'cache/' . $folder . $md5;
}
if (!isset($_GET['img']))
error404();
getFile($_GET['img']);
function getFile($url) {
// true to enable caching, false to delete cache if already cached
$cache = true;
$defaults = array(
CURLOPT_HEADER => FALSE,
CURLOPT_RETURNTRANSFER => 1,
CURLOPT_FOLLOWLOCATION => 1,
CURLOPT_MAXCONNECTS => 15,
CURLOPT_CONNECTTIMEOUT => 30,
CURLOPT_TIMEOUT => 360,
CURLOPT_USERAGENT => 'Image Download'
);
$ch = curl_init();
curl_setopt_array($ch, $defaults);
curl_setopt($ch, CURLOPT_URL, $_GET['img']);
$key = hexString(sha1($url));
if ($cache && file_exists($key)) {
return file_get_contents($key);
} elseif (!$cache && file_exists($key)) {
unlink($key);
}
$data = curl_exec($this->_ch);
$info = curl_getinfo($this->_ch);
if ($cache === true && $info['http_code'] == 200 && strlen($data) > 20)
file_put_contents($key, $data);
elseif ($info['http_code'] != 200)
error404();
return $data;
}
$content = getURL($_GET['img']);
if ($content !== null or $content !== false) {
// Success!
header("Content-Type: image");
echo $content;
}
None of the two options will resolve server resources usage issue. Out of the two, though, I would recommend option 1. The second one will delay page loading, causing website speed to slow down, and reducing your SEO ratings.
Best option for you would be something like:
foreach($items as $item) {
echo '<img url="url_to_my_website/get_image.php?id='.$item->id.'" />'
}
Then where the magic happens is get_image.php:
if(file_exists('/path_to_local_storage/image_'.$id.'.png')) {
$url = '/path_to_images_webfolder/image_'.$id.'.png';
$img = file_get_contents($url);
} else {
$url = get_from_web_service($id);
$img = file_get_contents($url);
$imgname = end(explode('/', $url));
file_put_contents($imgname, $img);
}
header("Content-Type: image/png");
echo $img;
This was you will only run the request to web service once per image, and then store it on your local space. Next time the image is requested - you will serve it form your local space, skipping request to web service.
Of course, considering image IDs to be unique and persistent.
Probably not the best solution, but should work pretty well for you.
As we see that above you're including an URL to the web service provided image right in the <img> tag src attribute, one can safely assume that these URLs are not secret or confidential.
Knowing that above, the following snippet from the get_image.php will work with the least overhead possible:
$url = get_from_web_service($id);
header("Location: $url");
If you're getting a lot of subsequent requests to the same id from a given client, you can somewhat lessen number of requests by exploiting browser's internal cache.
header("Cache-Control: private, max-age=$seconds");
header("Expires: ".gmdate('r', time()+$seconds));
Else resort to server-side caching by means of Memcached, database, or plain files like so:
is_dir('cache') or mkdir('cache');
$cachedDataFile = "cache/$id";
$cacheExpiryDelay = 3600; // an hour
if (is_file($cachedDataFile) && filesize($cachedDataFile)
&& filemtime($cachedDataFile) + $cacheExpiryDelay > time()) {
$url = file_get_contents($cachedDataFile);
} else {
$url = get_from_web_service($id);
file_put_contents($cachedDataFile, $url, LOCK_EX);
}
header("Cache-Control: private, max-age=$cacheExpiryDelay");
header("Expires: ".gmdate('r', time() + $cacheExpiryDelay));
header("Location: $url");
I want to create a web directory site, and I need to get these site screenshots. How to get a site screenshot quickly using PHP?
I tried IECAPT,webscreencapture, khtml2png, but they are all slowly. And they all get screenshot one url by one url.
Is IECAPT depends on a ie browser? if it is, why it can not open many ie tags so that work at the same time?
Is there anyone can recommend me a PHP screenshots software using online? according to my above requirements? Thank you.
Your requirements are unrealistic. Your best bet is to integrate with WebKit through something like CutyCapt that doesn't run an actual browser, but just the WebKit rendering engine. You shouldn't have any concurrency issues, but it it isn't going to be fantastic.
These external services are developing fast. Take a look at:
http://immediatenet.com/thumbnail_api.html
it renders thumbnails extremely fast and caches them like the other similar services.
Probably the easiest way is to use an external service. There used to be Alexa Site Thumbnail but it has been discontinued, so you must look for alternatives. For example http://www.pageglimpse.com/ seems to be one.
I have tried CutyCapt, I copied 3 CutyCapt.exe and renamed them. But it also catch the screenshot one by one , not run the 3 processes at one time.
<?php
set_time_limit(0);
$url1 = 'http://www.google.co.uk';
$out1 = '1.jpg';
$path1 = 'CutyCapt1.exe';
$cmd1 = "$path1 -u=$url1 -o=$out1";
//exec($cmd);
system($cmd2);
$url2 = 'http://www.google.com';
$out2 = '2.jpg';
$path2 = 'CutyCapt2.exe';
$height2 = '1200 ';
$cmd2 = "$path2 -u=$url2 -o=$out2";
//exec($cmd);
system($cmd2);
$url3 = 'http://www.google.co.jp';
$out3 = '2.jpg';
$path3 = 'CutyCapt3.exe';
$height3 = '1200 ';
$cmd2 = "$path3 -u=$url3 -o=$out3";
//exec($cmd);
system($cmd3);`
?>
I do not think many thumbnail service site, like pageglimpse.com, they install many browsers on their web servers. What is the technology they use?
I have a little problem here, and no tutorials have been of help, since I couldn't find one that was directed at this specific problem.
I have 2 hosting accounts, one on a server that supports PHP. And the other on a different server that does not support PHP.
SERVER A = PHP Support, and
SERVER B = NO PHP Support.
On server a I have a php script that generates a random image. And On server b, i have a html file that includes a javascript that calls that php function on server a. But no matter how I do it, it never works.
I have the following code to retrieve the result from the php script:
<script language="javascript" src="http://www.mysite.com/folder/file.php"></script>
I know I'm probably missing something, but I've been looking for weeks! But haven't found any information that could explain how this is done. Please help!
Thank you :)
UPDATE
The PHP script is:
$theimgs= array ("images/logo.png", "images/logo.png", "images/logo.png", "images/logo.png", "images/logo.png");
function doitnow ( $imgs) {
$total = count($imgs);
$call = rand(0,$total-2);
return $imgs[$call];
}
echo '<img src="'.doitnow($theimgs).'" alt="something" />';
<img src="http://mysite.com/folder/file.php" alt="" /> ?
It's not clear, why you include a PHP file as JavaScript. But try following:
Modify your PHP Script so that it returns a image file directly. I'll call that script image.php. For further information, look for the PHP function: header('Content-type: image/jpeg')
In your JavaScript file use image.php as you would any normal image.
Include the JavaScript on server B as a *.js file.
UPDATE:
It's still not clear, why you need JavaScript.
Try as image.php:
$theimgs= array ("images/logo.png", "images/logo.png", "images/logo.png", "images/logo.png", "images/logo.png");
function doitnow ( $imgs) {
$total = count($imgs);
$call = rand(0,$total-2);
return $imgs[$call];
}
$host = $_SERVER['HTTP_HOST'];
$uri = rtrim(dirname($_SERVER['PHP_SELF']), '/\\');
$extra = 'mypage.php';
header("Location: http://$host$uri/" . doitnow($theimgs));
And on server b:
<img src="www.example.org/image.php"/>
You didn't specify, but I assume the two servers have different domain/hostnames. You may be running into a browser security model problem (same origin policy).
If that's the case, you need to use JSONP.
You may be using outdated sources to learn, since the language attribute is deprecated and you should use type="text/javascript" instead. It's also not clear what kind of output does the .php script produce. If it's image data, why are you trying to load it as a script and not an image (i.e., with the <img> tag)?
Update: The script is returning HTML, which means it should be loaded using Ajax, but you can't do that if it's on a different domain due to the same origin policy. The reason nothing is working now is that scripts loaded using the <script> tag aren't interpreted as HTML. To pass data between servers, you should try JSONP instead.
It seems that server A generates an HTML link to a random image (not an image). The URL is relative to wherever you insert it:
<img src="images/logo.png" alt="something" />
That means that you have an images subdirectory everywhere you are using the picture. If not, please adjust the URL accordingly. Forget about JavaScript, PHP or AJAX: this is just good old HTML.
Update
The PHP Script displays pics randomly.
Pics are hosted on server A, and they
are indeed accessible and readable
from the internet. The PHP Script has
been tested by itself, and works.
If these statements are true, Māris Kiseļovs' answer should work. So either your description of the problem is inaccurate or you didn't understand the answer...