Can't file_get_contents on certain sites (PHP)

Can't file_get_contents on certain sites (PHP) - php

Actually I'm fooling around while developing some things like "change avatar", where there exists the option to get the image from its URL
$raw = file_get_contents($src);
$img = imagecreatefromstring($raw);
// and others gd stuff
Actually it works fine, except when it comes from a certains website, like pixiv
http://i2.pixiv.net/img02/img/suzupin/2800349.jpg
This, for example, throw some errors
BTW, the same goes when I try to PIN this on Pinterest =P WHY?
Is there a way to prevent the others to do things like file_get_contents on my site?
Am I right to say that it has something to do with oriental websites? Because pretty often I can't PIN images from japanese sources. >.<

What is happening is these websites are implementing 'hotlink prevention'. There are many different methods--some #duskwuff suggested--cookies, checking referer, sessions, etc.
What you want to do is circumvent hotlink prevention, and that is answered in this SO question: Download file from URL using CURL

You can use login form or captcha then allow image view and test browser version etc. but if user is using curl with cookies (i don't know how to use cookie with file_get_contents then it's tough to disallow user.

Related

file_get_contents gets different file from google than shown in browser

I use file_get_contents to find out if there is an URL of the search I look at:
http://www.google.com/search?q=*a*+site:www.reddit.com/r/+-inurl:(/shirt/|/related/|/domain/|/new/|/top/|/controversial/|/widget/|/buttons/|/about/|/duplicates/|dest=|/i18n)&num=1&sort=date-sdate
If I go to this URL in my browser, a different file is displayed then what I see when I echo file_get_contents
$url = "http://www.google.com/search?q=*a*+site:www.reddit.com/r/+-inurl:(/shirt/|/related/|/domain/|/new/|/top/|/controversial/|/widget/|/buttons/|/about/|/duplicates/|dest=|/i18n)&num=1&sort=date-sdate";
$google_search = file_get_contents($url);
What's wrong with my code?

Nothing really. The problem is that the page uses javascript and ajax to get contents. So, in order to get a "snapshot" of the page, you need to "run it". That is, you need to parse the javascript code, which php doesn't do.
Your best bet is to use an headless browser such as phantomjs. If you search, you find some tutorials explaining how to do it
NOTE
If all you're looking for is a way to retrieve raw data from the search, you might want to try to use google's search api.

I assume Google is definitely checking the user agent to avoid any kind of automated searches.
So you should at least use CURL and define a proper user agent string (i.e. the same as a common browser) to "trick" Google.
Somehow I fear it will not be so easy to trick Google, but maybe I'm just paranoic and at least you may learn something about CURL.

how to fake url detection by php

im working on a script for indexing and downloading whole website by user sent url
for example when a user submit a domain like http://example.com then i will copy all links in index page and go for download the its inside links and start from first.....
i do this part with curl and regular expression to download and extract the links
however
some yellow websites are making fake urls for example if you go to http://example.com?page=12 it have some links to http://example.com?page=12&id=10 or http://example.com?page=13 and etc..
this will make a loop and the script cant complete the site downloading
is there any way to detect these kind of pages!?
p.s.: i think google and yahoo and some other search engines face this kind of problem too but their database are clear and on searches thay dont show these kind of data....

Some pages may use GET variables and be perfectly valid (like as you've mentioned here, ?page=12 and ?page=13 may be acceptable). So what I believe you're actually looking for here is a unique page.
It's not possible however to detect these straight from their URL. ?page=12 may point to exactly the same thing as ?page=12&id=1 does; they may not. The only way to detect one of these is to download it, compare the download to pages you've already got, and as a result find out if it really is one you haven't seen yet. If you have seen it before, don't crawl its links.
Minor side note here: Make sure you block websites from a different domain, otherwise you may accidentally start crawling the whole web :)

Using PHP to determine a user can handle FancyZoom (Javascript image zoomer)

I'm working on a small PHP-driven website that's so basic that I can't imagine a browser from any time in the 2000's, if not further back, would have any serious issues with it.
I added the FancyZoom Javascript image viewer, though, and it's the ONE part of my site that I can't bet my life on in terms of across-the-board compatibility, especially taking fragmented mobile browsers into account (for instance, I'm still using an iPhone 3GS, so I know luddites like me are out there).
I know browser/feature detection is discussed here often, but I've got a relatively specific request since I'm not an up-to-date web programmer. What specific features (or user agents, if the case may be) should I be detecting to determine whether to enable an image viewer like FancyZoom or simply leave the user with a direct image link?
I'd imagine that it should be possible to filter out a few cases where the image loader wouldn't work, without going so far as to use one of those uber-complex user agent parsers that require updates, etc. This is a really simple, specific detection problem.
Any ideas on how to boil this down to the simplest possible features to check for would be great. Thanks!

you could do something like this to weed out old browsers
$browser = $_SERVER['HTTP_USER_AGENT'];
if(preg_match('/(?i)MSIE [1-6]/',$browser) && !preg_match('/Opera/i',$browser))
{
$image = 'Non fancy zoom';
} else
{
$image = 'fancy zoom';
}
I have not messed with the jquery plugin your using, so I do not know what browsers it is good for etc, but you can use the below to find their browser and just write rules...
$browser = $_SERVER['HTTP_USER_AGENT'];

File Upload With Progress

I know this has been discussed a number of times, but the problem I'm having at the moment is finding a solution that is easy to work with and does not require much hacking around.
I want to be able to upload a file, and report on its progress. I've been playing with SWFUpload, and it seems like a bit to much messing around for my liking. Integrating it with code igniter just seems like it's going to cause headaches.
I want a visual progress indicator of some sort to show the user their upload hasn't stagnated. Even if it was just a spinner saying "Uploading. Do not close this window until upload is complete." that would be enough for me.
Security is the most important. Using something like SWFUpload is going to require passing variables to the upload form such as the user ID and other information I'd rather not give snooping noses the opportunity to sniff.
Any possible solutions. Help is much appreciated.

You should take a look at HTML5 FormData and XMLHttpRequest 2 which allow you watch the progress directly in javascript.

You must customize the tool to meet your requirement , especially when its open source .
Security is the most important. Using something like SWFUpload is
going to require passing variables to the upload form such as the user
ID and other information I'd rather not give snooping noses the
opportunity to sniff.
Why you need to pass user ID ? i think its HTTP matter not SWFUpload,so you can make it secure .
Look here : http://demo.swfupload.org/Documentation/

I'm actually on the hunt for the same thing. A few of the options I have encountered so far are:
http://www.uploadify.com
http://valums.com/ajax-upload/ (Which is now headed off by Ben Colon here - github dot com / bencolon / file-uploader )
I haven't used any of those solutions because I'm not quite sure how to customize them for my application. But it looks like so far in my research, those are links that keep popping up.

Efficient Method for Preventing Hotlinking via .htaccess

I need to confirm something before I go accuse someone of ... well I'd rather not say.
The problem:
We allow users to upload images and embed them within text on our site. In the past we allowed users to hotlink to our images as well, but due to server load we unfortunately had to stop this.
Current "solution":
The method the programmer used to solve our "too many connections" issue was to rename the file that receives and processes image requests (image_request.php) to image_request2.php, and replace the contents of the original with
<?php
header("HTTP/1.1 500 Internal Server Error") ;
?>
Obviously this has caused all images with their src attribute pointing to the original image_request.php to be broken, and is also the wrong code to be sending in this case.
Proposed solution:
I feel a more elegant solution would be:
In .htaccess
If the request is for image_request.php
Check referrer
If referrer is not our site, send the appropriate header
If referrer is our site, proceed to image_request.php and process image request
What I would like to know is:
Compared to simply returning a 500 for each request to image_request.php:
How much more load would be incurred if we were to use my proposed alternative solution outlined above?
Is there a better way to do this?
Our main concern is that the site stays up. I am not willing to agree that breaking all internally linked images is the best / only way to solve this. I refuse to tell our users that because of something WE changed they must now manually change the embed code in all their previously uploaded content.

Ok, then you can use mod_rewrite capability of Apache to prevent hot-linking:
http://www.cyberciti.biz/faq/apache-mod_rewrite-hot-linking-images-leeching-howto/

Using ModRwrite will probably give you less load than running a PHP script. I think your solution would be lighter.
Make sure that you only block access in step 3 if the referer header is not empty. Some browsers and firewalls block the referer header completely and you wouldn't want to block those.

I assume you store image paths in database with ids of images, right?
And then you query database for image path giving it image id.
I suggest you install MemCached to the server and do caching of user requests. It's easy to do in PHP. After that you will see server load and decide if you should stop this hotlinking thing at all.

Your increased load is equal to that of a string comparison in PHP (zilch).
The obfuscation solution doesn't even solve the problem to begin with, as it doesn't stop future hotlinking from happening. If you do check the referrer header, make absolutely certain that all major mainstream browsers will set the header as you expect. It's an optional header, and the behavior might vary from browser to browser for images embedded in an HTML document.
You likely have sessions enabled for all requests (whether they're authenticated or not) -- as a backup plan, you can also rename your session cookie name to something obscure (edit: obscurity here actually doesn't matter as long as the cookie is set for your host only (and it is)) and check that a cookie by that name is set in image_request.php (no cookie set would indicate that it's a first-request to your site). Only use that as a fallback or redundancy check. It's worse than checking the referrer.
If you were generating the IMG HTML on the fly from markdown or something else, you could use a private key hash strategy with a short-live expire time attached to the query string. Completely air tight, but it seems way over the top for what you're doing.
Also, there is no "appropriate header" for lying to a client about the availability of a resource ;) Just send a 404.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.