I have a project of video broadcasting in which i need to provide the downloading option. I have used Justin.tv api they send a url to download the video file when i hit that url i got 403 forbidden error. I have discussed this problem with their concerned person he replied:
Browsers will get the 403 error, you need to either proxy the file
through your server (by removing the User-Agent header) or tell users
to use a download manager.
Definately the latter one is not good idea. Now i am stucked at sending request without user agent headers how can i do this (using PHP). I have googled it but did not find anything helpful.
Necromancing this old thread, I dunno if the info in the comment by #ayman-safadi was accurate at the time it was posted. That was a quote from some other location. But now,to remove the user agent header you do this:
-H "User-Agent:"
Maybe you can have the "download" link point to an internal page that will make a cURL call to the actual Justin.tv link.
According to one of the comments:
FYI... unless you specifically set the user agent, no user agent will be sent in your request as there is no default value like some of the other options.
There are a lot more comments that might help.
Related
Using PHP I'm trying to download/save the following image:
http://www.bobshop.nl/catalog/product_image.php?size=detail&id=42428
When you load this image in a browser, you can see it, but when I try to download it using several different methods, I get an 1 KB file that says that the product could not be found on the server.
I tried this with both the file_put_contents and the curl way.
I even used the function get_web_page that I found somewhere on StackOverflow, to catch a possible redirect.
What else could be the reason that you can see the image in a browser, but no way to download it ?
UPDATE:
Thanks to an error that was thrown trying out the different answers, I just found out the real cause of the problem. Somewhere in the process of scraping the html, the URL got & instead of & . I replace these now and every other method works now too... thanks all!
I just implemented a simple way to download and store and it worked:
<?php
$fileContent = implode("",file("http://www.bobshop.nl/catalog/product_image.php?size=detail&id=42428"));
$fp = fopen("/tmp/image","w+");
fwrite($fp, $fileContent);
fclose($fp);
?>
Are you behind a proxy? This could be the problem (you are with proxy configured but php not) ;)
There is likely some kind of header checking that is being done with this PHP script to ensure that a browser is requesting the image and not someone trying to scrape their content. This can be forged (although after doing something like this I feel like I need to take a shower) with cURL. Specifically, curl_setopt():
curl_setopt($ch, CURLOPT_HTTPHEADER, array(
'User-agent: Some legitimate string'
));
To find out which headers need to be sent, you'll need to do some experimentation. If you have Google Chrome, you've probably used the Inspector (If you don't Firefox has similar addons, so you can use something like Firebug). If you request the image with Chrome, you can right click to inspect it. Go to the Network tab. Now refresh the page. The request to product_image.php should show up. If you click on it and click the Headers tab, you should see a list of headers sent. My browsers sends: User-Agent, Accept, Accept-Encoding, Accept-Language, and Accept-Charset.
Try combinations of these headers with valid values to see which ones need to be sent for the image to be returned. I'd bet that this site probably only checks User-agent so start with that one.
An important note: You should cache the result of this call, because it will be very suspicious if your server requests the image multiple times in rapid succession (say if many users on your site request the script that grabs this image). Also as an extra layer of anonymity, you might want to pick your User-agent from an array of valid ones so bobshop.nl thinks that all of the requests are coming from users behind a large network (like a college campus). You can find valid user agent strings on UserAgentString.com.
How can I make it so when the site visitor of mysite.com clicks a link, like http://google.com, the referrer page is not sent to the target website ?
Is this possible with PHP ?
Basically I want the linked site to not be aware where the visitor came from
I don't think it is possible, as the HTTP referrer information is sent by the browser. You can install browser plugins to prevent sending referrers, but not directly with PHP.
Update: I just found this
If a website is accessed from a HTTP Secure (HTTPS) connection and a link points to anywhere except another secure location, then the referrer field is not sent.
The upcoming standard HTML5 will support the attribute/value rel = "noreferrer" in order to instruct the user agent not to send a referrer.
Source: http://en.wikipedia.org/wiki/HTTP_referrer#Referrer_hiding
The referer is set by the browser, not the server, so broadly speaking, you can't really control this.
You may be able to find ways to mask mysite.com by redirecting the user through an intermediary site to google.com. I wouldn't recommend this, though.
No. Not possible. The client (broswer) is responsible for that HTTP header. A browser might even choose to not (ever) send it. (I'm not sure about the exact protocols/specifications of when to send it.)
edit
There might be a trick. (But I don't know it.) Maybe some JavaScript or header cancelling image or something nasty.
I have an application which records users visits. None of these visits are directly accessed, 100% of these visits are referred from another site.
I am passing $_SERVER['HTTP_REFERER'] through to the database. Approximately 35% of the logged entrees pass a referer, the rest are blank.
Is there a reason for this?
There are a couple of number of reasons why HTTP_REFERER might be blank.
You have to understand it's an environment variable given by the browser. Meaning users can remove it or even change it, if they so intend to.
Users accessing the link from a bookmark, history or by typing the link manually do not have a referer.
IE has also been known to remove the referer in situations revolving around javascript. Such as window.open, window.location and even setting target="_blank" in anchors or meta refresh.
Clicking an embedded link in a chat application, PDF/Word/Excel document, will also not set a referer.
Using AJAX, file_get_contents, fopen and other similar functions in other languages will probably not set a referer request.
cURL, fsockopen, applications that have browser-like components might not set a referer.
There are probably more situations when this could happen, I'll update if I can think of anything that seems reasonable.
If a user visits your site directly, there is no referrer. It's also possible they have set it up so their browser never sends the referrer.
According to this answer, browsers do not necessarily send a referrer when doing a meta refresh.
Browsers sometimes will include the referer in the request. But it is not mandatory to do so (the referer is 100% voluntary). Indeed there are various privacy and security issues surrounding the referer (for example, if an HTTPS site refers you to an HTTP site, the browser should not include the referring site as the referer). So don't rely on it.
When linking from one document to another in Internet Explorer 4.0 and later, the Referer header will not be sent when the link is from an HTTPS page to a non-HTTPS page. The Referer header also will not be sent when the link is from a non-HTTP(S) protocol, such as file://, to another page. for more info go to this link
Direct access to your page (typing URL in address bar or from bookmarks, history, etc)
Browser settings (disabled referrer or empty)
if someone requests page content with file_get_contents() function...
It is common when you are stuck finding why it is missing:
- Sometime your referer is https and you are on http, it will be lost.
Otherwise:
- User accessing by inputing url directly.
- A user has bookmarked and come from bookmarks.
- Sometime user keep the url default for browser (similar like bookmark)
- Proxy surfying may remove referer.
- accessing website as bots (search engine)
It also depends on the Transport layer, I encountered an issue where my Consumer Application A was running on the HTTP layer while the Application from where I was sending the request was running on the HTTPS layer.
I'm currently trying to grab a file from an external url that has an authorization box that pops up (like the default one asking for a username and password)
How can I have a script get the contents of the page (it's a video), save it to a directory and handle the authorization (i have a username and password)
Thanks :)
file_put_contents('where to put it', file_get_contents('http://username:password#domain.com/video'));
In a word, look at curl: http://php.net/curl, for all you posting/logging in/cookies/session needs in HTTP country.
You don't need to download the page, just check what is being submitted to the web server. Chances are it's just a POST. It may have some additional checks (i.e. checksum) which may need to be scraped from the page.
You can use the HTTP Headers plugin for Firefox to see how the browser is communicating with the server. You then just need to emulate that transaction. It is likely a POST, which is easy to do with CURL.
I don't think file_put_contents will work since it doesn't do an http POST.
How can I detect the site the user came from before accessing mine in PHP?
You could check at the Referer HTTP Header :
echo $_SERVER['HTTP_REFERER'];
But note that the Referer is sent by the browser, which means :
It can be disabled (it's not mandatory, and is just an additionnal information that the browser can send)
It can be faked (i.e. anyone can send anything -- even some SQL injection, or XSS injection, for instance)
So, you can use the referer to provide an additional feature on your website, but you have to make sure that your website doesn't rely on it : your application must still work, even if the Referer is not present.
Try this:
$_SERVER['HTTP_REFERER']
For more information, please see HTTP referrer:
The referrer, or HTTP referrer—also
known by the common misspelling
referer that occurs as an HTTP header
field—identifies, from the point of
view of an internet webpage or
resource, the address of the webpage
(commonly the URL, the more generic
URI or the i18n updated IRI) of the
resource that links to it. By checking
the referrer, the new page can see
where the request came from.
echo $_SERVER['HTTP_REFERER'];
It's not entirely reliable and can be spoofed, but in general it will be populated with the URL that the user clicked to get to the script.
You need to look at the HTTP Referer Header:
$_SERVER['HTTP_REFERER']
See PHP Documentation for more HTTP Headers
As #Andrew Hare states in his answer, getting the value of the HTTP_REFERRER server value (which is a header that is sent as part of the HTTP request) will tell you the site that the browser was last on.
What should be noted, however, is that it is completely possible that this header/server variable will have no value, for a number of legitimate reasons, some being:
The user typed in the URL to the site in the same window
The user opened a bookmark in the same window
The user just opened the browser and did one of the things above
All of the above are really variations on the same thing, a case where the same browser window is used for going to another site, but wasn't prompted through clicking the on a link in a document which lead them there, a redirect, or some other action prompted by the page in the history before yours.
The above notes are correct, but keep in mind that the user can make his/her browser not send this information, or they can mess with this information and send false data.