I tried to write a program that can automatically down files (with php links). However, I have two issue right now
First, my target website requires registration for the first time access. Then, every time when I clicked the download link, it automatically downloads the file I want. It looks like searched some cookies that saved on my computer to determine who I am. How to make my python program deal with my local cookies? if multiples?
Second, can anyone provide me an example code about how to deal with php download link file? I want to save all these files on a specific location with a specific name. How should I do that in python3?
For getting the cookies:
Try:
import urllib.request
cookier = urllib.request.HTTPCookieProcessor()
# create the cookie handler
opener = urllib.request.build_opener(cookier)
urllib.request.install_opener(opener)
The HTTPCookieProcessor will return cookielib.CookieJar object which contains those cookies. You can loop through it to find the cookie you want.
for c in cookier.cookiejar:
if c.domain == '.stackoverflow.com':
# do something
For read the content in the link:
Try:
url = 'YOUR_URL'
req = urllib.request.Request(url, headers=_headers) # where headers is the header setting you can find in your brwoser
f = urllib.request.urlopen(req)
contents = f.read().decode('utf-8')
# contents is the content inside your file
# You can add the code here to write contents to other file to save it
Related
I have a PHP file that performs a query to an external API using a query parameter to retrieve the image from the database. This particular API doesn't return the image, but rather generates a new URL that can be used for a short period of time (associated cookie involved).
So my PHP file might be found at:
myserver.com/getFile.php?id=B9590963-145B-4E6A-8230-C80749D689WE
which performs the API call which generates a URL like this:
myserver.com/Streaming_SSL/MainDB/38799B1C4E38F1F9BCC99D5A4A2E0A514EEA26558C287C4C941FA8BA4FB7885B.png?RCType=SecuredRCFileProcessor&Redirect
which I store in a PHP variable - $fileURL. I then use:
header('Location: '. $fileURL);
to redirect the browser to the URL that shows the image. This is all working well, but has created an issue with a service that I integrate with. This service caches the 2nd (redirected URL) which causes problems, as it essentially only works the first time it is generated due to the use of the session cookie. I need to come up with a solution that will allow them to cache a URL that will continue to work that shows the image.
I'm wondering is there a way that I can download the image and show that somehow, without having to redirect the browser to a new location here and thus allowing the original URL to continue working if it is cached?
To get image or file do this:
$getImg = file_get_contents('myserver.com/getFile.php?id=B9590963-145B-4E6A-8230-C80749D689WE')
Name the file
$fileName = 'name.png'
Save the image on your server (in directory) to view it later or show somewhere else with url.
$saveFile = file_put_contents($fileName , $getImg);
Load the file like you wanted after you have saved it.
header('Location: '. $fileName);
Hope it helps.
I have seen file protection methods used in many web sites such as youtube , file hosting sites, music sites, facebook.. They are using special way to control the availability of the file....
Links look like this,
http://www.mysite.com/music/audio.mp3?Expires=1354180089&Key=APKAIKAIRXBA2H7FXITA
after the expiry , file is no longer available so the user who wants to use the file have to request it again with new expiry code... It will avoid illegal use of the file in other sites and it will protect bandwidth...
when using such a way, file is not available forever like in http://www.mysite.com/music/audio.mp3
I searched everywhere for tutorials but I couldn't find any.... Help me...
in this case, audio.mp3 is not a real mp3 file, it is a script which checks the session expire time and if it is valid, makes the right header, and prints out the real mp3 file which is located somewhere on the server and only the script can access it. something like this pseudo code:
if (session valid) {
//set the right header;
//print out the mp3 file;
} else {
//text/html header;
//print the message about session being invalid;
}
I'm developing sites and some visitor's browsers appear with old cache.
Is there a way we can clear visitor's browser cache using codes from the server side or even javascript so they don't have to clear themselves?
I cannot find the direct answer to this.
There must be a way big companies do like Facebook, Ebay etc
We have been using htaccess to determine the caching rules of the clients. We explicitly give the cache a 24h lifetime and we put no-cache rules the day before we do the update. It has helped but it is tedious and not so reliable.
Just posting it to give you ideas if no one answers, but I would really love to get the answer too. :)
First Method:
You can actually save the output of the page before you end the script, then load the cache at the start of the script.
example code:
<?php
$cachefile = 'cache/'.basename($_SERVER['PHP_SELF']).'.cache'; // e.g. cache/index.php.cache
$cachetime = 3600; // time to cache in seconds
if(file_exists($cachefile) && time()-$cachetime <= filemtime($cachefile)){
$c = #file_get_contents($cf);
echo $c;
exit;
}else{
unlink($cachefile);
}
ob_start();
// all the coding goes here
$c = ob_get_contents();
file_put_contents($cachefile);
?>
You can actually save the output of the page before you end the script, then load the cache at the start of the script.
example code:
If you have a lot of pages needing this caching you can do this:
in cachestart.php:
in cacheend.php:
<?php
$c = ob_get_contents();
file_put_contents($cachefile);
?>
Then just simply add
include('cachestart.php');
at the start of your scripts. and add
include('cacheend.php');
at the end of your scripts. Remember to have a folder named cache and allow PHP to access it.
Also do remember that if you're doing a full page cache, your page should not have SESSION specific display (e.g. display members' bar or what) because they will be cached as well. Look at a framework for specific-caching (variable or part of the page).
Second Method:
Use Squid or update the HTTP headers correctly to do browser caching.
PEAR has a caching package (actually two):
http://pear.php.net/package/Cache
Fourth Method:
Use http://memcached.org/. There's an explanation of how to do it on that site.
I usually use a combination of techniques:
HTML resulting from PHP code is not cached using the standard configuration, because it sends out the appropriate headers automatically.
Images and other binary assets get renamed if they change.
For JavaScript and CSS I add a automatically created unique code (e.a MD5 hash of the contents or the file size) to the filename (e.g. /public/styles.f782bed8.css) and remove it again with mod_rewrite. This way every change in the file results in a new file name. This can be done at runtime in PHP while outputting the HTML header, to have it fully automated. In this case however an MD5 might have a performance impact.
I am writing a anti-leeching download script, and my plan is to create a temporary file, which is named by session ID, then after the session expires, the file will be automatically deleted. Is it possible ? And can you give me some tips how to do that in PHP ?
Thanks so much for any reply
PHP has a function for that name tmpfile. It creates a temporary file and returns a resource. The resource can be used like any other resource.
E.g. the example from the manual:
<?php
$temp = tmpfile();
fwrite($temp, "writing to tempfile");
fseek($temp, 0);
echo fread($temp, 1024);
fclose($temp); // this removes the file
?>
The file is automatically removed when closed (using fclose()), or when the script ends. You can use any file functions on the resource. You can find these here. Hope this will help you?
Another solution would be to create the file in the regular way and use a cronjob to regular check if a session is expired. The expiration date and other session data could be stored in a database. Use the script to query that data and determine if a session is expired. If so, remove it physically from the disk. Make sure to run the script once an hour or so (depending on your timeout).
So we have one or more files available for download. Creating a temporary file for each download requests is not a good idea. Creating a symlink() for each file instead is a much better idea. This will save loads of disk space and keep down the server load.
Naming the symlink after the user's session is a decent idea. A better idea is to generate a random symlink name & associate with the session, so the script can handle multiple downloads per session. You can use session_set_save_handler() (link) and register a custom read function that checks for expired sessions and removes symlinks when the session has expired.
Could you explain your problem a bit more deeply? Because I don't see a reason why not to use $_SESSION. The data in $_SESSION is stored server-side in a file (see http://php.net/session.save-path) BTW. At least by default. ;-)
Ok, so we have the following requirements so far
Let the user download in his/her session only
no copy & paste the link to somebody else
Users have to download from the site, e.g. no hotlinking
Control speed
Let's see. This is not working code, but it should work along these lines:
<?php // download.php
session_start(); // start or resume a session
// always sanitize user input
$fileId = filter_input(INPUT_GET, 'fileId', FILTER_SANITIZE_NUMBER_INT);
$token = filter_input(INPUT_GET, 'token', FILTER_UNSAFE_RAW);
$referer = filter_input(INPUT_SERVER, 'HTTP_REFERER', FILTER_SANITIZE_URL);
$script = filter_input(INPUT_SERVER, 'SCRIPT_NAME', FILTER_SANITIZE_URL);
// mush session_id and fileId into an access token
$secret = 'i can haz salt?';
$expectedToken = md5($secret . session_id() . $fileId);
// check if request came from download.php and has the valid access token
if(($expectedToken === $token) && ($referer === $script)) {
$file = realpath('path/to/files/' . $fileId . '.zip');
if(is_readable($file)) {
session_destroy(); // optional
header(/* stuff */);
fpassthru($file);
exit;
}
}
// if no file was sent, send the page with the download link.
?>
<html ...
<?php printf('a href="/download.php?fileId=%s&token=%s',
$fileId, $expectedToken); ?>
...
</html>
And that's it. No database required. This should cover requirements 1-3. You cannot control speed with PHP, but if you dont destroy the session after sending a file you could write a counter to the session and limit the number of files the user will be sent during a session.
I wholeheartedly agree that this could be solved much more elegantly than with this monkeyform hack, but as proof-of-concept, it should be sufficient.
I'd suggest you not to copy the file in the first place. I'd do the following: when user requests the file, you generate a random unique string to give him the link this way: dl.php?k=hd8DcjCjdCkk123 then put this string to a database, storing his IP address, maybe session and the time you've generated the link. Then another user request that file, make sure all the stuff (hash, ip and so on) matches and the link is not expired (e.g. not more that N hours have passed since the generation) and if everything is OK, use PHP to pipe the file. Set a cron job to look through the DB and remove the expired entries. What do you think?
tmpfile
Creates a temporary file with a unique
name in read-write (w+) mode and
returns a file handle. The file is
automatically removed when closed
(using fclose()), or when the script
ends.
Maybe it's to late for answering but I'm try to share on feature googlize!
if you use CPanel there is a short and quick way for blocking external
request on your hosted files which name is: HotLink.
you can Enable HotLinks on you Cpanel and be sure nobody can has request o your file from another hosting or use your files as a download reference.
To acheive this, I would make one file and protect it using chmod - making it unavailable to the public. Or, alternatively, save the contents in a database table row, fetch it whenever required.
Making it downloadable as a file. To do so, I would get the contents from the protected file, or if it is stored in a database table, fetch it and simply output it. Using php headers, I would, give it a desired name, extension, specify it's type, and finally force the browser to download the output as a solid file.
This way, you only need to save data in one place either, in a protected file or in database. Force client browser to download it as many times as the the conditions meet e.g., as long as the user is logged-in and so on. Without having to worry about the disk space, making any temp file, cronJobs and or auto-deletion of the file.
Hi I am trying to redirect all links to any pdf file in my site to a page with a form in it that collects user info before they can proceed to download/view the pdf.
Eg
I want to redirect *.pdf files in web site to request.php?file=name_of_pdf_being_redirected
Where request.php is the page with the form on it asking for a few details before proceeding.
All pdf's in the site are held inside /pdf folder.
Any ideas?
EDIT: sorry I'm using Apache on the server.
OK I'M GETTING THERE:
I have it working now using:
RewriteEngine on
RewriteRule ^pdf/(.+.pdf)$ request.php?file=/$1 [R]
But now when it goes to the download page when i want to let the person actually download the file my new rule is spitting the download link back to the form :-P haha so is there anyway to let it download the file once the form has been submitted and you're on download.php?
Ideas? You could start by telling us which web/app server you're using, that might help :-)
In Apache, you should be able to use a RewriteRule to morph the request into a different form. For example, turning /pub/docs/x.pdf into request.php?file=/pub/docs/x.pdf could be done with something like:
RewriteRule ^/pdf/(.*)\.pdf/ request.php?file=/$1.pdf
Keep in mind this is from memory (six years since I touched Apache and still clean :-), the format may be slightly different.
Update:
Now you've got that sorted, here's a couple of options for your next problem.
1/ Rename the PDFs to have a different extension so that they're not caught by the rewrite rule. They should be configured to push out the same MIME type to the client so that they open in the clients choice of viewer.
2/ Do the download as part of the script as well, not as a direct access to the PDF. Since the submission of the form is a HTTP request, you should be able to answer it immediately with the PDF contents rather than re-directing them again to the download page.
That second option would be my choice since it:
stops people figuring out they can get to the PDFs just by requesting xx.pdfx instead of xx.pdf.
makes it quicker for the person to get the PDF (they don't have to click on the link again).
You can try this:
Move your files to a folder "outside" your web root so that no one can access it thru a browser
Use sessions to detect whether a person has completed the form or not
Use a php powered file download script. In its naivest form, it might look like this:
if ( isset( $_SESSION[ 'OK_TO_DOWNLOAD' ] ) == false )
{
header( "Location: must_fill_this_first.php" );
exit( 0 );
}
header( "Content-type: application/pdf" );
// double check the above, google it as i am not sure
echo file_get_contents( 'some_directory_inaccessible_thru_www/' . $_GET[ 'pdf_name' ] );
// ideally a binary-safe function needs to be used above
This is a tried and tested technique I used on a website. The code example is a draft outline and needs refinement.
Note, my answer is with respect to a .NET website, but I'm sure the same constructs exist somewhere in PHP.
I would have an HTTPModule with a path of *.pdf that simply does a Response.Redirect to request.php?...etc (in my case request.aspx) And then in the event handler for the button click on that page, when you know which pdf to display and that they're authorized, simple do a Response.ContentType = [MIME type of pdf], and then Response.WriteFile(pdfFile), and finally Response.End().
There are other things you can add to make it better, such as filesize, etc. But in the minimal case, this would work. If you want the code for it in C# I could come up with something in about 3 minutes, but in PHP i'm quite lost. I'd start out looking for HTTPModules and how to write them in PHP.
Googling for "PHP HTTPModule" leads to this: Equivalent of ASP.NET HttpModules in PHP so, I may be a little wrong, but hopefully that's a starting point.
Use an .htaccess file if you're using an Apache web server. You'll need to make certain that you have mod_rewrite enabled, but once you do you can rewrite all files using these two simple lines:
RewriteEngine On
RewriteRule ^.pdf$ /rewrite.php [NC,L]
If you are using IIS, you can accomplish something similar using ISAPI_Rewrite.
Your other alternative is to place your pdf's inside of a directory that is not publicly accessible and then any request made for a pdf resource would return an access denied error and the files could only be accessed through the appropriate download script.
if($user==authenticated){
//set pdf headers
echo file_get_contents('actual.pdf');
no mod re-writes, hides actual source and is what i normally do - hope this helps