Bulk download files from url - php

I need to download files in bulk each 0-2.5 MB from an Url to my server(Linux CentOS/can be any other too).
I would like to use the wget (if you have another solution then please post it):
My first approach is to test it with only 1 file:
wget -U --load-cookies=cookies.txt "url"
This is the Shell Response:
The Problem is that it doesn't download the file but only the empty html. The necessary cookie is saved in the right format in the file and the download works in the browser.
If it works to download the 1 file, I want to use a txt with all the urls (e.g. urls.txt) where the urls are like the above but only one parameter is changing. Then I want also that it downloads maybe 10-100 files at a time.
If you have a solution in PHP or Python for this, it will help me too.
Thank you for your help!

I have solved it now with aria2. Its a great Tool for such Things.

Basically:
for i in foo bar 42 baz; do
wget -other -options -here "http://blah/blah?param=$i" -O $i.txt
done
Note the -O parameter, which lets you set the output filename. foo.txt" is a little easier to use thandata-output?format=blahblahblah`.

Related

How to clear contents of file with batch script

I have a self-hosted local website (on W10) with a little chat application. The chat history is saved to a log.html file, and i want to clear it out with a batch script.
I know that on the Ubuntu Shell, it is as simple as > log.html but on Windows, that doesn't work.
I also found nul > log.html, but it says access denied
I also don't want to use a powershell script as i have to change executing rules and it takes nearly a minute.So, my question is:
Is there a way that i can empty log.html with a batch script that doesn't stay open for longer than 20 seconds?
Or, I don't mind if there is a way to use something php-related to clear it daily. I'm using IIS on Windows 10v1803 if that helps.
I think what you want is:
TYPE NUL > log.html
…or as possible alternatives:
BREAK>log.html
 
CLS 2>log.html
 
CD.>log.html
Technically they're not emptying the file they're writing a new file which overwrites the existing one.
This will delete the file and re-create it, and instantly close, so pretty much what you're wanting. Replace "Desktop" with the file path to the file, and place this .bat in the same folder as your log.html:
#echo off
cd "Desktop"
del "log.html"
echo. 2>log.html

Wget download file using PHP exec

Is it possible to download a file using Wget? I want download that file into my default browser's download directory.
It's possible, but it wouldn't achieve the effect you desire.
Running wget would cause the file to be downloaded to the server (which is something you'd be better off using the cURL library for instead of shelling out to an external binary).
If you want the browser to download it, then you need to output the file from PHP, not save it to a file on the server.
Try something like this :
shell_exec('wget -P path_to_default_download_directory google.com');

PHP exec not giving the same result as cmd

exec ("C:/Lame/sox \"C:/1/2.wav\" -t wav \"C:/1/2.rev\" reverse");
Using that code to use an audio post processing tool to reverse a sound file. There is an output but the file is about 1/5th the size it should be and I am unable to play it. Basically it makes a file but its not the one I would have gotten if I did this in the command prompt:
C:/Lame/sox "C:/1/2.wav" -t wav "C:/1/2.rev" reverse
With that, I get the result I want and I am able to play the rev file.
Anyone have any idea why this is happening?
Found the problem. It was a permission problem.
All the other post processing command works because it writes in that folder. Reverse makes a temporary file in another folder which the current user didn't have write access in which why it made a small file since it tried to later read from a file that didn't exist.

Get filename from wget in php

I'm setting up a script so that I can input a URL to a web page and the script will wget the file. Most of the files, however, will be in the *.rar format. Is there anyway I can pass the filename to the unrar command to unarchive the files downloaded via wget?
Many, many thanks in advance!
EDIT I thought about using PHP's explode() function to break up the URL by the slashes (/) but that seems a bit hack-y.
Rather than forking out to external programs to download and extract the file, you should consider using PHP's own cURL and RAR extensions. You can use the tmpfile() function to create a temporary file, use it as the value of the CURLOPT_FILE option to make cURL save the downloaded file there, and then open that file with the RAR functions to extract the contents.
Use basename()to get the filename.
#Wyzard gives the best answer. If there's a library that solves your problem, use it instead of forking an external process. It's safer and it's the clean solution. PHP's cURL and RAR are good, use them.
However, if you must use wget and unrar, then #rik gives a good answer. wget's -O filename option saves the file as filename, so you don't have to work it out. I would rather pipe wget's output directly to unrar though, using wget -q -O - http://www.example.com | unrar.
#Byron's answer is helpful, but you really should not need to use it here. It is, however, better than using explode() as your edit mentions.
wget -O filename URL && unrar filename

How can I create a site in php and have it generate a static version?

For a particular project I have, no server side code is allowed. How can I create the web site in php (with includes, conditionals, etc) and then have that converted into a static html site that I can give to the client?
Update: Thanks to everyone who suggested wget. That's what I used. I should have specified that I was on a PC, so I grabbed the windows version from here: http://gnuwin32.sourceforge.net/packages/wget.htm.
If you have a Linux system available to you use wget:
wget -k -K -E -r -l 10 -p -N -F -nH http://website.com/
Options
-k : convert links to relative
-K : keep an original versions of files without the conversions made by wget
-E : rename html files to .html (if they don’t already have an htm(l) extension)
-r : recursive… of course we want to make a recursive copy
-l 10 : the maximum level of recursion. if you have a really big website you may need to put a higher number, but 10 levels should be enough.
-p : download all necessary files for each page (css, js, images)
-N : Turn on time-stamping.
-F : When input is read from a file, force it to be treated as an HTML file.
-nH : By default, wget put files in a directory named after the site’s hostname. This will disabled creating of those hostname directories and put everything in the current directory.
Source: Jean-Pascal Houde's weblog
build your site, then use a mirroring tool like wget or lwp-mirror to grab a static copy
I have done this in the past by adding:
ob_start();
In the top of the pages and then in the footer:
$page_html = ob_get_contents();
ob_end_clean();
file_put_contents($path_where_to_save_files . $_SERVER['PHP_SELF'], $page_html);
You might want to convert .php extensions to .html before baking the HTML into the files.
If you need to generate multiple pages with variables one quite easy option is to append the filename with md5sum of all GET variables, you just need to change them in the HTML too. So you can convert:
somepage.php?var1=hello&var2=hullo
to
somepage_e7537aacdbba8ad3ff309b3de1da69e1.html
ugly but works.
Sometimes you can use PHP to generate javascript to emulate some features, but that cannot be automated very easily.
Create the site as normal, then use spidering software to generate a HTML copy.
HTTrack is software I have used before.
One way to do this is to create the site in PHP as normal, and have a script actually grab the webpages (through HTTP - you can use wget or write another php script that just uses file() with URLs) and save them to the public website locations when you are "done". Then you can just run the script again when you decide to change the pages again. This method is quite useful when you have a slowly changing database and lots of traffic, as you can eliminate all SQL queries on the live site.
If you use modx it has a built in function to export static files.
If you have a number of pages, with all sorts of request variables and whatnot, probably one of the spidering tools the other commenters have mentioned (wget, lwp-mirror, etc) would be the easiest and most robust solution.
However, if the number of pages you need to get is low, or at least manageable, you've got a few options which don't require any third party tools (not that you should discount them JUSt because they are third party).
You can use php on the command line to get it to output directly into a file.
php myFile.php > myFile.html
Using this method could get painful (though you could put it all into a shell script), and it doesn't allow you to pass variables in the same way (eg: php myFile.php?abc=1 won't work).
You could use another PHP file as a "build" script which contains a list of all the URLs you want and then grabs them via file_get_contents() or file() and writes them to a local file. Using this method, you can also get it to check if the file has changed (md5_file() should work for that), so you'll know what to give your client, should they only want updates.
Further to #2, before you write the output to file, scan it for local urls and then add those to your list of files to download. While you're there, change those urls to link to what you'll eventually name your output so you have a functioning web at the end. Note of caution here - if this is sounding good, you could probably use one of the tools which already exist and do this for you.
Alternatively to wget you could use (Win|Web)HTTrack (Website) to grab the static page. HTTrack even corrects links to files and documents to match the static output.
You can use python or visual basic (or your choice) to create your static files all at once then upload them.
For a project with 11 million business listings in excel files I used VBA to extract the spreadsheet data into 11 mil small .php files, then zipped, ftp'd, unzipped.
https://contactlookup.us
Voila - a super fast business directory
I started with Jekyll, but after about half million entries the generator got bogged down. For 11 million it looked like it would finalize the build in about 2 months!
I do it on my own web site for certain pages that are guaranteed not to change -- I simply run a shell script that could be boiled to (warning: bash pseudocode):
find site_folder -name \*.static.php -print -exec Staticize {} \;
with Staticize being:
# This replaces .static.php with .html
TARGET_NAME="`dirname "$1"`/"`basename "$1" .static.php`".html
php "$1" > "$TARGET_NAME"
wget is probably the most complete method. If you don't have access to that, and you have a template based layout, you may want to look into using Savant 3. I recommend Savant 3 highly over other template systems like Smarty.
Savant is very light weight and uses PHP as the template language, not some proprietary sublanguage. The command you would want to look up is fetch(), which will "compile" your template and place it in a variable that you can output.
http://www.phpsavant.com/

Categories