PHP web based scraper - php

What Im trying to do is use PHP to scrape a website of a url I enter into a parameter.
I want the whole raw source code.. But thats not all..
I want it then saved into an html page, and onto the local server of the php script.
Is there a Easy Snippet for this? or can someone easily write me up a code?
For example
I want to scrape http://google.com
So for instance, mysite.com/scrape.php?url=http://google.com
I want it to save the front page of google into http://mysite.com/scraped/google.com.html

Here's a script that will save the contents of the specified url into a file named scraped.html:
if (isset($_GET['url'])):
$contents = file_get_contents($_GET['url']);
file_put_contents('scraped.html', $contents);
endif;
To use a url in the call to file_get_contents() you must enable allow_url_fopen in your php.ini file.
Of course this will only save the actual source of the requested url and not any other resources, such as images, scripts and stylesheets.

Related

PHP: save a file (generated by aspx) to server

I have an url provided by a wholesaler. Url generates xml file which I need to save on my server.
I use PHP - file_get_contents and file_put_contents to do that:
$savepath = "path_to_my_server_folder";
$xmlurl = "http://usistema.eurodigital.lt/newxml/xmlfile.aspx?xml=labas&code=052048048048051057049050048049052";
file_put_contents($savepath.'eurodigital.xml', file_get_contents($xmlurl));
File is generated on my server, but its content is empty. I have no problems with other xml files if I provide direct xml url, but in this situation file is generated by aspx file dynamically. Xml url is actual url I use. When I open xmlurl in browser, xml file gets saved to device.
Can you help me with moving xml to my server? What function should I use? Is it even possible to do that using PHP 5? "allow_url_fopen" is ON.
Thank you in advance!

How to download file Using PHP that has delayed force download?

I am at a situation, where I need to download files from the URL, it is easy with the direct file URLs like https://somedomain.com/some-path/somefile.exe
file_put_contents( $save_file_loc, file_get_contents($url_to_download);
But what to do when you have delayed force download from the URL which actually prints HTML and how to differentiate those URL?
Example URL: https://filehippo.com/download_mozilla-firefox-64/post_download/
EDIT: On above url the file download starts using JS, as I tested with blocking JS and download did not start.
Thanks in advance for your help.
Read the html of the URL using file_get_contents
Find the URL of the file within the HTML. You'll have to visit the page and view source to locate the URL. In your example of https://filehippo.com/download_mozilla-firefox-64/post_download/ it's found in between data-qa-download-url="https://dl5.filehippo.com/367/fb9/ef3863463463b174ae36c8bf09a90145/Firefox_Installer.exe?Expires=1594425587&Signature=18ab87cedcf3464363469231db54575665668c4f6&url=https://filehippo.com/download_mozilla-firefox-64/&Filename=Firefox_Installer.exe"
As you may have noticed, the page may have pre-approved the request so it's not guaranteed to work if the host has checks using cookies or other methods.
Create a regex based on the above to extract the URL using preg_match
Then file_get_contents the URL of the file to download it.

How to get a file from remote server and dispose to user as a download?

Sorry if I did not get this title right.
I have in my server the redirect.php script that receives a URL passed by the client user, fetches the content using file_get_contents() function and them shows it to the user with echo() function.
The problem is when the user points directly to a PDF or JPG file in that URL and them the script shows the file contents as binary code.
When I set the code to recognize when the requested URL points directly to a downloadable file,
What should be the function or header to echo to the user so that his browser ask him to download the file insted of showing it?
Do I have to first put it into a file inside my server or can I do it directly from a command like file_get_contents()? If I can do that without writing it to my server, it would be a much better approuch.
I can't point directly to the server because some sites are blocked by my employee and the third party company that does this service thinks that StakExchenge sites are malicious and not constructive and were tegged as online communities like Facebook.
Try this:
$sourcefile = "http://www.myremotewebsite.com/myfile.jpg";
$destfile = "myfile.jpg";
copy($sourcefile, $destfile);

download a file with php and hash

So this is simple to understand what i want to achieve. So i get links like theese:
http://rockdizfile.com/atfmzkm7236t
http://rockdizfile.com/xuj5oincoqmy
http://rockdizfile.com/pg8wg9ej3pou
So theese links are from one cloud storage site I want to make a php script that automates their downloading.
So I can't find which is the script or the thing these links download button starts and how can I start that so i can download it with php on my server?
Basically my idea is to download a lot of files but don't wanna do it manually so need automatic way of doing it. As far as I know I make a request which is the following 2 urls:
http://rockdizfile.com/pg8wg9ej3pou
http://wi32.rockdizfile.com/d/wsli6rbhfp4r2ge4t7cqeeztijrprelfiw4afvqg5iwspmvqabpkmgiz/Desislava%20feat.%20Mandi%20&%20Ustata%20-%20Pusni%20go%20pak%20(CDRIP).mp3
So the first url is executing the next one but here comes the tricky part as far as I tested that last string Desislava%20feat.%20Mandi%20&%20Ustata%20-%20Pusni%20go%20pak%20(CDRIP).mp3 is the file name we get when downloading so if you change it with for example somefile.mp3 it will download somefile.mp3 but with the same file content as http://wi32.rockdizfile.com/d/wsli6rbhfp4r2ge4t7cqeeztijrprelfiw4afvqg5iwspmvqabpkmgiz/Desislava%20feat.%20Mandi%20&%20Ustata%20-%20Pusni%20go%20pak%20(CDRIP).mp3 so the data is hidden in this hash wsli6rbhfp4r2ge4t7cqeeztijrprelfiw4afvqg5iwspmvqabpkmgiz or i think so. And now is the tricky part how to get this hash? we have almost everything we have the code for the url atfmzkm7236t the hash wsli6rbhfp4r2ge4t7cqeeztijrprelfiw4afvqg5iwspmvqabpkmgiz and the filename Desislava%20feat.%20Mandi%20&%20Ustata%20-%20Pusni%20go%20pak%20(CDRIP).mp3 There must be a way to download from this site without clicking so please help me kinda a hack this :)
you can use PHP's header function to force a file to download
header('Content-disposition: attachment; filename=index.php');
readfile('Link');
You should know that this will not give you the ability to download PHP files from external websites.
You can only use this if you got the direct link to a file
It's impossibly to tell you without the source code
e.g. sha1("Test Message") gives you 35ee8386410d41d14b3f779fc95f4695f4851682 but sha256("Vote this up") gives you 65e03c456bcc3d71dde6b28d441f5a933f6f0eaf6222e578612f2982759378ed
totally different... unless you're hidden function add's "65e03c456bcc3d71dde6b28dxxxxxxxxxxxxxxxxxxxxxxxxxx" (where xxxxxxxxxxxxxxxxxxxxxxxxxx is a bunch of numbers I can't be arsed to work out) to each hash...
then sha1("Test Message") gives you 65e03c456bcc3d71dde6b28d441f5a933f6f0eaf6222e578612f2982759378ed
The file is embedded into the swf player.
alert(jwplayer('mp3player').config.file);
Something like:
<?PHP echo file_get_contents($_GET["url"]); ?>
<script>
document.location=jwplayer('mp3player').config.file;
</script>
Though I've actually just noticed they change 5 digits of the URL on each page request, and the script above uses 2 page requests. One to get the URL and HTML source and another to try and download the file, meaning the URL has changed before the second request has started.

Curl or file_get_contents for downloading whole webpage with css, images and JS files

What I'm need is to get source code of some webpage URL:
$url = 'http://www.kupime.com/';
$data = file_get_contents($url);
after that string I need to save as HTML file in my server folder and after that I need to save other elements od pages (images,css and JS files) and also put on server folder...
After that all, I need to show this page as on my domain to looks like iframe but with source code which I need for other actions.
HOW TO DO THAT! with php file_get_contents or with some cUrl functions or ... suggest you!
You need to parse all web site urls, src= href= etc etc etc. it's really hard to do that. Try out hidemyass.com , and look that not any web site will work correctly because of js. The script you are looking for is called WebProxy.

Categories