XML-RPC failing to respond to POST requests via cURL in PHP - php

I'm having some issues with calling WordPress XML-RPC via cURL in PHP. It's a WordPress.com hosted blog, and the XML-RPC file is located at http://sunseekerblogbook.com/xmlrpc.php.
Starting yesterday (or at least, yesterday was when it was noticed), cURL has been failing with error #52: Empty reply from server.
The code snippet we're using is below:
$ch = curl_init('http://sunseekerblogbook.com/xmlrpc.php');
curl_setopt_array($ch, [
CURLOPT_HEADER => false,
CURLOPT_HTTPHEADER => [
'Content-Type: text/xml'
],
CURLOPT_POSTFIELDS => xmlrpc_encode_request('wp.getPosts', [
1,
WP_USERNAME,
WP_PASSWORD,
[
'number' => 15
]
]),
CURLOPT_RETURNTRANSFER => true
]);
$ret = curl_exec($ch);
$data = xmlrpc_decode($ret, 'UTF-8');
Using cURL directly however, everything returns exactly as expected:
$output = [];
exec('curl -d "<?xml version=\"1.0\" encoding=\"UTF-8\"?><methodCall><methodName>wp.getPosts</methodName><params><param><value><int>1</int></value></param><param><value><string>' . WP_USERNAME . '</string></value></param><param><value><string>' . WP_PASSWORD . '</string></value></param><param><value><struct><member><name>number</name><value><int>15</int></value></member></struct></value></param></params></methodCall>" sunseekerblogbook.com/xmlrpc.php', $output);
$data = xmlrpc_decode(implode('', $output), 'UTF-8');
We've been successfully able to query WordPress since July 2013, and we're at a dead-end as to why this has happened. It doesn't look like PHP or cURL have been updated/changed recently on the server, but the first code snippet has failed on every server we've tried it on now (with PHP 5.4+).
Using the http://sunseekerblogbook.wordpress.com/xmlrpc.php link gives the same issue.
Is there anything missing from the PHP code that would cause this issue? That it's suddenly stopped working over 12 months down the line is what has flummoxed me.

Managed to fix it. Looking at the headers sent by cURL, the only differences were that the cURL command line uses Content-Type: application/x-www-form-urlencoded and that the user agent was set to User-Agent: curl/7.30.0.
The choice of content type didn't affect it, but setting a user agent sorted it! It seems WordPress.com (but not self-hosted WordPress.org sites running the latest v3.9.2) now requires a user agent for XML-RPC requests, though this hasn't been documented anywhere that I can find.

Related

PHP script no longer receiving data from context stream

We have a "legacy" script that stopped working a little while back. Pretty sure it's because the endpoint it's connecting to changed from http to https, and the old http address now returns a 301.
I've never done anything other than tiny changes to PHP scripts, so am a little out of my depth here.
Note that our PHP version is old - 5.3.0. This may well be part of the problem.
The script as-is (relevant bit anyway):
$uri = "http://www.imf.org/external/np/fin/data/rms_mth.aspx"
."?SelectDate=$date&reportType=CVSDR&tsvflag=Y";
$opts = array('http' => array(
'proxy' => 'tcp://internal.proxy.address:port',
'method' => 'GET',
'request_fulluri' => true)
);
$ctx = stream_context_create($opts);
$lines = file($uri, false, $ctx);
foreach ($lines as $line)
...
This returns nothing any more. The link btw is the IMF link for exchange rates, so that is open to all - if you open it you'll get a download with a rate table in it. The rest of the script basically parses this for the data we want.
Now, pretty sure our proxy is OK. Running some tests with curl gives the following results:
curl --proxy tcp://internal.proxy.address:port -v https://www.imf.org/external/np/fin/data/rms_mth.aspx?SelectDate=05/28/2020&reportType=CVSDR&tsvflag=Y
(specify https) works just fine.
curl --proxy tcp://internal.proxy.address:port -v http://www.imf.org/external/np/fin/data/rms_mth.aspx?SelectDate=05/28/2020&reportType=CVSDR&tsvflag=Y
(specify http) does not work, and shows a 301 error
curl --proxy tcp://internal.proxy.address:port -v -L http://www.imf.org/external/np/fin/data/rms_mth.aspx?SelectDate=05/28/2020&reportType=CVSDR&tsvflag=Y
(specify http with follow redirects) then works OK.
I've tried a few things after some googling. It seems I need opts for 'ssl' as well when using https. So I've made the following changes
$uri = "https://www.imf.org/external/np/fin/data/rms_mth.aspx"
."?SelectDate=$date&reportType=CVSDR&tsvflag=Y";
$opts = array('http' => array(
'proxy' => 'tcp://internal.proxy.address:port',
'method' => 'GET',
'request_fulluri' => true),
'ssl' => array(
'verify_peer' => false,
'verify_peer_name' => false,
'SNI_enabled' => false)
);
Sadly, the SNI_enabled flag was introduced after 5.3.0, so I don't think this helps. There's also a follow_location context option for http, but that was introduced in 5.3.4, so also no use.
(BTW, I have little to no control over the version of PHP we have, so while I appreciate higher versions may offer better solutions, that's not a lot of use to me I'm afraid).
Basically, I am now stuck. No combination of these parameters or settings returns any data at all. I can see it works via curl and the proxy, so it's not a general connectivity issue.
Any and all suggestions gratefully received!
Update: After adding the lines to enable error reporting, the error code is for the stream connecting:
Warning: file(https://www.imf.org/external/np/fin/data/rms_mth.aspx?SelectDate=05/28/2020&reportType=CVSDR&tsvflag=Y): failed to open stream: Cannot connect to HTTPS server through proxy in /usr/bass/apps/htdocs/BASS/mods/module.XSM.php on line 79
(line 79 is the $lines = ... line)
So it doesn't connect in the php script, but running the same connection via the proxy in curl works fine. What's the difference in php that causes this?
You can use php curl functions to get the response from your given url. And then you can use explode() function to break the response line by line.
$uri = "https://www.imf.org/external/np/fin/data/rms_mth.aspx"
."?SelectDate=$date&reportType=CVSDR&tsvflag=Y";
$opts = array(
CURLOPT_URL => $uri,
CURLOPT_PROXY => 'tcp://internal.proxy.address:port',
CURLOPT_HEADER => false,
CURLOPT_RETURNTRANSFER => true
);
$ch = curl_init();
curl_setopt_array($ch, $opts);
$lines = curl_exec($ch);
curl_close($ch);
$lines = explode("\n", $lines); // breaking the whole response string line by line
foreach ($lines as $line)
...

file_get_contents() gets 403 from api.github.com every time

I call myself an experienced PHP developer, but this is one drives me crazy. I'm trying to get release informations of a repository for displaying update-warnings, but I keep returning 403 errors. For simplifying it I used the most simple usage of GitHubs API: GET https://api.github.com/zen. It is kind of a hello world.
This works
directly in the browser
with a plain curl https://api.github.com/zen in a terminal
with a PHP-Github-API-Class like php-github-api
This works not
with a simple file_get_contents()from a PHP-Skript
This is my whole simplified code:
<?php
$content = file_get_contents("https://api.github.com/zen");
var_dump($content);
?>
The browser shows Warning: file_get_contents(https://api.github.com/zen): failed to open stream: HTTP request failed! HTTP/1.0 403 Forbidden, the variable $content is a boolean and false.
I guess I'm missing some sort of http-header-fields, but neither can I find those informations in the API-Docs, nor uses my terminal curl-call any special header files and works.
This happens because GitHub requires you to send UserAgent header. It doesn't need to be anything specific. This will do:
$opts = [
'http' => [
'method' => 'GET',
'header' => [
'User-Agent: PHP'
]
]
];
$context = stream_context_create($opts);
$content = file_get_contents("https://api.github.com/zen", false, $context);
var_dump($content);
The output is:
string(35) "Approachable is better than simple."

PHP simple_html_dom load_file with user agent

I need to crawl a web site with simple_dom_html->load_file(),and i need include a user agent,follow is my code,but i don't know if my code is right or there has a good way to achieve my needs.thanks in advance
$option = array(
'http' => array(
'method' => 'GET',
'header' => 'User-Agent: Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)',
)
);
$context = stream_context_create($option);
$simple_html_dom = new simple_html_dom();
$simple_html_dom -> load_file(CRAWLER_URL, false, $context);
I have tested your method / code and I can confirm it works as intended: the user-agent in the HTTP header send, is correctly changed to the one you provide with the code. :-)
As for your uncertainty: I usually use the curl functions to obtain the HTML string (http://php.net/manual/en/ref.curl.php). In this way I have more control of the HTTP request and then (when anything works fine) I use the simple_dom_html→str_get_html() function on the HTML string I get with curl. So I am more flexible in error handling, dealing with redirects and I had implemented some caching...
The solution for your problem was simply to grep a URL like http://www.whatsmyuseragent.com/ and lock in the result for the user-agent string used in the request, to check if it had worked as intended...

File cache in gae PHP

I'm having some trouble using gae php as a simple proxy using "file_get_contents"
When i load a file for the first time I get the latest version available.
But if I change the content of the file, I dont get the latest version immediately.
$result = file_get_contents('http://example.com/'.$url);
The temporary solution I found was to add a random variable at the end of the query string, which allowed me to get a fresh version of the file every time :
$result = file_get_contents('http://example.com/'.$url.'?r=' . rand(0, 9999));
But this trick doesn't work for api calls with parameters for example.
I tried disabling APC cache in the php.ini of gae (using apc.enabled = "0") and i used clearstatcache(); in my script, but neither work.
Any ideas ?
Thanks.
As described in the appengine documentation the http stream wrapper uses urlfetch. As seen in another question urlfetch provides a public/shared cache and as such does not allow individual apps to clear it. For your own services you can set the HTTP cache headers to reduce or void the cache as necessary.
Additionally, you can also add HTTP request headers indicating the maximum age of data that is allowed to be returned. The python example given in mailing list thread is:
result = urlfetch.fetch(url, headers = {'Cache-Control' : 'max-age=300'})
Per php.net file_get_contents http header example and HTTP header documentation a modified example would be:
<?php
$opts = [
'http' => [
'method' => 'GET',
'header' => "Cache-Control: max-age=60\r\n",
],
];
$context = stream_context_create($opts);
$file = file_get_contents('http://www.example.com/', false, $context);
?>

'&' becomes '&' when trying to get contents from a URL

I was running my WebServer for months with the same Algorithm where I got the content of a URL by using this line of code:
$response = file_get_contents('http://femoso.de:8019/api/2/getVendorLogin?' . http_build_query(array('vendor'=>$vendor,'user'=>$login,'pw'=>$pw),'','&'));
But now something must have changed as out of sudden it stopped working.
In earlier days the URL looked like it should have been:
http://femoso.de:8019/api/2/getVendorLogin?vendor=100&user=test&pw=test
but now I get an error in my nginx log saying that I requested the following URL which returned a 403
http://femoso.de:8019/api/2/getVendorLogin?vendor=100&user=test&pw=test
I know that something changed on the target server, but I think that shouldn't affect me or not?!
I already spent hours and hours of reading and searching through Google and Stackoverflow, but all the suggested ways as
urlencode() or
htmlspecialchars() etc...
didn't work for me.
For your information, the environment is a zend application with a nginx server on my end and a php webservice with apache on the other end.
Like I said, it changed without any change on my side!
Thanks
Let's find out the culprit!
1) Is it http_build_query ? Try replacing:
'http://femoso.de:8019/api/2/getVendorLogin?' . http_build_query(array('vendor'=>$vendor,'user'=>$login,'pw'=>$pw)
with:
"http://femoso.de:8019/api/2/getVendorLogin?vendor={$vendor}&user={$login}&pw={$pw}"
2) Is some kind of post-processing in the place? Try replacing '&' with chr(38)
3) Maybe give a try and play a little bit with cURL?
$ch = curl_init();
curl_setopt_array($ch, array(
CURLOPT_URL => 'http://femoso.de:8019/api/2/getVendorLogin?' . http_build_query(array('vendor'=>$vendor,'user'=>$login,'pw'=>$pw),
CURLOPT_RETURNTRANSFER => true,
CURLOPT_HEADER => true, // include response header in result
//CURLOPT_FOLLOWLOCATION => true, // uncomment to follow redirects
CURLINFO_HEADER_OUT => true, // track request header, see var_dump below
));
$data = curl_exec($ch);
curl_close($ch);
var_dump($data, curl_getinfo($ch, CURLINFO_HEADER_OUT));
exit;
Sounds like your arg_separator.output is set to "&" in your php.ini. Either comment that line out or change to just "&"
I'm no expert but that's the way the computer reads the address since it's a special character. Something with encoding. Simple fix would be to to filter by utilizing str_replace(). Something along those lines.

Categories