I'm trying to scrape the below feed (with permission) via PHP cURL:
http://www.safc.com/Home/RSS Feeds/News%20Feed
Loads fine in a browser, but gives me a 400 'bad request' with cURL.
$ch = curl_init($uri); //http://www.safc.com/Home/RSS Feeds/News%20Feed
curl_setopt_array($ch, array(
CURLOPT_RETURNTRANSFER => 1,
CURLOPT_ENCODING => '',
CURLOPT_TIMEOUT => CURL_CONNECT_TIMEOUT,
CURLOPT_USERAGENT => CURL_USER_AGENT,
CURLOPT_SSL_VERIFYPEER => false,
CURLOPT_FOLLOWLOCATION => true
));
$ret = curl_exec($ch);
Result is a 400; I know this from looking in curl_getinfo().
CURL_USER_AGENT is an arbitrary identifier as I realised some other feeds wouldn't spit out content unless this header was present. I have tried removing the headers one by one, and tried adding a few more, but that approach feels a bit needle/haystack.
Before I approach the owners of the site, does anyone know how I might resolve this?
use http://www.safc.com/home/rss%20feeds/news%20feed check different between "Home" and "home" there is 301 redirect when you use "Home".
Related
I have searched a lot, and found many related questions and forums related to this, but this one is a challenging one.
I'm trying to POST a complex array via curl. It has to be form-data while the first value in the array is of type JSON.
The two other values of array are two images which are uploaded and ready to send.
I tried to run it in Postman, and works perfectly fine. I used the generated PHP code from Postman, but it is not working. Seems like postman is handling some of its tricks without revealing them to us.
Any way, I'm posting a Postman image to illustrate what I mean:
As you can see, I'm sending the data in form-data tab, my first value (param1) is a JSON with content-type application/json, the second and third values are images uploaded in Postman.
This works just fine in Postman.
The problem is, if I set Content-Type:multipart/form-data in header, the destination server throws an error saying the content-type must be JSON.
If I set the Content-Type:application/json in header, the destination server says content must be of type Multipart.
Somehow, I need to set both content-types. The main one as form-data and the one for param1 as JSON.
I paste the Postman code as well, may that be a good start for you fellas to help out with the code.
Postman Code:
$curl = curl_init();
curl_setopt_array($curl, array(
CURLOPT_URL => 'http://xxxxx.com/xxxx/xxx/xxxx',
CURLOPT_RETURNTRANSFER => true,
CURLOPT_ENCODING => '',
CURLOPT_MAXREDIRS => 10,
CURLOPT_TIMEOUT => 0,
CURLOPT_FOLLOWLOCATION => true,
CURLOPT_HTTP_VERSION => CURL_HTTP_VERSION_1_1,
CURLOPT_CUSTOMREQUEST => 'POST',
CURLOPT_POSTFIELDS => array('param1' => '{
"AgentId":"1414",
"ContractId":36529,
"Files":[
{
"FileName":"car_card_front_image.png",
"FileTypeId":2
},
{
"FileName":"car_card_back_image.png",
"FileTypeId":2
}
]
}','param2'=> new CURLFILE('/C:/images/icons/car_card_back_image.png'),'param3'=> new CURLFILE('/C:/images/icons/car_card_front_image.png')),
CURLOPT_HTTPHEADER => array(
'authenticationToken: xxxx-xxx-xx-xxxxxxxx'
),
));
$response = curl_exec($curl);
curl_close($curl);
echo $response;
The PHP generated code by postman, is not working. One of the reasons can be that there's no content-type mentioned in it.
I tried modifying the code, adding content-types in header and in parameter, nothing seems to work.
If Postman can do it, we should be able it too, right?
Go ahead, make as much changes as you would or suggest anything that comes to your mind, I will test them all.
Cheeeeers...
May i suggest the ixudrra/curl library ?
It would make your life easier ....
$response = Curl::to('http://example.org')
->withData( array( 'Foo' => 'Bar' ) )
->withFile( 'image_1', '/path/to/dir/image1.png', 'image/png', 'imageName1.png' )
->withFile( 'image_2', '/path/to/dir/image2.png', 'image/png', 'imageName2.png' )
->post();
When using postman locally on my machine, I am able to send the request no problem and get a response back. Because of the invalid token I am sending the api, I should receive this back.
{
"status": "Error",
"message": "Invalid API Token"
}
Using postman's utility to generate php curl code to make this request I get this.
$curl = curl_init();
curl_setopt_array($curl, array(
CURLOPT_URL => "https://app.mobilecause.com/api/v2/reports/transactions.json",
CURLOPT_RETURNTRANSFER => true,
CURLOPT_ENCODING => "",
CURLOPT_MAXREDIRS => 10,
CURLOPT_TIMEOUT => 30,
CURLOPT_HTTP_VERSION => CURL_HTTP_VERSION_1_1,
CURLOPT_CUSTOMREQUEST => "GET",
CURLOPT_POSTFIELDS => "",
CURLOPT_COOKIESESSION => true,
CURLOPT_COOKIEFILE => "cookie.txt",
CURLOPT_COOKIEJAR => "cookie.txt",
CURLOPT_FOLLOWLOCATION => true,
CURLOPT_RETURNTRANSFER => true,
CURLOPT_HTTPHEADER => array(
'Authorization: Token token="test_token"',
"Content-Type: application/x-www-form-urlencoded",
"cache-control: no-cache",
),
));
curl_setopt($curl, CURLOPT_VERBOSE, true);
$response = curl_exec($curl);
$err = curl_error($curl);
curl_close($curl);
if ($err) {
echo "cURL Error #:" . $err;
} else {
echo $response;
}
Running this code on my webserver results in a page body returned that is a cloudflare landing page, specifically this.
Please enable cookies.
One more step
Please complete the security check to access app.mobilecause.com
Why do I have to complete a CAPTCHA?
Completing the CAPTCHA proves you are a human and gives you temporary access to the web property.
What can I do to prevent this in the future?
If you are on a personal connection, like at home, you can run an anti-virus scan on your device to make sure it is not infected with malware.
If you are at an office or shared network, you can ask the network administrator to run a scan across the network looking for misconfigured or infected devices.
Cloudflare Ray ID: RAY_ID • Your IP_REDACTED • Performance & security by Cloudflare
I cannot explain why this happens. I have a valid 'cookie.txt' that is getting written to, but it seems like it is missing content.
The cookie that curl writes through this request stored in 'cookie.txt' looks like this. (Redacted potentially sensitive information.)
#HttpOnly_.app.mobilecause.com TRUE / FALSE shortStringOfNumbers __cfduid longStringOfNumbers
The cookies generated by postman when executing the command through postman look like this. (Redacted potentially sensitive information.)
__cfruid=longStringOfNumbers-shortStringOfNumbers; path=/; domain=.app.mobilecause.com; HttpOnly; Expires=Tue, 19 Jan 2038 03:14:07 GMT;
__cfduid=longStringOfNumbers; path=/; domain=.app.mobilecause.com; HttpOnly; Expires=Thu, 23 Jan 2020 04:54:50 GMT;
Essentially it seems like the php request is missing the '__cfruid' cookie. Could this be the cause?
Copying this exact code into http://phpfiddle.org/ produces this same cloudflare landing page. Running this locally on my machine produces the expected result.
You're running into a Managed Challenge: https://developers.cloudflare.com/fundamentals/get-started/concepts/cloudflare-challenges/
The key question here is whether you own the zone. The site owner can add a managed challenge for pretty much any reason as part of their WAF: https://developers.cloudflare.com/waf/ . We could speculate about whether it's due to your traffic being deemed a bot or maybe they're blocking based on your user agent string. You don't have control over managed challenges that are served to you if you don't own the domain in Cloudflare.
If you are the site owner, you can determine which rule is causing this Managed Challenge by taking the Cloudflare rayID and filtering for it in Security > Overview. You can then add a bypass to your firewall rule to exclude this PHP curl traffic.
I am going to send a Log in request to the server by using CURL in my PHP code. This request is POST and I wrote following code,
if (strpos($header, 'POST') !== false){
curl_setopt_array($curl, array(
CURLOPT_RETURNTRANSFER => 1,
CURLOPT_HEADER => 1,
CURLOPT_URL => $url,
CURLOPT_USERAGENT => 'proxy',
CURLOPT_POST => 1,
CURLOPT_POSTFIELDS => $postParameters,
CURLOPT_COOKIE => $postCoockies
));
$respond = curl_exec($curl);
curl_close($curl); }
and if I want to say what my variables are here:
$url is the exact String after POST and before HTTP/1.1.
when I want to send the GET request also, I am using this part as my urs which would be contained all parameters in my query String.
$postParameters is the exact String that exist in POST body.
(That's a String, not an associative array)
$postCoockies in the exact String after Cookie: in my request header.
(again, that's not an associative array)
My problem is:
When I send this request, server sends me back a response to only the URL I am passing. Which is to load the login page again! it seems that the server is not receiving my parameters!
Also, when I used GET request for login (I have access to the source code of website), and sent the whole URL and parameters inside the $url, the same thing happened.
Am I wrong somewhere in sending my request?
Should I consider something else here?
We've gotten permission to periodically copy a webcam image from another site. We use cURL functions elsewhere in our code, but when trying to access this image, we are unable to.
I'm not sure what is going on. The code we use for many other cURL functions is like so:
$image = 'http://island-alpaca.selfip.com:10202/SnapShotJPEG?Resolution=640x480&Quality=Standard'
$options = array(
CURLOPT_URL => $image,
CURLOPT_RETURNTRANSFER => true,
CURLOPT_FOLLOWLOCATION => true,
CURLOPT_CONNECTTIMEOUT => 120,
CURLOPT_TIMEOUT => 120,
CURLOPT_MAXREDIRS => 10
);
$ch = curl_init();
curl_setopt_array($ch, $options);
$cURL_source = curl_exec($ch);
curl_close($ch);
This code doesn't work for the following URL (webcam image), which is accessible in a browser from our location: http://island-alpaca.selfip.com:10202/SnapShotJPEG?Resolution=640x480&Quality=Standard
When I run a test cURL, it just seems to hang for the length of the timeout. $cURL_source never has any data.
I've tried some other cURL examples online, but to no avail. I'm assuming there's a way to build the cURL request to get this to work, but nothing I've tried seems to get me anywhere.
Any help would be greatly appreciated.
Thanks
I don't see any problems with your code. You can get error sometimes because of different problems with network. You can try to wait for good response in loop to increase the chances of success.
Something like:
$image = 'http://island-alpaca.selfip.com:10202/SnapShotJPEG?Resolution=640x480&Quality=Standard';
$tries = 3; // max tries to get good response
$retry_after = 5; // seconds to wait before new try
while($tries > 0) {
$options = array(
CURLOPT_URL => $image,
CURLOPT_RETURNTRANSFER => true,
CURLOPT_FOLLOWLOCATION => true,
CURLOPT_CONNECTTIMEOUT => 10,
CURLOPT_TIMEOUT => 10,
CURLOPT_MAXREDIRS => 10
);
$ch = curl_init();
curl_setopt_array($ch, $options);
$cURL_source = curl_exec($ch);
curl_close($ch);
if($cURL_source !== false) {
break;
}
else {
$tries--;
sleep($retry_after);
}
}
Can you fetch the URL from the server where this code is running? Perhaps it has firewall rules in place? You are fetching from a non-standard port: 10202. It must be allowed by your firewall.
I, like the others, found it easy to fetch the image with curl/php.
As it was said before, I can either see any problem with the code. However, maybe you should consider setting more timeout for the curl - to be sure that this slow loading picture finally gets loaded. So, as a possibility, try to increase CURLOPT_TIMEOUT to weird big number, as well as corresponding timeout for php script execution. It may help.
Maybe, the best variant is to mix the previous author's variant and this one.
I tried wget on the image URL and it downloads the image and then seems to hang - perhaps the server isn't correctly closing the connection.
However I got file_get_contents to work rather than curl, if that helps:
<?php
$image = 'http://island-alpaca.selfip.com:10202/SnapShotJPEG?Resolution=640x480&Quality=Standard';
$imageData = base64_encode(file_get_contents($image));
$src = 'data: '.mime_content_type($image).';base64,'.$imageData;
echo '<img src="',$src,'">';
Are you sure it's not working? Your code is working fine for me (after adding the missing semicolon after $image = ...).
The reason it might be giving you trouble is because it's not actually an image, it's an MJPEG. It uses an HTTP session that's kept open and with a multipart content (similar to what you see in MIME email), and the server pushes a new JPEG frame to replace the last one on an interval. CURL seems to be happy just giving you the first frame though.
I have a PHP file that invokes another PHP file via curl. I am trying to have the second file send a response back to the first to let it know that it started. The problem is the first can't wait for the first to finish execution because that can take a minute or more, I need it to send a response immediately then go about it's regular business. I tried using an echo at the top of the second file, but the first doesn't get that as a response.
How do I send back a response without finishing execution?
file1.php
<?php
$url = 'file2.php';
$params = array('data'=>$data,'moredata'=>$moredata);
$options = array(
CURLOPT_RETURNTRANSFER => true, // return web page
CURLOPT_HEADER => false, // don't return headers
CURLOPT_FOLLOWLOCATION => true, // follow redirects
CURLOPT_ENCODING => "", // handle all encodings
CURLOPT_USERAGENT => "Mozilla", // who am i
CURLOPT_AUTOREFERER => true, // set referer on redirect
CURLOPT_CONNECTTIMEOUT => 120, // timeout on connect
CURLOPT_TIMEOUT => 120, // timeout on response
CURLOPT_MAXREDIRS => 10, // stop after 10 redirects
CURLOPT_TIMEOUT => 10, // don't wait too long
CURLOPT_POST => true, // Use Method POST (not GET)
CURLOPT_POSTFIELDS => http_build_query($params)
);
$ch = curl_init($url);
curl_setopt_array( $ch, $options );
$response = curl_exec($ch); // See that the page started.
curl_close($ch);
echo 'Response: ' . $response;
?>
file2.php
<?php
/* This is the top of the file. */
echo 'I started.';
.
.
.
// Other CODE
.
.
.
?>
When I run file1.php it results in: 'Response: ' but I expect it to be 'Response: I started.' I know that file2.php gets started because 'Other CODE' get executed, but The echo doesn't get sent back to file1.php, why?
This could be just what you're looking for. Forking in PHP:
http://framework.zend.com/manual/en/zendx.console.process.unix.overview.html
A process divides in two. One is father of the other. The father can tell the client he just begun and the child can do the job. When the child finishes, he's able to report the father which can also report to the client.
Keep in mind there are many requirements for this to run:
Linux
CLI or CGI interface
shmop, pcntl and posix extensions (require recompiling)
The answer ended up being that CURL does not behave like a browser:
PHP Curl output buffer not receiving response
I ended up running my 2nd file first and my 1st file second. The 2nd file waited for a 'finished' file write that the 1st file did once it, obviously, finished.
At this point, it seems like the database would be a better place to store messages for files to be able to pass between each other, but a file would also work for a quick and dirty job.