Simple_HTML_dom and Curl stopped working on a server - php

I've had several scripts pull information from a different server work perfectly for about a year.
Recently, it looks like they've changed something and both simple_html_dom and curl fail to work, either resulting in a failed to open stream: HTTP request failed! error or simply freezing.
Sometimes, I can successfully get ONE request through, but only one. The problem is the nature of the script requires more than one request from that server.
The page I am trying to pull is http://www.ic.gc.ca/app/opic-cipo/trdmrks/srch/cntnBscSrch.do?textField1=otimo&selectField1=tmlookup_ext&useblg=bscSrch.do%3Flang%3Deng&languageDirection=f&lang=eng&submitButton=Search&selectMaxDoc=1000000&selectDocsPerPage=1000000
Would really really appreciate any help
This is the simplified version of the code which also results in the same problem:
<?php
require("simple_html_dom.php");
function file_get_contents_curl($url) {
$ch = curl_init();
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1); //Set curl to return the data instead of printing it to the browser.
curl_setopt($ch, CURLOPT_URL, $url);
$data = curl_exec($ch);
curl_close($ch);
return $data;
}
$link="http://www.ic.gc.ca/app/opic-cipo/trdmrks/srch/cntnBscSrch.do?textField1=otamo&selectField1=tmlookup_ext&useblg=bscSrch.do%3Flang%3Deng&languageDirection=f&lang=eng&submitButton=Search&selectMaxDoc=1000000&selectDocsPerPage=1000000";
$html = file_get_html($link);
echo "SIMPLE HTML DOM:<br>".$html;
$html = file_get_contents_curl($link);
echo "CURL:<br>".$html
?>

Related

CURl not returning an entire page

I am using CURl to retrieve a page for a small search engine project I am working on, but on some pages it's not retrieving the entire page.
The function that I have setup is:
public function grabSourceCode($url) {
// Try and get source code using #file_get_contents
$ch = curl_init();
$timeout = 50;
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, $timeout);
curl_setopt($ch, CURLOPT_USERAGENT,'NameBot/0.2');
$source_code = curl_exec($ch);
curl_close($ch);
return $source_code;
}
and I am retrieving the page using:
$Crawler->grabSourceCode('https://sedo.com/search/searchresult.php4?keyword=cats&language_output=e&language=e')
on this page I get everything, but on this page, I only get part of the page.
I have tried using file_get_contents() but that has the same results.
It seem's to be an issue with dynamic loading of the page, when I run the browser in JavaScript blocking mode it shows the same results as the CURl function.
Is there anyway to do this in PHP, or would I have to look at another language, such as JavaScript?
Thanks, Daniel

Issue with cURL in PHP, returning false regardless

So I'm rather new to using cURL in PHP, I was told to use it for a task this week and it's been nothing but a pain, and I can't seem to find a solution to my problem no matter how hard I search.
What I am attempting to do is send a file to an upload directory on my hosted server from a remote portal manager that I have built. The file uploadhandler in the portal manager connects via curl to the remote destination and then the remote destination grabs the info and processes the file like normal. No matter what I've been trying though everything just throws back a failed response.
Here is the updated version of the code I am working with
updates
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_POST, false);
//curl_setopt($ch, CURLOPT_HEADER, false);
curl_setopt(
$ch,
CURLOPT_POSTFIELDS,
array(
'file' =>
'#' .$_FILES['doc']['tmp_name'][$i]
.';filename=' .$_FILES['doc']['name'][$i]
.';type=' .$_FILES['doc']['type'][$i]
)
);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
//$response = curl_exec($ch);
if(curl_exec($ch) === false){
echo curl_error($ch);
echo curl_errno($ch);
}else{
echo "ok";
}
With all of this information set I am getting no value for curl_error but I get a value of 43 for curl_errno.
From what I have been researching, error 43 for curl is
CURLE_BAD_FUNCTION_ARGUMENT (43)
Internal error. A function was called with a bad parameter.
However all my functions for the curl_setopt() are put together correctly based on the info from php.net. So this is where I am now confused, because I have no idea what is causing this to happen. Thanks again for the help!

How to get GitHub raw file contents with PHP?

I am browsing Github info/docs but cant find any simple example on how to
get contents of a raw Github file.
For example if I try to use
$url ="https://raw.githubusercontent.com/octocat/Spoon-Knife/master/index.html";
echo file_get_contents($url);
I get failed to open stream: HTTP...
If I use curl same thing 404 page. So obviously I have to use API but there is just no simple example on how to do it.
Can someone please post plain example.
Thank you!
You can use a different method to escape SSL validation and add more options if you want, with CURL function.
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, 'https://raw.githubusercontent.com/octocat/Spoon-Knife/master/index.html');
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
$data = curl_exec($ch);
curl_close($ch);
echo $data;
Works fine for me here, you just need to enable, if was not, the curl extension.

Timeout when passing variables with the url and fopen() in php

I have done quite a bit of searching and cannot quite find my answer. My problem is that I am trying to call a link with GET variables attached to it and it just hangs and hangs until connection times out. When I just literally call the link in a web browser it works fine no problem.
Here is the fopen() php code example:
<?php
$url = "https://www.mysite.com/folder/second_folder/file.php?varA=val1&varB=val2&varC=val3&varD=val4&varE=val5";
$ch = fopen($url, 'r');
if(!$ch){
echo "could not open!!! $url";
} else {
echo "Success! ($url)";
}
?>
I can call file.php without the GET variables just fine. Returns with no error.
NOTE: I will say that file.php with one of the var's that get passed, does some functions and then does a header Location rewrite. I do not think it is even getting to this point when it does a connect timeout though because when I had problems, I put in a "check point" prior to the header Location point which should email me, and it does not email me.
Again, if I run the URL in a web browser it works just fine.
So what is going on if anyone can help me? I just need to run the URL as if PHP is clicking on the links. I have used fopen before but for some reason it does not work now. Also cURL did not work on this.
Try changing '' to " " in this case.
My working code is
<?php $handle = fopen("c:\\folder\\resource.txt", "r"); ?>
I think you want to be using
$ch = file_get_contents($url);
Edit: cURL option
// open
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($ch, CURLOPT_MAXREDIRS, 1);
curl_setopt($ch, CURLOPT_FORBID_REUSE, 1);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
$page_data = curl_exec($ch);
$page_info = curl_getinfo($ch);
// close
curl_close ($ch);

How can I output the body of a webpage if the response code is 500 using curl?

I've been through hell and high water with this problem. I get a 500 error on a page a tiny, tiny fraction of the time. I have been completely unable to reproduce it, but Google insists that they see a 500 code. Fetch as Googlebot says it's successful, however something is wrong. I've been down many avenues and the only recourse I have left is to brute-force the local copy of the website.
I want to use curl to hammer the dev site until I get a 500 error, and when I do, to output the body of the page to the terminal so I can actually get some useful information.
for(;;){
$url = "http://www.blahblah.dev/";
$ch = curl_init();
//Set the URL
curl_setopt($ch, CURLOPT_URL, $url);
//Enable curl response
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
//Enable POST data
curl_setopt($ch, CURLOPT_POST, true);
//Use the $pData array as the POST data
curl_setopt($ch, CURLOPT_POSTFIELDS, $jData);
$result = curl_exec($ch);
if(strstr($result, "error")){
echo $result;
exit();
}
curl_close($ch);
usleep(500000);
}
As you can see, I'm simply checking to see if "error" appears in the body, as I can't figure out how to check for a 500 error properly. I realize that this is a terrible and contrived way of debugging, but it's all I've got at this point. Thanks!

Categories