Got a php script using cURL grabbing the contents of a url that has colons in the source name:
$url = 'http://www.awebsite.com/anxml:file:thatoddly:hascolons:allovertheplace:';
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 5);
$data = curl_exec($ch);
if(curl_errno($ch)) {
echo 'Curl error: ' . curl_error($ch);
}
curl_close($ch);
I am getting the error.
Could not resolve host: http; nodename nor servname provided, or not known <url here>
I've double checked that the url is working fine otherwise, but I suspect cURL is choking on the colons in the filename. The source isn't mine, so I can't remove the colons.
Is there another way around this?
Provider fixed their files, so I don't have to deal with colons any longer. Turns out I was using cURL improperly after all and likely the urlencode() with the code below would have worked.
This DIDN'T WORK:
$url = urlencode($url);
$url = str_replace("http%3A","http:",$url);
$ch = curl_init();
$timeout = 5;
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, $timeout);
$data = curl_exec($ch);
This DID WORK:
$url = urlencode($url);
$url = str_replace("http%3A","http:",$url);
$c = curl_init($url);
curl_setopt($c, CURLOPT_RETURNTRANSFER, true);
$data = iconv("UTF-8","ISO-8859-1",curl_exec($c));
Hope that helps someone out.
Looking at the man page, cURL has a --data-urlencode flag.
If it's just one URL not being done via CLI but PHP, you could use PHP's urlencode().
Related
i write following code to get html data from url and its working for https site like Facebook but not working for Instagram only.
Instagram returns the blank
<?php
$url = 'https://www.instagram.com';
$returned_content = get_data($url);
print_r($returned_content)
/* gets the data from a URL */
function get_data($url) {
$ch = curl_init();
$timeout = 5;
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, $timeout);
$data = curl_exec($ch);
curl_close($ch);
return $data;
}
?>
The Instagram will return only javascript, that can't be render by your browser because it uses dynamic path, so <script src='/path/file.js'> will try to get localhost/path/file.js instead of instagram.com/path/file.js and in this situation the localhost/path/file.js not will exist, so the page will be blank.
One solution is find a way to give the full HTML instead of the Javascript, in this case you can use the "User-Agent" to do this trick. You might know that JS not handle by the search-engine, so for this situation the Instagram (and many websites) give the page without JS that is supported by the bot.
So, add this:
curl_setopt($ch, CURLOPT_USERAGENT, "ABACHOBot");
The "ABACHOBot" is one Crawler. In this page you can found many others alternatives, like a "Baiduspider", "BecomeBot"...
You can use "generic" user-agent too, like "bot", "spider", "crawler" and probably will work too.
Here try this on
<?php
$url = 'https://www.instagram.com';
$returned_content = get_data($url);
print_r($returned_content);
/* gets the data from a URL */
function get_data($url) {
$ch = curl_init();
$timeout = 5;
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, $timeout);
//Update.................
curl_setopt($ch, CURLOPT_USERAGENT, 'spider');
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
curl_setopt($ch, CURLOPT_HEADER, false);
//....................................................
curl_setopt($ch,CURLOPT_SSL_VERIFYPEER, false);
$data = curl_exec($ch);
curl_close($ch);
return $data;
}
?>
You should pass
curl_setopt($ch,CURLOPT_SSL_VERIFYPEER, false)
and other header info as above.
For more detail,Please see
http://stackoverflow.com/questions/4372710/php-curl-https
I have a task: get by inputed keyword Wikipedia article, save it to database and then make a search inside them.
The problem is: how to access api and retrieve data from wikipedia, I've tried this url (at the begining i've tried json format):
$url = 'https://en.wikipedia.org/w/api.php?action=query&titles=Dog&prop=revisions&rvprop=content&format=xml';
and this php code:
$ch=curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_USERAGENT, $_SERVER['HTTP_USER_AGENT']);
$res = curl_exec($ch);
if (!$res) {
echo 'cURL Error: '.curl_error($ch);
}
var_dump($res);
but nothing happend. Is it possible to access data with curl?
At the end one code worked with url above:
ini_set('user_agent','TestText');
$xmlDoc = new \DOMDocument();
$xmlDoc->load($url);
echo($xmlDoc->saveXML());
and then I get the text like this
{{about|the domestic dog|related species known as "dogs"|Canidae|other
uses|Dog (disambiguation)|}} {{Redirect|Doggie|the Danish
artist|Doggie (artist)}} {{pp-semi-indef}} {{pp-move-indef}} {{Taxobox
| name = Domestic dog | fossil_range = {{Fossil
range|0.033|0}}[[Pleistocene]] – [[Recent]] |
How can I handle it to be prettier (text with paragraphes or at liest plain text)?
So, There are two questions:
1. Is it possible to access wiki data with php curl and how I should improve my code?
2. How do I make wiki xml code prettier?
My question about code, especially about curl. Why it doesn't work?
And also, answer to another question says only about wikipedia api urls. By only changing url I can't solve problem.
I've found the solution, CURLOPT_SSL_VERIFYPEER was needed:
$url = 'http://en.wikipedia.org/w/api.php?action=query&prop=extracts&format=json&explaintext=&titles=Dog';
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, TRUE);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
curl_setopt($ch, CURLOPT_USERAGENT, $_SERVER['HTTP_USER_AGENT']);
$res = curl_exec($ch);
//$json_data = mb_substr($res, curl_getinfo($ch, CURLINFO_HEADER_SIZE));
curl_close($ch);
$json = json_decode($res);
$content = $json->query->pages;
$wiki_id = '';
foreach ($content as $key => $value) {
$wiki_id = $key;
}
echo $content = $content->$wiki_id->extract;
I have the following bit of PHP, it works locally (via apache and localhost) but not on my hosting - $response is always empty:
function get_data($url) {
$ch = curl_init();
$timeout = 5;
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_NOBODY, true);
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, $timeout);
$data = curl_exec($ch);
curl_close($ch);
return $data;
}
$api_key = 'my_api_key';
$randomString = generateRandomString(10);
$endLabel = sha1(md5($randomString));
$user_id = $endLabel;
$amount_doge = '5';
$url = "https://dogeapi.com/wow/?api_key=".$api_key."&a=get_new_address&address_label=".$user_id;
$response = get_data($url);
I wondered if this could be because I'm hosted on HTTP (no SSL option) and I'm calling a HTTPS domain? If so, is there a way around this? I've tried curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false); but it doesn't seem to do anything :(
try to use echo curl_error($ch) after $data = curl_exec($ch); to see what curl says
it wil lreport you what happened
I'm trying to execute curl using the following code.
mainFunction{
.
.
$url = strtolower($request->get('url', NULL));
$html_output= $this->startURLCheck($url);
.
.
}
function startURLCheck($url)
{
$ch = curl_init();
curl_setopt($ch,CURLOPT_URL,$url);
curl_setopt($ch, CURLOPT_HEADER, false);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, true);
curl_setopt($ch, CURLOPT_TIMEOUT, 30);
$html_output = curl_exec($ch);
}
When i give the string URL directly this is working fine. But then I pass the string data through a function curl is not executing. curl_error gives shows no errors too. I tried many encoding and decoding method for the string with same result.Am i doing something wrong? I working using XAMPP server on windows.
I'm passing URL to this function after getting the URL from a HTML post request in another function.
The problem is that your function startURLCheck does not actually return a value for the main program to use. Change the last line:
function startURLCheck($url)
{
$ch = curl_init();
curl_setopt($ch,CURLOPT_URL,$url);
curl_setopt($ch, CURLOPT_HEADER, false);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, true);
curl_setopt($ch, CURLOPT_TIMEOUT, 30);
return curl_exec($ch);
}
In your calling code, take out the "$this->"
$html_output = startURLCheck($url);
$html_output now contains results of the curl call.
I have assumed that you copied and pasted this code from somewhere since your "mainFunction" declaration is syntactically incorrect, and you used "$this->" without specifying that startURLCheck was a method of an object.
If in fact you intend startURLCheck to be an object method and you want it to set $html_output on the object, do this:
<?php
class Example {
private $html_output;
function mainFunction()
{
$url='http://www.ebay.com/itm/Apple-iPhone-5-16GB-Black-Slate-Cricket-intl-UNLOCKED-pleeze-read-ad-/251252227033';
$this->startURLCheck($url);
echo "HTML output: " . $this->html_output;
}
function startURLCheck($url)
{
$ch = curl_init();
curl_setopt($ch,CURLOPT_URL,$url);
curl_setopt($ch, CURLOPT_HEADER, false);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, true);
curl_setopt($ch, CURLOPT_TIMEOUT, 30);
$this->html_output = curl_exec($ch);
}
}
$example = new Example();
$example->mainFunction();
I have tested this code on the command line (not in a web page). If you copy and paste this into a file and run it using php -r you will see the results. (And note that I didn't include a closing ?> tag. The closing tag is optional when the file contains only PHP code and no HTML. In fact it is recommended that the closing tag be omitted in such cases. See http://php.net/manual/en/language.basic-syntax.instruction-separation.php)
Please also note in your question code for mainFunction you have illegal spaces before "pleeze" in your URL and you are missing the semicolon at the end of the $url assignment.
Hope this helps. Good luck.
This works good.
<?php
function excURL()
{
$ch = curl_init();
$url = "http://www.google.com";
curl_setopt($ch,CURLOPT_URL,$url);
curl_setopt($ch, CURLOPT_HEADER, false);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, true);
curl_setopt($ch, CURLOPT_TIMEOUT, 30);
$html_output = curl_exec($ch);
echo $html_output;
}
excURL();
?>
Hey Guys I have found the problem..Finally..
When I set CURLOPT_FOLLOWLOCATION for the curl this is working fine...
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
But stil it is not clear why it worked when I hardcoded the URL inside the function and did not work when I passed url as a variable into the function, without setting CURLOPT_FOLLOWLOCATION ... When I set this option it is working in both ways..
When i use curl library and try to get image from url i get 400 bad request error.
I founded that problem is with encoding url.
But in my case it's not work, because my url - it's path to image on server side - like
http://example.com/images/products/product 1.jpg
I understand that user spaces in name files it's bad practice, but it's not my server and not i created those files.
$ch = curl_init();
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_URL, urlencode($url));
echo $ret = curl_exec($ch);
When i use urlencode function - curl return http_code = 0
Updated
$url = str_replace(' ', '+', $url);
doesn't work, server return 404 error.
Does this maybe work?
$url = 'http://host/a b.img';
$url = str_replace(" ", '%20', $url);
$ch = curl_init();
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_URL, $url);
echo $ret = curl_exec($ch);
You need to use rawurlencode() function:
$ch = curl_init();
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_URL, rawurlencode($url));
echo $ret = curl_exec($ch);
rawurlencode() must be always preferred. urlencode() is only kept for legacy use. For more details look at this SO answer.
You can't urlencode the entire string because that will encode the slashes and others that you need to remain unencoded. If spaces are your only problem, this will do:
str_replace(' ', '+', $url);
Using curl has more challenges than other ways. Simply use shell_exec
$cmd = "curl -k -v -X POST 'https://serveraddress:8243/?foo=bar' -H 'Authorization: Basic cEZ.....h'";
$result = shell_exec($cmd);
$output = json_decode($result,true);
print_r($output);