cUrl set language header
I was trying to get the source code of Facebook's homepage by using cURL, but it was all Chinese due to the location of my server host. For this reason, I added Accept-Language of CURLOPT_HTTPHEADER to change the language to English, but failed. According to the answer I quoted above, below is the PHP code of cURL I tried:
<?php
$url = "http://www.facebook.com/";
if(isset($_SERVER['HTTP_USER_AGENT']))
$user_agent = $_SERVER['HTTP_USER_AGENT'];
else
$user_agent = "";
$options = array(
CURLOPT_RETURNTRANSFER => true, // return web page
CURLOPT_HEADER => false, // don't return headers
CURLOPT_FOLLOWLOCATION => true, // follow redirects
CURLOPT_HTTPHEADER => array("Accept-Language: en-US;q=0.6,en;q=0.4"),
CURLOPT_USERAGENT => $user_agent);
$ch = curl_init($url);
curl_setopt_array($ch, $options);
$content = curl_exec($ch);
$err = curl_errno($ch);
$errmsg = curl_error($ch);
$header = curl_getinfo($ch);
curl_close($ch);
echo $content;
?>
But it still showed Chinese:
How can I solve this problem?
Related
I looked for a sample code for php & curl and i found this link http://www.php-guru.in/2013/upload-files-using-php-curl/
I tried using the code with gifs.com API to try to convert gif to mp4 (for speed reasons) then display it on my site. i tried using a giphy url to upload to gifs.com and so i ended up with the code below.
$url = 'https://api.gifs.com/media/upload';
$headers = array("Content-Type:multipart/form-data", "Gifs-API-Key:gifkey"); // cURL headers for file uploading
$postfields = array("file" => "#https://media.giphy.com/media/l378drKbCncSKYbS0/giphy.gif", "title" => 'guineapig');
$ch = curl_init();
$options = array(
CURLOPT_URL => $url,
CURLOPT_HEADER => true,
CURLOPT_POST => 1,
CURLOPT_HTTPHEADER => $headers,
CURLOPT_POSTFIELDS => $postfields,
CURLOPT_RETURNTRANSFER => true
); // cURL options
curl_setopt_array($ch, $options);
$server_output = curl_exec($ch);
if (!curl_errno($ch)) {
$info = curl_getinfo($ch);
echo $info['http_code'];
echo $server_output;
} else {
$errmsg = curl_error($ch);
echo $errmsg;
}
curl_close ($ch);
The problem is, it's always showing a 400 http_code and i don't know what the problems is
here is the full error it displays
HTTP/1.1 100 Continue HTTP/1.1 400 Bad Request Server: nginx Date: Thu, 02 Nov 2017 13:44:05 GMT Content-Type: text/plain; charset=utf-8 Content-Length: 0 Access-Control-Allow-Credentials: false Access-Control-Allow-Headers: Origin, Accept,Content-Type,Gifs-API-Key Access-Control-Allow-Methods: GET,POST,OPTIONS Access-Control-Allow-Origin: * Access-Control-Max-Age: 43200 Request-Id: 9b78a2d3-0f25-4f13-bb2d-a40b75e6fa8f Via: 1.1 google Alt-Svc: clear
I don't understand the error
note: I'm using a localhost xampp server, Is this the cause of it messing up?
turns out i only needed to use their import API since it's a link to a gif rather than a file upload, so i changed the header with a json instead of a multipart-form
$url = 'https://api.gifs.com/media/import';
$headers = array("Gifs-API-Key: gifkey", "Content-Type: application/json"); // cURL headers for file uploading
$postfields = "{\n \"source\": \"https://media.giphy.com/media/l378drKbCncSKYbS0/giphy.gif\",\n \"title\": \"guineapig\",\n \"tags\": [\"crazy\", \"hand drawn\", \"2015\", \"art\"],\n \"attribution\": {\n \"site\": \"vine\",\n \"user\": \"someone\"\n }\n}";
$ch = curl_init();
$options = array(
CURLOPT_URL => $url,
CURLOPT_HEADER => true,
CURLOPT_POST => 1,
CURLOPT_HTTPHEADER => $headers,
CURLOPT_POSTFIELDS => $postfields,
CURLOPT_RETURNTRANSFER => true
); // cURL options
curl_setopt_array($ch, $options);
$server_output = curl_exec($ch);
if (!curl_errno($ch)) {
$info = curl_getinfo($ch);
echo $info['http_code'];
echo $server_output;
} else {
$errmsg = curl_error($ch);
echo $errmsg;
}
curl_close ($ch);
I am trying to read the content of a website using cURL to compare some data. I accomplished to receive the content of the webpage with cURL but when I want to extract some data out of the content is it not working. I parse the content with DOMDocument but it seems that characters like & and € and so on does not get converted in a good way, so it crashes. that is why I put htmlentities with it but that also does not work.
This is one of the errors i receive:
Warning: DOMDocument::loadHTML() [domdocument.loadhtml]: htmlParseEntityRef: expecting ';' in Entity, line: 37 in URL on line 40
Can anyone suggest me what I should do different?
This is how I get the content of a website:
function get_web_page( $url )
{
$user_agent='Mozilla/5.0 (Windows NT 6.1; rv:8.0) Gecko/20100101 Firefox/8.0';
$options = array(
CURLOPT_CUSTOMREQUEST =>"GET", //set request type post or get
CURLOPT_POST =>false, //set to GET
CURLOPT_USERAGENT => $user_agent, //set user agent
CURLOPT_COOKIEFILE =>"cookie.txt", //set cookie file
CURLOPT_COOKIEJAR =>"cookie.txt", //set cookie jar
CURLOPT_RETURNTRANSFER => true, // return web page
CURLOPT_HEADER => false, // don't return headers
CURLOPT_FOLLOWLOCATION => false, // follow redirects
CURLOPT_ENCODING => "", // handle all encodings
CURLOPT_AUTOREFERER => true, // set referer on redirect
CURLOPT_CONNECTTIMEOUT => 120, // timeout on connect
CURLOPT_TIMEOUT => 120, // timeout on response
CURLOPT_MAXREDIRS => 10, // stop after 10 redirects
);
$ch = curl_init( $url );
curl_setopt_array( $ch, $options );
$content = curl_exec( $ch );
$err = curl_errno( $ch );
$errmsg = curl_error( $ch );
$header = curl_getinfo( $ch );
curl_close( $ch );
$header['errno'] = $err;
$header['errmsg'] = $errmsg;
$header['content'] = $content;
return $header;
}
$html = get_web_page("url of a website");
And this is how i tought i should parse it:
$dom = new DOMDocument;
$dom->loadHTML(mb_convert_encoding($html["content"], 'HTML-ENTITIES', 'UTF- 8'));
foreach($dom->getElementsByTagName('div') as $div){
echo $div->nodeValue."<br>";
}
But actually I am looking for a value from a specific div with a class, only that value do you know how I am able to get that ?
I use SimpleHTMLDom, it is quite easy and well documented.
You can even find a bunch of questions here in StackOverflow
I have read many question regarding the title. Basically I'm using combination of getheader and curl to check wether a url is exist.
$url = "http://www.asdkkk.com";
$headers = get_headers($url);
if(strpos($headers[0],'404') === false){
$ch = curl_init($url);
curl_setopt_array($ch,array(
CURLOPT_HEADER => true,
CURLOPT_RETURNTRANSFER => true,
CURLOPT_FOLLOWLOCATION => true,
CURLOPT_SSL_VERIFYPEER => false,
CURLOPT_HTTPHEADER => array("Accept-Language: en-US;q=0.6,en;q=0.4"),
CURLOPT_USERAGENT => 'Mozilla/5.0 (Windows NT 5.1) AppleWebKit/535.6 (KHTML, like Gecko) Chrome/16.0.897.0 Safari/535.6'
));
$data = curl_exec($ch);
$httpCode = curl_getinfo($ch, CURLINFO_HTTP_CODE);
if($httpCode != 404){
curl_close($ch);
return $data;
}
}else{
echo "URL Not Exists";
}
Both function will return status code 200 for the url("http://www.asdkkk.com"). In the url is a page not found website. But it seem like it is hosted and the header of the page doesn't set to 404. I have try out not only this url but others too. So how can I determine a URL is actually existence in a very accurate way?
I think the issue with your example code is you are confusing a 404 HTTP response code for 'Not Found' from a server with the case of a URL that doesn't point to any server at all. If there's no server response at all, cURL will return '0' as the HTTP response, rather than 404. Try running the below code and see if it works for your purposes:
$urls = array(
"http://www.asdkkk.com",
"http://www.google.com/cantfindthisurl",
"http://www.google.com",
);
$ch = curl_init();
foreach($urls as $url){
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_exec($ch);
$http_status = curl_getinfo($ch, CURLINFO_HTTP_CODE);
echo "$http_status for $url <br>";
}
I created the following PHP function to the HTTP code of a webpage.
function get_link_status($url, $timeout = 10)
{
$ch = curl_init();
// set cURL options
$opts = array(CURLOPT_RETURNTRANSFER => true, // do not output to browser
CURLOPT_URL => $url, // set URL
CURLOPT_NOBODY => true, // do a HEAD request only
CURLOPT_TIMEOUT => $timeout); // set timeout
curl_setopt_array($ch, $opts);
curl_exec($ch); // do it!
$status = curl_getinfo($ch, CURLINFO_HTTP_CODE); // find HTTP status
curl_close($ch); // close handle
return $status;
}
How can I modify this function to follow 301 & 302 redirects (possibility multiple redirects) and get the final HTTP status code?
set CURLOPT_FOLLOWLOCATION to TRUE.
$opts = array(CURLOPT_RETURNTRANSFER => true, // do not output to browser
CURLOPT_URL => $url, // set URL
CURLOPT_NOBODY => true, // do a HEAD request only
CURLOPT_FOLLOWLOCATION => true // follow location headers
CURLOPT_TIMEOUT => $timeout); // set timeout
If you're not bound to curl, you can do this with standard PHP http wrappers as well (which might be even curl then internally). Example code:
$url = 'http://example.com/';
$code = FALSE;
$options['http'] = array(
'method' => "HEAD"
);
$context = stream_context_create($options);
$body = file_get_contents($url, NULL, $context);
foreach($http_response_header as $header)
{
sscanf($header, 'HTTP/%*d.%*d %d', $code);
}
echo "Status code (after all redirects): $code<br>\n";
See as well HEAD first with PHP Streams.
A related question is How can one check to see if a remote file exists using PHP?.
I coded this function to retrieve JSON data from an API (which returns data in JSON format).
function file_get_contents_curl($url,$json=false){
$ch = curl_init();
$headers = array();
if($json) {
$headers[] = 'Content-type: application/json';
$headers[] = 'X-HTTP-Method-Override: GET';
}
$options = array(
CURLOPT_URL => $url,
CURLOPT_HTTPHEADER => array($headers),
CURLOPT_TIMEOUT => 5,
CURLOPT_RETURNTRANSFER => 1,
CURLOPT_HEADER => 0,
CURLOPT_FOLLOWLOCATION => 1,
CURLOPT_MAXREDIRS => 3,
CURLOPT_USERAGENT => 'Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.0)'
);
curl_setopt_array($ch,$options);
$response = curl_exec($ch);
curl_close($ch);
if($response === false) {
return false;
} else {
return $response;
}
}
If $response is in fact ===false, does it mean that cURL could not connect to the URL? Or could it also be that the API itself returned nothing (but the connection was successful)?
How do I know if cURL connects properly to the URL?
PHP curl doc says:
Return Values
Returns TRUE on success or FALSE on failure. However, if the
CURLOPT_RETURNTRANSFER option is set, it will return the result on
success, FALSE on failure.
Check error using curl_error
If $response === false then curl failed.
It does not mean that curl succeeded but got no content. Since you've turned on CURLOPT_RETURNTRANSFER, that means the response will be returned as a string. So, no content should be indicated by $response === ''.
Where you would run into trouble is if you only had two equal signs rather than three. With three, you're doing type checking, so the boolean false is not the same as an empty string.