I'm stuck on this. I'm trying to pull dynamically generated JSON data from a remote server. Here is just the URL to generate the JSON:
https://www.librarything.com/api_getdata.php?userid=jtodd1973&key=1962548278&max=1&responseType=json
I am able to access the data fine using jQuery/AJAX. Here's the code I'm using on jtodd.info/librarything.php:
<div id="booklist">
<table id="tbl_books" style="width:80%; border: thin solid black;">
<tr style="border: thin solid black; background-color: #666; color: #fff;">
<th style="width:40%;">Title</th>
<th style="width:30%;">Author</th>
<th style="width:10%;">Rating</th>
<th style="width:20%;">Reading dates</th>
</tr>
</table>
</div>
<script type="text/javascript">
$(document).ready(function () {
$.ajax({
type:'POST',
callback: 'callback',
crossDomain: true,
contentType: 'application/json; charset=utf-8',
dataType:'JSONP',
beforeSend: function setHeader(xhr){ xhr.setRequestHeader('accept', 'application/json'); },
url:'https://www.librarything.com/api_getdata.php?userid=jtodd1973&key=1962548278&booksort=title&showTags=1&showCollections=1&showDates=1&showRatings=1&max=1000',
success:function(data) {
x = 0;
var data1 = JSON.stringify(data);
var data2 = JSON.parse(data1);
$.each(data2.books, function(i,book){
var date1 = Number(1420027199);
var date2 = Number(book.entry_stamp);
if (date2 > date1) {
x = x + 1;
var testTitle = book.title;
var n = testTitle.indexOf(" (");
if(n > -1) {
var bookTitle = testTitle.substr(0, n);
} else {
var bookTitle = testTitle;
}
var bookAuthor = book.author_lf;
var bookRating = book.rating;
if(x % 2 == 0){
var rowColor = "#fff";
} else {
var rowColor = "#ccc";
}
$('#booklist table').append('<tr style="background-color:' + rowColor + ';">' +
'<td style="font-style: italic;">' + bookTitle +
'</td><td>' + bookAuthor +
'</td><td style="text-align: center;">' + bookRating +
'</td><td> ' +
'</td></tr>');
}
});
},
error:function() {
alert("Sorry, I can't get the feed");
}
});
});
</script>
However, I am not able to access the data using PHP & cURL. I'm getting no response from the server. More specifically, I get Error number 7 / HTTP code 0. Here's the code I am using on jtodd.info/librarything2.php:
<?php
$url = 'https://www.librarything.com/api_getdata.php?userid=jtodd1973&key=1962548278&max=1&responseType=json';
$result = get_web_page( $url );
if ( $result['errno'] != 0 )
echo "<p>Error number = " . $result['errno'] . "</p>";
if ( $result['http_code'] != 200 )
echo "<p>HTTP code = " . $result['http_code'] . "</p>";
$page = $result['content'];
echo "<pre>" . $page . "</pre>";
function get_web_page( $url ) {
if(!function_exists("curl_init")) die("cURL extension is not installed");
$ch = curl_init();
$options = array(
CURLOPT_URL => $url,
CURLOPT_RETURNTRANSFER => true, // return web page
CURLOPT_HEADER => true, // don't return headers
CURLOPT_FOLLOWLOCATION => true, // follow redirects
CURLOPT_ENCODING => "", // handle all encodings
CURLOPT_USERAGENT => "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.2 (KHTML, like Gecko) Chrome/22.0.1216.0 Safari/537.2", // who am i
CURLOPT_AUTOREFERER => true, // set referer on redirect
CURLOPT_CONNECTTIMEOUT => 120, // timeout on connect
CURLOPT_TIMEOUT => 120, // timeout on response
CURLOPT_MAXREDIRS => 10, // stop after 10 redirects
CURLOPT_SSL_VERIFYPEER => false // Disabled SSL Cert checks
);
curl_setopt_array( $ch, $options );
$content = curl_exec( $ch );
$err = curl_errno( $ch );
$errmsg = curl_error( $ch );
$header = curl_getinfo( $ch );
curl_close( $ch );
$header['errno'] = $err;
$header['errmsg'] = $errmsg;
$header['content'] = $content;
return $header;
}
?>
Thanks for any advice.
I've just tried your code - and this is the response I get:
HTTP/1.1 200 OK
Server: nginx
Date: Wed, 11 Feb 2015 20:53:34 GMT
Content-Type: application/json
Content-Length: 1102
Connection: keep-alive
Set-Cookie: cookie_from=deleted; expires=Thu, 01-Jan-1970 00:00:01 GMT; path=/
Set-Cookie: LTAnonSessionID=572521187; expires=Wed, 10-Feb-2016 20:53:34 GMT; path=/
lt-backend: 192.168.0.101:80
{"settings":{"amazonchoice":null,"show":{"showCovers":null,"showAuthors":null,"showTitles":null,"showRatings":null,"showDates":null,"showReviews":null,"showTags":null,"showCollections":null},"style":null,"title":null,"titleLink":null,"theuser":"jtodd1973","powered":"Powered by ","uniqueKey":null,"bookcount":1,"showWhat":null,"nullSetMsg":"No books found.","notEnoughImagesMsg":"Not enough books found.","domain":"www.librarything.com","textsnippets":{"by":"by","Tagged":"Tagged","readreview":"read review","stars":"stars"}},"books":{"112016118":{"book_id":"112016118","title":"As I lay dying : the corrected text","author_lf":"Faulkner, William","author_fl":"William Faulkner","author_code":"faulknerwilliam","ISBN":"067973225X","ISBN_cleaned":"067973225X","publicationdate":"1990","entry_stamp":"1409083726","entry_date":"Aug 26, 2014","copies":"1","rating":5,"language_main":"","language_secondary":"","language_original":"","hasreview":"0","dateacquired_stamp":"0","dateacquired_date":"Dec 31, 1969","cover":"https:\/\/images-na.ssl-images-amazon.com\/images\/P\/067973225X.01._SCLZZZZZZZ_.jpg"}}}
i.o.w. the problem is not in your PHP code - you need to look further (like: find out if your IP is blocked for some reason)
Related
I'm using the following code in an attempt to get a public Linkedin company page into a variable, but it always returns Linkedin's page not found 404. Any idea where I'm going wrong?
$html = get_web_page('https://www.linkedin.com/company/google/');
echo stripos( $html['content'], 'occludable-update' );
echo $html['content'];
function get_web_page( $url )
{
$user_agent='Mozilla/5.0 (Windows NT 6.1; rv:8.0) Gecko/20100101 Firefox/8.0';
$options = array(
CURLOPT_CUSTOMREQUEST =>"GET", //set request type post or get
CURLOPT_POST =>false, //set to GET
CURLOPT_USERAGENT => $user_agent, //set user agent
CURLOPT_COOKIEFILE =>"cookie.txt", //set cookie file
CURLOPT_COOKIEJAR =>"cookie.txt", //set cookie jar
CURLOPT_RETURNTRANSFER => true, // return web page
CURLOPT_HEADER => false, // don't return headers
CURLOPT_FOLLOWLOCATION => true, // follow redirects
CURLOPT_ENCODING => "", // handle all encodings
CURLOPT_AUTOREFERER => true, // set referer on redirect
CURLOPT_CONNECTTIMEOUT => 120, // timeout on connect
CURLOPT_TIMEOUT => 120, // timeout on response
CURLOPT_MAXREDIRS => 10, // stop after 10 redirects
);
$ch = curl_init( $url );
curl_setopt_array( $ch, $options );
$content = curl_exec( $ch );
$err = curl_errno( $ch );
$errmsg = curl_error( $ch );
$header = curl_getinfo( $ch );
curl_close( $ch );
$header['errno'] = $err;
$header['errmsg'] = $errmsg;
$header['content'] = $content;
return $header;
}
They must have some kind of scraping protection in place. If you fetch the page with curl via CLI you can see that it just returns a bit of Javascript code:
$ curl https://www.linkedin.com/company/google/
<html><head>
<script type="text/javascript">
window.onload = function() {
// Parse the tracking code from cookies.
var trk = "bf";
var trkInfo = "bf";
var cookies = document.cookie.split("; ");
for (var i = 0; i < cookies.length; ++i) {
if ((cookies[i].indexOf("trkCode=") == 0) && (cookies[i].length > 8)) {
trk = cookies[i].substring(8);
}
else if ((cookies[i].indexOf("trkInfo=") == 0) && (cookies[i].length > 8)) {
trkInfo = cookies[i].substring(8);
}
}
if (window.location.protocol == "http:") {
// If "sl" cookie is set, redirect to https.
for (var i = 0; i < cookies.length; ++i) {
if ((cookies[i].indexOf("sl=") == 0) && (cookies[i].length > 3)) {
window.location.href = "https:" + window.location.href.substring(window.location.protocol.length);
return;
}
}
}
// Get the new domain. For international domains such as
// fr.linkedin.com, we convert it to www.linkedin.com
var domain = "www.linkedin.com";
if (domain != location.host) {
var subdomainIndex = location.host.indexOf(".linkedin");
if (subdomainIndex != -1) {
domain = "www" + location.host.substring(subdomainIndex);
}
}
window.location.href = "https://" + domain + "/authwall?trk=" + trk + "&trkInfo=" + trkInfo +
"&originalReferer=" + document.referrer.substr(0, 200) +
"&sessionRedirect=" + encodeURIComponent(window.location.href);
}
</script>
</head></html>
This cron job php script finds the csv file on my server then loops through the urls on it. It attempts to check if its loaded via https or http or its offline via curl. This curl request may be taking up too much time. I've done this via post through ajax and it completes the job, but I needed to do this via cron job and a csv file. Are there any other possible solutions?
Can you find a reason why it doesn't complete the task?
Any help would be great.
function url_test($url){
$timeout = 20;
$ch = curl_init();
curl_setopt ($ch, CURLOPT_HEADER , true); // we want headers
curl_setopt($ch, CURLOPT_NOBODY , true); // we don't need body
curl_setopt ( $ch, CURLOPT_URL, $url );
curl_setopt ( $ch, CURLOPT_RETURNTRANSFER, 1 );
curl_setopt ( $ch, CURLOPT_TIMEOUT, $timeout );
$http_respond = curl_exec($ch);
$http_respond = trim( strip_tags( $http_respond ) );
$http_code = curl_getinfo( $ch, CURLINFO_HTTP_CODE );
if ( ( $http_code == "200" ) || ( $http_code == "301")) {
return true;
} else {
return false;
}
}
// run on each url
$offline = 0;
$fullcount = 0;
if (($handle = fopen("/pathtocsv/".$csv, "r")) !== FALSE)
{
while (($data = fgetcsv($handle, 1000, ",")) !== FALSE)
{
$num = count($data);
for ($c = 0; $c < $num; $c++)
{
$https = "https://".$data[$c];
$https = strtolower($https);
$http = "http://".$data[$c];
$http = strtolower($http);
$http = preg_replace('/\s+/', '', $http);
$https = preg_replace('/\s+/', '', $https);
$site = $data[$c];
if(url_test($https))
{
$fullcount++;
echo $https. " <br>";
?>
<?php
}
else if(url_test($http))
{
$fullcount++;
echo $http. " <br>";
?>
<?php
}else{
echo $site. " <br>";
$mysqltime = date("Y-m-d H:i:s", $phptime);
try
{
$conn = new PDO("conn info here);
// set the PDO error mode to exception
$conn->setAttribute(PDO::ATTR_ERRMODE, PDO::ERRMODE_EXCEPTION);
$sql = $conn->prepare($sql);
$sql = "INSERT INTO table (url,csv,related)
VALUES ('$site','$csv',1)";
// use exec() because no results are returned
$conn->exec($sql);
echo "New record created successfully";
}
catch(PDOException $e)
{
echo "Connection failed: " . $e->getMessage();
}
}
curl_close( $ch );
}
You can use the get_headers() function. reference
It will return a response similar to:
Array
(
[0] => HTTP/1.1 200 OK
[1] => Date: Sat, 29 May 2004 12:28:13 GMT
[2] => Server: Apache/1.3.27 (Unix) (Red-Hat/Linux)
[3] => Last-Modified: Wed, 08 Jan 2003 23:11:55 GMT
[4] => ETag: "3f80f-1b6-3e1cb03b"
[5] => Accept-Ranges: bytes
[6] => Content-Length: 438
[7] => Connection: close
[8] => Content-Type: text/html
)
which you can use to validate as needed.
As for why the task you're running does not complete. Have you checked the error logs.
EDIT:
My updated PHP Code:
<?php
$opts = array(
'http'=>array(
'method'=>"GET",
'header'=>"Referer: http://www.ucaster.me/hembedplayer/shid05/1/1/1" .
"User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:51.0) Gecko/20100101 Firefox/51.0"
));
$url = file_get_contents('http://www.ucaster.me/hembedplayer/shid05/1/1/1', false, stream_context_create($opts));
preg_match("/playlist\.m3u8\?id=([^&=>]+).*?enableVideo\("([^"]*)/s", $url, $m);
$video_id = $m[1];
$video_pk = $m[2];
echo $video_pk;
?>
What I have to do is pull the stream pk inside the source
Page Source (same as before):
<script type="text/javascript">
var videoPlayer = document.createElement("video");
videoPlayer.setAttribute("id", "videoplayer");
videoPlayer.setAttribute("width", "580");
videoPlayer.setAttribute("height", "450");
videoPlayer.setAttribute("poster", "/static/images/logo.png");
videoPlayer.setAttribute("autoplay", true);
videoPlayer.setAttribute("controls", "");
var em = document.createElement("em");
em.innerHTML = "Sorry, your browser doesn't support HTML5 video.";
videoPlayer.appendChild(em);
document.getElementById("player_div").appendChild(videoPlayer);
function setupVideo() {
if (Hls.isSupported()) {
var video = document.getElementById('videoplayer');
var player = new Hls();
player.attachMedia(video);
player.on(Hls.Events.MEDIA_ATTACHED, function () {
var hlsUrl = "http://" + ea + ":8088/live/shid04/playlist.m3u8?id=95328&pk=";
hlsUrl = hlsUrl + enableVideo("5be02e45f5917b29199f8e5326499a6f8c6c7c9df86920b38c09bee46b050289");
player.loadSource(hlsUrl);
player.on(Hls.Events.MANIFEST_PARSED, function (event, data) {
video.play();
});
});
}else {
em.innerHTML = "Sorry, your browser doesn't support HTML5 video.";
}
}
</script>
I am trying to display in echo the pk like this:
5be02e45f5917b29199f8e5326499a6f8c6c7c9df86920b38c09bee46b050289
See https://regex101.com/r/aIxtsI/1
preg_match("/playlist\.m3u8\?id=([^&=>]+).*?enableVideo\("([^"]*)/s", $input, $m);
$video_id = $m[1];
$video_pk = $m[2];
I have this code
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_HTTPHEADER, Array("'Content-Type: application/json; charset=utf-8'"));
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLOPT_POSTFIELDS, json_encode($data_str));
curl_setopt($ch, CURLOPT_POST, 1);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, 0);
$xmlresult = curl_exec($ch);
$xmlError = $xmlresult;
$json = json_decode($xmlresult, true);
The answer I get into json but I could not convert because the in the beginning answer is, extra characters See example
п»ї{"customerPaymentProfileIdList":[],"customerShippingAddressIdList":[],"validationDirectResponseList":[],"messages":{"resultCode":"Error","message":[{"code":"E00039","text":"A duplicate record with ID 39223758 already exists."}]}}
response header
HTTP/1.1 200 OK
Cache-Control: private
Content-Length: 232
Content-Type: application/json; charset=utf-8
Server: Microsoft-IIS/7.5
X-AspNet-Version: 2.0.50727
X-Powered-By: ASP.NET
Access-Control-Allow-Origin: *
Access-Control-Allow-Methods: PUT,OPTIONS,POST,GET
Access-Control-Allow-Headers: x-requested-with,cache-control,content-type,origin,method,SOAPAction
Date: Thu, 04 Feb 2016 09:08:15 GMT
Connection: keep-alive
Because of the extra characters I can not json_decode string. What can be done?
I encountered the same issue when developing my library for accessing their JSON API. In the code that handles the response I had to strip those characters out in order to properly decode the string as JSON.
Line 113:
$this->responseJson = preg_replace('/[\x00-\x1F\x80-\xFF]/', '', $responseJson);
I'm having the same issue in Node.js with JSON.parse().
var https = require('https');
var requestData = {
"getCustomerProfileIdsRequest": {
"merchantAuthentication": {
"name": "your-auth-name-here",
"transactionKey": "your-trans-key-name-here"
}
}
};
var requestString = JSON.stringify(requestData);
var req = https.request({
host: "apitest.authorize.net",
port: "443",
path: "/xml/v1/request.api",
method: "POST",
headers: {
"Content-Length": requestString.length,
"Content-Type": "application/json"
}
});
req.on('response', function (resp) {
var response = '';
resp.setEncoding('utf8');
resp.on('data', function(chunk) {
response += chunk;
});
resp.on('end', function() {
var buf = new Buffer(response);
console.log('buf[0]:', buf[0]); // 239 Binary 11101111
console.log('buf[0] char:', String.fromCharCode(buf[0])); // "ï"
console.log('buf[1]:', buf[1]); // 187 Binary 10111011
console.log('buf[1] char:', String.fromCharCode(buf[1])); // "»"
console.log('buf[2]:', buf[2]); // 191 Binary 10111111
console.log('buf[2] char:', String.fromCharCode(buf[2])); // "¿"
console.log('buf[3]:', buf[3]); // 123
console.log('buf[3] char:', String.fromCharCode(buf[3])); // "{"
// Note: The first three chars are a "Byte Order Marker" i.e. `BOM`, `ZERO WIDTH NO-BREAK SPACE`, `11101111 10111011 10111111`
response = JSON.parse(response); // Throws error: 'Unrecoverable exception. Unexpected token SyntaxError: Unexpected token'
console.log(response);
});
});
req.on('error', function (error) {
console.log(JSON.stringify(error));
});
req.on('socket', function(socket) {
socket.on('secureConnect', function() {
req.write(requestString);
req.end();
});
});
If I call trim() on the response, it works:
response = JSON.parse(response.trim());
Or replace the BOM:
response = response.replace(/^\uFEFF/, '');
response = JSON.parse(response);
I've a variable with multiple single quotes and want to extract a string of this.
My Code is:
$image['src'] = addslashes($image['src']);
preg_match('~src=["|\'](.*?)["|\']~', $image['src'], $matches);
$image['src'] = $matches[1];
$image['src'] contains this string:
tooltip_html(this, '<div style="display: block; width: 262px"><img src="https://url.com/var/galerie/15773_262.jpg"/></div>');
I thought all would be right but $image['src'] returns null. The addslashes method works fine and returns this:
tooltip_html(this, \'<div style="display: block; width: 262px"><img src="https://url.com/var/galerie/15773_262.jpg"/></div>\');
I don't get the problem in here, did I miss something?
=====UPDATE======
The whole code:
<?php
error_reporting(E_ALL);
header("Content-Type: application/json", true);
define('SITE', 'https://akipa-autohandel.autrado.de/');
include_once('simple_html_dom.php');
/**
* Create CDATA-Method for XML Output
*/
class SimpleXMLExtended extends SimpleXMLElement {
public function addCData($cdata_text) {
$node = dom_import_simplexml($this);
$no = $node->ownerDocument;
$node->appendChild($no->createCDATASection($cdata_text));
}
}
/**
* Get a web file (HTML, XHTML, XML, image, etc.) from a URL. Return an
* array containing the HTTP server response header fields and content.
*/
function get_web_page( $url ) {
$user_agent='Mozilla/5.0 (Windows NT 6.1; rv:8.0) Gecko/20100101 Firefox/8.0';
$options = array(
CURLOPT_CUSTOMREQUEST =>"GET", //set request type post or get
CURLOPT_POST =>false, //set to GET
CURLOPT_USERAGENT => $user_agent, //set user agent
CURLOPT_COOKIEFILE =>"cookie.txt", //set cookie file
CURLOPT_COOKIEJAR =>"cookie.txt", //set cookie jar
CURLOPT_RETURNTRANSFER => true, // return web page
CURLOPT_HEADER => false, // don't return headers
CURLOPT_FOLLOWLOCATION => true, // follow redirects
CURLOPT_ENCODING => "", // handle all encodings
CURLOPT_AUTOREFERER => true, // set referer on redirect
CURLOPT_CONNECTTIMEOUT => 120, // timeout on connect
CURLOPT_TIMEOUT => 120, // timeout on response
CURLOPT_MAXREDIRS => 10, // stop after 10 redirects
);
$ch = curl_init( $url );
curl_setopt_array( $ch, $options );
$content = curl_exec( $ch );
$err = curl_errno( $ch );
$errmsg = curl_error( $ch );
$header = curl_getinfo( $ch );
if($content === FALSE) {
// when output is false it can't be used in str_get_html()
// output a proper error message in such cases
echo 'output error';
die(curl_error($ch));
}
curl_close( $ch );
$header['errno'] = $err;
$header['errmsg'] = $errmsg;
$header['content'] = $content;
return $header;
}
function renderPage( $uri ) {
$rendering = get_web_page( $uri );
if ( $rendering['errno'] != 0 )
echo 'bad url, timeout, redirect loop';
if ( $rendering['http_code'] != 200 )
echo 'no page, no permissions, no service';
$content = $rendering['content'];
if(!empty($content)) {
$parsing = str_get_html($content);
}
return $parsing;
}
/**
* Get all current car data of the selected autrado site
*/
function models() {
$paramURI = SITE . 'schnellsuche.php?suche_hersteller=14&suche_modell=&suche_from=form&suche_action=suche&itemsperpage=500';
$content = renderPage($paramURI);
foreach ($content->find('tr[class*=fahrzeugliste]') as $auto) {
$item['src'] = $auto->find('a[onmouseover]', 0)->onmouseover;
preg_match('~src=["\'](.*?)["\']~', $item['src'], $matches);
echo $matches[1];
}
}
if(isset($_POST['action']) && !empty($_POST['action'])) {
$action = $_POST['action'];
if((string) $action == 'test') {
$output = models();
json_encode($output);
}
}
?>
The content of $image['src'] is not as you wrote above. I've run now your script and the content is:
tooltip_html(this, '<div style="display: block; width: 262px"><img src="http://server12.autrado.de/autradogalerie_copy/var/galerie/127915_262.jpg" /></div>');
It will work if you add the following line before the preg_match:
$item['src']= html_entity_decode($item['src']);