I have got 16 links in my get-listing.php output and I need to send the request to each URL to get the responses, for which I need to receive the list of elements when send the request to each URL.
$base1 = "http://testbox.elementfx.com/get-listing.php";
$html = file_get_html($base1);
$links = $html->find('p[id=links] a');
foreach ($links as $element)
{
//open each url in each array
$urls[] = $url = $element->href;
$data = file_get_html($url);
}
When I use the code as above, it will only send the request to each url to get the response which I have got 9 responses. I should have more than 9 responses.
Can you please tell me how I can send request to every url to get the responses using with simple_http_dom?
If your question is to send a simple request to each of the urls you've already parsed and get a response back, try file_get_contents:
foreach ($links as $element)
{
// This array stack is only necessary if you plan on using it later
$urls[] = $url = $element->href;
// $opts and $context are optional for specifying options like method
$opts = array(
'http'=>array(
'method'=>"GET", // "GET" or "POST"
)
);
$context = stream_context_create($opts);
// Remove context argument if not using options array
$data = file_get_contents($url, false, $context);
// ... Do something with $data
}
Your other option is more complex but has more flexibility (depending on the application) is Curl:
foreach ($links as $element)
{
$urls[] = $url = $element->href;
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
// Used for GET request
curl_setopt($ch, CURLOPT_POSTFIELDS, null);
curl_setopt($ch, CURLOPT_POST, FALSE);
curl_setopt($ch, CURLOPT_HTTPGET, TRUE);
// Necessary to return data
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
$data = curl_exec($ch);
curl_close($ch);
// ... Do something with $data
}
This is scratching the surface of Curl, the documentation on PHP's website (linked above) has more information.
If the data being returned from the urls is HTML, you can pass it through PHP's DomDocument to parse after it has been pulled. Documentation and examples are readily available on PHP's site (I can't post more links right now, sorry).
Related
I am just beginning to learn DOM Parser.
Let's assume that in http://test.com I have 4 lines like the one below and I am trying to extract the context as text.
All I need is LPPR 051600Z 35010KT CAVOK 27/14 Q1020 to send as a JSON payload to an incoming webhook.
<FONT FACE="Monospace,Courier">LPPR 051600Z 35010KT CAVOK 27/14 Q1020</FONT><BR>
From this example, how can I do it using $html = str_get_html and $html->find ???
I managed to send the complete HTML content, but that's not what I want.
<?php
include_once('simple_html_dom.php');
$html = file_get_html('http://test.com')->plaintext;
// The data to send to the API
$postData = array('text' => $html);
// Setup cURL
$ch = curl_init('https://uri.com/test');
curl_setopt_array($ch, array(
CURLOPT_POST => TRUE,
CURLOPT_RETURNTRANSFER => TRUE,
CURLOPT_HTTPHEADER => array(
'Authorization: '.$authToken,
'Content-Type: application/json'
),
CURLOPT_POSTFIELDS => json_encode($postData)
));
// Send the request
$response = curl_exec($ch);
// Check for errors
if($response === FALSE){
die(curl_error($ch));
}
// Decode the response
$responseData = json_decode($response, TRUE);
// Print the date from the response
echo $responseData['published'];
?>
Many Thanks
If you are certain that the line is exactly like this one, you can
$line = explode('<br>', $response);
This will create an array with the <FONT>xxxxx</FONT> of each line in each position.
To get only the text from the 2nd line
$filteredResponse = strip_tags($line[1]);
you can use PHP:DOM is an alternative for simple_html_dom
below example gets links from google search.
<?php
# Use the Curl extension to query Google and get back a page of results
$url = "http://www.google.com";
$ch = curl_init();
$timeout = 5;
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, $timeout);
$html = curl_exec($ch);
curl_close($ch);
# Create a DOM parser object
$dom = new DOMDocument();
# Parse the HTML from Google.
# The # before the method call suppresses any warnings that
# loadHTML might throw because of invalid HTML in the page.
#$dom->loadHTML($html);
# Iterate over all the <a> tags
foreach($dom->getElementsByTagName('font') as $link) {
# Show the <font>
echo $link->textContent;
echo "<br />";
}
?>
$dom->getElementsByTagName('font') replace tag that you want.
Happy scraping
reference :
http://htmlparsing.com/php.html
http://php.net/manual/en/book.dom.php
Here is my index.php file...
<?php
// Defining the basic cURL function
function curl($url) {
$ch = curl_init(); // Initialising cURL
curl_setopt($ch, CURLOPT_URL, $url); // Setting cURL's URL option with the $url variable passed into the function
curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE); // Setting cURL's option to return the webpage data
$data = curl_exec($ch); // Executing the cURL request and assigning the returned data to the $data variable
curl_close($ch); // Closing cURL
return $data; // Returning the data from the function
}
$url1 = $_GET['link'];
$response = curl($url1);
$response = str_replace("./views","http://movietube.pm/views",$response);
$response = str_replace("./lib","http://movietube.pm/lib",$response);
$response = str_replace("./assets","http://movietube.pm/assets",$response);
echo $response;
?>
// Defining the basic cURL function
function curl($url) {
$ch = curl_init(); // Initialising cURL
curl_setopt($ch, CURLOPT_URL, $url); // Setting cURL's URL option with the $url variable passed into the function
curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE); // Setting cURL's option to return the webpage data
$data = curl_exec($ch); // Executing the cURL request and assigning the returned data to the $data variable
curl_close($ch); // Closing cURL
return $data; // Returning the data from the function
}
$url1 = $_GET['link'];
$response = curl($url1);
$response = str_replace("./views","http://movietube.pm/views",$response);
$response = str_replace("./lib","http://movietube.pm/lib",$response);
$response = str_replace("./assets","http://movietube.pm/assets",$response);
echo $response;
?>
Basically, what I want it to do is take an input
www.example.com?link=(link)
and return the HTML of the page, after executing the php...
On the output, it loads the page correctly, but it doesn't put in the tv show stuff, like the video player, the links, or the episode director...
What it does...
http://muchmovies.uphero.com/?link=http://www.tvstreaming.cc/watch.php?v=TGmi0OPy0Cc
What I want it to do...
http://www.tvstreaming.cc/watch.php?v=TGmi0OPy0Cc
Any help is appreciated!
Maybe you have a problem with php variable $_GET['link'] because you put this link:
http://muchmovies.uphero.com/?link=http://www.tvstreaming.cc/watch.php?v=TGmi0OPy0Cc
And note that your query string will be this:
?link=http://www.tvstreaming.cc/watch.php?v=TGmi0OPy0Cc
This query string must be encoding and you are not encoding it, so, variable $_GET['link'] will not have the value that you need to do the curl.
I recommend you 2 options:
Encode url params
Or, pass the url using base64 encode then on your
server use base decode
Please, tell me if this was the solution.
I am currently adding the ability to a php back-end system to allow it to print directly and I am trying to get things working with Google's Cloud Print. Imagine the app as an online shopping cart and I want it to print picking notes (completed orders) without the need for someone to login. The server is remote and the destination has Cloud Ready Printers.
So far I have been successful in getting it to print using the interfaces, as long as I am simply passing HTML, plain text or a URL to a PDF. I am able to set the print to color, marginless and the print quality.
However where I have hit a problem is, the PDF which the system creates are not publicly accessible, hence I can't pass a URL to the file, I need to pass the contents of the file.
I have been trying with no success to modify one of the examples I have found on the web HERE. However I don't know the language so am struggling with it.
Another example in python HERE again I have been trying without success!
I'm using PHP and the Zend framework to work with the interface. Here is one sample I have tried, cut down to where I am trying to prepare the file to send, like I say I'm not really sure on translating from python to php, or if the python script even works, but this is what I came up with:
<?php
// Test print a job:
$b64_pathname = PDF_PATH.'ec22c3.pdf'.'.b64';
$fileType = "application/pdf";
// Open the original file and base64 encode it:
$dataHandle = fopen(PDF_PATH.'ec22c3.pdf', "rb");
$dataContent = fread($dataHandle, filesize(PDF_PATH.'ec22ed167763a15e8591a3776f3c65c3.pdf'));
fclose($dataHandle);
$b64data = $fileType.base64_encode($dataContent);
// Store the base64 encoded file:
$ourFileHandle = fopen($b64_pathname, 'w');
fwrite($ourFileHandle, $b64data);
fclose($ourFileHandle);
// Read the contents of the base64 encoded file and delete it:
$fileHandle = fopen($b64_pathname, "rb");
$fileContent = fread($fileHandle, filesize($b64_pathname));
fclose($fileHandle);
unlink($b64_pathname);
// URL encode the file contents:
$file = urlencode($fileContent);
// Add the file and send to the printer:
$client->setParameterPost('content', $file);
$client->setParameterPost('contentType', $fileType);
$client->request(Zend_Http_Client::POST);
?>
Here's a method in php using cUrl (note, I have object level variables called _auth, _username, _password & _printerId).
First, build a function to post with cUrl:
function processRequest($url, $postFields, $referer) {
$ret = "";
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL,$url);
curl_setopt($ch, CURLOPT_USERAGENT, "");
if(!is_null($postFields)) {
curl_setopt($ch, CURLOPT_POST, 1);
curl_setopt($ch, CURLOPT_POSTFIELDS,
$postFields);
// http_build_query() will properly escape the fields and
// build a query string.
}
if(strlen($this->_auth) > 0) {
$headers = array(
"Authorization: GoogleLogin auth=". $this->_auth,
//"GData-Version: 3.0",
"X-CloudPrint-Proxy", "yourappname"
);
curl_setopt($ch, CURLOPT_HTTPHEADER, $headers);
}
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($ch, CURLOPT_REFERER, $referer);
curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, 2);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
$ret = curl_exec ($ch);
curl_close ($ch);
return $ret;
}
Then, a function to authorize against Google:
public function authorize() {
$url = "https://www.google.com/accounts/ClientLogin";
$post = array("accountType" => "HOSTED_OR_GOOGLE",
"Email" => $this->_username,
"Passwd" => $this->_password,
"service" => "cloudprint",
"source" => "yourappname");
$resp = $this->processRequest($url, $post, "");
preg_match("/Auth=([a-z0-9_\-]+)/i", $resp, $matches);
$this->_auth = $matches[1];
}
Finally, build a function to submit to the cloud printer:
function printDocument($title, $docBytes)
{
$url = "http://www.google.com/cloudprint/submit?printerid=". $this->_printerId."&output=json";
$post = array(
"printerid" => $this->_printerId,
"capabilities" => "",
"contentType" => "dataUrl",
"title" => $title,
"content" => 'data:application/pdf;base64,'. base64_encode($docBytes)
);
$ret = $this->processRequest($url, $post, "");
echo $ret;
}
In use, call authorize() to get the authentication token. Then just read your file (from wherever) into a variable and pass it to printDocument with the title.
In order to send base64 encoded content you need to send another parameter in submit request:
$client->setParameterPost('contentTransferEncoding', 'base64');
Please take a look at this sample code:
function http_response($url)
{
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_HEADER, TRUE);
curl_setopt($ch, CURLOPT_NOBODY, TRUE); // remove body
curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE);
$head = curl_exec($ch);
$httpCode = curl_getinfo($ch, CURLINFO_HTTP_CODE);
echo $httpCode ;
}
this code will print the httpCode of the given url. I have couple of questions:
Can I get rid of some setopt() lines here and still getting httpCode?
What about if I want to check multiple urls at the same time? Can I modify the code to do that?
Can I do the same functionality in a simpler way using libraries different than cURL?
Thanks :)
You should be able to remove CURLOPT_HEADER and CURLOPT_NOBODY and still get the same result.
You could do that like this:
$urls = array(
'http://google.com',
'http://facebook.com'
);
$status = array();
foreach($urls as $url){
$status[$url] = http_response($url);
}
Try print_r($status); after this and you'll see the result.
You could do this with file_get_contents and $http_response_header, to learn more: http://www.php.net/manual/en/reserved.variables.httpresponseheader.php I would however recommend using cURL anyway.
*2. to check multiple urls you have to use this function in a loop, in any programming language 1 response from a server = 1 connection to that server. If you want to use 1 function to get responses from multiple servers you can always pass an array to the function and do the loop inside the function
*3. you can try this way:
function get_contents() {
file_get_contents("http://example.com");
var_dump($http_response_header);
}
get_contents();
Im trying to decode a json string returned from flickr within my PHP code. Im using CURL but it keeps returning a string even when I wrap json_decode() around the json sring variable. Any ideas?
$api_key = '####';
$photoset_id = '###';
$query = 'http://api.flickr.com/services/rest/?&method=flickr.photosets.getPhotos&api_key='.$api_key.'&photoset_id='.$photoset_id.'&extras=url_o,url_t&format=json&jsoncallback=1';
$ch = curl_init(); // open curl session
// set curl options
curl_setopt($ch, CURLOPT_URL, $query);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
$data = curl_exec($ch); // execute curl session
curl_close($ch); // close curl session
var_dump(json_decode($data));
Your request URL ends with:
&format=json&jsoncallback=1';
The correct name of the parameter is nojsoncallback, so the right URL you should be using ends like this:
&format=json&nojsoncallback=1';
Change that and it should work.
Regards.
That's because the returned data is not valid JSON. Its valid JavaScript, though.
The returned data is wrapped inside a default callback function called jsonFlickrApi.
You need to get rid of the JSON callback which wraps the JSON inside a callback function which is then supposed to be executed on the client side. You need to do some string manipulation on the returned JSON to remove the default callback jsonFlickrApi and then pass it to json_decode
$api_key = '####';
$photoset_id = '###';
$query = 'http://api.flickr.com/services/rest/?&method=flickr.photosets.getPhotos&api_key='.$api_key.'&photoset_id='.$photoset_id.'&extras=url_o,url_t&format=json';
$ch = curl_init(); // open curl session
// set curl options
curl_setopt($ch, CURLOPT_URL, $query);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
$data = curl_exec($ch); // execute curl session
curl_close($ch); // close curl session
$data = str_replace( 'jsonFlickrApi(', '', $data );
$data = substr( $data, 0, strlen( $data ) - 1 ); //strip out last paren
$object = json_decode( $data ); // stdClass object
var_dump( $object );
Even better instead of using a format=json in your url, use format=php_serial and get a serialize string then you wont have to worry about valid formating from flickr and you get an array in return
$api_key = '####';
$photoset_id = '###';
$query = 'http://api.flickr.com/services/rest/?&method=flickr.photosets.getPhotos&api_key='.$api_key.'&photoset_id='.$photoset_id.'&extras=url_o,url_t&format=php_serial';
$ch = curl_init(); // open curl session
// set curl options
curl_setopt($ch, CURLOPT_URL, $query);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
$data = curl_exec($ch); // execute curl session
curl_close($ch); // close curl session
$output = unserialize ($data);
stack overflow saves the day again. I scoured the flickr documentation and found NO MENTION of this "nojsoncallback" paramater.
who makes such a feature by default, then doesn't tell anyone how to disable it?
even worse, why would it be written that you have to ENable it in order to DISable the function?!
ridiculous... but thanks for the heads up, this fixed my problem!
The details of nojsoncallback is at the bottom this page https://www.flickr.com/services/api/response.json.html