Unable to get Healthline search results with PHP - php

I am trying to run a script that will search Healthline with a query string and determine if there are any search results, but I can't get the contents with the query string posting to the page. To search for something on their site, you go to https://www.healthline.com/search?q1=search+string.
Here is what I tried:
$healthline_url = 'https://www.healthline.com/search';
$search_string = 'ashwaganda';
$postdata = http_build_query(
array(
'q1' => $search_string
)
);
$opts = array('http' =>
array(
'method' => 'POST',
'header' => 'Content-type: application/x-www-form-urlencoded',
'content' => $postdata
)
);
$stream = stream_context_create($opts);
$theHtmlToParse = file_get_contents($healthline_url, false, $stream);
print_r($theHtmlToParse);
I also tried to just add the query string to the url and skip the stream, amongst other variations, but I'm running out of ideas. This also didn't work:
$healthline_url = 'https://www.healthline.com/search';
$search_string = 'ashwaganda';
$opts = array(
'http'=>array(
'method'=>"GET",
'header'=>"Content-Type: text/xml; charset=utf-8"
)
);
$stream = stream_context_create($opts);
$theHtmlToParse = file_get_contents($healthline_url.'&q1='.$search_string, false, $stream);
print_r($theHtmlToParse);
And suggestions?
EDIT: I changed the url in case someone wants to look at the search page. Also fixed the query string. Still doesn't work.
In response to Ken Lee, I did try the following cURL script that also just returns the page without search results:
$healthline_url = 'https://www.healthline.com/search?q1=ashwaganda';
$ch = curl_init();
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_URL, $healthline_url);
$data = curl_exec($ch);
curl_close($ch);
print_r($data);

Healthline does not load the search result directly. It has its search index stored in Algolia and made extra javascript calls to retrieve the result. Therefore you cannot see the search result by file_get_content.
To see the search result, you need to run a browser simulator that simulates a javascript-capable browser to properly run the site page.
For PHP developers, you may try using php-webdriver to control browers through webdriver (e.g. Selenium, Chrome + chromedriver, Firefox + geckodriver).
Update: Didn't know that the target site is Healthline. Updated the answer once I found out.

Related

Delay the response by some seconds when scraping a website

I am working on a php scraping server , so i have website list to loop and then return the content of each page to get the data that i want.
The problem that some sites are not fully returned and as i see some data appear after the page is fully loaded
I tried with both these methods but i cant get the full page
First method :
$opts = array('http' =>
array(
'method' => 'GET',
'timeout' => 10
) );
$context = stream_context_create($opts);
$html = file_get_contents('some url',false,$context);
echo $html;
Second method
$html = implode('',file('some url'));
echo $html;
I just want to return the content of the page after 1 or 2 seconds after the page is loaded.
For Exemple with this url i cant get the search results just this
: Résultats
News Photos Vidéos Tags Filtre par date
Précédente Suivante
Things are not as they seem to be.
Actually the url that you want to hit is
https://api.swiftype.com/api/v1/public/engines/search.json because the webpage on load makes a json request that is on this url.
in that url you have to post the following json
$search = array("engine_key"=>"naxCjQ58frTkB_diETvu","page"=>1,"q"=>"kardas","per_page"=>12,"sort_direction"=>"","filters"=>array("page"=>array("category"=>"News")),"facets"=>array("page"=>array("0"=>"tag")));
A quick guide:
On the "page" property type a value, that represents the page number you want to get,
on the "q" property type the term that you want to search,
"per_page" property is the entries that you will get, try some
values, 12 is the default,
the rest you have to find them out yourself.
a code example that works
$ch = curl_init();
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
curl_setopt($ch,CURLOPT_URL,"https://api.swiftype.com/api/v1/public/engines/search.json");
curl_setopt($ch,CURLOPT_RETURNTRANSFER,1);
curl_setopt($ch,CURLOPT_POSTFIELDS,json_encode($search));
curl_setopt($ch,CURLOPT_POST, true);
curl_setopt($ch,CURLOPT_HTTPHEADER, array('Content-Type: application/json; charset=utf-8'));
curl_setopt($ch,CURLOPT_HEADER, 0);
$data = curl_exec($ch);
curl_close($ch);
and to check the results
print_r(json_decode($data));
this thing is beautiful is like them to give you an API on the plate...

file_get_contents() with no response - php

I am trying to call a web service which basicaly looks like this:
http://10.10.10.10:8080/gw/someAction?amount=10&description='Some description'
So this is how i call this web service:
$endpoint = "http://10.10.10.10:8080/gw/someAction?amount=10&description='Some description'";
$opts = array('http' =>
array(
'method' => 'GET',
'header' => 'Content-type: application/xml'
)
);
$context = stream_context_create( $opts );
$result = file_get_contents( $endpoint, false, $context );
$xml_result = simplexml_load_string( $result );
echo $xml_result->success;
So here, i got nothing, the xml_result is empty. And here is the interesting part - when i remove the blank space from the description:
http://10.10.10.10:8080/gw/someAction?amount=10&description='Somedescription'
Everything is just fine, I got the answer from the web service. Also tried to call the web service with the chrome rest client WITH the blank space in the description and everything is OK, i have response. So this leads me to some kind of PHP problem here with the blank spaces in the web service. Please, help !
UPDATE:
print_r($result)
results in
1
This is not a valid URL, spaces must be escaped:
http://10.10.10.10:8080/gw/someAction?amount=10&description='Some%20description'
You might want to take a look at How to properly URL encode a string in PHP?.

using a context-stream resource with file_get_contents returns a NULL string

I'm using PHP 4.3.9 and am trying to POST to a url without a form using stream_context_create like below:
function do_post_request($url, $postdata) {
$content = "";
foreach($postdata as $key => $value)
$content .= "$key=$value&";
$content = urlencode($content);
$params = array('http' => array(
'method' => 'POST',
'header' => 'Content-Type: application/x-www-form-urlencoded',
'content' => $content
));
$ctx = stream_context_create($params);
$result = file_get_contents($url, false, $ctx);
var_dump($result);
This code is taken almost word for word from the php manual and I've seen it in several places here on stackoverflow as well.
If I do file_get_contents without $ctx, var_dump($results) will display the $url properly (but without the necessary changes $_POST would cause, of course). With $ctx, var_dump($result) is NULL. So something is wrong with $ctx but I have no idea what. Am I setting up my $params incorrectly or something?
Any insight would be appreciated. If there is another way to pass POST data to a url I wouldn't mind hearing that either. But I cannot use cURL (or anything that needs installation) and I'm using an older version of php so my choices are limited.
Thanks

Getting JSON response with PHP

I'm trying to connect an API that uses 0AUTH2 via PHP. The original plan was to use client-side JS, but that isn't possible with 0AUTH2.
I'm simply trying get a share count from the API's endpoint which is here:
https://api.bufferapp.com/1/links/shares.json?url=[your-url-here]&access_token=[your-access-key-here]
I do have a proper access_token that I am using to access the json file, that is working fine.
This is the code I have currently written, but I'm not even sure I'm on the right track.
// 0AUTH2 ACCESS TOKEN FOR AUTHENTICATION
$key = '[my-access-key-here]';
// JSON URL TO BE REQUESTED
$json_url = 'https://api.bufferapp.com/1/links/shares.json?url=http://bufferapp.com&access_token=' . $key;
// GET THE SHARE COUNT FROM THE REQUEST
$json_string = '[shares]';
// INITIALIZE CURL
$ch = curl_init( $json_url );
// CONFIG CURL OPTIONS
$options = array(
CURLOPT_RETURNTRANSFER => true,
CURLOPT_HTTPHEADER => array('Content-type: application/json') ,
CURLOPT_POSTFIELDS => $json_string
);
// SETTING CURL AOPTIONS
curl_setopt_array( $ch, $options );
// GET THE RESULTS
$result = curl_exec($ch); // Getting jSON result string
Like I said, I don't know if this is the best method - so I'm open to any suggestions.
I'm just trying to retrieve the share count with this PHP script, then with JS, spit out the share count where I need it on the page.
My apologies for wasting anyone's time. I have since been able to work this out. All the code is essentially the same - to test to see if you're getting the correct response, just print it to the page. Again, sorry to have wasted anyones time.
<?php
// 0AUTH2 ACCESS TOKEN FOR AUTHENTICATION
$key = '[your_access_key_here]';
// URL TO RETRIEVE SHARE COUNT FROM
$url = '[your_url_here]';
// JSON URL TO BE REQUESTED - API ENDPOINT
$json_url = 'https://api.bufferapp.com/1/links/shares.json?url=' . $url . ' &access_token=' . $key;
// GET THE SHARE COUNT FROM THE REQUEST
$json_string = '[shares]';
// INITIALIZE CURL
$ch = curl_init( $json_url );
// CONFIG CURL OPTIONS
$options = array(
CURLOPT_RETURNTRANSFER => true,
CURLOPT_HTTPHEADER => array('Content-type: application/json') ,
CURLOPT_POSTFIELDS => $json_string
);
// SETTING CURL AOPTIONS
curl_setopt_array( $ch, $options );
// GET THE RESULTS
$result = curl_exec($ch); // Getting jSON result string
print $result;
?>

Send parameters to a URL and get output from that page

I have 2 pages say abc.php and def.php. When abc.php sends 2 values [id and name] to def.php, it shows a message "Value received". Now how can I send those 2 values to def.php without using form in abc.php and get the "Value received" message from def.php? I can't use form because when user frequently visits the abc.php file, the script should automatically work and get the message "Value received" from def.php. Please see my example code:
abc.php:
<?php
$id="123";
$name="blahblah";
//need to send the value to def.php & get value from that page
// echo $value=Print the "Value received" msg from def.php;
?>
def.php:
<?php
$id=$_GET['id'];
$name=$_GET['name'];
if(!is_null($id)&&!is_null($name))
{ echo "Value received";}
else{echo "Not ok";}
?>
Is there any kind heart who can help me solve the issue?
First make up your mind : do you want GET or POST parameters.
Your script currently expects them to be GET parameters, so you can simply call it (provided that URL wrappers are enabled anyway) using :
$f = file_get_contents('http://your.domain/def.php?id=123&name=blahblah');
To use the curl examples posted here in other answers you'll have to alter your script to use $_POST instead of $_GET.
You can try without cURL (I havent tried though):
Copy pasted from : POSTing data without cURL extension
// Your POST data
$data = http_build_query(array(
'param1' => 'data1',
'param2' => 'data2'
));
// Create HTTP stream context
$context = stream_context_create(array(
'http' => array(
'method' => 'POST',
'header' => 'Content-Type: application/x-www-form-urlencoded',
'content' => $data
)
));
// Make POST request
$response = file_get_contents('http://example.com', false, $context);
Taken from the examples page of php.net:
// create curl resource
$ch = curl_init();
// set url
curl_setopt($ch, CURLOPT_URL, "example.com/abc.php");
//return the transfer as a string
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
// $output contains the output string
$output = curl_exec($ch);
// close curl resource to free up system resources
curl_close($ch);
Edit: To send parameters
curl_setopt( $ch, CURLOPT_POST, true );
curl_setopt( tch, CURLOPT_POSTFIELDS, array('var1=foo', 'var2=bar'));
use CURL or Zend_Http_Client.
<?php
$method = 'GET'; //change to 'POST' for post method
$url = 'http://localhost/browse/';
$data = array(
'manufacturer' => 'kraft',
'packaging_type' => 'bag'
);
if ($method == 'POST'){
//Make POST request
$data = http_build_query($data);
$context = stream_context_create(array(
'http' => array(
'method' => "$method",
'header' => 'Content-Type: application/x-www-form-urlencoded',
'content' => $data)
)
);
$response = file_get_contents($url, false, $context);
}
else {
// Make GET request
$data = http_build_query($data, '', '&');
$response = file_get_contents($url."?".$data, false);
}
echo $response;
?>
get inspired by trix's answer, I decided to extend that code to cater for both GET and POST method.

Categories