Load website in div using curl - php

I'm using curl to retrieve an external website and display it in a div...
function Get_Domain_Contents($url){
$ch = curl_init();
$timeout = 5;
curl_setopt($ch,CURLOPT_URL,$url);
curl_setopt($ch,CURLOPT_RETURNTRANSFER,1);
curl_setopt($ch,CURLOPT_CONNECTTIMEOUT,$timeout);
curl_setopt($ch, CURLOPT_TIMEOUT, 10);
$data = curl_exec($ch);
curl_close($ch);
return $data;
}
but how do I get it to return the css and images of that webpage too? Right now it just returns everything except the images and css. Thanks in advance.

$url = 'http://example.com';
$html = Get_Domain_Contents($url);
$html = "<base href='{$url}' />" . $html;
echo $html;

Rather than download all these extra files, you could parse the content that you downloaded and modify any relative URLs to make them absolute URLs. You'd be using them in place and the CSS and images would be included in the rendered code. You could use something like the PHP simple HTML DOM parser to make parsing part of the task easier.

Related

Return the Link of first result in google search in PHP

I want to return the first URL of google search result like:
First Url Result
.php code:
<?php
//Check if submit button is clicked or not
if (isset($_POST['submit'])) {
$text = $_POST['text'];
// echo $text;
function file_get_contents_curl($url) {
$ch = curl_init();
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1); //Set curl to return the data instead of printing it to the browser.
curl_setopt($ch, CURLOPT_URL, $url);
$data = curl_exec($ch);
curl_close($ch);
return $data;
}
$query = $text;
$url = 'http://www.google.co.in/search?q='.urlencode($query).'';
echo $url;
$scrape = file_get_contents_curl($url);
echo $scrape;
?>
How can i achieve that?
By default since using this scraping method only sends an HTTP request and does not run render page as browsers do, google won't load you the page you are looking for and it will show you an agreement page like this.
You should use the google Search API as mentioned by ADyson in the comment
Another approach which is not recommended is that you use selenium or any headless browser. Using the headless browser you will still be prompt by agreement but behind it you can scrape the search results.

Parse element from redirected page with PHP Simple HTML DOM Parser

I'm trying to parse a title from redirected page. Here is my code:
<?php
include_once('simple_html_dom.php');
$link="https://duckduckgo.com/?q=!ducky+google";
$html = file_get_html($link);
foreach ($html->find('title') as $text){
echo $text->plaintext."<br/>";
}
?>
The result should be "Google". Thanks
I'm not sure I 100% understood your request, but here are a few things to help you move on !
Three things :
the "!" in the $link redirects you to google. Delete it if you want to access the ducky result page.
simple-html-dom can't access the ducky result page. Did you try to echo the $html to see what you get ? I tried and was blocked by a captcha ... you'll need to figure out how to bypass it. Then and only then you'll have access to the titles.
Finally, your titles are H2 ... it might be easier to reach h2 tags with the parser.
Does this help ?
If you find a way to bypass the captcha let me know ! I'm interested :)
After use Simle DOM find URL redirected page :
<?php
$url="https://duckduckgo.com/?q=!ducky+google";
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_HEADER, true);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true); // Must be set to true so that PHP follows any "Location:" header
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
$a = curl_exec($ch); // $a will contain all headers
$url = curl_getinfo($ch, CURLINFO_EFFECTIVE_URL); // This is what you need, it will return you the last effective URL
echo $url; // your url hire *_* AppsGM
?>
AND now use your code
<?php
include_once('simple_html_dom.php');
$html = file_get_html($url);
foreach ($html->find('title') as $text){
echo $text->plaintext."<br/>";
}
?>

Call links from the site when using the shortcut API - white page

i have a Problem with shortcut links site ,
when i use API by this way http://hulklink.net/api.php?key=534287562&url=google.com
Does not show links ... but i sure my code is correct .
i think the Problem is this site don't show short link in same page
but show it in Another path
like this http://hulklink.net/apiget.php
that's my code
<?php
function get_vgd($url)
{
$apiurl = "http://hulklink.net/api.php?key=534287562&url=$url";
$ch = curl_init();
$timeout = 5;
curl_setopt($ch,CURLOPT_URL,$apiurl);
curl_setopt($ch,CURLOPT_RETURNTRANSFER,1);
curl_setopt($ch,CURLOPT_CONNECTTIMEOUT,$timeout);
$data = curl_exec($ch);
curl_close($ch);
return $data;
}
echo get_vgd("http://google.com");
?>
thnx .
Looks like that website is using a redirect to display the shorten link.
Turn on follow redirects for curl.
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);

Getting Website URL Dynamically in Simple PHP HTML dom parser

This script will get a website address dynamically from the url.Example: www.site.com/fetch.php?url=http://www.google.com to fetch its(google.com) content using php simple html dom parser but my script cant get the url .. any idea>
$url = htmlentities($_GET['url']);
$html = file_get_html('$url')->plaintext;
$result = $html;
Do not quote the $url variable:
$url = "htmlentities($_GET['url'])";
$html = file_get_html($url)->plaintext;
$result = $html;
not sure use can this $html=file_get_contents($_GET['url']);
You can either use file_get_contents($_GET['url']) as suggested or CURL
<?php
$ch = curl_init();
$timeout = 5;
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
curl_setopt($ch,CURLOPT_URL,$_GET['url']);
curl_setopt($ch,CURLOPT_RETURNTRANSFER,1);
curl_setopt($ch,CURLOPT_CONNECTTIMEOUT,$timeout);
$data = curl_exec($ch);
curl_close($ch);
echo $data;
?>
To get the plaintext , you need to do some parsing . You can get it here

Selecting a specific div from a extern webpage using CURL

Hi can anyone help me how to select a specific div from the content of a webpage.
Let's say i want to get the div with id="wrapper_content" from webpage http://www.test.com/page3.php.
My current code looks something like this: (not working)
//REG EXP.
$s_searchFor = '#^/.dont know what to put here..#ui';
//CURL
$ch = curl_init();
$timeout = 5; // set to zero for no timeout
curl_setopt ($ch, CURLOPT_URL, 'http://www.test.com/page3.php');
curl_setopt ($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt ($ch, CURLOPT_CONNECTTIMEOUT, $timeout);
if(!preg_match($s_searchFor, $ch))
{
$file_contents = curl_exec($ch);
}
curl_close($ch);
// display file
echo $file_contents;
So i'd like to know how i can use reg expressions to find a specific div and how to unset the rest of the webpage so that $file_content only contains the div.
HTML isn't regular, so you shouldn't use regex. Instead I would recommend a HTML Parser such as Simple HTML DOM or DOM
If you were going to use Simple HTML DOM you would do something like the following:
$html = str_get_html($file_contents);
$elem = $html->find('div[id=wrapper_content]', 0);
Even if you used regex your code still wouldn't work correctly. You need to get the contents of the page before you can use regex.
//wrong
if(!preg_match($s_searchFor, $ch)){
$file_contents = curl_exec($ch);
}
//right
$file_contents = curl_exec($ch); //get the page contents
preg_match($s_searchFor, $file_contents, $matches); //match the element
$file_contents = $matches[0]; //set the file_contents var to the matched elements
include('simple_html_dom.php');
$html = str_get_html($file_contents);
$elem = $html->find('div[id=wrapper_content]', 0);
Download simple_html_dom.php
check our hpricot, it lets you elegantly select sections
first you would use curl to get the document, then use hpricot to get the part you need

Categories