scraping google custom search result with PHP - php

I'm using simple_html_dom.php
<?php
include('simple_html_dom.php');
$songName = '再见青春';
$dom = file_get_html('http://www.google.com/cse?q='. $songName .'&cx=partner-pub-4291153493758949%3A9692445719&cof=FORID%3A10&ie=UTF-8&ad=w9&num=1');
$firstRow = $dom->find('#gs-visibleUrl-long')->plaintext;
echo $dom;
var_dump($firstRow);
?>
$dom is ok, but I want to dive in the DOM, it doesn't work. The $firstRow returned NULL. Am I doing this scrapping wrong?
The Dom and error is here http://daysof.me/chrome_lyric/lyric.php

Related

select by class and id in simple html dom not work in google search

The following code where I try to find divs by class is not working for google search results, I have also tried for id.
include('simple_html_dom.php');
$dom = file_get_html("https://www.google.com/search?q=best+mug");
$all_divs = $dom->find("div[class='g']");
foreach ($all_divs as $div) {
echo $div->plaintext;
}
I think it's better to use XPath to do that, here is a sample of what your code could look like with XPath:
$dom = file_get_contents("https://www.google.com/search?q=best+mug");
#$doc = new DOMDocument();
#$doc->loadHTML($dom);
$xpath = new DomXPath($doc);
$all_divs = $xpath->query("//div[#class='g']");
foreach ($all_divs as $div) {
echo $div->plaintext;
}
Try it out and let me know if it works.

Simple crawl to website

I am trying to crawl a website . Will the below code is that efficient to get me the values which I listed
<?php
include 'simple_html_dom.php';
$target_url = "http://www.phunwa.com/phone/0191/2604233";
$html = new simple_html_dom();
$html->load_file($target_url);
foreach($html->find('Name') as $link){
echo $link."<br />";
}
?>
Actaully I am trying to ftech Name , Address and location . COuld anybody please give me any idea on this.
Thanks in advance
By looking at the source code, try getting the contents of the div with class address-tags then looping through the tags and echoing the contents.
Try this to start with;
$dom = new DomDocument();
$dom->loadHtml($html);
$xpath = new DomXpath($dom);
$div = $xpath->query('//*[#class="address-tags"]')->item(0);

php simple html dom parser not updating multi classes

I'm trying to apply 2 classes to an element like this:
$div->setAttribute('class', 'txt found');
unfortunately it won't work as i'm getting the following markup:
<div found="" class="txt">
I've also tried $div->class = "txt found"; which had same result.
Any ideas how to fix this?
Could you please try following;
$div->className = "txt found";
Updated:
<?php
$divHtml = "<div></div>";
$dom = new DOMDocument();
$dom->loadHTML($divHtml);
$allElements = $dom->getElementsByTagName('div');
$divElement = $allElements->item(0);
$divElement->setAttribute("class", "txt found");
echo $dom->saveHTML();
?>
I tried to reproduce your case and finally it worked.You can test it.If you send more code we can modify it inorder to work

php simple html dom parser how to get the content of html tag

I am trying to get the specific tag content, but seems I am not able to do so using following function
<?PHP
include_once('simple_html_dom.php');
function read_page($url = 'http://google.com')
{
$doc = new DOMDocument();
$data = file_get_html($url);
$content = $data->find('div#footer');
print_r( $content);
}
read_page();
?>
Try $data->find('div[id="footer"]')

Echoing only a div with php

I'm attempting to make a script that only echos the div that encolose the image on google.
$url = "http://www.google.com/";
$page = file($url);
foreach($page as $theArray) {
echo $theArray;
}
The problem is this echos the whole page.
I want to echo only the part between the <div id="lga"> and the next closest </div>
Note: I have tried using if's but it wasn't working so I deleted them
Thanks
Use the built-in DOM methods:
<?php
$page = file_get_contents("http://www.google.com");
$domd = new DOMDocument();
libxml_use_internal_errors(true);
$domd->loadHTML($page);
libxml_use_internal_errors(false);
$domx = new DOMXPath($domd);
$lga = $domx->query("//*[#id='lga']")->item(0);
$domd2 = new DOMDocument();
$domd2->appendChild($domd2->importNode($lga, true));
echo $domd2->saveHTML();
In order to do this you need to parse the DOM and then get the ID you are looking for. Check out a parsing library like this http://simplehtmldom.sourceforge.net/manual.htm
After feeding your html document into the parser you could call something like:
$html = str_get_html($page);
$element = $html->find('div[id=lga]');
echo $element->plaintext;
That, I think, would be your quickest and easiest solution.

Categories