Parsing html with php and ganon - php

please help me to change selector for my code.
I try to get sellen name from page http://www.plati.ru/asp/seller.asp?id_s=119777
It's must be amedia, but I can't to get it.
This is my code
$result = curl_exec($ch);
curl_close($ch);
$html = str_get_dom($result );
foreach ($html('table tr td tr td') as $element) {
$seller_name = $element->getPlainText();
}

You can try the following and let me know if you still have any difficulties,
include "ganon.php";
$shopUrl = "http://www.plati.ru/asp/seller.asp?id_s=119777";
$html = file_get_dom($shopUrl);
echo $html('table',9)->getPlainText();

you can use DomDocument like this code to retrieve the td value :
<?php
header('Content-Type: text/html; charset=utf-8');
$DOM = new DOMDocument;
#$DOM->loadHTMLFile('http://www.plati.ru/asp/seller.asp?id_s=119777');
$tables = $DOM->getElementsByTagName('table');//->item(10);
$table = $tables->item(9);
$cells = $table->getElementsByTagName('td');
$cell = $cells->item(0);
echo $cell->textContent;
?>
the split the $cell->textContent using spaces.

Related

Datascraping With PHP

I am trying to take advantage of DOMDocument to scrape a table from another website. I am on shared hosting.
Here is what the html looks like:
<tbody>
<tr class="odd">
<td class="nightclub">Elleven</td>
<td class="city">Downtown Miami</td>
</tr>
<tr class="even">
<td class="night club">Story</td>
<td class="city">South Beach</td>
</tr>
</tbody>
I tried doing:
<?php
$domDoc = new \DOMDocument();
$url = "http://example.com/";
$html = file_get_contents($url);
$domDoc->loadHtml($html);
$domDoc->preserveWhiteSpace = false;
$tables = $domDoc->getElementsByTagName('tbody');
$rows = $tables->item(0)->getElementsByTagName('tr');
foreach ($rows as $row)
{
$columns = $row->getElementsByTagName('td');
print $columns->item(0)->nodeValue."/n";
print $columns->item(1)->nodeValue."/n";
print $columns->item(2)->nodeValue;
}
When I do this I get not result. I think the server is blocking my request.
try with simplehtmldom Here
// Create DOM from URL or file
$html = file_get_html('http://www.example.com/');
// Find all tr
foreach($html->find('tr') as $element)
echo $element->innertext . '<br>';
Its good library to parse HTML Manual
What I did was used a open sources PHP packaged called Guzzle. It will even allow you to Crawl into the site you are using.
If you are on shared hosting then download Guzzle and upload it to your server.
github.com/guzzle/guzzle/releases
<?php
require 'vendor/autoload.php';
$client = new GuzzleHttp\Client();
$domDoc = new DOMDocument();
$url = 'http://example.com';
$res = $client->request('GET', $url, [
'auth' => ['user', 'pass']
]);
$html = (string)$res->getBody();
// The # in front of $domDoc will suppress any warnings
$domHtml = #$dom->loadHTML($html);
//discard white space
$domDoc->preserveWhiteSpace = false;
//the table by its tag name
$tables = $domDoc->getElementsByTagName('tbody');
//get all rows from the table
$rows = $tables->item(0)->getElementsByTagName('tr');
// loop over the table rows
foreach ($rows as $row)
{
// get each column by tag name
$columns = $row->getElementsByTagName('td');
// echo the values
echo $columns->item(0)->nodeValue.'<br />';
echo $columns->item(1)->nodeValue.'<br />';
echo $columns->item(2)->nodeValue;
}
?>
If you don't mind, this is simplest solution. Use Simple Html Dom like below way:
$html = file_get_html("WWW.YOURDOMAIN.COM");
$data = array();
foreach($html->find("table tr") as $tr){
$row = array();
foreach($tr->find("td") as $td){
/* enter code here */
$row[] = $td->plaintext;
}
$data[] = $row;
}
See detailed answer here.
Your Code is perfect only remove \
$domDoc = new \DOMDocument();
Try
$domDoc = new DOMDocument();

Getting data from HTML using DOMDocument

I'm trying to get data from HTML using DOM. I can get some data, but can't figure out how to get the rest. Here is an image highlighting the data I want.
http://i.imgur.com/Es51s5s.png
here is the code itself
http://pastebin.com/Re8qEivv
and here my PHP code
$html = file_get_contents('result.html');
$dom = new DOMDocument;
$dom->loadHTML($html);
$tr = $dom->getElementsByTagName('tr');
foreach ($tr as $row){
$td = $row->getElementsByTagName('td');
$td1 = $td->item(1);
$td2 = $td->item(2);
foreach ($td1->childNodes as $node){
$title = $node->textContent;
}
foreach ($td2->childNodes as $node){
$type = $node->textContent;
}
}
Figured it out
$html = file_get_contents('result.html');
$dom = new DOMDocument;
$dom->loadHTML($html);
$tr = $dom->getElementsByTagName('tr');
foreach ($tr as $row){
$td = $row->getElementsByTagName('td');
$td1 = $td->item(1);
$td2 = $td->item(2);
$title = $td1->childNodes->item(0)->textContent;
$firstURL = $td1->getElementsByTagName('a')->item(0)->getAttribute('href');
$type = $td2->childNodes->item(0)->textContent;
$imageURL = $td2->getElementsByTagName('img')->item(0)->getAttribute('src');
}
I have used following class.
http://sourceforge.net/projects/simplehtmldom/
This is very simple and easy to use class.
You can use
$html->find('#RosterReport > tbody', 0);
to find specific table
$html->find('tr')
$html->find('td')
to find table rows or columns
Note $html is variable have full html dom content.

nested selector failed in using simple html dom parser

I want to get the link and scrape its content but I can';t event reach there. What's wrong with my nested selector?
my php
$dom = file_get_html('http://mojim.com/%E5%BF%83%E8%B7%B3.html?t3');
$tables = $dom->find('.iB');
$firstRow = $tables->find('tr',1)->find('td',4);
foreach ($firstRow as $value) {
echo $value;
}
?>
here is how the DOM look like
You just have a problem on pointing/traversing the correct element.
Example:
$dom = file_get_html('http://mojim.com/%E5%BF%83%E8%B7%B3.html?t3');
$firstRow = $dom->find('table.iB', 0)->find('tr', 1)->find('td', 3);
$link = $firstRow->find('a', 0);
echo $link->href . '<br/>' . $link->title;
Should output:
/twy100015x34x8.htm
心跳 歌詞 王力宏

PHP: How to find an element with particular name attribute in html (from url)

I am currently using PHP's file_get_contents($url) to fetch content from a URL. After getting the contents I need to inspect the given HTML chunk, find a 'select' that has a given name attribute, extract its options, and their values text. I am not sure how to go about this, I can use PHP's simplehtmldom class to parse html, but how do I get a particular 'select' with name 'union'
<span class="d3-box">
<select name='union' class="blockInput" >
<option value="">Select a option</option> ..
Page can have multiple 'select' boxes and hence I need to specifically look by name attribute
<?php
include_once("simple_html_dom.php");
$htmlContent = file_get_contents($url);
foreach($htmlContent->find(byname['union']) as $element)
echo 'option : value';
?>
Any sort of help is appreciated. Thank you in advance.
Try this PHP code:
<?php
require_once dirname(__FILE__) . "/simple_html_dom.php";
$url = "Your link here";
$htmlContent = str_get_html(file_get_contents($url));
foreach ($htmlContent->find("select[name='union'] option") as $element) {
$option = $element->plaintext;
$value = $element->getAttribute("value");
echo $option . ":" . $value . "<br>";
}
?>
how about this:
$htmlContent = file_get_html('your url');
$htmlContent->find('select[name= "union"]');
in object oriented way:
$html = new simple_html_dom();
$htmlContent = $html->load_file('your url');
$htmlContent->find('select[name= "union"]');
From DOMDocument documentation: http://www.php.net/manual/en/class.domdocument.php
$html = file_get_contents( $url );
$dom = new DOMDocument();
$dom->loadHTML( $html );
$selects = $dom->getElementsByTagName( 'select' );
$select = $selects->item(0);
// Assuming all children are options.
$children = $select->childNodes;
$options_values = array();
for ( $i = 0; $i < $children->length; $i++ )
{
$item = $children->item( $i );
$options_values[] = $item->nodeValue;
}

How to parse the attribute value of a <a> tag in PHP

I am trying to parse a html page for a database for universities and colleges in US. The code I wrote does fetches the names of the universities but I am unable to to fetch their respective url address.
public function fetch_universities()
{
$url = "http://www.utexas.edu/world/univ/alpha/";
$dom = new DOMDocument();
$html = $dom->loadHTMLFile($url);
$dom->preserveWhiteSpace = false;
$tables = $dom->getElementsByTagName('table');
$tr = $tables->item(1)->getElementsByTagName('tr');
$td = $tr->item(7)->getElementsByTagName('td');
$rows = $td->item(0)->getElementsByTagName('li');
$count = 0;
foreach ($rows as $row)
{
$count++;
$cols = $row->getElementsByTagName('a');
echo "$count:".$cols->item(0)->nodeValue. "\n";
}
}
This is my code that I have currently.
Please tell me how to fetch the attribute values as well.
Thank you
If you have a reference to an element, you just have to use getAttribute(), so probably:
echo "$count:".$cols->item(0)->getAttribute('href') . "\n";

Categories