I would like to be pointed in the right direction on how I would go about editing data (not headings) of a table using PHP DOM Document.
I have been looking into PHP DomDocument to replace the content of "Name 1" and "Age 1" etc, with real data from a database, however I am having a few issues...
<?php
$doc = new DOMDocument();
$doc->loadHTMLFile('template.html');
$sql = 'SELECT name,
age
FROM db.people';
$sql = mysql_query($sql);
for($i=0; $person = mysql_fetch_assoc($sql); $i++)
{
$doc->getElementsByTagName('td')->item($i)->nodeValue = $person['name'];
}
$doc->formatOutput = TRUE;
echo $doc->saveHTML();
?>
I would like to continue editing the above PHP code to replace place holder data with data from a database.
<table>
<tr>
<th>Name</th>
<th>Age</th>
</tr>
<tr>
<td>Stephanie</td>
<td>22</td>
</tr>
<tr>
<td>Martin</td>
<td>45</td>
</tr>
<tr>
<td>Sarah</td>
<td>61</td>
</tr>
<tr>
<td>Kevin</td>
<td>12</td>
</tr>
</table>
Can anyone point me in the right direction, and if i'm on the right track?
Both assignments inside the loop are assigning to exactly the same element, except the top one assigns "name" to the element, and the bottom one assigns "age" to the element. So Age always wins.
Related
I just want to sort my table by ID number... how can I that with the simplest way?
I want to sort that table by "1" "2" "3" automatically.. it will start 1 to 3 (from little one to bigger numbers.)
<table>
<tr>
<th>ID numbers</th>
<th>Names</th>
</tr>
<tr>
<td>1</td>
<td>haluk</td>
</tr>
<tr>
<td>2</td>
<td>betul</td>
</tr>
<tr>
<td>3</td>
<td>Erdem</td>
</tr>
<tr>
<td>5</td>
<td>Eylül</td>
</tr>
Thanks...
To sorting the table automatically via Pure HTML seems impossible at all. You still need to use PHP to make a quick sorting. But first all the data must be inside of your database table. An to simply echo out all the info just do some simple looping
<?php
$bil = 1;
$SQL = "SELECT * FROM tablename";
$Query = mysqli_query($connection , $SQL);
while ($row = mysqli_fetch_array($Query)) {
echo "
<tr><td>".$bil++."</td></tr>
";
}
?>
If don't understand feel free to ask
<?php
include('database_connection.php');
$sorgu = $baglanti->query("select * from makale ORDER BY ID;");
while ($sonuc = $sorgu->fetch_assoc()) {
?>
I added that code in front of my table and fetch_assoc my datas to my table.. than all ordered by ID numbers...
I am trying to get the text of child elements using the PHP DOM.
Specifically, I am trying to get only the first <a> tag within every <tr>.
The HTML is like this...
<table>
<tbody>
<tr>
<td>
1st Link
</td>
<td>
2nd Link
</td>
<td>
3rd Link
</td>
</tr>
<tr>
<td>
1st Link
</td>
<td>
2nd Link
</td>
<td>
3rd Link
</td>
</tr>
</tbody>
</table>
My sad attempt at it involved using foreach() loops, but would only return Array() when doing a print_r() on the $aVal.
$dom = new DOMDocument();
libxml_use_internal_errors(true);
$dom->loadHTML(returnURLData($url));
libxml_use_internal_errors(false);
$tables = $dom->getElementsByTagName('table');
$aVal = array();
foreach ($tables as $table) {
foreach ($table as $tr){
$trVal = $tr->getElementsByTagName('tr');
foreach ($trVal as $td){
$tdVal = $td->getElementsByTagName('td');
foreach($tdVal as $a){
$aVal[] = $a->getElementsByTagName('a')->nodeValue;
}
}
}
}
Am I on the right track or am I completely off?
Put this code in test.php
require 'simple_html_dom.php';
$html = file_get_html('test1.php');
foreach($html->find('table tr') as $element)
{
foreach($element->find('a',0) as $element)
{
echo $element->plaintext;
}
}
and put your html code in test1.php
<table>
<tbody>
<tr>
<td>
1st Link
</td>
<td>
2nd Link
</td>
<td>
3rd Link
</td>
</tr>
<tr>
<td>
1st Link
</td>
<td>
2nd Link
</td>
<td>
3rd Link
</td>
</tr>
</tbody>
</table>
I am pretty sure I am late, but better way should be to iterate through all "tr" with getElementByTagName and then while iterating through each node in nodelist recieved use getElementByTagName"a". Now no need to iterate through nodeList point out the first element recieved by item(0). That's it! Another way can be to use xPath.
I personally don't like SimpleHtmlDom because of the loads of extra added features it uses where a small functionality is required. In case of heavy scraping also memory management issue can hold you back, its better if you yourself do DOM Analysis rather than depending thrid party application.
Just My opinion. Even I used SHD initially but later realized this.
You're not setting $trVal and $tdVal yet you're looping them ?
The webpage in question is http://assignments.uspto.gov/assignments/q?db=pat&pub=20060030630
Now, let's just say I want to capture the Assignees in the first assignment. The relevant code there looks like
<div class="t3">Assignee:</div>
</td>
</tr>
</table>
</td><td>
<table width="100%" cellpadding="0" cellspacing="0" border="0">
<tbody valign="top">
<tr>
<td>
<table>
<tr>
<td>
<div class="p1">
LEAR CORPORATION
</div>
</td>
</tr>
<tr>
<td><span class="p1">21557 TELEGRAPH ROAD</span></td>
</tr>
<tr>
<td><span class="p1">SOUTHFIELD, MICHIGAN 48034</span></td>
</tr>
</table>
</td>
</tr>
</tbody>
</table>
</td>
</tr>
I could I suppose use xpath and grab everything out of spans with class p1, except that thing is used all throughout the page for basically everything, same for the div class that lear corporation is in.
So is there a way for me to just read "Assignees" and then grab just the information relevant to it?
I figure if I can understand how to do that, then I can extrapolate from that and figure out how to grab any specific data on the page that I want, i.e. grabbing the conveyance data on any particular assignment.
But if say, I were just to grab all the data on the page (reel/frame, conveyance, assignors, assignee, correspondent for every assignment, and the header information about the patent itself), might that be easier to do than trying to grab each individual piece of information?
There is no clear way to do it since we have no designation in the DOM where this information is.. It's very arbitrary.
I would recommend using some math to figure out the pattern of where in the DOM the Assignee resides.
For example, we know that for every class of p1, the assignee value is position 16, and a new Assignment occurs every 23rd position. Using a loop you could figure it out.
This should get you started at the very least.
$Site = file_get_contents('http://assignments.uspto.gov/assignments/q?db=pat&pub=20060030630');
$Dom = new DomDocument();
$Dom->loadHTML($Site);
$Finder = new DomXPath($Dom);
$Nodes = $Finder->query("//*[contains(concat(' ', normalize-space(#class), ' '), ' p1 ')]");
$position = 0;
foreach($Nodes as $node) {
if(($position % 16) == 0 && $position > 0) {
var_dump($node->nodeValue);
break;
}
$position++;
}
Lately I've had a question, what I'm trying to do is read data from an HTML table and grab the data into a variable called $id. For example I have this code:
<tr>
<td>413</td>
<td>Party Hat</td>
<td>0</td>
<td>No</td>
<td>View SWF</td>
</tr>
What I want to do is that another variable called $array[$i] which is holding a search query. I want my PHP code to search through the table until it finds the section with that specific query in it. In this case is would be "Party Hat." What I want it to do after it finds the query is for it to look at the ID which is the "td" section above the name "Party Hat" the ID in this case is 413. After this I want the variable $id to hold the ID. How do I do this? Any help would be HIGHLY appreciated!
using Tidy, DOMDocument and DOMXPath (make sure the PHP extensions are enabled) you can do something like this:
<?php
$url = "http://example.org/test.html";
function get_data_from_table($id, $url)
{
// retrieve the content of that url
$content = file_get_contents($url);
// repair bad HTML
$tidy = tidy_parse_string($content);
$tidy->cleanRepair();
$content = (string)$tidy;
// load into DOM
$dom = new DOMDocument();
$dom->loadHTML($content);
// make xpath-able
$xpath = new DOMXPath($dom);
// search for the first td of each tr, where its content is $id
$query = "//tr/td[position()=1 and normalize-space(text())='$id']";
$elements = $xpath->query($query);
if ($elements->length != 1) {
// not exactly 1 result as expected? return number of hits
return $elements->length;
}
// our td was found
$element = $elements->item(0);
// get his parent element (tr)
$tr = $element->parentNode;
$data = array();
// iterate over it's td elements
foreach ($tr->getElementsByTagName("td") as $td) {
// retrieve the content as text
$data[] = $td->textContent;
}
// return the array of <td> contents
return $data;
}
echo '<pre>';
print_r(
get_data_from_table(
414,
$url
)
);
echo '</pre>';
Your HTML source (http://example.org/test.html):
<table><tr>
<td>413</td>
<td>Party Hat</td>
<td>0</td>
<td>No</td>
<td>View SWF</td>
</tr><tr>
<td>414</td>
<td>Party Hat</td>
<td>0</td>
<td>No</td>
<td>View SWF</td>
</tr>
(as you can see, no valid HTML, but this doesn't matter)
This works: (although a bit ugly, perhaps someone else can come up with a better xpath solution)
$html = <<<HTML
<html>
<body>
<table>
<thead>
<tr>
<td>id</td>
<td>name</td>
<td>a</td>
<td>b</td>
<td>c</td>
</tr>
</thead>
<tbody>
<tr>
<td>413</td>
<td>Party Hat</td>
<td>0</td>
<td>No</td>
<td>a link</td>
</tr>
<tr>
<td>414</td>
<td>Party Hat 2</td>
<td>0</td>
<td>No</td>
<td>a link</td>
</tr>
</tbody>
</table>
</body>
</html>
HTML;
$doc = new DOMDocument();
$doc->loadHTML($html);
$domxpath = new DOMXPath($doc);
$res = $domxpath->query("//*[local-name() = 'td'][text() = 'Party Hat']/../td[position() = '1']");
var_dump($res->length, $res->item(0)->textContent);
Outputs:
int(1)
string(3) "413"
try to load the html into an new DOMDocument via loadHTML and process it like an XML Doc, with xpath or other types of query
I have a table, of whose number of columns can change depending on the configuration of the scrapped page (I have no control of it). I want to get only the information from a specific column, designated by the columns heading.
Here is a simplified table:
<table>
<tbody>
<tr class='header'>
<td>Image</td>
<td>Name</td>
<td>Time</td>
</tr>
<tr>
<td><img src='someimage.png' /></td>
<td>Name 1</td>
<td>13:02</td>
</tr>
<tr>
<td><img src='someimage.png' /></td>
<td>Name 2</td>
<td>13:43</td>
</tr>
<tr>
<td><img src='someimage.png' /></td>
<td>Name 3</td>
<td>14:53</td>
</tr>
</tbody>
</table>
I want to only extract the names (column 2) of the table. However, as previously stated, the column order cannot be known. The Image column might not be there, for example, in which case the column I want would be the first one.
I was wondering if there's any way to do this with DomDocument/DomXPath. Perhaps search for the string "Name" in the first tr, and find out which column index it is, and then use that to get the info. A less elegant solution would be to see if the first column has an img tag, in which case the image column is first and so we can throw that way and use the next one.
Been looking at it for about an hour and a half, but I'm not familiar to DomDocument functions and manipulation. Having a lot of trouble with this one.
Simple HTML DOM Parser may be useful. You can check the manual. Basically you should use something like;
$url = "file url";
$html = file_get_html($url);
$header = $html->find('tr.header td');
$i = 0;
foreach ($header as $element){
if ($element->innerText == 'Image') { $num = $i; }
$i++;
}
We found which column ($num) is image column. You can add additional codes to improve.
PS: Easy way to find all image sources;
$images = $html->find('tr td img');
foreach ($images as $image){
$imageUrl[] = $image->src;
}