XPath: Get tables with exactly two columns - php

<table>
<tr>
<td>1</td>
</tr>
<tr>
<td>2</td>
</tr>
</table>
<table>
<tr>
<td>1</td>
<td>A</td>
</tr>
<tr>
<td>2</td>
<td>B</td>
</tr>
</table>
<table>
<tr>
<th>1</th>
<td>A</td>
</tr>
<tr>
<th>2</th>
<td>B</td>
</tr>
</table>
<table>
<tr>
<th>1</th>
<th>A</th>
</tr>
<tr>
<td>2</td>
<td>B</td>
</tr>
</table>
How do I get only the table(s) with exactly 2 columns, whether its both th in a single tr, or both td or either, but two columns.

If you want to check if count of either th or td in each row equals 2, this is one possible way :
//table[tr[count(td|th) = 2]]
And if child of tr is always either th or td, never other element that you don't want to consider in the count, then you can just say count all child elements :
//table[tr[count(*) = 2]]
Notice that the above will count only direct child, not including descendants.

Related

How do I only retrieve data from the first table when web-scraping?

I only know the basics of scraping webpages using php simple html dom. The webpage has several tables on it, all with the same classes - so nothing unique to each table.
<table class="table">
<tr>
<td>item 1</td>
<td>item 2</td>
<td>item 3</td>
</tr>
</table>
<table class="table">
<tr>
<td>item 4</td>
<td>item 5</td>
<td>item 6</td>
</tr>
</table>
<table class="table">
<tr>
<td>item 7</td>
<td>item 8</td>
<td>item 9</td>
</tr>
</table>
The code I have is working, however it's returning data from ALL tables, and I only want data from the first table. I'm using the following to find the rows of data :
$d1s = $dom->find('table.table tr');
I'm assuming it's a simple error I've made. Can anyone help ?

How to follow the condition to underline in the table?

I have a question how to underline in the table according the column data. Below is example coding to explain what I am facing the problem:
I want to detect if column underline is 1 the first name data will draw the underline, if 0 the first name data no show the underline. Below the sample is hardcode, if real situation, I have too many row to show the data, I cannot 1 by 1 to add text-decoration: underline; in the td. So that, hope someone can guide me how to solve this problem. I am using the php code to make the variable to define the underline.
<!--Below the php code I just write the logic, because I don't know how to write to detect the column underline value-->
<?php
if ( <th>Underline</th> == 1) {
$add_underline = "text-decoration: underline;";
}
if ( <th>Underline</th> == 0) {
$add_underline = "text-decoration: underline;";
}
?>
<table style="width:100%">
<tr>
<th>Firstname</th>
<th>Lastname</th>
<th>Underline</th>
</tr>
<tr>
<td style="<?php echo $add_underline;?> ">Jill</td>
<td>Smith</td>
<td>1</td>
</tr>
<tr>
<td style="<?php echo $add_underline;?>">Eve</td>
<td>Jackson</td>
<td>0</td>
</tr>
<tr>
<td style="<?php echo $add_underline;?>">John</td>
<td>Doe</td>
<td>1</td>
</tr>
</table>
My output like below the picture:
My expected result like below the picture, Jill and John can underline:
Why not use javascript to achieve this? No matter what the server sends it will evaluate the condition if 1 is set and then underline accordingly... You would have to use classes to get the appropriate table data tags holding the values, I added class='name' to the names <td> tag and class='underline' tot he underline <td> tag.
// get the values of the elements with a class of 'name'
let names = document.getElementsByClassName('name');
// get the values of the elements with a class of 'underline'
let underline = document.getElementsByClassName('underline');
// loop over elements using for and use the keys to get and set values
// `i` will iterate until it reaches the length of the list of elements with class of underline
for(let i = 0; i < underline.length; i++){
// use the key to get the text content and check if 1 is set use Number to change string to number for strict evaluation
if(Number(underline[i].textContent) === 1){
// set values set to 1 to underline in css style
names[i].style.textDecoration = "underline";
}
}
<table style="width:100%">
<tr>
<th>Firstname</th>
<th>Lastname</th>
<th>Underline</th>
</tr>
<tr>
<td class="name">Jill</td>
<td>Smith</td>
<td class='underline'>1</td>
</tr>
<tr>
<td class="name">Eve</td>
<td>Jackson</td>
<td class='underline'>0</td>
</tr>
<tr>
<td class="name">John</td>
<td>Doe</td>
<td class='underline'>1</td>
</tr>
</table>
Or using the td child values...
let tr = document.querySelectorAll("tr");
last = null;
for(let i = 1; i < tr.length; i++){
if(Number(tr[i].lastElementChild.innerHTML) === 1){
tr[i].firstElementChild.style.textDecoration = "underline";
}
}
<table style="width:100%">
<tr>
<th>Firstname</th>
<th>Lastname</th>
<th>Underline</th>
</tr>
<tr>
<td>Jill</td>
<td>Smith</td>
<td>1</td>
</tr>
<tr>
<td>Eve</td>
<td>Jackson</td>
<td>0</td>
</tr>
<tr>
<td>John</td>
<td>Doe</td>
<td>1</td>
</tr>
</table>

Trouble returning correct results when comparing 2 tables

I have two tables like this:
products
<table>
<tr>
<th>id</th>
<th>name</th>
<th>description</th>
</tr>
<tr>
<td>1</td>
<td>book</td>
<td>book desc</td>
</tr>
<tr>
<td>2</td>
<td>tea</td>
<td>tea desc</td>
</tr>
<tr>
<td>3</td>
<td>glasses</td>
<td>glasses desc</td>
</tr>
</table>
product_attributes
<table>
<tr>
<th>product_id</th>
<th>attribute_id</th>
</tr>
<tr>
<td>1</td>
<td>2</td>
</tr>
<tr>
<td>2</td>
<td>7</td>
</tr>
<tr>
<td>2</td>
<td>8</td>
</tr>
<tr>
<td>3</td>
<td>2</td>
</tr>
<tr>
<td>3</td>
<td>7</td>
</tr>
<tr>
<td>3</td>
<td>9</td>
</tr>
<tr>
<td>3</td>
<td>10</td>
</tr>
<tr>
<td>1</td>
<td>2</td>
</tr>
<tr>
<td>1</td>
<td>5</td>
</tr>
<tr>
<td>1</td>
<td>7</td>
</tr>
</table>
I also have two variables (strings), $pid and $search_ids
$pid contains product ids, and has a value of 2,3,1
$search_ids contains attribute ids, and has a value of 7,8
I want to return name from products table, IF in product_attributes table both product_id and attribute_id in one row contain a value from $pid and $search_ids.
So in the case of tables specified above, I expect to get only TEA returned, because only its id matches both ids of $search_ids in the product_attributes table.
I tried the following but this returns all products I have four times for some reason:
$q = "SELECT p.name FROM products AS p, product_attributes AS pa WHERE p.id IN ($pid) AND pa.product_id IN ($pid) AND pa.attribute_id IN ($search_ids)";
try this change
SELECT p.name
FROM products AS p, product_attributes AS pa
WHERE p.id = pa.product_id <---!!!
AND p.id IN ($pid) AND pa.attribute_id IN ($search_ids)
you need to specify the join.

php mysql - get more than 1 record in single query

I have a table like
<table width="60%" border="0">
<tr>
<td>intId</td>
<td>tagname</td>
<td>cid</td>
<td>lid</td>
</tr>
<tr>
<td>1</td>
<td>chemis</td>
<td>5</td>
<td>0</td>
</tr>
<tr>
<td>2</td>
<td>hist</td>
<td>4</td>
<td>0</td>
</tr>
<tr>
<td>3</td>
<td>canada</td>
<td>0</td>
<td>9</td>
</tr>
<tr>
<td>4</td>
<td>chemis</td>
<td>6</td>
<td>0</td>
</tr>
<tr>
<td>5</td>
<td>chemis</td>
<td>9</td>
<td>2</td>
</tr>
<tr>
<td>6</td>
<td>hist</td>
<td>3</td>
<td>1</td>
</tr>
</table>
$srarchkey_arr = array('chemis','tes','loyal','hist','canada');
My output should be
<table width="60%" border="0">
<tr>
<td>Tag Name </td>
<td>cid</td>
<td>lid</td>
</tr>
<tr>
<td>Chemis</td>
<td>5,6,9</td>
<td>0,0,2</td>
</tr>
<tr>
<td>hist</td>
<td>4,3,</td>
<td>0,1</td>
</tr>
<tr>
<td>canada</td>
<td>0</td>
<td>9</td>
</tr>
</table>
Please refer the fiddle
i.e In my tags table i have lots of tags with cid and lid
I want to search the words which are in array $srarchkey_arr. I want to search these tags and give the output as specified. I used the like query but it gives the out put as individual record. So i again use loops to concat the cid and lids.
Is this possible to do this with single query and loop. Is there any possibility to pass this like array or like in() for strings.
Please help me. thanks
use this sql
SELECT tagname, group_concat(cid), group_concat(lid) from test.check group by tagname;

Parsing a complicated HTML table with PHP

I successfully parsed a dynamic table with the following PHP code:
$docH = new DOMDocument();
$docH->loadHTMLFile($url);
//get everything inside the body element:
$bodyH = $docH->getElementsByTagName('body')->item(0);
foreach ($bodyH->childNodes as $childNode) {
echo $docH->saveHTML($childNode);
}
Parsed HTML Table:
<table>
<tr>
<td>5CG</td>
<td>aass</td>
<td>sxs</td>
<td>sx</td>
<td>EK</td>
<td></td>
<td>72</td>
</tr>
<td></td>
<td>samplxs</td>
<td>xs</td>
<td></td>
<td>xss</td>
<td>fkxsx aus</td>
<td>s</td>
</tr>
<td></td>
<td>5AH.</td>
<td>ds</td>
<td>d</td>
<td>sdf</td>
<td>sdfsdf aus</td>
<td></td>
</tr>
<tr>
<td>6CG</td>
<td>3.</td>
<td>sfd</td>
<td></td>
<td>scs</td>
<td>das aus</td>
<td>a</td>
</tr>
<tr>
<td>7DG</td>
<td>6.</td>
<td>s</td>
<td>s</td>
<td>sD</td>
<td>sdsa.</td>
<td></td>
</tr>
<td></td>
<td>samplxs</td>
<td>xs</td>
<td></td>
<td>xss</td>
<td>fkxsx aus</td>
<td>s</td>
</tr>
<tr>
<td>7DG, 7CG, 7CR</td>
<td>6.</td>
<td>NsdR</td>
<td>s</td>
<td>SP</td>
<td>fasdlt aus</td>
<td>s</td>
</tr>
<td></td>
<td>samplxs</td>
<td>xs</td>
<td></td>
<td>xss</td>
<td>fkxsx aus</td>
<td>s</td>
</tr>
<tr>
<td>9BR</td>
<td>6.</td>
<td>FEI</td>
<td>sa</td>
<td>DE</td>
<td>fasdad aus</td>
<td></td>
</tr>
<tr>
<td>9AR, 9BR, 9CR</td>
<td>62.</td>
<td>BEH</td>
<td></td>
<td>sd</td>
<td>fasda aus</td>
<td></td>
</tr>
<tr>
<td></td>
<td>6.</td>
<td>MLR</td>
<td></td>
<td>FdR</td>
<td>fsdfaus</td>
<td></td>
</tr>
<tr>
<td>E10C</td>
<td>6.</td>
<td>sdf</td>
<td>d</td>
<td>d</td>
<td>fsdfs aus</td>
<td></td>
</tr>
<tr>
</table>
But my goal is to just show the content of the table the user wants by asking for just the <tr> elements in which the first <td> of the first <tr> includes some text until there is another <tr> which first <td> has a different content.
For example: If the user types "9BR" into an input field, I just want him to see:
9BR
6.
FEI
sa
DE
fasdad aus
9AR, 9BR, 9CR
62.
BEH
sd
fasda aus
6.
MLR
FdR
fsdfaus
If he types in 5CG:
<tr>
<td>5CG</td>
<td>aass</td>
<td>sxs</td>
<td>sx</td>
<td>EK</td>
<td></td>
<td>72</td>
</tr>
<td></td>
<td>samplxs</td>
<td>xs</td>
<td></td>
<td>xss</td>
<td>fkxsx aus</td>
<td>s</td>
</tr>
Or if 6CG just:
<tr>
<td>6CG </td>
<td>3. </td>
<td>sfd </td>
<td> </td>
<td>scs </td>
<td>das aus</td>
<td>a </td>
</tr>
Using XPath, something like this should do the trick
http://de3.php.net/manual/en/class.domxpath.php
$xpath = new DomXpath($docH);
$trs = $xpath->query('//tr[td[1][contains(text(), "BR9")]]');
find all tr which first td contains text "anything"
as for the following ´tr´s with empty first td
this might not be the most elegant form to query this, but would work:
$query = '
//tr[td[1][contains(text(), "anything")]]
|
//tr[td[1][contains(text(), "anything")]]
/following-sibling::tr[td[1][not(text())] and preceding-sibling::tr[1][td[1][not(text()) or contains(text(), "anything")]]]
';
find all tr which first td contains text "anything"
also find all tr which first td is empty and whose preceding siblings (trs) first td
is also empty or contains text "anything"
example: http://3v4l.org/q6eDu

Categories