Extract links from specific table

Extract links from specific table - php

I have a html code with many html tables. I want to extract links from specific one which has specific div above.
Here's my sample code:
<div class="boxuniwersal_header">Table 1</div>
<img src="img/boxuniwersal_top.gif" width="210" height="18" alt="" style="margin-top: 5px" />
<div class="boxuniwersal_content">
<div class="boxuniwersal_subcontent">
<div class='menu_m1'><table cellpadding="3"><tr><td><img src="some.jpg" width="45" /></td><td>Some text</td></tr></table></div>
<br />
</div>
</div>
<!-- /box -->
<!-- box -->
<div class="boxuniwersal_header">Table 2</div>
<img src="img/boxuniwersal_top.gif" width="210" height="18" alt="" style="margin-top: 5px" />
<div class="boxuniwersal_content">
<div class="boxuniwersal_subcontent">
<div class='menu_m1'><table cellpadding="3"><tr><td><img src="some2.jpg" width="45" /></td><td>Some text2</td></tr></table></div>
<br />
</div>
</div>
$domXPath = new DOMXPath($domDocument);
$results = $domXPath->query("//div/div/table/tr/td/a|//table//tr/td//a"); //querying domdocument
foreach($results as $result)
{
$links[]=$result->getAttribute("href");
}
This code returns all links. I want to grab only links from Table1. Is it possible?

Your main problem is just tuning the XPath expression to select the right XML.
If you change your XPath to
//div[text()="Table 1"]/following-sibling::div[1]//table//a
What this does is first find the <div> element whose text is the one your after.
The following-sibling::div[1] part will look at the first <div> element at the same level as the <div> element already selected (this is the one where the <table> is).
The last part just looks for all <a> elements within the enclosing <table>.

Related

Couldnt grab div element with a specified class name using simple_html_dom?

I am using simple_html_dom, i am having issues grabbing a div with a class name specified below is the code!
<?php
include 'simple_html_dom.php';
$html='
<div class="user-info ">
<div class="user-action-time">
answered <span title="2016-06-27 20:01:45Z" class="relativetime">Jun 27 at 20:01</span>
</div>
<div class="user-gravatar32">
<div class="gravatar-wrapper-32"><img src="https://www.gravatar.com/avatar/09e3746cf7e47d4b3b15f5d871b91661?s=32&d=identicon&r=PG" alt="" width="32" height="32"></div>
</div>
<div class="user-details">
David Mulder
<div class="-flair">
';
echo $html->find('div[class=user-details]',0);
?>
What am i doing wrong here i am getting error Call to a member function find() on string in
Thanks!

You are tying to use Simple Html Dom to parse an html string.
Do not assign your html string to $html variable.
Assign it to an other, like $html_string.
Then use $html = str_get_html($html_string)
and
echo $html->find('div[class=user-details]',0);

You trying to call object method on a string variable. It should works:
$html = str_get_html('<div class="user-info ">
<div class="user-action-time">
answered <span title="2016-06-27 20:01:45Z" class="relativetime">Jun 27 at 20:01</span>
</div>
<div class="user-gravatar32">
<div class="gravatar-wrapper-32"><img src="https://www.gravatar.com/avatar/09e3746cf7e47d4b3b15f5d871b91661?s=32&d=identicon&r=PG" alt="" width="32" height="32"></div>
</div>
<div class="user-details">
David Mulder
<div class="-flair">');

PHP with DOMXPath - Sum values after selection of items

I have this html structure:
<div class="wanted-list">
<div class="headline"></div>
<div class="entry">
<div></div>
<div></div>
<div class="length">1100</div>
<div></div>
<div class="status">
<img src="xxxx" alt="open">
</div>
</div>
<div class="entry mark">
<div></div>
<div></div>
<div class="length">800</div>
<div></div>
<div class="status">
<img src="xxxx" alt="open">
</div>
</div>
<div class="entry">
<div></div>
<div></div>
<div class="length">2300</div>
<div></div>
<div class="status">
<img src="xxxx" alt="closed">
</div>
</div>
</div>
I want to select only the items that are 'open', so I do:
$doc4 = new DOMDocument();
$doc4->loadHtmlFile('http://www.whatever.com');
$doc4->preserveWhiteSpace = false;
$xpath4 = new DOMXPath($doc4);
$elements4 = $xpath4->query("//div[#class='wanted-list']/div/div[5]/img[#alt='open']");
Now, if I'm not mistaken, we have isolated the 'open' items we wanted. Now, I need to get the 'length' values, and sum them to make a total length so I can echo it. I've spent several hours trying different solutions and researching, but I haven't found anything similar. Can you guys help?
Thanks in advance.
EDITED the wrong div's, sorry.

I'm not sure if you mean for the calculations all to be done in the xsl or whether you are just wanting the sum of the lengths to be available to you in php, however this captures and sums the lengths. As noted by #Chris85 in the comment - the html is invalid - there are spare closing div tags within each entry ~ presumably the image is supposed to be a child of div.status? If that is so the below would need slight modification when trying to target the correct parent. That said, I received no warnings from DOMDocument whilst parsing it but better to fix than ignore!
$strhtml='
<div class="wanted-list">
<div class="headline"></div>
<div class="entry">
<div></div>
<div></div>
<div class="length">1100</div>
<div></div>
<div class="status">
<img src="xxxx" alt="open">
</div>
</div>
<div class="entry mark">
<div></div>
<div></div>
<div class="length">800</div>
<div></div>
<div class="status">
<img src="xxxx" alt="open">
</div>
</div>
<div class="entry">
<div></div>
<div></div>
<div class="length">2300</div>
<div></div>
<div class="status">
<img src="xxxx" alt="closed">
</div>
</div>
</div>';
$dom = new DOMDocument();
$dom->loadHtml( $strhtml );/* alternative to loading a file directly */
$dom->preserveWhiteSpace = false;
$xp = new DOMXPath($dom);
$col=$xp->query('//img[#alt="open"]');/* target the nodes with the attribute you need to look for */
/* variable to increment with values found from DOM values */
$length=0;
foreach( $col as $n ) {/* loop through the found nodes collection */
$parent=$n->parentNode->parentNode;/* corrected here to account for change in html layout ~ get the suitable parent node */
/* based on original code, find value from particular node */
$length += $parent->childNodes->item(5)->nodeValue;
}
echo 'Length:'.$length;

Efficient and easy way to extract data from divs taken from website

Hi a have 100 such elements * 40 files = 4000 elements.
I want to extract src and href from each element and write it to array to simply send it to database
<a class="market_listing_row_link" href="http://steamcommunity.com/market/listings/730/FAMAS%20%7C%20Colony%20%28Minimal%20Wear%29" id="resultlink_99">
<div class="market_listing_row market_recent_listing_row market_listing_searchresult" id="result_99">
<img id="result_99_image" src="http://steamcommunity-a.akamaihd.net/economy/image/fWFc82js0fmoRAP-qOIPu5THSWqfSmTELLqcUywGkijVjZYMUrsm1j-9xgEObwgfEh_nvjlWhNzZCveCDfIBj98xqodQ2CZknz59Ne60Iwh0fTvREaFdWco39RrlByIN5M5kXMP49bhWKA3utIrGYLl-M4pJH5PRWaLSNFz5ux1pg_dbeZyPoyvui3i6PnBKBUQvkKsHsA/62fx62f" style="border-color: #D2D2D2;" class="market_listing_item_img" alt="" />
<div class="market_listing_right_cell market_listing_their_price">
<span class="market_table_value">
Starting at:<br/>
<span style="color:white">$0.05 USD</span>
</span>
<span class="market_arrow_down" style="display: none"></span>
<span class="market_arrow_up" style="display: none"></span>
</div>
<div class="market_listing_right_cell market_listing_num_listings">
<span class="market_table_value">
<span class="market_listing_num_listings_qty">6,191</span>
</span>
</div>
<div class="market_listing_item_name_block">
<span id="result_99_name" class="market_listing_item_name" style="color: #D2D2D2;">FAMAS | Colony (Minimal Wear)</span>
<br/>
<span class="market_listing_game_name">Counter-Strike: Global Offensive</span>
</div>
</div>
</a>

You could try using Simple HTML Dom: http://simplehtmldom.sourceforge.net/
It will let you go through the HTML using the DOM instead of having to manually parse through everything.

Php to auto populate grids

I have the following html code:
<div class="media row-fluid">
<div class="span3">
<div class="widget">
<div class="well">
<div class="view">
<img src="img/demo/media/1.png" alt="" />
</div>
<div class="item-info">
Title 1
<p>Info.</p>
<p class="item-buttons">
<i class="icon-pencil"></i>
<i class="icon-trash"></i>
</p>
</div>
</div>
</div>
<div class="widget">
<div class="well">
<div class="view">
<img src="img/demo/media/2.png" alt="" />
</div>
<div class="item-info">
This is another title
<p>Some info and details go here.</p>
<p class="item-buttons">
<i class="icon-pencil"></i>
<i class="icon-trash"></i>
</p>
</div>
</div>
</div>
</div>
Which basically alternates between a span class with the widget class, and then the widget class without the span3 class.
What I wanted to know was if there was a way to have php "echo" or populate the details for and details under the "item-info" class. Would I need to use a foreach statement to get this done? I would be storing the information in a mysql database, and while I can get it to fill in the info one by one (repeatedly entering the and echoing out each image and item title) it's not practical when the content needed to be displayed is over 15 different items. I'm not well versed in foreach statements so I could definitely use some help on it.
If someone could help me perhaps structure a php script so that it can automatically output the html based on the number individual items in the database, that'd be greatly appreciated!
I'm wondering if the html + php (not including the foreach) would look like this:
<div class="span3">
<div class="widget">
<div class="well">
<div class="view">
<img src="img/<? $file ?>" alt="" />
</div>
<div class="item-info">
<?$title?>
<p>Info.</p>
<p class="item-buttons">
<i class="icon-pencil"></i>
<i class="icon-trash"></i>
</p>
</div>
</div>
</div>
EDIT:
I wanted to add some more information. The items populated would be based on a type of subscription - which will be managed by a group id.
I was initially going to use <? (if $_SESSION['group_id']==1)>
echo <div class="item-info">
$title
<p>$info</p>
</div>
so that only the subscribed items would populate. But, I would need it to iterate through all the items for group1 table and list it. Currently I know that I can do
<? (if $_SESSION['group_id']==1)
while ($row=mysql_fetch_assoc($sqlItem))
{
$itemInfo = $row['info'];
$image = $row['image'];
$title = $row['title'];
$url = $row['url'];
};
>
$sqlItem for now can only be assigned one thing (manually - as in: $sqlItem = '123'), unless I iterate through which is what I'm trying to figure out.

Just read that 'mysql_fetch_assoc' is being depreciated with 5.5, here is the new way and looks better, easier I think.. Hope this helps, was updated today.
I hope this helps http://php.net/manual/en/mysqli-stmt.fetch.php
replace the printf with echo '//then your html stuff
This will iterate through the rows in your database until their are no more matching records.

shouldn't a while be enough? It depends on the structure of your database and website (we didn't need so much HTML I think. Some more PHP maybe). Hope this helps.

Find and separate the HTML blocks to an array

First of all I want to describe the idea - anyone know that any CMS or a simple website has some kind of blocks like the list of articles for example on the main page of wordpress where shown each in a block of information: Title, author, content, date etc.
So the main idea is how to find and separate such blocks of HTML and append each of them to an array.
I thought first need to clear them from: classes, ids and styles.
step1:
<div id="box1">
<h3 class="title_style">Title1</h3>
<p>content for box1</p>
<div class="author">Author Name1<span class="style_date">date1<span>any text</div>
</div>
<div id="box2">
<h3 class="title_style">Title2</h3>
<p>content for box2</p>
<div class="author">Author Name2<span class="style_date">date2<span>any text2</div>
</div>
to
<div>
<h3>Title1</h3>
<p>content for box1</p>
<div>Author Name1<span>date1<span>any text</div>
</div>
<div>
<h3>Title2</h3>
<p>content for box2</p>
<div>Author Name2<span>date2<span>any text2</div>
</div>
Step2:
I need to find each block and write them to an array so I can to put each block to a row in the table like this: (note that this blocks are present on almost any site so it doesn't matter what tags it has, they just repeat with different content and attributes, only the structure is the same)
<table>
<tr id="block1">
<td>Title1</td>
<td>content for box1</td>
<td>Author Name1</td>
<td>date1</td>
<td>any text</td>
</tr>
<tr id="block2">
<td>Title2</td>
<td>content for box2</td>
<td>Author Name2</td>
<td>date2</td>
<td>any text</td>
</tr>
</table>
Any ideas ? I need the logic how to do this, not the code itself.

You can walk the DOM of the document using PHP's DOMDocument class.
So you can do something like this:
$str = <<<STR
<div id="box1">
<h3 class="title_style">Title1</h3>
<p>content for box1</p>
<div class="author">Author Name1<span class="style_date">date1</span>any text</div>
</div>
<div id="box2">
<h3 class="title_style">Title2</h3>
<p>content for box2</p>
<div class="author">Author Name2<span class="style_date">date2</span>any text2</div>
</div>
STR;
$dom = new DOMDocument();
$dom->loadHTML($str);
$divs = $dom->getElementsByTagName('div');
foreach ($divs as $div) {
//read child elements
}

Try this library Simple HTML Dom Parser.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

Extract links from specific table - php

Related

Couldnt grab div element with a specified class name using simple_html_dom?

PHP with DOMXPath - Sum values after selection of items

Efficient and easy way to extract data from divs taken from website

Php to auto populate grids

Find and separate the HTML blocks to an array

Categories

Resources