phpQuery selecting div inside first li - php

I have html page what im trying to read(used htmlsql.class.php, but as its too old and outdated, then i have to use phpQuery).
The html markup is:
<ul class="small-block-grid-1 medium-block-grid-2 large-block-grid-3">
<li>
<div data-widget-type="epg.tvGuide.channel" data-view="epg.tvGuide.channel" id="widget-765574917197" class=" widget-epg_tvGuide_channel">
<div class="group-box">
<div class="group-header l-center" data-action="togglePreviousBroadcasts">
<span class="header-text">
<img src="logo.png" style="height: 40px" />
</span>
</div>
<div>
<div class="tvGuide-item is-past">
<span data-action="toggleEventMeta">
06:15 what a day
</span>
<div class="tvGuide-item-meta">
Some text.
<div>Näita rohkem</div>
</div>
</div>
<div class="tvGuide-item is-current">
<span data-action="toggleEventMeta">
06:15 what a day
</span>
<div class="tvGuide-item-meta">
Some text.
<div>Näita rohkem</div>
</div>
</div>
<div class="tvGuide-item">
<span data-action="toggleEventMeta">
06:15 what a day
</span>
<div class="tvGuide-item-meta">
Some text.
<div>Näita rohkem</div>
</div>
</div>
</div>
</div>
Then with the previos thing it was fearly easy:
$wsql->select('li');
if (!$wsql->query('SELECT * FROM span')){
print "Query error: " . $wsql->error;
exit;
}
foreach($wsql->fetch_array() as $row){
But i could not read the class so i need to know when the class is current and when its not.
As im new to phpQuery then and reallife examples are hard to find.
can someone point me to the right direction.
I would like to have the "span" text and item meta, allso i like to know when the div class is "is-past" or "is-current"

You can find infos about phpQuery here: https://code.google.com/archive/p/phpquery/
I prefer "one-file" version on top in downloads:
https://code.google.com/archive/p/phpquery/downloads
Simple examples based on your code:
// for loading files use phpQuery::newDocumentFileHTML();
// for plain strings use phpQuery::newDocument();
$document = phpQuery::newDocumentFileHTML('http://domain.com/yourFile.html');
$items = pq($document)->find('.tvGuide-item');
foreach($items as $item) {
if(pq($item)->hasClass('is-past') === true) {
// matching past items
}
if(pq($item)->hasClass('is-current') === true) {
// matching current items
}
// examples for finding elements and grabbing text/attributes
$span = pq($item)->find('span');
$text_in_span = pq($span)->text();
$meta = pq($item)->find('.tvGuide-item-meta');
$link_in_meta = pq($meta)->find('a');
$href_of_link_in_meta = pq($link_in_meta)->attr('href');
}

Related

simple html dom traversal confusion when looping

I'm trying to use the php script simplehtmldom to loop over divs on a web page while scraping.
Right now I have this:
$url = "https://test.com/";
$html = new simple_html_dom();
$html->load_file($url);
$item_list = $html->find('div.main div[id]');
foreach ($item_list as $item)
{
echo $item->outertext . PHP_EOL;
}
This will give me many like this (from the echo in the loop above):
<div id=1>
<div>
stuff here
</div>
<div>
<span class="title">name</span>
</div>
</div>
<div id=2>
<div>
stuff here
</div>
<div>
<span class="title">name 2</span>
</div>
</div>
What I'm trying to do is loop over the span with class=title, but no matter what I can't seem to quite get the right selector. Could someone help me out?
You can get the spans adding span[class=title] as a selector:
$item_list = $html->find('div.main div[id] span[class=title]');
foreach ($item_list as $item)
{
echo $item->outertext . PHP_EOL;
}

Xpath to select based on attribute value not working in PHP as expected

The code works well if it is "//div" or "//html". The moment I use "//*[#class='hit']", "//div[#class='hit']" or '//*[#class="hit"]', it does not select the element I need.
This is the code:
$xpath = "//div";
$data = file_get_contents("https://www.hachi.tech/searching?q=&hPP=144&idx=instant_product_price_asc&p=0&is_v=1");
d($data);
//d() is a custom function that works like var_dump
$doc = new DOMDocument();
$doc->loadHTML($data);
$xpatho = new DOMXpath($doc);
$elementsn = $xpatho->query($xpath);
d($xpath);
d($elementsn->length);
//d() is a custom function that works like var_dump
When I dumped $data, I got this:
https://justpaste.it/38v46
(the text is very long so I pasted in a separate link).
There is clearly a div element with class="hit" in the html (you can do a search). Search for:
<div class="hit" style="min-height:258px;">
I can only think of malformed HTML, in which case what can I do in general to check (and fix!) the HTML first before passing it for selection?
You can't access that div directly because it is not in the DOM; it is inside a script element and thus treated as text. It gets added to the DOM once the page loads, via this code:
a.addWidget(instantsearch.widgets.hits({container:"#hits",hitsPerPage: showperpage,templates:{item: document.querySelector("#hit-template").innerHTML ,empty: document.querySelector("#no-results-template").innerHTML }}));
You can find the contents of the script tag with this code:
$xpatho = new DOMXPath($dom);
$elementsn = $xpatho->query("//*[contains(text(), 'div class=\"hit\"')]");
var_dump($elementsn);
echo htmlspecialchars($elementsn->item(0)->nodeValue);
Output:
<div class="col-lg-3 col-md-3 col-sm-4 col-xs-6"> <div class="hit" style="min-height:258px;"> <div class="thumbnail"> <div class="stay-image"> <div class="space-overlay hide" id="{{item_id}}"> <a href="https://www.hachi.tech/product/{{{item_id}}}/{{{item_url_desc}}}"><img src="https://cdn.hachi.tech/assets/images/product_images_thumb/{{image}}" alt="{{item_desc}}" class="img-responsive"/> <div class="caption"> <h5 class="product-title"><a style="-webkit-line-clamp: 3;max-height: 50px;" href="https://www.hachi.tech/product/{{{item_id}}}/{{{item_url_desc}}}">{{{_highlightResult.item_desc.value}}} <div class="prod-rating" id="R{{{item_id}}}"> <span class="text-red mbr-price"><span class="text-red mbr-price hidden-xs">ValueClub {{{display_final_price}}} <br class="dp_rebate"><span class="dp_rebate text-red product-price">{{{rebate}}} <p> <span class="reg-price"> <span class="hidden-xs hidden-sm" style='color:black'>{{strikeoff}} <div class="positionBtns"> {{#color_display}} <a class="hidden-xs hidden-sm" id="{{item_id}}" href="https://www.hachi.tech/product/{{{item_id}}}/{{{item_url_desc}}}"><span class="colours-special">{{color_display}} {{/color_display}} {{#special_display}} <a class="hidden-xs hidden-sm" id="{{item_id}}" href="https://www.hachi.tech/product/{{{item_id}}}/{{{item_url_desc}}}"><span class="colours-special">{{special_display}} {{/special_display}} <span class="colours-special publish_inv_check hide " style="color: #EE1C24;" id="{{item_id}}">Sold Out
but if you want to directly access that div, you will need to use a headless browser such as Selenium.

How to get content text of div by simple html dom - php

I get the bottom html code by simple dom html (file_get_html('http://example.com'))
<div id="ship" class="fe" data-feature-name="box" data-cel-widget="sox">
<div class="a-medium b-di">
<div id="mer-info" class="a-section a-spacing-mini">
Hello World
<span class="">
</span>
</div>
</div>
</div>
How can I get 'Hello World" content text?
I tried a lot of things for example bottom text, but that gave me 'NULL'
$html->find('div[id="mer-info"]',0);
$html->find("div#mer-info");
$html->find("div#mer-info")->plaintext;
$html->find('div[id="mer-info"]')->innertext;
and ...
But I got NULL still!
You only passed the second argument (0) to find method where you used div[id="mer-info"] as selector, which seems not to be recognized by find method. Try the following:
require 'simple_html_dom.php';
$html =<<<html
<div id="ship" class="fe" data-feature-name="box" data-cel-widget="sox">
<div class="a-medium b-di">
<div id="mer-info" class="a-section a-spacing-mini">
Hello World
<span class="">
</span>
</div>
</div>
</div>
html;
$dom = str_get_html($html);
$elem = $dom->find('#mer-info', 0);
print $elem->plaintext;
print "\n";
$elem = $dom->find('div#mer-info', 0);
print $elem->plaintext;

How do I extract keyword from webpage using PHP DOM

Here is a same of code I have extracted from a webpage...
<div class="user-details-narrow">
<div class="profileheadtitle">
<span class=" headline txtBlue size15">
Profession
</span>
</div>
<div class="profileheadcontent-narrow">
<span class="txtGrey size15">
administration
</span>
</div>
</div>
When displayed on the webpage it shows as "Profession administration". What I want to do is extract the profession, in this case "administration". However, it's not as simple as it might seem because this piece of code is repeated many times for various other questions, such as
<div class="user-details-narrow">
<div class="profileheadtitle">
<span class=" headline txtBlue size15">
Industry
</span>
</div>
<div class="profileheadcontent-narrow">
<span class="txtGrey size15">
banking
</span>
</div>
</div>
Any ideas on a good solution?
Please, do not use regular expressions for getting node values from a page.
PHP have a very nice class named DOMDocument. You can just fetch a page as DOMDocument:
$dom = new DOMDocument;
$dom->loadURL("http://test.de/page.html");
$finder = new DomXPath($doc);
$spaner = $finder->query("//*[contains(#class, 'size15')]");
echo $spaner->item(0)->nodeValue . "/" . $spaner->item(1)->nodeValue;

Php to auto populate grids

I have the following html code:
<div class="media row-fluid">
<div class="span3">
<div class="widget">
<div class="well">
<div class="view">
<img src="img/demo/media/1.png" alt="" />
</div>
<div class="item-info">
Title 1
<p>Info.</p>
<p class="item-buttons">
<i class="icon-pencil"></i>
<i class="icon-trash"></i>
</p>
</div>
</div>
</div>
<div class="widget">
<div class="well">
<div class="view">
<img src="img/demo/media/2.png" alt="" />
</div>
<div class="item-info">
This is another title
<p>Some info and details go here.</p>
<p class="item-buttons">
<i class="icon-pencil"></i>
<i class="icon-trash"></i>
</p>
</div>
</div>
</div>
</div>
Which basically alternates between a span class with the widget class, and then the widget class without the span3 class.
What I wanted to know was if there was a way to have php "echo" or populate the details for and details under the "item-info" class. Would I need to use a foreach statement to get this done? I would be storing the information in a mysql database, and while I can get it to fill in the info one by one (repeatedly entering the and echoing out each image and item title) it's not practical when the content needed to be displayed is over 15 different items. I'm not well versed in foreach statements so I could definitely use some help on it.
If someone could help me perhaps structure a php script so that it can automatically output the html based on the number individual items in the database, that'd be greatly appreciated!
I'm wondering if the html + php (not including the foreach) would look like this:
<div class="span3">
<div class="widget">
<div class="well">
<div class="view">
<img src="img/<? $file ?>" alt="" />
</div>
<div class="item-info">
<?$title?>
<p>Info.</p>
<p class="item-buttons">
<i class="icon-pencil"></i>
<i class="icon-trash"></i>
</p>
</div>
</div>
</div>
EDIT:
I wanted to add some more information. The items populated would be based on a type of subscription - which will be managed by a group id.
I was initially going to use <? (if $_SESSION['group_id']==1)>
echo <div class="item-info">
$title
<p>$info</p>
</div>
so that only the subscribed items would populate. But, I would need it to iterate through all the items for group1 table and list it. Currently I know that I can do
<? (if $_SESSION['group_id']==1)
while ($row=mysql_fetch_assoc($sqlItem))
{
$itemInfo = $row['info'];
$image = $row['image'];
$title = $row['title'];
$url = $row['url'];
};
>
$sqlItem for now can only be assigned one thing (manually - as in: $sqlItem = '123'), unless I iterate through which is what I'm trying to figure out.
Just read that 'mysql_fetch_assoc' is being depreciated with 5.5, here is the new way and looks better, easier I think.. Hope this helps, was updated today.
I hope this helps http://php.net/manual/en/mysqli-stmt.fetch.php
replace the printf with echo '//then your html stuff
This will iterate through the rows in your database until their are no more matching records.
shouldn't a while be enough? It depends on the structure of your database and website (we didn't need so much HTML I think. Some more PHP maybe). Hope this helps.

Categories