php dom parser get li palintext - php

i have html like this:
<li>
TEXT <---- GET THIS TEXT
<ul>
<li>a</li>
<li>aa</li>
</ul>
</li>
I want to get "TEXT" in li element, but then i try get li element I get all elements...
This is my code:
$html = str_get_html('<li>TEXT<ul><li>a</li><li>aa</li></ul></li>');
echo $html->find('li', 0)->plaintext
output:
TEXTaaa
but I need get only TEXT. And I can't add id or or something else

Each part before/after a node is a textnode, so you just need to get the first childnode:
$foo->firstChild->textContent;
I'm assuming Simple HTML Dom implements DOMDocument...

I solved it! What you needed was to grab the first textnode:
<?php
require_once 'simple_html_dom.php';
$html = str_get_html('<li>TEXT<ul><li>a</li><li>aa</li></ul></li>');
echo $html->find('li text', 0)->plaintext;
?>

OK, another example:
$html = str_get_html('<li>TEXTb<ul><li>a</li><li>aa</li></ul></li>');
echo $html->find('li', 0)->first_child()->plaintext;
now I get "b" how get "TEST" in this situation?

Related

how Access to a span tag without class name

I have this codes in my SimpleHtmlDom Project
how can I access this span Tags without Class Name?
<div class="somename">
<span>This text i need </span>
<span>This text i need too </span>
</div>
how can I echo that span tags?
I already tried this:
$html->find(".somename",0)->innertext;
I believe you are using simple_html_dom.php. If that is the case then:
$html->find("span",0)->innertext;
should give you the first span
$html->find("span",1)->innertext;
should give you the second span
$html->find("span")->innertext;
should give you all spans in an array
If you are trying to retrieve the content of the span you should use plaintext not innertext
If you want it to specifically search for spans in a div with a class somename you can do it like this:
$html->find("div[class=somename] span")->innertext;
Reference: http://simplehtmldom.sourceforge.net/manual.htm
Use xpath to get those span tags.
$xml = new SimpleXMLElement($yourHtmlContents);
$result = $xml->xpath('//span');
$firstSpan = (string) $result[0];
$secondSpan = (string) $result[1];

Find immediate descendants with PHP Simple DOM parser

I would like to be able to do the equivalent of
$html->find("#foo>ul")
But the PHP Simple DOM library doesn't recognize the "immediate descendant" selector > and so finds all <ul> items under #foo including those that are nested deeper in the dom.
What would you recommend as the best way to grab the immediate descendants that are of a specific type?
You can use DomElementFilter to fetch the desired type of nodes under some Dom branch. This is described here:
PHP DOM: How to get child elements by tag name in an elegant manner?
Or do a regular loop on all childNodes and filter then by their tag name by yourself:
foreach ($parent->childNodes as $node)
if ($node->nodeName == "tagname1")
...
HTML snippet
<div id="foo">
<ul>
<li>1</li>
</ul>
<ul>
<li>2</li>
</ul>
<ul>
<li>3</li>
</ul>
</div>
PHP code to get FIRST <ul>
echo $html->find('#foo>ul', 0);
this will output
<ul>
<li>1</li>
</ul>
but if you want to get just 1 from first <ul>
echo $html->find('#foo>ul', 0)->plaintext;
Just to share the solutions i found in related posts and to put it in a nutshell:
"Find immediate descendants with PHP Simple DOM parser" works both with...
...PHP Simple DOM:
//if there is only one div containing your searched tag
foreach ($html->find('div.with-given-class')[0]->children() as $div_with_given_class) {
if ($div_with_given_class->tag == 'tag-you-are-searching-for') {
$output [] = $div_with_given_class->plaintext; //or whatever you want
}
}
//if there are more divs with a given class (better solution)
$all_divs_with_given_class =
$html->find('div.with-given-class');
foreach ($all_divs_with_given_class as $single_div_with_given_class) {
foreach ($single_div_with_given_class->children() as $children) {
if ($children->tag == 'tag-you-are-searching-for') {
$output [] = $children->plaintext; //or whatever you want
}
}
}
...and also PHP DOM/xpath:
$all_divs_with_given_class =
$xpath->query("//div[#class='with-given-class']/tag-you-are-searching-for");
if (!is_null($all_divs_with_given_class)) {
foreach ($all_divs_with_given_class as $tag-you-are-searching-for) {
$ouput [] = $tag-you-are-searching-for->nodeValue; //or whatever you want
}
}
Note that you have to use single slashes "/" in the xpath to find immediate descendants only.

How to parse multiple elements in portions for html via Simple Html Dom

I am attempting to get various elements inside of an li as shown below. I am pretty new to this so I may not be using the most efficient methods but this is where I have started...
EXAMPLE CODE SIMPLIFIED....
<li id='entry_0' title='09879879'>
<div ....>
<h2> The title text would go here </h2>
<span class='entrySize' ....> 20oz </span>
<span class='entryPrice' ....> $32.09 </span>
<span class='anotherEntry' ....> More Data I need To Grab </span>
.......
</div>
</li>
<li> .... With same structure as above .... 100's of entries like this </li>
I know how to pull individual parts separately but having trouble grasping how to do it grouped within a portion of the html.
$filename = "directory/file.html";
$html = file_get_html($filename);
for($i=0; $i<=count(entryNumber);$i++)
{
$li_id = "entry_".$i;
foreach($html->find('li[id='.$li_id.']') as $li) {
echo $li->innertext;
}
}
So this gets me the content in the line item tag with the id number as the unique attribute. I would like to grab the h2 text, entrySize, entryPrice etc as I iterate through the line item tags. What I don't understand is once I have the line item tag content how can I parse through that line item inner tags and attributes. There maybe other parts of the full HTML document that has tags with same id, class as these throughout the document so I am breaking this down to portions and than looking to parse each section at a time.
I would also like to pull the title attribute out of the title tag for the li tag.
I hope my explanation make sense.
You should probably use a DOM parser. PHP comes bundled with one, and there are many other's you could use.
http://php.net/dom
PHP Simple HTML DOM Parser
<?php
$html = file_get_content($page);
$doc = new DOMDocument();
$doc->loadHTML($html);
// now find what you need
$items = $dom->getElementsByTagName('li');
foreach ($items as $item) {
$id = $item->getAttribute('id');
if (strpos($id, 'item_') !== false) {
// found matchin li, grab its children
}
}
Use this as a baseline, we can't write all the code for you. Check out the PHP docs to finish this :) From what I have so far, you need to follow the docs to make it grab the child values, and handle them.

How to get html code between two <p> tag?

I want get some html code between 2 tag and I have 2 regex for it
1-$LinkGrabber = "<p><strong>item1:<\/strong> <span style=\"color: #ff0000;\"><strong>Full<\/strong><\/span><\/p>(.*)<p> <\/p>";
2-$linkGrabber = "<p><strong>item2<\/strong> <span style=\"color: #ff0000;\"><strong>Full<\/strong><\/span><\/p>(.*)<p> <\/p>";
first code work fine but second not.can you tel me what's different between these code?
I'd say, they both work fine but they're named different. Make sure, when testing the second one to use $linkGrabber instead of $LinkGrabber in the first example.
Don't ever use Regex to Parse HTML tags. Make use of a DOM Parser.
$dom = new DOMDocument;
#$dom->loadHTML($html); //<---- Pass your HTML source here
foreach ($dom->getElementsByTagName('p') as $tag) {
echo $tag->nodeValue; //"prints" the content of the p tag.
}
The first is looking for HTML tags that contains item1: while the second looks for item2...

DOM Document - How to get the text inside a tag without the inner tags

Assume I have a dom_document containing the following html and it is put in a variable called $dom_document
<div>
<a href='something'>some text here</a>
I want this
</div>
What i would like is retrieve the text that is inside the div tag ('I want this'), but not the a tag. What i do is the following:
$dom_document->nodeValue;
Unfortunately with this statement I have the a tag in with it. Hope someone can help. Thank you in advance. Cheers. Marc
You can use XPath for it:
$xpath = new DOMXpath($dom_document);
$textNodes = $xpath->query('//div/text()');
foreach ($textNodes as $txt) {
echo $txt->nodeValue;
}

Categories