php dom parser return parent and child - php

I think this is a simple question but I can't sort it, I am trying to get all heading tags with the simple php DOM parser, my code works only one way, example
$heading['h2']=$html->find('h2 a');//works fine
I have found some sites wrap the h2 within the a tag like this
<a href='#'><h2> my heading</h2></a>
The problem is trying to get both tags so I can display the link with it. So when I do this
$heading['h2']=$html->find('a h2');
I get the h2 fine but it will not wrap the link tag around it, which of course makes sense, find all h2 tags that are children of a but how do I get the entire parent tag, I hope that makes sense, what I want it to return is
<h2>My Headings</h2>
then I can just print the output with
echo $headings['h2']; //and the link with be there

If the <a href="[..]"> ist just the outer element, you can do it like this:
$heading['h2']=$html->find('a h2');
foreach ($heading['h2'] as $h2) {
echo $h2->parent(), "\n";
}
You could also go up the DOM tree until you reach an <a> tag:
$heading['h2']=$html->find('a h2');
foreach ($heading['h2'] as $h2) {
$a = $h2;
while ($a && $a->tag != "h2") $a = $a->parent();
if (!$a) continue; // no <a> above <h2>
echo $a, "\n";
}

Well my first thought we be to use
$html->find('a');
But I'm guessing you have multiple links on your page. So the correct practice would then be to use an ID (or a class) to identify your link
<h2> my heading</h2>
And then search for that specific ID:
$html->find('a#titleLink');
I don't know what library you're using and what syntax it supports, but I hope you get the idea anyway.

According to docs: $heading['h2']=$html->find('a > h2')->parent(); would return the anchor tag wrapping the h2, but if you have multiple 'a > h2' in the page, the find function will return an array, so try it and/or use foreach.

$info = $html->find('a,h2');
echo '<a href='.$info[0]->href.'>'.$info[1]->innertext.'</a>';

Related

use selector search on html code(string) on PHP variable or ways alike

what im currently doing is i have a text area for user to copy and paste the html code.
i want to get a certain element of that html file.
in pure html, this can be done via jquery selector
but i think its a whole different thing when html code is on a variable and considered as a string.
how can i get a certain element location in that way?
code is:
function searchHtml() {
$html = $_POST; // text area input contains html code
$selector = "#rso > div > div > div:nth-child(1) > div > h3 > a"; //example - the a element with hello world
$getValue = getValueBySelector($selector); //will return hello world
}
function getValueBySelector($selector) {
//what will i do here?
}
searchHtml();
You can look at SimpleHTMLDom Parser (manual at http://simplehtmldom.sourceforge.net/manual.htm). This is a powerful tool to parse the HTML code to find and extract various elements and their attribute.
For your particular case, you can use
// Create a DOM object from the input string
$htmlDom = str_get_html($html);
// Find the required element
$e = $htmlDom->find($selector);
Oh, and you've to pass the provided input value to the getValueBySelector() function :-)

How to get html code between two <p> tag?

I want get some html code between 2 tag and I have 2 regex for it
1-$LinkGrabber = "<p><strong>item1:<\/strong> <span style=\"color: #ff0000;\"><strong>Full<\/strong><\/span><\/p>(.*)<p> <\/p>";
2-$linkGrabber = "<p><strong>item2<\/strong> <span style=\"color: #ff0000;\"><strong>Full<\/strong><\/span><\/p>(.*)<p> <\/p>";
first code work fine but second not.can you tel me what's different between these code?
I'd say, they both work fine but they're named different. Make sure, when testing the second one to use $linkGrabber instead of $LinkGrabber in the first example.
Don't ever use Regex to Parse HTML tags. Make use of a DOM Parser.
$dom = new DOMDocument;
#$dom->loadHTML($html); //<---- Pass your HTML source here
foreach ($dom->getElementsByTagName('p') as $tag) {
echo $tag->nodeValue; //"prints" the content of the p tag.
}
The first is looking for HTML tags that contains item1: while the second looks for item2...

search for element name using PHP simple HTML dom parser

I'm hoping someone can help me. I'm using PHP Simple HTML DOM Parser (http://simplehtmldom.sourceforge.net/manual.htm) successfully, but I now am trying to find elements based on a certain name. For example, in the fetched HTML, there might be a tags such as:
<p class="mattFacer">Matt Facer</p>
<p class="mattJones">Matt Jones</p>
<p class="daveSmith">DaveS Smith</p>
What I need to do is to read in this HTML and capture any HTML elements which match anything beginning with the word, "matt"
I've tried
$html = str_get_html("http://www.testsite.com");
foreach($html->find('matt*') as $element) {
echo $element;
}
but this doesn't work. It returns nothing.
Is it possible to do this? I basically want to search for any HTML element which contains the word "matt". It could be a span, div or p.
I'm at a dead end here!
$html = str_get_html("http://www.testsite.com");
foreach($html->find('[class*=matt]') as $element) {
echo $element;
}
Let's try that
Maybe this?
foreach(array_merge($html->find('[class*=matt]'),$html->find('[id*=matt]')) as $element) {
echo $element;
}

Highlighting Text: How to echo HTML DOM element with all tags

I want to highlight specified keywords in the body of an HTML document. At first I used preg_replace to put a < span > around the keywords, but of course that caused problems if the keyword was part of a tag, like the letter "i" (as in < li >). So instead, I'm using DOM::loadHTMLFile(path) to load the document, and then use the preg_replace inside the values of each child.
So far, so good. I can echo out the plain text of the document with the appropriate words highlighted and no interference from tags. But I need to echo the entire body of the text including the tags after the changes, and I don't know how. Here's what I have so far:
if (file_exists('mss/'.$link_title)) {
$descfile = DOMDocument::loadHTMLFile('mss/'.$link_title);
foreach ($descfile->childNodes as $e) {
$desc_output = $e->nodeValue;
$desc_output = preg_replace($to_highlight, "<span class=\"highlight\">$0</span>", $desc_output);
}
echo ???
}
What should I echo?
If you change your code to:
$e->nodeValue = preg_replace($to_highlight, "<span class=\"highlight\">$0</span>", $e->nodeValue);
You can probably use:
http://php.net/manual/de/domdocument.savehtml.php
to output your entire html document.

Highlight Search Terms in PHP without breaking anchor tags using regex

I'm searching through some database search results on a website & trying to highlight the term in the returned results that matches the searched term. Below is what I have so far (in php):
$highlight = trim($highlight);
if(preg_match('|\b(' . $highlight . ')\b|i', $str_content))
{
$str_content = preg_replace('|\b(' . $highlight. ')(?!["\'])|i', "<span class=\"highlight\">$1</span>",
$str_break;
}
The downside of going this route is that if my search term shows up in the url permalink as well, the returned result will insert the span into the href attribute and break the anchor tag. Is there anyway in my regex to exclude "any" information from the search results that appear in between an opening and closing HTML tag?
I know I could use the strip_tags() function and just spit out the results in plain text, but I'd rather not do that if I didn't have to.
DO NOT try to parse HTML with regular expressions:
RegEx match open tags except XHTML self-contained tags
Try something like PHP Simple HTML DOM.
<?php
// get DOM
$html = file_get_html('http://www.google.com/search?q=hello+kitty');
// ensure this is properly sanitized.
$term = trim($term);
// highlight $term in all <div class="result">...</div> elements
foreach($html->find('div.result') as $e){
echo str_replace($term, '<span class="highlight">'.$term.'</span>', $e->plaintext);
}
?>
Note: this is not an exact solution because I don't know what your HTML looks like, but this should put you pretty close to being on track.
I think assertions is what your looking for.
I ended up going this route, which so far, works well for this specific situation.
<?php
if(preg_match('|\b(' . $term . ')\b|i', $str_content))
{
$str_content = strip_tags($str_content);
$str_content = preg_replace('|\b(' . $term . ')(?!["\'])|i', "<span class=\"highlight\">$1</span>", $str_content);
$str_content = preg_replace('|\n[^<]+|', '</p><p>', $str_content);
break;
}
?>
It's still html encoded, but it's easier to parse through now without html tags

Categories