This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
get wrapping element using preg_match php
I want to get the element that wraps a specified string, so example:
$string = "My String";
$code = "<div class="string"><p class='text'>My String</p></div>";
So how am i able to get <p class='text'></p> that wraps the string by matching it using regex pattern.
Using the DOM Classes of PHP you are able to do so.
$html = new DomDocument();
// load in the HTML
$html->loadHTML('<div class="string"><p class=\'text\'>My String</p></div>');
// create XPath object
$xpath = new DOMXPath($html);
// get a DOMNodeList containing every DOMNode which has the text 'My String'
$list = $xpath->evaluate("//*[text() = 'My String']");
// lets grab the first item from the list
$element = $list->item(0);
now we have the whole <p>-tag. But we need to remove all child nodes. Here a little function:
function remove_children($node) {
while (($childnode = $node->firstChild) != null) {
remove_children($childnode);
$node->removeChild($childnode);
}
}
let's use this function:
// remove all the child nodes (including the text 'My String')
remove_children($element);
// this will output '<p class="text"></p>'
echo $html->saveHTML($element);
Related
I did the following which works with simple text fields:
$field = "How are you doing?";
$arr = explode(' ',trim($field));
$first_word = $arr[0];
$balance = strstr("$field"," ");
It didn't work because the field contains html markup, perhaps an image, video, div, div, paragraph, etc and resulted in all text within the html getting mixed in with the text.
I could possibly use strip_tags to strip out the html then obtain first word and reformat it, but then I would have to figure out how to add the html back into the data. I'm wondering if there is a php or custom function ready made for this purpose.
You can use DOMDocument to parse the HTML, modify the contents, and save it back as HTML. Also, find the words is not always as simple as using space delimiters since not all languages delimit their words with spaces and not all words are necessarily delimited by spaces. For example: mother-in-law this could be viewed as one word or as 3 depending on how you define a word. Also, things like pancake do you consider this one word or two (pan and cake)? One simple solution is to use the IntlBreakIterator::createWordInstance class which implements the Unicode Standard for text segmentation A.K.A UAX #29.
Here's an example of how you might go about implementing this:
$html = <<<'HTML'
<div>some sample text here</div>
HTML;
/* Let's extend DOMDocument to include a walk method that can traverse the entire DOM tree */
class MyDOMDocument extends DOMDocument {
public function walk(DOMNode $node, $skipParent = false) {
if (!$skipParent) {
yield $node;
}
if ($node->hasChildNodes()) {
foreach ($node->childNodes as $n) {
yield from $this->walk($n);
}
}
}
}
$dom = new MyDOMDocument;
$dom->loadHTML($html, LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD);
// Let's traverse the DOMTree to find the first text node
foreach ($dom->walk($dom->childNodes->item(0)) as $node) {
if ($node->nodeName === "#text") {
break;
}
}
// Extract the first word from that text node
$iterator = IntlBreakIterator::createWordInstance();
$iterator->setText($node->nodeValue); // set the text in the word iterator
$it = $iterator->getPartsIterator(IntlPartsIterator::KEY_RIGHT);
foreach ($it as $offset => $word) {
break;
}
// You can do whatever you want to $word here
$word .= "s"; // I'm going to append the letter s
// Replace the text node with the modification
$unmodifiedString = substr($node->nodeValue, $offset);
$modifiedString = $word . $unmodifiedString;
$oldNode = $node; // Keep a copy of the old node for reference
$node->nodeValue = $modifiedString;
// Replace the node back into the DOM tree
$node->parentNode->replaceChild($node, $oldNode);
// Save the HTML
$newHTML = $dom->saveHTML();
echo $newHTML;
Outputs
<div>somes sample text here</div>
This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Grabbing the href attribute of an A element
I need to parse all links of an HTML document that contain some word (it's always different).
Example:
BLA
BLA
BLA
I only need the links with "href=/link: ...." what's the best way to go for it?
$html = "SOME HTLM ";
$dom = new DomDocument();
#$dom->loadHTML($html);
$urls = $dom->getElementsByTagName('a');
foreach ($urls as $url)
{
echo "<br> {$url->getAttribute('href')} , {$url->getAttribute('title')}";
echo "<hr><br>";
}
In this example all links are shown, I need specific links.
By using a condition.
<?php
$lookfor='/link:';
foreach ($urls as $url){
if(substr($url->getAttribute('href'),0,strlen($lookfor))==$lookfor){
echo "<br> ".$url->getAttribute('href')." , ".$url->getAttribute('title');
echo "<hr><br>";
}
}
?>
Instead of first fetching all the a elements and then filtering out the ones you need you can query your document for those nodes directly by using XPath:
//a[contains(#href, "link:")]
This query will find all a elements in the document which contain the string link: in the href attribute.
To check whether the href attribute starts with link: you can do
//a[starts-with(#href, "link:")]
Full example (demo):
$dom = new DomDocument();
$dom->loadHTML($html);
$xpath = new DOMXPath($dom);
foreach ($xpath->query('//a[contains(#href, "link:")]') as $a) {
echo $a->getAttribute('href'), PHP_EOL;
}
Please also see
Implementing condition in XPath
excluding URLs from path links?
PHP/XPath: find text node that "starts with" a particular string?
PHP Xpath : get all href values that contain needle
for related questions.
Note: marking this CW because of the many related questions
Use regular expressions.
foreach ($urls as $url)
{
$href = $url->getAttribute('href');
if (preg_match("/^\/link:/",$href){
$links[$url->getAttribute('title')] = $href;
}
}
$links array contains all of the titles and href's that match.
As getAttribute simply returns a string you only need to check what it starts with with strpos().
$href = $url -> getAttrubute ('href');
if (strpos ($href, '/link:') === 0)
{
// Do your processing here
}
This question already has answers here:
PHP Getting and Setting tag attributes
(2 answers)
Closed 9 years ago.
I'm looking for a solution for manipulating html elements via php.
I was reading http://www.php.net/manual/en/book.dom.php but I didn't get to far.
I'm taking an "iframe" element ( video embed code ) and trying to modify it before echoing it.
I would like to add some parameters to the "src" attribute.
Based on the answer from https://stackoverflow.com/a/2386291 I'am able to iterate through element attributes.
$doc = new DOMDocument();
// $frame_array holds <iframe> tag as a string
$doc->loadHTML($frame_array['frame-1']);
$frame= $doc->getElementsByTagName('iframe')->item(0);
if ($frame->hasAttributes()) {
foreach ($frame->attributes as $attr) {
$name = $attr->nodeName;
$value = $attr->nodeValue;
echo "Attribute '$name' :: '$value'<br />";
}
}
My questions are:
How could I get the attribute value without iterating through all attributes of the element and checking to see if the current element is the one I'm looking for?
How can I set the attribute value on the element?
I prefer not to use regex for this because I would like it to be future proof. If the "iframe" tag is properly formatted, should I have any problems with this?
iframe example:
<iframe src="http://player.vimeo.com/video/68567588?color=c9ff23" width="486"
height="273" frameborder="0" webkitAllowFullScreen mozallowfullscreen allowFullScreen>
</iframe>
// to get the 'src' attribute
$src = $frame->getAttribute('src');
// to set the 'src' attribute
$frame->setAttribute('src', 'newValue');
To change the URL, you should first use parse_url($src), then rebuild it with your new query arguments, for example:
$parts = parse_url($src);
extract($parts); // creates $host, $scheme, $path, $query...
// extract query string into an array;
// be careful if you have magic quotes enabled (this function may add slashes)
parse_str($query, $args);
$args['newArg'] = 'someValue';
// rebuild query string
$query = http_build_query($args);
$newSrc = sprintf('%s://%s%s?%s', $scheme, $host, $path, $query);
I don't understand why you need to iterate through the attributes to determine if this is the element you are looking for. You seem to only be grabbing the first iframe element, so I am not clear what you first question is really about.
For your second question, you just need to use setAttribute() method of DOMElement like this:
$frame->setAttribute($attr_key, $attr_value);
You shouldn't have problems parsing the HTML you have shown.
This question already has an answer here:
Closed 10 years ago.
Possible Duplicate:
DOMDocument::load - PHP - Getting attribute value
I have many div tags pulled from a string through php, each of them having a unique id and a subjective class. I am trying to get the id and class of each of the divs but am not too sure how I would do this.
HTML:
<div id='x1y1' class = 'classname'></div><div id = 'x2y1' class = 'classname1'>
so far I have tried
$html = new DOMDocument();
$html->loadHTML($boardDataStripSlashes);
$elements = $html->getElementsByTagName('div');
but have not been able to find anything on how to get the actual id's and classes of the selected elements.
You need to use DOMElement::getAttribute to retrieve attributes of elements.
foreach($elements as $element) {
$id = $element->getAttribute('id');
$className = $element->getAttribute('class');
// ...
}
This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Grabbing the href attribute of an A element
I need to parse all links of an HTML document that contain some word (it's always different).
Example:
BLA
BLA
BLA
I only need the links with "href=/link: ...." what's the best way to go for it?
$html = "SOME HTLM ";
$dom = new DomDocument();
#$dom->loadHTML($html);
$urls = $dom->getElementsByTagName('a');
foreach ($urls as $url)
{
echo "<br> {$url->getAttribute('href')} , {$url->getAttribute('title')}";
echo "<hr><br>";
}
In this example all links are shown, I need specific links.
By using a condition.
<?php
$lookfor='/link:';
foreach ($urls as $url){
if(substr($url->getAttribute('href'),0,strlen($lookfor))==$lookfor){
echo "<br> ".$url->getAttribute('href')." , ".$url->getAttribute('title');
echo "<hr><br>";
}
}
?>
Instead of first fetching all the a elements and then filtering out the ones you need you can query your document for those nodes directly by using XPath:
//a[contains(#href, "link:")]
This query will find all a elements in the document which contain the string link: in the href attribute.
To check whether the href attribute starts with link: you can do
//a[starts-with(#href, "link:")]
Full example (demo):
$dom = new DomDocument();
$dom->loadHTML($html);
$xpath = new DOMXPath($dom);
foreach ($xpath->query('//a[contains(#href, "link:")]') as $a) {
echo $a->getAttribute('href'), PHP_EOL;
}
Please also see
Implementing condition in XPath
excluding URLs from path links?
PHP/XPath: find text node that "starts with" a particular string?
PHP Xpath : get all href values that contain needle
for related questions.
Note: marking this CW because of the many related questions
Use regular expressions.
foreach ($urls as $url)
{
$href = $url->getAttribute('href');
if (preg_match("/^\/link:/",$href){
$links[$url->getAttribute('title')] = $href;
}
}
$links array contains all of the titles and href's that match.
As getAttribute simply returns a string you only need to check what it starts with with strpos().
$href = $url -> getAttrubute ('href');
if (strpos ($href, '/link:') === 0)
{
// Do your processing here
}