PHP getting and setting attributes on HTML Elements [duplicate] - php

This question already has answers here:
PHP Getting and Setting tag attributes
(2 answers)
Closed 9 years ago.
I'm looking for a solution for manipulating html elements via php.
I was reading http://www.php.net/manual/en/book.dom.php but I didn't get to far.
I'm taking an "iframe" element ( video embed code ) and trying to modify it before echoing it.
I would like to add some parameters to the "src" attribute.
Based on the answer from https://stackoverflow.com/a/2386291 I'am able to iterate through element attributes.
$doc = new DOMDocument();
// $frame_array holds <iframe> tag as a string
$doc->loadHTML($frame_array['frame-1']);
$frame= $doc->getElementsByTagName('iframe')->item(0);
if ($frame->hasAttributes()) {
foreach ($frame->attributes as $attr) {
$name = $attr->nodeName;
$value = $attr->nodeValue;
echo "Attribute '$name' :: '$value'<br />";
}
}
My questions are:
How could I get the attribute value without iterating through all attributes of the element and checking to see if the current element is the one I'm looking for?
How can I set the attribute value on the element?
I prefer not to use regex for this because I would like it to be future proof. If the "iframe" tag is properly formatted, should I have any problems with this?
iframe example:
<iframe src="http://player.vimeo.com/video/68567588?color=c9ff23" width="486"
height="273" frameborder="0" webkitAllowFullScreen mozallowfullscreen allowFullScreen>
</iframe>

// to get the 'src' attribute
$src = $frame->getAttribute('src');
// to set the 'src' attribute
$frame->setAttribute('src', 'newValue');
To change the URL, you should first use parse_url($src), then rebuild it with your new query arguments, for example:
$parts = parse_url($src);
extract($parts); // creates $host, $scheme, $path, $query...
// extract query string into an array;
// be careful if you have magic quotes enabled (this function may add slashes)
parse_str($query, $args);
$args['newArg'] = 'someValue';
// rebuild query string
$query = http_build_query($args);
$newSrc = sprintf('%s://%s%s?%s', $scheme, $host, $path, $query);

I don't understand why you need to iterate through the attributes to determine if this is the element you are looking for. You seem to only be grabbing the first iframe element, so I am not clear what you first question is really about.
For your second question, you just need to use setAttribute() method of DOMElement like this:
$frame->setAttribute($attr_key, $attr_value);
You shouldn't have problems parsing the HTML you have shown.

Related

Get all HTML list element using Simple HTML Dom

Currently I am working on a project which requires me to parse some data from an alternative website, and I'm having some issues (note I am very new to PHP coding.)
Here's the code I am using below + the content it returns.
$dl = $html2->find('ol.tracklist',0);
print $dl = $dl->outertext;
The above code returns the data for what we're trying to get, it's below but extremely messy provided you would like to see click here.
However, when I put this in a foreach, it only returns one of the a href attributes at a time.
foreach($html2->find('ol.tracklist') as $li)
{
$title = $li->find('a',0);
print $title;
}
What can I do so that it returns all of the a href elements from the example code above?
NOTE: I am using simple_html_dom.php for this.
Based on the markup, just point directly to it, just get it list then point to its anchor:
foreach ($html2->find('ol.tracklist li') as $li) {
$anchor = $li->find('ul li a', 0);
echo $anchor->href; // and other attributes
}

PHP: How to change part of XML using DomElement

I am trying to make a function that changes part of an XML using XPath. I used part of someone else post:
/*********************************************************************
Function to replace part of an XML
**********************************************************************/
function replacePartofXML($element, $methodName, $methodValue, $xml, $newPartofXML)
{
$xpathstring = "//" . $element . "[#$methodName = \"$methodValue\"]";
$xml->xpath($xpathstring);
//$domToChange = dom_import_simplexml($xml->xpath($xpathstring));
$domToChange = dom_import_simplexml($xml);
$domReplace = dom_import_simplexml($newPartofXML);
$nodeImport = $domToChange->ownerDocument->importNode($domReplace, TRUE);
$domToChange->parentNode->replaceChild($nodeImport, $domToChange);
return($xml);
}
What I want to do is return the appended XML. I can't use dom_import_simplexml($xml->node->node) as my XML has many repeating element (but they have different ID reason why I am trying to use xpath)
The commented line does not work either as xpath returns an array and dom_import_simplexml is cannot import arrays.
Thanks for you input
You can take the first element returned by xpath() in case you believe the target element is unique (no-element-returned checking omitted) :
$domToChange = dom_import_simplexml($xml->xpath($xpathstring)[0]);
or iterate through the return value of xpath() and replace one by one.

How to use preg_match_all search if HTML source contains given URL?

I want to find all href tags that include my URL in any html source.
I used this code:
preg_match_all("'<a.*?href=\"(http[s]*://[^>\"]*?)\"[^>]*?>(.*?)</a>'si", $target_source, $matches);
Example, I try to find a href tags that include http://www.emrekadan.com
How can I do it ?
I'd simply use PHP's DOM Parser for this purpose. This may seem harder than regex, but it's actually a lot more easier and is the correct way to parse HTML.
$url = 'WEBSITE_TO_SEARCH_FOR';
$searchstring = 'YOUR_SEARCH_STRING';
$dom = new DOMDocument();
#$dom->loadHTMLFile($url);
$result = array();
foreach($dom->getElementsByTagName('a') as $link) {
$href = $link->getAttribute('href');
if(stripos($href, $searchstring) !== FALSE) {
$result[] = $href;
}
}
if(!empty($result)) print_r($result);
Explanation:
Loads the given URL using loadHTMLfile() method
Finds all <a> tags and loops through them
Uses stripos() to case-insensitively check if the href contains the given search term
If it does, it's pushed into the $result array
Note: If an empty string is passed as the filename or an empty file is named, a warning will be generated. I've used # to hide that message, but it's generally regarded as a bad practice. You can add additional checks to make sure the URL exists before trying to load it.

simple_html_dom find all elements that ONLY contain certain text

I have:
<span>something or other</span>
<b>blarg</b>
<b>blarg and stuff</b>
<span>blarg</span>
<em>wakka wakka</em>
<em>wakka blarg</em>
<em>blarg</em>
and I just want to get the elements that ONLY contain "blarg" and no other text, so:
<b>blarg</b>
<span>blarg</span>
<em>blarg</em>
The important issue here is that I'm trying to check if blarg exists within one element alone on the page or not. I've had some general luck with regex but I'd rather do it with simple_html_dom so that I can look at child and sibling elements as well.
Does anyone know what is the simplest way to do this with simple_html_dom?
A way to do it, is to parse every tag, and test if it contains 'blarg'...
Here's a working example:
$text = '<span>something or other</span>
<b>blarg</b>
<b>blarg and stuff</b>
<span>blarg</span>
<em>wakka wakka</em>
<em>wakka blarg</em>
<em>blarg</em>';
echo "<div>Original Text: <xmp>$text</xmp></div>";
$html = str_get_html($text);
// Find all elements
$tags = $html->find('*');
foreach ($tags as $key => $tag) {
// If text in tag contains 'blarg'
if (strcmp(trim($tag->plaintext),'blarg') == 0) {
echo "<div> 'blarg' found in \$tags[$key]: <xmp>".$tag->outertext."</xmp></div>";
}
}
I don't know what you want to do with, but this may be a start :)

PHP - Extracting two values from a line

I'm a beginner with regular expressions and am working on a server where I cannot instal anything (does using DOM methods require the instal of anything?).
I have a problem that I cannot solve with my current knowledge.
I would like to extract from the line below the album id and image url.
There are more lines and other url elements in the string (file), but the album ids and image urls I need are all in strings similar to the one below:
<img alt="/" src="http://img255.imageshack.us/img00/000/000001.png" height="133" width="113">
So in this case I would like to get '774' and 'http://img255.imageshack.us/img00/000/000001.png'
I've seen multiple examples of extracting just the url or one other element from a string, but I really need to keep these both together and store these in one record of the database.
Any help is really appreciated!
Since you are new to this, I'll explain that you can use PHP's HTML parser known as DOMDocument to extract what you need. You should not use a regular expression as they are inherently error prone when it comes to parsing HTML, and can easily result in many false positives.
To start, lets say you have your HTML:
$html = '<img alt="/" src="http://img255.imageshack.us/img00/000/000001.png" height="133" width="113">';
And now, we load that into DOMDocument:
$doc = new DOMDocument;
$doc->loadHTML( $html);
Now, we have that HTML loaded, it's time to find the elements that we need. Let's assume that you can encounter other <a> tags within your document, so we want to find those <a> tags that have a direct <img> tag as a child. Then, check to make sure we have the correct nodes, we need to make sure we extract the correct information. So, let's have at it:
$results = array();
// Loop over all of the <a> tags in the document
foreach( $doc->getElementsByTagName( 'a') as $a) {
// If there are no children, continue on
if( !$a->hasChildNodes()) continue;
// Find the child <img> tag, if it exists
foreach( $a->childNodes as $child) {
if( $child->nodeType == XML_ELEMENT_NODE && $child->tagName == 'img') {
// Now we have the <a> tag in $a and the <img> tag in $child
// Get the information we need:
parse_str( parse_url( $a->getAttribute('href'), PHP_URL_QUERY), $a_params);
$results[] = array( $a_params['album'], $child->getAttribute('src'));
}
}
}
A print_r( $results); now leaves us with:
Array
(
[0] => Array
(
[0] => 774
[1] => http://img255.imageshack.us/img00/000/000001.png
)
)
Note that this omits basic error checking. One thing you can add is in the inner foreach loop, you can check to make sure you successfully parsed an album parameter from the <a>'s href attribute, like so:
if( isset( $a_params['album'])) {
$results[] = array( $a_params['album'], $child->getAttribute('src'));
}
Every function I've used in this can be found in the PHP documentation.
If you've already narrowed it down to this line, then you can use a regex like the following:
$matches = array();
preg_match('#.+album=(\d+).+src="([^"]+)#', $yourHtmlLineHere, $matches);
Now if you
echo $matches[1];
echo " ";
echo $matches[2];
You'll get the following:
774 http://img255.imageshack.us/img00/000/000001.png

Categories