hi i was working on a scraper but i am unable to get one of information.
this is the link http://sfglobe.com/?id=19110
div class="video_container">
<div class="video_object">
<iframe id="player" width="100%" height="100%" frameborder="0" allowfullscreen="1" title="YouTube video player"
src="http://www.youtube.com/embed/KMYrIi_Mt8A?enablejsapi=1&controls=1&showinfo=0& color=white&rel=0&wmode=transparent&modestbranding=1&theme=light&autohide=1&start=4& origin=http%3A%2F%2Fsfglobe.com">
<!DOCTYPE html>
<html lang="en" data-cast-api-enabled="true" dir="ltr"
i need src ="http://www.youtube.com/embed/KMYrIi_Mt8A....."
i this is my code which does not work
foreach ($html->find('.video_object')as $iframe){
echo "this is video ".$iframe->outertext ." <br>";
}
thank you very uc
Do this return anything on your code?
$html->find('.video_object iframe')
If so, try using ->getAttribute('src'); it might work.
For further information take a look at PHP DOMElement
EDIT
Use XPath instead, it will output the expected result
//init DOMDocument
$dom = new DOMDocument();
//get the source from the URL
$html = file_get_contents("URL");
//load the html from html string
$dom->loadHTML($html);
//init XPath
$xpath = new DOMXPath($dom);
//fetch the src from the iframe within
$iframe_src=$xpath->query('//*[#class="CLASSNAME"]/iframe//#src');
vardump($iframe_src);
Related
I'm trying to get full accurate img tags from a html code using DOM:
$content=new DOMDocument();
$content->loadHTML($htmlcontent);
$imgTags=$content->getElementsByTagName('img');
foreach($imgTags as $tag) {
echo $content->saveXML($tag); }
If i had the original <img src="img">, the result would be <img src="img"/>. But i need the exact value corresponding to the original.
It is possible - to get the exact img tag using DOM without regular expressions or thirdparty libraries (Simple HTML DOM)?
No. It isn't possible to do this.
However, you can achieve your goal of removing the <img> elements from an HTML document if they meet specific conditions using DOMDocument. Here's some sample code which removes images which contain the class attribute "removeme".
$htmlcontent =
'<!DOCTYPE html><html><head><title>Example</title></head><body>'
. '<img src="1"><img src="2" class="removeme"><img src="3"><img class="removeme" src="4">'
. '</body></html>';
$content=new DOMDocument();
$content->loadHTML($htmlcontent);
foreach ($content->getElementsByTagName('img') as $image) {
if ($image->getAttribute("class") == "removeme") {
$image->parentNode->removeChild($image);
}
}
echo $content->saveHTML();
Output:
<!DOCTYPE html> <html><head><title>Example</title></head><body><img src="1"><img src="3"></body></html>
I have a html string of iframe where width and it's value is included. I want to replace the width's value by regex in php. For example, I am getting a value dynamically as
<iframe width="560" height="315" src="" frameborder="0" allowfullscreen></iframe>
I want to change the value of width by the regular expression. Can you help me someone.
Avoid using RegEx in XML/HTML documents, there are a performant libraries to do that, unless there a very very very good reason for that
Try with this code to achieve your job
<?php
$html = '<iframe width="560" height="315" src="" frameborder="0" allowfullscreen></iframe>';
$doc = new DOMDocument();
$doc->loadHTML($html);
$elements = $doc->getElementsByTagName('iframe');
foreach($elements as $el) {
$el->setAttribute('width', '1024');
}
print $doc->saveHTML();
OUTPUT
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
<html><body><iframe width="1024" height="315" src="" frameborder="0" allowfullscreen></iframe></body></html>
sounds like a really bad idea, but here goes... something like
<?php
header('content-type:text/plain;charset=utf8');
$str=base64_decode('PGlmcmFtZSB3aWR0aD0iNTYwIiBoZWlnaHQ9IjMxNSIgc3JjPSIiIGZyYW1lYm9yZGVyPSIwIiBhbGxvd2Z1bGxzY3JlZW4+PC9pZnJhbWU+');
$ret=preg_replace('/(\<iframe.*?width\=)\"(.*?)\"/','${1}"999"',$str);
var_dump($str,$ret);
will change width to 999... but you should really use a proper HTML parser instead, like DOMDocument.
$domd=#DOMDocument::loadHTML($str);
$domd->getElementsByTagName('iframe')->item(0)->setAttribute('width',999);
echo $domd->saveHTML($domd->getElementsByTagName('iframe')->item(0));
will also change width to 999, and is much more reliable (for example, the regex will break if there is spaces or newlines between the width and = , although it would still be legal html.. sigh)
I'm trying to validate the following HTML code (
Please note the text content inside IMG tag, which is structurally correct as markup, but invalid as HTML):
<html>
<head>
</head>
<body>
<img src="./">
Some Text
</img>
</body>
</html>
Using PHP and DomDocument, I try to read entire tree with XPATH:
$dom = new DOMDocument();
$dom->validateOnParse = 0;
$dom->loadHTML($htmlSource);
$xpath = new DOMXPath($dom);
$allNodes = $xpath->query("//node()");
The result I get:
/html
/html/head
/html/body
/html/body/#text[1]
/html/body/img
/html/body/#text[2]
which obviously does not match the exact HTML structure.
What I expected to see is
....
/html/body/img/#text
....
Why does XPATH interpret the tree this way?
How can I get it to work as I expected?
I have image tag <img src="path_to_file.png"> but I want that the image tag be converted to link in mobile site.
So I want img to be converted to an href:
Click here to open in new tab
I am getting started with php dom.
I could get all the attribute listed.
$newdocument = new DOMDocument();
$newdocument->loadHTML();
$getimagetag = $doc->getElementsByTagName('img');
foreach($getimagetag as $tag) {
echo $src=$tag->getAttribute('src');
}
But how do we get the src attribute , then remove the img tag completely because it contains other parameter like height and length and then create new tag of link?
Hi guys I could get it done from php dom using following code
$input="<img src='path_to_file.png' height='50'>";
$doc = new DOMDocument();
$doc->loadHTML($input);
$imageTags = $doc->getElementsByTagName('img');
foreach($imageTags as $tag) {
$src=$tag->getAttribute('src');
$a=$doc->createElement('a','click here to open in new tab');
$a->setAttribute('href',$src);
$a->setAttribute('style','color:red;');
$tag->parentNode->replaceChild($a,$tag);
}
$input=$doc->saveHTML();
echo $input;
The create element can also be used to put text between <a></a> ie Click...new tab.
replacechild is used to remove $tag i.e. img and replace it with a tag.
By setting attribute, we can add other parameters like style,target etc.
I used php dom in the end because I only wanted the data that I get from mysql to be converted and not the other elements like logo of website. Ofcourse it can be possible using javascript too.
Thanks
#dave chen for javascript way and pointing to detecting mobile link.
#nate for pointing me to a answer.
Use phpQuery, it's amazing. It's just like using jquery! :)
https://code.google.com/p/phpquery/
I would recommend doing this with JavaScript:
<!DOCTYPE html>
<html>
<head>
<title>Images Test</title>
<script>
window.onload = changeImages;
function changeImages() {
var images = document.getElementsByTagName("img");
while (images.length > 0) {
var imageLink = document.createElement("a");
imageLink.href = images[0].src;
imageLink.innerHTML = "Click here to view " + images[0].title;
images[0].parentNode.replaceChild(imageLink, images[0]);
}
}
</script>
</head>
<body>
Here is a image of flowers : <img src="images/flowers.bmp" title="Flowers" ><br>
Here is a image of lakes : <img src="images/lakes.bmp" title="Lakes" ><br>
Here is a image of computers: <img src="images/computers.bmp" title="Computers"><br>
</body>
</html>
Example
first I need to find all img in the sites,
and then check if the img have the "alt" attribute, if image have the attribute it'll be escaped and if it not have one or the alt is empty,a string will be randomly added to img from a list or array.
here is how you do it with javascript:
find if a img have alt in jquery if not then add from array
but it did not help me because according to this:
How do search engines crawl Javascript?
search bots can't read it , if you use JavaScript you need to use server-side language to add keyword to img alt.
what next? php? can i do it with a simple code?
Well, import it into an DOMDocument object and find all images inside.
Seems rather trivial. See the DOMDocument class
Here's my code for the problem:
<?php
$html = <<<HTML
<html lang="en-US">
<head>
<meta charset="UTF-8">
<title></title>
</head>
<body>
<p>
<img src="test.png">
<img src="test.jpg" alt="Testing">
<img src="test.gif">
</p>
</body>
</html>
HTML;
$dom = new DOMDocument();
$dom->loadHTML($html);
$images = $dom->getElementsByTagName("img");
foreach ($images as $image) {
if (!$image->hasAttribute("alt")) {
$altAttribute = $dom->createAttribute("alt");
$altAttribute->value = "Ready Value!";
$image->appendChild($altAttribute);
}
}
echo $dom->saveHTML();