Get the first image in a page with class foo - php

I'm trying to get the first image with specific class from page by php
<?php
$document = new DOMDocument();
#$document->loadHTML(file_get_contents('http://www.cbsnews.com/8301-501465_162-57471379-501465/first-picture-on-the-internet-turns-20/'));
$lst = $document->getElementsByTagName('img');
for ($i=0; $i<$lst->length; $i++) {
$image = $lst->item($i);
echo $image->attributes->getNamedItem('src')->value, '<br />';
}
?>
this code get all images from the page, i'm trying now to get the images with class "cnet-image" from this page

You should be able to do what you need to with Simple HTML Dom, give it a try, I've used it for several similar things including image crawlers. http://simplehtmldom.sourceforge.net/
It looks like you should be able to use the following for what you need.
// Create DOM from URL or file
$html = file_get_html('http://www.google.com/');
$ret = $html->find('img[class=foo]');

I presume that you want to retrieve the first image with a specific class attribute name in a HTML document.
If that's the case, then this could help.
var l = document.images;
var myclass = "myclass";//This is the class you want
var firstImageWithMyClass = null;
for(var i = 0; i<l; i++)
if(document.images[i].className==myclass){
firstImageWithMyClass = document.images[i];
break;
}
//Then you can see if an image with that class was found,
//then do what you want to do withit here;
if(firstImageWithMyClass!=null){
var imageSource = firstImageWithMyClass.src;
//etc, etc
}
jQuery makes this easier. Let me know if you would like to know how to do the same with jQuery and I can share with you.

Related

str_replace for multiple img classes

I am currently trying to use a str_replace() to replace multiple unique image classes on my site. How is this possible to do.
preg_match('/src=".+?(\.jpg)/', $image, $src);
if ($src) {
$classedImg = str_replace('<img', '<img class="plant-img" ', $image);
$src = str_replace(array('src="', '//'), array('','/'), $src[0]);
this is how it is currently set up and functioning to replace the class for all images but how would I go about doing this for multiple images setting up unique classes assigned each image.
Thank you.
If your solution doesn't require php:
Javascript
Note: you can modify the function and add arguments. I am not sure I understand the reason for why you want a unique class, but you can just increment some variable like below.
<script>
function replaceClass() {
var x = document.getElementsByClassName("targetClass");
for (var i = 0; i < x.length; i++) {
x[i].classList.add('replaceWithClass'+i); // Added the "+ i" to make unique
}
}
</script>

scraping images from url using php

i am trying to make a page that allows me to grab and save images from another link , so here's what i want to add on my page:
text box (to enter url that i want to get images from).
save dialog box to specify the path to save images.
but what i am trying to do here i want to save images only from that url and from inside specific element.
for example on my code i say go to example.com and from inside of element class="images" grab all images.
notes: not all images from the page, just from inside the element
whether element has 3 images in it or 50 or 100 i don't care.
here's what i tried and worked using php
<?php
$html = file_get_contents('http://www.tgo-tv.net');
preg_match_all( '|<img.*?src=[\'"](.*?)[\'"].*?>|i',$html, $matches );
echo $matches[ 1 ][ 0 ];
?>
this gets image name and path but what i am trying to make is a save dialog box and the code must save image directly into that path instead of echo it out
hope you understand
Edit 2
it's ok of Not having save dialog box. i must specify save path from the code
If you want something generic, you can use:
<?php
$the_site = "http://somesite.com";
$the_tag = "div"; #
$the_class = "images";
$html = file_get_contents($the_site);
libxml_use_internal_errors(true);
$dom = new DOMDocument();
$dom->loadHTML($html);
$xpath = new DOMXPath($dom);
foreach ($xpath->query('//'.$the_tag.'[contains(#class,"'.$the_class.'")]/img') as $item) {
$img_src = $item->getAttribute('src');
print $img_src."\n";
}
Usage:
Change the site, tag, which can be a div, span, a, etc. also change the class name.
For example, change the values to:
$the_site = "https://stackoverflow.com/questions/23674744/what-is-the-equivalent-of-python-any-and-all-functions-in-javascript";
$the_tag = "div"; #
$the_class = "gravatar-wrapper-32";
Output:
https://www.gravatar.com/avatar/67d8ca039ee1ffd5c6db0d29aeb4b168?s=32&d=identicon&r=PG
https://www.gravatar.com/avatar/24da669dda96b6f17a802bdb7f6d429f?s=32&d=identicon&r=PG
https://www.gravatar.com/avatar/24780fb6df85a943c7aea0402c843737?s=32&d=identicon&r=PG
Maybe you should try HTML DOM Parser for PHP. I've found this tool recently and to be honest it works pretty well. It was JQuery-like selectors as you can see on the site. I suggest you to take a look and try something like:
<?php
require_once("./simple_html_dom.php");
foreach ($html->find("<tag>") as $<tag>) //Start from the root (<html></html>) find the the parent tag you want to search in instead of <tag> (e.g "div" if you want to search in all divs)
{
foreach ($<tag>->find("img") as $img) //Start searching for img tag in all (divs) you found
{
echo $img->src . "<br>"; //Output the information from the img's src attribute (if the found tag is <img src="www.example.com/cat.png"> you will get www.example.com/cat.png as result)
}
}
?>
I hope i helped you less or more.

Reference an element id from another HTML file

So I have two files one being index.php and the other march.html, I'd like to grab the employee's name out of march.html which I set an id like
<h3 id="name">John Doe</h3>
So how would I go about grabbing that name out of march.html so I can place it in my index.php. If you want some more detail, it's like employee of the month so I need to grab a name from 11 other files so I can reference them in the index.php. I've tried using DomDocument in php but it's showing a lot of trouble but here is that code just because
<?php
$dom = new DomDocument();
$dom->validateOnParse = true;
$dom->loadHtml("march.html");
$name = $dom->getElementById("name");
print $name;
?>
Use nodeValue property to get the value:
<?php
$dom = new DOMDocument();
$dom->validateOnParse = true;
$dom->loadHTML(file_get_contents("march.html"));
$name = $dom->getElementById("name")->nodeValue;
print $name;
?>
I would use a jQuery AJAX call to do that with the specialized ajax function "load"
example:
$('#result').load('ajax/test.html #container');
jQuery load API:
http://api.jquery.com/load/

get the href value of a specific element and load it

I'm using jquery to add rel=brochure using $('.imageOuter a').attr('rel', 'brochure') this works as expected.
However, I want to grab the link that has rel as brochure. I'm trying to do this with loadHTML, as below:
function getBrochureLink() {
$doc = new DOMDocument();
$doc->loadHTML($file);
$area = $doc->getElementsByTagName('body')->item(0);
$links = $area->getElementsByTagName("link");
foreach($links as $l) {
if($l->getAttribute("rel") == "brochure") {
$brochureLink = $l->getAttribute("href");
}
}
}
Sadly $brochureLink is empty and not grabbing it.
Your issue is that the attr is set via Javascript. When you retrieved the page's contents via loadHTML, the JS was not executed, so you can't find the matching link.
You'll have to either run the JS on the server side, put the attr into the DOM directly without JS, or find another architecture for whatever you're attempting to accomplish.

Dom element of paragraph's text

I'm making a web scraper and this is driving me crazy!
I need to get the text of a paragraph. Simple, right?! Here's the code.
$xpath = new DOMXPath($dom);
$hrefs = $xpath->evaluate("/html/body//div");
for ($i = 0; $i < $hrefs->length; $i++) {
$href = $hrefs->item($i);
$url = $href->getAttribute('class');
echo "<br />Found it: $url";
}
It works perfectly, grabs the class of every div on the page and echoes it out. But what I really need to do is find all <p> tags - every one on the page - and echo the text that is in between the <p>! I have a feeling it's simple but I just can't figure it out.
edit
All it took was the following:
$doc = new DOMDocument();
#$doc->loadHTML($html);
$node = $doc->getElementsByTagName('p')->item(3);
echo $node->textContent."\n";
What you really want is getElementsByName and then once you have the node, you textContent for the win. Thanks folks! Not sure if it will apply to everyone else's situation, but it sure does mine. =o
Use getElementsByTagName to retrieve all <p>-elements. Then iterate over the resulting DOMNodeList an fetch the nodeValue of the items.
<?php
$dom=new DOMDocument;
$dom->loadXML('<html><body><p>para1<p>para2<p>para3</p></p></p></body></html>');
$paras=$dom->getElementsByTagName('p');
for($p=0;$p<$paras->length;++$p)
{
echo htmlentities($paras->item($p)->nodeValue).'<hr/>';
}
?>
This jQuery snippet may help. upon click on textarea, it will find all contents in p element
and load them into textarea.
/** BEGIN **/
$(document).ready(function(){
$('textarea').click(function(){
var pText = $('p').text();
if($('p').children('a, span, li'))
{
var aText = $('a').text();
var spanText=$('span').text();
var liText= $('li').text();
}
//alert('the value p is ' + pText +''+ spanText+''+liText);
$(this).text(pText+''+ spanText+''+liText);
});
});
/** END **/

Categories