I want to get src in image based on class or id.
Ex. On html page there are many <img src="url"> but only one have a class or id:
<img src="url" class="image" or id="image">
How to get right src attribute wich have a specific class or id?
Pls regex not dom
I gonna explain you why I dont want to use dom or other libraries because I'm getting a html page from an other site which not allow fopen or _file_get_contents or DOM but only Curl could do this. Sure I have a reason why I not use these libraries like simplehtmldom because sometimes is impossible to get remote html page and I should make by myself some scripts.
You say that you don't want to use DOM libraries because you need to use cURL. That's fine -- DOMDocument and simple_xml_load_string both take string arguments. So you can get your string from cURL and load it into your DOM library.
For instance:
$html = curl_exec($ch); // assuming CURLOPT_RETURNTRANSFER
$dom = new DOMDocument;
$dom->loadHTML($html); // load the string from cURL into the DOMDocument object
// using an ID
$el = $dom->getElementById('image');
// using a class
$xpath = new DOMXPath($dom);
$els = $xpath->query('//img[#class="image"]');
$el = $els->item(0);
$src = $el->getAttribute('src');
if you absolutely have to use regex, here it is
<img(?:[^>]+src="(.+?)"[^>]+(?:id|class)="image"|[^>]+(?:id|class)="image"[^>]+src="(.+?)")
That said, the right way to do it is to use jQuery or a similar DOM-parsing technique. Don't use the regex unless you have a very good reason to because it will miss many cases (for example, it won't work if single quotes are used instead of double quotes or if there are spaces before "image").
Related
I am trying to read the html <audio> tag in PHP, But it is creating dynamically
This is the URL! I'm using to read
$dom = new DOMDocument();
#$dom->loadHTML($html);
foreach (iterator_to_array($dom->getElementsByTagName('audio')) as $node) {
$this->printnode($node);
}
In printnode() function it is showing like no <audio> tag exits because it is creating dynamically
After seeing the structure, yes the url for the actual audio is being loading dynamically via JS.
But the audio playlist data is still visible. Use that:
$xpath = new DOMXPath($dom);
$playlist_data = $xpath->evaluate('string(//script[#id="playlist-data"])');
$data = json_decode($playlist_data, 1);
echo $data['audio'];
Its inside another script tag on JSON string format. So basically, access this data and get the value as a string. Then you'll get the JSON string, and as usual, load it into json_decode and the parser will do its thing returning you with an array, then access the audio url like any normal array
Sidenote: I just used xpath as personal preference, you can use:
$playlist_data = $dom->getElementById('playlist-data')->nodeValue;
if you choose to do so.
I used the following code to parse the HTML of another site but it display the fatal error:
$html=file_get_html('http://www.google.co.in');
Fatal error: Call to undefined function file_get_html()
are you sure you have downloaded and included php simple html dom parser ?
You are calling class does not belong to php.
Download simple_html_dom class here and use the methods included as you like it. It is really great especially when you are working with Emails-newsletter:
include_once('simple_html_dom.php');
$html = file_get_html('http://www.google.co.in');
As everyone have told you, you are seeing this error because you obviously didn't downloaded and included simple_html_dom class after you just copy pasted that third party code,
Now you have two options, option one is what all other developers have provided in their answers along with mine,
However my friend,
Option two is to not use that third party php class at all! and use the php developer's default class to perform same task, and that class is always loaded with php, so there is also efficiency in using this method along with originality plus security!
Instead of file_get_html which not a function defined by php developers use-
$doc = new DOMDocument();
$doc->loadHTMLFile("filename.html");
echo $doc->saveHTML(); that's indeed defined by them. Check it on php.net/manual(Original php manual by its devs)
This puts the HTML into a DOM object which can be parsed by individual tags, attributes, etc.. Here is an example of getting all the 'href' attributes and corresponding node values out of the 'a' tag. Very cool....
$tags = $doc->getElementsByTagName('a');
foreach ($tags as $tag) {
echo $tag->getAttribute('href').' | '.$tag->nodeValue."\n";
}
P.S. : PLEASE UPVOTE IF YOU LIKED MY ANSWER WILL HELP MY REPUTATION ON STACKOVERFLOW, THIS PEOPLES THINK I'M NOOB!
It looks like you're looking for simplexml_load_file which will load a file and put it into a SimpleXML object.
Of course, if it is not well-formatted that might cause problems. Your other option is DomObject::loadHTMLFile. That is a good deal more forgiving of badly formed documents.
If you don't care about the XML and just want the data, you can use file_get_contents.
$html = file_get_contents('http://www.google.co.in');
to get the html content of the page
in simple words
download the simple_html_dom.php from here Click here
now write these line to your Php file
include_once('simple_html_dom.php');
and start your coading after that
$html = file_get_html('http://www.google.co.in');
no error will be displayed
Try file_get_contents.
http://www.php.net/manual/en/function.file-get-contents.php
Anybody any idea how they do it? I currently use OffLiberty.com to parse Mixcloud links to get the raw MP3 URL for use in a custom HTML5 player for iOS compatibility, I was just wondering if anyone knew how exactly their process works, so I could create something similar that would 'cut out the middleman' so to speak, so my end-user wouldn't have to go to an external site to get a link to the MP3 for the mix they want to post. Just a thought really, not terribly important if it couldn't be done, but it would be a nice touch :)
Anybody any idea?
Note that I'm against content scraping and you should ask those website permission to scrap their MP3 URLs. Else, if I was them, I'd block you right now and ad vitam æternam.
Anyway, you can parse its HTML using DOMDocument.
For example :
<?php
// just so you don't see parse errors
$internal_errors = libxml_use_internal_errors(true);
// initialize the document
$doc = new DomDocument();
// load a page
$doc->loadHTMLFile('http://www.mixcloud.com/LaidBackRadio/le-motel-on-the-road/');
// initialize XPATH for the document
$xpath = new DomXPath($doc);
// span with "data-preview-url" seems to contain MP3 url
// we request them inside a DomNodeList http://www.php.net/manual/en/class.domnodelist.php
$mp3 = $xpath->query('//span[#data-preview-url]');
foreach($mp3 as $m){
// we print the attribute value
echo $m->attributes->getNamedItem('data-preview-url')->nodeValue . '<br/>';
}
libxml_use_internal_errors($internal_errors);
I am grabbing the contents from google with PhP, how can I search $page for elements with the id of "#lga" and echo out another property? Say #lga is an image, how would I echo out it's source?
No, i'm not going to do this with Google, Google is strictly an example and testing page.
<body><img id="lga" src="snail.png" /></body>
I want to find the element named "lga" and echo out it's source; so the above code I would want to echo out "snail.png".
This is what i'm using and how i'm storing what I found:
<?php
$url = "https://www.google.com/";
$page = file($url);
foreach($page as $part){
}
?>
You can achieve this using the built-in DOMDocument class. This class allows you to work with HTML in a structured manner rather than parsing plain text yourself, and it's quite versatile:
$dom = new DOMDocument();
$dom->loadHTML($html);
To get the src attribute of the element with the id lga, you could simply use:
$imageSrc = $dom->getElementById('lga')->getAttribute('src');
Note that DOMDocument::loadHTML will generate warnings when it encounters invalid HTML. The method's doc page has a few notes on how to suppress these warnings.
Also, if you have control over the website you are parsing the HTML from, it might be more appropriate to have a dedicated script to serve the information you are after. Unless you need to parse exactly what's on a page as it is served, extracting data from HTML like this could be quite wasteful.
I'm trying to find all href links on a webpage and replace the link with my own proxy link.
For example
Google
Needs to be
Google
Use PHP's DomDocument to parse the page
$doc = new DOMDocument();
// load the string into the DOM (this is your page's HTML), see below for more info
$doc->loadHTML('Google');
//Loop through each <a> tag in the dom and change the href property
foreach($doc->getElementsByTagName('a') as $anchor) {
$link = $anchor->getAttribute('href');
$link = 'http://www.example.com/?loadpage='.urlencode($link);
$anchor->setAttribute('href', $link);
}
echo $doc->saveHTML();
Check it out here: http://codepad.org/9enqx3Rv
If you don't have the HTML as a string, you may use cUrl (docs) to grab the HTML, or you can use the loadHTMLFile method of DomDocument
Documentation
DomDocument - http://php.net/manual/en/class.domdocument.php
DomElement - http://www.php.net/manual/en/class.domelement.php
DomElement::getAttribute - http://www.php.net/manual/en/domelement.getattribute.php
DOMElement::setAttribute - http://www.php.net/manual/en/domelement.setattribute.php
urlencode - http://php.net/manual/en/function.urlencode.php
DomDocument::loadHTMLFile - http://www.php.net/manual/en/domdocument.loadhtmlfile.php
cURL - http://php.net/manual/en/book.curl.php
Just another option if you would like to have the links replaced with by jQuery you could also do the following:
$(document).find('a').each(function(key, element){
curValue = element.attr('href');
element.attr('href', 'http://www.example.com?loadpage='+curValue);
});
However a more secure way is doing it in php offcourse.
Simplest way I can think to do this:
$loader = "http://www.example.com?loadpage=";
$page_contents = str_ireplace(array('href="', "href='"), array('href="'.$loader, "href='".$loader), $page_contents);
But that might have some problems with urls containing ? or &. Or if the text (not code) of the document contains href="