I'm using the current function :
function callframe(){
$ch = curl_init("file.html");
curl_setopt($ch, CURLOPT_HEADER, 0);
echo curl_exec($ch);
curl_close($ch);
}
Then i call callframe() and it appears on my php page.
Let's say this is the file.html content :
<html>
<body>
[...]
<td class="bottombar" valign="middle" height="20" align="center" width="1%" nowrap>
[...]
Link
[...]
</body>
</html>
How could i delete the <td class="bottombar" valign="middle" height="20" align="center" width="1%" nowrap> line?
How could i delete one parameter like the height parameter, or change align center to left?
How could i insert 'http://www.whatever.com/' before link.html in my a href
Thanks for your help!
ps: you may want to ask why i don't directly change file.html. well, then, there would be no question.
To get you started, instead of just echoing the curl_exec, store it first so you can work with it:
$html = curl_exec($ch);
now, load it up in to a DOMDocument that you can then use for parsing and making changes:
$dom = new DOMDocument();
$dom->loadHTML($html);
now, for the first task (removing that line) it'd look something like:
//
// rough example, not just copy-paste code
//
$tds = $dom->getElementsByTagname('td'); // $tds = DOMNodeList
foreach ($tds as $td) // $td = DOMNode
{
// validate this $td is the one you want to delete, then
// call something like:
$parent = $td->parentNode;
$parent->removeChild($td);
}
Perform any other kinds of processing as well.
Then, finally call:
echo $dom->saveHTML();
You can take your output in one variable and can use string functions to do your stuffs
function callframe(){
$ch = curl_init("file.html");
curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE);
$result = curl_exec($ch);
$result = str_replace("link.html","http://www.whatever.com/link.html", $result);
// other replacements as required
curl_close($ch);
}
This is how i did it.
To change for example an option field (for search string)
This change the second value of my option list and replace it with what i wanted.
require('simple_html_dom.php');
$html = file_get_html('fileorurl');
$e = $html->find('option', 0) ->next_sibling ();
$e->outertext = '<option value="WTR">Tradition</option>';
then
echo $html;
Related
This question already has answers here:
How i can get td nodeValue with specific class?
(2 answers)
Closed 2 years ago.
I'm trying to parse a webpage for the ISBN number, the HTML looks like:
<tr>
<td>ISBN: </td>
<td itemprop="isbn">9781472223821</td>
</tr>
I currently have:
header('Content-Type:application/json');
$url = "URL Removed";
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
$res = curl_exec($ch);
curl_close($ch);
$dom = new DomDocument();
#$dom->loadHTML($res);
$searchNodes = $dom->getElementsByTagName("//td[#itemprop='isbn']");
foreach ($searchNodes as $node) {
echo $node->nodeValue, PHP_EOL;
}
When i run this i get no output, i've double checked the xpath query in the chrome dev tools and that correctly selects the element i'm after. i believe its something to do with teh nodeValue option. I've tried a var_dump on the $searchNode variable and get
object(DOMNodeList)#2 (1) {
["length"]=>
int(0)
}
Is anyone able to highlight my next steps to investigate with this.
getElementsByTagName expects only a single tag name. Here is a working example using DOMXPath:
$res = '
<tr>
<td>ISBN: </td>
<td itemprop="isbn">9781472223821</td>
</tr>
';
$dom = new DomDocument();
$dom->loadHTML($res);
$xpath = new DOMXPath($dom);
$searchNodes = $xpath->query("//td[#itemprop='isbn']");
foreach ($searchNodes as $node) {
echo $node->nodeValue, PHP_EOL;
}
I have a php code:
$url = "http://www.bbc.co.uk/";
$ch = curl_init();
$timeout = 5;
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, $timeout);
$data = curl_exec($ch);
curl_close($ch);
$doc = new DOMDocument();
$doc->validateOnParse = true;
#$doc->loadHtml($data);
//I want to get element id and all i know is that the element is containg text "Business"
echo $doc->getElementById($id)->textContent;
Lets assume, that there is an element on a page a want to keep track of. I don't know the id, just the textcontent at that time. I want to get the id so i could get the textcontent of the same element next week or month, no matter if the text content is changing or not...
Have a look at this project:
http://code.google.com/p/phpquery/
With this you can use CSS3 selectors like "div:contains('foo')" to find elements containing a text.
Update: An example
The task: Find the elements containing "find me" inside "test.html":
<html>
<head></head>
<body>
<div>hello</div>
<div>find me!</div>
<div>and find me!</div>
<div>another one</div>
</body>
</html>
The PHP-Skript:
<?php
include "phpQuery-onefile.php";
phpQuery::newDocumentFileXHTML('test.html');
$domNodes = pq('div:contains("find me")');
foreach($domNodes as $domNode) {
/** #var DOMNode */
echo $domNode->textContent . PHP_EOL;
}
The result of running it:
php test.php
find me!
and find me!
I'm trying to monitor a new products page of a website with specific words. I already have a basic script that searches for a single word using file_get_contents(); however this is not effective.
Looking at the code they are in <td> tags within a <table>
How do I get PHP to search for the words no matter what order and get declaration they are in? e.g.
$searchTerm = "Orange Boots";
from:
<table>
<td>Boots (RED)</td>
</table>
<table>
<td>boots (ORANGE)</td>
</table>
<table>
<td>Shirt (GREEN)</td>
</table>
Returns a match.
Sorry if its not clear, but I hope you understand
you can do this like
$newcontent= (str_replace( 'Boots', '<span class="Red">Boots</span>',$cont));
and just write css for class red like you want to show the red color than color:red; and do same thing for rest
but the better approach will be DOM and Xpath
If you're looking to make a quick and dirty search over that HTML block, you can try a simple regular expression with the preg_match_all() function. For example, you can try:
$html_block = get_file_contents(...);
$matches_found = preg_match_all('/(orange|boots|shirt)/i', $html_block, $matches);
$matches_found would be either 1 or 0, as an indication if a match was found or not. $matches would be populated with any matches in accordance.
Use curl. It's much faster than filegetcontents(). Here's a starting point:
$target_url="http://www.w3schools.com/htmldom/dom_nodes.asp";
// make the cURL request to $target_url
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL,$target_url);
curl_setopt($ch, CURLOPT_FAILONERROR, true);
curl_setopt($ch, CURLOPT_AUTOREFERER, true);
curl_setopt($ch, CURLOPT_RETURNTRANSFER,true);
curl_setopt($ch, CURLOPT_TIMEOUT, 10);
$html= curl_exec($ch);
if (!$html) {exit;}
$dom = new DOMDocument();
#$dom->loadHTML($html);
$query = "(/html/body//tr)"; //this is where the search takes place
$xpath = new DOMXPath($dom);
$result = $xpath->query($query);
for ($i = 0; $i <$result->length; $i++) {
$node = $result->item(0);
echo "{$node->nodeName} - {$node->nodeValue}<br />";
}
<?php
$file = 'http://www.google.com';
$doc = new DOMDocument();
# $doc->loadHTML(file_get_contents($file));
echo $doc->getElementsByTagName('span')->item(2)->nodeValue;
if (0 != $element->length)
{
$content = trim($element->item(2)->nodeValue);
if (empty($content))
{
$content = trim($element->item(2)->textContent);
}
echo $content . "\n";
}
?>
im trying to get the inner content of a span tag from google.com's home site. this code should output the first span tag, but it is not outputting any results?
The is not an error ... the first span in http://www.google.com is empty and am not sure what else you expect
<span class=gbtcb></span> <---------------- item(0)
<span class=gbtb2></span> <---------------- item(1)
<span class=gbts>Search</span> <----------- item(2)
Try
$element = $doc->getElementsByTagName('span')->item(2);
var_dump($element->nodeValue);
Output
Search
First, bear in mind that the HTML is not necessarily valid XML.
That aside, check that you're actually getting some contents to parse; you need to have allow_url_fopen enabled in order to use file_get_contents() with URLs.
In general, avoid using the error suppression operator (#) because it will almost certainly come back to bite you some time (and this time might well be that time); there is a discussion on this elsewhere on SO.
So, as a first step, switch to something like the following let me know if you're getting any contents at all.
// stop using # to suppress errors
$contents = file_get_contents($file);
// check that you're getting something to parse
echo $contents;
Try this and tell us what the output is
<?
echo ini_get('allow_url_fopen');
?>
Try using cURL to get the data and then load it into a DOMDocument:
<?php
$url = "http://www.google.com";
$ch = curl_init();
$timeout = 5;
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, $timeout);
$data = curl_exec($ch);
curl_close($ch);
$dom = new DOMDocument();
#$dom->loadHTML($data); //The # is necessary to suppress invalid markup
echo $dom->getElementsByTagName('span')->item(2)->nodeValue;
if (0 != $element->length)
{
$content = trim($element->item(2)->nodeValue);
if (empty($content))
{
$content = trim($element->item(2)->textContent);
}
echo $content . "\n";
}
?>
This question already has answers here:
How to get innerHTML of DOMNode?
(9 answers)
Closed 9 years ago.
I've been picking bits and pieces of code, you can see roughly what I'm trying to do, obviously this doesn't work and is utterly wrong:
<?php
$dom= new DOMDocument();
$dom->loadHTMLFile('http://example.com/');
$data = $dom->getElementById("profile_section_container");
$html = $data->saveHTML();
echo $html;
?>
Using a CURL call, I am able to retrieve the document URL source:
function curl_get_file_contents($URL)
{
$c = curl_init();
curl_setopt($c, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($c, CURLOPT_URL, $URL);
$contents = curl_exec($c);
curl_close($c);
if ($contents) return $contents;
else return FALSE;
}
$f = curl_get_file_contents('http://example.com/');
echo $f;
So how can I use this now to instantiate a DOMDocument object in PHP and extract a node using getElementById
This is the code you will need to avoid any malformed HTML errors:
$dom = new DOMDocument();
libxml_use_internal_errors(true);
$dom->loadHTMLFile('http://example.com/');
$data = $dom->getElementById("banner");
echo $data->nodeValue."\n"
To dump whole HTML source you can call:
echo $dom->saveHTML();
<?php
$f = curl_get_file_contents('http://example.com/')
$dom = new DOMDocument();
#$dom->loadHTML($f);
$data = $dom->getElementById("profile_section_container");
$html = $dom->saveHTML($data);
echo $html;
?>
It would help if you provided the example html.
i'm not sure but i remember once i wanted to use this i was unbale to load some external url as file because the php.ini directve allow-url-fopen was set to off ...
So check your pnp.ini or try to open url with fopen to see if you can read the url as a file
<?php
$f = file_get_contents(url);
var_dump($f); // just to see the content
?>
Regards;
mimiz
Try this:
$dom= new DOMDocument();
$dom->loadHTMLFile('http://example.com/');
$data = $dom->getElementById("profile_section_container")->item(0);
$html = $data->saveHTML();
echo $html;
i think that now you can use DOMDocument::loadHTML
Maybe you should try Doctype existence (with a regexp) and then add it if necessary, for being sure to have it declare ...
Regards
Mimiz