Is it possible to check if an element exists with PHP?
I'm aware of the javascript method already but I just want to avoid it if possible.
If you have the HTML server side in a string, you can use DOMDocument:
<?php
$html = '<html><body><div id="first"></div><div id="second"></div></body></html>';
$dom = new DOMDocument;
$dom->loadHTML($html);
$element = $dom->getElementById('second');
// this will be null if it isn't found
var_dump($element);
Not directly, because PHP is serverside only.
But if you really wish to do so, you may send the whole code of your page to a php script on your server using an ajax request, parse it there to find out if a div with a specified ID exists (see Shabbyrobes post; sure this would be very ineffective and is not recommended when you can easily check it with javascript...) and return the result in your ajax response.
No. PHP can only serve content, it has no control or view of the DOM except what you ask it to create.
Related
Let's say you wanted to parse the DOM with PHP. You can easily achieve this using the DomDocument.
However, in order to do so, you would need to load some HTML using loadHTML or loadHTMLFile and provide the functions with a string containing HTML (or a file path in the case of loadHTMLFile).
As an example, if you just wanted to get an element with a specific ID (in PHP, not JavaScript), WITHIN your page, what can you do?
If you have PHP code generating the page, you could use the output buffer to generate the page in memory, edit the generated page and then flush it to the browser. You can only change the DOM before the browser gets it.
You could do the following:
ob_start(); // Should be called before any output is generated
// ... PHP code that outputs HTML ...
$generated_html = ob_get_clean(); // Store generated HTML to string
// Load and manipulate HTML
$doc = new DOMDocument();
$doc->loadHTML($generated_html);
// ... Manipulate the generated HTML ...
echo $doc->saveHTML(); // echo the modified HTML
However, since you are generating the HTML it would make more sense to change whatever you need to change before it's generated to reduce procesing time.
If you want to change the HTML of a page which is already shown in the browser you'll need another way (such as JS/AJAX) since at that point PHP can't possibly access the DOM.
getElementById method can be invoked on the DOMDocument instance with id string to get the element. 1
$element = $testDOMDocument->getElementById('test-id');
I'm trying get element from a website. But i can't find element append by javascript. Have solution for that problem>
Code here:
$dom = new Dom;
$obj = $dom->loadFromUrl($url);
$element = $obj->find(".c-payment");
echo count($element);
Result = 0, but it has on website
When you reading a web page content with PHP, you are getting only static content (which are providen from a web server). The dinamic part of the content (which will be generated by JavaScript) do not exists at that moment, because PHP do not executes the JavaScript code.
You can try to use V8 Javascript Engine Integration. But I do not think that you easily can achieve what you want.
Maybe it will be useful for you: https://github.com/scraperlab/browserext
I am grabbing the contents from google with PhP, how can I search $page for elements with the id of "#lga" and echo out another property? Say #lga is an image, how would I echo out it's source?
No, i'm not going to do this with Google, Google is strictly an example and testing page.
<body><img id="lga" src="snail.png" /></body>
I want to find the element named "lga" and echo out it's source; so the above code I would want to echo out "snail.png".
This is what i'm using and how i'm storing what I found:
<?php
$url = "https://www.google.com/";
$page = file($url);
foreach($page as $part){
}
?>
You can achieve this using the built-in DOMDocument class. This class allows you to work with HTML in a structured manner rather than parsing plain text yourself, and it's quite versatile:
$dom = new DOMDocument();
$dom->loadHTML($html);
To get the src attribute of the element with the id lga, you could simply use:
$imageSrc = $dom->getElementById('lga')->getAttribute('src');
Note that DOMDocument::loadHTML will generate warnings when it encounters invalid HTML. The method's doc page has a few notes on how to suppress these warnings.
Also, if you have control over the website you are parsing the HTML from, it might be more appropriate to have a dedicated script to serve the information you are after. Unless you need to parse exactly what's on a page as it is served, extracting data from HTML like this could be quite wasteful.
UPDATE: The source code is very much different from what Developer Tools shows.
Check out the source: view-source:http://www.machinerytrader.com/list/list.aspx?ETID=1&catid=1002
Is that javascript that needs to be rendered by a browser into html? If so, how can I have php do that process so that I have Html to parse? It's weird that you can use Xpath Checker to return the items I'm looking for (see below), but you cannot access the full html!
(Xpath: //table[contains(#id, 'ctl00_ContentPlaceHolder1') and (contains(#id,"tblContent") or contains(#id,"tblListingHeader"))])
END UPDATE
I need to scrape some information off of this site for work on a regular basis. I am attempting to write some PHP code to scrape this data. I think I have some namespace issues here, having read a number of other posts on SO. I have never encountered namespace problems before and used the approach shown on another SO post (to no avail :().
It appears the xpath query is just not happening for whatever reason. If you have any guesses or solutions as to how to handle this issue, I am open for suggestions.
Also here is the output from my code:
object(DOMXPath)#2 (0) {
}
Debug 1
array(0) {
}
array(0) {
}
I left out the bottom of the code where I var_dump testarray and create and var_dump otherarray. Their output is included above. Obviously the two arrays will be empty if the DOMXPath element has length 0 as well.
$string = 'http://www.machinerytrader.com/list/list.aspx?ETID=1&catid=1002';
$machine_trader = file_get_contents($string);
$xml = new DOMDocument();
$xml->loadHTML($machine_trader);
$xpath = new DOMXPath($xml);
$rootNamespace = $xml->lookupNamespaceUri($xml->namespaceURI);
$xpath->registerNamespace('x', $rootNamespace);
$tableRows = $xpath->query("//x:table[contains(#id, 'ctl00_ContentPlaceHolder1') and (contains(#id,'tblContent') or contains(#id,'tblListingHeader'))]");
var_dump($xpath);
$testarray = array();
$otherarray = array();
foreach ( $tableRows as $row )
{
echo "Debug 1"."\n";
$testarray[] = $row->nodeValue;
}
This is not an XPath issue insofar that the actual content is found from a form post, which you didn't reach yet. JS Source code here does nothing more than authenticate a proper 'user' for the information request, and then send the request via form submission.
At each request, the salt / encryption 'key' is randomized and changes, preventing simple scrapes.
You could rewrite that JavaScript to PHP and then issue two requests, battling the authentication process along the way.
Or, rather than diddle with reverse-engineering this, you could switch your scraping to NodeJS and use something like PhantomJS since it can evaluate javascript but give you programmatic access. Given the complexity of this task, it'd be much simpler to use the right tool.
I am trying to find a way to search through a page in php to replace the names of form elements.
I guess I should explain. I'm doing a job for a friend and I want to make an easy database updater that is robust and can withstand adding elements without the person knowing much about php or databases.
In short, I want to search through a form and replace all the name="%name%" with the respective database table key names, so I can use a simple foreach method to update the table.
So I was looking at the DOMDocument element to open an html page and replace every form name inside in order with the corresponding table keys, but I wasn't sure if I can open a php page with loadHTMLfile or not. And, if I could open up a php page, would opening itself cause an infinite loop? Or would it just parse the html as if it were looking at client-side html?
Is there any way to do what I want? If not, that's OK, I'll just make it a little less awesome, but I was just wondering.
It's perfectly doable.
The DOMDocument is possibly the ideal (native) tool for this task, but you'll probably want to look into the DOMDocument::loadHTML() method instead of the loadHTMLfile() one.
To get the processed PHP page into a string, you can request the page with CURL, file_get_contents() or a similar alternative. This involves making an additional request and adding specific control logic to avoid an endless loop.
A better alternative might be to use output buffering, here is a simple example I have at hand in how to replace the contents of the <title> tag:
<?php
ob_start();
echo '<title>Original Title</title>';
/* get and delete current buffer && start a new buffer */
if ((($html = ob_get_clean()) !== false) && (ob_start() === true))
{
echo preg_replace('~<title>([^<]*)</title>~i', '<title>NEW TITLE</title>', $html, 1);
}
?>
I am using preg_replace(), but you shouldn't have any problems adapting it to use DOMDocument nodes. It's also worth noticing that the ob_start() call must be present before any headers / contents are sent to the browser, this includes session cookies and so on.
This should get you going, let me know if you need any more help.
A generic DOMDocument example:
<?php
ob_start(); // This must be the very first thing.
echo '<html>'; // Start of HTML.
echo '...'; // Your inputs and so on.
echo '</html>'; // End of HTML.
// Final processing, the $html variable will hold all output so far.
if ((($html = ob_get_clean()) !== false) && (ob_start() === true))
{
$dom = new DOMDocument();
$dom->loadHTML($html); // load the output HTML
/* your specific search and replace logic goes here */
echo $doc->saveHTML(); // output the replaced HTML
}
?>