I’m trying to store the content of a div to a variable.
Example:
<div class="anything">
<p>We don't know the content of this div</p>
</div>
I want to search for <div class="anything"> and store everything between opening and the end tag.
We also want to avoid using absolute pathnames, so that it only searches the current HTML/PHP file for this div where the code is present.
Is this possible with PHP, or is this only possible with JavaScript ?
PHP is not that intelligent. He doesn't even know what he says.
PHP is a server-side language. It has absolutely NO clue about what the DOM (ie. what is displayed in your browser's window) is when it delivers a page. Yeah I know, PHP rendered the DOM, so how could it not know what's in there?
Simply put, let's say that PHP doesn't have a memory of what he renders. He just knows that at one particular moment, he is delivering strings of characters, but that's all. He kind of doesn't get the big picture. The big picture goes to the client and is called the DOM. The server (PHP) forgets it immediately as he's rendering it.
Like a red fish.
To do that, you need JavaScript (which is on the client's computer, and therefore has complete access to the rendered DOM), or if you want PHP to do this, you have to retrieve an full-rendered page first.
So the only way to do what you want to do in PHP is to get your page printed, and only then you can retrieve it with an http request and parse it with, in your case, a library such as simpleHtmlDom.
Quick example on how to parse a rendered page with simpleHtmlDom:
Let's say you know that your page will be available at http://mypage.com/mypage.php
$html = file_get_html('http://mypage.com/mypage.php');
foreach($html->find('div.anything') as $element)
echo $element->src . '<br>';
you probably need a combination of those.
In your Javascript:
var content = document.getElementsByClassName("anything")[0].innerHTML();
document.getElementByID('formfield').value(content);
document.getElementByID('hiddenForm').submit();
In your HTML/PHP File:
<form id="hiddenForm" action="path/to/your/script">
<input type="hidden" name="formfield" value="" />
</form>
In the script you defined in the form action:
if(!empty($_POST)){
$content = $_POST['formfield'];
// DO something with the content;
}
Alternatively you could send the data via AJAX but I guess you are new to this stuff so you should start slowly :)
Cheers!
steve
You could use JS to take the .innerHTML from the elements you wan and store them in .value of some input fields of a form and then use a submit button to run the PHP form handling as normal. Use .readOnly to make the input fields uneditle.
Related
I have started learning php and I have a question.Let's say I have the following html code:
<p id='tobeChanged'>I wil be changed throughout the execution<p>
This paragraph is not static.Its content can be changed from the user with a button which will produce a random number and will replace the paragraphs html.
E.g. from
p id='tobeChanged'>I wil be changed throughout the execution<p>
to
<p id='tobeChanged'>42<p><!--changed with a button-->
Now my question.Is it possible to pass the new produced value to a php variable?If possible i would like a long explanation.
Also i would like not to use forms(if possible).
Thanks In advance
You need to fire an AJAX request on that button click, that will send that value to server making php to read it.
You can do something like this (you need to include jQuery on page):
$.post("/saveVariable.php",{randNum:randomNum},function(data){alert("Data saved successfully");})
At PHP end, you will get the value in
$_POST['randNum']
Maybe that will help.
I would like to download content of certain page and get one number from it (still not sure how, probably using PHP DOM interface). I opened the page, started Firefox's debugging, picked the element with number and found out that is in <div id="lblOptimizePercent" class="wod-dpsval">98.4%</div> (98.4% is what I am looking for). So I opened its source code, Ctrl - F for lblOptimizePercent and all I found is this <div id="lblOptimizePercent" class="wod-dpsval"></div> without any content. What I've done wrong? Or is it some site's protection not to steal contents?
Link to the original site
Normally, to scrape the page from PHP, you would have to
save the page
extract the value you want from HTML via a regular expression
alternatives include using SimpleXML for DOM querying...
The piece of HTML we are look at is:
<div id="lblOptimizePercent" class="wod-dpsval">DATA</div>
<?php
$text = file_get_contents('http://www.askmrrobot.com/wow/optimize/eu/drak%27thul/Ecclesiastic');
$regexp = '^<div id=\"lblOptimizePercent\" class=\"wod-dpsval\">(.*)<\/div>^';
preg_match($regexp, $text, $matches);
$percentage = $matches[1];
echo $percentage;
This should give you DATA - the percentage value. But this doesn't happen! Why:
The data is dynamically inserted by a Javascript on client-side.
The id or class selector is used for DOM querying (element selection), then the data value is added.
http://api.jquery.com/id-selector/ - http://api.jquery.com/class-selector/
jQuery example
On this site they deliver <div id="lblOptimizePercent" class="wod-dpsval"></div>to the client and then they use an update query like this: $("#lblOptimizePercent").text("100%"); to update the percentage value.
If you want to query it on client-side, you might use $("#lblOptimizePercent").text();**
Try this in your console. It returns the percentage value.
How to scrape this page?
If you want to scrape this page with dynamic data, you need something like a Browser Environment for scraping: PhantomJS or SlimerJS are your friend.
Open the page with PhantomJS, launch the jQuery cmd from above and done.
This snippet should get you pretty close. You might save it as scrape.js then execute it with Phantom.
var page = require('webpage').create();
page.open('http://www.askmrrobot.com/wow/optimize/eu/drak%27thul/Ecclesiastic', function() {
page.includeJs("http://ajax.googleapis.com/ajax/libs/jquery/1.6.1/jquery.min.js", function() {
page.evaluate(function() {
alert(
$("#lblOptimizePercent").text()
);
});
phantom.exit()
});
});
You can also save the "evaluated page" (now with data) and do the extract with PHP.
That's exactly like: Save Page in your browser and working on the saved HTML file.
In Firebug or another webdeveloper tools you see the generated content, in Source code there is a blank element only.
First time, blank element is shown (during rendering site) and than using JS the content is filled.
Googlebot etc. can´t see this JS-generated content, but it´s no problem in this case.
Code:
document.getElementById('lblOptimizePercent').innerHTML = '94%';
Or similarly using jQuery:
$('#lblOptimizePercent').html('94%');
// need to load jQuery before, of course
I have a php with a form/text area, I do not want to use a button press to post data to some other PHP.
I am looking for some auto_post AJAX query which will put all the data that user has typed in the text box, into a $variable in the same php.
Something like this:
<?php
$checkbox.= '<input type="textbox" name="vehicle" value="" />';
echo "$checkbox<br>";
$return = $checkbox;
//$return should have all the data typed by the user
?>
This is not possible without PHP as it is a serverside language, which means, that all code is processed before the document is sent to the Browser. Ajax and JavaScript however are Clientside languages, which first come to work when the Client recieved the document.
Working with php asynchronus can be reached as you already guessed by using Ajax. By sending information to a php script and printig it as a xml document you can run what you want to progress and read it on the original page again using Javascript.
But i suggest to read a tutorial on Ajax itself to get an idea of what to do, as this is a bit too much for a single Stackoverflow answer.
Here are some tutorials that might help you:
http://killerajax.com/
https://developer.mozilla.org/
I'd like to pass some data from PHP to JavaScript without JSON.
The reason is because I don't want the data been readable by anyone if clicks on view page source.
So, I have a PHP like
print(<script type="text/javascript">a = "aaa";</script>);
In my HTML code this will be
<script type="text/javascript">a = "aaa";</script>
I can remove this in the client side, after loading the variable. By for example with jquery
$('script[type="text/javascript"]').remove();
And after the DOM will not have anymore the script tag, but the variable a.
Later if I type to the console window.a will be aaa.
But i do not want to show the <script type="text/javascript">a = "aaa";</script> in my HTML source code. Is this possible, to pass the PHP variable directly to the DOM?
Thanks for the help.
JavaScript is a client-side language. Whatever you pass to it (by whatever means) will be readable by the end user.
Removing the Script DOM won't help, as "view source" shows the HTML code as it was during download. If that is what you are concerned about, you can fetch the variable via an AJAX once the DOM has been loaded.
(But it still is readable by anyone who can read JavaScript (an re-run the AJAX call), use Firebug or Wireshark. It really only helps against a simple "view source".)
i have a page,say abc.html, that is having a small form with some fields.
<form name="form" method="post" action="abc.html">.......................</form>
when we submit the form it again comes back to abc.html with some data posted and shows the resulted names on the page which came after processing the posted data.
in the whole procedure the page url remains same.Now i want to parse this abc.html containing data after the submission of the form.I have done parsing in which the original url contains all the data but not like this on which after submission the data gets displayed on the page.Please tell me how can i parse such page??
Well, to get the correct HTML from the server, you have to send a POST request containing the form data. Then you can parse the server response.
Parsing the HTML file is same as us seeing it. So the HTML page rendered after posting the data will have some or any HTML element in which the additional text is displayed. When you parse the page chek if this or a container exists if so then read the rest of the data. The HTML page displayed without the posted data will not have this additional or container.
Edit: Look at this question : PHP Screen Scraping and Sessions
First of all. Your page should be abc.php. Otherwise it will not parse any php.
Second. Here is some code that will help you out (I Hope). Copy/Paste this example and place it in abc.php
<html>
<head></head>
<body>
<?php
if (isset($_POST['submit'])) {
echo 'you posted the following value :'.$_POST['foo'];
}
?>
<form name="form" action="abc.php" method="post">
<input type="text" name="foo" value="" />
<input type="submit" name="submit" value="Press Me" />
</form>
</body>
</html>
If this is not the case. And you want to parse HTML like parsing XML you should use the DOMDocument class of PHP
$oDom = new DOMDocument();
$oDom->loadHTML($sHTMLstring);
// or
$oDom->loadHTMLFile($sFileName);
// now you can walk the dom like
$oDomElement = $oDom->getElementByTagName('form');
http://nl.php.net/manual/en/domdocument.loadhtml.php
http://nl.php.net/manual/en/domdocument.loadhtmlfile.php
http://nl.php.net/manual/en/domdocument.getelementsbytagname.php
Hope this helps
Good question, but I think it's not possible with PHP. My company doing that with very advanced tool in C. It just grab any page and send the any form and get rsponse HTML.
But You can found maybe some tools. Don't know.
I think the point here is that you can't just open the URL and read the HTML that comes back. You will have to play the part of the browser in order to interact with the server side form. To do this, you'll have to write your own code to HTTP POST the form input data. The HTTP response to your POST will contain the generated HTML, which you can then parse for the processed results.
If you want to send the form to the web server (i.e. "fill" it first) you need something similar to Perls WWW::Mechanize. See this question for possible solutions to do this. Afterwards, you need to parse the resulting page, and that heavily depends on the site in question itself: one site might use named elements you can easily retrieve using regular expressions, a different site might not, making it much harder to get the values you're interested in.