I need to go to http://butlercountyclerk.org/bcc-11112005/ForeclosureSearch.aspx, enter data in the fields, then click the button to get results. When taken to the result page, I'm given a table of data but it's paginated into 5 different pages.
I'm able to do the above using cURL, but it's at this point that I get stuck.
Once I'm on the result page, I need to click the "date" header twice to make the data order by decreasing date, then skim off the current day's results.
Any idea how to do this, advanced detail or in concept? Either way should help.
Thanks!
The problem is that the click is actually performing a postback using javascript, with the limitations of PHP and cURL you will need to inspect the HTTP headers (GET, POST and COOKIES) being sent by the browser, and emulate them. Taking in mind that some values might be session dependent. Right now I don't have time to do this for you but I know it can be quite tricky with ASP.Net websites in some cases. There might be easier ways to do it, but that's what it will always come down to, because that's what happens.
If you weren't tied to PHP a whole world of options open - for example, the aggregator in the project I'm working on is actually capable of executing (controlled) javascript specifically for these kinds of tasks/pages (albeit on a grander scale).
I can't get a working set of results - if you could post some dummy data that gives results, that would help.
As a generic answer, you need something that can manipulate the DOM. You can go server-side with something like PHP and Webdriver, or purely client-side with Selenium. Simulate the click, get the resulting HTML and parse that.
This should work. try this.
$url ='http://butlercountyclerk.org/bcc-11112005/ForeclosureSearch.aspx';
## do curl , with cookies enabled.
## after do this.
$url =$url.'?'.'__EVENTTARGET=Search%3AdgSearch%3A_ctl2%3A_ctl1&__EVENTARGUMENT=&__VIEWSTATE=dDwtMjk2Mjk5NzczO3Q8O2w8aTwxPjs%2BO2w8dDw7bDxpPDE%2BOz47bDx0PDtsPGk8Mz47aTwxNz47aTwxOT47PjtsPHQ8dDw7cDxsPGk8MD47aTwxPjtpPDI%2BO2k8Mz47aTw0PjtpPDU%2BOz47bDxwPDIwMDY7MjAwNj47cDwyMDA3OzIwMDc%2BO3A8MjAwODsyMDA4PjtwPDIwMDk7MjAwOT47cDwyMDEwOzIwMTA%2BO3A8MjAxMTsyMDExPjs%2BPjs%2BOzs%2BO3Q8cDxwPGw8VmlzaWJsZTs%2BO2w8bzx0Pjs%2BPjs%2BOzs%2BO3Q8QDA8cDxwPGw8Q3VycmVudFBhZ2VJbmRleDtQYWdlQ291bnQ7XyFJdGVtQ291bnQ7XyFEYXRhU291cmNlSXRlbUNvdW50O0RhdGFLZXlzOz47bDxpPDA%2BO2k8ND47aTwxMD47aTw0MD47bDw%2BOz4%2BOz47Ozs7Ozs7Ozs7PjtsPGk8MD47PjtsPHQ8O2w8aTwyPjtpPDM%2BO2k8ND47aTw1PjtpPDY%2BO2k8Nz47aTw4PjtpPDk%2BO2k8MTA%2BO2k8MTE%2BOz47bDx0PDtsPGk8MD47aTwxPjtpPDI%2BO2k8Mz47aTw0Pjs%2BO2w8dDw7bDxpPDA%2BOz47bDx0PHA8cDxsPFRleHQ7TmF2aWdhdGVVcmw7PjtsPENWIDIwMTEgMDUgMTQzNjtodHRwOi8vd3d3LmJ1dGxlcmNvdW50eWNsZXJrLm9yZy9wYS9wYS51cmQvcGFtdzIwMDAtb19jYXNlX3N1bT8xNjE3NzE0OSAgICAgICAgICAgIDs%2BPjs%2BOzs%2BOz4%2BO3Q8cDxwPGw8VGV4dDs%2BO2w8NS8zLzIwMTE7Pj47Pjs7Pjt0PHA8cDxsPFRleHQ7PjtsPFNVTlRSVVNUIE1PUlRHQUdFIElOQzs%2BPjs%2BOzs%2BO3Q8cDxwPGw8VGV4dDs%2BO2w8TkFUSEFOSUVMIEdBQkJBUkQ7Pj47Pjs7Pjt0PHA8cDxsPFRleHQ7PjtsPDEzNTQgVkFOREVSVkVFUiBBVkUgSEFNSUxUT04sIE9IIDQ1MDExOz4%2BOz47Oz47Pj47dDw7bDxpPDA%2BO2k8MT47aTwyPjtpPDM%2BO2k8ND47PjtsPHQ8O2w8aTwwPjs%2BO2w8dDxwPHA8bDxUZXh0O05hdmlnYXRlVXJsOz47bDxDViAyMDExIDA1IDE0MTU7aHR0cDovL3d3dy5idXRsZXJjb3VudHljbGVyay5vcmcvcGEvcGEudXJkL3BhbXcyMDAwLW9fY2FzZV9zdW0%2FMTk2MzQ4ODUgICAgICAgICAgICA7Pj47Pjs7Pjs%2BPjt0PHA8cDxsPFRleHQ7PjtsPDUvMi8yMDExOz4%2BOz47Oz47dDxwPHA8bDxUZXh0Oz47bDxUSElSRCBGRURFUkFMIFNBVklOR1MgQU5EIExPQU4gQVNTTiBPRiBDTEVWRUxBTkQ7Pj47Pjs7Pjt0PHA8cDxsPFRleHQ7PjtsPEdBWUxFIE5BU0g7Pj47Pjs7Pjt0PHA8cDxsPFRleHQ7PjtsPDg5NDEgQ09YIFJEIFdFU1QgQ0hFU1RFUiwgT0ggNDUwNjk7Pj47Pjs7Pjs%2BPjt0PDtsPGk8MD47aTwxPjtpPDI%2BO2k8Mz47aTw0Pjs%2BO2w8dDw7bDxpPDA%2BOz47bDx0PHA8cDxsPFRleHQ7TmF2aWdhdGVVcmw7PjtsPENWIDIwMTEgMDUgMTUwMztodHRwOi8vd3d3LmJ1dGxlcmNvdW50eWNsZXJrLm9yZy9wYS9wYS51cmQvcGFtdzIwMDAtb19jYXNlX3N1bT8yMjY1MTYxMiAgICAgICAgICAgIDs%2BPjs%2BOzs%2BOz4%2BO3Q8cDxwPGw8VGV4dDs%2BO2w8NS85LzIwMTE7Pj47Pjs7Pjt0PHA8cDxsPFRleHQ7PjtsPFUgUyBCQU5LIE4gQTs%2BPjs%2BOzs%2BO3Q8cDxwPGw8VGV4dDs%2BO2w8TE9VSVMgTUlSTUFOOz4%2BOz47Oz47dDxwPHA8bDxUZXh0Oz47bDw2OTkxIEdBUlkgTEVFIERSIFdFU1QgQ0hFU1RFUiwgT0ggNDUwNjk7Pj47Pjs7Pjs%2BPjt0PDtsPGk8MD47aTwxPjtpPDI%2BO2k8Mz47aTw0Pjs%2BO2w8dDw7bDxpPDA%2BOz47bDx0PHA8cDxsPFRleHQ7TmF2aWdhdGVVcmw7PjtsPENWIDIwMTEgMDUgMTQ5MjtodHRwOi8vd3d3LmJ1dGxlcmNvdW50eWNsZXJrLm9yZy9wYS9wYS51cmQvcGFtdzIwMDAtb19jYXNlX3N1bT8yMzk3NTc5MiAgICAgICAgICAgIDs%2BPjs%2BOzs%2BOz4%2BO3Q8cDxwPGw8VGV4dDs%2BO2w8NS82LzIwMTE7Pj47Pjs7Pjt0PHA8cDxsPFRleHQ7PjtsPEZJRlRIIFRISVJEIE1PUlRHQUdFIENPOz4%2BOz47Oz47dDxwPHA8bDxUZXh0Oz47bDxSQVlNT05EIFNURUlOOz4%2BOz47Oz47dDxwPHA8bDxUZXh0Oz47bDwyMzU5IFRIUlVTSCBBVkUgRkFJUkZJRUxELCBPSCA0NTAxNDs%2BPjs%2BOzs%2BOz4%2BO3Q8O2w8aTwwPjtpPDE%2BO2k8Mj47aTwzPjtpPDQ%2BOz47bDx0PDtsPGk8MD47PjtsPHQ8cDxwPGw8VGV4dDtOYXZpZ2F0ZVVybDs%2BO2w8Q1YgMjAxMSAwNSAxNDM4O2h0dHA6Ly93d3cuYnV0bGVyY291bnR5Y2xlcmsub3JnL3BhL3BhLnVyZC9wYW13MjAwMC1vX2Nhc2Vfc3VtPzI0NzgyOTYzICAgICAgICAgICAgOz4%2BOz47Oz47Pj47dDxwPHA8bDxUZXh0Oz47bDw1LzMvMjAxMTs%2BPjs%2BOzs%2BO3Q8cDxwPGw8VGV4dDs%2BO2w8V0VMTFMgRkFSR08gQkFOSyBOIEE7Pj47Pjs7Pjt0PHA8cDxsPFRleHQ7PjtsPEpBTkVUIEJPRUhNOz4%2BOz47Oz47dDxwPHA8bDxUZXh0Oz47bDw4NjA4IEdPTERGSU5DSCBXQVkgV0VTVCBDSEVTVEVSLCBPSCA0NTA2OTs%2BPjs%2BOzs%2BOz4%2BO3Q8O2w8aTwwPjtpPDE%2BO2k8Mj47aTwzPjtpPDQ%2BOz47bDx0PDtsPGk8MD47PjtsPHQ8cDxwPGw8VGV4dDtOYXZpZ2F0ZVVybDs%2BO2w8Q1YgMjAxMSAwNSAxNDQwO2h0dHA6Ly93d3cuYnV0bGVyY291bnR5Y2xlcmsub3JnL3BhL3BhLnVyZC9wYW13MjAwMC1vX2Nhc2Vfc3VtPzI1NTkwMjAzICAgICAgICAgICAgOz4%2BOz47Oz47Pj47dDxwPHA8bDxUZXh0Oz47bDw1LzQvMjAxMTs%2BPjs%2BOzs%2BO3Q8cDxwPGw8VGV4dDs%2BO2w8RklGVEggVEhJUkQgQkFOSzs%2BPjs%2BOzs%2BO3Q8cDxwPGw8VGV4dDs%2BO2w8VEhFT0RPUkUgQ09PSzs%2BPjs%2BOzs%2BO3Q8cDxwPGw8VGV4dDs%2BO2w8UE8gQk9YIDE3MTEgV0VTVCBDSEVTVEVSLCBPSCA0NTA3MTs%2BPjs%2BOzs%2BOz4%2BO3Q8O2w8aTwwPjtpPDE%2BO2k8Mj47aTwzPjtpPDQ%2BOz47bDx0PDtsPGk8MD47PjtsPHQ8cDxwPGw8VGV4dDtOYXZpZ2F0ZVVybDs%2BO2w8Q1YgMjAxMSAwNSAxNDkwO2h0dHA6Ly93d3cuYnV0bGVyY291bnR5Y2xlcmsub3JnL3BhL3BhLnVyZC9wYW13MjAwMC1vX2Nhc2Vfc3VtPzI2ODY3MDkxICAgICAgICAgICAgOz4%2BOz47Oz47Pj47dDxwPHA8bDxUZXh0Oz47bDw1LzYvMjAxMTs%2BPjs%2BOzs%2BO3Q8cDxwPGw8VGV4dDs%2BO2w8Q0lUSUZJTkFOQ0lBTCBJTkM7Pj47Pjs7Pjt0PHA8cDxsPFRleHQ7PjtsPERPTk5BIE1BUkRJUzs%2BPjs%2BOzs%2BO3Q8cDxwPGw8VGV4dDs%2BO2w8NjU0OSBDQU5BU1RPVEEgRFJJVkUgSEFNSUxUT04sIE9IIDQ1MDExOz4%2BOz47Oz47Pj47dDw7bDxpPDA%2BO2k8MT47aTwyPjtpPDM%2BO2k8ND47PjtsPHQ8O2w8aTwwPjs%2BO2w8dDxwPHA8bDxUZXh0O05hdmlnYXRlVXJsOz47bDxDViAyMDExIDA1IDE0Njg7aHR0cDovL3d3dy5idXRsZXJjb3VudHljbGVyay5vcmcvcGEvcGEudXJkL3BhbXcyMDAwLW9fY2FzZV9zdW0%2FMjk4NzU2MDIgICAgICAgICAgICA7Pj47Pjs7Pjs%2BPjt0PHA8cDxsPFRleHQ7PjtsPDUvNS8yMDExOz4%2BOz47Oz47dDxwPHA8bDxUZXh0Oz47bDxDSVRJTU9SVEdBR0UgSU5DOz4%2BOz47Oz47dDxwPHA8bDxUZXh0Oz47bDxNQVRUSEVXIEJMVU5ERUxMOz4%2BOz47Oz47dDxwPHA8bDxUZXh0Oz47bDwxNDEyIEhFTE1BIEFWRSBIQU1JTFRPTiwgT0ggNDUwMTM7Pj47Pjs7Pjs%2BPjt0PDtsPGk8MD47aTwxPjtpPDI%2BO2k8Mz47aTw0Pjs%2BO2w8dDw7bDxpPDA%2BOz47bDx0PHA8cDxsPFRleHQ7TmF2aWdhdGVVcmw7PjtsPENWIDIwMTEgMDUgMTQzMjtodHRwOi8vd3d3LmJ1dGxlcmNvdW50eWNsZXJrLm9yZy9wYS9wYS51cmQvcGFtdzIwMDAtb19jYXNlX3N1bT8zMjI0MzYxNyAgICAgICAgICAgIDs%2BPjs%2BOzs%2BOz4%2BO3Q8cDxwPGw8VGV4dDs%2BO2w8NS8zLzIwMTE7Pj47Pjs7Pjt0PHA8cDxsPFRleHQ7PjtsPFdFTExTIEZBUkdPIEJBTksgTiBBOz4%2BOz47Oz47dDxwPHA8bDxUZXh0Oz47bDxKT0hOIEJPV01BTjs%2BPjs%2BOzs%2BO3Q8cDxwPGw8VGV4dDs%2BO2w8Jm5ic3BcOzs%2BPjs%2BOzs%2BOz4%2BO3Q8O2w8aTwwPjtpPDE%2BO2k8Mj47aTwzPjtpPDQ%2BOz47bDx0PDtsPGk8MD47PjtsPHQ8cDxwPGw8VGV4dDtOYXZpZ2F0ZVVybDs%2BO2w8Q1YgMjAxMSAwNSAxNDYzO2h0dHA6Ly93d3cuYnV0bGVyY291bnR5Y2xlcmsub3JnL3BhL3BhLnVyZC9wYW13MjAwMC1vX2Nhc2Vfc3VtPzQyMjcwMTE5ICAgICAgICAgICAgOz4%2BOz47Oz47Pj47dDxwPHA8bDxUZXh0Oz47bDw1LzQvMjAxMTs%2BPjs%2BOzs%2BO3Q8cDxwPGw8VGV4dDs%2BO2w8VSBTIEJBTksgTkFUSU9OQUwgQVNTT0NJQVRJT047Pj47Pjs7Pjt0PHA8cDxsPFRleHQ7PjtsPEJSWUFOIFNDSE1JRFQ7Pj47Pjs7Pjt0PHA8cDxsPFRleHQ7PjtsPDI4OTUgV0VFUElORyBXSUxMT1cgRFJJVkUgSEFNSUxUT04sIE9IIDQ1MDExOz4%2BOz47Oz47Pj47Pj47Pj47Pj47Pj47Pj47PtVTse1TdIXrxq%2FXrY%2Fp22QQ7pAh&Search%3AddlMonth=5&Search%3AddlYear=2011&Search%3AtxtCompanyName=&Search%3AtxtLastName=&Search%3AtxtCaseNumber=';
## DO curl with cookies on again
Related
I recently created an image that automatically changes depending on the time thanks to a PHP script. I'm now thinking about doing something but I'm not sure if it's possible.
I do have restrictions. I need this to work on a forum board so it means I have to have all scripting on a different server. I would Google how to do this but I'm not sure what to search hence the broad title. If someone could possibly tell me if it's possible and show a small example to get me on the right track, that'd be appreciated.
What I need to do is print text out onto the page. As I stated above, all the scripting needs to be on a different server as the forum doesn't allow for php and only basic HTML (similar to here). This means I can't use include 'file.php';.
IMHO You have two options
Use HTML iframe element to embeed the external content (just give the external URL and the browser will handle the rest)
Call ajax request from javascript and inject the result into the DOM tree of the board.
Now that Your decision what suits You more.
I got a theoretic question.
If I use a form with GET method that is leading for an external PHP file (test.php),
I suppose anyone can find out what would be the result simply by viewing the source page, getting the variables (e.g., action="test.php" name="do" value="hello"), and then typing the URL with these variables:
....test.php?do=hello
I mean, he wouldn't have to actually click the button on the original page in order to find out what happens.
However, is there anyway to know what would be the result of a POST method button, without clicking it?
Your question has two possible meanings.
One is discover what the page does, what is the result of processing. That can be found by almost anybody with enough knowledge and tools to send a post request. There are a bunch of tools that allow you to do that. You can do it with plugins for your browser, security analysing tools like webscarab, programming languages using cURL, etc.
The second meaning is determining how the result was achieved. That, is not possible to know unless the source code of the processing file is accessed and analysed.
For a system I am developing I need to programmatically go to a specific page. Fill out one field in the form (I know the id and name of the input element), submit it and store the results.
I have seen a few different Perl, python and java classes that do this. However I would like to do this using PHP and havent found anything as of yet.
I do have the permission to do this from the site i am getting the information from as well.
Any help is appreciated
Take a look at David Walsh's simple explanation.
http://davidwalsh.name/curl-post
You can easily store the response (in this example, $result) in your database or logfile.
Usually PHP crawlers/scrapers use CURL - http://php.net/manual/en/book.curl.php.
It allows you to make a query from the server where PHP runs and get response from the website that you need to crawl. It returns response data in plain format and parsing it is up to you. You can manually check what does the form submit when you do it manually, and do the same thing via curl.
You also may try phpcrawl (http://phpcrawl.cuab.de), seems to fit all your needs.
(See "addPostData()"-method)
i'm trying to grab the content of a website based on ajax and https but with no luck.
Is this possible.
The website i'm trying to crawl is this:
https://www.bet3000.com/en/html/home.html#!https://www.bet3000.com/html/en/eventssportsbook.html?category_id=2117
Thanks
If you take a look at the HTTP requests that this page is doing (using, for example, Firebug for Firefox), you'll notice it makes several Ajax requests.
Instead of trying to execute the Javascript code, a possible solution could be for you to request one of those URLs, and get the data -- you'd also not have to parse the HTML, this way.
In this specific case, one of those requests is made to the following URL :
https://www.bet3000.com/ajax/en/sportsbook.json.html?category_id=2117&offset=&live=&sportsbook_id=0
This URL seems to return some JSON data, that should interest you quite a bit ;-)
(There is a few characters before and after the JSON, that will need to be removed, but, asides from that, I don't see anything that doesn't look good.)
I'm looking for a way to extract some information from this site via PHP:
http://www.mycitydeal.co.uk/deals/london
There ist a counter where the time left is displayed, but the information is within the JavaScript. Since I'm really a JavaScript rookie, I didn't really know how to get the information.
Normally I would extract the information with "preg_match" and some regular expressions. Can someone help me to extract the information (Hrs., Min., Sec.) ?
Jennifer
Extracting the count-down time is not going to be easy, because it is fetched and set purely using JavaScript, which cannot be parsed using pure PHP. You would have to de-code the JavaScript code and see what calls it makes to fetch the initial times.
That is not an easy process, and could be changed by the site owners in no time.
Also, doing that, you would be in clear breach of their T&C:
For the avoidance of doubt, scraping of the Website (and hacking of the Website) is not allowed.
I hate to say "no", but in this situation PHP is not the right job for this. JavaScript requires a browser to run (in this case) and on top of that you probably have a jQuery lib.
The only thing PHP could do is invoke a browser that would contain some JavaScript (i.e., GreaseMonkey) that could try and scrape the page for the info. But this is really a job for embedded JavaScript.
As the others have said you can usually not access JavaScript stuff from PHP. However JavaScript has to get its data from somewhere, and this is where to start.
I found this in the source code:
<input type="hidden" id="currentTimeLeft" value="3749960"/>
That's the number of microsecond until whatever it is.
However this was only present in firefox, not when fetching it with wget. I found out it's the cookie that matters, so you'd have to request the page once, store the cookies and then access it a second time.