good day everyone!
i am trying to append a script to a remote page (not mine and it is a form page) that would hide some of its content (some in particular) before showing it. i am using curl but the only thing i could do is retrieve its html code.
is there anyway of doing what i wanted to happen?
I'm assuming that the user asks your server for content, and your server needs to fetch that content on another server and process it before sending it back to the user.
Query the other script using CURL, then run your script on that HTML to remove the pieces that you don't want to keep (I hope for your sake that they are reasonably easy to find and eliminate), and finally output the resulting HTML to the user.
To remove some part of the html, you could preg_replace() it using regular expressions.
Googling for an online regexp might be of some help, if you have no experience with regular expressions.
Related
For a system I am developing I need to programmatically go to a specific page. Fill out one field in the form (I know the id and name of the input element), submit it and store the results.
I have seen a few different Perl, python and java classes that do this. However I would like to do this using PHP and havent found anything as of yet.
I do have the permission to do this from the site i am getting the information from as well.
Any help is appreciated
Take a look at David Walsh's simple explanation.
http://davidwalsh.name/curl-post
You can easily store the response (in this example, $result) in your database or logfile.
Usually PHP crawlers/scrapers use CURL - http://php.net/manual/en/book.curl.php.
It allows you to make a query from the server where PHP runs and get response from the website that you need to crawl. It returns response data in plain format and parsing it is up to you. You can manually check what does the form submit when you do it manually, and do the same thing via curl.
You also may try phpcrawl (http://phpcrawl.cuab.de), seems to fit all your needs.
(See "addPostData()"-method)
I am trying to get data from a site and be able to manipulate it to display it on my own site.
The site contains a table with ticks and is updated every few hours.
Here's an example: http://www.astynomia.gr/traffic-athens.php
This data is there for everyone to use, and I will mention them on my own site just to be sure.
I've read something about php's cURL but I have no idea if this is the way to go.
Any pointers/tutorials, or code anyone could provide so I can start somewhere would be very helpful.
Also any pointers on how I can get informed as soon as the site is updated?
If you want to crawl the page, use something like Simple HTML DOM Parser for PHP. That'll server your purpose.
First, your web host/localhost should have the php_curl extension enabled.
To start with, you should read a bit here. If you want to jump in directly, there is a simple function here Why I can't get website content using CURL. You just have to change the value of the variables $url,$timeout
Lastly, to get the updated data every 2hrs you will have to run the script as a cronjob. Please refer to this post
PHP - good cronjob/crontab/cron tutorial or book
Is there a way I can pass a PHP variable from a PHP page to a JQuery page and back to a PHP page?
Maybe I'm wrong but I believe you may be confusing things.
Server Generates page using PHP.
Generated page is composed of HTML, JAVASCRIPT whatever...
Client receives page
Received page is interpreted (JavaScript code is run)
In the end what you are asking for is possible but, by the way you put the question, I though that the above should be clarified.
How to do it?
Say you want to pass id=123 from server to client and then back.
generate page with a tag, say <span id="js-val">123</span>
have client read the contents of id="js-val
client can then resend the 123 using POST or GET that really depends on what you want.
Hope it helps to clear things up.
I want to upload dynamically content from a soccer live score website to my database.
I also want to do this daily, from a single page on that website (the soccer matches for that day).
If you can help me only with the connection and retrieval of data from that webpage, I will manage the rest.
website: http://soccerstand.com/
language: php/java - mysql
Thank you !
You can use php's file function to get the data. You just pass it a URL and it returns the content as an array of lines from the file. You can also use file_get_contents to get the content as one big string.
Ethical questions about scraping other site's data aside:
With php you can do an "open" call on a website as long as you're setup corectly. See this page for more details on that and examples: http://www.php.net/manual/en/wrappers.http.php
From there you have the content of the web page and it's a matter of breaking it up. Off the top of my head, I'd use regular expressions or an HTML parser to break apart the HTML, and then loop through the child elements and parse the data into your database calls to save the data.
There are a lot of resources for parsing HTML on the web and it's simply a matter of choosing the one that will work best for you.
Keep in mind you'll need to monitor the site for changes, because if they change elements, or their classes/ids you might need to change your parsing structure as well.
Using curl you will get the content of the page, then using regex you will get what you want.
There is an easy way: http://www.jonasjohn.de/lab/htmlsql.htm
I'm looking for a way to extract some information from this site via PHP:
http://www.mycitydeal.co.uk/deals/london
There ist a counter where the time left is displayed, but the information is within the JavaScript. Since I'm really a JavaScript rookie, I didn't really know how to get the information.
Normally I would extract the information with "preg_match" and some regular expressions. Can someone help me to extract the information (Hrs., Min., Sec.) ?
Jennifer
Extracting the count-down time is not going to be easy, because it is fetched and set purely using JavaScript, which cannot be parsed using pure PHP. You would have to de-code the JavaScript code and see what calls it makes to fetch the initial times.
That is not an easy process, and could be changed by the site owners in no time.
Also, doing that, you would be in clear breach of their T&C:
For the avoidance of doubt, scraping of the Website (and hacking of the Website) is not allowed.
I hate to say "no", but in this situation PHP is not the right job for this. JavaScript requires a browser to run (in this case) and on top of that you probably have a jQuery lib.
The only thing PHP could do is invoke a browser that would contain some JavaScript (i.e., GreaseMonkey) that could try and scrape the page for the info. But this is really a job for embedded JavaScript.
As the others have said you can usually not access JavaScript stuff from PHP. However JavaScript has to get its data from somewhere, and this is where to start.
I found this in the source code:
<input type="hidden" id="currentTimeLeft" value="3749960"/>
That's the number of microsecond until whatever it is.
However this was only present in firefox, not when fetching it with wget. I found out it's the cookie that matters, so you'd have to request the page once, store the cookies and then access it a second time.