the title says it all. I've tried using iFrames, but it's not what I need. I have a website where it's echoing data (users on the site, etc..) - but I want to be able to display this data being posted on another website, using PHP to grab the text from the website.
I don't want to use iFrames, since I don't want it in PHP & I don't want the actual link where it's coming from to be shown; so I want it to be all done backend.
Website 1 contains the information I want to be displayed on website 2. For website 2 to access the data; they need to load http://example.com/page.php - where it echo's all the information. And I just want website 2 to echo/display the data in text format. If that makes sense.
$file = file_get_contents('http://www.address.com/something'); // Can be locally too, for example: something.php
echo $file;
Be carefull, this method is not 100% safe, until You are not 100% sure that the url you are echoing is safe.
Hope it helps.
To "include" a page, you can use file_get_contents, which returns the file contents as a string. All you have to do is echo the returned string and you're good.
A few things though:
Relative URLs in the page you're scraping won't work too well. You'll have to convert them to a fixed URL
It's rude to scrape someone's website. If you don't own it, ask for permission.
You said at the end of your question that you want it in text format. If you mean you want it in plain text, not formatted HTML, you'll want to call htmlspecialchars on it.
You aren't always going to be certain that the page you're scraping is kosher. It could be hacked or otherwise violated, which would put your users at risk too.
Related
im working on a script for indexing and downloading whole website by user sent url
for example when a user submit a domain like http://example.com then i will copy all links in index page and go for download the its inside links and start from first.....
i do this part with curl and regular expression to download and extract the links
however
some yellow websites are making fake urls for example if you go to http://example.com?page=12 it have some links to http://example.com?page=12&id=10 or http://example.com?page=13 and etc..
this will make a loop and the script cant complete the site downloading
is there any way to detect these kind of pages!?
p.s.: i think google and yahoo and some other search engines face this kind of problem too but their database are clear and on searches thay dont show these kind of data....
Some pages may use GET variables and be perfectly valid (like as you've mentioned here, ?page=12 and ?page=13 may be acceptable). So what I believe you're actually looking for here is a unique page.
It's not possible however to detect these straight from their URL. ?page=12 may point to exactly the same thing as ?page=12&id=1 does; they may not. The only way to detect one of these is to download it, compare the download to pages you've already got, and as a result find out if it really is one you haven't seen yet. If you have seen it before, don't crawl its links.
Minor side note here: Make sure you block websites from a different domain, otherwise you may accidentally start crawling the whole web :)
I'm developing a web application where an html page is created for the user. The page could include anything that the user puts in it. We take these pages and add a little PHP at the top to check some things before outputting the actual html. It would look kind of like this:
<?php
require 'checksomestuff.php';
// User's html below
?>
<html>
<!-- user's html -->
</html>
Is there a way to stop PHP from parsing anything after my require? I need the html to be outputted, but, since the user can add anything they want to the html, I don't want any user-added PHP to be executed. Obviously that would be a security issue. So, I want the user's html to be outputted, but not parse any PHP. I would rather not have to put the user's html into another file.
One sensible way would be to offload the user created content to another file and then you should load this file (in your main php file) and output it as is - without parsing it as PHP.
There are many other ways to do this but if creating another file does the job for you then thats probably the best way forward.
UPDATE: Really must read last line of question!
You could encode the html into a variable using base64 encoding which you then just print out the decoded string.
If you don't store the file data in a php file, say n a txt or html file, the php won't be evaluated.
Alternatively you could read the file via file_get_contents() or by some other means which doesn't involve evaluating php.
Though I'm still tempted to ask why you want to do this (particularly this way), it sounds to me like one of the only things that can help you is the special __halt_compiler() function...
That should prevent it from executing the rest of the page, and may or may not output the rest of it. If not, well, read the first (and currently only) example in the PHP's manual for that function (linked above) for how to do it manually.
The only trouble I see with this method is that you'd probably have to have that code in every file you want to do this for, after your require.
I'm trying to build something similar to the "Try on your site" on http://www.mywebpresenters.com/
I want to let users enter in their URL, then I need to save the HTML of that URL/Page to a MySQL database using PHP. I then need to insert a div containing more code and then serve up the whole lot to the user again.
I have done this using an IFrame but I'd like to do it better.
Can anyone shed light on this? Also, This will be used on a WordPress site if that adds in anyway.
Thanks in advance,
Barry
If you want to save the whole html page in the database use the smarty. It simply fetch a html file in a variable like
$myvar = smarty->fetch('htmlfile');
now you can simply save the $myvar into the sql.
Unless you are prepared to parse the HTML and remove the head/body tags of the scraped page, than an iFrame is the way to go. You could probably use a find/replace algorithm when you retrieve the stored HTML from the server to insert the containing code that you need into the correct location.
The only way I could personally think of doing this is having them upload the page to your site and then doing a mysql query using PHP to store it into database. Then using the PHP 5 DomDocument model to take that same code and save it as a dynamic page.
Well actually, you can do it two ways. You can use PHP cURL library to get the url of the page and simply parse and save the HTML code to the database as well.
To present it in a div you want to use javascript and get the div a target _top attribute so window.open can load the content in that page.
You don't even need WordPress for this to be honest. This can be done by scratch.
I want to upload dynamically content from a soccer live score website to my database.
I also want to do this daily, from a single page on that website (the soccer matches for that day).
If you can help me only with the connection and retrieval of data from that webpage, I will manage the rest.
website: http://soccerstand.com/
language: php/java - mysql
Thank you !
You can use php's file function to get the data. You just pass it a URL and it returns the content as an array of lines from the file. You can also use file_get_contents to get the content as one big string.
Ethical questions about scraping other site's data aside:
With php you can do an "open" call on a website as long as you're setup corectly. See this page for more details on that and examples: http://www.php.net/manual/en/wrappers.http.php
From there you have the content of the web page and it's a matter of breaking it up. Off the top of my head, I'd use regular expressions or an HTML parser to break apart the HTML, and then loop through the child elements and parse the data into your database calls to save the data.
There are a lot of resources for parsing HTML on the web and it's simply a matter of choosing the one that will work best for you.
Keep in mind you'll need to monitor the site for changes, because if they change elements, or their classes/ids you might need to change your parsing structure as well.
Using curl you will get the content of the page, then using regex you will get what you want.
There is an easy way: http://www.jonasjohn.de/lab/htmlsql.htm
I have a bunch of big txt files (game walkthroughs) that I need translating from English to French. My first instinct was to host them on a server and use a PHP script to automate the translation process by doing a file_get_contents() and some URL manipulation to get the translated text. Something like:
http://translate.google.com/translate?hl=fr&sl=en&u=http://mysite.com/faq.txt
I found it poses two problems: 1) there are frames 2) the frame src values are relative (ie src="/translate_c?....") so nothing loads.
Is there any way to fetch pages translated via Google in PHP (without using their AJAX API as it's really not suitable here)?
Use cRL to get the resulting page and then parse it.
Instead of using the regular translate URL which has frames, use the src of the frame:
http://translate.googleusercontent.com/translate_c?hl=<TARGET LANGUAGE>&sl=<SOURCE LANGUAGE>&tl=af&u=http://<URL TO TRANSALTE>&rurl=translate.google.com&twu=1&usg=ALkJrhhxPIf2COh7LOgXGl4jZdEBNutZAg
For example to translate the page http://chaimchaikin.za.net/ from English to Afrikaans:
http://translate.googleusercontent.com/translate_c?hl=en&sl=en&tl=af&u=http://chaimchaikin.za.net/&rurl=translate.google.com&twu=1&usg=ALkJrhhxPIf2COh7LOgXGl4jZdEBNutZAg
This will open up only a "frameless" page of the translation.
You may want to examine and test around to find the codes for the required language.
Also bear in mind that Google may add scripts to the translation (for example to show original text on hover).
EDIT: It appears, on examing the code, that there is a lot of javascript in between the translation. You may need to find a way to get rid of it.
EDIT: Further examination shows that the end bit "usg=ALkJr..." seems to change every time. Maybe first run a request on the regular Google translate page (e.g. http://translate.google.com/translate?hl=fr&sl=en&u=http://mysite.com/faq.txt) than find and parse the "usg=.." part and use it for your next request on the "frameless" page (http://translate.googleusercontent.com/translate_c?...).