Fetch pages translated by Google? (PHP) - php

I have a bunch of big txt files (game walkthroughs) that I need translating from English to French. My first instinct was to host them on a server and use a PHP script to automate the translation process by doing a file_get_contents() and some URL manipulation to get the translated text. Something like:
http://translate.google.com/translate?hl=fr&sl=en&u=http://mysite.com/faq.txt
I found it poses two problems: 1) there are frames 2) the frame src values are relative (ie src="/translate_c?....") so nothing loads.
Is there any way to fetch pages translated via Google in PHP (without using their AJAX API as it's really not suitable here)?

Use cRL to get the resulting page and then parse it.

Instead of using the regular translate URL which has frames, use the src of the frame:
http://translate.googleusercontent.com/translate_c?hl=<TARGET LANGUAGE>&sl=<SOURCE LANGUAGE>&tl=af&u=http://<URL TO TRANSALTE>&rurl=translate.google.com&twu=1&usg=ALkJrhhxPIf2COh7LOgXGl4jZdEBNutZAg
For example to translate the page http://chaimchaikin.za.net/ from English to Afrikaans:
http://translate.googleusercontent.com/translate_c?hl=en&sl=en&tl=af&u=http://chaimchaikin.za.net/&rurl=translate.google.com&twu=1&usg=ALkJrhhxPIf2COh7LOgXGl4jZdEBNutZAg
This will open up only a "frameless" page of the translation.
You may want to examine and test around to find the codes for the required language.
Also bear in mind that Google may add scripts to the translation (for example to show original text on hover).
EDIT: It appears, on examing the code, that there is a lot of javascript in between the translation. You may need to find a way to get rid of it.
EDIT: Further examination shows that the end bit "usg=ALkJr..." seems to change every time. Maybe first run a request on the regular Google translate page (e.g. http://translate.google.com/translate?hl=fr&sl=en&u=http://mysite.com/faq.txt) than find and parse the "usg=.." part and use it for your next request on the "frameless" page (http://translate.googleusercontent.com/translate_c?...).

Related

how do I find out what php function from Wordpress is generating blocks of HTML using browser inspector?

I'm developing a Wordpress site, which I'm fairly new to. I'm not sure if this is a stupid question or not but I haven't been able to return any decent google results regarding this. Anyway, is there a way to find out what PHP function is generating a piece of HTML code using a browser code inspector like Chrome's? Thanks!
No.
Once the data arrive to the browser, all the PHP code have been processed and you can't know what part of PHP generated which part of the HTML code.
No - not without modifying the php code to enable some kind of debugging. Chrome can only give you information about the received html document on the client side (you). But php code gets parsed server side.
You kind of can:
Download a copy of the theme and plugins folder
Open the page on your site that you want to find the function for.
Find a div/class that is specific to section e.g. <article>
Open a text editor like notepad++ (one that will allow you to search through multiple files at ones)
Use the find feature of chosen text editor and search for the div/class
The result will show you a list of pages where that term is.
Look through those pages for the function you are looking for (it might take a few goes)
The above it is a bit of a roundabout way of doing it, but I think other than looking through each file separately, it is you next best way.

How to include another website in another using PHP

the title says it all. I've tried using iFrames, but it's not what I need. I have a website where it's echoing data (users on the site, etc..) - but I want to be able to display this data being posted on another website, using PHP to grab the text from the website.
I don't want to use iFrames, since I don't want it in PHP & I don't want the actual link where it's coming from to be shown; so I want it to be all done backend.
Website 1 contains the information I want to be displayed on website 2. For website 2 to access the data; they need to load http://example.com/page.php - where it echo's all the information. And I just want website 2 to echo/display the data in text format. If that makes sense.
$file = file_get_contents('http://www.address.com/something'); // Can be locally too, for example: something.php
echo $file;
Be carefull, this method is not 100% safe, until You are not 100% sure that the url you are echoing is safe.
Hope it helps.
To "include" a page, you can use file_get_contents, which returns the file contents as a string. All you have to do is echo the returned string and you're good.
A few things though:
Relative URLs in the page you're scraping won't work too well. You'll have to convert them to a fixed URL
It's rude to scrape someone's website. If you don't own it, ask for permission.
You said at the end of your question that you want it in text format. If you mean you want it in plain text, not formatted HTML, you'll want to call htmlspecialchars on it.
You aren't always going to be certain that the page you're scraping is kosher. It could be hacked or otherwise violated, which would put your users at risk too.

creating own audio captcha in php

I wanted to make a little audio captcha in php, so I needed to convert text to speech, but I have two restrictions:
First it should be a php-solution. creating a mp3/ogg would be fine, it could be inserted and played with audio-tags etc.
Second I need to install it on a server only using ftp-access. So, I can't use standard applications to which php would speak.
So, I already investigated some solutions:
Jquery's Jtalk can read text aloud, but it's kind of impractical here as javascripts is always open source => the captcha would be plain in the source-Code.
Google has an Api to speak aloud, too. However, you need to make a call to an extern file with the text as part of the url. so, listening to the outgoing requests will reveil the captcha, too.
I tried to combine my own audio-files using php. I have read in some posts here, that many player supports simply a echo file_get_contents['audio1.ogg'].file_get_contents['audio2.ogg']; solution. However, using the plugin in Firefox, only the first file is played. Downloading and playing in VLC reveals both audio files. I'm also not really happy with this one, even if it would work, as one could just associate the ogg-source with the letter and recognise the captcha by slicing the audio-source-code...
I also thought of loading all letters in audio-tags and playing them as needed, but that will again reveal the captcha in the web's source code.
Lastly I heard of "flite" which promised to be able to do all these things, but I think I got a little mistaken and it needs to get installed directly on the server rather than just putting a few files on an ftp.
So, does anybody know how to make a text to speech solution with only ftp-access and without contacting other websites with the text as part of the url?
Regards,
Julian
So, I have made up a solution combining javascript and php which is pleasing for my taste and could get modified for additional security (like adding noise or having something else than a letter per sound file).
It works like this: you set up a sounds-folder, protected per htaccess, only allowing a captcha.php-script to get files. There is one file per letter you want to display.
The script can also access the captcha via Session, database or protected file and has a pointer to the position that is currently read. Every time it is visited, it gives the audio of the next letter back. This could get done by e.g.
echo file_get_contents('sounds/'.$_SESSION["curaudio"].'.ogg');
Then you only need to insert the audio-element into your html:
<audio hidden id="Sound_captcha">
Your browser does not support the audio element.
</audio>
And Use javascript to switch to the next letter. For that, use the src-attribute of the audio and give the address of your captcha.php-file. Remember to add a value to prevent Cache:
"captcha.php?"+(new Date()).getTime()
You can call the play()-function of the audio-element to play the file.
To switch to the next requires to either stay at a fixed amount of time per file (very insecure) or to use the ended-event of the audio-element.
Of course, your php-script should at the end also tell when the captcha has been read completely (e.g. to be read with another script where you need a an ajax-request or e.g. the script that produces the sound does it only at every odd access, otherwise status, or the script tells you at the beginning how many reloads you need...)
That is actually all for a basic player, which would also need to get modified to prevent an easy bot-access... however, in my opinion, this is at least as secure as a standard text-captcha and removes a great barrier for people with eye-problems.

How can I save content from another website to my database?

I want to upload dynamically content from a soccer live score website to my database.
I also want to do this daily, from a single page on that website (the soccer matches for that day).
If you can help me only with the connection and retrieval of data from that webpage, I will manage the rest.
website: http://soccerstand.com/
language: php/java - mysql
Thank you !
You can use php's file function to get the data. You just pass it a URL and it returns the content as an array of lines from the file. You can also use file_get_contents to get the content as one big string.
Ethical questions about scraping other site's data aside:
With php you can do an "open" call on a website as long as you're setup corectly. See this page for more details on that and examples: http://www.php.net/manual/en/wrappers.http.php
From there you have the content of the web page and it's a matter of breaking it up. Off the top of my head, I'd use regular expressions or an HTML parser to break apart the HTML, and then loop through the child elements and parse the data into your database calls to save the data.
There are a lot of resources for parsing HTML on the web and it's simply a matter of choosing the one that will work best for you.
Keep in mind you'll need to monitor the site for changes, because if they change elements, or their classes/ids you might need to change your parsing structure as well.
Using curl you will get the content of the page, then using regex you will get what you want.
There is an easy way: http://www.jonasjohn.de/lab/htmlsql.htm

Translate a webpage in PHP

I'm looking to translate a webpage in PHP 5 so I can save the translation and make it easily accessible via mydomain.com/lang/fr/category/article.html rather than users having to go through google translate.
I've found various easy ways to translate text via CURL, however what i'd really like to be able to do is translate an entire webpage but obviously ignore the tags.
The problem is that Google Translate messes up all the HTML tags, class names etc
Does anyone know of a php class that can translate an entire webpage whilst ignoring the tags?
I'm guessing it may be possible via advanced regular expressions or something like that, but i'm not sure.
I can't just curl Google's response as i'll have all the extra JS that they put in.
Any ideas?
I know it's not quite what you asked for, but a much simpler alternative would just be to include the free Google Translate widget on all your pages. That way visitors select the language they would like to view the site in and Google dynmaically does the rest (and persists their selection throughout the site). You then don't need to worry about trying to create and keep updated dozens of different HTML files for every page, each with it's own set of internal links (which, frankly, sounds like a nightmare to maintain).

Categories