I am struggling with encoding issues in a PHP app that:
Reads an XML file and parses it according to some rules
Calls the Google Translate API and uses the result to populate a
database that is later used to display data on the browser (that
part works well)
Saves that data to an XML file (it saves but there's something wrong
with the encoding).
The data comes from Google Translate encoded in UTF-8 and in the browser, provided that you have the proper heading it displays fine whatever the language is.
Here's the Google Translate function:
function mt($text, $lang) {
$url = 'https://www.googleapis.com/language/translate/v2?key=' . $apiKey . '&q=' . rawurlencode($text) . '&source=en&target=' . $lang;
$handle = curl_init($url);
curl_setopt($handle, CURLOPT_RETURNTRANSFER, true);
curl_setopt($handle, CURLOPT_SSL_VERIFYPEER, false);
$response = curl_exec($handle);
$responseDecoded = json_decode($response, JSON_UNESCAPED_UNICODE);
$responseCode = curl_getinfo($handle, CURLINFO_HTTP_CODE);
curl_close($handle);
if($responseCode != 200) {
$resultxt = 'not200result';
}
else {
$resultxt = $responseDecoded['data']['translations'][0]['translatedText'];
}
return $resultxt;
}
I'm using Simplexml to load an XML file, modify its contents and save it with asXml().
The generated XML file is encoded in something other than UTF-8 as it looks like this:
<value>ようこそ%0 ST数学</value>
Here's the code that attributes the translation to the XML node and saves it.
$xml=simplexml_load_file('myfile.xml'); //Load source XML file
$xml->addAttribute('encoding', 'UTF-8');
$xmlFile = 'translation.xml'; //File that will be saved
//Here I have a call to the MT function above and get it to the XML file at face value.
$xml->asXML($xmlFile) //save translated XML file
I've tried using htmentities() and played with utf8_encode() and utf8_decode() but can't make it work.
I've tried everything and looked at many other posts. For the life of me, I can't figure this one out. Any help is appreciated.
Related
I have a php script that loads this webpage to extract some data from it's tables.
The following methods failed to get it's table contents:
Using file_get_contents:
$document -> file_get_contents("http://www.webpage.com/");
print_r($document);
Using cURL:
$document = curl_init('http://www.webpage.com/');
curl_setopt($document, CURLOPT_RETURNTRANSFER, true);
$html = curl_exec($document);
print_r($html);
Using loadHTMLFile:
$document->loadHTMLFile('http://www.webpage.com/');
print_r($document);
I'm not an expert in php and except the first method, the other ones are copied from StackOverflow's answers.
What am I doing wrong?
and How they do block some contents from loading?
Not the answer you're likely to want to hear, but none of the methods you describe will evaluate JavaScript and other browser resources as a normal browser client would. Instead, each of those methods retrieves the contents of only the file you've specified. A quick glance at the site you're targeting clearly shows this table in question being populated as the result of an AJAX call, which none of the methods you've tried are able to evaluate.
You'll need to lean on a library or script that has the capability for this type of emulation; namely laravel/dusk, the PHP bindings for Selenium webdriver, or something similar.
This is what I did to scrape data from a webpage using php curl:
// Defining the basic cURL function
function curl($url) {
$ch = curl_init(); // Initialising cURL
curl_setopt($ch, CURLOPT_URL, $url); // Setting cURL's URL option with the $url variable passed into the function
curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE); // Setting cURL's option to return the webpage data
$data = curl_exec($ch); // Executing the cURL request and assigning the returned data to the $data variable
curl_close($ch); // Closing cURL
return $data; // Returning the data from the function
}
// Defining the basic scraping function
function scrape_between($data, $start, $end){
$data = stristr($data, $start); // Stripping all data from before $start
$data = substr($data, strlen($start)); // Stripping $start
$stop = stripos($data, $end); // Getting the position of the $end of the data to scrape
$data = substr($data, 0, $stop); // Stripping all data from after and including the $end of the data to scrape
return $data; // Returning the scraped data from the function
}
$target_url = "https://www.somesite.com";
$scraped_website = curl($target_url);
$data_set_1 = scrape_between($scraped_website, "%before%", "%after%");
$data_set_2 = scrape_between($scraped_website, "%before%", "%after%");
The %before% and %after% is data that always shows up on the webpage before and after the data you wish to grab. Could be div tags or some other html tags that are unique to the data you wish to grab.
So maybe look into using curl and and imitate the same ajax request that the site is using? When I searched for that, this is what I found:
Mimicking an ajax call with Curl PHP
I have been searching and trying for hours and can't seem to find anything that actually solves my problem.
I'm calling a PHP function that grabs content using the Google translate API and I'm passing a string to be translated.
There are quite a few instances where the encoding is affected but I've done this before and it worked fine as far as I can remember.
Here's the code that calls that function:
$name = utf8_encode(mt($name));
And here's the actual function:
function mt($text) {
$apiKey = 'XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX';
$url = 'https://www.googleapis.com/language/translate/v2?key=' . $apiKey . '&q=' . rawurlencode($text) . '&source=en&target=es';
$handle = curl_init($url);
curl_setopt($handle, CURLOPT_RETURNTRANSFER, true);
curl_setopt($handle, CURLOPT_SSL_VERIFYPEER, false);
$response = curl_exec($handle);
echo curl_error($handle);
$responseDecoded = json_decode($response, true);
$responseCode = curl_getinfo($handle, CURLINFO_HTTP_CODE); //Fetch the HTTP response code
curl_close($handle);
if($responseCode != 200) {
$resultxt = 'failed!';
return $resultxt;
}
else {
$resultxt = $responseDecoded['data']['translations'][0]['translatedText'];
return utf8_decode($resultxt); //return($resultxt) won't work either
}
}
What I end up getting is garbled characters for any accentuated character, like GuÃa del desarrollador de XML
I've tried all combinations of encoding/decoding and I just can't get it to work...
I had this kind of issues before, what I can tell you to try is:
In the <head> tag try to add:
<meta http-equiv=”Content-type” content=”text/html; charset=utf-8″ />
Try to add it in the PHP header:
header(“Content-Type: text/html;charset=utf-8”);
Check the encoding of your file, for example in the Notepad ++
Encoding > UTF-8 without BOM
Setting charset in the .htaccess
AddDefaultCharset utf-8
As you said you are reading files from the users you can use this function: mb-convert-encoding to check for the encoding, and if it's different from UTF-8 convert it. Try this:
$content = mb_convert_encoding($content, 'UTF-8');
if (mb_check_encoding($content, 'UTF-8')) {
// log('Converted to UTF-8');
} else {
// log('Could not converted to UTF-8');
}
}
return $content;
}
?>
I'm trying to load an xml file from another website. I can do this using cURL using the following:
function getLatestPlayerXML($par1) {
$url = "http://somewebsite/page.php?par1=".$par1;
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
$xmlresponse = curl_exec($ch);
$xml = simplexml_load_string($xmlresponse);
$xml->asXML("./userxml/".$par1.".xml");
return $xml;
}
This works all well and good, however, the external website takes a long time to respond with the file, which is why I save the xml file to ./userxml/$par1.xml which also works. I load like this:
function getLocalPlayerXML($par1) {
$xml = simplexml_load_file("./userxml/".$par1.".xml");
if($xml != False) {
// How can I make it so that when called it only temporarily uses this file until the latest is available?
return $xml;
} else {
return $getLatestPlayerXML($par1);
}
}
The problem I am having is that I want it so when I call a single load function it first tries to load the xml from file and if it exists use that file until the latest file has been received at which point, update the page. If the file does not exist, simply wait until the latest file has been retrieved and then use that. Is even possible?
I am trying to get a xml document back from the webserver that also supports php.
It's something similar to what the traditional web services do but i want to achieve it in php. Is this even possible?
To be more specific about my needs -
I want to send a xml document as a request to the server, have PHP do some processing on it and send me back an xml document as a response.
Thanks in advance.
Maybe you simply want http://php.net/SOAP ?
If not SOAP, then you can send your XML POST request and use $xml = file_get_contents('php://input'); to dump it to a variable that you can feed to http://php.net/DOM or other XML processors.
After processing, you header('Content-Type: text/xml'); (or application/xml) and output the modified XML document.
Super simple example of reading an XML request body:
$request = http_get_request_body();
if($request && strpos($request, '<?xml') !== 0){
// not XML do somehting appropriate
} else {
$response = new DomDocment(); // easier to manipulate when *building* xml
$requestData = DomDocument::load($request);
// process $requestData however and build the $response XML
$responseString = $response->saveXML();
header('HTTP/1.1 200 OK');
header('Content-type: application/xml');
header('Content-length: ', strlen($responseString));
print $responseString;
exit(0);
}
use curl.
//open connection
$ch = curl_init();
//set the url, number of POST vars, POST data
curl_setopt($ch,CURLOPT_URL,$url);
curl_setopt($ch,CURLOPT_POST,1);
curl_setopt($ch,CURLOPT_POSTFIELDS,$xmlString);
//execute post
$result = curl_exec($ch);
A remote site is supplying a data structure in a js file.
I can include this file in my page to access the data and display it in my page.
<head>
<script type="text/javascript" src="http://www.example.co.uk/includes/js/data.js"></script>
</head>
Does anyone know how I use PHP to take this data and store in it a database?
You should GET that file directly, via, for example, CURL. Then parse it, if it comes in JSON, you can use json-decode.
Simple example (slightly modified version of code found here):
<?php
$url = "http://www.example.co.uk/includes/js/data.js";
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
...
$output = curl_exec($ch);
$info = curl_getinfo($ch);
if ($output === false || $info['http_code'] != 200) {
$error = "No cURL data returned for $url [". $info['http_code']. "]";
if (curl_error($ch))
$error .= "\n". curl_error($ch);
}
else {
$js_data = json_decode($output);
// 'OK' status; save $class members in the database, or the $output directly,
// depending on what you want to actually do.
...
}
//Display $error or do something about it
?>
You can grab the file via CURL or some other HTTP downloading library/function. Then, parse the data. If you're lucky, the data is in a JSON format and you can use a PHP function to convert it into a PHP array. Then, iterate through the items in the array, inserting each into your database.