I developed a desktop AIR application which saves data locally as XML files.
When the user clicks a button, the xml content is posted to a PHP application, where data is handled (inserted or updated).
The problem is, when I post some data containing html entities like & or >, the xml content is not parsed.
I'm using this class to parse the xml content:
https://sites.google.com/site/floweringmind/home
This is the input:
<companie_ps>
<denCompanie>Company & Friends</denCompanie>
<email>abc#abc.ro</email>
</companie_ps>
This is the wanted result:
Array
(
[0] => Array
(
[denCompanie] => Company & Friends
[email] => abc#abc.ro
)
)
This is the actual result:
Array
(
[0] => Array
(
[denCompanie] => Array
(
[#content] => Company
)
)
)
Error message: XML parse error 68 'XML_ERR_NAME_REQUIRED' at line 5, column 28 (byte index 63).
EDIT: The problem came from the fact that the post was converting html entities inside xml tags contents to actual characters. So, I used a regex to replace the special characters with their html entities. This is the code, in case someone will need it:
function _handle_match($match)
{
return '<' . $match[1] . '>' . htmlentities($match[2]) . '</' . $match[3] . '>';
}
$pattern = "/\<(.*)\>(.*?)\<\/(.*)\>/imU";
$xml = preg_replace_callback($pattern, '_handle_match', $xml);
Related
Not an experienced developer and using CodeIgniter for the first time. I'm trying to grab a signed URL for a given MP3 filename stored in S3. This is currently working with the exception of files that contain brackets.
Relevant controller code:
function index ($streamfile) {
// Load S3 client
$this->load->spark('amazon-sdk');
$s3 = $this->awslib->get_s3();
// Define request parameters
$s3bucket = $userdata['s3bucket']; // defined elsewhere
$streamfiletest = ($string)'Crazy_(Remix).mp3';
// Request signed URL
$url = $s3->get_object_url($s3bucket, ***EITHER $streamfiletest or $streamfile***, '5 minutes');
// Fetch status code
$http = new CFRequest($url);
$http->add_header('Content-Type', '');
$http->send_request(true);
$code = $http->get_response_code();
$headers = $http->get_response_header();
// Load the view
$data['filename'] = $url;
$data['debug'] = array(
'file1' => $streamfile,
'file2' => $streamfiletest,
'signed_url' => $url,
'code' => $code,
'headers' => $headers
);
$this->load->view('play', $data);
Relevant view code:
<?php if (isset($debug)) {
echo "DEBUGS:";
echo '<pre>' . print_r($debug, TRUE) . '</pre>';
} ?>
As you can see I either pass $streamfile or $streamfiletest. In the debug I can confirm that both variables are the same string.
When passing $streamfile to the URL request, the URL in the response is incorrect:
DEBUGS:
[file1] => Crazy_(Remix).mp3
[file2] => Crazy_(Remix).mp3
[signed_url] => http://s3-...(removed)/Crazy_%26%2340%3BRemix%26%2341%3B.mp3?AWSAccessKey...
[code] => 404
You can see that the brackets have been strangely encoded %26%2340%3B and therefore I can't find the file in S3.
When passing $streamfiletest however, the response is fine:
DEBUGS:
[file1] => Crazy_(Remix).mp3
[file2] => Crazy_(Remix).mp3
[signed_url] => http://s3-...(removed)/Crazy_%28Remix%29.mp3?AWSAccessKey...
[code] => 200
The brackets are encoded correctly in the signed URL an I get a HTTP 200 from S3.
Any ideas what could be causing this?
In the debug I can confirm that both variables are the same string
Actually, not quite.
If you look closely, it becomes apparent what the url escaped values must mean:
%26%2340%3B %26%2341%3B
& # 40 ; & # 41 ;
Those are numeric html character codes that the browser will display as ( and ) but it does not in fact mean that the two strings have identical content. They only appear to.
The solution, of course, depends on how they are getting transformed that way, and either not doing that, or decoding the numeric character codes.
Try doing the following to decode the url encoded brackets
$data['filename'] = urldecode($url);
This should return the string to its expected format ie with brackets
This question already has answers here:
Scrape web page data generated by javascript
(2 answers)
Closed 8 years ago.
I am stuck with a scraping task in my project.
i want to grab the data from the link in $html , all table content of tr and td , here i am trying to grab the link but it only shows javascript: self.close()
<?php
include("simple_html_dom.php");
$html = file_get_html('http://www.areacodelocations.info/allcities.php?ac=201');
foreach($html->find('a') as $element)
echo $element->href . '<br>';
?>
Usually, this kind of pages load a bunch of Javascript (jQuery, etc.), which then builds the interface and retrieves the data to be displayed from a data source.
So what you need to do is open that page in Firefox or similar, with a tool such as Firebug in order to see what requests are actually being done. If you're lucky, you will find it directly in the list of XHR requests. As in this case:
http://www.govliquidation.com/json/buyer_ux/salescalendar.js
Notice that this course of action may infringe on some license or terms of use. Clear this with the webmaster/data source/copyright owner before proceeding: detecting and forbidding this kind of scraping is very easy, and identifying you is probably only slightly less so.
Anyway, if you issue the same call in PHP, you can directly scrape the data (provided there is no session/authentication issue, as seems the case here) with very simple code:
<?php
$url = "http://www.govliquidation.com/json/buyer_ux/salescalendar.js";
$json = file_get_contents($url);
$data = json_decode($json);
?>
This yields a data object that you can inspect and convert in CSV by simple looping.
stdClass Object
(
[result] => stdClass Object
(
[events] => Array
(
[0] => stdClass Object
(
[yahoo_dur] => 11300
[closing_today] => 0
[language_code] => en
[mixed_id] => 9297
[event_id] => 9297
[close_meridian] => PM
[commercial_sale_flag] => 0
[close_time] => 01/06/2014
[award_time_unixtime] => 1389070800
[category] => Tires, Parts & Components
[open_time_unixtime] => 1388638800
[yahoo_date] => 20140102T000000Z
[open_time] => 01/02/2014
[event_close_time] => 2014-01-06 17:00:00
[display_event_id] => 9297
[type_code] => X3
[title] => Truck Drive Axles # Killeen, TX
[special_flag] => 1
[demil_flag] => 0
[google_close] => 20140106
[event_open_time] => 2014-01-02 00:00:00
[google_open] => 20140102
[third_party_url] =>
[bid_package_flag] => 0
[is_open] => 1
[fda_count] => 0
[close_time_unixtime] => 1389045600
You retrieve $data->result->events, use fputcsv() on its items converted to array form, and Bob's your uncle.
In the case of the second site, you have a table with several TR elements, and you want to catch the first two TD children of each TR.
By inspecting the source code you see something like this:
<tr>
<td> Allendale</td>
<td> Eastern Time
</td>
</tr>
<tr>
<td> Alpine</td>
<td> Eastern Time
</td>
So you just grab all the TR's
<?php
include("simple_html_dom.php");
$html = file_get_html('http://www.areacodelocations.info/allcities.php?ac=201');
$fp = fopen('output.csv', 'w');
if (!$fp) die("Cannot open output CSV - permission problems maybe?");
foreach($html->find('tr') as $tr) {
$csv = array(); // Start empty. A new CSV row for each TR.
// Now find the TD children of $tr. They will make up a row.
foreach($tr->find('td') as $td) {
// Get TD's innertext, but
$csv[] = $td->innertext;
}
fputcsv($fp, $csv);
}
fclose($fp);
?>
You will notice that the CSV text is "dirty". That is because the actual text is:
<td> Alpine</td>
<td> Eastern Time[CARRIAGE RETURN HERE]
</td>
So to have "Alpine" and "Eastern Time", you have to replace
$csv[] = $td->innertext;
with something like
$csv[] = strip(
html_entity_decode (
$td->innertext,
ENT_COMPAT | ENT_HTML401,
'UTF-8'
)
);
Check out the PHP man page for html_entity_decode() about character set encoding and entity handling. The above ought to work -- and an ought and fifty cents will get you a cup of coffee :-)
I'm trying to add a server side image uploaded from a user form to a Word document that gets generated using LiveDocx.
My word template looks like this.
«image:photo»
AKA
{ MERGEFIELD image:photo \* MERGEFORMAT }
My php looks like this.
$mailMerge = new Zend_Service_LiveDocx_MailMerge();
$mailMerge->uploadImage($this->logo_path);
$mailMerge->assign('image:photo', $this->logo_path);
I just get a blank area where the image should be. My other merge fields are working properly.
I didn't realize that LiveDocx stores only the name of the file to be referenced. I found this out using:
$mailMerge->listImages();
The format came back like this:
array
0 => array
'filename' => string 'directory_logo.png' (length=18)
'fileSize' => int 12829
'createTime' => int 1352835686
'modifyTime' => int 1352835686
So the template file was fine with this format:
«image:photo»
AKA
{ MERGEFIELD image:photo \* MERGEFORMAT }
But my php needed to look like this:
$mailMerge->uploadImage($this->logo_path);
$mailMerge->assign('image:photo', $this->logo_file);
My full working code looks like this:
$mailMerge = new Zend_Service_LiveDocx_MailMerge();
$mailMerge->setUsername($username)
->setPassword($password);
$mailMerge->setLocalTemplate($template_path . '/service_template.docx');
$mailMerge->uploadImage($this->logo_path);
$mailMerge->assign('image:photo', 'directory_logo.png');
$mailMerge->createDocument();
$document = $mailMerge->retrieveDocument('docx');
file_put_contents($this->config->livedocx->document . '/' . $this->prefs['serverid'] . '/service_directory.docx', $document);
$mailMerge->deleteImage('directory_logo.png');
I am trying to build a restful web service for my website. I have a php mysql query using the following code:
function mysql_fetch_rowsarr($result, $taskId, $num, $count){
$got = array();
if(mysql_num_rows($result) == 0)
return $got;
mysql_data_seek($result, 0);
while ($row = mysql_fetch_assoc($result)) {
$got[]=$row;
}
print_r($row)
print_r(json_encode($result));
return $got;
which returns the following using the print_r($data) in the code above
Array ( [0] => Array ( [show] => Blip TV Photoshop Users TV [region] => UK [url] => http://blip.tv/photoshop-user-tv/rss [resourceType] => RSS / Atom feed [plugin] => Blip TV ) [1] => Array ( [show] => TV Highlights [region] => UK [url] => http://feeds.bbc.co.uk/iplayer/highlights/tv [resourceType] => RSS / Atom feed [plugin] => iPlayer (UK) ) )
Here is the json it returns:
[{"show":"Blip TV Photoshop Users TV","region":"UK","url":"http:\/\/blip.tv\/photoshop-user-tv\/rss","resourceType":"RSS \/ Atom feed","plugin":"Blip TV"},{"show":"TV Highlights","region":"UK","url":"http:\/\/feeds.bbc.co.uk\/iplayer\/highlights\/tv","resourceType":"RSS \/ Atom feed","plugin":"iPlayer (UK)"}]
I am using the following code to add some items to the array then convert it to json and return the json.
$got=array(array("resource"=>$taskId,"requestedSize"=>$num,"totalSize"=>$count,"items"),$got);
using the following code to convert it to json and return it.
$response->body = json_encode($result);
return $response;
this gives me the following json.
[{"resource":"video","requestedSize":2,"totalSize":61,"0":"items"},[{"show":"Blip TV Photoshop Users TV","region":"UK","url":"http:\/\/blip.tv\/photoshop-user-tv\/rss","resourceType":"RSS \/ Atom feed","plugin":"Blip TV"},{"show":"TV Highlights","region":"UK","url":"http:\/\/feeds.bbc.co.uk\/iplayer\/highlights\/tv","resourceType":"RSS \/ Atom feed","plugin":"iPlayer (UK)"}]]
The consumers of the API want the json in the following format and I cannot figure out how to get it to come out this way. I have searched and tried everything I can find and still not get it. And I have not even started trying to get the xml formatting
{"resource":"video", "returnedSize":2, "totalSize":60,"items":[{"show":"Blip TV Photoshop Users TV","region":"UK","url":"http://blip.tv/photoshop-user-tv/rss","resourceType":"RSS / Atom feed","plugin":"Blip TV"},{"show":"TV Highlights","region":"UK", "url":"http://feeds.bbc.co.uk/iplayer/highlights/tv","resourceType":"RSS / Atom feed","plugin":"iPlayer (UK)"}]}
I appreciate any and all help with this. I have setup a copy of the database with readonly access and can give all the source code it that will help, I will warn you that I am just now learning php, I learned to program in basic, fortran 77 so the php is pretty messy and I would guess pretty bloated.
OK The above about json encoding was answered. The API consumers also want the special character "/", not to be escaped since it is a URL. I tried the "JSON_UNESCAPED_SLASHES " in the json_encode and got the following error.
json_encode() expects parameter 2 to be long
Your $result line should look like
$result=array(
"resource"=>$taskId,
"requestedSize"=>$num,
"totalSize"=>$count,
"items" => $got
);
hay i am working on barcode reader project when i call upcdatabase from my php script it give me errors. i use the php example provided by www.upcdatabase.com
the code is
<?php error_reporting(E_ALL);
ini_set('display_errors', true);
require_once 'XML/RPC.php';
$rpc_key = 'xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx'; // Set your rpc_key here
$upc='0639382000393';
// Setup the URL of the XML-RPC service
$client = new XML_RPC_Client('/xmlrpc', 'http://www.upcdatabase.com');
$params = array( new XML_RPC_Value( array(
'rpc_key' => new XML_RPC_Value($rpc_key, 'string'),
'upc' => new XML_RPC_Value($upc, 'string'),
), 'struct'));
$msg = new XML_RPC_Message('lookup', $params);
$resp = $client->send($msg);
if (!$resp)
{
echo 'Communication error: ' . $client->errstr;
exit;
}
if(!$resp->faultCode())
{
$val = $resp->value();
$data = XML_RPC_decode($val);
echo "<pre>" . print_r($data, true) . "</pre>";
}else{
echo 'Fault Code: ' . $resp->faultCode() . "\n";
echo 'Fault Reason: ' . $resp->faultString() . "\n";
}
?>
when i check the $upc='0639382000393'; into upc data base view this then it works fine but i run this script into the browser then it give the following error Array
(
[status] => fail
[message] => Invalid UPC length
)
Unfortunately, their API appears rather short on documentation.
There are three types of codes the site mentions on the Item Lookup page:
13 digits for an EAN/UCC-13
12 digits for a Type A UPC code, or
8 digits for a Type-E (zero-supressed) UPC code.
Right after the page mentions those three types, it also says,
Anything other than 8 or 12 digits is not a UPC code!
The 13-digit EAN/UCC-13 is a superset of UPC. It includes valid UPCs, but it has many other values that are not valid UPCs.
From the Wikipedia article on EAN-13:
If the first digit is zero, all digits in the first group of six are encoded using the patterns used for UPC, hence a UPC barcode is also an EAN-13 barcode with the first digit set to zero.
Having said that, when I removed the leading zero from $upc, it worked as expected. Apparently the Item Lookup page has logic to remove the leading zero, while the API does not.
Array
(
[upc] => 639382000393
[pendingUpdates] => 0
[status] => success
[ean] => 0639382000393
[issuerCountryCode] => us
[found] => 1
[description] => The Teenager's Guide to the Real World by BYG Publishing
[message] => Database entry found
[size] => book
[issuerCountry] => United States
[noCacheAfterUTC] => 2011-01-22T14:46:15
[lastModifiedUTC] => 2002-08-23T23:07:36
)
Alternatively, instead of setting the upc param, you can set the original 13-digit value to the ean param and it will also work.