Determine if a PHP resource contains binary or text - php

I am using MongoDB and storing files into GridFS using PHP. I am pulling files out via:
$mongo = new Mongo;
$images = $monogo->my_db->getGridFS('images');
$image = $images->findOne('epic-beard-man.png');
$stream = $image->getResource();
Which is cool, because $stream is a PHP resource. The thing I need, is to determine if the stream/resource is binary or text. If it is text, I want to output it, otherwise if it is binary, I don't want to output it.
Is there a magical function like: is_binary($stream)
EDIT
echo get_resource_type($stream);
Returns STREAM. Hum, not very useful.

You cannot check this without actually reading from the resource. You can read the whole thing and look for non-printable characters (which should happen pretty fast if it is an image). You can check for "printability" with ctype_print, which will unfortunately return false for tabs and newlines, so it may not be the best one after all. You can also build your own regex to check the data:
preg_match(':^(\P{Cc}|[\t\n])*$:', $data)
The best and easiest thing to do is however to save the data type, possibly the MIME type, together with the object. That way you do not need to do anything magic at display time.
I think that schemaless databases like MongoDB needs at least as much care in the design stage as relational databases. This is a typical thing to think about when designing a database: what type do my data have?

Related

PHP - alternative to Base64 with shorter results?

I'm currently using base64 to encode a short string and decode it later, and wonder if a better (shorter) alternative is possible.
$string = '/path/to/img/image.jpg';
$convertedString = base64_encode($string);
// New session, new user
$convertedString = 'L3BhdGgvdG8vaW1nL2ltYWdlLmpwZw==';
$originalString = base64_decode('L3BhdGgvdG8vaW1nL2ltYWdlLmpwZw==');
// Can $convertedString be shorter by any means ?
Requirements :
Shorter result possible
Must be reversible any time in a different session (therefore unique)
No security needed (anyone can guess it)
Any kind of characters that can be used in a URL (except slashes)
Can be an external lib
Goal :
Get a clean unique id from a path file that is not the path file and can be used in a URL, without using a database.
I've searched and read a lot, looks like it doesn't exist but couldn't find a definitive answer.
Well since you're using these in a URL, why not use rawurlencode($string) and rawurldecode($encodedString)?
If you can reserve one character like - (i.e., ensure that - never appears in your file names), you can do even better by doing rawurlencode(str_replace('/', '-', $string)) and str_replace('-', '/', rawurldecode($encodedString)). Depending on the file names you pick, this will create IDs that are the same length as the original filename. (This won't work if your file names have multi-byte characters in them; you will need to use some mb_* functions for that case.)
You could try using compression functions, but for strings as short as file paths, compression usually makes the output larger than the input.
Ultimately, unless you are willing to use a database, disallow certain file names, or you know something about what kinds of file names will come up, the best you can hope for is IDs that are as short or almost as short as the original file names. Otherwise, this would be a universal compression function, which is impossible.
I don't think there is anything reliable out there that would significantly shorten the encoded string and keep it URL friendly.
e.g. if you use something like
$test = gzcompress(base64_encode($parameter), 9, ZLIB_ENCODING_DEFLATE);
echo $test;
it would generate characters that are not URL-friendly and any post-transformation would be just a risky mess.
However, you can easily transform text to get URL-friendly parameters.
I use the following code to generate URL-friendly parameters:
$encodedParameter = urlencode(base64_encode($parameter));
And the following code to decode it:
$parameter = base64_decode(urldecode($encodedParameter));
As an alternative solution, you could use generated tokens to map known files using some database.

Pull a mime type out of a URI using PHP

So after searching through multiple documentation sources I'm still no closer to figuring out how to extract the Mime type from a data URI that has already been processed and stored in a DB.
That is a quick snap shot of the exact data I have to work with. I just want a dynamic way to always get the "image/png" part which may change with each image in the DB.
I'm using PHP.
Not an elegant solution, but you could do:
// assume you've set $image_uri to be the URI from the database
$image_parts = explode(";", $image_uri); // split on the ; after the mime type
$mime_type = substr($image_parts[0], 5); // get the information after the data: text
It could be done with regular expressions, but I'm not good enough at them to come up with it.
Here's the elegant solution, using the mime_content_type function.
return mime_content_type($data_uri);
Just pass the URI to the function, and it'll work (source).

PHP - DOMDocument load XML with encoded name

Lets say that in my flash project I have script that create for me xml files dynamically (by PHP). XML file name is based on specific variable and escaped using escape(variable) in case that variable may (and mostly do) contains unsupported filename chars...
I need to know precise name of xml file later in my flash project, because I'm loading these XML files only if unescape(XMLfile) == variable . There's a lot of variables, so I can't just use String.replace() function to wipe out unsuported fileneme chars...
There's part of PHP file I'm using:
$XMLDom = new DomDocument('1.0', 'UTF-8');
$xmlId = trim($_POST['xmlId']);
if(file_exists($xmlId)){
$XMLDom ->load($xmlId);
}else{
$newXMLHandler = fopen($xmlId, 'w') or die("can't open file");
fclose($newXMLHandler);
$XMLDom ->load($xmlId);
.... rest of the code ....
$XMLDom ->save($xmlId);
}
The result of the code above is that in directory are 2 newly created XML files
One XML empty created by fopen($xmlId, 'w'), named: "fi%20le%2C%2E%40.xml"
and second one named: "fi le,.#.xml" where all my new XML data is stored...
Is there any way to load escaped named XML file by PHP?
Thanks in advance.
Arthur.
I don't feel quite confident I understand your problem, but if your question was to find the analogue function to escape() in PHP, then urlencode() looks like the best match, but you need to research what exactly is being escaped. Note, for example, that there are several different ways to percent-encode strings, especially the multibyte strings. Flash may use escapeMultibyte() or it can also use encodeURIComponent() both encode different subsets of characters, and differently - so beware!
Now, regarding file names, if your HTTP server is running on Unix system, than "fi le,.#.xml" is a valid file name, nothing to worry about - inconvenient some times, but it is a legitimate name.
touch 'fi le,.#.xml'
would create a file, no problems there. Basically, the restricted characters are the slashes and the null character ("\x00"), but it is common to restrict also the characters that may be interpreted as shell commands - this is really up to you.

php how to write data into json?

I am newer for php. I want make php page cache, query data from mysql and store data into json format.
I have many questions:
which type of file should I store? .json or .txt or .cache? for I also need use json decode return datas into page.
I want use cron tab, make many mysql queries and write into one json file. what write code should I choose? fopen, fwrite or file_get_contents or other command? (do not cover the data, but continue write. I will deleted the file and renewer it at the next cron time)
If a multi write into a json data (10 or more mysql query at the same time and write into a same json file, each json child format like {name: ".$row['name']."}), how to completed a top { and bottom } to make a standad json data format?
{ //how to add this one
{name: ".$row['name']."}
{name: ".$row['name']."}
// many name from 10 more mysql queries
} //and this one
Thanks.
It's json_encode()
json_encode() — Returns the JSON representation of a value
<?php
$arr = array ('a'=>1,'b'=>2,'c'=>3,'d'=>4,'e'=>5);
echo json_encode($arr);
?>
which type of file should I store
It doesn't matter. There is no fixed extension, but I would pick .json just to make it clear what the file is supposed to contain.
what write code should I choose?
Just use file_put_contents to put the JSON string (see next section) into a file.
each json child format like
You really do not want to use that method. It might work for a while, but becomes very complex when you need to handle things like quoting and special-character escapes. Instead of re-inventing the wheel, use PHP's built-in JSON functions for this.
Create the data-structure you want using PHP's strings, numbers, and arrays, and then rely on json_encode to turn it into a string.
The main thing to be careful of is that depending on how your php array() looks, you might get JSON [] versus {}.
As far as saving the file as .txt or .json won't make a difference.
I think the focal point of this all lies in the json_encode page. Here's the example from that page:
This code:
<?php
$arr = array ('a'=>1,'b'=>2,'c'=>3,'d'=>4,'e'=>5);
echo json_encode($arr);
?>
Outputs like this:
{"a":1,"b":2,"c":3,"d":4,"e":5}
3 . You can use fopen and fwrite to write to your file. The second argument to fopen is the mode, you want to use 'a' for append.
Don't write your own cache because anything you write in PHP will be slower than can be supported by native extensions (like APC or memcached or even MySQL itself!!).
Don't cache as JSON. JSON is not a particulary 'fast' to serialize. If you're doing caching you don't want to do any serialization at all. Just store it as it is.
MySQL does query caching for you. If performance is a problem first tune your MySQL queries and database schema. Caching is one of the absolute last optimization you want to do.
If you want an easy way to cache, make a MySQL table called 'cache' and use that. If you want quick (small) file access, use MySQL (seriously). If you want an even faster cache access use an in-memory cache like APC or memcached.

Assistance with building an inverted-index

It's part of an information retrieval thing I'm doing for school. The plan is to create a hashmap of words using the the first two letters of the word as a key and any words with the two letters saved as a string value. So,
hashmap["ba"] = "bad barley base"
Once I'm done tokenizing a line I take that hashmap, serialize it, and append it to the text file named after the key.
The idea is that if I take my data and spread it over hundreds of files I'll lessen the time it takes to fulfill a search by lessening the density of each file. The problem I am running into is when I'm making 100+ files in each run it happens to choke on creating a few files for whatever reason and so those entries are empty. Is there any way to make this more efficient? Is it worth continuing this, or should I abandon it?
I'd like to mention I'm using PHP. The two languages I know relatively intimately are PHP and Java. I chose PHP because the front end will be very simple to do and I will be able to add features like autocompletion/suggested search without a problem. I also see no benefit in using Java. Any help is appreciated, thanks.
I would use a single file to get and put the serialized string. I would also use json as the serialization.
Put the data
$string = "bad barley base";
$data = explode(" ",$string);
$hashmap["ba"] = $data;
$jsonContent = json_encode($hashmap);
file_put_contents("a-z.txt",$jsonContent);
Get the data
$jsonContent = file_get_contents("a-z.txt");
$hashmap = json_decode($jsonContent);
foreach($hashmap as $firstTwoCharacters => $value) {
if ($firstTwoCharacters == 'ba') {
$wordCount = count($value);
}
}
You didn't explain the problem you are trying to solve. I'm guessing you are trying to make a full text search engine, but you don't have document ids in your hashmap so I'm not sure how you are using the hashmap to find matching documents.
Assuming you want a full text search engine, I would look into using a trie for the data structure. You should be able to fit everything in it without it growing too large. Nodes that match a word you want to index would contain the ids of the documents containing that word.

Categories