SHA-1 base32 of a file report back to a string

SHA-1 base32 of a file report back to a string - php

I'm having troubles figuring out how to implement a program that I can generate a base32 sha-1 value of a file. I know it can't be too difficult to figure out as to generate a standard sha1 file is fairly easy.
$file1 = sha1_file('main.jpg');
Any help would be appreciated.

If you want to encode the SHA1 value as Base32, you gonna have to either write that yourself or find a library. PHP does not have it built-in, like it does with base64_encode.
A while ago, I needed a base32_encode function, so I wrote one. I don't know how efficient it is, and I'm sure better ones exist out there, but it does work. It's located here: https://github.com/NTICompass/PHP-Base32
Using that you can do:
<?php
include 'Base32.php';
$base32 = new Base32;
$file1 = sha1_file('main.jpg');
echo $base32->base32_encode($file1);

I have a PHP class, Base2n, which can encode Base32 per RFC 4648. It's actually more flexible than that, allowing you to parametrically define many standard and non-standard encoding schemes with a 2n base.
https://github.com/ademarre/binary-to-text-php
Base32 is demonstrated in the first example of the README file.
Here's what your "main.jpg" example would look like:
// include, require, or autoload Base2n
$file1 = sha1_file('main.jpg', TRUE);
$base32 = new Base2n(5, 'ABCDEFGHIJKLMNOPQRSTUVWXYZ234567', FALSE, TRUE, TRUE);
$base32FileHash = $base32->encode($file1);
Notice that I also set $raw_output=TRUE in the call to sha1_file(). I assume this is what you want because otherwise the Base32 output would be done on the hexadecimal representation of the SHA-1 hash digest, not the raw 160-bit digest itself.

Related

LibSodium functions return unreadable characters

I am following along with a tutorial on encryption: https://php.watch/articles/modern-php-encryption-decryption-sodium. In working with the Sodium extension I'm just baffled by a few things. Googling is returning frustratingly little help. (Most of the results are just duplications of the php.net/manual.)
1. In various articles I'm reading, the result of sodium_crypto_*_encrypt() is something familiar:
// ex. DEx9ATXEg/eRq8GWD3NT5BatB3m31WED
Whenever I echo it out myself I get something like:
// ex. 𫦢�2(*���3�CV��Wu��R~�u���H��
which I'm certain won't store correctly on a database. Nowhere in the articles or documentation does it mention anything about charset weirdness. I can throw a header('Content-Type: text/html; charset=ISO-8859-1') in there, but I still get weird characters I'm not certain are right since I'm not finding any threads talking about this:
// ex. ÑAÁ5eŠ…n#±'ýÞÃ1è9ÜÈÌ³¬"CžãÚ0ÿÛ
2. I can't find any information about the best practice for storing keys or nonces.
I just figured this obvious-to-security-folks-but-not-to-others bit of information would be a regularly discussed part of articles on keygens and nonces and such. Seeing as both my keygen and nonce functions (at least in the Sodium library) seem to return non-UTF-8 gibberish, what do I do with it? fwrite it out to a file to be referenced later? Pass it directly to my database? Copy/pasting certainly doesn't work right with it being wingdings.
Other than these things, everything else in the encryption/decryption process makes complete sense to me. I'm far from new to PHP development, I just can't figure this out.

Came across https://stackoverflow.com/a/44874239/1128978 answering "PHP random_bytes returns unreadable characters"
random_bytes generates an arbitrary length string of cryptographic random bytes...
And suggests to use bin2hex to get readable characters. So amending my usages:
// Generate $ciphertext
$message = 'This is a secret message';
$key = sodium_crypto_*_keygen();
$nonce = random_bytes(SODIUM_CRYPTO_*BYTES);
$ciphertext = sodium_crypto_*_encrypt($message, '', $nonce, $key);
// Store hexadecimal versions of binary output
$nonce_hex = bin2hex($nonce);
$key_hex = bin2hex($key);
$ciphertext_hex = bin2hex($ciphertext);
// When ready to decrypt, convert hexadecimal values back to binary
$ciphertext_bin = hex2bin($ciphertext_hex);
$nonce_bin = hex2bin($nonce_hex);
$key_bin = hex2bin($key_hex);
$decrypted = sodium_crypto_*_decrypt($ciphertext_bin, '', $nonce_bin, $key_bin);
// "This is a secret message"
So making lots of use of bin2hex and hex2bin, but this now makes sense. Effectively solved, though not confident this is the proper way to work with it. I still have no idea why this isn't pointed out anywhere in php.net/manual nor in any of the articles/comments I've been perusing.

Implementing internationalization (language strings) in a PHP application

I want to build a CMS that can handle fetching locale strings to support internationalization. I plan on storing the strings in a database, and then placing a key/value cache like memcache in between the database and the application to prevent performance drops for hitting the database each page for a translation.
This is more complex than using PHP files with arrays of strings - but that method is incredibly inefficient when you have 2,000 translation lines.
I thought about using gettext, but I'm not sure that users of the CMS will be comfortable working with the gettext files. If the strings are stored in a database, then a nice administration system can be setup to allow them to make changes whenever they want and the caching in RAM will insure that the fetching of those strings is as fast, or faster than gettext. I also don't feel safe using the PHP extension considering not even the zend framework uses it.
Is there anything wrong with this approach?
Update
I thought perhaps I would add more food for thought. One of the problems with string translations it is that they doesn't support dates, money, or conditional statements. However, thanks to intl PHP now has MessageFormatter which is what really needs to be used anyway.
// Load string from gettext file
$string = _("{0} resulted in {1,choice,0#no errors|1#single error|1<{1, number} errors}");
// Format using the current locale
msgfmt_format_message(setlocale(LC_ALL, 0), $string, array('Update', 3));
On another note, one of the things I don't like about gettext is that the text is embedded into the application all over the place. That means that the team responsible for the primary translation (usually English) has to have access to the project source code to make changes in all the places the default statements are placed. It's almost as bad as applications that have SQL spaghetti-code all over.
So, it makes sense to use keys like _('error.404_not_found') which then allow the content writers and translators to just worry about the PO/MO files without messing in the code.
However, in the event that a gettext translation doesn't exist for the given key then there is no way to fall back to a default (like you could with a custom handler). This means that you either have the writter mucking around in your code - or have "error.404_not_found" shown to users that don't have a locale translation!
In addition, I am not aware of any large projects which use PHP's gettext. I would appreciate any links to well-used (and therefore tested), systems which actually rely on the native PHP gettext extension.

Gettext uses a binary protocol that is quite quick. Also the gettext implementation is usually simpler as it only requires echo _('Text to translate');. It also has existing tools for translators to use and they're proven to work well.
You can store them in a database but I feel it would be slower and a bit overkill, especially since you'd have to build the system to edit the translations yourself.
If only you could actually cache the lookups in a dedicated memory portion in APC, you'd be golden. Sadly, I don't know how.

For those that are interested, it seems full support for locales and i18n in PHP is finally starting to take place.
// Set the current locale to the one the user agent wants
$locale = Locale::acceptFromHttp(getenv('HTTP_ACCEPT_LANGUAGE'));
// Default Locale
Locale::setDefault($locale);
setlocale(LC_ALL, $locale . '.UTF-8');
// Default timezone of server
date_default_timezone_set('UTC');
// iconv encoding
iconv_set_encoding("internal_encoding", "UTF-8");
// multibyte encoding
mb_internal_encoding('UTF-8');
There are several things that need to be condered and detecting the timezone/locale and then using it to correctly parse and display input and output is important. There is a PHP I18N library that was just released which contains lookup tables for much of this information.
Processing User input is important to make sure you application has clean, well-formed UTF-8 strings from whatever input the user enters. iconv is great for this.
/**
* Convert a string from one encoding to another encoding
* and remove invalid bytes sequences.
*
* #param string $string to convert
* #param string $to encoding you want the string in
* #param string $from encoding that string is in
* #return string
*/
function encode($string, $to = 'UTF-8', $from = 'UTF-8')
{
// ASCII is already valid UTF-8
if($to == 'UTF-8' AND is_ascii($string))
{
return $string;
}
// Convert the string
return #iconv($from, $to . '//TRANSLIT//IGNORE', $string);
}
/**
* Tests whether a string contains only 7bit ASCII characters.
*
* #param string $string to check
* #return bool
*/
function is_ascii($string)
{
return ! preg_match('/[^\x00-\x7F]/S', $string);
}
Then just run the input through these functions.
$utf8_string = normalizer_normalize(encode($_POST['text']), Normalizer::FORM_C);
Translations
As Andre said, It seems gettext is the smart default choice for writing applications that can be translated.
Gettext uses a binary protocol that is quite quick.
The gettext implementation is usually simpler as it only requires _('Text to translate')
Existing tools for translators to use and they're proven to work well.
When you reach facebook size then you can work on implementing RAM-cached, alternative methods like the one I mentioned in the question. However, nothing beats "simple, fast, and works" for most projects.
However, there are also addition things that gettext cannot handle. Things like displaying dates, money, and numbers. For those you need the INTL extionsion.
/**
* Return an IntlDateFormatter object using the current system locale
*
* #param string $locale string
* #param integer $datetype IntlDateFormatter constant
* #param integer $timetype IntlDateFormatter constant
* #param string $timezone Time zone ID, default is system default
* #return IntlDateFormatter
*/
function __date($locale = NULL, $datetype = IntlDateFormatter::MEDIUM, $timetype = IntlDateFormatter::SHORT, $timezone = NULL)
{
return new IntlDateFormatter($locale ?: setlocale(LC_ALL, 0), $datetype, $timetype, $timezone);
}
$now = new DateTime();
print __date()->format($now);
$time = __date()->parse($string);
In addition you can use strftime to parse dates taking the current locale into consideration.
Sometimes you need the values for numbers and dates inserted correctly into locale messages
/**
* Format the given string using the current system locale
* Basically, it's sprintf on i18n steroids.
*
* #param string $string to parse
* #param array $params to insert
* #return string
*/
function __($string, array $params = NULL)
{
return msgfmt_format_message(setlocale(LC_ALL, 0), $string, $params);
}
// Multiple choices (can also just use ngettext)
print __(_("{1,choice,0#no errors|1#single error|1<{1, number} errors}"), array(4));
// Show time in the correct way
print __(_("It is now {0,time,medium}), time());
See the ICU format details for more information.
Database
Make sure your connection to the database is using the correct charset so that nothing gets currupted on storage.
String Functions
You need to understand the difference between the string, mb_string, and grapheme functions.
// 'LATIN SMALL LETTER A WITH RING ABOVE' (U+00E5) normalization form "D"
$char_a_ring_nfd = "a\xCC\x8A";
var_dump(grapheme_strlen($char_a_ring_nfd));
var_dump(mb_strlen($char_a_ring_nfd));
var_dump(strlen($char_a_ring_nfd));
// 'LATIN CAPITAL LETTER A WITH RING ABOVE' (U+00C5)
$char_A_ring = "\xC3\x85";
var_dump(grapheme_strlen($char_A_ring));
var_dump(mb_strlen($char_A_ring));
var_dump(strlen($char_A_ring));
Domain name TLD's
The IDN functions from the INTL library are a big help processing non-ascii domain names.

There are a number of other SO questions and answers similar to this one. I suggest you search and read them as well.
Advice? Use an existing solution like gettext or xliff as it will save you lot's of grief when you hit all the translation edge cases such as right to left text, date formats, different text volumes, French is 30% more verbose than English for example that screw up formatting etc. Even better advice Don't do it. If the users want to translate they will make a clone and translate it. Because Localisation is more about look and feel and using colloquial language this is usually what happens. Again giving and example Anglo-Saxon culture likes cool web colours and san-serif type faces. Hispanic culture like bright colours and Serif/Cursive types. Which to cater for you would need different layouts per language.
Zend actually cater for the following adapters for Zend_Translate and it is a useful list.
Array:- Use PHP arrays for Small pages; simplest usage; only for programmers
Csv:- Use comma separated (.csv/.txt) files for Simple text file format; fast; possible problems with unicode characters
Gettext:- Use binary gettext (*.mo) files for GNU standard for linux; thread-safe; needs tools for translation
Ini:- Use simple INI (*.ini) files for Simple text file format; fast; possible problems with unicode characters
Tbx:- Use termbase exchange (.tbx/.xml) files for Industry standard for inter application terminology strings; XML format
Tmx:- Use tmx (.tmx/.xml) files for Industry standard for inter application translation; XML format; human readable
Qt:- Use qt linguist (*.ts) files for Cross platform application framework; XML format; human readable
Xliff:- Use xliff (.xliff/.xml) files for A simpler format as TMX but related to it; XML format; human readable
XmlTm:- Use xmltm (*.xml) files for Industry standard for XML document translation memory; XML format; human readable
Others:- *.sql for Different other adapters may be implemented in the future

I'm using the ICU stuff in my framework and really finding it simple and useful to use. My system is XML-based with XPath queries and not a database as you're suggesting to use. I've not found this approach to be inefficient. I played around with Resource bundles too when researching techniques but found them quite complicated to implement.
The Locale functionality is a god send. You can do so much more easily:
// Available translations
$languages = array('en', 'fr', 'de');
// The language the user wants
$preference = (isset($_COOKIE['lang'])) ?
$_COOKIE['lang'] : ((isset($_SERVER['HTTP_ACCEPT_LANGUAGE'])) ?
Locale::acceptFromHttp($_SERVER['HTTP_ACCEPT_LANGUAGE']) : '');
// Match preferred language to those available, defaulting to generic English
$locale = Locale::lookup($languages, $preference, false, 'en');
// Construct path to dictionary file
$file = $dir . '/' . $locale . '.xsl';
// Check that dictionary file is readable
if (!file_exists($file) || !is_readable($file)) {
throw new RuntimeException('Dictionary could not be loaded');
}
// Load and return dictionary file
$dictionary = simplexml_load_file($file);
I then perform word lookups using a method like this:
$selector = '/i18n/text[#label="' . $word . '"]';
$result = $dictionary->xpath($selector);
$text = array_shift($result);
if ($formatted && isset($text)) {
return new MessageFormatter($locale, $text);
}
The bonus for my system is that the template system is XSL-based which means I can use the same translation XML files directly in my templates for simple messages that don't need any i18n formatting.

Stick with gettext, you won't find a faster alternative in PHP.
Regarding the how, you can use a database to store your catalog and allow other users to translate the strings using a friendly gui. When the new changes are reviewed/approved, hit a button, compile a new .mo file and deploy.
Some resources to get you on track:
http://code.google.com/p/simplepo/
http://www.josscrowcroft.com/2011/code/php-mo-convert-gettext-po-file-to-binary-mo-file-php/
https://launchpad.net/php-gettext/
http://sourceforge.net/projects/tcktranslator/

What about csv files (which can be easily edited in many apps) and caching to memcache (wincache, etc.)? This approach works well in magento. All languages phrases in the code are wrapped into __() function, for example
<?php echo $this->__('Some text') ?>
Then, for example before new version release, you run simple script which parses source files, finds all text wrapped into __() and puts into .csv file. You load csv files and cache them to memcache. In __() function you look into your memcache where translations are cached.

In a recent project, we considered using gettext, but it turned out to be easier to just write our own functionality. It really is quite simple: Create a JSON file per locale (e.g. strings.en.json, strings.es.json, etc.), and create a function somewhere called "translate()" or something, and then just call that. That function will determine the current locale (from the URI or a session var or something), and return the localized string.
The only thing to remember is to make sure any HTML you output is encoded in UTF-8, and marked as such in the markup (e.g. in the doctype, etc.)

Maybe not really an answer to your question, but maybe you can get some ideas from the Symfony translation component? It looks very good to me, although I must confess I haven't used it myself yet.
The documentation for the component can be found at
http://symfony.com/doc/current/book/translation.html
and the code for the component can be found at
https://github.com/symfony/Translation.
It should be easy to use the Translation component, because Symfony components are intended to be able to be used as standalone components.

On another note, one of the things I don't like about gettext is that
the text is embedded into the application all over the place. That
means that the team responsible for the primary translation (usually
English) has to have access to the project source code to make changes
in all the places the default statements are placed. It's almost as
bad as applications that have SQL spaghetti-code all over.
This isn't actually true. You can have a header file (sorry, ex C programmer), such as:
<?php
define(MSG_404_NOT_FOUND, 'error.404_not_found')
?>
Then whenever you want a message, use _(MSG_404_NOT_FOUND). This is much more flexible than requiring developers to remember the exact syntax of the non-localised message every time they want to spit out a localised version.
You could go one step further, and generate the header file in a build step, maybe from CSV or database, and cross-reference with the translation to detect missing strings.

have a zend plugin that works very well for this.
<?php
/** dependencies **/
require 'Zend/Loader/Autoloader.php';
require 'Zag/Filter/CharConvert.php';
Zend_Loader_Autoloader::getInstance()->setFallbackAutoloader(true);
//filter
$filter = new Zag_Filter_CharConvert(array(
'replaceWhiteSpace' => '-',
'locale' => 'en_US',
'charset'=> 'UTF-8'
));
echo $filter->filter('ééé ááá 90');//eee-aaa-90
echo $filter->filter('óóó 10aáééé');//ooo-10aaeee
if you do not want to use the zend framework can only use the plugin.
hug!

How to convert torrent info hash for scrape?

I have a torrent hash from the magnet link. For example: fda164e7af470f83ea699a529845a9353cc26576
When I try to get information about leechers and peers I should request: http://tracker.publicbt.com/scrape?info_hash=???
How should I convert info hash for this request? Is it url encoding or becoding? how? In PHP.

It's a raw hexadecimal representation. Use pack() with H to convert it. Then URL encode it.

Got this python snippet from a colleague,
r = ''
s = 'fda164e7af470f83ea699a529845a9353cc26576'
for n in range(0, len(s), 2):
r += '%%%s' % s[n:n+2].upper()
print r
Output: %FD%A1%64%E7%AF%47%0F%83%EA%69%9A%52%98%45%A9%35%3C%C2%65%76
Works like a charm.
Edit: Works like a charm for getting the tracker to give back status 200 (ok) but still doesn't work for retrieving the torrent details...

In case someone is having trouble and comes across this thread in the future: the trick to this whole issue is to use the bool $raw_output argument of the PHP: sha1 function, setting it to "true".
The BDecode/DEncode classes can be found HERE. This project, called Trackon, also includes many other helpful classes to interact with torrent trackers and files.
So, in PHP, something like this will work to obtain the correct info hash for scraping the tracker for details:
include('./path/to/BDecode.php');
include('./path/to/BEncode.php');
function getHash($torFile){
$tfile = BDecode(file_get_contents($torFile));
$infohash = sha1(BEncode($tfile["info"]), TRUE);
return urlencode($infohash);
}
Then merely call it like so:
$hash = getHash('./path/to/.torrent');
Hope this helps someone out there. I was still scratching my head after reading many posts about how to obtain the correct info hash. I understand why this wasn't mentioned anywhere now though, this argument was added in PHP 5. If you're not running PHP 5, you will have to convert the sha1 hash to raw binary after you calculate it.

PHP - fast serialize/unserialize?

I have a PHP script that builds a binary search tree over a rather large CSV file (5MB+). This is nice and all, but it takes about 3 seconds to read/parse/index the file.
Now I thought I could use serialize() and unserialize() to quicken the process. When the CSV file has not changed in the meantime, there is no point in parsing it again.
To my horror I find that calling serialize() on my index object takes 5 seconds and produces a huge (19MB) text file, whereas unserialize() takes unbearable 27 seconds to read it back. Improvements look a bit different. ;-)
So - is there a faster mechanism to store/restore large object graphs to/from disk in PHP?
(To clarify: I'm looking for something that takes significantly less than the aforementioned 3 seconds to do the de-serialization job.)

var_export should be lots faster as PHP won't have to process the string at all:
// export the process CSV to export.php
$php_array = read_parse_and_index_csv($csv); // takes 3 seconds
$export = var_export($php_array, true);
file_put_contents('export.php', '<?php $php_array = ' . $export . '; ?>');
Then include export.php when you need it:
include 'export.php';
Depending on your web server set up, you may have to chmod export.php to make it executable first.

Try igbinary...did wonders for me:
http://pecl.php.net/package/igbinary

First you have to change the way your program works. divide CSV file to smaller chunks. This is an IP datastore i assume. .
Convert all IP addresses to integer or long.
So if a query comes you can know which part to look.
There are <?php ip2long() /* and */ long2ip(); functions to do this.
So 0 to 2^32 convert all IP addresses into 5000K/50K total 100 smaller files.
This approach brings you quicker serialization.
Think smart, code tidy ;)

It seems that the answer to your question is no.
Even if you discover a "binary serialization format" option most likely even that would be to slow for what you envisage.
So, what you may have to look into using (as others have mentioned) is a database, memcached, or on online web service.
I'd like to add the following ideas as well:
caching of requests/responses
your PHP script does not shutdown but becomes a network server to answer queries
or, dare I say it, change the data structure and method of query you are currently using

i see two options here
string serialization, in the simplest form something like
write => implode("\x01", (array) $node);
read => explode() + $node->payload = $a[0]; $node->value = $a[1] etc
binary serialization with pack()
write => pack("fnna*", $node->value, $node->le, $node->ri, $node->payload);
read => $node = (object) unpack("fvalue/nre/nli/a*payload", $data);
It would be interesting to benchmark both options and compare the results.

If you want speed, writing to or reading from the file system in less than optimal.
In most cases, a database server will be able to store and retrieve data much more efficiently than a PHP script that is reading/writing files.
Another possibility would be something like Memcached.
Object serialization is not known for its performance but for its ease of use and it's definitely not suited to handle large amounts of data.

SQLite comes with PHP, you could use that as your database. Otherwise you could try using sessions, then you don't have to serialize anything, you just saving the raw PHP object.

What about using something like JSON for a format for storing/loading the data? I have no idea how fast the JSON parser is in PHP, but it's usually a fast operation in most languages and it's a lightweight format.
http://php.net/manual/en/book.json.php

How to encrypt a string in Python and decrypt that same string in PHP?

I have a string that I would like to encrypt in Python, store it as a cookie, then in a PHP file I'd like to retrieve that cookie, and decrypt it in PHP. How would I go about doing this?
I appreciate the fast responses.
All cookie talk aside, lets just say I want to encrypt a string in Python and then decrypt a string in PHP.
Are there any examples you can point me to?

Use a standard encryption scheme. The implementation is going to be equivalent in either language.
RSA is available (via third party libraries) in both languages, if you need asymmetric key crypto. So is AES, if you need symmetric keys.

There is a good example here:
http://www.codekoala.com/blog/2009/aes-encryption-python-using-pycrypto/
Other links that may help:
http://www.phpclasses.org/browse/package/4238.html
http://www.chilkatsoft.com/p/php_aes.asp

If you're not talking about encryption but encoding to make sure the contents make it through safely regardless of quoting issues, special characters, and line breaks, I think base64 encoding is your best bet. PHP has base64_encode / decode() out of the box, and I'm sure Python has, too.
Note that base64 encoding obviously does nothing to encrypt your data (i.e. to make it unreadable to outsiders), and base64 encoded data grows by 33%.

Well, my first thought would be to use a web server that uses SSL and set the cookie's secure property to true, meaning that it will only be served over SSL connections.
However, I'm aware that this probably isn't what you're looking for.

Although a bit late. Find sample code below using the Fernet library
#Python Code - fernet 1.0 library
from cryptography.fernet import Fernet
key = b"Gm3wFh9OiQHcVc8rcAMm8IOqKOJtk7CbrGRKVhrvXhg="
f = Fernet(key)
token = f.encrypt(b'the quick brown fox jumps over the lazy hare')
print(token)
##gAAAAABiMWVPsStLo42ExcmIqcGvRvCCmnhB5B6dc2JsOm4w-VsE9oJOuk_qYuZvHv5quQR4t_6ZjNJzAdCiDPOtESNzCreJZLwc2X-_apbqKKnBwc3KhmqL-K5X7t1uR1WXuyUEYUtW
<?php
//PHP - kelvinmo/fernet-php v1.0.1 A
require 'vendor/autoload.php';
use Fernet\Fernet;
$key = "Gm3wFh9OiQHcVc8rcAMm8IOqKOJtk7CbrGRKVhrvXhg=" ;
$fernet = new Fernet($key);
$token = "gAAAAABiMWVPsStLo42ExcmIqcGvRvCCmnhB5B6dc2JsOm4w-VsE9oJOuk_qYuZvHv5quQR4t_6ZjNJzAdCiDPOtESNzCreJZLwc2X-_apbqKKnBwc3KhmqL-K5X7t1uR1WXuyUEYUtW";
echo $fernet->decode($token);
?>

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.