Alternative for htmlentities($string,ENT_SUBSTITUTE) - php

I got a bit of a stupid question;
Currently I am making a website for a company on a server which actually has a bit an outdated PHP version (5.2.17). I have a database in which many fields are varchar with characters like 'é ä è ê' and so on, which I have to display in an HTML page.
So as the version of PHP is outdated (and I am not allowed to updated it because there are parts of the site that must keep working and to whom I have no acces to edit them) I can't use the htmlentities function with the ENT_SUBSTITUTE argument, because it was only added after version 5.4.
So my question is:
Does there exist an alternative to
htmlentities($string,ENT_SUBSTITUTE); or do I have to write a function
myself with all kinds of strange characters, which would be incomplete anyway.

Define a function for handling ill-formed byte sequences and call the function before passing the string to htmlentties. There are various way to define the function.
At first, try UConverter::transcode if you don't use Windows.
http://pecl.php.net/package/intl
If you are willing to handle bytes directly, see my previous answer.
https://stackoverflow.com/a/13695364/531320
The last option is to develop PHP extension. Thanks to php_next_utf8_char, it's not hard.
Here is code sample. The name "scrub" comes from Ruby 2.1 (see Equivalent of Iconv.conv("UTF-8//IGNORE",...) in Ruby 1.9.X?)
// header file
// PHP_FUNCTION(utf8_scrub);
#include "ext/standard/html.h"
#include "ext/standard/php_smart_str.h"
const zend_function_entry utf8_string_functions[] = {
PHP_FE(utf8_scrub, NULL)
PHP_FE_END
};
PHP_FUNCTION(utf8_scrub)
{
char *str = NULL;
int len, status;
size_t pos = 0, old_pos;
unsigned int code_point;
smart_str buf = {0};
if (zend_parse_parameters(ZEND_NUM_ARGS() TSRMLS_CC, "s", &str, &len) == FAILURE) {
return;
}
while (pos < len) {
old_pos = pos;
code_point = php_next_utf8_char((const unsigned char *) str, len, &pos, &status);
if (status == FAILURE) {
smart_str_appendl(&buf, "\xEF\xBF\xBD", 3);
} else {
smart_str_appendl(&buf, str + old_pos, pos - old_pos);
}
}
smart_str_0(&buf);
RETURN_STRINGL(buf.c, buf.len, 0);
smart_str_free(&buf);
}

You don't need ENT_SUBSTITUTE if your encoding is handled correctly.
If the characters in your database are utf-8, stored in utf-8, read in utf-8 and displayed to the user in utf-8 there should be no problem.

Just add
if (!defined('ENT_SUBSTITUTE')) define('ENT_SUBSTITUTE', 0);
and you'll be able to use ENT_SUBSTITUTE into htmlentities.

Related

Store output of php_execute_script

How can I store the output of php_execute_script() in a character array? php_execute_script() only prints the output.
Here's what I've got:
PHP_EMBED_START_BLOCK(argc, argv)
char string[200];
setbuf(stdout, string);
zend_file_handle file_handle;
zend_stream_init_filename(&file_handle, "say.php");
fflush(stdout);
if (php_execute_script(&file_handle) == FAILURE) {
php_printf("Failed to execute PHP script.\n");
}
setbuf(stdout, NULL);
printf("\"%s\" is what say.php has to say.\n", string);
PHP_EMBED_END_BLOCK()
I've tried redirecting stdout to string, but it doesn't even look like php_execute_script is actually writing to stdout! It just ignores it.
I'm trying to communicate with the PHP embedded script ("say.php") from C using the PHP SAPI after building PHP with embedding enabled.
Thinking that there's no such thing, I made an issue:
https://github.com/php/php-src/issues/9330
#include <sapi/embed/php_embed.h>
char *string[999] = {NULL};
int i = 0;
static size_t embed_ub_write(const char *str, size_t str_length){
string[i] = (char *) str; i++;
return str_length;
}
int main(int argc, char **argv)
{
php_embed_module.ub_write = embed_ub_write;
PHP_EMBED_START_BLOCK(argc, argv)
zend_file_handle file_handle;
zend_stream_init_filename(&file_handle, "say.php");
if (php_execute_script(&file_handle) == FAILURE) {
php_printf("Failed to execute PHP script.\n");
}
for (i = 0; string[i] != NULL; i++) {
printf("%s", string[i]);
}
PHP_EMBED_END_BLOCK()
}
You need to set php_embed_module.ub_write to a callback function that gets called every time PHP makes an echo.
It's weird how PHP doesn't write to stdout...
EDIT: As it turns out, it does. You just need to use pipe.

how to duplicate openssl_encrypt?

I was hoping someone had already implemented this in golang as I am far from even good at cryptography. However in porting a project from php to golang I have run into an issue with porting the openssl_encrypt method found here. I have also dug into the source code a little with no avail.
Here is the method I have implemented in golang. which gives me the output
lvb7JwaI4OCYUrdJMm8Q9uDd9rIILnvbZKJb/ozFbwCmLKkxoJN5Zf/ODOJ/RGq5
Here is the output I need when using php.
lvb7JwaI4OCYUrdJMm8Q9uDd9rIILnvbZKJb/ozFbwDV98XaJjvzEjBQp7jc+2DH
And here is the function I used to generate it with php.
$data = "This is some text I want to encrypt";
$method = "aes-256-cbc";
$password = "This is a really long key and su";
$options = 0;
$iv = "MMMMMMMMMMMMMMMM";
echo openssl_encrypt($data, $method, $password, $options, $iv);
To me it looks like it is very close and I must be missing something obvious.
You were very close, but you had the padding wrong. According to this answer (and the PHP docs), PHP uses the default OpenSSL padding behavior, which is to use the required number of padding bytes as the padding byte value.
The only change I made was:
copy(plaintextblock[length:], bytes.Repeat([]byte{uint8(extendBlock)}, extendBlock))
You can see the full updated code here.
Others beat me to the answer while I was playing with it, but I have a "better" fixed version of your example code that also takes into account that padding is always required (at least to emulate what the php code does).
It also shows the openssl command line that you'd use to do the same thing, and if available runs it (of course the playground won't).
package main
import (
"crypto/aes"
"crypto/cipher"
"encoding/base64"
"fmt"
"log"
"os/exec"
"strings"
)
func main() {
const input = "This is some text I want to encrypt"
fmt.Println(opensslCommand(input))
fmt.Println(aesCBCenctypt(input))
}
func aesCBCenctypt(input string) string {
// Of course real IVs should be from crypto/rand
iv := []byte("MMMMMMMMMMMMMMMM")
// And real keys should be from something like PBKDF2, RFC 2898.
// E.g. use golang.org/x/crypto/pbkdf2 to turn a
// "passphrase" into a key.
key := []byte("This is a really long key and su")
// Make sure the block size is a multiple of aes.BlockSize
// Pad to aes.BlockSize using the pad length as the padding
// byte. If we would otherwise need no padding we instead
// pad an entire extra block.
pad := (aes.BlockSize - len(input)%aes.BlockSize)
if pad == 0 {
pad = aes.BlockSize
}
data := make([]byte, len(input)+pad)
copy(data, input)
for i := len(input); i < len(input)+pad; i++ {
data[i] = byte(pad)
}
cb, err := aes.NewCipher(key)
if err != nil {
log.Fatalln("error NewCipher():", err)
}
mode := cipher.NewCBCEncrypter(cb, iv)
mode.CryptBlocks(data, data)
return base64.StdEncoding.EncodeToString(data)
}
// Just for comparison, don't do this for real!
func opensslCommand(input string) string {
iv := []byte("MMMMMMMMMMMMMMMM")
key := []byte("This is a really long key and su")
args := []string{"enc", "-aes-256-cbc", "-base64"}
// "-nosalt", "-nopad"
args = append(args, "-iv", fmt.Sprintf("%X", iv))
args = append(args, "-K", fmt.Sprintf("%X", key))
cmd := exec.Command("openssl", args...)
// Show how you could do this via the command line:
fmt.Println("Command:", strings.Join(cmd.Args, " "))
cmd.Stdin = strings.NewReader(input)
result, err := cmd.CombinedOutput()
if err != nil {
if e, ok := err.(*exec.Error); ok && e.Err == exec.ErrNotFound {
// openssl not available
return err.Error() // XXX
}
// some other error, show it and the (error?) output and die
fmt.Println("cmd error:", err)
log.Fatalf("result %q", result)
}
// Strip trailing '\n' and return it.
if n := len(result) - 1; result[n] == '\n' {
result = result[:n]
}
return string(result)
}
Playground

PHP extension seg fault when modifying a zval by reference

PHP:
$publickey = pack('H*', "03ca473d3c0cccbf600d1c89fa33b7f6b1f2b4c66f1f11986701f4b6cc4f54c360");
$pubkeylen = strlen($publickey);
$result = secp256k1_ec_pubkey_decompress($publickey, $pubkeylen);
C extension:
PHP_FUNCTION(secp256k1_ec_pubkey_decompress) {
secp256k1_start(SECP256K1_START_SIGN);
zval *pubkey, *pubkeylen;
unsigned char* newpubkey;
int newpubkeylen;
int result;
if (zend_parse_parameters(ZEND_NUM_ARGS() TSRMLS_CC, "zz", &pubkey, &pubkeylen) == FAILURE) {
return;
}
newpubkey = Z_STRVAL_P(pubkey);
newpubkeylen = Z_LVAL_P(pubkeylen);
result = secp256k1_ec_pubkey_decompress(newpubkey, &newpubkeylen);
if (result == 1) {
newpubkey[newpubkeylen] = 0U;
ZVAL_STRINGL(pubkey, newpubkey, newpubkeylen, 0);
ZVAL_LONG(pubkeylen, newpubkeylen);
}
RETURN_LONG(result);
}
the $publickey is decompressed from a 32 byte to a 65 byte string, for w/e reason when we're doing this we get a Segmentation Fault.
I asume we're doing something structurally wrong ... considering this is our first PHP extension.
full code; https://github.com/afk11/secp256k1-php
After looking at your extension code, you haven't linked the actual secp256k1 lib (.so) while building your extension ( #include "secp256k1.h" does not include actual bitcoin/secp256k1 code c library ).
You need to change your config.m4 in any of the following ways
1) Add "-l/path/to/bitcoin/secp256k1/lib" to the "gcc" options.
Help: here, I am talking about once you "make install" on
bitcoin/secp256k1, some libraries will be installed to /usr/lib or
/usr/lib64 or /usr/lib/secp256k1 etc....
-lsecp256k1
// i.e. something like...
PHP_NEW_EXTENSION(secp256k1, secp256k1.c, $ext_shared,, "-lsecp256k1 -DZEND_ENABLE_STATIC_TSRMLS_CACHE=1")
2) Or, include actual *.c files in from actual secp256k1 library
PHP_NEW_EXTENSION(secp256k1, secp256k1.c ../secp256k1/src/secp256k1.c ../secp256k1/src/others.c, $ext_shared,, -DZEND_ENABLE_STATIC_TSRMLS_CACHE=1)
I would recommend option-1

How to change the source code when called zend_compile_file() before it is parsed?

I am replacing zend_compile_string() for code pre-processing, which then pass to the original zend_compile_string().
It works.
But I also need code pre-process obtained from the files (_include/require, php <file.php>).
Zend provides the ability to replace zend_compile_file(), but the source text is available only inside open_file_for_scanning() (zend_stream_fixup(file_handle, &buf, &size TSRMLS_CC)), which is not accessible from the extension.
How can I change the code before being sent to zendparse()?
Edit: I found a solution:
zend_op_array* ext_zend_compile_file(zend_file_handle *file_handle, int type TSRMLS_DC)
{
char *buf;
size_t size;
if (zend_stream_fixup(file_handle, &buf, &size TSRMLS_CC) == FAILURE) {
return NULL;
}
char *res;
size_t res_size;
// My code that uses file_handle->handle.stream.mmap.buf/len is read-only and filling res/res_size
file_handle->handle.stream.mmap.buf = res;
file_handle->handle.stream.mmap.len = res_size;
return ext_orig_zend_compile_file(file_handle, type TSRMLS_CC);
}

PHP extension for Linux: reality check needed!

Okay, I've written my first functional PHP extension. It worked but it was a proof-of-concept only. Now I'm writing another one which actually does what the boss wants.
What I'd like to know, from all you PHP-heads out there, is whether this code makes sense. Have I got a good grasp of things like emalloc and the like, or is there stuff there that's going to turn around later and try to bite my hand off?
Below is the code for one of the functions. It returns a base64 of a string that has also been Blowfish encrypted. When the function is called, it is supplied with two strings, the text to encrypt and encode, and the key for the encryption phase. It's not using PHP's own base64 functions because, at this point, I don't know how to link to them. And it's not using PHP's own mcrypt functions for the same reason. Instead, it links in the SSLeay BF_ecb_encrypt functions.
PHP_FUNCTION(Blowfish_Base64_encode)
{
char *psData = NULL;
char *psKey = NULL;
int argc = ZEND_NUM_ARGS();
int psData_len;
int psKey_len;
char *Buffer = NULL;
char *pBuffer = NULL;
char *Encoded = NULL;
BF_KEY Context;
int i = 0;
unsigned char Block[ 8 ];
unsigned char * pBlock = Block;
char *plaintext;
int plaintext_len;
int cipher_len = 0;
if (zend_parse_parameters(argc TSRMLS_CC, "ss", &psData, &psData_len, &psKey, &psKey_len) == FAILURE)
return;
Buffer = (char *) emalloc( psData_len * 2 );
pBuffer = Buffer;
Encoded = (char *) emalloc( psData_len * 4 );
BF_set_key( &Context, psKey_len, psKey );
plaintext = psData;
plaintext_len = psData_len;
for (;;)
{
if (plaintext_len--)
{
Block[ i++ ] = *plaintext++;
if (i == 8 )
{
BF_ecb_encrypt( Block, pBuffer, &Context, BF_ENCRYPT );
pBuffer += 8;
cipher_len += 8;
memset( Block, 0, 8 );
i = 0;
}
} else {
BF_ecb_encrypt( Block, pBuffer, &Context, BF_ENCRYPT );
cipher_len += 8;
break;
}
}
b64_encode( Encoded, Buffer, cipher_len );
RETURN_STRINGL( Encoded, strlen( Encoded ), 0 );
}
You'll notice that I have two emalloc calls, for Encoded and for Buffer. Only Encoded is passed back to the caller, so I'm concerned that Buffer won't be freed. Is that the case? Should I use malloc/free for Buffer?
If there are any other glaring errors, I'd really appreciate knowing.
emalloc() allocates memory per request, and it's free()'d automatically when the runtime ends.
You should, however, compile PHP with
--enable-debug --enable-maintainer-zts
It will tell you if anything goes wrong (it can detect memory leaks if you've used the e*() functions and report_memleaks is set in your php.ini).
And yes, you should efree() Buffer.
You'll notice that I have two emalloc calls, for Encoded and for Buffer. Only Encoded is passed back to the caller, so I'm concerned that Buffer won't be freed. Is that the case? Should I use malloc/free for Buffer?
Yes, you should free it with efree before returning.
Although PHP has safety net and memory allocated with emalloc will be freed at the end of the request, it's still a bug to leak memory and, depending you will warned if running a debug build with report_memleaks = On.

Categories