I receive data from a PUSH service. This data is compressed with gzcompress(). At the very Beginning of the data, it contains an int which is the length of the data contained. This is done after the gzcompress(); So a sample data would be:
187xœËHÍÉÉ,
Which is produced by
echo '187'.gzcompress('Hello');
Now, I don't know the length of the int, it could be 1 digit it could be 10 digits. I also don't know the first character to find the position of the beginning of a string.
Any ideas on how to retrieve/subtract the int?
$length_value=???
$string_value=???
Assuming that the compressed data would NEVER start with a digit, then a regex would be easiest:
$string = '187xœËHÍÉÉ,';
preg_match('/^(\d+)/', $string, $matches);
$number = $matches[0];
$compressed_data = substr($string, 0, strlen($number));
If the compressed data DOES start with a digit, then you're going to end up with corrupt data - you'll have absolutely no way of differentiating where the 'length' value stops and the compressed data starts, e.g.
$compressed = '123foo';
$length = '6';
$your_string = '6123foo';
Ok - is that a string of length 61, with compressed data 23foo? or 612 + 3foo?
You could use preg_match() to catch the integer at the start of the string.
http://php.net/manual/en/function.preg-match.php
You could do:
$contents = "187xœËHÍÉÉ,";
$length = (int)$contents;
$startingPosition = strlen((string)$length);
$original = gzuncompress(substr($contents, $startingPosition), $length);
But I feel this may fail if the first compressed byte is a number.
Related
I have hardware unit, that when requested some data, returns a string, that when exploded on space, returns array of values:
$bytes = array(
'03',
'80',
'A0',
'01' // and others, total of 240 entries
);
These actually, depict bytes: 0x03, 0x80, 0xA0, 0x01. I need to transform them into their actual values.
I have tried in a loop, to: $value = 0x{$byte}, $value = {'0x' . $byte} and others, to no avail.
Also tried unpack, but don't know what format to apply, am kind of clueless about bytes.
Seems like a basic issue, yet cannot wrap my head around it.
How can I dynamically, transform them into their actual integer values?
use chr if you want a string
$value = chr($byte);
use hexdec if you want an integer
$value = hexdec($byte);
In PHP, bytes are the same as one-character long strings, with the following escaping:
$byte = "\x03";
There is a function that can help you, which is chr().
This function take as parameter the ASCII code of the byte you want to obtain. As it can be either a numeric string or an integer, you can use
$code = "03";
$byte = chr("0x" . $code);
to obtain the '\x03' byte, with the parameter to chr being interpreted as an hexadecimal integer.
On the other hand, as mentionned by #chumkiu, if you are trying to obtain integer values, the following code will work:
$code = "03";
$int = hexdec($code);
I think something like this will be sufficient:
foreach($bytes as byte)
{
echo hexdec($byte);
}
See also the hexdec manual.
If $string is the raw data (hex digits separated by spaces), then you can extract the binary data like this:
$binary = pack('H*',str_replace(' ','',$string));
Sorry for the title, I really didn't know how to say this...
I often have a string that needs to be cut after X characters, my problem is that this string often contains special characters like : & egrave ;
So, I'm wondering, is their a way to know in php, without transforming my string, if when I am cutting my string, I am in the middle of a special char.
Example
This is my string with a special char : è - and I want it to cut in the middle of the "è" but still keeping the string intact
so right now my result with a sub string would be :
This is my string with a special char : &egra
but I want to have something like this :
This is my string with a special char : è
The best thing to do here is store your string as UTF-8 without any html entities, and use the mb_* family of functions with utf8 as the encoding.
But, if your string is ASCII or iso-8859-1/win1252, you can use the special HTML-ENTITIES encoding of the mb_string library:
$s = 'This is my string with a special char : è - and I want it to cut in the middle of the "è" but still keeping the string intact';
echo mb_substr($s, 0, 40, 'HTML-ENTITIES');
echo mb_substr($s, 0, 41, 'HTML-ENTITIES');
However, if your underlying string is UTF-8 or some other multibyte encoding, using HTML-ENTITIES is not safe! This is because HTML-ENTITIES really means "win1252 with high-bit characters as html entities". This is an example of where this can go wrong:
// Assuming that é is in utf8:
mb_substr('é ', 0, 2, 'HTML-ENTITIES') === 'é'
// should be 'é '
When your string is in a multibyte encoding, you must instead convert all html entities to a common encoding before you split. E.g.:
$strings_actual_encoding = 'utf8';
$s_noentities = html_entity_decode($s, ENT_QUOTES, $strings_actual_encoding);
$s_trunc_noentities = mb_substr($s_noentities, 0, 41, $strings_actual_encoding);
The best solution would be to store your text as UTF-8, instead of storing them as HTML entities. Other than that, if you don't mind the count being off (` equals one character, instead of 7), then the following snippet should work:
<?php
$string = 'This is my string with a special char : è - and I want it to cut in the middle of the "è" but still keeping the string intact';
$cut_string = htmlentities(mb_substr(html_entity_decode($string, NULL, 'UTF-8'), 0, 45), NULL, 'UTF-8')."<br><br>";
Note: If you use a different function to encode the text (e.g. htmlspecialchars()), then use that function instead of htmlentities(). If you use a custom function, then use another custom function that does the opposite of your new custom function instead of html_entity_decode() (and custom function instead of htmlentities()).
The longest HTML entity is 10 characters long, including the ampersand and semicolon. If you intend to cut the string at X bytes, check bytes X-9 through X-1 for an ampersand. If the corresponding semicolon appears at byte X or later, cut the string after the semicolon instead of after byte X.
However, if you're willing to preprocess the string, Mike's solution will be more accurate because his cuts the string at X characters, not bytes.
You can use html_entity_decode() first to decode all the HTML entities. Then split your string. Then htmlentities() to re-encode the entities.
$decoded_string = html_entity_decode($original_string);
// implement logic to split string here
// then for each string part do the following:
$encoded_string_part = htmlentities($split_string_part);
A little bruteforce solution, that I'm not really happy with would a PCRE expression, let's say that you want to pass 80 characters and the longest possible HTML expression is 7 chars long:
$regex = '~^(.{73}([^&]{7}|.{0,7}$|[^&]{0,6}&[^;]+;))(.*)~mx'
// Note, this could return a bit of shorter text
return preg_replace( $regexp, '$1', $text);
Just so you know:
.{73} - 73 characters
[^&]{7} - okay, we may fill it with anything that doesn't contain &
.{0,7}$ - keep in mind the possible end (this shouldn't be necessary because shorter text wouldn't match at all)
[^&]{0,6}&[^;]+; - up to 6 characters (you'd be at 79th), then & and let it finish
Something that seems much better but requires bit of play with numbers is to:
// check whether $text is at least $N chars long :)
if( strlen( $text) < $N){
return;
}
// Get last &
$pos = strrpos( $text, '&', $N);
// We're not young anymore, we have to check this too (not entries at all) :)
if( $pos === false){
return substr( $text, 0, $N);
}
// Get Last
$end = strpos( $text, ';', $N);
// false wouldn't be smaller then 0 (entry open at the beginning
if( $end === false){
$end = -1;
}
// Okay, entry closed (; is after &)(
if( $end > $pos){
return substr($text, 0, $N);
}
// Now we need to find first ;
$end = strpos( $text, ';', $N)
if( $end === false){
// Not valid HTML, not closed entry, do whatever you want
}
return substr($text, 0, $end);
Check numbers, there may be +/-1 somewhere in indexes...
I think you would have to use a combination of strpos and strrpos to find the next and previous spaces, parse the text between the spaces, check that against a known list of special characters, and if it matches, extend your "cut" to the position of the next space. If you had a code sample of what you have now, we could give you a better answer.
I am sorry if this is a very stupid question, or an obvious newbie mistake - but I as basic as this is , I hardly never used the do - while loop before (I know - I can not comprehend it myself ! How is it possible that I managed to avoid it all those years ??)
so :
I want to select a number of words from the begining of a text paragraph.
I used the following code :
$no_of_char = 70;
$string = $content;
$string = strip_tags(stripslashes($string)); // convert to plaintext
$string = substr($string, 0, strpos(wordwrap($string, $no_of_char), "\n"));
Which Kind of works, but the problem is that sometimes it gives EMPTY results.
I would think that is because the paragraph contains spaces, empty lines , and / or carriage returns...
So I am trying to make a loop condition that will continue to try until the length of the string is at least X characters ..
$no_of_char = 70; // approximation - how many characters we want
$string = $content;
do {
$string = strip_tags(stripslashes($string)); // plaintext
$string = substr($string, 0, strpos(wordwrap($string, $no_of_char), "\n")); // do not crop words
}
while (strlen($string) > 8); // this would be X - and I am guessing here is my problem
Well - obviously it does not work (otherwise this question would not be ) - and now it ALWAYS produces nothing .(empty string)
Try using str_word_count:
$words = str_word_count($string, 2);
2 - returns an associative array, where the key is the numeric
position of the word inside the string and the value is the actual
word itself
Then use array_slice:
$total_words = 70;
$selected_words = array_slice($words, 0, $total_words);
The most likely problem you have is that the string has blank lines at the start. You can easily get rid of them with ltrim(). Then use your original code to get the first actual newline.
The reason your loop didn't work is because you told it to reject anything that was longer than 8 characters.
I am trying to extract substring from string gettting from mysql database using substr function:
substr($mystring,$startpoint,$endpoint);
here start and end point can be any number.
But I am not getting gesired result. Ptart point works but something is wrong with end point.
What is reason?
Edit
when I am pasin start ans end point like 15 and 50, start point is working fine so resultant string is starting from 15th char of main string . but end point is not working it's giving me meand end char in resultant string is not 50th of main string.
My guess is that you have mixed up endpoint with length. The third parameter of substring() is the length of the string - thus number of characters from the start point. Not the index of the last character.
<?php
$str = "A short string";
echo substr($str, 2, 2); // Prints sh
echo substr($str, 2, 4); // Prints shor
?>
If you want to specify an end point, you can calculate the length by subtracting the startpoint from your enpoint:
<?php
$startpoint = 2;
$endpoint = 5;
$str = "A short string";
echo substr($str, $startpoint, ($endpoint - $startpoint)); // Prints sho
?>
third parameter is for specifying length from start point.
But you want to get string till value in third parameter so both are different.
Try below it will work.
substr($string,$startpoint,($endpoint-$startpoint));
You should subtract $endpoint-$startpoint and pass it as third parameter to get desired output.
like below:
substr($mystring,$startpoint,($endpoint-$startpoint));
string substr ( string $string , int $start [, int $length ] )
length, not endpoint.
http://php.net/manual/en/function.substr.php
string substr (string $string, int $start [, int $length])
that means:
<?php
$text = "foobar text";
$startpoint = 0;
$endpoint = $startpoint + strlen($text);
echo substr($text, $startpoint, $endpoint);
?>
I am doing a real estate feed for a portal and it is telling me the max length of a string should be 20,000 bytes (20kb), but I have never run across this before.
How can I measure byte size of a varchar string. So I can then do a while loop to trim it down.
You can use mb_strlen() to get the byte length using a encoding that only have byte-characters, without worring about multibyte or singlebyte strings.
For example, as drake127 saids in a comment of mb_strlen, you can use '8bit' encoding:
<?php
$string = 'Cién cañones por banda';
echo mb_strlen($string, '8bit');
?>
You can have problems using strlen function since php have an option to overload strlen to actually call mb_strlen. See more info about it in http://php.net/manual/en/mbstring.overload.php
For trim the string by byte length without split in middle of a multibyte character you can use:
mb_strcut(string $str, int $start [, int $length [, string $encoding ]] )
You have to figure out if the string is ascii encoded or encoded with a multi-byte format.
In the former case, you can just use strlen.
In the latter case you need to find the number of bytes per character.
the strlen documentation gives an example of how to do it : http://www.php.net/manual/en/function.strlen.php#72274
Do you mean byte size or string length?
Byte size is measured with strlen(), whereas string length is queried using mb_strlen(). You can use substr() to trim a string to X bytes (note that this will break the string if it has a multi-byte encoding - as pointed out by Darhazer in the comments) and mb_substr() to trim it to X characters in the encoding of the string.
PHP's strlen() function returns the number of ASCII characters.
strlen('borsc') -> 5 (bytes)
strlen('boršč') -> 7 (bytes)
$limit_in_kBytes = 20000;
$pointer = 0;
while(strlen($your_string) > (($pointer + 1) * $limit_in_kBytes)){
$str_to_handle = substr($your_string, ($pointer * $limit_in_kBytes ), $limit_in_kBytes);
// here you can handle (0 - n) parts of string
$pointer++;
}
$str_to_handle = substr($your_string, ($pointer * $limit_in_kBytes), $limit_in_kBytes);
// here you can handle last part of string
.. or you can use a function like this:
function parseStrToArr($string, $limit_in_kBytes){
$ret = array();
$pointer = 0;
while(strlen($string) > (($pointer + 1) * $limit_in_kBytes)){
$ret[] = substr($string, ($pointer * $limit_in_kBytes ), $limit_in_kBytes);
$pointer++;
}
$ret[] = substr($string, ($pointer * $limit_in_kBytes), $limit_in_kBytes);
return $ret;
}
$arr = parseStrToArr($your_string, $limit_in_kBytes = 20000);
Further to PhoneixS answer to get the correct length of string in bytes - Since mb_strlen() is slower than strlen(), for the best performance one can check "mbstring.func_overload" ini setting so that mb_strlen() is used only when it is really required:
$content_length = ini_get('mbstring.func_overload') ? mb_strlen($content , '8bit') : strlen($content);