I am sorry if this is a very stupid question, or an obvious newbie mistake - but I as basic as this is , I hardly never used the do - while loop before (I know - I can not comprehend it myself ! How is it possible that I managed to avoid it all those years ??)
so :
I want to select a number of words from the begining of a text paragraph.
I used the following code :
$no_of_char = 70;
$string = $content;
$string = strip_tags(stripslashes($string)); // convert to plaintext
$string = substr($string, 0, strpos(wordwrap($string, $no_of_char), "\n"));
Which Kind of works, but the problem is that sometimes it gives EMPTY results.
I would think that is because the paragraph contains spaces, empty lines , and / or carriage returns...
So I am trying to make a loop condition that will continue to try until the length of the string is at least X characters ..
$no_of_char = 70; // approximation - how many characters we want
$string = $content;
do {
$string = strip_tags(stripslashes($string)); // plaintext
$string = substr($string, 0, strpos(wordwrap($string, $no_of_char), "\n")); // do not crop words
}
while (strlen($string) > 8); // this would be X - and I am guessing here is my problem
Well - obviously it does not work (otherwise this question would not be ) - and now it ALWAYS produces nothing .(empty string)
Try using str_word_count:
$words = str_word_count($string, 2);
2 - returns an associative array, where the key is the numeric
position of the word inside the string and the value is the actual
word itself
Then use array_slice:
$total_words = 70;
$selected_words = array_slice($words, 0, $total_words);
The most likely problem you have is that the string has blank lines at the start. You can easily get rid of them with ltrim(). Then use your original code to get the first actual newline.
The reason your loop didn't work is because you told it to reject anything that was longer than 8 characters.
Related
I have a string that looks something like this:
abc-def-ghi-jkl-mno-pqr-stu-vwx-yz I'd like to get the content BEFORE the 4th dash, so effectively, I'd like to get abc-def-ghi-jkl assigned to a new string, then I'd like to get mno assigned to a different string.
How could I go about doing this? I tried using explode but that changed it to an array and I didn't want to do it that way.
Try this:
$n = 4; //nth dash
$str = 'abc-def-ghi-jkl-mno-pqr-stu-vwx-yz';
$pieces = explode('-', $str);
$part1 = implode('-', array_slice($pieces, 0, $n));
$part2 = $pieces[$n];
echo $part1; //abc-def-ghi-jkl
echo $part2; //mno
See demo
http://php.net/manual/en/function.array-slice.php
http://php.net/manual/en/function.explode.php
http://php.net/manual/en/function.implode.php
Can you add your source code? I done this one before but I cant remember the exact source code I used. But I am pretty sure I used explode and you can't avoid using array.
EDIT: Mark M answer is right.
you could try using substr as another possible solution
http://php.net/manual/en/function.substr.php
If I see where you are trying to get with this you could also go onto substr_replace
I guess an alternative to explode would be to find the position of the 4th - in the string and then get a substring from the start of the string up to that character.
You can find the position using a loop with the method explained at find the second occurrence of a char in a string php and then use substr(string,0,pos) to get the substring.
$string = "abc-def-ghi-jkl-mno-pqr-stu-vwx-yz";
$pos = -1;
for($i=0;$i<4;$i++)
$pos = strpos($string, '-', $pos+1);
echo substr($string, 0, $pos);
Code isn't tested but the process is easy to understand. You start at the first character (0), find a - and on the next loop you start at that position +1. The loop repeats it for a set number of times and then you get the substring from the start to that last - you found.
This may be a dupe, but I cannot seem to find a thread which matches this issue. I want to remove all chars from a string after a given sub-string - but the chars and the number of chars after the sub-string is unknown. Most solutions I have found seem to only work for removing the given sub-string itself or a fixed length after a given sub-string.
I have
$str = preg_replace('(.gif*)','.gif$',$str);
Which locates 'blahblah.gif?12345' ok, but I cannot seem to remove the chars after the sub-string '.gif'. I read that $ denotes EOS so I thought this would work, but apparently not. I also tried
'.gif$/'
and simply
'.gif'
It can be done without regex:
echo substr('blahblah.gif?12345', strpos('blahblah.gif?12345', '.gif') + 4);
// returns ?12345 this is the length of the substring ^
So the code is:
$str = 'original string';
$match = 'matching string';
$output = substr($str, strpos($str, $match) + strlen($match));
Ok, now I'm not sure if you want to keep the first or the second part of the string. Anyway, here's the code for keeping the first part:
echo substr('blahblah.gif?12345', 0, strpos('blahblah.gif?12345', '.gif') + 4);
// returns blahblah.gif ^ this is the key
And the full code:
$str = 'original string';
$match = 'matching string';
$output = substr($str, 0, strpos($str, $match) + strlen($match));
See the both examples work here: http://ideone.com/Ge30rY
Assuming (from OP's comment) that you are working with actual URLs as your source string, I believe that the best course of action here would be to use PHP's built-in functionality for working with and parsing URLs. You do this by using the parse_url() function:
(PHP 4, PHP 5)
parse_url — Parse a URL and return its components
This function parses a URL and returns an associative array containing any of the various components of the URL that are present.
This function is not meant to validate the given URL, it only breaks it up into the above listed parts. Partial URLs are also accepted, parse_url() tries its best to parse them correctly.
From your example: www.page.com/image.gif?123 (or even just image.gif?123) using parse_url() will look something like this:
var_dump( parse_url( "www.page.com/image.gif?123" ) );
array(2) {
["path"]=>
string(22) "www.page.com/image.gif"
["query"]=>
string(3) "123"
}
As you can see, without the need for regular expressions or string manipulations we have broken up the URL into it's separate components. No need to re-invent the wheel. Nice and clean :)
You could do this:
$str = "somecontent.gif?anddata";
$pattern = ".gif";
echo strstr($str,$pattern,true).$pattern;
// Set up string to search through
$haystack = "blahblah.gif?12345";
// Determine substring and length of it
$needle = ".gif";
$length = strlen($needle);
// Find position of last substring
$location = strrpos($haystack, $needle);
// Use location of last occurence + it's length to get new string
$newtext = substr($haystack, 0, $location+$length);
I have a STRING $special which is formatted like £130.00 and is also an ex TAX(VAT) price.
I need to strip the first char so i can run some simple addition.
$str= substr($special, 1, 0); // Strip first char '£'
echo $str ; // Echo Value to check its worked
$endPrice = (0.20*$str)+$str ; // Work out VAT
I don't receive any value when i echo on the second line ? Also would i then need to convert the string to an integer in order to run the addition ?
Thanks
Matt
+++ UPDATE
Thanks for your help with this, I took your code and added some of my own, There are more than likely nicer ways to do this but it works :) I found out that if the price was below 1000 would look like £130.00 if the price was a larger value it would include a break. ie £1,400.22.
$str = str_replace('£', '', $price);
$str2 = str_replace(',', '', $str);
$vatprice = (0.2 * $str2) + $str2;
$display_vat_price = sprintf('%0.2f', $vatprice);
echo "£";
echo $display_vat_price ;
echo " (Inc VAT)";
Thanks again, Matt
You cannot use substr the way you are using it currently. This is because you are trying to remove the £ char, which is a two-byte unicode character, but substr() isn't unicode safe. You can either use $str = substr($string, 2), or, better, str_replace() like this:
$string = '£130.00';
$str = str_replace('£', '', $string);
echo (0.2 * $str) + $str; // 156
Original answer
I'll keep this version as it still can give some insight. The answer would be OK if £ wouldn't be a 2byte unicode character. Knowing this, you can still use it but you need to start the sub-string at offset 2 instead of 1.
Your usage of substr is wrong. It should be:
$str = substr($special, 1);
Check the documentation the third param would be the length of the sub-string. You passed 0, therefore you got an empty string. If you omit the third param it will return the sub-string starting from the index given in the first param until the end of the original string.
I receive data from a PUSH service. This data is compressed with gzcompress(). At the very Beginning of the data, it contains an int which is the length of the data contained. This is done after the gzcompress(); So a sample data would be:
187xœËHÍÉÉ,
Which is produced by
echo '187'.gzcompress('Hello');
Now, I don't know the length of the int, it could be 1 digit it could be 10 digits. I also don't know the first character to find the position of the beginning of a string.
Any ideas on how to retrieve/subtract the int?
$length_value=???
$string_value=???
Assuming that the compressed data would NEVER start with a digit, then a regex would be easiest:
$string = '187xœËHÍÉÉ,';
preg_match('/^(\d+)/', $string, $matches);
$number = $matches[0];
$compressed_data = substr($string, 0, strlen($number));
If the compressed data DOES start with a digit, then you're going to end up with corrupt data - you'll have absolutely no way of differentiating where the 'length' value stops and the compressed data starts, e.g.
$compressed = '123foo';
$length = '6';
$your_string = '6123foo';
Ok - is that a string of length 61, with compressed data 23foo? or 612 + 3foo?
You could use preg_match() to catch the integer at the start of the string.
http://php.net/manual/en/function.preg-match.php
You could do:
$contents = "187xœËHÍÉÉ,";
$length = (int)$contents;
$startingPosition = strlen((string)$length);
$original = gzuncompress(substr($contents, $startingPosition), $length);
But I feel this may fail if the first compressed byte is a number.
Sorry for the title, I really didn't know how to say this...
I often have a string that needs to be cut after X characters, my problem is that this string often contains special characters like : & egrave ;
So, I'm wondering, is their a way to know in php, without transforming my string, if when I am cutting my string, I am in the middle of a special char.
Example
This is my string with a special char : è - and I want it to cut in the middle of the "è" but still keeping the string intact
so right now my result with a sub string would be :
This is my string with a special char : &egra
but I want to have something like this :
This is my string with a special char : è
The best thing to do here is store your string as UTF-8 without any html entities, and use the mb_* family of functions with utf8 as the encoding.
But, if your string is ASCII or iso-8859-1/win1252, you can use the special HTML-ENTITIES encoding of the mb_string library:
$s = 'This is my string with a special char : è - and I want it to cut in the middle of the "è" but still keeping the string intact';
echo mb_substr($s, 0, 40, 'HTML-ENTITIES');
echo mb_substr($s, 0, 41, 'HTML-ENTITIES');
However, if your underlying string is UTF-8 or some other multibyte encoding, using HTML-ENTITIES is not safe! This is because HTML-ENTITIES really means "win1252 with high-bit characters as html entities". This is an example of where this can go wrong:
// Assuming that é is in utf8:
mb_substr('é ', 0, 2, 'HTML-ENTITIES') === 'é'
// should be 'é '
When your string is in a multibyte encoding, you must instead convert all html entities to a common encoding before you split. E.g.:
$strings_actual_encoding = 'utf8';
$s_noentities = html_entity_decode($s, ENT_QUOTES, $strings_actual_encoding);
$s_trunc_noentities = mb_substr($s_noentities, 0, 41, $strings_actual_encoding);
The best solution would be to store your text as UTF-8, instead of storing them as HTML entities. Other than that, if you don't mind the count being off (` equals one character, instead of 7), then the following snippet should work:
<?php
$string = 'This is my string with a special char : è - and I want it to cut in the middle of the "è" but still keeping the string intact';
$cut_string = htmlentities(mb_substr(html_entity_decode($string, NULL, 'UTF-8'), 0, 45), NULL, 'UTF-8')."<br><br>";
Note: If you use a different function to encode the text (e.g. htmlspecialchars()), then use that function instead of htmlentities(). If you use a custom function, then use another custom function that does the opposite of your new custom function instead of html_entity_decode() (and custom function instead of htmlentities()).
The longest HTML entity is 10 characters long, including the ampersand and semicolon. If you intend to cut the string at X bytes, check bytes X-9 through X-1 for an ampersand. If the corresponding semicolon appears at byte X or later, cut the string after the semicolon instead of after byte X.
However, if you're willing to preprocess the string, Mike's solution will be more accurate because his cuts the string at X characters, not bytes.
You can use html_entity_decode() first to decode all the HTML entities. Then split your string. Then htmlentities() to re-encode the entities.
$decoded_string = html_entity_decode($original_string);
// implement logic to split string here
// then for each string part do the following:
$encoded_string_part = htmlentities($split_string_part);
A little bruteforce solution, that I'm not really happy with would a PCRE expression, let's say that you want to pass 80 characters and the longest possible HTML expression is 7 chars long:
$regex = '~^(.{73}([^&]{7}|.{0,7}$|[^&]{0,6}&[^;]+;))(.*)~mx'
// Note, this could return a bit of shorter text
return preg_replace( $regexp, '$1', $text);
Just so you know:
.{73} - 73 characters
[^&]{7} - okay, we may fill it with anything that doesn't contain &
.{0,7}$ - keep in mind the possible end (this shouldn't be necessary because shorter text wouldn't match at all)
[^&]{0,6}&[^;]+; - up to 6 characters (you'd be at 79th), then & and let it finish
Something that seems much better but requires bit of play with numbers is to:
// check whether $text is at least $N chars long :)
if( strlen( $text) < $N){
return;
}
// Get last &
$pos = strrpos( $text, '&', $N);
// We're not young anymore, we have to check this too (not entries at all) :)
if( $pos === false){
return substr( $text, 0, $N);
}
// Get Last
$end = strpos( $text, ';', $N);
// false wouldn't be smaller then 0 (entry open at the beginning
if( $end === false){
$end = -1;
}
// Okay, entry closed (; is after &)(
if( $end > $pos){
return substr($text, 0, $N);
}
// Now we need to find first ;
$end = strpos( $text, ';', $N)
if( $end === false){
// Not valid HTML, not closed entry, do whatever you want
}
return substr($text, 0, $end);
Check numbers, there may be +/-1 somewhere in indexes...
I think you would have to use a combination of strpos and strrpos to find the next and previous spaces, parse the text between the spaces, check that against a known list of special characters, and if it matches, extend your "cut" to the position of the next space. If you had a code sample of what you have now, we could give you a better answer.