I have a block of text which occasionally has a really long word/web address which breaks out of my site's layout.
What is the best way to go through this block of text and shorten the words?
EXAMPLE:
this is some text and this a long word appears like this
fkdfjdksodifjdisosdidjsosdifosdfiosdfoisjdfoijsdfoijsdfoijsdfoijsdfoijsdfoisjdfoisdjfoisdfjosdifjosdifjosdifjosdifjosdifjsodifjosdifjosidjfosdifjsdoiofsij and i need that to either wrap in ALL browsers or trim the word.
You need wordwrap function i suppose.
You could truncate the string so it appears with an ellipsis in the middle or the end of the string. However, this would be independent from the actual rendering in a webbrowser. There is no way for PHP to determine the actual length a string will have with a certain font when rendered in a browser, especially if you have defined fallback fonts and don't know which font is used in the browser, e.g.
font-family: Verdana, Arial, sans-serif;
Compare the following:
I am 23 characters long
I am 23 characters long
Both chars have the same length, but since the one is monotyped and the other isn't the actual width it will have is different. PHP cannot determine this. You'd have to find a client side technology, probably JavaScript, to solve this for you.
You could also wrap the text into an element with the CSS property overflow:hidden to make the text disappear after a fixed length.
Look around SO. I'm pretty sure this was asked more than once before.
You could use the word-wrap: break-word CSS property to wrap the text that breaks your layout.
Check out the Mozilla Developer Center examples which demonstrate its use.
function fixlongwords($string) {
$exploded = explode(' ', $string);
$result = '';
foreach($exploded as $curr) {
if(strlen($curr) > 20) {
$curr = wordwrap($curr, 20, '<br/>\n');
}
$result .= $curr.' ';
}
return $result;
}
This should do the job.
You could do something like this:
preg_replace("/(\\S{20})/", '$1', $text);
It should* add a zero-width non-join character into all words each 20 characters. This means they will word-wrap.
* (untested)
Based on #JonnyLitt's answer, here's my take on the problem:
<?php
function insertSoftBreak($string, $interval=20, $breakChr='') {
$splitString = explode(' ', $string);
foreach($splitString as $key => $val) {
if(strlen($val)>$interval) {
$splitString[$key] = wordwrap($val, $interval, $breakChr, true);
}
}
return implode(' ', $splitString);
}
$string = 'Hello, My name is fwwfdfhhhfhhhfrhgrhffwfweronwefbwuecfbryhfbqpibcqpbfefpibcyhpihbasdcbiasdfayifvbpbfawfgawg, because that is my name.';
echo insertSoftBreak($string);
?>
Breaking the string up in space-seperated values, check the length of each individual 'word' (words include symbols like dot, comma, or question mark). For each word, check if the length is longer than $interval characters, and if so, insert a (soft hyphen) every $interval'th character.
I've chosen soft hyphens because they seem to be relatively well-supported across browsers, and they usually don't show unless the word actually wraps at that position.
I'm not aware of any other usable (and well supported) HTML entities that could be used instead ( does not seem to work in FF 3.6, at least), so if crossbrowser support for turns out lacking, a pure CSS or Javascript-based solution would be best.
Related
I'm trying to properly understand strlen() in PHP to make a application where text is shortened and finished with a ...
My code:
$prize_text = "Learn how to eat pizza TODAY";
if (strlen($prize_text) > 24) {
$prize_text = substr($prize_text, 0, 21) . '...';
}
$prize_text = "Watch Good Day Sunshine Today!";
$prize_text = "abcdefghijklmnopqrstuvwxyz";
Why is everything not matching up? I want one uniform standard of shortening the text and then appending three dots. What is wrong with my code?
Wrong tool for the job. You want CSS instead to avoid the breaking problem you're having.
#element {
overflow: hidden;
white-space: nowrap;
text-overflow: ellipsis;
}
Advantages
You can hide/show the text at will, whereas with PHP the text sent is already rendered in the document.
Will work for multiple situations whereas the PHP is hard-coded.
strlen() count the number of characters in string not space occupied by the string on viewport. You can make same length by make all characters in samecase- either in lowercase or uppercase, to make all string with same width you may take reference of http://www.ampsoft.net/webdesign-l/WindowsMacFonts.html
hai everybody i am using html2pdf ,it doesn't support word-break:break-all css any idea?
example
<td style="width:30%;word-break:break-all ;">
testtestetstetstetstetstettstetstetstetstetstetstetstetstetstetstets
</td>
output pdf take above 30% width like string length size
output pdf: testtestetstetstetstetstettstetstetstetstetstetstetstetstetstetstets
I want Output :
testtestetstetstetstetstettstets tetstetstetstetstetstetstetstets
Well, that's complicated. Your teststring is too long, but it's not composed of multiple words. That means that word-break won't work, because there aren't any words to break on. Obviously, this might well just be an example, in which case it might be that html2pdf just doesn't support relative widths and word-break, so you could try having an absolute width and word-break.
That said, here's something I know that will work: wordwrap in PHP. So, instead of echo $yourvar; you could use echo wordwrap($yourvar, 75, "\n", true) instead, which will always cut the string, even if it's just one long string. It takes a little fiddling to get the number of characters to match up with the width that you're looking for, but it will work.
<?php
$foo = str_repeat('test',12);
echo wordwrap($foo, 20, '<br />', true);
Output:
testtesttesttesttest
testtesttesttesttest
testtest
try this;
<td style="width:30%; word-wrap:break-word;">
testtestetstetstetstetstettstetstetstetstetstetstetstetstetstetstets
</td>
not word-break it is word-wrap ;
If you want long strings to wrap consistently within a boundary container I think you should be able to accomplish this by inserting zero-width space characters ( or \xe2\x80\x8b) between every letter of the orignial string. This will have the effect of wrapping as if every character was its own word, but without displaying the spaces to the end user. This may cause you trouble with text searches or indexing on the final product, but it should accomplish the task reliably from an aesthetic perspective.
Thus:
testtestetstetstetstetstettstetstetstetstetstetstetstetstetstetstets
Becomes
testtestetstetstetstetstettstetstetstetstetstetstetstetstetstetstets
(which displays: "testtestetstetstetstetstettstetstetstetstetstetstetstetstetstetstets")
So if you wrap it it will wrap exactly to the bounds of its container. Here's a fiddle of it as an example.
Just write a PHP script to loop though the string and insert the space:
$string="testtestetstetstetstetstettstetstetstetstetstetstetstetstetstetstets";
$new_string = "";
for($i=0;$i<strlen($string);$i++){
if ($string[$i]==' ' || $string[$i+1]==' '){ //if it is a space or the next letter is a space, there's no reason to add a break character
continue;
}
$new_string .= $string[$i]."";
}
echo $new_string
This is a particularly nice solution, because unlike wordwrap(), it automatically adjusts for non-fixed-width fonts (which is basically 99% of fonts that are actually used).
Again, if you need to resulting PDF to be searchable, this is not a good approach, but it will make it look like you want it to.
In your testing the word break will not work because the word break only works between the words in a particular sentence. So yo can use the multiple word sentence and then try with the word breaker
You just use substr function in your code.
I put a example for this. First put your output in variable.
$get_value = "testtestetstetstetstetstettstetstetstet";
$first = substr("$get_value",0,3);
$second = substr("$get_value",4,7);
and so on.
You can use "\r\n" to print newline character. make sure to use it with double quote. If your string is in the variable then you need to use word count function and append this string. You can also use PHP_EOL to avoid platform dependency.
html2pdf does not support this word-break:break-all css
Ref: http://www.yaronet.com/en/posts.php?sl=&h=0&s=151321#0
You may use this method.
<?php
$get_value = "testtestetstetstetstetstettstetstetstet";
$first = substr("$get_value",0,3);
$second = substr("$get_value",4,7);
$third = substr("$get_value",8,11);
?>
I want to add little bit of own experience with HTML2PDF and tables.
I used this solution to generate the PDF containing a table filled with delivery confirmation (list of products). Such list may contain up to thousand of products (rows).
I encountered a problem with formatting and long strings in cells. First problem was that the table was getting too wide even if I set the table's width to 100% and the width of header (<th>) columns (HTML2PDF does not support <colgroup> so I couldn't define it globally) - some columns were out of visible area. I used wordwrap() with <br /> as separator to break down the long strings which looked like it's working. Unfortunately, it turned out that if there is such long string in first and last row the whole table is prepended and appended with empty page. Not a real bugger but doesn't look nice either. The final solution was to (applies for tables which width could outreach the visible area):
set the fixed widths of table and each row in pixels
for A4 letter size I am using total width of 550 px with default margins but you'd have to play around a little to distribute the width between columns
in wordwrap use empty space or / \xe2\x80\x8b as delimiter
For small tables that you'd like to spread for 100% of visible area width it is OK to use width expressed in %.
I think this function is a limping solution.
function String2PDFString($string,$Length)
{
$Arry=explode(" ",$string);
foreach($Arry as $Line)
{
if(strlen($Line)>$Length)
$NewString.=wordwrap ($Line,$Length," ",true);
else
$NewString.=" ".$Line;
}
return $NewString;
}
I need to annotate an image with Chinese Text and I am using Imagick library right now.
An example of a Chinese Text is
这是中文
The Chinese Font file used is this
The file originally is named 华文黑体.ttf
it can also be found in Mac OSX under /Library/Font
I have renamed it to English STHeiTi.ttf make it easier to call the file in php code.
In particular the Imagick::annotateImage function
I also am using the answer from "How can I draw wrapped text using Imagick in PHP?".
The reason why I am using it is because it is successful for English text and application needs to annotate both English and Chinese, though not at the same time.
The problem is that when I run the annotateImage using Chinese text, I get annotation that looks like 罍
Code included here
The problem is you are feeding imagemagick the output of a "line splitter" (wordWrapAnnotation), to which you are utf8_decodeing the text input. This is wrong for sure, if you are dealing with Chinese text. utf8_decode can only deal with UTF-8 text that CAN be converted to ISO-8859-1 (the most common 8-bit extension of ASCII).
Now, I hope that you text is UTF-8 encoded. If it is not, you might be able to convert it like this:
$text = mb_convert_encoding($text, 'UTF-8', 'BIG-5');
or like this
$text = mb_convert_encoding($text, 'UTF-8', 'GB18030'); // only PHP >= 5.4.0
(in your code $text is rather $text1 and $text2).
Then there are (at least) two things to fix in your code:
pass the text "as is" (without utf8_decode) to wordWrapAnnotation,
change the argument of setTextEncoding from "utf-8" to "UTF-8"
as per specs
I hope that all variables in your code are initialized in some missing part of it. With the two changes above (the second one might not be necessary, but you never know...), and with the missing parts in place, I see no reason why your code should not work, unless your TTF file is broken or the Imagick library is broken (imagemagick, on which Imagick is based, is a great library, so I consider this last possibility rather unlikely).
EDIT:
Following your request, I update my answer with
a) the fact that setting mb_internal_encoding('utf-8') is very important for the solution, as you say in your answer, and
b) my proposal for a better line splitter, that works acceptably for western languages and for Chinese, and that is probably a good starting point for other languages using Han logograms (Japanese kanji and Korean hanja):
function wordWrapAnnotation(&$image, &$draw, $text, $maxWidth)
{
$regex = '/( |(?=\p{Han})(?<!\p{Pi})(?<!\p{Ps})|(?=\p{Pi})|(?=\p{Ps}))/u';
$cleanText = trim(preg_replace('/[\s\v]+/', ' ', $text));
$strArr = preg_split($regex, $cleanText, -1, PREG_SPLIT_DELIM_CAPTURE |
PREG_SPLIT_NO_EMPTY);
$linesArr = array();
$lineHeight = 0;
$goodLine = '';
$spacePending = false;
foreach ($strArr as $str) {
if ($str == ' ') {
$spacePending = true;
} else {
if ($spacePending) {
$spacePending = false;
$line = $goodLine.' '.$str;
} else {
$line = $goodLine.$str;
}
$metrics = $image->queryFontMetrics($draw, $line);
if ($metrics['textWidth'] > $maxWidth) {
if ($goodLine != '') {
$linesArr[] = $goodLine;
}
$goodLine = $str;
} else {
$goodLine = $line;
}
if ($metrics['textHeight'] > $lineHeight) {
$lineHeight = $metrics['textHeight'];
}
}
}
if ($goodLine != '') {
$linesArr[] = $goodLine;
}
return array($linesArr, $lineHeight);
}
In words: the input is first cleaned up by replacing all runs of whitespace, including newlines, with a single space, except for leading and trailing whitespace, which is removed. Then it is split either at spaces, or right before Han characters not preceded by "leading" characters (like opening parentheses or opening quotes), or right before "leading" characters. Lines are assembled in order not to be rendered in more than $maxWidth pixels horizontally, except when this is not possible by the splitting rules (in which case the final rendering will probably overflow). A modification in order to force splitting in overflow cases is not difficult. Note that, e.g., Chinese punctuation is not classified as Han in Unicode, so that, except for "leading" punctuation, no linebreak can be inserted before it by the algorithm.
I'm afraid you will have to choose a TTF that can support Chinese code points. There are many sources for this, here are two:
http://www.wazu.jp/gallery/Fonts_ChineseTraditional.html
http://wildboar.net/multilingual/asian/chinese/language/fonts/unicode/non-microsoft/non-microsoft.html
Full solution here:
https://gist.github.com/2971092/232adc3ebfc4b45f0e6e8bb5934308d9051450a4
Key ideas:
Must set the html charset and internal encoding on the form and on the processing page
header('Content-Type: text/html; charset=utf-8');
mb_internal_encoding('utf-8');
These lines must be at the top lines of the php files.
Use this function to determine if text is Chinese and use the right font file
function isThisChineseText($text) {
return preg_match("/\p{Han}+/u", $text);
}
For more details check out https://stackoverflow.com/a/11219301/80353
Set TextEncoding properly in ImagickDraw object
$draw = new ImagickDraw();
// set utf 8 format
$draw->setTextEncoding('UTF-8');
Note the Capitalized UTF. THis was helpfully pointed out to me by Walter Tross in his answer here: https://stackoverflow.com/a/11207521/80353
Use preg_match_all to explode English words, Chinese Words and spaces
// separate the text by chinese characters or words or spaces
preg_match_all('/([\w]+)|(.)/u', $text, $matches);
$words = $matches[0];
Inspired by this answer https://stackoverflow.com/a/4113903/80353
Works just as well for english text
Can you post a regex search and replacement in php for minifying/compressing javascript?
For example, here's a simple one for CSS
header('Content-type: text/css');
ob_start("compress");
function compress($buffer) {
/* remove comments */
$buffer = preg_replace('!/\*[^*]*\*+([^/][^*]*\*+)*/!', '', $buffer);
/* remove tabs, spaces, newlines, etc. */
$buffer = str_replace(array("\r\n", "\r", "\n", "\t", ' ', ' ', ' '), '', $buffer);
return $buffer;
}
/* put CSS here */
ob_end_flush();
And here's one for html:
<?php
/* Minify All Output - based on the search and replace regexes. */
function sanitize_output($buffer)
{
$search = array(
'/\>[^\S ]+/s', //strip whitespaces after tags, except space
'/[^\S ]+\</s', //strip whitespaces before tags, except space
'/(\s)+/s' // shorten multiple whitespace sequences
);
$replace = array(
'>',
'<',
'\\1'
);
$buffer = preg_replace($search, $replace, $buffer);
return $buffer;
}
ob_start("sanitize_output");
?>
<html>...</html>
But what about one for javascript?
A simple regex for minifying/compressing javascript is unlikely to exist anywhere. There are probably several good reasons for this, but here are a couple of these reasons:
Line breaks and semicolons
Good javascript minifiers remove all extra line breaks, but because javascript engines will work without semicolons at the end of each statement, a minifier could easily break this code unless it is sophisticated enough to watch for and handle different coding styles.
Dynamic Language Constructs
Many of the good javascript minifiers available will also change the names of your variables and functions to minify the code. For instance, a function named 'strip_white_space' that is called 12 times in your file might be renamed simple 'a', for a savings of 192 characters in your minified code. Unless your file has a lot of comments and/or whitespace, optimizations like these are where the majority of your filesize savings will come from.
Unfortunately, this is much more complicated than a simple regex should try to handle. Say you do something as simple as:
var length = 12, height = 15;
// other code that uses these length and height values
var arr = [1, 2, 3, 4];
for (i = (arr.length - 1); i >= 0; --i) {
//loop code
}
This is all valid code. BUT, how does the minifier know what to replace? The first "length" has "var" before it (but it doesn't have to), but "height" just has a comma before it. And if the minifier is smart enough to replace the first "length" properly, how smart does it have to be know NOT to change the word "length" when used as a property of the array? It would get even more complicated if you defined a javascript object where you specifically defined a "length" property and referred to it with the same dot-notation.
Non-regex Options Several projects exist to solve this problem using more complex solutions than just a simple regex, but many of them don't make any attempt to change variable names, so I still stick with Dean Edwards' packer or Douglas Crockford's JSMin or something like the YUI Compressor.
PHP implementation of Douglas Crockford's JSMin
https://github.com/mrclay/minify
I had a better shot at this Gist by orangeexception than Jan or B.F's answers.
preg_replace('#(?s)\s|/\*.*?\*/|//[^\r\n]*#', '', $javascript);
https://gist.github.com/orangexception/1301150/ed16505e2cb200dee0b0ab582ebbc67d5f060fe8
I'm writing on my own minifier because I have some PHP inside.
There is still one not solved problem. Preg_replace cannot handle quotes as boundary, or better it cannot count pair and impair quotes. Into the bargain there are double quotes, escaped double quotes, single quotes and escaped single quotes.
Here are just some interesting preg-functions.
$str=preg_replace('#//.*#','',$str);//delete comments
$str=preg_replace('#\s*/>#','>',$str);//delete xhtml tag slash ( />)
$str=str_replace(array("\n","\r","\t"),"",$str);//delete escaped white spaces
$str=preg_replace("/<\?(.*\[\'(\w+)\'\].*)\?>/","?>$1<?",$str);//rewrite associated array to object
$str=preg_replace("/\s*([\{\[\]\}\(\)\|&;]+)\s*/","$1",$str);//delete white spaces between brackets
$count=preg_match_all("/(\Wvar (\w{3,})[ =])/", $str, $matches);//find var names
$x=65;$y=64;
for($i=0;$i<$count;$i++){
if($y+1>90){$y=65;$x++;}//count upper case alphabetic ascii code
else $y++;
$str=preg_replace("/(\W)(".$matches[$i]."=".$matches[$i]."\+)(\W)/","$1".chr($x).chr($y)."+=$3",$str);//replace 'longvar=longvar+'blabla' to AA+='blabla'
$str=preg_replace("/(\W)(".$matches[$i].")(\W)/","$1".chr($x).chr($y)."$3",$str);//replace all other vars
}
//echo or save $str.
?>
You may do similarly with function names:
$count= preg_match_all("/function (\w{3,})/", $str, $matches);
If you want to see the replaced vars, put the following code in the for-loop:
echo chr($x).chr($y)."=".$matches[$i]."<br>";
Separate php from JS by:
$jsphp=(array)preg_split("/<\?php|\?>/",$str);
for($i=0;$i<count($jsphp);$i++){
if($i%2==0){do something whith js clause}
else {do something whith PHP clause}
}
This is only a draft. I'm always happy for suggestions.
Hope it was Englisch...
Adapted from B.F. answer and some other searching and testing I got to this. It works for my needs, is fast enough etc. It does leave my quoted text alone(finally).
<?php $str=preg_replace('#//.*#','',$someScriptInPhpVar);//delete comments
$count=preg_match_all("/(\Wvar (\w{3,})[ =])/", $str, $matches);//find var names
$x=65;$y=96;
for($i=0;$i<$count;$i++){if($y+1>122){$y=97;$x++;} else $y++; //count upper lower case alphabetic ascii code
$str=preg_replace("/([^\"a-zA-Z])(".$matches[2][$i]."=".$matches[2][$i]."\+)(\W)/","$1".chr($x).chr($y)."+=$3",$str);//replace 'longvar=longvar+'blabla' to AA+='blabla'
$str=preg_replace("/(\b)(".$matches[2][$i].")(\b)(?![\"\'\w :])/","$1".chr($x).chr($y)."$3",$str);//replace all other vars
}
$str=preg_replace("/\s+/"," ",$str);
$someScriptInPhpVar=str_replace(array("\n","\r","\t","; ","} "),array("","","",";","}"),$str);//delete escaped white space and other space ?>
If a user types a really long string it doesn't move onto a 2nd line and will break a page on my site. How do I take that string and remove it completely if it's not a URL?
Why would you want to remove what the user wrote? Instead, wrap it to a new line - there is a function in PHP to do that, called wordwrap
Do you really want to remove the word, or do you just want to prevent it from making your page layout too wide? If the latter is more what you want, consider using CSS to manage the overflow.
For instance:
div {
overflow:hidden;
}
will hide any content that exceeds the div boundary.
Here's more info on CSS overflow:
http://www.w3schools.com/css/pr_pos_overflow.asp
// remove words over 30 chars long
$str = preg_replace('/\S{30,}/', '', $str);
edit: updated per Tim P's suggestion, \S matches any non-space char (the same as [^\s])
Also here is a better way incorporating ehdv's suggestion to use wordwrap:
//This will break up the long words with spaces so they don't stretch layouts.
$str = preg_replace('/(\S{30,})/e', "wordwrap('$1', 30, ' ', true)", $str);
What if it is a really long URL? At any rate why not just match the text to a valid URL, and only accept those? Check out some php-regex info on URLs and see how they work. The Regular Expressions Cookbook has a good chapter on URL matching, as well.
#Rob care in using REGEX. Performance lookout.