Truncate Text Within Specific HTML Tag

Truncate Text Within Specific HTML Tag - php

This might not even be possible but I have quite a limited knowledge of PHP so I can't figure out if it is or not.
Basically I have a string $myText and this string outputs HTML in the following format:
<p>This is the main bit of text</p>
<small> This is some additional text</small>
My aim is to limit the number of characters displayed specifically within the <p> tag, for example 10 characters.
I have been playing around with PHP substr but I can only get this to work on all of the text, not just the text in the <p> tag.
Do you know if this is possible and if it is, do you know how to do it? Any pointers at all would be appreciated.
Thank you

The simplest solution is:
<?php
$text = '
<p>This is the main bit of text</p>
<small> This is some additional text</small>';
$pos = strpos($text,'<p>');
$pos2 = strpos($text,'</p>');
$text = '<p>' . substr($text,$pos+strlen('<p>'),10).substr($text,$pos2);
echo $text;
but it will work just for first pair of <p> ... </p>
If you need more, you can use regular expressions:
<?php
$text = '
<p>This is the main bit of text</p>
<small> This is some additional text</small>
<p>
werwerwrewre
</p>';
preg_match_all('#<p>(.*)</p>#isU', $text, $matches);
foreach ($matches[1] as $match) {
$text = str_replace('<p>'.$match.'</p>', '<p>'.substr($match,0,10).'</p>', $text);
}
echo $text;
or even
<?php
$text = '
<p>This is the main bit of text</p>
<small> This is some additional text</small>
<p>
werwerwrewre
</p>';
$text = preg_replace_callback('#<p>(.*)</p>#isU', function($matches) {
$matches[1] = '<p>'.substr($matches[1],0,10).'</p>';
return $matches[1];
}, $text);
echo $text;
However in those all 3 cases, all white characters are assumed as part of the string, so if the content of <p>...</p> starts with 3 spaces and you want to display only 3 characters, you simple display only 3 spaces, nothing more. Of course it can be quite easily modified, but I mentioned it to notice that fact.
And one more thing, quite possible you will need to use multibyte version of functions to get the result, so for example instead of strpos() you should use mb_strpos() and set earlier utf-8 encoding using mb_internal_encoding('UTF-8'); to make it working

You can achieve it by a quite simple way:
<?php
$max_length = 5;
$input = "<b>example: </b><div align=left>this is a test</div><div>another very very long item</div>";
$elements_count = preg_match_all("|(<[^>]+>)(.*)(</[^>]+>)|U",
$input,
$out, PREG_PATTERN_ORDER);
for($i=0; $i<$elements_count; $i++){
echo $out[1][$i].substr($out[2][$i], 0, $max_length).$out[3][$i]."\n";
}
these will work for any tag and any class or attribute within it.
ex. input:
<b>example: </b><div align=left>this is a test</div><div>another very very long item</div>
output:
<b>examp</b>
<div align=left>this </div>
<div>anoth</div>

Related

How to wrap every word in spans with PHP?

I have a some html paragraphs and I want to wrap every word in . Now I have
$paragraph = "This is a paragraph.";
$contents = explode(' ', $paragraph);
$i = 0;
$span_content = '';
foreach ($contents as $c){
$span_content .= '<span>'.$c.'</span> ';
$i++;
}
$result = $span_content;
The above codes work just fine for normal cases, but sometimes the $paragraph would contains some html tags, for example
$paragraph = "This is an image: <img src='/img.jpeg' /> This is a <a href='/abc.htm'/>Link</a>'";
How can I not wrap "words" inside html tag so that the htmnl tags still works but have the other words wrapped in spans? Thanks a lot!

Some (*SKIP)(*FAIL) mechanism?
<?php
$content = "This is an image: <img src='/img.jpeg' /> ";
$content .= "This is a <a href='/abc.htm'/>Link</a>";
$regex = '~<[^>]+>(*SKIP)(*FAIL)|\b\w+\b~';
$wrapped_content = preg_replace($regex, "<span>\\0</span>", $content);
echo $wrapped_content;
See a demo on ideone.com as well as on regex101.com.
To leave out the Link as well, you could go for:
(?:<[^>]+> # same pattern as above
| # or
(?<=>)\w+(?=<) # lookarounds with a word
)
(*SKIP)(*FAIL) # all of these alternatives shall fail
|
(\b\w+\b)
See a demo for this on on regex101.com.

The short version is you really do not want to attempt this.
The longer version: If you are dealing with HTML then you need an HTML parser. You can't use regexes. But where it becomes even more messy is that you are not starting with HTML, but with an HTML fragment (which may, or may not be well-formed. It might work if Hence you need to use an HTML praser to identify the non-HTML extents, separate them out and feed them into a secondary parser (which might well use regexes) for translation, then replace the translted content back into the DOM before serializing the document.

PHP - Replace Word in String (while ignore HTML tags)

I have a string with HTML tags, $paragraph:
$paragraph = '
<p class="instruction">
<sup id="num1" class="s">80</sup>
Hello there and welcome to Stackoverflow! You are welcome indeed.
</p>
';
$replaceIndex = array(0, 4);
$word = 'dingo';
I'd like to replace the words at indices defined by $replaceIndex (0 and 4) of $paragraph. By this, I mean I want to replace the words "80" and "welcome" (only the first instance) with $word. The paragraph itself may be formatted with different HTML tags in different places.
Is there a way to locate and replace certain words of the string while virtually ignoring (but not stripping) HTML tags?
Thanks!
Edit: Words are separated by (multiple) tags and (multiple) whitespace characters, while not including anything within the tags.

Thanks for all the tips. I figured it out! Since I'm new to PHP, I'd appreciate it if any PHP veterans have any tips on simplifying the code. Thanks!
$paragraph = '
<p class="instruction">
<sup id="num1" class="s">80</sup>
Hello there and welcome to Stackoverflow! You are welcome indeed.
</p>
';
// Split up $paragraph into an array of tags and words
$paragraphArray = preg_split('/(<.*?>)|\s/', $paragraph, -1, PREG_SPLIT_NO_EMPTY | PREG_SPLIT_DELIM_CAPTURE);
$wordIndicies = array(0, 4);
$replaceWith = 'REPLACED';
foreach ($wordIndicies as $wordIndex) {
for ($i = 0; $i <= $wordIndex; $i++) {
// if this element starts with '<', element is a tag.
if ($paragraphArray[$i]{0} == '<') {
// push wordIndex forward to compensate for found tag element
$wordIndex++;
}
// when we reach the word we want, replace it!
elseif ($i == $wordIndex) {
$paragraphArray[$i] = $replaceWith;
}
}
}
// Put the string back together
$newParagraph = implode(' ', $paragraphArray);
// Test output!
echo(htmlspecialchars($newParagraph));
*Only caveat is that this may potentially produce unwanted spaces in $newParagraph, but I'll see if that actually causes any issues when I implement the code.

$text = preg_replace('/\b80\b|\bwelcome\b/', $word, $paragraph);
Hope this will help you :)

SimpleXML could come in handy as well:
$paragraph = '
<p class="instruction">
<sup id="num1" class="s">80</sup>
Hello there and welcome to Stackoverflow! You are welcome indeed.
</p>
';
$xml = simplexml_load_string($paragraph);
$xml->sup = $word;

how to regex character < and > replace like < and > in tag <code> </code>?

I have a string like bellow:
<pre title="language-markup">
<code>
<div title="item_content item_view_content" itemprop="articleBody">
abc
</div>
</code>
</pre>
In the <code></code> tag I want to replace all the characters < and > with < and >. How should I do?
Example: <code> < div ><code>.
Please tell me if you have any ideas. Thanks all.

try below solution:
$textToScan = '<pre title="language-markup">
<code>
<div title="item_content item_view_content" itemprop="articleBody">
abc
</div>
</code>
</pre>';
// the regex pattern (case insensitive & multiline
$search = "~<code>(.*?)</code>~is";
// first look for all CODE tags and their content
preg_match_all($search, $textToScan, $matches);
//print_r($matches);
// now replace all the CODE tags and their content with a htmlspecialchars() content
foreach($matches[1] as $match){
$replace = htmlspecialchars($match);
// now replace the previously found CODE block
$textToScan = str_replace($match, $replace, $textToScan);
}
// output result
echo $textToScan;
output:
<pre title="language-markup">
<code>
<div title="item_content item_view_content" itemprop="articleBody">
abc
</div>
</code>
</pre>

Don't. Use htmlspecialchars. That is there only to serve that very purpose
echo htmlspecialchars("<a href='test'>Test</a>");
Output of your HTML code
<pre title="language-markup"><code>
<div title="item_content item_view_content"
itemprop="articleBody">abc</div></code></pre>
Another example based on your comment
<code>
<?php
echo htmlspecialchars('html here');?>
</code>

Use either htmlspecialchars() or htmlentities()
$string = "<html></html>"
// Do this
$encodedString = htmlentities($string);
// or
$encodedString = htmlspecialchars($string);
The difference in these two functions is that one will encode everything or better said "entities". The other will only encode special characters.
Below are some quotes from PHP.net
From the PHP documentation for htmlentities:
This function is identical to htmlspecialchars() in all ways, except with htmlentities(), all characters which have HTML character entity equivalents are translated into these entities.
From the PHP documentation for htmlspecialchars:
Certain characters have special significance in HTML, and should be represented by HTML entities if they are to preserve their meanings. This function returns a string with some of these conversions made; the translations made are those most useful for everyday web programming. If you require all HTML character entities to be translated, use htmlentities() instead.

Ok, I'm trying to fix my problem. I was successed, this is my code to resolve my problem. You can use my way or use Chetan Ameta's way bellow my answer:
function replaceString($string)
{
preg_match_all('/<code>(.*?)<\/code>/', $string, $matches);
$result = [];
foreach ($matches[1] as $key => $match) {
$result[$key] = str_replace(['<', '>'], ['<', '>'], $match);
}
return str_replace($matches[1], $result, $string);
}
$string = '<pre title="language-markup"><code><div title="item_content item_view_content" itemprop="articleBody">abc</div></code></pre>';
echo replaceString($string);
I like this place, thanks all help me, i'm so grateful. Thank again.

how do I use preg_replace_callback on unknown values?

I got some great help today with starting to understand preg_replace_callback with known values. But now I want to tackle unknown values.
$string = '<p id="keepthis"> text</p><div id="foo">text</div><div id="bar">more text</div><a id="red"href="page6.php">Page 6</a><a id="green"href="page7.php">Page 7</a>';
With that as my string, how would I go about using preg_replace_callback to remove all id's from divs and a tags but keeping the id in place for the p tag?
so from my string
<p id="keepthis"> text</p>
<div id="foo">text</div>
<div id="bar">more text</div>
<a id="red"href="page6.php">Page 6</a>
<a id="green"href="page7.php">Page 7</a>
to
<p id="keepthis"> text</p>
<div>text</div>
<div>more text</div>
Page 6
Page 7

There's no need of a callback.
$string = preg_replace('/(?<=<div|<a)( *id="[^"]+")/', ' ', $string);
Live demo
However in the use of preg_replace_callback:
echo preg_replace_callback(
'/(?<=<div|<a)( *id="[^"]+")/',
function ($match)
{
return " ";
},
$string
);
Demo

For your example, the following should work:
$result = preg_replace('/(<(a|div)[^>]*\s+)id="[^"]*"\s*/', '\1', $string);
Though in general you'd better avoid parsing HTML with regular expressions and use a proper parser instead (for example load the HTML into a DOMDocument and use the removeAttribute method, like in this answer). That way you can handle variations in markup and malformed HTML much better.

Stripping html tags using php

How can i strip html tag except the content inside the pre tag
code
$content="
<div id="wrapper">
Notes
</div>
<pre>
<div id="loginfos">asdasd</div>
</pre>
";
While using strip_tags($content,'') the html inside the pre tag too stripped of. but i don't want the html inside pre stripped off

Try :
echo strip_tags($text, '<pre>');

You may do the following:
Use preg_replace with 'e' modifier to replace contents of pre tags with some strings like ###1###, ###2###, etc. while storing this contents in some array
Run strip_tags()
Run preg_relace with 'e' modifier again to restore ###1###, etc. into original contents.
A bit kludgy but should work.

<?php
$document=html_entity_decode($content);
$search = array ("'<script[^>]*?>.*?</script>'si","'<[/!]*?[^<>]*?>'si","'([rn])[s]+'","'&(quot|#34);'i","'&(amp|#38);'i","'&(lt|#60);'i","'&(gt|#62);'i","'&(nbsp|#160);'i","'&(iexcl|#161);'i","'&(cent|#162);'i","'&(pound|#163);'i","'&(copy|#169);'i","'&#(d+);'e");
$replace = array ("","","\1","\"","&","<",">"," ",chr(161),chr(162),chr(163),chr(169),"chr(\1)");
$text = preg_replace($search, $replace, $document);
echo $text;
?>

$text = 'YOUR CODE HERE';
$org_text = $text;
// hide content within pre tags
$text = preg_replace( '/(<pre[^>]*>)(.*?)(<\/pre>)/is', '$1###pre###$3', $text );
// filter content
$text = strip_tags( $text, '<pre>' );
// insert back content of pre tags
if ( preg_match_all( '/(<pre[^>]*>)(.*?)(<\/pre>)/is', $org_text, $parts ) ) {
foreach ( $parts[2] as $code ) {
$text = preg_replace( '/###pre###/', $code, $text, 1 );
}
}
print_r( $text );

Ok!, you leave nothing but one choice: Regular Expressions... Nobody likes 'em, but they sure get the job done. First, replace the problematic text with something weird, like this:
preg_replace("#<pre>(.+?)</pre>#", "||k||", $content);
This will effectively change your
<pre> blah, blah, bllah....</pre>
for something else, and then call
strip_tags($content);
After that, you can just replace the original value in ||k||(or whatever you choose) and you'll get the desired result.

I think your content is not stored very well in the $content variable
could you check once by converting inner double quotes to single quotes
$content="
<div id='wrapper'>
Notes
</div>
<pre>
<div id='loginfos'>asdasd</div>
</pre>
";
strip_tags($content, '<pre>');

You may do the following:
Use preg_replace with 'e' modifier to replace contents of pre tags with some strings like ###1###, ###2###, etc. while storing this contents in some array
Run strip_tags()
Run preg_relace with 'e' modifier again to restore ###1###, etc. into original contents.
A bit kludgy but should work.
Could you please write full code. I understood, but something goes wrong. Please write full programming code

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

Truncate Text Within Specific HTML Tag - php

Related

How to wrap every word in spans with PHP?

PHP - Replace Word in String (while ignore HTML tags)

how to regex character < and > replace like < and > in tag <code> </code>?

how do I use preg_replace_callback on unknown values?

Stripping html tags using php

Categories

Resources