As only display text with substr() - php

On the basis of data I have saved an article that comes with HTML as <br>, <table> ... etc. ....
example :
<tr>
<td style="width:50%;text-align:left;">Temas nuevos - Comunidad</td>
<td class="blocksubhead" style="width:50%;text-align:left;">Temas actualizados - Comunidad Temas actualizados - Comunidad</td>
</tr>
What I want is displayed on another screen a summary of the article using substr (), my problem is I can not print what I want, and that prints the html code eta first.
Example: echo substr($row["news"], 0, 20);
It is printing the first 20 characters, it only show at browser:
<td style="width:50%;text-align:l<td/>
What I want is, it only show the text and discard the html code it has

Use strip_tags() to strip html etc from the string...
So: echo substr(strip_tags($row["news"]), 0, 20);
http://php.net/manual/en/function.strip-tags.php
You could also do it using preg_replace() to match and replace anything that looks like a tag :)

The strip_tags() function strips a string from HTML, XML, and PHP tags.
//remove the html from the string.
$row["news"] = strip_tags($row["news"]);
//getting the first 20 character from the string and display as output.
echo substr($row["news"], 0, 20);

Related

Substr String With HTML In PHP

I have some text that comes back from my database like so:
<span rgb(61,="" 36,="" 36);="" font-family:="" 'frutiger="" neue="" w01="" book',="" 'helvetica="" neue',="" helvetica,="" arial,="" sans-serif;="" line-height:="" 23.8px;"="">The Department of ...
I use echo html_entity_decode($item->body); to display:
The Department of ...
However, if I use the PHP substr function on this content it never displays correctly. It will display the first x characters of HTML and not the HTML formatted text.
Here's what I tried: echo substr(html_entity_decode($item->body), 0, 5);
But it doesn't display anything. If I try an amount like 0, 200); it will display:
The Department of Molec
But this is most definitely not the first 200 characters of the formatted text because the first character is T.
My idea is that there must be way to format and then substr, even though I can't get it to work using html_entity_decode() and substr() by themselves.
Can anybody help me out here? Thanks!
Try to use this instead of html_entity_decode():
strip_tags($item->body);
strip_tags removes all HTML tags from the string. So you better of treating the string and then do something with it.
You will see the output in the source code, but it is not beeing rendered. The source code will show:
echo substr(html_entity_decode($item->body), 0, 5);
// Output: "<span"
What you probably want to do is search for the end of the html-tag, and display 5 characters after that, like:
$text = html_entity_decode($item->body);
$start = strpos( $text, '>' ) + 1;
echo substr( $text, $start, 5 );

substr doesn't work after using strip_tags

I want to remove all tags before showing them on preview mode (just some text).
I have this code:
$text = strip_tags($item['content']);
echo substr($text,0,13);
here is my $item['content'] is something like this
<div class="note note-success">
<p>
Font Awesome gives you scalable
vector icons that can instantly be customized — size, color, drop
shadow, and anything that can be done with the power of CSS. The
complete set of 439 icons in Font Awesome 4.1.0
</p>
For more info check out: <a target="_blank" href="http://fortawesome.github.io/Font-Awesome/icons/">http://fortawesome.github.io/Font-Awesome/icons/</a>
</div>
The problem is that when I use substr it doesn't show anything, but when I use normal echo, it shows the content of the variable that was stripped before.
Does strip_tags not give string output?
Try to remove whitespaces before outputting your substring:
$new = str_replace(' ','',$text); (Use trim instead as #mario.klump said)
$text = strip_tags($item['content']);
$new = trim($text);
echo substr($new,0,13);
strip_tags() function works only when following type of html text. what you are doing is convert html encoded text so, it will not be parse.
$text = '<p>Test paragraph.</p><!-- Comment --> Other text';
echo strip_tags($text);
For your example you can use like this:
$text = htmlentities($item['content']);
echo substr(html_entity_decode($text),0,13); or
echo substr($text,0,13);

Preg_match to ignore new lines or tabs

My html output source is something like this
<td><span class="bookdetailtitle">ISBN</span></td>
<td>:</td>
<td>9788172338299</td>
I need only 9788172338299 to be printed. If the above code is in same line, it prints properly. But since there are new lines and tabs, I'm not getting the output. I tried replacing /i with /s, but not working. I want preg_match to match the string regardless of new lines or tabs and print the desired output.
Here is my code:
$page2='<td><span class="bookdetailtitle">ISBN</span></td>
<td>:</td>
<td>9788172338299</td>';
preg_match('/<td><span class="bookdetailtitle">ISBN<\/span><\/td><td>:<\/td><td>(.*)<\/td>/s', $page2, $keywords);
echo $keywords_out = $keywords[1];
If you need just number, something like this?
$page2='<td><span class="bookdetailtitle">ISBN</span></td>
<td>:</td>
<td>9788172338299</td>';
preg_match('/<td>+[0-9]+<\/td>/', $page2, $keywords);
print_r($keywords);
http://phpfiddle.org/main/code/43j-t8b
P.S. many will say - don't use regex for parsing html. I agree. :)
I would do something like this:
$page=explode('<td>',$page2);
print_r($page[3]);
http://phpfiddle.org/main/code/buf-95c
Edit: to get rid of last td -> print_r(strip_tags($page[3]));

How to start count chars avoiding <img></img> tag, but leaving it in text ? in joomla

I used this
$item->introtext = JHtml::_('content.prepare', $item->introtext);
$item->introtext = strip_tags($item->introtext);
$item->introtext = substr($item->introtext, 0, 50);
But then only plain text appears without image.. I want to count characters after the img tag, but still have the img tag in text after counting.
You can store the <img></img> and its position in the string in an array, then reinsert it after the count.

php preg_match_all html dates with slashes error

I've trying to preg_match_all a date with slashes in it sitting between 2 html tags; however its returning null.
here is the html:
> <td width='40%' align='right'class='SmallDimmedText'>Last Login: 11/14/2009</td>
Here is my preg_match_all() code
preg_match_all('/<td width=\'40%\' align=\'right\' class=\'SmallDimmedText\'>Last([a-zA-Z0-9\s\.\-\',]*)<\/td>/', $h, $table_content, PREG_PATTERN_ORDER);
where $h is the html above.
what am i doing wrong?
thanks in advance
It (from a quick glance) is because you are trying to match:
Last Login: 11/14/2009
With this regex:
Last([a-zA-Z0-9\s\.\-\',]*)
The regex doesn't contain the required characters of : and / which are included in the text string. Changing the required part of the regex to:
Last([a-zA-Z0-9\s\.\-\',:/]*)
Gives a match
Would it be better to simply use a DOM parser, and then preform the regex on the result of the DOM lookup? It makes for nicer regex...
EDIT
The other issue is that your HTML is:
...40%' align='right'class='SmallDimmedText'>...
Where there is no space between align='right' and class='SmallDimmedText'
However your regex for that section is:
...40%\' align=\'right\' class=\'SmallDimmedText\'>...
Where it is indicated there is a space.
Use a DOM Parser It will save you more headaches caused by subtle bugs than you can count.
Just to give you an idea on how simple it is to parse using Simple HTML DOM.
$html = str_get_html(...);
$elems = $html->find('.SmallDimmedText');
if ( count($elems->children()) != 1 ){
throw new Exception('Too many/few elements found');
}
$text = $elems->children(0)->plaintext;
//parsing here is only an example, but you have removed all
//the html so that any regex used is really simple.
$date = substr($text, strlen('Last Login: '));
$unixTime = strtotime($date);
I see at least two problems :
in your HTML string, there is no space between 'right' and class=, and there is one space there in your regex
you must add at least these 3 characters to the list of matched characters, between the [] :
':' (there is one between "Login" and the date),
' ' (there are spaces between "Last" and "Login", and between ":" and the date),
and '/' (between the date parts)
With this code, it seems to work better :
$h = "<td width='40%' align='right'class='SmallDimmedText'>Last Login: 11/14/2009</td>";
if (preg_match_all("#<td width='40%' align='right'class='SmallDimmedText'>Last([a-zA-Z0-9\s\.\-',: /]*)<\/td>#",
$h, $table_content, PREG_PATTERN_ORDER)) {
var_dump($table_content);
}
I get this output :
array
0 =>
array
0 => string '<td width='40%' align='right'class='SmallDimmedText'>Last Login: 11/14/2009</td>' (length=80)
1 =>
array
0 => string ' Login: 11/14/2009' (length=18)
Note I have also used :
# as a regex delimiter, to avoid having to escape slashes
" as a string delimiter, to avoid having to escape single quotes
My first suggestion would be to minimize the amount of text you have in the preg_match_all, why not just do between a ">" and a "<"? Second, I'd end up writing the regex like this, not sure if it helps:
/>.*[0-9]{1,2}/[0-9]{1,2}/[0-9]{2,4}</
That will look for the end of one tag, then any character, then a date, then the beginning of another tag.
I agree with Yacoby.
At the very least, remove all reference to any of the HTML specific and simply make the regex
preg_match_all('#Last Login: ([\d+/?]+)#', ...

Categories