How to extract only text from HTML string with PHP?

How to extract only text from HTML string with PHP? - php

I want to extract only text from a php string.
This php string contains html code like tags or etc.
So I only need a simple text from this string.
This is the actual string:
<div class="devblog-index-content battlelog-wordpress">
<p><strong>The celebration of the Recon class in our second </strong>BF4 Class Week<strong> continues with a sneaky stroll down memory lane. Learn more about how the Recon has changed in appearance, name and weaponry over the years…</strong></p>
<p> </p>
<p style="text-align:center"><img alt="bf4-history-of-recon-1" class="aligncenter" src="http://eaassets-a.akamaihd.net/battlelog/prod/954660ddbe53df808c23a0ba948e7971/en_US/blog/wp-content/uploads/2014/10/bf4-history-of-recon-1.jpg?v=1412871863.37" style="width:619px" /></p>
I want to show this from the string:
The celebration of the Recon class in our second BF4 Class Week continues with a sneaky stroll down memory lane. Learn more about how the Recon has changed in appearance, name and weaponry over the years…
Actually this text will be placed in meta description tag so I don't need any HTML in meta tag.
How can I perform this? Any ideas and thoughts about this technique ?

You may try:
echo(strip_tags($your_string));
More info here: http://php.net/manual/en/function.strip-tags.php

Another option is to use Html2Text. It will do a much better job than strip_tags, especially if you want to parse complicated HTML code.
Extracting text from HTML is tricky, so your best bet is to use a library built for this purpose.
https://github.com/mtibben/html2text
Install using composer:
composer require html2text/html2text
Basic usage:
$html = new \Html2Text\Html2Text('Hello, "<b>world</b>"');
echo $html->getText(); // Hello, "WORLD"

Adding another option for someone else who may need this, the Stringizer library might be an option, see Strip Tags.
Full disclosure I'm the owner of the project.

Related

How can I display and format HTML code within the page?

I want to display html code just like what your see here.
<textarea><script id="ff">gdgdgs</script></textarea>
and have it displayed without altering the page. and have it nicely within a box like this.
How is this achieved?

I think the best way is to actually have a look and see how Stackoverflow does it! :)
If you right click on your code box in Chrome and select inspect element, it'll show you the markup for that box. It's so useful to be able to do this, obviously not to rip people off, but learn how other people put websites together, and how they achieve cool effects like code boxes! :)
Interestingly enough though, if you simply right click on the page and go to view source, you'll see something slightly different:
<pre><code><textarea><script id="ff">gdgdgs</script></textarea>
</code></pre>
So we can see here that this is what the mark-up for that box looks like before the page has loaded and any JavaScript is run. When the page starts to load on the client side, some JavaScript will be run which takes the above mark-up and tranforms it to look like the mark up you see when you right click on the code box and inspect it in chrome. Doing this gives you a real-time view of the HTML on the page:
<pre class="lang-php prettyprint">
<code>
<span class="tag"><textarea><script</span>
<span class="pln"></span>
<span class="atn">id</span>
<span class="pun">=</span>
<span class="atv">"ff"</span>
<span class="tag">></span>
<span class="pln">gdgdgs</span>
<span class="tag"></script></textarea></span>
<span class="pln"><br></span>
</code>
</pre>
So if you have a look, you can see the transformed code uses a pre tag. This basically says, anything between here you can treat as a literal or in otherwords, keep line breaks and spaces where I left them!
As well as using the pre tag to wrap the code, you can also see that they use different CSS classes. This is to achieve the color coding you can see.
They also use a code tag which as far as I can see, is very similar to pre, only it makes your markup a bit clearer by saying, within this tag, you should expect to see code. It's probably more semantic more than anything, like the HTML tag artical. In most browsers, it'll also change the font for text inside the code tag to mono-space, which is a bit more code like! :)
You can go furhter into this and see exactly what their CSS classes look like, from this you can start to build a mental picture to see how their mark-up and CSS works together to product their nice code boxes.
Of course, if you don't want to roll this functionality yourself, you can use someone elses framework to achive this. SyntaxHighlighter for example if widely used and recommended.
With Syntax Highlighter, you simply reference the Syntax Highlighter CSS and javascript, and then only need to wrap your code in one pre tag to get it working, something like below:
<pre class="brush: xml">
<textarea><script id="ff">gdgdgs</script></textarea>
</pre>
It might be worth a look!
Hope this helps! :)

you could use
>
>
and
<
<

This website here can help you with your particular problem. It converts your tags/html/javascript to ASCII. If you need a function, here it is. It converts the passed tags/html/javascript to ASCII. The ASCII code is escaped and treated as text by the browser. You can latter use the generated ASCII and add it to the box.
function stringToAscii(s)
{
var ascii="";
if(s.length>0)
for(i=0; i<s.length; i++)
{
var c = ""+s.charCodeAt(i);
while(c.length < 3)
c = "0"+c;
ascii += c;
}
return(ascii);
}

Use the Encoded Version like this:
<textarea>
<script id="ff">
gdgdgs
</script>
</textarea>

Is this what you mean?
<textarea><script id="ff">gdgdgs</script></textarea>
Look up HTML entities.

Yeah, just include it like:
$(document).ready(function(){
var a = '<textarea><script id="ff">gdgdgs</scrip'+'t></textarea>';
$("div").css('background','red').text(a);
});

I use the <xmp> element.

Select first DOM Element of type text using phpQuery

Let's say i have this block of code,
<div id="id1">
This is some text
<div class="class1"><p>lala</p> Some markup</div>
</div>
What I would want is only the text "This is some text" without the child element's .class1 contents. I can do it in jquery using $('#id1').contents().eq(0).text(), how can i do this in phpQuery?
Thanks.

my bad, i was doing
pq('#id1.contents().eq(0).text()')
instead of
pq('#id1')->contents()->eq(0)->text()

If compatibility is what you are after, and you want to traverse/manipulate elements as DOM objects, then perhaps the PHP DOM XML library is what you are after: http://www.php.net/manual/en/book.domxml.php
Your code would look something like this:
$xml = xmldoc('<div id="id1">This is some text<div class="class1"><p>lala</p> Some markup</div></div>');
$node = $xml->get_element_by_id("id1");
$content = $node->get_content();
I'm sorry, I don't have time to run a test of this right now, but hopefully it sets you in the right direction, and forms the basis for a decent revision... There is a good list of DOM traversal functions in the PHP documentation though :)
References: http://www.php.net/manual/en/book.domxml.php, http://www.php.net/manual/en/function.domdocument-get-element-by-id.php, http://www.php.net/manual/en/function.domnode-get-content.php

php to display text as html formated stored in database

I have one text saved in MySQL Database
<p> Celebrate with these amazing<br /> offers direct from the
Now when I print that text using
echo
I got this output
<p> Celebrate with these amazing<br /> offers direct from the
but U want to display that as
Celebrate with these amazing
offers direct from the
like HTML print.
when i see in db its stored like bellow
<p>
Celebrate with these amazing<br />
offers from
How to do this?

Assuming your database has saved the escaped information like this:
<p>
Celebrate with these amazing<br />
offers from
Then you could just use PHP's html_entity_decode function to output that block of HTML.

maybe this could help you http://php.about.com/od/phpwithmysql/qt/php_in_mysql.htm

If you are outputting to the console (it's not clear from your question what you are trying to output to):
If you look at the PHP strip_tags documentation, you should be able to reach something approximating what you want. http://uk3.php.net/manual/en/function.strip-tags.php
With strip_tags(), you can allow the tag to remain and then replace it with "\n" afterwards.
Otherwise, if you are outputting to browser, then other posts on this page have the answer - you must be storing the htmlentities() version of the data in the database.

converting html text to an image with php

what's the best way to convert a text embedded in a html tag to an image using php keeping the style written in the html tag ? for example :
convert :
<span class="Apple-style-span" style="font-size: xx-large;"><font class="Apple-style-span" color="#F4A460">Stack </font><font class="Apple-style-span" color="#800000">Overflow</font></span>
into :
is there any class for it ? or should I explode it and read the tags one by one ? any suggestion ?

Might want to have a look at Painty. Although it isn't exactly what you're looking for because you'll have to feed it an array of options, it should be a good resource on which you can expand.
Not sure if you also want to render the font(s) being used in your HTML snippet, but if you do, you would also have to get all the commonly used web-fonts and put them all in a folder from where the script can read.
Hope this helps.

With PHP GD Library support, yes:
http://visionmasterdesigns.com/tutorial-convert-text-into-transparent-png-image-using-php/ (font/size technique included)
http://corpocrat.com/2009/06/23/php-script-to-convert-textemail-address-to-image/

Check this one out
http://code.google.com/p/wkhtmltopdf/downloads/list
The project is centered around html to pdf using the webkit engine, but there are also binaries and source for html to image. It's an external binary though, so might not be useful to you in your use-case.
Otherwise I would look into imagemagick.

Using PHP PCRE to fetch div content

I'm trying to fetch data from a div (based on his id), using PHP's PCRE. The goal is to fetch div's contents based on his id, and using recursivity / depth to get everything inside it. The main problem here is to get other divs inside the "main div", because regex would stop once it gets the next </div> it finds after the initial <div id="test">.
I've tryed so many different approaches to the subject, and none of it worked. The best solution, in my oppinion, is to use the R parameter (Recursion), but never got it to work properly.
Any Ideais?
Thanks in advance :D

You'd be much better off using some form of DOM parser - regex really isn't suited to this problem. If all you want is basic HTML dom parsing, something like simplehtmldom would be right up your alley. It's trivial to install (just include a single PHP file) and trivial to use (2-3 lines will do what you need).
include('simple-html-dom.php');
$dom = str_get_html($bunchofhtmlcode);
$testdiv = $dom->find('div#test',0); // 0 for the first occurrence
$testdiv_contents = $testdiv->innertext;

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

How to extract only text from HTML string with PHP? - php

You may try: echo(strip_tags($your_string)); More info here: http://php.net/manual/en/function.strip-tags.php

Adding another option for someone else who may need this, the Stringizer library might be an option, see Strip Tags. Full disclosure I'm the owner of the project.

Related

How can I display and format HTML code within the page?

Select first DOM Element of type text using phpQuery

php to display text as html formated stored in database

converting html text to an image with php

Using PHP PCRE to fetch div content

Categories

Resources