How to select content without element using simplehtmldom

How to select content without element using simplehtmldom - php

I am trying to extract data from a website using PHP with the help of simplehtmldom.
The problem is that there is a text which doesn't have any parent element.
<div class="div_1">First div</div> The text i need to grab <div class="div_2">Second div</div>
In the above demonstration i need to extract The text i need to grab.

If simplehtmldom is not required, try Regular Expressions:
<div[^<>]*class="div_1"[^<>]*>.*?</div>(.*?)<div[^<>]*class="div_2"[^<>]*>.*?</div>
http://regexr.com/3bd5f

Related

XPath data after closing tag

I have this HTML:
<span id="bla">text</span>more text
I want to get text and more text.
I have this XPath:
//span[#id="bla"]/text()
I can't figure out how to get the closing tag and what comes after it.

The more text is called a "tail" of an element and can be retrieved via following-sibling:
//span[#id="bla"]/following-sibling::text()

<span id="bla">text</span>more text alone is not well-formed and cannot be processed via XPath.
Let's put it in context:
<div><span id="bla">text</span>more text</div>
Then, you can simply take the string value of the parent element, div:
string(/div)
to get
textmore text
as requested.
If there's other surrounding content that you don't want:
<div>DO NOT WANT<span id="bla">text</span>more text<b/>DO NOT WANT</div>
You can follow #alecxe's lead with the following-sibling:: axis and use concat() to combine the parts you want:
concat(//span[#id="bla"], //span[#id="bla"]/following-sibling::text()[1])
to again get
textmore text
as requested.

Is there any way to rename or hide only one HTML tag?

Preface: I cannot rename the source tags or edit their IDs. Any changes to the tags must happen after they have been fetched.
What I'm doing: using file_get_contents in PHP, I am requesting data from a remote site. This data is just two <p> tags. I need to hide or rename the second of the two <p> tags.
Is this possible with PHP or jQuery?
What I'm working with:
<p>Hello my name is test</p><p>I like studying geology.</p>

If you need to hide second text, you can do this with Jquery:
$('p:eq(1)').hide();
Jsfiddle

You could try a php string replace
$new_string = str_replace('</p><p>','',file_get_contents('somecontent'));

If you need to do it before render HTML, you need to parse contents and remove/replace second p tag and create a new content.
Here is a DOM parser Simple HTML DOM Parser
Find similar questions below
How to match second <a> tag in this string
How to add attribute to first P tag using PHP regular expression?
Or you can do it after rendering HTML as rNix suggested.

Replacing Content inside DIV Tags

I am trying to replace content inside particular div tags (id="dd-header") with a comment. Tried several methods, and regular expressions. This is my latest try:
$html = preg_replace('/(<div\sid=\"dd\-header\">)[^<]+(<\/div>)/i', '<!-- Comment -->', $html);
Couldn't get it working. What am I doing wrong here?
NOTE: div tags further have multiple tags
Sample Code to Replace
<div id="dd-header">
<a id="logo-small" href="/" title="title"></a>
Link 1 |
Link 2
<!-- Image | -->
| Link 3</div>

$html = preg_replace('/(<div\sid="dd-header">)([^<]|<.+>.*<\/.+>)+(<\/div>)/i', '$1<!-- Comment -->$3', $html);
see http://codepad.org/GpYkteh4

While in simple cases you can do this, as posted by rabudde, you can't do the general case with regular expressions. It is a limitation of the regular expression language, and has been discussed extensively here on SO.
rabudde's code failes when a div contains sub-tags.
The correct way to do it is to parse the tree with an (X)HTML parser, find the div node, remove it's children, and replace with whatever you like.

Just use DOMDocument. It'll parse it into a DOM that is ridiculously easy to traverse, search by ID, and manipulate.
See the documentation, starting with loadHTML: http://docs.php.net/manual/en/domdocument.loadhtml.php

How can I scrape the text from a certain DIV using PHP and exlude html tags inside the DIV

I have a project I'm working on where I need to scrape the text out of a specific div tag but only the text no html tags.
Here is example of the html:
<div id="divid1" class="divclass1">
<h1>
TEXT INSIDE DIV
</h1>
</div>
I need to scrape the text inside the DIV w/ out the H1 tags. I've tried this numerous ways and just can't get it right.
Any suggestions? Thanks!

use PHP domparser, that is good for this purpose.
http://www.php.net/manual/en/domdocument.loadhtml.php

I would use PHP Simple HTML DOM Parser.
http://simplehtmldom.sourceforge.net/
You could say:
foreach ($html->find('div[divid1] h1') as $e)
echo $e->innertext;
This will echo the text within the h1 tag inside #divid1 (but not the tag itself).
The documentation is simple, but helps a bunch: http://simplehtmldom.sourceforge.net/manual.htm

Get element content from a variable containing html

How do I use the DOM parser to extract the content of a html element in a variable.
More exactly:
I have a form where user inputs html in a text area. I want to extract the content of the first paragraph.
I know there are many tutorials on this, but could not find any on extracting from variable and not a file(page)
Thanks

If you're taking HTML as user input, I recommend using simplehtmldom. It has a loose parser with tolerance for buggy html and lets you use CSS selectors to pull element and their content out of the DOM.
I didn't test this, but it should work:
$html = str_get_html($_POST['input']);
print $html->find('p:first')->plaintext; // first paragraph

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

How to select content without element using simplehtmldom - php

If simplehtmldom is not required, try Regular Expressions: <div[^<>]class="div_1"[^<>]>.?</div>(.?)<div[^<>]class="div_2"[^<>]>.*?</div> http://regexr.com/3bd5f

Related

XPath data after closing tag

Is there any way to rename or hide only one HTML tag?

Replacing Content inside DIV Tags

How can I scrape the text from a certain DIV using PHP and exlude html tags inside the DIV

Get element content from a variable containing html

Categories

Resources

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

How to select content without element using simplehtmldom - php

If simplehtmldom is not required, try Regular Expressions: <div[^<>]*class="div_1"[^<>]*>.*?</div>(.*?)<div[^<>]*class="div_2"[^<>]*>.*?</div> http://regexr.com/3bd5f

Related

XPath data after closing tag

Is there any way to rename or hide only one HTML tag?

Replacing Content inside DIV Tags

How can I scrape the text from a certain DIV using PHP and exlude html tags inside the DIV

Get element content from a variable containing html

Categories

Resources

If simplehtmldom is not required, try Regular Expressions: <div[^<>]class="div_1"[^<>]>.?</div>(.?)<div[^<>]class="div_2"[^<>]>.*?</div> http://regexr.com/3bd5f