Retrieve value of a textarea with PHP - php

Would anyone perhaps know how to get the value of a specific element in an HTML document with PHP? What I'm doing right now is using file_get_contents to pull up the HTML code from another website, and on that website there is a textarea:
<textarea id="body" name="body" rows="12" cols="75" tabindex="1">Hello World!</textarea>
What I want to do is have my script do the file_get_contents and just pull out the "Hello World!" from the textarea. Is that possible? Sorry for bugging you guys, again, you give such helpful advice :].

Don't be sorry for bugging us, this is a good question I'm happy to answer. You can use PHP Simple HTML DOM Parser to get what you need:
$html = file_get_html('http://www.domain.com/');
$textarea = $html->find('textarea[id=body]');
$contents = $textarea->innertext;
echo $contents; // Outputs 'Hello World!'
If you want to use file_get_contents(), you can do it like this:
$raw_html = file_get_contents('http://www.domain.com/');
$html = str_get_html($raw_html);
...
Although I don't see any need for the file_get_contents() as you can use the outertext method to get the original, full HTML out if you need it somewhere:
$html = file_get_html('http://www.domain.com/');
$raw_html = $html->outertext;
Just for the kicks, you can do this also with an one-liner regular expression:
preg_match('~<textarea id="body".*?>(.*?)</textarea>~', file_get_contents('http://www.domain.com/'), $matches);
echo $matches[1][0]; // Outputs 'Hello World!'
I'd strongly advise against this though as you are a lot more vulnerable to code changes which might break this regular expression.

I'd suggest using PHPs DOM & DOMXPath classes.
$dom = DOMDocument::loadHTMLFile( $url );
$xpath = new DOMXPath( $dom );
$nodes = $xpath->query('//textarea[id=body]' )
$result = array();
for( $nodes as $node ) {
$result[] = $node->textContent;
}
There $result would contain the value of every textarea with id body.

Related

How to get <pre> tag contents using preg_match_all?

I need to scrap the webspage inside the <pre> tag contents. i am using preg_match_all function but its not working.
My Scraping Website <pre> tag content is given below.
<pre># Mon Jul 22 03:10:03 CDT 2013
99.46.177.18
99.27.119.169
99.254.168.132
99.245.96.210
99.245.29.38
99.240.245.97
99.239.100.211
<pre>
Php file
Updated
$data = file_get_contents('http://www.infiltrated.net/blacklisted');
preg_match_all ("/<pre>([^`]*?)<\/pre>/", $data, $matches);
print_r($matches);
exit;
My php file returns empty array. i know my preg_match_all function is a problem.
how can i get the pre tag contents. please guide me.
Edit Question
I can run #Pieter script. but its returns only Array()
My script is given below.
<?php
$url = 'http://www.infiltrated.net/blacklisted';
$data = new DOMDocument();
$data->loadHTML(file_get_contents($url));
$xpath = new DomXpath($data);
$pre_tags = array();
foreach($xpath->query('//pre') as $node){
$pre_tags[] = $node->nodeValue;
}
print_r($pre_tags);
exit;
?>
Use the PHP functions to loop through DOM. Using Regex-patterns for HTML tags is strongly discouraged.
Try this code:
$data = new DOMDocument();
$data->loadHTML(file_get_contents($url));
$xpath = new DomXpath($data);
$pre_tags = array();
foreach($xpath->query('//pre') as $node){
$pre_tags[] = $node->nodeValue;
}
Or try PHP Simple HTML DOM Parser, see: http://simplehtmldom.sourceforge.net/
Finally I got it. This http://www.infiltrated.net/blacklisted url is loading from one text file.so only the pre tags shows in the page source. so i am using this method.
$array = explode("\n", file_get_contents('http://www.infiltrated.net/blacklisted'));
print_r($array);
Finally its working greet.

Regex to find target="_blank" links and add text before closing </a> tag

I need to be able to parse some text and find all the instances where an tag has target="_blank".... and for each match, add (for example): This link opens in a new window before the closeing tag.
For example:
Before:
Go here now
After:
Go here now<span>(This link opens in a new window)</span>
This is for a PHP site, so i assume preg_replace() will be the method... i just dont have the skills to write the regex properly.
Thanks in advance for any help anyone can offer.
You should never use a regex to parse HTML, except maybe in extremely well-defined and controlled circumstances.
Instead, try a built-in parser:
$dom = new DOMDocument();
$dom->loadHTML($your_html_source);
$xpath = new DOMXPath($dom);
$links = $xpath->query("//a[#target='_blank']");
foreach($links as $link) {
$link->appendChild($dom->createTextNode(" (This link opens in a new window)"));
}
$output = $dom->saveHTML();
Aternatively, if this is being output to the browser, you can just use CSS:
a[target='_blank']:after {
content: ' (This link opens in a new window)';
}
This will work for anchor tag replacement....
$string = str_replace('<a ','<a target="_blank" ',$string);
Well #Kolink is right, but there's my RegExp version.
$string = '<p>mess</p>Google<p>mess</p>';
echo preg_replace("/(\<a.*?target=\"_blank\".*?>)(.*?)(\<\/a\>)/miU","$1$2(This link opens in a new window)$3",$string);
This does the job:
$newText = '<span>(This link opens in a new window)</span>';
$pattern = '~<a\s[^>]*?\btarget\s*=(?:\s*([\'"])_blank\1|_blank\b)[^>]*>[^<]*(?:<(?!/a>)[^<]*)*\K~i';
echo preg_replace($pattern, $newText, $html);
However this direct string approach may replace also commented html parts, strings or comments in css or javascript code and eventually inside javascript literal regexes, that is at best unneeded and at worst unwanted at all. That's why you should use a DOM approach if you want to avoid these pitfalls. All you have to do is to append a new node to each link with the desired attribute:
$dom = new DOMDocument;
libxml_use_internal_errors(true);
$dom->loadHTML($html);
$xp = new DOMXPath($dom);
$nodeList = $xp->query('//a[#target="_blank"]');
foreach($nodeList as $node) {
$newNode = dom->createElement('span', '(This link opens in a new window)');
$node->appendChild($newNode);
}
$html = $dom->saveHTML();
To finish, a last alternative consists to not change the html at all and to play with css:
a[target="_blank"]::after {
content: " (This link opens in a new window)";
font-style: italic;
color: red;
}
You won't be able to write a regex that will evaluate an infinitely long string. I suggest:
$h = explode('>', $html);
This will give you the chance to traverse it like any other array and then do:
foreach($h as $k){
if(!preg_match('/^<a href=/', $k){
continue;
}elseif(!preg_match(/target="_blank")/, $k){
continue;
}else{
$h[$k + 1] .= '(open in new window);
}
}
$html = implode('>', $h);
This is how I would approach such a problem. of course, I just threw this out off the top of my head and is note guaranteed to work as is, but with a few possible tweaks to your exact logic, and you will have what you need.

Read page source using PHP with primes "

I am trying to read the source code of a page. I just want to read some text that is within a certain division element with the id "wrapper_left".
My problem is that if a prime " is used in the first argument of the explode function, it does not work. I tried escaping the string, although I figured this wouldn't do anything.
$source_code = htmlspecialchars(file_get_contents('http://mydomain.com'));
$source_code = explode('<div id="wrapper_left">', $source_code);
echo $source_code[1];
Thanks tons in advance.
Don't bother trying to get this done with explode(), string manipulation, or a regular expression, you need an HTML parser, like DOMDocument:
$doc = new DOMDocument;
$doc->loadHTMLFile( 'http://mydomain.com');
$xpath = new DOMXPath( $doc);
$div = $xpath->query( '//div[#id="wrapper_left"]')->item(0);
echo $div->textContent;
You can see it working in this demo, which, when fed this HTML:
<div id="wrapper_left">Some text</div>
It produces:
Some text

php insert text into a href

I'm working on using htmlpurifier to create a text-only version of my site.
I now need to replace all the a hrefs with the text only url i.e. 'www.example.com/aboutus' becomes 'www.example.com/text/aboutus'
Initially I tried a simple str_replace on the domain (I use a global variable for the domain), but the problem is links to files also get replaced i.e.
'www.example.com/document.pdf' becomes 'www.example.com/text/document.pdf' and therefore fails.
Is there a regular expression where I can say replace domain with domain/text where the url does not include string?
Thanks for any pointers you might be able to give me :)
Use a negative lookahead:
$output = preg_replace(
'#www.example.com(?!/text/)#',
'www.example.com/text',
$input
);
Better yet, use DOM with it:
$html = 'foo
<p>hello</p>
bar';
libxml_use_internal_errors(true); // supresses DOM errors
$dom = new DOMDocument;
$dom->loadHTML($html);
$xpath = new DOMXPath($dom);
$hrefs = $xpath->query('//a/#href');
foreach ($hrefs as $href) {
$href->value = preg_replace(
'#^www.example.com(?!/text/)(.*?)(?<!\.pdf)$#',
'www.example.com/text\\1',
$href->value
);
}
This should give you:
foo
<p>hello</p>
bar

PHP - How to replace a phrase with another?

How can i replace this <p><span class="headline"> with this <p class="headline"><span>
easiest with PHP.
$data = file_get_contents("http://www.ihr-apotheker.de/cs1.html");
$clean1 = strstr($data, '<p>');
$str = preg_replace('#(<a.*>).*?(</a>)#', '$1$2', $clean1);
$ausgabe = strip_tags($str, '<p>');
echo $ausgabe;
Before I alter the html from the site I want to get the class declaration from the span to the <p> tag.
dont parse html with regex!
this class should provide what you need
http://simplehtmldom.sourceforge.net/
The reason not to parse HTML with regex is if you can't guarantee the format. If you already know the format of the string, you don't have to worry about having a complete parser.
In your case, if you know that's the format, you can use str_replace
str_replace('<p><span class="headline">', '<p class="headline"><span>', $data);
Well, answer was accepted already, but anyway, here is how to do it with native DOM:
$dom = new DOMDocument;
$dom->loadHTMLFile("http://www.ihr-apotheker.de/cs1.html");
$xPath = new DOMXpath($dom);
// remove links but keep link text
foreach($xPath->query('//a') as $link) {
$link->parentNode->replaceChild(
$dom->createTextNode($link->nodeValue), $link);
}
// switch classes
foreach($xPath->query('//p/span[#class="headline"]') as $node) {
$node->removeAttribute('class');
$node->parentNode->setAttribute('class', 'headline');
}
echo $dom->saveHTML();
On a sidenote, HTML has elements for headings, so why not use a <h*> element instead of using the semantically superfluous "headline" class.
Have you tried using str_replace?
If the placement of the <p> and <span> tags are consistent, you can simply replace one for the other with
str_replace("replacement", "part to replace", $string);

Categories