file_get_contents and div

file_get_contents and div - php

What's wrong with my code?
I wish to get all dates from
but my array is empty.
<?php
$url = "http://weather.yahoo.com/";
$page_all = file_get_contents($url);
preg_match_all('#<div id="myLocContainer">(.*)</div>#', $page_all, $div_array);
echo "<pre>";
print_r($div_array);
echo "</pre>";
?>
Thanks

You want to parse a multiline content but you did not use multiline switch of REGEX pattern.
Try using this:
preg_match_all('#<div id="myLocContainer">(.*?)</div>#sim', $page_all, $div_array);
Please note that regular expressions is not suitable for parsing HTML content because of the hierachical nature of HTML documents.

try adding "m" and "s" modifiers, new lines might be in the div you need.. like this:
preg_match_all('#<div id="myLocContainer">(.*)</div>#ms', $page_all, $div_array);

Before messing around with REGEX, try HTML Scraping. This HTML Scraping in Php might give some ideas on how to do it in a more elegant and (possibly) faster way.

$doc = new DomDocument;
$doc->Load('http://weather.yahoo.com/');
$doc->getElementById('myLocContainer');

you need to Excape Special Characters in your Regular Expression like the following
~\<div id\=\"myLocContainer\"\>(.*)\<\/div\>~
also Checkout wheather there is a newline problem or not as mentioned by #eyazici and #kgb

Test your response before running the regex search. Then you'll know which part isn't working.

Related

PHP Regex to remove HTML-Tag

I am looking for a way to search a string in PHP and remove "<pre", "</pre>" and everything, that is in between.
Example:
$string = 'Hello, I am a little text. <pre class="foo">This should be deleted.</pre> This is fine again.';
// Some magic function
$newString = 'Hello, I am a little text. This is fine again.';
Is there any way to do it? If I use strip_tags(), only the tags will be removed, but now the content inside of the tags.
Thank you very much!

If it's just a small string, I don't recommend it but regex would be alright here.
$newString = preg_replace('~<pre[^>]*>[^<]*</pre>~', '', $str);
However, I always use DOM when dealing with HTML/XML.
$doc = new DOMDocument;
$doc->loadHTML($html);
foreach ($doc->getElementsByTagName('pre') as $tag) {
$tag->parentNode->removeChild($tag);
}

I'd use #hwnd's parsing example below (or above), that's a lot safer than using regex.
You could use something like this:
/<(.*?)(\h*).*?>(.*?)<\/\1>/
Demo: https://regex101.com/r/cN9rL4/3
PHP Demo: https://eval.in/415470
echo preg_replace('/<(.*?)(\h*).*?>(.*?)<\/\1>/s', '', 'Hello, I am a little text. <pre class="foo">This should be deleted.</pre> This is fine again.');
Output:
Hello, I am a little text. This is fine again.
Edit: added s modifier in case the content exceeds one line, demo of failure https://regex101.com/r/cN9rL4/2.
Also note this isn't specific to pre this will replace any elements it encounters that close.

[php]how to extract a single simple text from a long html source

i have a html like this:
......whatever very long html.....
<span class="title">hello world!</span>
......whatever very long html......
it is a very long html and i only want the content 'hello world!' from this html
i got this html by
$result = file_get_contents($url , false, $context);
many people were using Simple HTML DOM parser, but i think in this case, using regex would be more efficient.
how should i do it? any suggestions? any help would be really great.
thanks in advance!

Stick with the DOM parser - it is better. Having said that, you could use a REGEX like this...
// where the html is stored in `$html`
preg_match('/<span class="title">(.+?)<\/span>/', $html, $m);
$whatYouWant = $m[1];
preg_match() stores an array of all the elements captured inside brackets in the regex, and a 0th element which is the entire captured string. The regex is very simple in this case, being almost a direct string match for what you want, with the closing span tag's slash escaped. The captured part just means any character (.) one or more times (+) un-greedily (?).

No, I really don't think regEx or similar functions would be either more effective or easier.
If you would use SimpleHTML DOM, you could quickly get the data you are looking for like this:
//Get your file
$html = file_get_html('myfile.html');
//Use jQuery style selectors
$spanValue = $html->find('span.title')->plaintext;
echo($spanValue);
with preg_match you could do like this:
preg_match("/<span class=\"title\">([^`]*?)<\/span>/", $data, $matches);
or this, if there are multiple spans with the class "title":
preg_match_all("/<span class=\"title\">([^`]*?)<\/span>/", $data, $matches);

How to remove a href when using get_contents

I am really new to php so still getting to grips.
I am using this bit of code to pull in world market feed.
<?php
$homepage = file_get_contents('http://www.news4trader.com/cgi-bin/google_finance.cgi?widget=worldmarkets');
echo $homepage;
?>
I just wanted to know how I can strip the google links out of it so the market titles are just static text.
All help is very much appreciated.

You can use the PHP function strip_tags() like this:
<?php
$homepage = file_get_contents('http://www.news4trader.com/cgi-bin/google_finance.cgi?widget=worldmarkets');
echo strip_tags($homepage, "<style><div><table><tr><td>");
?>
Just include all the tags you want to allow in the second argument.

You can use preg_replace() with a regex pattern to filter it out. This is simple, but not very flexible if you want to work more with your loaded data. PHP provides a nice library called DOMDocument (http://php.net/manual/de/class.domdocument.php), with which you can work very flexible on your document.

you could use "The DOMDocument class" it's used for exactly that.
http://php.net/manual/en/class.domdocument.php
you should have the basic idea of oop.
if you struggle with it, you could use strpos, and substr and such, but that would be hard.
strpos: http://php.net/manual/en/function.strpos.php
substr: http://php.net/manual/en/function.substr.php

you can use regex something like this:
/<a (.+google.+)>.+<\/a>/
This matches link that has any attribute or value with word google in it

HTML span tag content and attribute preg_match regular expressions

Does anyone have a fix for the following;
I'm taking the following string:
<span id="tmpl_main_lblWord" class="randomWord">kid</span>
and using the following preg_match / regex rule;
preg_match("'/(<span id=.* class=.*>)(.*)(<\/span>)/'si", $buffer, $match);
But its returning with an empty array any ideas?

The following example uses DOMDocument:
$doc = new DOMDocument();
$doc->loadHtml('<span id="tmpl_main_lblWord" class="randomWord">kid</span>');
$el = $doc->getElementById('tmpl_main_lblWord');
echo 'Inner text is: ' . $el->textContent;

In general I would strongly advise against using regex to try and get values from HTML. I would use an HTML parser. See this question: Robust and Mature HTML Parser for PHP
If you insist though... you seem to have two sets of nested quotes. I would remove the inner single quotes. That should solve your problem.

How to remove text between tags in php?

Despite using PHP for years, I've never really learnt how to use expressions to truncate strings properly... which is now biting me in the backside!
Can anyone provide me with some help truncating this? I need to chop out the text portion from the url, turning
text
into

$str = preg_replace('#(<a.*?>).*?(</a>)#', '$1$2', $str)

Using SimpleHTMLDom:
<?php
// example of how to modify anchor innerText
include('simple_html_dom.php');
// get DOM from URL or file
$html = file_get_html('http://www.example.com/');
//set innerText to null for each anchor
foreach($html->find('a') as $e) {
$e->innerText = null;
}
// dump contents
echo $html;
?>

What about something like this, considering you might want to re-use it with other hrefs :
$str = 'text';
$result = preg_replace('#(<a[^>]*>).*?(</a>)#', '$1$2', $str);
var_dump($result);
Which will get you :
string '' (length=24)
(I'm considering you made a typo in the OP ? )
If you don't need to match any other href, you could use something like :
$str = 'text';
$result = preg_replace('#().*?()#', '$1$2', $str);
var_dump($result);
Which will also get you :
string '' (length=24)
As a sidenote : for more complex HTML, don't try to use regular expressions : they work fine for this kind of simple situation, but for a real-life HTML portion, they don't really help, in general : HTML is not quite "regular" "enough" to be parsed by regexes.

You could use substring in combination with stringpos, eventhough this is not
a very nice approach.
Check: PHP Manual - String functions
Another way would be to write a regular expression to match your criteria.
But in order to get your problem solved quickly the string functions will do...
EDIT: I underestimated the audience. ;) Go ahead with the regexes... ^^

You don't need to capture the tags themselves. Just target the text between the tags and replace it with an empty string. Super simple.
Demo of both techniques
Code:
$string = 'text';
echo preg_replace('/<a[^>]*>\K[^<]*/', '', $string);
// the opening tag--^^^^^^^^ ^^^^^-match everything before the end tag
// ^^-restart fullstring match
Output:
Or in fringe cases when the link text contains a <, use this: ~<a[^>]*>\K.*?(?=</a>)~
This avoids the expense of capture groups using a lazy quantifier, the fullstring restarting \K and a "lookahead".
Older & wiser:
If you are parsing valid html, you should use a dom parser for stability/accuracy. Regex is DOM-ignorant, so if there is a tag attribute value containing a >, my snippet will fail.
As a narrowly suited domdocument solution to offer some context:
$dom = new DOMDocument;
$dom->loadHTML($string, LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD); // 2nd params to remove DOCTYPE);
$dom->getElementsByTagName('a')[0]->nodeValue = '';
echo $dom->saveHTML();

Only use strip_tags(), that would get rid of the tags and left only the desired text between them

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

file_get_contents and div - php

What's wrong with my code? I wish to get all dates from but my array is empty. <?php $url = "http://weather.yahoo.com/"; $page_all = file_get_contents($url); preg_match_all('#<div id="myLocContainer">(.*)</div>#', $page_all, $div_array); echo "<pre>"; print_r($div_array); echo "</pre>"; ?> Thanks

try adding "m" and "s" modifiers, new lines might be in the div you need.. like this: preg_match_all('#<div id="myLocContainer">(.*)</div>#ms', $page_all, $div_array);

Before messing around with REGEX, try HTML Scraping. This HTML Scraping in Php might give some ideas on how to do it in a more elegant and (possibly) faster way.

$doc = new DomDocument; $doc->Load('http://weather.yahoo.com/'); $doc->getElementById('myLocContainer');

you need to Excape Special Characters in your Regular Expression like the following ~\<div id\=\"myLocContainer\"\>(.*)\<\/div\>~ also Checkout wheather there is a newline problem or not as mentioned by #eyazici and #kgb

Test your response before running the regex search. Then you'll know which part isn't working.

Related

PHP Regex to remove HTML-Tag

[php]how to extract a single simple text from a long html source

How to remove a href when using get_contents

HTML span tag content and attribute preg_match regular expressions

How to remove text between tags in php?

Categories

Resources