My html output source is something like this
<td><span class="bookdetailtitle">ISBN</span></td>
<td>:</td>
<td>9788172338299</td>
I need only 9788172338299 to be printed. If the above code is in same line, it prints properly. But since there are new lines and tabs, I'm not getting the output. I tried replacing /i with /s, but not working. I want preg_match to match the string regardless of new lines or tabs and print the desired output.
Here is my code:
$page2='<td><span class="bookdetailtitle">ISBN</span></td>
<td>:</td>
<td>9788172338299</td>';
preg_match('/<td><span class="bookdetailtitle">ISBN<\/span><\/td><td>:<\/td><td>(.*)<\/td>/s', $page2, $keywords);
echo $keywords_out = $keywords[1];
If you need just number, something like this?
$page2='<td><span class="bookdetailtitle">ISBN</span></td>
<td>:</td>
<td>9788172338299</td>';
preg_match('/<td>+[0-9]+<\/td>/', $page2, $keywords);
print_r($keywords);
http://phpfiddle.org/main/code/43j-t8b
P.S. many will say - don't use regex for parsing html. I agree. :)
I would do something like this:
$page=explode('<td>',$page2);
print_r($page[3]);
http://phpfiddle.org/main/code/buf-95c
Edit: to get rid of last td -> print_r(strip_tags($page[3]));
Related
I know I can do a simple replace when wanting to convert <br> tags to new lines. But I am facing a problem with parsing because provided <br> tags are not empty.
<br style=\"color: rgb(83, 83, 83); font-family: \" helvetica=\"\" ...
Back end is not mine, so there is no point in discussing about good or bad coding here, I am just wondering if there is a solution to replace those with simple new lines.
Something like nl2br() but reverse.
EDIT:
Don't know what use is to show code when I know 'why' is the thing that I've tried not working...but here goes
public function removeSingleHtmlFormatting($single)
{
$single->short_description = str_replace("<br>", "\r\n", $single->short_description);
$single->short_description = strip_tags($single->description);
$single->short_description = preg_replace("/ /", " ", $single->short_description);
}
Of course replace doesn't work because there is no such string to replace...I have no idea where to start parsing it
instead of
str_replace("<br>", "\r\n", $single->short_description);
try
preg_replace("/<br.*>/U", "\r\n", $single->short_description);
This way the regular expression matches <br> including anything inside it, not only empty <br>.
On the basis of data I have saved an article that comes with HTML as <br>, <table> ... etc. ....
example :
<tr>
<td style="width:50%;text-align:left;">Temas nuevos - Comunidad</td>
<td class="blocksubhead" style="width:50%;text-align:left;">Temas actualizados - Comunidad Temas actualizados - Comunidad</td>
</tr>
What I want is displayed on another screen a summary of the article using substr (), my problem is I can not print what I want, and that prints the html code eta first.
Example: echo substr($row["news"], 0, 20);
It is printing the first 20 characters, it only show at browser:
<td style="width:50%;text-align:l<td/>
What I want is, it only show the text and discard the html code it has
Use strip_tags() to strip html etc from the string...
So: echo substr(strip_tags($row["news"]), 0, 20);
http://php.net/manual/en/function.strip-tags.php
You could also do it using preg_replace() to match and replace anything that looks like a tag :)
The strip_tags() function strips a string from HTML, XML, and PHP tags.
//remove the html from the string.
$row["news"] = strip_tags($row["news"]);
//getting the first 20 character from the string and display as output.
echo substr($row["news"], 0, 20);
I'm trying to use preg_replace to strip out a section of code but I am having problems getting it to work right.
Code Example:
$str = '<p class="code">some string here</p>';
PHP I'm using:
$pattern = array();
$pattern[0] = '!<p class="code">!';
$pattern[1] = '!</p>!';
preg_replace($pattern,"", $str);
This strips out the code just as I want with the exception of the space between the p and class.
Returns:
some string here //notice the single space at the beginning.
I'm trying to get:
some string here //no space at the beginning.
I have been beating my head against the wall trying to find a solution. The reason I'm trying to strip it out in a chunk instead of breaking the preg_replace into pieces is because I don't want to change anything that may be in the string between the tags. Any ideas?
That does not happen for me (and it shouldn't).
It may be a space output somewhere else (use var_dump() to view the string).
You might want to look into this thread to see if you want to switch to using DOMDocument. It'll save you a great deal of headaches trying to parse through HTML.
Robust and Mature HTML Parser for PHP
test:
<?php
$str = '<p class="code">some string here</p>';
$pattern = array();
$pattern[0] = '!<p class="code">!';
$pattern[1] = '!</p>!';
$result = preg_replace($pattern,"", $str);
var_dump($result);
result:
php pregrep.php
string(16) "some string here"
seems to work just fine.
Alex I figured out where I was picking up the extra space.
I was putting that code into a text area like this:
$str = '<p class="code">some string here</p>';
$pattern = array();
$pattern[0] = '!<p class="code">!';
$pattern[1] = '!</p>!';
$strip_str = preg_replace($pattern,"", $str);
<textarea id="code_area" class="syntaxhl" name="code" cols="66" rows="5">
<?php echo $strip_str; ?>
</textarea>
This gave me my extra space but when I changed the code to:
<textarea id="code_area" class="syntaxhl" name="code" cols="66" rows="5"><?php echo $strip_str; ?></textarea>
No line spaces or breaks the extra space went away.
Why not use trim()?
$text = trim($text);
This removes white spaces around strings.
I have a string thats separated by a space. I want to show every part of the string on new line that is separated by space. how can I do that.
base1|123|wen dsj|test base2|sa|7243|sdg custom3|dskkjds|823|kd
if there is no more | after an initial pipe then the space should break the line and it should look like this
base1|123|wen dsj|test
base2|sa|7243|sdg
custom3|dskkjds|823|kd
echo str_replace(' ',"\n",$string);
or
echo str_replace(' ',PHP_EOL,$string);
This is pretty messy, yet to clean up the last empty result:
$string = 'base1|123|wen dsj|test base2|sa|7243|sdg custom3|dskkjds|823|kd';
preg_match_all('/(?P<line>(?:[^\\| ]*\\|{0,1})*(?: [^\\| ]*\\|[^\\| ]*(?: |\\z){0,1})*)(?: |\\z)/',$string,$matches,PREG_SET_ORDER);
print_r($matches);
Edit: Actually this is pretty horrible
I'm trying to make a regex for taking some data out of a table.
the code i've got now is:
<table>
<tr>
<td>quote1</td>
<td>have you trying it off and on again ?</td>
</tr>
<tr>
<td>quote65</td>
<td>You wouldn't steal a helmet of a policeman</td>
</tr>
</table>
This I want to replace by:
quote1:have you trying it off and on again ?
quote65:You wouldn't steal a helmet of a policeman
the code that I already have written is this:
%<td>((?s).*?)</td>%
But now I'm stuck.
If you really want to use regexes (might be OK if you are really really sure your string will always be formatted like that), what about something like this, in your case :
$str = <<<A
<table>
<tr>
<td>quote1</td>
<td>have you trying it off and on again ?</td>
</tr>
<tr>
<td>quote65</td>
<td>You wouldn't steal a helmet of a policeman</td>
</tr>
</table>
A;
$matches = array();
preg_match_all('#<tr>\s+?<td>(.*?)</td>\s+?<td>(.*?)</td>\s+?</tr>#', $str, $matches);
var_dump($matches);
A few words about the regex :
<tr>
then any number of spaces
then <td>
then what you want to capture
then </td>
and the same again
and finally, </tr>
And I use :
? in the regex to match in non-greedy mode
preg_match_all to get all the matches
You then get the results you want in $matches[1] and $matches[2] (not $matches[0]) ; here's the output of the var_dump I used (I've remove entry 0, to make it shorter) :
array
0 =>
...
1 =>
array
0 => string 'quote1' (length=6)
1 => string 'quote65' (length=7)
2 =>
array
0 => string 'have you trying it off and on again ?' (length=37)
1 => string 'You wouldn't steal a helmet of a policeman' (length=42)
You then just need to manipulate this array, with some strings concatenation or the like ; for instance, like this :
$num = count($matches[1]);
for ($i=0 ; $i<$num ; $i++) {
echo $matches[1][$i] . ':' . $matches[2][$i] . '<br />';
}
And you get :
quote1:have you trying it off and on again ?
quote65:You wouldn't steal a helmet of a policeman
Note : you should add some security checks (like preg_match_all must return true, count must be at least 1, ...)
As a side note : using regex to parse HTML is generally not a really good idea ; if you can use a real parser, it should be way safer...
Tim's regex probably works, but you may want to consider using the DOM functionality of PHP instead of regex, as it may be more reliable in dealing with minor changes in the markup.
See the loadHTML method
As usual, extracting text from HTML and other non-regular languages should be done with a parser - regexes can cause problems here. But if you're certain of your data's structure, you could use
%<td>((?s).*?)</td>\s*<td>((?s).*?)</td>%
to find the two pieces of text. \1:\2 would then be the replacement.
If the text cannot span more than one line, you'd be safer dropping the (?s) bits...
Extract each content from <td>
preg_match_all("%\<td((?s).*?)</td>%", $respose, $mathes);
var_dump($mathes);
Don't use regex, use a HTML parser. Such as the PHP Simple HTML DOM Parser