I was goog for hours and just cannot find an answer. Please suggest:
Having a .html file that contains only user comments in paragraphs like:
<p>12/02/2012 4:32pm Mark</p>
<p>Hi! it's a nice demo! Really thankful</p>
<hr>
<p>11/02/2012 11:03am Miron</p>
<p>How to change the font size from CFD again?</p>
<hr>
<!-- AND LOADS OF OTHER <P><P> COMMENTS DELIMITED BY <HR> ... -->
There's 1000's of comments structured like this,
I'd like to grab somehow the newest 10 (not by date, just the first 'ten' comments). And I don't know how.
I know I can use jQuery's .load('comments.html') and than remove all the elements but the first 10 comments, or even include the whole file with PHP and than do the .hide() with jQuery... but it's a good idea to load the whole file for just 10 comments?
How to split that file and get inside an <div id="latest_10_comments"></div> the first 10 comments from the comments.html file?
I know you wanted a JavaScript solution but you could do this in PHP by using the explode function.
Something like this:
$comments = explode("<hr>", file_get_contents("/comments.html"));
for($i = 0; $i < 10; $i++) {
print($comments[$i]);
}
This creates an array called $comments which is each comment in comments.html separated by a
<hr>
tag.
First, I'd suggest reconsidering your approach to this problem entirely. Why are you storing everything an in HTML file this way? You should either store it as an XML file or store it in your database if you want to dynamically load certain comments on demand.
However, to answer your question you're going to need to use an X/HTML parser like PHP's DomDocument if you want to do this in PHP. Here's a working example...
EDIT (changed to reflect the OP's desired behavior):
$dom = new DomDocument;
$dom->loadHTMLFile("comments.html");
// Get all the P tag elements in the DOM
$comments = $dom->getElementsByTagName('p');
// Get only the first 10
$amount = 10; // number of comments you want
foreach ($comments as $num => $comment_nodes) {
if ($num + 1 > $amount)
break;
echo $comment_nodes->nodeValue, PHP_EOL;
}
Solution 1. You can use a RegEx pattern to match 2 p tags followed by hr and repeat the pattern for 10 times.
Solution 2.
Idea from other answer(CHRIS), but as that has error in PHP, I am suggesting this.
$comments = explode("<hr>", file_get_contents("/comments.html"));
for($i = 0; $i < 10; $i++) {
print($comments[$i]);
}
$("<p>").each(function(index, value)
{
//Do what you want here
}
This will cycle through all your <p>. If you know the order of the elements then you can do what you want with them based on index.
Related
the source of this problem is because I'm running ads on my website, my content is mainly HTML stored in a database, so I decided to place "In-Text Ads", ads that are not in a fixed zone.
My solution was to explode the content by paragraphs and place the text ad in the middle of the p tags, which worked pretty cool since I use CKEditor to generate the content, I thought images, blockquotes, and other tags would be nested inside p tags (fool me) I realize now that images and blockquotes disappeared from my posts, what did I do next? I changed my code to explode using * instead of exploding by p tag, I sang victory too soon, because now I get a lot of duplicate content, for example, if I have one image now I get the same image 4 times as well as all other tags, I´m not sure about the source of this duplicates but I think It has something to do with nested HTML, I looked for a solution for hours and now I'm here asking to see whether somebody can help me solve this headache
Here is my code:
//In a helper file
function splitByHTMLTagName(string $string, string $tagName = 'p')
{
$text = <<<TEXT
$string
TEXT;
libxml_use_internal_errors(true);
$dom = new DOMDocument();
$nodes = [];
$dom->loadHTML('<?xml encoding="utf-8" ?>' . $text);
foreach ($dom->getElementsByTagName($tagName) as $node) {
array_push($nodes, $dom->saveHTML($node));
}
libxml_clear_errors();
return $nodes;
}
//In my view
$text = nl2br($database['content']);
$nodes = splitByHTMLTagName($text, '*');
//Using var_dump($nodes); here shows the duplicates are here already.
$nodes_count = count($nodes);
$show_ad_at = -1;
$was_added = false;
if($nodes_count % 2 == 0 ){
$show_ad_at = $nodes_count /2;
}else if ($nodes_count == 1 || $nodes_count < 3){
$show_ad_at = -1; //add later
}else if ($nodes_count > 3 && $nodes_count % 2 != 0){
$show_ad_at = ceil($nodes_count/2);
}
for($i = 0; $i<count($nodes); $i++){
if(!$was_added && $i == $show_ad_at){
$was_added = true;
?>
<div>
<script></script><!--This script is provided to me, it adds the ad where it is placed, I don't show the full script, It has nothing to do with the duplicates problem-->
</div>
<?php
}
echo $nodes[$i]; //print the node that comes from $nodes array where the duplicates already exist
}
if(!$was_added){
$was_added = true;
?>
<div>
<script></script><!--This script is provided to me, it adds the ad where it is placed, I don't show the full script, It has nothing to do with the duplicates problem-->
</div>
<?php
}
What can I do?
Thanks in advance.
Postdata #1: I use codeigniter as PHP Framework
Postdata #2: My ads provider does not implement "In-Text ads" as a feature like google does.
It seems you are printing the "ads block" inside if statement.
If I don't misunderstood your code is like
foreach ... {
if (strpos($html_line, "In-Text Ads") !== FALSE) {
print($ads_html);
}
I think, you should use str_replace() instead of print() like functions, if you are using something like print() when you outputting the value...
I want to cutoff everything but the first paragraph from an rte field for an excerpt:
20 = HTML
20.value.field = tx_myextention_field
20.value.parseFunc < lib.parseFunc_RTE
20.wrap = <p class="claim-long">|</p>
20.stdWrap.replacement {
10 {
search = /^(.*?\/p).*$/m
replace = \1>
useRegExp = 1
}
}
Why is this regex not working?
Or is there a better solution?
You could use stdWrap.cropHTML to achieve a similar effect. It would also shorten a long first paragraph, and use more than one paragraph, if the first one is too short. But maybe thats desirable in your situation?
Please be aware that the HTML cObject was deprecated in TYPO3 4.6. You should use the TEXT cObject.
I suspect that in your case the parseFunc was not properly applied because stdWrap cannot be used on the value but directly on the object. Without stdWrap, the newlines saved in the database are not transformed to <p> tags and therefore your regex couldn't apply.
I tried to fix your TypoScript (but it is untested):
20 = TEXT
20.field = tx_myextension_field
20.stdWrap.parseFunc < lib.parseFunc_RTE
20.stdWrap.replacement {
10 {
search = /^(.*?\/p).*$/m
replace = \1>
useRegExp = 1
}
}
20.wrap = <p class="claim-long">|</p>
Note: I'm sorry if the title was a little unclear couldn't think of another way to put it.
I am making a PHP posting system for a blog like website. I have a file called posts.txt which has information that points to other text files. These other text files have the physical post content in them. I know this is not the best way to do it but for now this is what I'm doing.
A sample of the posts.txt:
posts/topDownShooter.txt
posts/leapMotionSandbox.txt
end
The first two lines point to other text files that contain post content. The last line "end" lets the program know that all the post "pointers" are done
Here is a sample of a post like topDownShooter.txt
programming
Top Down Shooter
The actual post content goes here
end
The first line is a tag for organization. The second line is the title of the post. And the third is the actual content. The last line serves the same purpose.
Here is my PHP code:
I use "<--" for comments
<?php
$posts = "posts/posts.txt"; <--Pointer to the location of the posts.txt
$postsLines = file($posts);
$fetchingPost = TRUE; <--For while loop
$postNumber = 0;
$postPointer; <--In the example of posts.txt this would be the second or third line
$postTag;
$postTitle;
$postContent;
$endCondition = "end";
while ($fetchingPost == TRUE) {
$endOfFile = strcmp($postsLines[$postNumber], $endCondition);
if ($endOfFile == 0) {
$fetchingPost = FALSE;
}
if ($endOfFile <> 0) {
$postPointer[$postNumber] = $postsLines[$postNumber];
$postTag[$postNumber] = file($postPointer[$postNumber]); <--The problem, see below
$postNumber = $postNumber + 1;
}
}
?>
The Problem: It will not let me use a line that I take out of posts.txt as a "pointer" for accessing topDownShooter.txt or anything like that. I thought that the value I was pulling out of posts.txt was a string but it is not. Is there anyway that I can convert this to a string or make it work?
EDIT:
in short:
is there anyway to take something from $postsLines = file("somerandomtxtfile.txt); and make %postsLines[0] a string?
I'm not sure if I understand your question, but I'd try replacing the line by this
$postTag[$postNumber] = file_get_contents($postPointer[$postNumber]);
Answering the question in your edit, you can do that like this:
$postLines = explode(PHP_EOL, file_get_contents("somerandomtxtfile.txt"));
This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
How to parse and process HTML with PHP?
Let's say I want to extract a certain number/text from a table from here: http://www.fifa.com/associations/association=chn/ranking/gender=m/index.html
I want to get the first number on the right table td under FIFA Ranking position. That would be 88 right now. Upon inspection, it is <td class="c">88</td>.
How would I use PHP to extract the info from said webpage?
edit: I am told JQuery/JavaScript it is for this... better suited
This could probably be prettier, but it'd go something like:
<?php
$page = file_get_contents("http://www.fifa.com/associations/association=chn/ranking/gender=m/index.html");
preg_match('/<td class="c">[0-9]*</td>/',$page,$matches);
foreach($matches as $match){
echo str_replace(array( "/<td class=\"c\">", "</td>"), "", $match);
}
?>
I've never done anything like this before with PHP, so it may not work.
If you can work your magic after page load, you can use JavaScript/JQuery
<script type='text/javascript'>
var arr = [];
jQuery('table td.c').each(
arr[] = jQuery(this).html();
);
return arr;
</script>
Also, sorry for deleting my comment. You weren't specific as to what needed to be done, so I initially though jQuery would better fit your needs, but then I thought "Maybe you want to get the page content before an HTML page is loaded".
Try http://simplehtmldom.sourceforge.net/,
$html = file_get_html('http://www.google.com/');
echo $html->find('div.rankings', 0)->find('table', 0)->find('tr',0)->find('td.c',0)->plaintext;
This is untested, just looking at the source. I'm sure you could target it faster.
In fact,
echo $html->find('div.rankings', 0)->find('td.c',0)->plaintext;
should work.
Using DOMDocument, which should be pre-loaded with your PHP installation:
$dom = new DOMDocument();
$dom->loadHTML(file_get_contents("http://www.example.com/file.html"));
$xpath = new DOMXPath($dom);
$cell = $xpath->query("//td[#class='c']")->item(0);
if( $cell) {
$number = intval(trim($cell->textContent));
// do stuff
}
I am attempting to scrape the web page (see code) - as well as those pages going back in time (you can see the date '20110509' in the page itself) - for simple numerical strings. I can't seem to figure out through much trial and error (I'm new to programming) how to parse the specific data in the table that I want. I have been trying to use simple PHP/HTML without curl or other such things. Is this possible? I think my main issue is
using the delimiters that are necessary to get the data from the source code.
What I'd like is for the program to start at the very first page it can, say for example '20050101', and scan through each page till the current date, grabbing the specific data for example, the "latest close" (column), "closing arm" (row), and have that value for the corresponding date exported to a single .txt file, with the date being separated from the value with a comma. Each time the program is run, the date/value should be appended to the existing text file.
I am aware many lines of the code below are junk, it's part of my learning process.
<html>
<title>HTML with PHP</title>
<body>
<?php
$rawdata = file_get_contents('http://online.wsj.com/mdc/public/page/2_3021-tradingdiary2-20110509.html?mod=mdc_pastcalendar');
//$data = substr(' ', $data);
//$begindate = '20050101';
//$newlines = array("\t","\n","\r","\x20\x20","\0","\x0B");
//if (preg_match(' <td class="text"> ' , $data , $content)) {
//$content = str_replace($newlines
echo $rawdata;
///file_put_contents( 'NYSETRIN.html' , $content , FILE_APPEND);
?>
<b>some more html</b>
<?php
?>
</body>
</html>
All right so let's do this. We're going to first load the data into an HTML parser, then create an XPath parser out of it. XPath will help us navigate around the HTML easily. So:
$date = "20110509";
$data = file_get_contents("http://online.wsj.com/mdc/public/page/2_3021-tradingdiary2-{$date}.html?mod=mdc_pastcalendar");
$doc = new DOMDocument();
#$doc->loadHTML($data);
$xpath = new DOMXpath($doc);
Now then we need to grab some data. First off let's get all the data tables. Looking at the source, these tables are indicated by a class of mdcTable:
$result = $xpath->query("//table[#class='mdcTable']");
echo "Tables found: {$result->length}\n";
So far:
$ php test.php
Tables found: 5
Okay so we have the tables. Now we need to get specific column. So let's use the latest close column you mentioned:
$result = $xpath->query("//table[#class='mdcTable']/*/td[contains(.,'Latest close')]");
foreach($result as $td) {
echo "Column contains: {$td->nodeValue}\n";
}
The result so far:
$ php test.php
Column contains: Latest close
Column contains: Latest close
Column contains: Latest close
... etc ...
Now we need the column index for getting the specific column for the specific row. We do this by counting all of the previous sibling elements, then adding one. This is because element index selectors are 1 indexed, not 0 indexed:
$result = $xpath->query("//table[#class='mdcTable']/*/td[contains(.,'Latest close')]");
$column_position = count($xpath->query('preceding::*', $result->item(0))) + 1;
echo "Position is: $column_position\n";
Result is:
$ php test.php
Position is: 2
Now we need to get our specific row:
$data_row = $xpath->query("//table[#class='mdcTable']/*/td[starts-with(.,'Closing Arms')]");
echo "Returned {$data_row->length} row(s)\n";
Here we use starts-with, since the row label has a utf-8 symbol in it. This makes it easier. Result so far:
$ php test.php
Returned 4 row(s)
Now we need to use the column index to get the data we want:
$data_row = $xpath->query("//table[#class='mdcTable']/*/td[starts-with(.,'Closing Arms')]/../*[$column_position]");
foreach($data_row as $row) {
echo "{$date},{$row->nodeValue}\n";
}
Result is:
$ php test.php
20110509,1.26
20110509,1.40
20110509,0.32
20110509,1.01
Which can now be written to a file. Now, we don't have the markets these apply to, so let's go ahead and grab those:
$headings = array();
$market_headings = $xpath->query("//table[#class='mdcTable']/*/td[#class='colhead'][1]");
foreach($market_headings as $market_heading) {
$headings[] = $market_heading->nodeValue;
}
Now we can use a counter to reference which market we're on:
$data_row = $xpath->query("//table[#class='mdcTable']/*/td[starts-with(.,'Closing Arms')]/../*[$column_position]");
$i = 0;
foreach($data_row as $row) {
echo "{$date},{$headings[$i]},{$row->nodeValue}\n";
$i++;
}
The output being:
$ php test.php
20110509,NYSE,1.26
20110509,Nasdaq,1.40
20110509,NYSE Amex,0.32
20110509,NYSE Arca,1.01
Now for your part:
This can be made into a function that takes a date
You'll need code to write out the file. Check out the filesystem functions for hints
This can be made extendible to use different columns and different rows
I'd recommend using the HTML Agility Pack, its a HTML parser which is very handy for finding particular content within a HTML document.