Inserting numerical ID's in paragraphs (PHP/MySQL DB Query) - php

I have a pretty ordinary query that displays articles stored in a database table (field = 'Article')...
while ($row = $stm->fetch())
{
$Content = $row['Article'];
}
echo $Content;
I'd like to know how I can modify the display so that every paragraph has a numerical ID. For example, the first paragraph would be [p id="1"], the second one [p id="2"] and so on. However, it would be even better if the last paragraph displayed as [p id="Last"].
(Sorry, I forgot how to post inline code, so I replaced the tags (e.g. <) with brackets.)
My goal is to simply get more control over my content. For example, there are certain items that I want to include after the first paragraph on some pages, and I might want to include a certain feature before paragraph#4 on one special page.
ON EDIT... Neither of the methods suggested below worked for me, but it' probably because I simply didn't implement them correctly; the code in both examples isn't familiar to me. At any rate, I'm bookmarking this page so I can learn more about those scripts.
In the meantime, I finally found a regex solution. (I think preg_replace is another word for regex, right?)
This inserts a numerical ID in each paragraph tag:
$c = 1;
$r = preg_replace('/(<p( [^>]+)?>)/ie', '"<p\2 id=\"" . $c++ . "\">"', $Article);
$Article = $r;
This changes the ID in the last paragraph tag to "Last"...
$c = 1;
$r = preg_replace('/(<p( [^>]+)?>)/ie', '"<p\2 id=\"" . $c++ . "\">"', $Article);
$r = preg_replace('/(<p.*?)id="'.($c-1).'"(>)/i', '\1id="Last"\2', $r);
$Article = $r;

Assuming your HTML is well-formed, you could use the SimpleXMLElement class to do so:
$sxe = new SimpleXMLElement($row['Article']);
$i = 0;
foreach ($sxe->children() as $p) {
$p->addAttribute('id', $i);
}
$p->id = 'Last'; // to set the ID of the last paragraph
echo $sxe->__toString();
If it isn't well-formed, you could use the DOMDocument class instead:
$dom = new DOMDocument;
$dom->loadHTML($row['Article']);
$i;
foreach ($dom->getElementsByTagName('p') as $p) {
$p->id = $id;
}
$p->id = 'Last';
echo $dom->saveHTML();

Related

PHP getElementsByTagName('*') avoid duplicate nodes | "In Text ads" by separating content nodes

the source of this problem is because I'm running ads on my website, my content is mainly HTML stored in a database, so I decided to place "In-Text Ads", ads that are not in a fixed zone.
My solution was to explode the content by paragraphs and place the text ad in the middle of the p tags, which worked pretty cool since I use CKEditor to generate the content, I thought images, blockquotes, and other tags would be nested inside p tags (fool me) I realize now that images and blockquotes disappeared from my posts, what did I do next? I changed my code to explode using * instead of exploding by p tag, I sang victory too soon, because now I get a lot of duplicate content, for example, if I have one image now I get the same image 4 times as well as all other tags, I´m not sure about the source of this duplicates but I think It has something to do with nested HTML, I looked for a solution for hours and now I'm here asking to see whether somebody can help me solve this headache
Here is my code:
//In a helper file
function splitByHTMLTagName(string $string, string $tagName = 'p')
{
$text = <<<TEXT
$string
TEXT;
libxml_use_internal_errors(true);
$dom = new DOMDocument();
$nodes = [];
$dom->loadHTML('<?xml encoding="utf-8" ?>' . $text);
foreach ($dom->getElementsByTagName($tagName) as $node) {
array_push($nodes, $dom->saveHTML($node));
}
libxml_clear_errors();
return $nodes;
}
//In my view
$text = nl2br($database['content']);
$nodes = splitByHTMLTagName($text, '*');
//Using var_dump($nodes); here shows the duplicates are here already.
$nodes_count = count($nodes);
$show_ad_at = -1;
$was_added = false;
if($nodes_count % 2 == 0 ){
$show_ad_at = $nodes_count /2;
}else if ($nodes_count == 1 || $nodes_count < 3){
$show_ad_at = -1; //add later
}else if ($nodes_count > 3 && $nodes_count % 2 != 0){
$show_ad_at = ceil($nodes_count/2);
}
for($i = 0; $i<count($nodes); $i++){
if(!$was_added && $i == $show_ad_at){
$was_added = true;
?>
<div>
<script></script><!--This script is provided to me, it adds the ad where it is placed, I don't show the full script, It has nothing to do with the duplicates problem-->
</div>
<?php
}
echo $nodes[$i]; //print the node that comes from $nodes array where the duplicates already exist
}
if(!$was_added){
$was_added = true;
?>
<div>
<script></script><!--This script is provided to me, it adds the ad where it is placed, I don't show the full script, It has nothing to do with the duplicates problem-->
</div>
<?php
}
What can I do?
Thanks in advance.
Postdata #1: I use codeigniter as PHP Framework
Postdata #2: My ads provider does not implement "In-Text ads" as a feature like google does.
It seems you are printing the "ads block" inside if statement.
If I don't misunderstood your code is like
foreach ... {
if (strpos($html_line, "In-Text Ads") !== FALSE) {
print($ads_html);
}
I think, you should use str_replace() instead of print() like functions, if you are using something like print() when you outputting the value...

Retrieve a text with certain class name from PHP url

How can I get a text property from another page that has certain class name with PHP?
I have an array list of URLs like this
$url_array = array(
'https://www.example.com/item/32',
'https://www.example.com/item/33',
'https://www.example.com/item/34'
);
This is really difficult to explain, so I made a not-so beautiful sketch of
the process:
The first list of the bubbles are the $url_array's items, which each contains a different URL.
Now I need a method to read the URL, and get its content.
The PHP will return a div element that has an <a> -element with href url, but the url is different for each time.
Now I want to get a content from the <a> elements url. It should return a <span> or <p> tags text content, with text-class as its own class.
How could I achieve this approach into a PHP code?
I have tried this but it ain't working:
$htmlAsString = "index.php";
$doc = new DOMDocument();
$doc->loadHTML($htmlAsString);
$xpath = new DOMXPath($doc);
$nodeList = $xpath->query('//a[#class="class-name"]/#href');
for ($i = 0; $i < $nodeList->length; $i++) {
$url_price = $nodeList->item($i)->value . "<br/>\n";
$retrieve_text_begin = explode('<div class="text-property">',
$url_price);
$retrieve_text_end = explode('</div>', $retrieve_text_begin[1]);
echo $retrieve_text_end[0];
}
I know that the $htmlAsString = "index.php"; might be the problem.

Display first 4 columns of external table

I am using Windows software to organize a tourpool. This program creates (among other things) HTML pages with rankings of participants. But these HTML pages are quite hideous, so I am building a site around it.
To show the top 10 ranking I need to select the first 10 out of about 1000 participants of the generated HTML file and put it on my own site.
To do this, I used:
// get top 10 ranks of p_rank.html
$file_contents = file_get_contents('p_rnk.htm');
$start = strpos($file_contents, '<tr class="header">');
// get end
$i = 11;
while (strpos($file_contents, '<tr><td class="position">'. $i .'</td>', $start) === false){
$i++;
}
$end = strpos($file_contents, '<td class="position">'. $i .'</td>', $start);
$code = substr($file_contents, $start, $end);
echo $code;
This way I get it to work, only the last 3 columns (previous position, up or down and details) are useless information. So I want these columns deleted or find a way to only select and display the first 4.
How do i manage this?
EDIT
I adjusted my code and at the end I only echo the adjusted table.
<?php
$DOM = new DOMDocument;
$DOM->loadHTMLFile("p_rnk.htm");
$table = $DOM->getElementsByTagName('table')->item(0);
$rows = $table->getElementsByTagName('tr');
$cut_rows_after = 10;
$cut_colomns_after = 3;
$row_index = $rows->length-1;
while($row = $rows->item($row_index)) {
if($row_index+1 > $cut_rows_after)
$table->removeChild($row);
else {
$tds = $row->getElementsByTagName('td');
$colomn_index = $tds->length-1;
while($td = $tds->item($colomn_index)) {
if($colomn_index+1 > $cut_colomns_after)
$row->removeChild($td);
$colomn_index--;
}
}
$row_index--;
}
echo $DOM->saveHTML($table);
?>
I'd say that the best way to deal with such stuff is to parse the html document (see, for instance, the first anwser here) and then manipulate the object that describes DOM. This way, you can easily extract the table itself using various selectors, get your 10 first records in a simpler manner and also will be able to remove unnecessary child (td) nodes from each line (using removeChild). When you're done with modifying, dump the resulting HTML using saveHTML.
Update:
ok, here's a tested code. I removed the necessity to hardcode the numbers of colomns and rows and separated the desired numbers of colomns and rows into a couple of variables (so that you can adjust them if neede). Give the code a closer look: you'll notice some details which were missing in you code (index is 0..999, not 1..1000, that's why all those -1s and +1s appear; it's better to decrease the index instead of increasing because in this case you don't have to case about numeration shifts on removing; I've also used while instead of for not to care about cases of $rows->item($row_index) == null separately):
<?php
$DOM = new DOMDocument;
$DOM->loadHTMLFile("./table.html");
$table = $DOM->getElementsByTagName('tbody')->item(0);
$rows = $table->getElementsByTagName('tr');
$cut_rows_after = 10;
$cut_colomns_after = 4;
$row_index = $rows->length-1;
while($row = $rows->item($row_index)) {
if($row_index+1 > $cut_rows_after)
$table->removeChild($row);
else {
$tds = $row->getElementsByTagName('td');
$colomn_index = $tds->length-1;
while($td = $tds->item($colomn_index)) {
if($colomn_index+1 > $cut_colomns_after)
$row->removeChild($td);
$colomn_index--;
}
}
$row_index--;
}
echo $DOM->saveHTML();
?>
Update 2:
If the page doesn't contain tbody, use the container which is present. For instance, if tr elements are inside a table element, use $DOM->getElementsByTagName('table') instead of $DOM->getElementsByTagName('tbody').

Add comments and attributes including an incremented number to elements in an HTML string

I have been trying to understand how preg_replace_callback() works, but I just don't get it.
Say for example, I get_contents from navigation.php.
In that text are a bunch of a href and divs and I want to give incremental ids to and add in some code commenting before each a href.
How would I loop over all those so they would all increment and add the ids and commenting?
<?php
$string = file_get_contents("navigation.php");
$i = 1;
$replace = "<a ";
$with = '<!-- UNIT'.$i.' --><a id=a_'.$i;
$replace2 = "<div ";
$with2 = '<div id=b_'.$i;
preg_replace_callback()
$i++
?>
I figured maybe if I could get an example with my code, maybe I would be able to understand it better.
Do $replace and $replace2 are my strings I am searching for and $with and $with2 are the replacements respectively, and $i being the increment.
An example of data coming in:
Page 4
Page 3
<div class="red">stuff</div>
<div class="blue">stuff</div>
I would want an output like..
<!-- UNIT 1 --><a id="a_1" href="page4.php">Page 4</a>
<!-- UNIT 2 --><a id="a_2" href="page3.php">Page 3</a>
<div id="b_1" class="red">stuff</div>
<div id="b_2" class="blue">stuff</div>
You have multiple goals, the simplest way to accomplish them imo is doing it step-by-step.
1. The RegEx
You want two HTML tags, these can be caught easily via /(<a|<div)/i (explanation, g modifier is only used to demonstrate that it correctly matches).
With this you could write the following code:
$parsed = preg_replace_callback('/(<a|<div)/i', ???, $string);
2. The callback
The logic behind this can be simplified to the following switch
switch ($found) {
case '<div':
$result = '<div id="b_'.$id.'"';
break;
case '<a':
$result = '<!-- UNIT'.$id.' --><a id="a_'.$id.'"';
break;
default:
$result = "";
break;
}
To implement this you can either write a new function or use an anonymous one. To make $id accessible, you need to learn about variable scope in PHP. An easy way out of using anything like global $id; or define() is using Closures with the use() syntax. To be able to manipulate $id (increment it), you'll need to pass it by reference (when using Closures). This brings you to the following code:
$parsed = preg_replace_callback("/(<a|<div)/", function($match) use (&$id) {
switch ($match[1]) {
case '<div':
$result = '<div id="b_'.$id.'"';
break;
case '<a':
$result = '<!-- UNIT'.$id.' --><a id="a_'.$id.'"';
break;
default:
$result = $match[1];//do nothing
break;
}
$id++;
return $result;
}, $string);
Watch it work here.
I recommend not using a preg_ function at all. PHP has a robust set of tools for parsing valid HTML -- use a DOM parser.
Code: (Demo)
$html = <<<HTML
<body>
Page 4
Page 3
<div class="red">stuff</div>
<div class="blue">stuff</div>
</body>
HTML;
$dom = new DOMDocument;
$dom->loadHTML($html, LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD);
$counter = 0;
foreach ($dom->getElementsByTagName("a") as $a) {
++$counter;
$comment = new DOMComment(" UNIT $counter ");
$a->parentNode->insertBefore($comment, $a);
$a->setAttribute('id', "a_$counter");
}
$counter = 0;
foreach ($dom->getElementsByTagName("div") as $b) {
++$counter;
$b->setAttribute('id', "b_$counter");
}
echo substr($dom->saveHTML(), 7, -9);
I have wrapped your HTML in a parent body tag and removed it at the end of the script to aid in preserving the newlines of your input (otherwise some newlines will be lost while processing).
The remainder of the the syntax is rather self-documenting because the class methods are very descriptive of their functionality.

PHP - dom appendChild() - adding strings around selected html tags using PHP DOM

I'm trying to run through some html and insert some custom tags around every instance of an "A" tag. I've got so far, but the last step of actually appending my pseudotags to the link tags is eluding me, can anyone offer some guidance?
It all works great up until the last line of code - which is where I'm stuck. How do I place these pseudotags either side of the selected "A" tag?
$dom = new domDocument;
$dom->loadHTML($section);
$dom->preserveWhiteSpace = false;
$ahrefs = $dom->getElementsByTagName('a');
foreach($ahrefs as $ahref) {
$valueID = $ahref->getAttribute('name');
$pseudostart = $dom->createTextNode('%%' . $valueID . '%%');
$pseudoend = $dom->createTextNode('%%/' . $valueID . '%%');
$ahref->parentNode->insertBefore($pseudostart, $ahref);
$ahref->parentNode->appendChild($pseudoend);
$expression[] = $valueID; //^$link_name[0-9a-z_()]{0,3}$
$dom->saveHTML();
}
//$dom->saveHTML();
I'm hoping to get this to perform the following:
text
turned into
%%yyy%%text%%/yyy%%
But currently it doesn't appear to do anything - the page outputs, but there are no replacements or nodes added to the source.
In order to make sure that the ahref node is wrapped...
foreach($ahrefs as $ahref) {
$valueID = $ahref->getAttribute('name');
$pseudostart = $dom->createTextNode('%%' . $valueID . '%%');
$pseudoend = $dom->createTextNode('%%/' . $valueID . '%%');
$ahref->parentNode->insertBefore($pseudostart, $ahref);
$ahref->parentNode->insertBefore($ahref->cloneNode(true), $ahref); // Inserting cloned element (in order to insert $pseudoend immediately after)
$ahref->parentNode->insertBefore($pseudoend, $ahref);
$ahref->parentNode->removeChild($ahref); // Removing old element
}
print $dom->saveXML();

Categories