Finding the maximum occurring string within a text file

Finding the maximum occurring string within a text file - php

So I've seen questions asked before that are along the lines of finding the maximum occurence of a string within a file but all of those rely on knowing what to look for.
I have what you might almost call a flat file database that grabs a bunch of input data and basically wraps different parts of it in html span tags with referencing ids.
Each line comes out in this kind of fashion:
<p>
<span class="ip">58.106.**.***</span>
Wrote <span class='text'>some text</span>
<span class='effect1'> and caused seizures </span>
<span class='time'>23:47</span>
</p>
How would I then go about finding the #test contents that occurs the most times.
i.e if I had
<p>
<span class="ip">58.106.**.***</span>
Wrote <span id='text'>woof</span>
<span class='effect1'> and caused seizures </span>
<span class='time'>23:47</span>
</p>
<p>
<span class="ip">58.106.**.***</span>
Wrote <span class='text'>meow</span>
<span class='effect1'> and caused mind-splosion </span>
<span class='time'>23:47</span>
</p>
<p>
<span class="ip">58.106.**.***</span>
Wrote <span class='text'>meow</span>
<span class='effect1'> and used no effect </span>
<span class='time'>23:47</span>
</p>
<p>
<span class="ip">58.106.**.***</span>
Wrote <span class='text'>meow</span>
<span class='effect1'> and used no effect </span>
<span class='time'>23:47</span>
</p>
the output would be 'meow'.
How would I accomplish this in php?

First off: Your format is not conducive to this type of data manipulation; you might want to consider changing it.
That said, based on this structure the logical solution would be to leverage DOMXPath as Dani says. This could have been problematic because of all the duplicate ids in there, but in practice it works (after emitting a boatload of warnings, which is one more reason that the data structure affords revision).
Here's some code to go with the idea:
$input = '<body>'.get_input().'</body>';
$doc = new DOMDocument;
$doc->loadHTML($input); // lots of warnings, duplicate ids!
$xpath = new DOMXPath($doc);
$result = $xpath->query("//*[#id='text']/text()");
$occurrences = array();
foreach ($result as $item) {
if (!isset($occurrences[$item->wholeText])) {
$occurrences[$item->wholeText] = 0;
}
$occurrences[$item->wholeText]++;
}
// Sort the results and produce final answer
arsort($occurrences);
reset($occurrences);
echo "The most common text is '".key($occurrences).
"', which occurs ".current($occurrences)." times.";
See it in action.
Update (seeing as you fixed the duplicate id issue): You would simply change the xpath query to "//*[#class='text']/text()" so that it continues to match. However this way of doing things remains inefficient, so if one or more of these apply:
you are going to do this all the time
you have lots of data
you need it to be really fast
then changing the data format is a good idea.

Have a look at DOMXPath, you can use an XPath query to get all the #text and then find the most used one with php.
There is a problem that you used the same id few times which is not valid HTML so DOM might break.

Related

How to style the output of an echo to display a grid?

My team and I have made a database in php my admin for a restaurant, and I'm currently working on the customer dashboard. Im using for each loops to display complete orders in one of the dashboard tabs, and have the code working, but right now it just outputs regular black text. I was wondering how to style it to output the rows as a grid, similar to bootstrap grids.
I've tried to just add containers with rows and columns to the foreach echo itself, but its just not working as I thought it would.
<div id="CurrentOrders" class="tabcontent" style="margin-left: 24px">
<!-- This information will be pulled from the Orders table in the DB -->
<h3>Current Orders</h3>
<p>
<div class="container">
<?php
foreach ($orderno as $order) {
$n = $order['OrderNo'];
$menunamequery = "SELECT * FROM OrderItem WHERE OrderNo = '{$n}'";
$currentorders = getRows($menunamequery);
foreach ($currentorders as $currentorder) {
echo "Order Number -"." ".$currentorder['OrderNo']." , "."Order -"." ".$currentorder['MenuName']." , "."Quantity -"." ".$currentorder['Quantity']."<br>";
}
}
?> </div>
</p>
</div>
The expected result is for these rows im outputting to have some sort of grid layout, the actual result is just plaintext currently.
Sorry if this is a bad question, my team and I just learned php this semester and are hoping to continue getting better at it. Any help would be appreciated.

You can simply output HTML from PHP:
echo '<span style="color: red">'.$currentorder['MenuName'].'</span>';
However, it is advised that you sanitize your output, so nobody can "create HTML" by putting tags in the database;
echo '<span style="color: red">'.htmlspecialchars($currentorder['MenuName']).'</span>';
This does exactly what it says; makes HTML entities from special characters. For example, > will be printed as >, which the browser will safely render as >, instead of trying to interpret it as an HTML element closing bracket.
Alternatively, you can simply write HTML directly if you wish, by closing and opening the PHP tags:
// PHP Code
?>
<span class="some-class"><?=htmlspecialchars($currentorder['MenuName'])?></span>
<?php
// More PHP Code
You may also want to look into templating engines to make it easier for you, although it depends on the project if it's worth it for you to look into that, since there is a little bit of a learning curve to it.

how to set create regex for this string [duplicate]

This question already has answers here:
RegEx match open tags except XHTML self-contained tags
(35 answers)
Closed 6 years ago.
<div id="plugin-description">
<p itemprop="description" class="shortdesc">
BuddyPress helps you build any type of community website using WordPress, with member profiles, activity streams, user groups, messaging, and more. </p>
<div class="description-right">
<p class="button">
<a itemprop="downloadUrl" href="https://downloads.wordpress.org/plugin/buddypress.2.6.1.1.zip">Download Version 2.6.1.1</a>
i need description just with this code
<p itemprop="description" class="shortdesc">[a-z]</p>
i need download link
<a itemprop="downloadUrl" href="[A-Z]"></a>

There are better tools for parsing HTML than regular expressions. That said, there are times when parsing HTML with regular expressions works safely and consistently, so don't be bullied out of trying it. These cases are usually for small, known sets of HTML markup.
For this particular case, it seems that using an HTML parser would be effective leave you with more legible code. To illustrate this, I'll use a command line tool like pup, which will help you retrieve your content pretty simply. Let's pretend that the markup is stored at /tmp/input on your computer.
To grab the downloadUrl...
pup < /tmp/input 'a[itemprop="downloadUrl"] attr{href}'
To grab the description...
pup < /tmp/input 'p[itemprop="description"] text{}'
This I think illustrates the simplicity and benefits of using an HTML parser to grab what you're after.

And once again:
<?php
$data = <<<DATA
<div id="plugin-description">
<p itemprop="description" class="shortdesc">
BuddyPress helps you build any type of community website using WordPress.
</p>
<div class="description-right">
<p class="button">
<a itemprop="downloadUrl" href=".zip">Download Version 2.6.1.1</a>
</p>
</div>
</div>
DATA;
$dom = new DOMDocument();
$dom->loadHTML($data);
$xpath = new DOMXPath($dom);
$containers = $xpath->query("//div[#id='plugin-description']");
foreach ($containers as $container) {
$description = trim($xpath->query(".//p[#itemprop='description']", $container)->item(0)->nodeValue);
$link = $xpath->query(".//a[#itemprop='downloadUrl']/#href", $container)->item(0)->nodeValue;
echo $description . $link;
}
?>
See a demo on ideone.com.

is there any significant difference in these 3 different applications of php

I am trying to improve on my programming theory and in a previous question it was pointed out to me that I should not use multi-line ehcos in my programming as show in the first example. I use this because once it is complied it automatically minimizes the out put html. Which of the there examples below is the best practice for making use of php and why?
1)
echo '<div class="row cf">';
echo '<div class="col_8 cf alpha">'.$page_title.'</div>';
echo '<div class="col_4 cf omega right">';
echo '<a href="'.$table_url.'-action.php?action=add" class="button blue">';
echo '<i class="icon-plus-sign"> </i> Add a Site</a>';
echo '</div>';
echo '</div><hr>';
2)
echo '
<div class="row cf">
<div class="col_8 cf alpha">'.$page_title.'</div>
<div class="col_4 cf omega right">
<a href="'.$table_url.'-action.php?action=add" class="button blue">
<i class="icon-plus-sign"> </i> Add a Site</a>
</div>
</div>
<hr>
';
3)
<div class="row cf">
<div class="col_8 cf alpha"><?php echo $page_title; ?></div>
<div class="col_4 cf omega right">
<a href="<?php echo $table_url; ?>-action.php?action=add" class="button blue">
<i class="icon-plus-sign"> </i> Add a Site</a>
</div>
</div>
<hr>
Thanks.... Pete

There is significant difference. The first one is especially hard to maintain. The second is a bit better, but still inconvenient.
The third one allows you to write plain HTML without the need of escaping anything. You only briefly open PHP tags to insert variables. This HTML is also property syntax-highlighted if you got a smart editor like Netbeans or even Notepad++.
So I would choose the third one, except maybe when I insert a very tiny piece of HTML.
If I may suggest a small improvement:
<?php echo $x; ?>
can also be written as
<?= $x ?>
In case of performance, I think there won't be much difference. I would guess that the third one is faster, since it needs to parse and execute smaller pieces of PHP. In the other ones the strings need to be parsed as well to check for special characters.
That said, I doubt if you would be able to measure any difference at all, and it shouldn't be your concern. Choose the one you like best. For optimization, you'd better find real bottle necks, which are ususally found in the area of executing too many, too complex, or poorly optimized database queries.

I prefer solution 3 for two reasons:
My IDE (notepad++) will actually syntax highlight the HTML and the PHP and not just colour it the "string colour". I also do not have to escape ''s in the HTML by changing it to \' every time I use it.

PHP Simple HTML DOM parser give faulty data

I'm using PHP Simple HTML DOM to parse a web page with the following HTML. Notice the extra </span>-tags in each <li>.
<li>
<span class="name">
Link asdasd
</span>
</span>
</li>
<li>
<span class="name">
Link asdasd2
</span>
</span>
</li>
My queries are:
$lis = $dom->find('li');
foreach ($lis as $li) {
$spans = $li->find('span');
foreach ($spans as $span) {
echo $span->plaintext."<br>";
}
}
My output is:
Link asdasd
Link asdasd2
-----------
Link asdasd2
-----------
As you can see the find('span') finds two spans as children to the first <li> and getting the value from the next <span> it can find (even though it's a child of the next <li>). Removing the trailing </span> fixes the problem.
My questions are:
Why is this happening?
How I can solve this particular case?
Everything else works well and I'm not in a position to make big changes to my script. I can change the DOM queries easily though if needed.
I am thinking about counting start and closing tags and stripping one </span> if there are too many of them. Since they will always be <span>s, are there a smart way to check it with regexp?

1) Simple is trying to fix your extra </span> by adding a <span> somewhere. So now you have an extra span that shouldn't be there. For the record, DomDocument would do the same thing, although perhaps in a more predictable way.
2) Simplify:
foreach ($dom->find('li > span') as $span) {
echo $span->plaintext."<br>";
}
// Link asdasd <br> Link asdasd2 <br>
Now you've told it you only want the span that is a child of a li. Even better, do something like:
foreach ($dom->find('span.name') as $span) {
echo $span->plaintext."<br>";
}
Use those attributes, that's what they're good for.

$newTxt = preg_replace('/\<\/span\>[\S]*\<\/span\>/','</span>',$txt);
The method 'find(x)' is an overloaded function that can return the equivalents of:
$e->getElementById(x);
$e->getElementsById(x);
$e->getElementByTagName(x); and
$e->getElementsByTagName(x);
In your first call makes it use of the last call. In the second $li of the third possibility. It is probably a method of optimization which question you were asking according to the API. I guess you have found a bug in the API, because you were asking in both cases the use of the third call:
$e->getElementByTagName();

Step through DOMDocument tag by tag

Strangely I can't find an answer for this, though it seems like it must have been asked before. I have a DOMDocument in PHP and I want to step through each html tag as if it were a flat document basically. I need to inspect each element looking for names of the tag and specific attribute values. I can't use xpath in this instance i don't think because although the structure of the html remains the same, the attributes can be different depending on when the doc is parsed.
My document is a little unusual like this
<tr class='THIS COULD BE ONE OF THREE DIFFERENT CLASSES' id='UNIQUE ID'>
<td class='statistics show' >
<button class="js-hide">Show</button>
</td>
<td class='details'>
<p>
<span class='home'>
<a href='LINK'>TEAM 1</a> </span>
<span class='COULD BE ONE OF TWO DIFFERENT CLASSES'> VARIABLE CONTENT </span> <span class='away'>
<a href='LINK'>TEAM 2</a> </span>
</p>
</td>
<td class='COULD BE ONE OF THREE CLASS TYPES'>
VARIABLE CONTENT</td>
<td class='status'>
</td>
</tr>
There are other tags around the document but there are a number of duplicated sections like that one I would like to pull out. I can't see how xpath would allow me to parse this sensibly so tag by tag is my only option but I can't find the correct way to do it. Any suggestions?

you could use getElementsByTagName(*) to get all elements and loop through those.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

Finding the maximum occurring string within a text file - php

Have a look at DOMXPath, you can use an XPath query to get all the #text and then find the most used one with php. There is a problem that you used the same id few times which is not valid HTML so DOM might break.

Related

How to style the output of an echo to display a grid?

how to set create regex for this string [duplicate]

is there any significant difference in these 3 different applications of php

PHP Simple HTML DOM parser give faulty data

Step through DOMDocument tag by tag

Categories

Resources