PHP Advanced Regex Splitting

PHP Advanced Regex Splitting - php

I'm facing a slight issue with an idea.
I use a chat feature within an online forum on all my computing devices. I also use it mobily, which causes slight issues of formatting, input, etc. I've had the idea to relay all the chat from a relay account to my own mobile friendly site.
I haven't started on sending messages yet, although I know how to read messages. How to output them is the issue.
I sniffed outgoing packets on my computer as the chat uses ajax. I was then able to find the following url: http://server05.ips-chat-service.com/get.php?room=xxxx&user=xxxx&access_key=xxxx
The page outputs something similar to this: ~~||~~1419344231,1,kondaxdesign,Could somebody send a quick message for me__C__ please?,,10248~~||~~1419344237,1,tom.bridges,its a iso and a vm what more do we need to know?,,10880~~||~~
That string would output this in chat: http://i.stack.imgur.com/j7CM6.png
I unfortunately don't have much knowledge on regex, or any other function that would split this. Would anybody be able to assist me on getting the 1). Name, 2). Chat Data and 3). Timestamp?
As you can see, the string is something like this: ~~||~~[timestamp],1,[name],[data],,[some integer]~~||~~
Cheers.
After reading through the string output, when somebody leaves chat, this is sent: ~~||~~1419344521,2,wegface,TIMEOUT,2_10828,0~~||~~
The beginning of the log starts with 1,224442 before the first ~~||~~.

You would first explode each record, then use str_getcsv to read the string and parse it as you want. Here is a script that does that, without any formatting on output, and I've named the variables as named in the OP that describes what they are.
I wouldn't use a regular expression to parse the string, as better functionality is available (linked above)
$string = "~~||~~1419344231,1,kondaxdesign,Could somebody send a quick message for me__C__ please?,,10248~~||~~1419344237,1,tom.bridges,its a iso and a vm what more do we need to know?,,10880~~||~~";
//Split so we have each chat record to loop around
foreach( explode("~~||~~", $string) as $segments) {
//Read the CSV properly
$chat = str_getcsv($segments);
if( count($chat) <> 6 ) { continue; } //Skip any that don't have all the data
$timestamp = $chat[0];
$name = $chat[2];
$data = $chat[3];
$some_integer = $chat[5];
echo $name .' said - '. $data .'<br />';
}

Related

Will this {POST.data} construct work in a Yii email?

I'm considering converting a website from Adobe to Yii. In the Adobe code, I have an include page of variables that uses the form POST data like this:
$firstContact = "This is an email I sent to {POST.userFirstName}";
When I submit, the post data is picked up by the variable and is sent nicely.
But as I start to convert these pages to Yii, I'm wondering if that {POST.userFirstName}
is something that Yii will recognize as php and properly deploy that POST data in the email message.
Can someone kindly tell me where to look in the Yii documentation that will actually
answer this, or, if you already know that it does work, just tell me that, too?
Thanks

Syntax in php would be like this
$firstContact = "words ".$_POST['userFirstName']." more words ";
Or
$firstContact = "words {$_POST['userFirstName']} more words ";
But I would personally include this
$userFirstName = isset($_POST['userFirstName']) ? $_POST['userFirstName'] : '';
$firstContact = "words $userFirstName more words ";
In the case of emails with post data injected into them I would very strongly recommend adding htmlentities
$userFirstName = isset($_POST['userFirstName']) ? htmlentities($_POST['userFirstName'], ENT_QUOTES, "UTF-8"); : '';
$firstContact = "words $userFirstName more words ";
But please note this will render html useless such as <p>html<\p> for example, so it largely depends on what you need and if you can 100% trust the content, who is sending the email and who it's being sent to. The reason is that a user could add html containing Javascript code that could hijack user data sessions etc. all kinds of evil things that is best avoided.
There are several ways to put variables in strings ( interpolation ), Yii may offer a way like that but it's not done in native php as such. A lot of template systems use similar syntax to what you have, but I am not of fan of using just POST, I would need the _ in front as in {_POST.var} but that is just me.
The . in php is the concatenation operator similar to the + in Javascript. Many template systems use it as an access operator, which is what Javascript does, this is simular to the -> in php or [ ] in the case of an array. In general template designers will be more familiar with Javascript, which is why they use the . that way.

Searching for a link in a website and displaying it PHP

hello im a newbie in php i am trying make a search function using php but only inside the website without any database
basically if i want to search a string namely "Health" it would display the lines
The Joys of Health
Healthy Diets
This snippet is the only thing i could find if properly coded would output the "lines" i want
$myPage = array("directory.php","pages.php");
$lines = file($myPage[n]);
echo $lines[n];
i havent tried it yet if it would work but before i do i want to ask if there is any better way to do this?
if my files have too many lines wont it stress out the server?

The file() function will return an array. You should use file_get_contents() instead, as it returns a string.
Then, use regular expressions to find specific text within a link.

Your goal is fine but the method you're thinking about is not. the file() function read a file, line by line, and inserts it into an array. This assumes the HTML is well-structured in a human-readable fashion, which is not always the case. However, if you're the one providing the HTML and you make sure the structure is perfectly defined, ok... here you have the example you provided us with but complete (take into account it's the 'wrong' way of solving your problem, but if you want to follow that pattern, it's ok):
function pagesearch($pages, $string) {
if (!empty($pages) && !empty($string)) {
$tags = [];
foreach ($pages as $page) {
if ($lines = file($page)) {
foreach ($lines as $line) {
if (!empty($line)) {
if (mb_strpos($line, $string)) {
$tags[$page][] = $line;
}
}
}
}
}
return $tags;
}
}
This will return you an array with all the pages you referenced with all occurrences of the word you look for, separated by page. As I said, it's not the way you want to solve this, but it's a way.
Hope that helps

Because you do not want to use any database and because the term database is very broad and includes the file-system you want to do a search in some database without having a database.
That makes no sense. In your case one database at least is the file-system. If you can accept the fact that you want to search a database (here your html files) but you do not want to use a database to store anything related to the search (e.g. some index or cached results), then what you suggest is basically how it is working: A real-time, text-based, line-by-line file-search.
Sure it is very rudimentary but as your constraint is "no database", you have already found the only possible way. And yes it will stress your server when used because real-time search is expensive.
Otherwise normally Lucene/Solr is used for the job but that is a database and a server even.

Alternative to php preg_match to pull data from an external website?

I want to extrat the content of a specific div in an external webpage, the div looks like this:
<dt>Win rate</dt><dd><div>50%</div></dd>
My target is the "50%". I'm actually using this php code to extract the content:
function getvalue($parameter,$content){
preg_match($parameter, $content, $match);
return $match[1];
};
$parameter = '#<dt>Score</dt><dd><div>(.*)</div></dd>#';
$content = file_get_contents('https://somewebpage.com');
Everything works fine, the problem is that this method is taking too much time, especially if I've to use it several times with diferents $content.
I would like to know if there's a better (faster, simplier, etc.) way to acomplish the same function? Thx!

You may use DOMDocument::loadHTML and navigate your way to the given node.
$content = file_get_contents('https://somewebpage.com');
$doc = new DOMDocument();
$doc->loadHTML($content);
Now to get to the desired node, you may use method DOMDocument::getElementsByTagName, e.g.
$dds = $doc->getElementsByTagName('dd');
foreach($dds as $dd) {
// process each <dd> element here, extract inner div and its inner html...
}
Edit: I see a point #pebbl has made about DomDocument being slower. Indeed it is, however, parsing HTML with preg_match is a call for trouble; In that case, I'd also recommend looking at event-driven SAX XML parser. It is much more lightweight, faster and less memory intensive as it does not build a tree. You may take a look at XML_HTMLSax for such a parser.

There are basically three main things you can do to improve the speed of your code:
Off load the external page load to another time (i.e. use cron)
On a linux based server I would know what to suggest but seeing as you use Windows I'm not sure what the equivalent would be, but Cron for linux allows you to fire off scripts at certain schedule time offsets - in the background - so not using a browser. Basically I would recommend that you create a script who's sole purpose is to go and fetch the website pages at a particular time offset (depending on how frequently you need to update your data) and then write those webpages to files on your local system.
$listOfSites = array(
'http://www.something.com/page.htm',
'http://www.something-else.co.uk/index.php',
);
$dirToContainSites = getcwd() . '/sites';
foreach ( $listOfSites as $site ) {
$content = file_get_contents( $site );
/// i've just simply converted the URL into a filename here, there are
/// better ways of handling this, but this at least keeps things simple.
/// the following just converts any non letter or non number into an
/// underscore... so, http___www_something_com_page_htm
$file_name = preg_replace('/[^a-z0-9]/i','_', $site);
file_put_contents( $dirToContainSites . '/' . $file_name, $content );
}
Once you've created this script, you then need to set the server up to execute it as regularly as you need. Then you can modify your front-end script that displays the stats to read from local files, this would give a significant speed increase.
You can find out how to read files from a directory here:
http://uk.php.net/manual/en/function.dir.php
Or the simpler method (but prone to possible problems) is just to re-step your array of sites, convert the URLs to file names using the preg_replace above, and then check for the file's existence in the folder.
Cache the result of calculating your statistics
It's quite likely this being a stats page that you'll want to visit it quite frequently (not as frequent as a public page, but still). If the same page is visited more often than the cron-based script is executed then there is no reason to do all the calculation again. So basically all you have to do to cache your output is do something similar to the following:
$cachedVersion = getcwd() . '/cached/stats.html';
/// check to see if there is a cached version of this page
if ( file_exists($cachedVersion) ) {
/// if so, load it and echo it to the browser
echo file_get_contents($cachedVersion);
}
else {
/// start output buffering so we can catch what we send to the browser
ob_start();
/// DO YOUR STATS CALCULATION HERE AND ECHO IT TO THE BROWSER LIKE NORMAL
/// end output buffering and grab the contents so we now have a string
/// of the page we've just generated
$content = ob_get_contents(); ob_end_clean();
/// write the content to the cached file for next time
file_put_contents($cachedVersion, $content);
echo $content;
}
Once you start caching things you need to be aware of when you should delete or clear your cache - otherwise if you don't your stats output will never change. With regards to this situation, the best time to clear your cache is at the point you go and fetch the external web pages again. So you should add this line to the bottom of your "cron" script.
$cachedVersion = getcwd() . '/cached/stats.html';
unlink( $cachedVersion ); /// will delete the file
There are other speed improvements you could make to the caching system (you could even record the modified times of the external webpages and load only when they have been updated) but I've tried to keep things easy to explain.
Don't use a HTML Parser for this situation
Scanning a HTML file for one particular unique value does not require the use of a fully-blown or even lightweight HTML Parser. Using RegExp incorrectly seems to be one of those things that lots of start-up programmers fall into, and is a question that is always asked. This has led to lots of automatic knee-jerk reactions from more experience coders to automatically adhere to the following logic:
if ( $askedAboutUsingRegExpForHTML ) {
$automatically->orderTheSillyPersonToUse( $HTMLParser );
} else {
$soundAdvice = $think->about( $theSituation );
print $soundAdvice;
}
HTMLParsers should be used when the target within the markup is not so unique, or your pattern to match relies on such flimsy rules that it'll break the second an extra tag or character occurs. They should be used to make your code more reliable, not if you want to speed things up. Even parsers that do not build a tree of all the elements will still be using some form of string searching or regular expression notation, so unless the library-code you are using has been compiled in an extremely optimised manner, it will not beat well coded strpos/preg_match logic.
Considering I have not seen the HTML you are hoping to parse, I could be way off, but from what I've seen of your snippet it should be quite easy to find the value using a combination of strpos and preg_match. Obviously if your HTML is more complex and might have random multiple occurances of <dt>Win rate</dt><dd><div>50%</div></dd> it will cause problems - but even so - a HTMLParser would still have the same problem.
$offset = 0;
/// loop through the occurances of 'Win rate'
while ( ($p = stripos ($html, 'win rate', $offset)) !== FALSE ) {
/// grab out a snippet of the surrounding HTML to speed up the RegExp
$snippet = substr($html, $p, $p + 50 );
/// I've extended your RegExp to try and account for 'white space' that could
/// occur around the elements. The following wont take in to account any random
/// attributes that may appear, so if you find some pages aren't working - echo
/// out the $snippet var using something like "echo '<xmp>'.$snippet.'</xmp>';"
/// and that should show you what is appearing that is breaking the RegExp.
if ( preg_match('#^win\s+rate\s*</dt>\s*<dd>\s*<div>\s*([0-9]+%)\s*<#i', $snippet, $regs) ) {
/// once you are here your % value will be in $regs[1];
break; /// exit the while loop as we have found our 'Win rate'
}
/// reset our offset for the next loop
$offset = $p;
}
Gotchas to be aware of
If you are new to PHP, as you state in a comment above, then the above may seem rather complicated - which it is. What you are trying to do is quite complex, especially if you want to do it optimally and fast. However, if you follow throught the code I've given and research any bits that you aren't sure of / haven't heard of (php.net is your friend), it should give you a better understanding of a good way to achieve what you are doing.
Guessing ahead however, here are some of the problems you might face with the above:
File Permission errors - in order to be able to read and write files to and from the local operating system you will need to have the correct permissions to do so. If you find you can not write files to a particular directory it might be that the host you are using wont allow you to do so. If this is the case you can either contact them to ask about how to get write permission to a folder, or if that isn't possible you can easily change the code above to use a database instead.
I can't see my content - when using output buffering all the echo and print commands do not get sent to the browser, they instead get saved up in memory. PHP should automatically output all the stored content when the script exits, but if you use a command like ob_end_clean() this actually wipes the 'buffer' so all the content is erased. This can lead to confusing situations when you know you are echoing something.. but it just isn't appearing.
(Mini Disclaimer :) I've typed all the above manually so you may find there are PHP errors, if so, and they are baffling, just write them back here and StackOverflow can help you out)

Instead of trying to not use preg_match why not just trim your document contents down in size? for example, you could dump everything before <body and everything after </body>. then preg_match will be searching less content already.
Also, you could try to do each one of these processes as a pseudo separate thread, so that way they aren't happening one at a time.

Parsing json like data using php

I have some Json like data got crawling a URL
[[["oppl.lr",[,,,,,,,,,,,[[[[[,"A Google User"]
,,,,1]
,3000,,,"Double tree was ok, it wasnt super fancy or anything. Its good for families and just relaxing by the pool. Service was good, and rooms were kept neat.","a year ago",["http://www.ma..",,1],,"","",""]
]
,["Rooms","Service","Location","Value"]
,[]
Which is impossible to parse using php json_decode() function. Is there any library or something which will allow me to convert this to a regular json so that my task will be easier ? Otherwise I know I have to write regular expression.
Thanks in advanced.

Based on your comment. Incase your data is
[["oppl.lr",[,,,,,,,,,,,[[,"A Google User"],"",""],""]]]
If you can somehow send the data to client side. Then it is a valid javascript array. Either you can process the data #client side or use
JSON.stringify([["oppl.lr",[,,,,,,,,,,,[[,"A Google User"],"",""],""]]]);
and send the data back to server as
"[["oppl.lr",[null,null,null,null,null,null,null,null,null,null,null,[[null,"A Google User"],"",""],""]]]"
Else via php you can use this function
function getValidArray($input) {
$input = str_replace(",,", ',"",', $input);
$input = str_replace(",,", ',"",', $input);
$input = str_replace("[,", '["",', $input);
return eval("return $input;");
}
You can optimize the above function as per the need.

PHP+CSS Obfuscation - PHP ord THEN PHP strrev + CSS reverse text, how to get the special chars validated backwards?

I have a been reading up on email obfuscation.
I found an interesting post entitled Best Method for Email Obfuscation? - By Jeff Starr where he describes various tests preformed over 1.5 years by Silvan Mühlemann.
According to this study, css obfustication was 100% effective throughout the 1.5-year test, despite its various downsides.
Seeing as i was playing around with this method of obfustication before, i decided to give it another go, with the addition of a php function that i came accross.
Here is the function:
// Converts email and tel into html special characters
function convert_email_adr($email)
{
$pieces = str_split(trim($email));
$new_mail = '';
foreach ($pieces as $val)
{
$new_mail .= '&#'.ord($val).';';
}
return $new_mail;
}
And here is the php using that function.
$lstEmail = convert_email_adr("{$row['email']}");
This does exactly as described, and i would assume that this would work out quite well, assuming the harvesters have not written code that identifies the string of special chars and decodes them.
So i decided, what if i combined these two methods, as in, i break the string into special chars, then use strrev on it, then use css to reverse the string... Simple...
Here is the added peice of php that reverses the actual string as seen in the page source:
$lstEmail = strrev($lstEmail);
and the css to reverse it again on the client side:
span.obfuscate { unicode-bidi:bidi-override; direction: rtl; }
And the html:
<p><span class='listHeadings'>eMail:</span> <span class='obfuscate' style='font-size:0.6em;'><a href='mailto: $lstEmail?subject=Testing 123'>$lstEmail</a></span></p>
But the problem is that the string is now in reverse and will not be validated... Here is an example:
;901#&;111#&;99#&;64#&;801#&;501#&;79#&;901#&;301#&;46#&;411#&;101#&;001#&;011#&;111#&;611#&;011#&;79#&;811#&;301#&;501#&;79#&;411#&;99#&
What happens is that the special characters are not decoded into actual characters, so all you see is the string of special character in reverse.
There is also the downside as described by Jeff Starr, that you cannot use the css method in mailto as you cannot use the span tag within the href attribute.
So now i am truly stuck at an odds of how to go about this task. I guess i might be able to live with forcing people to input my email address themselves if they would like to mail me... But, on the other hand, i am not so sure about that.
Then there comes the task of validating special characters in reverse...
Would anyone be able to provide me with any type of input or support in this regard? Also any suggestions in different, LEGITIMATE ways of going about this task would be greatly appreciated!!
I say legitimate because i plan to use these functions in one of my live projects that is a business listing website (currently using the php function above)... The last thing i want to do is start playing around and create a gap and let out a bunch of info for the spammers! I think that would be very bad for business...

As webmaster I always put my email in plain text on the contact site. Its the most comfortable solution for the visitors and it works independent if css is supported or js.
I do this with several emails since 10 years .. yes I got some spam but not that much, about 3-5 a day. I've got a good spam filter and watch over the spam once a week and delete it.
I do not use mailto because a lot of people do not have configured a local email-program and do not know what to do with the popup when clicking the mailto-link.

Just reverse it before you obfuscate it...
$email = 'blah#whatever.co.uk';
$new = convert_email_adr($email);
echo '<span style="unicode-bidi:bidi-override; direction: rtl;">'.$new.'</span>';
function convert_email_adr($email, $reverse = true, $obfuscate = true)
{
$email = trim($email);
if($reverse)
{
$email = strrev($email);
}
if($obfuscate)
{
$pieces = str_split($email);
$email = '';
foreach($pieces as $piece)
{
$email .= '&#'.ord($piece).';';
}
}
return $email;
}

Why don't you use it that way?
function convert_email_adr($email)
{
$pieces = str_split(strrev(trim($email)));
$new_mail = '';
foreach ($pieces as $val)
{
$new_mail .= '&#'.ord($val).';';
}
return $new_mail;
}

Generally, a good solution to this is to provide a layer of abstraction around the email address entirely, by which I mean instead of just the email address, providing a contact form. They fill in their info, submit it, and your server sends along the information to the proper email address.
That's not an especially scalable approach, though, generally mostly applicable to a single "contact me" situation, not a "here are our listings of companies to contact" situation, in which case obsfucation is running directly counter to your goal of making sure the customers can contact the targets with as much ease as possible. In that case you generally want to go with good spam protection.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

PHP Advanced Regex Splitting - php

Related

Will this {POST.data} construct work in a Yii email?

Searching for a link in a website and displaying it PHP

Alternative to php preg_match to pull data from an external website?

Parsing json like data using php

PHP+CSS Obfuscation - PHP ord THEN PHP strrev + CSS reverse text, how to get the special chars validated backwards?

Categories

Resources