Regex to select div from source - php

A client of mine has asked for me to create a simple site that monitors files on another site. He needs to monitor the file names (unsure why?) and have them outputted to a file.
Here's the example source; http://pastebin.com/tyLUmCJr
I don't speak Russian, so I'm unaware of what the site's about. I apologize if it's anything that's 'less-than-suitable'.
Anyway, if you scroll to line 117, you will see a file name. I need to get all of the file names.
I've played around with the DOMDocument and third-party tools although I believe I could use regex to increase the speed of this. If anybody could point me in the correct direction, it would be greatly appreciated.
Note: take in mind that the source is stored within a string-variable known as $content.
Cheers!

After some more detailed, extensive research, I found a way to do it. Here's how I achieved it;
<?php
require_once("phpQuery.php");
$min = isset($_GET['min']) ? $_GET['min'] : 1;
$max = isset($_GET['max']) ? $_GET['max'] : 2;
$pages = [];
foreach(range($min, $max) as $page) {
array_push($pages, iconv("CP1251", "UTF-8", file_get_contents("http://www.fayloobmennik.net/files/list/" . $page . ".html")));
}
$html = file_get_html("http://www.fayloobmennik.net/files/list/");
$elem = $html->find('div[id=info] table > tbody', 0);
$test = $elem->find('tr a');
foreach ($test as $test2) {
$regex = '/<a href=\"([^\"]*)\">(.*)<\/a>/iU';
$test2 = preg_match($regex, $test2, $match);
print_r(iconv("CP1251", "UTF-8", $match[2]));
echo "<br/>";
}
?>
The phpQuery.php class is simple_html_dom (I believe that's what it's called?).
Cheers.

Related

Simple PHP code for extracting data from the HTML source code

I know I can use xpath, but in this case it wouldn't work because of the complexity of the navigation of the site.
I can only use the source code.
I have browsed all over the place and couldn't find a simple php solution that would:
Open the HTML source code page (I already have an exact source code page URL).
Select and extract the text between two codes. Not between a div. But I know the start and end variables.
So, basically, I need to extract the text between
knownhtmlcodestart> Text to extract <knownhtmlcodeend
What I'm trying to achieve in the end is this:
Go to a source code URL.
Extract the text between two codes.
Store the data temporarily (define the time manually for how long) on my web server in a simple text file.
Define the waiting time and then repeat the whole process again.
The website that I'm going to extract data from is changing dynamically. So it would always store new data into the same file.
Then I would use that data (but that's a question for another time).
I would appreciate it if anyone could lead me to a simple solution.
Not asking to write a code, but maybe someone did anything similar and sharing the code here would be helpful.
Thanks
I (shamefully) found the following function useful to extract stuff from HTML. Regexes sometimes are too complex to extract large stuff, e.g. a whole <table>
/*
$start - string marking the start of the sequence you want to extract
$end - string marking the end of it..
$offset - starting position in case you need to find multiple occurrences
returns the string between `$start` and `$end`, and the indexes of start and end
*/
function strExt($str, $start, $end = null, $offset = 0)
{
$p1 = mb_strpos($str,$start,$offset);
if ($p1 === false) return false;
$p1 += mb_strlen($start);
$p2 = $end === null ? mb_strlen($str) : mb_strpos($str,$end, $p1+1);
return
[
'str' => mb_substr($str, $p1, $p2-$p1),
'start' => $p1,
'end' => $p2];
}
This would assume the opening and closing tag are on the same line (as in your example). If the tags can be on separate lines, it wouldn't be difficult to adapt this.
$html = file_get_contents('website.com');
$lines = explode("\n", $html);
foreach($lines as $word) {
$t1 = strpos($word, "knownhtmlcodestart");
$t2 = strpos($word, "knownhtmlcodeend");
if ($t1)
$c1 = $t1;
if ($t2)
$c2 = $t2;
if ($c1 && $c2){
$text = substring($word, $c1, $c2-$c1);
break;
}
}
echo $text;

Large text replace array

I'm looking for some help when replacing text from when i'm importing an XML file. I want to text-replace some values when importing, so it matches my categories, filter values etc. on my website.
I'm using this function. i wrote it myself with copy-pasting from internet (i'm not a coder) but now i need some help/advice.
<?php
// Text replace test function
function my_text_replace($x) {
for ($y = 0; $y < 2; $y = $y+1) {
$phrase = $x;
$old = array("Draaideurkast", "fout1 MRC", "Draaideurkast MRC", "Draaideurkast MRC");
$new = array("fout1", "fout2", "goed", "fout3");
$x = str_ireplace($old, $new, $phrase);
$y = $y+1;
return $x;
}
}
?>
Code Fix:
What happens is that i do not want a partial match replace, but only the complete value of $x. in the example the output should be 'goed'. it only should replace once when found. (but that is fixed with the for loop i think). the output should be case insensitive.
Advice question:
is this a correct way of replace (large amounts) of texts during an import? you guys know other best practises or plugins (wordpress) or tools..
Thanks for any response!
Harm

PHP encoding and obfuscation

I've been trying to understand what the code below is doing. On a fundamental level I understand what the decoding process is doing however, once arriving to the for loop, It becomes a bit confusing. Basically, I'm trying to take the below and write an appropriate encoder which will allow me to reuse the decoding routine below. I'm using this for the obfuscation of a PHP script.
Would anyone be so kind as to provide assistance in the area of taking the below and providing guidance on how I'd be able to write the reverse of this (an encoder) vs the decoding method provided?
Any assistance is greatly appreciated.
<?php
$B_yhp='bCtyDEF7u+8SszsSu+kgCO5ydU+XgC8PCVdhu1jwM7cyDU6dY0V7URbSZ/';
$L_qmfd=base64_decode($B_yhp);
for($i=0; $i<strlen($L_qmfd); $i++)
{
$L_qmfd[$i]=chr(ord($L_qmfd[$i])^((91403)%256));
}
$Laz_ep=#gzinflate(strrev($L_qmfd));
//Use $Lax_ep to generate a function IE:
//create_function('$runtime',$Laz_ep);
?>
Edited:
I've tried to create the encoder using the following but I believe I'm missing / doing something wrong. The CRC values don't end up matching and the php code doesn't successfully decode to it's original state so I know I'm clearly missing a step or overlooking something here.
<?php
$hash = hash_file('crc32b', $argv[1]);
$array = unpack('N', pack('H*', $hash));
$crc32 = $array[1];
printf("CRC: %u\n",$crc32);
$content = file_get_contents($argv[1]);
$content = strrev($content);
$content = gzdeflate($content, 9);
for($i=0; $i<strlen($content); $i++)
{
$content[$i]=chr( ord($content[$i])^((91403)%256));
}
$content=base64_encode($content);
file_put_contents($argv[1]. ".packed",$content);
?>
You are doing the gzdeflate and strrev in the wrong order. Do the gzdeflate first, then reverse the deflated $content.
<?php
$hash = hash_file('crc32b', $argv[1]);
$array = unpack('N', pack('H*', $hash));
$crc32 = $array[1];
printf("CRC: %u\n",$crc32);
$content = file_get_contents($argv[1]);
$content = gzdeflate($content, 9); // Swapped these
$content = strrev($content); // two lines
for($i=0; $i<strlen($content); $i++)
{
$content[$i]=chr( ord($content[$i])^((91403)%256));
}
$content=base64_encode($content);
file_put_contents($argv[1]. ".packed",$content);
?>

How does youtube encode their urls?

Quick Question
How does youtube encode theirs urls? take below
http://www.youtube.com/watch?v=MhWyAL2hKlk
what are they doing to get the value MhWyAL2hKlk
are they using some kind of encryption then decrypting at their end
I want to something similar with a website i am working on below looks horrible.
http://localhost:8888/example/account_player/?playlist=drum+and+bass+music
i would like to encode the urls to act like youtubes dont know how they do it tho.
Any advice
Well, technically speaking, YouTube generates video IDs by using an algorithm. Honestly, I have no idea. It could be a hashsum of the entire video file + a salt using the current UNIX time, or it could be a base64 encoding of something unique to the video. But I do know that it's most likely not random, because if it were, the risk of collision would be too high.
For the sake of example, though, we'll assume that YouTube does generate random ID's. Keep in mind that when using randomly generated values to store something, it is generally a good idea to implement collision checking to ensure that a new object doesn't overwrite the existing one. In practice, though, I would recommend using a hashing algorithm, since they are one-way and very effective at preventing collisions.
So, I'm not very familiar with PHP. I had to write it in JavaScript first. Then, I ported it to PHP, which turned out to be relatively simple:
function randch($charset){
return $charset[rand() % strlen($charset)];
}
function randstr($len, $charset = "0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ_-"){
$out = [];
for($i = 0; $i < $len; $i++){
array_push($out, randch($charset));
}
return join("", $out);
}
What this does is generate a random string len characters long via the given charset.
Here's some sample output:
randstr(5) -> 1EWHd
randstr(30) -> atcUVgfhAmM5bXz-3jgyRoaVnnY2jD
randstr(30, "asdfASDF") -> aFSdSAfsfSdAsSSddFFSSsdasDDaDa
Though it's not a good idea to use such a short charset.
randstr(30, "asdf")
sdadfaafsdsdfsaffsddaaafdddfad
adaaaaaafdfaadsadsdafdsfdfsadd
dfaffafaaddfdddadasaaafsfssssf
randstr(30)
r5BbvJ45HEN6dWtNZc5ZvHGLCg4Qyq
50vKb1rh66WWf9RLZQY2QrMucoNicl
Mklh3zjuRqDOnVYeEY3B0V3Moia9Dn
Now let's say you have told the page to use this function to generate a random id for a video that was just uploaded, now you want to store this key in a table with a link to the relevant data to display the right page. If an id is requested via $_GET (e.g. /watch?v=02R0-1PWdEf), you can tell the page to check this key against the database containing the video ids, and if it finds a match, grab the data from that key, else give a 404.
You can also encode directly to a base 64 string if you don't want it to be random. This can be done with base64_encode() and base64_decode(). For example, say you have the data for the video in one string $str="filename=apples.avi;owner=coolpixlol124", for whatever reason. base64_encode($str) will give you ZmlsZW5hbWU9YXBwbGVzLmF2aTtvd25lcj1jb29scGl4bG9sMTI0.
To decode it later use base64_decode($new_str), which will give back the original string.
Though, as I said before, it's probably a better idea to use a hashing algorithm like SHA.
I hope this helped.
EDIT: I forgot to mention, YouTube's video ids as of now are 11 characters long, so if you want to use the same kind of thing, you would want to use randstr(11) to generate an 11 digit random string, like this sample id I got: 6AMx8N5r6cg
EDIT 2 (2015.12.17): Completely re-wrote answer. Original was crap, I don't know what I was thinking when I wrote it.
Your question is similar to this other SO question which contains some optimised generator functions along with a clear description of the problem you're trying to solve:
php - help improve the efficiency of this youtube style url generator
It will provide you with code, a better understanding of performance issues, and a better understanding of the problem domain all at once.
Dunno how exactly google generates their strings, but the idea is really simple. Create a table like:
+----------+------------------------------+
| code | url |
+----------+------------------------------+
| asdlkasd | playlist=drum+and+bass+music |
+----------+------------------------------+
Now, create your url like:
http://localhost:8888/example/account_player/asdlkasd
After that, just read compare your own made code with the database url and load your image, video or whatever you intend to.
PS: This is just a fast example. It can be done in many other ways also of course.
If you don't want to use decimal numbers, you can encode them into base36:
echo base_convert(123456789, 10, 36); // => "21i3v9"
And decode back:
echo base_convert("21i3v9", 36, 10); // => "123456789"
function alphaID($in, $to_num = false, $pad_up = false, $pass_key = null)
{
$out = '';
$index = 'abcdefghijklmnopqrstuvwxyz0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ';
$base = strlen($index);
if ($pass_key !== null) {
for ($n = 0; $n < strlen($index); $n++) {
$i[] = substr($index, $n, 1);
}
$pass_hash = hash('sha256',$pass_key);
$pass_hash = (strlen($pass_hash) < strlen($index) ? hash('sha512', $pass_key) : $pass_hash);
for ($n = 0; $n < strlen($index); $n++) {
$p[] = substr($pass_hash, $n, 1);
}
array_multisort($p, SORT_DESC, $i);
$index = implode($i);
}
if ($to_num) {
// Digital number <<-- alphabet letter code
$len = strlen($in) - 1;
for ($t = $len; $t >= 0; $t--) {
$bcp = bcpow($base, $len - $t);
$out = $out + strpos($index, substr($in, $t, 1)) * $bcp;
}
if (is_numeric($pad_up)) {
$pad_up--;
if ($pad_up > 0) {
$out -= pow($base, $pad_up);
}
}
} else {
// Digital number -->> alphabet letter code
if (is_numeric($pad_up)) {
$pad_up--;
if ($pad_up > 0) {
$in += pow($base, $pad_up);
}
}
for ($t = ($in != 0 ? floor(log($in, $base)) : 0); $t >= 0; $t--) {
$bcp = bcpow($base, $t);
$a = floor($in / $bcp) % $base;
$out = $out . substr($index, $a, 1);
$in = $in - ($a * $bcp);
}
}
return $out;
}
?>
you can encypt or decrypt using this function.
<?php
$random_id=57256;
$encode=alphaID($random_id);
$decode=alphaID($encode,true); //where boolean true reverse the string back to original
echo "Encode : {$encode} <br> Decode : {$decode}";
?>
Just visit the below for more info :
http://kvz.io/blog/2009/06/10/create-short-ids-with-php-like-youtube-or-tinyurl/
Just use an auto-increment ID value (from a database). Although I personally like the long URLs.

Clear PHP CLI output

I'm trying to get a "live" progress indicator working on my php CLI app. Rather than outputting as
1Done
2Done
3Done
I would rather it cleared and just showed the latest result. system("command \C CLS") doesnt work. Nor does ob_flush(), flush() or anything else that I've found.
I'm running windows 7 64 bit ultimate, I noticed the command line outputs in real time, which was unexpected. Everyone warned me that out wouldn't... but it does... a 64 bit perk?
Cheers for the help!
I want to avoid echoing 24 new lines if I can.
Try outputting a line of text and terminating it with "\r" instead of "\n".
The "\n" character is a line-feed which goes to the next line, but "\r" is just a return that sends the cursor back to position 0 on the same line.
So you can:
echo "1Done\r";
echo "2Done\r";
echo "3Done\r";
etc.
Make sure to output some spaces before the "\r" to clear the previous contents of the line.
[Edit] Optional: Interested in some history & background? Wikipedia has good articles on "\n" (line feed) and "\r" (carriage return)
I came across this while searching for a multi line solution to this problem. This is what I eventually came up with. You can use Ansi Escape commands. http://www.inwap.com/pdp10/ansicode.txt
<?php
function replaceOut($str)
{
$numNewLines = substr_count($str, "\n");
echo chr(27) . "[0G"; // Set cursor to first column
echo $str;
echo chr(27) . "[" . $numNewLines ."A"; // Set cursor up x lines
}
while (true) {
replaceOut("First Ln\nTime: " . time() . "\nThird Ln");
sleep(1);
}
?>
I recently wrote a function that will also keep track of the number of lines it last output, so you can feed it arbitrary string lengths, with newlines, and it will replace the last output with the current one.
With an array of strings:
$lines = array(
'This is a pretty short line',
'This line is slightly longer because it has more characters (i suck at lorem)',
'This line is really long, but I an not going to type, I am just going to hit the keyboard... LJK gkjg gyu g uyguyg G jk GJHG jh gljg ljgLJg lgJLG ljgjlgLK Gljgljgljg lgLKJgkglkg lHGL KgglhG jh',
"This line has newline characters\nAnd because of that\nWill span multiple lines without being too long",
"one\nmore\nwith\nnewlines",
'This line is really long, but I an not going to type, I am just going to hit the keyboard... LJK gkjg gyu g uyguyg G jk GJHG jh gljg ljgLJg lgJLG ljgjlgLK Gljgljgljg lgLKJgkglkg lHGL KgglhG jh',
"This line has newline characters\nAnd because of that\nWill span multiple lines without being too long",
'This is a pretty short line',
);
One can use the following function:
function replaceable_echo($message, $force_clear_lines = NULL) {
static $last_lines = 0;
if(!is_null($force_clear_lines)) {
$last_lines = $force_clear_lines;
}
$term_width = exec('tput cols', $toss, $status);
if($status) {
$term_width = 64; // Arbitrary fall-back term width.
}
$line_count = 0;
foreach(explode("\n", $message) as $line) {
$line_count += count(str_split($line, $term_width));
}
// Erasure MAGIC: Clear as many lines as the last output had.
for($i = 0; $i < $last_lines; $i++) {
// Return to the beginning of the line
echo "\r";
// Erase to the end of the line
echo "\033[K";
// Move cursor Up a line
echo "\033[1A";
// Return to the beginning of the line
echo "\r";
// Erase to the end of the line
echo "\033[K";
// Return to the beginning of the line
echo "\r";
// Can be consolodated into
// echo "\r\033[K\033[1A\r\033[K\r";
}
$last_lines = $line_count;
echo $message."\n";
}
In a loop:
foreach($lines as $line) {
replaceable_echo($line);
sleep(1);
}
And all lines replace each other.
The name of the function could use some work, just whipped it up, but the idea is sound. Feed it an (int) as the second param and it will replace that many lines above instead. This would be useful if you were printing after other output, and you didn't want to replace the wrong number of lines (or any, give it 0).
Dunno, seemed like a good solution to me.
I make sure to echo the ending newline so that it allows the user to still use echo/print_r without killing the line (use the override to not delete such outputs), and the command prompt will come back in the correct place.
i know the question isn't strictly about how to clear a SINGLE LINE in PHP, but this is the top google result for "clear line cli php", so here is how to clear a single line:
function clearLine()
{
echo "\033[2K\r";
}
function clearTerminal () {
DIRECTORY_SEPARATOR === '\\' ? popen('cls', 'w') : exec('clear');
}
Tested on Win 7 PHP 7. Solution for Linux should work, according to other users reports.
something like this :
for ($i = 0; $i <= 100; $i++) {
echo "Loading... {$i}%\r";
usleep(10000);
}
Use this command for clear cli:
echo chr(27).chr(91).'H'.chr(27).chr(91).'J'; //^[H^[J
Console functions are platform dependent and as such PHP has no built-in functions to deal with this. system and other similar functions won't work in this case because PHP captures the output of these programs and prints/returns them. What PHP prints goes to standard output and not directly to the console, so "printing" the output of cls won't work.
<?php
error_reporting(E_ERROR | E_WARNING | E_PARSE);
function bufferout($newline, $buffer=null){
$count = strlen(rtrim($buffer));
$buffer = $newline;
if(($whilespace = $count-strlen($buffer))>=1){
$buffer .= str_repeat(" ", $whilespace);
}
return $buffer."\r";
};
$start = "abcdefghijklmnopqrstuvwxyz0123456789";
$i = strlen($start);
while ($i >= 0){
$new = substr($start, 0, $i);
if($old){
echo $old = bufferout($new, $old);
}else{
echo $old = bufferout($new);
}
sleep(1);
$i--;
}
?>
A simple implementation of #dkamins answer. It works well. It's a bit- hack-ish. But does the job. Wont work across multiple lines.
function (int $count = 1) {
foreach (range(1,$count) as $value){
echo "\r\x1b[K"; // remove this line
echo "\033[1A\033[K"; // cursor back
}
}
See the full example here
Unfortunately, PHP 8.0.2 does not has a function to do it. However, if you just want to clear console try this: print("\033[2J\033[;H"); or use : proc_open('cls', 'w');
It works in php 8.0.2 and windows 10. It is the same that system('cls') using c language programing.
Tried some of solutions from answers:
<?php
...
$messages = [
'11111',
'2222',
'333',
'44',
'5',
];
$endlines = [
"\r",
"\033[2K\r",
"\r\033[K\033[1A\r\033[K\r",
chr(27).chr(91).'H'.chr(27).chr(91).'J',
];
foreach ($endlines as $i=>$end) {
foreach ($messages as $msg) {
output()->write("$i. ");
output()->write($msg);
sleep(1);
output()->write($end);
}
}
And \033[2K\r seems like works correct.

Categories