PHP: Get specific content of a website

PHP: Get specific content of a website - php

I want to get specific content of a website into an array.
I have approx 20 sites to fetch the content and output in other ways i like.Only the port is always changing (not 27015, its than 27016 or so...)
This is just one: SOURCE-URL of Content
For now, i use this code in PHP to fetch the Gameicon "cs.png", but the icon varies in length - so it isn't the best way, or? :-/
$srvip = '148.251.78.214';
$srvlist = array('27015');
foreach ($srvlist as $srvport) {
$source = file_get_contents('http://www.gametracker.com/server_info/'.$srvip.':'.$srvport.'/');
$content = array(
"icon" => substr($source, strpos($source, 'game_icons64')+13, 6),
);
echo $content[icon];
}
Thanks for helping, some days are passed from my last PHP work :P

You just need to look for the first " that comes after the game_icons64 and read up to there.
$srvip = '148.251.78.214';
$srvlist = array('27015');
foreach ($srvlist as $srvport) {
$source = file_get_contents('http://www.gametracker.com/server_info/'.$srvip.':'.$srvport.'/');
// find the position right after game_icons64/
$first_occurance = strpos($source, 'game_icons64')+13;
// find the first occurance of " after game_icons64, where the src ends for the img
$second_occurance = strpos($source, '"', $first_occurance);
$content = array(
// take a substring starting at the end of game_icons64/ and ending just before the src attribute ends
"icon" => substr($source, $first_occurance, $second_occurance-$first_occurance),
);
echo $content['icon'];
}
Also, you had an error because you used [icon] and not ['icon']
Edit to match the second request involving multiple strings
$srvip = '148.251.78.214';
$srvlist = array('27015');
$content_strings = array( );
// the first 2 items are the string you are looking for in your first occurrence and how many chars to skip from that position
// the third is what should be the first char after the string you are looking for, so the first char that will not be copied
// the last item is how you want your array / program to register the string you are reading
$content_strings[] = array('game_icons64', 13, '"', 'icon');
// to add more items to your search, just copy paste the line above and change whatever you need from it
foreach ($srvlist as $srvport) {
$source = file_get_contents('http://www.gametracker.com/server_info/'.$srvip.':'.$srvport.'/');
$content = array();
foreach($content_strings as $k=>$v)
{
$first_occurance = strpos($source, $v[0])+$v[1];
$second_occurance = strpos($source, $v[2], $first_occurance);
$content[$v[3]] = substr($source, $first_occurance, $second_occurance-$first_occurance);
}
print_r($content);
}

Related

How to get the string position of the middle-most element within HTML content?

I am working with news articles in HTML format, that come from a wysiwyg editor, and I need to find the middle of it, but in a visual/HTML context, meaning an empty place inbetween two root elements. Kind of if you wanted to split the article into two pages let's say, with the equal number of paragraphs on each when possible.
All root elements seem to come out as paragraphs, which was easy enough to count, a simple
$p_count = substr_count($article_text, '<p');
Returns the total number of opening paragraph tags, and then i can look for the strpos of a ($p_count/2)-th occurrence of a paragraph.
But the problem is embedded tweets, that contain paragraphs, which appear sometimes under blockquote > p, other times as center > blockquote > p.
So i turn to DOMDocument. This little snippet gives me the nth element that is the middle one (even if the elements are divs and not paragraphs, which is cool):
$dom = new DOMDocument();
$dom->loadHTML($article_text);
$body = $dom->getElementsByTagName('body');
$rootNodes = $body->item(0)->childNodes;
$empty_nodes = 0;
foreach($rootNodes as $node) {
if($node->nodeType === XML_TEXT_NODE && strlen(trim($node->nodeValue)) === 0) {
$empty_nodes++;
}
}
$total_elements = $rootNodes->length - $empty_nodes;
$middle_element = floor($total_elements / 2);
But how do i now find the string offset of this middle element within my original HTML source, so that i can point to this middle place within the article text string? Especially considering that DOMDocument converts the HTML of what i gave it, into a full HTML page (with a doctype, and head and all that), so its output HTML is bigger than my original HTML article source.

Ok i solved it.
What i did was match all HTML tags from the article, using the PREG_OFFSET_CAPTURE flag of preg_match_all, which remembers at which character offset the pattern was matched. Then i looped through all of them sequentially, and counted which depth i'm in; if it's an opening tag, i count the depth +1, and for a closing -1 (minding the self-closing tags). Every time the depth gets to zero after a closing tag, i count that as one more root element closed. If at the end i ended up at depth 0, i assumed i counted correctly.
Now, i can take the number of root elements that i counted, divide by 2 to get the middle-ish one (+-1 for odd numbers), and look at the offset of the element at that index as reported by preg_match_all previously.
Complete code for that if anyone needs to do the same thing is below.
It might be sped up if the is_self_closing() function was written using a regex and then checking in_array($self_closing_tags), instead of a foreach loop, but in my case it didn't make enough of a difference for me to bother.
function calculate_middle_of_article(string $text, bool $debug=false): ?int {
function is_self_closing(string $input, array $self_closing_tags): bool {
foreach($self_closing_tags as $tag) {
if(substr($input, 1, strlen($tag)) === $tag) {
return true;
}
}
return false;
}
$self_closing_tags = [
'!--',
'area',
'base',
'br',
'col',
'embed',
'hr',
'img',
'input',
'link',
'meta',
'param',
'source',
'track',
'wbr',
'command',
'keygen',
'menuitem',
];
$regex = '/<("[^"]*"|\'[^\']*\'|[^\'">])*>/';
preg_match_all($regex, $text, $matches, PREG_OFFSET_CAPTURE);
$debug && print count($matches[0]) . " tags found \n";
$root_elements = [];
$depth = 0;
foreach($matches[0] as $match) {
if(!is_self_closing($match[0], $self_closing_tags)) {
$depth+= (substr($match[0], 1, 1) === '/') ? -1 : 1;
}
$debug && print "level {$depth} after tag: " . htmlentities($match[0]) . "\n";
if($depth === 0) {
$root_elements[]= $match;
}
}
$ok = ($depth === 0);
$debug && print ($ok ? 'ok' : 'not ok') . "\n";
// has to end at depth zero to confirm counting is correct
if(!$ok) {
return null;
}
$debug && print count($root_elements) . " root elements\n";
$element_index_at_middle = floor(count($root_elements)/2);
$half_char = $root_elements[$element_index_at_middle][1];
$debug && print "which makes the half the {$half_char}th character at the {$element_index_at_middle}th element\n";
return $half_char;
}

php wildcard characters in filename

In a PHP while loop, I have the following:
$sql = mysqli_query($conn,"SELECT * FROM stock ORDER BY partnumber");
$productCount = mysqli_num_rows($sql);
if ($productCount > 0) {
while($row = mysqli_fetch_array($sql)){
$id = $row["id"];
$partnumber = $row["partnumber"];
$description = $row["description"];
$price = $row["listprice"];
$availability = $row["availability"];
$image = "path/$partnumber.jpg";
if(!file_exists($image)) { //substitute image if one does not exist
$image = "path/no-image.jpg";
}
When displaying the images in a table, I have:
<img src="'.$image.'" alt="'.$partnumber.'" />
It works ok with exact matches.
The problem is, many of the images contain lower case x in the filename as a character wildcard.
How can I get these image filenames containing x to display with the partnumbers containing explicit characters in their filenames?
Example filename: AB2Dxxx.jpg
Example partnumber: AB2DTUV

I hope this will help - I've had to adapt it to run locally, but it uses glob and assumes that the wildcards are all at the end of the string, so the filename can step back until a match is found.
The execution path has a folder called images, which will contain our theoretical part images. In lieu of a database, I've defined a simple key/value array containing my test results (it looks like you only rely on the partnumber, so that's all I've defined...).
The images directory contains two files: ABCDEF.jpg and ABCxxx.jpg.
$parts = array( array( "partnumber"=> "ABCDEF" ),
array( "partnumber"=> "ABDCEF" ),
array( "partnumber"=> "ABCDDD" ) );
foreach($parts as $part) {
$partnumber = $part["partnumber"];
$next_pn = $partnumber;
$pn_len = strlen($partnumber);
echo $partnumber." is the part number<br>-------------<br>";
while(strlen($next_pn) > 0 )
{
$image = null;
foreach (glob("images/".$partnumber."*.jpg", GLOB_NOSORT) as $filename) {
$image = $filename;
echo $filename . " matches " . $partnumber."<br>";
}
if($image) {
echo "<br>";
break;
}
$next_pn = substr($next_pn, 0, -1);
$partnumber = str_pad($next_pn, $pn_len, "x");
}
}
This outputs
ABCDEF is the part number
-------------
images/ABCDEF.jpg matches ABCDEF
ABDCEF is the part number
-------------
ABCDDD is the part number
-------------
images/ABCxxx.jpg matches ABCxxx
As you can see, it will basically take a filename and check for a match - if none is found, it takes off a character and replaces it with an x, then checks again. In this case:
ABCDEF matches ABCDEF.jpg instantly;
ABDCEF doesn't match any of the files, and;
ABCDDD matches ABCxxx.jpg on the fourth loop (first checks for ABCDDD, second ABCDDx, third ABCDxx, fourth matches on ABCxxx).
It isn't perfect and might need a bit more work, but it seems like it would do what you're looking to do.
P.S. glob has a lot of options - I've used NOSORT as a flag which tends to help it run a bit quicker when there are lots of files to hunt through.
http://php.net/manual/en/function.glob.php

how to partially mask/hide email address using PHP

Im trying to achieve the following with PHP
sample#gmail.com => s*****#gmail.com
sa#yahoo.com => **#yahoo.com
sampleaddress#hotmail.com => samplead*****#hotmail.com
I want to hide last five characters in the portion that stays before '#'
I can write long code to do this by splitting and then replacing based on lengths, but Im sure there must be an easy way to do this using PHP functions, any help please?
UPDATE:
Im adding my code here, Im sure its not efficient, and thats the reason Im asking it here
$email = 'sampleuser#gmail.com';
$star_string = '';
$expl_set = explode('#',$email);
if(strlen ($expl_set[0]) > 5){$no_stars = 5; }else{$no_stars = strlen ($expl_set[0]); }
for($i=0;$i<$no_stars; $i++)
{
$star_string.='*';
}
$masked_email = substr($expl_set[0], 0, -5).$star_string.'#'.$expl_set[1];

You can wrap it into a function, making it easier to call multiple times.
Basically, split the address and the domain, replace $mask number of characters in the end of the string (default 5) with *, or the length of the address if it's shorter than the amount of masked characters.
function mask_email($email, $masks = 5) {
$array = explode("#", $email);
$string_length = strlen($array[0]);
if ($string_length < $masks)
$masks = $string_length;
$result = substr($array[0], 0, -$masks) . str_repeat('*', $masks);
return $result."#".$array[1];
}
The above would be used like this
echo mask_email("test#test.com")."\n";
echo mask_email("longeremail#test.com");
which would ouput this
****#test.com
longer*****#test.com
You can also specify the number you want filtered by using the second parameter, which is optional.
echo mask_email("longeremail#test.com", 2); // Output: longerema**#test.com
Live demo

Php loop and count through txt file

I got a bit of a complex problem. At work we have to count our inventory every month. This is done with a scanner. At each location there can be up to 100 different items. Every item, even the same kind have to be scanned. When each location has been scanned, we print out the list of scanned items. The problem is that each scan has its own line in the txt file (it done not add/subtract multiple counts of the same item)
As the vendor of our system is notoriously slow implementing new functions I thought about a php script that does the following:
1: read every line from the txt file
2: add/substract the count of the same item
3: print out a list with the item number and count.
The txt file is as following:
01234+000001N
Where the first 5 digits is the item number. As it is possible to add and substract the next symbol is + or - then the next 5 digits is the count and the N is the "eol"
So somehow I have to put it all in some sort of array and the sort it by item number. And the add/substract and then finally print out the final list

Assuming you've loaded the file into a string, line by line, and is split by a new line, you can do the following; (read code comments)
$strTxtFile = <<<TXT
01234+000001N
01234+000001N
09876+000002N
01234+000001N
01234+000001N
09876+000002N
01234-000001N
09876+000002N
TXT;
/**
* 01234 should have 3 stock
* 09876 should have 6 stock
*/
$arrProducts = array();
$arrFileLines = explode(PHP_EOL, $strTxtFile);
foreach($arrFileLines as $strFileLine) {
//Split the lines by the action (+/-)
$arrStockAction = preg_split("/(\+|\-)/", $strFileLine, NULL, PREG_SPLIT_DELIM_CAPTURE);
$strProductCode = $arrStockAction[0]; //The first part is the product code
$strAction = $arrStockAction[1]; //The action (+/-) to the stock
$intStockAmount = (int) $arrStockAction[2]; //Cast it to an int to get the number
//Check if the product exists in our array, if not, create it with 0 stock
if( array_key_exists($strProductCode, $arrProducts) === FALSE ) {
$arrProducts[$strProductCode] = 0;
}
if($strAction === "+") {
//Add stock
$arrProducts[$strProductCode] += $intStockAmount;
} else {
//Minus stock
$arrProducts[$strProductCode] -= $intStockAmount;
}
}
print_r($arrProducts);
https://repl.it/ECrW

Similar to the other answer, maybe a little simpler:
foreach(file('/path/to/file.txt') as $line) {
$item = substr($line, 0, 5);
$sign = substr($line, 5, 1);
$qty = substr($line, 6, 6);
if(!isset($result[$item])) {
$result[$item] = $qty;
} else {
$result[$item] += $sign.$qty;
}
}
Or replace the substr() lines with:
preg_match('/(\d{5})(.)(\d{6})/', $line, $matches);
And use $matches[1], $matches[2] and $matches[3].

I just found out I had misread the txt file. The lines is as follow:
01234000001 N
And
01234000001-N
The blank space between the last number and the N represent addition and - substract

Split a large string into an array, but the split point cannot break a tag

I wrote a script that sends chunks of text of to Google to translate, but sometimes the text, which is html source code) will end up splitting in the middle of an html tag and Google will return the code incorrectly.
I already know how to split the string into an array, but is there a better way to do this while ensuring the output string does not exceed 5000 characters and does not split on a tag?
UPDATE: Thanks to answer, this is the code I ended up using in my project and it works great
function handleTextHtmlSplit($text, $maxSize) {
//our collection array
$niceHtml[] = '';
// Splits on tags, but also includes each tag as an item in the result
$pieces = preg_split('/(<[^>]*>)/', $text, -1, PREG_SPLIT_DELIM_CAPTURE);
//the current position of the index
$currentPiece = 0;
//start assembling a group until it gets to max size
foreach ($pieces as $piece) {
//make sure string length of this piece will not exceed max size when inserted
if (strlen($niceHtml[$currentPiece] . $piece) > $maxSize) {
//advance current piece
//will put overflow into next group
$currentPiece += 1;
//create empty string as value for next piece in the index
$niceHtml[$currentPiece] = '';
}
//insert piece into our master array
$niceHtml[$currentPiece] .= $piece;
}
//return array of nicely handled html
return $niceHtml;
}

Note: haven't had a chance to test this (so there may be a minor bug or two), but it should give you an idea:
function get_groups_of_5000_or_less($input_string) {
// Splits on tags, but also includes each tag as an item in the result
$pieces = preg_split('/(<[^>]*>)/', $input_string,
-1, PREG_SPLIT_DELIM_CAPTURE);
$groups[] = '';
$current_group = 0;
while ($cur_piece = array_shift($pieces)) {
$piecelen = strlen($cur_piece);
if(strlen($groups[$current_group]) + $piecelen > 5000) {
// Adding the next piece whole would go over the limit,
// figure out what to do.
if($cur_piece[0] == '<') {
// Tag goes over the limit, just put it into a new group
$groups[++$current_group] = $cur_piece;
} else {
// Non-tag goes over the limit, split it and put the
// remainder back on the list of un-grabbed pieces
$grab_amount = 5000 - $strlen($groups[$current_group];
$groups[$current_group] .= substr($cur_piece, 0, $grab_amount);
$groups[++$current_group] = '';
array_unshift($pieces, substr($cur_piece, $grab_amount));
}
} else {
// Adding this piece doesn't go over the limit, so just add it
$groups[$current_group] .= $cur_piece;
}
}
return $groups;
}
Also note that this can split in the middle of regular words - if you don't want that, then modify the part that begins with // Non-tag goes over the limit to choose a better value for $grab_amount. I didn't bother coding that in since this is just supposed to be an example of how to get around splitting tags, not a drop-in solution.

Why not strip the html tags from the string before sending it to google. PHP has a strip_tags() function that can do this for you.

preg_split with a good regex would do it for you.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

PHP: Get specific content of a website - php

Related

How to get the string position of the middle-most element within HTML content?

php wildcard characters in filename

how to partially mask/hide email address using PHP

Php loop and count through txt file

Split a large string into an array, but the split point cannot break a tag

Categories

Resources