I'm working with a function taken from Corrupt (a web based piece of software used to get "glitchy" effects using jpeg images). This function can be found in the corrupt.php file on line 23. At the moment it's not making the files glitchy enough. I made this images to show you how I want the images to look. This was made by opening the jpeg in a text editor and cutting certain lines and pasting them in other places.
I want this function to do a similar thing but at the moment it doesn't. Any ideas? Is there a better way of doing this maybe?
function scramble($content, $size) {
$sStart = 10;
$sEnd = $size-1;
$nReplacements = rand(1, 30);
for($i = 0; $i < $nReplacements; $i++) {
$PosA = rand($sStart, $sEnd);
$PosB = rand($sStart, $sEnd);
$tmp = $content[$PosA];
$content[$PosA] = $content[$PosB];
$content[$PosB] = $tmp;
}
return($content);
}
It is randomly swapping information around in the data arrays loaded from your image. This causes a valid image to come out with invalid image information in some sectors. Also, image files sometimes contain additional information at the front/end of the file; this does not look like it takes that into account and could corrupt that information as well.
To increase the amount of swaps you will want to increase the number of replacements. The bit of code you are particularly interested in is rand(1, 30);; I would suggest increasing the minimum amount of scramble first and then the upper range if you still do not get the desired effect.
The function does random swaps between the elements of the array. The number of swaps is a randomly generated number from 1 to 30.
Related
I've been trying to validate over 1 million randomly generated values (strings) with PHP and a client side programming language on an online form, but there are a few challenges I'm facing:
PHP
Link to the (editable) PHP code:https://3v4l.org/AtTkO
The PHP code:
<?php
function generateRandomString($length = 10) {
$characters = '0123456789abcdefghijklmnopqrstuvwxyz-_.';
$charactersLength = strlen($characters);
$randomString = '';
for ($i = 0; $i < $length; $i++) {
$randomString .= $characters[rand(0, $charactersLength - 1)];
}
return $randomString;
}
$unique = array();
for ($i = 0; $i < 9000000; $i++)
{
$u=$i+1;
$random = generateRandomString(5);
if(!in_array($random, $unique)){
echo $u.".m".$random."#[server]\n";
$unique[] = $random;
gc_collect_cycles();
}else{
echo "duplicate detected";
$i--;
}
}
echo memory_get_peak_usage();
What should happen:
New 5 character value gets randomly generated
Value gets checked if it already exists in the array
Value gets added to array
All randomly generated values are exported to a .txt file to be used for validating. (Not in the script yet)
What actually happens:
I hit either a memory usage limit or a server timeout for the execution time.
What I've tried
I've tried using sleep(3) during the for loop.
Setting Memory limit to -1 and timeout to 0. The unlimited memory doesn't make a difference and is too dangerous in a working environment.
Using gc_collect_cycles() during the for loop
Using echo memory_get_peak_usage(); -> I don't really understand
how I could use this for debugging.
What I need help with:
Memory management in PHP
Having pauses in the script that will reset the PHP execution timer
Client Side Programming language
This is where I have absolutely no clue which way I should go or which programming language I should use for this.
What I want to achieve
Load a webpage that has a form
Load the .txt with all randomly generated strings
fill in the form with the first string
submit the form:
If positive response from form > save string in special .txt file or array, go to the next value
If negative response from form > delete string from file, go to the next value | or just go to the next value
All values with a positive response are filtered out and easily accessible at the end.
I don't know which programming language I should use for this function. I've been thinking about Javascript and Python but I'm not sure how I could combine that with PHP. A nudge in the right direction would be appreciated.
I might be completely wrong for trying to achieve this with PHP, if so, please let me know what would be the better and easier option.
Thanks!
Interesting question, first of all whenever you think of a solution like this, one of the first things you need to consider is can it be async? If your answer is yes, then your implementation will likely be simple, else, you will likely have to pay huge server costs or render random cached results.
NB remove gc_collect_cycles. It does the opposite of what you want, and you hardly ever need to call it manually.
That being said, the approach I would recommend in your case is as follows:
Use a websocket which will be opened only once on the client browser, and then forward results in realtime from server to the browser. Of course, this code itself, can run completely on clientside via javascript, so if it's not just a PoC, you can convert the php code to javascript.
Change your code to yield items or forward results via websocket once a generated code has been confirmed as unique.
However, if you're really just doing only what the PHP code says, you can do that completely in javascript and save your server resources. See this answer for an example code to replace your generateRandomString function.
Assuming you have the ability to edit the php.ini:
Increase your memory limit as described here:
PHP MEMORY LIMIT INCREASE
For the 'memory limit' see here
and for the 'timeout for the execution time' add :
set_time_limit(0);
on the top of the PHP file.
Have you tried using sets? https://www.php.net/manual/en/class.ds-set.php
Sets are very efficient whenever you want to ensure a value isn't present twice.
Checking the presence of a value in a set it way way way faster that loop across all entries on the array.
I'm not a expert with PHP but it would look like something like that in Ruby
require 'set'
CHARS = '0123456789abcdefghijklmnopqrstuvwxyz-_.'.split('');
unique = Set.new()
def generateRandomString(l = 10)
Array.new(l) { CHARS.sample }.join
end
while unique.length < 1_000_000
random_string = generateRandomString
if !unique.include?(random_string)
unique.add(random_string)
end
end
hope it helps
BACKGROUND
I own a website that indexes all psychologists of Denmark.
My site provides contact information for all the clinics as well as user ratings.
I'm currently listing 12.000 Psychologists, of which about 6.000 have a website. About 1000 of the Psychologists have visited my website, and filled out their profile with additional "Descriptive" info (such as opening hours, prices, etc.)
I'm attempting to automatically scrape (with PHP and RegEx) the sites of those who haven't provided details to my community, for informative reasons.
I went through about a good random 150 of the websites, and concluded that more than 85 % af them, have valuable text proceeding the word 'Velkommen' (=welcome, in Denish). PRECIOUS!
THE QUESTIONS
#1
How do I specificy in my script, that I'd only like to grab approx. 360 characters, and nothing more. Ofc. this should be preceeding (and including) the word Velkommen. Also, the script shouldn't be case sensitive (though Velkommen is usually spelled with a capital V, it can pop up in another sentence.)
Also, it should the last occuring 'velkommen' on the whole frontpage, since it sometimes occurs as a Menu/Navigation option, which would suck, since i'd then grab the navigation options.
#2
Currently - my script saves info in arrays, and then in the database.
Not sure how I should even go about this. What would be optimal for SEO;
Save the scraped text in a MySQL and display that every time.
Render the same 360-characters-text every time [that follows 'Velkommen']
Render random 360-characters-text from the sites, each time someone views a specific Psychologist on my site.
An example site:
$web = "http://www.psykologdorthelau.dk/";
$website = file_get_contents ($web);
preg_match_all("/velkommen.+?/sim", $website, $information);
//THIS SHOULD SPECIFICY THE VERY LAST 'VELKOMMEN' - it doesn't, I know :(
for($i = 0; $i < count($information[0]); $i++){
preg_match_all("/Velkommen (.+?)\"/sim", $information[0][$i], $text, PREG_SET_ORDER);
$psychologist[$i]['text'] = mysql_real_escape_string($text[0][1]);
}
Thank you to anyone who can solve this puzzle, from the wonderful country of Denmark.
When you want to fetch only a certain amount of data you can use a filestream.
It would look something like this:
$handle = fopen("http://www.example.com/", "r"); // open a filestream
// Fetch for example only 10 bytes each time we check
$chunkSize = 10;
$contents = "";
while ( !feof( $handle ) && strlen($contents) < 360) {
$buffer = fread( $handle, $chunkSize );
$contents .= $buffer;
}
$status = fclose( $handle );
//your data is stored in $contents
"the scraped data should be preceeding the word 'velkommen'":
preg_replace_callback('/velkommen(.*){360}/i',
function($matched) {
// Use $matched[1] to perform further testing
},
$contents
);
It's hacky, but it will get you started. Requires PHP 5.4 I believe.
I ran into an interesting (well at least in my opinion) problem.
I have a PHP script that should generate the formatting (eg. the absolute positioning values of each image so they get displayed next to each other in a logical pattern) and the image sources when run. When completed it would load the appropriate image path from an sql db but currently I have a problem with this at this point.
Currently my script looks something like:
for ($i=0; $i<(866+1+866); $i++){
for ($j=0; $j<1001; $j++){
$data .= "<div id=\"tac-".$j."\"><img src=\"default_tactical.png\"/></div>";
}
}
As you can see it's rather basic at this point, as I only wanted to test if I can get the images in place.
Also the $data is a variable that my template simply echo-es to the browser.
The problem with all this is that my server runs out-of memory whenever I try to run this script.
So what's the problem? Or rather: how can I have a lot of images in a webpage without running out of memory?
Try changing it to:
for ($i=0; $i<(866+1+866); $i++){
for ($j=0; $j<1001; $j++){
echo "<div id=\"tac-".$j."\"><img src=\"default_tactical.png\"/></div>";
}
}
It should not run out of memory since it's not storing anything, just directly outputs it.
EDIT: Since you can't modify the code, just try raising the memory limit somewhere in the code (can be any PHP code that is executed before your loop).
#ini_set("memory_limit", "512M");
Look at it this way, you've got 2 nested loop, and are building a string inside.
866+1+66 = 1733 x 1002 = 17,364,66 iterations
17,364,666 iterations * 40 chars = ~70 megabytes
Either DON'T build the string at all once, or at least split it into chunks, e.g.
for ($i = ....) {
for ($j = ....) {
... build string here
}
echo $string
$string = ''; // reset to empty string and start over
}
While you haven't echo your $data, you haven't load image, it's just a string. It's the navigator which going to load each image after PHP processing. Your PHP is executed in the server and client load images. It's your variable $data which is out of memory.
Try like this :
for ($i=0; $i<(866+1+866); $i++){
for ($j=0; $j<1001; $j++){
echo "<div id=\"tac-".$j."\"><img src=\"default_tactical.png\"/></div>";
}
}
I've been having a major headache lately with parsing metadata from video files, and found part of the problem is a disregard of various standards (or at least differences in interepretation) by video-production software vendors (and other reasons).
As a result I need to be able scan through very large video (and image) files, of various formats, containers and codecs, and dig out the metadata. I've already got FFMpeg, ExifTool Imagick and Exiv2 each to handle different types of metadata in various filetypes and been through various other options to fill some other gaps (please don't suggest libraries or other tools, I've tried them all :)).
Now I'm down to scanning the large files (upto 2GB each) for an XMP block (which is commonly written to movie files by Adobe suite and some other software). I've written a function to do it, but I'm concerned it could be improved.
function extractBlockReverse($file, $searchStart, $searchEnd)
{
$handle = fopen($file, "r");
if($handle)
{
$startLen = strlen($searchStart);
$endLen = strlen($searchEnd);
for($pos = 0,
$output = '',
$length = 0,
$finished = false,
$target = '';
$length < 10000 &&
!$finished &&
fseek($handle, $pos, SEEK_END) !== -1;
$pos--)
{
$currChar = fgetc($handle);
if(!empty($output))
{
$output = $currChar . $output;
$length++;
$target = $currChar . substr($target, 0, $startLen - 1);
$finished = ($target == $searchStart);
}
else
{
$target = $currChar . substr($target, 0, $endLen - 1);
if($target == $searchEnd)
{
$output = $target;
$length = $length + $endLen;
$target = '';
}
}
}
fclose($handle);
return $output;
}
else
{
throw new Exception('not found file');
}
return false;
}
echo extractBlockReverse("very_large_video_file.mov",
'<x:xmpmeta',
'</x:xmpmeta>');
At the moment it's 'ok' but I'd really like to get the most out of php here without crippling my server so I'm wondering if there is a better way to do this (or tweaks to the code which would improve it) as this approach seems a bit over the top for something as simple as finding a couple of strings and pulling out anything between them.
You can use one of the fast string searching algorithms - like Knuth-Morris-Pratt
or Boyer-Moore in order to find the positions of the start and end tags, and then read all the data between them.
You should measure their performance though, as with such small search patterns it might turn out that the constant of the chosen algorithm is not good enough for it to be worth it.
With files this big, I think that the most important optimization would be to NOT search the string everywhere. I don't believe that a video or image will ever have a XML block smack in the middle - or if it has, it will likely be garbage.
Okay, it IS possible - TIFF can do this, and JPEG too, and PNG; so why not video formats? But in real world applications, loose-format metadata such as XMP are usually stored last. More rarely, they are stored near the beginning of the file, but that's less common.
Also, I think that most XMP blocks will not have sizes too great (even if Adobe routinely pads them in order to be able to "almost always" quickly update them in-place).
So my first attempt would be to extract the first, say, 100 Kb and last 100 Kb of information from the file. Then scan these two blocks for "
If the search does not succeed, you will still be able to run the exhaustive search, but if it succeeds it will return in one ten-thousandth of the time. Conversely, even if this "trick" only succeeded one time in one thousand, it would still be worthwhile.
I have large text files 140k or larger full of paragraphs of text and need to insert a sentence in to this file at random intervals only if the file contains more then 200 words.
The sentence I need to insert randomly throughout the larger document is 10 words long.
I have full control over the server running my LAMP site so I can use PHP or a linux command line application if one exists which would do this for me.
Any ideas of how best to tackle this would be greatly appreciated.
Thanks
Mark
You could use str_word_count() to get the number of words in the string. From there, determine if you want to insert the string or not. As for inserting it "at random," that could be dangerous. Do you mean to suggest you want to insert it in a couple random areas? If so, load the contents of the file in as an array with file() and insert your sentence anywhere between $file[0] and count($file);
The following code should do the trick to locate and insert strings into random locations. From there you would just need to re-write the file. This is a very crude way and does not take into account punctuation or anything like that, so some fine-tuning will most likely be necessary.
$save = array();
$words = str_word_count(file_get_contents('somefile.txt'), 1);
if (count($words) <= 200)
$save = $words;
else {
foreach ($words as $word) {
$save[] = $word;
$rand = rand(0, 1000);
if ($rand >= 100 && $rand <= 200)
$save[] = 'some string';
}
}
$save = implode(' ', $save);
This generates a random number and checks if it's between 100 and 200 inclusive and, if so, puts in the random string. You can change the range of the random number and that of the check to increase or decrease how many are added. You could also implement a counter to do something like make sure there are at least x words between each string.
Again, this doesn't take into account punctuation or anything and just assumes all words are separated by spaces. So some fine tuning may be necessary to perfect it, but this should be a good starting point.