Using PHP to search a text file

Using PHP to search a text file - php

I'm trying to create a code to download mp3 files embedded in a page. It starts out as a submit form. You input the URL and submit it, and it writes the HTML source of that page to a text file. I also set the script to search the source to see if there is an audio file embedded. I suppose I should include that it's not in the format of filename.mp3. The format is:
embed type="application/x-shockwave-flash" src="http://diaryofthedead.tumblr.com/swf/audio_player_black.swf?audio_file=http://www.tumblr.com/audio_file/1435664895/tumblr_lb2ybulZkt1qb5hrc&color=FFFFFF" height="27" width="207" quality="best"
So here's the thing, there's just a certain string you have to add to the end of the file, for it to redirect to the mp3 file. I know the string. What I want to do is extract, for example "http://www.tumblr.com/audio_file/1435664895/tumblr_lb3ybulZkt1q5hrc" from the middle of this. I know how to read from files but I have no idea how to extract certain parts from it without knowing the exact filename already. So is there any way I can have it search the source for "audio_file" and if it finds the string, extract the audio file?

If your program is just a parser for extracting MP3 files embedded in a webpage you don't even need to save the contents of the webpage onto a file, you can work with the page source inside just your server's memory.
If you want to detect paths to MP3 inside flashes, provided you know how does it match a regular expression, you are done.
If you don't know much about rgular expressions, you should look at them.
If you don't want as much power as a regular expression can give to you, you can always find strings by position, like:
$pos = strpos($haystack, $needle);
Beware: strpos() will find the first (strrpos will find the last) occurrence of a string. So you need to make it as explicit as you can, or you might end up capturing something unwanted.
Take a look at http://www.regular-expressions.info/quickstart.html or something similar.
I can't post more links because I don't have enough reputation yet

You can try using preg_match (http://php.net/manual/en/function.preg-match.php) to get the contents between "audio_file=" and "&".
Or you can also use a string between function to get the contents between those two strings:
http://www.php.net/manual/en/function.substr.php#89493

Related

Load string into html from file? Preferably not using javascript

I have the following code in index.html:
<div class="button">
Title
</div>
I'd like to save "ridiculously long string" in a text file, referenced by index.html. Is this possible?
I tried replacing the string like so the following, but it doesn't work: php reference: file_get_contents()
<div class="button">
Title
</div>
Errors symptoms: the button on my page now reads title="title">Title and clicking it takes me to a 404: The requested URL /~user/html_root/< was not found on this server.. index.html and text.txt are in the html_root directory.
Here's how one of the shorter text.txts read:
?autoplay=0&trail=0&grid=1&colors=1&zoom=1&s=%5B{%228%22:%5B60,61,98,103,109,115%5D},{%229%22:%5B60,61,77,78,97,99,102,104,108,110,114,116%5D},{%2210%22:%5B76,79,98,103,105,109,111,115,117%5D},{%2211%22:%5B76,79,104,110,112,116,118%5D},{%2212%22:%5B60,61,63,64,77,78,111,117%5D},{%2213%22:%5B60,61,63,64%5D},{%2219%22:%5B76,77,79,97,98,102,103,108,109,114,115%5D},{%2220%22:%5B76,78,79,97,99,102,104,108,110,114,116%5D},{%2221%22:%5B98,103,105,109,111,115,117%5D},{%2222%22:%5B104,110,112,116,118%5D},{%2223%22:%5B61,111,117%5D},{%2224%22:%5B60,62,76,77%5D},{%2225%22:%5B60,62,75,78%5D},{%2226%22:%5B61,76,79%5D},{%2227%22:%5B77,78,96,97,102,103,109,110,115,116%5D},{%2228%22:%5B96,98,102,104,109,111,115,117%5D},{%2229%22:%5B61,65,97,98,103,105,110,112,116,118%5D},{%2230%22:%5B60,62,64,66,104,105,111,113,117,119%5D},{%2231%22:%5B60,62,64,66,75,76,112,113,118,120%5D},{%2232%22:%5B61,65,75,78,119,120%5D},{%2233%22:%5B77,78%5D},{%2237%22:%5B78,79%5D},{%2238%22:%5B77,79%5D},{%2239%22:%5B77%5D},{%2240%22:%5B60,61,63,64,75,77%5D},{%2241%22:%5B61,63,75,76%5D},{%2242%22:%5B61,63%5D},{%2243%22:%5B60,61,63,64,114%5D},{%2244%22:%5B78,79,84,85,92,93,95,113,115%5D},{%2245%22:%5B79,84,86,92,93,95,96,97,104,112,115%5D},{%2246%22:%5B78,86,98,103,105,111,113,114%5D},{%2247%22:%5B75,77,86,87,92,93,95,96,97,102,105,110,112%5D},{%2248%22:%5B75,76,93,95,103,104,109,112%5D},{%2249%22:%5B93,95,110,111%5D},{%2250%22:%5B94%5D}%5D
I thought changing text.txt to a more benign URL might help debugging. I changed text.txt to https://www.google.com/ and get the same 404.
I could implement a javascript solution. There's already js on this webpage. But it's controlled by a colleague and I'd prefer to try a stand alone solution first. Many thanks to anyone who can help!

Anytime you want to inject arbitrary data into HTML, you need to wrap it with htmlspecialchars() so that any reserved characters are escaped. Additionally, you actually need to surround attribute values with quotes or you're going to be generating invalid HTML.
Title
Really though, "ridiculously long string" is questionable anyway. I assume you're using some huge data URI? If so, consider not doing that, as there are limits you'll run into and it's not efficient to base64-encode things.

Regex on File Names

I have a function called getContents(), Which accepts a regex for the file names it finds.
I scan the js folder for javascript files, with the following two regex patterns:
$js['head'] = "/(\.head\.js\.php)|(\.head\.js)|(\.h.js)/";
$js['foot'] = "/(\.foot\.js\.php)|(\.foot\.js)|(\.f.js)|(\.js)^(\.head\.js)/";
I have a naming system whereby if you determine where the javascript file gets loaded, in the <head> tag or footer of the HTML page. All files are generally considered to be loaded at the bottom of the page, unless you specify (.head.js for example).
Up until a few days a go I noticed that the js['foot'] array was also including .head.js as well, causing the files to be loaded twice. So I added in the ^(\.head\.js) and it worked! it stopped the .head.js files being added into the footer array. I was quite pleased with myself, because I suck at regex. However it seems now that standard .js files (any normal .js files) arnt being loaded into the $js['foot'] array now. Why is this? If I remove the ^(\.head\.js) part it loads them.
To be clear, I want the $js['foot'] array to load files ending with:
.foot.js.php
.foot.js
.f.js
.js
And IGNORE all:
.head.js.php
.head.js
.h.js
Can someone correct my regex above to do this? I thought the ^ operator was NOT but i was wrong!

^(\.head\.js) in the middle of string makes it an invalid because ^ is considered anchor that matches line start.
You actually need a negative lookbehind assertion to stop matching head.js in footer regex:
$js['head'] = '/\.head\.js(?:\.php)?|\.h.js/';
$js['foot'] = '/\.foot\.js(?:\.php)?|(?<!head|h)\.js/';
RegEx Demo

How to remove Junk characters coming in gmail attachments in php?

I have marked the junk characters in the image and I want the code to remove it and start reading the data after it.

That ugly looking text is not junk but something that makes a *.doc file a DOC file that it is (i.e. formatting). You can't really just echo that file using PHP.
You can display it using a some PHP doc viewer library though or if you can find some API online to convert DOC to TXT.
You can also make the user download it. Use file_put_content() to store that attachment into a doc file like below :
if(file_put_content("attachment.doc", $email['attachment'])){
header("Location: attachment.doc");
}

The binary data represents a *.doc file. If you really want to extract plain text from it, you could do some fuzzy logic, and extract the lines that do not contain any characters with low ASCII codes (except for CR and LF).
Assuming your data structure is in $data, you could do this:
foreach($data as $element) {
$element["attachment"] = preg_replace(
"/^.*?[\x01-\x09,\x0B,\x0C,\x0E-\x1F].*?$\R?/m",
"", $element["attachment"]);
}
Again, this is just "fuzzy" logic, so you still might get some meaningless text that is not removed.

how can I get ID3 tags of my mp3 file with php?

I'm surprised not to find more questions about it, I might be the one on stackoverflow. How do I need to go to extract ID3 basic tags of one mp3 file with php ?
Without downloading any librairies (I just want a neat function that returns an associative array key=a string that specifies the tag, value=a string that specifies the value of the tag).
I found many ways to do it but all require to download libraries with many functions that I don't need necessarily.
I don't want kiddyscripting.
I want to build this function using fopen and reading the first bytes of the file according to the length of each tags (reading in the rfc).
Which functions will I need to use to get the bits (or the bytes converted in bits) ? But not the characters which are in the mp3, cause of course they are not text file.

Reading ID3-Tags is not as easy as "reading the first few bytes" (btw. ID3-V1 would be at the end and not the beginning) because most mp3's have ID3-V2 tags which are eighter at the beginning or the end of the file and they have dynamic lenght and encoding.
Why don't you just use an exsiting library, which does all the work for you (e.g. this)?

I'm not sure who authored this class (ID3TagsReader) but it's used in a tutorial here and works great: http://www.script-tutorials.com/id3-tags-reader-with-php/
No libraries, fluff, etc -- just reads ID3v3 tags.

PHP "spinning" content via random find/replace

I'm using file_get_contents('mysourcefile.html') to load the contents of mysourcefile.html into mysql db.
I have two things that I want to do to the contents of mysourcefile.html before I insert it into the db.
First...
I'd like to do a find/replace on specific string matches contained in mysourcefile.
For example: the tags that a user may place in their source input files would look something like this:
Welcome to [site-name], located at
[site-url] contact us at [site-email]
if you need help.
And I'd like to do a simple string match replacement on these values as they appear in the source file before they are written to the db. The replacement text would come from the wordpress database setup fields. For example, get_option('admin_email') and get_option('home')
Secondly...
I'd also like to allow the user to specify, via a special bracket, a string of words in which to use in order to randomize the content each time its imported, using the same input source file.
For example, in the above sentence, it might be encoded in the source file like so:
I'd also [%like|prefer|want%] to
[%allow|permit%]the user to
[%specify|declare|select%] via a
[%special|unique|predefined%] bracket,
a string of [%words|characters|text%]
in which to use in order to randomize
the content from site to site, using
the same input source file.
So I want to parse that content string and do a simple random replacement of each set element to pick one word out of the collection and use that word for the insert.
Its basically a crude content replacement/spinner and I'm looking for some direction and methods which I could use to do it.

For the first part:
$tags = array('[site-name]', '[site-url'], '[site-email]');
$words = array("My Name", "My URL", "My Email");
$content = str_replace($tags, $words, $content);
The second part might be a little trickier. But the process is:
Grab the content between "[%" and "%]" tags.
implode("|", $string);
Pick a random value
So .. you'll need someone who knows Regex.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

Using PHP to search a text file - php

You can try using preg_match (http://php.net/manual/en/function.preg-match.php) to get the contents between "audio_file=" and "&". Or you can also use a string between function to get the contents between those two strings: http://www.php.net/manual/en/function.substr.php#89493

Related

Load string into html from file? Preferably not using javascript

Regex on File Names

How to remove Junk characters coming in gmail attachments in php?

how can I get ID3 tags of my mp3 file with php?

PHP "spinning" content via random find/replace

Categories

Resources