regex php and text files - php

I am currently attempting to drag some information out of a .txt file using a php script. I have been reading about regex's and thought this would be ideal. To give you some idea the format of the text in the .txt file is as follows:
Data Rate: 20 Hz
Digital I/O Channels:
CH1_IN,0,CH2_OUT,1,CH3_IN,0,CH4_OUT,1,CH5_IN,0,CH6_IN,0,CH7_IN,0,CH8_OUT,0,CH9_IN,0,
CH10_IN,0,CH11_OUT,1,CH12_IN,0,CH13_IN,0,CH14_IN,0,CH15_IN,0,CH16_IN,0,
QEA: Enabled
I am trying to pull out the following detail for each channel:
CH(number)_(IN or OUT),(integer)
As described in various posts and some tutorials I have tried using preg_split but haven't been able to get it to work as I want. My understanding is that something like that shown below should work, although it is likely I have not used it correctly:
$log_file_data = file_get_contents('Log.txt');
$channel_detail = preg_split("/CH[0-9]{2}_[A-Z],[0-1]{1}/",$log_file_data);
My intention is that this would split the text nicely into portions as described earlier but as expected it just pretty much spews out the complete text file. Am I using the correct method or does it not suit what I am looking to achieve?
Any guidance would be appreciated.

You don't need preg_split actually but preg_match_all with an improved regex:
$line = <<< EOF
CH1_IN,0,CH2_OUT,1,CH3_IN,0,CH4_OUT,1,CH5_IN,0,CH6_IN,0,CH7_IN,0,CH8_OUT,0,CH9_IN,0,
CH10_IN,0,CH11_OUT,1,CH12_IN,0,CH13_IN,0,CH14_IN,0,CH15_IN,0,CH16_IN,0,
EOF;
if (preg_match_all('/CH([0-9]+)_(IN|OUT),([01])/', $line, $arr))
print_r($arr);
Your channel #, IN/OUT and next number is available in groups #1, #2 and #3

You really don't need regex at all. Exploding on ',' will yield an array that is Channel names for all odd indexes, and every even number will contain an integer that belong to the last index.
Cheers

Related

Simple web scraping in PHP

To make it clear from a beginning, I have total consent to do this by the website administrator until they build an API.
What I want to do is get, say, a number or any piece of data found in a specific part of the site, althought it's place in line can change.
An example of what I wish to do, if I were to store the html in a variable through file_get_contents, and wanted to find somewhere in the source where it says "<p>User status: Online.</p>"; I would need to store the text between "status: " and ".</p>" in a variable, only knowing these two strings to find it, but knowing as well that there's only one possible scenario where those two texts are in the same line
EDIT: I seem to have forgotten the most important part of this. Well, the question is how to do what I just described, if you have a lot of text, how can I find what's between one piece of text and another piece of text, and store it in a variable?
There are a couple ways to scrape websites, one would be to use CSS Selectors and another would be to use XPath, which both select elements from the DOM.
Since I can't see the full HTML of the webpage it would be hard for me to determine which method is better for you. There is another option which may be frowned upon, but in this case it might work.
You could use a Regex (regular expressions) to find the characters, I'm not the best at regular expressions but here is some sample code of how that might work:
<?php
$subject = "<html><body><p>Some User</p><p>User status: Online.</p></body></html>";
$pattern = '/User status: (.*)\<\/p\>/';
preg_match($pattern, $subject, $matches);
print_r($matches);
?>
Sample output:
Array
(
[0] => User status: Online.</p>
[1] => Online.
)
Basically what the regex above is doing is matching a pattern, in this case it looks for the string "User status: " then matches all the characters (.*) up to the ending paragraph tag (escaped).
Here is the pattern that will return just "Online" without the period, wasn't sure if all statuses ended in a period but here is what it would look like:
'/User status: (.*)\.\<\/p\>/'

Make certain text in outlook incoming e-mails into links?

I'd like to do some operations on incoming e-mails. Namely transform all 6 digit numbers into links which lead to a url based on the number.
I don't want to open a huge can of worms, in terms of APIs or languages besides PHP, this isn't that much of a timesaver, but it would be nice. Anyone done anything like this? Just looking to get pointed in the right direction !
You can use a regex to find your numbers and replace them with your links. Since I do not know your link structure, I made one up.
Here is a simple example:
$str = "Testing 385758 String";
preg_replace( '/(\d{6})/', '$1', $str);
This will turn $str into:
Testing 385758 String
Demo

Use line-break as separator for an array input?

I've never actually used arrays before, as I've never had to so far (a simple variable has been enough for me), however now I've created a form with a text-area that is meant to POST multiple urls through to my PHP script.
What I want to do is use a line-break in the visitors input to act as a separator for an array input.
For example, the visitor inputs 90 lines of text (all url's), the array breaks each one into a list of 90, and creates an array value for each one.
Any info, advice or comments would be greatly appreciated :)!
Not 100% percent sure what line breaks are used, e.g.:
Windows uses \r\n
Linux uses \n
(old) Macs used \r
However if you know this you can simply do:
$urls = explode("\n", $_POST['urls']);
EDIT
Actually after testing using regex IS faster than first doing a str_replace() and explode.
Look at http://www.php.net/manual/en/function.preg-split.php and as delimiter use new line sign
or see PHP REGEX - text to array by preg_split at line break
be careful about using just \r or \n because every operating system has "new line" defined another way
see answer by Tgr on SO question PHP REGEX - text to array by preg_split at line break
Use explode
$array=explode("\n",$_POST['textarea']);

Need PHP Regex help

I've been working on this simple script all day trying to figure it out. I'm new to regex so please keep that in mind. On top of that, I've tried just about anything and everything I could to get this to work.
I'm trying to (to learn, please don't point me to the API) download a TSV file from Yahoo Site Explorer via either cURL or file_get_contents (both work, just messing with different things) and then using regex to get only the URL column to appear. I realize I might have more luck with other functions, but I can't find anything dealing with TSV and now it's become a challenge. I've literally spent the entire day trying to get this correct.
So a URL would be:
https://siteexplorer.search.yahoo.com/search?p=www.google.com&bwm=i&bwmo=&bwmf=s
And my regex currently looks like this (I know it's horrible...it's probably the millionth attempt):
preg_match_all('((http(s?)://?(([^/]+(\/.+))))^[\t]$)', $dl, $matches);
My issue right now is that there's 4 columns. TITLE URL SIZE FORMAT. I'm able to strip out everything from the first column (TITLE) and the last (FORMAT) column, but I cannot seem to strip out the SIZE column and get rid of the last slash in case the sites linking in don't have that last slash.
Another thing - I've actually accomplished getting JUST the URL to appear, but they all had ending slashes which leave out links from, say, Twitter.
Any help would be greatly appreciated!
Don't know much about PHP, but this regex works in python (should be the same in PHP):
".+?\t(.+?)\t.*"
Just match it and get the content of group 1. FWIW, code in Python:
import re
import fileinput
urlre = re.compile(".+?\t(.+?)\t.*")
for line in fileinput.input():
m = urlre.match(line)
if m:
print m.group(1)
Personally, I'd split the lines by tab. For example:
$stuff = file_get_contents($url);
// split the whole file by newlines, to get an array of lines
$lines = explode("\n", $stuff);
// loop through the lines
foreach ($lines as $line) {
// split by tab
$parts = explode("\t", $line);
// put the URLs in a list
$urls[] = $parts[1];
// or keep track of them by title
$urls[$parts[0]] = $parts[1];
// or whatever...
}
Just use parse_url or parse_str instead. Always try to find anything else than regular expressions which are extremely slow.

PHP tags. How to minimize the tags usage in php script

I am using an ask answer script on a website and it converts the headline title words into the search query tags automatically.
For example: "Who are you?" is converted into tags 'Who' 'are' and 'you' tags respectively. I want tags to be displayed only if the letters in the word are greater than 4. Is it possible?
I am not into php but I searched for the 'tags' in my script and have uploaded the result here http://pastebin.com/m670a1609. Kindly let me know which source file would help in achieving this..
Thanks!
I want tags to be displayed only if
the letters in the word are greater
than 4. Is it possible?
You can do like this:
if (strlen($your_word) > 4)
{
// go ahead
}
Surely it would be easier to directly ask the provider of the script for help. A quick Google search makes me believe that AnswerScript is a commercial package that comes with support.
Maybe you should be looking for something like
explode(' ', $title_words)
and not php tags. This function splits the variable $title_words into separate array elements using a space as a delimiter. More or less what that script does to that title.

Categories