Search for part of a string in array - php

i have been over so many posts online trying to sort this....
i am integrating a small search feature into a php script....
i have tried many scripts including in_array (only works for full match) and array_search
and nothing seems to work....
i have a script that creates an array with file names (all in the format "Random-Name-1.ext")
but the script removes the file extention, just leaving the filenames....
all filenames are seperated word by word with a -
all words have the first letter capital....
the array is called $files1;
the search string is called $search_string;
what im looking for would be along the lines of a foreach loop to check if the search string is contained in any part of each array value, and if it is, put the full array value into another array called $search_results
as the next part of my script to paginate needs the array $search_results to echo each of the filenames and display them 10 per page....
hope this is enough info, ive been workin on it for ages and racking my brains trying to find the correct code....
thanks in advance
..........
...........
EDIT.....
.......
........
got script working with preg_grep..... but i now have a slight problem....
the script before the search code is designed to get the url of the search page with q={search string}
and then trim this so that i just have the serach string as a variable.....
i use an str_replace to change the page url from
/games-search?q={search-string}
to {search-string}
this is perfectly fine, but the script to paginate the results as 10 per page adds ?page=2 to the url....
so when i click page 2, the str_replace to change the url to search string doesnt work now, as the new page url for page 2 is
/games-search?page=2&q={search-string}
i have been trying to do another str_replace to change the consecutive pages urls to just the search string but i am having problems with the regex to define the page number.....
the page numbers vary from 1 - about 50 (never more than 99, so 2 digits would be enough to match....
i have tried over and over again today to get this regex correct but i am not sure if i am going about it the correct way.....
here is my latest effort....
/games-search?page=(^[0-9][0-9]+)&
that is what i am trying to replace with "" (eg, nothing)
as i only need the data from after the & character in the url ... and thats IF the url even contains this (for exapmple the first page when the url doesnt contain the & character - if the url doesnt contain the & character , i dont want it modifying as i already have the data i need)
thanks again if anyone can help
decided not to paginate the results of each search as ther will never be more than about 15 results..... this is fine, but i am going to implement a miminum search length of 4 characters so it doesnt bring up 500 results for the letter A :)

Search an array for a substring match? preg_grep
Then do something like
$search_results = ...;
$paginated = array();
for ($i = 0; $i < count($search_results); $i++)
{
if ($i % 10 == 0) $paginated[$i + 1] = array();
$paginated[$i][] = $search_results[$i + 1];
}
Then print_r that array, and you'll see you've got something dead easy to work with.

Related

PHP get 5 additional characters after a specified string of a particular page and list them

Note: I have edited the post on 2017/08/20
I'm trying to obtain a list of product page's URL that goes "www.example.com/product/11111/".
There are over 200 different products available and each of them has its own product page, I want to print out each product in a PDF file.
On "www.example.com/productlist/", there are URLs that lead to each product's page.
So, what I'm trying to do is
Obtain URLs that I need from "www.example.com/productlist/"
Generate PDF files of URLs that I have obtained
Insufficient Information: You did not provide me with much information about the code you already have and how the website will get the 200 URLs, so I can't write the whole code because it depends on the way your website will get the links from.
If you explain more about how the website is supposed to get the links, I will help you put them into an array and save them into a file and implement the rest of the code!
1-) Thing I understood is getting the last 5 characters
As you just want the last 5 characters, not the whole last part, you can do something like this.
$string = "http://example.com/folder/example/1234567"; //your link
$characters = strlen($string); //gets the characters count
$letters = 5; //edit 5 to show more or less manually
$code = substr($string, $characters - $letters, $characters);
echo $code; //will show the last 5 characters
I am always here to help. Good luck!
Start with parse_url().
$parts=parse_url("http://example.com/folder/example/12345");
That will give you an array with a handful of keys. The one you are looking for is path.
Split the path on / and take the last one.
$path_parts=explode('/', $parts['path']);
Your random numbers can now be stored like:
$number = $path_parts[count($path_parts)-1].

PHP extract String after pattern

I want to cut a html link with php.
The html link is everytime the same pattern
domain.com/forum/members/84564-name.html
I want to get the 84564 from the name.
the /forum/members/ is everytime the same.
and the "-" after the user-id is also everstime the same.
Can you help me to extract the user id?
I'm going to assume the user id may not always be 5 digits.
$domainSplit = explode("/", $theDomain); // split into parts
$theIdSplit=explode("-", $domainSplit[3]); // split 84564-name.html
$id=$theIdSplit[0];

PHP website data mining Preg_Match Undefined Offset

I'm working on a PHP project for school. The task is to build a website to grab and analyze data from another website. I have the framework set up, and I am able to grab certain data from the desired site, but I can't seem to get the syntax right for other data that I need to obtain.
For example, the site that I am currently analyzing is a page for a specific item returned from a search of Amazon.com (e.g. search amazon.com for "iPad" and pick the first result). I am able to grab the title of the product's page, but I need to grab the review count and the price, and therein lies the issue. I'm using preg_match to get the title (works fine), but I'm not able to get the reviews nor the price. I continue to get the Undefined Offset error, which I've discovered means that there is nothing being returned that matches the given criterion. Simply checking to see whether something has been returned will not help me, since I need to obtain these data for my analysis. The 's that I'm trying to mine are unique on the page, so there is only one instance of each.
The Page Source for my product page contains the following snippits of HTML that I need to grab. (The website can, and needs to be able to handle, anything, but for this example, I searched "iPad").
<span id="priceblock_ourprice" class="a-size-medium a-color-price">$397.74</span>
I need the 397.74.
<span id="acrCustomerReviewText" class="a-size-base">1,752 customer reviews</span>
I need the 1,752.
I've tried all combinations of escape characters, wildcards, etc., but I can't seem to get beyond the Undefined Offset error. An example of my code is as follows where $link is the URL, and $f is an empty array in which I want to store the result (Note: There is NOT a space after the '<' in "< span..." It just erased everything up to the "...(.*)..." when I typed it as "< span..." without the space):
preg_match("#\< span id\=\"priceblock\_ourprice\" class\=\"a\-size\-medium a\-color\-price\"\>(.*)\<\/span\>#", file_get_contents($link), $f);
$price=$f[1]; //Offset error occurs on this line
echo $price;
Please help. I've been beating my head against this for the past two days now. I'm hoping I'm just doing something stupid. This is my first experience with preg_match and data mining. Thank you much in advanced for your time and assistance.
Code
As stated by #cabellicar123, you shouldn't use regex with html.
I believe what you are looking for is strpos() and substr(). It should look something like this:
function get_content($string, $begintag, $endtag) {
if (strpos($string, $begintag) !== False) {
$location = strpos($string, $begintag) + strlen($begintag);
$leftover = substr($string, $location);
$contents = substr($leftover, 0, strpos($leftover, $endtag));
return $contents;
}
}
// Usage (Change the variables):
$str = file_get_contents('http://www.amazon.com/OLB3-Official-League-Recreational-Ball/dp/B004KOBRMC/');
$beg = '<b class="priceLarge">$';
$end = '</b>';
get_content($str, $beg, $end);
I've provided a working example which would return the price of the object on the page, in this case, the price of a rawlings baseball.
Explanation
I'll go through the code, line by line, and explain every piece.
function get_content($string, $begintag, $endtag)
$string is the string being searched through (in this case an amazon page), $begintag is the opening tag of the element being searched for, and $closetag is the closing tag of that element. NOTE: This will only use the first instance of the opening tag, more than that will be ignored.
if (strpos($string, $begintag) !== False)
Checks if the beginning tag actually exists. Note the !== False; that's because strpos can return 0, which evaluates to False.
$location = strpos($string, $begintag) + strlen($begintag);
strpos() will return the first instance of $begintag in $string, therefore the length of the $begintag must be added to the strpos() to get the location of the end of $begintag.
$leftover = substr($string, $location);
Now that we have the $location of the opening tag, we need to narrow the $string down by setting $leftover to the part of the $string after $location.
$contents = substr($leftover, 0, strpos($leftover, $endtag));
This gets the position of the $endtag in $leftover, and stores everything before that $endtag in $contents.
As for the last few lines of code, they are specific to this example and just need to be changed to fit the circumstances.

Need to pull a list of image url's from a Mysql database based on a subdomain, and store them as a list

I have a Mysql Forum database which I need to query all the posts in for a specific set of images, on a particular url.
The url's are of images hosted on a subdomain that we don't have access to like this "http://images.website.com/images/randomnumberhere.jpg".
I need a mysql query to pull these out and process them into a list which we can later loop through to grab them all and move them.(I got this part handled)
I'm a php/mysql programmer but this feels like a regex problem and i'm not so great with that yet.
The issue is we don't have a list of the images, and it's a big long random number (so far as I can see). So what I need is a string like "images.website.com/images/(randomnumbers).jpg" and then put them into a list.
You could get all fancy pants-like and use regular expressions,
but you could also try a simple
SELECT * FROM image_table WHERE image_source LIKE '%images.website.com/images/%'
Is this what you are looking for?
If you're looking to pull all of the text from the database, and then use PHP to create a list of the images, try something like this:
$image_list = array();
while($row = $sql->fetch_array())
{
$text = $row['text'];
/* Changed to preg_match_all */
if(preg_match_all("/http:\/\/images.website.com\/images\/[0-9]+\.(jpg|jpeg|png|gif)/i", $text, $matches))
{
$image_list[] = $matches[0];
}
}
Nothing fancy, and I didn't test it, but it should work. That's a hardcoded regex that matches the URL you're looking for specifically. You may want to modify it so that it can match multiple domains from an array, or something, but it should get you started.
EDIT: Should have mentioned that you could then loop through the $image_list array to display the images, or whatever you're going to do with them.
$rand = //rand() function or whatever you would like to create the random number, since you didn't give ranges, it might be an array aswell;
$string = "images.website.com/images/$rand.jpg";
Or even a simple loop:
for ($i = 1; $i < 100; $i++) {
echo $string = "images.website.com/images/$i.jpg";
}
It's up to you how would you use it with your database

character-based pagination - inserting page breaks on text, not punctuation or code

I'm writing code to generate character-based pagination. I have articles in my site that I want to split up based on length.
The code I have so far is working albeit two issues:
It's splitting pages in the middle of words and HTML tags; I want it to
only split after a complete word, tag, or a punctuation mark.
In the pagination bar, it's generating the wrong number of pages.
In the
pagination bar, it's generating the
wrong number of pages.
Need help addressing these two issues. Code follows:
$text = file_get_contents($View);
$ArticleLength = strlen($text);
$CharsPerPage = 5000;
$NoOfPages = round((double)$ArticleLength / (double)$CharsPerPage);
$CurrentPage = $this->ReturnNeededObject('pagenumber');
$Page = (isset($CurrentPage) && '' !== $CurrentPage) ? $CurrentPage : '1';
$PageText = substr($text, $CharsPerPage*($Page-1), $CharsPerPage);
echo $PageText, '<p>';
for ($i=1; $i<$NoOfPages+1; $i++)
{
if ($i == $CurrentPage)
{
echo '<strong>', $i, '</strong>';
}
else
{
echo '', $i, '';
}
echo ' | ';
}
echo '</p>';
What am I doing wrong?
Thanks, guys. I put in the fix for the 1st point and it worked beautifully.
Hm. I guess it is messy to do the second point. I've found some regex on-line. Will think, write, and get back to you when I make some progress.
Thanks again.
$NoOfPages = round((double)$ArticleLength / (double)$CharsPerPage);
That should use ceil instead of round - if you use round, 4.2 pages will only show 1-4 - you need a 5th page to show the last .2 of a page.
The other part is harder ... its common to use some sort of marker in the file to indicate where the page breaks go as no matter how clever your code, it can't appreciate where is a good break in then same way a human can.
If you insist on doing it suggest some logic that first works forwards/backwards to the nearest space when a page break is created, which isn't too tricky. More tricky is deciding when you are within a tag or not .... think you'll either need some fairly heavy regex, or else an HTML parsing tool.
You're calculating the number of pages wrong... you should be using ceil() not round() (for example 4.1 pages worth of text is still 5 pages to display).
To fix the other issue, you're going to have big problems if there's arbitrary HTML in there. For example, you need to know that <div>s and <p>s are OK to split, but <table>s aren't (unless you want to get really fancy)!
To do it properly you should use an HTML library to build a tree of elements and then go from there.
Based on your first statement,
It's splitting pages in the middle of words and HTML tags
it appears that your character count is being done after markup is inserted. That would imply that e.g. long URLs in links would be counted against the page length you're trying to achieve. However, you didn't say how the articles were being created initially.
I'd suggest looking for a point in the process of creating the article where you could examine the raw text. By regarding the actual content (without markup) as a set of paragraphs, and estimating the vertical length of each paragraph based on typical number of characters per line, you can come up with a more consistent sizing.
I would also consider only breaking between paragraphs, to keep units of thought together on the same page. Speaking as a reader, I really hate going to sites that force me to pause, hit a button or link, and wait for a page reload, all in the middle of a single thought.

Categories