Consider the following string:
$MRK - Merck - ($AAPL, $MSFT, $F) having day $AA! like $P and me :)
Although it's jibberish, it shows my problem. I want to scan for all words starting with a dollar-sign ($) and check them against a pre-defined list of tickers. If there's a match, replace them with a label ({TICKER}), as follows:
{TICKER} - Merck - ({TICKER}, {TICKER},{TICKER}) having a day {TICKER}! like
{TICKER} an me :)
I now use this function:
function _process_tickers($string) {
$result = db_query("SELECT symbol FROM us_stocks");
while ($row = db_fetch_object($result)) {
$tickers[] = ' $' . $row->symbol . ' ';
}
return str_replace($tickers, ' {TICKER} ', $tweet);
}
Problem: this only catches tickers that are surrounded by spaces (this $AA is surrounded by spaces) but not other situations like (this ticker has only a space in in front $AA) or (this one is surrounded by commas: my,$AA, ticker). But also two tickers right after each other (happy with $AA$XOM) - should become (happy with {TICKER}{TICKER}). How do I cath all these possible situations?
Use regular expressions:
ReplyDeletepreg_replace('/\$[A-Z]+/', '{TICKER}', $tweet);
What's the downside of changing this code
ReplyDelete$tickers[] = ' $' . $row->symbol . ' ';
to this?
$tickers[] = '$' . $row->symbol;
Doing so will make this change regardless of what comes before or after the ticker.
You are probably looking for the \b word boundary If the "tickers" will always just be a string of capital letters, this should suffice:
ReplyDelete<?php
$str = '$MRK - Merck - ($AAPL, $MSFT, $F) having day $AA! like $P and me :)';
echo preg_replace('#\$[A-Z]+#', '{TICKER}', $str), "\n";
Output:
{TICKER} - Merck - ({TICKER}, {TICKER}, {TICKER}) having day {TICKER}! like {TICKER} and me :)