The Feed Doctor: New Function Released
It’s been nearly two months since my last post; my excuse is that we were super-busy with the Holiday season and with getting some new features out the door, as Max mentioned earlier. In my last post, I hinted at a new feature I was working on for ShoppingAdvisor. That’s going to be a huge post in itself, so today I’ll just briefly mention a new function that you can use to write business rules: REGEXGET.
Regular readers know some of the magic that a regular expression (or “regex”) can work, from our discussion of two other functions, REGEXREPLACE and REGEXMATCH. REGEXGET works a lot like REGEXMATCH: it has the same arguments—text and a regex. Instead of returning a true/false value, though, REGEXGET returns the part of the text that matches the regex. If I may plagiarize myself, I’ll re-use an example from before:
Dear Feed Doctor,
I sell model cars, and one of the attributes I want to put in my feed is the year of the car. Unfortunately, I don’t have any separate data field for the year; however, the title DOES have the year in it. Can I write a business rule that will pull the year out of the title? My titles are usually something like “Red 1978 Mercury Cougar.”
You may recall that we solved this problem using REGEXREPLACE:
REGEXREPLACE($TITLE,".+(\d{4}).+","$1")
REGEXGET can solve the same problem with less typing. The main difference is the regular expression. Before, we needed something that matched the entire title. We only cared about the 4 digits, which is matched by \d{4}; we put the parentheses around it to show that we wanted to “save” it. But then we also had to match everything before and after the digits. That’s what the .+ stuff is for. Then finally, we replace what’s matched (which is the entire title text) with the part we “saved,” which is what the $1 is for. With REGEXGET, we only need to match the part we want to “save”:
REGEXGET($TITLE,"\d{4}")
Here are the results:
| Example # | Title | Year (From Rule) |
|---|---|---|
| 1 | Red 1978 Mercury Cougar | 1978 |
| 2 | Metallic Mint Green 1963 Pontiac Tempest | 1963 |
| 3 | White 1979 Oldsmobile 98 | 1979 |
| 4 | Grey 1984 Audi 4000 | 1984 |
| 5 | Ford Model T |
Note line #4: even though the regular expression matches multiple times (both “1984″ and “4000″), you only get the first one. Also, as #5 shows, if the regex doesn’t match anything, the result will be blank.
That’s it for this post. The next one is a little bit more technical than usual, so be warned!
