I’ve been reading Mastering Regular Expressions by Jeffrey E.F. Friedl, and since nobody in my life (aside from my wife) cares, I thought I’d share something I’m pretty proud of. My first set of regular expressions, that I wrote myself to manipulate the text I’m working with.
What’s I’m so happy about is that I wrote these expressions. I understand exactly what they do and the purpose of each character in each expression.
I’ve used regex in the past. Stuff cobbled together from stack overflow, but I never really understood how they worked or what the expressions meant, just that they did what I needed them to do at the time.
I’m only about 10% of the way through the book, but already I understand so much more than I ever did about regex (I also recognize I have a lot to learn).
I wrote the expressions to be used with egrep and sed to generate and clean up a list of filenames pulled out of tarballs. (movies I’ve ripped from my DVD collection and tarballed to archive them).
The first expression I wrote was this one used with tar and egrep to list the files in the tarball and get just the name of the video file:
tar -tzvf file.tar.gz | egrep -o '\/[^/]*\.m(kv|p4)' > movielist
Which gives me a list of movies of which this is an example:
/The.Hunger.Games.(2012).[tmdbid-70160].mp4
Then I used sed with the expression groups to remove:
- the leading forward slash
- Everything from
.[
to the end - All of the periods in between words
And the last expression checks for one or more spaces and replaces them with a single space.
This is the full sed command:
sed -Eie 's/^\///; s/\.\[[a-z]+-[0-9]+\]\.m(p4|kv)//; s/[^a-zA-Z0-9\(\)&-]/ /g; s/ +/ /g' movielist
Which leaves me with a pretty list of movies that looks like this:
The Hunger Games (2012)
I’m sure this could be done more elegantly, and I’m happy for any feedback on how to do that! For now, I’m just excited that I’m beginning to understand regex and how to use it!
Edit: fixed title so it didn’t say “regex expressions”
It is a great book, although a bit outdated. In particular, nowadays
egrep
is not recommended to use.grep -E
is a more portable synonim.Some notes on you script:
You don’t need to escape slashes in grep regex. In the sed
s///
command better use another character likes###
so you also can leave slashes unescaped.You usually don’t need to pipe
grep
andsed
,sed -n
with regex address and explicit printing command gives the same result asgrep
.You could omit leading slash in your
egrep
regex, so you won’t need to remove it later.So I would do the same with
tar -tzvf file.tar.gz | sed -En '/\.(mp4|mkv)$/{s#^.*/##; s#\.\[.*##; s#[^a-zA-Z0-9()&-]# #g; s/ +/ /g; p}'
Not directed at you personally, but this is the kind of pointless pedantry from upstream developers that grinds my gears.
Like, I’ve used
egrep
for 25 years. I don’t know of a still relevant Unix variant in existence that doesn’t have theegrep
command. But suddenly now, when any other Unix variant but Linux is all but extinct, and all your shell scripts are probably full of bashisms and Linuxisms anyway, now there is somehow a portability problem, and they deem it necessary to print out a warning whenever I dare to runegrep
instead ofgrep -E
? C’mon now … If anything, they have just made it less portable by spitting out spurious warnings where there weren’t any before.GNU grep, the most widespread implementation, does not include
egrep
,fgrep
andrgrep
for years. Distributions (not all, but many) provide shell scripts that simply rungrep
with corresponding option for backward compatibility. You can learn this from official documentation.Also, my scripts are not full of bashisms, gnuisms, linuxisms and other -isms, I try to keep them portable unless it is really necessary to use some unportable command or syntax.
It seems you need to read the official documentation yourself. While it’s new information to me that
egrep
is no longer a symlink, as it used to be a couple of years ago, but a shell script wrapper togrep -E
instead, the egrep command is to this day still provided by upstream GNU grep and is installed by default if you run./configure; make; make install
from source. So it is not a backward compatibility hack provided by the distribution.You can check for yourself. Download the source from https://ftp.gnu.org/gnu/grep/grep-3.11.tar.gz, unpack and look for
src/egrep.sh
or line 1756 ofsrc/Makefile
. Apparently the change from symlink to shell script was done in 2014, and the deprecation warning was added only last year.In any case, my larger point is that the depreciation of
egrep
was a pointless and arbitrary decision that does not benefit users, especially not veterans like myself who have become accustomed to its presence. I don’t mind change, but let’s be honest, most people are not in the habit of checking the minutiae of every little command line utility they use, so a change like this violates the principle of least surprise. It’s one thing if things are changed with a good reason and the users do not only suffer the inconvenience of the change but get to reap the benefits of it as well, but so far I haven’t found any justification for it yet, nor can I think of any.So if there is a portability problem with using
egrep
now, it’s a self-inflicted portability problem that they caused by deprecatingegrep
in the first place.Good for you. Do you want a cookie or something?
I don’t know about that guy but you need a chill-pill dude.
Well he wrote it like he wanted to be applauded for it or something.
I also find the irony of your comment extremely funny … although that’s probably lost on you.
Later, dude.
I did. Debian man page, GNU grep manual.
I’m sorry for your loss, however the egrep deprecation is a fact. Of course you can continue using it as a veteran, but it is not correct to recommend this to beginners.
You are strawmanning, and your links are not countering any point I made. I never disputed the depreciation as fact, and I never recommended that beginners should use
egrep
overgrep -E
I disputed your claims that the
egrep
command has just been a distro hack all these years, when in fact GNU to this day still distributesegrep
through its source tarballs and only very recently started to warn about it through the wrapper script. And again, the only “portability problem” here is the fact that they deprecated it in the first place, i.e. a self-inflicted one.Then as a Linux and Unix veteran I gave my subjective opinion by lamenting and criticizing the fact that this depreciation happened, and how changes like this always feel like unnecessary pedantry to me. Yes it’s an expression of frustration, but I am allowed to feel frustrated about it. I don’t need people like you invalidating how I feel about breaking changes in software that I use daily.