I’ve been reading Mastering Regular Expressions by Jeffrey E.F. Friedl, and since nobody in my life (aside from my wife) cares, I thought I’d share something I’m pretty proud of. My first set of regular expressions, that I wrote myself to manipulate the text I’m working with.
What’s I’m so happy about is that I wrote these expressions. I understand exactly what they do and the purpose of each character in each expression.
I’ve used regex in the past. Stuff cobbled together from stack overflow, but I never really understood how they worked or what the expressions meant, just that they did what I needed them to do at the time.
I’m only about 10% of the way through the book, but already I understand so much more than I ever did about regex (I also recognize I have a lot to learn).
I wrote the expressions to be used with egrep and sed to generate and clean up a list of filenames pulled out of tarballs. (movies I’ve ripped from my DVD collection and tarballed to archive them).
The first expression I wrote was this one used with tar and egrep to list the files in the tarball and get just the name of the video file:
tar -tzvf file.tar.gz | egrep -o '\/[^/]*\.m(kv|p4)' > movielist
Which gives me a list of movies of which this is an example:
/The.Hunger.Games.(2012).[tmdbid-70160].mp4
Then I used sed with the expression groups to remove:
- the leading forward slash
- Everything from
.[
to the end - All of the periods in between words
And the last expression checks for one or more spaces and replaces them with a single space.
This is the full sed command:
sed -Eie 's/^\///; s/\.\[[a-z]+-[0-9]+\]\.m(p4|kv)//; s/[^a-zA-Z0-9\(\)&-]/ /g; s/ +/ /g' movielist
Which leaves me with a pretty list of movies that looks like this:
The Hunger Games (2012)
I’m sure this could be done more elegantly, and I’m happy for any feedback on how to do that! For now, I’m just excited that I’m beginning to understand regex and how to use it!
Edit: fixed title so it didn’t say “regex expressions”
I can also recommend the book the TS mentioned, it is very good and after reading it you will understand regular expressions. It’s fine to use a cheat sheet if you want, cause if you don’t do it regularly the knowledge can sag, but the understanding is what matters. Also depending on the context, different implementations can have slightly different syntax or modifiers to be aware of.
I lent out the book to my brother once and he somehow lost it, so I never got it back. Don’t lend out book guys.
And remember not everything can be solved using a regular expression: https://xkcd.com/1171/