You are here

Capitalizing words and changing case using regex

Submitted by Druss on Sun, 2014-08-03 11:16

Let's assume that we have some sample text like the following:

Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.

Now, say that we want to convert all characters in this block to upper-case using a regular expression. To do this, we would use something like:

s/(\w+)/\U\1/g

This matches each string of letters of the alphabet (i.e., "word" characters, \w) and upper-cases them using the \U. This gives us:

LOREM IPSUM DOLOR SIT AMET, CONSECTETUR ADIPISICING ELIT, SED DO EIUSMOD TEMPOR INCIDIDUNT UT LABORE ET DOLORE MAGNA ALIQUA. UT ENIM AD MINIM VENIAM, QUIS NOSTRUD EXERCITATION ULLAMCO LABORIS NISI UT ALIQUIP EX EA COMMODO CONSEQUAT. DUIS AUTE IRURE DOLOR IN REPREHENDERIT IN VOLUPTATE VELIT ESSE CILLUM DOLORE EU FUGIAT NULLA PARIATUR. EXCEPTEUR SINT OCCAECAT CUPIDATAT NON PROIDENT, SUNT IN CULPA QUI OFFICIA DESERUNT MOLLIT ANIM ID EST LABORUM.

To accomplish the opposite, i.e., convert all these words to lower-case, we'd use:

s/(\w+)/\L\1/g

This would give us:

lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.

This is all well and good. However, the usual use case is to capitalise either the first letter of the first word of a sentence (aka sentence case) or to capitalise the first letter of each word in a sentence (aka title case) (often with the exception of prepositions). While this can perhaps be done by matching characters following spaces, there is a regex modifier that makes our job easier.

To convert only the first letter of each match to upper-case, use:

s/(\w+)/\u\1/g

This gives us:

Lorem Ipsum Dolor Sit Amet, Consectetur Adipisicing Elit, Sed Do Eiusmod Tempor Incididunt Ut Labore Et Dolore Magna Aliqua. Ut Enim Ad Minim Veniam, Quis Nostrud Exercitation Ullamco Laboris Nisi Ut Aliquip Ex Ea Commodo Consequat. Duis Aute Irure Dolor In Reprehenderit In Voluptate Velit Esse Cillum Dolore Eu Fugiat Nulla Pariatur. Excepteur Sint Occaecat Cupidatat Non Proident, Sunt In Culpa Qui Officia Deserunt Mollit Anim Id Est Laborum.

The converse modifier (i.e., convert first letter of a match to lower case) is \l (a lower-case L).

There is probably some lengthy regex available with lots of look-aheads and look-behinds that can format a string using title-case. But I'm happy to break it up into multiple steps.

Hope this helps.

Tags: