David Y.
—How do I split a string into a list of words using Python?
We can do this using the string method split
:
sentence = 'Jackdaws love my big sphinx of quartz.' wordlist = sentence.split() print(wordlist) # will print ['Jackdaws', 'love', 'my', 'big', 'sphinx', 'of', 'quartz.']
The split
method takes an optional sep
argument, allowing us to specify a substring to treat as the separator between items in our list. If no separator is specified, as in the above example, the string is split on each run of one or more whitespace characters (spaces, tabs, and newlines).
Splitting the string on whitespace characters preserves sentence punctuation such as periods and commas, like quartz.
in the example. If this is acceptable, you can stick with split
. But if you would like to remove punctuation from the wordlist, use a regular expression with re.findall
to build a list containing all words from the string. For example:
import re sentence = 'Jackdaws love my big sphinx of quartz.' wordlist = re.findall(r'\b\w+\b', sentence) print(wordlist) # will print ['Jackdaws', 'love', 'my', 'big', 'sphinx', 'of', 'quartz']
The regular expression \b\w+\b
matches all words of at least one character (\w+
) surrounded by word boundaries (\b
), such as spaces, commas and periods. However, this code will also split on apostrophes and hyphens, which you may not want. To avoid that, you need to make the regular expression a little more complicated:
import re sentence = "Jack's mother-in-law's favorite cozy tavern was a quaint pub where zebras, lynxes, and quokkas danced." wordlist = re.findall(r"\b\w+(?:[-']\w+)*\b", sentence) print(wordlist) # will print ["Jack's", "mother-in-law's" 'favorite', 'cozy', 'tavern', 'was', 'a', 'quaint', 'pub', 'where', 'zebras', 'lynxes', 'and', 'quokkas', 'danced']
Here, we’ve added (?:[-']\w+)*
to the regular expression, which allows characters after the first one in the word to be apostrophes or hyphens. Our code now splits the sentence into a list of words, preserving words with apostrophes and hyphens, while discarding commas, periods, and other sentence-level punctuation. The regular expression can be further tweaked to serve different use cases.
Tasty treats for web developers brought to you by Sentry. Get tips and tricks from Wes Bos and Scott Tolinski.
SEE EPISODESConsidered “not bad” by 4 million developers and more than 100,000 organizations worldwide, Sentry provides code-level observability to many of the world’s best-known companies like Disney, Peloton, Cloudflare, Eventbrite, Slack, Supercell, and Rockstar Games. Each month we process billions of exceptions from the most popular products on the internet.
Here’s a quick look at how Sentry handles your personal information (PII).
×We collect PII about people browsing our website, users of the Sentry service, prospective customers, and people who otherwise interact with us.
What if my PII is included in data sent to Sentry by a Sentry customer (e.g., someone using Sentry to monitor their app)? In this case you have to contact the Sentry customer (e.g., the maker of the app). We do not control the data that is sent to us through the Sentry service for the purposes of application monitoring.
Am I included?We may disclose your PII to the following type of recipients:
You may have the following rights related to your PII:
If you have any questions or concerns about your privacy at Sentry, please email us at compliance@sentry.io.
If you are a California resident, see our Supplemental notice.