How to Remove Punctuation from Python String
This blog explains how to remove punctuations from Python String. We have covered three methord here string.punctuation, regular Expression and translate(). Let’s explore!
Various types of processing are required to make data useful when dealing with huge amounts of data. Raw data often contain irrelevant and unfiltered information, which must be sorted and filtered before further processing. A typical example is removing unwanted characters from strings, which is required for data mining, web scraping, and machine learning. Cleaning strings could involve eliminating particular text from a long string or discarding unnecessary symbols that aren’t needed for processing. This article will explore different methods to remove punctuation from Python strings.
We will be covering the following sections:
- How to Remove Punctuation from Python String?
- Using string.punctuation
- Using Regular Expression
- Using translate()
How to Remove Punctuation from Python String?
The English language includes several grammatical symbols, such as hyphens -, underscores _, exclamation marks !, question marks ?, commas ,, colons :, parentheses {}, semicolons ;, brackets (), and others, which are referred to as punctuation marks. These symbols are used in written English to disambiguate the meaning of words and sentences. Although we often receive information in the natural English language, it can lead to complications when processing it further. Therefore, it is necessary to eliminate irrelevant symbols from the text. In the following article, we will explore the process of removing such symbols.
There are several ways to remove punctuation from a string in Python. We will be discussing the three most common methods below:
Best-suited IT & Software courses for you
Learn IT & Software with these high-rated online courses
Using string.punctuation
The string module in Python includes a constant string named “punctuation” that contains all the common punctuation marks. We can utilize this to remove the punctuation from a given string by iterating through each character in the string and checking if it is in string.punctuation. If it is not, we add it to a new string.
Here is an example:
import string
# Sample string my_string = "Hello! How are you? I'm doing well, thanks."
# Remove punctuation new_string = "" for char in my_string: if char not in string.punctuation: new_string += char
# Output print(new_string)
Output:
Hello How are you Im doing well thanks
Using Regular Expression
Regular expressions, commonly referred to as Regex, are commonly employed for pattern-matching purposes that involve logic. Regular expressions are a collection of characters that describe a specific pattern that needs to be searched within a given string. In the following code, we utilize the regex pattern matching approach to eliminate punctuation from a Python string. Python’s re module offers functions for utilizing regular expressions.
This approach aims to identify all characters that are not alphabets or numbers, and then replace them with an empty string. This is achieved through the re.sub() function, which performs the substitution.
The syntax of re.sub() is given as:
re.sub(pattern, replacement, string)
The re.sub() function takes the following arguments:
- pattern: regular expression pattern to match
- replacement: a string or regex, that will be substituted in place of the pattern
- string: the actual string to perform substitutions on
The return value of re.sub() is a string with the newly applied substitutions.
Here is an example:
import re
# Sample string my_string = "Hello! How are you? I'm doing well, thanks."
# Remove punctuation new_string = re.sub(r'[^\w\s]','', my_string)
# Output print(new_string)
Output:
Hello How are you Im doing well thanks
In this example, we use the re.sub() function to substitute all non-word and non-space characters with an empty string. The r'[^\w\s]’ regular expression pattern matches all non-word and non-space characters, and the sub() function replaces them with an empty string.
Please note that using regular expressions to remove punctuation from a string is the slowest method of performing this task. This is because regular expressions match a string over multiple iterations; in this case, a single iteration is sufficient to accomplish the task. As a result, using regular expressions in this scenario causes unnecessary iterations, slowing down the process.
Using translate()
We can use the translate() method to remove punctuation from a Python string. The translate() method [insert link] can be used to replace specified characters with another character or delete them altogether.
Here is an example:
import string
# Sample string my_string = "Hi, Welcome aboard! You're in for a ride of your life. Are you ready?" # Create a translation table translator = str.maketrans('', '', string.punctuation)
# Remove punctuation new_string = my_string.translate(translator)
# Output print(new_string)
Output:
Hi Welcome aboard Youre in for a ride of your life Are you ready
In this example, we create a translation table using the str.maketrans() method [insert link] and pass it the string.punctuation constant. This creates a translation table that maps all punctuation characters to None, effectively deleting them. We then pass this translation table to the translate() method to remove the punctuation from the original string.
This article discussed different methods to remove punctuation from a Python string. We also noted that while the string.punctuation and translate() methods are faster and simpler than the regex method, the regex method is still useful for more complex pattern-matching tasks.
Endnotes
Overall, the method used to remove punctuation from a string will depend on the specific requirements of the task and the trade-off between simplicity and performance. I hope this article was helpful to you. You can explore related articles here if you wish to learn more about Python and practice Python programming.
Contributed By: Prerna Singh
This is a collection of insightful articles from domain experts in the fields of Cloud Computing, DevOps, AWS, Data Science, Machine Learning, AI, and Natural Language Processing. The range of topics caters to upski... Read Full Bio