How to Remove Punctuation from Python String

How to Remove Punctuation from Python String

4 mins read630 Views Comment
Updated on Mar 31, 2023 14:44 IST

This blog explains how to remove punctuations from Python String. We have covered three methord here string.punctuation, regular Expression and translate(). Let’s explore!

2023_03_Copy-of-Feature-Image-Templates-21.jpg

Various types of processing are required to make data useful when dealing with huge amounts of data. Raw data often contain irrelevant and unfiltered information, which must be sorted and filtered before further processing. A typical example is removing unwanted characters from strings, which is required for data mining, web scraping, and machine learning. Cleaning strings could involve eliminating particular text from a long string or discarding unnecessary symbols that aren’t needed for processing.  This article will explore different methods to remove punctuation from Python strings.

We will be covering the following sections: 

How to Remove Punctuation from Python String? 

The English language includes several grammatical symbols, such as hyphens -, underscores _, exclamation marks !, question marks ?, commas ,, colons :, parentheses {}, semicolons ;, brackets (), and others, which are referred to as punctuation marks. These symbols are used in written English to disambiguate the meaning of words and sentences. Although we often receive information in the natural English language, it can lead to complications when processing it further. Therefore, it is necessary to eliminate irrelevant symbols from the text. In the following article, we will explore the process of removing such symbols. 

There are several ways to remove punctuation from a string in Python. We will be discussing the three most common methods below: 

Recommended online courses

Best-suited IT & Software courses for you

Learn IT & Software with these high-rated online courses

18 K
1 year
39.88 K
2 years
– / –
2 years
18 K
1 year
10.8 K
6 months
– / –
2 years
16.25 K
4 weeks
19.5 K
12 months
name
ICACertificate
– / –
80 hours

Using string.punctuation 

The string module in Python includes a constant string named “punctuation” that contains all the common punctuation marks. We can utilize this to remove the punctuation from a given string by iterating through each character in the string and checking if it is in string.punctuation. If it is not, we add it to a new string.  

Here is an example: 


 
import string
# Sample string
my_string = "Hello! How are you? I'm doing well, thanks."
# Remove punctuation
new_string = ""
for char in my_string:
if char not in string.punctuation:
new_string += char
# Output
print(new_string)
Copy code

Output: 

Hello How are you Im doing well thanks 

Using Regular Expression 

Regular expressions, commonly referred to as Regex, are commonly employed for pattern-matching purposes that involve logic. Regular expressions are a collection of characters that describe a specific pattern that needs to be searched within a given string. In the following code, we utilize the regex pattern matching approach to eliminate punctuation from a Python string. Python’s re module offers functions for utilizing regular expressions. 

This approach aims to identify all characters that are not alphabets or numbers, and then replace them with an empty string. This is achieved through the re.sub() function, which performs the substitution. 

The syntax of re.sub() is given as: 


 
re.sub(pattern, replacement, string)
Copy code

The re.sub() function takes the following arguments: 

  • pattern: regular expression pattern to match 
  • replacement: a string or regex, that will be substituted in place of the pattern 
  • string: the actual string to perform substitutions on 

The return value of re.sub() is a string with the newly applied substitutions. 

Here is an example: 


 
import re
# Sample string
my_string = "Hello! How are you? I'm doing well, thanks."
# Remove punctuation
new_string = re.sub(r'[^\w\s]','', my_string)
# Output
print(new_string)
Copy code

Output: 

Hello How are you Im doing well thanks 

In this example, we use the re.sub() function to substitute all non-word and non-space characters with an empty string. The r'[^\w\s]’ regular expression pattern matches all non-word and non-space characters, and the sub() function replaces them with an empty string. 

Please note that using regular expressions to remove punctuation from a string is the slowest method of performing this task. This is because regular expressions match a string over multiple iterations; in this case, a single iteration is sufficient to accomplish the task. As a result, using regular expressions in this scenario causes unnecessary iterations, slowing down the process. 

Using translate() 

We can use the translate() method to remove punctuation from a Python string. The translate() method [insert link] can be used to replace specified characters with another character or delete them altogether.  

Here is an example: 


 
import string
# Sample string
my_string = "Hi, Welcome aboard! You're in for a ride of your life. Are you ready?"
# Create a translation table
translator = str.maketrans('', '', string.punctuation)
# Remove punctuation
new_string = my_string.translate(translator)
# Output
print(new_string)
Copy code

Output: 

Hi Welcome aboard Youre in for a ride of your life Are you ready 

In this example, we create a translation table using the str.maketrans() method [insert link] and pass it the string.punctuation constant. This creates a translation table that maps all punctuation characters to None, effectively deleting them. We then pass this translation table to the translate() method to remove the punctuation from the original string. 

This article discussed different methods to remove punctuation from a Python string. We also noted that while the string.punctuation and translate() methods are faster and simpler than the regex method, the regex method is still useful for more complex pattern-matching tasks. 

Endnotes 

Overall, the method used to remove punctuation from a string will depend on the specific requirements of the task and the trade-off between simplicity and performance. I hope this article was helpful to you. You can explore related articles here if you wish to learn more about Python and practice Python programming. 

Contributed By: Prerna Singh

About the Author

This is a collection of insightful articles from domain experts in the fields of Cloud Computing, DevOps, AWS, Data Science, Machine Learning, AI, and Natural Language Processing. The range of topics caters to upski... Read Full Bio