1

This is my first post in this community and I am excited.

Environment

I am using a Notebook in Microsoft Fabric. The language is PySpark.

Objective

I want to convert a column in RTF to plaintext.

pip install striprtf

# Import the modules needed
from striprtf.striprtf import rtf_to_text
from pyspark.sql.functions import col, udf
from pyspark.sql.types import StringType

# Wrap function in UDF
my_udf = udf(lambda x:rtf_to_text(x), StringType())

# Create new column with plaintext
df.withColumn("plaintext", my_udf(col("formattedtext"))) \
  .show(truncate=False)

Challenge

I have installed the module striprtf, imported the needed functions and defined a UDF. Still I get an error message for the last command saying "No module named 'striprtf'". If I test the function 'rtf_to_text' on a variable it works.

2
  • Has the library loaded ok? try pip show striprtf to check Commented Jun 18, 2024 at 11:35
  • Thanks, Jon. I didnt know that command. The library loaded okay. Do I have to consider the note "you may need to restart the kernel to use updated packages.". I would get it every time i run the script. Commented Jun 19, 2024 at 13:56

0

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.