1,988 questions
0
votes
1
answer
94
views
How to format a vocabulary list into a table using Python for Google Sheets
What are the details of your problem?
I am a teacher and I want to use Python to create a worksheet for my students. I have a vocabulary PDF with content like this:
do your best duː jɔː best
33, 81
do ...
0
votes
0
answers
100
views
How to remove duplicate footers from pdf text
ASP.NET Core 9 MVC / C# controller extracts texts from invoices using pdfpig based on code in answer How to group text to lines if there is small difference in Y position.
Invoices can have multiple ...
0
votes
0
answers
42
views
Swift Regex -- RegexBuilder for extracting blocks from enclosed in tags
I'm trying to use RegexBuilder/Swift to write a Swift method that extracts for example lists enclosed by <ul> and </ul> from an HTML-string.
In this example
let htmlText = ""&...
1
vote
1
answer
82
views
How do I remove escape characters from output of nltk.word_tokenize?
How do I get rid of non-printing (escaped) characters from the output of the nltk.word_tokenize method? I am working through the book 'Natural Language Processing with Python' and am following the ...
2
votes
2
answers
97
views
How can sort a file on specific lines, preserving headers?
raw txt file contains these lines:
cat raw.txt
ID DESCRIPTION
----- --------------
2 item2
4 item4
1 item1
3 item3
How can reorder it by ID as ...
3
votes
1
answer
149
views
Parse text file, change some strings to camel case, add other strings - follow up question
Note that this is the follow up question of Parse text file, change some strings to camel case, add other strings .
The parsing rules are similar but different:
The input order in the output is ...
2
votes
1
answer
96
views
How to Dynamically Map Directory-Based Identifiers to a Specific Column in Annotated Output Files?
Problem:
I have a two-step bioinformatics pipeline where:
Code 1 generates output files (.marked.bam) and places them into a directory structure.
Code 2 processes annotated files (annotated....
-5
votes
1
answer
120
views
How to Separate Text and Code in Python Strings?
I've encountered an issue in python. I have a string that contains both a message and code, and I need to separate them and pass each to different functions. An example:
text = """
Can ...
0
votes
0
answers
140
views
Accented characters in Presidio deny list not being recognized despite correct YAML encoding
I'm using Microsoft Presidio for text analysis and anonymization, and I have a configuration file (all-config.yml) to specify recognizers, including some deny lists with accented characters. However, ...
-1
votes
1
answer
43
views
Vector search to get a uniqueness score based on context
I have a single blog post with a title and description, and I want to compare its uniqueness against multiple blog entries in a CSV file. The CSV contains several blogs, each with a title and meta ...
-2
votes
2
answers
133
views
Keep most common line from each set of duplicates of a column
I have a relatively complicated Bash problem. I have a two-column CSV file that contains duplicate values in the first column, as well as duplicates within those duplicate values (in the second column)...
0
votes
2
answers
703
views
convert list of word in github actions into json array and use as a strategy matrix
I have in github actions a list of words with spaces in the shell Bash. e.g.
hello1 hello2 hello3
The goal is to convert this list to a JSON array format, write it to the output variable and use it ...
1
vote
1
answer
74
views
A regex line to remove whitespaces unless within double quotes, taking into account escaped double quotes
I am parsing some game config files using Python and putting it all in dictionaries. It seemed to all work well until I encountered the following edge-case:
random_owned_controlled_state = {
...
0
votes
0
answers
554
views
Creating a Python script to mask sensitive data in a flat log file
I want to mask all types of sensitive data (usernames, passwords, api keys, DB connection strings, endpoints, secrets, and even any custom variables containing secrets) present in a flat log file.
...
0
votes
1
answer
124
views
Return sentences from list of sentences using user specified keyword
I got a list of sentences (roughly 20000) stored in excel file named list.xlsx and sheet named Sentence under column name named Sentence.
My intention is to get words from user and return those ...
1
vote
1
answer
802
views
how to use jq to print one-line per root-level object key?
I would like to compress the space for a json file by printing in compact mode (-c) but I want to add a new line after each root-level object.
for example, for the below object
{
"a": {
...
0
votes
1
answer
67
views
How to print specific values of the etherscan API output with python
I use the following code to request data from etherscan API
from etherscan import Etherscan
eth = Etherscan("API_KEY_HERE") # key in quotation marks
response = eth.get_normal_txs_by_address(...
-1
votes
1
answer
978
views
Split address text into components using Machine Learning
I have a CSV file with each row representing the different components of an address, such as City, Street, House No, etc., and then a column with a combined address in one line, with a predefined ...
-4
votes
1
answer
82
views
awk find/print paragraph containing multiple patterns [closed]
Request:
Extract blocks of text that contain 2 or more search terms, something akin to [ AND ] logical operator in [ awk ].
Preferably run as awk in bash/zsh function (but also ok with standalone awk ...
1
vote
1
answer
58
views
Using a Word Counter in Python is understating results
As a complete preface, I am a beginner and learning. But, here's the sample schema of my products review table.
Record_ID
Product_ID
Review Comment
1234
89847457
I love this product it was shipped ...
-2
votes
1
answer
44
views
How to merge some elements in a python list based on some features
This is a list, each elements consists of two strings and a "/t" between.We can call the string on the left "label" and the part on the right "text".
continued In the ...
-3
votes
3
answers
495
views
How can I find out the number of tabs at the beginning of each line in a text file?
I have a text file, where each line may start with a number of tabs, including no tabs. For example, the first line starts with no tab, the second line with 1 tab, and the third line with 2 tabs:
...
1
vote
1
answer
294
views
SAM alignment: Extract specific region in the query sequence and its enclosing parts in the CIGAR string
I need to perform the local alignment of a given region of a DNA sequence for which a global alignment was made, and update the corresponding part of the global CIGAR string.
The steps would be as ...
1
vote
1
answer
132
views
Text processing with Python
I need to extract 1,500,000 out of 25,000,000 records and group them.
The groups and the UUIDs of the records to extract are defined in a separate file (200MB) with the following format:
>Cluster 0
...
1
vote
1
answer
90
views
How to split an array of string nodes at given index?
I probably need to use other data structure, but I'm stuck with this solution for now. Will appreciate any advice on this.
For now I have this data structure:
const data = [
{
id: 'node-1',...