Skip to main content
Filter by
Sorted by
Tagged with
Advice
0 votes
4 replies
83 views

I am working with a university faculty salary dataset where the same person appears across many years, but their name strings are inconsistent. The dataset has about 8,000 unique people and years from ...
Mengyang Cao's user avatar
0 votes
2 answers
84 views

I'm working with many tabular datasets (Excel, CSV) that contain inconsistent or messy column names due to typos, different naming conventions, spacing, punctuation, etc. I have a standard schema (as ...
Ste347789's user avatar
-2 votes
1 answer
116 views

I'm working with two datasets for German NUTS-3 level regions: A shapefile from Eurostat via the giscoR package: > library(giscoR) > nuts3_germany <- gisco_get_nuts(country = "Germany&...
Saïd Maanan's user avatar
2 votes
3 answers
121 views

I have a column in Pandas DataFrame(Names) with a large collection of names. I have another DataFrame(Title) text column and in between text, the names in Name frame are there. What would be the ...
Totura's user avatar
  • 167
1 vote
1 answer
71 views

I have a database with three columns: name, occupation, and organization. In these columns, I have duplicates with slightly different names. For example, Anne Sue Frank and Anne S. Frank refer to the ...
Vitoria Sanchez's user avatar
1 vote
3 answers
94 views

I have a large pandas DataFrames like below. import pandas as pd import numpy as np df = pd.DataFrame( [ ("1", "Dixon Street", "Auckland"), ("2&...
Totura's user avatar
  • 167
1 vote
1 answer
78 views

I'm trying to write a regex that matches every occurrence of some_function(...), but it should not match when it's part of an object method like my.some_function(...) or if it is a substring of ...
JVS's user avatar
  • 2,682
1 vote
0 answers
75 views

What I'm trying to do is find and correct similar names in my database, like 'Patrick Maxwell' and 'Patrick Maxwel.' However, the issue I'm facing is that the best match for each name is often itself, ...
Kauan Randall Oliveira Ferreir's user avatar
1 vote
1 answer
2k views

I have a string that is returned from an api call , the string is something like ".\controllers\myaction c:\test\path" I want to use Powershell to check if the string contains c:\ ...
Kate's user avatar
  • 61
0 votes
2 answers
71 views

I have a DataFrame with a column of publisher names that contains various minor variations of the same publisher. For example, entries such as "Harlequin Romance", "Harlequin Blaze"...
Claudine U's user avatar
0 votes
1 answer
62 views

Major edit: Apparently it is difficult to understand my question, so I'll do my best to concretize it. I got two dataframes, "df1" and "df2". These are quite larger, larger than in ...
Calle Flygare's user avatar
0 votes
2 answers
75 views

I have searched high and low and nobody seems to have asked that exact question, so I'm at loss. I have a data frame with a couple columns. One of this column contains various sentences that don't ...
Laue28's user avatar
  • 1
2 votes
6 answers
133 views

I have a series of string in a vector and need to remove the matching starting pattern from the string. However, I don't know the pattern or how long it is. stringa <- c("apple_tart", &...
Katie Helm's user avatar
2 votes
2 answers
335 views

Goal: I'd like to find all exact occurrences of a string, or close matches of it, in a longer string in Python. I'd also like to know the location of these occurrences in the longer string. To define ...
Franck Dernoncourt's user avatar
0 votes
0 answers
80 views

I'm working on a problem involving string matching where I need to compute the similarity scores for each prefix of a string C against another string S. The similarity score for a prefix P of C and S ...
NatsumiStar's user avatar
0 votes
1 answer
635 views

I have 2 pandas dataframes that both contain company names. I want to merge these 2 dataframes on company names using a fuzzy match. But the problem is 1 dataframe contains 5m rows and the other 1 ...
L H's user avatar
  • 27
2 votes
3 answers
167 views

I have 2 dataframes that captured the hierarchy of the same dataset. Df1 is more complete compared to Df2, so I want to use Df1 as the standard to analyze if the hierarchy in Df2 is correct. However, ...
L H's user avatar
  • 27
-1 votes
1 answer
85 views

Reading a text file with the format: e2c=["(vsim-86)" ,'kkk', "pppp", "bbbbbb", #"old", "uio", " sds # sds", #"old2", " sds #...
taquionbcn's user avatar
0 votes
0 answers
61 views

I have implemented a string matching function in Python utilizing n-grams and similarity ratios. The function signature is as follows: # concise version of the function def match_strings(...
NIDHI SHASTRY's user avatar
1 vote
1 answer
62 views

As my question indicates, I would like to convert a vector of strings into a new vector one of two values that appears in every string. Here is an example of a very simple data frame I have: data <-...
jdenn0514's user avatar
  • 109
0 votes
0 answers
701 views

I'm trying to create a code to see if my predictions for games and the actual result of the games are the same. I was going to create a point value, like March Madness has, but I can't actually get ...
Dixon Gerber's user avatar
1 vote
1 answer
319 views

Been trying to use thefuzz to compare two different lists, and got the above error, which doesn't seem right. I've commented everything else out in my code except the below two test lines and still ...
user avatar
0 votes
0 answers
34 views

PS C:\Users\Administrator> $string = "hello world" PS C:\Users\Administrator> $string -ilike "hello" False the above is outputing false, and not true. not sure what I am ...
ctappy's user avatar
  • 177
0 votes
2 answers
102 views

I am trying to join several messy datasets together without using "fuzzy matching". In the core dataset (example dataset1 below), I have simple names for companies. In the datasets I would ...
lyd-m's user avatar
  • 3
1 vote
1 answer
573 views

I have a table with address1, city, state, and postal code. However, some address1 will also contains city, state and postal code (separated by either comma or space or both). Example: Address1: 9999 ...
shano's user avatar
  • 23

1
2 3 4 5
47