Newest 'data-manipulation' Questions

Advice

0 votes

3 replies

47 views

String manipulation: extract words under brackets

I'm not yet very familiar with the patterns in Lua's string.gsub function. If I have a string like this: Fishing Lure(+100 Fishing Skill)(1 hour) and I want extract only the string "1 hour"...

user3204810

391

asked Nov 18 at 21:18

3 votes

4 answers

237 views

Fast unnest complex column with data.table

I have a dataset where the column to unnest contains data with unequal rows and columns rather than data with equal dimensions. I'm looking for a fast approach to unnest this dataset using data.table. ...

Steve

107

asked Sep 16 at 18:24

3 votes

3 answers

214 views

Mutating detection data into binary

Currently I have a dataframe of bear detections that I want to convert into a binary detection history (14 columns of day1, day2, day3, etc. where: actual_date_out = the date the camera was deployed, ...

Jessie Elliott

33

asked Aug 11 at 23:27

0 votes

0 answers

20 views

Find conditions from multiple databases to have in a single database

I am currently working in a project where multiple databses are available to check for specific conditions of a patient. Specifically, I have a "master" database in wide format, with one row ...

Claudio Laudani

133

asked May 25 at 14:44

0 votes

1 answer

77 views

Setting a row number for each row in PySpark Dataframe

Currently I'm working with a large database using PySpark and stuck with a problem oh how to correctly set row numbers depending on condition My dataframe is: id_company id_client id_loan date c1 ...

lenpyspanacb

333

asked Apr 12 at 19:53

0 votes

3 answers

245 views

Update Object_construct nested in an Array_construct in Snowflake

Can anyone please help me with this scenario where I have might have multiple OBJECT_CONSTRUCT nested within an ARRAY_CONSTRUCT. I am not able to update one value of an element within it. I am using ...

tan1987

7

asked Feb 21 at 20:55

0 votes

2 answers

170 views

Merge dataframes with conditions using PySpark

Currently I'm making calculations using PySpark and trying to match data from multiple dataframes on a specific conditions. I'm new to PySpark and decided to ask for a help. My first dataframe ...

lenpyspanacb

333

asked Feb 3 at 20:31

0 votes

0 answers

78 views

Dropping rows whose row sum = zero keeping the original structure same

I have a dataframe containing incalculable rows and columns. The df is structured in such that until 6th row and 2nd column, I have string as input and the rest are numbers(floating points). I want to ...

hrith_

1

asked Jan 24 at 15:35

0 votes

1 answer

98 views

Wrong variable comparison result when performing data.table merge of two table with duplicated keys

A collegue trying to do analysis came up with a code from chatgpt, doing something wrong, but that I don't understand. Here is the example: Let's consider a first table ( drugs: Patient have an id, ...

denis

5,721

asked Jan 10 at 11:24

0 votes

0 answers

92 views

How to avoid burp suite from altering input dropdown values in java

We have an application which was tested from Burp suite, by intercepting and altering the values of the dropdown data in our application. Those fields are disabled when view through browser, but able ...

Saranya Raghavan

1

asked Dec 17, 2024 at 4:29

0 votes

2 answers

58 views

Correlation based on mutliple columns and rows

I have a data frame arranged along these lines: theta Rater Case1 Case2 ... CaseN theta1 rater1 score1 score2 ... scoreN theta2 rater1 score1 score4 ... scoreN theta1 rater2 score1 score2 ... scoreN .....

D Theorist

9

asked Nov 4, 2024 at 2:18

0 votes

2 answers

82 views

Return a list of positions of all possible occurrences of a character in a string

In Python how do you get a list of all possible occurrences of a character or a substring in a given string Input: "This is a string" , "s" Output: [3,6,10] Explanation: Returns ...

Anmol Maheshwari

1

asked Oct 12, 2024 at 3:35

0 votes

1 answer

66 views

I need advice with data manipulation R: large data set

I have two data sets of bird detections. One by a human at randomly selected intervals of 2 minutes and one by a machine. I want to compare how well the machine did by checking if the detections ...

CrazyBirdLady

59

asked Oct 11, 2024 at 8:06

0 votes

1 answer

41 views

How can I format the entire tree recursively of PHP Nested Category array output?

I have a nested model category tree in array format as follows .... $data = [ [ 'categoryId' => '08adf337-a577-4038-86a6-a5cd16676dff', 'name' => 'ELECTRONICS', 'parentId' => 0, ...

Ersin Güvenç

311

asked Sep 27, 2024 at 8:25

0 votes

1 answer

75 views

Separating one mysql row into n different ones

My company's client has a table to store inventory data, this inventory separates products using a column called product_code, and it has another column called Qtd that stores how many of the same ...

Pedro Henrique Trentin

3

asked Sep 19, 2024 at 20:27

0 votes

2 answers

76 views

how to filter rows in r in a dataframe with multiple columns based on names in a column from another dataframe?

I have a two dataframes of names, where dataframe one contains a single column of names whereas dataframe two contains multiple columns of names. How can I filter the second dataframe to only contain ...

yardley

1

asked Sep 10, 2024 at 3:26

0 votes

1 answer

67 views

How to consolidate two rows based on data source?

Using snowflake for this: I have a query that produces a very simple table union from 5 difference data sources: WITH personal_info_workday AS ( SELECT 'Workday' AS source, CAST(w....

NidenK

359

asked Aug 28, 2024 at 18:01

1 vote

0 answers

230 views

PyArrow Table manipulation: Unnest float-array column to individual columns

I have nested data stored in parquet files. Polars was my main entrypoint for fast data formatting of this nested data, but for performance reasons, I'd like to use native arrow, using the PyArrow ...

Durand

89

asked Aug 20, 2024 at 13:32

1 vote

1 answer

128 views

restrict to those with data at specific age ranges in R

I have the following long format data frame with columns, id, age, and BMI. I have restricted the dataset such that only people (id) with at least 3 repeated measurements between age 2 weeks and 10 ...

aelhak

435

asked Aug 14, 2024 at 13:45

2 votes

2 answers

100 views

How can I replicate rows in R based on the values of another row?

I am wondering how to write a function to replicate rows based on the value within a column, e.g. if there is a difference of > +-0.1 between one row and the next, that row is replicated so that ...

Oscar Flynn

21

asked Aug 7, 2024 at 10:49

0 votes

1 answer

67 views

A way to concat data from child rows of parent row of a pivot table

I have a large pivot table with the following data. The bold text is the parent row of data and the following years (non-bold data) are the child data of the parent row. Is it possible in excel or ...

chickenbutt

11

asked Jul 11, 2024 at 15:54

1 vote

0 answers

33 views

Create subset and calculate sums in Python based on a condition

I am currently doing some data manipulation procedures and have run into a problem of how to make subsets based on special conditions. My example (dataframe) is like this: Name ID Debt ...

lenpyspanacb

333

asked Jul 4, 2024 at 19:25

0 votes

2 answers

71 views

r difference in each observation within Id

Assuming I have a dataset like this id time cd4 sequence 1 -0.741958 548 1 1 -0.246407 893 2 1 0.243669 657 3 2 -2.7296369 464 1 2 -2.2505131 845 2 2 -0....

Ahir Bhairav Orai

697

asked Jun 29, 2024 at 16:20

1 vote

1 answer

329 views

Power Query document not saving changes

I will preface this with I am brand new to Excel Power Query, just learned about it last night and made my source folder containing the csv files to merge (5 total - 2020, 2021, 2022, 2023,2024 (...

CCoats45

13

asked Jun 26, 2024 at 12:44

0 votes

1 answer

123 views

Pandas idxmin equivalent for mean

I am trying to filter a very large dataframe that looks like this: unique id x y 1 1 2 1 2 3 1 3 4 2 1 2 2 2 3 2 3 4 to only contain the mean values for each unique id, (e.g. filtered on 'x') like ...

Razrer

21

asked Jun 25, 2024 at 22:18

Collectives™ on Stack Overflow

String manipulation: extract words under brackets

Fast unnest complex column with data.table

Mutating detection data into binary

Find conditions from multiple databases to have in a single database

Setting a row number for each row in PySpark Dataframe

Update Object_construct nested in an Array_construct in Snowflake

Merge dataframes with conditions using PySpark

Dropping rows whose row sum = zero keeping the original structure same

Wrong variable comparison result when performing data.table merge of two table with duplicated keys

How to avoid burp suite from altering input dropdown values in java

Correlation based on mutliple columns and rows

Return a list of positions of all possible occurrences of a character in a string

I need advice with data manipulation R: large data set

How can I format the entire tree recursively of PHP Nested Category array output?

Separating one mysql row into n different ones

how to filter rows in r in a dataframe with multiple columns based on names in a column from another dataframe?

How to consolidate two rows based on data source?

PyArrow Table manipulation: Unnest float-array column to individual columns

restrict to those with data at specific age ranges in R

How can I replicate rows in R based on the values of another row?

A way to concat data from child rows of parent row of a pivot table

Create subset and calculate sums in Python based on a condition

r difference in each observation within Id

Power Query document not saving changes

Pandas idxmin equivalent for mean

Hot Network Questions