36,973 questions
1
vote
1
answer
50
views
counting the number of success after certain date every year in Pandas using groupby cumsum
I have a data frame that looks like
Date Student_ID Exam_Score
2020-12-24 1 79
2020-12-24 3 100
2020-12-24 4 88
2021-01-19 1 100
...
1
vote
1
answer
121
views
Fill null values based on unusual repeating pattern [closed]
I have a pandas df that needs to be cleaned by filling nulls in city_id and address_type fields:
City_ID
Date
State
City
Address_type
1001
10/1/24
Texas
Houston
House
1001
10/1/24
Texas
Houston
House
...
1
vote
1
answer
66
views
Counting all "-1" and -1 in a dataframe for a list of certain columns
I have a polars dataframe, and for some of the columns I want to count the number of "-1" (if character) and -1 (if numeric). I would like to really make this a fast query, so I'm very ...
2
votes
3
answers
107
views
Pandas groupby transform mean with date before current row for huge huge dataframe
I have a Pandas dataframe that looks like
df = pd.DataFrame([['John', '1/1/2017','10'],
['John', '2/2/2017','15'],
['John', '2/2/2017','20'],
['...
0
votes
1
answer
34
views
SQL - Columns merge and grouping by column name while summation of the other column
I have a question related to the SQL query. The data looks like this:
Name1
Name2
Category
Total
ABC
NULL
Category1
100
DEF
ABC
Category1
20
GHI
ABC
Category3
300
XYZ
DEF
Category2
60
XYZ
GHI
...
2
votes
2
answers
71
views
sql retrieve all users with multiple email invitations, ordered by biggest invitation number per email, and user grouped
I have a table invitations keeping for each user the email invitations he received.
One user can have many emails, on each email can be received many invitations.
create table invitation (
user_id ...
1
vote
2
answers
204
views
How do I find the min and max of a value of one column in postgresSQL and have it return only 2 rows?
I'm using postgresSQL and would like to find the minimum value and the maximum value in one column. I have to have a column titled Order ID, Minimum Order, Maximum Order in the query. See the below ...
0
votes
2
answers
66
views
Selecting multiple columns (`MultiIndex` based) within a `DataFrameGroupBy`
I have a complex dataframe with multiple columns. All of them being MultiIndex based. At some point I wanted to be quite specific when it comes to estimating some metrics so I started experimenting ...
1
vote
2
answers
98
views
Pandas create % and # distribution list in descending order for each group
I have a pandas dataframe like as below
data = {
'cust_id': ['abc', 'abc', 'abc', 'abc', 'abc', 'abc', 'abc', 'abc', 'abc', 'abc'],
'product_id': [12, 12, 12, 12, 12, 12, 12, 12, 12, 12],
'...
1
vote
2
answers
80
views
What is the best way to filter groups by conditionally checking the values of the first row of each group only?
This is my DataFrame:
import pandas as pd
df = pd.DataFrame(
{
'group': list('xxxxyyy'),
'open': [100, 150, 200, 160, 300, 150, 170],
'close': [105, 150, 200, 160, 350, 150,...
0
votes
1
answer
74
views
Group by and having clause
Consider the following relational schema:
candidates (candidate_id, skill)
[Sample input table]
Select candidate_id
from candidates
group by candidate_id
having sum(if(lower(skill)='python' or lower(...
1
vote
2
answers
71
views
Find a value based on all its present combinations with a value in another field in Postgres
I have a PostgreSQL table:
create table test(type_id, m_id)as values
(1, 123)
,(2, 456)
,(3, 123)
,(4, 123)
,(2, 456)
,(1, 789)
;
Basically, one m_id could have ...
0
votes
1
answer
70
views
Aggregation: Category missing in SQL subquery
my base data:
Process ID
Location
Date
Timeliness
2030608
New York
May 24
in time
2067393
Ohio
May 24
overdue
1329306
Ohio
May 24
in time
1740814
Ohio
June 24
overdue
1924676
Chicago
May 24
overdue
...
0
votes
3
answers
107
views
MYSQL GROUP BY most frequent value for each ID
I have following 2 tables: Sales and Menu.
SALES TABLE
customer_id
product_id
A
1
A
2
A
2
A
3
A
3
A
3
B
1
B
1
B
3
B
3
B
2
B
2
C
3
C
3
C
3
MENU TABLE
product_id
product_name
1
sushi
2
curry
3
ramen
**...
1
vote
1
answer
60
views
Pandas groupby is changing column values
I have a multiindex Pandas DataFrame and I'm using groupby to extract the rows containing the first appearances of the first index.
After this operation, however, the output column values does not ...
2
votes
3
answers
149
views
Group elements in dataframe and show them in chronological order
Consider the following dataframe, where Date is in the format DD-MM-YYY:
Date Time Table
01-10-2000 13:00:03 B
01-10-2000 13:00:04 A
01-10-2000 13:00:05 B
01-10-2000 13:00:06 A
01-...
0
votes
3
answers
70
views
Pandas Groupby and Filter based on first record having date greater than specific date
I have a dataframe that shows details about employees and the site they are at and the positions they have held. The dataframe has columns for Site Id, Employee ID, and StartDate (plus a lot more ...
0
votes
1
answer
54
views
Set a boolean Mask for a Group with a Condition
If the Value '2235' is in column 'Age', the associated Group in the Column 'Name' should be set for all true in a new Column.
My Test-Dataframe is:
import pandas as pd
# intialise data of lists.
data ...
0
votes
0
answers
52
views
Count values greater than row value in all previous rows in each group
I want to compare the numerical value of each row of groups with the previous rows of the same group and return the number of previous rows that have a value greater than the current row.
I know the ...
1
vote
1
answer
69
views
Query Returns Invalid error on Column Already Exists?
I want to Group BY Date and Payment and branch name and code
I don't know why this SQL query return this error:
Column 'FactSalesTable.SalesTotal' is invalid in the select list because it is not ...
0
votes
1
answer
45
views
Pandas groupby and concat multiple rows
CONTEXT
I want to group by both a rule_id and calc_id and transform multiple columns into one row where each variable is concatenated with a ",'
DATA EXAMPLE
Calc_ID Rule_ID Name Tracked?...
0
votes
2
answers
149
views
Pandas Dataframe groupby count number of rows quickly [duplicate]
I have a dataframe that looks like
Class_ID Student_ID feature
1 4 31
1 4 86
1 4 2
1 2 11
1 2 0
5 3 ...
0
votes
1
answer
158
views
Efficiently Grouping Millions of Records by Field in MongoDB - Performance Optimization
I'm working with a large dataset in MongoDB where I need to group approximately 15 million records by a specific field (name). The goal is to find all records where the name field is duplicated. I'm ...
1
vote
2
answers
176
views
Sql TRANSFORM with calculation time in MS Access
I'm Trying to make Sql TRANSFORM with calculation time in column in and out.
I want the answer in 1 query and I don't want to use the function in ms access because I want to use the sql in vb.net
...
0
votes
0
answers
123
views
Pandas how to groupby close values with large range
I have dataframes for which I want to group relatively close values of a column.
I tried using percentile but often there are huge differences between values in a group.
I.E. column has 1000 values ...