Newest 'data-mining' Questions

0 votes

0 answers

64 views

Measuring logicality of programming languages?

I have a simple question of how would you measure the logicality of a programming language? EDIT: I was asked to specify the term "logicality". Hence I will try and provide a stipulation. By ...

Shawn W.

11

asked Oct 15, 2024 at 20:33

1 vote

1 answer

173 views

What books are there to learn to implement these graph algorithms?

I saw a post on Reddit (https://www.reddit.com/r/math/comments/ci50d3/visualizing_mathematical_subjects/) that utilizes label propagation, Fruchterman-Reingold algorithm, and edge betweenness ...

ZENG

113

asked Jan 20, 2023 at 21:02

1 vote

1 answer

75 views

Machine learning and test split for time series data

I have used different machine learning algorithms to predict solar panels' power output. There are ten independent features for weather data. In all models, I set time as an index and have used the ...

graphicart86

39

asked Jan 27, 2022 at 14:31

1 vote

3 answers

2k views

How can we express value of cosine similarity of two documents into percentage?

We were doing project work for plagiarism checking. For this purpose, we have taken a term frequency vector of two documents and measured the similarity using a cosine similarity measure. The value of ...

Tushar Saha

27

asked Nov 8, 2021 at 6:45

1 vote

1 answer

58 views

Would samples be considered data redundancy if they are similar to each other fairly naturally?

I am working on building ML/DL solution for a problem where that data is considered, naturally similar and I am worried if that would be considered as data redundancy. My question is, is that so? and ...

Luka

11

asked Jul 28, 2021 at 6:13

0 votes

0 answers

63 views

What are the confusion matrix values?

I'm currently going through past paper questions and was wondering if I could get some help answering this one? 'Consider a classification model which is applied to a set of records, of which 100 ...

curiousCoder

1

asked Apr 30, 2021 at 16:00

0 votes

1 answer

184 views

How to detect outliers using DBSCAN?

I am working on a Fraudulent Cash Transaction Detection System using DBSCAN and I want to know what is the proper way to identify outliers? Thank you ##Edite## I had a problem how to represent the ...

Xx_22

1

asked Oct 24, 2020 at 7:17

0 votes

0 answers

66 views

How to handle distribution of values with same attributes into different classes

I'm a student studying a data mining course and have come across a problem. I need to explain the problem with the help of an example scenario as I do not know how to explain the problem in any other ...

mahesh Rao

153

asked Sep 13, 2020 at 10:05

3 votes

1 answer

236 views

How does an inverted index reduce storage requirements?

In p. 7 of the book "Introduction to Information Retrieval" (by Manning et al), the authors explain how, given a collection of text documents, an inverted index is built by tokenizing, then ...

jm jm

31

asked Jul 1, 2020 at 10:36

1 vote

0 answers

65 views

Can anyone think of applications of a 3 way (k-way) dot product in computer science or data mining

I have developed a locality sensitive hashing algorithm for the 3-way or k-way dot product. When I say 3-way dot product I mean the following. Suppose we have $x,y,z \in [-1,1]^{S}$ for $S \in \...

Mihir Mongia

11

asked Jan 16, 2020 at 21:13

2 votes

2 answers

305 views

Why is it not always possible to compute the centroid of feature vectors?

Hi in the data mining and machine learning course that I'm taking there is a subject on feature spaces and there is this part about feature vector aggregation and metric spaces that I don't really ...

Mads

21

asked Aug 14, 2019 at 17:53

-1 votes

1 answer

96 views

Dimension Reduction - Which feature should remove to reduce the dimension of the matrix

Let's suppose that we have the following 2 tables: If we want to reduce the dimension by one(in every table) which feature we should remove and why ? I am confused about the way that i should work ...

Emily Serone

1

asked Jun 16, 2019 at 17:35

3 votes

1 answer

63 views

Find plane within margin of error of >50% of points

There are $N < 3\times10^4$ 3D points. At least 50% of them lie approximately in the same plane, i.e. the distance between the plane and each point is at most $p$. Find such a plane. Attempt: since ...

Ignacio

133

asked May 11, 2019 at 11:55

2 votes

0 answers

54 views

In topological data analysis, do bar codes that begin and end at the same index mean anything?

The typical workflow in topological data analysis is from point cloud data to filtration to a list of bar codes corresponding to each dimension. A filtration is a sequence of simplicial complexes, ...

Eben Kadile

235

asked Jul 30, 2018 at 8:25

0 votes

0 answers

51 views

What are Key benefits of Ontologies in Systematic Literature Review?

I am working on a Systematic Literature Review (SLR) and about to done with data synthesis. After SLR, I want to create an Ontology and include different details of the SLR in Ontology. I have almost ...

Khan

1

asked Apr 16, 2018 at 21:51

1 vote

0 answers

174 views

Combining Computer Science and Humanities

I currently hold a bachelors in Computer science and a masters in Art History. I really want to combine the two and I know of Digital Humanities but I'm not completely aware of where Digital Humanists ...

dcs1

11

asked Mar 20, 2018 at 19:40

0 votes

0 answers

127 views

Naviers Stokes equation and machine learning

I am looking for a reference explaining how to solve Navier-Stokes numerically using Machine learning algorithms . Thank you in advance for your help .

ABRAICH Ayoub

1

asked Feb 28, 2018 at 2:42

4 votes

1 answer

339 views

Finding (and possibly extracting) source code in heterogenous text data set

I'm looking for a way to recognize and possibly extract source code from text files that may contain only source code, source code mixed with plain text or just plain text without any source code. ...

Marv

143

asked Feb 17, 2018 at 14:52

1 vote

0 answers

627 views

Algorithms for tabulating/counting/frequency counting?

It is common in data science to receive two equal length vectors (array of dimension 1), say Categories and Weights. We aim to find all unique values of Categories and sum up the corresponding ...

xiaodai

131

asked Oct 24, 2017 at 22:01

1 vote

0 answers

44 views

How may I look for 'regions' of text in a larger corpus of different texts

I have an extremely large (100GB+) corpus of many different texts. All of them are in English and 'well' formatted. They are not loaded into any kind of database, think of them as a huge collection of ...

Alex Morales

121

asked Sep 6, 2017 at 0:48

1 vote

0 answers

332 views

What is the best stream data clustering algorithm that can handle non-static, uncertain data? [closed]

I have gone through many algorithms including streaming k-means, CluStream etc and they all have their pros and cons. What is the best performing algorithm in terms of Computational Complexity Memory ...

Cybernix

11

asked Jul 28, 2017 at 8:18

0 votes

1 answer

959 views

List count of occurrences pairs, triplets, etc. from sets

A receipt is an array of products. I have an array of receipts. I need to generate a report in where I can find the products often bought together. For instance, for a single receipt where the ...

Berry

101

asked Jun 27, 2017 at 5:03

2 votes

2 answers

256 views

How to use Neural Network classification if data not same size?

I have data like this. [0 1 0 1 0] [0 1 0 1 0 1 1] [0 1 0 1 ] [0 1 0 1 0 1 1 1 1 0] ... I want to classify with Neural Network but my data different size . I can ...

user572575

121

asked May 19, 2017 at 7:38

12 votes

5 answers

20k views

Data Science vs Operations Research

The general question, as the title suggests, is: What is the difference between DS and OR/optimization. On a conceptual level I understand that DS tries to extract knowledge from the available data ...

PsySp

261

asked Mar 14, 2017 at 13:20

1 vote

1 answer

244 views

Method for finding correlation between data sets

Let's say that I have $N$ data sets where I have data points at some fixed frequency, such as "daily". What would be a good method for finding correlation between any of the data sets, or choosing a ...

Alan Wolfe

1,368

asked Jan 7, 2017 at 18:05

Stack Exchange Network

Questions tagged [data-mining]

Measuring logicality of programming languages?

What books are there to learn to implement these graph algorithms?

Machine learning and test split for time series data

How can we express value of cosine similarity of two documents into percentage?

Would samples be considered data redundancy if they are similar to each other fairly naturally?

What are the confusion matrix values?

How to detect outliers using DBSCAN?

How to handle distribution of values with same attributes into different classes

How does an inverted index reduce storage requirements?

Can anyone think of applications of a 3 way (k-way) dot product in computer science or data mining

Why is it not always possible to compute the centroid of feature vectors?

Dimension Reduction - Which feature should remove to reduce the dimension of the matrix

Find plane within margin of error of >50% of points

In topological data analysis, do bar codes that begin and end at the same index mean anything?

What are Key benefits of Ontologies in Systematic Literature Review?

Combining Computer Science and Humanities

Naviers Stokes equation and machine learning

Finding (and possibly extracting) source code in heterogenous text data set

Algorithms for tabulating/counting/frequency counting?

How may I look for 'regions' of text in a larger corpus of different texts

What is the best stream data clustering algorithm that can handle non-static, uncertain data? [closed]

List count of occurrences pairs, triplets, etc. from sets

How to use Neural Network classification if data not same size?

Data Science vs Operations Research

Method for finding correlation between data sets

Hot Network Questions