Newest 'cluster-computing' Questions - Page 4

0 votes

1 answer

322 views

How training loop for 1 epoch is taking way longer than all batches of trainloader loop execution time?

I have following specification: AMD 32 core processors NVIDIA GPU Tesla V100 16 GB gpu. To debug/test: I have used: only 1 GPU. My dataset is: Imagenet-1k 2012 Batch size: 32 using following details: ...

Ray

1

asked Sep 2, 2023 at 1:14

0 votes

1 answer

532 views

Windows Failover Cluster Generic Script Resource: How to get IP-Address?

I have a generic script resource (vbs). This script needs the cluster application ip address. ATM the IP is configured in the script. But I would like to get it from the cluster (maybe from the ...

stackedyellowangel

25

asked Aug 28, 2023 at 15:16

1 vote

0 answers

25 views

Wildfly 27 : Cluster not work when we have multiple servergroups

Trying to configure wildfly 27 in domain and in clustered environment. 2 RHEL Servers and installed wildfly on both. (No HTTPD nothing) Done required settings on Primary & Secondary Created a ...

fatherazrael

6,027

asked Aug 22, 2023 at 10:00

0 votes

1 answer

3k views

How to get VS Code Server working in RHEL9?

I would like to use the remote development capabilities of VS Code in the context of an LSF compute cluster (but I think the same question would be relevant on e.g. a Slurm cluster). In particular, I ...

Adam L. Taylor

391

asked Aug 15, 2023 at 15:52

1 vote

1 answer

2k views

Can i do a K-Means cluster analysis based on only one variable (in R)?

I have a dataframe with 2 columns. The first column has the name of a meteorological station and the other column has a corresponding index. Can i do a K-Means cluster analysis in order to group the ...

nick

39

asked Aug 9, 2023 at 16:46

1 vote

0 answers

69 views

cluster failover generic application

Is there any way to monitor specific variables generated inside a generic application monitored by a failover cluster in windows server ? I would like to avoid an internal application failure not ...

s4n-dev

13

asked Aug 8, 2023 at 12:54

0 votes

0 answers

91 views

How do I run an Octave code on a cluster?

I recently set-up 5 debian-based computers with the purpose of running a series of tests to determine the effect of additional nodes on the amount of processing time. I am wondering how I should set ...

Nate Winslow

1

asked Aug 2, 2023 at 20:46

1 vote

1 answer

602 views

How to allow %run magic command in Azure Databricks cluster?

I am currently trying to work through a Databricks course and am getting the following error stating that "Your administrator has only allowed sql and python commands on this cluster. This ...

Dat Boi

27

asked Aug 1, 2023 at 19:25

0 votes

1 answer

539 views

GKE cluster creation failed in health check

I'm tried to create a GKE standard cluster. I'm continuously facing creation failed after it reaches 83% cluster creation. It failed in Health check stage. Is there any solution to solve this problem? ...

dnt -kamal

1

asked Aug 1, 2023 at 12:20

1 vote

0 answers

401 views

Slurm Cluster Python Script Not Running on Multiple Nodes using SBATCH

We recently setup a Slurm Cluster with 2 Nodes(1 headnode+compute node and 1 compute nodes) for some HPC CFD simulations.Right now i am trying to run some python script which is used for feature ...

akhil kumar

1,626

asked Jul 27, 2023 at 11:32

1 vote

0 answers

224 views

Why is it that I keep getting the error Insufficient 'DISKS_TOTAL_GB' quota when trying to create cluster on google cloud console?

I keep getting the error Insufficient 'DISKS_TOTAL_GB' quota when I try create a compute engine cluster on google cloud console. It is strange because I have no disks that are active. All the disks ...

Nick

11

asked Jul 26, 2023 at 14:05

1 vote

1 answer

193 views

Storing slurmd node computation outputs in a database?

I have a slurm cluster of 8 separate rhel9 server nodes to serve as master, compute, and database nodes. The master and compute nodes are up and talking, but I have not activated the database node yet....

paul runner

53

asked Jul 21, 2023 at 21:03

0 votes

2 answers

1k views

Pyspark stuck and not processing. It shows more than 1K processes

I have a for loop running in databricks and the first iterations run fast, then it gets slower and then it doesn't proceeds at all. While I know that is common in for loops if data size is increasing ...

Eugenio.Gastelum96

462

asked Jul 16, 2023 at 3:28

1 vote

1 answer

940 views

Running an Independent SLURM Job with mpirun Inside a Python Script - Recursive Call Error

I'm currently using a Python script that requires MPI to operate. This script runs on a SLURM system. In order to run my Python script, I define the number of nodes to use and launch the following ...

yvrob

105

asked Jul 12, 2023 at 16:52

2 votes

1 answer

95 views

How to unnest an object from `spatialsampling`'s `spatial_clustering_cv`

I would like to perform spatial clustering on a sf object and attach the fold IDs to my original dataframe, in a new column. Here's what my input looks like (an sf object with points) # A tibble: 6 × ...

Nova

5,980

asked Jul 11, 2023 at 17:42

0 votes

1 answer

145 views

Long-polling with message queues in a clustered environment

I have a system design question that I'm looking for some guidance on. I have two different systems that need to have a basic level of communication. This is abstracted via message queues. For ...

user1597121

343

asked Jul 10, 2023 at 0:10

2 votes

2 answers

1k views

Install and run containers on Slurm HPC

I have an hpc slurm on ubuntu and I want to install docker or docker rootles. But I can't find anything on official sites, so how could I install docker on a slurm cluster and run containers or if you ...

Sergiu Neagoe

25

asked Jul 6, 2023 at 7:01

1 vote

1 answer

91 views

How can I incorporate cluster-robust standard errors into a randomization test using the 'ritest' function?

I'm currently attempting to perform a randomization test following several regressions. However, I'm encountering difficulties in incorporating cluster-robust standard errors into my randomization ...

user1290547

43

asked Jul 1, 2023 at 2:21

0 votes

1 answer

148 views

Data not available after neo4j-admin import (Causal Cluster / Neo4J Enterprise Edition)

I am using neo4j-admin import command to import data into a neo4j causal cluster with neo4j enterprise 4.4.7 installed on 9 Ubuntu VMs configured as CORE instances. Cluster is functional and works. ...

Esanu Codrin Stefan

1

asked Jun 25, 2023 at 22:27

0 votes

1 answer

464 views

How to manually cluster rows for heatmap

I am trying to make a heatmap composing of several different "clusters" of gene types. For example, 5 genes are related to protein folding, 5 genes are related to ECM composition, etc. I ...

Maria Faleeva

57

asked Jun 22, 2023 at 10:34

0 votes

1 answer

268 views

CDO application issue

I run a climate model on cluster. The necessary moudels have been loaded, icluding: module load netcdf/c/4.6.1-intel-2013.1 module load netcdf/fortran/4.4.4-intel-2013.1 module load cdo/1.9.1 The ...

M Wang

1

asked Jun 21, 2023 at 5:08

0 votes

0 answers

318 views

connection to the server X.X.X.X:6443 was refused - did you specify the right host or port?

im trying to configure a cluster with master and 2 workers on ec2 machines. I followed the guide on kubernetes and it gave final result of NAME STATUS ROLES AGE VERSION ...

aviv levari

33

asked Jun 12, 2023 at 16:34

1 vote

1 answer

539 views

slurm ignores dependency on running job

Suppose that on a cluster with slurm the job with ID 12345 is currently running. I want to submit another job that will start after this job finishes. I tried sbatch -d after:12345 job.script, but I ...

stardt

1,229

asked Jun 8, 2023 at 22:09

0 votes

0 answers

429 views

S2d Failover Cluster Permissions

I have just setup a new 3 node cluster with s2d using a storage pool of 6 drives per node but when I reboot any of the nodes, one or more of the attached drives become detached from the pool and ...

David Crawford

305

asked Jun 4, 2023 at 13:22

0 votes

1 answer

286 views

problems joining nodes 2 and 3 in a Galera cluster with mysql on ubuntu server 20.04

My project consists of making a Galera cluster with MySQL on ubuntu 20.04. The problem has been generated when joining the nodes (I have 3) to mysql: the main node does not give me any problem but ...

Rubén Cortés Barba

1

asked Jun 2, 2023 at 11:24

Collectives™ on Stack Overflow

How training loop for 1 epoch is taking way longer than all batches of trainloader loop execution time?

Windows Failover Cluster Generic Script Resource: How to get IP-Address?

Wildfly 27 : Cluster not work when we have multiple servergroups

How to get VS Code Server working in RHEL9?

Can i do a K-Means cluster analysis based on only one variable (in R)?

cluster failover generic application

How do I run an Octave code on a cluster?

How to allow %run magic command in Azure Databricks cluster?

GKE cluster creation failed in health check

Slurm Cluster Python Script Not Running on Multiple Nodes using SBATCH

Why is it that I keep getting the error Insufficient 'DISKS_TOTAL_GB' quota when trying to create cluster on google cloud console?

Storing slurmd node computation outputs in a database?

Pyspark stuck and not processing. It shows more than 1K processes

Running an Independent SLURM Job with mpirun Inside a Python Script - Recursive Call Error

How to unnest an object from `spatialsampling`'s `spatial_clustering_cv`

Long-polling with message queues in a clustered environment

Install and run containers on Slurm HPC

How can I incorporate cluster-robust standard errors into a randomization test using the 'ritest' function?

Data not available after neo4j-admin import (Causal Cluster / Neo4J Enterprise Edition)

How to manually cluster rows for heatmap

CDO application issue

connection to the server X.X.X.X:6443 was refused - did you specify the right host or port?

slurm ignores dependency on running job

S2d Failover Cluster Permissions

problems joining nodes 2 and 3 in a Galera cluster with mysql on ubuntu server 20.04

Hot Network Questions