GalIC/README000644 000765 000024 00000016032 12373725757 013346 0ustar00volkerstaff000000 000000 ------------------------------------------------------------------------ GALIC v1.0 - A code for the creation of galaxy inititial conditions ------------------------------------------------------------------------ Copyright (c) 2014 by Denis Yurin and Volker Springel Heidelberg Institute for Theoretical Studies (HITS) Schloss-Wolfsbrunnenweg 35, 69118 Heidelberg, Germany, Heidelberg University, Zentrum fuer Astronomy, ARI Moenchhofstrasse 12-14, 69120 Heidelberg, Germany Code web-site: http://www.h-its.org/tap/galic GALIC is an implementation of a new iterative method to construct steady state composite halo-disk-bulge galaxy models with prescribed density distribution and velocity anisotropy. The method is described in full in the paper Yurin D., Springel, V., 2014, MNRAS, in press (see also the preprin at http://arxiv.org/abs/1402.1623) Users of the code are kindly asked to cite the paper if they make use of the code. The code is released "as is", without any guarantees or warrantees. ------------ Dependencies ------------ GalIC needs the following non-standard libraries for compilation: mpi - the ‘Message Passing Interface’ (http://www-unix.mcs.anl.gov/mpi/mpich) gsl - the GNU scientific library. This open-source package can be obtained at http://www.gnu.org/software/gsl hdf5 - the ‘Hierarchical Data Format’ (available at http://hdf.ncsa.uiuc.edu/HDF5). This library is optional and only needed when one wants to read or write snapshot files in HDF format. ----------- Compilation ----------- Please first copy "Template-Makefile.systype" file to "Makefile.systype" and uncomment your system if a suitable one is already predefined. If your system is not listed, then you should define a corresponding section in the "Makefile", and activate it in "Makefile.systype". Next, please copy the "Template-Config.sh" file to "Config.sh" and uncomment the compile-time options specified there according to your needs. Once the above steps are completed, it should be possible to compile the code by simply executing "make". ----- Usage ----- To start GalIC, run the executable with a command of the form mpirun -np 12 ./GalIC myparameterfile.param This example will run GalIC using 12 MPI processors, and with parameters as specified in the parameter file "myparameterfile.param", which is passed as an argument to the program. The number of MPI ranks that is used is arbitrary, and it is also possible to run the code in serial. ------------- Parameterfile ------------- The parameterfile is a simple text file that defines in each line a parameter-value pair. The sequence of the parameters is arbitrary, and comment lines are also allowed. The code will complain about missing or duplicate parameters. For creating a parameterfile, it is best to refer to one of the example files ("Model_H1.param", "Model_H2.param", etc.) included in the code description. These correspond to the models considered in the method paper (as listed in Table 1 of the paper) and contain comments that explain the parameters briefly. As an important step, we note that the total mass of the halo-disk-bulge system needs to be specified. GalIC assumes that this mass is the total mass enclosed within the virial radius of an equivalent NFW halo. To determine the halo mass distribution one should set the parameter V200 (the circular velocity at the virial radius) and CC (the concentration parameter) - see for details http://arxiv.org/abs/astro-ph/0411108. Another important parameter is the number of particles you want to use for the different components of the galaxy model, which are specified through N_HALO, N_DISK, N_BULGE. The mass of the disk and bulge has to be specified as fraction of the total mass via the MD and MB parameters. Both halo and bulge are parametrized as Hernquist spheres. The scale length of the bulge is determined by 'BulgeSize' and must be given in units of halo scale length, which GalIC computes automatically from the V200 and CC parameters. To parametrize the disk an exponential profile in the radial direction is used. One needs to specify the disk spin fraction JD (which essentially defines disk scale length) and thickness of the disk through 'DiskHeight'. Finally, the type of velocity structure of each component is selected with the parameters: TypeOfHaloVelocityStructure, TypeOfDiskVelocityStructure, and TypeOfBulgeVelocityStructure. For each component one can use four values: 0 - the velocity ellipsoid is everywhere isotropic, and its radial variation is spherically symmetrical; 1 - the same as 0, but now the specific ratio between radial velocity dispersion and the velocity dispersion that is perpendicular to the radial one can be specified by HaloBetaParameter and BulgeBetaParameter; 2 - this case represents systems where the velocity ellipsoid is isotropic in the meridional plane but it has different shape and may have a first moment in the azimuthal direction, as specified by HaloStreamingVelocityParameter, DiskStreamingVelocityParameter. We refer to these systems as axisymmetric systems with two integrals of motion, total energy E and vertical component of angular momentum Lz. The streaming parameter either refers directly to "k" as given in the paper, or it gives k in units of kmax if a negative value is adopted. 3 - this case is a combination of cases 1 and 2, it allows to choose the specific ratio between radial velocity dispersion and the one that is perpendicular to the orbital plane via the HaloDispersionRoverZratio, DiskDispersionRoverZratio and BulgeDispersionRoverZratio parameters, and also to specify a streaming velocity via HaloStreamingVelocityParameter, or DiskStreamingVelocityParameter. This case corresponds to axisymmetric systems with three integrals of motion, E, Lz and a non-classical one I3. All other parameters are related to the procedure of finding a solution, and typically need not be altered. You can find short comments on them in the example parameter files provided with the code. These 20 parameter files were used to create the initial condition described in the paper. ------ Output ------ As GalIC progresses, it regularly dumps "snaphot files", which can be used as initial conditions files. The latest snapshot represents the last optimization state, and represents the best initial conditions produced by the code thus far. GalIC supports the three file formats of the GADGET code for its output snapshot files (i.e. the venerable 'type1' format, the slightly improved 'type2' format, and an HDF5 format). The default file format in the example parametersfiles is the plain type1 format. A documentation of this file format can be found in http://www.mpa-garching.mpg.de/gadget/users-guide.pdf ------- Restart ------- The general usage of GALIC is in fact quite similar to the GADGET code. However, restarts are currently not yet supported. GalIC/Makefile000644 000765 000024 00000020735 12373727234 014123 0ustar00volkerstaff000000 000000 # #/******************************************************************************* # * This file is part of the GALIC code developed by D. Yurin and V. Springel. # * # * Copyright (c) 2014 # * Denis Yurin (denis.yurin@h-its.org) # * Volker Springel (volker.springel@h-its.org) # *******************************************************************************/ # # You might be looking for the compile-time Makefile options of the code... # # They have moved to a separate file. # # To build the code, do the following: # # (1) Copy the file "Template-Config.sh" to "Config.sh" # # cp Template-Config.sh Config.sh # # (2) Edit "Config.sh" as needed for your application # # (3) Run "make" # # # # # New compile-time options should be added to the # file "Template-Config.sh" only. Usually, the should be added # there in the disabled/default version. # # "Config.sh" should not be checked in to the repository # # Note: It is possible to override the default name of the # Config.sh file, if desired, as well as the name of the # executable. For example: # # make CONFIG=MyNewConf.sh EXEC=GalIC_new # #----------------------------------------------------------------- # # You might also be looking for the target system SYSTYPE option # # It has also moved to a separate file. # # To build the code, do the following: # # (A) set the SYSTYPE variable in your .bashrc (or similar file): # # e.g. export SYSTYPE=Magny # or # # (B) set SYSTYPE in Makefile.systype # This file has priority over your shell variable: # # (1) Copy the file "Template-Makefile.systype" to "Makefile.systype" # # cp Template-Makefile.systype Makefile.systype # # (2) Uncomment your system in "Makefile.systype". # # If you add an ifeq for a new system below, also add that systype to # Template-Makefile.systype EXEC = GalIC CONFIG = Config.sh BUILD_DIR = build SRC_DIR = src ################### #determine SYSTYPE# ################### ifdef SYSTYPE SYSTYPE := "$(SYSTYPE)" -include Makefile.systype else include Makefile.systype endif MAKEFILES = Makefile config-makefile ifeq ($(wildcard Makefile.systype), Makefile.systype) MAKEFILES += Makefile.systype endif PERL = /usr/bin/perl RESULT := $(shell CONFIG=$(CONFIG) PERL=$(PERL) BUILD_DIR=$(BUILD_DIR) make -f config-makefile) CONFIGVARS := $(shell cat $(BUILD_DIR)/galicconfig.h) MPICHLIB = -lmpich GMPLIB = -lgmp GSLLIB = -lgsl -lgslcblas MATHLIB = -lm ifeq ($(SYSTYPE),"Darwin") CC = mpicc # sets the C-compiler OPTIMIZE = -ggdb -O3 -Wall -Wno-format-security -Wno-unknown-pragmas ifeq (NUM_THREADS,$(findstring NUM_THREADS,$(CONFIGVARS))) OPTIMIZE += -fopenmp MPI_COMPILE_FLAGS = $(shell mpicc --showme:compile) CC = gcc $(MPI_COMPILE_FLAGS) # to replace clang with gcc (mpicc uses clang for some reason) endif GSL_INCL = -I/sw/include -I/opt/local/include GSL_LIBS = -L/sw/lib -L/opt/local/lib FFTW_INCL= -I/sw/include -I/opt/local/include FFTW_LIBS= -L/sw/lib -L/opt/local/lib MPICHLIB = -lmpi HDF5INCL = -I/sw/lib -I/opt/local/include -DH5_USE_16_API #-DUSE_SSE HDF5LIB = -L/sw/lib -L/opt/local/lib -lhdf5 -lz endif ifeq ($(SYSTYPE),"Darwin-mpich") CC = mpicc # sets the C-compiler LINKER = mpic++ OPTIMIZE = -m64 -ggdb -O3 -Wall -Wextra -Wno-format-security -Wno-unknown-pragmas ifeq (NUM_THREADS,$(findstring NUM_THREADS,$(CONFIGVARS))) OPTIMIZE+= -fopenmp endif GSL_INCL = -I/opt/local/include GSL_LIBS = -L/opt/local/lib FFTW_INCL= -I/opt/local/include FFTW_LIBS= -L/opt/local/lib HDF5INCL = -I/opt/local/include -DH5_USE_16_API HDF5LIB = -L/opt/local/lib -lhdf5 -lz CUDA_INCL= -I/Developer/NVIDIA/CUDA-5.0/include CUDA_LIBS= -Xlinker -rpath /Developer/NVIDIA/CUDA-5.0/lib -L/Developer/NVIDIA/CUDA-5.0/lib -lcudart -lnvToolsExt -framework CUDA NVCC = /Developer/NVIDIA/CUDA-5.0/bin/nvcc CUDA_OPTIMIZE = -g -G -O3 -m64 --ptxas-options=-v -Xptxas="-v" --maxrregcount=32 -arch=sm_30 $(filter -I%,$(shell mpicc -show)) endif # modules for Magny # module add mvapich2/gcc/64/1.6-qlc ifeq ($(SYSTYPE),"Magny") CC = mpicc OPTIMIZE = -g -Wall -m64 -O3 -msse3 ifeq (NUM_THREADS,$(findstring NUM_THREADS,$(CONFIGVARS))) OPTIMIZE += -fopenmp else OPTIMIZE += -Wno-unknown-pragmas endif GSL_INCL = -I/hits/tap/sw/libs/gsl-1.15/include GSL_LIBS = -L/hits/tap/sw/libs/gsl-1.15/lib -Xlinker -R -Xlinker /hits/tap/sw/libs/gsl-1.15/lib FFTW_INCL= -I/hits/tap/sw/libs/fftw-2.1.5/include FFTW_LIBS= -L/hits/tap/sw/libs/fftw-2.1.5/lib -Xlinker -R -Xlinker /hits/tap/sw/libs/fftw-2.1.5/lib GMP_INCL = -I/hits/tap/sw/libs/gmp-5.0.5/include GMP_LIBS = -L/hits/tap/sw/libs/gmp-5.0.5/lib -Xlinker -R -Xlinker /hits/tap/sw/libs/gmp-5.0.5/lib MPICHLIB = HDF5INCL = -I/hits/tap/sw/libs/hdf5-1.8.10/include -DH5_USE_16_API HDF5LIB = -L/hits/tap/sw/libs/hdf5-1.8.10/lib -lhdf5 -Xlinker -R -Xlinker /hits/tap/sw/libs/hdf5-1.8.10/lib #OPT += -DNOCALLSOFSYSTEM #OPT += -DIMPOSE_PINNING #OPT += -DUSE_SSE endif # modules for Nehalem cluster # module add mvapich2/gcc/64/1.6 ifeq ($(SYSTYPE),"Nehalem") CC = mpicc ifeq (SOFTDOUBLEDOUBLE,$(findstring SOFTDOUBLEDOUBLE,$(CONFIGVARS))) CC = mpicxx endif OPTIMIZE = -O3 -msse3 -g -Wall -m64 GSL_INCL = -I/hits/tap/sw/libs/include GSL_LIBS = -L/hits/tap/sw/libs/lib -Xlinker -R -Xlinker /hits/tap/sw/libs/lib FFTW_INCL= -I/hits/tap/sw/libs/include FFTW_LIBS= -L/hits/tap/sw/libs/lib -Xlinker -R -Xlinker /hits/tap/sw/libs/lib MPICHLIB = HDF5INCL = -I/hits/tap/sw/nehalem/include -DH5_USE_16_API #HDF5LIB = -L/hits/tap/sw/nehalem/lib -Xlinker -R -Xlinker /hits/tap/sw/nehalem/lib -lhdf5 HDF5LIB = /hits/tap/sw/nehalem/lib/libhdf5.a -lz #OPT += -DNOCALLSOFSYSTEM #OPT += -DIMPOSE_PINNING #OPT += -DUSE_SSE endif # modules for Magny-Intel # module load intel/compiler # module load mvapich2/intel/64/1.6-qlc ifeq ($(SYSTYPE),"Magny-Intel") CC = mpicc OPTIMIZE = -O2 -g -Wall -m64 ifeq (NUM_THREADS,$(findstring NUM_THREADS,$(CONFIGVARS))) OPTIMIZE += -openmp else OPTIMIZE += -Wno-unknown-pragmas endif GSL_INCL = -I/hits/tap/sw/libs/include GSL_LIBS = -L/hits/tap/sw/libs/lib -Xlinker -R -Xlinker /hits/tap/sw/libs/lib FFTW_INCL= FFTW_LIBS= MPICHLIB = MATHLIB = -limf -lm HDF5INCL = -DH5_USE_16_API HDF5LIB = -lhdf5 #OPT += -DNOCALLSOFSYSTEM OPT += -DIMPOSE_PINNING endif ifndef LINKER LINKER = $(CC) endif ########################################## #determine the needed object/header files# ########################################## SUBDIRS = . OBJS = main.o allocate.o allvars.o disk.o grid.o bulge.o set_particles.o parallel_sort.o \ halo.o init.o io.o mymalloc.o orbit_response.o parameters.o structure.o system.o disp_fields.o \ forcetree/gravtree.o forcetree/forcetree.o forcetree/forcetree_walk.o domain/peano.o domain/pqueue.o \ domain/domain.o domain/domain_balance.o domain/domain_counttogo.o domain/domain_exchange.o \ domain/domain_rearrange.o domain/domain_sort_kernels.o domain/domain_toplevel.o domain/domain_vars.o domain/domain_box.o INCL += allvars.h proto.h SUBDIRS += forcetree domain ################################ #determine the needed libraries# ################################ ifneq (HAVE_HDF5,$(findstring HAVE_HDF5,$(CONFIGVARS))) HDF5LIB = endif ifeq (NUM_THREADS,$(findstring NUM_THREADS,$(CONFIGVARS))) THREAD_LIB = endif ########################## #combine compiler options# ########################## CFLAGS = $(OPTIMIZE) $(OPT) $(HDF5INCL) $(GSL_INCL) $(FFTW_INCL) $(ODE_INCL) $(GMP_INCL) $(MKL_INCL) $(CUDA_INCL) -I$(BUILD_DIR) CFLAGS_CUDA = $(CUDA_OPTIMIZE) $(OPT) $(GSL_INCL) $(FFTW_INCL) $(HDF5INCL) $(ODE_INCL) $(GMP_INCL) $(MKL_INCL) $(CUDA_INCL) -I$(BUILD_DIR) LIBS = $(MATHLIB) $(HDF5LIB) $(MPICHLIB) $(GSL_LIBS) $(GSLLIB) $(FFTW_LIB) $(GMP_LIBS) $(GMPLIB) $(ODE_LIB) $(MKL_LIBS) $(THREAD_LIB) $(CUDA_LIBS) SUBDIRS := $(addprefix $(BUILD_DIR)/,$(SUBDIRS)) OBJS := $(addprefix $(BUILD_DIR)/,$(OBJS)) $(BUILD_DIR)/compile_time_info.o INCL := $(addprefix $(SRC_DIR)/,$(INCL)) $(BUILD_DIR)/galicconfig.h ################ #create subdirs# ################ RESULT := $(shell mkdir -p $(SUBDIRS) ) ############# #build rules# ############# all: $(EXEC) $(EXEC): $(OBJS) $(LINKER) $(OPTIMIZE) $(OBJS) $(LIBS) -o $(EXEC) clean: rm -f $(OBJS) $(EXEC) lib$(LIBRARY).a rm -f $(BUILD_DIR)/compile_time_info.c $(BUILD_DIR)/galicconfig.h $(BUILD_DIR)/%.o: $(SRC_DIR)/%.c $(INCL) $(MAKEFILES) $(CC) $(CFLAGS) -c $< -o $@ $(BUILD_DIR)/compile_time_info.o: $(BUILD_DIR)/compile_time_info.c $(MAKEFILES) $(CC) $(CFLAGS) -c $< -o $@ GalIC/Template-Config.sh000644 000765 000024 00000001607 12373713531 015765 0ustar00volkerstaff000000 000000 #******************************************************************************* # This file is part of the GALIC code developed by D. Yurin and V. Springel. # # Copyright (c) 2014 # Denis Yurin (denis.yurin@h-its.org) # Volker Springel (volker.springel@h-its.org) #******************************************************************************* #!/bin/bash # this line only there to enable syntax highlighting in this file #---------------------------------------- Single/Double Precision DOUBLEPRECISION=1 #OUTPUT_IN_DOUBLEPRECISION # snapshot files will be written in double precision #--------------------------------------- Output/Input options HAVE_HDF5 # needed when HDF5 I/O support is desired #DEBUG_ENABLE_FPU_EXCEPTIONS #enables floating point exceptions #--------------------------------------- Special behaviour #RADIAL_WEIGHTING_IN_DENSITY_RESPONSEGalIC/Template-Makefile.systype000644 000765 000024 00000001530 12373713530 017375 0ustar00volkerstaff000000 000000 # Select Target Computer # # Please copy this file to Makefile.systype and uncomment your # system. Don't commit changes to this file unless you add support for # a new system. #SYSTYPE="Curie" #SYSTYPE="Hermite" #SYSTYPE="Ranger_pgi" #SYSTYPE="Ranger_intel" #SYSTYPE="lonestar" #SYSTYPE="Kraken_pgi" #SYSTYPE="aurora" #SYSTYPE="hecate" #SYSTYPE="Darwin" #SYSTYPE="Darwin-mpich" #SYSTYPE="MBM" #SYSTYPE="Magny" #SYSTYPE="Magny-Intel" #SYSTYPE="Nehalem" #SYSTYPE="OpenSuse" #SYSTYPE="OpenSuse64" #SYSTYPE="OpenSuse64-cuda" #SYSTYPE="Judge" #SYSTYPE="HLRB2" #SYSTYPE="OPA-Cluster64-Intel" #SYSTYPE="OPA-Cluster64-Gnu" #SYSTYPE="Odin" #SYSTYPE="OpteronMPA-Gnu" #SYSTYPE="OpteronMPA-Intel" #SYSTYPE="MPA" #SYSTYPE="VIP" #SYSTYPE="odyssey" #SYSTYPE="odyssey-intel" #SYSTYPE="odyssey-opteron" #SYSTYPE="Ubuntu" #SYSTYPE="Centos5-intel" #SYSTYPE="Centos5-Gnu" GalIC/config-makefile000644 000765 000024 00000000255 12373713530 015413 0ustar00volkerstaff000000 000000 RESULT := $(shell mkdir -p $(BUILD_DIR) ) all: $(BUILD_DIR)/galicconfig.h $(BUILD_DIR)/galicconfig.h: $(CONFIG) $(PERL) prepare-config.perl $(CONFIG) $(BUILD_DIR) GalIC/prepare-config.perl000644 000765 000024 00000002460 12373727413 016242 0ustar00volkerstaff000000 000000 #******************************************************************************* # This file is part of the GALIC code developed by D. Yurin and V. Springel. # # Copyright (c) 2014 # Denis Yurin (denis.yurin@h-its.org) # Volker Springel (volker.springel@h-its.org) #******************************************************************************* # # This file processes the configurations options in Config.sh, producing # two files: # # galicconfig.h to be included in each source file (via allvars.h) # compile_time_info.c code to be compiled in, which will print the configuration # if( @ARGV != 2) { print "usage: perl prepare-config.perl \n"; exit; } open(FILE, @ARGV[0]); $path = @ARGV[1]; open(OUTFILE, ">${path}/galicconfig.h"); open(COUTF, ">${path}/compile_time_info.c"); print COUTF "#include \n"; print COUTF "void output_compile_time_options(void)\n\{\n"; print COUTF "printf(\n"; while($line=) { chop $line; @fields = split ' ' , $line; if(substr($fields[0], 0, 1) ne "#") { if(length($fields[0]) > 0) { @subfields = split '=', $fields[0]; print OUTFILE "#define $subfields[0] $subfields[1]\n"; print COUTF "\" $fields[0]\\n\"\n"; } } } print COUTF "\"\\n\");\n"; print COUTF "\}\n"; GalIC/src/allocate.c000644 000765 000024 00000006016 12373713530 015170 0ustar00volkerstaff000000 000000 /******************************************************************************* * This file is part of the GALIC code developed by D. Yurin and V. Springel. * * Copyright (c) 2014 * Denis Yurin (denis.yurin@h-its.org) * Volker Springel (volker.springel@h-its.org) *******************************************************************************/ #include #include #include #include #include #include "allvars.h" #include "proto.h" /* This routine allocates memory for * particle storage, both the collisionless and the SPH particles. * The memory for the ordered binary tree of the timeline * is also allocated. */ void allocate_memory(void) { int NTaskTimesThreads; NTaskTimesThreads = MaxThreads * NTask; Exportflag = (int *) mymalloc("Exportflag", NTaskTimesThreads * sizeof(int)); Exportindex = (int *) mymalloc("Exportindex", NTaskTimesThreads * sizeof(int)); Exportnodecount = (int *) mymalloc("Exportnodecount", NTaskTimesThreads * sizeof(int)); Send_count = (int *) mymalloc("Send_count", sizeof(int) * NTaskTimesThreads); Send_offset = (int *) mymalloc("Send_offset", sizeof(int) * NTaskTimesThreads); Recv_count = (int *) mymalloc("Recv_count", sizeof(int) * NTask); Recv_offset = (int *) mymalloc("Recv_offset", sizeof(int) * NTask); Send_count_nodes = (int *) mymalloc("Send_count_nodes", sizeof(int) * NTask); Send_offset_nodes = (int *) mymalloc("Send_offset_nodes", sizeof(int) * NTask); Recv_count_nodes = (int *) mymalloc("Recv_count_nodes", sizeof(int) * NTask); Recv_offset_nodes = (int *) mymalloc("Recv_offset_nodes", sizeof(int) * NTask); Mesh_Send_count = (int *) mymalloc("Mesh_Send_count", sizeof(int) * NTask); Mesh_Send_offset = (int *) mymalloc("Mesh_Send_offset", sizeof(int) * NTask); Mesh_Recv_count = (int *) mymalloc("Mesh_Recv_count", sizeof(int) * NTask); Mesh_Recv_offset = (int *) mymalloc("Mesh_Recv_offset", sizeof(int) * NTask); P = (struct particle_data *) mymalloc_movable(&P, "P", All.MaxPart * sizeof(struct particle_data)); ActiveGravityParticles = (int *) mymalloc_movable(&ActiveGravityParticles, "ActiveGravityParticle", All.MaxPart * sizeof(int)); /* set to zero */ memset(P, 0, All.MaxPart * sizeof(struct particle_data)); } void free_allocated_memory(void) { myfree(ActiveGravityParticles); myfree(P); myfree(Mesh_Recv_offset); myfree(Mesh_Recv_count); myfree(Mesh_Send_offset); myfree(Mesh_Send_count); myfree(Recv_offset_nodes); myfree(Recv_count_nodes); myfree(Send_offset_nodes); myfree(Send_count_nodes); myfree(Recv_offset); myfree(Recv_count); myfree(Send_offset); myfree(Send_count); myfree(Exportnodecount); myfree(Exportindex); myfree(Exportflag); } void reallocate_memory_maxpart(void) { mpi_printf("ALLOCATE: Changing to MaxPart = %d\n", All.MaxPart); P = (struct particle_data *) myrealloc_movable(P, All.MaxPart * sizeof(struct particle_data)); ActiveGravityParticles = (int *) myrealloc_movable(ActiveGravityParticles, All.MaxPart * sizeof(int)); } GalIC/src/allvars.c000644 000765 000024 00000027772 12373713530 015064 0ustar00volkerstaff000000 000000 /******************************************************************************* * This file is part of the GALIC code developed by D. Yurin and V. Springel. * * Copyright (c) 2014 * Denis Yurin (denis.yurin@h-its.org) * Volker Springel (volker.springel@h-its.org) *******************************************************************************/ /*! \file allvars.h * \brief declares global variables. * * This file declares all global variables. Further variables should be added here, and declared as * 'extern'. The actual existence of these variables is provided by the file 'allvars.c'. To produce * 'allvars.c' from 'allvars.h', do the following: * * - Erase all #define statements * - add #include "allvars.h" * - delete all keywords 'extern' * - delete all struct definitions enclosed in {...}, e.g. * "extern struct global_data_all_processes {....} All;" * becomes "struct global_data_all_processes All;" */ #include "allvars.h" #ifdef PERIODIC MyDouble boxSize, boxHalf; #ifdef LONG_X MyDouble boxSize_X, boxHalf_X; #else #endif #ifdef LONG_Y MyDouble boxSize_Y, boxHalf_Y; #else #endif #ifdef LONG_Z MyDouble boxSize_Z, boxHalf_Z; #else #endif #endif #ifdef FIX_PATHSCALE_MPI_STATUS_IGNORE_BUG MPI_Status mpistat; #endif /*********************************************************/ /* Global variables */ /*********************************************************/ int FG_Nbin, FG_Ngrid; double FG_Rmin, FG_Fac, FG_Rin; double *FG_Pot; double *FG_DPotDR; double *FG_DPotDz; double *FG_Pot_exact; double *FG_DPotDR_exact; double *FG_DPotDz_exact; double *FG_Disp_r[6]; double *FG_DispZ[6]; double *FG_DispPhi[6]; double *FG_Vstream[6]; double *FG_tilted_vz2[6]; double *FG_tilted_vR2[6]; double *FG_tilted_vz2_prime[6]; double *FG_tilted_vR2_prime[6]; double *FG_R; int EG_MaxLevel, EG_Nstack, EG_Nbin, EG_Ngrid; double EG_Fac, EG_Rin, EG_Rmin; double *EG_R; double *EGs_EgyResponse_r[6]; double *EGs_EgyResponse_t[6]; double *EGs_EgyResponse_p[6]; double *EGs_EgyResponse_q[6]; double *EGs_EgyTarget_r[6]; double *EGs_EgyTarget_t[6]; double *EGs_EgyTarget_p[6]; double *EGs_EgyTarget_q[6]; double *EGs_MassTarget[6]; double *EGs_MassResponse[6]; double *EG_MassLoc[6]; double *EG_EgyResponseRLoc[6]; double *EG_EgyResponseTLoc[6]; double *EG_EgyResponsePLoc[6]; double *EG_EgyResponseQLoc[6]; double *EG_EgyResponseRLoc_delta[6]; double *EG_EgyResponseTLoc_delta[6]; double *EG_EgyResponsePLoc_delta[6]; double *EG_EgyResponseQLoc_delta[6]; int DG_MaxLevel, DG_Nstack, DG_Nbin, DG_Ngrid; double DG_Rmin, DG_Fac, DG_Rin; double *DG_CellVol; double *DG_CellSize; double *DGs_LogR; double *DGs_LogZ; double *DGs_Distance; double *DGs_MassTarget[6]; double *DGs_MassResponse[6]; double *DG_MassLoc[6]; double *DG_MassLoc_delta[6]; double Totorbits[6]; int Tries[6]; int Changes[6]; double TotDv2Sum[6]; double Epsilon; double Tintegrate; double S[6]; double Sdisp_r[6]; double Sdisp_t[6]; double Sdisp_p[6]; double Sdisp_q[6]; double Srelfac[6]; double Srelfac_count[6]; double MType[6]; int NType[6]; double SizeType[6]; int CountLargeChange[6]; int Noptimized; FILE *FdFit[6]; int ThisTask; /*!< the number of the local processor */ int NTask; /*!< number of processors */ int PTask; /*!< note: NTask = 2^PTask */ double CPUThisRun; /*!< Sums CPU time of current process */ int NumForceUpdate; /*!< number of active particles on local processor in current timestep */ long long GlobNumForceUpdate; int NumSphUpdate; /*!< number of active SPH particles on local processor in current timestep */ int MaxTopNodes; /*!< Maximum number of nodes in the top-level tree used for domain decomposition */ int RestartFlag; /*!< taken from command line used to start code. 0 is normal start-up from initial conditions, 1 is resuming a run from a set of restart files, while 2 marks a restart from a snapshot file. */ int RestartSnapNum; int Argc; char **Argv; int Nforces; int Ndensities; int Nhydroforces; int *TargetList; int *Threads_P_CostCount[NUM_THREADS]; int *Threads_TreePoints_CostCount[NUM_THREADS]; int *Threads_Node_CostCount[NUM_THREADS]; int maxThreads = NUM_THREADS; #ifdef IMPOSE_PINNING cpu_set_t cpuset_thread[NUM_THREADS]; #endif int *Exportflag, *ThreadsExportflag[NUM_THREADS]; /*!< Buffer used for flagging whether a particle needs to be exported to another process */ int *Exportnodecount; int *Exportindex; int *Send_offset, *Send_count, *Recv_count, *Recv_offset; int *Send_offset_nodes, *Send_count_nodes, *Recv_count_nodes, *Recv_offset_nodes; int Mesh_nimport, Mesh_nexport, *Mesh_Send_offset, *Mesh_Send_count, *Mesh_Recv_count, *Mesh_Recv_offset; int TakeLevel; int SelRnd; FILE *FdMemory; unsigned char *ProcessedFlag; int TimeBinCount[TIMEBINS]; int TimeBinCountSph[TIMEBINS]; int TimeBinCountSphHydro[TIMEBINS]; int TimeBinActive[TIMEBINS]; int NActiveHydro; int NActiveGravity; int *ActiveGravityParticles; int *ActiveHydroParticles; long long GlobalNActiveHydro; long long GlobalNActiveGravity; #ifdef USE_SFR double TimeBinSfr[TIMEBINS]; #endif #ifdef SUBFIND int GrNr; int NumPartGroup; #endif int FlagNyt = 0; char DumpFlag = 1; size_t AllocatedBytes; size_t HighMarkBytes; size_t FreeBytes; size_t HighMark_run, HighMark_domain, HighMark_gravtree, HighMark_pmperiodic, HighMark_pmnonperiodic, HighMark_sphdensity, HighMark_sphhydro, HighMark_subfind_processing, HighMark_subfind_density; double WallclockTime; /*!< This holds the last wallclock time measurement for timings measurements */ double StartOfRun; /*!< This stores the time of the start of the run for evaluating the elapsed time */ double EgyInjection; int NumPart; /*!< number of particles on the LOCAL processor */ int NumGas; /*!< number of gas particles on the LOCAL processor */ gsl_rng *random_generator; /*!< the random number generator used */ #ifdef USE_SFR int Stars_converted; /*!< current number of star particles in gas particle block */ #endif #ifdef TOLERATE_WRITE_ERROR int WriteErrorFlag; #endif double TimeOfLastDomainConstruction; /*!< holds what it says */ int *Ngblist; /*!< Buffer to hold indices of neighbours retrieved by the neighbour search routines */ double DomainCorner[3], DomainCenter[3], DomainLen, DomainFac; double DomainInverseLen, DomainBigFac; int *DomainStartList, *DomainEndList; double *DomainCost, *TaskCost; int *DomainCount, *TaskCount; struct no_list_data *ListNoData; int domain_bintolevel[TIMEBINS]; int domain_refbin[TIMEBINS]; int domain_corr_weight[TIMEBINS]; int domain_full_weight[TIMEBINS]; double domain_reffactor[TIMEBINS]; int domain_to_be_balanced[TIMEBINS]; int *DomainTask; int *DomainNewTask; int *DomainNodeIndex; peanokey *Key, *KeySorted; struct topnode_data *TopNodes; int NTopnodes, NTopleaves; /* variables for input/output , usually only used on process 0 */ char ParameterFile[MAXLEN_PATH]; /*!< file name of parameterfile used for starting the simulation */ FILE *FdInfo, /*!< file handle for info.txt log-file. */ *FdEnergy, /*!< file handle for energy.txt log-file. */ *FdTimings, /*!< file handle for timings.txt log-file. */ *FdDomain, /*!< file handle for domain.txt log-file. */ *FdBalance, /*!< file handle for balance.txt log-file. */ *FdMemory, *FdTimebin, *FdCPU; /*!< file handle for cpu.txt log-file. */ #ifdef OUTPUT_CPU_CSV FILE *FdCPUCSV; #endif #ifdef USE_SFR FILE *FdSfr; /*!< file handle for sfr.txt log-file. */ #endif struct pair_data *Pairlist; #ifdef FORCETEST FILE *FdForceTest; /*!< file handle for forcetest.txt log-file. */ #endif #ifdef DARKENERGY FILE *FdDE; /*!< file handle for darkenergy.txt log-file. */ #endif int WriteMiscFiles = 1; void *CommBuffer; /*!< points to communication buffer, which is used at a few places */ /*! This structure contains data which is the SAME for all tasks (mostly code parameters read from the * parameter file). Holding this data in a structure is convenient for writing/reading the restart file, and * it allows the introduction of new global variables in a simple way. The only thing to do is to introduce * them into this structure. */ struct global_data_all_processes All; /*! This structure holds all the information that is * stored for each particle of the simulation. */ struct particle_data *P, /*!< holds particle data on local processor */ *DomainPartBuf; /*!< buffer for particle data used in domain decomposition */ struct subfind_data *PS; /* the following struture holds data that is stored for each SPH particle in addition to the collisionless * variables. */ struct sph_particle_data *SphP, /*!< holds SPH particle data on local processor */ *DomainSphBuf; /*!< buffer for SPH particle data in domain decomposition */ #ifdef EXACT_GRAVITY_FOR_PARTICLE_TYPE struct special_particle_data *PartSpecialListGlobal; #endif peanokey *DomainKeyBuf; /* Various structures for communication during the gravity computation. */ struct data_index *DataIndexTable; /*!< the particles to be exported are grouped by task-number. This table allows the results to be disentangled again and to be assigned to the correct particle */ struct data_nodelist *DataNodeList; struct gravdata_in *GravDataIn, /*!< holds particle data to be exported to other processors */ *GravDataGet; /*!< holds particle data imported from other processors */ struct gravdata_out *GravDataResult, /*!< holds the partial results computed for imported particles. Note: We use GravDataResult = GravDataGet, such that the result replaces the imported data */ *GravDataOut; /*!< holds partial results received from other processors. This will overwrite the GravDataIn array */ int ThreadsNexport[NUM_THREADS], ThreadsNexportNodes[NUM_THREADS]; int *ThreadsNgblist[NUM_THREADS]; struct data_partlist *PartList, *ThreadsPartList[NUM_THREADS]; struct datanodelist *NodeList, *ThreadsNodeList[NUM_THREADS]; int *NodeDataGet, *NodeDataIn; struct potdata_out *PotDataResult, /*!< holds the partial results computed for imported particles. Note: We use GravDataResult = GravDataGet, such that the result replaces the imported data */ *PotDataOut; /*!< holds partial results received from other processors. This will overwrite the GravDataIn array */ /*! Header for the standard file format. */ struct io_header header; /*!< holds header for snapshot files */ #ifdef PARAMS_IN_SNAP char Parameters[MAX_PARAMETERS][MAXLEN_PARAM_TAG]; /*!< holds the tags of the parameters defined in the parameter file */ char ParameterValues[MAX_PARAMETERS][MAXLEN_PARAM_VALUE]; /*!< holds the values for the parameters defined in the parameter file */ #endif /* * Variables for Tree * ------------------ */ int Nexport, Nimport; int NexportNodes, NimportNodes; int MaxNexport, MaxNexportNodes; int BufferFullFlag; int NextParticle; int NextJ; struct permutation_data *permutation; /** Variables for gravitational tree */ int Tree_MaxPart; int Tree_NumNodes; int Tree_MaxNodes; int Tree_FirstNonTopLevelNode; int Tree_NumPartImported; int Tree_NumPartExported; int Tree_ImportedNodeOffset; int Tree_NextFreeNode; MyDouble *Tree_Pos_list; unsigned long long *Tree_IntPos_list; int *Tree_Task_list; int *Tree_ResultIndexList; struct treepoint_data *Tree_Points; struct resultsactiveimported_data *Tree_ResultsActiveImported; int *Nextnode; /*!< gives next node in tree walk (nodes array) */ int *Father; /*!< gives parent node in tree (Prenodes array) */ struct NODE *Nodes; /*!< points to the actual memory allocted for the nodes */ /*!< this is a pointer used to access the nodes which is shifted such that Nodes[All.MaxPart] gives the first allocated node */ float *Nodes_GravCost; /** Variables for neighbor tree */ int Ngb_MaxPart; int Ngb_NumNodes; int Ngb_MaxNodes; int Ngb_FirstNonTopLevelNode; int Ngb_NextFreeNode; int *Ngb_DomainNodeIndex; int *Ngb_Nextnode; /** The ngb-tree data structure */ struct NgbNODE *Ngb_Nodes; struct ExtNgbNODE *ExtNgb_Nodes; #ifdef STATICNFW double Rs, R200; double Dc; double RhoCrit, V200; double fac; #endif #ifdef NUM_THREADS int MaxThreads = NUM_THREADS; #else int MaxThreads = 1; #endif GalIC/src/bulge.c000644 000765 000024 00000005017 12373713530 014502 0ustar00volkerstaff000000 000000 /******************************************************************************* * This file is part of the GALIC code developed by D. Yurin and V. Springel. * * Copyright (c) 2014 * Denis Yurin (denis.yurin@h-its.org) * Volker Springel (volker.springel@h-its.org) *******************************************************************************/ #include #include #include #include #include #include #include "allvars.h" #include "proto.h" /* this function returns a new random coordinate for the bulge */ void bulge_get_fresh_coordinate(double *pos) { double r; do { double q = gsl_rng_uniform(random_generator); if(q > 0) r = All.Bulge_A * (q + sqrt(q)) / (1 - q); else r = 0; } while(r > All.Rmax); double phi = gsl_rng_uniform(random_generator) * M_PI * 2; double theta = acos(gsl_rng_uniform(random_generator) * 2 - 1); pos[0] = r * sin(theta) * cos(phi); pos[1] = r * sin(theta) * sin(phi); pos[2] = r * cos(theta) / All.BulgeStretch; } /* return the bulge density for the given coordinate */ double bulge_get_density(double *pos) { double r = sqrt(pos[0] * pos[0] + pos[1] * pos[1] + All.BulgeStretch * All.BulgeStretch * pos[2] * pos[2]); return All.BulgeStretch * All.Bulge_Mass / (2 * M_PI) * All.Bulge_A / (r + 1.0e-6 * All.Bulge_A) / pow(r + All.Bulge_A, 3); } /* Note that the other functions below will only be called in a meaningfull for a spherical system */ double bulge_get_mass_inside_radius(double r) { if(All.Bulge_Mass > 0) return All.Bulge_Mass * pow(r / (r + All.Bulge_A), 2); else return 0; } double bulge_get_potential(double *pos) { double r = sqrt(pos[0] * pos[0] + pos[1] * pos[1] + pos[2] * pos[2]); return bulge_get_potential_from_radius(r); } double bulge_get_potential_from_radius(double r) { double phi = -All.G * All.Bulge_Mass / (r + All.Bulge_A); return phi; } /* returns the acceleration at coordinate pos[] */ void bulge_get_acceleration(double *pos, double *acc) { double r = sqrt(pos[0] * pos[0] + pos[1] * pos[1] + pos[2] * pos[2]); double fac = All.G * All.Bulge_Mass / ((r + 1.0e-6 * All.Bulge_A)* (r + All.Bulge_A) * (r + All.Bulge_A)); acc[0] = -fac * pos[0]; acc[1] = -fac * pos[1]; acc[2] = -fac * pos[2]; } double bulge_get_escape_speed(double *pos) { double r = sqrt(pos[0] * pos[0] + pos[1] * pos[1] + pos[2] * pos[2]); double phi = -All.G * All.Bulge_Mass / (r + All.Bulge_A); double vesc = sqrt(-2.0 * phi); return vesc; } GalIC/src/disk.c000644 000765 000024 00000003507 12373713530 014340 0ustar00volkerstaff000000 000000 /******************************************************************************* * This file is part of the GALIC code developed by D. Yurin and V. Springel. * * Copyright (c) 2014 * Denis Yurin (denis.yurin@h-its.org) * Volker Springel (volker.springel@h-its.org) *******************************************************************************/ #include #include #include #include #include #include #include "allvars.h" #include "proto.h" /* this function returns a new random coordinate for the disk */ void disk_get_fresh_coordinate(double *pos) { double q, f, f_, R, R2, Rold, phi; do { q = gsl_rng_uniform(random_generator); pos[2] = All.Disk_Z0 / 2 * log(q / (1 - q)); q = gsl_rng_uniform(random_generator); R = 1.0; do { f = (1 + R) * exp(-R) + q - 1; f_ = -R * exp(-R); Rold = R; R = R - f / f_; } while(fabs(R - Rold) / R > 1e-7); R *= All.Disk_H; phi = gsl_rng_uniform(random_generator) * M_PI * 2; pos[0] = R * cos(phi); pos[1] = R * sin(phi); R2 = pos[0] * pos[0] + pos[1] * pos[1] + pos[2] * pos[2]; } while(R2 > All.Rmax * All.Rmax); } /* return the density of the disk at the given coordinate */ double disk_get_density(double *pos) { if(All.Disk_Mass > 0) { double R = sqrt(pos[0] * pos[0] + pos[1] * pos[1]); double z = pos[2]; double rho = All.Disk_Mass / (4 * M_PI * All.Disk_H * All.Disk_H * All.Disk_Z0) * exp(-R / All.Disk_H) * pow(2 / (exp(z / All.Disk_Z0) + exp(-z / All.Disk_Z0)), 2); return rho; } else return 0; } /* return the disk mass contained inside the specified radius */ double disk_get_mass_inside_radius(double R) { return All.Disk_Mass * (1 - (1 + R / All.Disk_H) * exp(-R / All.Disk_H)); } GalIC/src/disp_fields.c000644 000765 000024 00000064071 12373713530 015676 0ustar00volkerstaff000000 000000 /******************************************************************************* * This file is part of the GALIC code developed by D. Yurin and V. Springel. * * Copyright (c) 2014 * Denis Yurin (denis.yurin@h-its.org) * Volker Springel (volker.springel@h-its.org) *******************************************************************************/ #include #include #include #include #include #include "allvars.h" #include "proto.h" double get_density_of_type(double *pos, int type) { if(type == 1) return halo_get_density(pos); else if(type == 2) return disk_get_density(pos); else if(type == 3) return bulge_get_density(pos); else terminate("unknown type"); return 0; } double get_beta_of_type(double *pos, int type) { double beta = 0; if(type == 1) { beta = All.HaloBetaParameter; if(beta >= 1) { /* this signals that we adopt a beta that depends on the local density slope */ double r = sqrt(pos[0] * pos[0] + pos[1] * pos[1] + pos[2] * pos[2]); double dlogrhodr = -1.0 - 3.0 * r / (r + All.Halo_A); beta = -0.15 - 0.20 * dlogrhodr; } } else if(type == 2) { /* for the disk component, we only support beta = 0 */ beta = 0; } else if(type == 3) { beta = All.BulgeBetaParameter; if(beta >= 1) { /* this signals that we adopt a beta that depends on the local density slope */ double r = sqrt(pos[0] * pos[0] + pos[1] * pos[1] + pos[2] * pos[2]); double dlogrhodr = -1.0 - 3.0 * r / (r + All.Bulge_A); beta = -0.15 - 0.20 * dlogrhodr; } } else terminate("unknown type"); return beta; } void get_disp_rtp(double *pos, int type, double *disp_r, double *disp_t, double *disp_p, double *disp_q) { int typeOfVelocityStructure = 0; if(type == 1) /* a halo particle */ typeOfVelocityStructure = All.TypeOfHaloVelocityStructure; else if(type == 2) /* disk */ typeOfVelocityStructure = All.TypeOfDiskVelocityStructure; else if(type == 3) /* bulge */ typeOfVelocityStructure = All.TypeOfBulgeVelocityStructure; else terminate("unknown type"); if(typeOfVelocityStructure == 0) /* spherical, isotropic case */ { *disp_r = get_radial_disp_spherical(pos, type); *disp_t = *disp_r; *disp_p = *disp_r; *disp_q = *disp_r; } else if(typeOfVelocityStructure == 1) /* spherical, anisotropic case */ { *disp_r = get_radial_disp_spherical(pos, type); *disp_t = (1 - get_beta_of_type(pos, type)) * (*disp_r); *disp_p = *disp_t; *disp_q = *disp_t; } else if(typeOfVelocityStructure == 2) { *disp_t = get_z_disp_cylindrical(pos, type); *disp_r = *disp_t; *disp_p = get_phi_disp(pos, type); double vstr = get_vstream(pos, type); *disp_q = (*disp_p) + vstr * vstr; } else if(typeOfVelocityStructure == 3) { *disp_r = get_r_disp_tilted(pos, type); *disp_t = get_theta_disp_tilted(pos, type); *disp_p = get_phi_disp(pos, type); double vstr = get_vstream(pos, type); *disp_q = (*disp_p) + vstr * vstr; } else terminate("unknown velocity structure"); } double get_vstream(double *pos, int type) { int iz, iR; double fR, fz; forcegrid_get_cell(pos, &iR, &iz, &fR, &fz); double vstr = FG_Vstream[type][iz * FG_Nbin + iR] * (1 - fR) * (1 - fz) + FG_Vstream[type][(iz + 1) * FG_Nbin + iR] * (1 - fR) * (fz) + FG_Vstream[type][iz * FG_Nbin + (iR + 1)] * (fR) * (1 - fz) + FG_Vstream[type][(iz + 1) * FG_Nbin + (iR + 1)] * (fR) * (fz); return vstr; } double get_z_disp_cylindrical(double *pos, int type) { int iz, iR; double fR, fz; forcegrid_get_cell(pos, &iR, &iz, &fR, &fz); double disp = FG_DispZ[type][iz * FG_Nbin + iR] * (1 - fR) * (1 - fz) + FG_DispZ[type][(iz + 1) * FG_Nbin + iR] * (1 - fR) * (fz) + FG_DispZ[type][iz * FG_Nbin + (iR + 1)] * (fR) * (1 - fz) + FG_DispZ[type][(iz + 1) * FG_Nbin + (iR + 1)] * (fR) * (fz); return disp; } double get_phi_disp(double *pos, int type) { int iz, iR; double fR, fz; forcegrid_get_cell(pos, &iR, &iz, &fR, &fz); double disp = FG_DispPhi[type][iz * FG_Nbin + iR] * (1 - fR) * (1 - fz) + FG_DispPhi[type][(iz + 1) * FG_Nbin + iR] * (1 - fR) * (fz) + FG_DispPhi[type][iz * FG_Nbin + (iR + 1)] * (fR) * (1 - fz) + FG_DispPhi[type][(iz + 1) * FG_Nbin + (iR + 1)] * (fR) * (fz); return disp; } double get_r_disp_tilted(double *pos, int type) { int iz, iR; double fR, fz; forcegrid_get_cell(pos, &iR, &iz, &fR, &fz); double disp = FG_tilted_vR2_prime[type][iz * FG_Nbin + iR] * (1 - fR) * (1 - fz) + FG_tilted_vR2_prime[type][(iz + 1) * FG_Nbin + iR] * (1 - fR) * (fz) + FG_tilted_vR2_prime[type][iz * FG_Nbin + (iR + 1)] * (fR) * (1 - fz) + FG_tilted_vR2_prime[type][(iz + 1) * FG_Nbin + (iR + 1)] * (fR) * (fz); return disp; } double get_theta_disp_tilted(double *pos, int type) { int iz, iR; double fR, fz; forcegrid_get_cell(pos, &iR, &iz, &fR, &fz); double disp = FG_tilted_vz2_prime[type][iz * FG_Nbin + iR] * (1 - fR) * (1 - fz) + FG_tilted_vz2_prime[type][(iz + 1) * FG_Nbin + iR] * (1 - fR) * (fz) + FG_tilted_vz2_prime[type][iz * FG_Nbin + (iR + 1)] * (fR) * (1 - fz) + FG_tilted_vz2_prime[type][(iz + 1) * FG_Nbin + (iR + 1)] * (fR) * (fz); return disp; } /* this function decomposes the velocity vector vel[0,1,2] assigned to particle n into * the relevant 'radial' and 'tangential' velocity components squared */ void calc_disp_components_for_particle(int n, double *vel, double *vr2, double *vt2, double *vp2, double *vq2) { int type = P[n].Type; int typeOfVelocityStructure; if(type == 1) /* a halo particle */ typeOfVelocityStructure = All.TypeOfHaloVelocityStructure; else if(type == 2) /* disk */ typeOfVelocityStructure = All.TypeOfDiskVelocityStructure; else if(type == 3) /* bulge */ typeOfVelocityStructure = All.TypeOfBulgeVelocityStructure; else terminate("unknown type"); if(typeOfVelocityStructure == 0 || typeOfVelocityStructure == 1 || typeOfVelocityStructure == 3) { double phi = atan2(P[n].Pos[1], P[n].Pos[0]); double theta = acos(P[n].Pos[2] / sqrt(P[n].Pos[0] * P[n].Pos[0] + P[n].Pos[1] * P[n].Pos[1] + P[n].Pos[2] * P[n].Pos[2])); double er[3], ePhi[3], eTheta[3]; er[0] = sin(theta) * cos(phi); er[1] = sin(theta) * sin(phi); er[2] = cos(theta); ePhi[0] = -sin(phi); ePhi[1] = cos(phi); ePhi[2] = 0; eTheta[0] = -cos(theta) * cos(phi); eTheta[1] = -cos(theta) * sin(phi); eTheta[2] = sin(theta); double vr = vel[0] * er[0] + vel[1] * er[1] + vel[2] * er[2]; double vphi = vel[0] * ePhi[0] + vel[1] * ePhi[1] + vel[2] * ePhi[2]; double vtheta = vel[0] * eTheta[0] + vel[1] * eTheta[1] + vel[2] * eTheta[2]; double vstr = 0; if(typeOfVelocityStructure == 1 || typeOfVelocityStructure == 3) vstr = get_vstream(P[n].Pos, type); *vr2 = vr * vr; *vt2 = vtheta * vtheta; *vp2 = (vphi - vstr) * (vphi - vstr); *vq2 = vphi * vphi; } else if(typeOfVelocityStructure == 2) { double phi = atan2(P[n].Pos[1], P[n].Pos[0]); double eR[3], ePhi[3], eZ[3]; eR[0] = cos(phi); eR[1] = sin(phi); eR[2] = 0; ePhi[0] = -sin(phi); ePhi[1] = cos(phi); ePhi[2] = 0; eZ[0] = 0; eZ[1] = 0; eZ[2] = 1; double vR = vel[0] * eR[0] + vel[1] * eR[1] + vel[2] * eR[2]; double vphi = vel[0] * ePhi[0] + vel[1] * ePhi[1] + vel[2] * ePhi[2]; double vZ = vel[0] * eZ[0] + vel[1] * eZ[1] + vel[2] * eZ[2]; double vstr = get_vstream(P[n].Pos, type); *vr2 = vR * vR; *vt2 = vZ * vZ; *vp2 = (vphi - vstr) * (vphi - vstr); *vq2 = vphi * vphi; } } double get_radial_disp_spherical(double *pos, int type) { double r = sqrt(pos[0] * pos[0] + pos[1] * pos[1] + pos[2] * pos[2]); double br = (log(r / FG_Rmin + 1.0) / log(FG_Fac)); int binr; double fR; if(br < 0) br = 0; binr = (int) br; fR = br - binr; if(binr < 0) terminate("binr=%d\n", binr); if(binr >= FG_Nbin - 1) { binr = FG_Nbin - 2; fR = 1; } double disp = FG_Disp_r[type][binr] * (1 - fR) + FG_Disp_r[type][binr + 1] * (fR); return disp; } struct int_parameters { double R; int type; }; double disp_integ(double z, void *param) { double pos[3], acc[3]; struct int_parameters *par; par = param; pos[0] = par->R; pos[1] = 0; pos[2] = z; forcegrid_get_acceleration(pos, acc); return -acc[2] * get_density_of_type(pos, par->type); } double integrate_axisymmetric_jeans(double zstart, double zend, double R, int type) { int steps = 50, i; double dz = (zend - zstart) / steps; double sum = 0; for(i = 0; i < steps; i++) { double z0 = zstart + i * dz; double z1 = zstart + (i + 1) * dz; double pos0[3] = {R, 0, z0}; double pos1[3] = {R, 0, z1}; double acc0[3], acc1[3]; forcegrid_get_acceleration(pos0, acc0); double y0 = -acc0[2] * get_density_of_type(pos0, type); forcegrid_get_acceleration(pos1, acc1); double y1 = -acc1[2] * get_density_of_type(pos1, type); sum += 0.5 * (y0 + y1) * dz; } return sum; } double integrate_spherical_jeans_beta(int type, double ystart, double rstart, double rend) { int steps = 50, i; double dr = (rend - rstart) / steps; for(i = 0; i < steps; i++) { double r0 = rstart + i * dr; double r1 = rstart + (i + 1) * dr; double pos0[3] = {r0, 0, 0}; double pos1[3] = {r1, 0, 0}; double beta0 = get_beta_of_type(pos0, type); double beta1 = get_beta_of_type(pos1, type); double pos[3], acc[3]; pos[0] = r0; pos[1] = 0; pos[2] = 0; double dens0 = get_density_of_type(pos, type); forcegrid_get_acceleration(pos, acc); double acc0 = acc[0]; pos[0] = r1; pos[1] = 0; pos[2] = 0; double dens1 = get_density_of_type(pos, type); forcegrid_get_acceleration(pos, acc); double acc1 = acc[0]; if(r1 > fabs(dr) && r0 > fabs(dr)) { double ypred = ystart + dr * (dens0 * acc0 - 2 * beta0 / r0 * ystart); ystart += dr * 0.5 * ((dens0 * acc0 - 2 * beta0 / r0 * ystart) + (dens1 * acc1 - 2 * beta1 / r1 * ypred)); } else if(r0 > fabs(dr)) { ystart += dr * (dens0 * acc0 - 2 * beta0 / r0 * ystart); } } return ystart; } void calculate_dispfield(void) { int i, j, k, type; struct int_parameters par; #define AA(i,j) ((i) * FG_Nbin + (j)) /* (zindex, Rindex) */ mpi_printf("\nCalculating velocity dispersion fields...\n"); /* purely radial case first (only useful for TypeOf-VelocityStructure = 0 or 1) */ for(type = 1; type <= 3; type++) { if(type == 1 && All.Halo_N == 0) continue; if(type == 2 && All.Disk_N == 0) continue; if(type == 3 && All.Bulge_N == 0) continue; double r1 = FG_Rmin * (pow(FG_Fac, FG_Nbin) - 1.0); double r2 = FG_Rmin * (pow(FG_Fac, FG_Nbin + 2) - 1.0); double y = integrate_spherical_jeans_beta(type, 0.0, r2, r1); for(j = FG_Nbin - 1; j >= 0; j--) { r1 = FG_Rmin * (pow(FG_Fac, j) - 1.0); r2 = FG_Rmin * (pow(FG_Fac, j + 1) - 1.0); y = integrate_spherical_jeans_beta(type, y, r2, r1); FG_Disp_r[type][j] = y; } for(j = FG_Nbin - 1; j >= 0; j--) { r1 = FG_Rmin * (pow(FG_Fac, j) - 1.0); double pos[3]; pos[0] = r1; pos[1] = 0; pos[2] = 0; double dens = get_density_of_type(pos, type); if(dens > 0) FG_Disp_r[type][j] /= dens; else FG_Disp_r[type][j] = 0; } /* now we output the result */ if(ThisTask == 0) { char buf[1000]; sprintf(buf, "%s/sigma_r_%d.txt", All.OutputDir, type); FILE *fd = fopen(buf, "w"); fprintf(fd, "%d\n", FG_Nbin); for(j = 0; j < FG_Nbin; j++) { r1 = FG_Rmin * (pow(FG_Fac, j) - 1.0); double pos[3]; pos[0] = r1; pos[1] = 0; pos[2] = 0; fprintf(fd, "%g %g %g\n", r1, FG_Disp_r[type][j], get_beta_of_type(pos, type)); } fclose(fd); } } /* now do the simple axisymmetric f(E,Lz) case (useful for TypeOf-VelocityStructure = 2) */ for(type = 1; type <= 3; type++) { if(type == 1 && All.Halo_N == 0) continue; if(type == 2 && All.Disk_N == 0) continue; if(type == 3 && All.Bulge_N == 0) continue; double kParameter = 0; if(type == 1) kParameter = All.HaloStreamingVelocityParameter; else if(type == 2) kParameter = All.DiskStreamingVelocityParameter; else kParameter = All.BulgeStreamingVelocityParameter; for(j = 0; j < FG_Nbin; j++) { double R = FG_Rmin * (pow(FG_Fac, j) - 1.0); double z1, z2; par.R = R; par.type = type; k = FG_Nbin; z1 = FG_Rmin * (pow(FG_Fac, k) - 1.0); double integ = integrate_axisymmetric_jeans(z1, FG_Rmin * (pow(FG_Fac, 2 * FG_Nbin) - 1.0), R, type); for(k = FG_Nbin - 1; k >= 0; k--) { i = k * FG_Nbin + j; /* r,z */ z1 = FG_Rmin * (pow(FG_Fac, k) - 1.0); z2 = FG_Rmin * (pow(FG_Fac, k + 1) - 1.0); integ += integrate_axisymmetric_jeans(z1, z2, R, type); double pos[3]; pos[0] = R; pos[1] = 0; pos[2] = z1; double dens = get_density_of_type(pos, type); if(dens > 0) { FG_DispZ[type][i] = integ / dens; } else FG_DispZ[type][i] = 0; } } /* now calculate streaming velocity through axisymmetric Jeans equations */ for(k = FG_Nbin - 1; k >= 0; k--) for(j = 0; j < FG_Nbin; j++) { double z = FG_Rmin * (pow(FG_Fac, k) - 1.0); double R = FG_Rmin * (pow(FG_Fac, j) - 1.0); double pos[3], acc[3], R1, R2; int i1, i2; pos[0] = R; pos[1] = 0; pos[2] = z; forcegrid_get_acceleration(pos, acc); i = k * FG_Nbin + j; /* r,z */ if(j > 1 && j < FG_Nbin - 1) { R1 = FG_Rmin * (pow(FG_Fac, j - 1) - 1.0); i1 = k * FG_Nbin + j - 1; R2 = FG_Rmin * (pow(FG_Fac, j + 1) - 1.0); i2 = k * FG_Nbin + j + 1; } else if(j == 1) { R1 = FG_Rmin * (pow(FG_Fac, j) - 1.0); i1 = k * FG_Nbin + j; R2 = FG_Rmin * (pow(FG_Fac, j + 1) - 1.0); i2 = k * FG_Nbin + j + 1; } else if(j == 0) { R1 = FG_Rmin * (pow(FG_Fac, j + 1) - 1.0); i1 = k * FG_Nbin + j + 1; R2 = FG_Rmin * (pow(FG_Fac, j + 2) - 1.0); i2 = k * FG_Nbin + j + 2; } else { R1 = FG_Rmin * (pow(FG_Fac, j - 1) - 1.0); i1 = k * FG_Nbin + j - 1; R2 = FG_Rmin * (pow(FG_Fac, j) - 1.0); i2 = k * FG_Nbin + j; } pos[0] = R1; double dens1 = get_density_of_type(pos, type); pos[0] = R2; double dens2 = get_density_of_type(pos, type); double dlogDensSigma_dlogR = 0; if(dens1 * FG_DispZ[type][i1] > 0 && dens2 * FG_DispZ[type][i2] > 0) dlogDensSigma_dlogR = log( (dens2 * FG_DispZ[type][i2]) / (dens1 * FG_DispZ[type][i1])) / log(R2/R1); double Vphi2 = FG_DispZ[type][i] + R * (-acc[0]) + FG_DispZ[type][i] * dlogDensSigma_dlogR; if(Vphi2 > 0) { double vstr = 0; if(kParameter >= 0) { if(Vphi2 >= FG_DispZ[type][i]) vstr = kParameter * sqrt(Vphi2 - FG_DispZ[type][i]); } else { vstr = -kParameter * sqrt(Vphi2); if(kParameter < -1) terminate("illegal parameter kParameter=%g", kParameter); } FG_DispPhi[type][i] = Vphi2 - vstr * vstr; FG_Vstream[type][i] = vstr; } else { FG_DispPhi[type][i] = 0; FG_Vstream[type][i] = 0; } } if(ThisTask == 0) { double *tmpR = mymalloc("tmpR", FG_Ngrid * sizeof(double)); double *tmpz = mymalloc("tmpz", FG_Ngrid * sizeof(double)); for(k = 0; k < FG_Nbin; k++) { double z = FG_Rmin * (pow(FG_Fac, k) - 1.0); for(j = 0; j < FG_Nbin; j++) { double R = FG_Rmin * (pow(FG_Fac, j) - 1.0); i = k * FG_Nbin + j; /* z,r */ tmpR[i] = R; tmpz[i] = z; } } char buf[1000]; sprintf(buf, "%s/sigma_%d.dat", All.OutputDir, type); FILE *fd = fopen(buf, "w"); fwrite(&FG_Nbin, sizeof(int), 1, fd); fwrite(FG_DispZ[type], sizeof(double), FG_Ngrid, fd); fwrite(FG_DispPhi[type], sizeof(double), FG_Ngrid, fd); fwrite(FG_Vstream[type], sizeof(double), FG_Ngrid, fd); fwrite(tmpR, sizeof(double), FG_Ngrid, fd); fwrite(tmpz, sizeof(double), FG_Ngrid, fd); fclose(fd); myfree(tmpz); myfree(tmpR); } } /* now do the more difficult axisymmetic f(E,Lz,I3) case (useful for TypeOf-VelocityStructure = 3) */ for(type = 1; type <= 3; type++) { if(type == 1 && All.Halo_N == 0) continue; if(type == 2 && All.Disk_N == 0) continue; if(type == 3 && All.Bulge_N == 0) continue; if(type == 1 && All.TypeOfHaloVelocityStructure != 3) continue; if(type == 2 && All.TypeOfDiskVelocityStructure != 3) continue; if(type == 3 && All.TypeOfBulgeVelocityStructure != 3) continue; if(type == 1 && All.HaloDispersionRoverZratio == 0) terminate("invalid HaloDispersionRoverZratio=%g", All.HaloDispersionRoverZratio); if(type == 2 && All.DiskDispersionRoverZratio == 0) terminate("invalid DiskDispersionRoverZratio=%g", All.DiskDispersionRoverZratio); if(type == 3 && All.BulgeDispersionRoverZratio == 0) terminate("invalid BulgeDispersionRoverZratio=%g", All.BulgeDispersionRoverZratio); double kParameter = 0; if(type == 1) kParameter = All.HaloStreamingVelocityParameter; else if(type == 2) kParameter = All.DiskStreamingVelocityParameter; else if(type == 3) kParameter = All.BulgeStreamingVelocityParameter; /* grid for solution */ double *FG_q = mymalloc("FG_q", FG_Ngrid * sizeof(double)); /* auxiliary vectors */ double *qprev = mymalloc("qprev", FG_Nbin * sizeof(double)); for(k = FG_Nbin-1; k >= 0; k--) { mpi_printf("method of lines, row %d out of %d for type=%d\n", k, FG_Nbin, type); if(k== FG_Nbin-1) { for(j = 0; j < FG_Nbin; j++) { FG_q[AA(k,j)] = 0; } } else { double z0 = FG_Rmin * (pow(FG_Fac, k + 1) - 1.0); double z1 = FG_Rmin * (pow(FG_Fac, k) - 1.0); double dz = (z1 - z0); int nsteps = 100, st; for(j = 0; j < FG_Nbin; j++) qprev[j] = FG_q[AA(k+1,j)]; for(st = 0; st < nsteps; st++) { double z = z0 + (dz / nsteps)*st; for(j = 0; j < FG_Nbin; j++) { double R1 = FG_Rmin * (pow(FG_Fac, j) - 1.0); double h1 = h_factor(R1, z, type); double pos[3], acc[3]; pos[0] = R1; pos[1] = 0; pos[2] = z; double dens = get_density_of_type(pos, type); forcegrid_get_acceleration(pos, acc); double p = - dens * acc[2]; double dqdR; if(h1 > 0) { if(j == FG_Nbin - 1) dqdR = 0; else { double R2 = FG_Rmin * (pow(FG_Fac, j+1) - 1.0); double h2 = h_factor(R2, z, type); double dR = R2 - R1; dqdR = (h2*qprev[j+1] - h1*qprev[j]) / (dR) + qprev[j] * h_over_R(R1, z, type); } FG_q[AA(k,j)] = qprev[j] + ( - p - dqdR) * (dz / nsteps); } else { if(j == 0) { double R2 = FG_Rmin * (pow(FG_Fac, j+1) - 1.0); double h2 = h_factor(R2, z, type); double dR = R2 - R1; dqdR = (h2*qprev[j+1] - h1*qprev[j]) / (dR) + qprev[j] * h_over_R(R1, z, type); } else { double R2 = FG_Rmin * (pow(FG_Fac, j-1) - 1.0); double h2 = h_factor(R2, z, type); double dR = R2 - R1; dqdR = (h2*qprev[j-1] - h1*qprev[j]) / (dR) + qprev[j] * h_over_R(R1, z, type); } FG_q[AA(k,j)] = qprev[j] + ( - p - dqdR) * (dz / nsteps); } } for(j = 0; j < FG_Nbin; j++) qprev[j] = FG_q[AA(k,j)]; } } } for(k = 0; k < FG_Nbin; k++) for(j = 0; j < FG_Nbin; j++) { double R = FG_Rmin * (pow(FG_Fac, j) - 1.0); double z = FG_Rmin * (pow(FG_Fac, k) - 1.0); double pos[3]; pos[0] = R; pos[1] = 0; pos[2] = z; double dens = get_density_of_type(pos, type); if(dens > 0) FG_tilted_vz2[type][AA(k,j)] = FG_q[AA(k,j)] / dens; else FG_tilted_vz2[type][AA(k,j)] = 0; double f = 0; if(type == 1) f = All.HaloDispersionRoverZratio; else if(type == 2) f = All.DiskDispersionRoverZratio; else if(type == 3) f = All.BulgeDispersionRoverZratio; else terminate("not allowed"); if(j > 0) { double alpha = atan(z / R); double vrvz = FG_tilted_vz2[type][AA(k,j)] * ((f - 1) / 2 * tan (2 * alpha))/ ( pow(cos(alpha),2) - f*pow(sin(alpha),2) + (1.0+f)/2.0 * sin(2*alpha) * tan(2*alpha)); double vr2 = FG_tilted_vz2[type][AA(k,j)] * ( f*pow(cos(alpha),2) - pow(sin(alpha),2) + (1.0+f)/2.0 * sin(2*alpha) * tan(2*alpha)) / ( pow(cos(alpha),2) - f*pow(sin(alpha),2) + (1.0+f)/2.0 * sin(2*alpha) * tan(2*alpha)); double vr2_prime = vr2 * pow(cos(alpha),2) + 2 * vrvz * sin(alpha)*cos(alpha) + FG_tilted_vz2[type][AA(k,j)] * pow(sin(alpha),2); double vz2_prime = vr2 * pow(sin(alpha),2) - 2 * vrvz * sin(alpha)*cos(alpha) + FG_tilted_vz2[type][AA(k,j)] * pow(cos(alpha),2); FG_tilted_vR2[type][AA(k,j)] = vr2; FG_tilted_vz2_prime[type][AA(k,j)] = vz2_prime; FG_tilted_vR2_prime[type][AA(k,j)] = vr2_prime; } else { FG_tilted_vR2[type][AA(k,j)] = FG_tilted_vz2[type][AA(k,j)] / f; FG_tilted_vz2_prime[type][AA(k,j)] = FG_tilted_vR2[type][AA(k,j)]; FG_tilted_vR2_prime[type][AA(k,j)] = FG_tilted_vz2[type][AA(k,j)]; } } /* now calculate streaming velocity through axisymmetric Jeans equations */ for(k = FG_Nbin - 1; k >= 0; k--) for(j = 0; j < FG_Nbin; j++) { double z = FG_Rmin * (pow(FG_Fac, k) - 1.0); double R = FG_Rmin * (pow(FG_Fac, j) - 1.0); double pos[3], acc[3], R2; int i2; pos[0] = R; pos[1] = 0; pos[2] = z; double dens = get_density_of_type(pos, type); forcegrid_get_acceleration(pos, acc); i = k * FG_Nbin + j; /* r,z */ if(j < FG_Nbin - 1) { R2 = FG_Rmin * (pow(FG_Fac, j + 1) - 1.0); i2 = k * FG_Nbin + j + 1; } else { R2 = FG_Rmin * (pow(FG_Fac, j - 1) - 1.0); i2 = k * FG_Nbin + j - 1; } pos[0] = R2; double dens2 = get_density_of_type(pos, type); double Vphi2 = 0; if(dens > 0) Vphi2 = FG_tilted_vR2[type][i] + R * (-acc[0]) + R / dens * (dens2 * FG_tilted_vR2[type][i2] - dens * FG_tilted_vR2[type][i]) / (R2 - R) + R / dens * (dens2 * h_factor(R2, z, type) * FG_tilted_vz2[type][i2] - dens * h_factor(R, z, type) * FG_tilted_vz2[type][i]) / (R2 - R); if(Vphi2 > 0) { double vstr = 0; if(kParameter >= 0) { if(Vphi2 >= FG_tilted_vR2[type][i]) vstr = kParameter * sqrt((Vphi2 - FG_tilted_vR2[type][i])); } else { vstr = -kParameter * sqrt(Vphi2); if(kParameter < -1) terminate("illegal parameter kParameter=%g", kParameter); } FG_DispPhi[type][i] = Vphi2 - vstr * vstr; FG_Vstream[type][i] = vstr; } else { FG_DispPhi[type][i] = 0; FG_Vstream[type][i] = 0; } } if(ThisTask == 0) { double *tmpR = mymalloc("tmpR", FG_Ngrid * sizeof(double)); double *tmpz = mymalloc("tmpz", FG_Ngrid * sizeof(double)); for(k = 0; k < FG_Nbin; k++) { double z = FG_Rmin * (pow(FG_Fac, k) - 1.0); for(j = 0; j < FG_Nbin; j++) { double R = FG_Rmin * (pow(FG_Fac, j) - 1.0); i = k * FG_Nbin + j; /* z,r */ tmpR[i] = R; tmpz[i] = z; } } char buf[1000]; sprintf(buf, "%s/pde_%d.dat", All.OutputDir, type); FILE *fd = fopen(buf, "w"); fwrite(&FG_Nbin, sizeof(int), 1, fd); fwrite(FG_q, sizeof(double), FG_Ngrid, fd); fwrite(FG_tilted_vz2[type], sizeof(double), FG_Ngrid, fd); fwrite(FG_tilted_vR2[type], sizeof(double), FG_Ngrid, fd); fwrite(FG_tilted_vz2_prime[type], sizeof(double), FG_Ngrid, fd); fwrite(FG_tilted_vR2_prime[type], sizeof(double), FG_Ngrid, fd); fwrite(tmpR, sizeof(double), FG_Ngrid, fd); fwrite(tmpz, sizeof(double), FG_Ngrid, fd); fwrite(FG_DispZ[type], sizeof(double), FG_Ngrid, fd); fwrite(FG_DispPhi[type], sizeof(double), FG_Ngrid, fd); fwrite(FG_Vstream[type], sizeof(double), FG_Ngrid, fd); fclose(fd); myfree(tmpz); myfree(tmpR); } myfree(qprev); myfree(FG_q); } mpi_printf("done.\n\n"); } double h_factor(double R, double z, int type) { double f = 0, fac; if(type == 1) f = All.HaloDispersionRoverZratio; else if(type == 2) f = All.DiskDispersionRoverZratio; else if(type == 3) f = All.BulgeDispersionRoverZratio; else terminate("not allowed"); if(R <= 1.0e-12 * z || R == 0) fac = 0; else { double alpha = atan(z / R); fac = ((f - 1) / 2 * tan (2 * alpha))/ ( pow(cos(alpha),2) - f*pow(sin(alpha),2) + (1.0+f)/2.0 * sin(2*alpha) * tan(2*alpha)); } return fac; } double h_over_R(double R, double z, int type) { double f = 0, fac; if(type == 1) f = All.HaloDispersionRoverZratio; else if(type == 2) f = All.DiskDispersionRoverZratio; else if(type == 3) f = All.BulgeDispersionRoverZratio; else terminate("not allowed"); if(z == 0) terminate("z = 0 not allowed"); if(R <= 1.0e-12 * z || R == 0) fac = (f - 1) / f; else { double alpha = atan(z / R); fac = ((f - 1) / 2 * tan(alpha) * tan (2 * alpha))/ ( pow(cos(alpha),2) - f*pow(sin(alpha),2) + (1.0+f)/2.0 * sin(2*alpha) * tan(2*alpha)); } return fac / z; } GalIC/src/grid.c000644 000765 000024 00000106044 12373713530 014333 0ustar00volkerstaff000000 000000 /******************************************************************************* * This file is part of the GALIC code developed by D. Yurin and V. Springel. * * Copyright (c) 2014 * Denis Yurin (denis.yurin@h-its.org) * Volker Springel (volker.springel@h-its.org) *******************************************************************************/ #include #include #include #include #include #include #include "allvars.h" #include "proto.h" /* returns the index of the spatial grid point corresponding to coordinate pos[] */ void forcegrid_get_cell(double *pos, int *iR, int *iz, double *fR, double *fz) { double r = sqrt(pos[0] * pos[0] + pos[1] * pos[1]); double z = fabs(pos[2]); double br = (log(r / FG_Rmin + 1.0) / log(FG_Fac)); double bz = (log(z / FG_Rmin + 1.0) / log(FG_Fac)); int binr, binz; if(br < 0) br = 0; if(bz < 0) bz = 0; binr = (int) br; *fR = br - binr; binz = (int) bz; *fz = bz - binz; if(binr < 0) terminate("binr=%d: pos=(%g|%g|%g)\n", binr, pos[0], pos[1], pos[2]); if(binz < 0) terminate("binz=%d: pos=(%g|%g|%g)\n", binz, pos[0], pos[1], pos[2]); if(binr >= FG_Nbin - 1) { binr = FG_Nbin - 2; *fR = 1; } if(binz >= FG_Nbin - 1) { binz = FG_Nbin - 2; *fz = 1; } *iR = binr; *iz = binz; } /* gets the gravitational potential at the given coordinate. */ double forcegrid_get_potential(double *pos) { double pot; if(All.Disk_Mass > 0 || (All.Halo_Mass > 0 && All.SampleForceNhalo > 0) || (All.Bulge_Mass > 0 && All.SampleForceNbulge > 0)) { /* interpolate from potential grid */ double R = sqrt(pos[0] * pos[0] + pos[1] * pos[1]); double jdbl = log(R / FG_Rmin + 1.0) / log(FG_Fac); int jbin = (int) jdbl; double jfac = jdbl - jbin; double idbl = log(fabs(pos[2]) / FG_Rmin + 1.0) / log(FG_Fac); int ibin = (int) idbl; double ifac = idbl - ibin; if(ibin < FG_Nbin - 1 && jbin < FG_Nbin - 1) { pot = FG_Pot[ibin * FG_Nbin + jbin] * (1 - ifac) * (1 - jfac) + FG_Pot[ibin * FG_Nbin + (jbin + 1)] * (1 - ifac) * jfac + FG_Pot[(ibin + 1) * FG_Nbin + jbin] * ifac * (1 - jfac) + FG_Pot[(ibin + 1) * FG_Nbin + (jbin + 1)] * ifac * jfac; } else { /* we are off the grid. In this case, let's pretend the halo is spherical and adopt this as the force */ pot = All.M200 / All.Halo_Mass * halo_get_potential(pos); } } else { pot = 0; } if(All.Halo_Mass > 0 && All.SampleForceNhalo == 0) { if(All.HaloStretch != 1.0) terminate("not allowed"); /* add analytic halo potential */ pot += halo_get_potential(pos); } if(All.Bulge_Mass > 0 && All.SampleForceNbulge == 0) { if(All.BulgeStretch != 1.0) terminate("not allowed"); /* add analytic bulge potential */ pot += bulge_get_potential(pos); } return pot; } /* returns the escape speed at the given coordinate. */ double forcegrid_get_escape_speed(double *pos) { double phi = forcegrid_get_potential(pos); double vesc = sqrt(-2.0 * phi); return vesc; } /* returns the gravitational acceleration at the given coordinate. */ void forcegrid_get_acceleration(double *pos, double *acc) { if(All.Disk_Mass > 0 || (All.Halo_Mass > 0 && All.SampleForceNhalo > 0) || (All.Bulge_Mass > 0 && All.SampleForceNbulge > 0)) { /* interpolate from force grid */ double phi = atan2(pos[1], pos[0]); double R = sqrt(pos[0] * pos[0] + pos[1] * pos[1]); double jdbl = log(R / FG_Rmin + 1.0) / log(FG_Fac); int jbin = (int) jdbl; double jfac = jdbl - jbin; double idbl = log(fabs(pos[2]) / FG_Rmin + 1.0) / log(FG_Fac); int ibin = (int) idbl; double ifac = idbl - ibin; if(ibin < FG_Nbin - 1 && jbin < FG_Nbin - 1) { double accR = FG_DPotDR[ibin * FG_Nbin + jbin] * (1 - ifac) * (1 - jfac) + FG_DPotDR[ibin * FG_Nbin + (jbin + 1)] * (1 - ifac) * jfac + FG_DPotDR[(ibin + 1) * FG_Nbin + jbin] * ifac * (1 - jfac) + FG_DPotDR[(ibin + 1) * FG_Nbin + (jbin + 1)] * ifac * jfac; double accz = FG_DPotDz[ibin * FG_Nbin + jbin] * (1 - ifac) * (1 - jfac) + FG_DPotDz[ibin * FG_Nbin + (jbin + 1)] * (1 - ifac) * jfac + FG_DPotDz[(ibin + 1) * FG_Nbin + jbin] * ifac * (1 - jfac) + FG_DPotDz[(ibin + 1) * FG_Nbin + (jbin + 1)] * ifac * jfac; if(pos[2] < 0) accz = -accz; acc[0] = accR * cos(phi); acc[1] = accR * sin(phi); acc[2] = accz; } else { /* we are off the grid. In this case, let's pretend the halo is spherical and adopt this as the force */ halo_get_acceleration(pos, acc); double fac = All.M200 / All.Halo_Mass; int k; for(k=0; k<3; k++) acc[k] *= fac; } } else { acc[0] = acc[1] = acc[2] = 0; } if(All.Halo_Mass > 0 && All.SampleForceNhalo == 0) { if(All.HaloStretch != 1.0) terminate("SampleForceNhalo=0 not allowed because apherical halo has been chosen"); /* add analytic halo force */ double acc_h[3]; halo_get_acceleration(pos, acc_h); acc[0] += acc_h[0]; acc[1] += acc_h[1]; acc[2] += acc_h[2]; } if(All.Bulge_Mass > 0 && All.SampleForceNbulge == 0) { if(All.BulgeStretch != 1.0) terminate("SampleForceNbulge=0 not allowed because apherical bulge has been chosen"); /* add analytic bulge force */ double acc_b[3]; bulge_get_acceleration(pos, acc_b); acc[0] += acc_b[0]; acc[1] += acc_b[1]; acc[2] += acc_b[2]; } } /* The force grids allocated below store tabulated values on a grid with grid spacings that grow logarithmically. * The width of the first bin is "FG_Rmin". The 1D index [0] stores the value at coordinate 0, the * index [1] at coordinate FG_Rmin, the index[2[ at coordinate FG_Rmin + FG_Rmin * FG_Fac, etc. */ void forcegrid_allocate(void) { int type; FG_Ngrid = FG_Nbin * FG_Nbin; FG_Pot = mymalloc("FG_Pot", FG_Ngrid * sizeof(double)); FG_DPotDR = mymalloc("FG_PotDR", FG_Ngrid * sizeof(double)); FG_DPotDz = mymalloc("FG_PotDz", FG_Ngrid * sizeof(double)); FG_Pot_exact = mymalloc("FG_Pot_exact", FG_Ngrid * sizeof(double)); FG_DPotDR_exact = mymalloc("FG_PotDR_exact", FG_Ngrid * sizeof(double)); FG_DPotDz_exact = mymalloc("FG_PotDz_exact", FG_Ngrid * sizeof(double)); FG_R = mymalloc("FG_R", FG_Nbin * sizeof(double)); for(type = 1; type <= 3; type++) { FG_Disp_r[type] = mymalloc("FG_Sigma_r", FG_Ngrid * sizeof(double)); FG_DispZ[type] = mymalloc("FG_SigmaZ", FG_Ngrid * sizeof(double)); FG_DispPhi[type] = mymalloc("FG_DispPhi", FG_Ngrid * sizeof(double)); FG_Vstream[type] = mymalloc("FG_Vstream", FG_Ngrid * sizeof(double)); FG_tilted_vz2[type] = mymalloc("FG_title_vz2", FG_Ngrid * sizeof(double)); FG_tilted_vR2[type] = mymalloc("FG_title_vR2", FG_Ngrid * sizeof(double)); FG_tilted_vz2_prime[type] = mymalloc("FG_title_vz2_prime", FG_Ngrid * sizeof(double)); FG_tilted_vR2_prime[type] = mymalloc("FG_title_vR2_prime", FG_Ngrid * sizeof(double)); } } /* this routines allocates the grids for holding the energy response */ void energygrid_allocate(void) { int type; /* total elements in grid stack */ EG_Nstack = ((1 << (2 + 2 * EG_MaxLevel)) - 1) / 3; EG_Nbin = (1 << EG_MaxLevel); EG_Ngrid = EG_Nbin * EG_Nbin; EG_R = mymalloc("EG_R", EG_Nbin * sizeof(double)); for(type = 1; type <= 3; type++) { EGs_MassTarget[type] = mymalloc("EGs_MassTarget", EG_Nstack * sizeof(double)); EGs_MassResponse[type] = mymalloc("EGs_MassResponse", EG_Nstack * sizeof(double)); EGs_EgyTarget_r[type] = mymalloc("EGs_EgyTarget_r", EG_Nstack * sizeof(double)); EGs_EgyTarget_t[type] = mymalloc("EGs_EgyTarget_t", EG_Nstack * sizeof(double)); EGs_EgyTarget_p[type] = mymalloc("EGs_EgyTarget_p", EG_Nstack * sizeof(double)); EGs_EgyTarget_q[type] = mymalloc("EGs_EgyTarget_p", EG_Nstack * sizeof(double)); EGs_EgyResponse_r[type] = mymalloc("EGs_EgyResponse_r", EG_Nstack * sizeof(double)); EGs_EgyResponse_t[type] = mymalloc("EGs_EgyResponse_t", EG_Nstack * sizeof(double)); EGs_EgyResponse_p[type] = mymalloc("EGs_EgyResponse_p", EG_Nstack * sizeof(double)); EGs_EgyResponse_q[type] = mymalloc("EGs_EgyResponse_p", EG_Nstack * sizeof(double)); EG_MassLoc[type] = mymalloc("EG_MassLoc", sizeof(double) * EG_Ngrid); EG_EgyResponseRLoc[type] = mymalloc("EG_EgyResponseRLoc", sizeof(double) * EG_Ngrid); EG_EgyResponseTLoc[type] = mymalloc("EG_EgyResponseTLoc", sizeof(double) * EG_Ngrid); EG_EgyResponsePLoc[type] = mymalloc("EG_EgyResponsePLoc", sizeof(double) * EG_Ngrid); EG_EgyResponseQLoc[type] = mymalloc("EG_EgyResponseQLoc", sizeof(double) * EG_Ngrid); EG_EgyResponseRLoc_delta[type] = mymalloc("EG_EgyResponseRLoc", sizeof(double) * EG_Ngrid); EG_EgyResponseTLoc_delta[type] = mymalloc("EG_EgyResponseTLoc", sizeof(double) * EG_Ngrid); EG_EgyResponsePLoc_delta[type] = mymalloc("EG_EgyResponsePLoc", sizeof(double) * EG_Ngrid); EG_EgyResponseQLoc_delta[type] = mymalloc("EG_EgyResponseQLoc", sizeof(double) * EG_Ngrid); } } /* this routine allocates grids for holding the density response of the orbits. */ void densitygrid_allocate(void) { int type; /* total elements in grid stack */ DG_Nstack = ((1 << (2 + 2 * DG_MaxLevel)) - 1) / 3; /* finest grid resolution per dimension */ DG_Nbin = (1 << DG_MaxLevel); /* elements in finest grid */ DG_Ngrid = DG_Nbin * DG_Nbin; DG_CellVol = mymalloc("DG_CellVol", DG_Ngrid * sizeof(double)); DG_CellSize = mymalloc("DG_CellSize", DG_Ngrid * sizeof(double)); DGs_LogR = mymalloc("DGs_LogR", DG_Nstack * sizeof(double)); DGs_LogZ = mymalloc("DGs_LogZ", DG_Nstack * sizeof(double)); DGs_Distance = mymalloc("DGs_Distance", DG_Nstack * sizeof(double)); for(type = 1; type <= 3; type++) { DG_MassLoc[type] = mymalloc("DG_MassLoc", DG_Ngrid * sizeof(double)); DG_MassLoc_delta[type] = mymalloc("DG_MassLoc", DG_Ngrid * sizeof(double)); DGs_MassTarget[type] = mymalloc("DGs_MassTarget", DG_Nstack * sizeof(double)); DGs_MassResponse[type] = mymalloc("DGs_MassResponse", DG_Nstack * sizeof(double)); } } /* this function creates the look-up fields for the force and density fields * of the desired model. */ void forcedensitygrid_create(void) { double fac, dfac; int iter; fac = All.OutermostBinEnclosedMassFraction; All.Rmax = All.Halo_A * (fac + sqrt(fac)) / (1 - fac); mpi_printf("\nGrid structure:\n" "Rmax = %g (outer edge of grid)\n", All.Rmax); fac = All.InnermostBinEnclosedMassFraction; DG_Rin = All.Halo_A * (fac + sqrt(fac)) / (1 - fac); mpi_printf("Rin = %g (radius that encloses a fraction %g of the Hernquist halo)\n", DG_Rin, fac); /* we put DG_Nbin per dimension for the density grid, ibin=[0,...,Nbin-1] * left and right edges of a bin are given by * rleft = Rmin * (fac^ibin - 1) * right = Rmin * (fac^(ibin+1) - 1) */ iter = 0; DG_Fac = 1.02; double y = (All.Rmax / DG_Rin); do { double f = log((pow(DG_Fac, DG_Nbin) - 1.0) / (y * (DG_Fac - 1.0))); double df = DG_Nbin / (DG_Fac - 1.0 / pow(DG_Fac, DG_Nbin - 1)) - 1.0 / (DG_Fac - 1.0); dfac = -f / df; DG_Fac += dfac; iter++; if(iter > MAXITER) terminate("iter > MAXITER"); } while(fabs(dfac) > 1.0e-8 * DG_Fac); DG_Rmin = DG_Rin / (DG_Fac - 1.0); mpi_printf("Extension of first cell of density grid: Rmin = %10g (grid spacing factor=%g)\n", DG_Rmin * (DG_Fac - 1.0), DG_Fac); /* now determine the DGs_Distance stack */ int i, j, k, l; for(k = 0; k < DG_Nbin; k++) { for(j = 0; j < DG_Nbin; j++) { double z = DG_Rmin * 0.5 * (pow(DG_Fac, k) + pow(DG_Fac, k + 1) - 2.0); double R = DG_Rmin * 0.5 * (pow(DG_Fac, j) + pow(DG_Fac, j + 1) - 2.0); DGs_LogR[STACKOFFSET(DG_MaxLevel, k, j)] = log(R); DGs_LogZ[STACKOFFSET(DG_MaxLevel, k, j)] = log(z); } } smooth_stack(DGs_LogR, DG_MaxLevel); smooth_stack(DGs_LogZ, DG_MaxLevel); for(l = DG_MaxLevel - 1; l >= 0; l--) { int n = (1 << l); int f = (1 << (DG_MaxLevel - l)); for(i = 0; i < n; i++) for(j = 0; j < n; j++) { DGs_LogR[STACKOFFSET(l, i, j)] /= (f*f); DGs_LogZ[STACKOFFSET(l, i, j)] /= (f*f); } } for(l = DG_MaxLevel; l >= 0; l--) { int n = (1 << l); for(i = 0; i < n; i++) for(j = 0; j < n; j++) { DGs_Distance[STACKOFFSET(l, i, j)] = sqrt(pow(exp(DGs_LogR[STACKOFFSET(l, i, j)]), 2) + pow(exp(DGs_LogZ[STACKOFFSET(l, i, j)]), 2)); } } /* now do the same thing for the force grid */ FG_Rin = DG_Rin; iter = 0; FG_Fac = 1.02; y = (All.Rmax / FG_Rin); do { double f = log((pow(FG_Fac, FG_Nbin) - 1.0) / (y * (FG_Fac - 1.0))); double df = FG_Nbin / (FG_Fac - 1.0 / pow(FG_Fac, FG_Nbin - 1)) - 1.0 / (FG_Fac - 1.0); dfac = -f / df; FG_Fac += dfac; iter++; if(iter > MAXITER) terminate("iter > MAXITER"); } while(fabs(dfac) > 1.0e-8 * FG_Fac); FG_Rmin = FG_Rin / (FG_Fac - 1.0); mpi_printf("Extension of first cell of force grid: Rmin = %10g (grid spacing factor=%g)\n", FG_Rmin * (FG_Fac - 1.0), FG_Fac); /* now do the same thing for the energy grid */ EG_Rin = DG_Rin; iter = 0; EG_Fac = 1.02; y = (All.Rmax / EG_Rin); do { double f = log((pow(EG_Fac, EG_Nbin) - 1.0) / (y * (EG_Fac - 1.0))); double df = EG_Nbin / (EG_Fac - 1.0 / pow(EG_Fac, EG_Nbin - 1)) - 1.0 / (EG_Fac - 1.0); dfac = -f / df; EG_Fac += dfac; iter++; if(iter > MAXITER) terminate("iter > MAXITER"); } while(fabs(dfac) > 1.0e-8 * EG_Fac); EG_Rmin = EG_Rin / (EG_Fac - 1.0); mpi_printf("Extension of first cell of energy grid: Rmin = %10g (grid spacing factor=%g)\n", EG_Rmin * (EG_Fac - 1.0), EG_Fac); forcedensitygrid_calculate(); mpi_printf("Extension of first cell of energy grid: Rmin = %10g (grid spacing factor=%g)\n", EG_Rmin * (EG_Fac - 1.0), EG_Fac); } /* returns the index of the spatial grid point corresponding to coordinate pos[] */ void densitygrid_get_cell(double *pos, int *iR, int *iz, double *fR, double *fz) { double r = sqrt(pos[0] * pos[0] + pos[1] * pos[1]); double z = fabs(pos[2]); double br = (log(r / DG_Rmin + 1.0) / log(DG_Fac)) - 0.5; double bz = (log(z / DG_Rmin + 1.0) / log(DG_Fac)) - 0.5; int binr, binz; if(br < 0) br = 0; if(bz < 0) bz = 0; binr = (int) br; *fR = br - binr; binz = (int) bz; *fz = bz - binz; if(binr < 0) terminate("binr=%d: pos=(%g|%g|%g)\n", binr, pos[0], pos[1], pos[2]); if(binz < 0) terminate("binz=%d: pos=(%g|%g|%g)\n", binz, pos[0], pos[1], pos[2]); if(binr >= DG_Nbin - 1) { binr = DG_Nbin - 2; *fR = 1; } if(binz >= DG_Nbin - 1) { binz = DG_Nbin - 2; *fz = 1; } *iR = binr; *iz = binz; } /* this function samples the desired density model in order * to create an accurate force field with a tree code. */ void densitygrid_sample_targetresponse(void) { int type, i, j, k, count, cstart; double *mfield, pos[3]; mpi_printf("sampling density field\n"); for(type = 1; type <= 3; type++) { if(MType[type] > 0) { mfield = mymalloc("mfield", DG_Nbin * DG_Nbin * sizeof(double)); for(k = 0; k < DG_Nbin; k++) for(j = 0; j < DG_Nbin; j++) { i = k * DG_Nbin + j; /* r,z */ mfield[i] = 0; } count = 0; cstart = 0; int ntarget = All.SampleParticleCount / NTask + 1; while(count < ntarget) { if(type == 1) halo_get_fresh_coordinate(pos); /* a halo particle */ else if(type == 2) disk_get_fresh_coordinate(pos); /* disk particle */ else if(type == 3) bulge_get_fresh_coordinate(pos); /* disk particle */ double r = sqrt(pos[0] * pos[0] + pos[2] * pos[2]); if(r < All.Rmax) { int iR, iz; double fR, fz; densitygrid_get_cell(pos, &iR, &iz, &fR, &fz); mfield[iz * DG_Nbin + iR] += (1 - fR) * (1 - fz); mfield[iz * DG_Nbin + (iR + 1)] += (fR) * (1 - fz); mfield[(iz + 1) * DG_Nbin + iR] += (1 - fR) * (fz); mfield[(iz + 1) * DG_Nbin + (iR + 1)] += (fR) * (fz); if(count >= cstart) { mpi_printf("."); fflush(stdout); cstart += ntarget / 100; } count++; } } MPI_Allreduce(mfield, &DGs_MassTarget[type][STACKOFFSET(DG_MaxLevel, 0, 0)], DG_Ngrid, MPI_DOUBLE, MPI_SUM, MPI_COMM_WORLD); double nt = ntarget, nsum; MPI_Allreduce(&nt, &nsum, 1, MPI_DOUBLE, MPI_SUM, MPI_COMM_WORLD); for(k = 0; k < DG_Nbin; k++) for(j = 0; j < DG_Nbin; j++) { i = k * DG_Nbin + j; /* r,z */ DGs_MassTarget[type][STACKOFFSET(DG_MaxLevel, 0, 0) + i] *= MType[type] / nsum; } smooth_stack(DGs_MassTarget[type], DG_MaxLevel); myfree(mfield); } } mpi_printf("done\n"); } /* this function determines grids with the density response of the * desired mass model, as well as a tabulated loop-up table for the * gravitational force field created by this mass distribution. */ void forcedensitygrid_calculate(void) { int i, j, k, s, type; if(All.Disk_Mass > 0 && All.SampleForceNdisk == 0) terminate("Disk_Mass > 0 combined with SampleForceNdisk == 0 is not allowed"); /* First, find the force field by creating a particle representation. * For bulge and halo, the analytic force fields will be used if the corresponding values for * SampleForceNhalo or SampleForceNbulge are zero */ if( (All.Halo_Mass > 0 && All.SampleForceNhalo > 0) || (All.Disk_Mass > 0 && All.SampleForceNdisk > 0) || (All.Bulge_Mass > 0 && All.SampleForceNbulge > 0)) { int nsample_tot = FG_Nbin * FG_Nbin * FG_SECTIONS; int nsample_before = 0, ns; int nhalo = get_part_count_this_task(All.SampleForceNhalo); int ndisk = get_part_count_this_task(All.SampleForceNdisk); int nbulge = get_part_count_this_task(All.SampleForceNbulge); int nsample = get_part_count_this_task(nsample_tot); NumPart = nhalo + ndisk + nbulge + nsample; MPI_Allreduce(&NumPart, &All.MaxPart, 1, MPI_INT, MPI_MAX, MPI_COMM_WORLD); sumup_large_ints(1, &NumPart, &All.TotNumPart); allocate_memory(); int *tmp = mymalloc("tmp", NTask * sizeof(int)); MPI_Allgather(&nsample, 1, MPI_INT, tmp, 1, MPI_INT, MPI_COMM_WORLD); for(i = 0; i < ThisTask; i++) nsample_before += tmp[i]; myfree(tmp); int n = 0; for(i = 0; i < nhalo; i++, n++) { P[n].Type = 1; P[n].Mass = All.Halo_Mass / All.SampleForceNhalo; halo_get_fresh_coordinate(P[n].Pos); /* a halo particle */ } for(i = 0; i < ndisk; i++, n++) { P[n].Type = 2; P[n].Mass = All.Disk_Mass / All.SampleForceNdisk; disk_get_fresh_coordinate(P[n].Pos); /* a disk particle */ } for(i = 0; i < nbulge; i++, n++) { P[n].Type = 3; P[n].Mass = All.Bulge_Mass / All.SampleForceNbulge; bulge_get_fresh_coordinate(P[n].Pos); /* a bulge particle */ } for(s = 0, ns = 0; s < FG_SECTIONS; s++) for(k = 0; k < FG_Nbin; k++) for(j = 0; j < FG_Nbin; j++) { if(ns >= nsample_before && ns < (nsample_before + nsample)) { P[n].Type = 5; P[n].Mass = 0; i = (s * FG_Nbin * FG_Nbin) + k * FG_Nbin + j; /* r,z */ P[n].ID = i; double phi = 2 * M_PI / FG_SECTIONS * s; double r1 = FG_Rmin * (pow(FG_Fac, j) - 1.0); double z1 = FG_Rmin * (pow(FG_Fac, k) - 1.0); P[n].Pos[0] = r1 * cos(phi); P[n].Pos[1] = r1 * sin(phi); P[n].Pos[2] = z1; n++; } ns++; } if(n != NumPart) terminate("n=%d != NumPart=%d nsample_before=%d nsample=%d ", n, NumPart, nsample_before, nsample); /* call the tree code to calculate the gravitational forces */ gravity(); double *loc_FG_DPotDR = mymalloc("loc_FG_DPotDR", FG_Ngrid * sizeof(double)); double *loc_FG_DPotDz = mymalloc("loc_FG_DPotDz", FG_Ngrid * sizeof(double)); double *loc_FG_Pot = mymalloc("loc_FG_Pot", FG_Ngrid * sizeof(double)); memset(loc_FG_DPotDR, 0, FG_Ngrid * sizeof(double)); memset(loc_FG_DPotDz, 0, FG_Ngrid * sizeof(double)); memset(loc_FG_Pot, 0, FG_Ngrid * sizeof(double)); /* read out the forces and potentials */ for(n = 0; n < NumPart; n++) { if(P[n].Type == 5) { i = P[n].ID; s = i / (FG_Nbin * FG_Nbin); i -= s * (FG_Nbin * FG_Nbin); k = i / FG_Nbin; i -= k * FG_Nbin; j = i; i = k * FG_Nbin + j; /* r,z */ double phi = 2 * M_PI / FG_SECTIONS * s; loc_FG_DPotDR[i] += P[n].GravAccel[0] * cos(phi) + P[n].GravAccel[1] * sin(phi); loc_FG_DPotDz[i] += P[n].GravAccel[2]; loc_FG_Pot[i] += P[n].Potential; } } MPI_Allreduce(loc_FG_DPotDR, FG_DPotDR, FG_Ngrid, MPI_DOUBLE, MPI_SUM, MPI_COMM_WORLD); MPI_Allreduce(loc_FG_DPotDz, FG_DPotDz, FG_Ngrid, MPI_DOUBLE, MPI_SUM, MPI_COMM_WORLD); MPI_Allreduce(loc_FG_Pot, FG_Pot, FG_Ngrid, MPI_DOUBLE, MPI_SUM, MPI_COMM_WORLD); for(i = 0; i < FG_Ngrid; i++) { FG_DPotDR[i] /= FG_SECTIONS; FG_DPotDz[i] /= FG_SECTIONS; FG_Pot[i] /= FG_SECTIONS; } myfree(loc_FG_Pot); myfree(loc_FG_DPotDz); myfree(loc_FG_DPotDR); free_allocated_memory(); } else { memset(FG_DPotDR, 0, FG_Ngrid * sizeof(double)); memset(FG_DPotDz, 0, FG_Ngrid * sizeof(double)); memset(FG_Pot, 0, FG_Ngrid * sizeof(double)); } /* for test purposes, we create a grid of the exact force just for the halo */ for(k = 0; k < FG_Nbin; k++) for(j = 0; j < FG_Nbin; j++) { i = k * FG_Nbin + j; /* r,z */ double r1 = FG_Rmin * (pow(FG_Fac, j) - 1.0); double z1 = FG_Rmin * (pow(FG_Fac, k) - 1.0); double pos[3], acc[3]; pos[0] = r1; pos[1] = 0; pos[2] = z1; FG_Pot_exact[i] = halo_get_potential(pos); halo_get_acceleration(pos, acc); FG_DPotDR_exact[i] = acc[0]; FG_DPotDz_exact[i] = acc[2]; } if(ThisTask == 0) { char buf[1000]; sprintf(buf, "%s/forcefield.dat", All.OutputDir); FILE *fd = fopen(buf, "w"); fwrite(&FG_Nbin, sizeof(int), 1, fd); fwrite(FG_DPotDR, sizeof(double), FG_Ngrid, fd); fwrite(FG_DPotDz, sizeof(double), FG_Ngrid, fd); fwrite(FG_Pot, sizeof(double), FG_Ngrid, fd); fwrite(FG_DPotDR_exact, sizeof(double), FG_Ngrid, fd); fwrite(FG_DPotDz_exact, sizeof(double), FG_Ngrid, fd); fwrite(FG_Pot_exact, sizeof(double), FG_Ngrid, fd); double *tmpR = mymalloc("tmpR", FG_Ngrid * sizeof(double)); double *tmpz = mymalloc("tmpz", FG_Ngrid * sizeof(double)); for(k = 0; k < FG_Nbin; k++) { double z = FG_Rmin * (pow(FG_Fac, k) - 1.0); for(j = 0; j < FG_Nbin; j++) { double R = FG_Rmin * (pow(FG_Fac, j) - 1.0); i = k * FG_Nbin + j; /* z,r */ tmpR[i] = R; tmpz[i] = z; } } fwrite(tmpR, sizeof(double), FG_Ngrid, fd); fwrite(tmpz, sizeof(double), FG_Ngrid, fd); fclose(fd); myfree(tmpz); myfree(tmpR); } force_test(); /* now the density grid */ for(k = 0; k < DG_Nbin; k++) for(j = 0; j < DG_Nbin; j++) { i = k * DG_Nbin + j; /* r,z */ double r1 = DG_Rmin * (pow(DG_Fac, j) - 1.0); double r2 = DG_Rmin * (pow(DG_Fac, j + 1) - 1.0); double z1 = DG_Rmin * (pow(DG_Fac, k) - 1.0); double z2 = DG_Rmin * (pow(DG_Fac, k + 1) - 1.0); double vol = M_PI * (r2 * r2 - r1 * r1) * (z2 - z1); vol *= 2; /* this factor accounts for symmetry at z=0 plane */ double pos[3]; pos[0] = 0.5 * (r1 + r2); pos[1] = 0; pos[2] = 0.5 * (z1 + z2); DG_CellSize[i] = dmax(r2 - r1, z2 - z1); DG_CellVol[i] = vol; for(type = 1; type <= 3; type++) DGs_MassTarget[type][STACKOFFSET(DG_MaxLevel, 0, 0) + i] = vol * get_density_of_type(pos, type); } for(type = 1; type <= 3; type++) smooth_stack(DGs_MassTarget[type], DG_MaxLevel); if(All.SampleDensityFieldForTargetResponse) densitygrid_sample_targetresponse(); mpi_printf("\nMass assigned to density reposnse grid:\n"); for(type = 1; type <= 3; type++) { double msum = 0; for(i = 0; i < DG_Ngrid; i++) msum += DGs_MassTarget[type][STACKOFFSET(DG_MaxLevel, 0, 0) + i]; mpi_printf("Type=%d: raw mass on grid = %10g target=%10g (after recalibration)\n", type, msum, MType[type]); /* we renormalize to compensate for any missing mass off the grid, and errors from poor density sampling */ if(msum > 0) { for(i = 0; i < DG_Ngrid; i++) DGs_MassTarget[type][STACKOFFSET(DG_MaxLevel, 0, 0) + i] *= MType[type] / msum; if(ThisTask == 0) { char buf[1000]; sprintf(buf, "%s/target_%d.dat", All.OutputDir, type); FILE *fd = fopen(buf, "w"); fwrite(&DG_Nbin, sizeof(int), 1, fd); fwrite(DG_CellSize, sizeof(double), DG_Ngrid, fd); fwrite(&DGs_MassTarget[type][STACKOFFSET(DG_MaxLevel, 0, 0)], sizeof(double), DG_Ngrid, fd); fclose(fd); } } } } /* returns the index of the spatial grid point corresponding to coordinate pos[] */ void energygrid_get_cell(double *pos, int *iR, int *iz, double *fR, double *fz) { double r = sqrt(pos[0] * pos[0] + pos[1] * pos[1]); double z = fabs(pos[2]); double br = (log(r / EG_Rmin + 1.0) / log(EG_Fac)) - 0.5; double bz = (log(z / EG_Rmin + 1.0) / log(EG_Fac)) - 0.5; int binr, binz; if(br < 0) br = 0; if(bz < 0) bz = 0; binr = (int) br; *fR = br - binr; binz = (int) bz; *fz = bz - binz; if(binr < 0) terminate("binr=%d: EG_Rmin=%g EG_Fac=%g pos=(%g|%g|%g)\n", binr, EG_Rmin, EG_Fac, pos[0], pos[1], pos[2]); if(binz < 0) terminate("binz=%d: EG_Rmin=%g EG_Fac=%g pos=(%g|%g|%g)\n", binz, EG_Rmin, EG_Fac, pos[0], pos[1], pos[2]); if(binr >= EG_Nbin - 1) { binr = EG_Nbin - 2; *fR = 1; } if(binz >= EG_Nbin - 1) { binz = EG_Nbin - 2; *fz = 1; } *iR = binr; *iz = binz; } void calc_energy_grid_mass_maps(void) { int i, j, k, iR, iz, type, count, cstart; double fR, fz; double *mfield, *egyfield_r, *egyfield_t, *egyfield_p, *egyfield_q, pos[3]; mpi_printf("calculate mass map for energy grid...\n"); mpi_printf("sampling energy field\n"); for(type = 1; type <= 3; type++) { if(MType[type] > 0) { mfield = mymalloc("mfield", EG_Nbin * EG_Nbin * sizeof(double)); egyfield_r = mymalloc("egyfield_r", EG_Nbin * EG_Nbin * sizeof(double)); egyfield_t = mymalloc("egyfield_t", EG_Nbin * EG_Nbin * sizeof(double)); egyfield_p = mymalloc("egyfield_p", EG_Nbin * EG_Nbin * sizeof(double)); egyfield_q = mymalloc("egyfield_q", EG_Nbin * EG_Nbin * sizeof(double)); for(k = 0; k < EG_Nbin; k++) for(j = 0; j < EG_Nbin; j++) { i = k * EG_Nbin + j; /* r,z */ mfield[i] = 0; egyfield_r[i] = 0; egyfield_t[i] = 0; egyfield_p[i] = 0; egyfield_q[i] = 0; } count = 0; cstart = 0; int ntarget = All.SampleParticleCount / NTask + 1; while(count < ntarget) { if(type == 1) halo_get_fresh_coordinate(pos); /* a halo particle */ else if(type == 2) disk_get_fresh_coordinate(pos); /* disk particle */ else if(type == 3) bulge_get_fresh_coordinate(pos); /* disk particle */ double r = sqrt(pos[0] * pos[0] + pos[2] * pos[2]); if(r < All.Rmax) { energygrid_get_cell(pos, &iR, &iz, &fR, &fz); double disp_r = 0, disp_t = 0, disp_p = 0, disp_q = 0; get_disp_rtp(pos, type, &disp_r, &disp_t, &disp_p, &disp_q); /* we are actually binning the dispersion of a particle normalized to the expected dispersion, hence * this boils down to unity here */ mfield[iz * EG_Nbin + iR] += (1 - fR) * (1 - fz); mfield[iz * EG_Nbin + (iR + 1)] += (fR) * (1 - fz); mfield[(iz + 1) * EG_Nbin + iR] += (1 - fR) * (fz); mfield[(iz + 1) * EG_Nbin + (iR + 1)] += (fR) * (fz); egyfield_r[iz * EG_Nbin + iR] += (1 - fR) * (1 - fz) * disp_r; egyfield_r[iz * EG_Nbin + (iR + 1)] += (fR) * (1 - fz) * disp_r; egyfield_r[(iz + 1) * EG_Nbin + iR] += (1 - fR) * (fz) * disp_r; egyfield_r[(iz + 1) * EG_Nbin + (iR + 1)] += (fR) * (fz) * disp_r; egyfield_t[iz * EG_Nbin + iR] += (1 - fR) * (1 - fz) * disp_t; egyfield_t[iz * EG_Nbin + (iR + 1)] += (fR) * (1 - fz) * disp_t; egyfield_t[(iz + 1) * EG_Nbin + iR] += (1 - fR) * (fz) * disp_t; egyfield_t[(iz + 1) * EG_Nbin + (iR + 1)] += (fR) * (fz) * disp_t; egyfield_p[iz * EG_Nbin + iR] += (1 - fR) * (1 - fz) * disp_p; egyfield_p[iz * EG_Nbin + (iR + 1)] += (fR) * (1 - fz) * disp_p; egyfield_p[(iz + 1) * EG_Nbin + iR] += (1 - fR) * (fz) * disp_p; egyfield_p[(iz + 1) * EG_Nbin + (iR + 1)] += (fR) * (fz) * disp_p; egyfield_q[iz * EG_Nbin + iR] += (1 - fR) * (1 - fz) * disp_q; egyfield_q[iz * EG_Nbin + (iR + 1)] += (fR) * (1 - fz) * disp_q; egyfield_q[(iz + 1) * EG_Nbin + iR] += (1 - fR) * (fz) * disp_q; egyfield_q[(iz + 1) * EG_Nbin + (iR + 1)] += (fR) * (fz) * disp_q; if(count >= cstart) { mpi_printf("."); fflush(stdout); cstart += ntarget / 100; } count++; } } MPI_Allreduce(mfield, &EGs_MassTarget[type][STACKOFFSET(EG_MaxLevel, 0, 0)], EG_Ngrid, MPI_DOUBLE, MPI_SUM, MPI_COMM_WORLD); MPI_Allreduce(egyfield_r, &EGs_EgyTarget_r[type][STACKOFFSET(EG_MaxLevel, 0, 0)], EG_Ngrid, MPI_DOUBLE, MPI_SUM, MPI_COMM_WORLD); MPI_Allreduce(egyfield_t, &EGs_EgyTarget_t[type][STACKOFFSET(EG_MaxLevel, 0, 0)], EG_Ngrid, MPI_DOUBLE, MPI_SUM, MPI_COMM_WORLD); MPI_Allreduce(egyfield_p, &EGs_EgyTarget_p[type][STACKOFFSET(EG_MaxLevel, 0, 0)], EG_Ngrid, MPI_DOUBLE, MPI_SUM, MPI_COMM_WORLD); MPI_Allreduce(egyfield_q, &EGs_EgyTarget_q[type][STACKOFFSET(EG_MaxLevel, 0, 0)], EG_Ngrid, MPI_DOUBLE, MPI_SUM, MPI_COMM_WORLD); double nt = ntarget, nsum; MPI_Allreduce(&nt, &nsum, 1, MPI_DOUBLE, MPI_SUM, MPI_COMM_WORLD); for(k = 0; k < EG_Nbin; k++) for(j = 0; j < EG_Nbin; j++) { i = k * EG_Nbin + j; /* r,z */ EGs_MassTarget[type][STACKOFFSET(EG_MaxLevel, 0, 0) + i] *= MType[type] / nsum; EGs_EgyTarget_r[type][STACKOFFSET(EG_MaxLevel, 0, 0) + i] *= MType[type] / nsum; EGs_EgyTarget_t[type][STACKOFFSET(EG_MaxLevel, 0, 0) + i] *= MType[type] / nsum; EGs_EgyTarget_p[type][STACKOFFSET(EG_MaxLevel, 0, 0) + i] *= MType[type] / nsum; EGs_EgyTarget_q[type][STACKOFFSET(EG_MaxLevel, 0, 0) + i] *= MType[type] / nsum; } smooth_stack(EGs_MassTarget[type], EG_MaxLevel); smooth_stack(EGs_EgyTarget_r[type], EG_MaxLevel); smooth_stack(EGs_EgyTarget_t[type], EG_MaxLevel); smooth_stack(EGs_EgyTarget_p[type], EG_MaxLevel); smooth_stack(EGs_EgyTarget_q[type], EG_MaxLevel); myfree(egyfield_q); myfree(egyfield_p); myfree(egyfield_t); myfree(egyfield_r); myfree(mfield); } } mpi_printf("done.\n"); } /* returns the minimum of two input values */ double min(double a, double b) { if(a < b) return a; else return b; } /* returns a measure of the difference between two hierarchically smoothed fields. * If there is enough mass in a given cells (i.e. of it lies above a threshold value), * a finder version of the mesh is considered, and the daughter cells are considered in turn. * If 'flag' is set, we are dealing with energy density fields, which are converted to velocity dispersion * fields before evaluating the difference. */ double calc_stack_difference(double *d1, double *d2, int l, int i, int j, int maxlevel, double *ref1, double *ref2, double thresh, double *dist, int flag) { if(l >= maxlevel || ref1[STACKOFFSET(l, i, j)] < thresh) { if(flag) { if(ref1[STACKOFFSET(l, i, j)] > 0 && ref2[STACKOFFSET(l, i, j)] > 0) return fabs(d1[STACKOFFSET(l, i, j)] / ref1[STACKOFFSET(l, i, j)] - d2[STACKOFFSET(l, i, j)] / ref2[STACKOFFSET(l, i, j)]) / dmax((d2[STACKOFFSET(l, i, j)] / ref2[STACKOFFSET(l, i, j)]), All.LowerDispLimit); else return 0; } else { #ifdef RADIAL_WEIGHTING_IN_DENSITY_RESPONSE return fabs(d1[STACKOFFSET(l, i, j)] - d2[STACKOFFSET(l, i, j)]) / min(dist[STACKOFFSET(l, i, j)], All.R200 / 20.0); #else return fabs(d1[STACKOFFSET(l, i, j)] - d2[STACKOFFSET(l, i, j)]); #endif } } else { double sum = 0; int ii, jj; for(ii = 2 * i; ii <= 2 * i + 1; ii++) for(jj = 2 * j; jj <= 2 * j + 1; jj++) sum += calc_stack_difference(d1, d2, l + 1, ii, jj, maxlevel, ref1, ref2, thresh, dist, flag); return sum; } } void calc_smoothed_stack(double *din, double *dout, int maxlevel, double *ref, double thresh) { int i, j, n; n = (1 << maxlevel); for(i = 0; i < n; i++) for(j = 0; j < n; j++) { dout[i * n + j] = eval_smoothed_stack(din, maxlevel, i, j, maxlevel, ref, thresh); } } double eval_smoothed_stack(double *din, int l, int i, int j, int maxlevel, double *ref, double thresh) { if(l == 0) return din[STACKOFFSET(l, i, j)] / pow((1 << (maxlevel - l)), 2); else { if(ref[STACKOFFSET(l - 1, i / 2, j / 2)] > thresh) return din[STACKOFFSET(l, i, j)] / pow((1 << (maxlevel - l)), 2); else return eval_smoothed_stack(din, l - 1, i / 2, j / 2, maxlevel, ref, thresh); } } /* produce a hierarchically averaged version of the given field, a 'stack' */ void smooth_stack(double *data, int maxlevel) { int l, i, j, n; for(l = maxlevel - 1; l >= 0; l--) { n = (1 << l); for(i = 0; i < n; i++) for(j = 0; j < n; j++) { data[STACKOFFSET(l, i, j)] = data[STACKOFFSET(l + 1, 2 * i, 2 * j)] + data[STACKOFFSET(l + 1, 2 * i + 1, 2 * j)] + data[STACKOFFSET(l + 1, 2 * i, 2 * j + 1)] + data[STACKOFFSET(l + 1, 2 * i + 1, 2 * j + 1)]; } } } /* produce a little test of whether we get the forces and potentials * of the halo correctly. */ void force_test(void) { int i, N= 10000; double pos[3], acc[3], acc_exact[3], pot, pot_exact; if(ThisTask == 0) { FILE *fd = fopen("forcetest.dat", "w"); fwrite(&N, sizeof(int), 1, fd); for(i=0; i< N; i++) { halo_get_fresh_coordinate(pos); /* a halo particle */ forcegrid_get_acceleration(pos, acc); halo_get_acceleration(pos, acc_exact); pot = forcegrid_get_potential(pos); pot_exact = halo_get_potential(pos); fwrite(pos, 3, sizeof(double), fd); fwrite(acc, 3, sizeof(double), fd); fwrite(acc_exact, 3, sizeof(double), fd); fwrite(&pot, 1, sizeof(double), fd); fwrite(&pot_exact, 1, sizeof(double), fd); } fclose(fd); } } GalIC/src/halo.c000644 000765 000024 00000010726 12373713530 014332 0ustar00volkerstaff000000 000000 /******************************************************************************* * This file is part of the GALIC code developed by D. Yurin and V. Springel. * * Copyright (c) 2014 * Denis Yurin (denis.yurin@h-its.org) * Volker Springel (volker.springel@h-its.org) *******************************************************************************/ #include #include #include #include #include #include #include "allvars.h" #include "proto.h" /* this file contains auxiliary routines for the description of the halo, * here modeled as a Hernquist sphere */ /* this function returns a new random coordinate for the halo */ void halo_get_fresh_coordinate(double *pos) { double r; do { double q = gsl_rng_uniform(random_generator); if(q > 0) r = All.Halo_A * (q + sqrt(q)) / (1 - q); else r = 0; double phi = gsl_rng_uniform(random_generator) * M_PI * 2; double theta = acos(gsl_rng_uniform(random_generator) * 2 - 1); pos[0] = r * sin(theta) * cos(phi); pos[1] = r * sin(theta) * sin(phi); pos[2] = r * cos(theta) / All.HaloStretch; r = sqrt(pos[0]*pos[0] + pos[1]*pos[1] + pos[2]*pos[2]); } while(r > All.Rmax); } /* return the dark matter halo density at the given coordinate */ double halo_get_density(double *pos) { double r = sqrt(pos[0] * pos[0] + pos[1] * pos[1] + All.HaloStretch * All.HaloStretch * pos[2] * pos[2]); return All.HaloStretch * All.Halo_Mass / (2 * M_PI) * All.Halo_A / (r + 1.0e-6 * All.Halo_A) / pow(r + All.Halo_A, 3); } /* Note that the other functions below will only be meaningfully called for a spherical system */ /* cumulative mass inside a given radius for a spherical Hernquist halo */ double halo_get_mass_inside_radius(double r) { return All.Halo_Mass * pow(r / (r + All.Halo_A), 2); } double halo_get_potential(double *pos) { double r = sqrt(pos[0] * pos[0] + pos[1] * pos[1] + pos[2] * pos[2]); return halo_get_potential_from_radius(r); } double halo_get_potential_from_radius(double r) { double phi = -All.G * All.Halo_Mass / (r + All.Halo_A); return phi; } /* returns the acceleration at coordinate pos[] */ void halo_get_acceleration(double *pos, double *acc) { double r = sqrt(pos[0] * pos[0] + pos[1] * pos[1] + pos[2] * pos[2]); double fac = All.G * All.Halo_Mass / ((r + 1.0e-6 * All.Halo_A)* (r + All.Halo_A) * (r + All.Halo_A)); acc[0] = -fac * pos[0]; acc[1] = -fac * pos[1]; acc[2] = -fac * pos[2]; } double halo_get_escape_speed(double *pos) { double r = sqrt(pos[0] * pos[0] + pos[1] * pos[1] + pos[2] * pos[2]); double phi = -All.G * All.Halo_Mass / (r + All.Halo_A); double vesc = sqrt(-2.0 * phi); return vesc; } /* E to q conversion */ double halo_E_to_q(double E) { return sqrt(-E * All.Halo_A / (All.G * All.Halo_Mass)); } /* Hernquist density of states (as a function of q) */ double halo_g_q(double q) { double pre = 2 * sqrt(2) * M_PI * M_PI * All.Halo_A * All.Halo_A * All.Halo_A * sqrt(All.G * All.Halo_Mass / All.Halo_A); return pre * (3 * (8 * q * q * q * q - 4 * q * q + 1) * acos(q) - q * sqrt(1 - q * q) * (4 * q * q - 1) * (2 * q * q + 3)) / (3 * q * q * q * q * q); } /* Hernquist distribution function (as a function of q) */ double halo_f_q(double q) { double pre = (All.Halo_Mass / (All.Halo_A * All.Halo_A * All.Halo_A)) / (4 * M_PI * M_PI * M_PI * pow(2 * All.G * All.Halo_Mass / All.Halo_A, 1.5)); return pre * (3 * asin(q) + q * sqrt(1 - q * q) * (1 - 2 * q * q) * (8 * q * q * q * q - 8 * q * q - 3)) / pow(1 - q * q, 2.5); } /* Hernquist distribution function (as a function of radius and velocity) */ double halo_f(double rad, double vel) { double E = 0.5 * vel * vel + halo_get_potential_from_radius(rad); double q = halo_E_to_q(E); return halo_f_q(q); } /* generate velocities for Hernquist distribution function with von Neumann rejection technique */ double halo_generate_v(double rad) { double pot = halo_get_potential_from_radius(rad); double v_max = sqrt(-2 * pot); double v_guess, x_aux; double f_max = v_max * v_max * halo_f(rad, 0); v_guess = gsl_rng_uniform(random_generator) * v_max; x_aux = gsl_rng_uniform(random_generator) * f_max; while(x_aux > v_guess * v_guess * halo_f(rad, v_guess)) { v_guess = gsl_rng_uniform(random_generator) * v_max; x_aux = gsl_rng_uniform(random_generator) * f_max; } return v_guess; } GalIC/src/init.c000644 000765 000024 00000005532 12373713530 014351 0ustar00volkerstaff000000 000000 /******************************************************************************* * This file is part of the GALIC code developed by D. Yurin and V. Springel. * * Copyright (c) 2014 * Denis Yurin (denis.yurin@h-its.org) * Volker Springel (volker.springel@h-its.org) *******************************************************************************/ #include #include #include #include #include #include "allvars.h" #include "proto.h" /* do various initializations */ void init(void) { if(ThisTask == 0) { char buf[2000]; sprintf(buf, "%s/memory.txt", All.OutputDir); if(!(FdMemory = fopen(buf, "w"))) terminate("can't open file '%s'", buf); } mymalloc_init(); set_units(); random_generator = gsl_rng_alloc(gsl_rng_ranlxd1); gsl_rng_set(random_generator, 42 + ThisTask); /* start-up seed */ set_softenings(); All.TopNodeAllocFactor = 0.1; All.TreeAllocFactor = 0.8; #ifdef DEBUG_ENABLE_FPU_EXCEPTIONS enable_core_dumps_and_fpu_exceptions(); #endif } /* Computes conversion factors between internal code units and the cgs-system. * In addition constants like the gravitation constant are set. */ void set_units(void) { All.UnitTime_in_s = All.UnitLength_in_cm / All.UnitVelocity_in_cm_per_s; All.UnitTime_in_Megayears = All.UnitTime_in_s / SEC_PER_MEGAYEAR; if(All.GravityConstantInternal == 0) All.G = GRAVITY / pow(All.UnitLength_in_cm, 3) * All.UnitMass_in_g * pow(All.UnitTime_in_s, 2); else All.G = All.GravityConstantInternal; All.UnitDensity_in_cgs = All.UnitMass_in_g / pow(All.UnitLength_in_cm, 3); All.UnitPressure_in_cgs = All.UnitMass_in_g / All.UnitLength_in_cm / pow(All.UnitTime_in_s, 2); All.UnitCoolingRate_in_cgs = All.UnitPressure_in_cgs / All.UnitTime_in_s; All.UnitEnergy_in_cgs = All.UnitMass_in_g * pow(All.UnitLength_in_cm, 2) / pow(All.UnitTime_in_s, 2); /* convert some physical input parameters to internal units */ All.Hubble = HUBBLE * All.UnitTime_in_s; if(ThisTask == 0) { printf("\nHubble (internal units) = %g\n", All.Hubble); printf("G (internal units) = %g\n", All.G); printf("UnitMass_in_g = %g\n", All.UnitMass_in_g); printf("UnitTime_in_s = %g\n", All.UnitTime_in_s); printf("UnitVelocity_in_cm_per_s = %g\n", All.UnitVelocity_in_cm_per_s); printf("UnitDensity_in_cgs = %g\n", All.UnitDensity_in_cgs); printf("UnitEnergy_in_cgs = %g\n", All.UnitEnergy_in_cgs); printf("\n"); } } /* set the softening length used in the tree calculation of the gravitional forces */ void set_softenings(void) { All.ForceSoftening = 2.8 * All.Softening; } /* terminate the calculation. */ void endrun(void) { mpi_printf("endrun called, calling MPI_Finalize()\nbye!\n\n"); fflush(stdout); MPI_Finalize(); exit(0); } GalIC/src/io.c000644 000765 000024 00000111267 12373713530 014020 0ustar00volkerstaff000000 000000 /******************************************************************************* * This file is part of the GALIC code developed by D. Yurin and V. Springel. * * Copyright (c) 2014 * Denis Yurin (denis.yurin@h-its.org) * Volker Springel (volker.springel@h-its.org) *******************************************************************************/ #include #include #include #include #include #include #include #include #include #include #include #include #include "allvars.h" #include "proto.h" #ifdef HAVE_HDF5 #include void write_header_attributes_in_hdf5(hid_t handle); #endif /*! \file io.c * \brief Output of a snapshot (or an image file) file to disk. */ static int n_type[6]; /**< contains the local (for a single task) number of particles of each type in the snapshot file */ static long long ntot_type_all[6]; /**< contains the global number of particles of each type in the snapshot file */ void output_density_field(int iter) { int nbin; double *dout; if(!(iter % All.StepsBetweenDump) == 0) return; int num = iter / All.StepsBetweenDump; int type, lev; for(type = 1; type <= 3; type++) { if(type == 1 && All.Halo_N == 0) continue; if(type == 2 && All.Disk_N == 0) continue; if(type == 3 && All.Bulge_N == 0) continue; char buf[1000]; sprintf(buf, "%s/densfield_%d_%03d.dat", All.OutputDir, type, num); FILE *fd = fopen(buf, "w"); fwrite(&DG_MaxLevel, sizeof(int), 1, fd); for(lev = DG_MaxLevel; lev >= 0; lev--) { int nbin = (1 << lev); fwrite(&nbin, sizeof(int), 1, fd); fwrite(&DGs_MassTarget[type][STACKOFFSET(lev, 0, 0)], sizeof(double), nbin * nbin, fd); } /* target, variable resolution */ nbin = (1 << DG_MaxLevel); dout = mymalloc("dout", nbin * nbin * sizeof(double)); calc_smoothed_stack(DGs_MassTarget[type], dout, DG_MaxLevel, DGs_MassTarget[type], All.MinParticlesPerBinForDensityMeasurement * MType[type] / NType[type]); fwrite(&nbin, sizeof(int), 1, fd); fwrite(dout, sizeof(double), nbin * nbin, fd); /* response, variable resolution */ calc_smoothed_stack(DGs_MassResponse[type], dout, DG_MaxLevel, DGs_MassTarget[type], All.MinParticlesPerBinForDensityMeasurement * MType[type] / NType[type]); fwrite(&nbin, sizeof(int), 1, fd); fwrite(dout, sizeof(double), nbin * nbin, fd); myfree(dout); /* response, fine resolution */ fwrite(&DG_Nbin, sizeof(int), 1, fd); fwrite(&DGs_MassResponse[type][STACKOFFSET(DG_MaxLevel, 0, 0)], sizeof(double), DG_Ngrid, fd); /************************************/ /* Density grid coordinates */ { double *tmpR = mymalloc("tmpR", DG_Ngrid * sizeof(double)); double *tmpz = mymalloc("tmpz", DG_Ngrid * sizeof(double)); int i, k, j; for(k = 0; k < DG_Nbin; k++) { double z = DG_Rmin * 0.5 * (pow(DG_Fac, k) + pow(DG_Fac, k + 1) - 2.0); for(j = 0; j < DG_Nbin; j++) { double R = DG_Rmin * 0.5 * (pow(DG_Fac, j) + pow(DG_Fac, j + 1) - 2.0); i = k * DG_Nbin + j; /* z,r */ tmpR[i] = R; tmpz[i] = z; } } nbin = DG_Nbin; fwrite(&nbin, sizeof(int), 1, fd); fwrite(tmpR, sizeof(double), DG_Ngrid, fd); fwrite(&nbin, sizeof(int), 1, fd); fwrite(tmpz, sizeof(double), DG_Ngrid, fd); myfree(tmpz); myfree(tmpR); } /*************************************/ /* Energy grid coordinates */ { double *tmpR = mymalloc("tmpR", EG_Ngrid * sizeof(double)); double *tmpz = mymalloc("tmpz", EG_Ngrid * sizeof(double)); int i, k, j; for(k = 0; k < EG_Nbin; k++) { double z = EG_Rmin * 0.5 * (pow(EG_Fac, k) + pow(EG_Fac, k+1) - 2.0); for(j = 0; j < EG_Nbin; j++) { double R = EG_Rmin * 0.5 * (pow(EG_Fac, j) + pow(EG_Fac, j + 1) - 2.0); i = k * EG_Nbin + j; /* z,r */ tmpR[i] = R; tmpz[i] = z; } } nbin = EG_Nbin; fwrite(&nbin, sizeof(int), 1, fd); fwrite(tmpR, sizeof(double), EG_Ngrid, fd); fwrite(&nbin, sizeof(int), 1, fd); fwrite(tmpz, sizeof(double), EG_Ngrid, fd); myfree(tmpz); myfree(tmpR); } /*************************************/ /* target mass, variable resolution */ nbin = (1 << EG_MaxLevel); dout = mymalloc("dout", nbin * nbin * sizeof(double)); calc_smoothed_stack(EGs_MassTarget[type], dout, EG_MaxLevel, EGs_MassResponse[type], All.MinParticlesPerBinForDispersionMeasurement * MType[type] / NType[type]); fwrite(&nbin, sizeof(int), 1, fd); fwrite(dout, sizeof(double), nbin * nbin, fd); /* response mass, variable resolution */ nbin = (1 << EG_MaxLevel); calc_smoothed_stack(EGs_MassResponse[type], dout, EG_MaxLevel, EGs_MassResponse[type], All.MinParticlesPerBinForDispersionMeasurement * MType[type] / NType[type]); fwrite(&nbin, sizeof(int), 1, fd); fwrite(dout, sizeof(double), nbin * nbin, fd); myfree(dout); /*************************************/ /* target energy, variable resolution */ nbin = (1 << EG_MaxLevel); dout = mymalloc("dout", nbin * nbin * sizeof(double)); calc_smoothed_stack(EGs_EgyTarget_r[type], dout, EG_MaxLevel, EGs_MassResponse[type], All.MinParticlesPerBinForDispersionMeasurement * MType[type] / NType[type]); fwrite(&nbin, sizeof(int), 1, fd); fwrite(dout, sizeof(double), nbin * nbin, fd); /* response energy, variable resolution */ calc_smoothed_stack(EGs_EgyResponse_r[type], dout, EG_MaxLevel, EGs_MassResponse[type], All.MinParticlesPerBinForDispersionMeasurement * MType[type] / NType[type]); fwrite(&nbin, sizeof(int), 1, fd); fwrite(dout, sizeof(double), nbin * nbin, fd); myfree(dout); /*************************************/ /* target energy, variable resolution */ nbin = (1 << EG_MaxLevel); dout = mymalloc("dout", nbin * nbin * sizeof(double)); calc_smoothed_stack(EGs_EgyTarget_t[type], dout, EG_MaxLevel, EGs_MassResponse[type], All.MinParticlesPerBinForDispersionMeasurement * MType[type] / NType[type]); fwrite(&nbin, sizeof(int), 1, fd); fwrite(dout, sizeof(double), nbin * nbin, fd); /* response energy, variable resolution */ calc_smoothed_stack(EGs_EgyResponse_t[type], dout, EG_MaxLevel, EGs_MassResponse[type], All.MinParticlesPerBinForDispersionMeasurement * MType[type] / NType[type]); fwrite(&nbin, sizeof(int), 1, fd); fwrite(dout, sizeof(double), nbin * nbin, fd); myfree(dout); /*************************************/ /* target energy, variable resolution */ nbin = (1 << EG_MaxLevel); dout = mymalloc("dout", nbin * nbin * sizeof(double)); calc_smoothed_stack(EGs_EgyTarget_p[type], dout, EG_MaxLevel, EGs_MassResponse[type], All.MinParticlesPerBinForDispersionMeasurement * MType[type] / NType[type]); fwrite(&nbin, sizeof(int), 1, fd); fwrite(dout, sizeof(double), nbin * nbin, fd); /* response energy, variable resolution */ calc_smoothed_stack(EGs_EgyResponse_p[type], dout, EG_MaxLevel, EGs_MassResponse[type], All.MinParticlesPerBinForDispersionMeasurement * MType[type] / NType[type]); fwrite(&nbin, sizeof(int), 1, fd); fwrite(dout, sizeof(double), nbin * nbin, fd); myfree(dout); /*************************************/ /* target energy, variable resolution */ nbin = (1 << EG_MaxLevel); dout = mymalloc("dout", nbin * nbin * sizeof(double)); calc_smoothed_stack(EGs_EgyTarget_q[type], dout, EG_MaxLevel, EGs_MassResponse[type], All.MinParticlesPerBinForDispersionMeasurement * MType[type] / NType[type]); fwrite(&nbin, sizeof(int), 1, fd); fwrite(dout, sizeof(double), nbin * nbin, fd); /* response energy, variable resolution */ calc_smoothed_stack(EGs_EgyResponse_q[type], dout, EG_MaxLevel, EGs_MassResponse[type], All.MinParticlesPerBinForDispersionMeasurement * MType[type] / NType[type]); fwrite(&nbin, sizeof(int), 1, fd); fwrite(dout, sizeof(double), nbin * nbin, fd); myfree(dout); /*************************************/ fclose(fd); } } void output_particles(int iter) { if(!(iter % All.StepsBetweenDump) == 0) return; int num = iter / All.StepsBetweenDump; char buf[500]; int n, filenr, gr, ngroups, masterTask, lastTask; mpi_printf("\nwriting snapshot file #%d... \n", num); CommBuffer = mymalloc("CommBuffer", All.BufferSize * 1024 * 1024); if(NTask < All.NumFilesPerSnapshot) { if(ThisTask == 0) printf("Fatal error.\nNumber of processors must be larger or equal than All.NumFilesPerSnapshot.\n"); endrun(); } if(All.SnapFormat < 1 || All.SnapFormat > 3) { mpi_printf("Unsupported File-Format\n"); endrun(); } #ifndef HAVE_HDF5 if(All.SnapFormat == 3) { mpi_printf("Code wasn't compiled with HDF5 support enabled!\n"); endrun(); } #endif /* determine global and local particle numbers */ for(n = 0; n < 6; n++) n_type[n] = 0; for(n = 0; n < NumPart; n++) { n_type[P[n].Type]++; } sumup_large_ints(6, n_type, ntot_type_all); /* assign processors to output files */ distribute_file(All.NumFilesPerSnapshot, 0, 0, NTask - 1, &filenr, &masterTask, &lastTask); if(All.NumFilesPerSnapshot > 1) { if(ThisTask == 0) { sprintf(buf, "%s/snapdir_%03d", All.OutputDir, num); mkdir(buf, 02755); } MPI_Barrier(MPI_COMM_WORLD); } if(All.NumFilesPerSnapshot > 1) sprintf(buf, "%s/snapdir_%03d/%s_%03d.%d", All.OutputDir, num, All.OutputFile, num, filenr); else sprintf(buf, "%s%s_%03d", All.OutputDir, All.OutputFile, num); ngroups = All.NumFilesPerSnapshot / All.NumFilesWrittenInParallel; if((All.NumFilesPerSnapshot % All.NumFilesWrittenInParallel)) ngroups++; for(gr = 0; gr < ngroups; gr++) { if((filenr / All.NumFilesWrittenInParallel) == gr) /* ok, it's this processor's turn */ { write_file(buf, masterTask, lastTask); } MPI_Barrier(MPI_COMM_WORLD); } myfree(CommBuffer); mpi_printf("done with writing snapshot.\n\n"); } /*! \brief This function fills the write buffer with particle data. * * New output blocks can in principle be added here. * * \param blocknr ID of the output block (i.e. position, velocities...) * \param startindex pointer containing the offset in the write buffer * \param pc nuber of particle to be put in the buffer * \param type particle type * \param subbox_flag if greater than 0 instructs the code to output only a subset * of the whole domain */ void fill_write_buffer(enum iofields blocknr, int *startindex, int pc, int type) { int n, k, pindex; MyOutputFloat *fp; MyIDType *ip; fp = (MyOutputFloat *) CommBuffer; ip = (MyIDType *) CommBuffer; pindex = *startindex; for(n = 0; n < pc; pindex++) { if(P[pindex].Type == type) switch (blocknr) { case IO_POS: /* positions */ for(k = 0; k < 3; k++) { fp[k] = P[pindex].Pos[k]; } n++; fp += 3; break; case IO_VEL: /* velocities */ for(k = 0; k < 3; k++) { fp[k] = P[pindex].Vel[k]; } n++; fp += 3; break; case IO_VELTHEO: /* velocities */ for(k = 0; k < 3; k++) { fp[k] = P[pindex].VelTheo[k]; } n++; fp += 3; break; case IO_ID: /* particle ID */ *ip++ = P[pindex].ID; n++; break; case IO_MASS: /* particle mass */ *fp++ = P[pindex].Mass; n++; break; case IO_LASTENTRY: terminate("reached last entry in switch - how can that be?"); break; } } *startindex = pindex; } /*! \brief This function tells the size in bytes of one data entry in each of the blocks * defined for the output file. * * \param blocknr ID of the output block (i.e. position, velocities...) * \param mode used to distinguish whether the function is called in input * mode (mode > 0) or in output mode (mode = 0). The size of one data * entry may vary depending on the mode * \return size of the data entry in bytes */ int get_bytes_per_blockelement(enum iofields blocknr, int mode) { int bytes_per_blockelement = 0; switch (blocknr) { case IO_POS: case IO_VEL: case IO_VELTHEO: if(mode) bytes_per_blockelement = 3 * sizeof(MyInputFloat); else bytes_per_blockelement = 3 * sizeof(MyOutputFloat); break; case IO_ID: bytes_per_blockelement = sizeof(MyIDType); break; case IO_MASS: if(mode) bytes_per_blockelement = sizeof(MyInputFloat); else bytes_per_blockelement = sizeof(MyOutputFloat); break; case IO_LASTENTRY: terminate("reached last entry in switch - strange."); break; } return bytes_per_blockelement; } /*! \brief This function determines the type of one data entry in each of the blocks * defined for the output file. * * Used only if output in HDF5 format is enabled * * \param blocknr ID of the output block (i.e. position, velocities...) * \return typekey, a flag that indicates the type of the data entry */ int get_datatype_in_block(enum iofields blocknr) { int typekey; switch (blocknr) { case IO_ID: #ifdef LONGIDS typekey = 2; /* native long long */ #else typekey = 0; /* native int */ #endif break; default: typekey = 1; /* native MyOutputFloat */ break; } return typekey; } /*! \brief This function determines the number of elements composing one data entry * in each of the blocks defined for the output file. * * Used only if output in HDF5 format is enabled * * \param blocknr ID of the output block (i.e. position, velocities...) * \return number of elements of one data entry */ int get_values_per_blockelement(enum iofields blocknr) { int values = 0; switch (blocknr) { case IO_POS: case IO_VEL: case IO_VELTHEO: values = 3; break; case IO_ID: case IO_MASS: values = 1; break; case IO_LASTENTRY: terminate("reached last entry in switch - strange."); break; } return values; } /*! \brief Get particle number in an output block * * This function determines how many particles there are in a given block, * based on the information in the header-structure. It also flags particle * types that are present in the block in the typelist array. * * \param blocknr ID of the output block (i.e. position, velocities...) * \param typelist array that contains the number of particles of each type in the block * \return the total number of particles in the block */ int get_particles_in_block(enum iofields blocknr, int *typelist) { int i, nall, ntot_withmasses; nall = 0; ntot_withmasses = 0; for(i = 0; i < 6; i++) { typelist[i] = 0; if(header.npart[i] > 0) { nall += header.npart[i]; typelist[i] = 1; } if(All.MassTable[i] == 0) ntot_withmasses += header.npart[i]; } switch (blocknr) { case IO_POS: case IO_VEL: case IO_VELTHEO: case IO_ID: return nall; break; case IO_MASS: for(i = 0; i < 6; i++) { typelist[i] = 0; if(All.MassTable[i] == 0 && header.npart[i] > 0) typelist[i] = 1; } return ntot_withmasses; break; case IO_LASTENTRY: terminate("reached last entry in switch - strange."); break; } terminate("reached end of function - this should not happen"); return 0; } /*! \brief Check if a block is present in a file * * This function tells whether a block in the input/output file is present * or not. Because the blocks processed in the two cases are different, the * mode is indicated with the flag write (1=write, 0=read). * * \param blocknr ID of the output block (i.e. position, velocities...) * \param write if 0 the function is in read mode, if 1 the functionis in write mode * \return 0 if the block is not present, 1 otherwise */ int blockpresent(enum iofields blocknr, int write) { switch (blocknr) { case IO_POS: case IO_VEL: case IO_VELTHEO: case IO_ID: case IO_MASS: return 1; /* always present */ break; case IO_LASTENTRY: return 0; /* will not occur */ break; } return 0; /* default: not present */ } /*! \brief This function associates a short 4-character block name with each block number. * * This is stored in front of each block for snapshot FileFormat=2. * * \param blocknr ID of the output block (i.e. position, velocities...) * \param label string containing the dataset name */ void get_Tab_IO_Label(enum iofields blocknr, char *label) { switch (blocknr) { case IO_POS: strncpy(label, "POS ", 4); break; case IO_VEL: strncpy(label, "VEL ", 4); break; case IO_VELTHEO: strncpy(label, "VELT", 4); break; case IO_ID: strncpy(label, "ID ", 4); break; case IO_MASS: strncpy(label, "MASS", 4); break; case IO_LASTENTRY: terminate("reached last statement in switch - this should not happen"); break; } } /*! \brief This function associates a dataset name with each block number. * * This is needed to name the dataset if the output is written in HDF5 format * * \param blocknr ID of the output block (i.e. position, velocities...) * \param buf string containing the dataset name */ void get_dataset_name(enum iofields blocknr, char *buf) { strcpy(buf, "default"); switch (blocknr) { case IO_POS: strcpy(buf, "Coordinates"); break; case IO_VEL: strcpy(buf, "Velocities"); break; case IO_VELTHEO: strcpy(buf, "VelocitiesTheory"); break; case IO_ID: strcpy(buf, "ParticleIDs"); break; case IO_MASS: strcpy(buf, "Masses"); break; case IO_LASTENTRY: terminate("reached last statement in switch - this should not happen"); break; } } /*! \brief Actually write the snapshot file to the disk * * This function writes a snapshot file containing the data from processors * 'writeTask' to 'lastTask'. 'writeTask' is the one that actually writes. * Each snapshot file contains a header first, then particle positions, * velocities and ID's. Then particle masses are written for those particle * types with zero entry in MassTable. After that, first the internal * energies u, and then the density is written for the SPH particles. If * cooling is enabled, mean molecular weight and neutral hydrogen abundance * are written for the gas particles. This is followed by the SPH smoothing * length and further blocks of information, depending on included physics * and compile-time flags. * * \param fname string containing the file name * \param writeTask the rank of the task in a writing group that which is responsible * for the output operations * \param lastTask the rank of the last task in a writing group * \param subbox_flag if greater than 0 instructs the code to output only a subset * of the whole domain * */ void write_file(char *fname, int writeTask, int lastTask) { int type, bytes_per_blockelement, npart, nextblock, typelist[6]; int n_for_this_task, n, p, pc, offset = 0, task; int blockmaxlen, ntot_type[6], nn[6]; enum iofields blocknr; char label[8]; int bnr; int blksize; MPI_Status status; FILE *fd = 0; #ifdef HAVE_HDF5 hid_t hdf5_file = 0, hdf5_grp[6], hdf5_headergrp = 0, hdf5_dataspace_memory; hid_t hdf5_datatype = 0, hdf5_dataspace_in_file = 0, hdf5_dataset = 0; hsize_t dims[2], count[2], start[2]; int rank = 0, pcsum = 0; char buf[500]; #endif #define SKIP {my_fwrite(&blksize,sizeof(int),1,fd);} /* determine particle numbers of each type in file */ if(ThisTask == writeTask) { for(n = 0; n < 6; n++) ntot_type[n] = n_type[n]; for(task = writeTask + 1; task <= lastTask; task++) { MPI_Recv(&nn[0], 6, MPI_INT, task, TAG_LOCALN, MPI_COMM_WORLD, &status); for(n = 0; n < 6; n++) ntot_type[n] += nn[n]; } for(task = writeTask + 1; task <= lastTask; task++) MPI_Send(&ntot_type[0], 6, MPI_INT, task, TAG_N, MPI_COMM_WORLD); } else { MPI_Send(&n_type[0], 6, MPI_INT, writeTask, TAG_LOCALN, MPI_COMM_WORLD); MPI_Recv(&ntot_type[0], 6, MPI_INT, writeTask, TAG_N, MPI_COMM_WORLD, &status); } /* fill file header */ for(n = 0; n < 6; n++) { header.npart[n] = ntot_type[n]; header.npartTotal[n] = (unsigned int) ntot_type_all[n]; header.npartTotalHighWord[n] = (unsigned int) (ntot_type_all[n] >> 32); } for(n = 0; n < 6; n++) header.mass[n] = All.MassTable[n]; header.time = 0; header.redshift = 0; header.flag_sfr = 0; header.flag_feedback = 0; header.flag_cooling = 0; header.flag_stellarage = 0; header.flag_metals = 0; header.flag_tracer_field = 0; header.num_files = All.NumFilesPerSnapshot; header.BoxSize = 0; header.Omega0 = 0; header.OmegaLambda = 0; header.HubbleParam = 0; #ifdef OUTPUT_IN_DOUBLEPRECISION header.flag_doubleprecision = 1; #else header.flag_doubleprecision = 0; #endif /* open file and write header */ if(ThisTask == writeTask) { if(All.SnapFormat == 3) { #ifdef HAVE_HDF5 sprintf(buf, "%s.hdf5", fname); mpi_printf("writing snapshot file: '%s' (file 1 of %d)\n", fname, All.NumFilesPerSnapshot); hdf5_file = H5Fcreate(buf, H5F_ACC_TRUNC, H5P_DEFAULT, H5P_DEFAULT); hdf5_headergrp = H5Gcreate(hdf5_file, "/Header", 0); for(type = 0; type < 6; type++) { if(header.npart[type] > 0) { sprintf(buf, "/PartType%d", type); hdf5_grp[type] = H5Gcreate(hdf5_file, buf, 0); } } write_header_attributes_in_hdf5(hdf5_headergrp); #endif } else { if(!(fd = fopen(fname, "w"))) { printf("can't open file `%s' for writing snapshot.\n", fname); terminate("file open error"); } mpi_printf("writing snapshot file: '%s' (file 1 of %d)\n", fname, All.NumFilesPerSnapshot); if(All.SnapFormat == 2) { blksize = sizeof(int) + 4 * sizeof(char); SKIP; my_fwrite((void *) "HEAD", sizeof(char), 4, fd); nextblock = sizeof(header) + 2 * sizeof(int); my_fwrite(&nextblock, sizeof(int), 1, fd); SKIP; } blksize = sizeof(header); SKIP; my_fwrite(&header, sizeof(header), 1, fd); SKIP; } } for(bnr = 0; bnr < 1000; bnr++) { blocknr = (enum iofields) bnr; if(blocknr == IO_LASTENTRY) break; if(blockpresent(blocknr, 1)) { bytes_per_blockelement = get_bytes_per_blockelement(blocknr, 0); blockmaxlen = ((int) (All.BufferSize * 1024 * 1024)) / bytes_per_blockelement; npart = get_particles_in_block(blocknr, &typelist[0]); if(npart > 0) { if(ThisTask == 0) { char buf[1000]; get_dataset_name(blocknr, buf); printf("writing block %d (%s)...\n", blocknr, buf); } if(ThisTask == writeTask) { if(All.SnapFormat == 1 || All.SnapFormat == 2) { if(All.SnapFormat == 2) { blksize = sizeof(int) + 4 * sizeof(char); SKIP; get_Tab_IO_Label(blocknr, label); my_fwrite(label, sizeof(char), 4, fd); nextblock = npart * bytes_per_blockelement + 2 * sizeof(int); my_fwrite(&nextblock, sizeof(int), 1, fd); SKIP; } blksize = npart * bytes_per_blockelement; SKIP; } } for(type = 0; type < 6; type++) { if(typelist[type]) { #ifdef HAVE_HDF5 if(ThisTask == writeTask && All.SnapFormat == 3 && header.npart[type] > 0) { switch (get_datatype_in_block(blocknr)) { case 0: hdf5_datatype = H5Tcopy(H5T_NATIVE_UINT); break; case 1: #ifdef OUTPUT_IN_DOUBLEPRECISION hdf5_datatype = H5Tcopy(H5T_NATIVE_DOUBLE); #else hdf5_datatype = H5Tcopy(H5T_NATIVE_FLOAT); #endif break; case 2: hdf5_datatype = H5Tcopy(H5T_NATIVE_UINT64); break; } dims[0] = header.npart[type]; dims[1] = get_values_per_blockelement(blocknr); if(dims[1] == 1) rank = 1; else rank = 2; get_dataset_name(blocknr, buf); hdf5_dataspace_in_file = H5Screate_simple(rank, dims, NULL); hdf5_dataset = H5Dcreate(hdf5_grp[type], buf, hdf5_datatype, hdf5_dataspace_in_file, H5P_DEFAULT); pcsum = 0; } #endif for(task = writeTask, offset = 0; task <= lastTask; task++) { if(task == ThisTask) { n_for_this_task = n_type[type]; for(p = writeTask; p <= lastTask; p++) if(p != ThisTask) MPI_Send(&n_for_this_task, 1, MPI_INT, p, TAG_NFORTHISTASK, MPI_COMM_WORLD); } else MPI_Recv(&n_for_this_task, 1, MPI_INT, task, TAG_NFORTHISTASK, MPI_COMM_WORLD, &status); while(n_for_this_task > 0) { pc = n_for_this_task; if(pc > blockmaxlen) pc = blockmaxlen; if(ThisTask == task) fill_write_buffer(blocknr, &offset, pc, type); if(ThisTask == writeTask && task != writeTask) MPI_Recv(CommBuffer, bytes_per_blockelement * pc, MPI_BYTE, task, TAG_PDATA, MPI_COMM_WORLD, &status); if(ThisTask != writeTask && task == ThisTask) MPI_Ssend(CommBuffer, bytes_per_blockelement * pc, MPI_BYTE, writeTask, TAG_PDATA, MPI_COMM_WORLD); if(ThisTask == writeTask) { if(All.SnapFormat == 3) { #ifdef HAVE_HDF5 start[0] = pcsum; start[1] = 0; count[0] = pc; count[1] = get_values_per_blockelement(blocknr); pcsum += pc; H5Sselect_hyperslab(hdf5_dataspace_in_file, H5S_SELECT_SET, start, NULL, count, NULL); dims[0] = pc; dims[1] = get_values_per_blockelement(blocknr); hdf5_dataspace_memory = H5Screate_simple(rank, dims, NULL); H5Dwrite(hdf5_dataset, hdf5_datatype, hdf5_dataspace_memory, hdf5_dataspace_in_file, H5P_DEFAULT, CommBuffer); H5Sclose(hdf5_dataspace_memory); #endif } else { my_fwrite(CommBuffer, bytes_per_blockelement, pc, fd); } } n_for_this_task -= pc; } } #ifdef HAVE_HDF5 if(ThisTask == writeTask && All.SnapFormat == 3 && header.npart[type] > 0) { if(All.SnapFormat == 3) { H5Dclose(hdf5_dataset); H5Sclose(hdf5_dataspace_in_file); H5Tclose(hdf5_datatype); } } #endif } } if(ThisTask == writeTask) { if(All.SnapFormat == 1 || All.SnapFormat == 2) SKIP; } } } } if(ThisTask == writeTask) { if(All.SnapFormat == 3) { #ifdef HAVE_HDF5 for(type = 5; type >= 0; type--) if(header.npart[type] > 0) H5Gclose(hdf5_grp[type]); H5Gclose(hdf5_headergrp); H5Fclose(hdf5_file); #endif } else fclose(fd); } } #ifdef HAVE_HDF5 /*! \brief Write the fields contained in the header group of the HDF5 snapshot file * * This function stores the fields of the structure io_header as attributes belonging * to the header group of the HDF5 file. * * \param handle contains a reference to the header group */ void write_header_attributes_in_hdf5(hid_t handle) { hsize_t adim[1] = { 6 }; hid_t hdf5_dataspace, hdf5_attribute; hdf5_dataspace = H5Screate(H5S_SIMPLE); H5Sset_extent_simple(hdf5_dataspace, 1, adim, NULL); hdf5_attribute = H5Acreate(handle, "NumPart_ThisFile", H5T_NATIVE_INT, hdf5_dataspace, H5P_DEFAULT); H5Awrite(hdf5_attribute, H5T_NATIVE_UINT, header.npart); H5Aclose(hdf5_attribute); H5Sclose(hdf5_dataspace); hdf5_dataspace = H5Screate(H5S_SIMPLE); H5Sset_extent_simple(hdf5_dataspace, 1, adim, NULL); hdf5_attribute = H5Acreate(handle, "NumPart_Total", H5T_NATIVE_UINT, hdf5_dataspace, H5P_DEFAULT); H5Awrite(hdf5_attribute, H5T_NATIVE_UINT, header.npartTotal); H5Aclose(hdf5_attribute); H5Sclose(hdf5_dataspace); hdf5_dataspace = H5Screate(H5S_SIMPLE); H5Sset_extent_simple(hdf5_dataspace, 1, adim, NULL); hdf5_attribute = H5Acreate(handle, "NumPart_Total_HighWord", H5T_NATIVE_UINT, hdf5_dataspace, H5P_DEFAULT); H5Awrite(hdf5_attribute, H5T_NATIVE_UINT, header.npartTotalHighWord); H5Aclose(hdf5_attribute); H5Sclose(hdf5_dataspace); hdf5_dataspace = H5Screate(H5S_SIMPLE); H5Sset_extent_simple(hdf5_dataspace, 1, adim, NULL); hdf5_attribute = H5Acreate(handle, "MassTable", H5T_NATIVE_DOUBLE, hdf5_dataspace, H5P_DEFAULT); H5Awrite(hdf5_attribute, H5T_NATIVE_DOUBLE, header.mass); H5Aclose(hdf5_attribute); H5Sclose(hdf5_dataspace); hdf5_dataspace = H5Screate(H5S_SCALAR); hdf5_attribute = H5Acreate(handle, "Time", H5T_NATIVE_DOUBLE, hdf5_dataspace, H5P_DEFAULT); H5Awrite(hdf5_attribute, H5T_NATIVE_DOUBLE, &header.time); H5Aclose(hdf5_attribute); H5Sclose(hdf5_dataspace); hdf5_dataspace = H5Screate(H5S_SCALAR); hdf5_attribute = H5Acreate(handle, "Redshift", H5T_NATIVE_DOUBLE, hdf5_dataspace, H5P_DEFAULT); H5Awrite(hdf5_attribute, H5T_NATIVE_DOUBLE, &header.redshift); H5Aclose(hdf5_attribute); H5Sclose(hdf5_dataspace); hdf5_dataspace = H5Screate(H5S_SCALAR); hdf5_attribute = H5Acreate(handle, "BoxSize", H5T_NATIVE_DOUBLE, hdf5_dataspace, H5P_DEFAULT); H5Awrite(hdf5_attribute, H5T_NATIVE_DOUBLE, &header.BoxSize); H5Aclose(hdf5_attribute); H5Sclose(hdf5_dataspace); hdf5_dataspace = H5Screate(H5S_SCALAR); hdf5_attribute = H5Acreate(handle, "NumFilesPerSnapshot", H5T_NATIVE_INT, hdf5_dataspace, H5P_DEFAULT); H5Awrite(hdf5_attribute, H5T_NATIVE_INT, &header.num_files); H5Aclose(hdf5_attribute); H5Sclose(hdf5_dataspace); hdf5_dataspace = H5Screate(H5S_SCALAR); hdf5_attribute = H5Acreate(handle, "Omega0", H5T_NATIVE_DOUBLE, hdf5_dataspace, H5P_DEFAULT); H5Awrite(hdf5_attribute, H5T_NATIVE_DOUBLE, &header.Omega0); H5Aclose(hdf5_attribute); H5Sclose(hdf5_dataspace); hdf5_dataspace = H5Screate(H5S_SCALAR); hdf5_attribute = H5Acreate(handle, "OmegaLambda", H5T_NATIVE_DOUBLE, hdf5_dataspace, H5P_DEFAULT); H5Awrite(hdf5_attribute, H5T_NATIVE_DOUBLE, &header.OmegaLambda); H5Aclose(hdf5_attribute); H5Sclose(hdf5_dataspace); hdf5_dataspace = H5Screate(H5S_SCALAR); hdf5_attribute = H5Acreate(handle, "HubbleParam", H5T_NATIVE_DOUBLE, hdf5_dataspace, H5P_DEFAULT); H5Awrite(hdf5_attribute, H5T_NATIVE_DOUBLE, &header.HubbleParam); H5Aclose(hdf5_attribute); H5Sclose(hdf5_dataspace); hdf5_dataspace = H5Screate(H5S_SCALAR); hdf5_attribute = H5Acreate(handle, "Flag_Sfr", H5T_NATIVE_INT, hdf5_dataspace, H5P_DEFAULT); H5Awrite(hdf5_attribute, H5T_NATIVE_INT, &header.flag_sfr); H5Aclose(hdf5_attribute); H5Sclose(hdf5_dataspace); hdf5_dataspace = H5Screate(H5S_SCALAR); hdf5_attribute = H5Acreate(handle, "Flag_Cooling", H5T_NATIVE_INT, hdf5_dataspace, H5P_DEFAULT); H5Awrite(hdf5_attribute, H5T_NATIVE_INT, &header.flag_cooling); H5Aclose(hdf5_attribute); H5Sclose(hdf5_dataspace); hdf5_dataspace = H5Screate(H5S_SCALAR); hdf5_attribute = H5Acreate(handle, "Flag_StellarAge", H5T_NATIVE_INT, hdf5_dataspace, H5P_DEFAULT); H5Awrite(hdf5_attribute, H5T_NATIVE_INT, &header.flag_stellarage); H5Aclose(hdf5_attribute); H5Sclose(hdf5_dataspace); hdf5_dataspace = H5Screate(H5S_SCALAR); hdf5_attribute = H5Acreate(handle, "Flag_Metals", H5T_NATIVE_INT, hdf5_dataspace, H5P_DEFAULT); H5Awrite(hdf5_attribute, H5T_NATIVE_INT, &header.flag_metals); H5Aclose(hdf5_attribute); H5Sclose(hdf5_dataspace); hdf5_dataspace = H5Screate(H5S_SCALAR); hdf5_attribute = H5Acreate(handle, "Flag_Feedback", H5T_NATIVE_INT, hdf5_dataspace, H5P_DEFAULT); H5Awrite(hdf5_attribute, H5T_NATIVE_INT, &header.flag_feedback); H5Aclose(hdf5_attribute); H5Sclose(hdf5_dataspace); hdf5_dataspace = H5Screate(H5S_SCALAR); hdf5_attribute = H5Acreate(handle, "Flag_DoublePrecision", H5T_NATIVE_INT, hdf5_dataspace, H5P_DEFAULT); H5Awrite(hdf5_attribute, H5T_NATIVE_INT, &header.flag_doubleprecision); H5Aclose(hdf5_attribute); H5Sclose(hdf5_dataspace); hdf5_dataspace = H5Screate(H5S_SCALAR); hdf5_attribute = H5Acreate(handle, "Composition_vector_length", H5T_NATIVE_INT, hdf5_dataspace, H5P_DEFAULT); H5Awrite(hdf5_attribute, H5T_NATIVE_INT, &header.composition_vector_length); H5Aclose(hdf5_attribute); H5Sclose(hdf5_dataspace); } /*! \brief A simple error handler for HDF5 * * This function terminates the run or if write errors are tolerated, calls * the write_error() function to print information about the error and returns * a positive integer to allow the repetition of the write operation * (see also the HDF5 documentation) * * \param unused the parameter is not used, but it is necessary for compatibility * with the HDF5 library * \return 1 if the write error is tolerated, otherwise the run is terminated */ herr_t my_hdf5_error_handler(void *unused) { terminate("An HDF5 error was detected. Good Bye.\n"); return 0; } #endif void distribute_file(int nfiles, int firstfile, int firsttask, int lasttask, int *filenr, int *master, int *last) { int i, group; int tasks_per_file = NTask / nfiles; int tasks_left = NTask % nfiles; if(tasks_left == 0) { group = ThisTask / tasks_per_file; *master = group * tasks_per_file; *last = (group + 1) * tasks_per_file - 1; *filenr = group; return; } double tpf = ((double) NTask) / nfiles; for(i = 0, *last = -1; i < nfiles; i++) { *master = *last + 1; *last = (i + 1) * tpf; if(*last >= NTask) *last = *last - 1; if(*last < *master) terminate("last < master"); *filenr = i; if(i == nfiles - 1) *last = NTask - 1; if(ThisTask >= *master && ThisTask <= *last) return; } } /*! \brief A wrapper for the fwrite() function * * This catches I/O errors occuring for fwrite(). In this case we * better stop. If stream is null, no attempt at writing is done. * * \param ptr pointer to the beginning of data to write * \param size size in bytes of a single data element * \param nmemb number of elements to be written * \param stream pointer to the output stream * \return number of elements written to stream */ size_t my_fwrite(void *ptr, size_t size, size_t nmemb, FILE * stream) { size_t nwritten; if(!stream) return 0; if(size * nmemb > 0) { if((nwritten = fwrite(ptr, size, nmemb, stream)) != nmemb) terminate("I/O error (fwrite) on task=%d has occured: %s\n", ThisTask, strerror(errno)); } else nwritten = 0; return nwritten; } /*! \brief A wrapper for the fread() function * * This catches I/O errors occuring for fread(). In this case we * better stop. If stream is null, no attempt at readingis done. * * \param ptr pointer to the beginning of memory location where to store data * \param size size in bytes of a single data element * \param nmemb number of elements to be read * \param stream pointer to the nput stream * \return number of elements read from stream */ size_t my_fread(void *ptr, size_t size, size_t nmemb, FILE * stream) { size_t nread; if(!stream) return 0; if(size * nmemb > 0) { if((nread = fread(ptr, size, nmemb, stream)) != nmemb) { if(feof(stream)) { terminate("I/O error (fread) on task=%d has occured: end of file\n", ThisTask); } else { terminate("I/O error (fread) on task=%d has occured: %s\n", ThisTask, strerror(errno)); } } } else nread = 0; return nread; } /*! \brief A wrapper for the printf() function * * This function has the same functionalities of the standard printf() * function. However, data is written to the standard output only for * the task with rank 0 * * \param fmt string that contains format arguments */ void mpi_printf(const char *fmt, ...) { if(ThisTask == 0) { va_list l; va_start(l, fmt); vprintf(fmt, l); fflush(stdout); va_end(l); } } /*! \brief Opens the requested file name and returns the file descriptor. * * If opening fails, an error is printed and the file descriptor is * null. * * \param fnam the file name * \return a file descriptor to the file */ FILE *open_file(char *fnam) { FILE *fd; if(!(fd = fopen(fnam, "w"))) { printf("can't open file `%s' for writing.\n", fnam); } return fd; } GalIC/src/main.c000644 000765 000024 00000063417 12373713530 014340 0ustar00volkerstaff000000 000000 /******************************************************************************* * This file is part of the GALIC code developed by D. Yurin and V. Springel. * * Copyright (c) 2014 * Denis Yurin (denis.yurin@h-its.org) * Volker Springel (volker.springel@h-its.org) *******************************************************************************/ #include #include #include #include #include #include #include #include #include "allvars.h" #include "proto.h" int main(int argc, char **argv) { /* MPI-Initialization */ MPI_Init(&argc, &argv); MPI_Comm_rank(MPI_COMM_WORLD, &ThisTask); MPI_Comm_size(MPI_COMM_WORLD, &NTask); for(PTask = 0; NTask > (1 << PTask); PTask++); mpi_printf("\nThis is GalIC, version %s.\n\n" "Running on %d processors.\n\n" "Code was compiled with settings:\n\n", GALIC_VERSION, NTask); if(ThisTask == 0) output_compile_time_options(); /* check code parameters */ if(argc < 2) { if(ThisTask == 0) { printf("\nParameters are missing.\n"); printf("Call with []\n"); printf("\n"); printf(" RestartFlag Action\n"); printf(" 0 Start from scratch\n"); printf(" 1 Try to read gravity field from file\n"); printf("\n"); } endrun(); } strcpy(ParameterFile, argv[1]); if(argc >= 3) RestartFlag = atoi(argv[2]); else RestartFlag = 0; /* read in parameters for this run */ read_parameter_file(ParameterFile); /* do basic code initialization */ init(); /* determine structural parameters of the galaxy model */ structure_determination(); /* allocate force_grid */ forcegrid_allocate(); /* allocate density response grid */ densitygrid_allocate(); /* allocate grid for calculating velocity dispersion */ energygrid_allocate(); /* calculate force/potential/density grids */ forcedensitygrid_create(); /* determine solutions of the Jeans equations in the axisymmetric or spherically symmetric cases */ calculate_dispfield(); /* set-up particle positions for galaxy model */ initialize_particles(); /* calculate mass maps for energy grid */ calc_energy_grid_mass_maps(); /* calculate all orbits up front at initial state */ calc_all_response_fields(); /* now prepare the iteration loop */ int iter = 0, rep; log_message(iter); output_particles(iter); output_density_field(iter); do { for(rep = 0; rep < All.IndepenentOptimizationsPerStep; rep++) { init_updates(); optimize_some_particles(); commit_updates(); } iter++; log_message(iter); output_particles(iter); output_density_field(iter); } while(iter <= All.MaximumNumberOfSteps); mpi_printf("Maximum number of steps reached.\n"); /* clean up & finalize MPI */ endrun(); return 0; } /* this function draws new velocities for a fraction of the particles * and checks whether they would improve the fit, assuming the other * particles are not changed. */ void optimize_some_particles(void) { static int first = 1, n = 0, n_per_cpu; int i; if(first) { first = 0; n_per_cpu = 1 + (All.FractionToOptimizeIndependendly * All.TotNumPart) / NTask; mpi_printf("\nParticles treated independently per CPU: %d (effective value of FractionToOptimizeIndependendly is %g)\n\n", n_per_cpu, ((double) n_per_cpu * NTask) / All.TotNumPart); if(n_per_cpu > NumPart) terminate("n_per_cpu > NumPart"); } for(i = 0; i < n_per_cpu; i++, n++) { if(n >= NumPart) n = 0; optimize(permutation[n].index); } } /* this function calculates all the local "response fields" of the system, consisting of the * 1) time-averaged mass density of the orbits (DG_MassLoc), where "DG" stands for density grid. * 2) the energy response in the "radial" component (EG_EgyResponseRLoc), where "EG" stands for energy grid. * 3) similarly, energy response in the "tangential" component (EG_EgyResponseTLoc) * 4) similarly, energy response in the azimuthal component, relative to the streaming velocity (vphi-vstream)^2, (EG_EgyResponsePLoc) * 5) similarly, the energy response in azimuthal direction, vphi^2 (EG_EgyResponseQLoc) * 6) Finally, also the mass density is accumulated in the energy grid (EG_MassLoc), so that one can obtain the * velocity dispersion by dividing the energy response fields by this. */ void calc_all_response_fields(void) { int i, n, type; /* loop over the possible particle types */ for(type = 1; type <= 3; type++) { if(type == 1 && All.Halo_N == 0) continue; if(type == 2 && All.Disk_N == 0) continue; if(type == 3 && All.Bulge_N == 0) continue; for(i = 0; i < DG_Ngrid; i++) DG_MassLoc[type][i] = 0; for(i = 0; i < EG_Ngrid; i++) { EG_MassLoc[type][i] = 0; EG_EgyResponseRLoc[type][i] = 0; EG_EgyResponseTLoc[type][i] = 0; EG_EgyResponsePLoc[type][i] = 0; EG_EgyResponseQLoc[type][i] = 0; } } double *massOrbit = mymalloc("massOrbit", sizeof(double) * DG_Ngrid); for(n = 0; n < NumPart; n++) { type = P[n].Type; produce_orbit_response_field(P[n].Pos, P[n].Vel, P[n].ID, massOrbit, P[n].Mass, P[n].Tint, &P[n].Orbits); for(i = 0; i < DG_Ngrid; i++) DG_MassLoc[P[n].Type][i] += massOrbit[i]; calc_disp_components_for_particle(n, P[n].Vel, &P[n].vr2, &P[n].vt2, &P[n].vp2, &P[n].vq2); add_to_energy_grid(P[n].Pos, P[n].Mass, P[n].vr2, P[n].vt2, P[n].vp2, P[n].vq2, EG_MassLoc[type], EG_EgyResponseRLoc[type], EG_EgyResponseTLoc[type], EG_EgyResponsePLoc[type], EG_EgyResponseQLoc[type]); } myfree(massOrbit); calc_global_fit(); } /* this function evaluates the global quality of the fit of the model, taking into account the * orbit response and the initial velocity dispersions of the particles. */ void calc_global_fit(void) { int type; static int firstrun[6] = {0, 0, 0, 0, 0, 0}; for(type = 1; type <= 3; type++) { /* now sum across all CPUs */ MPI_Allreduce(DG_MassLoc[type], &DGs_MassResponse[type][STACKOFFSET(DG_MaxLevel, 0, 0)], DG_Ngrid, MPI_DOUBLE, MPI_SUM, MPI_COMM_WORLD); MPI_Allreduce(EG_MassLoc[type], &EGs_MassResponse[type][STACKOFFSET(EG_MaxLevel, 0, 0)], EG_Ngrid, MPI_DOUBLE, MPI_SUM, MPI_COMM_WORLD); MPI_Allreduce(EG_EgyResponseRLoc[type], &EGs_EgyResponse_r[type][STACKOFFSET(EG_MaxLevel, 0, 0)], EG_Ngrid, MPI_DOUBLE, MPI_SUM, MPI_COMM_WORLD); MPI_Allreduce(EG_EgyResponseTLoc[type], &EGs_EgyResponse_t[type][STACKOFFSET(EG_MaxLevel, 0, 0)], EG_Ngrid, MPI_DOUBLE, MPI_SUM, MPI_COMM_WORLD); MPI_Allreduce(EG_EgyResponsePLoc[type], &EGs_EgyResponse_p[type][STACKOFFSET(EG_MaxLevel, 0, 0)], EG_Ngrid, MPI_DOUBLE, MPI_SUM, MPI_COMM_WORLD); MPI_Allreduce(EG_EgyResponseQLoc[type], &EGs_EgyResponse_q[type][STACKOFFSET(EG_MaxLevel, 0, 0)], EG_Ngrid, MPI_DOUBLE, MPI_SUM, MPI_COMM_WORLD); /* hierarchically smooth the results */ smooth_stack(DGs_MassResponse[type], DG_MaxLevel); smooth_stack(EGs_MassResponse[type], EG_MaxLevel); smooth_stack(EGs_EgyResponse_r[type], EG_MaxLevel); smooth_stack(EGs_EgyResponse_t[type], EG_MaxLevel); smooth_stack(EGs_EgyResponse_p[type], EG_MaxLevel); smooth_stack(EGs_EgyResponse_q[type], EG_MaxLevel); /* calculate the goodness of fit of the density response */ S[type] = calc_stack_difference(DGs_MassResponse[type], DGs_MassTarget[type], 0, 0, 0, DG_MaxLevel, DGs_MassTarget[type], NULL, All.MinParticlesPerBinForDensityMeasurement * MType[type] / NType[type], DGs_Distance, 0); S[type] *= 1.0 / DGs_MassTarget[type][0]; /* now evalute the difference between the fields for the velocity dispersions */ Sdisp_r[type] = calc_stack_difference(EGs_EgyResponse_r[type], EGs_EgyTarget_r[type], 0, 0, 0, EG_MaxLevel, EGs_MassResponse[type], EGs_MassTarget[type], All.MinParticlesPerBinForDispersionMeasurement * MType[type] / NType[type], NULL, 1); Sdisp_t[type] = calc_stack_difference(EGs_EgyResponse_t[type], EGs_EgyTarget_t[type], 0, 0, 0, EG_MaxLevel, EGs_MassResponse[type], EGs_MassTarget[type], All.MinParticlesPerBinForDispersionMeasurement * MType[type] / NType[type], NULL, 1); Sdisp_p[type] = calc_stack_difference(EGs_EgyResponse_p[type], EGs_EgyTarget_p[type], 0, 0, 0, EG_MaxLevel, EGs_MassResponse[type], EGs_MassTarget[type], All.MinParticlesPerBinForDispersionMeasurement * MType[type] / NType[type], NULL, 1); Sdisp_q[type] = calc_stack_difference(EGs_EgyResponse_q[type], EGs_EgyTarget_q[type], 0, 0, 0, EG_MaxLevel, EGs_MassResponse[type], EGs_MassTarget[type], All.MinParticlesPerBinForDispersionMeasurement * MType[type] / NType[type], NULL, 1); int typeOfVelocityStructure; if(type == 1) /* a halo particle */ typeOfVelocityStructure = All.TypeOfHaloVelocityStructure; else if(type == 2) /* disk */ typeOfVelocityStructure = All.TypeOfDiskVelocityStructure; else if(type == 3) /* bulge */ typeOfVelocityStructure = All.TypeOfBulgeVelocityStructure; else terminate("unknown type"); /* depending on the type of velocity structure model that is selected, we either average 3 or 4 velocity dispersion constraints, * and weight this then roughly equally as the density response. */ if(typeOfVelocityStructure == 0 || typeOfVelocityStructure == 1) { Srelfac_count[type] = 3.0; Sdisp_q[type] = 0; } else Srelfac_count[type] = 4.0; /* the relative weighting factor is determined on the first iteration such that density response and velocity structure * constraint are weighted equally. The relative weighting factor is then kept fixed as the optimization progresses. */ if(firstrun[type] == 0) { Srelfac[type] = S[type] / ((Sdisp_r[type] + Sdisp_t[type] + Sdisp_p[type] + Sdisp_q[type])/ Srelfac_count[type]); firstrun[type] = 1; } Sdisp_r[type] *= Srelfac[type]; Sdisp_t[type] *= Srelfac[type]; Sdisp_p[type] *= Srelfac[type]; Sdisp_q[type] *= Srelfac[type]; } } /* this function determines a new velocity guess for particle n, and checks whether adopting this would * improve the overall fit. */ void optimize(int n) { int i, k, orbits, orbits_try; double vdir_new; double vbase[3], dir[3], sigma; double dir_rnd = gsl_rng_uniform(random_generator); double F_base, F_try; int typeOfVelocityStructure, type = P[n].Type; if(type == 1) /* a halo particle */ typeOfVelocityStructure = All.TypeOfHaloVelocityStructure; else if(type == 2) /* disk */ typeOfVelocityStructure = All.TypeOfDiskVelocityStructure; else if(type == 3) /* bulge */ typeOfVelocityStructure = All.TypeOfBulgeVelocityStructure; else terminate("unknown type"); double phi = atan2(P[n].Pos[1], P[n].Pos[0]); double vstr = 0; if(typeOfVelocityStructure == 2) { if(dir_rnd < 0.333333) { /* phi - direction */ dir[0] = -sin(phi); dir[1] = cos(phi); dir[2] = 0; sigma = sqrt(P[n].vp2_target); vstr = get_vstream(P[n].Pos, P[n].Type); } else if(dir_rnd < 0.666666) { /* R - direction */ dir[0] = cos(phi); dir[1] = sin(phi); dir[2] = 0; sigma = sqrt(P[n].vr2_target); } else { /* z - direction */ dir[0] = 0; dir[1] = 0; dir[2] = 1; sigma= sqrt(P[n].vt2_target); } } else { double phi = atan2(P[n].Pos[1], P[n].Pos[0]); double theta = acos(P[n].Pos[2] / sqrt(P[n].Pos[0] * P[n].Pos[0] + P[n].Pos[1] * P[n].Pos[1] + P[n].Pos[2] * P[n].Pos[2])); if(dir_rnd < 0.333333) { /* phi-direction */ dir[0] = -sin(phi); dir[1] = cos(phi); dir[2] = 0; sigma = sqrt(P[n].vp2_target); vstr = get_vstream(P[n].Pos, P[n].Type); } else if(dir_rnd < 0.666666) { /* radial r-direction */ dir[0] = sin(theta) * cos(phi); dir[1] = sin(theta) * sin(phi); dir[2] = cos(theta); sigma = sqrt(P[n].vr2_target); } else { /* theta-direction */ dir[0] = -cos(theta) * cos(phi); dir[1] = -cos(theta) * sin(phi); dir[2] = sin(theta); sigma= sqrt(P[n].vt2_target); } } double vdir = P[n].Vel[0] * dir[0] + P[n].Vel[1] * dir[1] + P[n].Vel[2] * dir[2]; /* We only change one of the velocity components, in direction dir[]. * The rest of the velocity vector is computed in vbase[], and will not be changed. */ vbase[0] = P[n].Vel[0] - vdir * dir[0]; vbase[1] = P[n].Vel[1] - vdir * dir[1]; vbase[2] = P[n].Vel[2] - vdir * dir[2]; double vbase2 = (vbase[0] * vbase[0] + vbase[1] * vbase[1] + vbase[2] * vbase[2]); int iter =0; do { vdir_new = vstr + gsl_ran_gaussian(random_generator, sigma); iter++; if(iter > 100000) terminate("iter > 100000"); } while( sqrt(vdir_new * vdir_new + vbase2) >= All.MaxVelInUnitsVesc * P[n].Vesc); /* in order to avoid that directional biases through binning or grid alignment effects can * lead to subtle biases, we always evaluate an orbit twice, where the relevant velocity component * is changed in sign. Vel[] and vel2[] correspond to the sign-changed pair of the original velocity, * while vel_try[] and vel_try2[] are the new velocities. */ double vel_try[3], vel_try2[3]; for(k = 0; k < 3; k++) { vel_try[k] = vbase[k] + vdir_new * dir[k]; vel_try2[k] = vbase[k] - vdir_new * dir[k]; } /* velocity with reversed component */ double vel2[3]; vel2[0] = vbase[0] - vdir * dir[0]; vel2[1] = vbase[1] - vdir * dir[1]; vel2[2] = vbase[2] - vdir * dir[2]; double *massOrbit_old = mymalloc("massOrbit_old", sizeof(double) * DG_Ngrid); double *massOrbit_new = mymalloc("massOrbit_new", sizeof(double) * DG_Ngrid); produce_orbit_response_field(P[n].Pos, P[n].Vel, P[n].ID, massOrbit_old, P[n].Mass, P[n].Tint, &P[n].Orbits); produce_orbit_response_field(P[n].Pos, vel_try, P[n].ID, massOrbit_new, P[n].Mass, P[n].Tint, &orbits_try); if(vstr > 0) { F_base = eval_fit(n, P[n].Vel, massOrbit_old, massOrbit_old); F_try = eval_fit(n, vel_try, massOrbit_new, massOrbit_old); } else { double *massOrbit_old2 = mymalloc("massOrbit_old2", sizeof(double) * DG_Ngrid); double *massOrbit_new2 = mymalloc("massOrbit_new2", sizeof(double) * DG_Ngrid); produce_orbit_response_field(P[n].Pos, vel2, P[n].ID, massOrbit_old2, P[n].Mass, P[n].Tint, &orbits); produce_orbit_response_field(P[n].Pos, vel_try2, P[n].ID, massOrbit_new2, P[n].Mass, P[n].Tint, &orbits); F_base = eval_fit(n, P[n].Vel, massOrbit_old, massOrbit_old) + eval_fit(n, vel2, massOrbit_old2, massOrbit_old); F_try = eval_fit(n, vel_try, massOrbit_new, massOrbit_old) + eval_fit(n, vel_try2, massOrbit_new2, massOrbit_old); myfree(massOrbit_new2); myfree(massOrbit_old2); } Tries[type]++; if(F_try < F_base) { for(k = 0; k < 3; k++) P[n].Vel[k] = vel_try[k]; P[n].Orbits = orbits_try; for(i = 0; i < DG_Ngrid; i++) DG_MassLoc_delta[type][i] += massOrbit_new[i] - massOrbit_old[i]; double vr2, vt2, vp2, vq2; calc_disp_components_for_particle(n, vel_try, &vr2, &vt2, &vp2, &vq2); double vr2_diff = (vr2 - P[n].vr2); double vt2_diff = (vt2 - P[n].vt2); double vp2_diff = (vp2 - P[n].vp2); double vq2_diff = (vq2 - P[n].vq2); add_to_energy_grid(P[n].Pos, P[n].Mass, vr2_diff, vt2_diff, vp2_diff, vq2_diff, NULL, EG_EgyResponseRLoc_delta[type], EG_EgyResponseTLoc_delta[type], EG_EgyResponsePLoc_delta[type], EG_EgyResponseQLoc_delta[type]); P[n].vr2 = vr2; P[n].vt2 = vt2; P[n].vp2 = vp2; P[n].vq2 = vq2; Changes[type]++; } myfree(massOrbit_new); myfree(massOrbit_old); Noptimized++; } /* this function updated the local response fields with the calculated * changes due to adopting modified velocities for some of the particles. */ void commit_updates(void) { int i, type; for(type = 1; type <= 3; type++) { if(type == 1 && All.Halo_N == 0) continue; if(type == 2 && All.Disk_N == 0) continue; if(type == 3 && All.Bulge_N == 0) continue; for(i = 0; i < DG_Ngrid; i++) DG_MassLoc[type][i] += DG_MassLoc_delta[type][i]; for(i = 0; i < EG_Ngrid; i++) { EG_EgyResponseRLoc[type][i] += EG_EgyResponseRLoc_delta[type][i]; EG_EgyResponseTLoc[type][i] += EG_EgyResponseTLoc_delta[type][i]; EG_EgyResponsePLoc[type][i] += EG_EgyResponsePLoc_delta[type][i]; EG_EgyResponseQLoc[type][i] += EG_EgyResponseQLoc_delta[type][i]; } } calc_global_fit(); } /* clear the update fields */ void init_updates(void) { int i, type; for(type = 1; type <= 3; type++) { if(type == 1 && All.Halo_N == 0) continue; if(type == 2 && All.Disk_N == 0) continue; if(type == 3 && All.Bulge_N == 0) continue; for(i = 0; i < DG_Ngrid; i++) DG_MassLoc_delta[type][i] = 0; for(i = 0; i < EG_Ngrid; i++) { EG_EgyResponseRLoc_delta[type][i] = 0; EG_EgyResponseTLoc_delta[type][i] = 0; EG_EgyResponsePLoc_delta[type][i] = 0; EG_EgyResponseQLoc_delta[type][i] = 0; } } } /* add a particle with the given squared velocity components to the energy grids. */ void add_to_energy_grid(double *pos, double mass, double vr2, double vt2, double vp2, double vq2, double *egyMass, double *egyResponse_r, double *egyResponse_t, double *egyResponse_p, double *egyResponse_q) { int iR, iz; double fR, fz; energygrid_get_cell(pos, &iR, &iz, &fR, &fz); vr2 *= mass; vt2 *= mass; vp2 *= mass; vq2 *= mass; if(egyMass) { egyMass[OFFSET(EG_MaxLevel, iz, iR)] += (1 - fR) * (1 - fz) * mass; egyMass[OFFSET(EG_MaxLevel, iz, iR + 1)] += (fR) * (1 - fz) * mass; egyMass[OFFSET(EG_MaxLevel, iz + 1, iR)] += (1 - fR) * (fz) * mass; egyMass[OFFSET(EG_MaxLevel, iz + 1, iR + 1)] += (fR) * (fz) * mass; } egyResponse_r[OFFSET(EG_MaxLevel, iz, iR)] += (1 - fR) * (1 - fz) * vr2; egyResponse_r[OFFSET(EG_MaxLevel, iz, iR + 1)] += (fR) * (1 - fz) * vr2; egyResponse_r[OFFSET(EG_MaxLevel, iz + 1, iR)] += (1 - fR) * (fz) * vr2; egyResponse_r[OFFSET(EG_MaxLevel, iz + 1, iR + 1)] += (fR) * (fz) * vr2; egyResponse_t[OFFSET(EG_MaxLevel, iz, iR)] += (1 - fR) * (1 - fz) * vt2; egyResponse_t[OFFSET(EG_MaxLevel, iz, iR + 1)] += (fR) * (1 - fz) * vt2; egyResponse_t[OFFSET(EG_MaxLevel, iz + 1, iR)] += (1 - fR) * (fz) * vt2; egyResponse_t[OFFSET(EG_MaxLevel, iz + 1, iR + 1)] += (fR) * (fz) * vt2; egyResponse_p[OFFSET(EG_MaxLevel, iz, iR)] += (1 - fR) * (1 - fz) * vp2; egyResponse_p[OFFSET(EG_MaxLevel, iz, iR + 1)] += (fR) * (1 - fz) * vp2; egyResponse_p[OFFSET(EG_MaxLevel, iz + 1, iR)] += (1 - fR) * (fz) * vp2; egyResponse_p[OFFSET(EG_MaxLevel, iz + 1, iR + 1)] += (fR) * (fz) * vp2; egyResponse_q[OFFSET(EG_MaxLevel, iz, iR)] += (1 - fR) * (1 - fz) * vq2; egyResponse_q[OFFSET(EG_MaxLevel, iz, iR + 1)] += (fR) * (1 - fz) * vq2; egyResponse_q[OFFSET(EG_MaxLevel, iz + 1, iR)] += (1 - fR) * (fz) * vq2; egyResponse_q[OFFSET(EG_MaxLevel, iz + 1, iR + 1)] += (fR) * (fz) * vq2; } /* calc the global quality of the fit if for particle n the velocity vel[] is adopted. */ double eval_fit(int n, double vel[3], double *massOrbit_new, double *massOrbit_old) { int i, type; double value = 0, value_r = 0, value_t = 0, value_p = 0, value_q = 0; type = P[n].Type; double *updated_DGs_MassResponse = mymalloc("updated_DGs_MassResponse", DG_Nstack * sizeof(double)); for(i = 0; i < DG_Ngrid; i++) updated_DGs_MassResponse[STACKOFFSET(DG_MaxLevel, 0, 0) + i] = DGs_MassResponse[type][STACKOFFSET(DG_MaxLevel, 0, 0) + i] + massOrbit_new[i] - massOrbit_old[i]; smooth_stack(updated_DGs_MassResponse, DG_MaxLevel); value = calc_stack_difference(updated_DGs_MassResponse, DGs_MassTarget[type], 0, 0, 0, DG_MaxLevel, DGs_MassTarget[type], NULL, All.MinParticlesPerBinForDensityMeasurement * MType[type] / NType[type], DGs_Distance, 0); value *= 1.0 / MType[type]; myfree(updated_DGs_MassResponse); double mass = P[n].Mass, vr2, vt2, vp2, vq2; calc_disp_components_for_particle(n, vel, &vr2, &vt2, &vp2, &vq2); double vr2_diff = (vr2 - P[n].vr2); double vt2_diff = (vt2 - P[n].vt2); double vp2_diff = (vp2 - P[n].vp2); double vq2_diff = (vq2 - P[n].vq2); double *updated_EGs_EgyResponse_r = mymalloc("updated_EGs_EgyResponse_r", EG_Nstack * sizeof(double)); double *updated_EGs_EgyResponse_t = mymalloc("updated_EGs_EgyResponse_r", EG_Nstack * sizeof(double)); double *updated_EGs_EgyResponse_p = mymalloc("updated_EGs_EgyResponse_p", EG_Nstack * sizeof(double)); double *updated_EGs_EgyResponse_q = mymalloc("updated_EGs_EgyResponse_q", EG_Nstack * sizeof(double)); int off = STACKOFFSET(EG_MaxLevel, 0, 0); for(i = 0; i < EG_Ngrid; i++) { updated_EGs_EgyResponse_r[off + i] = EGs_EgyResponse_r[type][off + i]; updated_EGs_EgyResponse_t[off + i] = EGs_EgyResponse_t[type][off + i]; updated_EGs_EgyResponse_p[off + i] = EGs_EgyResponse_p[type][off + i]; updated_EGs_EgyResponse_q[off + i] = EGs_EgyResponse_q[type][off + i]; } add_to_energy_grid(P[n].Pos, mass, vr2_diff, vt2_diff, vp2_diff, vq2_diff, NULL, updated_EGs_EgyResponse_r + off, updated_EGs_EgyResponse_t + off, updated_EGs_EgyResponse_p + off, updated_EGs_EgyResponse_q + off); smooth_stack(updated_EGs_EgyResponse_r, EG_MaxLevel); smooth_stack(updated_EGs_EgyResponse_t, EG_MaxLevel); smooth_stack(updated_EGs_EgyResponse_p, EG_MaxLevel); smooth_stack(updated_EGs_EgyResponse_q, EG_MaxLevel); value_r = calc_stack_difference(updated_EGs_EgyResponse_r, EGs_EgyTarget_r[type], 0, 0, 0, EG_MaxLevel, EGs_MassResponse[type], EGs_MassTarget[type], All.MinParticlesPerBinForDispersionMeasurement * MType[type] / NType[type], NULL, 1); value_t = calc_stack_difference(updated_EGs_EgyResponse_t, EGs_EgyTarget_t[type], 0, 0, 0, EG_MaxLevel, EGs_MassResponse[type], EGs_MassTarget[type], All.MinParticlesPerBinForDispersionMeasurement * MType[type] / NType[type], NULL, 1); value_p = calc_stack_difference(updated_EGs_EgyResponse_p, EGs_EgyTarget_p[type], 0, 0, 0, EG_MaxLevel, EGs_MassResponse[type], EGs_MassTarget[type], All.MinParticlesPerBinForDispersionMeasurement * MType[type] / NType[type], NULL, 1); value_q = calc_stack_difference(updated_EGs_EgyResponse_q, EGs_EgyTarget_q[type], 0, 0, 0, EG_MaxLevel, EGs_MassResponse[type], EGs_MassTarget[type], All.MinParticlesPerBinForDispersionMeasurement * MType[type] / NType[type], NULL, 1); myfree(updated_EGs_EgyResponse_q); myfree(updated_EGs_EgyResponse_p); myfree(updated_EGs_EgyResponse_t); myfree(updated_EGs_EgyResponse_r); int typeOfVelocityStructure; if(type == 1) /* a halo particle */ typeOfVelocityStructure = All.TypeOfHaloVelocityStructure; else if(type == 2) /* disk */ typeOfVelocityStructure = All.TypeOfDiskVelocityStructure; else if(type == 3) /* bulge */ typeOfVelocityStructure = All.TypeOfBulgeVelocityStructure; else terminate("unknown type"); if(typeOfVelocityStructure == 0 || typeOfVelocityStructure == 1) value_q = 0; return value + Srelfac[type] * (value_r + value_t + value_p + value_q) / Srelfac_count[type]; } /* ouput some log-info about how the iterations are progressing. */ void log_message(int iter) { int n, k, type; long long totNoptimized, tottries, totchanges; sumup_large_ints(1, &Noptimized, &totNoptimized); for(type = 1; type <= 3; type++) { sumup_large_ints(1, &Tries[type], &tottries); sumup_large_ints(1, &Changes[type], &totchanges); Tries[type] = 0; Changes[type] = 0; double vavg = 0, vavgtot, norbits = 0; int count = 0, counttot; for(n = 0; n < NumPart; n++) { if(P[n].Type == type) { for(k = 0; k < 3; k++) vavg += P[n].Vel[k] * P[n].Vel[k]; count++; norbits += P[n].Orbits; } } MPI_Allreduce(&vavg, &vavgtot, 1, MPI_DOUBLE, MPI_SUM, MPI_COMM_WORLD); MPI_Allreduce(&count, &counttot, 1, MPI_INT, MPI_SUM, MPI_COMM_WORLD); MPI_Allreduce(&norbits, &Totorbits[type], 1, MPI_DOUBLE, MPI_SUM, MPI_COMM_WORLD); if(counttot > 0) { vavg = sqrt(vavgtot / counttot); mpi_printf ("iter=%5d: opt-frac=%10.7g type=%d vavg=%10.7g (average number of orbits %7.3g) S=%10.7g [Sdisp_r=%10.7g | Sdisp_t=%10.7g | Sdisp_p=%10.7g | Sdisp_q=%10.7g] Sall=%10.7g Chg-Frac=%10.7g\n", iter, ((double)totNoptimized) / All.TotNumPart, type, vavg, (double) Totorbits[type] / counttot, S[type], Sdisp_r[type], Sdisp_t[type], Sdisp_p[type], Sdisp_q[type], S[type] + (Sdisp_r[type] + Sdisp_t[type] + Sdisp_p[type] + Sdisp_q[type]) / Srelfac_count[type], ((double)totchanges) / (tottries+1.0e-60)); if(ThisTask == 0) { fprintf(FdFit[type], "%10.7g %10.7g %10.7g %10.7g %10.7g %10.7g %10.7g %10.7g %10.7g\n", ((double)totNoptimized) / All.TotNumPart, S[type], vavg, Sdisp_r[type], Sdisp_t[type], Sdisp_p[type], Sdisp_q[type], S[type] + (Sdisp_r[type] + Sdisp_t[type] + Sdisp_p[type] + Sdisp_q[type]) / Srelfac_count[type], ((double)totchanges) / (tottries+1.0e-60)); fflush(FdFit[type]); } } } } GalIC/src/mymalloc.c000644 000765 000024 00000052041 12373713530 015220 0ustar00volkerstaff000000 000000 /******************************************************************************* * This file is part of the GALIC code developed by D. Yurin and V. Springel. * * Copyright (c) 2014 * Denis Yurin (denis.yurin@h-its.org) * Volker Springel (volker.springel@h-its.org) *******************************************************************************/ #include #include #include #include #include #include #include "allvars.h" #include "proto.h" #define MAXBLOCKS 5000 #define MAXCHARS 19 /** \file mymalloc.c * \brief Manager for dynamic memory allocation * * This module handles the dynamic memory allocation. * To avoid memory allocation/dellocation overhead a big chunk of memory * (which will be the maximum amount of dinamically allocatable memory) * is allocated upon initialization. This chunk is then filled by the memory * blocks as in a stack structure. The blocks are automatically aligned to a 64 bit boundary. * Memory blocks come in two flavours: movable and non-movable. In non-movable * blocks the starting address is fixed once the block is allocated and cannot be changed. * Due to the stack structure of the dynamic memory, this implies that the last (non-movable) * block allocated must be the first block to be deallocated. If this condition is not met, * an abort condition is triggered. If more flexibility is needed, movable memory blocks can * be used. In this case, the starting address of the block is again fixed upon allocation * but the block can be shifted (therefore its initial address changes) according to needs. * For a movable block to be successfully shifted it is required that all the subsequent allocated * blocks are movable. Again, an abort condition is triggered if this condition is not met. * Movable blocks can be deallocated in any order provided that the condition just described holds. * The gap resulting form the deallocation of a block that is not in * the last position will be automatically filled by shifting all the blocks coming after the * deallocated block. */ static size_t TotBytes; /**< The total dimension (in bytes) of dynamic memory available to the current task. */ static void *Base; /**< Base pointer (initial memory address) of the stack. */ static unsigned long Nblocks; /**< The current number of allocated memory blocks. */ static void **Table; /**< Table containing the initial addresses of the allocated memory blocks.*/ static size_t *BlockSize; /**< Array containing the size (in bytes) of all the allocated memory blocks. */ static char *MovableFlag; /**< Identifies whether a block is movable. */ static void ***BasePointers; /**< Base pointers containing the initial addresses of movable memory blocks */ static size_t GlobHighMarkBytes = 0; /**< The maximum number of bytes allocated by all tasks. */ static char *VarName; /**< The name of the variable with which the block has been allocated. */ static char *FunctionName; /**< The function name that has allocated the memory block. */ static char *FileName; /**< The file name where the function that has allocated the block is called. */ static int *LineNumber; /**< The line number in FileName where the function that allocated the block has been called. */ /** \brief Initialize memory manager. * * This function initializes the memory manager. In particular, it sets * the global variables of the module to their initial value and allocates * the memory for the stack. */ void mymalloc_init(void) { size_t n; BlockSize = (size_t *) malloc(MAXBLOCKS * sizeof(size_t)); Table = (void **) malloc(MAXBLOCKS * sizeof(void *)); MovableFlag = (char *) malloc(MAXBLOCKS * sizeof(char)); BasePointers = (void ***) malloc(MAXBLOCKS * sizeof(void **)); VarName = (char *) malloc(MAXBLOCKS * MAXCHARS * sizeof(char)); FunctionName = (char *) malloc(MAXBLOCKS * MAXCHARS * sizeof(char)); FileName = (char *) malloc(MAXBLOCKS * MAXCHARS * sizeof(char)); LineNumber = (int *) malloc(MAXBLOCKS * sizeof(int)); memset(VarName, 0, MAXBLOCKS * MAXCHARS); memset(FunctionName, 0, MAXBLOCKS * MAXCHARS); memset(FileName, 0, MAXBLOCKS * MAXCHARS); n = All.MaxMemSize * ((size_t) 1024 * 1024); if(n & 63) terminate("want 64 byte aligned address"); if(!(Base = malloc(n))) { printf("Failed to allocate memory for `Base' (%d Mbytes).\n", All.MaxMemSize); terminate("failure to allocate memory"); } TotBytes = FreeBytes = n; AllocatedBytes = 0; Nblocks = 0; HighMarkBytes = 0; } /** \brief Output memory usage for the task with the greatest amount of memory allocated. * * \param OldHighMarkBytes old value of the maximum number of bytes allocated. If the current maximum amount of bytes is less than this value no output is done * \param label contains the neme of the code module which requested the memory report (e.g. RUN, ...) * \param func name of function that has requested the memory usage report (usually given by the __FUNCTION__ macro) * \param file file where the function that has requested the memory usage report resides (usually given by the __FILE__ macro) * \param line line number of file where the function that has requested the memory usage was called (usually given by the __LINE__ macro) */ void report_detailed_memory_usage_of_largest_task(size_t * OldHighMarkBytes, const char *label, const char *func, const char *file, int line) { size_t *sizelist, maxsize, minsize; double avgsize; int i, task; sizelist = (size_t *) mymalloc("sizelist", NTask * sizeof(size_t)); MPI_Allgather(&AllocatedBytes, sizeof(size_t), MPI_BYTE, sizelist, sizeof(size_t), MPI_BYTE, MPI_COMM_WORLD); for(i = 1, task = 0, maxsize = minsize = sizelist[0], avgsize = sizelist[0]; i < NTask; i++) { if(sizelist[i] > maxsize) { maxsize = sizelist[i]; task = i; } if(sizelist[i] < minsize) { minsize = sizelist[i]; } avgsize += sizelist[i]; } myfree(sizelist); if(maxsize > GlobHighMarkBytes) GlobHighMarkBytes = maxsize; if(maxsize > 1.1 * (*OldHighMarkBytes)) { *OldHighMarkBytes = maxsize; avgsize /= NTask; if(ThisTask == task) { char *buf = mymalloc("buf", 200 * (Nblocks + 10)); int cc = 0; cc += sprintf (buf + cc, "\n\nStep=%d: At '%s', %s()/%s/%d: Largest Allocation = %g Mbyte (on task=%d), Smallest = %g Mbyte, Average = %g Mbyte (Past Largest: %g Mbyte)\n\n", All.NumCurrentTiStep, label, func, file, line, maxsize / (1024.0 * 1024.0), task, minsize / (1024.0 * 1024.0), avgsize / (1024.0 * 1024.0), GlobHighMarkBytes / (1024.0 * 1024.0)); cc += dump_memory_table_buffer(buf + cc); if(task == 0) { if(RestartFlag <= 2) { fprintf(FdMemory, "%s", buf); fflush(FdMemory); } } else { MPI_Send(&cc, 1, MPI_INT, 0, TAG_N, MPI_COMM_WORLD); MPI_Send(buf, cc + 1, MPI_BYTE, 0, TAG_PDATA, MPI_COMM_WORLD); } myfree(buf); } if(ThisTask == 0 && task > 0) { int cc; MPI_Recv(&cc, 1, MPI_INT, task, TAG_N, MPI_COMM_WORLD, MPI_STATUS_IGNORE); char *buf = mymalloc("buf", cc + 1); MPI_Recv(buf, cc + 1, MPI_BYTE, task, TAG_PDATA, MPI_COMM_WORLD, MPI_STATUS_IGNORE); if(RestartFlag <= 2) { fprintf(FdMemory, "%s", buf); fflush(FdMemory); } myfree(buf); } fflush(stdout); MPI_Barrier(MPI_COMM_WORLD); } } /** \brief Dump the buffer where the memory information is stored to the standard output. * */ void dump_memory_table(void) { char *buf = malloc(200 * (Nblocks + 10)); dump_memory_table_buffer(buf); printf("%s", buf); free(buf); } /** \brief Fill the output buffer with the memory log. * * \param p output buffer * \return the number of charcter written to p */ int dump_memory_table_buffer(char *p) { int i, cc = 0; size_t totBlocksize = 0; cc += sprintf(p + cc, "-------------------------- Allocated Memory Blocks----------------------------------------\n"); cc += sprintf(p + cc, "Task Nr F Variable MBytes Cumulative Function/File/Linenumber\n"); cc += sprintf(p + cc, "------------------------------------------------------------------------------------------\n"); for(i = 0; i < Nblocks; i++) { totBlocksize += BlockSize[i]; if(strncmp(VarName + i * MAXCHARS, "yieldsSNIa", 10) == 0 || strncmp(VarName + i * MAXCHARS, "yieldsSNII", 10) == 0 || strncmp(VarName + i * MAXCHARS, "yieldsAGB", 9) == 0 || strncmp(VarName + i * MAXCHARS, "netCoolingRate", 14) == 0) { continue; } cc += sprintf(p + cc, "%4d %5d %d %19s %10.4f %10.4f %s()/%s/%d\n", ThisTask, i, MovableFlag[i], VarName + i * MAXCHARS, BlockSize[i] / (1024.0 * 1024.0), totBlocksize / (1024.0 * 1024.0), FunctionName + i * MAXCHARS, FileName + i * MAXCHARS, LineNumber[i]); } cc += sprintf(p + cc, "------------------------------------------------------------------------------------------\n"); return cc; } /** \brief Allocate a non-movable memory block and store the relative information. * * \param varname name of the variable to be stored in the allocated block * \param n size of the memory block in bytes * \param func name of function that has called the allocation routine (usually given by the __FUNCTION__ macro) * \param file file where the function that has called the allocation routine resides (usually given by the __FILE__ macro) * \param line line number of file where the allocation routine was called (usually given by the __LINE__ macro) * \return a pointer to the beginning of the allocated memory block */ void *mymalloc_fullinfo(const char *varname, size_t n, const char *func, const char *file, int line) { char msg[1000]; if((n % 64) > 0) n = (n / 64 + 1) * 64; if(n < 64) n = 64; if(Nblocks >= MAXBLOCKS) { sprintf(msg, "Task=%d: No blocks left in mymalloc_fullinfo() at %s()/%s/line %d. MAXBLOCKS=%d\n", ThisTask, func, file, line, MAXBLOCKS); terminate(msg); } if(n > FreeBytes) { dump_memory_table(); sprintf (msg, "\nTask=%d: Not enough memory in mymalloc_fullinfo() to allocate %g MB for variable '%s' at %s()/%s/line %d (FreeBytes=%g MB).\n", ThisTask, n / (1024.0 * 1024.0), varname, func, file, line, FreeBytes / (1024.0 * 1024.0)); terminate(msg); } Table[Nblocks] = Base + (TotBytes - FreeBytes); FreeBytes -= n; strncpy(VarName + Nblocks * MAXCHARS, varname, MAXCHARS - 1); strncpy(FunctionName + Nblocks * MAXCHARS, func, MAXCHARS - 1); strncpy(FileName + Nblocks * MAXCHARS, file, MAXCHARS - 1); LineNumber[Nblocks] = line; AllocatedBytes += n; BlockSize[Nblocks] = n; MovableFlag[Nblocks] = 0; Nblocks += 1; if(AllocatedBytes > HighMarkBytes) HighMarkBytes = AllocatedBytes; return Table[Nblocks - 1]; } /** \brief Allocate a movable memory block and store the relative information. * * \param ptr pointer to the initial memory address of the block * \param varname name of the variable to be stored in the allocated block * \param n size of the memory block in bytes * \param func name of function that has called the allocation routine (usually given by the __FUNCTION__ macro) * \param file file where the function that has called the allocation routine resides (usually given by the __FILE__ macro) * \param line line number of file where the allocation routine was called (usually given by the __LINE__ macro) * \return a pointer to the beginning of the allocated memory block */ void *mymalloc_movable_fullinfo(void *ptr, const char *varname, size_t n, const char *func, const char *file, int line) { char msg[1000]; if((n % 64) > 0) n = (n / 64 + 1) * 64; if(n < 64) n = 64; if(Nblocks >= MAXBLOCKS) { sprintf(msg, "Task=%d: No blocks left in mymalloc_fullinfo() at %s()/%s/line %d. MAXBLOCKS=%d\n", ThisTask, func, file, line, MAXBLOCKS); terminate(msg); } if(n > FreeBytes) { dump_memory_table(); sprintf (msg, "\nTask=%d: Not enough memory in mymalloc_fullinfo() to allocate %g MB for variable '%s' at %s()/%s/line %d (FreeBytes=%g MB).\n", ThisTask, n / (1024.0 * 1024.0), varname, func, file, line, FreeBytes / (1024.0 * 1024.0)); terminate(msg); } Table[Nblocks] = Base + (TotBytes - FreeBytes); FreeBytes -= n; strncpy(VarName + Nblocks * MAXCHARS, varname, MAXCHARS - 1); strncpy(FunctionName + Nblocks * MAXCHARS, func, MAXCHARS - 1); strncpy(FileName + Nblocks * MAXCHARS, file, MAXCHARS - 1); LineNumber[Nblocks] = line; AllocatedBytes += n; BlockSize[Nblocks] = n; MovableFlag[Nblocks] = 1; BasePointers[Nblocks] = ptr; Nblocks += 1; if(AllocatedBytes > HighMarkBytes) HighMarkBytes = AllocatedBytes; return Table[Nblocks - 1]; } /** \brief Deallocate a non-movable memory block. * * For this operation to be successful the block that has to be deallocated must be the last allocated one. * * \param p pointer to the memory block to be deallocated * \param func name of function that has called the deallocation routine (usually given by the __FUNCTION__ macro) * \param file file where the function that has called the deallocation routine resides (usually given by the __FILE__ macro) * \param line line number of file where the deallocation routine was called (usually given by the __LINE__ macro) */ void myfree_fullinfo(void *p, const char *func, const char *file, int line) { char msg[1000]; if(Nblocks == 0) { sprintf(msg, "no allocated blocks that could be freed"); terminate(msg); } if(p != Table[Nblocks - 1]) { dump_memory_table(); sprintf(msg, "Task=%d: Wrong call of myfree() at %s()/%s/line %d: not the last allocated block!\n", ThisTask, func, file, line); terminate(msg); } Nblocks -= 1; AllocatedBytes -= BlockSize[Nblocks]; FreeBytes += BlockSize[Nblocks]; } /** \brief Deallocate a movable memory block. * * For this operation to be successful all the blocks allocated after the block that has to be freed must be of movable type. * * \param p pointer to the memory block to be deallocated * \param func name of function that has called the deallocation routine (usually given by the __FUNCTION__ macro) * \param file file where the function that has called the deallocation routine resides (usually given by the __FILE__ macro) * \param line line number of file where the deallocation routine was called (usually given by the __LINE__ macro) */ void myfree_movable_fullinfo(void *p, const char *func, const char *file, int line) { int i; char msg[1000]; if(Nblocks == 0) { sprintf(msg, "no allocated blocks that could be freed"); terminate(msg); } /* first, let's find the block */ int nr; for(nr = Nblocks - 1; nr >= 0; nr--) if(p == Table[nr]) break; if(nr < 0) { dump_memory_table(); sprintf(msg, "Task=%d: Wrong call of myfree_movable() from %s()/%s/line %d - this block has not been allocated!\n", ThisTask, func, file, line); terminate(msg); } if(nr < Nblocks - 1) /* the block is not the last allocated block */ { /* check that all subsequent blocks are actually movable */ for(i = nr + 1; i < Nblocks; i++) if(MovableFlag[i] == 0) { dump_memory_table(); sprintf (msg, "Task=%d: Wrong call of myfree_movable() from %s()/%s/line %d - behind block=%d there are subsequent non-movable allocated blocks\n", ThisTask, func, file, line, nr); fflush(stdout); terminate(msg); } } AllocatedBytes -= BlockSize[nr]; FreeBytes += BlockSize[nr]; size_t offset = -BlockSize[nr]; size_t length = 0; for(i = nr + 1; i < Nblocks; i++) length += BlockSize[i]; if(nr < Nblocks - 1) memmove(Table[nr + 1] + offset, Table[nr + 1], length); for(i = nr + 1; i < Nblocks; i++) { Table[i] += offset; *BasePointers[i] = *BasePointers[i] + offset; } for(i = nr + 1; i < Nblocks; i++) { Table[i - 1] = Table[i]; BasePointers[i - 1] = BasePointers[i]; BlockSize[i - 1] = BlockSize[i]; MovableFlag[i - 1] = MovableFlag[i]; strncpy(VarName + (i - 1) * MAXCHARS, VarName + i * MAXCHARS, MAXCHARS - 1); strncpy(FunctionName + (i - 1) * MAXCHARS, FunctionName + i * MAXCHARS, MAXCHARS - 1); strncpy(FileName + (i - 1) * MAXCHARS, FileName + i * MAXCHARS, MAXCHARS - 1); LineNumber[i - 1] = LineNumber[i]; } Nblocks -= 1; } /** \brief Reallocate an existing non-movable memory block. * * For this operation to be successful this must be the last allocated block. * * \param p pointer to the existing memory block to be reallocated * \param n the new size of the memory block in bytes * \param func name of function that has called the reallocation routine (usually given by the __FUNCTION__ macro) * \param file file where the function that has called the reallocation routine resides (usually given by the __FILE__ macro) * \param line line number of file where the reallocation routine was called (usually given by the __LINE__ macro) * \return a pointer to the beginning of the newly allocated memory block */ void *myrealloc_fullinfo(void *p, size_t n, const char *func, const char *file, int line) { char msg[1000]; if((n % 64) > 0) n = (n / 64 + 1) * 64; if(n < 64) n = 64; if(Nblocks == 0) { sprintf(msg, "no allocated blocks that could be reallocated"); terminate(msg); } if(p != Table[Nblocks - 1]) { dump_memory_table(); sprintf(msg, "Task=%d: Wrong call of myrealloc() at %s()/%s/line %d - not the last allocated block!\n", ThisTask, func, file, line); terminate(msg); } AllocatedBytes -= BlockSize[Nblocks - 1]; FreeBytes += BlockSize[Nblocks - 1]; if(n > FreeBytes) { dump_memory_table(); sprintf (msg, "Task=%d: Not enough memory in myremalloc(n=%g MB) at %s()/%s/line %d. previous=%g FreeBytes=%g MB\n", ThisTask, n / (1024.0 * 1024.0), func, file, line, BlockSize[Nblocks - 1] / (1024.0 * 1024.0), FreeBytes / (1024.0 * 1024.0)); terminate(msg); } Table[Nblocks - 1] = Base + (TotBytes - FreeBytes); FreeBytes -= n; AllocatedBytes += n; BlockSize[Nblocks - 1] = n; if(AllocatedBytes > HighMarkBytes) HighMarkBytes = AllocatedBytes; return Table[Nblocks - 1]; } /** \brief Reallocate an existing movable memory block. * * For this operation to be successful all the blocks allocated after the block that has to be reallocated must be of movable type. * * \param p pointer to the existing memory block to be reallocated * \param n the new size of the memory block in bytes * \param func name of function that has called the reallocation routine (usually given by the __FUNCTION__ macro) * \param file file where the function that has called the reallocation routine resides (usually given by the __FILE__ macro) * \param line line number of file where the reallocation routine was called (usually given by the __LINE__ macro) * \return a pointer to the beginning of the newly allocated memory block */ void *myrealloc_movable_fullinfo(void *p, size_t n, const char *func, const char *file, int line) { int i; char msg[1000]; if((n % 64) > 0) n = (n / 64 + 1) * 64; if(n < 64) n = 64; if(Nblocks == 0) { sprintf(msg, "no allocated blocks that could be reallocated"); terminate(msg); } /* first, let's find the block */ int nr; for(nr = Nblocks - 1; nr >= 0; nr--) if(p == Table[nr]) break; if(nr < 0) { dump_memory_table(); sprintf(msg, "Task=%d: Wrong call of myrealloc_movable() from %s()/%s/line %d - this block has not been allocated!\n", ThisTask, func, file, line); terminate(msg); } if(nr < Nblocks - 1) /* the block is not the last allocated block */ { /* check that all subsequent blocks are actually movable */ for(i = nr + 1; i < Nblocks; i++) if(MovableFlag[i] == 0) { dump_memory_table(); sprintf (msg, "Task=%d: Wrong call of myrealloc_movable() from %s()/%s/line %d - behind block=%d there are subsequent non-movable allocated blocks\n", ThisTask, func, file, line, nr); terminate(msg)} } AllocatedBytes -= BlockSize[nr]; FreeBytes += BlockSize[nr]; if(n > FreeBytes) { dump_memory_table(); sprintf (msg, "Task=%d: at %s()/%s/line %d: Not enough memory in myremalloc_movable(n=%g MB). previous=%g FreeBytes=%g MB\n", ThisTask, func, file, line, n / (1024.0 * 1024.0), BlockSize[nr] / (1024.0 * 1024.0), FreeBytes / (1024.0 * 1024.0)); terminate(msg); } size_t offset = n - BlockSize[nr]; size_t length = 0; for(i = nr + 1; i < Nblocks; i++) length += BlockSize[i]; if(nr < Nblocks - 1) memmove(Table[nr + 1] + offset, Table[nr + 1], length); for(i = nr + 1; i < Nblocks; i++) { Table[i] += offset; *BasePointers[i] = *BasePointers[i] + offset; } FreeBytes -= n; AllocatedBytes += n; BlockSize[nr] = n; if(AllocatedBytes > HighMarkBytes) HighMarkBytes = AllocatedBytes; return Table[nr]; } GalIC/src/orbit_response.c000644 000765 000024 00000010432 12373713530 016436 0ustar00volkerstaff000000 000000 /******************************************************************************* * This file is part of the GALIC code developed by D. Yurin and V. Springel. * * Copyright (c) 2014 * Denis Yurin (denis.yurin@h-its.org) * Volker Springel (volker.springel@h-its.org) *******************************************************************************/ #include #include #include #include #include #include #include #include #include "allvars.h" #include "proto.h" /* returns the timestep for the particle with the giving velocity and acceleration */ double get_timestep(double *pos, double *vel, double *acc, int icell) { // double r = sqrt(pos[0]*pos[0] + pos[1]*pos[1] + pos[2]*pos[2]); double v = sqrt(vel[0] * vel[0] + vel[1] * vel[1] + vel[2] * vel[2]); double aa = sqrt(acc[0] * acc[0] + acc[1] * acc[1] + acc[2] * acc[2]); double torbit = All.V200 / aa; double tcross = DG_CellSize[icell] / v; return dmin(All.TimeStepFactorOrbit * torbit, All.TimeStepFactorCellCross * tcross); } /* calculate the density response of a single particle starting from pos[]/vel[], * averaged over time 'timespan'. If timespan=0, the routine determines an * appropriate time itself. */ double produce_orbit_response_field(double *pos, double *vel, int id, double *mfield, double mass, double timespan, int *orbitstaken) { int i, norbit, icell, flag = 0, iR, iz; double x[3], v[3], a[3], dt, tall, radsign_previous = 0, radsign, fR, fz; for(i = 0; i < 3; i++) { x[i] = pos[i]; v[i] = vel[i]; } for(i = 0; i < DG_Ngrid; i++) mfield[i] = 0; norbit = 0; tall = 0; forcegrid_get_acceleration(x, a); densitygrid_get_cell(x, &iR, &iz, &fR, &fz); icell = iz * DG_Nbin + iR; int Norbits = 100000000; double E0 = 0.5 * (v[0] * v[0] + v[1] * v[1] + v[2] * v[2]) + forcegrid_get_potential(x); int steps = 0; do { dt = get_timestep(x, v, a, icell); if(timespan > 0) if(dt + tall > timespan) { dt = timespan - tall; flag = 1; } mfield[iz * DG_Nbin + iR] += 0.5 * dt * (1 - fR) * (1 - fz); mfield[iz * DG_Nbin + (iR + 1)] += 0.5 * dt * (fR) * (1 - fz); mfield[(iz + 1) * DG_Nbin + iR] += 0.5 * dt * (1 - fR) * (fz); mfield[(iz + 1) * DG_Nbin + (iR + 1)] += 0.5 * dt * (fR) * (fz); for(i = 0; i < 3; i++) v[i] += 0.5 * dt * a[i]; for(i = 0; i < 3; i++) x[i] += dt * v[i]; forcegrid_get_acceleration(x, a); for(i = 0; i < 3; i++) v[i] += 0.5 * dt * a[i]; densitygrid_get_cell(x, &iR, &iz, &fR, &fz); icell = iz * DG_Nbin + iR; mfield[iz * DG_Nbin + iR] += 0.5 * dt * (1 - fR) * (1 - fz); mfield[iz * DG_Nbin + (iR + 1)] += 0.5 * dt * (fR) * (1 - fz); mfield[(iz + 1) * DG_Nbin + iR] += 0.5 * dt * (1 - fR) * (fz); mfield[(iz + 1) * DG_Nbin + (iR + 1)] += 0.5 * dt * (fR) * (fz); tall += dt; radsign = v[0] * x[0] + v[1] * x[1] + v[2] * x[2]; if(radsign > 0 && radsign_previous < 0) norbit++; radsign_previous = radsign; steps++; if(steps > 100000000) { printf("too many steps... pos=(%g|%g|%g) vel=(%g|%g|%g) dt=%g\n", pos[0], pos[1], pos[2], vel[0], vel[1], vel[2], dt); double E1 = 0.5 * (v[0] * v[0] + v[1] * v[1] + v[2] * v[2]) + forcegrid_get_potential(x); printf("steps=%d: rel error = %g\n", steps, fabs(E1 - E0) / fabs(E0)); exit(1); } } while((timespan == 0 && norbit < Norbits) || (timespan != 0 && flag == 0)); double E1 = 0.5 * (v[0] * v[0] + v[1] * v[1] + v[2] * v[2]) + forcegrid_get_potential(x); double rel_egy_error = fabs((E1 - E0) / E0); if(rel_egy_error > 0.5) { mpi_printf("relative energy error= %g orbits=%d steps=%d pos(=%g|%g|%g) vel=(%g|%g|%g)\n", rel_egy_error, norbit, steps, pos[0], pos[1], pos[2], vel[0], vel[1], vel[2]); /* terminate("error seems large, we better stop: pos=(%g|%g|%g) vel=(%g|%g|%g) id=%d v=%g vesc=%g", pos[0], pos[1], pos[2], vel[0], vel[1], vel[2], id, sqrt(vel[0] * vel[0] + vel[1] * vel[1] + vel[2] * vel[2]), forcegrid_get_escape_speed(pos)); */ } double fac = mass / tall; for(i = 0; i < DG_Ngrid; i++) mfield[i] *= fac; *orbitstaken = norbit; return tall; } GalIC/src/parallel_sort.c000644 000765 000024 00000053164 12373713530 016255 0ustar00volkerstaff000000 000000 /******************************************************************************* * This file is part of the GALIC code developed by D. Yurin and V. Springel. * * Copyright (c) 2014 * Denis Yurin (denis.yurin@h-its.org) * Volker Springel (volker.springel@h-its.org) *******************************************************************************/ #include #include #include #include #include #include #include #include #include #include #include "allvars.h" #include "proto.h" #define TRANSFER_SIZE_LIMIT 1000000000 #define MAX_ITER_PARALLEL_SORT 500 /* Note: For gcc-4.1.2, the compiler produces incorrect code for this routine if optimization level O1 or higher is used. * In gcc-4.3.4, this problem is absent. */ #define TAG_TRANSFER 100 /* #define CHECK_LOCAL_RANK */ static void serial_sort(char *base, size_t nmemb, size_t size, int (*compar) (const void *, const void *)); static void msort_serial_with_tmp(char *base, size_t n, size_t s, int (*compar) (const void *, const void *), char *t); static void get_local_rank(char *element, size_t tie_braking_rank, char *base, size_t nmemb, size_t size, size_t noffs_thistask, long long left, long long right, size_t * loc, int (*compar) (const void *, const void *)); #ifdef CHECK_LOCAL_RANK static void check_local_rank(char *element, size_t tie_braking_rank, char *base, size_t nmemb, size_t size, size_t noffs_thistask, long long left, long long right, size_t loc, int (*compar) (const void *, const void *)); #endif static int (*comparfunc) (const void *, const void *); static char *median_element_list; static size_t element_size; int parallel_sort_indirect_compare(const void *a, const void *b) { return (*comparfunc) (median_element_list + *((int *) a) * element_size, median_element_list + *((int *) b) * element_size); } double parallel_sort(void *base, size_t nmemb, size_t size, int (*compar) (const void *, const void *)) { return parallel_sort_comm(base, nmemb, size, compar, MPI_COMM_WORLD); } double parallel_sort_comm(void *base, size_t nmemb, size_t size, int (*compar) (const void *, const void *), MPI_Comm comm) { int i, j, ranks_not_found, Local_ThisTask, Local_NTask, Local_PTask, Color, new_max_loc; size_t tie_braking_rank, new_tie_braking_rank, rank; MPI_Comm MPI_CommLocal; double ta = second(); /* do a serial sort of the local data up front */ serial_sort((char *) base, nmemb, size, compar); /* we create a communicator that contains just those tasks with nmemb > 0. This makes * it easier to deal with CPUs that do not hold any data. */ if(nmemb) Color = 1; else Color = 0; MPI_Comm_split(comm, Color, ThisTask, &MPI_CommLocal); MPI_Comm_rank(MPI_CommLocal, &Local_ThisTask); MPI_Comm_size(MPI_CommLocal, &Local_NTask); if(Local_NTask > 1 && Color == 1) { for(Local_PTask = 0; Local_NTask > (1 << Local_PTask); Local_PTask++); size_t *nlist = (size_t *) mymalloc("nlist", Local_NTask * sizeof(size_t)); size_t *noffs = (size_t *) mymalloc("noffs", Local_NTask * sizeof(size_t)); MPI_Allgather(&nmemb, sizeof(size_t), MPI_BYTE, nlist, sizeof(size_t), MPI_BYTE, MPI_CommLocal); for(i = 1, noffs[0] = 0; i < Local_NTask; i++) noffs[i] = noffs[i - 1] + nlist[i - 1]; char *element_guess = mymalloc("element_guess", Local_NTask * size); size_t *element_tie_braking_rank = mymalloc("element_tie_braking_rank", Local_NTask * sizeof(size_t)); size_t *desired_glob_rank = mymalloc("desired_glob_rank", Local_NTask * sizeof(size_t)); size_t *current_glob_rank = mymalloc("current_glob_rank", Local_NTask * sizeof(size_t)); size_t *current_loc_rank = mymalloc("current_loc_rank", Local_NTask * sizeof(size_t)); long long *range_left = mymalloc("range_left", Local_NTask * sizeof(long long)); long long *range_right = mymalloc("range_right", Local_NTask * sizeof(long long)); int *max_loc = mymalloc("max_loc", Local_NTask * sizeof(int)); size_t *list = mymalloc("list", Local_NTask * sizeof(size_t)); size_t *range_len_list = mymalloc("range_len_list", Local_NTask * sizeof(long long)); char *median_element = mymalloc("median_element", size); median_element_list = mymalloc("median_element_list", Local_NTask * size); size_t *tie_braking_rank_list = mymalloc("tie_braking_rank_list", Local_NTask * sizeof(size_t)); int *index_list = mymalloc("index_list", Local_NTask * sizeof(int)); int *max_loc_list = mymalloc("max_loc_list", Local_NTask * sizeof(int)); size_t *source_range_len_list = mymalloc("source_range_len_list", Local_NTask * sizeof(long long)); size_t *source_tie_braking_rank_list = mymalloc("source_tie_braking_rank_list", Local_NTask * sizeof(long long)); char *source_median_element_list = mymalloc("source_median_element_list", Local_NTask * size); char *new_element_guess = mymalloc("new_element_guess", size); for(i = 0; i < Local_NTask - 1; i++) { desired_glob_rank[i] = noffs[i + 1]; current_glob_rank[i] = 0; range_left[i] = 0; /* first element that it can be */ range_right[i] = nmemb; /* first element that it can not be */ } /* now we determine the first split element guess, which is the same for all divisions in the first iteration */ /* find the median of each processor, and then take the median among those values. * This should work reasonably well even for extremely skewed distributions */ long long range_len = range_right[0] - range_left[0]; if(range_len >= 1) { long long mid = (range_left[0] + range_right[0]) / 2; memcpy(median_element, (char *) base + mid * size, size); tie_braking_rank = mid + noffs[Local_ThisTask]; } MPI_Gather(&range_len, sizeof(long long), MPI_BYTE, range_len_list, sizeof(long long), MPI_BYTE, 0, MPI_CommLocal); MPI_Gather(median_element, size, MPI_BYTE, median_element_list, size, MPI_BYTE, 0, MPI_CommLocal); MPI_Gather(&tie_braking_rank, sizeof(size_t), MPI_BYTE, tie_braking_rank_list, sizeof(size_t), MPI_BYTE, 0, MPI_CommLocal); if(Local_ThisTask == 0) { for(j = 0; j < Local_NTask; j++) max_loc_list[j] = j; /* eliminate the elements that are undefined because the corresponding CPU has zero range left */ int nleft = Local_NTask; for(j = 0; j < nleft; j++) { if(range_len_list[j] < 1) { range_len_list[j] = range_len_list[nleft - 1]; if(range_len_list[nleft - 1] >= 1 && j != (nleft - 1)) { memcpy(median_element_list + j * size, median_element_list + (nleft - 1) * size, size); memcpy(tie_braking_rank_list + j, tie_braking_rank_list + (nleft - 1), sizeof(size_t)); max_loc_list[j] = max_loc_list[nleft - 1]; } nleft--; j--; } } /* do a serial sort of the remaining elements (indirectly, so that we have the order of tie braking list as well) */ comparfunc = compar; element_size = size; for(j = 0; j < nleft; j++) index_list[j] = j; qsort(index_list, nleft, sizeof(int), parallel_sort_indirect_compare); /* now select the median of the medians */ int mid = nleft / 2; memcpy(&element_guess[0], median_element_list + index_list[mid] * size, size); element_tie_braking_rank[0] = tie_braking_rank_list[index_list[mid]]; max_loc[0] = max_loc_list[index_list[mid]]; } MPI_Bcast(element_guess, size, MPI_BYTE, 0, MPI_CommLocal); MPI_Bcast(&element_tie_braking_rank[0], sizeof(size_t), MPI_BYTE, 0, MPI_CommLocal); MPI_Bcast(&max_loc[0], 1, MPI_INT, 0, MPI_CommLocal); for(i = 1; i < Local_NTask - 1; i++) { memcpy(element_guess + i * size, element_guess, size); element_tie_braking_rank[i] = element_tie_braking_rank[0]; max_loc[i] = max_loc[0]; } int iter = 0; do { for(i = 0; i < Local_NTask - 1; i++) { if(current_glob_rank[i] != desired_glob_rank[i]) { get_local_rank(element_guess + i * size, element_tie_braking_rank[i], (char *) base, nmemb, size, noffs[Local_ThisTask], range_left[i], range_right[i], ¤t_loc_rank[i], compar); #ifdef CHECK_LOCAL_RANK check_local_rank(element_guess + i * size, element_tie_braking_rank[i], (char *) base, nmemb, size, noffs[Local_ThisTask], range_left[i], range_right[i], current_loc_rank[i], compar); #endif } } /* now compute the global ranks by summing the local ranks */ /* Note: the last element in current_loc_rank is not defined. It will be summed by the last processor, and stored in the last element of current_glob_rank */ MPI_Alltoall(current_loc_rank, sizeof(size_t), MPI_BYTE, list, sizeof(size_t), MPI_BYTE, MPI_CommLocal); for(j = 0, rank = 0; j < Local_NTask; j++) rank += list[j]; MPI_Allgather(&rank, sizeof(size_t), MPI_BYTE, current_glob_rank, sizeof(size_t), MPI_BYTE, MPI_CommLocal); for(i = 0, ranks_not_found = 0; i < Local_NTask - 1; i++) { if(current_glob_rank[i] != desired_glob_rank[i]) /* here we're not yet done */ { ranks_not_found++; if(current_glob_rank[i] < desired_glob_rank[i]) { range_left[i] = current_loc_rank[i]; if(Local_ThisTask == max_loc[i]) range_left[i]++; } if(current_glob_rank[i] > desired_glob_rank[i]) range_right[i] = current_loc_rank[i]; } } /* now we need to determine new element guesses */ for(i = 0; i < Local_NTask - 1; i++) { if(current_glob_rank[i] != desired_glob_rank[i]) /* here we're not yet done */ { /* find the median of each processor, and then take the median among those values. * This should work reasonably well even for extremely skewed distributions */ source_range_len_list[i] = range_right[i] - range_left[i]; if(source_range_len_list[i] >= 1) { long long middle = (range_left[i] + range_right[i]) / 2; memcpy(source_median_element_list + i * size, (char *) base + middle * size, size); source_tie_braking_rank_list[i] = middle + noffs[Local_ThisTask]; } } } MPI_Alltoall(source_range_len_list, sizeof(long long), MPI_BYTE, range_len_list, sizeof(long long), MPI_BYTE, MPI_CommLocal); MPI_Alltoall(source_median_element_list, size, MPI_BYTE, median_element_list, size, MPI_BYTE, MPI_CommLocal); MPI_Alltoall(source_tie_braking_rank_list, sizeof(size_t), MPI_BYTE, tie_braking_rank_list, sizeof(size_t), MPI_BYTE, MPI_CommLocal); if(Local_ThisTask < Local_NTask - 1) { if(current_glob_rank[Local_ThisTask] != desired_glob_rank[Local_ThisTask]) /* in this case we're not yet done for this split point */ { for(j = 0; j < Local_NTask; j++) max_loc_list[j] = j; /* eliminate the elements that are undefined because the corresponding CPU has zero range left */ int nleft = Local_NTask; for(j = 0; j < nleft; j++) { if(range_len_list[j] < 1) { range_len_list[j] = range_len_list[nleft - 1]; if(range_len_list[nleft - 1] >= 1 && j != (nleft - 1)) { memcpy(median_element_list + j * size, median_element_list + (nleft - 1) * size, size); memcpy(tie_braking_rank_list + j, tie_braking_rank_list + (nleft - 1), sizeof(size_t)); max_loc_list[j] = max_loc_list[nleft - 1]; } nleft--; j--; } } if((iter & 1)) { int max_range, maxj; for(j = 0, maxj = 0, max_range = 0; j < nleft; j++) if(range_len_list[j] > max_range) { max_range = range_len_list[j]; maxj = j; } /* now select the median element from the task which has the largest range */ memcpy(new_element_guess, median_element_list + maxj * size, size); new_tie_braking_rank = tie_braking_rank_list[maxj]; new_max_loc = max_loc_list[maxj]; } else { /* do a serial sort of the remaining elements (indirectly, so that we have the order of tie braking list as well) */ comparfunc = compar; element_size = size; for(j = 0; j < nleft; j++) index_list[j] = j; qsort(index_list, nleft, sizeof(int), parallel_sort_indirect_compare); /* now select the median of the medians */ int mid = nleft / 2; memcpy(new_element_guess, median_element_list + index_list[mid] * size, size); new_tie_braking_rank = tie_braking_rank_list[index_list[mid]]; new_max_loc = max_loc_list[index_list[mid]]; } } else { /* in order to preserve existing guesses */ memcpy(new_element_guess, element_guess + Local_ThisTask * size, size); new_tie_braking_rank = element_tie_braking_rank[Local_ThisTask]; new_max_loc = max_loc[Local_ThisTask]; } } MPI_Allgather(new_element_guess, size, MPI_BYTE, element_guess, size, MPI_BYTE, MPI_CommLocal); MPI_Allgather(&new_tie_braking_rank, sizeof(size_t), MPI_BYTE, element_tie_braking_rank, sizeof(size_t), MPI_BYTE, MPI_CommLocal); MPI_Allgather(&new_max_loc, 1, MPI_INT, max_loc, 1, MPI_INT, MPI_CommLocal); iter++; if(iter > (MAX_ITER_PARALLEL_SORT - 100) && Local_ThisTask == 0) { printf("PSORT: iter=%d: ranks_not_found=%d Local_NTask=%d\n", iter, ranks_not_found, Local_NTask); fflush(stdout); if(iter > MAX_ITER_PARALLEL_SORT) terminate("can't find the split points. That's odd"); } } while(ranks_not_found); myfree(new_element_guess); myfree(source_median_element_list); myfree(source_tie_braking_rank_list); myfree(source_range_len_list); myfree(max_loc_list); myfree(index_list); myfree(tie_braking_rank_list); myfree(median_element_list); myfree(median_element); /* At this point we have found all the elements corresponding to the desired split points */ /* we can now go ahead and determine how many elements of the local CPU have to go to each other CPU */ if(nmemb * size > (1LL << 31)) terminate("currently, local data must be smaller than 2 GB"); /* note: to restrict this limitation, the send/recv count arrays have to made 64-bit, * and the MPI data exchange though MPI_Alltoall has to be modified such that buffers > 2 GB become possible */ int *send_count = mymalloc("send_count", Local_NTask * sizeof(int)); int *recv_count = mymalloc("recv_count", Local_NTask * sizeof(int)); int *send_offset = mymalloc("send_offset", Local_NTask * sizeof(int)); int *recv_offset = mymalloc("recv_offset", Local_NTask * sizeof(int)); for(i = 0; i < Local_NTask; i++) send_count[i] = 0; int target = 0; for(i = 0; i < nmemb; i++) { while(target < Local_NTask - 1) { int cmp = compar((char *) base + i * size, element_guess + target * size); if(cmp == 0) { if(i + noffs[Local_ThisTask] < element_tie_braking_rank[target]) cmp = -1; else if(i + noffs[Local_ThisTask] > element_tie_braking_rank[target]) cmp = +1; } if(cmp >= 0) target++; else break; } send_count[target]++; } MPI_Alltoall(send_count, 1, MPI_INT, recv_count, 1, MPI_INT, MPI_CommLocal); size_t nimport; for(j = 0, nimport = 0, recv_offset[0] = 0, send_offset[0] = 0; j < Local_NTask; j++) { nimport += recv_count[j]; if(j > 0) { send_offset[j] = send_offset[j - 1] + send_count[j - 1]; recv_offset[j] = recv_offset[j - 1] + recv_count[j - 1]; } } if(nimport != nmemb) terminate("nimport != nmemb"); for(j = 0; j < Local_NTask; j++) { send_count[j] *= size; recv_count[j] *= size; send_offset[j] *= size; recv_offset[j] *= size; } char *basetmp = mymalloc("basetmp", nmemb * size); /* exchange the data */ MPI_Alltoallv(base, send_count, send_offset, MPI_BYTE, basetmp, recv_count, recv_offset, MPI_BYTE, MPI_CommLocal); memcpy(base, basetmp, nmemb * size); myfree(basetmp); serial_sort((char *) base, nmemb, size, compar); myfree(recv_offset); myfree(send_offset); myfree(recv_count); myfree(send_count); myfree(range_len_list); myfree(list); myfree(max_loc); myfree(range_right); myfree(range_left); myfree(current_loc_rank); myfree(current_glob_rank); myfree(desired_glob_rank); myfree(element_tie_braking_rank); myfree(element_guess); myfree(noffs); myfree(nlist); } MPI_Comm_free(&MPI_CommLocal); double tb = second(); return tb - ta; } static void get_local_rank(char *element, /* element of which we want the rank */ size_t tie_braking_rank, /* the inital global rank of this element (needed for braking ties) */ char *base, /* base address of local data */ size_t nmemb, size_t size, /* number and size of local data */ size_t noffs_thistask, /* cumulative length of data on lower tasks */ long long left, long long right, /* range of elements on local task that may hold the element */ size_t * loc, /* output: local rank of the element */ int (*compar) (const void *, const void *)) /* user-specified comparison function */ { if(right < left) terminate("right < left"); if(left == 0 && right == nmemb + 1) { if(compar(base + (nmemb - 1) * size, element) < 0) { *loc = nmemb; return; } else if(compar(base, element) > 0) { *loc = 0; return; } } if(right == left) /* looks like we already converged to the proper rank */ { *loc = left; } else { if(compar(base + (right - 1) * size, element) < 0) /* the last element is smaller, hence all elements are on the left */ *loc = (right - 1) + 1; else if(compar(base + left * size, element) > 0) /* the first element is already larger, hence no element is on the left */ *loc = left; else { while(right > left) { long long mid = ((right - 1) + left) / 2; int cmp = compar(base + mid * size, element); if(cmp == 0) { if(mid + noffs_thistask < tie_braking_rank) cmp = -1; else if(mid + noffs_thistask > tie_braking_rank) cmp = +1; } if(cmp == 0) /* element has exactly been found */ { *loc = mid; break; } if((right - 1) == left) /* elements is not on this CPU */ { if(cmp < 0) *loc = mid + 1; else *loc = mid; break; } if(cmp < 0) { left = mid + 1; } else { if((right - 1) == left + 1) { if(mid != left) terminate("Can't be: -->left=%lld right=%lld\n", left, right); *loc = left; break; } right = mid; } } } } } #ifdef CHECK_LOCAL_RANK static void check_local_rank(char *element, /* element of which we want the rank */ size_t tie_braking_rank, /* the inital global rank of this element (needed for braking ties) */ char *base, /* base address of local data */ size_t nmemb, size_t size, /* number and size of local data */ size_t noffs_thistask, /* cumulative length of data on lower tasks */ long long left, long long right, /* range of elements on local task that may hold the element */ size_t loc, int (*compar) (const void *, const void *)) /* user-specified comparison function */ { int i; long long count = 0; for(i = 0; i < nmemb; i++) { int cmp = compar(base + i * size, element); if(cmp == 0) { if(noffs_thistask + i < tie_braking_rank) cmp = -1; } if(cmp < 0) count++; } if(count != loc) terminate("Inconsistency: Task=%d: loc=%lld count=%lld left=%lld right=%lld nmemb=%lld\n", ThisTask, (long long) loc, count, left, right, (long long) nmemb); } #endif static void serial_sort(char *base, size_t nmemb, size_t size, int (*compar) (const void *, const void *)) { size_t storage = nmemb * size; char *tmp = (char *) mymalloc("tmp", storage); msort_serial_with_tmp(base, nmemb, size, compar, tmp); myfree(tmp); } static void msort_serial_with_tmp(char *base, size_t n, size_t s, int (*compar) (const void *, const void *), char *t) { char *tmp; char *b1, *b2; size_t n1, n2; if(n <= 1) return; n1 = n / 2; n2 = n - n1; b1 = base; b2 = base + n1 * s; msort_serial_with_tmp(b1, n1, s, compar, t); msort_serial_with_tmp(b2, n2, s, compar, t); tmp = t; while(n1 > 0 && n2 > 0) { if(compar(b1, b2) < 0) { --n1; memcpy(tmp, b1, s); tmp += s; b1 += s; } else { --n2; memcpy(tmp, b2, s); tmp += s; b2 += s; } } if(n1 > 0) memcpy(tmp, b1, n1 * s); memcpy(base, t, (n - n2) * s); } void parallel_sort_test_order(char *base, size_t nmemb, size_t size, int (*compar) (const void *, const void *)) { int i, recv, send; size_t *nlist; nlist = (size_t *) mymalloc("nlist", NTask * sizeof(size_t)); MPI_Allgather(&nmemb, sizeof(size_t), MPI_BYTE, nlist, sizeof(size_t), MPI_BYTE, MPI_COMM_WORLD); for(i = 0, recv = -1; i < ThisTask && nmemb > 0; i++) if(nlist[i] > 0) recv = i; for(i = ThisTask + 1, send = -1; nmemb > 0 && i < NTask; i++) if(nlist[i] > 0) { send = i; break; } char *element = mymalloc("element", size); MPI_Request requests[2]; int nreq = 0; if(send >= 0) MPI_Isend(base + (nmemb - 1) * size, size, MPI_BYTE, send, TAG_TRANSFER, MPI_COMM_WORLD, &requests[nreq++]); if(recv >= 0) MPI_Irecv(element, size, MPI_BYTE, recv, TAG_TRANSFER, MPI_COMM_WORLD, &requests[nreq++]); MPI_Waitall(nreq, requests, MPI_STATUSES_IGNORE); if(recv >= 0) { for(i = 0; i < nmemb; i++) { if(compar(element, base + i * size) > 0) terminate("wrong order"); } } myfree(element); myfree(nlist); } GalIC/src/parameters.c000644 000765 000024 00000026617 12373713530 015560 0ustar00volkerstaff000000 000000 /******************************************************************************* * This file is part of the GALIC code developed by D. Yurin and V. Springel. * * Copyright (c) 2014 * Denis Yurin (denis.yurin@h-its.org) * Volker Springel (volker.springel@h-its.org) *******************************************************************************/ #include #include #include #include #include #include #include #include #include #include "allvars.h" #include "proto.h" #define REAL 1 #define STRING 2 #define INT 3 /* this routine parses and reads the parameterfile */ void read_parameter_file(char *fname) { FILE *fd, *fdout; char buf[MAXLEN_PARAM_TAG + MAXLEN_PARAM_VALUE + 200], buf1[MAXLEN_PARAM_TAG + 200], buf2[MAXLEN_PARAM_VALUE + 200], buf3[MAXLEN_PARAM_TAG + MAXLEN_PARAM_VALUE + 400]; int i, j, nt; int id[MAX_PARAMETERS]; void *addr[MAX_PARAMETERS]; char tag[MAX_PARAMETERS][MAXLEN_PARAM_TAG]; int pnum, errorFlag = 0; if(sizeof(long long) != 8) { mpi_printf("\nType `long long' is not 64 bit on this platform. Stopping.\n\n"); endrun(); } if(sizeof(int) != 4) { mpi_printf("\nType `int' is not 32 bit on this platform. Stopping.\n\n"); endrun(); } if(sizeof(float) != 4) { mpi_printf("\nType `float' is not 32 bit on this platform. Stopping.\n\n"); endrun(); } if(sizeof(double) != 8) { mpi_printf("\nType `double' is not 64 bit on this platform. Stopping.\n\n"); endrun(); } if(ThisTask == 0) /* read parameter file on process 0 */ { nt = 0; strcpy(tag[nt], "DG_MaxLevel"); addr[nt] = &DG_MaxLevel; id[nt++] = INT; strcpy(tag[nt], "FG_Nbin"); addr[nt] = &FG_Nbin; id[nt++] = INT; strcpy(tag[nt], "EG_MaxLevel"); addr[nt] = &EG_MaxLevel; id[nt++] = INT; strcpy(tag[nt], "TorbitFac"); addr[nt] = &All.TorbitFac; id[nt++] = REAL; strcpy(tag[nt], "MaxVelInUnitsVesc"); addr[nt] = &All.MaxVelInUnitsVesc; id[nt++] = REAL; strcpy(tag[nt], "TimeStepFactorOrbit"); addr[nt] = &All.TimeStepFactorOrbit; id[nt++] = REAL; strcpy(tag[nt], "TimeStepFactorCellCross"); addr[nt] = &All.TimeStepFactorCellCross; id[nt++] = REAL; strcpy(tag[nt], "TypeOfHaloVelocityStructure"); addr[nt] = &All.TypeOfHaloVelocityStructure; id[nt++] = INT; strcpy(tag[nt], "TypeOfDiskVelocityStructure"); addr[nt] = &All.TypeOfDiskVelocityStructure; id[nt++] = INT; strcpy(tag[nt], "TypeOfBulgeVelocityStructure"); addr[nt] = &All.TypeOfBulgeVelocityStructure; id[nt++] = INT; strcpy(tag[nt], "HaloBetaParameter"); addr[nt] = &All.HaloBetaParameter; id[nt++] = REAL; strcpy(tag[nt], "BulgeBetaParameter"); addr[nt] = &All.BulgeBetaParameter; id[nt++] = REAL; strcpy(tag[nt], "HaloStreamingVelocityParameter"); addr[nt] = &All.HaloStreamingVelocityParameter; id[nt++] = REAL; strcpy(tag[nt], "DiskStreamingVelocityParameter"); addr[nt] = &All.DiskStreamingVelocityParameter; id[nt++] = REAL; strcpy(tag[nt], "BulgeStreamingVelocityParameter"); addr[nt] = &All.BulgeStreamingVelocityParameter; id[nt++] = REAL; strcpy(tag[nt], "HaloDispersionRoverZratio"); addr[nt] = &All.HaloDispersionRoverZratio; id[nt++] = REAL; strcpy(tag[nt], "DiskDispersionRoverZratio"); addr[nt] = &All.DiskDispersionRoverZratio; id[nt++] = REAL; strcpy(tag[nt], "BulgeDispersionRoverZratio"); addr[nt] = &All.BulgeDispersionRoverZratio; id[nt++] = REAL; strcpy(tag[nt], "MinParticlesPerBinForDensityMeasurement"); addr[nt] = &All.MinParticlesPerBinForDensityMeasurement; id[nt++] = INT; strcpy(tag[nt], "MinParticlesPerBinForDispersionMeasurement"); addr[nt] = &All.MinParticlesPerBinForDispersionMeasurement; id[nt++] = INT; strcpy(tag[nt], "OutermostBinEnclosedMassFraction"); addr[nt] = &All.OutermostBinEnclosedMassFraction; id[nt++] = REAL; strcpy(tag[nt], "InnermostBinEnclosedMassFraction"); addr[nt] = &All.InnermostBinEnclosedMassFraction; id[nt++] = REAL; strcpy(tag[nt], "CC"); addr[nt] = &All.Halo_C; id[nt++] = REAL; strcpy(tag[nt], "V200"); addr[nt] = &All.V200; id[nt++] = REAL; strcpy(tag[nt], "LAMBDA"); addr[nt] = &All.Lambda; id[nt++] = REAL; strcpy(tag[nt], "MD"); addr[nt] = &All.MD; id[nt++] = REAL; strcpy(tag[nt], "MBH"); addr[nt] = &All.MBH; id[nt++] = REAL; strcpy(tag[nt], "MB"); addr[nt] = &All.MB; id[nt++] = REAL; strcpy(tag[nt], "JD"); addr[nt] = &All.JD; id[nt++] = REAL; strcpy(tag[nt], "DiskHeight"); addr[nt] = &All.DiskHeight; id[nt++] = REAL; strcpy(tag[nt], "BulgeSize"); addr[nt] = &All.BulgeSize; id[nt++] = REAL; strcpy(tag[nt], "HaloStretch"); addr[nt] = &All.HaloStretch; id[nt++] = REAL; strcpy(tag[nt], "BulgeStretch"); addr[nt] = &All.BulgeStretch; id[nt++] = REAL; strcpy(tag[nt], "N_HALO"); addr[nt] = &All.Halo_N; id[nt++] = INT; strcpy(tag[nt], "N_DISK"); addr[nt] = &All.Disk_N; id[nt++] = INT; strcpy(tag[nt], "N_BULGE"); addr[nt] = &All.Bulge_N; id[nt++] = INT; strcpy(tag[nt], "OutputDir"); addr[nt] = All.OutputDir; id[nt++] = STRING; strcpy(tag[nt], "OutputFile"); addr[nt] = All.OutputFile; id[nt++] = STRING; strcpy(tag[nt], "FractionToOptimizeIndependendly"); addr[nt] = &All.FractionToOptimizeIndependendly; id[nt++] = REAL; strcpy(tag[nt], "IndepenentOptimizationsPerStep"); addr[nt] = &All.IndepenentOptimizationsPerStep; id[nt++] = INT; strcpy(tag[nt], "StepsBetweenDump"); addr[nt] = &All.StepsBetweenDump; id[nt++] = INT; strcpy(tag[nt], "MaximumNumberOfSteps"); addr[nt] = &All.MaximumNumberOfSteps; id[nt++] = INT; strcpy(tag[nt], "SampleForceNhalo"); addr[nt] = &All.SampleForceNhalo; id[nt++] = INT; strcpy(tag[nt], "SampleForceNdisk"); addr[nt] = &All.SampleForceNdisk; id[nt++] = INT; strcpy(tag[nt], "SampleForceNbulge"); addr[nt] = &All.SampleForceNbulge; id[nt++] = INT; strcpy(tag[nt], "SampleParticleCount"); addr[nt] = &All.SampleParticleCount; id[nt++] = INT; strcpy(tag[nt], "SampleDensityFieldForTargetResponse"); addr[nt] = &All.SampleDensityFieldForTargetResponse; id[nt++] = INT; strcpy(tag[nt], "MaxMemSize"); addr[nt] = &All.MaxMemSize; id[nt++] = INT; strcpy(tag[nt], "UnitVelocity_in_cm_per_s"); addr[nt] = &All.UnitVelocity_in_cm_per_s; id[nt++] = REAL; strcpy(tag[nt], "UnitLength_in_cm"); addr[nt] = &All.UnitLength_in_cm; id[nt++] = REAL; strcpy(tag[nt], "UnitMass_in_g"); addr[nt] = &All.UnitMass_in_g; id[nt++] = REAL; strcpy(tag[nt], "ErrTolTheta"); addr[nt] = &All.ErrTolTheta; id[nt++] = REAL; strcpy(tag[nt], "ErrTolForceAcc"); addr[nt] = &All.ErrTolForceAcc; id[nt++] = REAL; strcpy(tag[nt], "MultipleDomains"); addr[nt] = &All.MultipleDomains; id[nt++] = INT; strcpy(tag[nt], "TopNodeFactor"); addr[nt] = &All.TopNodeFactor; id[nt++] = REAL; strcpy(tag[nt], "SnapFormat"); addr[nt] = &All.SnapFormat; id[nt++] = INT; strcpy(tag[nt], "NumFilesPerSnapshot"); addr[nt] = &All.NumFilesPerSnapshot; id[nt++] = INT; strcpy(tag[nt], "NumFilesWrittenInParallel"); addr[nt] = &All.NumFilesWrittenInParallel; id[nt++] = INT; strcpy(tag[nt], "TypeOfOpeningCriterion"); addr[nt] = &All.TypeOfOpeningCriterion; id[nt++] = INT; strcpy(tag[nt], "Softening"); addr[nt] = &All.Softening; id[nt++] = REAL; strcpy(tag[nt], "BufferSize"); addr[nt] = &All.BufferSize; id[nt++] = INT; strcpy(tag[nt], "BufferSizeGravity"); addr[nt] = &All.BufferSizeGravity; id[nt++] = INT; strcpy(tag[nt], "GravityConstantInternal"); addr[nt] = &All.GravityConstantInternal; id[nt++] = REAL; if((fd = fopen(fname, "r"))) { sprintf(buf, "%s%s", fname, "-usedvalues"); if(!(fdout = fopen(buf, "w"))) { printf("error opening file '%s' \n", buf); errorFlag = 1; } else { printf("Obtaining parameters from file '%s':\n", fname); while(!feof(fd)) { *buf = 0; fgets(buf, MAXLEN_PARAM_TAG + MAXLEN_PARAM_VALUE + 200, fd); if(sscanf(buf, "%s%s%s", buf1, buf2, buf3) < 2) continue; if(buf1[0] == '%') continue; for(i = 0, j = -1; i < nt; i++) if(strcmp(buf1, tag[i]) == 0) { j = i; tag[i][0] = 0; break; } if(j >= 0) { switch (id[j]) { case REAL: *((double *) addr[j]) = atof(buf2); sprintf(buf3, "%%-%ds%%g\n", MAXLEN_PARAM_TAG); fprintf(fdout, buf3, buf1, *((double *) addr[j])); fprintf(stdout, buf3, buf1, *((double *) addr[j])); break; case STRING: strcpy((char *) addr[j], buf2); sprintf(buf3, "%%-%ds%%s\n", MAXLEN_PARAM_TAG); fprintf(fdout, buf3, buf1, buf2); fprintf(stdout, buf3, buf1, buf2); break; case INT: *((int *) addr[j]) = atoi(buf2); sprintf(buf3, "%%-%ds%%d\n", MAXLEN_PARAM_TAG); fprintf(fdout, buf3, buf1, *((int *) addr[j])); fprintf(stdout, buf3, buf1, *((int *) addr[j])); break; } } else { fprintf(stdout, "Error in file %s: Tag '%s' not allowed or multiply defined.\n", fname, buf1); errorFlag = 1; } } fclose(fd); fclose(fdout); printf("\n"); i = strlen(All.OutputDir); if(i > 0) if(All.OutputDir[i - 1] != '/') strcat(All.OutputDir, "/"); mkdir(All.OutputDir, 02755); sprintf(buf1, "%s%s", fname, "-usedvalues"); sprintf(buf2, "%s%s", All.OutputDir, "parameters-usedvalues"); sprintf(buf3, "cp %s %s", buf1, buf2); #ifndef NOCALLSOFSYSTEM system(buf3); #endif } } else { printf("Parameter file %s not found.\n", fname); errorFlag = 1; } for(i = 0; i < nt; i++) { if(*tag[i]) { printf("Error. I miss a value for tag '%s' in parameter file '%s'.\n", tag[i], fname); errorFlag = 1; } } } MPI_Bcast(&errorFlag, 1, MPI_INT, 0, MPI_COMM_WORLD); if(errorFlag) { MPI_Finalize(); exit(0); } /* now communicate the relevant parameters to the other processes */ MPI_Bcast(&All, sizeof(struct global_data_all_processes), MPI_BYTE, 0, MPI_COMM_WORLD); MPI_Bcast(&DG_MaxLevel, sizeof(int), MPI_BYTE, 0, MPI_COMM_WORLD); MPI_Bcast(&FG_Nbin, sizeof(int), MPI_BYTE, 0, MPI_COMM_WORLD); MPI_Bcast(&EG_MaxLevel, sizeof(int), MPI_BYTE, 0, MPI_COMM_WORLD); for(pnum = 0; All.NumFilesWrittenInParallel > (1 << pnum); pnum++); if(All.NumFilesWrittenInParallel != (1 << pnum)) { mpi_printf("NumFilesWrittenInParallel MUST be a power of 2\n"); endrun(); } if(All.NumFilesWrittenInParallel > NTask) { mpi_printf("NumFilesWrittenInParallel MUST be smaller than number of processors\n"); endrun(); } } GalIC/src/set_particles.c000644 000765 000024 00000027510 12373713530 016247 0ustar00volkerstaff000000 000000 /******************************************************************************* * This file is part of the GALIC code developed by D. Yurin and V. Springel. * * Copyright (c) 2014 * Denis Yurin (denis.yurin@h-its.org) * Volker Springel (volker.springel@h-its.org) *******************************************************************************/ #include #include #include #include #include #include #include #include #include "allvars.h" #include "proto.h" /* This function samples the mass model with particles. * The main job of the code will then be to find the initial velocities. */ void initialize_particles(void) { int n, i, k; double phi, theta, vr; double vsum2 = 0, rsum2 = 0, vsum2_exact = 0; int count_r[6], count_t[6], count_p[6], count_q[6]; int tot_count_r[6], tot_count_t[6], tot_count_p[6], tot_count_q[6]; int nhalo = get_part_count_this_task(All.Halo_N); int ndisk = get_part_count_this_task(All.Disk_N); int nbulge = get_part_count_this_task(All.Bulge_N); NumPart = nhalo + ndisk + nbulge; MPI_Allreduce(&NumPart, &All.MaxPart, 1, MPI_INT, MPI_MAX, MPI_COMM_WORLD); sumup_large_ints(1, &NumPart, &All.TotNumPart); P = (struct particle_data *) mymalloc_movable(&P, "P", All.MaxPart * sizeof(struct particle_data)); memset(P, 0, All.MaxPart * sizeof(struct particle_data)); permutation = (struct permutation_data *) mymalloc_movable(&permutation, "permutation", All.MaxPart * sizeof(struct permutation_data)); n = 0; for(i = 0; i < 6; i++) count_r[i] = count_t[i] = count_p[i] = count_q[i] = 0; for(i = 0; i < nhalo; i++, n++) { P[n].Type = 1; P[n].Mass = All.Halo_Mass / All.Halo_N; } for(i = 0; i < ndisk; i++, n++) { P[n].Type = 2; P[n].Mass = All.Disk_Mass / All.Disk_N; } for(i = 0; i < nbulge; i++, n++) { P[n].Type = 3; P[n].Mass = All.Bulge_Mass / All.Bulge_N; } int *nlist = mymalloc("nlist", NTask * sizeof(int)); MPI_Allgather(&NumPart, 1, MPI_INT, nlist, 1, MPI_INT, MPI_COMM_WORLD); int nbefore = 0; for(i = 0; i < ThisTask; i++) nbefore += nlist[i]; myfree(nlist); for(n = 0; n < NumPart; n++) P[n].ID = nbefore + n + 1; for(n = 0; n < NumPart; n++) { if(P[n].Type == 1) halo_get_fresh_coordinate(P[n].Pos); /* a halo particle */ else if(P[n].Type == 2) disk_get_fresh_coordinate(P[n].Pos); /* disk particle */ else if(P[n].Type == 3) bulge_get_fresh_coordinate(P[n].Pos); /* disk particle */ P[n].Vesc = forcegrid_get_escape_speed(P[n].Pos); double acc[3]; forcegrid_get_acceleration(P[n].Pos, acc); double a = sqrt(acc[0] * acc[0] + acc[1] * acc[1] + acc[2] * acc[2]); double r = sqrt(P[n].Pos[0] * P[n].Pos[0] + P[n].Pos[1] * P[n].Pos[1] + P[n].Pos[2] * P[n].Pos[2]); P[n].Tint = All.TorbitFac * 2 * M_PI * r / sqrt(r * a); P[n].RecalcFlag = 1; if(P[n].Type == 1) { /* generate a realization in VelTheo[] with the exact spherically symmetric, isotropic Hernquist distribution function, for comparison */ do { vr = halo_generate_v(r); } while(vr >= All.MaxVelInUnitsVesc * P[n].Vesc); /* isotropic velocity distribution */ phi = gsl_rng_uniform(random_generator) * M_PI * 2; theta = acos(gsl_rng_uniform(random_generator) * 2 - 1); P[n].VelTheo[0] = vr * sin(theta) * cos(phi); P[n].VelTheo[1] = vr * sin(theta) * sin(phi); P[n].VelTheo[2] = vr * cos(theta); vsum2_exact += vr * vr; rsum2 += r * r; } /* generate an initial guess for the velocities */ /* let's pick the Jeans moment for this, and use a Gaussian */ int typeOfVelocityStructure = 0; if(P[n].Type == 1) /* a halo particle */ typeOfVelocityStructure = All.TypeOfHaloVelocityStructure; else if(P[n].Type == 2) /* disk */ typeOfVelocityStructure = All.TypeOfDiskVelocityStructure; else if(P[n].Type == 3) /* bulge */ typeOfVelocityStructure = All.TypeOfBulgeVelocityStructure; else terminate("unknown type"); double disp_r = 0, disp_t = 0, disp_p = 0, disp_q = 0; get_disp_rtp(P[n].Pos, P[n].Type, &disp_r, &disp_t, &disp_p, &disp_q); if(disp_r <= All.LowerDispLimit) { count_r[P[n].Type]++; disp_r = All.LowerDispLimit; } if(disp_t <= All.LowerDispLimit) { count_t[P[n].Type]++; disp_t = All.LowerDispLimit; } if(disp_p <= All.LowerDispLimit) { count_p[P[n].Type]++; disp_p = All.LowerDispLimit; } if(disp_q <= All.LowerDispLimit) { count_q[P[n].Type]++; disp_q = All.LowerDispLimit; } P[n].vr2_target = disp_r; P[n].vt2_target = disp_t; P[n].vp2_target = disp_p; P[n].vq2_target = disp_q; double vstr = get_vstream(P[n].Pos, P[n].Type); if(typeOfVelocityStructure == 0 || typeOfVelocityStructure == 1 || typeOfVelocityStructure == 3) /* spherical case */ { double sigmaR = sqrt(disp_r); double sigmaT = sqrt(disp_t); double sigmaP = sqrt(disp_p); double v, vr, vphi, vtheta; /* draw three Gaussians with the relevant dispersions */ do { vr = gsl_ran_gaussian(random_generator, sigmaR); vtheta = gsl_ran_gaussian(random_generator, sigmaT); vphi = gsl_ran_gaussian(random_generator, sigmaP); vphi += vstr; v = sqrt(vr * vr + vphi * vphi + vtheta * vtheta); } while(v >= All.MaxVelInUnitsVesc * P[n].Vesc); double phi = atan2(P[n].Pos[1], P[n].Pos[0]); double theta = acos(P[n].Pos[2] / sqrt(P[n].Pos[0] * P[n].Pos[0] + P[n].Pos[1] * P[n].Pos[1] + P[n].Pos[2] * P[n].Pos[2])); double er[3], ePhi[3], eTheta[3]; er[0] = sin(theta) * cos(phi); er[1] = sin(theta) * sin(phi); er[2] = cos(theta); ePhi[0] = -sin(phi); ePhi[1] = cos(phi); ePhi[2] = 0; eTheta[0] = -cos(theta) * cos(phi); eTheta[1] = -cos(theta) * sin(phi); eTheta[2] = sin(theta); for(k = 0; k < 3; k++) P[n].Vel[k] = vr * er[k] + vphi * ePhi[k] + vtheta * eTheta[k]; /* for(k = 0; k < 3; k++) P[n].Vel[k] *= 0.1; */ } else if(typeOfVelocityStructure == 2) /* axisymmetric case, f(E,Lz), with net rotation */ { double sigmaR = sqrt(disp_r); double sigmaT = sqrt(disp_t); double sigmaP = sqrt(disp_p); double v, vR, vphi, vz; /* draw three Gaussians with the relevant dispersions */ do { vR = gsl_ran_gaussian(random_generator, sigmaR); vz = gsl_ran_gaussian(random_generator, sigmaT); vphi = gsl_ran_gaussian(random_generator, sigmaP); vphi += vstr; v = sqrt(vR * vR + vphi * vphi + vz * vz); } while(v >= All.MaxVelInUnitsVesc * P[n].Vesc); phi = atan2(P[n].Pos[1], P[n].Pos[0]); double eR[3], ePhi[3], eZ[3]; eR[0] = cos(phi); eR[1] = sin(phi); eR[2] = 0; ePhi[0] = -sin(phi); ePhi[1] = cos(phi); ePhi[2] = 0; eZ[0] = 0; eZ[1] = 0; eZ[2] = 1; for(k = 0; k < 3; k++) P[n].Vel[k] = vR * eR[k] + vphi * ePhi[k] + vz * eZ[k]; } vsum2 += P[n].Vel[0] * P[n].Vel[0] + P[n].Vel[1] * P[n].Vel[1] + P[n].Vel[2] * P[n].Vel[2]; } MPI_Allreduce(count_r, tot_count_r, 6, MPI_INT, MPI_SUM, MPI_COMM_WORLD); MPI_Allreduce(count_t, tot_count_t, 6, MPI_INT, MPI_SUM, MPI_COMM_WORLD); MPI_Allreduce(count_p, tot_count_p, 6, MPI_INT, MPI_SUM, MPI_COMM_WORLD); MPI_Allreduce(count_q, tot_count_q, 6, MPI_INT, MPI_SUM, MPI_COMM_WORLD); int type; for(type = 1; type <= 3; type++) { if(NType[type] == 0) continue; double frac_r = ((double)tot_count_r[type]) / NType[type]; double frac_t = ((double)tot_count_t[type]) / NType[type]; double frac_p = ((double)tot_count_p[type]) / NType[type]; double frac_q = ((double)tot_count_q[type]) / NType[type]; mpi_printf("Type=%d: fractions of particles with problematic low velocity dispersion: (r/R|t/z|phi/tot_phi) = (%g|%g|%g|%g)\n", type, frac_r, frac_t, frac_p, frac_q); if(frac_r > 0.05 || frac_t > 0.05 || frac_p > 0.05 || frac_q > 0.05) { mpi_printf("\nwe better stop, because there appears to be no valid velocity structure for this configuration.\n\n"); endrun(); } } if(ThisTask == 0) { for(type = 1; type <= 3; type++) { if(NType[type] == 0) continue; char buf[2000]; sprintf(buf, "%s/fit_%d.txt", All.OutputDir, type); if(!(FdFit[type] = fopen(buf, "w"))) terminate("can't open file '%s'", buf); } } for(n = 0; n < NumPart; n++) { permutation[n].rnd = gsl_rng_uniform(random_generator); permutation[n].index = n; } qsort(permutation, NumPart, sizeof(struct permutation_data), permutation_compare); output_toomre_Q(); output_rotcurve(); } int permutation_compare(const void *a, const void *b) { if(((struct permutation_data *) a)->rnd < (((struct permutation_data *) b)->rnd)) return -1; if(((struct permutation_data *) a)->rnd > (((struct permutation_data *) b)->rnd)) return +1; return 0; } int get_part_count_this_task(int n) { int avg = (n - 1) / NTask + 1; int exc = NTask * avg - n; int tasklastsection = NTask - exc; if(ThisTask < tasklastsection) return avg; else return avg - 1; } void output_toomre_Q(void) { if(ThisTask == 0 && NType[2] > 0) { double pos[3], R, acc[3], R2, acc2[3], R1, acc1[3]; double disp_r, disp_t, disp_p, disp_q; char buf[1000]; int j, n = 500; double Rmax = 5.0 * All.Disk_H; sprintf(buf, "%s/toomreQ.txt", All.OutputDir); FILE *fd = fopen(buf, "w"); fprintf(fd, "%d\n", n); for(j = 0; j < n; j++) { R = (Rmax / n) * (j + 0.5); pos[0] = R; pos[1] = 0; pos[2] = 0; forcegrid_get_acceleration(pos, acc); double dphiDR = -acc[0]; R2 = R + 0.05 * R; R1 = R - 0.05 * R; pos[0] = R2; forcegrid_get_acceleration(pos, acc2); pos[0] = R1; forcegrid_get_acceleration(pos, acc1); double d2phiDR2 = (-acc2[0] - (-acc1[0])) / (R2 - R1); double kappa2 = d2phiDR2 + 3.0 / R * dphiDR; if(kappa2 < 0) terminate("kappa2 = %g", kappa2); double kappa = sqrt(kappa2); pos[0] = R; pos[1] = 0; pos[2] = 0; get_disp_rtp(pos, 2, &disp_r, &disp_t, &disp_p, &disp_q); double sigmaR = sqrt(disp_r); double sigma_star = All.Disk_Mass / (2 * M_PI * All.Disk_H * All.Disk_H) * exp(-R / All.Disk_H); double Q = sigmaR * kappa / (3.36 * All.G * sigma_star); fprintf(fd, "%g %g\n", R, Q); } fclose(fd); } } /* this function outputs the rotation curve. */ void output_rotcurve(void) { if(ThisTask == 0) { double pos[3], R, acc[3]; char buf[1000]; int j, n = 5000; double Rmax = All.R200; sprintf(buf, "%s/rotcurve.txt", All.OutputDir); FILE *fd = fopen(buf, "w"); fprintf(fd, "%d\n", n); double vc2_tot, vc2_dm, vc2_disk, vc2_bulge; for(j = 0; j < n; j++) { R = (Rmax / n) * (j + 0.5); pos[0] = R; pos[1] = 0; pos[2] = 0; forcegrid_get_acceleration(pos, acc); vc2_tot = fabs(R * acc[0]); if(All.Bulge_Mass > 0) { bulge_get_acceleration(pos, acc); vc2_bulge = fabs(R * acc[0]); } else vc2_bulge = 0; if(All.Halo_Mass > 0) { halo_get_acceleration(pos, acc); vc2_dm = fabs(R * acc[0]); } else vc2_dm = 0; vc2_disk = vc2_tot - vc2_dm - vc2_bulge; if(vc2_disk < 0) vc2_disk = 0; fprintf(fd, "%g %g %g %g %g\n", R, sqrt(vc2_tot), sqrt(vc2_dm), sqrt(vc2_disk), sqrt(vc2_bulge)); } fclose(fd); } } GalIC/src/structure.c000644 000765 000024 00000011225 12373713530 015442 0ustar00volkerstaff000000 000000 /******************************************************************************* * This file is part of the GALIC code developed by D. Yurin and V. Springel. * * Copyright (c) 2014 * Denis Yurin (denis.yurin@h-its.org) * Volker Springel (volker.springel@h-its.org) *******************************************************************************/ #include #include #include #include #include #include #include #include #include "allvars.h" #include "proto.h" static double fc(double c) { return c * (0.5 - 0.5 / pow(1 + c, 2) - log(1 + c) / (1 + c)) / pow(log(1 + c) - c / (1 + c), 2); } static double jdisk_int(double x, void *param) { double vc2, Sigma0, vc, y; if(x > 1.0e-10 * All.Halo_A) vc2 = All.G * (halo_get_mass_inside_radius(x) + bulge_get_mass_inside_radius(x)) / x; else vc2 = 0; if(vc2 < 0) terminate("vc2 < 0"); Sigma0 = All.Disk_Mass / (2 * M_PI * All.Disk_H * All.Disk_H); y = x / (2 * All.Disk_H); if(y > 1e-4) vc2 += x * 2 * M_PI * All.G * Sigma0 * y * (gsl_sf_bessel_I0(y) * gsl_sf_bessel_K0(y) - gsl_sf_bessel_I1(y) * gsl_sf_bessel_K1(y)); vc = sqrt(vc2); return pow(x / All.Disk_H, 2) * vc * exp(-x / All.Disk_H); } static double gc_int(double x, void *param) { return pow(log(1 + x) - x / (1 + x), 0.5) * pow(x, 1.5) / pow(1 + x, 2); } void structure_determination(void) { double jhalo, jdisk, jd; double hnew, dh; /* total galaxy mass */ All.M200 = pow(All.V200, 3) / (10 * All.G * All.Hubble); /* virial radius of galaxy */ All.R200 = All.V200 / (10 * All.Hubble); All.LowerDispLimit = pow(0.01 * All.V200, 2); /* halo scale radius */ All.Halo_Rs = All.R200 / All.Halo_C; /* determine the masses of all components */ All.Disk_Mass = All.MD * All.M200; All.Bulge_Mass = All.MB * All.M200; All.BH_Mass = All.MBH * All.M200; if(All.MBH > 0) All.BH_N = 1; else All.BH_N = 0; All.Halo_Mass = All.M200 - All.Disk_Mass - All.Bulge_Mass - All.BH_Mass; /* set the scale factor of the hernquist halo */ All.Halo_A = All.Halo_Rs * sqrt(2 * (log(1 + All.Halo_C) - All.Halo_C / (1 + All.Halo_C))); jhalo = All.Lambda * sqrt(All.G) * pow(All.M200, 1.5) * sqrt(2 * All.R200 / fc(All.Halo_C)); jdisk = All.JD * jhalo; double halo_spinfactor = 1.5 * All.Lambda * sqrt(2 * All.Halo_C / fc(All.Halo_C)) * pow(log(1 + All.Halo_C) - All.Halo_C / (1 + All.Halo_C), 1.5) / structure_gc(All.Halo_C); mpi_printf("\nStructural parameters:\n"); mpi_printf("R200 = %g\n", All.R200); mpi_printf("M200 = %g (this is the total mass)\n", All.M200); mpi_printf("A (halo) = %g\n", All.Halo_A); mpi_printf("halo_spinfactor = %g\n", halo_spinfactor); /* first guess for disk scale length */ All.Disk_H = sqrt(2.0) / 2.0 * All.Lambda / fc(All.Halo_C) * All.R200; All.Disk_Z0 = All.DiskHeight * All.Disk_H; /* sets disk thickness */ All.Bulge_A = All.BulgeSize * All.Halo_A; /* this will be used if no disk is present */ MType[1] = All.Halo_Mass; MType[2] = All.Disk_Mass; MType[3] = All.Bulge_Mass; NType[1] = All.Halo_N; NType[2] = All.Disk_N; NType[3] = All.Bulge_N; if(All.Disk_Mass > 0) { do { jd = structure_disk_angmomentum(); /* computes disk momentum */ hnew = jdisk / jd * All.Disk_H; dh = hnew - All.Disk_H; if(fabs(dh) > 0.5 * All.Disk_H) dh = 0.5 * All.Disk_H * dh / fabs(dh); else dh = dh * 0.1; All.Disk_H = All.Disk_H + dh; /* mpi_printf("Jd/J=%g hnew: %g \n", jd / jhalo, All.Disk_H); */ All.Disk_Z0 = All.DiskHeight * All.Disk_H; /* sets disk thickness */ } while(fabs(dh) / All.Disk_H > 1e-5); } mpi_printf("H (disk) = %g\n", All.Disk_H); mpi_printf("Z0 (disk) = %g\n", All.Disk_Z0); mpi_printf("A (bulge) = %g\n", All.Bulge_A); } double structure_disk_angmomentum(void) { gsl_function F; gsl_integration_workspace *workspace = gsl_integration_workspace_alloc(WORKSIZE); F.function = &jdisk_int; double result, abserr; gsl_integration_qag(&F, 0, dmin(30 * All.Disk_H, All.R200), 0, 1.0e-8, WORKSIZE, GSL_INTEG_GAUSS41, workspace, &result, &abserr); result *= All.Disk_Mass; gsl_integration_workspace_free(workspace); return result; } double structure_gc(double c) { gsl_function F; gsl_integration_workspace *workspace = gsl_integration_workspace_alloc(WORKSIZE); F.function = &gc_int; double result, abserr; gsl_integration_qag(&F, 0, c, 0, 1.0e-8, WORKSIZE, GSL_INTEG_GAUSS41, workspace, &result, &abserr); gsl_integration_workspace_free(workspace); return result; } GalIC/src/system.c000644 000765 000024 00000017630 12373713530 014734 0ustar00volkerstaff000000 000000 /******************************************************************************* * This file is part of the GALIC code developed by D. Yurin and V. Springel. * * Copyright (c) 2014 * Denis Yurin (denis.yurin@h-its.org) * Volker Springel (volker.springel@h-its.org) *******************************************************************************/ #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include "allvars.h" #include "proto.h" int get_thread_num(void) { #if (NUM_THREADS > 1) /* This enables OpenMP */ return omp_get_thread_num(); #else return 0; #endif } double dabs(double a) { if(a < 0) return -a; else return a; } double dmax(double a, double b) { if(a > b) return a; else return b; } double dmin(double a, double b) { if(a < b) return a; else return b; } int imax(int a, int b) { if(a > b) return a; else return b; } int imin(int a, int b) { if(a < b) return a; else return b; } #ifdef DEBUG_ENABLE_FPU_EXCEPTIONS #include void enable_core_dumps_and_fpu_exceptions(void) { /* enable floating point exceptions */ extern int feenableexcept(int __excepts); feenableexcept(FE_DIVBYZERO | FE_INVALID); /* set core-dump size to infinity */ struct rlimit rlim; getrlimit(RLIMIT_CORE, &rlim); rlim.rlim_cur = RLIM_INFINITY; setrlimit(RLIMIT_CORE, &rlim); /* MPICH catches the signales SIGSEGV, SIGBUS, and SIGFPE.... * The following statements reset things to the default handlers, * which will generate a core file. */ signal(SIGSEGV, SIG_DFL); signal(SIGBUS, SIG_DFL); signal(SIGFPE, SIG_DFL); signal(SIGINT, SIG_DFL); } #endif /* returns the number of cpu-ticks in seconds that * have elapsed. (or the wall-clock time) */ double second(void) { return MPI_Wtime(); /* * possible alternative: * * return ((double) clock()) / CLOCKS_PER_SEC; * * but note: on AIX and presumably many other 32bit systems, * clock() has only a resolution of 10ms=0.01sec */ } double measure_time(void) /* strategy: call this at end of functions to account for time in this function, and before another (nontrivial) function is called */ { double t, dt; t = second(); dt = t - WallclockTime; WallclockTime = t; return dt; } /* returns the time difference between two measurements * obtained with second(). The routine takes care of the * possible overflow of the tick counter on 32bit systems. */ double timediff(double t0, double t1) { double dt; dt = t1 - t0; if(dt < 0) /* overflow has occured (for systems with 32bit tick counter) */ { #ifdef WALLCLOCK dt = 0; #else dt = t1 + pow(2, 32) / CLOCKS_PER_SEC - t0; #endif } return dt; } void minimum_large_ints(int n, long long *src, long long *res) { int i, j; long long *numlist; numlist = (long long *) mymalloc("numlist", NTask * n * sizeof(long long)); MPI_Allgather(src, n * sizeof(long long), MPI_BYTE, numlist, n * sizeof(long long), MPI_BYTE, MPI_COMM_WORLD); for(j = 0; j < n; j++) res[j] = src[j]; for(i = 0; i < NTask; i++) for(j = 0; j < n; j++) if(res[j] > numlist[i * n + j]) res[j] = numlist[i * n + j]; myfree(numlist); } void sumup_large_ints_comm(int n, int *src, long long *res, MPI_Comm comm) { int i, j, *numlist; int ntask; MPI_Comm_size(comm, &ntask); numlist = (int *) mymalloc("numlist", ntask * n * sizeof(int)); MPI_Allgather(src, n, MPI_INT, numlist, n, MPI_INT, comm); for(j = 0; j < n; j++) res[j] = 0; for(i = 0; i < ntask; i++) for(j = 0; j < n; j++) res[j] += numlist[i * n + j]; myfree(numlist); } void sumup_large_ints(int n, int *src, long long *res) { sumup_large_ints_comm(n, src, res, MPI_COMM_WORLD); } void sumup_longs(int n, long long *src, long long *res) { int i, j; long long *numlist; numlist = (long long *) mymalloc("numlist", NTask * n * sizeof(long long)); MPI_Allgather(src, n * sizeof(long long), MPI_BYTE, numlist, n * sizeof(long long), MPI_BYTE, MPI_COMM_WORLD); for(j = 0; j < n; j++) res[j] = 0; for(i = 0; i < NTask; i++) for(j = 0; j < n; j++) res[j] += numlist[i * n + j]; myfree(numlist); } void sumup_floats(int n, float *x, float *res) { int i, j, p; float *numlist; double min_FreeBytes_glob, FreeBytes_local = 1.0 * FreeBytes; MPI_Allreduce(&FreeBytes_local, &min_FreeBytes_glob, 1, MPI_DOUBLE, MPI_MIN, MPI_COMM_WORLD); int sum_chunksize = (int) (min_FreeBytes_glob / sizeof(float) / NTask); int sum_pieces = n / sum_chunksize; int sum_restsize = n % sum_chunksize; if(sum_chunksize == 0) terminate("min_FreeBytes_glob too small - not enough memory for sumup_floats.\n"); for(j = 0; j < n; j++) res[j] = 0; for(p = 0; p < sum_pieces; p++) { numlist = (float *) mymalloc("numlist", NTask * sum_chunksize * sizeof(float)); MPI_Allgather(x + p * sum_chunksize, sum_chunksize, MPI_FLOAT, numlist, sum_chunksize, MPI_FLOAT, MPI_COMM_WORLD); for(i = 0; i < NTask; i++) for(j = 0; j < sum_chunksize; j++) res[p * sum_chunksize + j] += numlist[i * sum_chunksize + j]; myfree(numlist); } if(sum_restsize > 0) { numlist = (float *) mymalloc("numlist", NTask * sum_restsize * sizeof(float)); MPI_Allgather(x + sum_pieces * sum_chunksize, sum_restsize, MPI_FLOAT, numlist, sum_restsize, MPI_FLOAT, MPI_COMM_WORLD); for(i = 0; i < NTask; i++) for(j = 0; j < sum_restsize; j++) res[sum_pieces * sum_chunksize + j] += numlist[i * sum_restsize + j]; myfree(numlist); } } void sumup_doubles(int n, double *x, double *res) { int i, j, p; double *numlist; double min_FreeBytes_glob, FreeBytes_local = 1.0 * FreeBytes; MPI_Allreduce(&FreeBytes_local, &min_FreeBytes_glob, 1, MPI_DOUBLE, MPI_MIN, MPI_COMM_WORLD); int sum_chunksize = (int) (min_FreeBytes_glob / sizeof(float) / NTask); int sum_pieces = n / sum_chunksize; int sum_restsize = n % sum_chunksize; if(sum_chunksize == 0) terminate("min_FreeBytes_glob too small - not enough memory for sumup_doubles.\n"); for(j = 0; j < n; j++) res[j] = 0; for(p = 0; p < sum_pieces; p++) { numlist = (double *) mymalloc("numlist", NTask * sum_chunksize * sizeof(double)); MPI_Allgather(x + p * sum_chunksize, sum_chunksize, MPI_DOUBLE, numlist, sum_chunksize, MPI_DOUBLE, MPI_COMM_WORLD); for(i = 0; i < NTask; i++) for(j = 0; j < sum_chunksize; j++) res[p * sum_chunksize + j] += numlist[i * sum_chunksize + j]; myfree(numlist); } if(sum_restsize > 0) { numlist = (double *) mymalloc("numlist", NTask * sum_restsize * sizeof(double)); MPI_Allgather(x + sum_pieces * sum_chunksize, sum_restsize, MPI_DOUBLE, numlist, sum_restsize, MPI_DOUBLE, MPI_COMM_WORLD); for(i = 0; i < NTask; i++) for(j = 0; j < sum_restsize; j++) res[sum_pieces * sum_chunksize + j] += numlist[i * sum_restsize + j]; myfree(numlist); } } size_t sizemax(size_t a, size_t b) { if(a < b) return b; else return a; } /* The following function is part of the GNU C Library. Contributed by Torbjorn Granlund (tege@sics.se) */ /* Find the first bit set in the argument */ int my_ffsll(long long int i) { unsigned long long int x = i & -i; if(x <= 0xffffffff) return ffs(i); else return 32 + ffs(i >> 32); } double mysort(void *base, size_t nel, size_t width, int (*compar) (const void *, const void *)) { double t0, t1; t0 = second(); qsort(base, nel, width, compar); t1 = second(); return timediff(t0, t1); } GalIC/src/allvars.h000644 000765 000024 00000076464 12373713530 015073 0ustar00volkerstaff000000 000000 /******************************************************************************* * This file is part of the GALIC code developed by D. Yurin and V. Springel. * * Copyright (c) 2014 * Denis Yurin (denis.yurin@h-its.org) * Volker Springel (volker.springel@h-its.org) *******************************************************************************/ /*! \file allvars.h * \brief declares global variables. * * This file declares all global variables. Further variables should be added here, and declared as * 'extern'. The actual existence of these variables is provided by the file 'allvars.c'. To proN_Cduce * 'allvars.c' from 'allvars.h', do the following: * * - Erase all #define statements * - add #include "allvars.h" * - delete all keywords 'extern' * - delete all struct definitions enclosed in {...}, e.g. * "extern struct global_data_all_processes {....} All;" * becomes "struct global_data_all_processes All;" */ #ifndef ALLVARS_H #define ALLVARS_H #include #include #include #include #include #include #include #include #include "../build/galicconfig.h" #define TAG_N 10 /*!< Various tags used for labelling MPI messages */ #define TAG_HEADER 11 #define TAG_PDATA 12 #define TAG_SPHDATA 13 #define TAG_KEY 14 #define TAG_DMOM 15 #define TAG_NODELEN 16 #define TAG_HMAX 17 #define TAG_GRAV_A 18 #define TAG_GRAV_B 19 #define TAG_DIRECT_A 20 #define TAG_DIRECT_B 21 #define TAG_HYDRO_A 22 #define TAG_HYDRO_B 23 #define TAG_NFORTHISTASK 24 #define TAG_PERIODIC_A 25 #define TAG_PERIODIC_B 26 #define TAG_PERIODIC_C 27 #define TAG_PERIODIC_D 28 #define TAG_NONPERIOD_A 29 #define TAG_NONPERIOD_B 30 #define TAG_NONPERIOD_C 31 #define TAG_NONPERIOD_D 32 #define TAG_POTENTIAL_A 33 #define TAG_POTENTIAL_B 34 #define TAG_DENS_A 35 #define TAG_DENS_B 36 #define TAG_LOCALN 37 #define TAG_BH_A 38 #define TAG_BH_B 39 #define TAG_SMOOTH_A 40 #define TAG_SMOOTH_B 41 #define TAG_ENRICH_A 42 #define TAG_CONDUCT_A 43 #define TAG_CONDUCT_B 44 #define TAG_FOF_A 45 #define TAG_FOF_B 46 #define TAG_FOF_C 47 #define TAG_FOF_D 48 #define TAG_FOF_E 49 #define TAG_FOF_F 50 #define TAG_FOF_G 51 #define TAG_HOTNGB_A 52 #define TAG_HOTNGB_B 53 #define TAG_GRAD_A 54 #define TAG_GRAD_B 55 #ifndef LONGIDS typedef unsigned int MyIDType; #define MPI_MYIDTYPE MPI_UNSIGNED #else typedef unsigned long long MyIDType; #define MPI_MYIDTYPE MPI_UNSIGNED_LONG_LONG #endif #ifndef DOUBLEPRECISION /* default is single-precision */ typedef float MyFloat; typedef float MyDouble; #define MPI_MYFLOAT MPI_FLOAT #define MPI_MYDOUBLE MPI_FLOAT #else #if (DOUBLEPRECISION == 2) /* mixed precision */ typedef float MyFloat; typedef double MyDouble; #define MPI_MYFLOAT MPI_FLOAT #define MPI_MYDOUBLE MPI_DOUBLE #else /* everything double-precision */ typedef double MyFloat; typedef double MyDouble; #define MPI_MYFLOAT MPI_DOUBLE #define MPI_MYDOUBLE MPI_DOUBLE #endif #endif #ifdef OUTPUT_IN_DOUBLEPRECISION typedef double MyOutputFloat; #else typedef float MyOutputFloat; #endif #ifdef INPUT_IN_DOUBLEPRECISION typedef double MyInputFloat; #else typedef float MyInputFloat; #endif #define GALIC_VERSION "1.0" /* code version string */ #define FG_SECTIONS 8 // 32 extern int FlagNyt; #define terminate(...) {if(FlagNyt==0){char termbuf1[1000], termbuf2[1000]; sprintf(termbuf1, "Code termination on task=%d, function %s(), file %s, line %d", ThisTask, __FUNCTION__, __FILE__, __LINE__); sprintf(termbuf2, __VA_ARGS__); printf("%s: %s\n", termbuf1, termbuf2); fflush(stdout); FlagNyt=1; MPI_Abort(MPI_COMM_WORLD, 1);} exit(0);} #define warn(...) {char termbuf1[1000], termbuf2[1000]; sprintf(termbuf1, "Code warning on task=%d, function %s(), file %s, line %d", ThisTask, __FUNCTION__, __FILE__, __LINE__); sprintf(termbuf2, __VA_ARGS__); printf("%s: %s\n", termbuf1, termbuf2); myflush(stdout); FILE *fd=fopen("WARNINGS", "w"); fclose(fd);} /* define an "assert" macro which outputs MPI task (we do NOT want to call MPI_Abort, because then the assertion failure isn't caught in the debugger) */ #ifndef NDEBUG #define myassert(cond) \ if(!(cond)) { \ char termbuf[1000]; \ sprintf(termbuf, "Assertion failure!\n\ttask=%d, function %s(), file %s, line %d:\n\t%s\n", ThisTask, __FUNCTION__, __FILE__, __LINE__, #cond); \ printf("%s", termbuf); myflush(stdout); \ assert(0); \ } #else #define myassert(cond) #endif #define mymalloc(x, y) mymalloc_fullinfo(x, y, __FUNCTION__, __FILE__, __LINE__) #define mymalloc_movable(x, y, z) mymalloc_movable_fullinfo(x, y, z, __FUNCTION__, __FILE__, __LINE__) #define myrealloc(x, y) myrealloc_fullinfo(x, y, __FUNCTION__, __FILE__, __LINE__) #define myrealloc_movable(x, y) myrealloc_movable_fullinfo(x, y, __FUNCTION__, __FILE__, __LINE__) #define myfree(x) myfree_fullinfo(x, __FUNCTION__, __FILE__, __LINE__) #define myfree_movable(x) myfree_movable_fullinfo(x, __FUNCTION__, __FILE__, __LINE__) #define report_memory_usage(x, y) report_detailed_memory_usage_of_largest_task(x, y, __FUNCTION__, __FILE__, __LINE__) #define STACKOFFSET(l,i,j) (((1<<(2*(l))) -1)/3 + ((i) * (1 << (l))) + (j)) #define OFFSET(l,i,j) (((i) * (1 << (l))) + (j)) typedef int integertime; #define TIMEBINS 29 #define TIMEBASE (1<= MaxPart+MaxNodes are "pseudo particles" that hang off the toplevel leaf nodes belonging to other tasks. These are not represented by this structure. Instead, the tree traversal for these are saved in the Nextnode, Prevnode and Father arrays, indexed with the node number in the case of real particles and by nodenumber-MaxNodes for pseudo particles. */ extern struct NODE { union { int suns[8]; /**< temporary pointers to daughter nodes */ struct { MyDouble s[3] __attribute__((__aligned__(16))); /**< center of mass of node */ MyDouble mass; /**< mass of node */ /** The next node in the tree walk in case the current node does not need to be opened. This means that it traverses the 8 subnodes of a node in a breadth-first fashion, and then goes to father->sibling. */ int sibling; /** The next node in case the current node needs to be opened. Applying nextnode repeatedly results in a pure depth-first traversal of the tree. */ int nextnode; /** The parent node of the node. (Is -1 for the root node.) */ int father; } d; } u; float center[3]; /**< geometrical center of node */ float len; /**< sidelength of treenode */ } *Nodes; /** Gives next node in tree walk for the "particle" nodes. Entries 0 -- MaxPart-1 are the real particles, and the "pseudoparticles" are indexed by the node number-MaxNodes. */ extern int *Nextnode; /** Gives previous node in tree walk for the leaf (particle) nodes. Entries 0 -- MaxPart-1 are the real particles, and the "pseudoparticles" are indexed by the node number-MaxNodes. */ extern int *Father; extern int MaxThreads; #endif GalIC/src/proto.h000644 000765 000024 00000017364 12373713530 014564 0ustar00volkerstaff000000 000000 /******************************************************************************* * This file is part of the GALIC code developed by D. Yurin and V. Springel. * * Copyright (c) 2014 * Denis Yurin (denis.yurin@h-its.org) * Volker Springel (volker.springel@h-its.org) *******************************************************************************/ #ifndef PROTO_H #define PROTO_H #include "allvars.h" #include "forcetree/forcetree.h" #include #include #ifdef HAVE_HDF5 #include #endif int cmp_P_Rnd(const void *a, const void *b); void shuffle_energies(int iter); double parallel_sort(void *base, size_t nmemb, size_t size, int (*compar) (const void *, const void *)); double parallel_sort_comm(void *base, size_t nmemb, size_t size, int (*compar) (const void *, const void *), MPI_Comm comm); void smooth_stack(double *data, int maxlevel); double calc_stack_difference(double *d1, double *d2, int l, int i, int j, int maxlevel, double *ref1, double *ref2, double thresh, double *dist, int flag); double calc_stack_difference_used(double *d1, double *d2, int l, int i, int j, int maxlevel, double *ref1, double *ref2, double *used1, double *used2, double thresh, int flag); double eval_smoothed_stack(double *din, int l, int i, int j, int maxlevel, double *ref, double thresh); void calc_smoothed_stack(double *din, double *dout, int maxlevel, double *ref, double thresh); double integrate_axisymmetric_jeans(double zstart, double zend, double R, int type); double h_factor(double R, double z, int type); double get_beta_of_type(double *pos, int type); void free_allocated_memory(void); void force_test(void); void forcegrid_get_cell(double *pos, int *iR, int *iz, double *fR, double *fz); double halo_get_potential(double *pos); void halo_get_acceleration(double *pos, double *acc); void halo_get_fresh_coordinate(double *pos); double halo_generate_v(double rad); double halo_get_potential_from_radius(double r); double halo_get_density(double *pos); double halo_get_mass_inside_radius(double r); double halo_get_escape_speed(double *pos); void disk_get_fresh_coordinate(double *pos); double disk_get_density(double *pos); double disk_get_mass_inside_radius(double R); double bugle_get_mass_inside_radius(double r); void bulge_get_fresh_coordinate(double *pos); double bulge_get_density(double *pos); double bulge_get_mass_inside_radius(double r); double bulge_get_escape_speed(double *pos); double bulge_get_potential(double *pos); double bulge_get_potential_from_radius(double r); void bulge_get_acceleration(double *pos, double *acc); double bulge_get_escape_speed(double *pos); void output_rotcurve(void); void densitygrid_sample_targetresponse(void); void enable_core_dumps_and_fpu_exceptions(void); double h_over_R(double R, double z, int type); void line_search(void); void calc_energy_grid_mass_maps(void); void energygrid_get_cell(double *pos, int *iR, int *iz, double *fR, double *fz); void calc_disp_components_for_particle(int n, double *v, double *vr2, double *vt2, double *vp2, double *vq2); void structure_determination(void); double structure_disk_angmomentum(void); double structure_gc(double c); double eval_fit(int n, double *vel, double *newdens, double *olddens); double goldensection_search(int n, double ekin_a, double ekin_b, double ekin_c, double f_a, double f_b, double f_c, double *dir, double *egy, double *fnew, int *count); double eval_fit_anisotropy(int, double alpha, double v, double *rad, double *perp); void optimize(int n); void free_all_response_fields(void); void calc_all_response_fields(void); void optimize_some_particles(void); void forcegrid_allocate(void); double forcegrid_get_potential(double *pos); void forcegrid_get_acceleration(double *pos, double *acc); double forcegrid_get_escape_speed(double *pos); void forcedensitygrid_create(void); void forcedensitygrid_calculate(void); void densitygrid_allocate(void); void densitygrid_get_cell(double *pos, int *iR, int *iz, double *fR, double *fz); void forcedensitygrid_load(void); void forcedensitygrid_save(void); void commit_updates(void); void init_updates(void); void calc_global_fit(void); void energygrid_allocate(void); void reorient_particle_velocities(int iter); void update_velocities(int iter); void initialize_particles(void); double get_density_of_type(double *pos, int type); double get_vstream(double *pos, int type); double get_z_disp_cylindrical(double *pos, int type); double get_radial_disp_spherical(double *pos, int type); void get_disp_rtp(double *pos, int type, double *disp_r, double *disp_t, double *disp_p, double *disp_q); double get_r_disp_tilted(double *pos, int type); double get_theta_disp_tilted(double *pos, int type); double get_phi_disp(double *pos, int type); void calculate_dispfield(void); void calc_all_response_fields_and_gradients(void); void log_message(int iter); void calc_response_dispersion(void); void allocate_memory(void); void output_toomre_Q(void); void add_to_energy_grid(double *pos, double mass, double vr2, double vt2, double vp2, double vq2, double *egyMass, double *egyResponse_r, double *egyResponse_t, double *egyResponse_p, double *egyResponse_q); double produce_orbit_response_field(double *pos, double *vel, int id, double *mfield, double mass, double timespan, int *orbitstaken); void init(void); void set_units(void); void endrun(void); void output_compile_time_options(void); void set_softenings(void); void read_parameter_file(char *fname); void mpi_printf(const char *fmt, ...); size_t my_fread(void *ptr, size_t size, size_t nmemb, FILE * stream); size_t my_fwrite(void *ptr, size_t size, size_t nmemb, FILE * stream); void write_file(char *fname, int writeTask, int lastTask); void get_dataset_name(enum iofields blocknr, char *buf); void get_Tab_IO_Label(enum iofields blocknr, char *label); int blockpresent(enum iofields blocknr, int write); int get_particles_in_block(enum iofields blocknr, int *typelist); int get_values_per_blockelement(enum iofields blocknr); int get_datatype_in_block(enum iofields blocknr); int get_bytes_per_blockelement(enum iofields blocknr, int mode); void fill_write_buffer(enum iofields blocknr, int *startindex, int pc, int type); void output_particles(int iter); void output_density_field(int iter); void distribute_file(int nfiles, int firstfile, int firsttask, int lasttask, int *filenr, int *master, int *last); void *mymalloc_fullinfo(const char *varname, size_t n, const char *func, const char *file, int linenr); void *mymalloc_movable_fullinfo(void *ptr, const char *varname, size_t n, const char *func, const char *file, int line); void *myrealloc_fullinfo(void *p, size_t n, const char *func, const char *file, int line); void *myrealloc_movable_fullinfo(void *p, size_t n, const char *func, const char *file, int line); void myfree_fullinfo(void *p, const char *func, const char *file, int line); void myfree_movable_fullinfo(void *p, const char *func, const char *file, int line); int dump_memory_table_buffer(char *p); void mymalloc_init(void); int permutation_compare(const void *a, const void *b); double dabs(double a); double dmax(double a, double b); double dmin(double a, double b); int imax(int a, int b); int imin(int a, int b); int get_part_count_this_task(int n); size_t sizemax(size_t a, size_t b); int my_ffsll(long long int i); void reorder_particles(int *Id); void gravity(void); double second(void); void sumup_large_ints(int n, int *src, long long *res); void sumup_longs(int n, long long *src, long long *res); double timediff(double t0, double t1); int get_thread_num(void); peanokey peano_hilbert_key(int x, int y, int z, int bits); void peano_hilbert_order(void); void peano_hilbert_key_inverse(peanokey key, int bits, int *x, int *y, int *z); double mysort(void *base, size_t nel, size_t width, int (*compar) (const void *, const void *)); #endif GalIC/src/domain/domain.c000644 000765 000024 00000014707 12373713530 016130 0ustar00volkerstaff000000 000000 /******************************************************************************* * This file is part of the GALIC code developed by D. Yurin and V. Springel. * * Copyright (c) 2014 * Denis Yurin (denis.yurin@h-its.org) * Volker Springel (volker.springel@h-its.org) *******************************************************************************/ #include #include #include #include #include #include #include "../allvars.h" #include "../proto.h" #include "domain.h" /*! \file domain.c * \brief code for domain decomposition * * This file contains the code for the domain decomposition of the * simulation volume. The domains are constructed from disjoint subsets * of the leaves of a fiducial top-level tree that covers the full * simulation volume. Domain boundaries hence run along tree-node * divisions of a fiducial global BH tree. As a result of this method, the * tree force are in principle strictly independent of the way the domains * are cut. The domain decomposition can be carried out for an arbitrary * number of CPUs. Individual domains are not cubical, but spatially * coherent since the leaves are traversed in a Peano-Hilbert order and * individual domains form segments along this order. This also ensures * that each domain has a small surface to volume ratio, which minimizes * communication. */ /*! This is the main routine for the domain decomposition. It acts as a * driver routine that allocates various temporary buffers, maps the * particles back onto the periodic box if needed, and then does the * domain decomposition, and a final Peano-Hilbert order of all particles * as a tuning measure. */ void domain_Decomposition(void) { mpi_printf("DOMAIN:\n"); mpi_printf("DOMAIN: Begin domain decomposition (sync-point %d).\n", All.NumCurrentTiStep); domain_allocate(); domain_allocate_lists(); topNodes = (struct local_topnode_data *) mymalloc_movable(&topNodes, "topNodes", (MaxTopNodes * sizeof(struct local_topnode_data))); /* find total cost factors */ domain_find_total_cost(); /* determine global dimensions of domain grid */ domain_findExtent(); /* determine top-level tree */ domain_determineTopTree(); /* find the split of the top-level tree */ domain_combine_topleaves_to_domains(All.MultipleDomains * NTask, NTopleaves); /* combine on each MPI task several of the domains (namely the number All.MultipleDomains) */ domain_combine_multipledomains(); /* permutate the task assignment such that the smallest number of particles needs to be moved */ domain_optimize_domain_to_task_mapping(); /* determine for each cpu how many particles have to be shifted to other cpus */ domain_countToGo(); /* finally, carry out the actual particle exchange */ domain_exchange(); /* copy what we need for the topnodes */ domain_preserve_relevant_topnode_data(); myfree(topNodes); domain_free_lists(); int nummax; MPI_Allreduce(&NumPart, &nummax, 1, MPI_INT, MPI_MAX, MPI_COMM_WORLD); mpi_printf("\nDOMAIN: ----> Final load balance = %g <------\n\n", nummax / ( ((double)All.TotNumPart) / NTask)); mpi_printf("DOMAIN: domain decomposition done.\n"); peano_hilbert_order(); myfree(Key); TopNodes = (struct topnode_data *) myrealloc_movable(TopNodes, NTopnodes * sizeof(struct topnode_data)); DomainTask = (int *) myrealloc_movable(DomainTask, NTopleaves * sizeof(int)); } void domain_preserve_relevant_topnode_data(void) { int i; for(i = 0; i < NTopnodes; i++) { TopNodes[i].StartKey = topNodes[i].StartKey; TopNodes[i].Size = topNodes[i].Size; TopNodes[i].Daughter = topNodes[i].Daughter; TopNodes[i].Leaf = topNodes[i].Leaf; int j; int bits = my_ffsll(TopNodes[i].Size); int blocks = (bits - 1) / 3 - 1; for(j = 0; j < 8; j++) { int xb, yb, zb; peano_hilbert_key_inverse(TopNodes[i].StartKey + j * (TopNodes[i].Size >> 3), BITS_PER_DIMENSION, &xb, &yb, &zb); xb >>= blocks; yb >>= blocks; zb >>= blocks; int idx = (xb & 1) | ((yb & 1) << 1) | ((zb & 1) << 2); if(idx < 0 || idx > 7) { char buf[1000]; sprintf(buf, "j=%d idx=%d xb=%d yb=%d zb=%d blocks=%d bits=%d size=%lld\n", j, idx, xb, yb, zb, blocks, bits, TopNodes[i].Size); terminate(buf); } TopNodes[i].MortonToPeanoSubnode[idx] = j; } } } void domain_find_total_cost(void) { int i; long long Ntype[6]; /*!< total number of particles of each type */ int NtypeLocal[6]; /*!< local number of particles of each type */ if(All.MultipleDomains < 1 || All.MultipleDomains > 512) terminate("All.MultipleDomains < 1 || All.MultipleDomains > 512"); for(i = 0; i < 6; i++) NtypeLocal[i] = 0; for(i = 0; i < NumPart; i++) NtypeLocal[P[i].Type]++; /* because Ntype[] is of type `long long', we cannot do a simple * MPI_Allreduce() to sum the total particle numbers */ sumup_large_ints(6, NtypeLocal, Ntype); for(i = 0, totpartcount = 0; i < 6; i++) totpartcount += Ntype[i]; fac_load = 1.0 / totpartcount; } int domain_double_to_int(double d) { union { double d; unsigned long long ull; } u; u.d = d; return (int) ((u.ull & 0xFFFFFFFFFFFFFllu) >> (52 - BITS_PER_DIMENSION)); } /*! This function allocates all the stuff that will be required for the tree-construction/walk later on */ void domain_allocate(void) { MaxTopNodes = (int) (All.TopNodeAllocFactor * All.MaxPart + 1); if(DomainStartList) terminate("domain storage already allocated"); DomainStartList = (int *) mymalloc_movable(&DomainStartList, "DomainStartList", (NTask * All.MultipleDomains * sizeof(int))); DomainEndList = (int *) mymalloc_movable(&DomainEndList, "DomainEndList", (NTask * All.MultipleDomains * sizeof(int))); TopNodes = (struct topnode_data *) mymalloc_movable(&TopNodes, "TopNodes", (MaxTopNodes * sizeof(struct topnode_data))); DomainTask = (int *) mymalloc_movable(&DomainTask, "DomainTask", (MaxTopNodes * sizeof(int))); } void domain_free(void) { if(!DomainStartList) terminate("domain storage not allocated"); myfree(DomainTask); myfree(TopNodes); myfree(DomainEndList); myfree(DomainStartList); DomainTask = NULL; TopNodes = NULL; DomainEndList = NULL; DomainStartList = NULL; } void domain_printf(char *buf) { if(RestartFlag <= 2) { printf("%s", buf); } } GalIC/src/domain/domain_balance.c000644 000765 000024 00000037305 12373713530 017574 0ustar00volkerstaff000000 000000 /******************************************************************************* * This file is part of the GALIC code developed by D. Yurin and V. Springel. * * Copyright (c) 2014 * Denis Yurin (denis.yurin@h-its.org) * Volker Springel (volker.springel@h-its.org) *******************************************************************************/ #include #include #include #include #include #include #include "../allvars.h" #include "../proto.h" #include "domain.h" #include "pqueue.h" /** Computes the total gravity cost of a particle i. * All timebins in which the particle appears are summed, and the relative frequency with * which this timebin is executed is taken into account. */ double domain_grav_tot_costfactor(int i) { return 1.0; } /** This function determines the cost and load associated with each top-level leave node of the * tree. These leave nodes can be distributed among the processors in order to reach a good * work-load and memory-load balance. */ void domain_sumCost(void) { int i, j, n, no, nexport = 0, nimport = 0, ngrp, task, loc_first_no; struct domain_cost_data *loc_DomainLeaveNode, *listCost, *export_node_data, *import_node_data; int *blocksize = mymalloc("blocksize", sizeof(int) * NTask); int blk = NTopleaves / NTask; int rmd = NTopleaves - blk * NTask; /* remainder */ int pivot_no = rmd * (blk + 1); for(task = 0, loc_first_no = 0; task < NTask; task++) { if(task < rmd) blocksize[task] = blk + 1; else blocksize[task] = blk; if(task < ThisTask) loc_first_no += blocksize[task]; } loc_DomainLeaveNode = mymalloc("loc_DomainLeaveNode", blocksize[ThisTask] * sizeof(struct domain_cost_data)); memset(loc_DomainLeaveNode, 0, blocksize[ThisTask] * sizeof(struct domain_cost_data)); listCost = mymalloc("listCost", NTopleaves * sizeof(struct domain_cost_data)); int *no_place = mymalloc("no_place", NTopleaves * sizeof(int)); memset(no_place, -1, NTopleaves * sizeof(int)); for(j = 0; j < NTask; j++) Send_count[j] = 0; /* find for each particle its top-leave, and then add the associated cost with it */ for(n = 0; n < NumPart; n++) { no = 0; while(topNodes[no].Daughter >= 0) no = topNodes[no].Daughter + (Key[n] - topNodes[no].StartKey) / (topNodes[no].Size >> 3); no = topNodes[no].Leaf; int p = no_place[no]; if(p < 0) { p = nexport++; no_place[no] = p; memset(&listCost[p], 0, sizeof(struct domain_cost_data)); listCost[p].no = no; if(no < pivot_no) task = no / (blk + 1); else task = rmd + (no - pivot_no) / blk; /* note: if blk=0, then this case can not occur, since then always no < pivot_no */ if(task < 0 || task > NTask) terminate("task < 0 || task > NTask"); Send_count[task]++; } listCost[p].Count += 1; } myfree(no_place); MPI_Alltoall(Send_count, 1, MPI_INT, Recv_count, 1, MPI_INT, MPI_COMM_WORLD); for(j = 0, nimport = 0, Recv_offset[0] = 0, Send_offset[0] = 0; j < NTask; j++) { nimport += Recv_count[j]; if(j > 0) { Send_offset[j] = Send_offset[j - 1] + Send_count[j - 1]; Recv_offset[j] = Recv_offset[j - 1] + Recv_count[j - 1]; } } export_node_data = mymalloc("export_node_data", nexport * sizeof(struct domain_cost_data)); import_node_data = mymalloc("import_node_data", nimport * sizeof(struct domain_cost_data)); for(j = 0; j < NTask; j++) Send_count[j] = 0; for(i=0; i < nexport; i++) { if(listCost[i].no < pivot_no) task = listCost[i].no / (blk + 1); else task = rmd + (listCost[i].no - pivot_no) / blk; /* note: if blk=0, then this case can not occur, since then always no < pivot_no */ int ind = Send_offset[task] + Send_count[task]++; export_node_data[ind] = listCost[i]; } for(ngrp = 0; ngrp < (1 << PTask); ngrp++) /* note: here we also have a transfer from each task to itself (for ngrp=0) */ { int recvTask = ThisTask ^ ngrp; if(recvTask < NTask) if(Send_count[recvTask] > 0 || Recv_count[recvTask] > 0) MPI_Sendrecv(&export_node_data[Send_offset[recvTask]], Send_count[recvTask] * sizeof(struct domain_cost_data), MPI_BYTE, recvTask, TAG_DENS_B, &import_node_data[Recv_offset[recvTask]], Recv_count[recvTask] * sizeof(struct domain_cost_data), MPI_BYTE, recvTask, TAG_DENS_B, MPI_COMM_WORLD, MPI_STATUS_IGNORE); } for(i=0; i < nimport; i++) { int j = import_node_data[i].no - loc_first_no; if(j < 0 || j>= blocksize[ThisTask]) terminate("j=%d < 0 || j>= blocksize[ThisTask]=%d loc_first_no=%d import_node_data[i].no=%d i=%d nimport=%d", j, blocksize[ThisTask], loc_first_no, import_node_data[i].no, i, nimport); loc_DomainLeaveNode[j].Count += import_node_data[i].Count; } myfree(import_node_data); myfree(export_node_data); /* now share the cost data across all processors */ int *bytecounts = (int *) mymalloc("bytecounts", sizeof(int) * NTask); int *byteoffset = (int *) mymalloc("byteoffset", sizeof(int) * NTask); for(task = 0; task < NTask; task++) bytecounts[task] = blocksize[task] * sizeof(struct domain_cost_data); for(task = 1, byteoffset[0] = 0; task < NTask; task++) byteoffset[task] = byteoffset[task - 1] + bytecounts[task - 1]; MPI_Allgatherv(loc_DomainLeaveNode, bytecounts[ThisTask], MPI_BYTE, DomainLeaveNode, bytecounts, byteoffset, MPI_BYTE, MPI_COMM_WORLD); myfree(byteoffset); myfree(bytecounts); myfree(listCost); myfree(loc_DomainLeaveNode); myfree(blocksize); } /** This function uses the cumulative cost function (which weights work-load and memory-load equally) to subdivide * the list of top-level leave nodes into pieces that are (approximately) equal in size. */ void domain_combine_topleaves_to_domains(int ncpu, int ndomain) { double t0 = second(); int i, start, end; double work, workavg, work_before, workavg_before, workhalfnode, max_work = 0; workhalfnode = 0.5 / ndomain; workavg = 1.0 / ncpu; work_before = workavg_before = 0; start = 0; for(i = 0; i < ncpu; i++) { work = 0; end = start; work += fac_load * DomainLeaveNode[end].Count; while((work + work_before + (end + 1 < ndomain ? fac_load * DomainLeaveNode[end + 1].Count : 0) < workavg + workavg_before + workhalfnode) || (i == ncpu - 1 && end < ndomain - 1)) { if((ndomain - end) > (ncpu - i)) end++; else break; work += fac_load * DomainLeaveNode[end].Count; } DomainStartList[i] = start; DomainEndList[i] = end; work_before += work; workavg_before += workavg; start = end + 1; if(max_work < work) max_work = work; } double t1 = second(); mpi_printf("DOMAIN: balance reached among multiple-domains=%g, average leave-nodes per domain=%g (took %g sec)\n", max_work / workavg, ((double) ndomain) / ncpu, timediff(t0, t1)); } static struct domain_segments_data { int task, start, end; double load; double normalized_load; } *domainAssign; struct tasklist_data { double load; int count; } *tasklist; int domain_sort_task(const void *a, const void *b) { if(((struct domain_segments_data *) a)->task < (((struct domain_segments_data *) b)->task)) return -1; if(((struct domain_segments_data *) a)->task > (((struct domain_segments_data *) b)->task)) return +1; return 0; } int domain_sort_load(const void *a, const void *b) { if(((struct domain_segments_data *) a)->normalized_load > (((struct domain_segments_data *) b)->normalized_load)) return -1; if(((struct domain_segments_data *) a)->normalized_load < (((struct domain_segments_data *) b)->normalized_load)) return +1; return 0; } /* mode structure for priority queues */ typedef struct node_t { double pri; int val; size_t pos; } node_t; /* define call back functions for priority queues */ static int cmp_pri(double next, double curr) { return (next > curr); } static double get_pri(void *a) { return (double) ((node_t *) a)->pri; } static void set_pri(void *a, double pri) { ((node_t *) a)->pri = pri; } static size_t get_pos(void *a) { return ((node_t *) a)->pos; } static void set_pos(void *a, size_t pos) { ((node_t *) a)->pos = pos; } /** This function assigns the domain pieces to individual MPI tasks with the goal to balance the work-load * on different timebins. The algorithm used works as follows: * * The domains are assigned to the CPUs in sequence of decreasing "effective load", which is a simple combined measure of * relative total gravity, hydro and memory load. For each assignment, a number of possible target CPUs are evaluated, and * the assignment leading to the lowest total runtime is adopted. * The set of target CPUs that is tested in each step is the one that * consists of the CPUs that currently have the lowest load in the set of primary tasks that are examined. */ void domain_combine_multipledomains(void) { double t0 = second(); double best_runtime; double tot_load; double max_load; int best_target, target; int i, n, ta; int ndomains = All.MultipleDomains * NTask; domainAssign = (struct domain_segments_data *) mymalloc("domainAssign", ndomains * sizeof(struct domain_segments_data)); tasklist = mymalloc("tasklist", NTask * sizeof(struct tasklist_data)); for(ta = 0; ta < NTask; ta++) { tasklist[ta].load = 0; tasklist[ta].count = 0; } for(n = 0; n < ndomains; n++) for(i = DomainStartList[n]; i <= DomainEndList[n]; i++) DomainTask[i] = n; /* now assign this cost to the domainAssign-structure, which keeps track of the different pieces */ tot_load = 0; for(n = 0; n < ndomains; n++) { domainAssign[n].start = DomainStartList[n]; domainAssign[n].end = DomainEndList[n]; domainAssign[n].load = 0; for(i = DomainStartList[n]; i <= DomainEndList[n]; i++) domainAssign[n].load += DomainLeaveNode[i].Count; tot_load += domainAssign[n].load; } for(n = 0; n < ndomains; n++) domainAssign[n].normalized_load = domainAssign[n].load / ((double) tot_load + 1.0e-30); /* sort the pieces according to their normalized work-load, with the most heavily loaded coming first */ mysort(domainAssign, ndomains, sizeof(struct domain_segments_data), domain_sort_load); max_load = 0; /* create priority queues, one for the cost of each occupied timebin, * one for the hydro cost of each occupied timebin */ pqueue_t *queue_load; node_t *nload; queue_load = pqueue_init(NTask, cmp_pri, get_pri, set_pri, get_pos, set_pos); nload = mymalloc("nload", NTask * sizeof(node_t)); for(i = 0; i < NTask; i++) { nload[i].pri = 0; nload[i].val = i; pqueue_insert(queue_load, &nload[i]); } /* now assign each of the domains to a CPU, trying to minimize the overall runtime */ for(n = 0; n < ndomains; n++) { best_runtime = 1.0e30; best_target = -1; /* now check also the load queue */ node_t *node = pqueue_peek(queue_load); target = node->val; double runtime = 0; double load = domainAssign[n].load + tasklist[target].load; if(load < max_load) load = max_load; runtime += ((double) load) / totpartcount; if(runtime < best_runtime || best_target < 0) { best_runtime = runtime; best_target = target; } if(best_target < 0) terminate("best_target < 0"); target = best_target; domainAssign[n].task = target; tasklist[target].load += domainAssign[n].load; tasklist[target].count++; if(max_load < tasklist[target].load) max_load = tasklist[target].load; pqueue_change_priority(queue_load, tasklist[target].load, &nload[target]); } /* free the priority queues again */ myfree(nload); pqueue_free(queue_load); mysort(domainAssign, ndomains, sizeof(struct domain_segments_data), domain_sort_task); for(n = 0; n < ndomains; n++) { DomainStartList[n] = domainAssign[n].start; DomainEndList[n] = domainAssign[n].end; for(i = DomainStartList[n]; i <= DomainEndList[n]; i++) DomainTask[i] = domainAssign[n].task; } myfree(tasklist); myfree(domainAssign); double t1 = second(); mpi_printf("DOMAIN: combining multiple-domains took %g sec\n", timediff(t0, t1)); } /** This function determines a permutation of the new assignment of domains to CPUs such that * the number of particles that has to be moved given the current distribution of particles * is minimized. */ void domain_optimize_domain_to_task_mapping(void) { int i, j, m, maxcount, maxtask; double t0 = second(); int *count_per_task = mymalloc("count_per_task", NTask * sizeof(int)); for(i = 0; i < NTask; i++) count_per_task[i] = 0; /* count how many we want to send to each task */ for(i = 0; i < NumPart; i++) { int no = 0; while(topNodes[no].Daughter >= 0) no = topNodes[no].Daughter + (Key[i] - topNodes[no].StartKey) / (topNodes[no].Size / 8); no = topNodes[no].Leaf; int task = DomainTask[no]; count_per_task[task]++; } /* find the task that holds most of our particles (we really would like to be this task) */ for(i = 1, maxcount = count_per_task[0], maxtask = 0; i < NTask; i++) if(count_per_task[i] > maxcount) { maxcount = count_per_task[i]; maxtask = i; } struct domain_count_data loc_count; struct domain_count_data *domain_count = mymalloc("domain_count", NTask * sizeof(struct domain_count_data)); loc_count.task = maxtask; loc_count.count = maxcount; loc_count.origintask = ThisTask; MPI_Allgather(&loc_count, sizeof(struct domain_count_data), MPI_BYTE, domain_count, sizeof(struct domain_count_data), MPI_BYTE, MPI_COMM_WORLD); qsort(domain_count, NTask, sizeof(struct domain_count_data), domain_compare_count); /* this array will hold a permutation of all tasks constructed such that particle exchange should be minimized */ int *new_task = mymalloc("new_task", NTask * sizeof(int)); /* this array will now flag tasks that have been assigned */ for(i = 0; i < NTask; i++) { count_per_task[i] = 0; new_task[i] = -1; } for(i = 0; i < NTask; i++) { int task = domain_count[i].task; int origin = domain_count[i].origintask; if(new_task[task] == -1 && count_per_task[origin] == 0) { count_per_task[origin] = 1; /* taken */ new_task[task] = origin; } } /* now we have to fill up still unassigned ones in case there were collisions */ for(i = 0, j = 0; i < NTask; i++) { if(new_task[i] == -1) { while(count_per_task[j]) j++; new_task[i] = j; count_per_task[j] = 1; } } int *copy_DomainStartList = mymalloc("copy_DomainStartList", All.MultipleDomains * NTask * sizeof(int)); int *copy_DomainEndList = mymalloc("copy_DomainEndList", All.MultipleDomains * NTask * sizeof(int)); memcpy(copy_DomainStartList, DomainStartList, All.MultipleDomains * NTask * sizeof(int)); memcpy(copy_DomainEndList, DomainEndList, All.MultipleDomains * NTask * sizeof(int)); /* apply permutation to DomainTask assignment */ for(i = 0; i < NTask; i++) for(m = 0; m < All.MultipleDomains; m++) { DomainStartList[new_task[i] * All.MultipleDomains + m] = copy_DomainStartList[i * All.MultipleDomains + m]; DomainEndList[new_task[i] * All.MultipleDomains + m] = copy_DomainEndList[i * All.MultipleDomains + m]; } myfree(copy_DomainEndList); myfree(copy_DomainStartList); for(i = 0; i < NTopleaves; i++) DomainTask[i] = new_task[DomainTask[i]]; myfree(new_task); myfree(domain_count); myfree(count_per_task); double t1 = second(); mpi_printf("DOMAIN: task reshuffling took %g sec\n", timediff(t0, t1)); } GalIC/src/domain/domain_box.c000644 000765 000024 00000003416 12373713530 016773 0ustar00volkerstaff000000 000000 /******************************************************************************* * This file is part of the GALIC code developed by D. Yurin and V. Springel. * * Copyright (c) 2014 * Denis Yurin (denis.yurin@h-its.org) * Volker Springel (volker.springel@h-its.org) *******************************************************************************/ #include #include #include #include #include #include #include "../allvars.h" #include "../proto.h" #include "domain.h" /*! This routine finds the extent of the global domain grid. If periodic is on, the minimum extent is the box size. Otherwise it looks at the maximum extent of the particles. */ void domain_findExtent(void) { int i, j; double len, xmin[3], xmax[3], xmin_glob[3], xmax_glob[3]; /* determine local extension */ for(j = 0; j < 3; j++) { xmin[j] = MAX_REAL_NUMBER; xmax[j] = -MAX_REAL_NUMBER; } for(i = 0; i < NumPart; i++) { for(j = 0; j < 3; j++) { if(xmin[j] > P[i].Pos[j]) xmin[j] = P[i].Pos[j]; if(xmax[j] < P[i].Pos[j]) xmax[j] = P[i].Pos[j]; } } MPI_Allreduce(xmin, xmin_glob, 3, MPI_DOUBLE, MPI_MIN, MPI_COMM_WORLD); MPI_Allreduce(xmax, xmax_glob, 3, MPI_DOUBLE, MPI_MAX, MPI_COMM_WORLD); len = 0; for(j = 0; j < 3; j++) if(xmax_glob[j] - xmin_glob[j] > len) len = xmax_glob[j] - xmin_glob[j]; len *= 1.00001; for(j = 0; j < 3; j++) { DomainCenter[j] = 0.5 * (xmin_glob[j] + xmax_glob[j]); DomainCorner[j] = 0.5 * (xmin_glob[j] + xmax_glob[j]) - 0.5 * len; } DomainLen = len; DomainInverseLen = 1.0 / DomainLen; DomainFac = 1.0 / len * (((peanokey) 1) << (BITS_PER_DIMENSION)); DomainBigFac = (DomainLen / (((long long) 1) << 52)); } GalIC/src/domain/domain_counttogo.c000644 000765 000024 00000002301 12373713530 020214 0ustar00volkerstaff000000 000000 /******************************************************************************* * This file is part of the GALIC code developed by D. Yurin and V. Springel. * * Copyright (c) 2014 * Denis Yurin (denis.yurin@h-its.org) * Volker Springel (volker.springel@h-its.org) *******************************************************************************/ #include #include #include #include #include #include #include "../allvars.h" #include "../proto.h" #include "domain.h" /*! This function determines how many particles that are currently stored * on the local CPU have to be moved off according to the domain * decomposition. */ int domain_countToGo(void) { int n; for(n = 0; n < NTask; n++) { toGo[n] = 0; } for(n = 0; n < NumPart; n++) { int no = 0; while(topNodes[no].Daughter >= 0) no = topNodes[no].Daughter + (Key[n] - topNodes[no].StartKey) / (topNodes[no].Size / 8); no = topNodes[no].Leaf; if(DomainTask[no] != ThisTask) { toGo[DomainTask[no]] += 1; } } MPI_Alltoall(toGo, 1, MPI_INT, toGet, 1, MPI_INT, MPI_COMM_WORLD); return 0; } GalIC/src/domain/domain_exchange.c000644 000765 000024 00000011235 12373713530 017763 0ustar00volkerstaff000000 000000 /******************************************************************************* * This file is part of the GALIC code developed by D. Yurin and V. Springel. * * Copyright (c) 2014 * Denis Yurin (denis.yurin@h-its.org) * Volker Springel (volker.springel@h-its.org) *******************************************************************************/ #include #include #include #include #include #include #include "../allvars.h" #include "../proto.h" #include "domain.h" int myMPI_Alltoallv(void *sendbuf, int *sendcounts, int *sdispls, void *recvbuf, int *recvcounts, int *rdispls, int len, MPI_Comm comm) { int i, ntask; MPI_Comm_size(comm, &ntask); int *scount = mymalloc("scount", ntask * sizeof(int)); int *rcount = mymalloc("rcount", ntask * sizeof(int)); int *soff = mymalloc("soff", ntask * sizeof(int)); int *roff = mymalloc("roff", ntask * sizeof(int)); for(i=0; i < ntask; i++) { scount[i] = sendcounts[i] * len; rcount[i] = recvcounts[i] * len; soff[i] = sdispls[i] * len; roff[i] = rdispls[i] * len; } int ret = MPI_Alltoallv(sendbuf, scount, soff, MPI_BYTE, recvbuf, rcount, roff, MPI_BYTE, comm); myfree(roff); myfree(soff); myfree(rcount); myfree(scount); return ret; } void domain_resize_storage(int count_get, int count_get_sph, int option_flag) { int max_load, load = NumPart + count_get; int max_sphload, sphload = NumGas + count_get_sph; MPI_Allreduce(&load, &max_load, 1, MPI_INT, MPI_MAX, MPI_COMM_WORLD); MPI_Allreduce(&sphload, &max_sphload, 1, MPI_INT, MPI_MAX, MPI_COMM_WORLD); if(max_load > (1.0 - ALLOC_TOLERANCE) * All.MaxPart || max_load < (1.0 - 3 * ALLOC_TOLERANCE) * All.MaxPart) { All.MaxPart = max_load / (1.0 - 2 * ALLOC_TOLERANCE); mpi_printf("ALLOCATE: Changing to MaxPart = %d\n", All.MaxPart); P = (struct particle_data *) myrealloc_movable(P, All.MaxPart * sizeof(struct particle_data)); if(option_flag == 1) Key = (peanokey *) myrealloc_movable(Key, sizeof(peanokey) * All.MaxPart); } } void domain_exchange(void) { double t0 = second(); int count_togo = 0, count_get = 0; int *count, *offset; int *count_recv, *offset_recv; int i, n, no, target; struct particle_data *partBuf; peanokey *keyBuf; long long sumtogo = 0; for(i = 0; i < NTask; i++) sumtogo += toGo[i]; sumup_longs(1, &sumtogo, &sumtogo); mpi_printf("DOMAIN: exchange of %lld particles\n", sumtogo); count = (int *) mymalloc_movable(&count, "count", NTask * sizeof(int)); offset = (int *) mymalloc_movable(&offset, "offset", NTask * sizeof(int)); count_recv = (int *) mymalloc_movable(&count_recv, "count_recv", NTask * sizeof(int)); offset_recv = (int *) mymalloc_movable(&offset_recv, "offset_recv", NTask * sizeof(int)); offset[0] = 0; for(i = 1; i < NTask; i++) offset[i] = offset[i - 1] + toGo[i - 1]; for(i = 0; i < NTask; i++) { count_togo += toGo[i]; count_get += toGet[i]; } partBuf = (struct particle_data *) mymalloc_movable(&partBuf, "partBuf", count_togo * sizeof(struct particle_data)); keyBuf = (peanokey *) mymalloc_movable(&keyBuf, "keyBuf", count_togo * sizeof(peanokey)); for(i = 0; i < NTask; i++) count[i] = 0; for(n = 0; n < NumPart; n++) { no = 0; while(topNodes[no].Daughter >= 0) no = topNodes[no].Daughter + (Key[n] - topNodes[no].StartKey) / (topNodes[no].Size / 8); no = topNodes[no].Leaf; target = DomainTask[no]; if(target != ThisTask) { partBuf[offset[target] + count[target]] = P[n]; keyBuf[offset[target] + count[target]] = Key[n]; count[target]++; P[n] = P[NumPart - 1]; Key[n] = Key[NumPart - 1]; NumPart--; n--; } } /**** now resize the storage for the P[] and SphP[] arrays if needed ****/ domain_resize_storage(count_get, 0, 1); /***** space has been created, now can do the actual exchange *****/ for(i = 0; i < NTask; i++) count_recv[i] = toGet[i]; offset_recv[0] = NumPart; for(i = 1; i < NTask; i++) offset_recv[i] = offset_recv[i - 1] + count_recv[i - 1]; myMPI_Alltoallv(partBuf, count, offset, P, count_recv, offset_recv, sizeof(struct particle_data), MPI_COMM_WORLD); myMPI_Alltoallv(keyBuf, count, offset, Key, count_recv, offset_recv, sizeof(peanokey), MPI_COMM_WORLD); NumPart += count_get; myfree(keyBuf); myfree(partBuf); myfree(offset_recv); myfree(count_recv); myfree(offset); myfree(count); double t1 = second(); mpi_printf("DOMAIN: particle exchange done. (took %g sec)\n", timediff(t0, t1)); } GalIC/src/domain/domain_rearrange.c000644 000765 000024 00000002252 12373713530 020146 0ustar00volkerstaff000000 000000 /******************************************************************************* * This file is part of the GALIC code developed by D. Yurin and V. Springel. * * Copyright (c) 2014 * Denis Yurin (denis.yurin@h-its.org) * Volker Springel (volker.springel@h-its.org) *******************************************************************************/ #include #include #include #include #include #include #include "../allvars.h" #include "../proto.h" #include "domain.h" void domain_rearrange_particle_sequence(void) { #ifdef USE_SFR if(Stars_converted) { struct particle_data psave; peanokey key; int i; for(i = 0; i < NumGas; i++) if((P[i].Type & 15) != 0) /*If not a gas particle, swap to the end of the list */ { psave = P[i]; key = Key[i]; P[i] = P[NumGas - 1]; SphP[i] = SphP[NumGas - 1]; Key[i] = Key[NumGas - 1]; P[NumGas - 1] = psave; Key[NumGas - 1] = key; NumGas--; i--; } /*Now we have rearranged the particles, *we don't need to do it again unless there are more stars*/ Stars_converted = 0; } #endif } GalIC/src/domain/domain_sort_kernels.c000644 000765 000024 00000004276 12373713530 020722 0ustar00volkerstaff000000 000000 /******************************************************************************* * This file is part of the GALIC code developed by D. Yurin and V. Springel. * * Copyright (c) 2014 * Denis Yurin (denis.yurin@h-its.org) * Volker Springel (volker.springel@h-its.org) *******************************************************************************/ #include #include #include #include #include #include #include "../allvars.h" #include "../proto.h" #include "domain.h" int domain_compare_count(const void *a, const void *b) { if(((struct domain_count_data *) a)->count > (((struct domain_count_data *) b)->count)) return -1; if(((struct domain_count_data *) a)->count < (((struct domain_count_data *) b)->count)) return +1; return 0; } int domain_compare_key(const void *a, const void *b) { if(((struct domain_peano_hilbert_data *) a)->key < (((struct domain_peano_hilbert_data *) b)->key)) return -1; if(((struct domain_peano_hilbert_data *) a)->key > (((struct domain_peano_hilbert_data *) b)->key)) return +1; return 0; } static void msort_domain_with_tmp(struct domain_peano_hilbert_data *b, size_t n, struct domain_peano_hilbert_data *t) { struct domain_peano_hilbert_data *tmp; struct domain_peano_hilbert_data *b1, *b2; size_t n1, n2; if(n <= 1) return; n1 = n / 2; n2 = n - n1; b1 = b; b2 = b + n1; msort_domain_with_tmp(b1, n1, t); msort_domain_with_tmp(b2, n2, t); tmp = t; while(n1 > 0 && n2 > 0) { if(b1->key <= b2->key) { --n1; *tmp++ = *b1++; } else { --n2; *tmp++ = *b2++; } } if(n1 > 0) memcpy(tmp, b1, n1 * sizeof(struct domain_peano_hilbert_data)); memcpy(b, t, (n - n2) * sizeof(struct domain_peano_hilbert_data)); } void mysort_domain(void *b, size_t n, size_t s) { /* this function tends to work slightly faster than a call of qsort() for this particular * list, at least on most platforms */ const size_t size = n * s; struct domain_peano_hilbert_data *tmp; tmp = (struct domain_peano_hilbert_data *) mymalloc("tmp", size); msort_domain_with_tmp((struct domain_peano_hilbert_data *) b, n, tmp); myfree(tmp); } GalIC/src/domain/domain_toplevel.c000644 000765 000024 00000014777 12373713530 020051 0ustar00volkerstaff000000 000000 /******************************************************************************* * This file is part of the GALIC code developed by D. Yurin and V. Springel. * * Copyright (c) 2014 * Denis Yurin (denis.yurin@h-its.org) * Volker Springel (volker.springel@h-its.org) *******************************************************************************/ #include #include #include #include #include #include #include "../allvars.h" #include "../proto.h" #include "domain.h" /*! This function constructs the global top-level tree node that is used * for the domain decomposition. This is done by considering the string of * Peano-Hilbert keys for all particles, which is recursively chopped off * in pieces of eight segments until each segment holds at most a certain * number of particles. */ int domain_determineTopTree(void) { int i, count; mp = (struct domain_peano_hilbert_data *) mymalloc_movable(&mp, "mp", sizeof(struct domain_peano_hilbert_data) * NumPart); for(i = 0, count = 0; i < NumPart; i++) { int xb = domain_double_to_int(((P[i].Pos[0] - DomainCorner[0]) * DomainInverseLen) + 1.0); int yb = domain_double_to_int(((P[i].Pos[1] - DomainCorner[1]) * DomainInverseLen) + 1.0); int zb = domain_double_to_int(((P[i].Pos[2] - DomainCorner[2]) * DomainInverseLen) + 1.0); mp[count].key = Key[i] = peano_hilbert_key(xb, yb, zb, BITS_PER_DIMENSION); mp[count].index = i; count++; } mysort_domain(mp, count, sizeof(struct domain_peano_hilbert_data)); NTopnodes = 1; topNodes[0].Daughter = -1; topNodes[0].Parent = -1; topNodes[0].Size = PEANOCELLS; topNodes[0].StartKey = 0; topNodes[0].PIndex = 0; topNodes[0].Count = count; int list[1] = { 0 }; int *listp = list; domain_do_local_refine(1, &listp); myfree(mp); /* count the number of top leaves */ NTopleaves = 0; domain_walktoptree(0); mpi_printf("DOMAIN: NTopleaves=%d\n", NTopleaves); if(NTopleaves < All.MultipleDomains * NTask) terminate("NTopleaves = %d < All.MultipleDomains * NTask = %d * %d = %d", NTopleaves, All.MultipleDomains, NTask, All.MultipleDomains * NTask); mpi_printf("DOMAIN: determination of top-level tree done\n"); domain_sumCost(); mpi_printf("DOMAIN: cost summation for top-level tree done\n"); return 0; } int domain_do_local_refine(int n, int **listp) /* In list[], we store the node indices hat should be refined, N is their number */ { static int message_printed = 0; int i, j, k, l, p, sub, ret, *list; list = *listp; double limit = 1.0 / (All.TopNodeFactor * All.MultipleDomains * NTask); if(list[0] == 0) message_printed = 0; while((NTopnodes + 8 * n) > MaxTopNodes) { mpi_printf("DOMAIN: Increasing TopNodeAllocFactor=%g ", All.TopNodeAllocFactor); All.TopNodeAllocFactor *= 1.3; mpi_printf("new value=%g\n", All.TopNodeAllocFactor); if(All.TopNodeAllocFactor > 1000) terminate("something seems to be going seriously wrong here. Stopping.\n"); MaxTopNodes = (int) (All.TopNodeAllocFactor * All.MaxPart + 1); topNodes = (struct local_topnode_data *) myrealloc_movable(topNodes, (MaxTopNodes * sizeof(struct local_topnode_data))); TopNodes = (struct topnode_data *) myrealloc_movable(TopNodes, (MaxTopNodes * sizeof(struct topnode_data))); DomainTask = (int *) myrealloc_movable(DomainTask, (MaxTopNodes * sizeof(int))); DomainLeaveNode = (struct domain_cost_data *) myrealloc_movable(DomainLeaveNode, (MaxTopNodes * sizeof(struct domain_cost_data))); list = *listp; /* update this here because the above reallocations may have moved the pointer to the memory block */ } int *new_list = mymalloc_movable(&new_list, "new_list", 8 * n * sizeof(int)); double *worktotlist = mymalloc("worktotlist", 8 * n * sizeof(double)); double *worklist = mymalloc("worklist", 8 * n * sizeof(double)); double non_zero = 0, non_zero_tot; /* create the new nodes */ for(k = 0; k < n; k++) { i = list[k]; topNodes[i].Daughter = NTopnodes; NTopnodes += 8; for(j = 0; j < 8; j++) { sub = topNodes[i].Daughter + j; topNodes[sub].Daughter = -1; topNodes[sub].Parent = i; topNodes[sub].Size = (topNodes[i].Size >> 3); topNodes[sub].StartKey = topNodes[i].StartKey + j * topNodes[sub].Size; topNodes[sub].PIndex = topNodes[i].PIndex; topNodes[sub].Count = 0; } sub = topNodes[i].Daughter; for(p = topNodes[i].PIndex, j = 0; p < topNodes[i].PIndex + topNodes[i].Count; p++) { if(j < 7) while(mp[p].key >= topNodes[sub + 1].StartKey) { j++; sub++; topNodes[sub].PIndex = p; if(j >= 7) break; } topNodes[sub].Count++; } for(j = 0; j < 8; j++) { sub = topNodes[i].Daughter + j; worklist[k * 8 + j] = fac_load * topNodes[sub].Count; if(worklist[k * 8 + j] != 0) non_zero++; } } MPI_Allreduce(&non_zero, &non_zero_tot, 1, MPI_DOUBLE, MPI_SUM, MPI_COMM_WORLD); MPI_Allreduce(worklist, worktotlist, 8 * n, MPI_DOUBLE, MPI_SUM, MPI_COMM_WORLD); int new_n = 0; for(k = 0, l = 0; k < n; k++) { i = list[k]; for(j = 0; j < 8; j++, l++) { sub = topNodes[i].Daughter + j; if(worktotlist[l] > limit) { if(topNodes[sub].Size < 8) { if(message_printed == 0) { mpi_printf("DOMAIN: Note: we would like to refine top-tree, but PEANOGRID is not fine enough\n"); message_printed = 1; } } else new_list[new_n++] = sub; } } } myfree(worklist); myfree(worktotlist); new_list = myrealloc(new_list, new_n * sizeof(int)); if(new_n > 0) ret = domain_do_local_refine(new_n, &new_list); else ret = 0; myfree(new_list); return ret; } /*! This function walks the global top tree in order to establish the * number of leaves it has, and for assigning the leaf numbers along the * Peano-Hilbert Curve. These leaves are later combined to domain pieces, * which are distributed to different processors. */ void domain_walktoptree(int no) { int i; if(topNodes[no].Daughter == -1) { topNodes[no].Leaf = NTopleaves; NTopleaves++; } else { for(i = 0; i < 8; i++) domain_walktoptree(topNodes[no].Daughter + i); } } GalIC/src/domain/domain_vars.c000644 000765 000024 00000003247 12373713530 017160 0ustar00volkerstaff000000 000000 /******************************************************************************* * This file is part of the GALIC code developed by D. Yurin and V. Springel. * * Copyright (c) 2014 * Denis Yurin (denis.yurin@h-its.org) * Volker Springel (volker.springel@h-its.org) *******************************************************************************/ #include #include #include #include #include #include #include "../allvars.h" #include "../proto.h" #include "domain.h" struct domain_peano_hilbert_data *mp; struct local_topnode_data *topNodes, *branchNodes; /*!< points to the root node of the top-level tree */ double totpartcount; struct domain_cost_data *DomainLeaveNode; double fac_load; int Nbranch; /*! toGo[partner] gives the number of particles on the current task that have to go to task 'partner' */ int *toGo; int *toGet; int *list_NumPart; int *list_load; void domain_allocate_lists(void) { Key = (peanokey *) mymalloc_movable(&Key, "domain_key", (sizeof(peanokey) * All.MaxPart)); toGo = (int *) mymalloc_movable(&toGo, "toGo", (sizeof(int) * NTask)); toGet = (int *) mymalloc_movable(&toGet, "toGet", (sizeof(int) * NTask)); list_NumPart = (int *) mymalloc_movable(&list_NumPart, "list_NumPart", (sizeof(int) * NTask)); list_load = (int *) mymalloc_movable(&list_load, "list_load", (sizeof(int) * NTask)); DomainLeaveNode = (struct domain_cost_data *) mymalloc_movable(&DomainLeaveNode, "DomainLeaveNode", (MaxTopNodes * sizeof(struct domain_cost_data))); } void domain_free_lists(void) { myfree(DomainLeaveNode); myfree(list_load); myfree(list_NumPart); myfree(toGet); myfree(toGo); } GalIC/src/domain/peano.c000644 000765 000024 00000041047 12373713530 015760 0ustar00volkerstaff000000 000000 /******************************************************************************* * This file is part of the GALIC code developed by D. Yurin and V. Springel. * * Copyright (c) 2014 * Denis Yurin (denis.yurin@h-its.org) * Volker Springel (volker.springel@h-its.org) *******************************************************************************/ #include #include #include #include #include #include "../allvars.h" #include "../proto.h" #include "domain.h" void mysort_peano(void *b, size_t n, size_t s, int (*cmp) (const void *, const void *)); void peano_hilbert_key_inverse(peanokey key, int bits, int *x, int *y, int *z); int peano_compare_key(const void *a, const void *b); #include /** Returns the peanokey for the position on the Peano-Hilbert curve that contains pos. */ peanokey position_to_peanokey(MyDouble pos[3]) { // First find peano-hilbert key for the position peanokey key = peano_hilbert_key((int) ((pos[0] - DomainCorner[0]) * DomainFac), (int) ((pos[1] - DomainCorner[1]) * DomainFac), (int) ((pos[2] - DomainCorner[2]) * DomainFac), BITS_PER_DIMENSION); return key; } /** Returns the topnode index containing the specified peanokey. */ int peanokey_to_topnode(peanokey key) { int node = 0; while(TopNodes[node].Daughter >= 0) { node = TopNodes[node].Daughter + (key - TopNodes[node].StartKey) / (TopNodes[node].Size / 8); } node = TopNodes[node].Leaf; return node; } void peano_hilbert_key_extended(peanokey * keyp1, peanokey * keyp0, int x, int y, int z, int bits); static struct peano_hilbert_data { #ifdef PEANOHILBERT_EXTEND_DYNAMIC_RANGE peanokey key1, key0; #else peanokey key; #endif int index; } *pmp; static int *Id; void peano_hilbert_order(void) { int i; mpi_printf("begin Peano-Hilbert order...\n"); if(NumPart - NumGas > 0) { pmp = (struct peano_hilbert_data *) mymalloc("pmp", sizeof(struct peano_hilbert_data) * (NumPart - NumGas)); pmp -= (NumGas); Id = (int *) mymalloc("Id", sizeof(int) * (NumPart - NumGas)); Id -= (NumGas); for(i = NumGas; i < NumPart; i++) { pmp[i].index = i; #ifdef PEANOHILBERT_EXTEND_DYNAMIC_RANGE peano_hilbert_key_extended(&pmp[i].key1, &pmp[i].key0, (int) ((P[i].Pos[0] - DomainCorner[0]) * DomainFac), (int) ((P[i].Pos[1] - DomainCorner[1]) * DomainFac), (int) ((P[i].Pos[2] - DomainCorner[2]) * DomainFac), BITS_PER_DIMENSION); #else pmp[i].key = Key[i]; #endif } mysort_peano(pmp + NumGas, NumPart - NumGas, sizeof(struct peano_hilbert_data), peano_compare_key); for(i = NumGas; i < NumPart; i++) Id[pmp[i].index] = i; reorder_particles(Id); Id += NumGas; myfree(Id); pmp += NumGas; myfree(pmp); } mpi_printf("Peano-Hilbert done.\n"); } void peano_hilbert_order_DP(void) { mpi_printf("begin Peano-Hilbert order of DP points...\n"); /* int i; if(Mesh.Ndp) { pmp = (struct peano_hilbert_data *) mymalloc("pmp", sizeof(struct peano_hilbert_data) * Mesh.Ndp); Id = (int *) mymalloc("Id", sizeof(int) * Mesh.Ndp); point *DP = Mesh.DP; for(i = 0; i < Mesh.Ndp; i++) { pmp[i].index = i; #ifdef PEANOHILBERT_EXTEND_DYNAMIC_RANGE peano_hilbert_key_extended(&pmp[i].key1, &pmp[i].key0, (int) ((DP[i].x + DomainLen) * DomainFac / 3), (int) ((DP[i].y + DomainLen) * DomainFac / 3), (int) ((DP[i].z + DomainLen) * DomainFac / 3), BITS_PER_DIMENSION); #else pmp[i].key = peano_hilbert_key((int) ((DP[i].x + DomainLen) * DomainFac / 3), (int) ((DP[i].y + DomainLen) * DomainFac / 3), (int) ((DP[i].z + DomainLen) * DomainFac / 3), BITS_PER_DIMENSION); #endif } mysort_peano(pmp, Mesh.Ndp, sizeof(struct peano_hilbert_data), peano_compare_key); for(i = 0; i < Mesh.Ndp; i++) Id[pmp[i].index] = i; myfree(Id); myfree(pmp); } */ mpi_printf("Peano-Hilbert of DP points done.\n"); } int peano_compare_key(const void *a, const void *b) { #ifdef PEANOHILBERT_EXTEND_DYNAMIC_RANGE if(((struct peano_hilbert_data *) a)->key1 < (((struct peano_hilbert_data *) b)->key1)) return -1; if(((struct peano_hilbert_data *) a)->key1 > (((struct peano_hilbert_data *) b)->key1)) return +1; if(((struct peano_hilbert_data *) a)->key0 < (((struct peano_hilbert_data *) b)->key0)) return -1; if(((struct peano_hilbert_data *) a)->key0 > (((struct peano_hilbert_data *) b)->key0)) return +1; #else if(((struct peano_hilbert_data *) a)->key < (((struct peano_hilbert_data *) b)->key)) return -1; if(((struct peano_hilbert_data *) a)->key > (((struct peano_hilbert_data *) b)->key)) return +1; #endif return 0; } void reorder_particles(int *Id) { int i; struct particle_data Psave, Psource; int idsource, idsave, dest; for(i = NumGas; i < NumPart; i++) { if(Id[i] != i) { Psource = P[i]; idsource = Id[i]; dest = Id[i]; do { Psave = P[dest]; idsave = Id[dest]; P[dest] = Psource; Id[dest] = idsource; if(dest == i) break; Psource = Psave; idsource = idsave; dest = idsource; } while(1); } } } /* The following rewrite of the original function * peano_hilbert_key_old() has been written by MARTIN REINECKE. * It is about a factor 2.3 - 2.5 faster than Volker's old routine! */ const unsigned char rottable3[48][8] = { {36, 28, 25, 27, 10, 10, 25, 27}, {29, 11, 24, 24, 37, 11, 26, 26}, {8, 8, 25, 27, 30, 38, 25, 27}, {9, 39, 24, 24, 9, 31, 26, 26}, {40, 24, 44, 32, 40, 6, 44, 6}, {25, 7, 33, 7, 41, 41, 45, 45}, {4, 42, 4, 46, 26, 42, 34, 46}, {43, 43, 47, 47, 5, 27, 5, 35}, {33, 35, 36, 28, 33, 35, 2, 2}, {32, 32, 29, 3, 34, 34, 37, 3}, {33, 35, 0, 0, 33, 35, 30, 38}, {32, 32, 1, 39, 34, 34, 1, 31}, {24, 42, 32, 46, 14, 42, 14, 46}, {43, 43, 47, 47, 25, 15, 33, 15}, {40, 12, 44, 12, 40, 26, 44, 34}, {13, 27, 13, 35, 41, 41, 45, 45}, {28, 41, 28, 22, 38, 43, 38, 22}, {42, 40, 23, 23, 29, 39, 29, 39}, {41, 36, 20, 36, 43, 30, 20, 30}, {37, 31, 37, 31, 42, 40, 21, 21}, {28, 18, 28, 45, 38, 18, 38, 47}, {19, 19, 46, 44, 29, 39, 29, 39}, {16, 36, 45, 36, 16, 30, 47, 30}, {37, 31, 37, 31, 17, 17, 46, 44}, {12, 4, 1, 3, 34, 34, 1, 3}, {5, 35, 0, 0, 13, 35, 2, 2}, {32, 32, 1, 3, 6, 14, 1, 3}, {33, 15, 0, 0, 33, 7, 2, 2}, {16, 0, 20, 8, 16, 30, 20, 30}, {1, 31, 9, 31, 17, 17, 21, 21}, {28, 18, 28, 22, 2, 18, 10, 22}, {19, 19, 23, 23, 29, 3, 29, 11}, {9, 11, 12, 4, 9, 11, 26, 26}, {8, 8, 5, 27, 10, 10, 13, 27}, {9, 11, 24, 24, 9, 11, 6, 14}, {8, 8, 25, 15, 10, 10, 25, 7}, {0, 18, 8, 22, 38, 18, 38, 22}, {19, 19, 23, 23, 1, 39, 9, 39}, {16, 36, 20, 36, 16, 2, 20, 10}, {37, 3, 37, 11, 17, 17, 21, 21}, {4, 17, 4, 46, 14, 19, 14, 46}, {18, 16, 47, 47, 5, 15, 5, 15}, {17, 12, 44, 12, 19, 6, 44, 6}, {13, 7, 13, 7, 18, 16, 45, 45}, {4, 42, 4, 21, 14, 42, 14, 23}, {43, 43, 22, 20, 5, 15, 5, 15}, {40, 12, 21, 12, 40, 6, 23, 6}, {13, 7, 13, 7, 41, 41, 22, 20} }; const unsigned char subpix3[48][8] = { {0, 7, 1, 6, 3, 4, 2, 5}, {7, 4, 6, 5, 0, 3, 1, 2}, {4, 3, 5, 2, 7, 0, 6, 1}, {3, 0, 2, 1, 4, 7, 5, 6}, {1, 0, 6, 7, 2, 3, 5, 4}, {0, 3, 7, 4, 1, 2, 6, 5}, {3, 2, 4, 5, 0, 1, 7, 6}, {2, 1, 5, 6, 3, 0, 4, 7}, {6, 1, 7, 0, 5, 2, 4, 3}, {1, 2, 0, 3, 6, 5, 7, 4}, {2, 5, 3, 4, 1, 6, 0, 7}, {5, 6, 4, 7, 2, 1, 3, 0}, {7, 6, 0, 1, 4, 5, 3, 2}, {6, 5, 1, 2, 7, 4, 0, 3}, {5, 4, 2, 3, 6, 7, 1, 0}, {4, 7, 3, 0, 5, 6, 2, 1}, {6, 7, 5, 4, 1, 0, 2, 3}, {7, 0, 4, 3, 6, 1, 5, 2}, {0, 1, 3, 2, 7, 6, 4, 5}, {1, 6, 2, 5, 0, 7, 3, 4}, {2, 3, 1, 0, 5, 4, 6, 7}, {3, 4, 0, 7, 2, 5, 1, 6}, {4, 5, 7, 6, 3, 2, 0, 1}, {5, 2, 6, 1, 4, 3, 7, 0}, {7, 0, 6, 1, 4, 3, 5, 2}, {0, 3, 1, 2, 7, 4, 6, 5}, {3, 4, 2, 5, 0, 7, 1, 6}, {4, 7, 5, 6, 3, 0, 2, 1}, {6, 7, 1, 0, 5, 4, 2, 3}, {7, 4, 0, 3, 6, 5, 1, 2}, {4, 5, 3, 2, 7, 6, 0, 1}, {5, 6, 2, 1, 4, 7, 3, 0}, {1, 6, 0, 7, 2, 5, 3, 4}, {6, 5, 7, 4, 1, 2, 0, 3}, {5, 2, 4, 3, 6, 1, 7, 0}, {2, 1, 3, 0, 5, 6, 4, 7}, {0, 1, 7, 6, 3, 2, 4, 5}, {1, 2, 6, 5, 0, 3, 7, 4}, {2, 3, 5, 4, 1, 0, 6, 7}, {3, 0, 4, 7, 2, 1, 5, 6}, {1, 0, 2, 3, 6, 7, 5, 4}, {0, 7, 3, 4, 1, 6, 2, 5}, {7, 6, 4, 5, 0, 1, 3, 2}, {6, 1, 5, 2, 7, 0, 4, 3}, {5, 4, 6, 7, 2, 3, 1, 0}, {4, 3, 7, 0, 5, 2, 6, 1}, {3, 2, 0, 1, 4, 5, 7, 6}, {2, 5, 1, 6, 3, 4, 0, 7} }; /*! This function computes a Peano-Hilbert key for an integer triplet (x,y,z), * with x,y,z in the range between 0 and 2^bits-1. */ peanokey peano_hilbert_key(int x, int y, int z, int bits) { int mask; unsigned char rotation = 0; peanokey key = 0; for(mask = 1 << (bits - 1); mask > 0; mask >>= 1) { unsigned char pix = ((x & mask) ? 4 : 0) | ((y & mask) ? 2 : 0) | ((z & mask) ? 1 : 0); key <<= 3; key |= subpix3[rotation][pix]; rotation = rottable3[rotation][pix]; } return key; } void peano_hilbert_key_extended(peanokey * keyp1, peanokey * keyp0, int x, int y, int z, int bits) { int mask; unsigned char rotation = 0; peanokey key1 = 0, key0 = 0; bits *= 2; /* double the dynamic range */ for(mask = 1 << (bits - 1); mask > 0; mask >>= 1) { unsigned char pix = ((x & mask) ? 4 : 0) | ((y & mask) ? 2 : 0) | ((z & mask) ? 1 : 0); key1 <<= 3; key1 |= (key0 >> 61) & 7; key0 <<= 3; key0 |= subpix3[rotation][pix]; rotation = rottable3[rotation][pix]; } *keyp0 = key0; *keyp1 = key1; } peanokey morton_key(int x, int y, int z, int bits) { int mask; peanokey morton = 0; for(mask = 1 << (bits - 1); mask > 0; mask >>= 1) { morton <<= 3; morton |= ((z & mask) ? 4 : 0) | ((y & mask) ? 2 : 0) | ((x & mask) ? 1 : 0); } return morton; } peanokey peano_and_morton_key(int x, int y, int z, int bits, peanokey * morton_key) { int mask; unsigned char rotation = 0; peanokey key = 0; peanokey morton = 0; for(mask = 1 << (bits - 1); mask > 0; mask >>= 1) { unsigned char pix = ((x & mask) ? 4 : 0) | ((y & mask) ? 2 : 0) | ((z & mask) ? 1 : 0); key <<= 3; key |= subpix3[rotation][pix]; rotation = rottable3[rotation][pix]; morton <<= 3; morton |= ((z & mask) ? 4 : 0) | ((y & mask) ? 2 : 0) | ((x & mask) ? 1 : 0); } *morton_key = morton; return key; } static int quadrants[24][2][2][2] = { /* rotx=0, roty=0-3 */ {{{0, 7}, {1, 6}}, {{3, 4}, {2, 5}}}, {{{7, 4}, {6, 5}}, {{0, 3}, {1, 2}}}, {{{4, 3}, {5, 2}}, {{7, 0}, {6, 1}}}, {{{3, 0}, {2, 1}}, {{4, 7}, {5, 6}}}, /* rotx=1, roty=0-3 */ {{{1, 0}, {6, 7}}, {{2, 3}, {5, 4}}}, {{{0, 3}, {7, 4}}, {{1, 2}, {6, 5}}}, {{{3, 2}, {4, 5}}, {{0, 1}, {7, 6}}}, {{{2, 1}, {5, 6}}, {{3, 0}, {4, 7}}}, /* rotx=2, roty=0-3 */ {{{6, 1}, {7, 0}}, {{5, 2}, {4, 3}}}, {{{1, 2}, {0, 3}}, {{6, 5}, {7, 4}}}, {{{2, 5}, {3, 4}}, {{1, 6}, {0, 7}}}, {{{5, 6}, {4, 7}}, {{2, 1}, {3, 0}}}, /* rotx=3, roty=0-3 */ {{{7, 6}, {0, 1}}, {{4, 5}, {3, 2}}}, {{{6, 5}, {1, 2}}, {{7, 4}, {0, 3}}}, {{{5, 4}, {2, 3}}, {{6, 7}, {1, 0}}}, {{{4, 7}, {3, 0}}, {{5, 6}, {2, 1}}}, /* rotx=4, roty=0-3 */ {{{6, 7}, {5, 4}}, {{1, 0}, {2, 3}}}, {{{7, 0}, {4, 3}}, {{6, 1}, {5, 2}}}, {{{0, 1}, {3, 2}}, {{7, 6}, {4, 5}}}, {{{1, 6}, {2, 5}}, {{0, 7}, {3, 4}}}, /* rotx=5, roty=0-3 */ {{{2, 3}, {1, 0}}, {{5, 4}, {6, 7}}}, {{{3, 4}, {0, 7}}, {{2, 5}, {1, 6}}}, {{{4, 5}, {7, 6}}, {{3, 2}, {0, 1}}}, {{{5, 2}, {6, 1}}, {{4, 3}, {7, 0}}} }; static int rotxmap_table[24] = { 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 0, 1, 2, 3, 17, 18, 19, 16, 23, 20, 21, 22 }; static int rotymap_table[24] = { 1, 2, 3, 0, 16, 17, 18, 19, 11, 8, 9, 10, 22, 23, 20, 21, 14, 15, 12, 13, 4, 5, 6, 7 }; static int rotx_table[8] = { 3, 0, 0, 2, 2, 0, 0, 1 }; static int roty_table[8] = { 0, 1, 1, 2, 2, 3, 3, 0 }; static int sense_table[8] = { -1, -1, -1, +1, +1, -1, -1, -1 }; static int flag_quadrants_inverse = 1; static char quadrants_inverse_x[24][8]; static char quadrants_inverse_y[24][8]; static char quadrants_inverse_z[24][8]; peanokey peano_hilbert_key_old(int x, int y, int z, int bits) { int i, quad, bitx, bity, bitz; int mask, rotation, rotx, roty, sense; peanokey key; mask = 1 << (bits - 1); key = 0; rotation = 0; sense = 1; for(i = 0; i < bits; i++, mask >>= 1) { bitx = (x & mask) ? 1 : 0; bity = (y & mask) ? 1 : 0; bitz = (z & mask) ? 1 : 0; quad = quadrants[rotation][bitx][bity][bitz]; key <<= 3; key += (sense == 1) ? (quad) : (7 - quad); rotx = rotx_table[quad]; roty = roty_table[quad]; sense *= sense_table[quad]; while(rotx > 0) { rotation = rotxmap_table[rotation]; rotx--; } while(roty > 0) { rotation = rotymap_table[rotation]; roty--; } } return key; } peanokey peano_and_morton_key_old(int x, int y, int z, int bits, peanokey * morton_key) { int i, quad, bitx, bity, bitz; int mask, rotation, rotx, roty, sense; peanokey key, morton; mask = 1 << (bits - 1); key = 0; rotation = 0; sense = 1; morton = 0; for(i = 0; i < bits; i++, mask >>= 1) { bitx = (x & mask) ? 1 : 0; bity = (y & mask) ? 1 : 0; bitz = (z & mask) ? 1 : 0; quad = quadrants[rotation][bitx][bity][bitz]; key <<= 3; key += (sense == 1) ? (quad) : (7 - quad); rotx = rotx_table[quad]; roty = roty_table[quad]; sense *= sense_table[quad]; while(rotx > 0) { rotation = rotxmap_table[rotation]; rotx--; } while(roty > 0) { rotation = rotymap_table[rotation]; roty--; } morton <<= 3; morton += (bitz << 2) + (bity << 1) + bitx; } *morton_key = morton; return key; } peanokey morton_key_old(int x, int y, int z, int bits) { int i, bitx, bity, bitz; int mask; peanokey morton; mask = 1 << (bits - 1); morton = 0; for(i = 0; i < bits; i++, mask >>= 1) { bitx = (x & mask) ? 1 : 0; bity = (y & mask) ? 1 : 0; bitz = (z & mask) ? 1 : 0; morton <<= 3; morton += (bitz << 2) + (bity << 1) + bitx; } return morton; } void peano_hilbert_key_inverse(peanokey key, int bits, int *x, int *y, int *z) { int i, bitx, bity, bitz, quad, rotation, shift; peanokey keypart, mask; char sense, rotx, roty; if(flag_quadrants_inverse) { flag_quadrants_inverse = 0; for(rotation = 0; rotation < 24; rotation++) for(bitx = 0; bitx < 2; bitx++) for(bity = 0; bity < 2; bity++) for(bitz = 0; bitz < 2; bitz++) { quad = quadrants[rotation][bitx][bity][bitz]; quadrants_inverse_x[rotation][quad] = bitx; quadrants_inverse_y[rotation][quad] = bity; quadrants_inverse_z[rotation][quad] = bitz; } } shift = 3 * (bits - 1); mask = ((long long) 7) << shift; rotation = 0; sense = 1; *x = *y = *z = 0; for(i = 0; i < bits; i++, mask >>= 3, shift -= 3) { keypart = (key & mask) >> shift; quad = (sense == 1) ? (keypart) : (7 - keypart); *x = (*x << 1) + quadrants_inverse_x[rotation][quad]; *y = (*y << 1) + quadrants_inverse_y[rotation][quad]; *z = (*z << 1) + quadrants_inverse_z[rotation][quad]; rotx = rotx_table[quad]; roty = roty_table[quad]; sense *= sense_table[quad]; while(rotx > 0) { rotation = rotxmap_table[rotation]; rotx--; } while(roty > 0) { rotation = rotymap_table[rotation]; roty--; } } } static void msort_peano_with_tmp(struct peano_hilbert_data *b, size_t n, struct peano_hilbert_data *t) { struct peano_hilbert_data *tmp; struct peano_hilbert_data *b1, *b2; size_t n1, n2; if(n <= 1) return; n1 = n / 2; n2 = n - n1; b1 = b; b2 = b + n1; msort_peano_with_tmp(b1, n1, t); msort_peano_with_tmp(b2, n2, t); tmp = t; while(n1 > 0 && n2 > 0) { #ifdef PEANOHILBERT_EXTEND_DYNAMIC_RANGE if(b1->key1 <= b2->key1 || (b1->key1 == b2->key1 && b1->key0 <= b2->key0)) #else if(b1->key <= b2->key) #endif { --n1; *tmp++ = *b1++; } else { --n2; *tmp++ = *b2++; } } if(n1 > 0) memcpy(tmp, b1, n1 * sizeof(struct peano_hilbert_data)); memcpy(b, t, (n - n2) * sizeof(struct peano_hilbert_data)); } void mysort_peano(void *b, size_t n, size_t s, int (*cmp) (const void *, const void *)) { /* this function could be replaced by a call of qsort(b, n, s, cmp), but the present * merge sort implementation is usually a bit faster for this array */ const size_t size = n * s; struct peano_hilbert_data *tmp = (struct peano_hilbert_data *) mymalloc("tmp", size); msort_peano_with_tmp((struct peano_hilbert_data *) b, n, tmp); myfree(tmp); } GalIC/src/domain/pqueue.c000644 000765 000024 00000012105 12373713530 016153 0ustar00volkerstaff000000 000000 /******************************************************************************* * This file is part of the GALIC code developed by D. Yurin and V. Springel. * * Copyright (c) 2014 * Denis Yurin (denis.yurin@h-its.org) * Volker Springel (volker.springel@h-its.org) *******************************************************************************/ /* V. Springel modified some of the memory allocation calls to inline it with * our internal memory handler. */ #include #include #include #include "pqueue.h" #include "../allvars.h" #include "../proto.h" #define left(i) ((i) << 1) #define right(i) (((i) << 1) + 1) #define parent(i) ((i) >> 1) pqueue_t *pqueue_init(size_t n, pqueue_cmp_pri_f cmppri, pqueue_get_pri_f getpri, pqueue_set_pri_f setpri, pqueue_get_pos_f getpos, pqueue_set_pos_f setpos) { pqueue_t *q; q = mymalloc("q", sizeof(pqueue_t)); /* Need to allocate n+1 elements since element 0 isn't used. */ q->d = mymalloc("q->d", (n + 1) * sizeof(void *)); q->size = 1; q->avail = q->step = (n + 1); /* see comment above about n+1 */ q->cmppri = cmppri; q->setpri = setpri; q->getpri = getpri; q->getpos = getpos; q->setpos = setpos; return q; } void pqueue_free(pqueue_t * q) { myfree(q->d); myfree(q); } size_t pqueue_size(pqueue_t * q) { /* queue element 0 exists but doesn't count since it isn't used. */ return (q->size - 1); } static void bubble_up(pqueue_t * q, size_t i) { size_t parent_node; void *moving_node = q->d[i]; pqueue_pri_t moving_pri = q->getpri(moving_node); for(parent_node = parent(i); ((i > 1) && q->cmppri(q->getpri(q->d[parent_node]), moving_pri)); i = parent_node, parent_node = parent(i)) { q->d[i] = q->d[parent_node]; q->setpos(q->d[i], i); } q->d[i] = moving_node; q->setpos(moving_node, i); } static size_t maxchild(pqueue_t * q, size_t i) { size_t child_node = left(i); if(child_node >= q->size) return 0; if((child_node + 1) < q->size && q->cmppri(q->getpri(q->d[child_node]), q->getpri(q->d[child_node + 1]))) child_node++; /* use right child instead of left */ return child_node; } static void percolate_down(pqueue_t * q, size_t i) { size_t child_node; void *moving_node = q->d[i]; pqueue_pri_t moving_pri = q->getpri(moving_node); while((child_node = maxchild(q, i)) && q->cmppri(moving_pri, q->getpri(q->d[child_node]))) { q->d[i] = q->d[child_node]; q->setpos(q->d[i], i); i = child_node; } q->d[i] = moving_node; q->setpos(moving_node, i); } int pqueue_insert(pqueue_t * q, void *d) { size_t i; size_t newsize; if(!q) return 1; /* allocate more memory if necessary */ if(q->size >= q->avail) { newsize = q->size + q->step; q->d = myrealloc(q->d, sizeof(void *) * newsize); q->avail = newsize; } /* insert item */ i = q->size++; q->d[i] = d; bubble_up(q, i); return 0; } void pqueue_change_priority(pqueue_t * q, pqueue_pri_t new_pri, void *d) { size_t posn; pqueue_pri_t old_pri = q->getpri(d); q->setpri(d, new_pri); posn = q->getpos(d); if(q->cmppri(old_pri, new_pri)) bubble_up(q, posn); else percolate_down(q, posn); } int pqueue_remove(pqueue_t * q, void *d) { size_t posn = q->getpos(d); q->d[posn] = q->d[--q->size]; if(q->cmppri(q->getpri(d), q->getpri(q->d[posn]))) bubble_up(q, posn); else percolate_down(q, posn); return 0; } void *pqueue_pop(pqueue_t * q) { void *head; if(!q || q->size == 1) return NULL; head = q->d[1]; q->d[1] = q->d[--q->size]; percolate_down(q, 1); return head; } void *pqueue_peek(pqueue_t * q) { void *d; if(!q || q->size == 1) return NULL; d = q->d[1]; return d; } void pqueue_dump(pqueue_t * q, FILE * out, pqueue_print_entry_f print) { int i; fprintf(stdout, "posn\tleft\tright\tparent\tmaxchild\t...\n"); for(i = 1; i < q->size; i++) { fprintf(stdout, "%d\t%d\t%d\t%d\t%ul\t", i, left(i), right(i), parent(i), (unsigned int) maxchild(q, i)); print(out, q->d[i]); } } static void set_pos(void *d, size_t val) { /* do nothing */ } static void set_pri(void *d, pqueue_pri_t pri) { /* do nothing */ } void pqueue_print(pqueue_t * q, FILE * out, pqueue_print_entry_f print) { pqueue_t *dup; void *e; dup = pqueue_init(q->size, q->cmppri, q->getpri, set_pri, q->getpos, set_pos); dup->size = q->size; dup->avail = q->avail; dup->step = q->step; memcpy(dup->d, q->d, (q->size * sizeof(void *))); while((e = pqueue_pop(dup))) print(out, e); pqueue_free(dup); } static int subtree_is_valid(pqueue_t * q, int pos) { if(left(pos) < q->size) { /* has a left child */ if(q->cmppri(q->getpri(q->d[pos]), q->getpri(q->d[left(pos)]))) return 0; if(!subtree_is_valid(q, left(pos))) return 0; } if(right(pos) < q->size) { /* has a right child */ if(q->cmppri(q->getpri(q->d[pos]), q->getpri(q->d[right(pos)]))) return 0; if(!subtree_is_valid(q, right(pos))) return 0; } return 1; } int pqueue_is_valid(pqueue_t * q) { return subtree_is_valid(q, 1); } GalIC/src/forcetree/forcetree.c000644 000765 000024 00000075576 12373713530 017361 0ustar00volkerstaff000000 000000 /******************************************************************************* * This file is part of the GALIC code developed by D. Yurin and V. Springel. * * Copyright (c) 2014 * Denis Yurin (denis.yurin@h-its.org) * Volker Springel (volker.springel@h-its.org) *******************************************************************************/ #include #include #include #include #include #include #ifdef USE_SSE #include #include #endif #include "../allvars.h" #include "../proto.h" #include "../domain/domain.h" static int *th_list; static unsigned char *level_list; /*! \file forcetree.c * \brief gravitational tree build * * This file contains the construction of the tree used for calculating the gravitational force. * The type tree implemented is a geometrical oct-tree, starting from a cube encompassing * all particles. This cube is automatically found in the domain decomposition, which also * splits up the global "top-level" tree along node boundaries, moving the particles * of different parts of the tree to separate processors. In this version of the code, the tree * construction may be repeated every timestep without a renewed domain decomposition. * If particles are on the "wrong" processor because a new domain decomposition has not been * carried out, they are sent as temporary points to the right insertion processor according * to the layout of the top-level nodes. In addition, the mapping of the top-level nodes to * processors may be readjusted in order to improve work-load balance for the current time step. * */ /*! This function is a driver routine for constructing the gravitational oct-tree. * * \return number of local+top nodes of the constructed tree */ int force_treebuild(int npart /*!< number of particles on local task */, int optimized_domain_mapping /*!< specifies if mapping of the top-level nodes to processors may be readjusted */) { int i, flag; mpi_printf("FORCETREE: Tree construction. (presently allocated=%g MB)\n", AllocatedBytes / (1024.0 * 1024.0)); do /* try constructing tree until successful */ { int flag_single = force_treebuild_construct(npart, optimized_domain_mapping); MPI_Allreduce(&flag_single, &flag, 1, MPI_INT, MPI_MIN, MPI_COMM_WORLD); if(flag < 0) { /* tree construction was not successful and needs to be repeated */ if(flag_single != -2) { myfree(Tree_Points); } force_treefree(); All.TreeAllocFactor *= 1.15; mpi_printf("FORCETREE: Increasing TreeAllocFactor, new value=%g\n", All.TreeAllocFactor); force_treeallocate(npart, All.MaxPart); } } while(flag < 0); Nextnode = (int *) mymalloc_movable(&Nextnode, "Nextnode", (Tree_MaxPart + NTopleaves + Tree_NumPartImported) * sizeof(int)); Father = (int *) mymalloc_movable(&Father, "Father", (Tree_MaxPart + Tree_NumPartImported) * sizeof(int)); for(i = 0; i < Tree_MaxPart + Tree_NumPartImported; i++) Father[i] = -1; /* insert the pseudo particles that represent the mass distribution of other domains */ force_insert_pseudo_particles(); /* now compute the multipole moments recursively */ int last = -1; force_update_node_recursive(Tree_MaxPart, -1, -1, &last); if(last >= Tree_MaxPart) { if(last >= Tree_MaxPart + Tree_MaxNodes) /* a pseudo-particle or imported particle */ Nextnode[last - Tree_MaxNodes] = -1; else Nodes[last].u.d.nextnode = -1; } else Nextnode[last] = -1; force_exchange_topleafdata(); Tree_NextFreeNode = Tree_MaxPart + 1; force_treeupdate_toplevel(Tree_MaxPart, 0, 1, 0, 0, 0); int max_imported; long long tot_imported; sumup_large_ints(1, &Tree_NumPartImported, &tot_imported); MPI_Reduce(&Tree_NumPartImported, &max_imported, 1, MPI_INT, MPI_MAX, 0, MPI_COMM_WORLD); double numnodes = Tree_NumNodes, tot_numnodes; MPI_Reduce(&numnodes, &tot_numnodes, 1, MPI_DOUBLE, MPI_SUM, 0, MPI_COMM_WORLD); mpi_printf ("FORCETREE: Tree construction done. (imported/local ratio: max=%g avg_global=%g) =%g NTopnodes=%d NTopleaves=%d tree-build-scalability=%g\n", max_imported / ((double) (All.TotNumPart / NTask) + 1.0e-60), tot_imported / ((double) All.TotNumPart), tot_numnodes / NTask, NTopnodes, NTopleaves, ((double) ((tot_numnodes - NTask * ((double)NTopnodes)) + NTopnodes)) / tot_numnodes); return Tree_NumNodes; } /*! Constructs the gravitational oct-tree. * * The index convention for accessing tree nodes is the following: \n * node index \n * [0... Tree_MaxPart-1] references single particles, the indices \n * [Tree_MaxPart... Tree_MaxPart+Tree_MaxNodes-1] references tree nodes \n * [Tree_MaxPart+Tree_MaxNodes... Tree_MaxPart+Tree_MaxNodes+NTopleaves-1] references "pseudo particles", i.e. mark branches on foreign CPUs \n * [Tree_MaxPart+Tree_MaxNodes+NTopleaves... Tree_MaxPart+Tree_MaxNodes+NTopleaves+Tree_NumPartImported-1] references imported points \n * * the pointer `Nodes' is shifted such that Nodes[Tree_MaxPart] gives the first tree node (i.e. the root node). * * \return if successful returns the number of local+top nodes of the constructed tree \n * -1 if the number of allocated tree nodes is too small \n * -2 if the number of allocated tree nodes is even too small to fit the top nodes \n * -3 if a particle out of domain box condition was encountered */ int force_treebuild_construct(int npart /*!< number of particles on local task */, int optimized_domain_mapping /*!< specifies if mapping of the top-level nodes to processors may be readjusted */) { int i, j, no; int ngrp, recvTask; unsigned long long *intposp; MyDouble *posp; optimized_domain_mapping = 0; /* create an empty root node */ Tree_NextFreeNode = Tree_MaxPart; /* index of first free node */ struct NODE *nfreep = &Nodes[Tree_NextFreeNode]; /* select first node */ for(j = 0; j < 8; j++) nfreep->u.suns[j] = -1; nfreep->len = DomainLen; for(j = 0; j < 3; j++) nfreep->center[j] = DomainCenter[j]; Tree_NumNodes = 1; Tree_NextFreeNode++; /* create a set of empty nodes corresponding to the top-level domain * grid. We need to generate these nodes first to make sure that we have a * complete top-level tree which allows the easy insertion of the * pseudo-particles at the right place */ if(force_create_empty_nodes(Tree_MaxPart, 0, 1, 0, 0, 0) < 0) return -2; Tree_FirstNonTopLevelNode = Tree_NextFreeNode; /* if a high-resolution region in a global tree is used, we need to generate * an additional set of empty nodes to make sure that we have a complete * top-level tree for the high-resolution inset */ /* we first do a dummy allocation here that we'll resize later if needed, in which case the following arrays will have to be moved once. */ int guess_nimported = 1.2 * NumPart; Tree_Points = (struct treepoint_data *) mymalloc_movable(&Tree_Points, "Tree_Points", guess_nimported * sizeof(struct treepoint_data)); th_list = (int *) mymalloc_movable(&th_list, "th_list", npart * sizeof(int)); level_list = (unsigned char *) mymalloc_movable(&level_list, "level_list", npart * sizeof(int)); Tree_IntPos_list = (unsigned long long *) mymalloc_movable(&Tree_IntPos_list, "Tree_IntPos_list", 3 * npart * sizeof(unsigned long long)); for(i = 0, posp = Tree_Pos_list; i < npart; i++) { for(j = 0; j < 3; j++, posp++) *posp = P[i].Pos[j]; } /* now we determine for each point the insertion top-level node, and the task on which this lies */ for(i = 0, posp = Tree_Pos_list, intposp = Tree_IntPos_list; i < npart; i++) { unsigned long long xxb = force_double_to_int(((*posp++ - DomainCorner[0]) * DomainInverseLen) + 1.0); unsigned long long yyb = force_double_to_int(((*posp++ - DomainCorner[1]) * DomainInverseLen) + 1.0); unsigned long long zzb = force_double_to_int(((*posp++ - DomainCorner[2]) * DomainInverseLen) + 1.0); unsigned long long mask = ((unsigned long long) 1) << (52 - 1); unsigned char shiftx = (52 - 1); unsigned char shifty = (52 - 2); unsigned char shiftz = (52 - 3); unsigned char levels = 0; *intposp++ = xxb; *intposp++ = yyb; *intposp++ = zzb; no = 0; while(TopNodes[no].Daughter >= 0) /* walk down top tree to find correct leaf */ { unsigned char subnode = (((unsigned char) ((xxb & mask) >> (shiftx--))) | ((unsigned char) ((yyb & mask) >> (shifty--))) | ((unsigned char) ((zzb & mask) >> (shiftz--)))); mask >>= 1; levels++; no = TopNodes[no].Daughter + TopNodes[no].MortonToPeanoSubnode[subnode]; } no = TopNodes[no].Leaf; th_list[i] = no; level_list[i] = levels; } memcpy(DomainNewTask, DomainTask, NTopleaves * sizeof(int)); for(j = 0; j < NTask; j++) Mesh_Send_count[j] = 0; for(i = 0; i < npart; i++) /* make list of insertion top leaf and task for all particles */ { no = th_list[i]; th_list[i] = DomainNodeIndex[no]; int task = DomainNewTask[no]; Tree_Task_list[i] = task; if(task != ThisTask) { terminate("particle i=%d on task=%d should be on task=%d", i, ThisTask, task); Mesh_Send_count[task]++; } } MPI_Alltoall(Mesh_Send_count, 1, MPI_INT, Mesh_Recv_count, 1, MPI_INT, MPI_COMM_WORLD); for(j = 0, Tree_NumPartImported = 0, Tree_NumPartExported = 0, Mesh_Recv_offset[0] = 0, Mesh_Send_offset[0] = 0; j < NTask; j++) { Tree_NumPartImported += Mesh_Recv_count[j]; Tree_NumPartExported += Mesh_Send_count[j]; if(j > 0) { Mesh_Send_offset[j] = Mesh_Send_offset[j - 1] + Mesh_Send_count[j - 1]; Mesh_Recv_offset[j] = Mesh_Recv_offset[j - 1] + Mesh_Recv_count[j - 1]; } } if(Tree_NumPartImported > guess_nimported) Tree_Points = (struct treepoint_data *) myrealloc_movable(Tree_Points, Tree_NumPartImported * sizeof(struct treepoint_data)); struct treepoint_data *export_Tree_Points = (struct treepoint_data *) mymalloc("export_Tree_Points", Tree_NumPartExported * sizeof(struct treepoint_data)); for(j = 0; j < NTask; j++) { Mesh_Send_count[j] = 0; } for(i = 0; i < npart; i++) /* prepare particle data to be copied to other tasks */ { int task = Tree_Task_list[i]; if(task != ThisTask) { int n = Mesh_Send_offset[task] + Mesh_Send_count[task]++; /* this point has to go to another task */ export_Tree_Points[n].Pos[0] = Tree_Pos_list[3 * i + 0]; export_Tree_Points[n].Pos[1] = Tree_Pos_list[3 * i + 1]; export_Tree_Points[n].Pos[2] = Tree_Pos_list[3 * i + 2]; export_Tree_Points[n].IntPos[0] = Tree_IntPos_list[3 * i + 0]; export_Tree_Points[n].IntPos[1] = Tree_IntPos_list[3 * i + 1]; export_Tree_Points[n].IntPos[2] = Tree_IntPos_list[3 * i + 2]; export_Tree_Points[n].Mass = P[i].Mass; export_Tree_Points[n].index = i; export_Tree_Points[n].Type = P[i].Type; export_Tree_Points[n].th = th_list[i]; export_Tree_Points[n].level = level_list[i]; } } /* exchange data */ for(ngrp = 1; ngrp < (1 << PTask); ngrp++) { recvTask = ThisTask ^ ngrp; if(recvTask < NTask) if(Mesh_Send_count[recvTask] > 0 || Mesh_Recv_count[recvTask] > 0) MPI_Sendrecv(&export_Tree_Points[Mesh_Send_offset[recvTask]], Mesh_Send_count[recvTask] * sizeof(struct treepoint_data), MPI_BYTE, recvTask, TAG_DENS_A, &Tree_Points[Mesh_Recv_offset[recvTask]], Mesh_Recv_count[recvTask] * sizeof(struct treepoint_data), MPI_BYTE, recvTask, TAG_DENS_A, MPI_COMM_WORLD, MPI_STATUS_IGNORE); } myfree(export_Tree_Points); Tree_ImportedNodeOffset = Tree_MaxPart + Tree_MaxNodes + NTopleaves; int full_flag = 0; /* now we insert all particles */ for(i = 0; i < npart; i++) { if(Tree_Task_list[i] == ThisTask) { if(P[i].Type != 5) if(force_treebuild_insert_single_point(i, &Tree_IntPos_list[3 * i], th_list[i], level_list[i]) < 0) { full_flag = 1; break; } } } if(full_flag == 0) /* only continue if previous step was successful */ { for(i = 0; i < Tree_NumPartImported; i++) { if(force_treebuild_insert_single_point(i + Tree_ImportedNodeOffset, Tree_Points[i].IntPos, Tree_Points[i].th, Tree_Points[i].level) < 0) { full_flag = 1; break; } } } myfree_movable(Tree_IntPos_list); myfree_movable(level_list); myfree_movable(th_list); if(full_flag) return -1; return Tree_NumNodes; } /*! inserts a single particle into the gravitational tree * * \return 0 if successful \n * -1 if too few nodes have been allocated in the Nodes array */ int force_treebuild_insert_single_point(int i /*!< index of particle */, unsigned long long *intpos /*!< integer representation of particle position */, int th /*!< target node */, unsigned char levels /*!< level of target node */) { int j, parent = -1; unsigned char subnode = 0; unsigned long long xxb = intpos[0]; unsigned long long yyb = intpos[1]; unsigned long long zzb = intpos[2]; unsigned long long mask = ((unsigned long long) 1) << ((52 - 1) - levels); unsigned char shiftx = (52 - 1) - levels; unsigned char shifty = (52 - 2) - levels; unsigned char shiftz = (52 - 3) - levels; signed long long centermask = (0xFFF0000000000000llu); unsigned long long *intppos; centermask >>= levels; while(1) { if(th >= Tree_MaxPart && th < Tree_ImportedNodeOffset) /* we are dealing with an internal node */ { subnode = (((unsigned char) ((xxb & mask) >> (shiftx--))) | ((unsigned char) ((yyb & mask) >> (shifty--))) | ((unsigned char) ((zzb & mask) >> (shiftz--)))); centermask >>= 1; mask >>= 1; levels++; if(levels > MAX_TREE_LEVEL) { /* seems like we're dealing with particles at identical (or extremely close) * locations. Shift subnode index to allow tree construction. Note: Multipole moments * of tree are still correct, but one should MAX_TREE_LEVEL large enough to have * DomainLen/2^MAX_TREE_LEEL < gravitational softening length */ for(j = 0; j < 8; j++) { if(Nodes[th].u.suns[subnode] < 0) break; subnode++; if(subnode >= 8) subnode = 7; } } int nn = Nodes[th].u.suns[subnode]; if(nn >= 0) /* ok, something is in the daughter slot already, need to continue */ { parent = th; th = nn; } else { /* here we have found an empty slot where we can attach * the new particle as a leaf. */ Nodes[th].u.suns[subnode] = i; break; /* done for this particle */ } } else { /* We try to insert into a leaf with a single particle. Need * to generate a new internal node at this point. */ Nodes[parent].u.suns[subnode] = Tree_NextFreeNode; struct NODE *nfreep = &Nodes[Tree_NextFreeNode]; /* one possibility is: double len = 2 * ((force_int_to_double(mask) - 1.0) * DomainLen); double cx = (force_int_to_double((xxb & centermask) | mask) - 1.0) * DomainLen + DomainCorner[0]; double cy = (force_int_to_double((yyb & centermask) | mask) - 1.0) * DomainLen + DomainCorner[1]; double cz = (force_int_to_double((zzb & centermask) | mask) - 1.0) * DomainLen + DomainCorner[2]; */ /* the other is: */ double len = ((double) (mask << 1)) * DomainBigFac; double cx = ((double) ((xxb & centermask) | mask)) * DomainBigFac + DomainCorner[0]; double cy = ((double) ((yyb & centermask) | mask)) * DomainBigFac + DomainCorner[1]; double cz = ((double) ((zzb & centermask) | mask)) * DomainBigFac + DomainCorner[2]; nfreep->len = len; nfreep->center[0] = cx; nfreep->center[1] = cy; nfreep->center[2] = cz; for(j = 0; j < 8; j++) nfreep->u.suns[j] = -1; if(th >= Tree_ImportedNodeOffset) intppos = Tree_Points[th - Tree_ImportedNodeOffset].IntPos; else intppos = &Tree_IntPos_list[3 * th]; subnode = (((unsigned char) ((intppos[0] & mask) >> shiftx)) | ((unsigned char) ((intppos[1] & mask) >> shifty)) | ((unsigned char) ((intppos[2] & mask) >> shiftz))); nfreep->u.suns[subnode] = th; th = Tree_NextFreeNode; /* resume trying to insert the new particle the newly created internal node */ Tree_NumNodes++; Tree_NextFreeNode++; if(Tree_NumNodes >= Tree_MaxNodes) { if(All.TreeAllocFactor > MAX_TREE_ALLOC_FACTOR) { char buf[500]; sprintf(buf, "task %d: looks like a serious problem for particle %d, stopping with particle dump. Tree_NumNodes=%d Tree_MaxNodes=%d Tree_NumPartImported=%d NumPart=%d\n", ThisTask, i, Tree_NumNodes, Tree_MaxNodes, Tree_NumPartImported, NumPart); dump_particles(); terminate(buf); } return -1; } } } return 0; } /*! This function recursively creates a set of empty tree nodes which * corresponds to the top-level tree for the domain grid. This is done to * ensure that this top-level tree is always "complete" so that we can easily * associate the pseudo-particles of other CPUs with tree-nodes at a given * level in the tree, even when the particle population is so sparse that * some of these nodes are actually empty. * * \return 0 if successful \n * -1 if number of allocated tree nodes is too small to fit the newly created nodes */ int force_create_empty_nodes(int no /*!< parent node for which daughter nodes shall be created */, int topnode /*!< index of the parent node in the #TopNodes array */, int bits /*!< 2^bits is the number of nodes per dimension at the level of the daughter nodes */, int x /*!< position of the parent node in the x direction, falls in the range [0,2^(bits-1) - 1] */, int y /*!< position of the parent node in the y direction, falls in the range [0,2^(bits-1) - 1] */, int z /*!< position of the parent node in the z direction, falls in the range [0,2^(bits-1) - 1] */) { int i, j, k, n, sub, count; if(TopNodes[topnode].Daughter >= 0) { for(i = 0; i < 2; i++) /* loop over daughter nodes */ for(j = 0; j < 2; j++) for(k = 0; k < 2; k++) { if(Tree_NumNodes >= Tree_MaxNodes) { if(All.TreeAllocFactor > MAX_TREE_ALLOC_FACTOR) { char buf[500]; sprintf(buf, "task %d: looks like a serious problem (NTopnodes=%d), stopping with particle dump.\n", ThisTask, NTopnodes); dump_particles(); terminate(buf); } return -1; } sub = 7 & peano_hilbert_key((x << 1) + i, (y << 1) + j, (z << 1) + k, bits); count = i + 2 * j + 4 * k; Nodes[no].u.suns[count] = Tree_NextFreeNode; double lenhalf = 0.25 * Nodes[no].len; Nodes[Tree_NextFreeNode].len = 0.5 * Nodes[no].len; Nodes[Tree_NextFreeNode].center[0] = Nodes[no].center[0] + (2 * i - 1) * lenhalf; Nodes[Tree_NextFreeNode].center[1] = Nodes[no].center[1] + (2 * j - 1) * lenhalf; Nodes[Tree_NextFreeNode].center[2] = Nodes[no].center[2] + (2 * k - 1) * lenhalf; for(n = 0; n < 8; n++) Nodes[Tree_NextFreeNode].u.suns[n] = -1; if(TopNodes[TopNodes[topnode].Daughter + sub].Daughter == -1) DomainNodeIndex[TopNodes[TopNodes[topnode].Daughter + sub].Leaf] = Tree_NextFreeNode; Tree_NextFreeNode++; Tree_NumNodes++; if(force_create_empty_nodes(Tree_NextFreeNode - 1, TopNodes[topnode].Daughter + sub, bits + 1, 2 * x + i, 2 * y + j, 2 * z + k) < 0) return -1; /* create granddaughter nodes for current daughter node */ } } return 0; } /*! this function inserts pseudo-particles which will represent the mass * distribution of the other CPUs. Initially, the mass of the * pseudo-particles is set to zero, and their coordinate is set to the * center of the domain-cell they correspond to. These quantities will be * updated later on. */ void force_insert_pseudo_particles(void) { int i, index; for(i = 0; i < NTopleaves; i++) { index = DomainNodeIndex[i]; if(DomainNewTask[i] != ThisTask) Nodes[index].u.suns[0] = Tree_MaxPart + Tree_MaxNodes + i; } } /*! this routine determines the multipole moments for a given internal node * and all its subnodes using a recursive computation. The result is * stored in the Nodes[] structure in the sequence of this tree-walk. */ void force_update_node_recursive(int no /*!< node for which the moments shall be found */, int sib /*!< sibling of node no */, int father /*!< father node of node no */, int *last /*!< last node for which this function was called, or -1 when called for root node */) { int j, jj, p, pp, nextsib, suns[8]; double s[3], mass; if(no >= Tree_MaxPart && no < Tree_MaxPart + Tree_MaxNodes) /* internal node */ { for(j = 0; j < 8; j++) suns[j] = Nodes[no].u.suns[j]; /* this "backup" is necessary because the nextnode entry will overwrite one element (union!) */ if(*last >= 0) { if(*last >= Tree_MaxPart) { if(*last >= Tree_MaxPart + Tree_MaxNodes) Nextnode[*last - Tree_MaxNodes] = no; /* a pseudo-particle or imported point */ else Nodes[*last].u.d.nextnode = no; } else Nextnode[*last] = no; } *last = no; mass = 0; s[0] = 0; s[1] = 0; s[2] = 0; for(j = 0; j < 8; j++) { if((p = suns[j]) >= 0) { /* check if we have a sibling on the same level */ for(jj = j + 1; jj < 8; jj++) if((pp = suns[jj]) >= 0) break; if(jj < 8) /* yes, we do */ nextsib = pp; else nextsib = sib; force_update_node_recursive(p, nextsib, no, last); if(p < Tree_MaxPart) /* a particle */ { MyDouble *pos = &Tree_Pos_list[3 * p]; mass += P[p].Mass; s[0] += P[p].Mass * pos[0]; s[1] += P[p].Mass * pos[1]; s[2] += P[p].Mass * pos[2]; } else if(p < Tree_MaxPart + Tree_MaxNodes) /* an internal node */ { mass += Nodes[p].u.d.mass; s[0] += Nodes[p].u.d.mass * Nodes[p].u.d.s[0]; s[1] += Nodes[p].u.d.mass * Nodes[p].u.d.s[1]; s[2] += Nodes[p].u.d.mass * Nodes[p].u.d.s[2]; } else if(p < Tree_MaxPart + Tree_MaxNodes + NTopleaves) /* a pseudo particle */ { /* nothing to be done here because the mass of the * pseudo-particle is still zero. This will be changed * later. */ } else { /* an imported point */ int n = p - (Tree_MaxPart + Tree_MaxNodes + NTopleaves); if(n >= Tree_NumPartImported) terminate("n >= Tree_NumPartImported"); mass += Tree_Points[n].Mass; s[0] += Tree_Points[n].Mass * Tree_Points[n].Pos[0]; s[1] += Tree_Points[n].Mass * Tree_Points[n].Pos[1]; s[2] += Tree_Points[n].Mass * Tree_Points[n].Pos[2]; } } } if(mass) { s[0] /= mass; s[1] /= mass; s[2] /= mass; } else { s[0] = Nodes[no].center[0]; s[1] = Nodes[no].center[1]; s[2] = Nodes[no].center[2]; } Nodes[no].u.d.mass = mass; Nodes[no].u.d.s[0] = s[0]; Nodes[no].u.d.s[1] = s[1]; Nodes[no].u.d.s[2] = s[2]; Nodes[no].u.d.sibling = sib; Nodes[no].u.d.father = father; } else /* single particle or pseudo particle */ { if(*last >= 0) { if(*last >= Tree_MaxPart) { if(*last >= Tree_MaxPart + Tree_MaxNodes) Nextnode[*last - Tree_MaxNodes] = no; /* a pseudo-particle or an imported point */ else Nodes[*last].u.d.nextnode = no; } else Nextnode[*last] = no; } *last = no; if(no < Tree_MaxPart) /* only set it for single particles... */ Father[no] = father; if(no >= Tree_MaxPart + Tree_MaxNodes + NTopleaves) /* ...or for imported points */ Father[no - Tree_MaxNodes - NTopleaves] = father; } } /*! This function communicates the values of the multipole moments of the * top-level tree-nodes of the domain grid. This data can then be used to * update the pseudo-particles on each CPU accordingly. */ void force_exchange_topleafdata(void) { int n, no, idx, task; int *recvcounts, *recvoffset, *bytecounts, *byteoffset; struct DomainNODE { MyDouble s[3]; MyDouble mass; } *DomainMoment, *loc_DomainMoment; DomainMoment = (struct DomainNODE *) mymalloc("DomainMoment", NTopleaves * sizeof(struct DomainNODE)); /* share the pseudo-particle data accross CPUs */ recvcounts = (int *) mymalloc("recvcounts", sizeof(int) * NTask); recvoffset = (int *) mymalloc("recvoffset", sizeof(int) * NTask); bytecounts = (int *) mymalloc("bytecounts", sizeof(int) * NTask); byteoffset = (int *) mymalloc("byteoffset", sizeof(int) * NTask); for(task = 0; task < NTask; task++) recvcounts[task] = 0; for(n = 0; n < NTopleaves; n++) recvcounts[DomainNewTask[n]]++; for(task = 0; task < NTask; task++) bytecounts[task] = recvcounts[task] * sizeof(struct DomainNODE); for(task = 1, recvoffset[0] = 0, byteoffset[0] = 0; task < NTask; task++) { recvoffset[task] = recvoffset[task - 1] + recvcounts[task - 1]; byteoffset[task] = byteoffset[task - 1] + bytecounts[task - 1]; } loc_DomainMoment = (struct DomainNODE *) mymalloc("loc_DomainMoment", recvcounts[ThisTask] * sizeof(struct DomainNODE)); for(n = 0, idx = 0; n < NTopleaves; n++) { if(DomainNewTask[n] == ThisTask) { no = DomainNodeIndex[n]; /* read out the multipole moments from the local base cells */ loc_DomainMoment[idx].s[0] = Nodes[no].u.d.s[0]; loc_DomainMoment[idx].s[1] = Nodes[no].u.d.s[1]; loc_DomainMoment[idx].s[2] = Nodes[no].u.d.s[2]; loc_DomainMoment[idx].mass = Nodes[no].u.d.mass; idx++; } } MPI_Allgatherv(loc_DomainMoment, bytecounts[ThisTask], MPI_BYTE, DomainMoment, bytecounts, byteoffset, MPI_BYTE, MPI_COMM_WORLD); for(task = 0; task < NTask; task++) recvcounts[task] = 0; for(n = 0; n < NTopleaves; n++) { task = DomainNewTask[n]; if(task != ThisTask) { no = DomainNodeIndex[n]; idx = recvoffset[task] + recvcounts[task]++; Nodes[no].u.d.s[0] = DomainMoment[idx].s[0]; Nodes[no].u.d.s[1] = DomainMoment[idx].s[1]; Nodes[no].u.d.s[2] = DomainMoment[idx].s[2]; Nodes[no].u.d.mass = DomainMoment[idx].mass; } } myfree(loc_DomainMoment); myfree(byteoffset); myfree(bytecounts); myfree(recvoffset); myfree(recvcounts); myfree(DomainMoment); } /*! This function updates the top-level tree after the multipole moments of * the pseudo-particles have been updated. */ void force_treeupdate_toplevel(int no /*!< node to be updated */, int topnode /*!< index of the node no in the #TopNodes array */, int bits /*!< 2^bits is the number of nodes per dimension at the level of the daughter nodes of node no */, int x /*!< position of the node no in the x direction, falls in the range [0,2^(bits-1) - 1] */, int y /*!< position of the node no in the y direction, falls in the range [0,2^(bits-1) - 1] */, int z /*!< position of the node no in the z direction, falls in the range [0,2^(bits-1) - 1] */) { int i, j, k, sub; int p; double s[3], mass; if(TopNodes[topnode].Daughter >= 0) { for(i = 0; i < 2; i++) for(j = 0; j < 2; j++) for(k = 0; k < 2; k++) { sub = 7 & peano_hilbert_key((x << 1) + i, (y << 1) + j, (z << 1) + k, bits); Tree_NextFreeNode++; force_treeupdate_toplevel(Tree_NextFreeNode - 1, TopNodes[topnode].Daughter + sub, bits + 1, 2 * x + i, 2 * y + j, 2 * z + k); } mass = 0; s[0] = 0; s[1] = 0; s[2] = 0; p = Nodes[no].u.d.nextnode; for(j = 0; j < 8; j++) /* since we are dealing with top-level nodes, we know that there are 8 consecutive daughter nodes */ { if(p >= Tree_MaxPart && p < Tree_MaxPart + Tree_MaxNodes) /* internal node */ { mass += Nodes[p].u.d.mass; s[0] += Nodes[p].u.d.mass * Nodes[p].u.d.s[0]; s[1] += Nodes[p].u.d.mass * Nodes[p].u.d.s[1]; s[2] += Nodes[p].u.d.mass * Nodes[p].u.d.s[2]; } else terminate("may not happen"); p = Nodes[p].u.d.sibling; } if(mass) { s[0] /= mass; s[1] /= mass; s[2] /= mass; } else { s[0] = Nodes[no].center[0]; s[1] = Nodes[no].center[1]; s[2] = Nodes[no].center[2]; } Nodes[no].u.d.s[0] = s[0]; Nodes[no].u.d.s[1] = s[1]; Nodes[no].u.d.s[2] = s[2]; Nodes[no].u.d.mass = mass; } } /*! This function allocates the memory used for storage of the tree nodes. Usually, * the number of required nodes is of order 0.7*maxpart, but if this is insufficient, * the code will try to allocated more space. */ void force_treeallocate(int maxpart /*!< number of particles on the current task */, int maxindex /*!< the Nodes pointer will be shifted such that the index of the first element is maxindex */) { if(Nodes) terminate("already allocated"); Tree_MaxPart = maxindex; Tree_MaxNodes = (int) (All.TreeAllocFactor * maxpart) + NTopnodes; DomainNewTask = (int *) mymalloc_movable(&DomainNewTask, "DomainNewTask", NTopleaves * sizeof(int)); DomainNodeIndex = (int *) mymalloc_movable(&DomainNodeIndex, "DomainNodeIndex", NTopleaves * sizeof(int)); Tree_Task_list = (int *) mymalloc_movable(&Tree_Task_list, "Tree_Task_list", maxpart * sizeof(int)); Tree_Pos_list = (MyDouble *) mymalloc_movable(&Tree_Pos_list, "Tree_Pos_list", 3 * maxpart * sizeof(MyDouble)); Nodes = (struct NODE *) mymalloc_movable(&Nodes, "Nodes", (Tree_MaxNodes + 1) * sizeof(struct NODE)); Nodes -= Tree_MaxPart; } /*! This function frees the memory allocated for the tree, i.e. it frees * the space allocated by the function force_treeallocate(). */ void force_treefree(void) { if(Nodes) { myfree(Nodes + Tree_MaxPart); myfree(Tree_Pos_list); myfree(Tree_Task_list); myfree(DomainNodeIndex); myfree(DomainNewTask); Nodes = NULL; DomainNodeIndex = NULL; DomainNewTask = NULL; Tree_Task_list = NULL; Nextnode = NULL; Father = NULL; } else terminate("trying to free the tree even though it's not allocated"); } /*! This function dumps some of the basic particle data to a file. In case * the tree construction fails, it is called just before the run * terminates with an error message. Examination of the generated file may * then give clues to what caused the problem. */ void dump_particles(void) { FILE *fd; char buffer[200]; int i; sprintf(buffer, "particles%d.dat", ThisTask); fd = fopen(buffer, "w"); my_fwrite(&NumPart, 1, sizeof(int), fd); for(i = 0; i < NumPart; i++) my_fwrite(&P[i].Pos[0], 3, sizeof(MyDouble), fd); for(i = 0; i < NumPart; i++) my_fwrite(&P[i].Vel[0], 3, sizeof(MyFloat), fd); for(i = 0; i < NumPart; i++) my_fwrite(&P[i].ID, 1, sizeof(int), fd); fclose(fd); } GalIC/src/forcetree/forcetree_optimizebalance.c000644 000765 000024 00000030122 12373713530 022561 0ustar00volkerstaff000000 000000 /******************************************************************************* * This file is part of the GALIC code developed by D. Yurin and V. Springel. * * Copyright (c) 2014 * Denis Yurin (denis.yurin@h-its.org) * Volker Springel (volker.springel@h-its.org) *******************************************************************************/ #include #include #include #include #include #include #include "allvars.h" #include "proto.h" #include "domain.h" #include "pqueue.h" static struct force_segments_data { int start, end, task; double work, cost, count, normalized_load; } *force_domainAssign; int force_sort_load(const void *a, const void *b) { if(((struct force_segments_data *) a)->normalized_load > (((struct force_segments_data *) b)->normalized_load)) return -1; if(((struct force_segments_data *) a)->normalized_load < (((struct force_segments_data *) b)->normalized_load)) return +1; return 0; } /* mode structure for priority queues */ typedef struct node_t { double pri; int val; size_t pos; } node_t; /* define call back functions for priority queues */ static int cmp_pri(double next, double curr) { return (next > curr); } static double get_pri(void *a) { return (double) ((node_t *) a)->pri; } static void set_pri(void *a, double pri) { ((node_t *) a)->pri = pri; } static size_t get_pos(void *a) { return ((node_t *) a)->pos; } static void set_pos(void *a, size_t pos) { ((node_t *) a)->pos = pos; } static double oldmax, oldsum; double force_get_current_balance(double *impact) { #ifndef NO_MPI_IN_PLACE MPI_Allreduce(MPI_IN_PLACE, TaskCost, NTask, MPI_DOUBLE, MPI_SUM, MPI_COMM_WORLD); #else double *inTaskCost = mymalloc("inTaskCost", NTask * sizeof(double));; memcpy(inTaskCost, TaskCost, NTask * sizeof(double)); MPI_Allreduce(inTaskCost, TaskCost, NTask, MPI_DOUBLE, MPI_SUM, MPI_COMM_WORLD); myfree(inTaskCost); #endif int i; for(i = 0, oldmax = oldsum = 0; i < NTask; i++) { oldsum += TaskCost[i]; if(oldmax < TaskCost[i]) oldmax = TaskCost[i]; } *impact = 1.0 + domain_full_weight[All.HighestActiveTimeBin] * (oldmax - oldsum / NTask) / All.TotGravCost; return oldmax / (oldsum / NTask); } void force_get_global_cost_for_leavenodes(int nexport) { int i, j, n, nimport, idx, task, ngrp; struct node_data { double domainCost; int domainCount; int no; } *export_node_data, *import_node_data; MPI_Alltoall(Send_count, 1, MPI_INT, Recv_count, 1, MPI_INT, MPI_COMM_WORLD); for(j = 0, nimport = 0, Recv_offset[0] = 0, Send_offset[0] = 0; j < NTask; j++) { nimport += Recv_count[j]; if(j > 0) { Send_offset[j] = Send_offset[j - 1] + Send_count[j - 1]; Recv_offset[j] = Recv_offset[j - 1] + Recv_count[j - 1]; } } for(j = 0; j < NTask; j++) Send_count[j] = 0; export_node_data = mymalloc("export_node_data", nexport * sizeof(struct node_data)); import_node_data = mymalloc("import_node_data", nimport * sizeof(struct node_data)); for(i=0; i < nexport; i++) { int task = ListNoData[i].task; int ind = Send_offset[task] + Send_count[task]++; export_node_data[ind].domainCost = ListNoData[i].domainCost; export_node_data[ind].domainCount = ListNoData[i].domainCount; export_node_data[ind].no = ListNoData[i].no; } for(ngrp = 1; ngrp < (1 << PTask); ngrp++) { int recvTask = ThisTask ^ ngrp; if(recvTask < NTask) if(Send_count[recvTask] > 0 || Recv_count[recvTask] > 0) MPI_Sendrecv(&export_node_data[Send_offset[recvTask]], Send_count[recvTask] * sizeof(struct node_data), MPI_BYTE, recvTask, TAG_DENS_B, &import_node_data[Recv_offset[recvTask]], Recv_count[recvTask] * sizeof(struct node_data), MPI_BYTE, recvTask, TAG_DENS_B, MPI_COMM_WORLD, MPI_STATUS_IGNORE); } for(i=0; i < nimport; i++) { int no = import_node_data[i].no; DomainCost[no] += import_node_data[i].domainCost; DomainCount[no] += import_node_data[i].domainCount; } myfree(import_node_data); myfree(export_node_data); /* now share the cost data across all processors */ struct DomainNODE { double domainCost; int domainCount; } *DomainMoment, *loc_DomainMoment; DomainMoment = (struct DomainNODE *) mymalloc("DomainMoment", NTopleaves * sizeof(struct DomainNODE)); /* share the cost data accross CPUs */ int *recvcounts = (int *) mymalloc("recvcounts", sizeof(int) * NTask); int *recvoffset = (int *) mymalloc("recvoffset", sizeof(int) * NTask); int *bytecounts = (int *) mymalloc("bytecounts", sizeof(int) * NTask); int *byteoffset = (int *) mymalloc("byteoffset", sizeof(int) * NTask); for(task = 0; task < NTask; task++) recvcounts[task] = 0; for(n = 0; n < NTopleaves; n++) recvcounts[DomainTask[n]]++; for(task = 0; task < NTask; task++) bytecounts[task] = recvcounts[task] * sizeof(struct DomainNODE); for(task = 1, recvoffset[0] = 0, byteoffset[0] = 0; task < NTask; task++) { recvoffset[task] = recvoffset[task - 1] + recvcounts[task - 1]; byteoffset[task] = byteoffset[task - 1] + bytecounts[task - 1]; } loc_DomainMoment = (struct DomainNODE *) mymalloc("loc_DomainMoment", recvcounts[ThisTask] * sizeof(struct DomainNODE)); for(n = 0, idx = 0; n < NTopleaves; n++) { if(DomainTask[n] == ThisTask) { loc_DomainMoment[idx].domainCost = DomainCost[n]; loc_DomainMoment[idx].domainCount = DomainCount[n]; idx++; } } MPI_Allgatherv(loc_DomainMoment, bytecounts[ThisTask], MPI_BYTE, DomainMoment, bytecounts, byteoffset, MPI_BYTE, MPI_COMM_WORLD); for(task = 0; task < NTask; task++) recvcounts[task] = 0; for(n = 0; n < NTopleaves; n++) { task = DomainTask[n]; if(task != ThisTask) { idx = recvoffset[task] + recvcounts[task]++; DomainCost[n] = DomainMoment[idx].domainCost; DomainCount[n] = DomainMoment[idx].domainCount; } } myfree(loc_DomainMoment); myfree(byteoffset); myfree(bytecounts); myfree(recvoffset); myfree(recvcounts); myfree(DomainMoment); } void force_optimize_domain_mapping(void) { int i, j; double fac_cost = 0.5 / oldsum; double fac_count = 0.5 / All.TotNumPart; int ncpu = NTask * All.MultipleDomains; int ndomain = NTopleaves; double workavg = 1.0 / ncpu; double workhalfnode = 0.5 / NTopleaves; double work_before = 0; double workavg_before = 0; int start = 0; force_domainAssign = mymalloc("force_domainAssign", ncpu * sizeof(struct force_segments_data)); for(i = 0; i < ncpu; i++) { double work = 0, cost = 0, count = 0; int end = start; cost += fac_cost * DomainCost[end]; count += fac_count * DomainCount[end]; work += fac_cost * DomainCost[end] + fac_count * DomainCount[end]; while((work + work_before + (end + 1 < NTopleaves ? fac_cost * DomainCost[end + 1] + fac_count * DomainCount[end + 1] : 0) < workavg + workavg_before + workhalfnode) || (i == ncpu - 1 && end < ndomain - 1)) { if((ndomain - end) > (ncpu - i)) end++; else break; cost += fac_cost * DomainCost[end]; count += fac_count * DomainCount[end]; work += fac_cost * DomainCost[end] + fac_count * DomainCount[end]; } force_domainAssign[i].start = start; force_domainAssign[i].end = end; force_domainAssign[i].work = work; force_domainAssign[i].cost = cost; force_domainAssign[i].count = count; force_domainAssign[i].normalized_load = cost + count; /* note: they are already multiplied by fac_cost/fac_count */ work_before += work; workavg_before += workavg; start = end + 1; } qsort(force_domainAssign, ncpu, sizeof(struct force_segments_data), force_sort_load); /* create three priority queues, one for the cost load, one for the particle count, and one for the combined cost */ pqueue_t *queue_cost = pqueue_init(NTask, cmp_pri, get_pri, set_pri, get_pos, set_pos); node_t *ncost = mymalloc("ncost", NTask * sizeof(node_t)); pqueue_t *queue_count = pqueue_init(NTask, cmp_pri, get_pri, set_pri, get_pos, set_pos); node_t *ncount = mymalloc("ncount", NTask * sizeof(node_t)); pqueue_t *queue_combi = pqueue_init(NTask, cmp_pri, get_pri, set_pri, get_pos, set_pos); node_t *ncombi = mymalloc("ncombi", NTask * sizeof(node_t)); /* fill in all the tasks into the queue. The priority will be the current cost/count, the tag 'val' is used to label the task */ for(i = 0; i < NTask; i++) { ncost[i].pri = 0; ncost[i].val = i; pqueue_insert(queue_cost, &ncost[i]); ncount[i].pri = 0; ncount[i].val = i; pqueue_insert(queue_count, &ncount[i]); ncombi[i].pri = 0; ncombi[i].val = i; pqueue_insert(queue_combi, &ncombi[i]); } double max_load = 0; double max_cost = 0; for(i = 0; i < ncpu; i++) { /* pick the least work-loaded target from the queue, and the least particle-loaded, and then decide which choice gives the smallest load overall */ double cost, load; node_t *node_cost = pqueue_peek(queue_cost); node_t *node_count = pqueue_peek(queue_count); node_t *node_combi = pqueue_peek(queue_combi); int targetA = node_cost->val; int targetB = node_count->val; int targetC = node_combi->val; cost = ncost[targetA].pri + force_domainAssign[i].cost; load = ncount[targetA].pri + force_domainAssign[i].count; if(cost < max_cost) cost = max_cost; if(load < max_load) load = max_load; double workA = cost + load; cost = ncost[targetB].pri + force_domainAssign[i].cost; load = ncount[targetB].pri + force_domainAssign[i].count; if(cost < max_cost) cost = max_cost; if(load < max_load) load = max_load; double workB = cost + load; cost = ncost[targetC].pri + force_domainAssign[i].cost; load = ncount[targetC].pri + force_domainAssign[i].count; if(cost < max_cost) cost = max_cost; if(load < max_load) load = max_load; double workC = cost + load; int target; if(workA < workB && workA < workC) target = targetA; else if(workC < workB) target = targetC; else target = targetB; force_domainAssign[i].task = target; cost = ncost[target].pri + force_domainAssign[i].cost; load = ncount[target].pri + force_domainAssign[i].count; pqueue_change_priority(queue_cost, cost, &ncost[target]); pqueue_change_priority(queue_count, load, &ncount[target]); pqueue_change_priority(queue_combi, cost + load, &ncombi[target]); if(max_cost < cost) max_cost = cost; if(max_load < load) max_load = load; } /* free queue again */ myfree(ncombi); pqueue_free(queue_combi); myfree(ncount); pqueue_free(queue_count); myfree(ncost); pqueue_free(queue_cost); for(i = 0; i < ncpu; i++) for(j = force_domainAssign[i].start; j <= force_domainAssign[i].end; j++) DomainNewTask[j] = force_domainAssign[i].task; myfree(force_domainAssign); for(i = 0; i < NTask; i++) { TaskCost[i] = 0; TaskCount[i] = 0; } for(i = 0; i < NTopleaves; i++) { TaskCost[DomainNewTask[i]] += DomainCost[i]; TaskCount[DomainNewTask[i]] += DomainCount[i]; } double max, sum, maxload, sumload; for(i = 0, max = sum = 0, maxload = sumload = 0; i < NTask; i++) { sum += TaskCost[i]; if(max < TaskCost[i]) max = TaskCost[i]; sumload += TaskCount[i]; if(maxload < TaskCount[i]) maxload = TaskCount[i]; } mpi_printf("FORCETREE: Active-TimeBin=%d [unoptimized work-balance=%g] new work-balance=%g, new load-balance=%g\n", All.HighestActiveTimeBin, oldmax / (oldsum / NTask), max / (sum / NTask), maxload / (sumload / NTask)); if((max / (sum / NTask) > oldmax / (oldsum / NTask)) || (maxload > All.MaxPart)) { mpi_printf("FORCETREE: The work-load is either worse than before or the memory-balance is not viable. We keep the old distribution.\n"); memcpy(DomainNewTask, DomainTask, NTopleaves * sizeof(int)); } } GalIC/src/forcetree/forcetree_walk.c000644 000765 000024 00000013261 12373713530 020356 0ustar00volkerstaff000000 000000 /******************************************************************************* * This file is part of the GALIC code developed by D. Yurin and V. Springel. * * Copyright (c) 2014 * Denis Yurin (denis.yurin@h-its.org) * Volker Springel (volker.springel@h-its.org) *******************************************************************************/ #include #include #include #include #include #include #include "../allvars.h" #include "../proto.h" int force_treeevaluate(int i, int mode, int thread_id) { struct NODE *nop = 0; int k, target, numnodes, no, task; double r2, dx, dy, dz, mass, r, u, hmax, h_inv, h3_inv; double pos_x, pos_y, pos_z; double fac; double acc_x = 0; double acc_y = 0; double acc_z = 0; double wp, pot = 0.0; int ninteractions = 0; hmax = All.ForceSoftening; if(mode == 0) { target = TargetList[i]; if(target < NumPart) { pos_x = Tree_Pos_list[3 * target + 0]; pos_y = Tree_Pos_list[3 * target + 1]; pos_z = Tree_Pos_list[3 * target + 2]; } else { terminate("target >= NumPart"); } numnodes = 1; } else { target = i; pos_x = GravDataGet[target].Pos[0]; pos_y = GravDataGet[target].Pos[1]; pos_z = GravDataGet[target].Pos[2]; if(target == Nimport - 1) numnodes = NimportNodes - GravDataGet[target].Firstnode; else numnodes = GravDataGet[target + 1].Firstnode - GravDataGet[target].Firstnode; } for(k = 0; k < numnodes; k++) { if(mode == 0) no = Tree_MaxPart; /* root node */ else { no = NodeDataGet[GravDataGet[target].Firstnode + k]; no = Nodes[no].u.d.nextnode; /* open it */ } while(no >= 0) { if(no < Tree_MaxPart) /* single particle */ { dx = Tree_Pos_list[3 * no + 0] - pos_x; dy = Tree_Pos_list[3 * no + 1] - pos_y; dz = Tree_Pos_list[3 * no + 2] - pos_z; r2 = dx * dx + dy * dy + dz * dz; mass = P[no].Mass; no = Nextnode[no]; } else if(no < Tree_MaxPart + Tree_MaxNodes) /* internal node */ { if(mode == 1) { if(no < Tree_FirstNonTopLevelNode) /* we reached a top-level node again, which means that we are done with the branch */ { no = -1; continue; } } nop = &Nodes[no]; mass = nop->u.d.mass; dx = nop->u.d.s[0] - pos_x; dy = nop->u.d.s[1] - pos_y; dz = nop->u.d.s[2] - pos_z; r2 = dx * dx + dy * dy + dz * dz; /* we have an internal node. Need to check opening criterion */ if(nop->len * nop->len > r2 * All.ErrTolTheta * All.ErrTolTheta) { /* open cell */ no = nop->u.d.nextnode; continue; } /* ok, node can be used */ no = nop->u.d.sibling; } else if(no >= Tree_ImportedNodeOffset) /* point from imported nodelist */ { int n = no - Tree_ImportedNodeOffset; dx = Tree_Points[n].Pos[0] - pos_x; dy = Tree_Points[n].Pos[1] - pos_y; dz = Tree_Points[n].Pos[2] - pos_z; r2 = dx * dx + dy * dy + dz * dz; mass = Tree_Points[n].Mass; no = Nextnode[no - Tree_MaxNodes]; } else /* pseudo particle */ { if(mode == 0) { task = DomainNewTask[no - (Tree_MaxPart + Tree_MaxNodes)]; if(ThreadsExportflag[thread_id][task] != i) { ThreadsExportflag[thread_id][task] = i; int nexp = ThreadsNexport[thread_id]++; if(nexp >= MaxNexport) terminate("nexp >= MaxNexport"); ThreadsPartList[thread_id][nexp].Task = task; ThreadsPartList[thread_id][nexp].Index = i; } int nexp = ThreadsNexportNodes[thread_id]++; if(nexp >= MaxNexportNodes) terminate("nexp >= MaxNexportNodes"); ThreadsNodeList[thread_id][nexp].Task = task; ThreadsNodeList[thread_id][nexp].Index = i; ThreadsNodeList[thread_id][nexp].Node = DomainNodeIndex[no - (Tree_MaxPart + Tree_MaxNodes)]; } no = Nextnode[no - Tree_MaxNodes]; continue; } /* now evaluate the multipole moment */ if(mass) { r = sqrt(r2); if(r >= hmax) { fac = mass / (r2 * r); wp = -mass / r; } else { h_inv = 1.0 / hmax; h3_inv = h_inv * h_inv * h_inv; u = r * h_inv; if(u < 0.5) { fac = mass * h3_inv * (10.666666666667 + u * u * (32.0 * u - 38.4)); wp = mass * h_inv * (-2.8 + u * u * (5.333333333333 + u * u * (6.4 * u - 9.6))); } else { fac = mass * h3_inv * (21.333333333333 - 48.0 * u + 38.4 * u * u - 10.666666666667 * u * u * u - 0.066666666667 / (u * u * u)); wp = mass * h_inv * (-3.2 + 0.066666666667 / u + u * u * (10.666666666667 + u * (-16.0 + u * (9.6 - 2.133333333333 * u)))); } } acc_x += dx * fac; acc_y += dy * fac; acc_z += dz * fac; pot += wp; ninteractions++; } } } /* store result at the proper place */ if(mode == 0) { if(target < NumPart) { P[target].GravAccel[0] = acc_x; P[target].GravAccel[1] = acc_y; P[target].GravAccel[2] = acc_z; P[target].Potential = pot; } else { int idx = Tree_ResultIndexList[target - Tree_ImportedNodeOffset]; Tree_ResultsActiveImported[idx].GravAccel[0] = acc_x; Tree_ResultsActiveImported[idx].GravAccel[1] = acc_y; Tree_ResultsActiveImported[idx].GravAccel[2] = acc_z; Tree_ResultsActiveImported[idx].Potential = pot; } } else { GravDataResult[target].Acc[0] = acc_x; GravDataResult[target].Acc[1] = acc_y; GravDataResult[target].Acc[2] = acc_z; GravDataResult[target].Potential = pot; } return ninteractions; } GalIC/src/forcetree/gravtree.c000644 000765 000024 00000053740 12373713530 017207 0ustar00volkerstaff000000 000000 /******************************************************************************* * This file is part of the GALIC code developed by D. Yurin and V. Springel. * * Copyright (c) 2014 * Denis Yurin (denis.yurin@h-its.org) * Volker Springel (volker.springel@h-its.org) *******************************************************************************/ #include #include #include #include #include "../allvars.h" #include "../proto.h" #include "../domain/domain.h" /*! \file gravtree.c * \brief main driver routines for gravitational (short-range) force computation * * This file contains the code for the gravitational force computation by * means of the tree algorithm. To this end, a tree force is computed for all * active local particles, and particles are exported to other processors if * needed, where they can receive additional force contributions. If the * TreePM algorithm is enabled, the force computed will only be the * short-range part. */ void gravity_tree(void); void gravity_primary_loop(int thread_id); void gravity_secondary_loop(int thread_id); int compare_partlist_task_index(const void *a, const void *b); int compare_nodelist_task_index_node(const void *a, const void *b); /*! \brief main driver routine of tree force calculation * * This routine handles the whole tree force calculation. First it * build a new force tree force_treebuild() every timestep. This tree is then * used to calculate a new tree force for every active particle ( gravity_tree() ). */ void gravity(void) { domain_Decomposition(); force_treeallocate(NumPart, All.MaxPart); force_treebuild(NumPart, 1); gravity_tree(); myfree(Father); myfree(Nextnode); myfree(Tree_Points); force_treefree(); domain_free(); } /*! \brief This function computes the gravitational forces for all active particles. * * The tree walk is done in two phases: First the local part of the force tree is processed (gravity_primary_loop() ). * Whenever an external node is encountered during the walk, this node is saved on a list. * This node list along with data about the particles is then exchanged among tasks. * In the second phase (gravity_secondary_loop() ) each task now continues the tree walk for * the imported particles. Finally the resulting partial forces are send back to the original task * and are summed up there to complete the tree force calculation. * * If only the tree algorithm is used in a periodic box, the whole tree walk is done twice. * First a normal tree walk is done as described above, and afterwards a second tree walk, * which adds the needed Ewald corrections is performed. * * Particles are only exported to other processors when really needed, thereby allowing a * good use of the communication buffer. Every particle is sent at most once to a given processor * together with the complete list of relevant tree nodes to be checked on the other task. * * Particles which drifted into the domain of another task are sent to this task for the force computation. * Afterwards the resulting force is sent back to the originating task. * * In order to improve the work load balancing during a domain decomposition, the work done by each * node/particle is measured. The work is measured for the interaction partners (i.e. the nodes or particles) * and not for the particles itself that require a force computation. This way, work done for imported * particles is accounted for at the task where the work actually incurred. The cost measurement is * only done for the "GRAVCOSTLEVELS" highest occupied time bins. The variable #MeasureCostFlag will state whether a * measurement is done at the present time step. If the option THREAD_COSTS_EXACT is activate, cost measurements * will be done by each thread, otherwise only by thread 0. * * The tree imbalance can be further reduced using the CHUNKING option. The particles requiring a force computation * are split into chunks of size #Nchunksize. A set of every #Nchunk -th chunk is processed first. * Then the process is repeated, processing the next set of chunks. This way the amount of exported particles * is more balanced, as communication heavy regions are mixed with less communication intensive regions. * */ void gravity_tree(void) { long long n_exported = 0; long long N_nodesinlist = 0; int i, j, k, l, rel_node_index, ncount, iter = 0, threadid; int ndone, ndone_flag, ngrp; int place; int recvTask; MPI_Status status; /* set new softening lengths on global steps to take into account possible cosmological time variation */ set_softenings(); /* allocate buffers to arrange communication */ mpi_printf("GRAVTREE: Begin tree force. (presently allocated=%g MB)\n", AllocatedBytes / (1024.0 * 1024.0)); if(Tree_NumPartImported != 0) terminate("Tree_NumPartImported=%d != 0", Tree_NumPartImported); /* Create list of targets. We do this here to simplify the treatment of the two possible sources of points */ TargetList = mymalloc("TargetList", (NumPart + Tree_NumPartImported) * sizeof(int)); Tree_ResultIndexList = mymalloc("Tree_ResultIndexList", Tree_NumPartImported * sizeof(int)); Nforces = 0; for(i = 0; i < NumPart; i++) { if(P[i].Type == 5) TargetList[Nforces++] = i; } for(i = 0, ncount = 0; i < Tree_NumPartImported; i++) if(Tree_Points[i].Type & 16) { Tree_ResultIndexList[i] = ncount++; TargetList[Nforces++] = i + Tree_ImportedNodeOffset; } Tree_ResultsActiveImported = mymalloc("Tree_ResultsActiveImported", ncount * sizeof(struct resultsactiveimported_data)); MaxNexport = (int) ((All.BufferSizeGravity * 1024 * 1024) / (sizeof(struct data_partlist) + FAC_AVG_NODES_PER_EXPORT * sizeof(struct datanodelist) + sizeof(struct gravdata_in) + sizeof(struct gravdata_out) + FAC_AVG_NODES_PER_EXPORT * sizeof(int) + sizemax(sizeof(struct gravdata_in) + FAC_AVG_NODES_PER_EXPORT * sizeof(int), sizeof(struct gravdata_out)))); MaxNexportNodes = FAC_AVG_NODES_PER_EXPORT * MaxNexport; MaxNexport /= NUM_THREADS; MaxNexportNodes /= NUM_THREADS; if(MaxNexport <= (NTask - 1) || MaxNexportNodes <= NTopleaves) terminate("Bummer. Can't even safely process a single particle for the given gravity buffer size"); mpi_printf("GRAVTREE: MaxNexport=%d MaxNexportNodes=%d (per thread)\n", MaxNexport, MaxNexportNodes); NextParticle = 0; do { iter++; PartList = (struct data_partlist *) mymalloc("PartList", MaxNexport * NUM_THREADS * sizeof(struct data_partlist)); NodeList = (struct datanodelist *) mymalloc("NodeList", MaxNexportNodes * NUM_THREADS * sizeof(struct datanodelist)); for(i = 0; i < NUM_THREADS; i++) { ThreadsNexport[i] = 0; ThreadsNexportNodes[i] = 0; ThreadsPartList[i] = PartList + i * MaxNexport; ThreadsNodeList[i] = NodeList + i * MaxNexportNodes; ThreadsExportflag[i] = Exportflag + i * NTask; } /* do local particles and prepare export list */ #pragma omp parallel private(threadid) { threadid = get_thread_num(); gravity_primary_loop(threadid); /* do local particles and prepare export list */ } Nexport = ThreadsNexport[0]; NexportNodes = ThreadsNexportNodes[0]; /* consolidate the results of all threads into one list */ for(i = 1; i < NUM_THREADS; i++) { memmove(&PartList[Nexport], ThreadsPartList[i], ThreadsNexport[i] * sizeof(struct data_partlist)); memmove(&NodeList[NexportNodes], ThreadsNodeList[i], ThreadsNexportNodes[i] * sizeof(struct datanodelist)); Nexport += ThreadsNexport[i]; NexportNodes += ThreadsNexportNodes[i]; } n_exported += Nexport; N_nodesinlist += NexportNodes; qsort(PartList, Nexport, sizeof(struct data_partlist), compare_partlist_task_index); qsort(NodeList, NexportNodes, sizeof(struct datanodelist), compare_nodelist_task_index_node); for(j = 0; j < NTask; j++) { Send_count[j] = 0; Send_count_nodes[j] = 0; } for(j = 0; j < Nexport; j++) Send_count[PartList[j].Task]++; for(j = 0; j < NexportNodes; j++) Send_count_nodes[NodeList[j].Task]++; MPI_Alltoall(Send_count, 1, MPI_INT, Recv_count, 1, MPI_INT, MPI_COMM_WORLD); MPI_Alltoall(Send_count_nodes, 1, MPI_INT, Recv_count_nodes, 1, MPI_INT, MPI_COMM_WORLD); for(j = 0, Nimport = 0, NimportNodes = 0, Recv_offset[0] = 0, Send_offset[0] = 0, Recv_offset_nodes[0] = 0, Send_offset_nodes[0] = 0; j < NTask; j++) { Nimport += Recv_count[j]; NimportNodes += Recv_count_nodes[j]; if(j > 0) { Send_offset[j] = Send_offset[j - 1] + Send_count[j - 1]; Recv_offset[j] = Recv_offset[j - 1] + Recv_count[j - 1]; Send_offset_nodes[j] = Send_offset_nodes[j - 1] + Send_count_nodes[j - 1]; Recv_offset_nodes[j] = Recv_offset_nodes[j - 1] + Recv_count_nodes[j - 1]; } } GravDataGet = (struct gravdata_in *) mymalloc("GravDataGet", Nimport * sizeof(struct gravdata_in)); NodeDataGet = (int *) mymalloc("NodeDataGet", NimportNodes * sizeof(int)); GravDataIn = (struct gravdata_in *) mymalloc("GravDataIn", Nexport * sizeof(struct gravdata_in)); NodeDataIn = (int *) mymalloc("NodeDataIn", NexportNodes * sizeof(int)); /* prepare particle data for export */ for(j = 0, l = 0, rel_node_index = 0; j < Nexport; j++) { if(j > 0) { if(PartList[j].Task != PartList[j-1].Task) rel_node_index = 0; } place = PartList[j].Index; int target = TargetList[place]; if(target < NumPart) { for(k = 0; k < 3; k++) GravDataIn[j].Pos[k] = P[target].Pos[k]; GravDataIn[j].Type = P[target].Type; GravDataIn[j].Firstnode = rel_node_index; } else { target -= Tree_ImportedNodeOffset; for(k = 0; k < 3; k++) GravDataIn[j].Pos[k] = Tree_Points[target].Pos[k]; GravDataIn[j].Type = Tree_Points[target].Type & 15; GravDataIn[j].Firstnode = rel_node_index; terminate("should not get here\n"); } while(l < NexportNodes && NodeList[l].Index == PartList[j].Index && NodeList[l].Task == PartList[j].Task) { l++; rel_node_index++; } } for(j = 0; j < NexportNodes; j++) NodeDataIn[j] = NodeList[j].Node; /* exchange particle data */ for(ngrp = 1; ngrp < (1 << PTask); ngrp++) { recvTask = ThisTask ^ ngrp; if(recvTask < NTask) { if(Send_count[recvTask] > 0 || Recv_count[recvTask] > 0) { /* get the particles */ MPI_Sendrecv(&GravDataIn[Send_offset[recvTask]], Send_count[recvTask] * sizeof(struct gravdata_in), MPI_BYTE, recvTask, TAG_GRAV_A, &GravDataGet[Recv_offset[recvTask]], Recv_count[recvTask] * sizeof(struct gravdata_in), MPI_BYTE, recvTask, TAG_GRAV_A, MPI_COMM_WORLD, &status); /* get the nodes */ MPI_Sendrecv(&NodeDataIn[Send_offset_nodes[recvTask]], Send_count_nodes[recvTask], MPI_INT, recvTask, TAG_GRAV_B, &NodeDataGet[Recv_offset_nodes[recvTask]], Recv_count_nodes[recvTask], MPI_INT, recvTask, TAG_GRAV_B, MPI_COMM_WORLD, &status); } } } myfree(NodeDataIn); myfree(GravDataIn); GravDataResult = (struct gravdata_out *) mymalloc("GravDataIn", Nimport * sizeof(struct gravdata_out)); GravDataOut = (struct gravdata_out *) mymalloc("GravDataOut", Nexport * sizeof(struct gravdata_out)); for(recvTask=0; recvTask < NTask; recvTask++) for(k=0; k < Recv_count[recvTask]; k++) GravDataGet[Recv_offset[recvTask] + k].Firstnode += Recv_offset_nodes[recvTask]; NextJ = 0; #pragma omp parallel private(threadid) { int threadid = get_thread_num(); gravity_secondary_loop(threadid); /* do particles that were sent to us */ } if(NextParticle < Nforces) ndone_flag = 0; else ndone_flag = 1; MPI_Allreduce(&ndone_flag, &ndone, 1, MPI_INT, MPI_SUM, MPI_COMM_WORLD); /* get the result */ for(ngrp = 1; ngrp < (1 << PTask); ngrp++) { recvTask = ThisTask ^ ngrp; if(recvTask < NTask) { if(Send_count[recvTask] > 0 || Recv_count[recvTask] > 0) { /* send the results */ MPI_Sendrecv(&GravDataResult[Recv_offset[recvTask]], Recv_count[recvTask] * sizeof(struct gravdata_out), MPI_BYTE, recvTask, TAG_GRAV_B, &GravDataOut[Send_offset[recvTask]], Send_count[recvTask] * sizeof(struct gravdata_out), MPI_BYTE, recvTask, TAG_GRAV_B, MPI_COMM_WORLD, &status); } } } /* add the results to the local particles */ for(j = 0; j < Nexport; j++) { place = PartList[j].Index; int target = TargetList[place]; if(target < NumPart) { for(k = 0; k < 3; k++) P[target].GravAccel[k] += GravDataOut[j].Acc[k]; P[target].Potential += GravDataOut[j].Potential; } else { int idx = Tree_ResultIndexList[target - Tree_ImportedNodeOffset]; for(k = 0; k < 3; k++) Tree_ResultsActiveImported[idx].GravAccel[k] += GravDataOut[j].Acc[k]; Tree_ResultsActiveImported[idx].Potential += GravDataOut[j].Potential; } } myfree(GravDataOut); myfree(GravDataResult); myfree(NodeDataGet); myfree(GravDataGet); myfree(NodeList); myfree(PartList); } while(ndone < NTask); /* now communicate the forces in Tree_ResultsActiveImported */ for(j = 0; j < NTask; j++) Recv_count[j] = 0; int n; for(i = 0, n = 0, k = 0; i < NTask; i++) for(j = 0; j < Mesh_Recv_count[i]; j++, n++) { if(Tree_Points[n].Type & 16) { Tree_ResultsActiveImported[k].index = Tree_Points[n].index; Recv_count[i]++; k++; } } MPI_Alltoall(Recv_count, 1, MPI_INT, Send_count, 1, MPI_INT, MPI_COMM_WORLD); for(j = 0, Nexport = 0, Nimport = 0, Recv_offset[0] = 0, Send_offset[0] = 0; j < NTask; j++) { Nexport += Send_count[j]; Nimport += Recv_count[j]; if(j > 0) { Send_offset[j] = Send_offset[j - 1] + Send_count[j - 1]; Recv_offset[j] = Recv_offset[j - 1] + Recv_count[j - 1]; } } struct resultsactiveimported_data *tmp_results = mymalloc("tmp_results", Nexport * sizeof(struct resultsactiveimported_data)); memset(tmp_results, -1, Nexport * sizeof(struct resultsactiveimported_data)); /* exchange data */ for(ngrp = 1; ngrp < (1 << PTask); ngrp++) { recvTask = ThisTask ^ ngrp; if(recvTask < NTask) { if(Send_count[recvTask] > 0 || Recv_count[recvTask] > 0) { MPI_Sendrecv(&Tree_ResultsActiveImported[Recv_offset[recvTask]], Recv_count[recvTask] * sizeof(struct resultsactiveimported_data), MPI_BYTE, recvTask, TAG_FOF_A, &tmp_results[Send_offset[recvTask]], Send_count[recvTask] * sizeof(struct resultsactiveimported_data), MPI_BYTE, recvTask, TAG_FOF_A, MPI_COMM_WORLD, MPI_STATUS_IGNORE); } } } for(i = 0; i < Nexport; i++) { int target = tmp_results[i].index; for(k = 0; k < 3; k++) P[target].GravAccel[k] = tmp_results[i].GravAccel[k]; P[target].Potential = tmp_results[i].Potential; } myfree(tmp_results); myfree(Tree_ResultsActiveImported); myfree(Tree_ResultIndexList); myfree(TargetList); /* muliply by G */ for(i = 0; i < NumPart; i++) { int target = i; for(j = 0; j < 3; j++) P[target].GravAccel[j] *= All.G; P[target].Potential *= All.G; } mpi_printf("GRAVTREE: tree-force is done.\n"); } /*! \brief Drives the first phase of the tree walk * * This routine calls the tree walk routine for every active particle. In * this phase only the contribution of the local part of the tree is added * to local particles. * * If threading is activated, this routine can run in multiple threads simultaneously. * * The global variable #NextParticle keeps track of the next particle to be processed among all threads. * * The Depending whether a PM mesh is used and whether the box is periodic, * the walk for a single particle is handled either by, force_treeevaluate_shortrange(), * force_treeevaluate() or force_treeevaluate_ewald_correction(). * * \param p pointer to the index of this thread */ void gravity_primary_loop(int thread_id) { int i, j; for(j = 0; j < NTask; j++) ThreadsExportflag[thread_id][j] = -1; while(1) { if(ThreadsNexport[thread_id] >= (MaxNexport - (NTask - 1)) || ThreadsNexportNodes[thread_id] >= (MaxNexportNodes - NTopleaves)) break; #pragma omp critical { i = NextParticle; if(i < Nforces) { NextParticle++; } } /* end of critical section */ if(i >= Nforces) break; force_treeevaluate(i, 0, thread_id); } } /*! \brief Drives the second phase of the tree walk * * This routine calls the tree walk routine for every imported particle from the previous phase. * The additional contributions from this tasks local tree are added to the imported particles * * If threading is activated, this routine can run in multiple threads simultaneously. * * The Depending whether a PM mesh is used and whether the box is periodic, * the walk for a single particle is handled either by, force_treeevaluate_shortrange(), * force_treeevaluate() or force_treeevaluate_ewald_correction(). * * \param p pointer to the index of this thread */ void gravity_secondary_loop(int thread_id) { int j; while(1) { #pragma omp critical j = NextJ++; if(j >= Nimport) break; force_treeevaluate(j, 1, thread_id); } } /*! \brief Returns the softening length for local particles. * * \param i the index of the local particle * \return the softening length of particle i */ double get_softening_of_particle(int i) { return All.ForceSoftening; } /*! \brief Sort function for data_partlist objects. * * Sorts first by task and then by index. * This function is used as a comparison kernel in a sort * routine to group particle information that is sent to other * task to be processed further. * * \param a data_partlist struct to be compared * \param b data_partlist struct to be compared * \return sort result */ int compare_partlist_task_index(const void *a, const void *b) { if(((struct data_partlist *) a)->Task < (((struct data_partlist *) b)->Task)) return -1; if(((struct data_partlist *) a)->Task > (((struct data_partlist *) b)->Task)) return +1; if(((struct data_partlist *) a)->Index < (((struct data_partlist *) b)->Index)) return -1; if(((struct data_partlist *) a)->Index > (((struct data_partlist *) b)->Index)) return +1; return 0; } /*! \brief Sort function for datanodelist objects. * * Sorts first by task, then by index and then by node. * This function is used as a comparison kernel in a sort * routine to group nodes information that goes along with * data_partlist structs. * * \param a datanodelist struct to be compared * \param b datanodelist struct to be compared * \return sort result */ int compare_nodelist_task_index_node(const void *a, const void *b) { if(((struct datanodelist *) a)->Task < (((struct datanodelist *) b)->Task)) return -1; if(((struct datanodelist *) a)->Task > (((struct datanodelist *) b)->Task)) return +1; if(((struct datanodelist *) a)->Index < (((struct datanodelist *) b)->Index)) return -1; if(((struct datanodelist *) a)->Index > (((struct datanodelist *) b)->Index)) return +1; if(((struct datanodelist *) a)->Node < (((struct datanodelist *) b)->Node)) return -1; if(((struct datanodelist *) a)->Node > (((struct datanodelist *) b)->Node)) return +1; return 0; } /*! \brief Sort function for data_index objects. * * Sorts first by task, then by index then by IndexGet. This function is used as * a comparison kernel in a sort routine to group particles in the * communication buffer that are going to be sent to the same CPU. * * \param a data_index struct to be compared * \param b data_index struct to be compared * \return sort result */ int data_index_compare(const void *a, const void *b) { if(((struct data_index *) a)->Task < (((struct data_index *) b)->Task)) return -1; if(((struct data_index *) a)->Task > (((struct data_index *) b)->Task)) return +1; if(((struct data_index *) a)->Index < (((struct data_index *) b)->Index)) return -1; if(((struct data_index *) a)->Index > (((struct data_index *) b)->Index)) return +1; if(((struct data_index *) a)->IndexGet < (((struct data_index *) b)->IndexGet)) return -1; if(((struct data_index *) a)->IndexGet > (((struct data_index *) b)->IndexGet)) return +1; return 0; } /*! \brief Implements the sorting function for mysort_dataindex() * * The data_index table is sorted using a merge sort algorithm. * * \param b data_index array to sort * \param n number of elements to sort * \param t temporary buffer array */ static void msort_dataindex_with_tmp(struct data_index *b, size_t n, struct data_index *t) { struct data_index *tmp; struct data_index *b1, *b2; size_t n1, n2; if(n <= 1) return; n1 = n / 2; n2 = n - n1; b1 = b; b2 = b + n1; msort_dataindex_with_tmp(b1, n1, t); msort_dataindex_with_tmp(b2, n2, t); tmp = t; while(n1 > 0 && n2 > 0) { if(b1->Task < b2->Task || (b1->Task == b2->Task && b1->Index <= b2->Index)) { --n1; *tmp++ = *b1++; } else { --n2; *tmp++ = *b2++; } } if(n1 > 0) memcpy(tmp, b1, n1 * sizeof(struct data_index)); memcpy(b, t, (n - n2) * sizeof(struct data_index)); } /*! \brief Sort the data_index array b of n entries using the sort kernel * cmp. * * The parameter s is set to sizeof(data_index). * * \param b the data_index array to sort * \param n number of entries in array b * \param size of each entry (must be sizeof(data_index; there for compatibility with qsort) * \param cmp comparator function */ void mysort_dataindex(void *b, size_t n, size_t s, int (*cmp)(const void *, const void *)) { /* this function could be replaced by a call to qsort(b, n, s, cmp) * but the present merge-sort is usually slightly faster for the * data_index list */ const size_t size = n * s; struct data_index *tmp = (struct data_index *) mymalloc("tmp", size); msort_dataindex_with_tmp((struct data_index *) b, n, tmp); myfree(tmp); } GalIC/src/mpi_utils/checksummed_sendrecv.c000644 000765 000024 00000020214 12373713530 021566 0ustar00volkerstaff000000 000000 /******************************************************************************* * This file is part of the GALIC code developed by D. Yurin and V. Springel. * * Copyright (c) 2014 * Denis Yurin (denis.yurin@h-its.org) * Volker Springel (volker.springel@h-its.org) *******************************************************************************/ #include #include #include #include #include #include #include "../allvars.h" #include "../proto.h" #ifdef MPISENDRECV_CHECKSUM #undef MPI_Sendrecv int MPI_Check_Sendrecv(void *sendbuf, int sendcount, MPI_Datatype sendtype, int dest, int sendtag, void *recvbufreal, int recvcount, MPI_Datatype recvtype, int source, int recvtag, MPI_Comm comm, MPI_Status * status) { int checksumtag = 1000, errtag = 2000; int i, iter = 0, err_flag, err_flag_imported, size_sendtype, size_recvtype; long long sendCheckSum, recvCheckSum, importedCheckSum; unsigned char *p, *buf, *recvbuf; char msg[500]; if(dest != source) terminate("destination task different from source task"); MPI_Type_size(sendtype, &size_sendtype); MPI_Type_size(recvtype, &size_recvtype); if(dest == ThisTask) { memcpy(recvbufreal, sendbuf, recvcount * size_recvtype); return 0; } if(!(buf = mymalloc(recvcount * size_recvtype + 1024))) terminate("not enough memory to allocate the buffer buf"); for(i = 0, p = buf; i < recvcount * size_recvtype + 1024; i++) *p++ = 255; recvbuf = buf + 512; MPI_Sendrecv(sendbuf, sendcount, sendtype, dest, sendtag, recvbuf, recvcount, recvtype, source, recvtag, comm, status); for(i = 0, p = buf; i < 512; i++, p++) { if(*p != 255) { sprintf (msg, "MPI-ERROR: Task=%d/%s: Recv occured before recv buffer. message-size=%d from %d, i=%d c=%d\n", ThisTask, getenv("HOST"), recvcount, dest, i, *p); terminate(msg); } } for(i = 0, p = recvbuf + recvcount * size_recvtype; i < 512; i++, p++) { if(*p != 255) { sprintf (msg, "MPI-ERROR: Task=%d/%s: Recv occured after recv buffer. message-size=%d from %d, i=%d c=%d\n", ThisTask, getenv("HOST"), recvcount, dest, i, *p); terminate(msg); } } for(i = 0, p = sendbuf, sendCheckSum = 0; i < sendcount * size_sendtype; i++, p++) sendCheckSum += *p; importedCheckSum = 0; if(dest > ThisTask) { if(sendcount > 0) MPI_Ssend(&sendCheckSum, sizeof(sendCheckSum), MPI_BYTE, dest, checksumtag, MPI_COMM_WORLD); if(recvcount > 0) MPI_Recv(&importedCheckSum, sizeof(importedCheckSum), MPI_BYTE, dest, checksumtag, MPI_COMM_WORLD, status); } else { if(recvcount > 0) MPI_Recv(&importedCheckSum, sizeof(importedCheckSum), MPI_BYTE, dest, checksumtag, MPI_COMM_WORLD, status); if(sendcount > 0) MPI_Ssend(&sendCheckSum, sizeof(sendCheckSum), MPI_BYTE, dest, checksumtag, MPI_COMM_WORLD); } checksumtag++; for(i = 0, p = recvbuf, recvCheckSum = 0; i < recvcount * size_recvtype; i++, p++) recvCheckSum += *p; err_flag = err_flag_imported = 0; if(recvCheckSum != importedCheckSum) { printf ("MPI-ERROR: Receive error on task=%d/%s from task=%d, message size=%d, sendcount=%d checksums= %d %d %d %d. Try to fix it...\n", ThisTask, getenv("HOST"), source, recvcount, sendcount, (int) (recvCheckSum >> 32), (int) recvCheckSum, (int) (importedCheckSum >> 32), (int) importedCheckSum); myflush(stdout); err_flag = 1; } if(dest > ThisTask) { MPI_Ssend(&err_flag, 1, MPI_INT, dest, errtag, MPI_COMM_WORLD); MPI_Recv(&err_flag_imported, 1, MPI_INT, dest, errtag, MPI_COMM_WORLD, status); } else { MPI_Recv(&err_flag_imported, 1, MPI_INT, dest, errtag, MPI_COMM_WORLD, status); MPI_Ssend(&err_flag, 1, MPI_INT, dest, errtag, MPI_COMM_WORLD); } errtag++; if(err_flag > 0 || err_flag_imported > 0) { printf("Task=%d is on %s, wants to send %d and has checksum=%d %d of send data\n", ThisTask, getenv("HOST"), sendcount, (int) (sendCheckSum >> 32), (int) sendCheckSum); myflush(stdout); do { sendtag++; recvtag++; for(i = 0, p = recvbuf; i < recvcount * size_recvtype; i++, p++) *p = 0; if((iter & 1) == 0) { if(dest > ThisTask) { if(sendcount > 0) MPI_Ssend(sendbuf, sendcount, sendtype, dest, sendtag, MPI_COMM_WORLD); if(recvcount > 0) MPI_Recv(recvbuf, recvcount, recvtype, dest, recvtag, MPI_COMM_WORLD, status); } else { if(recvcount > 0) MPI_Recv(recvbuf, recvcount, recvtype, dest, recvtag, MPI_COMM_WORLD, status); if(sendcount > 0) MPI_Ssend(sendbuf, sendcount, sendtype, dest, sendtag, MPI_COMM_WORLD); } } else { if(iter > 5) { printf("we're trying to send each byte now on task=%d (iter=%d)\n", ThisTask, iter); myflush(stdout); if(dest > ThisTask) { for(i = 0, p = sendbuf; i < sendcount * size_sendtype; i++, p++) MPI_Ssend(p, 1, MPI_BYTE, dest, i, MPI_COMM_WORLD); for(i = 0, p = recvbuf; i < recvcount * size_recvtype; i++, p++) MPI_Recv(p, 1, MPI_BYTE, dest, i, MPI_COMM_WORLD, status); } else { for(i = 0, p = recvbuf; i < recvcount * size_recvtype; i++, p++) MPI_Recv(p, 1, MPI_BYTE, dest, i, MPI_COMM_WORLD, status); for(i = 0, p = sendbuf; i < sendcount * size_sendtype; i++, p++) MPI_Ssend(p, 1, MPI_BYTE, dest, i, MPI_COMM_WORLD); } } else { MPI_Sendrecv(sendbuf, sendcount, sendtype, dest, sendtag, recvbuf, recvcount, recvtype, source, recvtag, comm, status); } } importedCheckSum = 0; for(i = 0, p = sendbuf, sendCheckSum = 0; i < sendcount * size_sendtype; i++, p++) sendCheckSum += *p; printf("Task=%d gas send_checksum=%d %d\n", ThisTask, (int) (sendCheckSum >> 32), (int) sendCheckSum); myflush(stdout); if(dest > ThisTask) { if(sendcount > 0) MPI_Ssend(&sendCheckSum, sizeof(sendCheckSum), MPI_BYTE, dest, checksumtag, MPI_COMM_WORLD); if(recvcount > 0) MPI_Recv(&importedCheckSum, sizeof(importedCheckSum), MPI_BYTE, dest, checksumtag, MPI_COMM_WORLD, status); } else { if(recvcount > 0) MPI_Recv(&importedCheckSum, sizeof(importedCheckSum), MPI_BYTE, dest, checksumtag, MPI_COMM_WORLD, status); if(sendcount > 0) MPI_Ssend(&sendCheckSum, sizeof(sendCheckSum), MPI_BYTE, dest, checksumtag, MPI_COMM_WORLD); } for(i = 0, p = recvbuf, recvCheckSum = 0; i < recvcount; i++, p++) recvCheckSum += *p; err_flag = err_flag_imported = 0; if(recvCheckSum != importedCheckSum) { printf ("MPI-ERROR: Again (iter=%d) a receive error on task=%d/%s from task=%d, message size=%d, checksums= %d %d %d %d. Try to fix it...\n", iter, ThisTask, getenv("HOST"), source, recvcount, (int) (recvCheckSum >> 32), (int) recvCheckSum, (int) (importedCheckSum >> 32), (int) importedCheckSum); myflush(stdout); err_flag = 1; } if(dest > ThisTask) { MPI_Ssend(&err_flag, 1, MPI_INT, dest, errtag, MPI_COMM_WORLD); MPI_Recv(&err_flag_imported, 1, MPI_INT, dest, errtag, MPI_COMM_WORLD, status); } else { MPI_Recv(&err_flag_imported, 1, MPI_INT, dest, errtag, MPI_COMM_WORLD, status); MPI_Ssend(&err_flag, 1, MPI_INT, dest, errtag, MPI_COMM_WORLD); } if(err_flag == 0 && err_flag_imported == 0) break; errtag++; checksumtag++; iter++; } while(iter < 10); if(iter >= 10) { char buf[1000]; int length; FILE *fd; sprintf(buf, "send_data_%d.dat", ThisTask); fd = fopen(buf, "w"); length = sendcount * size_sendtype; fwrite(&length, 1, sizeof(int), fd); fwrite(sendbuf, sendcount, size_sendtype, fd); fclose(fd); sprintf(buf, "recv_data_%d.dat", ThisTask); fd = fopen(buf, "w"); length = recvcount * size_recvtype; fwrite(&length, 1, sizeof(int), fd); fwrite(recvbuf, recvcount, size_recvtype, fd); fclose(fd); sprintf(msg, "MPI-ERROR: Even 10 trials proved to be insufficient on task=%d/%s. Stopping\n", ThisTask, getenv("HOST")); terminate(msg); } } memcpy(recvbufreal, recvbuf, recvcount * size_recvtype); myfree(buf); return 0; } #endif GalIC/src/mpi_utils/hypercube_allgatherv.c000644 000765 000024 00000002743 12373713530 021613 0ustar00volkerstaff000000 000000 /******************************************************************************* * This file is part of the GALIC code developed by D. Yurin and V. Springel. * * Copyright (c) 2014 * Denis Yurin (denis.yurin@h-its.org) * Volker Springel (volker.springel@h-its.org) *******************************************************************************/ #include #include #include #include #include #include #include "../allvars.h" #include "../proto.h" #ifdef MPI_HYPERCUBE_ALLGATHERV #define TAG 100 int MPI_hypercube_Allgatherv(void *sendbuf, int sendcount, MPI_Datatype sendtype, void *recvbuf, int *recvcount, int *displs, MPI_Datatype recvtype, MPI_Comm comm) { int ntask, thistask, ptask, ngrp, size_sendtype, size_recvtype; MPI_Status status; MPI_Comm_rank(comm, &thistask); MPI_Comm_size(comm, &ntask); MPI_Type_size(sendtype, &size_sendtype); MPI_Type_size(recvtype, &size_recvtype); for(ptask = 0; ntask > (1 << ptask); ptask++); for(ngrp = 1; ngrp < (1 << ptask); ngrp++) { int recvtask = thistask ^ ngrp; if(recvtask < ntask) MPI_Sendrecv(sendbuf, sendcount, sendtype, recvtask, TAG, recvbuf + displs[recvtask] * size_recvtype, recvcount[recvtask], recvtype, recvtask, TAG, comm, &status); } if(sendbuf != recvbuf + displs[thistask] * size_recvtype) memcpy(recvbuf + displs[thistask] * size_recvtype, sendbuf, sendcount * size_sendtype); return 0; } #endif GalIC/src/mpi_utils/mpi_util.c000644 000765 000024 00000013042 12373713530 017230 0ustar00volkerstaff000000 000000 /******************************************************************************* * This file is part of the GALIC code developed by D. Yurin and V. Springel. * * Copyright (c) 2014 * Denis Yurin (denis.yurin@h-its.org) * Volker Springel (volker.springel@h-its.org) *******************************************************************************/ /** \file MPI utility functions. */ #include #include #include "../allvars.h" #include "../proto.h" /** Implements the common idiom of exchanging buffers with every other MPI task. The number of items to send/receive are in the send_count and recv_count arrays, respectively. The data to exchange are in send_buf and recv_buf, and the offset to the location of the data to/from each task is in send_offset and recv_offset. Since the buffer pointers are void*, the size of the items to be exchanged are in item_size, and the tag to apply to the MPI call is in commtag. If include_self is true, the send data for ThisTask is also copied to the recieve buffer. All arrays should be allocated with NTask size. */ void mpi_exchange_buffers(void *send_buf, int *send_count, int *send_offset, void *recv_buf, int *recv_count, int *recv_offset, int item_size, int commtag, int include_self) { int ngrp; // this loop goes from 0 in some cases, but that doesn't make sense // because then recvTask==ThisTask and nothing is done. for(ngrp = include_self ? 0 : 1; ngrp < (1 << PTask); ngrp++) { int recvTask = ThisTask ^ ngrp; if(recvTask < NTask) { if(send_count[recvTask] > 0 || recv_count[recvTask] > 0) { /* exchange data */ MPI_Sendrecv((char *) send_buf + send_offset[recvTask] * item_size, send_count[recvTask] * item_size, MPI_BYTE, recvTask, commtag, (char *) recv_buf + recv_offset[recvTask] * item_size, recv_count[recvTask] * item_size, MPI_BYTE, recvTask, commtag, MPI_COMM_WORLD, MPI_STATUS_IGNORE); } } } } /** Calculates the recv_count, send_offset, and recv_offset arrays based on the send_count. Returns nimport, the total number of particles to be received. If an identical set of copies are to be sent to all tasks, set send_identical=1 and the send_offset will be zero for all tasks. All arrays should be allocated with NTask size. */ int mpi_calculate_offsets(int *send_count, int *send_offset, int *recv_count, int *recv_offset, int send_identical) { // Exchange the send/receive counts MPI_Alltoall(send_count, 1, MPI_INT, recv_count, 1, MPI_INT, MPI_COMM_WORLD); int nimport = 0; recv_offset[0] = 0; send_offset[0] = 0; int j; for(j = 0; j < NTask; j++) { nimport += recv_count[j]; if(j > 0) { send_offset[j] = send_offset[j - 1] + (send_identical ? 0 : send_count[j - 1]); recv_offset[j] = recv_offset[j - 1] + recv_count[j - 1]; } } return nimport; } /** Compare function used to sort an array of int pointers into order of the pointer targets. */ int intpointer_compare(const void *a, const void *b) { if((**(int **) a) < (**(int **) b)) return -1; if((**(int **) a) > (**(int **) b)) return +1; return 0; } /** Sort an opaque array into increasing order of an int field, given by the specified offset. (This would typically be field indicating the task.) Returns a sorted copy of the data array, that needs to be myfreed. We do this by sorting an array of pointers to the task field, and then using this array to deduce the reordering of the data array. Unfortunately this means making a copy of the data, but this just replaces the copy after the mpi_exchange_buffers anyway. */ void *sort_based_on_field(void *data, int field_offset, int n_items, int item_size) { int i; char *data2; int **perm; data2 = mymalloc("data2", n_items * item_size); perm = mymalloc("perm", n_items * sizeof(*perm)); for(i = 0; i < n_items; ++i) perm[i] = (int *) ((char *) data + i * item_size + field_offset); mysort(perm, n_items, sizeof(*perm), intpointer_compare); // reorder data into data2 for(i = 0; i < n_items; ++i) { size_t orig_pos = ((char *) perm[i] - ((char *) data + field_offset)) / item_size; myassert(((char *) perm[i] - ((char *) data + field_offset)) % item_size == 0); memcpy(data2 + item_size * i, (char *) data + item_size * orig_pos, item_size); } myfree(perm); return (void *) data2; } /** This function distributes the members in an opaque structure to the tasks based on a task field given by a specified offset into the opaque struct. The task field must have int type. n_items is updated to the new size of data. max_n is the allocated size of the data array, and is updated if a realloc is necessary. */ void mpi_distribute_items_to_tasks(void *data, int task_offset, int *n_items, int *max_n, int item_size, int commtag) { int i; for(i = 0; i < NTask; i++) Send_count[i] = 0; for(i = 0; i < *n_items; i++) { int task = *(int *) ((char *) data + i * item_size + task_offset); myassert(task >= 0 && task < NTask); Send_count[task]++; } void *data2 = sort_based_on_field(data, task_offset, *n_items, item_size); int nimport = mpi_calculate_offsets(Send_count, Send_offset, Recv_count, Recv_offset, 0); if(*max_n < nimport) { data = myrealloc_movable(data, nimport * item_size); *max_n = nimport; } mpi_exchange_buffers(data2, Send_count, Send_offset, data, Recv_count, Recv_offset, item_size, commtag, 1); myfree(data2); *n_items = nimport; } GalIC/src/mpi_utils/sizelimited_sendrecv.c000644 000765 000024 00000003670 12373713530 021627 0ustar00volkerstaff000000 000000 /******************************************************************************* * This file is part of the GALIC code developed by D. Yurin and V. Springel. * * Copyright (c) 2014 * Denis Yurin (denis.yurin@h-its.org) * Volker Springel (volker.springel@h-its.org) *******************************************************************************/ #include #include #include #include #include #include #include "../allvars.h" #include "../proto.h" #ifdef MPISENDRECV_SIZELIMIT #undef MPI_Sendrecv int MPI_Sizelimited_Sendrecv(void *sendbuf, int sendcount, MPI_Datatype sendtype, int dest, int sendtag, void *recvbuf, int recvcount, MPI_Datatype recvtype, int source, int recvtag, MPI_Comm comm, MPI_Status * status) { int iter = 0, size_sendtype, size_recvtype, send_now, recv_now; int count_limit; if(dest != source) terminate("dest != source"); MPI_Type_size(sendtype, &size_sendtype); MPI_Type_size(recvtype, &size_recvtype); if(dest == ThisTask) { memcpy(recvbuf, sendbuf, recvcount * size_recvtype); return 0; } count_limit = (int) ((((long long) MPISENDRECV_SIZELIMIT) * 1024 * 1024) / size_sendtype); while(sendcount > 0 || recvcount > 0) { if(sendcount > count_limit) { send_now = count_limit; if(iter == 0) { printf("imposing size limit on MPI_Sendrecv() on task=%d (send of size=%d)\n", ThisTask, sendcount * size_sendtype); myflush(stdout); } iter++; } else send_now = sendcount; if(recvcount > count_limit) recv_now = count_limit; else recv_now = recvcount; MPI_Sendrecv(sendbuf, send_now, sendtype, dest, sendtag, recvbuf, recv_now, recvtype, source, recvtag, comm, status); sendcount -= send_now; recvcount -= recv_now; sendbuf += send_now * size_sendtype; recvbuf += recv_now * size_recvtype; } return 0; } #endif GalIC/src/domain/domain.h000644 000765 000024 00000012165 12373713530 016131 0ustar00volkerstaff000000 000000 /******************************************************************************* * This file is part of the GALIC code developed by D. Yurin and V. Springel. * * Copyright (c) 2014 * Denis Yurin (denis.yurin@h-its.org) * Volker Springel (volker.springel@h-its.org) *******************************************************************************/ #ifndef ALLVARS_H #include "../allvars.h" #endif #ifndef DOMAIN_H #define DOMAIN_H extern struct local_topnode_data { peanokey Size; /*!< number of Peano-Hilbert mesh-cells represented by top-level node */ peanokey StartKey; /*!< first Peano-Hilbert key in top-level node */ long long Count; /*!< counts the number of particles in this top-level node */ int Daughter; /*!< index of first daughter cell (out of 8) of top-level node */ int Leaf; /*!< if the node is a leaf, this gives its number when all leaves are traversed in Peano-Hilbert order */ int Parent; int PIndex; /*!< first particle in node */ } *topNodes, *branchNodes; /*!< points to the root node of the top-level tree */ struct domain_count_data { int task; int count; int origintask; }; extern struct domain_peano_hilbert_data { peanokey key; int index; } *mp; extern struct trans_data { MyIDType ID; int new_task; int new_index; int wrapped; } *trans_table; extern int N_trans; extern int Nbranch; extern double fac_load; extern double totpartcount; extern struct domain_cost_data { int no; int Count; /*!< a table that gives the total number of particles held by each processor */ } *DomainLeaveNode; /*! toGo[partner] gives the number of particles on the current task that have to go to task 'partner' */ extern int *toGo; extern int *toGet; extern int *list_NumPart; extern int *list_load; int domain_check_for_local_refine_new(int i, MPI_Comm current_comm); int domain_double_to_int(double d); double domain_grav_tot_costfactor(int i); double domain_hydro_tot_costfactor(int i); void domain_init_sum_cost(void); void domain_printf(char *buf); void domain_report_balance(void); int domain_sort_load(const void *a, const void *b); int domain_compare_count(const void *a, const void *b); int domain_sort_task(const void *a, const void *b); void domain_post_checks(void); void domain_prechecks(void); void domain_insertnode(struct local_topnode_data *treeA, struct local_topnode_data *treeB, int noA, int noB); void domain_add_cost(struct local_topnode_data *treeA, int noA, long long count, double cost, double sphcost); int domain_compare_count(const void *a, const void *b); void domain_rearrange_particle_sequence(void); void domain_combine_topleaves_to_domains(int ncpu, int ndomain); void domain_findSplit_load_balanced(int ncpu, int ndomain); int domain_sort_loadorigin(const void *a, const void *b); int domain_sort_segments(const void *a, const void *b); void domain_combine_multipledomains(void); void domain_allocate(void); void domain_Decomposition(void); int domain_check_memory_bound(void); int domain_compare_key(const void *a, const void *b); int domain_compare_key(const void *a, const void *b); int domain_compare_toplist(const void *a, const void *b); double domain_particle_costfactor(int i); int domain_countToGo(void); int domain_decompose(void); int domain_determineTopTree(void); void domain_exchange(void); void domain_findExchangeNumbers(int task, int partner, int sphflag, int *send, int *recv); void domain_findExtent(void); void domain_findSplit(int cpustart, int ncpu, int first, int last); void domain_findSplit_balanced(int cpustart, int ncpu, int first, int last); void domain_free(void); void domain_shiftSplit(void); void domain_sumCost(void); int domain_topsplit(int node, peanokey startkey); int domain_topsplit_local(int node, peanokey startkey, int mode); int domain_topsplit_special(void); int domain_compare_key(const void *a, const void *b); int domain_check_for_local_refine(int i, MPI_Comm comm, double work); void domain_free_trick(void); void domain_allocate_trick(void); int domain_recursively_combine_topTree(int start, int ncpu); void domain_walktoptree(int no); void domain_optimize_domain_to_task_mapping(void); int domain_compare_count(const void *a, const void *b); void domain_allocate_lists(void); void domain_free_lists(void); void domain_pack_tree_branch(int no, int parent); int domain_unpack_tree_branch(int no, int parent); int domain_check_for_local_refine_alt(int i, int *current_taskset); int domain_reduce_error_flag(int flag, int *current_taskset); int domain_do_local_refine(int n, int **list); void domain_preserve_relevant_topnode_data(void); void domain_find_total_cost(void); void domain_voronoi_dynamic_update_execute(void); void domain_prepare_voronoi_dynamic_update(void); void domain_voronoi_dynamic_flag_particles(void); void domain_mark_in_trans_table(int i, int task); void domain_exchange_and_update_DC(void); int domain_compare_connection_ID(const void *a, const void *b); int domain_compare_local_trans_data_ID(const void *a, const void *b); int domain_compare_recv_trans_data_ID(const void *a, const void *b); int domain_compare_recv_trans_data_oldtask(const void *a, const void *b); void mysort_domain(void *b, size_t n, size_t s); #endif GalIC/src/domain/pqueue.h000644 000765 000024 00000007443 12373713530 016171 0ustar00volkerstaff000000 000000 /******************************************************************************* * This file is part of the GALIC code developed by D. Yurin and V. Springel. * * Copyright (c) 2014 * Denis Yurin (denis.yurin@h-its.org) * Volker Springel (volker.springel@h-its.org) *******************************************************************************/ /** * @file pqueue.h * @brief Priority Queue function declarations * * @{ */ #ifndef PQUEUE_H #define PQUEUE_H /** priority data type */ typedef double pqueue_pri_t; /** callback functions to get/set/compare the priority of an element */ typedef pqueue_pri_t (*pqueue_get_pri_f)(void *a); typedef void (*pqueue_set_pri_f)(void *a, pqueue_pri_t pri); typedef int (*pqueue_cmp_pri_f)(pqueue_pri_t next, pqueue_pri_t curr); /** callback functions to get/set the position of an element */ typedef size_t (*pqueue_get_pos_f)(void *a); typedef void (*pqueue_set_pos_f)(void *a, size_t pos); /** debug callback function to print a entry */ typedef void (*pqueue_print_entry_f)(FILE *out, void *a); /** the priority queue handle */ typedef struct pqueue_t { size_t size; size_t avail; size_t step; pqueue_cmp_pri_f cmppri; pqueue_get_pri_f getpri; pqueue_set_pri_f setpri; pqueue_get_pos_f getpos; pqueue_set_pos_f setpos; void **d; } pqueue_t; /** * initialize the queue * * @param n the initial estimate of the number of queue items for which memory * should be preallocated * @param pri the callback function to run to assign a score to a element * @param get the callback function to get the current element's position * @param set the callback function to set the current element's position * * @Return the handle or NULL for insufficent memory */ pqueue_t * pqueue_init(size_t n, pqueue_cmp_pri_f cmppri, pqueue_get_pri_f getpri, pqueue_set_pri_f setpri, pqueue_get_pos_f getpos, pqueue_set_pos_f setpos); /** * free all memory used by the queue * @param q the queue */ void pqueue_free(pqueue_t *q); /** * return the size of the queue. * @param q the queue */ size_t pqueue_size(pqueue_t *q); /** * insert an item into the queue. * @param q the queue * @param d the item * @return 0 on success */ int pqueue_insert(pqueue_t *q, void *d); /** * move an existing entry to a different priority * @param q the queue * @param old the old priority * @param d the entry */ void pqueue_change_priority(pqueue_t *q, pqueue_pri_t new_pri, void *d); /** * pop the highest-ranking item from the queue. * @param p the queue * @param d where to copy the entry to * @return NULL on error, otherwise the entry */ void *pqueue_pop(pqueue_t *q); /** * remove an item from the queue. * @param p the queue * @param d the entry * @return 0 on success */ int pqueue_remove(pqueue_t *q, void *d); /** * access highest-ranking item without removing it. * @param q the queue * @param d the entry * @return NULL on error, otherwise the entry */ void *pqueue_peek(pqueue_t *q); /** * print the queue * @internal * DEBUG function only * @param q the queue * @param out the output handle * @param the callback function to print the entry */ void pqueue_print(pqueue_t *q, FILE *out, pqueue_print_entry_f print); /** * dump the queue and it's internal structure * @internal * debug function only * @param q the queue * @param out the output handle * @param the callback function to print the entry */ void pqueueu_dump(pqueue_t *q, FILE *out, pqueue_print_entry_f print); /** * checks that the pq is in the right order, etc * @internal * debug function only * @param q the queue */ int pqueue_is_valid(pqueue_t *q); #endif /* PQUEUE_H */ /** @} */ GalIC/src/forcetree/forcetree.h000644 000765 000024 00000005452 12373713530 017350 0ustar00volkerstaff000000 000000 /******************************************************************************* * This file is part of the GALIC code developed by D. Yurin and V. Springel. * * Copyright (c) 2014 * Denis Yurin (denis.yurin@h-its.org) * Volker Springel (volker.springel@h-its.org) *******************************************************************************/ #ifndef FORCETREE_H #define FORCETREE_H #ifndef INLINE_FUNC #ifdef INLINE #define INLINE_FUNC inline #else #define INLINE_FUNC #endif #endif /*! length of lock-up table for short-range force kernel in TreePM algorithm */ #define NTAB 1000 #define MAX_TREE_LEVEL 30 #define MAX_TREE_ALLOC_FACTOR 30.0 #define MAX_IMPACT_BEFORE_OPTIMIZATION 1.03 #define BITFLAG_TOPLEVEL 0 #define BITFLAG_DEPENDS_ON_LOCAL_MASS 1 #define BITFLAG_DEPENDS_ON_EXTERN_MASS 2 #define BITFLAG_INTERNAL_TOPLEVEL 6 #define BITFLAG_MULTIPLEPARTICLES 7 #define BITFLAG_NODEHASBEENKICKED 8 #define BITFLAG_CONTAINS_GAS 10 #define BITFLAG_MASK ((1<< BITFLAG_CONTAINS_GAS) + (1 << BITFLAG_MULTIPLEPARTICLES)) static inline unsigned long long force_double_to_int(double d) { union { double d; unsigned long long ull; } u; u.d=d; return (u.ull&0xFFFFFFFFFFFFFllu); } static inline double force_int_to_double(unsigned long long x) { union { double d; unsigned long long ull; } u; u.d = 1.0; u.ull |= x; return u.d; } int force_treebuild(int npart, int optimized_domain_mapping); int force_treebuild_construct(int npart, int optimized_domain_mapping); int force_treebuild_insert_single_point(int i, unsigned long long *intpos, int th, unsigned char level); int force_create_empty_nodes(int no, int topnode, int bits, int x, int y, int z); void force_insert_pseudo_particles(void); #ifndef GPU_TREE void force_update_node_recursive(int no, int sib, int father, int *last); #else int force_update_node_recursive(int no, int sib, int father, int *last, int depth); #endif void force_exchange_topleafdata(void); void force_treeupdate_toplevel(int no, int topnode, int bits, int x, int y, int z); void force_treeallocate(int maxpart, int maxindex); void force_treefree(void); void dump_particles(void); int force_add_empty_nodes(void); void force_short_range_init(void); int force_treeevaluate(int target, int mode, int thread_id); int force_treeevaluate_shortrange(int target, int mode, int thread_id, int measure_cost_flag); int force_treeevaluate_ewald_correction(int i, int mode, int thread_id); int force_treeevaluate_direct(int target, int mode); void force_assign_cost_values(void); void force_update_node_recursive_sse(int no, int sib, int father, int *last); void force_optimize_domain_mapping(void); double force_get_current_balance(double *impact); void force_get_global_cost_for_leavenodes(int nexport); #endif GalIC/Model_B1.param000644 000765 000024 00000014524 12373713530 015060 0ustar00volkerstaff000000 000000 %------ File and path names, as well as output file format OutputDir ./Model_B1 OutputFile snap % Base filename of generated sequence of files SnapFormat 1 % File format selection %------ Basic structural parameters of model CC 10.0 % halo concentration V200 200.0 % circular velocity v_200 (in km/sec) LAMBDA 0.035 % spin parameter MD 0.0 % disk mass fraction MB 0.05 % bulge mass fraction MBH 0.0 % black hole mass fraction. If zero, no black % hole is generated, otherwise one at the centre % is added. JD 0.0 % disk spin fraction, typically chosen equal to MD DiskHeight 0.2 % thickness of stellar disk in units of radial scale length BulgeSize 0.1 % bulge scale length in units of halo scale length HaloStretch 1.0 % should be one for a spherical halo, smaller than one corresponds to prolate distortion, otherwise oblate BulgeStretch 1.0 % should be one for a spherical bulge, smaller than one corresponds to prolate distortion, otherwise oblate %------ Particle numbers in target model N_HALO 100000 % desired number of particles in dark halo N_DISK 0 % desired number of collisionless particles in disk N_BULGE 100000 % number of bulge particles %------ Selection of symmetry constraints of velocity structure TypeOfHaloVelocityStructure 0 % 0 = spherically symmetric, isotropic % 1 = spherically symmetric, anisotropic (with beta parameter specified) % 2 = axisymmetric, f(E, Lz), with specified net rotation % 3 = axisymmetric, f(E, Lz, I_3), with / specified and net rotation specified TypeOfDiskVelocityStructure 0 % 0 = spherically symmetric, isotropic % 1 = spherically symmetric, anisotropic (with beta parameter specified) % 2 = axisymmetric, f(E, Lz), with specified net rotation % 3 = axisymmetric, f(E, Lz, I_3), with / specified and net rotation specified TypeOfBulgeVelocityStructure 0 % 0 = spherically symmetric, isotropic % 1 = spherically symmetric, anisotropic (with beta parameter specified) % 2 = axisymmetric, f(E, Lz), with specified net rotation % 3 = axisymmetric, f(E, Lz, I_3), with / specified and net rotation specified HaloBetaParameter 0 % only relevant for TypeOfHaloVelocityStructure=1 BulgeBetaParameter 0 % only relevant for TypeOfBulgeVelocityStructure=1 HaloDispersionRoverZratio 4.0 % only relevant for TypeOfHaloVelocityStructure=3 DiskDispersionRoverZratio 4.0 % only relevant for TypeOfDiskVelocityStructure=3 BulgeDispersionRoverZratio 4.0 % only relevant for TypeOfBulgeVelocityStructure=3 HaloStreamingVelocityParameter 0.0 % gives the azimuthal streaming velocity in the TypeOf*VelocityStructure=2/3 cases ('k parameter') DiskStreamingVelocityParameter 1.0 % gives the azimuthal streaming velocity in the TypeOf*VelocityStructure=2/3 cases ('k parameter') BulgeStreamingVelocityParameter 0.0 % gives the azimuthal streaming velocity in the TypeOf*VelocityStructure=2/3 cases ('k parameter') %------ Orbit integration accuracy TorbitFac 10.0 % regulates the integration time of orbits % (this is of the order of the typical number of orbits per particle) TimeStepFactorOrbit 0.01 TimeStepFactorCellCross 0.25 %------ Iterative optimization parameters FractionToOptimizeIndependendly 0.001 IndepenentOptimizationsPerStep 100 StepsBetweenDump 10 MaximumNumberOfSteps 100 MinParticlesPerBinForDispersionMeasurement 100 MinParticlesPerBinForDensityMeasurement 50 %------ Grid dimension and extenstion/resolution DG_MaxLevel 7 EG_MaxLevel 7 FG_Nbin 256 % number of bins for the acceleration grid in the R- and z-directions OutermostBinEnclosedMassFraction 0.999 % regulates the fraction of mass of the Hernquist % halo that must be inside the grid (determines grid extension) InnermostBinEnclosedMassFraction 0.0000001 % regulates the fraction of mass enclosed by the innermost % bin (regulates size of innermost grid cells) MaxVelInUnitsVesc 0.9999 % maximum allowed velocity in units of the local escape velocity %------ Construction of target density field SampleDensityFieldForTargetResponse 1 % if set to 1, the code will randomly sample points to construct the density field SampleParticleCount 100000000 % number of points sampled for target density field %------ Construction of force field SampleForceNhalo 0 % number of points to use to for computing force field with a tree SampleForceNdisk 0 SampleForceNbulge 0 Softening 0.05 %------ Accuracy settings of tree code used in construction of force field TypeOfOpeningCriterion 1 ErrTolTheta 0.4 ErrTolForceAcc 0.0025 %------ Domain decomposition parameters used in parallel tree code MultipleDomains 4 TopNodeFactor 4 %------ Parallel I/O paramaters, only affects writing of galaxy files NumFilesPerSnapshot 1 NumFilesWrittenInParallel 1 %------ Memory allocation parameters MaxMemSize 2300.0 % in MB BufferSize 100.0 BufferSizeGravity 100.0 %------ Specification of internal system of units UnitLength_in_cm 3.085678e21 % 1.0 kpc UnitMass_in_g 1.989e43 % 1.0e10 solar masses UnitVelocity_in_cm_per_s 1e5 % 1 km/sec GravityConstantInternal 0 GalIC/Model_B2.param000644 000765 000024 00000014602 12373713530 015056 0ustar00volkerstaff000000 000000 %------ File and path names, as well as output file format OutputDir ./Model_B2 OutputFile snap % Base filename of generated sequence of files SnapFormat 1 % File format selection %------ Basic structural parameters of model CC 10.0 % halo concentration V200 200.0 % circular velocity v_200 (in km/sec) LAMBDA 0.035 % spin parameter MD 0.0 % disk mass fraction MB 0.05 % bulge mass fraction MBH 0.0 % black hole mass fraction. If zero, no black % hole is generated, otherwise one at the centre % is added. JD 0.0 % disk spin fraction, typically chosen equal to MD DiskHeight 0.2 % thickness of stellar disk in units of radial scale length BulgeSize 0.1 % bulge scale length in units of halo scale length HaloStretch 1.0 % should be one for a spherical halo, smaller than one corresponds to prolate distortion, otherwise oblate BulgeStretch 1.0 % should be one for a spherical bulge, smaller than one corresponds to prolate distortion, otherwise oblate %------ Particle numbers in target model N_HALO 100000 % desired number of particles in dark halo N_DISK 0 % desired number of collisionless particles in disk N_BULGE 100000 % number of bulge particles %------ Selection of symmetry constraints of velocity structure TypeOfHaloVelocityStructure 1 % 0 = spherically symmetric, isotropic % 1 = spherically symmetric, anisotropic (with beta parameter specified) % 2 = axisymmetric, f(E, Lz), with specified net rotation % 3 = axisymmetric, f(E, Lz, I_3), with / specified and net rotation specified TypeOfDiskVelocityStructure 0 % 0 = spherically symmetric, isotropic % 1 = spherically symmetric, anisotropic (with beta parameter specified) % 2 = axisymmetric, f(E, Lz), with specified net rotation % 3 = axisymmetric, f(E, Lz, I_3), with / specified and net rotation specified TypeOfBulgeVelocityStructure 1 % 0 = spherically symmetric, isotropic % 1 = spherically symmetric, anisotropic (with beta parameter specified) % 2 = axisymmetric, f(E, Lz), with specified net rotation % 3 = axisymmetric, f(E, Lz, I_3), with / specified and net rotation specified HaloBetaParameter 0.5 % only relevant for TypeOfHaloVelocityStructure=1, value larger than 1 selects beta(r) model BulgeBetaParameter -1.0 % only relevant for TypeOfBulgeVelocityStructure=1 HaloDispersionRoverZratio 4.0 % only relevant for TypeOfHaloVelocityStructure=3 DiskDispersionRoverZratio 4.0 % only relevant for TypeOfDiskVelocityStructure=3 BulgeDispersionRoverZratio 4.0 % only relevant for TypeOfBulgeVelocityStructure=3 HaloStreamingVelocityParameter 0.0 % gives the azimuthal streaming velocity in the TypeOf*VelocityStructure=2/3 cases ('k parameter') DiskStreamingVelocityParameter 1.0 % gives the azimuthal streaming velocity in the TypeOf*VelocityStructure=2/3 cases ('k parameter') BulgeStreamingVelocityParameter 0.0 % gives the azimuthal streaming velocity in the TypeOf*VelocityStructure=2/3 cases ('k parameter') %------ Orbit integration accuracy TorbitFac 10.0 % regulates the integration time of orbits % (this is of the order of the typical number of orbits per particle) TimeStepFactorOrbit 0.01 TimeStepFactorCellCross 0.25 %------ Iterative optimization parameters FractionToOptimizeIndependendly 0.001 IndepenentOptimizationsPerStep 100 StepsBetweenDump 10 MaximumNumberOfSteps 100 MinParticlesPerBinForDispersionMeasurement 100 MinParticlesPerBinForDensityMeasurement 50 %------ Grid dimension and extenstion/resolution DG_MaxLevel 7 EG_MaxLevel 7 FG_Nbin 256 % number of bins for the acceleration grid in the R- and z-directions OutermostBinEnclosedMassFraction 0.999 % regulates the fraction of mass of the Hernquist % halo that must be inside the grid (determines grid extension) InnermostBinEnclosedMassFraction 0.0000001 % regulates the fraction of mass enclosed by the innermost % bin (regulates size of innermost grid cells) MaxVelInUnitsVesc 0.9999 % maximum allowed velocity in units of the local escape velocity %------ Construction of target density field SampleDensityFieldForTargetResponse 1 % if set to 1, the code will randomly sample points to construct the density field SampleParticleCount 100000000 % number of points sampled for target density field %------ Construction of force field SampleForceNhalo 0 % number of points to use to for computing force field with a tree SampleForceNdisk 0 SampleForceNbulge 0 Softening 0.05 %------ Accuracy settings of tree code used in construction of force field TypeOfOpeningCriterion 1 ErrTolTheta 0.4 ErrTolForceAcc 0.0025 %------ Domain decomposition parameters used in parallel tree code MultipleDomains 4 TopNodeFactor 4 %------ Parallel I/O paramaters, only affects writing of galaxy files NumFilesPerSnapshot 1 NumFilesWrittenInParallel 1 %------ Memory allocation parameters MaxMemSize 2300.0 % in MB BufferSize 100.0 BufferSizeGravity 100.0 %------ Specification of internal system of units UnitLength_in_cm 3.085678e21 % 1.0 kpc UnitMass_in_g 1.989e43 % 1.0e10 solar masses UnitVelocity_in_cm_per_s 1e5 % 1 km/sec GravityConstantInternal 0 GalIC/Model_B3.param000644 000765 000024 00000014530 12373713531 015060 0ustar00volkerstaff000000 000000 %------ File and path names, as well as output file format OutputDir ./Model_B3 OutputFile snap % Base filename of generated sequence of files SnapFormat 1 % File format selection %------ Basic structural parameters of model CC 10.0 % halo concentration V200 200.0 % circular velocity v_200 (in km/sec) LAMBDA 0.035 % spin parameter MD 0.0 % disk mass fraction MB 0.05 % bulge mass fraction MBH 0.0 % black hole mass fraction. If zero, no black % hole is generated, otherwise one at the centre % is added. JD 0.0 % disk spin fraction, typically chosen equal to MD DiskHeight 0.2 % thickness of stellar disk in units of radial scale length BulgeSize 0.1 % bulge scale length in units of halo scale length HaloStretch 0.85 % should be one for a spherical halo, smaller than one corresponds to prolate distortion, otherwise oblate BulgeStretch 1.0 % should be one for a spherical bulge, smaller than one corresponds to prolate distortion, otherwise oblate %------ Particle numbers in target model N_HALO 100000 % desired number of particles in dark halo N_DISK 0 % desired number of collisionless particles in disk N_BULGE 100000 % number of bulge particles %------ Selection of symmetry constraints of velocity structure TypeOfHaloVelocityStructure 2 % 0 = spherically symmetric, isotropic % 1 = spherically symmetric, anisotropic (with beta parameter specified) % 2 = axisymmetric, f(E, Lz), with specified net rotation % 3 = axisymmetric, f(E, Lz, I_3), with / specified and net rotation specified TypeOfDiskVelocityStructure 0 % 0 = spherically symmetric, isotropic % 1 = spherically symmetric, anisotropic (with beta parameter specified) % 2 = axisymmetric, f(E, Lz), with specified net rotation % 3 = axisymmetric, f(E, Lz, I_3), with / specified and net rotation specified TypeOfBulgeVelocityStructure 2 % 0 = spherically symmetric, isotropic % 1 = spherically symmetric, anisotropic (with beta parameter specified) % 2 = axisymmetric, f(E, Lz), with specified net rotation % 3 = axisymmetric, f(E, Lz, I_3), with / specified and net rotation specified HaloBetaParameter 0.0 % only relevant for TypeOfHaloVelocityStructure=1 BulgeBetaParameter 0.0 % only relevant for TypeOfBulgeVelocityStructure=1 HaloDispersionRoverZratio 4.0 % only relevant for TypeOfHaloVelocityStructure=3 DiskDispersionRoverZratio 4.0 % only relevant for TypeOfDiskVelocityStructure=3 BulgeDispersionRoverZratio 4.0 % only relevant for TypeOfBulgeVelocityStructure=3 HaloStreamingVelocityParameter 0.0 % gives the azimuthal streaming velocity in the TypeOf*VelocityStructure=2/3 cases ('k parameter') DiskStreamingVelocityParameter 1.0 % gives the azimuthal streaming velocity in the TypeOf*VelocityStructure=2/3 cases ('k parameter') BulgeStreamingVelocityParameter 0.0 % gives the azimuthal streaming velocity in the TypeOf*VelocityStructure=2/3 cases ('k parameter') %------ Orbit integration accuracy TorbitFac 10.0 % regulates the integration time of orbits % (this is of the order of the typical number of orbits per particle) TimeStepFactorOrbit 0.01 TimeStepFactorCellCross 0.25 %------ Iterative optimization parameters FractionToOptimizeIndependendly 0.001 IndepenentOptimizationsPerStep 100 StepsBetweenDump 10 MaximumNumberOfSteps 100 MinParticlesPerBinForDispersionMeasurement 100 MinParticlesPerBinForDensityMeasurement 50 %------ Grid dimension and extenstion/resolution DG_MaxLevel 7 EG_MaxLevel 7 FG_Nbin 256 % number of bins for the acceleration grid in the R- and z-directions OutermostBinEnclosedMassFraction 0.999 % regulates the fraction of mass of the Hernquist % halo that must be inside the grid (determines grid extension) InnermostBinEnclosedMassFraction 0.0000001 % regulates the fraction of mass enclosed by the innermost % bin (regulates size of innermost grid cells) MaxVelInUnitsVesc 0.9999 % maximum allowed velocity in units of the local escape velocity %------ Construction of target density field SampleDensityFieldForTargetResponse 1 % if set to 1, the code will randomly sample points to construct the density field SampleParticleCount 100000000 % number of points sampled for target density field %------ Construction of force field SampleForceNhalo 100000000 % number of points to use to for computing force field with a tree SampleForceNdisk 0 SampleForceNbulge 0 Softening 0.05 %------ Accuracy settings of tree code used in construction of force field TypeOfOpeningCriterion 1 ErrTolTheta 0.4 ErrTolForceAcc 0.0025 %------ Domain decomposition parameters used in parallel tree code MultipleDomains 4 TopNodeFactor 4 %------ Parallel I/O paramaters, only affects writing of galaxy files NumFilesPerSnapshot 1 NumFilesWrittenInParallel 1 %------ Memory allocation parameters MaxMemSize 2300.0 % in MB BufferSize 100.0 BufferSizeGravity 100.0 %------ Specification of internal system of units UnitLength_in_cm 3.085678e21 % 1.0 kpc UnitMass_in_g 1.989e43 % 1.0e10 solar masses UnitVelocity_in_cm_per_s 1e5 % 1 km/sec GravityConstantInternal 0 GalIC/Model_B4.param000644 000765 000024 00000014540 12373713531 015062 0ustar00volkerstaff000000 000000 %------ File and path names, as well as output file format OutputDir ./Model_B4 OutputFile snap % Base filename of generated sequence of files SnapFormat 1 % File format selection %------ Basic structural parameters of model CC 10.0 % halo concentration V200 200.0 % circular velocity v_200 (in km/sec) LAMBDA 0.035 % spin parameter MD 0.0 % disk mass fraction MB 0.05 % bulge mass fraction MBH 0.0 % black hole mass fraction. If zero, no black % hole is generated, otherwise one at the centre % is added. JD 0.0 % disk spin fraction, typically chosen equal to MD DiskHeight 0.2 % thickness of stellar disk in units of radial scale length BulgeSize 0.1 % bulge scale length in units of halo scale length HaloStretch 1.15 % should be one for a spherical halo, smaller than one corresponds to prolate distortion, otherwise oblate BulgeStretch 0.85 % should be one for a spherical bulge, smaller than one corresponds to prolate distortion, otherwise oblate %------ Particle numbers in target model N_HALO 100000 % desired number of particles in dark halo N_DISK 0 % desired number of collisionless particles in disk N_BULGE 100000 % number of bulge particles %------ Selection of symmetry constraints of velocity structure TypeOfHaloVelocityStructure 2 % 0 = spherically symmetric, isotropic % 1 = spherically symmetric, anisotropic (with beta parameter specified) % 2 = axisymmetric, f(E, Lz), with specified net rotation % 3 = axisymmetric, f(E, Lz, I_3), with / specified and net rotation specified TypeOfDiskVelocityStructure 0 % 0 = spherically symmetric, isotropic % 1 = spherically symmetric, anisotropic (with beta parameter specified) % 2 = axisymmetric, f(E, Lz), with specified net rotation % 3 = axisymmetric, f(E, Lz, I_3), with / specified and net rotation specified TypeOfBulgeVelocityStructure 2 % 0 = spherically symmetric, isotropic % 1 = spherically symmetric, anisotropic (with beta parameter specified) % 2 = axisymmetric, f(E, Lz), with specified net rotation % 3 = axisymmetric, f(E, Lz, I_3), with / specified and net rotation specified HaloBetaParameter 0.0 % only relevant for TypeOfHaloVelocityStructure=1 BulgeBetaParameter 0.0 % only relevant for TypeOfBulgeVelocityStructure=1 HaloDispersionRoverZratio 4.0 % only relevant for TypeOfHaloVelocityStructure=3 DiskDispersionRoverZratio 4.0 % only relevant for TypeOfDiskVelocityStructure=3 BulgeDispersionRoverZratio 4.0 % only relevant for TypeOfBulgeVelocityStructure=3 HaloStreamingVelocityParameter 0.0 % gives the azimuthal streaming velocity in the TypeOf*VelocityStructure=2/3 cases ('k parameter') DiskStreamingVelocityParameter 1.0 % gives the azimuthal streaming velocity in the TypeOf*VelocityStructure=2/3 cases ('k parameter') BulgeStreamingVelocityParameter 0.0 % gives the azimuthal streaming velocity in the TypeOf*VelocityStructure=2/3 cases ('k parameter') %------ Orbit integration accuracy TorbitFac 10.0 % regulates the integration time of orbits % (this is of the order of the typical number of orbits per particle) TimeStepFactorOrbit 0.01 TimeStepFactorCellCross 0.25 %------ Iterative optimization parameters FractionToOptimizeIndependendly 0.001 IndepenentOptimizationsPerStep 100 StepsBetweenDump 10 MaximumNumberOfSteps 100 MinParticlesPerBinForDispersionMeasurement 100 MinParticlesPerBinForDensityMeasurement 50 %------ Grid dimension and extenstion/resolution DG_MaxLevel 7 EG_MaxLevel 7 FG_Nbin 256 % number of bins for the acceleration grid in the R- and z-directions OutermostBinEnclosedMassFraction 0.999 % regulates the fraction of mass of the Hernquist % halo that must be inside the grid (determines grid extension) InnermostBinEnclosedMassFraction 0.0000001 % regulates the fraction of mass enclosed by the innermost % bin (regulates size of innermost grid cells) MaxVelInUnitsVesc 0.9999 % maximum allowed velocity in units of the local escape velocity %------ Construction of target density field SampleDensityFieldForTargetResponse 1 % if set to 1, the code will randomly sample points to construct the density field SampleParticleCount 100000000 % number of points sampled for target density field %------ Construction of force field SampleForceNhalo 100000000 % number of points to use to for computing force field with a tree SampleForceNdisk 0 SampleForceNbulge 100000000 Softening 0.05 %------ Accuracy settings of tree code used in construction of force field TypeOfOpeningCriterion 1 ErrTolTheta 0.4 ErrTolForceAcc 0.0025 %------ Domain decomposition parameters used in parallel tree code MultipleDomains 4 TopNodeFactor 4 %------ Parallel I/O paramaters, only affects writing of galaxy files NumFilesPerSnapshot 1 NumFilesWrittenInParallel 1 %------ Memory allocation parameters MaxMemSize 2300.0 % in MB BufferSize 100.0 BufferSizeGravity 100.0 %------ Specification of internal system of units UnitLength_in_cm 3.085678e21 % 1.0 kpc UnitMass_in_g 1.989e43 % 1.0e10 solar masses UnitVelocity_in_cm_per_s 1e5 % 1 km/sec GravityConstantInternal 0 GalIC/Model_D1.param000644 000765 000024 00000014531 12373713530 015060 0ustar00volkerstaff000000 000000 %------ File and path names, as well as output file format OutputDir ./Model_D1 OutputFile snap % Base filename of generated sequence of files SnapFormat 1 % File format selection %------ Basic structural parameters of model CC 10.0 % halo concentration V200 200.0 % circular velocity v_200 (in km/sec) LAMBDA 0.035 % spin parameter MD 0.035 % disk mass fraction MB 0.0 % bulge mass fraction MBH 0.0 % black hole mass fraction. If zero, no black % hole is generated, otherwise one at the centre % is added. JD 0.035 % disk spin fraction, typically chosen equal to MD DiskHeight 0.2 % thickness of stellar disk in units of radial scale length BulgeSize 0.1 % bulge scale length in units of halo scale length HaloStretch 1.0 % should be one for a spherical halo, smaller than one corresponds to prolate distortion, otherwise oblate BulgeStretch 1.0 % should be one for a spherical bulge, smaller than one corresponds to prolate distortion, otherwise oblate %------ Particle numbers in target model N_HALO 100000 % desired number of particles in dark halo N_DISK 100000 % desired number of collisionless particles in disk N_BULGE 0 % number of bulge particles %------ Selection of symmetry constraints of velocity structure TypeOfHaloVelocityStructure 2 % 0 = spherically symmetric, isotropic % 1 = spherically symmetric, anisotropic (with beta parameter specified) % 2 = axisymmetric, f(E, Lz), with specified net rotation % 3 = axisymmetric, f(E, Lz, I_3), with / specified and net rotation specified TypeOfDiskVelocityStructure 2 % 0 = spherically symmetric, isotropic % 1 = spherically symmetric, anisotropic (with beta parameter specified) % 2 = axisymmetric, f(E, Lz), with specified net rotation % 3 = axisymmetric, f(E, Lz, I_3), with / specified and net rotation specified TypeOfBulgeVelocityStructure 0 % 0 = spherically symmetric, isotropic % 1 = spherically symmetric, anisotropic (with beta parameter specified) % 2 = axisymmetric, f(E, Lz), with specified net rotation % 3 = axisymmetric, f(E, Lz, I_3), with / specified and net rotation specified HaloBetaParameter 0 % only relevant for TypeOfHaloVelocityStructure=1 BulgeBetaParameter 0 % only relevant for TypeOfBulgeVelocityStructure=1 HaloDispersionRoverZratio 4.0 % only relevant for TypeOfHaloVelocityStructure=3 DiskDispersionRoverZratio 4.0 % only relevant for TypeOfDiskVelocityStructure=3 BulgeDispersionRoverZratio 4.0 % only relevant for TypeOfBulgeVelocityStructure=3 HaloStreamingVelocityParameter 0.0 % gives the azimuthal streaming velocity in the TypeOf*VelocityStructure=2/3 cases ('k parameter') DiskStreamingVelocityParameter 1.0 % gives the azimuthal streaming velocity in the TypeOf*VelocityStructure=2/3 cases ('k parameter') BulgeStreamingVelocityParameter 0.0 % gives the azimuthal streaming velocity in the TypeOf*VelocityStructure=2/3 cases ('k parameter') %------ Orbit integration accuracy TorbitFac 10.0 % regulates the integration time of orbits % (this is of the order of the typical number of orbits per particle) TimeStepFactorOrbit 0.01 TimeStepFactorCellCross 0.25 %------ Iterative optimization parameters FractionToOptimizeIndependendly 0.001 IndepenentOptimizationsPerStep 100 StepsBetweenDump 10 MaximumNumberOfSteps 100 MinParticlesPerBinForDispersionMeasurement 100 MinParticlesPerBinForDensityMeasurement 50 %------ Grid dimension and extenstion/resolution DG_MaxLevel 7 EG_MaxLevel 7 FG_Nbin 256 % number of bins for the acceleration grid in the R- and z-directions OutermostBinEnclosedMassFraction 0.999 % regulates the fraction of mass of the Hernquist % halo that must be inside the grid (determines grid extension) InnermostBinEnclosedMassFraction 0.0000001 % regulates the fraction of mass enclosed by the innermost % bin (regulates size of innermost grid cells) MaxVelInUnitsVesc 0.9999 % maximum allowed velocity in units of the local escape velocity %------ Construction of target density field SampleDensityFieldForTargetResponse 1 % if set to 1, the code will randomly sample points to construct the density field SampleParticleCount 100000000 % number of points sampled for target density field %------ Construction of force field SampleForceNhalo 0 % number of points to use to for computing force field with a tree SampleForceNdisk 100000000 SampleForceNbulge 0 Softening 0.05 %------ Accuracy settings of tree code used in construction of force field TypeOfOpeningCriterion 1 ErrTolTheta 0.4 ErrTolForceAcc 0.0025 %------ Domain decomposition parameters used in parallel tree code MultipleDomains 4 TopNodeFactor 4 %------ Parallel I/O paramaters, only affects writing of galaxy files NumFilesPerSnapshot 1 NumFilesWrittenInParallel 1 %------ Memory allocation parameters MaxMemSize 2300.0 % in MB BufferSize 100.0 BufferSizeGravity 100.0 %------ Specification of internal system of units UnitLength_in_cm 3.085678e21 % 1.0 kpc UnitMass_in_g 1.989e43 % 1.0e10 solar masses UnitVelocity_in_cm_per_s 1e5 % 1 km/sec GravityConstantInternal 0 GalIC/Model_D2.param000644 000765 000024 00000014532 12373713531 015063 0ustar00volkerstaff000000 000000 %------ File and path names, as well as output file format OutputDir ./Model_D2 OutputFile snap % Base filename of generated sequence of files SnapFormat 1 % File format selection %------ Basic structural parameters of model CC 10.0 % halo concentration V200 200.0 % circular velocity v_200 (in km/sec) LAMBDA 0.035 % spin parameter MD 0.035 % disk mass fraction MB 0.0 % bulge mass fraction MBH 0.0 % black hole mass fraction. If zero, no black % hole is generated, otherwise one at the centre % is added. JD 0.035 % disk spin fraction, typically chosen equal to MD DiskHeight 0.2 % thickness of stellar disk in units of radial scale length BulgeSize 0.1 % bulge scale length in units of halo scale length HaloStretch 0.85 % should be one for a spherical halo, smaller than one corresponds to prolate distortion, otherwise oblate BulgeStretch 1.0 % should be one for a spherical bulge, smaller than one corresponds to prolate distortion, otherwise oblate %------ Particle numbers in target model N_HALO 100000 % desired number of particles in dark halo N_DISK 100000 % desired number of collisionless particles in disk N_BULGE 0 % number of bulge particles %------ Selection of symmetry constraints of velocity structure TypeOfHaloVelocityStructure 2 % 0 = spherically symmetric, isotropic % 1 = spherically symmetric, anisotropic (with beta parameter specified) % 2 = axisymmetric, f(E, Lz), with specified net rotation % 3 = axisymmetric, f(E, Lz, I_3), with / specified and net rotation specified TypeOfDiskVelocityStructure 2 % 0 = spherically symmetric, isotropic % 1 = spherically symmetric, anisotropic (with beta parameter specified) % 2 = axisymmetric, f(E, Lz), with specified net rotation % 3 = axisymmetric, f(E, Lz, I_3), with / specified and net rotation specified TypeOfBulgeVelocityStructure 0 % 0 = spherically symmetric, isotropic % 1 = spherically symmetric, anisotropic (with beta parameter specified) % 2 = axisymmetric, f(E, Lz), with specified net rotation % 3 = axisymmetric, f(E, Lz, I_3), with / specified and net rotation specified HaloBetaParameter 0 % only relevant for TypeOfHaloVelocityStructure=1 BulgeBetaParameter 0 % only relevant for TypeOfBulgeVelocityStructure=1 HaloDispersionRoverZratio 4.0 % only relevant for TypeOfHaloVelocityStructure=3 DiskDispersionRoverZratio 4.0 % only relevant for TypeOfDiskVelocityStructure=3 BulgeDispersionRoverZratio 4.0 % only relevant for TypeOfBulgeVelocityStructure=3 HaloStreamingVelocityParameter 0.0 % gives the azimuthal streaming velocity in the TypeOf*VelocityStructure=2/3 cases ('k parameter') DiskStreamingVelocityParameter 1.0 % gives the azimuthal streaming velocity in the TypeOf*VelocityStructure=2/3 cases ('k parameter') BulgeStreamingVelocityParameter 0.0 % gives the azimuthal streaming velocity in the TypeOf*VelocityStructure=2/3 cases ('k parameter') %------ Orbit integration accuracy TorbitFac 10.0 % regulates the integration time of orbits % (this is of the order of the typical number of orbits per particle) TimeStepFactorOrbit 0.01 TimeStepFactorCellCross 0.25 %------ Iterative optimization parameters FractionToOptimizeIndependendly 0.001 IndepenentOptimizationsPerStep 100 StepsBetweenDump 10 MaximumNumberOfSteps 100 MinParticlesPerBinForDispersionMeasurement 100 MinParticlesPerBinForDensityMeasurement 50 %------ Grid dimension and extenstion/resolution DG_MaxLevel 7 EG_MaxLevel 7 FG_Nbin 256 % number of bins for the acceleration grid in the R- and z-directions OutermostBinEnclosedMassFraction 0.999 % regulates the fraction of mass of the Hernquist % halo that must be inside the grid (determines grid extension) InnermostBinEnclosedMassFraction 0.0000001 % regulates the fraction of mass enclosed by the innermost % bin (regulates size of innermost grid cells) MaxVelInUnitsVesc 0.9999 % maximum allowed velocity in units of the local escape velocity %------ Construction of target density field SampleDensityFieldForTargetResponse 1 % if set to 1, the code will randomly sample points to construct the density field SampleParticleCount 100000000 % number of points sampled for target density field %------ Construction of force field SampleForceNhalo 100000000 % number of points to use to for computing force field with a tree SampleForceNdisk 100000000 SampleForceNbulge 0 Softening 0.05 %------ Accuracy settings of tree code used in construction of force field TypeOfOpeningCriterion 1 ErrTolTheta 0.4 ErrTolForceAcc 0.0025 %------ Domain decomposition parameters used in parallel tree code MultipleDomains 4 TopNodeFactor 4 %------ Parallel I/O paramaters, only affects writing of galaxy files NumFilesPerSnapshot 1 NumFilesWrittenInParallel 1 %------ Memory allocation parameters MaxMemSize 2300.0 % in MB BufferSize 100.0 BufferSizeGravity 100.0 %------ Specification of internal system of units UnitLength_in_cm 3.085678e21 % 1.0 kpc UnitMass_in_g 1.989e43 % 1.0e10 solar masses UnitVelocity_in_cm_per_s 1e5 % 1 km/sec GravityConstantInternal 0 GalIC/Model_D3.param000644 000765 000024 00000014534 12373713531 015066 0ustar00volkerstaff000000 000000 %------ File and path names, as well as output file format OutputDir ./Model_D3b OutputFile snap % Base filename of generated sequence of files SnapFormat 1 % File format selection %------ Basic structural parameters of model CC 10.0 % halo concentration V200 200.0 % circular velocity v_200 (in km/sec) LAMBDA 0.035 % spin parameter MD 0.035 % disk mass fraction MB 0.0 % bulge mass fraction MBH 0.0 % black hole mass fraction. If zero, no black % hole is generated, otherwise one at the centre % is added. JD 0.035 % disk spin fraction, typically chosen equal to MD DiskHeight 0.2 % thickness of stellar disk in units of radial scale length BulgeSize 0.1 % bulge scale length in units of halo scale length HaloStretch 1.0 % should be one for a spherical halo, smaller than one corresponds to prolate distortion, otherwise oblate BulgeStretch 1.0 % should be one for a spherical bulge, smaller than one corresponds to prolate distortion, otherwise oblate %------ Particle numbers in target model N_HALO 100000 % desired number of particles in dark halo N_DISK 100000 % desired number of collisionless particles in disk N_BULGE 0 % number of bulge particles %------ Selection of symmetry constraints of velocity structure TypeOfHaloVelocityStructure 2 % 0 = spherically symmetric, isotropic % 1 = spherically symmetric, anisotropic (with beta parameter specified) % 2 = axisymmetric, f(E, Lz), with specified net rotation % 3 = axisymmetric, f(E, Lz, I_3), with / specified and net rotation specified TypeOfDiskVelocityStructure 3 % 0 = spherically symmetric, isotropic % 1 = spherically symmetric, anisotropic (with beta parameter specified) % 2 = axisymmetric, f(E, Lz), with specified net rotation % 3 = axisymmetric, f(E, Lz, I_3), with / specified and net rotation specified TypeOfBulgeVelocityStructure 0 % 0 = spherically symmetric, isotropic % 1 = spherically symmetric, anisotropic (with beta parameter specified) % 2 = axisymmetric, f(E, Lz), with specified net rotation % 3 = axisymmetric, f(E, Lz, I_3), with / specified and net rotation specified HaloBetaParameter 0 % only relevant for TypeOfHaloVelocityStructure=1 BulgeBetaParameter 0 % only relevant for TypeOfBulgeVelocityStructure=1 HaloDispersionRoverZratio 1.1 % only relevant for TypeOfHaloVelocityStructure=3 DiskDispersionRoverZratio 2.0 % only relevant for TypeOfDiskVelocityStructure=3 BulgeDispersionRoverZratio 1.0 % only relevant for TypeOfBulgeVelocityStructure=3 HaloStreamingVelocityParameter 0.0 % gives the azimuthal streaming velocity in the TypeOf*VelocityStructure=2/3 cases ('k parameter') DiskStreamingVelocityParameter 1.0 % gives the azimuthal streaming velocity in the TypeOf*VelocityStructure=2/3 cases ('k parameter') BulgeStreamingVelocityParameter 0.0 % gives the azimuthal streaming velocity in the TypeOf*VelocityStructure=2/3 cases ('k parameter') %------ Orbit integration accuracy TorbitFac 10.0 % regulates the integration time of orbits % (this is of the order of the typical number of orbits per particle) TimeStepFactorOrbit 0.01 TimeStepFactorCellCross 0.25 %------ Iterative optimization parameters FractionToOptimizeIndependendly 0.001 IndepenentOptimizationsPerStep 100 StepsBetweenDump 10 MaximumNumberOfSteps 100 MinParticlesPerBinForDispersionMeasurement 100 MinParticlesPerBinForDensityMeasurement 50 %------ Grid dimension and extenstion/resolution DG_MaxLevel 7 EG_MaxLevel 7 FG_Nbin 256 % number of bins for the acceleration grid in the R- and z-directions OutermostBinEnclosedMassFraction 0.999 % regulates the fraction of mass of the Hernquist % halo that must be inside the grid (determines grid extension) InnermostBinEnclosedMassFraction 0.0000001 % regulates the fraction of mass enclosed by the innermost % bin (regulates size of innermost grid cells) MaxVelInUnitsVesc 0.9999 % maximum allowed velocity in units of the local escape velocity %------ Construction of target density field SampleDensityFieldForTargetResponse 1 % if set to 1, the code will randomly sample points to construct the density field SampleParticleCount 100000000 % number of points sampled for target density field %------ Construction of force field SampleForceNhalo 0 % number of points to use to for computing force field with a tree SampleForceNdisk 100000000 SampleForceNbulge 0 Softening 0.05 %------ Accuracy settings of tree code used in construction of force field TypeOfOpeningCriterion 1 ErrTolTheta 0.4 ErrTolForceAcc 0.0025 %------ Domain decomposition parameters used in parallel tree code MultipleDomains 4 TopNodeFactor 4 %------ Parallel I/O paramaters, only affects writing of galaxy files NumFilesPerSnapshot 1 NumFilesWrittenInParallel 1 %------ Memory allocation parameters MaxMemSize 2300.0 % in MB BufferSize 100.0 BufferSizeGravity 100.0 %------ Specification of internal system of units UnitLength_in_cm 3.085678e21 % 1.0 kpc UnitMass_in_g 1.989e43 % 1.0e10 solar masses UnitVelocity_in_cm_per_s 1e5 % 1 km/sec GravityConstantInternal 0 GalIC/Model_D4.param000644 000765 000024 00000014533 12373713531 015066 0ustar00volkerstaff000000 000000 %------ File and path names, as well as output file format OutputDir ./Model_D4 OutputFile snap % Base filename of generated sequence of files SnapFormat 1 % File format selection %------ Basic structural parameters of model CC 10.0 % halo concentration V200 200.0 % circular velocity v_200 (in km/sec) LAMBDA 0.035 % spin parameter MD 0.035 % disk mass fraction MB 0.0 % bulge mass fraction MBH 0.0 % black hole mass fraction. If zero, no black % hole is generated, otherwise one at the centre % is added. JD 0.035 % disk spin fraction, typically chosen equal to MD DiskHeight 0.2 % thickness of stellar disk in units of radial scale length BulgeSize 0.1 % bulge scale length in units of halo scale length HaloStretch 1.0 % should be one for a spherical halo, smaller than one corresponds to prolate distortion, otherwise oblate BulgeStretch 1.0 % should be one for a spherical bulge, smaller than one corresponds to prolate distortion, otherwise oblate %------ Particle numbers in target model N_HALO 100000 % desired number of particles in dark halo N_DISK 100000 % desired number of collisionless particles in disk N_BULGE 0 % number of bulge particles %------ Selection of symmetry constraints of velocity structure TypeOfHaloVelocityStructure 2 % 0 = spherically symmetric, isotropic % 1 = spherically symmetric, anisotropic (with beta parameter specified) % 2 = axisymmetric, f(E, Lz), with specified net rotation % 3 = axisymmetric, f(E, Lz, I_3), with / specified and net rotation specified TypeOfDiskVelocityStructure 3 % 0 = spherically symmetric, isotropic % 1 = spherically symmetric, anisotropic (with beta parameter specified) % 2 = axisymmetric, f(E, Lz), with specified net rotation % 3 = axisymmetric, f(E, Lz, I_3), with / specified and net rotation specified TypeOfBulgeVelocityStructure 0 % 0 = spherically symmetric, isotropic % 1 = spherically symmetric, anisotropic (with beta parameter specified) % 2 = axisymmetric, f(E, Lz), with specified net rotation % 3 = axisymmetric, f(E, Lz, I_3), with / specified and net rotation specified HaloBetaParameter 0 % only relevant for TypeOfHaloVelocityStructure=1 BulgeBetaParameter 0 % only relevant for TypeOfBulgeVelocityStructure=1 HaloDispersionRoverZratio 1.1 % only relevant for TypeOfHaloVelocityStructure=3 DiskDispersionRoverZratio 4.0 % only relevant for TypeOfDiskVelocityStructure=3 BulgeDispersionRoverZratio 1.0 % only relevant for TypeOfBulgeVelocityStructure=3 HaloStreamingVelocityParameter 0.5 % gives the azimuthal streaming velocity in the TypeOf*VelocityStructure=2/3 cases ('k parameter') DiskStreamingVelocityParameter 1.0 % gives the azimuthal streaming velocity in the TypeOf*VelocityStructure=2/3 cases ('k parameter') BulgeStreamingVelocityParameter 0.0 % gives the azimuthal streaming velocity in the TypeOf*VelocityStructure=2/3 cases ('k parameter') %------ Orbit integration accuracy TorbitFac 10.0 % regulates the integration time of orbits % (this is of the order of the typical number of orbits per particle) TimeStepFactorOrbit 0.01 TimeStepFactorCellCross 0.25 %------ Iterative optimization parameters FractionToOptimizeIndependendly 0.001 IndepenentOptimizationsPerStep 100 StepsBetweenDump 10 MaximumNumberOfSteps 100 MinParticlesPerBinForDispersionMeasurement 100 MinParticlesPerBinForDensityMeasurement 50 %------ Grid dimension and extenstion/resolution DG_MaxLevel 7 EG_MaxLevel 7 FG_Nbin 256 % number of bins for the acceleration grid in the R- and z-directions OutermostBinEnclosedMassFraction 0.999 % regulates the fraction of mass of the Hernquist % halo that must be inside the grid (determines grid extension) InnermostBinEnclosedMassFraction 0.0000001 % regulates the fraction of mass enclosed by the innermost % bin (regulates size of innermost grid cells) MaxVelInUnitsVesc 0.9999 % maximum allowed velocity in units of the local escape velocity %------ Construction of target density field SampleDensityFieldForTargetResponse 1 % if set to 1, the code will randomly sample points to construct the density field SampleParticleCount 100000000 % number of points sampled for target density field %------ Construction of force field SampleForceNhalo 0 % number of points to use to for computing force field with a tree SampleForceNdisk 100000000 SampleForceNbulge 0 Softening 0.05 %------ Accuracy settings of tree code used in construction of force field TypeOfOpeningCriterion 1 ErrTolTheta 0.4 ErrTolForceAcc 0.0025 %------ Domain decomposition parameters used in parallel tree code MultipleDomains 4 TopNodeFactor 4 %------ Parallel I/O paramaters, only affects writing of galaxy files NumFilesPerSnapshot 1 NumFilesWrittenInParallel 1 %------ Memory allocation parameters MaxMemSize 2300.0 % in MB BufferSize 100.0 BufferSizeGravity 100.0 %------ Specification of internal system of units UnitLength_in_cm 3.085678e21 % 1.0 kpc UnitMass_in_g 1.989e43 % 1.0e10 solar masses UnitVelocity_in_cm_per_s 1e5 % 1 km/sec GravityConstantInternal 0 GalIC/Model_D5.param000644 000765 000024 00000014533 12373713531 015067 0ustar00volkerstaff000000 000000 %------ File and path names, as well as output file format OutputDir ./Model_D5 OutputFile snap % Base filename of generated sequence of files SnapFormat 1 % File format selection %------ Basic structural parameters of model CC 10.0 % halo concentration V200 200.0 % circular velocity v_200 (in km/sec) LAMBDA 0.035 % spin parameter MD 0.035 % disk mass fraction MB 0.0 % bulge mass fraction MBH 0.0 % black hole mass fraction. If zero, no black % hole is generated, otherwise one at the centre % is added. JD 0.035 % disk spin fraction, typically chosen equal to MD DiskHeight 0.2 % thickness of stellar disk in units of radial scale length BulgeSize 0.1 % bulge scale length in units of halo scale length HaloStretch 0.85 % should be one for a spherical halo, smaller than one corresponds to prolate distortion, otherwise oblate BulgeStretch 1.0 % should be one for a spherical bulge, smaller than one corresponds to prolate distortion, otherwise oblate %------ Particle numbers in target model N_HALO 100000 % desired number of particles in dark halo N_DISK 100000 % desired number of collisionless particles in disk N_BULGE 0 % number of bulge particles %------ Selection of symmetry constraints of velocity structure TypeOfHaloVelocityStructure 2 % 0 = spherically symmetric, isotropic % 1 = spherically symmetric, anisotropic (with beta parameter specified) % 2 = axisymmetric, f(E, Lz), with specified net rotation % 3 = axisymmetric, f(E, Lz, I_3), with / specified and net rotation specified TypeOfDiskVelocityStructure 3 % 0 = spherically symmetric, isotropic % 1 = spherically symmetric, anisotropic (with beta parameter specified) % 2 = axisymmetric, f(E, Lz), with specified net rotation % 3 = axisymmetric, f(E, Lz, I_3), with / specified and net rotation specified TypeOfBulgeVelocityStructure 0 % 0 = spherically symmetric, isotropic % 1 = spherically symmetric, anisotropic (with beta parameter specified) % 2 = axisymmetric, f(E, Lz), with specified net rotation % 3 = axisymmetric, f(E, Lz, I_3), with / specified and net rotation specified HaloBetaParameter 0 % only relevant for TypeOfHaloVelocityStructure=1 BulgeBetaParameter 0 % only relevant for TypeOfBulgeVelocityStructure=1 HaloDispersionRoverZratio 1.1 % only relevant for TypeOfHaloVelocityStructure=3 DiskDispersionRoverZratio 2.0 % only relevant for TypeOfDiskVelocityStructure=3 BulgeDispersionRoverZratio 1.0 % only relevant for TypeOfBulgeVelocityStructure=3 HaloStreamingVelocityParameter 1.0 % gives the azimuthal streaming velocity in the TypeOf*VelocityStructure=2/3 cases ('k parameter') DiskStreamingVelocityParameter 1.0 % gives the azimuthal streaming velocity in the TypeOf*VelocityStructure=2/3 cases ('k parameter') BulgeStreamingVelocityParameter 0.0 % gives the azimuthal streaming velocity in the TypeOf*VelocityStructure=2/3 cases ('k parameter') %------ Orbit integration accuracy TorbitFac 10.0 % regulates the integration time of orbits % (this is of the order of the typical number of orbits per particle) TimeStepFactorOrbit 0.01 TimeStepFactorCellCross 0.25 %------ Iterative optimization parameters FractionToOptimizeIndependendly 0.001 IndepenentOptimizationsPerStep 100 StepsBetweenDump 10 MaximumNumberOfSteps 100 MinParticlesPerBinForDispersionMeasurement 100 MinParticlesPerBinForDensityMeasurement 50 %------ Grid dimension and extenstion/resolution DG_MaxLevel 7 EG_MaxLevel 7 FG_Nbin 256 % number of bins for the acceleration grid in the R- and z-directions OutermostBinEnclosedMassFraction 0.999 % regulates the fraction of mass of the Hernquist % halo that must be inside the grid (determines grid extension) InnermostBinEnclosedMassFraction 0.0000001 % regulates the fraction of mass enclosed by the innermost % bin (regulates size of innermost grid cells) MaxVelInUnitsVesc 0.9999 % maximum allowed velocity in units of the local escape velocity %------ Construction of target density field SampleDensityFieldForTargetResponse 1 % if set to 1, the code will randomly sample points to construct the density field SampleParticleCount 100000000 % number of points sampled for target density field %------ Construction of force field SampleForceNhalo 100000000 % number of points to use to for computing force field with a tree SampleForceNdisk 100000000 SampleForceNbulge 0 Softening 0.05 %------ Accuracy settings of tree code used in construction of force field TypeOfOpeningCriterion 1 ErrTolTheta 0.4 ErrTolForceAcc 0.0025 %------ Domain decomposition parameters used in parallel tree code MultipleDomains 4 TopNodeFactor 4 %------ Parallel I/O paramaters, only affects writing of galaxy files NumFilesPerSnapshot 1 NumFilesWrittenInParallel 1 %------ Memory allocation parameters MaxMemSize 2300.0 % in MB BufferSize 100.0 BufferSizeGravity 100.0 %------ Specification of internal system of units UnitLength_in_cm 3.085678e21 % 1.0 kpc UnitMass_in_g 1.989e43 % 1.0e10 solar masses UnitVelocity_in_cm_per_s 1e5 % 1 km/sec GravityConstantInternal 0 GalIC/Model_H1.param000644 000765 000024 00000014520 12373713531 015063 0ustar00volkerstaff000000 000000 %------ File and path names, as well as output file format OutputDir ./Model_H1 OutputFile snap % Base filename of generated sequence of files SnapFormat 1 % File format selection %------ Basic structural parameters of model CC 10.0 % halo concentration V200 200.0 % circular velocity v_200 (in km/sec) LAMBDA 0.0 % spin parameter MD 0.0 % disk mass fraction MB 0.0 % bulge mass fraction MBH 0.0 % black hole mass fraction. If zero, no black % hole is generated, otherwise one at the centre % is added. JD 0.00 % disk spin fraction, typically chosen equal to MD DiskHeight 0.2 % thickness of stellar disk in units of radial scale length BulgeSize 0.2 % bulge scale length in units of halo scale length HaloStretch 1.0 % should be one for a spherical halo, smaller than one corresponds to prolate distortion, otherwise oblate BulgeStretch 1.0 % should be one for a spherical bulge, smaller than one corresponds to prolate distortion, otherwise oblate %------ Particle numbers in target model N_HALO 100000 % desired number of particles in dark halo N_DISK 0 % desired number of collisionless particles in disk N_BULGE 0 % number of bulge particles %------ Selection of symmetry constraints of velocity structure TypeOfHaloVelocityStructure 0 % 0 = spherically symmetric, isotropic % 1 = spherically symmetric, anisotropic (with beta parameter specified) % 2 = axisymmetric, f(E, Lz), with specified net rotation % 3 = axisymmetric, f(E, Lz, I_3), with / specified and net rotation specified TypeOfDiskVelocityStructure 0 % 0 = spherically symmetric, isotropic % 1 = spherically symmetric, anisotropic (with beta parameter specified) % 2 = axisymmetric, f(E, Lz), with specified net rotation % 3 = axisymmetric, f(E, Lz, I_3), with / specified and net rotation specified TypeOfBulgeVelocityStructure 0 % 0 = spherically symmetric, isotropic % 1 = spherically symmetric, anisotropic (with beta parameter specified) % 2 = axisymmetric, f(E, Lz), with specified net rotation % 3 = axisymmetric, f(E, Lz, I_3), with / specified and net rotation specified HaloBetaParameter 0 % only relevant for TypeOfHaloVelocityStructure=1 BulgeBetaParameter 0 % only relevant for TypeOfBulgeVelocityStructure=1 HaloDispersionRoverZratio 4.0 % only relevant for TypeOfHaloVelocityStructure=3 DiskDispersionRoverZratio 4.0 % only relevant for TypeOfDiskVelocityStructure=3 BulgeDispersionRoverZratio 4.0 % only relevant for TypeOfBulgeVelocityStructure=3 HaloStreamingVelocityParameter 0.0 % gives the azimuthal streaming velocity in the TypeOf*VelocityStructure=2/3 cases ('k parameter') DiskStreamingVelocityParameter 1.0 % gives the azimuthal streaming velocity in the TypeOf*VelocityStructure=2/3 cases ('k parameter') BulgeStreamingVelocityParameter 0.0 % gives the azimuthal streaming velocity in the TypeOf*VelocityStructure=2/3 cases ('k parameter') %------ Orbit integration accuracy TorbitFac 10.0 % regulates the integration time of orbits % (this is of the order of the typical number of orbits per particle) TimeStepFactorOrbit 0.01 TimeStepFactorCellCross 0.25 %------ Iterative optimization parameters FractionToOptimizeIndependendly 0.001 IndepenentOptimizationsPerStep 100 StepsBetweenDump 10 MaximumNumberOfSteps 100 MinParticlesPerBinForDispersionMeasurement 100 MinParticlesPerBinForDensityMeasurement 50 %------ Grid dimension and extenstion/resolution DG_MaxLevel 7 EG_MaxLevel 7 FG_Nbin 256 % number of bins for the acceleration grid in the R- and z-directions OutermostBinEnclosedMassFraction 0.999 % regulates the fraction of mass of the Hernquist % halo that must be inside the grid (determines grid extension) InnermostBinEnclosedMassFraction 0.0000001 % regulates the fraction of mass enclosed by the innermost % bin (regulates size of innermost grid cells) MaxVelInUnitsVesc 0.9999 % maximum allowed velocity in units of the local escape velocity %------ Construction of target density field SampleDensityFieldForTargetResponse 1 % if set to 1, the code will randomly sample points to construct the density field SampleParticleCount 100000000 % number of points sampled for target density field %------ Construction of force field SampleForceNhalo 0 % number of points to use to for computing force field with a tree SampleForceNdisk 0 SampleForceNbulge 0 Softening 0.05 %------ Accuracy settings of tree code used in construction of force field TypeOfOpeningCriterion 1 ErrTolTheta 0.4 ErrTolForceAcc 0.0025 %------ Domain decomposition parameters used in parallel tree code MultipleDomains 4 TopNodeFactor 4 %------ Parallel I/O paramaters, only affects writing of galaxy files NumFilesPerSnapshot 1 NumFilesWrittenInParallel 1 %------ Memory allocation parameters MaxMemSize 2300.0 % in MB BufferSize 100.0 BufferSizeGravity 100.0 %------ Specification of internal system of units UnitLength_in_cm 3.085678e21 % 1.0 kpc UnitMass_in_g 1.989e43 % 1.0e10 solar masses UnitVelocity_in_cm_per_s 1e5 % 1 km/sec GravityConstantInternal 0 GalIC/Model_H2.param000644 000765 000024 00000014525 12373713531 015071 0ustar00volkerstaff000000 000000 %------ File and path names, as well as output file format OutputDir ./Model_H2 OutputFile snap % Base filename of generated sequence of files SnapFormat 1 % File format selection %------ Basic structural parameters of model CC 10.0 % halo concentration V200 200.0 % circular velocity v_200 (in km/sec) LAMBDA 0.0 % spin parameter MD 0.0 % disk mass fraction MB 0.0 % bulge mass fraction MBH 0.0 % black hole mass fraction. If zero, no black % hole is generated, otherwise one at the centre % is added. JD 0.00 % disk spin fraction, typically chosen equal to MD DiskHeight 0.2 % thickness of stellar disk in units of radial scale length BulgeSize 0.2 % bulge scale length in units of halo scale length HaloStretch 1.0 % should be one for a spherical halo, smaller than one corresponds to prolate distortion, otherwise oblate BulgeStretch 1.0 % should be one for a spherical bulge, smaller than one corresponds to prolate distortion, otherwise oblate %------ Particle numbers in target model N_HALO 100000 % desired number of particles in dark halo N_DISK 0 % desired number of collisionless particles in disk N_BULGE 0 % number of bulge particles %------ Selection of symmetry constraints of velocity structure TypeOfHaloVelocityStructure 1 % 0 = spherically symmetric, isotropic % 1 = spherically symmetric, anisotropic (with beta parameter specified) % 2 = axisymmetric, f(E, Lz), with specified net rotation % 3 = axisymmetric, f(E, Lz, I_3), with / specified and net rotation specified TypeOfDiskVelocityStructure 0 % 0 = spherically symmetric, isotropic % 1 = spherically symmetric, anisotropic (with beta parameter specified) % 2 = axisymmetric, f(E, Lz), with specified net rotation % 3 = axisymmetric, f(E, Lz, I_3), with / specified and net rotation specified TypeOfBulgeVelocityStructure 0 % 0 = spherically symmetric, isotropic % 1 = spherically symmetric, anisotropic (with beta parameter specified) % 2 = axisymmetric, f(E, Lz), with specified net rotation % 3 = axisymmetric, f(E, Lz, I_3), with / specified and net rotation specified HaloBetaParameter 0.5 % only relevant for TypeOfHaloVelocityStructure=1 BulgeBetaParameter 0 % only relevant for TypeOfBulgeVelocityStructure=1 HaloDispersionRoverZratio 4.0 % only relevant for TypeOfHaloVelocityStructure=3 DiskDispersionRoverZratio 4.0 % only relevant for TypeOfDiskVelocityStructure=3 BulgeDispersionRoverZratio 4.0 % only relevant for TypeOfBulgeVelocityStructure=3 HaloStreamingVelocityParameter 0.0 % gives the azimuthal streaming velocity in the TypeOf*VelocityStructure=2/3 cases ('k parameter') DiskStreamingVelocityParameter 1.0 % gives the azimuthal streaming velocity in the TypeOf*VelocityStructure=2/3 cases ('k parameter') BulgeStreamingVelocityParameter 0.0 % gives the azimuthal streaming velocity in the TypeOf*VelocityStructure=2/3 cases ('k parameter') %------ Orbit integration accuracy TorbitFac 10.0 % regulates the integration time of orbits % (this is of the order of the typical number of orbits per particle) TimeStepFactorOrbit 0.01 TimeStepFactorCellCross 0.25 %------ Iterative optimization parameters FractionToOptimizeIndependendly 0.001 IndepenentOptimizationsPerStep 100 StepsBetweenDump 10 MaximumNumberOfSteps 100 MinParticlesPerBinForDispersionMeasurement 100 MinParticlesPerBinForDensityMeasurement 50 %------ Grid dimension and extenstion/resolution DG_MaxLevel 7 EG_MaxLevel 7 FG_Nbin 256 % number of bins for the acceleration grid in the R- and z-directions OutermostBinEnclosedMassFraction 0.999 % regulates the fraction of mass of the Hernquist % halo that must be inside the grid (determines grid extension) InnermostBinEnclosedMassFraction 0.0000001 % regulates the fraction of mass enclosed by the innermost % bin (regulates size of innermost grid cells) MaxVelInUnitsVesc 0.9999 % maximum allowed velocity in units of the local escape velocity %------ Construction of target density field SampleDensityFieldForTargetResponse 1 % if set to 1, the code will randomly sample points to construct the density field SampleParticleCount 100000000 % number of points sampled for target density field %------ Construction of force field SampleForceNhalo 0 % number of points to use to for computing force field with a tree SampleForceNdisk 0 SampleForceNbulge 0 Softening 0.05 %------ Accuracy settings of tree code used in construction of force field TypeOfOpeningCriterion 1 ErrTolTheta 0.4 ErrTolForceAcc 0.0025 %------ Domain decomposition parameters used in parallel tree code MultipleDomains 4 TopNodeFactor 4 %------ Parallel I/O paramaters, only affects writing of galaxy files NumFilesPerSnapshot 1 NumFilesWrittenInParallel 1 %------ Memory allocation parameters MaxMemSize 2300.0 % in MB BufferSize 100.0 BufferSizeGravity 100.0 %------ Specification of internal system of units UnitLength_in_cm 3.085678e21 % 1.0 kpc UnitMass_in_g 1.989e43 % 1.0e10 solar masses UnitVelocity_in_cm_per_s 1e5 % 1 km/sec GravityConstantInternal 0 GalIC/Model_H3.param000644 000765 000024 00000014524 12373713531 015071 0ustar00volkerstaff000000 000000 %------ File and path names, as well as output file format OutputDir ./Model_H3 OutputFile snap % Base filename of generated sequence of files SnapFormat 1 % File format selection %------ Basic structural parameters of model CC 10.0 % halo concentration V200 200.0 % circular velocity v_200 (in km/sec) LAMBDA 0.0 % spin parameter MD 0.0 % disk mass fraction MB 0.0 % bulge mass fraction MBH 0.0 % black hole mass fraction. If zero, no black % hole is generated, otherwise one at the centre % is added. JD 0.00 % disk spin fraction, typically chosen equal to MD DiskHeight 0.2 % thickness of stellar disk in units of radial scale length BulgeSize 0.2 % bulge scale length in units of halo scale length HaloStretch 1.0 % should be one for a spherical halo, smaller than one corresponds to prolate distortion, otherwise oblate BulgeStretch 1.0 % should be one for a spherical bulge, smaller than one corresponds to prolate distortion, otherwise oblate %------ Particle numbers in target model N_HALO 100000 % desired number of particles in dark halo N_DISK 0 % desired number of collisionless particles in disk N_BULGE 0 % number of bulge particles %------ Selection of symmetry constraints of velocity structure TypeOfHaloVelocityStructure 1 % 0 = spherically symmetric, isotropic % 1 = spherically symmetric, anisotropic (with beta parameter specified) % 2 = axisymmetric, f(E, Lz), with specified net rotation % 3 = axisymmetric, f(E, Lz, I_3), with / specified and net rotation specified TypeOfDiskVelocityStructure 0 % 0 = spherically symmetric, isotropic % 1 = spherically symmetric, anisotropic (with beta parameter specified) % 2 = axisymmetric, f(E, Lz), with specified net rotation % 3 = axisymmetric, f(E, Lz, I_3), with / specified and net rotation specified TypeOfBulgeVelocityStructure 0 % 0 = spherically symmetric, isotropic % 1 = spherically symmetric, anisotropic (with beta parameter specified) % 2 = axisymmetric, f(E, Lz), with specified net rotation % 3 = axisymmetric, f(E, Lz, I_3), with / specified and net rotation specified HaloBetaParameter -1.0 % only relevant for TypeOfHaloVelocityStructure=1 BulgeBetaParameter 0 % only relevant for TypeOfBulgeVelocityStructure=1 HaloDispersionRoverZratio 4.0 % only relevant for TypeOfHaloVelocityStructure=3 DiskDispersionRoverZratio 4.0 % only relevant for TypeOfDiskVelocityStructure=3 BulgeDispersionRoverZratio 4.0 % only relevant for TypeOfBulgeVelocityStructure=3 HaloStreamingVelocityParameter 0.0 % gives the azimuthal streaming velocity in the TypeOf*VelocityStructure=2/3 cases ('k parameter') DiskStreamingVelocityParameter 1.0 % gives the azimuthal streaming velocity in the TypeOf*VelocityStructure=2/3 cases ('k parameter') BulgeStreamingVelocityParameter 0.0 % gives the azimuthal streaming velocity in the TypeOf*VelocityStructure=2/3 cases ('k parameter') %------ Orbit integration accuracy TorbitFac 10.0 % regulates the integration time of orbits % (this is of the order of the typical number of orbits per particle) TimeStepFactorOrbit 0.01 TimeStepFactorCellCross 0.25 %------ Iterative optimization parameters FractionToOptimizeIndependendly 0.001 IndepenentOptimizationsPerStep 100 StepsBetweenDump 10 MaximumNumberOfSteps 100 MinParticlesPerBinForDispersionMeasurement 100 MinParticlesPerBinForDensityMeasurement 50 %------ Grid dimension and extenstion/resolution DG_MaxLevel 7 EG_MaxLevel 7 FG_Nbin 256 % number of bins for the acceleration grid in the R- and z-directions OutermostBinEnclosedMassFraction 0.999 % regulates the fraction of mass of the Hernquist % halo that must be inside the grid (determines grid extension) InnermostBinEnclosedMassFraction 0.0000001 % regulates the fraction of mass enclosed by the innermost % bin (regulates size of innermost grid cells) MaxVelInUnitsVesc 0.9999 % maximum allowed velocity in units of the local escape velocity %------ Construction of target density field SampleDensityFieldForTargetResponse 1 % if set to 1, the code will randomly sample points to construct the density field SampleParticleCount 100000000 % number of points sampled for target density field %------ Construction of force field SampleForceNhalo 0 % number of points to use to for computing force field with a tree SampleForceNdisk 0 SampleForceNbulge 0 Softening 0.05 %------ Accuracy settings of tree code used in construction of force field TypeOfOpeningCriterion 1 ErrTolTheta 0.4 ErrTolForceAcc 0.0025 %------ Domain decomposition parameters used in parallel tree code MultipleDomains 4 TopNodeFactor 4 %------ Parallel I/O paramaters, only affects writing of galaxy files NumFilesPerSnapshot 1 NumFilesWrittenInParallel 1 %------ Memory allocation parameters MaxMemSize 2300.0 % in MB BufferSize 100.0 BufferSizeGravity 100.0 %------ Specification of internal system of units UnitLength_in_cm 3.085678e21 % 1.0 kpc UnitMass_in_g 1.989e43 % 1.0e10 solar masses UnitVelocity_in_cm_per_s 1e5 % 1 km/sec GravityConstantInternal 0 GalIC/Model_H4.param000644 000765 000024 00000014600 12373713531 015065 0ustar00volkerstaff000000 000000 %------ File and path names, as well as output file format OutputDir ./Model_H4 OutputFile snap % Base filename of generated sequence of files SnapFormat 1 % File format selection %------ Basic structural parameters of model CC 10.0 % halo concentration V200 200.0 % circular velocity v_200 (in km/sec) LAMBDA 0.0 % spin parameter MD 0.0 % disk mass fraction MB 0.0 % bulge mass fraction MBH 0.0 % black hole mass fraction. If zero, no black % hole is generated, otherwise one at the centre % is added. JD 0.00 % disk spin fraction, typically chosen equal to MD DiskHeight 0.2 % thickness of stellar disk in units of radial scale length BulgeSize 0.2 % bulge scale length in units of halo scale length HaloStretch 1.0 % should be one for a spherical halo, smaller than one corresponds to prolate distortion, otherwise oblate BulgeStretch 1.0 % should be one for a spherical bulge, smaller than one corresponds to prolate distortion, otherwise oblate %------ Particle numbers in target model N_HALO 100000 % desired number of particles in dark halo N_DISK 0 % desired number of collisionless particles in disk N_BULGE 0 % number of bulge particles %------ Selection of symmetry constraints of velocity structure TypeOfHaloVelocityStructure 1 % 0 = spherically symmetric, isotropic % 1 = spherically symmetric, anisotropic (with beta parameter specified) % 2 = axisymmetric, f(E, Lz), with specified net rotation % 3 = axisymmetric, f(E, Lz, I_3), with / specified and net rotation specified TypeOfDiskVelocityStructure 0 % 0 = spherically symmetric, isotropic % 1 = spherically symmetric, anisotropic (with beta parameter specified) % 2 = axisymmetric, f(E, Lz), with specified net rotation % 3 = axisymmetric, f(E, Lz, I_3), with / specified and net rotation specified TypeOfBulgeVelocityStructure 0 % 0 = spherically symmetric, isotropic % 1 = spherically symmetric, anisotropic (with beta parameter specified) % 2 = axisymmetric, f(E, Lz), with specified net rotation % 3 = axisymmetric, f(E, Lz, I_3), with / specified and net rotation specified HaloBetaParameter 2.0 % only relevant for TypeOfHaloVelocityStructure=1, value larger than 1 selects beta(r) model BulgeBetaParameter 0 % only relevant for TypeOfBulgeVelocityStructure=1 HaloDispersionRoverZratio 4.0 % only relevant for TypeOfHaloVelocityStructure=3 DiskDispersionRoverZratio 4.0 % only relevant for TypeOfDiskVelocityStructure=3 BulgeDispersionRoverZratio 4.0 % only relevant for TypeOfBulgeVelocityStructure=3 HaloStreamingVelocityParameter 0.0 % gives the azimuthal streaming velocity in the TypeOf*VelocityStructure=2/3 cases ('k parameter') DiskStreamingVelocityParameter 1.0 % gives the azimuthal streaming velocity in the TypeOf*VelocityStructure=2/3 cases ('k parameter') BulgeStreamingVelocityParameter 0.0 % gives the azimuthal streaming velocity in the TypeOf*VelocityStructure=2/3 cases ('k parameter') %------ Orbit integration accuracy TorbitFac 10.0 % regulates the integration time of orbits % (this is of the order of the typical number of orbits per particle) TimeStepFactorOrbit 0.01 TimeStepFactorCellCross 0.25 %------ Iterative optimization parameters FractionToOptimizeIndependendly 0.001 IndepenentOptimizationsPerStep 100 StepsBetweenDump 10 MaximumNumberOfSteps 100 MinParticlesPerBinForDispersionMeasurement 100 MinParticlesPerBinForDensityMeasurement 50 %------ Grid dimension and extenstion/resolution DG_MaxLevel 7 EG_MaxLevel 7 FG_Nbin 256 % number of bins for the acceleration grid in the R- and z-directions OutermostBinEnclosedMassFraction 0.999 % regulates the fraction of mass of the Hernquist % halo that must be inside the grid (determines grid extension) InnermostBinEnclosedMassFraction 0.0000001 % regulates the fraction of mass enclosed by the innermost % bin (regulates size of innermost grid cells) MaxVelInUnitsVesc 0.9999 % maximum allowed velocity in units of the local escape velocity %------ Construction of target density field SampleDensityFieldForTargetResponse 1 % if set to 1, the code will randomly sample points to construct the density field SampleParticleCount 100000000 % number of points sampled for target density field %------ Construction of force field SampleForceNhalo 0 % number of points to use to for computing force field with a tree SampleForceNdisk 0 SampleForceNbulge 0 Softening 0.05 %------ Accuracy settings of tree code used in construction of force field TypeOfOpeningCriterion 1 ErrTolTheta 0.4 ErrTolForceAcc 0.0025 %------ Domain decomposition parameters used in parallel tree code MultipleDomains 4 TopNodeFactor 4 %------ Parallel I/O paramaters, only affects writing of galaxy files NumFilesPerSnapshot 1 NumFilesWrittenInParallel 1 %------ Memory allocation parameters MaxMemSize 2300.0 % in MB BufferSize 100.0 BufferSizeGravity 100.0 %------ Specification of internal system of units UnitLength_in_cm 3.085678e21 % 1.0 kpc UnitMass_in_g 1.989e43 % 1.0e10 solar masses UnitVelocity_in_cm_per_s 1e5 % 1 km/sec GravityConstantInternal 0 GalIC/Model_H5.param000644 000765 000024 00000014656 12373713531 015101 0ustar00volkerstaff000000 000000 %------ File and path names, as well as output file format OutputDir ./Model_H5 OutputFile snap % Base filename of generated sequence of files SnapFormat 1 % File format selection %------ Basic structural parameters of model CC 10.0 % halo concentration V200 200.0 % circular velocity v_200 (in km/sec) LAMBDA 0.0 % spin parameter MD 0.0 % disk mass fraction MB 0.0 % bulge mass fraction MBH 0.0 % black hole mass fraction. If zero, no black % hole is generated, otherwise one at the centre % is added. JD 0.00 % disk spin fraction, typically chosen equal to MD DiskHeight 0.2 % thickness of stellar disk in units of radial scale length BulgeSize 0.2 % bulge scale length in units of halo scale length HaloStretch 1.0 % should be one for a spherical halo, smaller than one corresponds to prolate distortion, otherwise oblate BulgeStretch 1.0 % should be one for a spherical bulge, smaller than one corresponds to prolate distortion, otherwise oblate %------ Particle numbers in target model N_HALO 100000 % desired number of particles in dark halo N_DISK 0 % desired number of collisionless particles in disk N_BULGE 0 % number of bulge particles %------ Selection of symmetry constraints of velocity structure TypeOfHaloVelocityStructure 2 % 0 = spherically symmetric, isotropic % 1 = spherically symmetric, anisotropic (with beta parameter specified) % 2 = axisymmetric, f(E, Lz), with specified net rotation % 3 = axisymmetric, f(E, Lz, I_3), with / specified and net rotation specified TypeOfDiskVelocityStructure 0 % 0 = spherically symmetric, isotropic % 1 = spherically symmetric, anisotropic (with beta parameter specified) % 2 = axisymmetric, f(E, Lz), with specified net rotation % 3 = axisymmetric, f(E, Lz, I_3), with / specified and net rotation specified TypeOfBulgeVelocityStructure 0 % 0 = spherically symmetric, isotropic % 1 = spherically symmetric, anisotropic (with beta parameter specified) % 2 = axisymmetric, f(E, Lz), with specified net rotation % 3 = axisymmetric, f(E, Lz, I_3), with / specified and net rotation specified HaloBetaParameter 0 % only relevant for TypeOfHaloVelocityStructure=1, value larger than 1 selects beta(r) model BulgeBetaParameter 0 % only relevant for TypeOfBulgeVelocityStructure=1 HaloDispersionRoverZratio 4.0 % only relevant for TypeOfHaloVelocityStructure=3 DiskDispersionRoverZratio 4.0 % only relevant for TypeOfDiskVelocityStructure=3 BulgeDispersionRoverZratio 4.0 % only relevant for TypeOfBulgeVelocityStructure=3 HaloStreamingVelocityParameter -0.1 % gives the azimuthal streaming velocity in the TypeOf*VelocityStructure=2/3 cases ('k parameter'), a negative value gives this in units of kmax DiskStreamingVelocityParameter 0.0 % gives the azimuthal streaming velocity in the TypeOf*VelocityStructure=2/3 cases ('k parameter') BulgeStreamingVelocityParameter 0.0 % gives the azimuthal streaming velocity in the TypeOf*VelocityStructure=2/3 cases ('k parameter') %------ Orbit integration accuracy TorbitFac 10.0 % regulates the integration time of orbits % (this is of the order of the typical number of orbits per particle) TimeStepFactorOrbit 0.01 TimeStepFactorCellCross 0.25 %------ Iterative optimization parameters FractionToOptimizeIndependendly 0.001 IndepenentOptimizationsPerStep 100 StepsBetweenDump 10 MaximumNumberOfSteps 100 MinParticlesPerBinForDispersionMeasurement 100 MinParticlesPerBinForDensityMeasurement 50 %------ Grid dimension and extenstion/resolution DG_MaxLevel 7 EG_MaxLevel 7 FG_Nbin 256 % number of bins for the acceleration grid in the R- and z-directions OutermostBinEnclosedMassFraction 0.999 % regulates the fraction of mass of the Hernquist % halo that must be inside the grid (determines grid extension) InnermostBinEnclosedMassFraction 0.0000001 % regulates the fraction of mass enclosed by the innermost % bin (regulates size of innermost grid cells) MaxVelInUnitsVesc 0.9999 % maximum allowed velocity in units of the local escape velocity %------ Construction of target density field SampleDensityFieldForTargetResponse 1 % if set to 1, the code will randomly sample points to construct the density field SampleParticleCount 100000000 % number of points sampled for target density field %------ Construction of force field SampleForceNhalo 0 % number of points to use to for computing force field with a tree SampleForceNdisk 0 SampleForceNbulge 0 Softening 0.05 %------ Accuracy settings of tree code used in construction of force field TypeOfOpeningCriterion 1 ErrTolTheta 0.4 ErrTolForceAcc 0.0025 %------ Domain decomposition parameters used in parallel tree code MultipleDomains 4 TopNodeFactor 4 %------ Parallel I/O paramaters, only affects writing of galaxy files NumFilesPerSnapshot 1 NumFilesWrittenInParallel 1 %------ Memory allocation parameters MaxMemSize 2300.0 % in MB BufferSize 100.0 BufferSizeGravity 100.0 %------ Specification of internal system of units UnitLength_in_cm 3.085678e21 % 1.0 kpc UnitMass_in_g 1.989e43 % 1.0e10 solar masses UnitVelocity_in_cm_per_s 1e5 % 1 km/sec GravityConstantInternal 0 GalIC/Model_H6.param000644 000765 000024 00000014605 12373713531 015074 0ustar00volkerstaff000000 000000 %------ File and path names, as well as output file format OutputDir ./Model_H6 OutputFile snap % Base filename of generated sequence of files SnapFormat 1 % File format selection %------ Basic structural parameters of model CC 10.0 % halo concentration V200 200.0 % circular velocity v_200 (in km/sec) LAMBDA 0.0 % spin parameter MD 0.0 % disk mass fraction MB 0.0 % bulge mass fraction MBH 0.0 % black hole mass fraction. If zero, no black % hole is generated, otherwise one at the centre % is added. JD 0.00 % disk spin fraction, typically chosen equal to MD DiskHeight 0.2 % thickness of stellar disk in units of radial scale length BulgeSize 0.2 % bulge scale length in units of halo scale length HaloStretch 0.85 % should be one for a spherical halo, smaller than one corresponds to prolate distortion, otherwise oblate BulgeStretch 1.0 % should be one for a spherical bulge, smaller than one corresponds to prolate distortion, otherwise oblate %------ Particle numbers in target model N_HALO 100000 % desired number of particles in dark halo N_DISK 0 % desired number of collisionless particles in disk N_BULGE 0 % number of bulge particles %------ Selection of symmetry constraints of velocity structure TypeOfHaloVelocityStructure 2 % 0 = spherically symmetric, isotropic % 1 = spherically symmetric, anisotropic (with beta parameter specified) % 2 = axisymmetric, f(E, Lz), with specified net rotation % 3 = axisymmetric, f(E, Lz, I_3), with / specified and net rotation specified TypeOfDiskVelocityStructure 0 % 0 = spherically symmetric, isotropic % 1 = spherically symmetric, anisotropic (with beta parameter specified) % 2 = axisymmetric, f(E, Lz), with specified net rotation % 3 = axisymmetric, f(E, Lz, I_3), with / specified and net rotation specified TypeOfBulgeVelocityStructure 0 % 0 = spherically symmetric, isotropic % 1 = spherically symmetric, anisotropic (with beta parameter specified) % 2 = axisymmetric, f(E, Lz), with specified net rotation % 3 = axisymmetric, f(E, Lz, I_3), with / specified and net rotation specified HaloBetaParameter 0 % only relevant for TypeOfHaloVelocityStructure=1, value larger than 1 selects beta(r) model BulgeBetaParameter 0 % only relevant for TypeOfBulgeVelocityStructure=1 HaloDispersionRoverZratio 4.0 % only relevant for TypeOfHaloVelocityStructure=3 DiskDispersionRoverZratio 4.0 % only relevant for TypeOfDiskVelocityStructure=3 BulgeDispersionRoverZratio 4.0 % only relevant for TypeOfBulgeVelocityStructure=3 HaloStreamingVelocityParameter 0.0 % gives the azimuthal streaming velocity in the TypeOf*VelocityStructure=2/3 cases ('k parameter') DiskStreamingVelocityParameter 0.0 % gives the azimuthal streaming velocity in the TypeOf*VelocityStructure=2/3 cases ('k parameter') BulgeStreamingVelocityParameter 0.0 % gives the azimuthal streaming velocity in the TypeOf*VelocityStructure=2/3 cases ('k parameter') %------ Orbit integration accuracy TorbitFac 10.0 % regulates the integration time of orbits % (this is of the order of the typical number of orbits per particle) TimeStepFactorOrbit 0.01 TimeStepFactorCellCross 0.25 %------ Iterative optimization parameters FractionToOptimizeIndependendly 0.001 IndepenentOptimizationsPerStep 100 StepsBetweenDump 10 MaximumNumberOfSteps 100 MinParticlesPerBinForDispersionMeasurement 100 MinParticlesPerBinForDensityMeasurement 50 %------ Grid dimension and extenstion/resolution DG_MaxLevel 7 EG_MaxLevel 7 FG_Nbin 256 % number of bins for the acceleration grid in the R- and z-directions OutermostBinEnclosedMassFraction 0.999 % regulates the fraction of mass of the Hernquist % halo that must be inside the grid (determines grid extension) InnermostBinEnclosedMassFraction 0.0000001 % regulates the fraction of mass enclosed by the innermost % bin (regulates size of innermost grid cells) MaxVelInUnitsVesc 0.9999 % maximum allowed velocity in units of the local escape velocity %------ Construction of target density field SampleDensityFieldForTargetResponse 1 % if set to 1, the code will randomly sample points to construct the density field SampleParticleCount 100000000 % number of points sampled for target density field %------ Construction of force field SampleForceNhalo 100000000 % number of points to use to for computing force field with a tree SampleForceNdisk 0 SampleForceNbulge 0 Softening 0.05 %------ Accuracy settings of tree code used in construction of force field TypeOfOpeningCriterion 1 ErrTolTheta 0.4 ErrTolForceAcc 0.0025 %------ Domain decomposition parameters used in parallel tree code MultipleDomains 4 TopNodeFactor 4 %------ Parallel I/O paramaters, only affects writing of galaxy files NumFilesPerSnapshot 1 NumFilesWrittenInParallel 1 %------ Memory allocation parameters MaxMemSize 2300.0 % in MB BufferSize 100.0 BufferSizeGravity 100.0 %------ Specification of internal system of units UnitLength_in_cm 3.085678e21 % 1.0 kpc UnitMass_in_g 1.989e43 % 1.0e10 solar masses UnitVelocity_in_cm_per_s 1e5 % 1 km/sec GravityConstantInternal 0 GalIC/Model_H7.param000644 000765 000024 00000014601 12373713531 015071 0ustar00volkerstaff000000 000000 %------ File and path names, as well as output file format OutputDir ./Model_H7 OutputFile snap % Base filename of generated sequence of files SnapFormat 1 % File format selection %------ Basic structural parameters of model CC 10.0 % halo concentration V200 200.0 % circular velocity v_200 (in km/sec) LAMBDA 0.0 % spin parameter MD 0.0 % disk mass fraction MB 0.0 % bulge mass fraction MBH 0.0 % black hole mass fraction. If zero, no black % hole is generated, otherwise one at the centre % is added. JD 0.00 % disk spin fraction, typically chosen equal to MD DiskHeight 0.2 % thickness of stellar disk in units of radial scale length BulgeSize 0.2 % bulge scale length in units of halo scale length HaloStretch 1.15 % should be one for a spherical halo, smaller than one corresponds to prolate distortion, otherwise oblate BulgeStretch 1.0 % should be one for a spherical bulge, smaller than one corresponds to prolate distortion, otherwise oblate %------ Particle numbers in target model N_HALO 100000 % desired number of particles in dark halo N_DISK 0 % desired number of collisionless particles in disk N_BULGE 0 % number of bulge particles %------ Selection of symmetry constraints of velocity structure TypeOfHaloVelocityStructure 2 % 0 = spherically symmetric, isotropic % 1 = spherically symmetric, anisotropic (with beta parameter specified) % 2 = axisymmetric, f(E, Lz), with specified net rotation % 3 = axisymmetric, f(E, Lz, I_3), with / specified and net rotation specified TypeOfDiskVelocityStructure 0 % 0 = spherically symmetric, isotropic % 1 = spherically symmetric, anisotropic (with beta parameter specified) % 2 = axisymmetric, f(E, Lz), with specified net rotation % 3 = axisymmetric, f(E, Lz, I_3), with / specified and net rotation specified TypeOfBulgeVelocityStructure 0 % 0 = spherically symmetric, isotropic % 1 = spherically symmetric, anisotropic (with beta parameter specified) % 2 = axisymmetric, f(E, Lz), with specified net rotation % 3 = axisymmetric, f(E, Lz, I_3), with / specified and net rotation specified HaloBetaParameter 0 % only relevant for TypeOfHaloVelocityStructure=1, value larger than 1 selects beta(r) model BulgeBetaParameter 0 % only relevant for TypeOfBulgeVelocityStructure=1 HaloDispersionRoverZratio 4.0 % only relevant for TypeOfHaloVelocityStructure=3 DiskDispersionRoverZratio 4.0 % only relevant for TypeOfDiskVelocityStructure=3 BulgeDispersionRoverZratio 4.0 % only relevant for TypeOfBulgeVelocityStructure=3 HaloStreamingVelocityParameter 0.0 % gives the azimuthal streaming velocity in the TypeOf*VelocityStructure=2/3 cases ('k parameter') DiskStreamingVelocityParameter 0.0 % gives the azimuthal streaming velocity in the TypeOf*VelocityStructure=2/3 cases ('k parameter') BulgeStreamingVelocityParameter 0.0 % gives the azimuthal streaming velocity in the TypeOf*VelocityStructure=2/3 cases ('k parameter') %------ Orbit integration accuracy TorbitFac 10.0 % regulates the integration time of orbits % (this is of the order of the typical number of orbits per particle) TimeStepFactorOrbit 0.01 TimeStepFactorCellCross 0.25 %------ Iterative optimization parameters FractionToOptimizeIndependendly 0.001 IndepenentOptimizationsPerStep 100 StepsBetweenDump 10 MaximumNumberOfSteps 100 MinParticlesPerBinForDispersionMeasurement 100 MinParticlesPerBinForDensityMeasurement 50 %------ Grid dimension and extenstion/resolution DG_MaxLevel 7 EG_MaxLevel 7 FG_Nbin 256 % number of bins for the acceleration grid in the R- and z-directions OutermostBinEnclosedMassFraction 0.999 % regulates the fraction of mass of the Hernquist % halo that must be inside the grid (determines grid extension) InnermostBinEnclosedMassFraction 0.0000001 % regulates the fraction of mass enclosed by the innermost % bin (regulates size of innermost grid cells) MaxVelInUnitsVesc 0.9999 % maximum allowed velocity in units of the local escape velocity %------ Construction of target density field SampleDensityFieldForTargetResponse 1 % if set to 1, the code will randomly sample points to construct the density field SampleParticleCount 100000000 % number of points sampled for target density field %------ Construction of force field SampleForceNhalo 100000000 % number of points to use to for computing force field with a tree SampleForceNdisk 0 SampleForceNbulge 0 Softening 0.05 %------ Accuracy settings of tree code used in construction of force field TypeOfOpeningCriterion 1 ErrTolTheta 0.4 ErrTolForceAcc 0.0025 %------ Domain decomposition parameters used in parallel tree code MultipleDomains 4 TopNodeFactor 4 %------ Parallel I/O paramaters, only affects writing of galaxy files NumFilesPerSnapshot 1 NumFilesWrittenInParallel 1 %------ Memory allocation parameters MaxMemSize 2300.0 % in MB BufferSize 100.0 BufferSizeGravity 100.0 %------ Specification of internal system of units UnitLength_in_cm 3.085678e21 % 1.0 kpc UnitMass_in_g 1.989e43 % 1.0e10 solar masses UnitVelocity_in_cm_per_s 1e5 % 1 km/sec GravityConstantInternal 0 GalIC/Model_M1.param000644 000765 000024 00000014531 12373713530 015071 0ustar00volkerstaff000000 000000 %------ File and path names, as well as output file format OutputDir ./Model_M1 OutputFile snap % Base filename of generated sequence of files SnapFormat 1 % File format selection %------ Basic structural parameters of model CC 10.0 % halo concentration V200 200.0 % circular velocity v_200 (in km/sec) LAMBDA 0.035 % spin parameter MD 0.035 % disk mass fraction MB 0.05 % bulge mass fraction MBH 0.0 % black hole mass fraction. If zero, no black % hole is generated, otherwise one at the centre % is added. JD 0.035 % disk spin fraction, typically chosen equal to MD DiskHeight 0.2 % thickness of stellar disk in units of radial scale length BulgeSize 0.1 % bulge scale length in units of halo scale length HaloStretch 1.0 % should be one for a spherical halo, smaller than one corresponds to prolate distortion, otherwise oblate BulgeStretch 1.0 % should be one for a spherical bulge, smaller than one corresponds to prolate distortion, otherwise oblate %------ Particle numbers in target model N_HALO 100000 % desired number of particles in dark halo N_DISK 100000 % desired number of collisionless particles in disk N_BULGE 100000 % number of bulge particles %------ Selection of symmetry constraints of velocity structure TypeOfHaloVelocityStructure 2 % 0 = spherically symmetric, isotropic % 1 = spherically symmetric, anisotropic (with beta parameter specified) % 2 = axisymmetric, f(E, Lz), with specified net rotation % 3 = axisymmetric, f(E, Lz, I_3), with / specified and net rotation specified TypeOfDiskVelocityStructure 2 % 0 = spherically symmetric, isotropic % 1 = spherically symmetric, anisotropic (with beta parameter specified) % 2 = axisymmetric, f(E, Lz), with specified net rotation % 3 = axisymmetric, f(E, Lz, I_3), with / specified and net rotation specified TypeOfBulgeVelocityStructure 2 % 0 = spherically symmetric, isotropic % 1 = spherically symmetric, anisotropic (with beta parameter specified) % 2 = axisymmetric, f(E, Lz), with specified net rotation % 3 = axisymmetric, f(E, Lz, I_3), with / specified and net rotation specified HaloBetaParameter 0 % only relevant for TypeOfHaloVelocityStructure=1 BulgeBetaParameter 0 % only relevant for TypeOfBulgeVelocityStructure=1 HaloDispersionRoverZratio 4.0 % only relevant for TypeOfHaloVelocityStructure=3 DiskDispersionRoverZratio 4.0 % only relevant for TypeOfDiskVelocityStructure=3 BulgeDispersionRoverZratio 4.0 % only relevant for TypeOfBulgeVelocityStructure=3 HaloStreamingVelocityParameter 0.0 % gives the azimuthal streaming velocity in the TypeOf*VelocityStructure=2/3 cases ('k parameter') DiskStreamingVelocityParameter 1.0 % gives the azimuthal streaming velocity in the TypeOf*VelocityStructure=2/3 cases ('k parameter') BulgeStreamingVelocityParameter 0.0 % gives the azimuthal streaming velocity in the TypeOf*VelocityStructure=2/3 cases ('k parameter') %------ Orbit integration accuracy TorbitFac 10.0 % regulates the integration time of orbits % (this is of the order of the typical number of orbits per particle) TimeStepFactorOrbit 0.01 TimeStepFactorCellCross 0.25 %------ Iterative optimization parameters FractionToOptimizeIndependendly 0.001 IndepenentOptimizationsPerStep 100 StepsBetweenDump 10 MaximumNumberOfSteps 100 MinParticlesPerBinForDispersionMeasurement 100 MinParticlesPerBinForDensityMeasurement 50 %------ Grid dimension and extenstion/resolution DG_MaxLevel 7 EG_MaxLevel 7 FG_Nbin 256 % number of bins for the acceleration grid in the R- and z-directions OutermostBinEnclosedMassFraction 0.999 % regulates the fraction of mass of the Hernquist % halo that must be inside the grid (determines grid extension) InnermostBinEnclosedMassFraction 0.0000001 % regulates the fraction of mass enclosed by the innermost % bin (regulates size of innermost grid cells) MaxVelInUnitsVesc 0.9999 % maximum allowed velocity in units of the local escape velocity %------ Construction of target density field SampleDensityFieldForTargetResponse 1 % if set to 1, the code will randomly sample points to construct the density field SampleParticleCount 100000000 % number of points sampled for target density field %------ Construction of force field SampleForceNhalo 0 % number of points to use to for computing force field with a tree SampleForceNdisk 100000000 SampleForceNbulge 0 Softening 0.05 %------ Accuracy settings of tree code used in construction of force field TypeOfOpeningCriterion 1 ErrTolTheta 0.4 ErrTolForceAcc 0.0025 %------ Domain decomposition parameters used in parallel tree code MultipleDomains 4 TopNodeFactor 4 %------ Parallel I/O paramaters, only affects writing of galaxy files NumFilesPerSnapshot 1 NumFilesWrittenInParallel 1 %------ Memory allocation parameters MaxMemSize 2300.0 % in MB BufferSize 100.0 BufferSizeGravity 100.0 %------ Specification of internal system of units UnitLength_in_cm 3.085678e21 % 1.0 kpc UnitMass_in_g 1.989e43 % 1.0e10 solar masses UnitVelocity_in_cm_per_s 1e5 % 1 km/sec GravityConstantInternal 0 GalIC/Model_M2.param000644 000765 000024 00000014531 12373713530 015072 0ustar00volkerstaff000000 000000 %------ File and path names, as well as output file format OutputDir ./Model_M2 OutputFile snap % Base filename of generated sequence of files SnapFormat 1 % File format selection %------ Basic structural parameters of model CC 10.0 % halo concentration V200 200.0 % circular velocity v_200 (in km/sec) LAMBDA 0.035 % spin parameter MD 0.035 % disk mass fraction MB 0.05 % bulge mass fraction MBH 0.0 % black hole mass fraction. If zero, no black % hole is generated, otherwise one at the centre % is added. JD 0.035 % disk spin fraction, typically chosen equal to MD DiskHeight 0.2 % thickness of stellar disk in units of radial scale length BulgeSize 0.1 % bulge scale length in units of halo scale length HaloStretch 1.0 % should be one for a spherical halo, smaller than one corresponds to prolate distortion, otherwise oblate BulgeStretch 1.0 % should be one for a spherical bulge, smaller than one corresponds to prolate distortion, otherwise oblate %------ Particle numbers in target model N_HALO 100000 % desired number of particles in dark halo N_DISK 100000 % desired number of collisionless particles in disk N_BULGE 100000 % number of bulge particles %------ Selection of symmetry constraints of velocity structure TypeOfHaloVelocityStructure 2 % 0 = spherically symmetric, isotropic % 1 = spherically symmetric, anisotropic (with beta parameter specified) % 2 = axisymmetric, f(E, Lz), with specified net rotation % 3 = axisymmetric, f(E, Lz, I_3), with / specified and net rotation specified TypeOfDiskVelocityStructure 3 % 0 = spherically symmetric, isotropic % 1 = spherically symmetric, anisotropic (with beta parameter specified) % 2 = axisymmetric, f(E, Lz), with specified net rotation % 3 = axisymmetric, f(E, Lz, I_3), with / specified and net rotation specified TypeOfBulgeVelocityStructure 2 % 0 = spherically symmetric, isotropic % 1 = spherically symmetric, anisotropic (with beta parameter specified) % 2 = axisymmetric, f(E, Lz), with specified net rotation % 3 = axisymmetric, f(E, Lz, I_3), with / specified and net rotation specified HaloBetaParameter 0 % only relevant for TypeOfHaloVelocityStructure=1 BulgeBetaParameter 0 % only relevant for TypeOfBulgeVelocityStructure=1 HaloDispersionRoverZratio 4.0 % only relevant for TypeOfHaloVelocityStructure=3 DiskDispersionRoverZratio 2.0 % only relevant for TypeOfDiskVelocityStructure=3 BulgeDispersionRoverZratio 4.0 % only relevant for TypeOfBulgeVelocityStructure=3 HaloStreamingVelocityParameter 0.0 % gives the azimuthal streaming velocity in the TypeOf*VelocityStructure=2/3 cases ('k parameter') DiskStreamingVelocityParameter 1.0 % gives the azimuthal streaming velocity in the TypeOf*VelocityStructure=2/3 cases ('k parameter') BulgeStreamingVelocityParameter 0.0 % gives the azimuthal streaming velocity in the TypeOf*VelocityStructure=2/3 cases ('k parameter') %------ Orbit integration accuracy TorbitFac 10.0 % regulates the integration time of orbits % (this is of the order of the typical number of orbits per particle) TimeStepFactorOrbit 0.01 TimeStepFactorCellCross 0.25 %------ Iterative optimization parameters FractionToOptimizeIndependendly 0.001 IndepenentOptimizationsPerStep 100 StepsBetweenDump 10 MaximumNumberOfSteps 100 MinParticlesPerBinForDispersionMeasurement 100 MinParticlesPerBinForDensityMeasurement 50 %------ Grid dimension and extenstion/resolution DG_MaxLevel 7 EG_MaxLevel 7 FG_Nbin 256 % number of bins for the acceleration grid in the R- and z-directions OutermostBinEnclosedMassFraction 0.999 % regulates the fraction of mass of the Hernquist % halo that must be inside the grid (determines grid extension) InnermostBinEnclosedMassFraction 0.0000001 % regulates the fraction of mass enclosed by the innermost % bin (regulates size of innermost grid cells) MaxVelInUnitsVesc 0.9999 % maximum allowed velocity in units of the local escape velocity %------ Construction of target density field SampleDensityFieldForTargetResponse 1 % if set to 1, the code will randomly sample points to construct the density field SampleParticleCount 100000000 % number of points sampled for target density field %------ Construction of force field SampleForceNhalo 0 % number of points to use to for computing force field with a tree SampleForceNdisk 100000000 SampleForceNbulge 0 Softening 0.05 %------ Accuracy settings of tree code used in construction of force field TypeOfOpeningCriterion 1 ErrTolTheta 0.4 ErrTolForceAcc 0.0025 %------ Domain decomposition parameters used in parallel tree code MultipleDomains 4 TopNodeFactor 4 %------ Parallel I/O paramaters, only affects writing of galaxy files NumFilesPerSnapshot 1 NumFilesWrittenInParallel 1 %------ Memory allocation parameters MaxMemSize 2300.0 % in MB BufferSize 100.0 BufferSizeGravity 100.0 %------ Specification of internal system of units UnitLength_in_cm 3.085678e21 % 1.0 kpc UnitMass_in_g 1.989e43 % 1.0e10 solar masses UnitVelocity_in_cm_per_s 1e5 % 1 km/sec GravityConstantInternal 0 GalIC/Model_M3.param000644 000765 000024 00000014531 12373713530 015073 0ustar00volkerstaff000000 000000 %------ File and path names, as well as output file format OutputDir ./Model_M3 OutputFile snap % Base filename of generated sequence of files SnapFormat 1 % File format selection %------ Basic structural parameters of model CC 10.0 % halo concentration V200 200.0 % circular velocity v_200 (in km/sec) LAMBDA 0.035 % spin parameter MD 0.035 % disk mass fraction MB 0.05 % bulge mass fraction MBH 0.0 % black hole mass fraction. If zero, no black % hole is generated, otherwise one at the centre % is added. JD 0.035 % disk spin fraction, typically chosen equal to MD DiskHeight 0.2 % thickness of stellar disk in units of radial scale length BulgeSize 0.1 % bulge scale length in units of halo scale length HaloStretch 1.0 % should be one for a spherical halo, smaller than one corresponds to prolate distortion, otherwise oblate BulgeStretch 1.0 % should be one for a spherical bulge, smaller than one corresponds to prolate distortion, otherwise oblate %------ Particle numbers in target model N_HALO 100000 % desired number of particles in dark halo N_DISK 100000 % desired number of collisionless particles in disk N_BULGE 100000 % number of bulge particles %------ Selection of symmetry constraints of velocity structure TypeOfHaloVelocityStructure 2 % 0 = spherically symmetric, isotropic % 1 = spherically symmetric, anisotropic (with beta parameter specified) % 2 = axisymmetric, f(E, Lz), with specified net rotation % 3 = axisymmetric, f(E, Lz, I_3), with / specified and net rotation specified TypeOfDiskVelocityStructure 3 % 0 = spherically symmetric, isotropic % 1 = spherically symmetric, anisotropic (with beta parameter specified) % 2 = axisymmetric, f(E, Lz), with specified net rotation % 3 = axisymmetric, f(E, Lz, I_3), with / specified and net rotation specified TypeOfBulgeVelocityStructure 2 % 0 = spherically symmetric, isotropic % 1 = spherically symmetric, anisotropic (with beta parameter specified) % 2 = axisymmetric, f(E, Lz), with specified net rotation % 3 = axisymmetric, f(E, Lz, I_3), with / specified and net rotation specified HaloBetaParameter 0 % only relevant for TypeOfHaloVelocityStructure=1 BulgeBetaParameter 0 % only relevant for TypeOfBulgeVelocityStructure=1 HaloDispersionRoverZratio 1.1 % only relevant for TypeOfHaloVelocityStructure=3 DiskDispersionRoverZratio 4.0 % only relevant for TypeOfDiskVelocityStructure=3 BulgeDispersionRoverZratio 0.8 % only relevant for TypeOfBulgeVelocityStructure=3 HaloStreamingVelocityParameter 0.1 % gives the azimuthal streaming velocity in the TypeOf*VelocityStructure=2/3 cases ('k parameter') DiskStreamingVelocityParameter 1.0 % gives the azimuthal streaming velocity in the TypeOf*VelocityStructure=2/3 cases ('k parameter') BulgeStreamingVelocityParameter 0.0 % gives the azimuthal streaming velocity in the TypeOf*VelocityStructure=2/3 cases ('k parameter') %------ Orbit integration accuracy TorbitFac 10.0 % regulates the integration time of orbits % (this is of the order of the typical number of orbits per particle) TimeStepFactorOrbit 0.01 TimeStepFactorCellCross 0.25 %------ Iterative optimization parameters FractionToOptimizeIndependendly 0.001 IndepenentOptimizationsPerStep 100 StepsBetweenDump 10 MaximumNumberOfSteps 100 MinParticlesPerBinForDispersionMeasurement 100 MinParticlesPerBinForDensityMeasurement 50 %------ Grid dimension and extenstion/resolution DG_MaxLevel 7 EG_MaxLevel 7 FG_Nbin 256 % number of bins for the acceleration grid in the R- and z-directions OutermostBinEnclosedMassFraction 0.999 % regulates the fraction of mass of the Hernquist % halo that must be inside the grid (determines grid extension) InnermostBinEnclosedMassFraction 0.0000001 % regulates the fraction of mass enclosed by the innermost % bin (regulates size of innermost grid cells) MaxVelInUnitsVesc 0.9999 % maximum allowed velocity in units of the local escape velocity %------ Construction of target density field SampleDensityFieldForTargetResponse 1 % if set to 1, the code will randomly sample points to construct the density field SampleParticleCount 100000000 % number of points sampled for target density field %------ Construction of force field SampleForceNhalo 0 % number of points to use to for computing force field with a tree SampleForceNdisk 100000000 SampleForceNbulge 0 Softening 0.05 %------ Accuracy settings of tree code used in construction of force field TypeOfOpeningCriterion 1 ErrTolTheta 0.4 ErrTolForceAcc 0.0025 %------ Domain decomposition parameters used in parallel tree code MultipleDomains 4 TopNodeFactor 4 %------ Parallel I/O paramaters, only affects writing of galaxy files NumFilesPerSnapshot 1 NumFilesWrittenInParallel 1 %------ Memory allocation parameters MaxMemSize 2300.0 % in MB BufferSize 100.0 BufferSizeGravity 100.0 %------ Specification of internal system of units UnitLength_in_cm 3.085678e21 % 1.0 kpc UnitMass_in_g 1.989e43 % 1.0e10 solar masses UnitVelocity_in_cm_per_s 1e5 % 1 km/sec GravityConstantInternal 0 GalIC/Model_M4.param000644 000765 000024 00000014544 12373713530 015100 0ustar00volkerstaff000000 000000 %------ File and path names, as well as output file format OutputDir ./Model_M4 OutputFile snap % Base filename of generated sequence of files SnapFormat 1 % File format selection %------ Basic structural parameters of model CC 10.0 % halo concentration V200 200.0 % circular velocity v_200 (in km/sec) LAMBDA 0.035 % spin parameter MD 0.035 % disk mass fraction MB 0.05 % bulge mass fraction MBH 0.0 % black hole mass fraction. If zero, no black % hole is generated, otherwise one at the centre % is added. JD 0.035 % disk spin fraction, typically chosen equal to MD DiskHeight 0.2 % thickness of stellar disk in units of radial scale length BulgeSize 0.1 % bulge scale length in units of halo scale length HaloStretch 0.85 % should be one for a spherical halo, smaller than one corresponds to prolate distortion, otherwise oblate BulgeStretch 1.15 % should be one for a spherical bulge, smaller than one corresponds to prolate distortion, otherwise oblate %------ Particle numbers in target model N_HALO 100000 % desired number of particles in dark halo N_DISK 100000 % desired number of collisionless particles in disk N_BULGE 100000 % number of bulge particles %------ Selection of symmetry constraints of velocity structure TypeOfHaloVelocityStructure 2 % 0 = spherically symmetric, isotropic % 1 = spherically symmetric, anisotropic (with beta parameter specified) % 2 = axisymmetric, f(E, Lz), with specified net rotation % 3 = axisymmetric, f(E, Lz, I_3), with / specified and net rotation specified TypeOfDiskVelocityStructure 3 % 0 = spherically symmetric, isotropic % 1 = spherically symmetric, anisotropic (with beta parameter specified) % 2 = axisymmetric, f(E, Lz), with specified net rotation % 3 = axisymmetric, f(E, Lz, I_3), with / specified and net rotation specified TypeOfBulgeVelocityStructure 2 % 0 = spherically symmetric, isotropic % 1 = spherically symmetric, anisotropic (with beta parameter specified) % 2 = axisymmetric, f(E, Lz), with specified net rotation % 3 = axisymmetric, f(E, Lz, I_3), with / specified and net rotation specified HaloBetaParameter 0 % only relevant for TypeOfHaloVelocityStructure=1 BulgeBetaParameter 0 % only relevant for TypeOfBulgeVelocityStructure=1 HaloDispersionRoverZratio 1.5 % only relevant for TypeOfHaloVelocityStructure=3 DiskDispersionRoverZratio 2.0 % only relevant for TypeOfDiskVelocityStructure=3 BulgeDispersionRoverZratio 0.7 % only relevant for TypeOfBulgeVelocityStructure=3 HaloStreamingVelocityParameter 1.0 % gives the azimuthal streaming velocity in the TypeOf*VelocityStructure=2/3 cases ('k parameter') DiskStreamingVelocityParameter 1.0 % gives the azimuthal streaming velocity in the TypeOf*VelocityStructure=2/3 cases ('k parameter') BulgeStreamingVelocityParameter 1.0 % gives the azimuthal streaming velocity in the TypeOf*VelocityStructure=2/3 cases ('k parameter') %------ Orbit integration accuracy TorbitFac 10.0 % regulates the integration time of orbits % (this is of the order of the typical number of orbits per particle) TimeStepFactorOrbit 0.01 TimeStepFactorCellCross 0.25 %------ Iterative optimization parameters FractionToOptimizeIndependendly 0.001 IndepenentOptimizationsPerStep 100 StepsBetweenDump 10 MaximumNumberOfSteps 100 MinParticlesPerBinForDispersionMeasurement 100 MinParticlesPerBinForDensityMeasurement 50 %------ Grid dimension and extenstion/resolution DG_MaxLevel 7 EG_MaxLevel 7 FG_Nbin 256 % number of bins for the acceleration grid in the R- and z-directions OutermostBinEnclosedMassFraction 0.999 % regulates the fraction of mass of the Hernquist % halo that must be inside the grid (determines grid extension) InnermostBinEnclosedMassFraction 0.0000001 % regulates the fraction of mass enclosed by the innermost % bin (regulates size of innermost grid cells) MaxVelInUnitsVesc 0.9999 % maximum allowed velocity in units of the local escape velocity %------ Construction of target density field SampleDensityFieldForTargetResponse 1 % if set to 1, the code will randomly sample points to construct the density field SampleParticleCount 100000000 % number of points sampled for target density field %------ Construction of force field SampleForceNhalo 100000000 % number of points to use to for computing force field with a tree SampleForceNdisk 100000000 SampleForceNbulge 100000000 Softening 0.05 %------ Accuracy settings of tree code used in construction of force field TypeOfOpeningCriterion 1 ErrTolTheta 0.4 ErrTolForceAcc 0.0025 %------ Domain decomposition parameters used in parallel tree code MultipleDomains 4 TopNodeFactor 4 %------ Parallel I/O paramaters, only affects writing of galaxy files NumFilesPerSnapshot 1 NumFilesWrittenInParallel 1 %------ Memory allocation parameters MaxMemSize 2300.0 % in MB BufferSize 100.0 BufferSizeGravity 100.0 %------ Specification of internal system of units UnitLength_in_cm 3.085678e21 % 1.0 kpc UnitMass_in_g 1.989e43 % 1.0e10 solar masses UnitVelocity_in_cm_per_s 1e5 % 1 km/sec GravityConstantInternal 0