Get trending papers in your email inbox once a day!
Get trending papers in your email inbox!
SubscribeLong Horizon Temperature Scaling
Temperature scaling is a popular technique for tuning the sharpness of a model distribution. It is used extensively for sampling likely generations and calibrating model uncertainty, and even features as a controllable parameter to many large language models in deployment. However, autoregressive models rely on myopic temperature scaling that greedily optimizes the next token. To address this, we propose Long Horizon Temperature Scaling (LHTS), a novel approach for sampling from temperature-scaled joint distributions. LHTS is compatible with all likelihood-based models, and optimizes for the long-horizon likelihood of samples. We derive a temperature-dependent LHTS objective, and show that fine-tuning a model on a range of temperatures produces a single model capable of generation with a controllable long-horizon temperature parameter. We experiment with LHTS on image diffusion models and character/language autoregressive models, demonstrating advantages over myopic temperature scaling in likelihood and sample quality, and showing improvements in accuracy on a multiple choice analogy task by 10%.
Temperature dependence of nonlinear elastic moduli of polystyrene
Nonlinear elastic properties of polymers and polymeric composites are essential for accurate prediction of their response to dynamic loads, which is crucial in a wide range of applications. These properties can be affected by strain rate, temperature, and pressure. The temperature susceptibility of nonlinear elastic moduli of polymers remains poorly understood. We have recently observed a significant frequency dependence of the nonlinear elastic (Murnaghan) moduli of polystyrene. In this paper we expand this analysis by the temperature dependence. The measurement methodology was based on the acousto-elastic effect, and involved analysis of the dependencies of velocities of longitudinal and shear single-frequency ultrasonic waves in the sample on the applied static pressure. Measurements were performed at different temperatures in the range of 25-65 {\deg}C and at different frequencies in the range of 0.75-3 MHz. The temperature susceptibility of the nonlinear moduli l and m was found to be two orders of magnitude larger than that of linear moduli lambda and mu. At the same time, the observed variations of n modulus with temperature were low and within the measurement tolerance. The observed tendencies can be explained by different influence of pressure on relaxation processes in the material at different temperatures.
EDT: Improving Large Language Models' Generation by Entropy-based Dynamic Temperature Sampling
Recently, Large Language Models (LLMs) have demonstrated outstanding performance across a wide range of downstream language tasks. Temperature sampling is a commonly used decoding strategy for LLMs' generation process. However, a fixed temperature parameter is used in most cases, which may not always be an optimal choice for balancing generation quality and diversity. In this paper, we propose an effective Entropy-based Dynamic Temperature (EDT) Sampling method, to achieve a more balanced performance in terms of both generation quality and diversity by dynamically selecting the temperature parameter. Additionally, we also show model performance and comprehensive analyses for 4 different generation benchmarks. Our experiments show that EDT significantly outperforms the existing strategies across different tasks.
KL-Divergence Guided Temperature Sampling
Temperature sampling is a conventional approach to diversify large language model predictions. As temperature increases, the prediction becomes diverse but also vulnerable to hallucinations -- generating tokens that are sensible but not factual. One common approach to mitigate hallucinations is to provide source/grounding documents and the model is trained to produce predictions that bind to and are attributable to the provided source. It appears that there is a trade-off between diversity and attribution. To mitigate any such trade-off, we propose to relax the constraint of having a fixed temperature over decoding steps, and a mechanism to guide the dynamic temperature according to its relevance to the source through KL-divergence. Our experiments justifies the trade-off, and shows that our sampling algorithm outperforms the conventional top-k and top-p algorithms in conversational question-answering and summarization tasks.
Is Temperature Sample Efficient for Softmax Gaussian Mixture of Experts?
Dense-to-sparse gating mixture of experts (MoE) has recently become an effective alternative to a well-known sparse MoE. Rather than fixing the number of activated experts as in the latter model, which could limit the investigation of potential experts, the former model utilizes the temperature to control the softmax weight distribution and the sparsity of the MoE during training in order to stabilize the expert specialization. Nevertheless, while there are previous attempts to theoretically comprehend the sparse MoE, a comprehensive analysis of the dense-to-sparse gating MoE has remained elusive. Therefore, we aim to explore the impacts of the dense-to-sparse gate on the maximum likelihood estimation under the Gaussian MoE in this paper. We demonstrate that due to interactions between the temperature and other model parameters via some partial differential equations, the convergence rates of parameter estimations are slower than any polynomial rates, and could be as slow as O(1/log(n)), where n denotes the sample size. To address this issue, we propose using a novel activation dense-to-sparse gate, which routes the output of a linear layer to an activation function before delivering them to the softmax function. By imposing linearly independence conditions on the activation function and its derivatives, we show that the parameter estimation rates are significantly improved to polynomial rates.
Small Temperature is All You Need for Differentiable Architecture Search
Differentiable architecture search (DARTS) yields highly efficient gradient-based neural architecture search (NAS) by relaxing the discrete operation selection to optimize continuous architecture parameters that maps NAS from the discrete optimization to a continuous problem. DARTS then remaps the relaxed supernet back to the discrete space by one-off post-search pruning to obtain the final architecture (finalnet). Some emerging works argue that this remap is inherently prone to mismatch the network between training and evaluation which leads to performance discrepancy and even model collapse in extreme cases. We propose to close the gap between the relaxed supernet in training and the pruned finalnet in evaluation through utilizing small temperature to sparsify the continuous distribution in the training phase. To this end, we first formulate sparse-noisy softmax to get around gradient saturation. We then propose an exponential temperature schedule to better control the outbound distribution and elaborate an entropy-based adaptive scheme to finally achieve the enhancement. We conduct extensive experiments to verify the efficiency and efficacy of our method.
Is Temperature the Creativity Parameter of Large Language Models?
Large language models (LLMs) are applied to all sorts of creative tasks, and their outputs vary from beautiful, to peculiar, to pastiche, into plain plagiarism. The temperature parameter of an LLM regulates the amount of randomness, leading to more diverse outputs; therefore, it is often claimed to be the creativity parameter. Here, we investigate this claim using a narrative generation task with a predetermined fixed context, model and prompt. Specifically, we present an empirical analysis of the LLM output for different temperature values using four necessary conditions for creativity in narrative generation: novelty, typicality, cohesion, and coherence. We find that temperature is weakly correlated with novelty, and unsurprisingly, moderately correlated with incoherence, but there is no relationship with either cohesion or typicality. However, the influence of temperature on creativity is far more nuanced and weak than suggested by the "creativity parameter" claim; overall results suggest that the LLM generates slightly more novel outputs as temperatures get higher. Finally, we discuss ideas to allow more controlled LLM creativity, rather than relying on chance via changing the temperature parameter.
To Cool or not to Cool? Temperature Network Meets Large Foundation Models via DRO
The temperature parameter plays a profound role during training and/or inference with large foundation models (LFMs) such as large language models (LLMs) and CLIP models. Particularly, it adjusts the logits in the softmax function in LLMs, which is crucial for next token generation, and it scales the similarities in the contrastive loss for training CLIP models. A significant question remains: Is it viable to learn a neural network to predict a personalized temperature of any input data for enhancing LFMs"? In this paper, we present a principled framework for learning a small yet generalizable temperature prediction network (TempNet) to improve LFMs. Our solution is composed of a novel learning framework with a robust loss underpinned by constrained distributionally robust optimization (DRO), and a properly designed TempNet with theoretical inspiration. TempNet can be trained together with a large foundation model from scratch or learned separately given a pretrained foundation model. It is not only useful for predicting personalized temperature to promote the training of LFMs but also generalizable and transferable to new tasks. Our experiments on LLMs and CLIP models demonstrate that TempNet greatly improves the performance of existing solutions or models, e.g. Table 1. The code to reproduce the experimental results in this paper can be found at https://github.com/zhqiu/TempNet.
On the Limitations of Temperature Scaling for Distributions with Overlaps
Despite the impressive generalization capabilities of deep neural networks, they have been repeatedly shown to be overconfident when they are wrong. Fixing this issue is known as model calibration, and has consequently received much attention in the form of modified training schemes and post-training calibration procedures such as temperature scaling. While temperature scaling is frequently used because of its simplicity, it is often outperformed by modified training schemes. In this work, we identify a specific bottleneck for the performance of temperature scaling. We show that for empirical risk minimizers for a general set of distributions in which the supports of classes have overlaps, the performance of temperature scaling degrades with the amount of overlap between classes, and asymptotically becomes no better than random when there are a large number of classes. On the other hand, we prove that optimizing a modified form of the empirical risk induced by the Mixup data augmentation technique can in fact lead to reasonably good calibration performance, showing that training-time calibration may be necessary in some situations. We also verify that our theoretical results reflect practice by showing that Mixup significantly outperforms empirical risk minimization (with respect to multiple calibration metrics) on image classification benchmarks with class overlaps introduced in the form of label noise.
The First Room-Temperature Ambient-Pressure Superconductor
For the first time in the world, we succeeded in synthesizing the room-temperature superconductor (T_c ge 400 K, 127^circC) working at ambient pressure with a modified lead-apatite (LK-99) structure. The superconductivity of LK-99 is proved with the Critical temperature (T_c), Zero-resistivity, Critical current (I_c), Critical magnetic field (H_c), and the Meissner effect. The superconductivity of LK-99 originates from minute structural distortion by a slight volume shrinkage (0.48 %), not by external factors such as temperature and pressure. The shrinkage is caused by Cu^{2+} substitution of Pb^{2+}(2) ions in the insulating network of Pb(2)-phosphate and it generates the stress. It concurrently transfers to Pb(1) of the cylindrical column resulting in distortion of the cylindrical column interface, which creates superconducting quantum wells (SQWs) in the interface. The heat capacity results indicated that the new model is suitable for explaining the superconductivity of LK-99. The unique structure of LK-99 that allows the minute distorted structure to be maintained in the interfaces is the most important factor that LK-99 maintains and exhibits superconductivity at room temperatures and ambient pressure.
LTD: Low Temperature Distillation for Robust Adversarial Training
Adversarial training has been widely used to enhance the robustness of neural network models against adversarial attacks. Despite the popularity of neural network models, a significant gap exists between the natural and robust accuracy of these models. In this paper, we identify one of the primary reasons for this gap is the common use of one-hot vectors as labels, which hinders the learning process for image recognition. Representing ambiguous images with one-hot vectors is imprecise and may lead the model to suboptimal solutions. To overcome this issue, we propose a novel method called Low Temperature Distillation (LTD) that generates soft labels using the modified knowledge distillation framework. Unlike previous approaches, LTD uses a relatively low temperature in the teacher model and fixed, but different temperatures for the teacher and student models. This modification boosts the model's robustness without encountering the gradient masking problem that has been addressed in defensive distillation. The experimental results demonstrate the effectiveness of the proposed LTD method combined with previous techniques, achieving robust accuracy rates of 58.19%, 31.13%, and 42.08% on CIFAR-10, CIFAR-100, and ImageNet data sets, respectively, without additional unlabeled data.
Recent global temperature surge amplified by record-low planetary albedo
In 2023, the global mean temperature soared to 1.48K above the pre-industrial level, surpassing the previous record by 0.17K. Previous best-guess estimates of known drivers including anthropogenic warming and the El Nino onset fall short by about 0.2K in explaining the temperature rise. Utilizing satellite and reanalysis data, we identify a record-low planetary albedo as the primary factor bridging this gap. The decline is caused largely by a reduced low-cloud cover in the northern mid-latitudes and tropics, in continuation of a multi-annual trend. Understanding how much of the low-cloud trend is due to internal variability, reduced aerosol concentrations, or a possibly emerging low-cloud feedback will be crucial for assessing the current and expected future warming.
Proposal for room-temperature quantum repeaters with nitrogen-vacancy centers and optomechanics
We propose a quantum repeater architecture that can operate under ambient conditions. Our proposal builds on recent progress towards non-cryogenic spin-photon interfaces based on nitrogen-vacancy centers, which have excellent spin coherence times even at room temperature, and optomechanics, which allows to avoid phonon-related decoherence and also allows the emitted photons to be in the telecom band. We apply the photon number decomposition method to quantify the fidelity and the efficiency of entanglement established between two remote electron spins. We describe how the entanglement can be stored in nuclear spins and extended to long distances via quasi-deterministic entanglement swapping operations involving the electron and nuclear spins. We furthermore propose schemes to achieve high-fidelity readout of the spin states at room temperature using the spin-optomechanics interface. Our work shows that long-distance quantum networks made of solid-state components that operate at room temperature are within reach of current technological capabilities.
Enhanced proton parallel temperature inside patches of switchbacks in the inner heliosphere
Switchbacks are discrete angular deflections in the solar wind magnetic field that have been observed throughout the heliosphere. Recent observations by Parker Solar Probe (PSP) have revealed the presence of patches of switchbacks on the scale of hours to days, separated by 'quieter' radial fields. We aim to further diagnose the origin of these patches using measurements of proton temperature anisotropy that can illuminate possible links to formation processes in the solar corona. We fitted 3D bi-Maxwellian functions to the core of proton velocity distributions measured by the SPAN-Ai instrument onboard PSP to obtain the proton parallel, T_{p,|}, and perpendicular, T_{p,perp}, temperature. We show that the presence of patches is highlighted by a transverse deflection in the flow and magnetic field away from the radial direction. These deflections are correlated with enhancements in T_{p,|}, while T_{p,perp} remains relatively constant. Patches sometimes exhibit small proton and electron density enhancements. We interpret that patches are not simply a group of switchbacks, but rather switchbacks are embedded within a larger-scale structure identified by enhanced T_{p,|} that is distinct from the surrounding solar wind. We suggest that these observations are consistent with formation by reconnection-associated mechanisms in the corona.
Possible Meissner effect near room temperature in copper-substituted lead apatite
With copper-substituted lead apatite below room temperature, we observe diamagnetic dc magnetization under magnetic field of 25 Oe with remarkable bifurcation between zero-field-cooling and field-cooling measurements, and under 200 Oe it changes to be paramagnetism. A glassy memory effect is found during cooling. Typical hysteresis loops for superconductors are detected below 250 K, along with an asymmetry between forward and backward sweep of magnetic field. Our experiment suggests at room temperature the Meissner effect is possibly present in this material.
Guidance is All You Need: Temperature-Guided Reasoning in Large Language Models
We present Quasar-1, a novel architecture that introduces temperature-guided reasoning to large language models through the Token Temperature Mechanism (TTM) and Guided Sequence of Thought (GSoT). Our approach leverages the concept of hot and cold tokens, where hot tokens are prioritized for their contextual relevance, while cold tokens provide supplementary information. This dynamic modulation of token importance enables the model to achieve superior logical reasoning capabilities compared to traditional chain-of-thought approaches. Through rigorous mathematical analysis, we prove that our temperature-guided attention mechanism converges to optimal reasoning paths with exponential guarantees. Empirical results show significant improvements in reasoning accuracy and computational efficiency across a wide range of tasks, making advanced AI reasoning accessible to a broader range of applications.
AFSD-Physics: Exploring the governing equations of temperature evolution during additive friction stir deposition by a human-AI teaming approach
This paper presents a modeling effort to explore the underlying physics of temperature evolution during additive friction stir deposition (AFSD) by a human-AI teaming approach. AFSD is an emerging solid-state additive manufacturing technology that deposits materials without melting. However, both process modeling and modeling of the AFSD tool are at an early stage. In this paper, a human-AI teaming approach is proposed to combine models based on first principles with AI. The resulting human-informed machine learning method, denoted as AFSD-Physics, can effectively learn the governing equations of temperature evolution at the tool and the build from in-process measurements. Experiments are designed and conducted to collect in-process measurements for the deposition of aluminum 7075 with a total of 30 layers. The acquired governing equations are physically interpretable models with low computational cost and high accuracy. Model predictions show good agreement with the measurements. Experimental validation with new process parameters demonstrates the model's generalizability and potential for use in tool temperature control and process optimization.
Out-of-Distribution Detection & Applications With Ablated Learned Temperature Energy
As deep neural networks become adopted in high-stakes domains, it is crucial to be able to identify when inference inputs are Out-of-Distribution (OOD) so that users can be alerted of likely drops in performance and calibration despite high confidence. Among many others, existing methods use the following two scores to do so without training on any apriori OOD examples: a learned temperature and an energy score. In this paper we introduce Ablated Learned Temperature Energy (or "AbeT" for short), a method which combines these prior methods in novel ways with effective modifications. Due to these contributions, AbeT lowers the False Positive Rate at 95% True Positive Rate (FPR@95) by 35.39% in classification (averaged across all ID and OOD datasets measured) compared to state of the art without training networks in multiple stages or requiring hyperparameters or test-time backward passes. We additionally provide empirical insights as to how our model learns to distinguish between In-Distribution (ID) and OOD samples while only being explicitly trained on ID samples via exposure to misclassified ID examples at training time. Lastly, we show the efficacy of our method in identifying predicted bounding boxes and pixels corresponding to OOD objects in object detection and semantic segmentation, respectively - with an AUROC increase of 5.15% in object detection and both a decrease in FPR@95 of 41.48% and an increase in AUPRC of 34.20% on average in semantic segmentation compared to previous state of the art.
Not All Semantics are Created Equal: Contrastive Self-supervised Learning with Automatic Temperature Individualization
In this paper, we aim to optimize a contrastive loss with individualized temperatures in a principled and systematic manner for self-supervised learning. The common practice of using a global temperature parameter tau ignores the fact that ``not all semantics are created equal", meaning that different anchor data may have different numbers of samples with similar semantics, especially when data exhibits long-tails. First, we propose a new robust contrastive loss inspired by distributionally robust optimization (DRO), providing us an intuition about the effect of tau and a mechanism for automatic temperature individualization. Then, we propose an efficient stochastic algorithm for optimizing the robust contrastive loss with a provable convergence guarantee without using large mini-batch sizes. Theoretical and experimental results show that our algorithm automatically learns a suitable tau for each sample. Specifically, samples with frequent semantics use large temperatures to keep local semantic structures, while samples with rare semantics use small temperatures to induce more separable features. Our method not only outperforms prior strong baselines (e.g., SimCLR, CLIP) on unimodal and bimodal datasets with larger improvements on imbalanced data but also is less sensitive to hyper-parameters. To our best knowledge, this is the first methodical approach to optimizing a contrastive loss with individualized temperatures.
A New Approach for Constraining Large-Scale Temperature Fluctuations in the Intergalactic Medium
The reionization of helium is thought to occur at 2.5lesssim zlesssim4, marking the last phase transition and final global heating event of the intergalactic medium (IGM). Since it is driven by rare quasars, helium reionization should give rise to strong temperature fluctuations in the IGM between neutral and recently-ionized regions of order sigma (ln T) sim Delta T/T = 20-50%. We introduce a novel method to search for reionization-induced temperature fluctuations in the IGM by using the effective optical depths of the Lyman-alpha forest towards a large number of background quasars. Higher IGM temperatures give rise to lower effective optical depths in the Lyman-alpha forest, implying that temperature fluctuations will broaden the observed optical depth distribution. We measured the distributions of effective Lyman-alpha forest optical depths across 71 X-Shooter spectra from the XQ-100 survey in four redshift bins from z=3.76 to z=4.19 and compared them to a large-volume cosmological hydrodynamical simulation. A good agreement is found between the observations and the simulation, which does not include temperature fluctuations; therefore, we do not detect a signature of helium reionization. We then post-process the simulations to include an increasing amount of temperature fluctuations until the model becomes inconsistent with the observations. We obtain tight constraints on sigma (ln T) < 0.29 (<0.40) at 2 sigma (3 sigma) at z=3.76 when averaging over scales of 100 comoving Mpc, and weaker constraints for higher redshifts and smaller scales. Our constraints are the tightest to date, and imply that either the IGM temperature contrast caused by helium reionization is less than sim30%, or that the process has not yet significantly started at z=3.76.
Neural Clamping: Joint Input Perturbation and Temperature Scaling for Neural Network Calibration
Neural network calibration is an essential task in deep learning to ensure consistency between the confidence of model prediction and the true correctness likelihood. In this paper, we propose a new post-processing calibration method called Neural Clamping, which employs a simple joint input-output transformation on a pre-trained classifier via a learnable universal input perturbation and an output temperature scaling parameter. Moreover, we provide theoretical explanations on why Neural Clamping is provably better than temperature scaling. Evaluated on CIFAR-100 and ImageNet image recognition datasets and a variety of deep neural network models, our empirical results show that Neural Clamping significantly outperforms state-of-the-art post-processing calibration methods.
Min P Sampling: Balancing Creativity and Coherence at High Temperature
Large Language Models (LLMs) generate longform text by successively sampling the next token based on the probability distribution of the token vocabulary at each decoding step. Current popular truncation sampling methods such as top-p sampling, also known as nucleus sampling, often struggle to balance coherence and creativity in generating text, particularly when using higher temperatures. To address this issue, we propose min-p, a dynamic truncation sampling method, that establishes a minimum base percentage threshold for tokens, which the scales according to the probability of the top candidate token. Through experiments on several benchmarks, such as GPQA, GSM8K and AlpacaEval Creative Writing, we demonstrate that min-p improves the coherence and quality of generated text even at high temperatures, while also facilitating more creative and diverse outputs compared to top-p and other sampling methods. As of writing, min-p has been adopted by multiple open-source LLM implementations, and have been independently assessed by members of the open-source LLM community, further validating its practical utility and potential.
Shaded Route Planning Using Active Segmentation and Identification of Satellite Images
Heatwaves pose significant health risks, particularly due to prolonged exposure to high summer temperatures. Vulnerable groups, especially pedestrians and cyclists on sun-exposed sidewalks, motivate the development of a route planning method that incorporates somatosensory temperature effects through shade ratio consideration. This paper is the first to introduce a pipeline that utilizes segmentation foundation models to extract shaded areas from high-resolution satellite images. These areas are then integrated into a multi-layered road map, enabling users to customize routes based on a balance between distance and shade exposure, thereby enhancing comfort and health during outdoor activities. Specifically, we construct a graph-based representation of the road map, where links indicate connectivity and are updated with shade ratio data for dynamic route planning. This system is already implemented online, with a video demonstration, and will be specifically adapted to assist travelers during the 2024 Olympic Games in Paris.
Complex-valued neural networks to speed-up MR Thermometry during Hyperthermia using Fourier PD and PDUNet
Hyperthermia (HT) in combination with radio- and/or chemotherapy has become an accepted cancer treatment for distinct solid tumour entities. In HT, tumour tissue is exogenously heated to temperatures between 39 and 43 ^circC for 60 minutes. Temperature monitoring can be performed non-invasively using dynamic magnetic resonance imaging (MRI). However, the slow nature of MRI leads to motion artefacts in the images due to the movements of patients during image acquisition. By discarding parts of the data, the speed of the acquisition can be increased - known as undersampling. However, due to the invalidation of the Nyquist criterion, the acquired images might be blurry and can also produce aliasing artefacts. The aim of this work was, therefore, to reconstruct highly undersampled MR thermometry acquisitions with better resolution and with fewer artefacts compared to conventional methods. The use of deep learning in the medical field has emerged in recent times, and various studies have shown that deep learning has the potential to solve inverse problems such as MR image reconstruction. However, most of the published work only focuses on the magnitude images, while the phase images are ignored, which are fundamental requirements for MR thermometry. This work, for the first time, presents deep learning-based solutions for reconstructing undersampled MR thermometry data. Two different deep learning models have been employed here, the Fourier Primal-Dual network and the Fourier Primal-Dual UNet, to reconstruct highly undersampled complex images of MR thermometry. The method reduced the temperature difference between the undersampled MRIs and the fully sampled MRIs from 1.3 ^circC to 0.6 ^circC in full volume and 0.49 ^circC to 0.06 ^circC in the tumour region for an acceleration factor of 10.
Climate-sensitive Urban Planning through Optimization of Tree Placements
Climate change is increasing the intensity and frequency of many extreme weather events, including heatwaves, which results in increased thermal discomfort and mortality rates. While global mitigation action is undoubtedly necessary, so is climate adaptation, e.g., through climate-sensitive urban planning. Among the most promising strategies is harnessing the benefits of urban trees in shading and cooling pedestrian-level environments. Our work investigates the challenge of optimal placement of such trees. Physical simulations can estimate the radiative and thermal impact of trees on human thermal comfort but induce high computational costs. This rules out optimization of tree placements over large areas and considering effects over longer time scales. Hence, we employ neural networks to simulate the point-wise mean radiant temperatures--a driving factor of outdoor human thermal comfort--across various time scales, spanning from daily variations to extended time scales of heatwave events and even decades. To optimize tree placements, we harness the innate local effect of trees within the iterated local search framework with tailored adaptations. We show the efficacy of our approach across a wide spectrum of study areas and time scales. We believe that our approach is a step towards empowering decision-makers, urban designers and planners to proactively and effectively assess the potential of urban trees to mitigate heat stress.
Radon concentration variations at the Yangyang underground laboratory
The concentration of radon in the air has been measured in the 700 m-deep Yangyang underground laboratory between October 2004 and May 2022. The average concentrations in two experimental areas, called A6 and A5, were measured to be 53.4pm0.2 Bq/m3 and 33.5pm0.1 Bq/m3, respectively. The lower value in the A5 area reflects the presence of better temperature control and ventilation. The radon concentrations sampled within the two A5 experimental rooms' air are found to be correlated to the local surface temperature outside of the rooms, with correlation coefficients r = 0.22 and r = 0.70. Therefore, the radon concentrations display a seasonal variation, because the local temperature driven by the overground season influences air ventilation in the experimental areas. A fit on the annual residual concentrations finds that the amplitude occurs each year on August, 31pm6 days.
Ergotropy and Capacity Optimization in Heisenberg Spin Chain Quantum Batteries
This study examines the performance of finite spin quantum batteries (QBs) using Heisenberg spin models with Dzyaloshinsky-Moriya (DM) and Kaplan--Shekhtman--Entin-Wohlman--Aharony (KSEA) interactions. The QBs are modeled as interacting quantum spins in local inhomogeneous magnetic fields, inducing variable Zeeman splitting. We derive analytical expressions for the maximal extractable work, ergotropy and the capacity of QBs, as recently examined by Yang et al. [Phys. Rev. Lett. 131, 030402 (2023)]. These quantities are analytically linked through certain quantum correlations, as posited in the aforementioned study. Different Heisenberg spin chain models exhibit distinct behaviors under varying conditions, emphasizing the importance of model selection for optimizing QB performance. In antiferromagnetic (AFM) systems, maximum ergotropy occurs with a Zeeman splitting field applied to either spin, while ferromagnetic (FM) systems benefit from a uniform Zeeman field. Temperature significantly impacts QB performance, with ergotropy in the AFM case being generally more robust against temperature increases compared to the FM case. Incorporating DM and KSEA couplings can significantly enhance the capacity and ergotropy extraction of QBs. However, there exists a threshold beyond which additional increases in these interactions cause a sharp decline in capacity and ergotropy. This behavior is influenced by temperature and quantum coherence, which signal the occurrence of a sudden phase transition. The resource theory of quantum coherence proposed by Baumgratz et al. [Phys. Rev. Lett. 113, 140401 (2014)] plays a crucial role in enhancing ergotropy and capacity. However, ergotropy is limited by both the system's capacity and the amount of coherence. These findings support the theoretical framework of spin-based QBs and may benefit future research on quantum energy storage devices.
The SRG/eROSITA All-Sky Survey: Large-scale view of the Centaurus cluster
Methods. We utilized the combined five SRG/eROSITA All-Sky Survey data (eRASS:5) to perform X-ray imaging and spectral analyses of the Centaurus cluster in various directions to large radii. Surface brightness (SB) profiles out to 2R_{200} were constructed. We acquired gas temperature, metallicity, and normalization per area profiles out to R_{200}. We compared our results with previous Centaurus studies, cluster outskirts measurements, and simulations. Comprehensive sky background analysis was done across the FoV, in particular, to assess the variation of the eROSITA Bubble emission that partially contaminates the field. Results. The processed X-ray images show the known sloshing-induced structures in the core. The core (rleq11~kpc) is better described with a 2T model than a 1T model. Here, we measured lower T from the cooler component (~1.0 keV) and higher Z (sim!1.6Z_odot), signifying an iron bias. In the intermediate radial range, we observed prominent SB and normalization per area excesses in the eastern sector (Cen 45 location), reaching out to R_{500}. Temperature enhancements near the location of Cen 45 imply that the gas is shock-heated due to the interaction with Cen 30, the significant excess behind Cen 45 center might be the tail/ram-pressure-stripped gas. We found good agreement between the outskirt temperatures with the profile from simulations and fit from Suzaku outskirts measurements. We detected significant SB emission to the sky background level out to R_{200} with a 3.5sigma and followed by 2.9sigma at 1.1R_{200}. The metallicity at R_{500}-R_{200} is low but within the ranges of other outskirts studies. Conclusions. We present the first measurement of ICM morphology and properties of Centaurus cluster sampling the whole azimuth beyond 30', increasing the probed volume by a factor of almost 30.
Protosolar D-to-H abundance and one part-per-billion PH_{3} in the coldest brown dwarf
The coldest Y spectral type brown dwarfs are similar in mass and temperature to cool and warm (sim200 -- 400 K) giant exoplanets. We can therefore use their atmospheres as proxies for planetary atmospheres, testing our understanding of physics and chemistry for these complex, cool worlds. At these cold temperatures, their atmospheres are cold enough for water clouds to form, and chemical timescales increase, increasing the likelihood of disequilibrium chemistry compared to warmer classes of planets. JWST observations are revolutionizing the characterization of these worlds with high signal-to-noise, moderate resolution near- and mid-infrared spectra. The spectra have been used to measure the abundances of prominent species like water, methane, and ammonia; species that trace chemical reactions like carbon monoxide; and even isotopologues of carbon monoxide and ammonia. Here, we present atmospheric retrieval results using both published fixed-slit (GTO program 1230) and new averaged time series observations (GO program 2327) of the coldest known Y dwarf, WISE 0855-0714 (using NIRSpec G395M spectra), which has an effective temperature of sim 264 K. We present a detection of deuterium in an atmosphere outside of the solar system via a relative measurement of deuterated methane (CH_{3}D) and standard methane. From this, we infer the D/H ratio of a substellar object outside the solar system for the first time. We also present a well-constrained part-per-billion abundance of phosphine (PH_{3}). We discuss our interpretation of these results and the implications for brown dwarf and giant exoplanet formation and evolution.
AutoTherm: A Dataset and Benchmark for Thermal Comfort Estimation Indoors and in Vehicles
Thermal comfort inside buildings is a well-studied field where human judgment for thermal comfort is collected and may be used for automatic thermal comfort estimation. However, indoor scenarios are rather static in terms of thermal state changes and, thus, cannot be applied to dynamic conditions, e.g., inside a vehicle. In this work, we present our findings of a gap between building and in-vehicle scenarios regarding thermal comfort estimation. We provide evidence by comparing deep neural classifiers for thermal comfort estimation for indoor and in-vehicle conditions. Further, we introduce a temporal dataset for indoor predictions incorporating 31 input signals and self-labeled user ratings by 18 subjects in a self-built climatic chamber. For in-vehicle scenarios, we acquired a second dataset featuring human judgments from 20 subjects in a BMW 3 Series. Our experimental results indicate superior performance for estimations from time series data over single vector input. Leveraging modern machine learning architectures enables us to recognize human thermal comfort states and estimate future states automatically. We provide details on training a recurrent network-based classifier and perform an initial performance benchmark of the proposed dataset. Ultimately, we compare our collected dataset to publicly available thermal comfort datasets.
Standardized Benchmark Dataset for Localized Exposure to a Realistic Source at 10-90 GHz
The lack of freely available standardized datasets represents an aggravating factor during the development and testing the performance of novel computational techniques in exposure assessment and dosimetry research. This hinders progress as researchers are required to generate numerical data (field, power and temperature distribution) anew using simulation software for each exposure scenario. Other than being time consuming, this approach is highly susceptible to errors that occur during the configuration of the electromagnetic model. To address this issue, in this paper, the limited available data on the incident power density and resultant maximum temperature rise on the skin surface considering various steady-state exposure scenarios at 10-90 GHz have been statistically modeled. The synthetic data have been sampled from the fitted statistical multivariate distribution with respect to predetermined dosimetric constraints. We thus present a comprehensive and open-source dataset compiled of the high-fidelity numerical data considering various exposures to a realistic source. Furthermore, different surrogate models for predicting maximum temperature rise on the skin surface were fitted based on the synthetic dataset. All surrogate models were tested on the originally available data where satisfactory predictive performance has been demonstrated. A simple technique of combining quadratic polynomial and tensor-product spline surrogates, each operating on its own cluster of data, has achieved the lowest mean absolute error of 0.058 {\deg}C. Therefore, overall experimental results indicate the validity of the proposed synthetic dataset.
The chemical inventory of the planet-hosting disk PDS 70
As host to two accreting planets, PDS 70 provides a unique opportunity to probe the chemical complexity of atmosphere-forming material. We present ALMA Band 6 observations of the PDS~70 disk and report the first chemical inventory of the system. With a spatial resolution of 0.4''-0.5'' (sim50 au), 12 species are detected, including CO isotopologues and formaldehyde, small hydrocarbons, HCN and HCO+ isotopologues, and S-bearing molecules. SO and CH3OH are not detected. All lines show a large cavity at the center of the disk, indicative of the deep gap carved by the massive planets. The radial profiles of the line emission are compared to the (sub-)mm continuum and infrared scattered light intensity profiles. Different molecular transitions peak at different radii, revealing the complex interplay between density, temperature and chemistry in setting molecular abundances. Column densities and optical depth profiles are derived for all detected molecules, and upper limits obtained for the non detections. Excitation temperature is obtained for H2CO. Deuteration and nitrogen fractionation profiles from the hydro-cyanide lines show radially increasing fractionation levels. Comparison of the disk chemical inventory to grids of chemical models from the literature strongly suggests a disk molecular layer hosting a carbon to oxygen ratio C/O>1, thus providing for the first time compelling evidence of planets actively accreting high C/O ratio gas at present time.
Fundamental Principle of Information-to-Energy Conversion
The equivalence of 1 bit of information to entropy was given by Landauer in 1961 as kln2, k the Boltzmann constant. Erasing information implies heat dissipation and the energy of 1 bit would then be (the Landauers limit) kT ln 2, T being the ambient temperature. From a quantum-cosmological point of view the minimum quantum of energy in the universe corresponds today to a temperature of 10^(-29) degrees K, probably forming a cosmic background of a Bose condensate [1]. Then, the bit with minimum energy today in the Universe is a quantum of energy 10^(-45)ergs, with an equivalent mass of 10^(-66)g. Low temperature implies low energy per bit and, of course, this is the way for faster and less energy dissipating computing devices. Our conjecture is this: the possibility of a future access to the CBBC (a coupling/channeling?) would mean a huge jump in the performance of these devices.
ClimateLearn: Benchmarking Machine Learning for Weather and Climate Modeling
Modeling weather and climate is an essential endeavor to understand the near- and long-term impacts of climate change, as well as inform technology and policymaking for adaptation and mitigation efforts. In recent years, there has been a surging interest in applying data-driven methods based on machine learning for solving core problems such as weather forecasting and climate downscaling. Despite promising results, much of this progress has been impaired due to the lack of large-scale, open-source efforts for reproducibility, resulting in the use of inconsistent or underspecified datasets, training setups, and evaluations by both domain scientists and artificial intelligence researchers. We introduce ClimateLearn, an open-source PyTorch library that vastly simplifies the training and evaluation of machine learning models for data-driven climate science. ClimateLearn consists of holistic pipelines for dataset processing (e.g., ERA5, CMIP6, PRISM), implementation of state-of-the-art deep learning models (e.g., Transformers, ResNets), and quantitative and qualitative evaluation for standard weather and climate modeling tasks. We supplement these functionalities with extensive documentation, contribution guides, and quickstart tutorials to expand access and promote community growth. We have also performed comprehensive forecasting and downscaling experiments to showcase the capabilities and key features of our library. To our knowledge, ClimateLearn is the first large-scale, open-source effort for bridging research in weather and climate modeling with modern machine learning systems. Our library is available publicly at https://github.com/aditya-grover/climate-learn.
CLIMATE-FEVER: A Dataset for Verification of Real-World Climate Claims
We introduce CLIMATE-FEVER, a new publicly available dataset for verification of climate change-related claims. By providing a dataset for the research community, we aim to facilitate and encourage work on improving algorithms for retrieving evidential support for climate-specific claims, addressing the underlying language understanding challenges, and ultimately help alleviate the impact of misinformation on climate change. We adapt the methodology of FEVER [1], the largest dataset of artificially designed claims, to real-life claims collected from the Internet. While during this process, we could rely on the expertise of renowned climate scientists, it turned out to be no easy task. We discuss the surprising, subtle complexity of modeling real-world climate-related claims within the fever framework, which we believe provides a valuable challenge for general natural language understanding. We hope that our work will mark the beginning of a new exciting long-term joint effort by the climate science and AI community.
ClimDetect: A Benchmark Dataset for Climate Change Detection and Attribution
Detecting and attributing temperature increases due to climate change is crucial for understanding global warming and guiding adaptation strategies. The complexity of distinguishing human-induced climate signals from natural variability has challenged traditional detection and attribution (D&A) approaches, which seek to identify specific "fingerprints" in climate response variables. Deep learning offers potential for discerning these complex patterns in expansive spatial datasets. However, lack of standard protocols has hindered consistent comparisons across studies. We introduce ClimDetect, a standardized dataset of over 816k daily climate snapshots, designed to enhance model accuracy in identifying climate change signals. ClimDetect integrates various input and target variables used in past research, ensuring comparability and consistency. We also explore the application of vision transformers (ViT) to climate data, a novel and modernizing approach in this context. Our open-access data and code serve as a benchmark for advancing climate science through improved model evaluations. ClimDetect is publicly accessible via Huggingface dataet respository at: https://huggingface.co/datasets/ClimDetect/ClimDetect.
HPCR: Holistic Proxy-based Contrastive Replay for Online Continual Learning
Online continual learning (OCL) aims to continuously learn new data from a single pass over the online data stream. It generally suffers from the catastrophic forgetting issue. Existing replay-based methods effectively alleviate this issue by replaying part of old data in a proxy-based or contrastive-based replay manner. In this paper, we conduct a comprehensive analysis of these two replay manners and find they can be complementary. Inspired by this finding, we propose a novel replay-based method called proxy-based contrastive replay (PCR), which replaces anchor-to-sample pairs with anchor-to-proxy pairs in the contrastive-based loss to alleviate the phenomenon of forgetting. Based on PCR, we further develop a more advanced method named holistic proxy-based contrastive replay (HPCR), which consists of three components. The contrastive component conditionally incorporates anchor-to-sample pairs to PCR, learning more fine-grained semantic information with a large training batch. The second is a temperature component that decouples the temperature coefficient into two parts based on their impacts on the gradient and sets different values for them to learn more novel knowledge. The third is a distillation component that constrains the learning process to keep more historical knowledge. Experiments on four datasets consistently demonstrate the superiority of HPCR over various state-of-the-art methods.
Selective Attention: Enhancing Transformer through Principled Context Control
The attention mechanism within the transformer architecture enables the model to weigh and combine tokens based on their relevance to the query. While self-attention has enjoyed major success, it notably treats all queries q in the same way by applying the mapping V^topsoftmax(Kq), where V,K are the value and key embeddings respectively. In this work, we argue that this uniform treatment hinders the ability to control contextual sparsity and relevance. As a solution, we introduce the Selective Self-Attention (SSA) layer that augments the softmax nonlinearity with a principled temperature scaling strategy. By controlling temperature, SSA adapts the contextual sparsity of the attention map to the query embedding and its position in the context window. Through theory and experiments, we demonstrate that this alleviates attention dilution, aids the optimization process, and enhances the model's ability to control softmax spikiness of individual queries. We also incorporate temperature scaling for value embeddings and show that it boosts the model's ability to suppress irrelevant/noisy tokens. Notably, SSA is a lightweight method which introduces less than 0.5% new parameters through a weight-sharing strategy and can be fine-tuned on existing LLMs. Extensive empirical evaluations demonstrate that SSA-equipped models achieve a noticeable and consistent accuracy improvement on language modeling benchmarks.
DySpec: Faster Speculative Decoding with Dynamic Token Tree Structure
While speculative decoding has recently appeared as a promising direction for accelerating the inference of large language models (LLMs), the speedup and scalability are strongly bounded by the token acceptance rate. Prevalent methods usually organize predicted tokens as independent chains or fixed token trees, which fails to generalize to diverse query distributions. In this paper, we propose DySpec, a faster speculative decoding algorithm with a novel dynamic token tree structure. We begin by bridging the draft distribution and acceptance rate from intuitive and empirical clues, and successfully show that the two variables are strongly correlated. Based on this, we employ a greedy strategy to dynamically expand the token tree at run time. Theoretically, we show that our method can achieve optimal results under mild assumptions. Empirically, DySpec yields a higher acceptance rate and speedup than fixed trees. DySpec can drastically improve the throughput and reduce the latency of token generation across various data distribution and model sizes, which significantly outperforms strong competitors, including Specinfer and Sequoia. Under low temperature setting, DySpec can improve the throughput up to 9.1times and reduce the latency up to 9.4times on Llama2-70B. Under high temperature setting, DySpec can also improve the throughput up to 6.21times, despite the increasing difficulty of speculating more than one token per step for draft model.
Neural Networks for cosmological model selection and feature importance using Cosmic Microwave Background data
The measurements of the temperature and polarisation anisotropies of the Cosmic Microwave Background (CMB) by the ESA Planck mission have strongly supported the current concordance model of cosmology. However, the latest cosmological data release from ESA Planck mission still has a powerful potential to test new data science algorithms and inference techniques. In this paper, we use advanced Machine Learning (ML) algorithms, such as Neural Networks (NNs), to discern among different underlying cosmological models at the angular power spectra level, using both temperature and polarisation Planck 18 data. We test two different models beyond LambdaCDM: a modified gravity model: the Hu-Sawicki model, and an alternative inflationary model: a feature-template in the primordial power spectrum. Furthermore, we also implemented an interpretability method based on SHAP values to evaluate the learning process and identify the most relevant elements that drive our architecture to certain outcomes. We find that our NN is able to distinguish between different angular power spectra successfully for both alternative models and LambdaCDM. We conclude by explaining how archival scientific data has still a strong potential to test novel data science algorithms that are interesting for the next generation of cosmological experiments.
Quarks to Cosmos: Particles and Plasma in Cosmological evolution
We describe in the context of the particle physics (PP) standard model (SM) `PP-SM' the understanding of the primordial properties and composition of the Universe in the temperature range 130GeV>T>20keV. The Universe evolution is described using FLRW cosmology. We present a global view on particle content across time and describe the different evolution eras using deceleration parameter q. We follow the arrow of time in the expanding and cooling Universe: After the PP-SM heavies (t, h, W, Z) diminish in abundance below Tsimeq 50GeV, the PP-SM plasma in the Universe is governed by the strongly interacting Quark-Gluon content. Once the temperature drops below Tsimeq 150MeV, quarks and gluons hadronize into strongly interacting matter particles. Rapid disappearance of baryonic antimatter completes at T_B=38.2MeV. We study the ensuing disappearance of strangeness and mesons in general. We show that the different eras defined by particle populations are barely separated from each other with abundance of muons fading out just prior to T=O(2.5)MeV, the era of emergence of the free-streaming neutrinos. We discuss the two relevant fundamental constants controlling the decoupling of neutrinos. We subsequently follow the primordial Universe as it passes through the hot dense electron-positron plasma epoch. The high density of positron antimatter disappears near T=20.3keV: Nuclear reactions occur in the presence of a highly mobile and relatively strongly interacting electron-positron plasma phase. We apply plasma theory methods to describe the strong screening effects between heavy dust particle (nucleons). We analyze the paramagnetic characteristics of the electron-positron plasma when exposed to an external primordial magnetic field.
The impact of internal variability on benchmarking deep learning climate emulators
Full-complexity Earth system models (ESMs) are computationally very expensive, limiting their use in exploring the climate outcomes of multiple emission pathways. More efficient emulators that approximate ESMs can directly map emissions onto climate outcomes, and benchmarks are being used to evaluate their accuracy on standardized tasks and datasets. We investigate a popular benchmark in data-driven climate emulation, ClimateBench, on which deep learning-based emulators are currently achieving the best performance. We implement a linear regression-based emulator, akin to pattern scaling, and find that it outperforms the incumbent 100M-parameter deep learning foundation model, ClimaX, on 3 out of 4 regionally-resolved surface-level climate variables. While emulating surface temperature is expected to be predominantly linear, this result is surprising for emulating precipitation. We identify that this outcome is a result of high levels of internal variability in the benchmark targets. To address internal variability, we update the benchmark targets with ensemble averages from the MPI-ESM1.2-LR model that contain 50 instead of 3 climate simulations per emission pathway. Using the new targets, we show that linear pattern scaling continues to be more accurate on temperature, but can be outperformed by a deep learning-based model for emulating precipitation. We publish our code, data, and an interactive tutorial at github.com/blutjens/climate-emulator.
Predictive power of the Berezinskii-Kosterlitz-Thouless theory based on Renormalization Group throughout the BCS-BEC crossover in 2D superconductors
Recent experiments on 2D superconductors allow the characterization of the critical temperature and of the phase diagram across the BCS-BEC crossover as a function of density. We obtain from these experiments the microscopic parameters of the superconducting state at low temperatures by the BCS mean-field approach. For Li_xZrNCl, the extracted parameters are used to evaluate the superconducting phase stiffness and the Berezinskii-Kosterlitz-Thouless (BKT) critical temperature throughout the BCS-BEC crossover, by implementing the corresponding Renormalization Group (RG) approach. In this way, we make a quantitative test of the predictive power of the BKT theory for evaluating the critical temperature. The RG flow equations turn out to give a sizable renormalization of the phase stiffness and of the critical temperature, which is crucial to obtain a satisfactory agreement between the BKT theory and the experiments, in particular in the BCS-BEC crossover regime. We predict the temperature range where phase stiffness renormalization can be measured in Li_xZrNCl across the BCS-BEC crossover. Contrary to other microscopic theories of superconductivity, we find that the BKT theory can be exploited to evaluate quantitatively the critical temperature of 2D superconductors in different pairing regimes.
Knowledge Distillation Based on Transformed Teacher Matching
As a technique to bridge logit matching and probability distribution matching, temperature scaling plays a pivotal role in knowledge distillation (KD). Conventionally, temperature scaling is applied to both teacher's logits and student's logits in KD. Motivated by some recent works, in this paper, we drop instead temperature scaling on the student side, and systematically study the resulting variant of KD, dubbed transformed teacher matching (TTM). By reinterpreting temperature scaling as a power transform of probability distribution, we show that in comparison with the original KD, TTM has an inherent R\'enyi entropy term in its objective function, which serves as an extra regularization term. Extensive experiment results demonstrate that thanks to this inherent regularization, TTM leads to trained students with better generalization than the original KD. To further enhance student's capability to match teacher's power transformed probability distribution, we introduce a sample-adaptive weighting coefficient into TTM, yielding a novel distillation approach dubbed weighted TTM (WTTM). It is shown, by comprehensive experiments, that although WTTM is simple, it is effective, improves upon TTM, and achieves state-of-the-art accuracy performance. Our source code is available at https://github.com/zkxufo/TTM.
A Three-regime Model of Network Pruning
Recent work has highlighted the complex influence training hyperparameters, e.g., the number of training epochs, can have on the prunability of machine learning models. Perhaps surprisingly, a systematic approach to predict precisely how adjusting a specific hyperparameter will affect prunability remains elusive. To address this gap, we introduce a phenomenological model grounded in the statistical mechanics of learning. Our approach uses temperature-like and load-like parameters to model the impact of neural network (NN) training hyperparameters on pruning performance. A key empirical result we identify is a sharp transition phenomenon: depending on the value of a load-like parameter in the pruned model, increasing the value of a temperature-like parameter in the pre-pruned model may either enhance or impair subsequent pruning performance. Based on this transition, we build a three-regime model by taxonomizing the global structure of the pruned NN loss landscape. Our model reveals that the dichotomous effect of high temperature is associated with transitions between distinct types of global structures in the post-pruned model. Based on our results, we present three case-studies: 1) determining whether to increase or decrease a hyperparameter for improved pruning; 2) selecting the best model to prune from a family of models; and 3) tuning the hyperparameter of the Sharpness Aware Minimization method for better pruning performance.
What Food Do We Tweet about on a Rainy Day?
Food choice is a complex phenomenon shaped by factors such as taste, ambience, culture or weather. In this paper, we explore food-related tweeting in different weather conditions. We inspect a Latvian food tweet dataset spanning the past decade in conjunction with a weather observation dataset consisting of average temperature, precipitation, and other phenomena. We find which weather conditions lead to specific food information sharing; automatically classify tweet sentiment and discuss how it changes depending on the weather. This research contributes to the growing area of large-scale social network data understanding of food consumers' choices and perceptions.
Hard-Constrained Deep Learning for Climate Downscaling
The availability of reliable, high-resolution climate and weather data is important to inform long-term decisions on climate adaptation and mitigation and to guide rapid responses to extreme events. Forecasting models are limited by computational costs and, therefore, often generate coarse-resolution predictions. Statistical downscaling, including super-resolution methods from deep learning, can provide an efficient method of upsampling low-resolution data. However, despite achieving visually compelling results in some cases, such models frequently violate conservation laws when predicting physical variables. In order to conserve physical quantities, here we introduce methods that guarantee statistical constraints are satisfied by a deep learning downscaling model, while also improving their performance according to traditional metrics. We compare different constraining approaches and demonstrate their applicability across different neural architectures as well as a variety of climate and weather data sets. Besides enabling faster and more accurate climate predictions through downscaling, we also show that our novel methodologies can improve super-resolution for satellite data and natural images data sets.
Machine learning-driven Anomaly Detection and Forecasting for Euclid Space Telescope Operations
State-of-the-art space science missions increasingly rely on automation due to spacecraft complexity and the costs of human oversight. The high volume of data, including scientific and telemetry data, makes manual inspection challenging. Machine learning offers significant potential to meet these demands. The Euclid space telescope, in its survey phase since February 2024, exemplifies this shift. Euclid's success depends on accurate monitoring and interpretation of housekeeping telemetry and science-derived data. Thousands of telemetry parameters, monitored as time series, may or may not impact the quality of scientific data. These parameters have complex interdependencies, often due to physical relationships (e.g., proximity of temperature sensors). Optimising science operations requires careful anomaly detection and identification of hidden parameter states. Moreover, understanding the interactions between known anomalies and physical quantities is crucial yet complex, as related parameters may display anomalies with varied timing and intensity. We address these challenges by analysing temperature anomalies in Euclid's telemetry from February to August 2024, focusing on eleven temperature parameters and 35 covariates. We use a predictive XGBoost model to forecast temperatures based on historical values, detecting anomalies as deviations from predictions. A second XGBoost model predicts anomalies from covariates, capturing their relationships to temperature anomalies. We identify the top three anomalies per parameter and analyse their interactions with covariates using SHAP (Shapley Additive Explanations), enabling rapid, automated analysis of complex parameter relationships. Our method demonstrates how machine learning can enhance telemetry monitoring, offering scalable solutions for other missions with similar data challenges.
SimpleStrat: Diversifying Language Model Generation with Stratification
Generating diverse responses from large language models (LLMs) is crucial for applications such as planning/search and synthetic data generation, where diversity provides distinct answers across generations. Prior approaches rely on increasing temperature to increase diversity. However, contrary to popular belief, we show not only does this approach produce lower quality individual generations as temperature increases, but it depends on model's next-token probabilities being similar to the true distribution of answers. We propose , an alternative approach that uses the language model itself to partition the space into strata. At inference, a random stratum is selected and a sample drawn from within the strata. To measure diversity, we introduce CoverageQA, a dataset of underspecified questions with multiple equally plausible answers, and assess diversity by measuring KL Divergence between the output distribution and uniform distribution over valid ground truth answers. As computing probability per response/solution for proprietary models is infeasible, we measure recall on ground truth solutions. Our evaluation show using SimpleStrat achieves higher recall by 0.05 compared to GPT-4o and 0.36 average reduction in KL Divergence compared to Llama 3.
Automatic extraction of materials and properties from superconductors scientific literature
The automatic extraction of materials and related properties from the scientific literature is gaining attention in data-driven materials science (Materials Informatics). In this paper, we discuss Grobid-superconductors, our solution for automatically extracting superconductor material names and respective properties from text. Built as a Grobid module, it combines machine learning and heuristic approaches in a multi-step architecture that supports input data as raw text or PDF documents. Using Grobid-superconductors, we built SuperCon2, a database of 40324 materials and properties records from 37700 papers. The material (or sample) information is represented by name, chemical formula, and material class, and is characterized by shape, doping, substitution variables for components, and substrate as adjoined information. The properties include the Tc superconducting critical temperature and, when available, applied pressure with the Tc measurement method.
Developing an Optimal Model for Predicting the Severity of Wheat Stem Rust (Case study of Arsi and Bale Zone)
This research utilized three types of artificial neural network (ANN) methodologies, namely Backpropagation Neural Network (BPNN) with varied training, transfer, divide, and learning functions; Radial Basis Function Neural Network (RBFNN); and General Regression Neural Network (GRNN), to forecast the severity of stem rust. It considered parameters such as mean maximum temperature, mean minimum temperature, mean rainfall, mean average temperature, mean relative humidity, and different wheat varieties. The statistical analysis revealed that GRNN demonstrated effective predictive capability and required less training time compared to the other models. Additionally, the results indicated that total seasonal rainfall positively influenced the development of wheat stem rust. Keywords: Wheat stem rust, Back propagation neural network, Radial Basis Function Neural Network, General Regression Neural Network.
A note on the gravitational dark matter production
Dark matter, one of the fundamental components of the universe, has remained mysterious in modern cosmology and particle physics, and hence, this field is of utmost importance at present moment. One of the foundational questions in this direction is the origin of dark matter which directly links with its creation. In the present article we study the gravitational production of dark matter in two distinct contexts: firstly, when reheating occurs through the gravitational particle production, and secondly, when it is driven by the inflaton's decay. We establish a connection between the reheating temperature and the mass of dark matter, and from the reheating bounds, we determine the range of viable dark matter mass values.
Benchmarking LLMs in Political Content Text-Annotation: Proof-of-Concept with Toxicity and Incivility Data
This article benchmarked the ability of OpenAI's GPTs and a number of open-source LLMs to perform annotation tasks on political content. We used a novel protest event dataset comprising more than three million digital interactions and created a gold standard that includes ground-truth labels annotated by human coders about toxicity and incivility on social media. We included in our benchmark Google's Perspective algorithm, which, along with GPTs, was employed throughout their respective APIs while the open-source LLMs were deployed locally. The findings show that Perspective API using a laxer threshold, GPT-4o, and Nous Hermes 2 Mixtral outperform other LLM's zero-shot classification annotations. In addition, Nous Hermes 2 and Mistral OpenOrca, with a smaller number of parameters, are able to perform the task with high performance, being attractive options that could offer good trade-offs between performance, implementing costs and computing time. Ancillary findings using experiments setting different temperature levels show that although GPTs tend to show not only excellent computing time but also overall good levels of reliability, only open-source LLMs ensure full reproducibility in the annotation.
CloudTracks: A Dataset for Localizing Ship Tracks in Satellite Images of Clouds
Clouds play a significant role in global temperature regulation through their effect on planetary albedo. Anthropogenic emissions of aerosols can alter the albedo of clouds, but the extent of this effect, and its consequent impact on temperature change, remains uncertain. Human-induced clouds caused by ship aerosol emissions, commonly referred to as ship tracks, provide visible manifestations of this effect distinct from adjacent cloud regions and therefore serve as a useful sandbox to study human-induced clouds. However, the lack of large-scale ship track data makes it difficult to deduce their general effects on cloud formation. Towards developing automated approaches to localize ship tracks at scale, we present CloudTracks, a dataset containing 3,560 satellite images labeled with more than 12,000 ship track instance annotations. We train semantic segmentation and instance segmentation model baselines on our dataset and find that our best model substantially outperforms previous state-of-the-art for ship track localization (61.29 vs. 48.65 IoU). We also find that the best instance segmentation model is able to identify the number of ship tracks in each image more accurately than the previous state-of-the-art (1.64 vs. 4.99 MAE). However, we identify cases where the best model struggles to accurately localize and count ship tracks, so we believe CloudTracks will stimulate novel machine learning approaches to better detect elongated and overlapping features in satellite images. We release our dataset openly at {zenodo.org/records/10042922}.
Rethinking Data Distillation: Do Not Overlook Calibration
Neural networks trained on distilled data often produce over-confident output and require correction by calibration methods. Existing calibration methods such as temperature scaling and mixup work well for networks trained on original large-scale data. However, we find that these methods fail to calibrate networks trained on data distilled from large source datasets. In this paper, we show that distilled data lead to networks that are not calibratable due to (i) a more concentrated distribution of the maximum logits and (ii) the loss of information that is semantically meaningful but unrelated to classification tasks. To address this problem, we propose Masked Temperature Scaling (MTS) and Masked Distillation Training (MDT) which mitigate the limitations of distilled data and achieve better calibration results while maintaining the efficiency of dataset distillation.
Two-Scale Gradient Descent Ascent Dynamics Finds Mixed Nash Equilibria of Continuous Games: A Mean-Field Perspective
Finding the mixed Nash equilibria (MNE) of a two-player zero sum continuous game is an important and challenging problem in machine learning. A canonical algorithm to finding the MNE is the noisy gradient descent ascent method which in the infinite particle limit gives rise to the {\em Mean-Field Gradient Descent Ascent} (GDA) dynamics on the space of probability measures. In this paper, we first study the convergence of a two-scale Mean-Field GDA dynamics for finding the MNE of the entropy-regularized objective. More precisely we show that for each finite temperature (or regularization parameter), the two-scale Mean-Field GDA with a suitable {\em finite} scale ratio converges exponentially to the unique MNE without assuming the convexity or concavity of the interaction potential. The key ingredient of our proof lies in the construction of new Lyapunov functions that dissipate exponentially along the Mean-Field GDA. We further study the simulated annealing of the Mean-Field GDA dynamics. We show that with a temperature schedule that decays logarithmically in time the annealed Mean-Field GDA converges to the MNE of the original unregularized objective.
Model-Aware Contrastive Learning: Towards Escaping the Dilemmas
Contrastive learning (CL) continuously achieves significant breakthroughs across multiple domains. However, the most common InfoNCE-based methods suffer from some dilemmas, such as uniformity-tolerance dilemma (UTD) and gradient reduction, both of which are related to a P_{ij} term. It has been identified that UTD can lead to unexpected performance degradation. We argue that the fixity of temperature is to blame for UTD. To tackle this challenge, we enrich the CL loss family by presenting a Model-Aware Contrastive Learning (MACL) strategy, whose temperature is adaptive to the magnitude of alignment that reflects the basic confidence of the instance discrimination task, then enables CL loss to adjust the penalty strength for hard negatives adaptively. Regarding another dilemma, the gradient reduction issue, we derive the limits of an involved gradient scaling factor, which allows us to explain from a unified perspective why some recent approaches are effective with fewer negative samples, and summarily present a gradient reweighting to escape this dilemma. Extensive remarkable empirical results in vision, sentence, and graph modality validate our approach's general improvement for representation learning and downstream tasks.
An error indicator-based adaptive reduced order model for nonlinear structural mechanics -- application to high-pressure turbine blades
The industrial application motivating this work is the fatigue computation of aircraft engines' high-pressure turbine blades. The material model involves nonlinear elastoviscoplastic behavior laws, for which the parameters depend on the temperature. For this application, the temperature loading is not accurately known and can reach values relatively close to the creep temperature: important nonlinear effects occur and the solution strongly depends on the used thermal loading. We consider a nonlinear reduced order model able to compute, in the exploitation phase, the behavior of the blade for a new temperature field loading. The sensitivity of the solution to the temperature makes {the classical unenriched proper orthogonal decomposition method} fail. In this work, we propose a new error indicator, quantifying the error made by the reduced order model in computational complexity independent of the size of the high-fidelity reference model. In our framework, when the {error indicator} becomes larger than a given tolerance, the reduced order model is updated using one time step solution of the high-fidelity reference model. The approach is illustrated on a series of academic test cases and applied on a setting of industrial complexity involving 5 million degrees of freedom, where the whole procedure is computed in parallel with distributed memory.
WeatherBench 2: A benchmark for the next generation of data-driven global weather models
WeatherBench 2 is an update to the global, medium-range (1-14 day) weather forecasting benchmark proposed by Rasp et al. (2020), designed with the aim to accelerate progress in data-driven weather modeling. WeatherBench 2 consists of an open-source evaluation framework, publicly available training, ground truth and baseline data as well as a continuously updated website with the latest metrics and state-of-the-art models: https://sites.research.google/weatherbench. This paper describes the design principles of the evaluation framework and presents results for current state-of-the-art physical and data-driven weather models. The metrics are based on established practices for evaluating weather forecasts at leading operational weather centers. We define a set of headline scores to provide an overview of model performance. In addition, we also discuss caveats in the current evaluation setup and challenges for the future of data-driven weather forecasting.
Forecasting Global Weather with Graph Neural Networks
We present a data-driven approach for forecasting global weather using graph neural networks. The system learns to step forward the current 3D atmospheric state by six hours, and multiple steps are chained together to produce skillful forecasts going out several days into the future. The underlying model is trained on reanalysis data from ERA5 or forecast data from GFS. Test performance on metrics such as Z500 (geopotential height) and T850 (temperature) improves upon previous data-driven approaches and is comparable to operational, full-resolution, physical models from GFS and ECMWF, at least when evaluated on 1-degree scales and when using reanalysis initial conditions. We also show results from connecting this data-driven model to live, operational forecasts from GFS.
Top-$nσ$: Not All Logits Are You Need
Large language models (LLMs) typically employ greedy decoding or low-temperature sampling for reasoning tasks, reflecting a perceived trade-off between diversity and accuracy. We challenge this convention by introducing top-nsigma, a novel sampling method that operates directly on pre-softmax logits by leveraging a statistical threshold. Our key insight is that logits naturally separate into a Gaussian-distributed noisy region and a distinct informative region, enabling efficient token filtering without complex probability manipulations. Unlike existing methods (e.g., top-p, min-p) that inadvertently include more noise tokens at higher temperatures, top-nsigma maintains a stable sampling space regardless of temperature scaling. We also provide a theoretical analysis of top-nsigma to better understand its behavior. The extensive experimental results across four reasoning-focused datasets demonstrate that our method not only outperforms existing sampling approaches but also surpasses greedy decoding, while maintaining consistent performance even at high temperatures.
Priority Sampling of Large Language Models for Compilers
Large language models show great potential in generating and optimizing code. Widely used sampling methods such as Nucleus Sampling increase the diversity of generation but often produce repeated samples for low temperatures and incoherent samples for high temperatures. Furthermore, the temperature coefficient has to be tuned for each task, limiting its usability. We present Priority Sampling, a simple and deterministic sampling technique that produces unique samples ordered by the model's confidence. Each new sample expands the unexpanded token with the highest probability in the augmented search tree. Additionally, Priority Sampling supports generation based on regular expression that provides a controllable and structured exploration process. Priority Sampling outperforms Nucleus Sampling for any number of samples, boosting the performance of the original model from 2.87% to 5% improvement over -Oz. Moreover, it outperforms the autotuner used for the generation of labels for the training of the original model in just 30 samples.
Are LLMs Aware that Some Questions are not Open-ended?
Large Language Models (LLMs) have shown the impressive capability of answering questions in a wide range of scenarios. However, when LLMs face different types of questions, it is worth exploring whether LLMs are aware that some questions have limited answers and need to respond more deterministically but some do not. We refer to this as question awareness of LLMs. The lack of question awareness in LLMs leads to two phenomena that LLMs are: (1) too casual to answer non-open-ended questions or (2) too boring to answer open-ended questions. In this paper, we first evaluate the question awareness in LLMs. The experimental results show that LLMs have the issues of lacking awareness of questions in certain domains, e.g. factual knowledge, resulting in hallucinations during the generation. To mitigate these, we propose a method called Question Awareness Temperature Sampling (QuATS). This method enhances the question awareness of LLMs by adaptively adjusting the output distributions based on question features. The automatic adjustment in QuATS eliminates the need for manual temperature tuning in text generation and consistently improves model performance in various benchmarks.
Length Generalization of Causal Transformers without Position Encoding
Generalizing to longer sentences is important for recent Transformer-based language models. Besides algorithms manipulating explicit position features, the success of Transformers without position encodings (NoPE) provides a new way to overcome the challenge. In this paper, we study the length generalization property of NoPE. We find that although NoPE can extend to longer sequences than the commonly used explicit position encodings, it still has a limited context length. We identify a connection between the failure of NoPE's generalization and the distraction of attention distributions. We propose a parameter-efficient tuning for searching attention heads' best temperature hyper-parameters, which substantially expands NoPE's context size. Experiments on long sequence language modeling, the synthetic passkey retrieval task and real-world long context tasks show that NoPE can achieve competitive performances with state-of-the-art length generalization algorithms. The source code is publicly accessible
Understanding the Impact of Post-Training Quantization on Large Language Models
Large language models (LLMs) are rapidly increasing in size, with the number of parameters becoming a key factor in the success of many commercial models, such as ChatGPT, Claude, and Bard. Even the recently released publicly accessible models for commercial usage, such as Falcon and Llama2, come equipped with billions of parameters. This significant increase in the number of parameters makes deployment and operation very costly. The remarkable progress in the field of quantization for large neural networks in general and LLMs in particular, has made these models more accessible by enabling them to be deployed on consumer-grade GPUs. Quantized models generally demonstrate comparable performance levels to their unquantized base counterparts. Nonetheless, there exists a notable gap in our comprehensive understanding of how these quantized models respond to hyperparameters, such as temperature, max new tokens, and topk, particularly for next word prediction. The present analysis reveals that nf4 and fp4 are equally proficient 4-bit quantization techniques, characterized by similar attributes such as inference speed, memory consumption, and the quality of generated content. the study identifies nf4 as displaying greater resilience to temperature variations in the case of the llama2 series of models at lower temperature, while fp4 and fp4-dq proves to be a more suitable choice for falcon series of models. It is noteworthy that, in general, 4-bit quantized models of varying sizes exhibit higher sensitivity to temperature in the range of 0.5 to 0.8, unlike their unquantized counterparts. Additionally, int8 quantization is associated with significantly slower inference speeds, whereas unquantized bfloat16 models consistently yield the fastest inference speeds across models of all sizes.
Predicting Thermoelectric Power Factor of Bismuth Telluride During Laser Powder Bed Fusion Additive Manufacturing
An additive manufacturing (AM) process, like laser powder bed fusion, allows for the fabrication of objects by spreading and melting powder in layers until a freeform part shape is created. In order to improve the properties of the material involved in the AM process, it is important to predict the material characterization property as a function of the processing conditions. In thermoelectric materials, the power factor is a measure of how efficiently the material can convert heat to electricity. While earlier works have predicted the material characterization properties of different thermoelectric materials using various techniques, implementation of machine learning models to predict the power factor of bismuth telluride (Bi2Te3) during the AM process has not been explored. This is important as Bi2Te3 is a standard material for low temperature applications. Thus, we used data about manufacturing processing parameters involved and in-situ sensor monitoring data collected during AM of Bi2Te3, to train different machine learning models in order to predict its thermoelectric power factor. We implemented supervised machine learning techniques using 80% training and 20% test data and further used the permutation feature importance method to identify important processing parameters and in-situ sensor features which were best at predicting power factor of the material. Ensemble-based methods like random forest, AdaBoost classifier, and bagging classifier performed the best in predicting power factor with the highest accuracy of 90% achieved by the bagging classifier model. Additionally, we found the top 15 processing parameters and in-situ sensor features to characterize the material manufacturing property like power factor. These features could further be optimized to maximize power factor of the thermoelectric material and improve the quality of the products built using this material.
Generic Approach to Visualization of Time Series Data
Time series is a collection of data instances that are ordered according to a time stamp. Stock prices, temperature, etc are examples of time series data in real life. Time series data are used for forecasting sales, predicting trends. Visualization is the process of visually representing data or the relationship between features of a data either in a two-dimensional plot or a three-dimensional plot. Visualizing the time series data constitutes an important part of the process for working with a time series dataset. Visualizing the data not only helps in the modelling process but it can also be used to identify trends and features that cause those trends. In this work, we take a real-life time series dataset and analyse how the target feature relates to other features of the dataset through visualization. From the work that has been carried out, we present an effective method of visualization for time series data which will be much useful for machine learning modelling with such datasets.
Unlocking the Capabilities of Masked Generative Models for Image Synthesis via Self-Guidance
Masked generative models (MGMs) have shown impressive generative ability while providing an order of magnitude efficient sampling steps compared to continuous diffusion models. However, MGMs still underperform in image synthesis compared to recent well-developed continuous diffusion models with similar size in terms of quality and diversity of generated samples. A key factor in the performance of continuous diffusion models stems from the guidance methods, which enhance the sample quality at the expense of diversity. In this paper, we extend these guidance methods to generalized guidance formulation for MGMs and propose a self-guidance sampling method, which leads to better generation quality. The proposed approach leverages an auxiliary task for semantic smoothing in vector-quantized token space, analogous to the Gaussian blur in continuous pixel space. Equipped with the parameter-efficient fine-tuning method and high-temperature sampling, MGMs with the proposed self-guidance achieve a superior quality-diversity trade-off, outperforming existing sampling methods in MGMs with more efficient training and sampling costs. Extensive experiments with the various sampling hyperparameters confirm the effectiveness of the proposed self-guidance.
Upsample or Upweight? Balanced Training on Heavily Imbalanced Datasets
Data availability across domains often follows a long-tail distribution: a few domains have abundant data, while most face dat . a scarcity. This imbalance poses challenges in training language models uniformly across all domains. In our study, we focus on multilingual settings, where data sizes vary significantly between high- and low-resource languages. Common strategies to address this include upsampling low-resource languages (Temperature Sampling) or upweighting their loss (Scalarization). Although often considered equivalent, this assumption has not been proven, which motivates our study. Through both theoretical and empirical analysis, we identify the conditions under which these approaches are equivalent and when they diverge. Specifically, we demonstrate that these two methods are equivalent under full gradient descent, but this equivalence breaks down with stochastic gradient descent. Empirically, we observe that Temperature Sampling converges more quickly but is prone to overfitting. We argue that this faster convergence is likely due to the lower variance in gradient estimations, as shown theoretically. Based on these insights, we propose Cooldown, a strategy that reduces sampling temperature during training, accelerating convergence without overfitting to low-resource languages. Our method is competitive with existing data re-weighting and offers computational efficiency.
Masked Diffusion Models are Secretly Time-Agnostic Masked Models and Exploit Inaccurate Categorical Sampling
Masked diffusion models (MDMs) have emerged as a popular research topic for generative modeling of discrete data, thanks to their superior performance over other discrete diffusion models, and are rivaling the auto-regressive models (ARMs) for language modeling tasks. The recent effort in simplifying the masked diffusion framework further leads to alignment with continuous-space diffusion models and more principled training and sampling recipes. In this paper, however, we reveal that both training and sampling of MDMs are theoretically free from the time variable, arguably the key signature of diffusion models, and are instead equivalent to masked models. The connection on the sampling aspect is drawn by our proposed first-hitting sampler (FHS). Specifically, we show that the FHS is theoretically equivalent to MDMs' original generation process while significantly alleviating the time-consuming categorical sampling and achieving a 20times speedup. In addition, our investigation raises doubts about whether MDMs can truly beat ARMs. We identify, for the first time, an underlying numerical issue, even with the commonly used 32-bit floating-point precision, which results in inaccurate categorical sampling. We show that the numerical issue lowers the effective temperature both theoretically and empirically, and the resulting decrease in token diversity makes previous evaluations, which assess the generation quality solely through the incomplete generative perplexity metric, somewhat unfair.
Transcendence: Generative Models Can Outperform The Experts That Train Them
Generative models are trained with the simple objective of imitating the conditional probability distribution induced by the data they are trained on. Therefore, when trained on data generated by humans, we may not expect the artificial model to outperform the humans on their original objectives. In this work, we study the phenomenon of transcendence: when a generative model achieves capabilities that surpass the abilities of the experts generating its data. We demonstrate transcendence by training an autoregressive transformer to play chess from game transcripts, and show that the trained model can sometimes achieve better performance than all players in the dataset. We theoretically prove that transcendence is enabled by low-temperature sampling, and rigorously assess this experimentally. Finally, we discuss other sources of transcendence, laying the groundwork for future investigation of this phenomenon in a broader setting.
Quantum Thermalization via Travelling Waves
Isolated quantum many-body systems which thermalize under their own dynamics are expected to act as their own thermal baths, thereby bringing their local subsystems to thermal equilibrium. Here we show that the infinite-dimensional limit of a quantum lattice model, as described by Dynamical Mean-Field theory (DMFT), provides a natural framework to understand this self-consistent thermalization process. Using the Fermi-Hubbard model as working example, we demonstrate that the emergence of a self-consistent bath thermalising the system is characterized by a sharp thermalization front, moving balistically and separating the initial condition from the long-time thermal fixed point. We characterize the full DMFT dynamics through an effective temperature for which we derive a travelling-wave equation of the Fisher-Kolmogorov-Petrovsky-Piskunov (FKPP) type. This equation allows to predict the asymptotic shape of the front and its velocity, which match perfectly the full DMFT numerics. Our results provide a new angle to understand the onset of quantum thermalisation in closed isolated systems.
Normalizing flows as an enhanced sampling method for atomistic supercooled liquids
Normalizing flows can transform a simple prior probability distribution into a more complex target distribution. Here, we evaluate the ability and efficiency of generative machine learning methods to sample the Boltzmann distribution of an atomistic model for glass-forming liquids. This is a notoriously difficult task, as it amounts to ergodically exploring the complex free energy landscape of a disordered and frustrated many-body system. We optimize a normalizing flow model to successfully transform high-temperature configurations of a dense liquid into low-temperature ones, near the glass transition. We perform a detailed comparative analysis with established enhanced sampling techniques developed in the physics literature to assess and rank the performance of normalizing flows against state-of-the-art algorithms. We demonstrate that machine learning methods are very promising, showing a large speedup over conventional molecular dynamics. Normalizing flows show performances comparable to parallel tempering and population annealing, while still falling far behind the swap Monte Carlo algorithm. Our study highlights the potential of generative machine learning models in scientific computing for complex systems, but also points to some of its current limitations and the need for further improvement.
Contrastive Latent Space Reconstruction Learning for Audio-Text Retrieval
Cross-modal retrieval (CMR) has been extensively applied in various domains, such as multimedia search engines and recommendation systems. Most existing CMR methods focus on image-to-text retrieval, whereas audio-to-text retrieval, a less explored domain, has posed a great challenge due to the difficulty to uncover discriminative features from audio clips and texts. Existing studies are restricted in the following two ways: 1) Most researchers utilize contrastive learning to construct a common subspace where similarities among data can be measured. However, they considers only cross-modal transformation, neglecting the intra-modal separability. Besides, the temperature parameter is not adaptively adjusted along with semantic guidance, which degrades the performance. 2) These methods do not take latent representation reconstruction into account, which is essential for semantic alignment. This paper introduces a novel audio-text oriented CMR approach, termed Contrastive Latent Space Reconstruction Learning (CLSR). CLSR improves contrastive representation learning by taking intra-modal separability into account and adopting an adaptive temperature control strategy. Moreover, the latent representation reconstruction modules are embedded into the CMR framework, which improves modal interaction. Experiments in comparison with some state-of-the-art methods on two audio-text datasets have validated the superiority of CLSR.
Employing Explainable Artificial Intelligence (XAI) Methodologies to Analyze the Correlation between Input Variables and Tensile Strength in Additively Manufactured Samples
This research paper explores the impact of various input parameters, including Infill percentage, Layer Height, Extrusion Temperature, and Print Speed, on the resulting Tensile Strength in objects produced through additive manufacturing. The main objective of this study is to enhance our understanding of the correlation between the input parameters and Tensile Strength, as well as to identify the key factors influencing the performance of the additive manufacturing process. To achieve this objective, we introduced the utilization of Explainable Artificial Intelligence (XAI) techniques for the first time, which allowed us to analyze the data and gain valuable insights into the system's behavior. Specifically, we employed SHAP (SHapley Additive exPlanations), a widely adopted framework for interpreting machine learning model predictions, to provide explanations for the behavior of a machine learning model trained on the data. Our findings reveal that the Infill percentage and Extrusion Temperature have the most significant influence on Tensile Strength, while the impact of Layer Height and Print Speed is relatively minor. Furthermore, we discovered that the relationship between the input parameters and Tensile Strength is highly intricate and nonlinear, making it difficult to accurately describe using simple linear models.
Evidence for Widespread Hydrogen Sequestration within the Moon's South Polar Cold Traps
The measured neutron flux from the Moons south polar region shows evidence of locally enhanced hydrogen concentrations, likely in the form of water ice, within most permanently shadowed regions (PSR), poleward of 77 deg S latitude. Results are consistent with the original findings of Watson et al, 1961, which found that the PSRs cryogenic surfaces create exclusive conditions for the sequestration of water ice, due to their extremely low sublimation rates. Widespread PSR hydrogenation is demonstrated in several studies by showing that the contrasting PSR area distribution is being instrumentally blurred. The PSRs expected hydrogen observations are correlated by their area fraction of the fixed 30 km diameter footprint area of the Collimated Sensor for Epithermal Neutrons (CSETN), which is part of the Lunar Exploration Neutron Detector (LEND) onboard the Lunar Reconnaissance Orbiter (LRO). The correlation indicates that the PSRs are similarly hydrogenated, with an expected concentration = 0.27 wt%, relative to that of the anhydrous reference terrain (lower bounds). Hydrogen concentrations are demonstrated to be correlated to maximum temperature distributions within the basins of Haworth, Shoemaker and Faustini PSRs. Cabeus-1 PSR shows an anomalously enhanced hydrogen concentration indicating a second process contributes to its hydrogen budget. Results are consistent with ongoing processes that introduce volatiles to the surface including outgassing, solar wind production with regolith silicates, and mixing from small scale meteor impacts and diurnal temperature variation. We validate the bandpass filter used to subtract CSETNs detection of uncollimated neutrons with profiles of several PSRs neutron suppression before and after processing. Keywords: Moon, Epithermal Neutron, Hydrogen, Water, Ice, Volatiles, LRO, LEND, Diviner, LOLA
Swing Distillation: A Privacy-Preserving Knowledge Distillation Framework
Knowledge distillation (KD) has been widely used for model compression and knowledge transfer. Typically, a big teacher model trained on sufficient data transfers knowledge to a small student model. However, despite the success of KD, little effort has been made to study whether KD leaks the training data of the teacher model. In this paper, we experimentally reveal that KD suffers from the risk of privacy leakage. To alleviate this issue, we propose a novel knowledge distillation method, swing distillation, which can effectively protect the private information of the teacher model from flowing to the student model. In our framework, the temperature coefficient is dynamically and adaptively adjusted according to the degree of private information contained in the data, rather than a predefined constant hyperparameter. It assigns different temperatures to tokens according to the likelihood that a token in a position contains private information. In addition, we inject noise into soft targets provided to the student model, in order to avoid unshielded knowledge transfer. Experiments on multiple datasets and tasks demonstrate that the proposed swing distillation can significantly reduce (by over 80% in terms of canary exposure) the risk of privacy leakage in comparison to KD with competitive or better performance. Furthermore, swing distillation is robust against the increasing privacy budget.
Arithmetic Sampling: Parallel Diverse Decoding for Large Language Models
Decoding methods for large language models often trade-off between diversity of outputs and parallelism of computation. Methods such as beam search and Gumbel top-k sampling can guarantee a different output for each element of the beam, but are not easy to parallelize. Alternatively, methods such as temperature sampling and its modifications (top-k sampling, nucleus sampling, typical decoding, and others), are embarrassingly parallel, but have no guarantees about duplicate samples. We present a framework for sampling according to an arithmetic code book implicitly defined by a large language model, compatible with common sampling variations, with provable beam diversity under certain conditions, as well as being embarrassingly parallel and providing unbiased and consistent expectations from the original model. We demonstrate the effectiveness of our approach on WMT machine translation, more than halving the standard deviation when estimating expected BLEU score reward, and closing the BLEU score gap between independent sampling and beam search by up to 63%.
Understanding the Behaviour of Contrastive Loss
Unsupervised contrastive learning has achieved outstanding success, while the mechanism of contrastive loss has been less studied. In this paper, we concentrate on the understanding of the behaviours of unsupervised contrastive loss. We will show that the contrastive loss is a hardness-aware loss function, and the temperature {\tau} controls the strength of penalties on hard negative samples. The previous study has shown that uniformity is a key property of contrastive learning. We build relations between the uniformity and the temperature {\tau} . We will show that uniformity helps the contrastive learning to learn separable features, however excessive pursuit to the uniformity makes the contrastive loss not tolerant to semantically similar samples, which may break the underlying semantic structure and be harmful to the formation of features useful for downstream tasks. This is caused by the inherent defect of the instance discrimination objective. Specifically, instance discrimination objective tries to push all different instances apart, ignoring the underlying relations between samples. Pushing semantically consistent samples apart has no positive effect for acquiring a prior informative to general downstream tasks. A well-designed contrastive loss should have some extents of tolerance to the closeness of semantically similar samples. Therefore, we find that the contrastive loss meets a uniformity-tolerance dilemma, and a good choice of temperature can compromise these two properties properly to both learn separable features and tolerant to semantically similar samples, improving the feature qualities and the downstream performances.
Enhancing The Reliability of Out-of-distribution Image Detection in Neural Networks
We consider the problem of detecting out-of-distribution images in neural networks. We propose ODIN, a simple and effective method that does not require any change to a pre-trained neural network. Our method is based on the observation that using temperature scaling and adding small perturbations to the input can separate the softmax score distributions between in- and out-of-distribution images, allowing for more effective detection. We show in a series of experiments that ODIN is compatible with diverse network architectures and datasets. It consistently outperforms the baseline approach by a large margin, establishing a new state-of-the-art performance on this task. For example, ODIN reduces the false positive rate from the baseline 34.7% to 4.3% on the DenseNet (applied to CIFAR-10) when the true positive rate is 95%.
Observational signatures of mixing-induced cooling in the Kelvin-Helmholtz instability
Cool (approx 10^4K), dense material permeates the hot (approx 10^6K), tenuous solar corona in form of coronal condensations, for example prominences and coronal rain. As the solar atmosphere evolves, turbulence can drive mixing between the condensations and the surrounding corona, with the mixing layer exhibiting an enhancement in emission from intermediate temperature (approx10^5K) spectral lines, which is often attributed to turbulent heating within the mixing layer. However, radiative cooling is highly efficient at intermediate temperatures and numerical simulations have shown that radiative cooling can far exceed turbulent heating in prominence-corona mixing scenarios. As such the mixing layer can have a net loss of thermal energy, i.e., the mixing layer is cooling rather than heating. Here, we investigate the observational signatures of cooling processes in Kelvin-Helmholtz mixing between a prominence thread and the surrounding solar corona through 2D numerical simulations. Optically thin emission is synthesised for Si IV, along with optically thick emission for Halpha, Ca II K and Mg II h using Lightweaver The Mg II h probes the turbulent mixing layer, whereas Halpha and Ca II K form within the thread and along its boundary respectively. As the mixing evolves, intermediate temperatures form leading to an increase in Si IV emission, which coincides with increased radiative losses. The simulation is dominated by cooling in the mixing layer, rather than turbulent heating, and yet enhanced emission in warm lines is produced. As such, an observational signature of decreased emission in cooler lines and increased emission in hotter lines may be a signature of mixing, rather than an implication of heating.
Gas dynamics around a Jupiter mass planet: II. Chemical evolution of circumplanetary material
In an ongoing effort to understand planet formation the link between the chemistry of the protoplanetary disk and the properties of resulting planets have long been a subject of interest. These connections have generally been made between mature planets and young protoplanetary disks through the carbon-to-oxygen (C/O) ratio. In a rare number of systems, young protoplanets have been found within their natal protoplanetary disks. These systems offer a unique opportunity to directly study the delivery of gas from the protoplanetary disk to the planet. In this work we post-process 3D numerical simulations of an embedded Jupiter-massed planet in its protoplanetary disk to explore the chemical evolution of gas as it flows from the disk to the planet. The relevant dust to this chemical evolution is assumed to be small, co-moving grains with a reduced dust-to-gas ratio indicative of the upper atmosphere of a protoplanetary disk. We find that as the gas enters deep into the planet's gravitational well, it warms significantly (up to sim 800 K), releasing all of the volatile content from the ice phase. This change in phase can influence our understanding of the delivery of volatile species to the atmospheres of giant planets. The primary carbon, oxygen, and sulfur carrying ices: CO_2, H_2O, and H_2S are released into the gas phase and along with the warm gas temperatures near the embedded planets lead to the production of unique species like CS, SO, and SO_2 compared to the protoplanetary disk. We compute the column densities of SO, SO_2, CS, and H_2CS in our model and find that their values are consistent with previous observational studies.
Open-Source Large Language Models Outperform Crowd Workers and Approach ChatGPT in Text-Annotation Tasks
This study examines the performance of open-source Large Language Models (LLMs) in text annotation tasks and compares it with proprietary models like ChatGPT and human-based services such as MTurk. While prior research demonstrated the high performance of ChatGPT across numerous NLP tasks, open-source LLMs like HugginChat and FLAN are gaining attention for their cost-effectiveness, transparency, reproducibility, and superior data protection. We assess these models using both zero-shot and few-shot approaches and different temperature parameters across a range of text annotation tasks. Our findings show that while ChatGPT achieves the best performance in most tasks, open-source LLMs not only outperform MTurk but also demonstrate competitive potential against ChatGPT in specific tasks.
CXR-LLaVA: Multimodal Large Language Model for Interpreting Chest X-ray Images
Purpose: Recent advancements in large language models (LLMs) have expanded their capabilities in a multimodal fashion, potentially replicating the image interpretation of human radiologists. This study aimed to develop open-source multimodal large language model for interpreting chest X-ray images (CXR-LLaVA). We also examined the effect of prompt engineering and model parameters such as temperature and nucleus sampling. Materials and Methods: For training, we collected 659,287 publicly available CXRs: 417,336 CXRs had labels for certain radiographic abnormalities (dataset 1); 241,951 CXRs provided free-text radiology reports (dataset 2). After pre-training the Resnet50 as an image encoder, the contrastive language-image pre-training was used to align CXRs and corresponding radiographic abnormalities. Then, the Large Language Model Meta AI-2 was fine-tuned using dataset 2, which were refined using GPT-4, with generating various question answering scenarios. The code can be found at https://github.com/ECOFRI/CXR_LLaVA. Results: In the test set, we observed that the model's performance fluctuated based on its parameters. On average, it achieved F1 score of 0.34 for five pathologic findings (atelectasis, cardiomegaly, consolidation, edema, and pleural effusion), which was improved to 0.46 through prompt engineering. In the independent set, the model achieved an average F1 score of 0.30 for the same pathologic findings. Notably, for the pediatric chest radiograph dataset, which was unseen during training, the model differentiated abnormal radiographs with an F1 score ranging from 0.84 to 0.85. Conclusion: CXR-LLaVA demonstrates promising potential in CXR interpretation. Both prompt engineering and model parameter adjustments can play pivotal roles in interpreting CXRs.
Feynman-Kac Correctors in Diffusion: Annealing, Guidance, and Product of Experts
While score-based generative models are the model of choice across diverse domains, there are limited tools available for controlling inference-time behavior in a principled manner, e.g. for composing multiple pretrained models. Existing classifier-free guidance methods use a simple heuristic to mix conditional and unconditional scores to approximately sample from conditional distributions. However, such methods do not approximate the intermediate distributions, necessitating additional 'corrector' steps. In this work, we provide an efficient and principled method for sampling from a sequence of annealed, geometric-averaged, or product distributions derived from pretrained score-based models. We derive a weighted simulation scheme which we call Feynman-Kac Correctors (FKCs) based on the celebrated Feynman-Kac formula by carefully accounting for terms in the appropriate partial differential equations (PDEs). To simulate these PDEs, we propose Sequential Monte Carlo (SMC) resampling algorithms that leverage inference-time scaling to improve sampling quality. We empirically demonstrate the utility of our methods by proposing amortized sampling via inference-time temperature annealing, improving multi-objective molecule generation using pretrained models, and improving classifier-free guidance for text-to-image generation. Our code is available at https://github.com/martaskrt/fkc-diffusion.
DiTAR: Diffusion Transformer Autoregressive Modeling for Speech Generation
Several recent studies have attempted to autoregressively generate continuous speech representations without discrete speech tokens by combining diffusion and autoregressive models, yet they often face challenges with excessive computational loads or suboptimal outcomes. In this work, we propose Diffusion Transformer Autoregressive Modeling (DiTAR), a patch-based autoregressive framework combining a language model with a diffusion transformer. This approach significantly enhances the efficacy of autoregressive models for continuous tokens and reduces computational demands. DiTAR utilizes a divide-and-conquer strategy for patch generation, where the language model processes aggregated patch embeddings and the diffusion transformer subsequently generates the next patch based on the output of the language model. For inference, we propose defining temperature as the time point of introducing noise during the reverse diffusion ODE to balance diversity and determinism. We also show in the extensive scaling analysis that DiTAR has superb scalability. In zero-shot speech generation, DiTAR achieves state-of-the-art performance in robustness, speaker similarity, and naturalness.
ACE2: Accurately learning subseasonal to decadal atmospheric variability and forced responses
Existing machine learning models of weather variability are not formulated to enable assessment of their response to varying external boundary conditions such as sea surface temperature and greenhouse gases. Here we present ACE2 (Ai2 Climate Emulator version 2) and its application to reproducing atmospheric variability over the past 80 years on timescales from days to decades. ACE2 is a 450M-parameter autoregressive machine learning emulator, operating with 6-hour temporal resolution, 1{\deg} horizontal resolution and eight vertical layers. It exactly conserves global dry air mass and moisture and can be stepped forward stably for arbitrarily many steps with a throughput of about 1500 simulated years per wall clock day. ACE2 generates emergent phenomena such as tropical cyclones, the Madden Julian Oscillation, and sudden stratospheric warmings. Furthermore, it accurately reproduces the atmospheric response to El Ni\~no variability and global trends of temperature over the past 80 years. However, its sensitivities to separately changing sea surface temperature and carbon dioxide are not entirely realistic.
QCRD: Quality-guided Contrastive Rationale Distillation for Large Language Models
The deployment of large language models (LLMs) faces considerable challenges concerning resource constraints and inference efficiency. Recent research has increasingly focused on smaller, task-specific models enhanced by distilling knowledge from LLMs. However, prior studies have often overlooked the diversity and quality of knowledge, especially the untapped potential of negative knowledge. Constructing effective negative knowledge remains severely understudied. In this paper, we introduce a novel framework called quality-guided contrastive rationale distillation aimed at enhancing reasoning capabilities through contrastive knowledge learning. For positive knowledge, we enrich its diversity through temperature sampling and employ self-consistency for further denoising and refinement. For negative knowledge, we propose an innovative self-adversarial approach that generates low-quality rationales by sampling previous iterations of smaller language models, embracing the idea that one can learn from one's own weaknesses. A contrastive loss is developed to distill both positive and negative knowledge into smaller language models, where an online-updating discriminator is integrated to assess qualities of rationales and assign them appropriate weights, optimizing the training process. Through extensive experiments across multiple reasoning tasks, we demonstrate that our method consistently outperforms existing distillation techniques, yielding higher-quality rationales.
Towards Better Text-to-Image Generation Alignment via Attention Modulation
In text-to-image generation tasks, the advancements of diffusion models have facilitated the fidelity of generated results. However, these models encounter challenges when processing text prompts containing multiple entities and attributes. The uneven distribution of attention results in the issues of entity leakage and attribute misalignment. Training from scratch to address this issue requires numerous labeled data and is resource-consuming. Motivated by this, we propose an attribution-focusing mechanism, a training-free phase-wise mechanism by modulation of attention for diffusion model. One of our core ideas is to guide the model to concentrate on the corresponding syntactic components of the prompt at distinct timesteps. To achieve this, we incorporate a temperature control mechanism within the early phases of the self-attention modules to mitigate entity leakage issues. An object-focused masking scheme and a phase-wise dynamic weight control mechanism are integrated into the cross-attention modules, enabling the model to discern the affiliation of semantic information between entities more effectively. The experimental results in various alignment scenarios demonstrate that our model attain better image-text alignment with minimal additional computational cost.
Inhomogeneous confinement and chiral symmetry breaking induced by imaginary angular velocity
We investigate detailed properties of imaginary rotating matter with gluons and quarks at high temperature. Previously, we showed that imaginary rotation induces perturbative confinement of gluons at the rotation center. We perturbatively calculate the Polyakov loop potential and find inhomogeneous confinement above a certain threshold of imaginary angular velocity. We also evaluate the quark contribution to the Polyakov loop potential and confirm that spontaneous chiral symmetry breaking occurs in the perturbatively confined phase.
Style Injection in Diffusion: A Training-free Approach for Adapting Large-scale Diffusion Models for Style Transfer
Despite the impressive generative capabilities of diffusion models, existing diffusion model-based style transfer methods require inference-stage optimization (e.g. fine-tuning or textual inversion of style) which is time-consuming, or fails to leverage the generative ability of large-scale diffusion models. To address these issues, we introduce a novel artistic style transfer method based on a pre-trained large-scale diffusion model without any optimization. Specifically, we manipulate the features of self-attention layers as the way the cross-attention mechanism works; in the generation process, substituting the key and value of content with those of style image. This approach provides several desirable characteristics for style transfer including 1) preservation of content by transferring similar styles into similar image patches and 2) transfer of style based on similarity of local texture (e.g. edge) between content and style images. Furthermore, we introduce query preservation and attention temperature scaling to mitigate the issue of disruption of original content, and initial latent Adaptive Instance Normalization (AdaIN) to deal with the disharmonious color (failure to transfer the colors of style). Our experimental results demonstrate that our proposed method surpasses state-of-the-art methods in both conventional and diffusion-based style transfer baselines.
On the Pareto Front of Multilingual Neural Machine Translation
In this work, we study how the performance of a given direction changes with its sampling ratio in Multilingual Neural Machine Translation (MNMT). By training over 200 multilingual models with various model sizes, data sizes, and language directions, we find it interesting that the performance of certain translation direction does not always improve with the increase of its weight in the multi-task optimization objective. Accordingly, scalarization method leads to a multitask trade-off front that deviates from the traditional Pareto front when there exists data imbalance in the training corpus, which poses a great challenge to improve the overall performance of all directions. Based on our observations, we propose the Double Power Law to predict the unique performance trade-off front in MNMT, which is robust across various languages, data adequacy, and the number of tasks. Finally, we formulate the sample ratio selection problem in MNMT as an optimization problem based on the Double Power Law. In our experiments, it achieves better performance than temperature searching and gradient manipulation methods with only 1/5 to 1/2 of the total training budget. We release the code at https://github.com/pkunlp-icler/ParetoMNMT for reproduction.
First Order Quantum Phase Transition in the Hybrid Metal-Mott Insulator Transition Metal Dichalcogenide 4Hb-TaS2
Coupling together distinct correlated and topologically non-trivial electronic phases of matter can potentially induce novel electronic orders and phase transitions among them. Transition metal dichalcogenide compounds serve as a bedrock for exploration of such hybrid systems. They host a variety of exotic electronic phases and their Van der Waals nature enables to admix them, either by exfoliation and stacking or by stoichiometric growth, and thereby induce novel correlated complexes. Here we investigate the compound 4Hb-TaS_2 that interleaves the Mott-insulating state of 1T-TaS_2 and the putative spin liquid it hosts together with the metallic state of 2H-TaS_2 and the low temperature superconducting phase it harbors. We reveal a thermodynamic phase diagram that hosts a first order quantum phase transition between a correlated Kondo cluster state and a flat band state in which the Kondo cluster becomes depleted. We demonstrate that this intrinsic transition can be induced by an electric field and temperature as well as by manipulation of the interlayer coupling with the probe tip, hence allowing to reversibly toggle between the Kondo cluster and the flat band states. The phase transition is manifested by a discontinuous change of the complete electronic spectrum accompanied by hysteresis and low frequency noise. We find that the shape of the transition line in the phase diagram is determined by the local compressibility and the entropy of the two electronic states. Our findings set such heterogeneous structures as an exciting platform for systematic investigation and manipulation of Mott-metal transitions and strongly correlated phases and quantum phase transitions therein.
Dissecting the Effects of SGD Noise in Distinct Regimes of Deep Learning
Understanding when the noise in stochastic gradient descent (SGD) affects generalization of deep neural networks remains a challenge, complicated by the fact that networks can operate in distinct training regimes. Here we study how the magnitude of this noise T affects performance as the size of the training set P and the scale of initialization alpha are varied. For gradient descent, alpha is a key parameter that controls if the network is `lazy'(alphagg1) or instead learns features (alphall1). For classification of MNIST and CIFAR10 images, our central results are: (i) obtaining phase diagrams for performance in the (alpha,T) plane. They show that SGD noise can be detrimental or instead useful depending on the training regime. Moreover, although increasing T or decreasing alpha both allow the net to escape the lazy regime, these changes can have opposite effects on performance. (ii) Most importantly, we find that the characteristic temperature T_c where the noise of SGD starts affecting the trained model (and eventually performance) is a power law of P. We relate this finding with the observation that key dynamical quantities, such as the total variation of weights during training, depend on both T and P as power laws. These results indicate that a key effect of SGD noise occurs late in training by affecting the stopping process whereby all data are fitted. Indeed, we argue that due to SGD noise, nets must develop a stronger `signal', i.e. larger informative weights, to fit the data, leading to a longer training time. A stronger signal and a longer training time are also required when the size of the training set P increases. We confirm these views in the perceptron model, where signal and noise can be precisely measured. Interestingly, exponents characterizing the effect of SGD depend on the density of data near the decision boundary, as we explain.
Causes and Cures for Interference in Multilingual Translation
Multilingual machine translation models can benefit from synergy between different language pairs, but also suffer from interference. While there is a growing number of sophisticated methods that aim to eliminate interference, our understanding of interference as a phenomenon is still limited. This work identifies the main factors that contribute to interference in multilingual machine translation. Through systematic experimentation, we find that interference (or synergy) are primarily determined by model size, data size, and the proportion of each language pair within the total dataset. We observe that substantial interference occurs mainly when the model is very small with respect to the available training data, and that using standard transformer configurations with less than one billion parameters largely alleviates interference and promotes synergy. Moreover, we show that tuning the sampling temperature to control the proportion of each language pair in the data is key to balancing the amount of interference between low and high resource language pairs effectively, and can lead to superior performance overall.
What Makes Graph Neural Networks Miscalibrated?
Given the importance of getting calibrated predictions and reliable uncertainty estimations, various post-hoc calibration methods have been developed for neural networks on standard multi-class classification tasks. However, these methods are not well suited for calibrating graph neural networks (GNNs), which presents unique challenges such as accounting for the graph structure and the graph-induced correlations between the nodes. In this work, we conduct a systematic study on the calibration qualities of GNN node predictions. In particular, we identify five factors which influence the calibration of GNNs: general under-confident tendency, diversity of nodewise predictive distributions, distance to training nodes, relative confidence level, and neighborhood similarity. Furthermore, based on the insights from this study, we design a novel calibration method named Graph Attention Temperature Scaling (GATS), which is tailored for calibrating graph neural networks. GATS incorporates designs that address all the identified influential factors and produces nodewise temperature scaling using an attention-based architecture. GATS is accuracy-preserving, data-efficient, and expressive at the same time. Our experiments empirically verify the effectiveness of GATS, demonstrating that it can consistently achieve state-of-the-art calibration results on various graph datasets for different GNN backbones.
Are Vision Transformers Robust to Patch Perturbations?
Recent advances in Vision Transformer (ViT) have demonstrated its impressive performance in image classification, which makes it a promising alternative to Convolutional Neural Network (CNN). Unlike CNNs, ViT represents an input image as a sequence of image patches. The patch-based input image representation makes the following question interesting: How does ViT perform when individual input image patches are perturbed with natural corruptions or adversarial perturbations, compared to CNNs? In this work, we study the robustness of ViT to patch-wise perturbations. Surprisingly, we find that ViTs are more robust to naturally corrupted patches than CNNs, whereas they are more vulnerable to adversarial patches. Furthermore, we discover that the attention mechanism greatly affects the robustness of vision transformers. Specifically, the attention module can help improve the robustness of ViT by effectively ignoring natural corrupted patches. However, when ViTs are attacked by an adversary, the attention mechanism can be easily fooled to focus more on the adversarially perturbed patches and cause a mistake. Based on our analysis, we propose a simple temperature-scaling based method to improve the robustness of ViT against adversarial patches. Extensive qualitative and quantitative experiments are performed to support our findings, understanding, and improvement of ViT robustness to patch-wise perturbations across a set of transformer-based architectures.
Blueprint for a Scalable Photonic Fault-Tolerant Quantum Computer
Photonics is the platform of choice to build a modular, easy-to-network quantum computer operating at room temperature. However, no concrete architecture has been presented so far that exploits both the advantages of qubits encoded into states of light and the modern tools for their generation. Here we propose such a design for a scalable and fault-tolerant photonic quantum computer informed by the latest developments in theory and technology. Central to our architecture is the generation and manipulation of three-dimensional hybrid resource states comprising both bosonic qubits and squeezed vacuum states. The proposal enables exploiting state-of-the-art procedures for the non-deterministic generation of bosonic qubits combined with the strengths of continuous-variable quantum computation, namely the implementation of Clifford gates using easy-to-generate squeezed states. Moreover, the architecture is based on two-dimensional integrated photonic chips used to produce a qubit cluster state in one temporal and two spatial dimensions. By reducing the experimental challenges as compared to existing architectures and by enabling room-temperature quantum computation, our design opens the door to scalable fabrication and operation, which may allow photonics to leap-frog other platforms on the path to a quantum computer with millions of qubits.
Neural Networks Fail to Learn Periodic Functions and How to Fix It
Previous literature offers limited clues on how to learn a periodic function using modern neural networks. We start with a study of the extrapolation properties of neural networks; we prove and demonstrate experimentally that the standard activations functions, such as ReLU, tanh, sigmoid, along with their variants, all fail to learn to extrapolate simple periodic functions. We hypothesize that this is due to their lack of a "periodic" inductive bias. As a fix of this problem, we propose a new activation, namely, x + sin^2(x), which achieves the desired periodic inductive bias to learn a periodic function while maintaining a favorable optimization property of the ReLU-based activations. Experimentally, we apply the proposed method to temperature and financial data prediction.
Planck 2018 results. V. CMB power spectra and likelihoods
This paper describes the 2018 Planck CMB likelihoods, following a hybrid approach similar to the 2015 one, with different approximations at low and high multipoles, and implementing several methodological and analysis refinements. With more realistic simulations, and better correction and modelling of systematics, we can now make full use of the High Frequency Instrument polarization data. The low-multipole 100x143 GHz EE cross-spectrum constrains the reionization optical-depth parameter tau to better than 15% (in combination with with the other low- and high-ell likelihoods). We also update the 2015 baseline low-ell joint TEB likelihood based on the Low Frequency Instrument data, which provides a weaker tau constraint. At high multipoles, a better model of the temperature-to-polarization leakage and corrections for the effective calibrations of the polarization channels (polarization efficiency or PE) allow us to fully use the polarization spectra, improving the constraints on the LambdaCDM parameters by 20 to 30% compared to TT-only constraints. Tests on the modelling of the polarization demonstrate good consistency, with some residual modelling uncertainties, the accuracy of the PE modelling being the main limitation. Using our various tests, simulations, and comparison between different high-ell implementations, we estimate the consistency of the results to be better than the 0.5sigma level. Minor curiosities already present before (differences between ell<800 and ell>800 parameters or the preference for more smoothing of the C_ell peaks) are shown to be driven by the TT power spectrum and are not significantly modified by the inclusion of polarization. Overall, the legacy Planck CMB likelihoods provide a robust tool for constraining the cosmological model and represent a reference for future CMB observations. (Abridged)
On the State Constrained Optimal Control of the Stefan Type Free Boundary Problems
We analyze the state constrained inverse Stefan type parabolic free boundary problem as an optimal control problem in the Sobolev-Besov spaces framework. Boundary heat flux, density of heat sources, and free boundary are components of the control vector. Cost functional is the sum of the L_2-norm declinations of the temperature measurement at the final moment, the phase transition temperature, the final position of the free boundary, and the penalty term, taking into account the state constraint on the temperature. We prove the existence of optimal control, Frechet differentiability, and optimality condition in the Besov spaces under minimal regularity assumptions on the data. We pursue space-time discretization through finite differences and prove that the sequence of discrete optimal control problems converges to the original problem both with respect to functional and control.
Frechet Differentiability in Besov Spaces in the Optimal Control of Parabolic Free Boundary Problems
We consider the inverse Stefan type free boundary problem, where information on the boundary heat flux and density of the sources are missing and must be found along with the temperature and the free boundary. We pursue optimal control framework where boundary heat flux, density of sources, and free boundary are components of the control vector. The optimality criteria consists of the minimization of the L_2-norm declinations of the temperature measurements at the final moment, phase transition temperature, and final position of the free boundary. We prove the Frechet differentiability in Besov spaces, and derive the formula for the Frechet differential under minimal regularity assumptions on the data. The result implies a necessary condition for optimal control and opens the way to the application of projective gradient methods in Besov spaces for the numerical solution of the inverse Stefan problem.