[SystemSafety] How much public Ada source code is there?
Derek M Jones
derek at knosof.co.uk
Tue Jun 4 17:48:35 CEST 2024
All,
Ada source code is present in version 2 of the Stack,
a public source code repo designed for training LLMS
https://huggingface.co/datasets/bigcode/the-stack-v2
Technical details here
https://arxiv.org/abs/2402.19173
The amount of Ada source is:
language : "Ada"
num_files : 183,890
dedup_num_files : 92,104
train_num_files : 89,221
size_bytes : 2.03e+10
dedup_size_bytes: 8.25e+08
train_size_bytes: 6.14e+08
num_files is the number of unique source files.
dedup_num_files is further deduplication, e.g., ignoring
differences in whitespace and blank lines.
Do these numbers look like they are representative
of the total amount of publicly available Ada source?
Is there some huge Ada repository someplace that looks
like it might not have been included?
--
Derek M. Jones Evidence-based software engineering
blog:https://shape-of-code.com
More information about the systemsafety
mailing list