[ELF] Fix crash on invalid undefined local symbols

r320770 made LLD handle invalid DSOs where local symbols were found in
the global part of the symbol table. Unfortunately, it didn't handle the
case where those local symbols were also undefined, and r326242 exposed
an assertion failure in that case. Just warn on that case instead of
crashing, by moving the local binding check before the undefined symbol
addition.

The input file for the test is crafted by hand, since I don't know of
any tool that would produce such a broken DSO. I also don't understand
what it even means for a symbol to be undefined but have STB_LOCAL
binding - I don't think that combination makes any sense - but we have
found broken DSOs of this nature that we were linking against. I've
included detailed instructions on how to produce the DSO in the test.

Differential Revision: https://reviews.llvm.org/D52815

git-svn-id: https://llvm.org/svn/llvm-project/lld/trunk@343745 91177308-0d34-0410-b5e6-96231b3b80d8
diff --git a/ELF/InputFiles.cpp b/ELF/InputFiles.cpp
index c535a27..8fe8a1e 100644
--- a/ELF/InputFiles.cpp
+++ b/ELF/InputFiles.cpp
@@ -999,7 +999,17 @@
   for (size_t I = 0; I < Syms.size(); ++I) {
     const Elf_Sym &Sym = Syms[I];
 
+    // ELF spec requires that all local symbols precede weak or global
+    // symbols in each symbol table, and the index of first non-local symbol
+    // is stored to sh_info. If a local symbol appears after some non-local
+    // symbol, that's a violation of the spec.
     StringRef Name = CHECK(Sym.getName(this->StringTable), this);
+    if (Sym.getBinding() == STB_LOCAL) {
+      warn("found local symbol '" + Name +
+           "' in global part of symbol table in file " + toString(this));
+      continue;
+    }
+
     if (Sym.isUndefined()) {
       Symbol *S = Symtab->addUndefined<ELFT>(Name, Sym.getBinding(),
                                              Sym.st_other, Sym.getType(),
@@ -1008,16 +1018,6 @@
       continue;
     }
 
-    // ELF spec requires that all local symbols precede weak or global
-    // symbols in each symbol table, and the index of first non-local symbol
-    // is stored to sh_info. If a local symbol appears after some non-local
-    // symbol, that's a violation of the spec.
-    if (Sym.getBinding() == STB_LOCAL) {
-      warn("found local symbol '" + Name +
-           "' in global part of symbol table in file " + toString(this));
-      continue;
-    }
-
     // MIPS BFD linker puts _gp_disp symbol into DSO files and incorrectly
     // assigns VER_NDX_LOCAL to this section global symbol. Here is a
     // workaround for this bug.
diff --git a/test/ELF/invalid/Inputs/undefined-local-symbol-in-dso.so b/test/ELF/invalid/Inputs/undefined-local-symbol-in-dso.so
new file mode 100755
index 0000000..b884273
--- /dev/null
+++ b/test/ELF/invalid/Inputs/undefined-local-symbol-in-dso.so
Binary files differ
diff --git a/test/ELF/invalid/undefined-local-symbol-in-dso.test b/test/ELF/invalid/undefined-local-symbol-in-dso.test
new file mode 100644
index 0000000..8cb8869
--- /dev/null
+++ b/test/ELF/invalid/undefined-local-symbol-in-dso.test
@@ -0,0 +1,66 @@
+# REQUIRES: x86
+
+# LLD used to crash when linking against a DSO with an undefined STB_LOCAL
+# symbol in the global part of the dynamic symbol table (i.e. an STB_LOCAL
+# symbol with an index >= the sh_info of the dynamic symbol table section). Such
+# a DSO is very broken, because local symbols should precede all global symbols
+# in the symbol table, and because having a symbol that's both undefined and
+# STB_LOCAL is a nonsensical combination. Nevertheless, we should warn on such
+# input files instead of crashing.
+
+# We've found actual broken DSOs of this sort in the wild, but for this test, we
+# created a reduced broken input file. There are no tools capable of producing a
+# broken DSO of this nature, so instead we created a valid DSO with an undefiend
+# global symbol in the dynamic symbol table and then manually edited the binary
+# to make that symbol local. The valid DSO was created as follows:
+
+```
+% cat undef.s
+.hidden bar
+bar:
+	movq	foo@GOT, %rax
+
+% llvm-mc -triple=x86_64-linux-gnu -filetype=obj -o undef.o undef.s
+% ld.lld --no-rosegment -shared -o undefined-local-symbol-in-dso.so undef.o
+% strip undef.so
+```
+
+# (--no-rosegment and stripping are unnecessary; they just produce a smaller
+# binary)
+
+# This DSO should only have a single dynamic symbol table entry for foo, and
+# then we can use a small C program to modify that symbol table entry to be
+# STB_LOCAL instead of STB_GLOBAL.
+
+```
+#include <elf.h>
+#include <stdio.h>
+
+int main(int argc, char *argv[]) {
+  FILE *F = fopen(argv[1], "rb+");
+
+  Elf64_Ehdr Ehdr;
+  fread(&Ehdr, sizeof(Ehdr), 1, F);
+  fseek(F, Ehdr.e_shoff, SEEK_SET);
+
+  Elf64_Shdr Shdr;
+  do {
+    fread(&Shdr, sizeof(Shdr), 1, F);
+  } while (Shdr.sh_type != SHT_DYNSYM);
+
+  Elf64_Sym Sym;
+  fseek(F, Shdr.sh_offset + sizeof(Elf64_Sym), SEEK_SET);
+  fread(&Sym, sizeof(Sym), 1, F);
+  Sym.st_info = STB_LOCAL << 4 | ELF64_ST_TYPE(Sym.st_info);
+  fseek(F, Shdr.sh_offset + sizeof(Elf64_Sym), SEEK_SET);
+  fwrite(&Sym, sizeof(Sym), 1, F);
+  fclose(F);
+}
+```
+
+# (the C program just takes its input DSO and modifies the binding of the first
+# dynamic symbol table entry to be STB_LOCAL instead of STB_GLOBAL)
+
+# RUN: ld.lld %p/Inputs/undefined-local-symbol-in-dso.so -o %t 2>&1 | \
+# RUN:   FileCheck -check-prefix=WARN %s
+# WARN: found local symbol 'foo' in global part of symbol table in file {{.*}}undefined-local-symbol-in-dso.so