Bitte benutzen Sie diese Kennung, um auf die Ressource zu verweisen: https://doi.org/10.48548/pubdata-1469
RessourcentypZeitschriftenartikel
TitelAutomated invoice processing: Machine learning-based information extraction for long tail suppliers
DOI10.48548/pubdata-1469
Handle20.500.14123/1539
Autor*inKrieger, Felix  0000-0002-6360-8115
Drews, Paul  0000-0002-9845-5024
Funk, Burkhardt  0000-0001-5855-2666
AbstractAutomation of incoming invoices processing promises to yield vast efficiency improvements in accounting. Until a universal adoption of fully electronic invoice exchange formats has been achieved, machine learning can help bridge the adoption gaps in electronic invoicing by extracting structured information from unstructured invoice formats. Machine learning especially helps the processing of invoices of suppliers who only send invoices infrequently, as the models are able to capture the semantic and visual cues of invoices and generalize them to previously unknown invoice layouts. Since the population of invoices in many companies is skewed toward a few frequent suppliers and their layouts, this research examines the effects of training data taken from such populations on the predictive quality of different machine-learning approaches for the extraction of information from invoices. Comparing the different approaches, we find that they are affected to varying degrees by skewed layout populations: The accuracy gap between in-sample and out-of-sample layouts is much higher in the Chargrid and random forest models than in the LayoutLM transformer model, which also exhibits the best overall predictive quality. To arrive at this finding, we designed and implemented a research pipeline that pays special attention to the distribution of layouts in the splitting of data and the evaluation of the models.
SpracheEnglisch
SchlagwörterLayout-rich Documents; Document Analysis; Natural Language Processing
Jahr der Veröffentlichung in PubData2024
Art der VeröffentlichungZweitveröffentlichung
PublikationsversionVeröffentlichte Version
Datum der Erstveröffentlichung2023-10-12
EntstehungskontextForschung
AnmerkungenThis publication was funded by the German Research Foundation (DFG).
Veröffentlicht durchMedien- und Informationszentrum, Leuphana Universität Lüneburg
Zugehörige Ressourcen Beziehungen dieser Publikation
Dateien zu dieser Ressource:
Datei Beschreibung GrößeFormat 

Krieger_Automated_invoice_processing_Machine_learning-based_information_extraction_for_long_tail_suppliers.pdf
MD5: cf62f97f569d0cadac812955848b1cce
Lizenz: 
open-access


1.99 MB

Adobe PDF
Öffnen/Anzeigen

Alle Ressourcen in diesem Repository sind urheberrechtlich geschützt, soweit nicht anderweitig angezeigt.

Ansichten
Zitationsformate
Datensatz Exporte
Zugriffsstatistik

Seitenaufruf(e): 10

Download(s): 30