Pramata’s Cleanse product enables you to easily and securely upload all your contractual documents into the Pramata platform. Our technology and team quickly get to work, cleansing and converting your documents into a digital format, identifying and removing any unnecessary and invalid files. This critical step ensures your centralized Repository comprises a clean set of files.
This page details the logic Pramata Cleanse uses to remove duplicates and extract metadata to provide inventory visibility, all without manual effort. With Cleanse, you gain a comprehensive overview of your files in days.
Document Out-of-Scope
All files that are transferred to, or received by Pramata from all intake sources go through Pramata Cleanse, where they are either queued for further processing (a.k.a. Documents In-Scope) or automatically removed from further processing (a.k.a. Documents Out-of-Scope). Additionally, some files may also be identified to have issues that a customer may need to resolve (a.k.a. Documents with Issues)
The following set of documents are automatically moved into the Documents Out-of-Scope category upon intake:
- Duplicate documents - wherein our proprietary algorithms checks to see if there are any other documents that are the same, ensuring swift identification of duplicates upon intake.
- Documents with the following extensions - .txt, .text, .xls, .xlsx, .csv, .ppt, .png, .crdownload, .rtf, .xlsm, .xlsb, .eml, .lnk, .url, .load, .dotx, .json, .blob, and .exe.
- Documents with variations of the following keyword(s) in the original file name:
- Template
- Redline
- Edit
- Draft
- Not Signed
- Unexecuted
- Invoice
- Unsigned
Document with Issues
The following set of documents are automatically tagged as Documents with Issues upon intake.
- 0KB documents - The system identifies files that have file size = 0KB. The status of these files is marked as “0KB”.
- Corrupt documents - The system identifies any PDFs that are corrupt (meaning they cannot be opened) and the status is marked as Corrupt File.
- Protected/Encrypted documents - The system identifies PDF documents that are encrypted/password (read) protected and attempts to decrypt these documents. All the files which fail decryption are usually because they are password protected, and these get marked as Protected Document.
File Conversion to PDF
The documents then move into the next step where non-PDF files are converted into PDF format.
Documents with the following extensions are automatically converted to PDF version for further processing - .tif, .tiff, .html, .htm, .jpeg, .jpg, .doc, .docx, .msg.
Documents Sent to OCR
Once all documents are converted to PDF format, each document is OCR’ed. This important step automatically converts contract images to text. This OCR ensures users can use Pramata’s Search features to easily find any clause or text inside their contracts.
Scoping Data Extraction
Once OCR finishes (usually within 24 hours), a certain set of metadata is extracted from each file. The following lists the specific concepts that are extracted as well as the currently calibrated extraction guidelines / rules for each concept:
Document Title
This key term allows us to identify the name of the contract i.e., the header or the title of it. The title enables users to search for a document by some phrases that generally appear in titles. Often the Document Title is also indicative of a standard template. A few general guidelines of extraction followed by Pramata:
- The document title is digitized verbatim, without any normalization.
- Pramata does normalize titles where one is not called out clearly, like in the following instances:
- Documents that are in letter format, the title would be “Letter Agreement”.
- For documents that are emails, the title would be “Email”.
- If a document does not have a title, then Pramata considers it as an “Untitled Document” title.
Document Type
Document Types are a categorization of the contracts into buckets/types. It's a grouping or putting similar/same types of documents together. This step of bucketing documents into certain fixed types also enables end users to search for documents in a quicker and more efficient way, rather than having to remember the title of the documents and having it scattered all over. Additionally, this categorization of document types into different buckets helps users analyze all kinds of data - how many types of a particular kind have been sent, what terms are contained in a particular document type, etc. Pramata, largely categorizes document types into the following:
- Master Agreement
- Online Master
- Group Purchasing Agreement
- NDA
- Order
- Amendment
- Addendum
- Statement of Work
- Miscellaneous
- Amendment and Order
- Pricing Schedule
- Quotes, Purchase Orders (Depending on the relation type of the Customer)
When contracts of more than one type are usually merged in one file, in such cases, Pramata splits the documents into its respective types before digitization begins. For example, if a customer has sent a document where the first 10 pages are that of a Master Agreement and the next 4 pages are that of an Amendment, Pramata would split the first 10 pages and review the document separately as a Master Agreement and the remaining would be reviewed as an Amendment in a separate document. Pramata also looks to split those documents where there is more than one document in them, and they each have a term of their own.
Account / Company Group Name
Accounts represent entities with which you maintain a two-way relationship, where either you are engaged in buying or selling transactions. Pramata’s algorithms normalize the various entities in contracts and categorize them into one main Account/Company Group. For example, contracts may have the signing entities as “IBM Inc”, “IBM Incorporated”, and “International Business Machines Inc”. In this case Pramata would consolidate all these contracts into one Company Group virtual folder rather than having it scattered into various folders based on the entity (in the contract) names - which would make it easier to search, have it more organized and establish a correct contract hierarchy. Contained within each Account are all the contractual documents for a related set of entities. This concept of accounting and storing documents in respective accounts plays a very critical role in allowing a user to search for contracts and to tie back documents to a Customer’s CRM accounts.
Pramata also can establish relationships (in the form of an affiliation, acquisitions, etc.) between Company Groups/Accounts.
Effective Date
The Effective Date is the date on which the contract comes into true effect, i.e., parties begin to fulfil their obligations under the contract, or in some cases the provisioning of services begin, etc. This information helps users search for contracts using date ranges. For example, users may search for all contracts from the entire set, only for those dated/effective in 2023. It also acts as a critical factor for the calculation of end dates and renewal dates. Few general guidelines of extraction followed by Pramata:
- Effective dates are usually defined in a contract’s recitals, which will be the first source of extraction for Pramata
- In the absence of a defined Effective Date, the 'Entered on Date' or 'Made on Date' 'Created Date' or 'Execution Date' will be considered as the Effective Date
- If none of the above dates are present, the latest signature date is extracted as the Effective Date.
- If none of the above scenarios help in determining the Effective Date, the Effective Date value is “Date Unknown”.
Doc ID
A unique identifier assigned to the contract, usually system generated and alpha-numeric.
-
This is usually located in the top corner of the contract or in the document title, recitals and/or introductory paragraph.
-
Pramata extracts this verbatim and is not subject to interpretation.
Entities
The parties to the contracts are critical to determining who has passed the contract. Pramata captures the legal entities that are parties to a contract. This data is helpful when someone needs to find a particular contract with a specific legal entity (example: all contracts from an acquired entity). The distinction between Entities and Account / Company Group is that Entities capture the verbatim name of the Legal entity (i.e., International Business Machines, Inc.) whereas Account / Company Group normalizes Legal Entities into a more common name (i.e., IBM). Entities are also closely linked to the Account/Company Group Name. Users may search for legal entities if they are looking to find a particular entity who has signed the contract at the most granular level in the hierarchy. It is also looked for in the case of mergers and acquisitions to know which acquired entity has transacted. It is also crucial to know in the case of GPOs to understand who the participating members are. Entities in Pramata may be broadly classified as:
- Party - This is the legal entity of the Customer (here, Pramata Customer)
- Other Party - This is the legal entity of the other entity in the contract (Other party to Pramata’s customers)
- Third Party - This is the legal entity that is not selling or buying any product or service but is connected to the transaction. (Like a partner, subcontractor, affiliate, etc.)
Signature Status
This term is indicative of whether the contract has been signed or not. This is found in a specific signature block either on the first page itself or at the end of the contract. Pramata’s AI and human assisted AI will pick up on both handwritten signatures and electronic signatures. Pramata indicates if a contract is:
- Fully signed - If both parties to a contract have fully signed it.
- Partially Signed - If either party to a contract has not signed it.
- Unsigned - If neither parties to a contract have not signed it.
- No signature block - When the contract is such (like a Letter Agreement, an Email) which does not require a signature block to be present.
By having this set of metadata extracted from each file, it allows for any further scoping that may be needed to remove unnecessary files from the Repository and for inclusion in Pramata Organize.
Self-Service Scoping
Customer Admins can move documents to the Out-of-Scope category based on business needs. This should be completed once all documents go through Pramata Cleanse and prior to them going into Pramata Organize.
Steps to Move Documents Out of Scope
- In the Pramata Platform, navigate to the Admin Console and then select Document Management.
- On the pie chart, click on Documents In-Scope.
- On the pie chart, click on Not Started.
- Select the documents you want to move from In-Scope to Out-of-Scope.
Tip: Add columns into your view to use the metadata from each file to help with scoping. - Click the Select Action drop-down list and select Move documents to Out-Of-Scope.
- In the Add Comment box, leave a comment to help understand later why you decided to keep a document out of scope.