curl --request PUT \
--url https://api.botpress.cloud/v1/files \
--header 'Authorization: Bearer <token>' \
--header 'Content-Type: application/json' \
--header 'x-bot-id: <x-bot-id>' \
--data '
{
"key": "<string>",
"size": 123,
"tags": {},
"index": false,
"indexing": {
"configuration": {
"parsing": {
"minimumParagraphLength": 1025,
"smartCleanup": true
},
"chunking": {
"maximumChunkLength": 2550,
"embeddedContextLevels": 1,
"embedBreadcrumb": true
},
"summarization": {
"enable": false,
"modelType": "balanced",
"minimumInputLength": 5500,
"outputTokenLimit": 5500,
"generateMasterSummary": true
},
"stack": "legacy",
"vision": {
"transcribePages": "<unknown>",
"indexPages": "<unknown>"
}
}
},
"accessPolicies": [
"public_content"
],
"contentType": "<string>",
"expiresAt": "2023-11-07T05:31:56Z",
"publicContentImmediatelyAccessible": true,
"metadata": "<unknown>"
}
'{
"file": {
"id": "<string>",
"botId": "<string>",
"key": "<string>",
"url": "<string>",
"size": 123,
"contentType": "<string>",
"tags": {},
"metadata": {},
"createdAt": "<string>",
"updatedAt": "<string>",
"accessPolicies": [
"integrations"
],
"index": true,
"status": "upload_pending",
"owner": {
"type": "bot",
"id": "<string>",
"name": "<string>"
},
"uploadUrl": "<string>",
"failedStatusReason": "<string>",
"expiresAt": "<string>",
"indexingStack": "v1"
}
}Creates or updates a file using the key parameter as unique identifier. Updating a file will erase the existing content of the file. Upload the file content by sending it in a PUT request to the uploadUrl returned in the response.
curl --request PUT \
--url https://api.botpress.cloud/v1/files \
--header 'Authorization: Bearer <token>' \
--header 'Content-Type: application/json' \
--header 'x-bot-id: <x-bot-id>' \
--data '
{
"key": "<string>",
"size": 123,
"tags": {},
"index": false,
"indexing": {
"configuration": {
"parsing": {
"minimumParagraphLength": 1025,
"smartCleanup": true
},
"chunking": {
"maximumChunkLength": 2550,
"embeddedContextLevels": 1,
"embedBreadcrumb": true
},
"summarization": {
"enable": false,
"modelType": "balanced",
"minimumInputLength": 5500,
"outputTokenLimit": 5500,
"generateMasterSummary": true
},
"stack": "legacy",
"vision": {
"transcribePages": "<unknown>",
"indexPages": "<unknown>"
}
}
},
"accessPolicies": [
"public_content"
],
"contentType": "<string>",
"expiresAt": "2023-11-07T05:31:56Z",
"publicContentImmediatelyAccessible": true,
"metadata": "<unknown>"
}
'{
"file": {
"id": "<string>",
"botId": "<string>",
"key": "<string>",
"url": "<string>",
"size": 123,
"contentType": "<string>",
"tags": {},
"metadata": {},
"createdAt": "<string>",
"updatedAt": "<string>",
"accessPolicies": [
"integrations"
],
"index": true,
"status": "upload_pending",
"owner": {
"type": "bot",
"id": "<string>",
"name": "<string>"
},
"uploadUrl": "<string>",
"failedStatusReason": "<string>",
"expiresAt": "<string>",
"indexingStack": "v1"
}
}Bearer authentication header of the form Bearer <token>, where <token> is your auth token.
Bot id
Integration id
Integration alias
Integration name
User Id
User Role
Properties of the file to create or update.
Unique key for the file. Must be unique across the bot (and the integration, when applicable).
File size in bytes. This will count against your File Storage quota. If the index parameter is set to true, this will also count against your Vector DB Storage quota.
Set to a value of 'true' to index the file in vector storage. Only certain file formats are currently supported for indexing. Files larger than 95 MB cannot be indexed. Note that if a file is indexed, it will count towards both the Vector DB Storage quota and the File Storage quota of the workspace.
Show child attributes
Configuration to use for indexing the file, will be stored in the file's metadata for reference.
Show child attributes
Show child attributes
The minimum length a standalone paragraph should have. If a paragraph is shorter than this, it will be merged with the next immediate paragraph.
50 <= x <= 2000(Team/Enterprise plan only, charged as AI Spend) Enabling this will use a lightweight/inexpensive LLM to clean up the extracted content of PDF files before indexing them to increase the quality of the stored vectors, as PDFs often store raw text in unusual ways which when extracted may result in formatting issues (e.g. broken sentences/paragraphs, unexpected headings, garbled characters, etc.) that can affect retrieval performance for certain user queries if left untouched.
Notes:
Show child attributes
The maximum length of a chunk in characters.
100 <= x <= 5000The number of surrounding context levels to include in the vector embedding of the chunk.
0 <= x <= 3Include the breadcrumb of the chunk in the vector embedding.
Show child attributes
(Team/Enterprise plan only, charged as AI Spend) Create summaries for this file and index them as standalone vectors. Enabling this option will incur in AI Spend cost (charged to the workspace of the bot) to generate the summaries based on the amount of content in the file and the summarization model used.
Please note that this feature is only available in Team and Enterprise plans.
The model type to use for summarization.
inexpensive, balanced, accurate The minimum length a section of the file should have to create a summary of it.
1000 <= x <= 10000The maximum length of a summary (in tokens).
1000 <= x <= 10000Generate a summary of the entire file and index it as a standalone vector.
If not set, the default indexing stack will be used.
legacy, realtime-v1 Show child attributes
(Team/Enterprise plan only, charged as AI Spend) For PDF files, set this option to true or pass an array with specific page numbers to use a vision-enabled LLM to transcribe each page of the PDF as standalone vectors and index them.
This feature is useful when a PDF file contains custom designs or layouts, or when your document has many infographics, which require visual processing in order to index the file effectively, as the default text-based indexing may not be enough to allow your bot to correctly understand the content in your PDFs.
Notes:
(Team/Enterprise plan only, charged as AI Spend) For PDF files, set this option to true or pass an array with specific page numbers to use a vision-enabled LLM to index each page of the PDF as a standalone image.
Enabling this feature will allow Autonomous Nodes in your bot to answer visual or higher-level questions about the content in these pages that can usually not be answered correctly by the default text-based indexing or visual transcription.
This feature is useful when a PDF has:
Notes:
File access policies. Add "public_content" to allow public access to the file content. Add "integrations" to allow read, search and list operations for any integration installed in the bot.
public_content, integrations File content type. If omitted, the content type will be inferred from the file extension (if any) specified in key. If a content type cannot be inferred, the default is "application/octet-stream".
Expiry timestamp in ISO 8601 format with UTC timezone. After expiry, the File will be deleted. Must be in the future. Cannot be more than 90 days from now. The value up to minutes is considered. Seconds and milliseconds are ignored.
Use when your file has "public_content" in its access policy and you need the file's content to be immediately accessible through its URL after the file has been uploaded without having to wait for the upload to be processed by our system.
If set to true, the x-amz-tagging HTTP header with a value of public=true will need to be sent in the HTTP PUT request to the uploadUrl in order for the upload request to work.
Custom metadata for the file expressed as an object of key-value pairs. The values can be of any type.
The created or updated file
Show child attributes
File ID
The ID of the bot the file belongs to
Unique key for the file. Must be unique across the bot (and the integration, when applicable).
URL to retrieve the file content. This URL will be ready to use once the file is uploaded.
If the file has a public_content policy, this will contain the permanent public URL to retrieve the file, otherwise this will contain a temporary pre-signed URL to download the file which should be used shortly after retrieving and should not be stored long-term as the URL will expire after a short timeframe.
File size in bytes. Non-null if file upload status is "COMPLETE".
MIME type of the file's content
File creation timestamp in ISO 8601 format
File last update timestamp in ISO 8601 format
Access policies configured for the file.
integrations, public_content Whether the file was requested to be indexed for search or not.
Status of the file. If the status is upload_pending, the file content has not been uploaded yet. The status will be set to upload_completed once the file content has been uploaded successfully.
If the upload failed for any reason (e.g. exceeding the storage quota or the maximum file size limit) the status will be set to upload_failed and the reason for the failure will be available in the failedStatusReason field of the file.
However, if the file has been uploaded and the index attribute was set to true on the file, the status will immediately transition to the indexing_pending status (the upload_completed status step will be skipped).
Once the indexing is completed and the file is ready to be used for searching its status will be set to indexing_completed. If the indexing failed the status will be set to indexing_failed and the reason for the failure will be available in the failedStatusReason field.
upload_pending, upload_failed, upload_completed, indexing_pending, indexing_failed, indexing_completed Show child attributes
bot, integration, user This field is present if type is "user" or "bot". If type is "user", this is the user ID. If type is "bot", this is the bot ID.
This field is present if the type is "integration". If type is "integration", this is the integration name.
URL to upload the file content. File content needs to be sent to this URL via a PUT request.
If the file status is upload_failed or indexing_failed this will contain the reason of the failure.
File expiry timestamp in ISO 8601 format
Indicates the indexing stack used to index this file. Present only when file has been successfully indexed. A value of "v2" denotes the latest stack, "v1" denotes the legacy stack.
v1, v2 Was this page helpful?