Hi team! I was interviewing yesterday and was advised to pick up a test task from the repository. I noticed the recent integration issues (like Google Drive and SharePoint) are already assigned or have pending PRs.
I would love to contribute a new integration for my test task. Maybe some knowledge connector like Airtable or Dropbox storage, or do you have something different in mind?
@user what's the best thing to do with connectors?
Matvey Kukuy (archestra team) —
Btw, is knowledge base under a feature flag? (side question) 😁
Innokentii Konstantinov (archestra team) —
Hi!
Yes, it's under a feature flag - `ARCHESTRA_ENTERPRISE_LICENSE_KNOWLEDGE_BASE_ACTIVATED`
Let me think of a connector
Matvey Kukuy (archestra team) —
Should we un-feature-flag it? 🙂
Innokentii Konstantinov (archestra team) —
> Should we un-feature-flag it? 🙂
I think it's a separate question to discuss, we had some reasons to do it.
Speaking of a connector - I actually have something in mind - "File Upload".
Idea is that user should be able to create such connector and upload files there so their content is searchable by knowledge base.
I think we have some pieces in already - for example sharepoint connector doing some text recognition from the files. WDYT @user. @user?
Sergey Frolov —
Sounds like a really nice and interesting feature!
Since users will be pushing files directly to the platform, I just have a few questions:
1. Should we restrict the file formats strictly to text-based types like .txt, .md, .pdf, and .docx?
2. Do you have a specific file size limitation you want me to enforce on the uploads?
Innokentii Konstantinov (archestra team) —
Hi, I created an issue for that - github.com/archestra-ai/archestra/issues/3781.
1. Text based formats are fine for first iteration
2. Put some meaningful file size restriction. I'm also happy to hear your ideas on how to store these uploaded files at our side 🙂
Sergey Frolov —
Thanks, Innokentii! I will check it
Sergey Frolov —
So we can use the existing `kb_documents` table to store file text and to support a RAG, and create a new `kb_uploaded_files` table to store `file_data` as `raw bytes` to give the ability to the user to `CRUD` uploaded files. And 10 MB will be the size limit.
Also, if for user enough only to see the uploaded file `names` and `remove`, we can use only kb_documents
Innokentii Konstantinov (archestra team) —
@user Can you share your github handle, so I will assign you?
Sergey Frolov —
@bigcheeseh
Innokentii Konstantinov (archestra team) —
Can you please a leave a comment under the issue, so I could assign you?
Sergey Frolov —
added
Innokentii Konstantinov (archestra team) —
Assigned!
To your questions :
1. I think 10 mbs limit is fine for start
2. I feel that we need to store file data too, since we may want to re-extract data from them at some point., so we still have a source of truth.
Sergey Frolov —
Hi, Innokentii! I made a PR github.com/bigcheeseh/archestra/pull/1
Matvey Kukuy (archestra team) —
Hi @user do you mind making a PR to archestra repo, not to the fork?
Sergey Frolov —
Hi @user, sure, done! Didn't have permissions at first but found the button. PR is up.
Matvey Kukuy (archestra team) —
@user could you please share the link to the pr?
Sergey Frolov —
@user github.com/archestra-ai/archestra/pull/3924
Matvey Kukuy (archestra team) —
@user I see there is a comment from @user
Sergey Frolov —
Yes, I completely disagree with `3.` and `4.` issues, will try to explain
Hi team! I was interviewing yesterday and was advised to pick up a test task from the repository. I noticed the recent integration issues (like Google Drive and SharePoint) are already assigned or have pending PRs.
I would love to contribute a new integration for my test task. Maybe some knowledge connector like Airtable or Dropbox storage, or do you have something different in mind?
Hi team! I was interviewing yesterday and was advised to pick up a test task from the repository. I noticed the recent integration issues (like Google Drive and SharePoint) are already assigned or have pending PRs.
I would love to contribute a new integration for my test task. Maybe some knowledge connector like Airtable or Dropbox storage, or do you have something different in mind?
So we can use the existing kb_documents table to store file text and to support a RAG, and create a new kb_uploaded_files table to store file_data as raw bytes to give the ability to the user to CRUD uploaded files. And 10 MB will be the size limit.
Also, if for user enough only to see the uploaded file names and remove, we can use only kb_documents
Innokentii Konstantinov (archestra team)2:16 PMOpen in Slack
@user Can you share your github handle, so I will assign you?