Thread

SF
Sergey Frolov1:31 PMOpen in Slack
Hi team! I was interviewing yesterday and was advised to pick up a test task from the repository. I noticed the recent integration issues (like Google Drive and SharePoint) are already assigned or have pending PRs.
I would love to contribute a new integration for my test task. Maybe some knowledge connector like Airtable or Dropbox storage, or do you have something different in mind?

22 replies
MK
Matvey Kukuy (archestra team)2:53 PMOpen in Slack
Hi @user!
MK
Matvey Kukuy (archestra team)2:53 PMOpen in Slack
@user what's the best thing to do with connectors?
MK
Matvey Kukuy (archestra team)2:53 PMOpen in Slack
Btw, is knowledge base under a feature flag? (side question) 😁
IK
Innokentii Konstantinov (archestra team)2:55 PMOpen in Slack
Hi!
Yes, it's under a feature flag - ARCHESTRA_ENTERPRISE_LICENSE_KNOWLEDGE_BASE_ACTIVATED
Let me think of a connector
MK
Matvey Kukuy (archestra team)2:55 PMOpen in Slack
Should we un-feature-flag it? 🙂
IK
Innokentii Konstantinov (archestra team)2:59 PMOpen in Slack
Should we un-feature-flag it? 🙂
I think it's a separate question to discuss, we had some reasons to do it.
Speaking of a connector - I actually have something in mind - "File Upload".
Idea is that user should be able to create such connector and upload files there so their content is searchable by knowledge base.
I think we have some pieces in already - for example sharepoint connector doing some text recognition from the files. WDYT @user. @user?
SF
Sergey Frolov6:07 AMOpen in Slack
Sounds like a really nice and interesting feature!
Since users will be pushing files directly to the platform, I just have a few questions:
1. Should we restrict the file formats strictly to text-based types like .txt, .md, .pdf, and .docx?
2. Do you have a specific file size limitation you want me to enforce on the uploads?
IK
Innokentii Konstantinov (archestra team)12:56 PMOpen in Slack
1. Text based formats are fine for first iteration
2. Put some meaningful file size restriction. I'm also happy to hear your ideas on how to store these uploaded files at our side 🙂
👍1
SF
Sergey Frolov1:01 PMOpen in Slack
Thanks, Innokentii! I will check it
SF
Sergey Frolov2:14 PMOpen in Slack
So we can use the existing kb_documents table to store file text and to support a RAG, and create a new kb_uploaded_files table to store file_data as raw bytes to give the ability to the user to CRUD uploaded files. And 10 MB will be the size limit.
Also, if for user enough only to see the uploaded file names and remove, we can use only kb_documents
IK
Innokentii Konstantinov (archestra team)2:16 PMOpen in Slack
@user Can you share your github handle, so I will assign you?
SF
Sergey Frolov2:18 PMOpen in Slack
@bigcheeseh
IK
Innokentii Konstantinov (archestra team)2:32 PMOpen in Slack
Can you please a leave a comment under the issue, so I could assign you?
👍1
SF
Sergey Frolov2:34 PMOpen in Slack
added
IK
Innokentii Konstantinov (archestra team)2:39 PMOpen in Slack
Assigned!
To your questions :
1. I think 10 mbs limit is fine for start
2. I feel that we need to store file data too, since we may want to re-extract data from them at some point., so we still have a source of truth.
👍1
SF
Sergey Frolov4:10 PMOpen in Slack
MK
Matvey Kukuy (archestra team)5:26 PMOpen in Slack
Hi @user do you mind making a PR to archestra repo, not to the fork?
SF
Sergey Frolov6:15 AMOpen in Slack
Hi @user, sure, done! Didn't have permissions at first but found the button. PR is up.
MK
Matvey Kukuy (archestra team)11:33 AMOpen in Slack
@user could you please share the link to the pr?
MK
Matvey Kukuy (archestra team)12:16 PMOpen in Slack
@user I see there is a comment from @user
SF
Sergey Frolov12:19 PMOpen in Slack
Yes, I completely disagree with 3. and 4. issues, will try to explain
❤️1