Retrieving data through API - performance is slow

computersrmyfriends · April 14, 2023, 8:42pm

@app.route('/api/data', methods=['GET'])
def get_data():
    data = []
    for eachworkspace in list_workspaces(strtoken):
        #pprint(eachworkspace)
        workspace_id=eachworkspace['id']
        workspace_name=eachworkspace['name']
        for eachdb in workspace_list_applications(strtoken,workspace_id):
            #print(eachdb)
            database_id=eachdb['id']
            database_name=eachdb['name']
            for eachtable in list_database_tables(strtoken,database_id):
                #pprint(eachtable)
                table_id=eachtable['id']
                table_name = eachtable['name']
                for eachview in list_database_table_views(strtoken,table_id):
                    view_id=eachview['id']
                    view_name=eachview['name']
                    #pprint(list_database_table_grid_view_rows(strtoken,view_id))
                    for eachrow in list_database_table_grid_view_rows(strtoken,view_id)['results']:
                        pprint(eachrow)
                        newrow=eachrow.copy()
                        for columnid,cellvalue in eachrow.items():
                            if "_" in columnid:
                              #print(int(key.split('_')[1]))
                              newrow.pop(columnid)
                              realcolumnid = int(columnid.split('_')[1])
                              columnname=get_database_table_field(strtoken,realcolumnid)['name'] ## getting column name based on column id for each row since they're mapped by column ids.
                              newrow[columnname]=cellvalue
                    #pprint(newrow)
                    data.append({database_name:{table_name:newrow}})
    return jsonify(data)

I’ve come this far to get the data by recursively calling your APIs. The get_database_table_field is especially called too many times and its making the performance seem slow to retrieve data.

I am basically trying to integrate baserow with Grafana so that data from baserow can be queried and displayed as dashboards.

Please suggest me how to improve the performance of this code.

Thank you

computersrmyfriends · April 15, 2023, 8:05am

I tried a little more tweaking but the performance is still slow

def find_name(data, number):
    for item in data:
        for _, value in item.items():
            if isinstance(value, list):
                for field in value:
                    if isinstance(field, set):
                        if number in field:
                            return list(field - {number})[0]
    return None



### NEED ALLLL DATA? USE THIS...
#
@app.route('/api/data', methods=['GET'])
def get_data():
    data = []
    datantables=[]
    for eachworkspace in list_workspaces(strtoken):
        #pprint(eachworkspace)
        workspace_id=eachworkspace['id']
        workspace_name=eachworkspace['name']
        for eachdb in workspace_list_applications(strtoken,workspace_id):
            #print(eachdb)
            database_id=eachdb['id']
            database_name=eachdb['name']
            for eachtable in list_database_tables(strtoken,database_id):
                #pprint(eachtable)
                table_id=eachtable['id']
                table_name = eachtable['name']
                table_fields=[]
                for eachtablefield in list_database_table_fields(strtoken,table_id):
                    table_fields.append({eachtablefield['id'],eachtablefield['name']})
                datantables.append({workspace_id:workspace_name,database_id:database_name,table_name:table_fields})
                for eachview in list_database_table_views(strtoken,table_id):
                    view_id=eachview['id']
                    view_name=eachview['name']
                    #pprint(list_database_table_grid_view_rows(strtoken,view_id))
                    for eachrow in list_database_table_grid_view_rows(strtoken,view_id)['results']:
                        #pprint(eachrow)
                        newrow=eachrow.copy()
                        for columnid,cellvalue in eachrow.items():
                            if "_" in columnid:
                              #print(int(key.split('_')[1]))
                              newrow.pop(columnid)
                              realcolumnid = int(columnid.split('_')[1])
                              print(realcolumnid)
                             
                              columnname= find_name(datantables,realcolumnid)
                              print(columnname)   #get_database_table_field(strtoken,realcolumnid)['name'] ## getting column name based on column id for each row since they're mapped by column ids.
                              newrow[columnname]=cellvalue
                    #pprint(newrow)
                    data.append({workspace_name:{database_name:{table_name:newrow}}})
                    #pprint(datantables)
    return jsonify(data)

its not slow because I’m trying to find the key and value from the tree. its slow because of the API.

How do I use this API for accessing a lot of data and processing it? Can it be used for that?

bram · April 20, 2023, 8:35am

Hi @computersrmyfriends, I don’t have any insights in what the list functions do exactly, but I assume they’re making an API call to fetch the content related to the name. If you have 5 workspaces, in each 5 databases, each having 5 tables with 5 views and 5 fields, you need to make well over a thousand requests to fetch all the data. That can indeed be a bit slow. This is not necessarily related to the API speed, but rather that it’s inefficient to fetch data this.

Would you mind telling me more about your use case? Why do you need a full structure of all your databases?

Some initial ideas:

Instead of fetching all the workspaces and then their tables, you can use the Baserow API spec endpoint to fetch all the databases of all the workspaces the user has access to in one request.
There is no need to make a separate request to fetch the tables, they are already included in the list applications response.
Depending on your use case, you can also consider using the Baserow API spec to get all the data without making a separate fields and rows request.
Are you aware that there is also a Baserow CLI command to export all the databases in a workspace to a structured JSON file? This is unfortunately not yet possible via the API.

computersrmyfriends · April 20, 2023, 6:58pm

Thank you Bram. I need all workspaces to run queries since I’m creating a separate workspace for each user with a copy of the same database. This is necessary as each user shouldn’t be able to see the other’s database, but the admin should be able to see all of them.

Additionally, I need to query records based on databases in different workspaces, retrieve data from multiple tables together, and analyze them, etc.

When I posted this question, I was also trying to integrate Baserow with Grafana to create charts based on queries. However, I found it challenging and have since given up on it. Instead, I’ve been using Appsmith for simple analysis, spreadsheets for data analysis, and APIs with an API token to retrieve data.

The final method I’m using involves generating an API token if it does not exist and using the table APIs. This seems to work better, but I still need to test how it works when recursively retrieving data.

Thank you for all your work on Baserow. I’ve settled with it for now, and I hope it works well for datasets up to atleast 50,000 records.

Thanks!